Network Working Group B. Sarikaya Internet-Draft L. Dunbar Intended status: Standards Track Huawei USA Expires: August 3, 2015 B. Khasnabish ZTE (TX) Inc. F. Xia Huawei USA January 30, 2015 Virtual Machine Mobility Protocol for Overlay Networks draft-sarikaya-nvo3-vmm-dmm-pmip-05.txt Abstract This document specifies a virtual machine mobility protocol in data centers built with overlay-based network virtualization approach. The protocol is based on the virtual machine sending a gratuitous Address Resolution Protocol request in IPv4 and unsolicited neighbor advertisement message in IPv6 which are broadcast or sent to all nodes after moving to the new Network Virtualization Edge. These messages enable the Network Virtualization Edges update their virtual machine MAC address to the tunnel endpoint tables. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on August 3, 2015. Copyright Notice Copyright (c) 2015 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents Sarikaya, et al. Expires August 3, 2015 [Page 1] Internet-Draft VM Mobility Solution January 2015 (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 2. Conventions and Terminology . . . . . . . . . . . . . . . . . 3 3. Requirements . . . . . . . . . . . . . . . . . . . . . . . . 3 4. Overview of the protocol . . . . . . . . . . . . . . . . . . 4 5. IPv6 Operation . . . . . . . . . . . . . . . . . . . . . . . 5 5.1. IPv6 Unsolicited Neighbor Advertisement . . . . . . . . . 5 5.2. Encapsulation . . . . . . . . . . . . . . . . . . . . . . 6 6. IPv4 Operation . . . . . . . . . . . . . . . . . . . . . . . 6 6.1. Gratuitous ARP . . . . . . . . . . . . . . . . . . . . . 6 6.2. Encapsulation . . . . . . . . . . . . . . . . . . . . . . 7 7. Handling Packets in Flight . . . . . . . . . . . . . . . . . 7 8. Moving Local State of VM . . . . . . . . . . . . . . . . . . 7 9. Handling of Hot, Warm and Cold Virtual Machine Mobility . . . 8 10. Virtual Machine Operation . . . . . . . . . . . . . . . . . . 8 10.1. Virtual Machine Lifecycle Management . . . . . . . . . . 9 11. Security Considerations . . . . . . . . . . . . . . . . . . . 9 12. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9 13. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 9 14. References . . . . . . . . . . . . . . . . . . . . . . . . . 9 14.1. Normative References . . . . . . . . . . . . . . . . . . 9 14.2. Informative references . . . . . . . . . . . . . . . . . 11 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 11 1. Introduction Data center networks are being increasingly used by telecom operators as well as by enterprises. In this document we are interested in overlay-based data center networks supporting multitenancy. These networks are organized as one large Layer 2 network geographically distributed in several buildings. Virtualization which is being used in almost all of today's data centers enables many virtual machines to run on a single physical computer or compute server. Virtual machines (VM) need hypervisor running on the physical compute server to provide them shared processor/memory/storage. Network connectivity is provided by the network virtualization edge (NVE) [I-D.ietf-nvo3-arch], [I-D.ietf-nvo3-nve-nva-cp-req]. Being able to move VMs dynamically, Sarikaya, et al. Expires August 3, 2015 [Page 2] Internet-Draft VM Mobility Solution January 2015 or live migration, from one server to another allows for dynamic load balancing or work distribution and thus it is a highly desirable feature [RFC7364]. There are many challenges and requirements related to migration, mobility, and interconnection of Virtual Machines (VMs)and Virtual Network Elements (VNEs). Retaining IP addresses after a move is a key requirement [RFC7364]. Such a requirement is needed in order to maintain existing transport connections. In view of many virtual machine mobility schemes that exist today, there is a desire to define a standard control plane protocol for virtual machine mobility. The protocol should be based on IPv4 or IPv6. In this document we specify such a protocol. 2. Conventions and Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119] and [I-D.ietf-nvo3-arch]. This document uses the terminology defined in [RFC7364]. In addition we make the following definitions: Hot VM Mobility. A given VM could be moved from one server to another in running state. Warm VM Mobility. In case of warm VM mobility, the VM states are mirrored to the secondary server (or domain) at a predefined (configurable) regular intervals. This reduces the overheads and complexity but this may also lead to a situation when both servers may not contain the exact same data (state information) Cold VM Mobility. A given VM could be moved from one server to another in stopped or suspended state. 3. Requirements This section states requirements on data center network virtual machine mobility. Data center network MUST support virtual machine mobility in IPv6. IPv4 SHOULD also be supported in virtual machine mobility. Virtual machine mobility protocol SHOULD not support host routes. Virtual machine mobility protocol SHOULD not support triangular Sarikaya, et al. Expires August 3, 2015 [Page 3] Internet-Draft VM Mobility Solution January 2015 routing. Virtual machine mobility protocol SHOULD not need to use tunneling except for handling packets in flight. 4. Overview of the protocol Being able to move Virtual Machines dynamically, from one server to another allows for dynamic load balancing or work distribution and thus it is a highly desirable feature. In a Layer-2 based data center approach, virtual machine moving to another server does not change its IP address. Because of this an IP based virtual machine mobility protocol is not needed. However, when a virtual machine moves, NVEs need to change their caches associating VM Layer 2 or MAC address with NVE's IP address. Such a change enables NVE to send outgoing MAC frames addressed to the virtual machine. Virtual machine moves from its source NVE to a new, destination NVE. The move is initiated by the source NVE and is in the same L2 link, the virtual machine IP address(es) do not change but this virtual machine is now under a new NVE, previously communicating NVEs will continue to send their packets to the source NVE. Address Resolution Protocol (ARP) cache in IPv4 or neighbor cache in IPv6 in the NVEs need to be updated. It takes a few seconds for a VM to move from its source NVE to the new destination one. During this period, a tunnel is needed so that source NVE forwards packets to the destination NVE. In IPv4, the virtual machine immediately after the move sends a gratuitous ARP request message containing its IPv4 and Layer-2 or MAC address in its new NVE, destination NVE. This message is sent to the broadcast address. This message is sent as a broadcast MAC frame to the destination NVE. NVE sends it as a MAC frame after encapsulation such as VXLAN [RFC7348] which includes an IPv4 and UDP header. Outer MAC frame contains destination NVE's source MAC address in Outer Source MAC Address field. The frame being a broadcast frame needs to be carried to all nodes in the whole L2 link. One mechanism is to establish a single VLAN [RFC6820]. All NVEs in the VLAN receive this frame and after decapsulation, ARP request is received. All NVEs communicating with this virtual machine MUST update their ARP cache with the new values of the destination NVE's IPv4 address corresponding to the virtual machine's MAC address. This update enables the communicating NVEs to send all new IP packets to the destination NVE under which the virtual machine is located after the virtual machine moved to its new place. IPv6 operation is slightly different: Sarikaya, et al. Expires August 3, 2015 [Page 4] Internet-Draft VM Mobility Solution January 2015 In IPv6, the virtual machine immediately after the move sends an unsolicited neighbor advertisement message containing its IPv6 address and Layer-2 MAC address in its new NVE, the destination NVE. This message is sent to the IPv6 Solicited Node Multicast Address corresponding to the target address which is VM's IPv6 address. NVE sends it as a MAC frame after encapsulation which possibly includes an IPv6 and UDP header. Outer MAC frame contains destination NVE's source MAC address in Outer Source MAC Address field. When querying for a target IP address, neighbor discovery protocol maps the target address, i.e. the IPv6 address of the destination NVE into an IPv6 Solicited Node multicast address which has the form FF02:0:0:0:0:1:FFXX:XXXX, containing the low-order 24 bits of the target address. This frame is sent as multicast frame rather than broadcast as in ARP. This has the benefit that the multicast frames do not necessarily need to be sent to all parts of the network, i.e., the frames can be sent only to segments where listeners for the Solicited Node multicast address reside [RFC6820]. NVE sends the packet to a link-local scope as FF02 indicates. All member NVEs receive this packet after decapsulation. All NVEs communicating with this virtual machine MUST update their neighbor cache with the new values of the destination NVE's IPv6 address corresponding to the virtual machine's MAC address. This update enables the communication NVEs to send all new IP packets to the destination NVE under which the virtual machine is located after the virtual machine moved to its new place. Note that Gratuitous ARP or Unsolicited Neighbor Advertisement messages are normally used to inform link-layer or MAC address changes [RFC4861]. In VM mobility case, these messages are used to inform IP address changes. 5. IPv6 Operation 5.1. IPv6 Unsolicited Neighbor Advertisement The virtual machine as an IPv6 node sends unsolicited Neighbor Advertisement after it moves to the destination NVE to inform neighboring nodes of changes in its attachment. The Neighbor Advertisement contains information required by nodes to determine the type of Neighbor Advertisement message, the sender's role on the network, and typically the link-layer address of the sender. In the IPv6 header of the Neighbor Advertisement message, you will find these settings: Sarikaya, et al. Expires August 3, 2015 [Page 5] Internet-Draft VM Mobility Solution January 2015 For an unsolicited Neighbor Advertisement, the Source Address field is set to a unicast address of the virtual machine. The Destination Address field is set to the link-local scope all-nodes multicast address (FF02::1). The Hop Limit field is set to 255. Source link- layer address option field is set to the virtual machine's MAC address. Assuming that the local link is Ethernet, the virtual machine encapsultes IPv6 datagram in a MAC frame. In the Ethernet header of the Neighbor Advertisement message, you will find the following settings: The Source Address field is set to the MAC address of the virtual machine. For an unsolicited Neighbor Advertisement, the Destination Address field is set to 33-33-00-00-00-01, which is the Ethernet MAC address corresponding to the link-local scope all-nodes multicast address. 5.2. Encapsulation Encapsulation depends on the encapsulation layer protocol used in the data center. VXLAN type of encapsulation details are TBD. 6. IPv4 Operation 6.1. Gratuitous ARP A gratuitous ARP message could be broadcast as an ARP request containing the sender's protocol address (SPA) in the target protocol address field (TPA=SPA), with the target hardware address (THA) set to zero. An alternative is to broadcast an ARP reply with the sender's hardware and protocol addresses (SHA and SPA) duplicated in the target protocol address and target hardware address fields (TPA=SPA, THA=SHA). Gratuitous ARP message is not sent to solicit a reply. Instead it updates any cached entries in the ARP tables of other NVEs that receive the packet. The operation code may indicate a request or a reply because the ARP standard specifies that the opcode is only processed after the ARP table has been updated from the address fields [RFC0826]. Gratuitous ARP request/reply message is sent in an Ethernet frame to NVE by the VM. The destination address is set to the broadcast address for the hardware (all ones in the case of the 10Mbit Ethernet). Sarikaya, et al. Expires August 3, 2015 [Page 6] Internet-Draft VM Mobility Solution January 2015 6.2. Encapsulation Encapsulation depends on the encapsulation layer protocol used in the data center. VXLAN type of encapsulation details are TBD. 7. Handling Packets in Flight Source hypervisor may receive packets from the virtual machine's ongoing communications and these packets should not be lost and they should be sent to the destination hypervisor to be delivered to the virtual machine. The steps involved in handling packets in flight are as follows: Preparation Step It takes some time, possibly a few seconds for a VM to move from its source hypervisor to a new destination one. During this period, a tunnel needs to be established so that source NVE forwards packets to the destination NVE. Tunnel Establishment - IPv6 Inflight packets are tunneled to the destination NVE using the encapsulation protocol such as VXLAN in IPv6. Source NVE gets destination NVE address from NVA along with the request to move the virtual machine. Tunnel Establishment - IPv4 Inflight packets are tunneled to the destination NVE using the encapsulation protocol such as VXLAN in IPv4. Source NVE gets destination NVE address from the gratuitous ARP message sent from the destination NVE. Tunneling Packets - IPv6 IPv6 packets are received for the migrating virtual machine encapsulated in an IPv6 header at the destination NVE. Destination NVE decapsulates the packet and sends IPv6 packet to the migrating VM. Tunneling Packets - IPv4 IPv4 packets are received for the migrating virtual machine encapsulated in an IPv4 header at the destination NVE. Destination NVE decapsulates the packet and sends IPv4 packet to the migrating VM. Stop Tunneling Packets When source NVE receives gratuitous ARP, or Unsolicited Neighbor Advertisement source VTEP MUST stop tunneling packets. 8. Moving Local State of VM After VM mobility related signaling (VM Mobility Registration Request/Reply), the virtual machine state needs to be transferred to the destination Hypervisor. The state includes its memory and file Sarikaya, et al. Expires August 3, 2015 [Page 7] Internet-Draft VM Mobility Solution January 2015 system. Source NVE opens a TCP connection with destination NVE over which VM's memory state is transferred. File system or local storage is more complicated to transfer. The transfer should ensure consistency, i.e. the VM at the destination should find the same file system it had at the source. Precopying is a commonly used technique for transferring the file system. First the whole disk image is transferred while VM continues to run. After the VM is moved any changes in the file system are packaged together and sent to the destination Hypervisor which reflects these changes to the file system locally at the destination. 9. Handling of Hot, Warm and Cold Virtual Machine Mobility Cold Virtual Machine mobility is facilitated by the VM initially sending an ARP or Neighbor Discovery message which enables the correspondents to direct their communication to the destination NVE in the link. A registration message needs to be sent to the source NVE because the messages from all other correspondents will be routed to the source NVE. Previous source NVEs in the chain (if any) need not be informed of the move. Cold VM mobility also allows all previous source NVEs to delete binding update list entries of the VM. The VMs that are used for cold standby receive scheduled backup information but less frequently than that would be for warm standby option. Therefore, the cold mobility option can be used for non- critical applications and services. In cases of warm standby option, the backup VMs receive backup information at regular intervals. The duration of the interval determines the warmth of the standby option. The larger the duration, the less warm (and hence cold) the standby option becomes. In case of hot standby option, the VMs in both primary and secondary domains have identical information and can provide services simultaneously as in load-share mode of operation. If the VMs in the primary domain fails, there is no need to actively move the VMs to the secondary domain because the VMs in the secondary domain already contains identical information. The hot standby option is the most costly mechanism for providing redundancy, and hence this option is utilized only for mission-critical applications and services. 10. Virtual Machine Operation Virtual machines are not involved in any mobility signalling. Once VM moves to the destination hypervisor, VM IP address does not change and VM should be able to continue to receive packets to its address(es). This happens in hot VM mobility scenarios. Sarikaya, et al. Expires August 3, 2015 [Page 8] Internet-Draft VM Mobility Solution January 2015 Virtual machine sends a gratuitous Address Resolution Protocol or unsolicited Neighbor Advertisement message upstream after each move. 10.1. Virtual Machine Lifecycle Management Managing the lifecycle of VM includes creating a VM with all of the required resources, and managing them seamlessly as the VM migrates from one service to another during its lifetime. The on-boarding process includes the following steps: 1. Sending an allowed (authorized/authenticated) request to Network Virtualization Authority (NVA) in an acceptable format with mandatory/optional virtualized resources {cpu, memory, storage, process/thread support, etc.} and interface information 2. Receiving an acknowledgement from the NVA regarding availability and usability of virtualized resources and interface package 3. Sending a confirmation message to the NVA with request for approval to adapt/adjust/modify the virtualized resources and interface package for utilization in a service. 11. Security Considerations TBD. 12. IANA Considerations This document makes no request to IANA. 13. Acknowledgements The authors are grateful to Tom Herbert for his comments. 14. References 14.1. Normative References [RFC0826] Plummer, D., "Ethernet Address Resolution Protocol: Or converting network protocol addresses to 48.bit Ethernet address for transmission on Ethernet hardware", STD 37, RFC 826, November 1982. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC2629] Rose, M., "Writing I-Ds and RFCs using XML", RFC 2629, June 1999. Sarikaya, et al. Expires August 3, 2015 [Page 9] Internet-Draft VM Mobility Solution January 2015 [RFC5213] Gundavelli, S., Leung, K., Devarapalli, V., Chowdhury, K., and B. Patil, "Proxy Mobile IPv6", RFC 5213, August 2008. [RFC5844] Wakikawa, R. and S. Gundavelli, "IPv4 Support for Proxy Mobile IPv6", RFC 5844, May 2010. [RFC3007] Wellington, B., "Secure Domain Name System (DNS) Dynamic Update", RFC 3007, November 2000. [RFC3022] Srisuresh, P. and K. Egevang, "Traditional IP Network Address Translator (Traditional NAT)", RFC 3022, January 2001. [RFC4271] Rekhter, Y., Li, T., and S. Hares, "A Border Gateway Protocol 4 (BGP-4)", RFC 4271, January 2006. [RFC4861] Narten, T., Nordmark, E., Simpson, W., and H. Soliman, "Neighbor Discovery for IP version 6 (IPv6)", RFC 4861, September 2007. [RFC6820] Narten, T., Karir, M., and I. Foo, "Address Resolution Problems in Large Data Center Networks", RFC 6820, January 2013. [I-D.ietf-nvo3-vm-mobility-issues] Rekhter, Y., Henderickx, W., Shekhar, R., Fang, L., Dunbar, L., and A. Sajassi, "Network-related VM Mobility Issues", draft-ietf-nvo3-vm-mobility-issues-03 (work in progress), June 2014. [I-D.ietf-nvo3-arch] Black, D., Hudson, J., Kreeger, L., Lasserre, M., and T. Narten, "An Architecture for Overlay Networks (NVO3)", draft-ietf-nvo3-arch-02 (work in progress), October 2014. [RFC7348] Mahalingam, M., Dutt, D., Duda, K., Agarwal, P., Kreeger, L., Sridhar, T., Bursell, M., and C. Wright, "Virtual eXtensible Local Area Network (VXLAN): A Framework for Overlaying Virtualized Layer 2 Networks over Layer 3 Networks", RFC 7348, August 2014. [RFC7364] Narten, T., Gray, E., Black, D., Fang, L., Kreeger, L., and M. Napierala, "Problem Statement: Overlays for Network Virtualization", RFC 7364, October 2014. [RFC7365] Lasserre, M., Balus, F., Morin, T., Bitar, N., and Y. Rekhter, "Framework for Data Center (DC) Network Virtualization", RFC 7365, October 2014. Sarikaya, et al. Expires August 3, 2015 [Page 10] Internet-Draft VM Mobility Solution January 2015 14.2. Informative references [I-D.ietf-nvo3-nve-nva-cp-req] Kreeger, L., Dutt, D., Narten, T., and D. Black, "Network Virtualization NVE to NVA Control Protocol Requirements", draft-ietf-nvo3-nve-nva-cp-req-03 (work in progress), October 2014. [I-D.wkumari-dcops-l3-vmmobility] Kumari, W. and J. Halpern, "Virtual Machine mobility in L3 Networks.", draft-wkumari-dcops-l3-vmmobility-00 (work in progress), August 2011. [I-D.shima-clouds-net-portability-reqs-and-models] Shima, K., Sekiya, Y., and K. Horiba, "Network Portability Requirements and Models for Cloud Environment", draft- shima-clouds-net-portability-reqs-and-models-01 (work in progress), October 2011. [I-D.raggarwa-data-center-mobility] Aggarwal, R., Rekhter, Y., Henderickx, W., Shekhar, R., Fang, L., and A. Sajassi, "Data Center Mobility based on E-VPN, BGP/MPLS IP VPN, IP Routing and NHRP", draft- raggarwa-data-center-mobility-07 (work in progress), June 2014. [I-D.khasnabish-vmmi-problems] Khasnabish, B., Liu, B., Lei, B., and F. Wang, "Mobility and Interconnection of Virtual Machines and Virtual Network Elements", draft-khasnabish-vmmi-problems-03 (work in progress), December 2012. Authors' Addresses Behcet Sarikaya Huawei USA 5340 Legacy Dr. Building 3 Plano, TX 75024 Email: sarikaya@ieee.org Linda Dunbar Huawei USA 5340 Legacy Dr. Building 3 Plano, TX 75024 Email: linda.dunbar@huawei.com Sarikaya, et al. Expires August 3, 2015 [Page 11] Internet-Draft VM Mobility Solution January 2015 Bhumip Khasnabish ZTE (TX) Inc. 55 Madison Avenue, Suite 160 Morristown, NJ 07960 Email: vumip1@gmail.com, bhumip.khasnabish@ztetx.com Frank Xia Huawei USA Nanjing, China Phone: +1 972-509-5599 Email: xiayangsong@huawei.com Sarikaya, et al. Expires August 3, 2015 [Page 12]