Network Working Group B. Sarikaya Internet-Draft L. Dunbar Intended status: Standards Track Huawei USA Expires: April 24, 2015 B. Khasnabish ZTE (TX) Inc. October 21, 2014 Virtual Machine Mobility Protocol Using Distributed Registrations draft-sarikaya-nvo3-vmm-dmm-pmip-04.txt Abstract This document specifies a new IP level protocol for seamless virtual machine mobility in data centers. Source network virtualization edge registers the newly created virtual machine with the centrally available management node. When the virtual machine moves to the destination network virtualization edge, the destination network virtualization edge updates the virtual machine record in the management node. Management node sends registration message to all previous source network virtualization edges in order to direct the ongoing traffic to the destination network virtualization edge. Hot, warm and and cold virtual machine mobility and intra data center and inter data center virtual machine mobility solutions are presented. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on April 24, 2015. Copyright Notice Copyright (c) 2014 IETF Trust and the persons identified as the document authors. All rights reserved. Sarikaya, et al. Expires April 24, 2015 [Page 1] Internet-Draft VM Mobility Solution Protocol October 2014 This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 2. Conventions and Terminology . . . . . . . . . . . . . . . . . 3 3. Requirements . . . . . . . . . . . . . . . . . . . . . . . . 3 4. Architecture . . . . . . . . . . . . . . . . . . . . . . . . 4 4.1. VM Mobility Protocol Architecture . . . . . . . . . . . . 4 5. VM Mobility Protocol Operation . . . . . . . . . . . . . . . 6 6. Moving Local State of VM . . . . . . . . . . . . . . . . . . 8 7. Handling of Hot, Warm and Cold Virtual Machine Mobility . . . 8 7.1. Route Optimization . . . . . . . . . . . . . . . . . . . 9 7.2. Intra Data Center Hot Virtual Machine Mobility . . . . . 9 7.3. Inter Data Center Hot Virtual Machine Mobility . . . . . 10 8. Virtual Machine Operation . . . . . . . . . . . . . . . . . . 10 8.1. Virtual Machine Lifecycle Management . . . . . . . . . . 10 9. Handling IPv4/IPv6 Virtual Machine Mobility and NAT Issues . 11 10. Security Considerations . . . . . . . . . . . . . . . . . . . 11 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11 12. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 11 13. References . . . . . . . . . . . . . . . . . . . . . . . . . 12 13.1. Normative References . . . . . . . . . . . . . . . . . . 12 13.2. Informative references . . . . . . . . . . . . . . . . . 13 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 13 1. Introduction Data center networks are being increasingly used by telecom operators as well as by enterprises. Currently these networks are organized as Layer 3 switched data center networks or one large Layer 2 network geographically distributed in several buildings. Virtualization which is being used in almost all of today's data centers enables many virtual machines to run on a single physical computer or compute server. Virtual machines (VM) need hypervisor running on the physical compute server to provide them shared processor/memory/storage. Network connectivity is provided by the network virtualization edge (NVE) [I-D.ietf-nvo3-arch], [I-D.ietf-nvo3-nve-nva-cp-req]. Being able to move VMs dynamically, Sarikaya, et al. Expires April 24, 2015 [Page 2] Internet-Draft VM Mobility Solution Protocol October 2014 or live migration, from one server to another allows for dynamic load balancing or work distribution and thus it is a highly desirable feature [RFC7364], [I-D.ietf-nvo3-vm-mobility-issues]. There are many challenges and requirements related to migration, mobility, and interconnection of Virtual Machines (VMs)and Virtual Network Elements (VNEs). Retaining IP addresses after a move is a key requirement [RFC7364]. Such a requirement is needed in order to maintain existing transport connections. In view of many virtual machine mobility schemes that exist today, there is a desire to define standard control plane protocol for Layer 3 based virtual machine mobility. The protocol should be based on IPv4 or IPv6. In this document we specify such a protocol. 2. Conventions and Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119] and [I-D.ietf-nvo3-arch]. This document uses the terminology defined in [RFC7365]. In addition we make the following definitions: Hot VM Mobility. A given VM could be moved from one server to another in running state. Warm VM Mobility. In case of warm VM mobility, the VM states are mirrored to the secondary server (or domain) at a predefined (configurable) regular intervals. This reduces the overheads and complexity but this may also lead to a situation when both servers may not contain the exact same data (state information) Cold VM Mobility. A given VM could be moved from one server to another in stopped or suspended state. 3. Requirements This section states requirements on data center network virtual machine mobility. Data center network MUST support virtual machine mobility in IPv6. IPv4 SHOULD also be supported in virtual machine mobility. Sarikaya, et al. Expires April 24, 2015 [Page 3] Internet-Draft VM Mobility Solution Protocol October 2014 Tunneling MAY be used between VMs in the same or different data centers. Tunneling MUST NOT be used between a VM located in a data center and a host in some other site. Host routes MAY be used between VMs in different data centers and between a VM located in a data center and a host in some other site [I-D.shima-clouds-net-portability-reqs-and-models]. Triangular routing MAY be be used between VMs in different data centers. The use of triangular routing SHOULD be minimized between a VM located in a data center and a host in some other site. 4. Architecture Datacenter is Layer-2 based if packets are switched inside a rack and bridged among the racks, i.e. completely in Layer-2. From IP point of view the nodes are connected to a single link. Layer-2 based networks make it easy to move Virtual Machines from one server to another but on the other hand they don't scale well for address resolution protocols like ARP [RFC6820]. In this document we assume L3-based datacenter network and design live virtual machine migration protocol. The design makes minimum use of Proxy Mobile IPv6 protocol as in [RFC5213]. 4.1. VM Mobility Protocol Architecture Virtual Machines connect to the network using a virtual interface supported by the NVE. In this document, the NVE is the source NVE after VM moves to another NVE called destination NVE (see Figure 1). Top of Rack Switch (ToR) is a switch used to connect the servers in a data center to the data center network. Border Router (BR) is the data center border router that provides connectivity between VMs and hosts communicating with the VMs. The data center has an associated storage center. The storage center is connected to the data center using fast means such as fiber channel (fc). When VM is created it registers with the data center management system called Virtual Machine (VM) Manager. The management system keeps a record of all VMs and their most recent addresses. VM Manager manages all intra- and inter-data center VM mobility. After VM is created it starts to serve its users. During this process, VM may be moved anytime. VM moves from the source NVE to the destination NVE. Because of the requirement, even if the VM moves to a different subnet its IP address(es) do not change. In Sarikaya, et al. Expires April 24, 2015 [Page 4] Internet-Draft VM Mobility Solution Protocol October 2014 Figure 1, if a VM moves from NVE A to NVE B, NVE A is the source and NVE B is the destination NVE. I N T E R N E T | | ------ ------ | BR | | BR | ------ ------ ________|_____________|______________________________________ | | | Data | | ------ ------ fc Center | | | R | | R |-----------------------| | | ------ ------ | | | | | \ _____|________ | | -------------- \____________ | || | |Agg. Switch | ( Network ) |Storage Center|| | -------------- (_Management_) |_____________ || | | \________/ | (___Node___) | | ------ ------ | | |Switch| |Switch| | | ------ ------ | | | \________/ | | | | / \ | | | ------------ ----- | | | NVE A | |NVE B| | | ------------ ----- ---------------------+ | | | | | | | | ---------- | ---------- | | |--| Server | |--| Server | Other Servers | | | |Hypervisor| | ---------- | | | | ---- | | | | | | | VM | | | ---------- | | | | ----- | --| Server | | | | | | VM | | |Hypervisor| | | | | ----- | | ----- | | | | | | VM | | | | VM | | | | | | ---- | | ---- | | | | ---------- ---------- | | |-- Other servers | ------------------------------------------------------------- Figure 1: Architecture of VMM Sarikaya, et al. Expires April 24, 2015 [Page 5] Internet-Draft VM Mobility Solution Protocol October 2014 5. VM Mobility Protocol Operation When a virtual machine is created, source Network Virtualization Edge sends a VM mobility registration message called VM Mobility Registration Request or registration request in short to the management node. The message is as UDP message. Message data contains various options which are structured in Type Length Value (TLV) format. VM Mobility Registration request message contains MAC address of the VM, Virtual Machine Identifier option containing VM-ID, Virtual Machine Address option containing VM address. More than one Virtual Machine Address option can be included, possibly one for each interface of VM. Source address of VM Mobility Registration Request packet is used as the new address to which packets need to be sent. Source NVE keeps all these values for each VM in a data structure called Binding Update List [RFC5213], one entry for each VM. The management node records the virtual machine information in a binding cache entry for this virtual machine including the source address. The management node sends a VM Mobility Registration Reply message to the NVE. The message is structured in UDP. VM Registration Reply message MUST contain a status field which should be set to accepted or rejected. Virtual machine moves from its source NVE to a new, destination NVE. The move could be initiated by the VM Manager. The virtual machine IP address(es) do not change. However, the address of the serving NVE changes. Because of this, registration message exchange MUST be conducted after the move. The destination NVE MUST send a VM Mobility Registration Request message to the management node. The management node receives VM Mobility Registration Request message and searches the binding cache for a matching entry. Once the match is found, the entry is modified to point to the new IP address of the serving NVE. Previous NVE addresses are kept in the entry. The management node sends a reply (VM Mobility Registration Reply message) to the destination NVE to indicate the acceptance of the registration. Source NVE or previous source NVEs need to be informed of the new IP address(es) of the serving NVE. For this purpose, the management node sends VM Mobility Registration Request message to the source NVEs. NVE verifies that this message is coming from the management node otherwise rejects any such message. Source NVE sends VM Mobility Registration Reply back to the management node. Source NVE creates a host route pointing the old Sarikaya, et al. Expires April 24, 2015 [Page 6] Internet-Draft VM Mobility Solution Protocol October 2014 serving NVE address to the new serving NVE address. The old serving NVE address is obtained from the Binding Update List entry matching this VM and the new serving NVE address from the VM Mobility Registration Request message received from the management node. Tunneling in the data plane MAY be supported instead of establishing source routes in the control plane. In this case, the source NVE initiates a tunnel interface to the destination NVE. Source NVE encapsulates all packets for this virtual machine (IPv4-in-IPv4 or IPv6-in-IPv6) and sends them to the tunnel interface. Virtual Machine Mobility Registration Request message contains a Lifetime field, a 16-bit unsigned integer. Lifetime field contains the lifetime of the registration in the number of time units (each 4 seconds). Source NVE sends its suggested value and the management node sends the final value of the lifetime which is equal or less than the suggested value. In order to extend a binding that is expiring, the Hypervisor sends periodic reregistration messages (VM Mobility Registration Requests). All source NVEs keep one entry in their Binding Update List for each virtual machine that was in communication before it was moved, i.e. VMs in hot VM mobility. Binding Update List is used to create the host routes or for tunneling. In the data plane, if host routes are established, source NVE sends all packets from ongoing connections of the virtual machine to the destination NVE using the host route. Destination NVE receives the packet and sends it to the VM. This delivery mechanism does not avoid triangular routing but it avoids tunneling. Route optimization, i.e. avoiding triangular routing is explained in Section 7. At the source NVE, virtual machine entries are kept in the binding update list until all inbound traffic to the virtual machine stops. A timer may be used for this purpose. When the timer times out, the entry is deleted. The virtual machine after the move sends IPv4 gratuitous Address Resolution Protocol or IPv6 unsolicited Neighbor Advertisement message upstream. This update serves to update ARP or neighbor caches of the VMs in the same link so that all traffic from new connections are directed to the new location of the virtual machine with no tunneling or triangular routing. Sarikaya, et al. Expires April 24, 2015 [Page 7] Internet-Draft VM Mobility Solution Protocol October 2014 6. Moving Local State of VM After VM mobility related signaling (VM Mobility Registration Request/Reply), the virtual machine state needs to be transferred to the destination Hypervisor. The state includes its memory and file system. Source Hypervisor opens a TCP connection with destination Hypervisor over which VM's memory state is transferred. File system or local storage is more complicated to transfer. The transfer should ensure consistency, i.e. the VM at the destination should find the same file system it had at the source. Precopying is commonly used technique for transferring the file system. First the whole disk image is transferred while VM continues to run. After the VM is moved any changes in the file system are packaged together and sent to the destination Hypervisor which reflects these changes to the file system locally at the destination. 7. Handling of Hot, Warm and Cold Virtual Machine Mobility Cold Virtual Machine mobility is facilitated by the VM initially sending an ARP or Neighbor Discovery message which enables the correspondents to direct their communication to the destination NVE in the link. A registration message needs to be sent to the source NVE because the messages from all other correspondents will be routed to the source NVE. Previous source NVEs in the chain (if any) need not be informed of the move. Cold VM mobility also allows all previous source NVEs to delete binding update list entries of the VM. The VMs that are used for cold standby receive scheduled backup information but less frequently than that would be for warm standby option. Therefore, the cold mobility option can be used for non- critical applications and services. In cases of warm standby option, the backup VMs receive backup information at regular intervals. The duration of the interval determines the warmth of the standby option. The larger the duration, the less warm (and hence cold) the standby option becomes. In case of hot standby option, the VMs in both primary and secondary domains have identical information and can provide services simultaneously as in load-share mode of operation. If the VMs in the primary domain fails, there is no need to actively move the VMs to the secondary domain because the VMs in the secondary domain already contains identical information. The hot standby option is the most costly mechanism for providing redundancy, and hence this option is utilized only for mission-critical applications and services. Sarikaya, et al. Expires April 24, 2015 [Page 8] Internet-Draft VM Mobility Solution Protocol October 2014 7.1. Route Optimization When VM in motion has ongoing communications with outside hosts, the packets will continue to be received at the source NVEs. Source NVEs create host routes or tunnels based on the binding cache entries they have for the VM. Source route enables them to route ongoing communications to the destination NVE. If the VM moved to a different data center then the packets are routed to the new data center. Host routes avoid tunneling. However host routes do not avoid triangular routing. Route optimization is needed to avoid triangular routing. In mobility protocols route optimization is achieved by establishing a direct route between all communicating hosts, a.k.a. correspondent nodes and the destination virtual machine. Such a solution requires host modifications and not scalable in virtual machine mobility. Optimal IP routing of the incoming traffic is divided into two components: intra data center traffic and inter data center traffic. 7.2. Intra Data Center Hot Virtual Machine Mobility Optimal IP routing of the incoming intra data center traffic is achieved as follows: Management node after sending VM Mobility Registration Request message to the source NVEs in Section 5, it also exchanges VM Mobility Registration Request/Reply messages with the default router of the source NVE. Default router is usually the Top of Rack switch or the default router could be a different node depending on the configuration of the data center. The default router interprets (source NVE, destination NVE) values in pairs as host routes for virtual machines. The default router establishes these host routes and uses them to redirect traffic from any correspondent nodes or from VMs in the servers. The default routers MUST allow configuration of the host route generated by our VM Mobility Registration protocol. VM is not moved until Interior gateway protocol (IGP), e.g. OSPF or IS-IS to announce the route by the default router of the destination NVE. The VMs can wait to move until the host route is set-up. The VM Mobility Registration protocol is basically used to inform both routers that this process is going on. Sarikaya, et al. Expires April 24, 2015 [Page 9] Internet-Draft VM Mobility Solution Protocol October 2014 7.3. Inter Data Center Hot Virtual Machine Mobility Optimal IP routing of the incoming inter data center traffic can be achieved by propagating the host routes using inter-domain routing protocols such as Border Gateway Protocol (BGP) [RFC4271]. If the host routes are propagated within a Data Center using IGPs, the normal redistribution mechanism can by policy redistribute the host routes at the Border Router. A BGP Community can be tagged to the host routes to make it easier to process. Border router (BR) MAY send BGP UPDATE message to its BGP peers. Source NVEs receiving incoming traffic for a VM that has moved first try to reroute the traffic using host routes. Next action should be to inform the border router to initiate a BGP update message. Source NVE may inform each host route that it has in its binding update list for the VM to the BR. Border router generates an UPDATE message using the information it received from the source NVEs. UPDATE message contains one or more ORIGIN path attributes which is set to IGP. The address prefix values in IPv4 or IPv6 of the VM when it was at the source NVE and the destination prefix are contained in Network Layer Reachability Information (NLRI) field of the UPDATE message. UPDATE messages with host routes should be exchanged among a particular set of data centers, possibly the data centers belonging to the same operator. This constrained propagation can be achived by policy enforcement. 8. Virtual Machine Operation Virtual machines are not involved in any mobility signalling. Once VM moves to the destination hypervisor, VM should be able to continue to receive packets to its previous address(es). This happens in hot VM mobility scenarios. Virtual machine sends a gratuitous Address Resolution Protocol or unsolicited Neighbor Advertisement message upstream after each move. 8.1. Virtual Machine Lifecycle Management Managing the lifecycle of VM includes creating a VM with all of the required resources, and managing them seamlessly as the VM migrates from one service to another during its lifetime. The on-boarding process includes the following steps: 1. Sending an allowed (authorized/authenticated) request to Management/Orchestration module in an acceptable format with Sarikaya, et al. Expires April 24, 2015 [Page 10] Internet-Draft VM Mobility Solution Protocol October 2014 mandatory/optional virtualized resources {cpu, memory, storage, process/thread support, etc.} and interface information 2. Receiving an acknowledgement from the Management/Orchestration module regarding availability and usability of virtualized resources and interface package 3. Sending a confirmation message to the Management/Orchestration module with request for approval to adapt/adjust/modify the virtualized resources and interface package for utilization in a service 9. Handling IPv4/IPv6 Virtual Machine Mobility and NAT Issues Virtual machine registration protocol uses UDP request and reply messages. In case of IPv4, the fields include IPv4 addresses. In case of IPv6, the fields include IPv6 prefixes. In case of IPv4, Network Address and Port Translation (NAPT) may be in use. In this document we assume that in the case of intra DC VM mobility, NAPT operation happens in some upstream node such as the border router and thus does not affect the virtual machine registration protocol operation. This enables intra-data center VM mobility of privately addressed VMs. In case of inter DC VM mobility, NAPT operation has to be considered. Private IPv4 addresses in the VM Mobility Registration messages have to be converted into public IPv4 addresses at the NAPT box before sending out to the destination data center. For incoming packets, the NAPT box has to convert public addresses into their corresponding private addresses before sending the registration packet downstream. 10. Security Considerations TBD. 11. IANA Considerations TBD. 12. Acknowledgements The authors are grateful to Tom Herbert for his comments. Sarikaya, et al. Expires April 24, 2015 [Page 11] Internet-Draft VM Mobility Solution Protocol October 2014 13. References 13.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC2629] Rose, M., "Writing I-Ds and RFCs using XML", RFC 2629, June 1999. [RFC5213] Gundavelli, S., Leung, K., Devarapalli, V., Chowdhury, K., and B. Patil, "Proxy Mobile IPv6", RFC 5213, August 2008. [RFC5844] Wakikawa, R. and S. Gundavelli, "IPv4 Support for Proxy Mobile IPv6", RFC 5844, May 2010. [RFC3007] Wellington, B., "Secure Domain Name System (DNS) Dynamic Update", RFC 3007, November 2000. [RFC3022] Srisuresh, P. and K. Egevang, "Traditional IP Network Address Translator (Traditional NAT)", RFC 3022, January 2001. [RFC4271] Rekhter, Y., Li, T., and S. Hares, "A Border Gateway Protocol 4 (BGP-4)", RFC 4271, January 2006. [RFC6820] Narten, T., Karir, M., and I. Foo, "Address Resolution Problems in Large Data Center Networks", RFC 6820, January 2013. [I-D.ietf-nvo3-vm-mobility-issues] Rekhter, Y., Henderickx, W., Shekhar, R., Fang, L., Dunbar, L., and A. Sajassi, "Network-related VM Mobility Issues", draft-ietf-nvo3-vm-mobility-issues-03 (work in progress), June 2014. [I-D.ietf-nvo3-arch] Black, D., Hudson, J., Kreeger, L., Lasserre, M., and T. Narten, "An Architecture for Overlay Networks (NVO3)", draft-ietf-nvo3-arch-01 (work in progress), February 2014. [RFC7364] Narten, T., Gray, E., Black, D., Fang, L., Kreeger, L., and M. Napierala, "Problem Statement: Overlays for Network Virtualization", RFC 7364, October 2014. [RFC7365] Lasserre, M., Balus, F., Morin, T., Bitar, N., and Y. Rekhter, "Framework for Data Center (DC) Network Virtualization", RFC 7365, October 2014. Sarikaya, et al. Expires April 24, 2015 [Page 12] Internet-Draft VM Mobility Solution Protocol October 2014 13.2. Informative references [I-D.ietf-nvo3-nve-nva-cp-req] Kreeger, L., Dutt, D., Narten, T., and D. Black, "Network Virtualization NVE to NVA Control Protocol Requirements", draft-ietf-nvo3-nve-nva-cp-req-02 (work in progress), April 2014. [I-D.wkumari-dcops-l3-vmmobility] Kumari, W. and J. Halpern, "Virtual Machine mobility in L3 Networks.", draft-wkumari-dcops-l3-vmmobility-00 (work in progress), August 2011. [I-D.shima-clouds-net-portability-reqs-and-models] Shima, K., Sekiya, Y., and K. Horiba, "Network Portability Requirements and Models for Cloud Environment", draft- shima-clouds-net-portability-reqs-and-models-01 (work in progress), October 2011. [I-D.raggarwa-data-center-mobility] Aggarwal, R., Rekhter, Y., Henderickx, W., Shekhar, R., Fang, L., and A. Sajassi, "Data Center Mobility based on E-VPN, BGP/MPLS IP VPN, IP Routing and NHRP", draft- raggarwa-data-center-mobility-07 (work in progress), June 2014. [I-D.khasnabish-vmmi-problems] Khasnabish, B., Liu, B., Lei, B., and F. Wang, "Mobility and Interconnection of Virtual Machines and Virtual Network Elements", draft-khasnabish-vmmi-problems-03 (work in progress), December 2012. Authors' Addresses Behcet Sarikaya Huawei USA 5340 Legacy Dr. Building 3 Plano, TX 75024 Email: sarikaya@ieee.org Linda Dunbar Huawei USA 5340 Legacy Dr. Building 3 Plano, TX 75024 Email: linda.dunbar@huawei.com Sarikaya, et al. Expires April 24, 2015 [Page 13] Internet-Draft VM Mobility Solution Protocol October 2014 Bhumip Khasnabish ZTE (TX) Inc. 55 Madison Avenue, Suite 160 Morristown, NJ 07960 Email: vumip1@gmail.com, bhumip.khasnabish@ztetx.com Sarikaya, et al. Expires April 24, 2015 [Page 14]