Network Working Group B. Sarikaya Internet-Draft L. Dunbar Intended status: Standards Track Huawei USA Expires: August 22, 2013 February 18, 2013 Virtual Machine Mobility Protocol Using Distributed Proxy Mobile IPv6 draft-sarikaya-nvo3-vmm-dmm-pmip-00.txt Abstract This document specifies a new IP level protocol for seamless virtual machine mobility in data centers. Source Hypervisor registers the newly created virtual machine with the centrally available management node. When the virtual machine moves to the destination Hypervisor, the destination Hypervisor updates the virtual machine record in the management node. Management node sends registration message to all previous source Hypervisors in order to direct the ongoing traffic to the destination Hypervisor. Cold and hot virtual machine mobility are achieved using dynamic domain name system update and host routes. Both intra data center and inter data center hot virtual machine mobility solutions are presented. Status of this Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on August 22, 2013. Copyright Notice Copyright (c) 2013 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of Sarikaya & Dunbar Expires August 22, 2013 [Page 1] Internet-Draft VM Mobility Using Distributed PMIPv6 February 2013 publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Conventions and Terminology . . . . . . . . . . . . . . . . . 3 3. Requirements . . . . . . . . . . . . . . . . . . . . . . . . . 3 4. Architecture . . . . . . . . . . . . . . . . . . . . . . . . . 4 4.1. VM Mobility Protocol Architecture . . . . . . . . . . . . 4 5. VM Mobility Protocol Operation . . . . . . . . . . . . . . . . 5 6. Moving Local State of VM . . . . . . . . . . . . . . . . . . . 7 7. Handling of Hot and Cold Virtual Machine Mobility . . . . . . 8 7.1. Intra Data Center Hot Virtual Machine Mobility . . . . . . 9 7.2. Inter Data Center Hot Virtual Machine Mobility . . . . . . 9 8. Virtual Machine Operation . . . . . . . . . . . . . . . . . . 10 9. Handling IPv4 Virtual Machine Mobility . . . . . . . . . . . . 10 10. Security Considerations . . . . . . . . . . . . . . . . . . . 11 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11 12. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 11 13. References . . . . . . . . . . . . . . . . . . . . . . . . . . 11 13.1. Normative References . . . . . . . . . . . . . . . . . . . 11 13.2. Informative references . . . . . . . . . . . . . . . . . . 12 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 14 Sarikaya & Dunbar Expires August 22, 2013 [Page 2] Internet-Draft VM Mobility Using Distributed PMIPv6 February 2013 1. Introduction Data center networks are being increasingly used by telecom operators as well as by enterprises. Currently these networks are organized as one large Layer 2 network in a single building. In some cases such a network is extended geographically using virtual private network (VPN) technologies still as an even larger Layer 2 network. Virtualization which is being used in almost all of today's data centers enables many virtual machines to run on a single physical computer or compute server. Virtual machines (VM) need hypervisor running on the physical compute server to provide them shared processor/memory/storage and network connectivity [I-D.kreeger-nvo3-overlay-cp]. Being able to move VMs dynamically, or live migration, from one server to another allows for dynamic load balancing or work distribution and thus it is a highly desirable feature [I-D.ietf-nvo3-vm-mobility-issues]. VM mobility is currently being provided by Layer 2 [802.1Qbg] or Layer 2.5 techniques [I-D.raggarwa-data-center-mobility]. In view of many virtual machine mobility schemes that exist today, there is a desire to define standard control plane protocol for Layer 3 based virtual machine mobility. The protocol should be based on IPv6. In this document we specify such a protocol. 2. Conventions and Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119]. This document uses the terminology defined in [I-D.ietf-nvo3-vm-mobility-issues]. 3. Requirements This section states requirements on data center network virtual machine mobility. Data center network MUST support virtual machine mobility in IPv6. IPv4 SHOULD also be supported in virtual machine mobility. Tunneling MUST NOT be used between VMs in different data centers. Tunneling MUST NOT be used between a VM located in a data center and a host in some other site. Sarikaya & Dunbar Expires August 22, 2013 [Page 3] Internet-Draft VM Mobility Using Distributed PMIPv6 February 2013 Host routes MAY be used between VMs in different data centers and between a VM located in a data center and a host in some other site [I-D.shima-clouds-net-portability-reqs-and-models]. Triangular routing MAY be be used between VMs in different data centers. The use of triangular routing SHOULD be minimized between a VM located in a data center and a host in some other site. 4. Architecture Datacenter is Layer-2 based if packets are switched inside a rack and bridged among the racks, i.e. completely in Layer-2. From IP point of view the nodes are connected to a single link. Layer-2 based networks make it easy to move Virtual Machines from one server to another but on the other hand they don't scale well for address resolution protocols like ARP [RFC6820]. In this document we assume L3-based datacenter network and design live virtual machine migration protocol. The design makes minimum use of Proxy Mobile IPv6 protocol as in [RFC5213]. The document is inspired from recent work on Distributed Proxy Mobile IPv6 such as [I-D.bernardos-dmm-distributed-anchoring], [I-D.seite-dmm-dma] which extend Proxy Mobile IPv6 protocol for distributed anchoring. 4.1. VM Mobility Protocol Architecture Virtual Machines connect to the network using a virtual interface supported by the Hypervisor. In this document, the hypervisor that supports distributed Proxy Mobile IP based virtual machine mobility is called mHypervisor, or mHS in short. For VMs that are created locally and have not yet moved, mHS is the serving or destination mHS. After the VM moves to a new mHypervisor, the previous mHS is called the anchor or source mHS (see Figure 1). Top of Rack Switch (ToR) is a switch used to connect the servers in a data center to the data center network. Border Router (BR) is the data center border router that provides connectivity between VMs and hosts communicating with the VMs. The data center has an associated storage center. The storage center is connected to the data center using fast means such as fiber channel (fc). When VM is created it registers with the data center management system. The management system keeps a record of all VMs and their most recent addresses. Data center management system manages all intra- and inter-data center VM mobility. After VM is created it starts to serve its users. During this Sarikaya & Dunbar Expires August 22, 2013 [Page 4] Internet-Draft VM Mobility Using Distributed PMIPv6 February 2013 process, VM may be moved anytime. Live VM migration is done by the hypervisor. VM moves from the source hypervisor (anchor mHS) to the destination hypervisor (serving mHS). If VM moves to a different subnet its IP address(es) change. In Figure 1, if a VM moves from Hypervisor A to Hypervisor B, Hypervisor A is the source and Hypervisor B is the destination hypervisor. I N T E R N E T | | _____________________________________________________________ | | | Data | | ------ ------ fc Center | | | BR | | BR |-----------------| | | ------ ------ | | | |_____________| _____|________ | | ( Data Center ) | | | | ( Network ) |Storage Center| | | (_Management_) |_____________ | | | (___Node___) | | | | | ---- ---- | | | | | | ------------ ----- | | | ToR Switch | | ToR | | | ------------ ----- ---------------------+ | | | | | | | | ---------- | ---------- | | |--| Server | |--| Server | Other Servers | | | |Hypervisor| | ---------- | | | | ---- A| | | | | | | VM | | | ---------- | | | | ----- | --| Server | | | | | | VM | | |Hypervisor| | | | | ----- | | ----- B| | | | | | VM | | | | VM | | | | | | ---- | | ---- | | | | ---------- ---------- | | |-- Other servers | ------------------------------------------------------------- Figure 1: Architecture of DPMIP VMM 5. VM Mobility Protocol Operation When a virtual machine is created, its hypervisor or source hypervisor sends a registration message called VM Registration Sarikaya & Dunbar Expires August 22, 2013 [Page 5] Internet-Draft VM Mobility Using Distributed PMIPv6 February 2013 Request or registration request in short to the management node. The message is structured as a Proxy Binding Update (PBU) message [RFC5213], i.e. an IPv6 extension header containing the fields of length, type, checksum, sequence number, lifetime and the message data. Message data contains various options which are structured in Type Length Value (TLV) format. Registration request message contains Virtual Machine Link Layer Identifier option (renamed from Mobile Node Link Layer Identifier option) in which Link-Layer Identifier is MAC address of the VM, Virtual Machine Identifier option (renamed from Mobile Node Identifier option) containing VM-ID, Virtual Machine Address option (renamed from Home Network Prefix Option) containing VM address. More than one Virtual Machine Address option can be included, possibly one for each interface of VM. Source address of VM Registration Request packet is used as Proxy Care-of Address (Proxy- CoA). Source Hypervisor keeps all these values for each VM in a data structure called Binding Update List [RFC5213], one entry for each VM. The management node records the virtual machine information in a binding cache entry for this virtual machine including the source address in VM Registration Request packet as the Proxy-CoA. The management node sends a VM Registration Reply message to the hypervisor. The message is structured as a Proxy Binding Acknowledgement message [RFC5213]. VM Registration Reply message contains a status field which should be set to accepted or rejected. Virtual machine moves from its source Hypervisor to a new, destination Hypervisor. The move is initiated by the source Hypervisor. If the move is not in the same L2 link, the virtual machine IP address(es) changes. VM obtains a new IP address from the destination Hypervisor. The destination Hypervisor MUST send a VM Registration Request message to the management node. The management node receives VM Registration Request message and searches the binding cache for a matching entry. Once the match is found, the entry is modified to point to the new IP address(es) and Proxy-CoA. Previous Proxy-CoAs are kept in the entry. The management node sends a reply (VM Registration Reply message) to the destination Hypervisor to indicate the acceptance of the registration. Source Hypervisor or previous source Hypervisors need to be informed of the new IP address(es) of the VM. For this purpose, the management node sends VM Registration Request message to the source Hypervisors. Hypervisor verifies that this message is coming from the management node otherwise rejects any such message. Sarikaya & Dunbar Expires August 22, 2013 [Page 6] Internet-Draft VM Mobility Using Distributed PMIPv6 February 2013 Source Hypervisor sends VM Registration Reply back to the management node. Source Hypervisor creates a host route pointing the old VM address to the new VM address. The old VM address is obtained from the Binding Update List entry matching this VM and the new VM address from VM Registration Request message received from the management node. Virtual Machine Registration Request message contains a Lifetime field, a 16-bit unsigned integer. Lifetime field contains the lifetime of the registration in the number of time units (each 4 seconds). Source Hypervisor sends its suggested value and the management node sends the final value of the lifetime which is equal or less than the suggested value. In order to extend a binding that is expiring, the Hypervisor sends periodic reregistration messages (VM Registration Requests). All source hypervisors keep one entry in their Binding Update List for each virtual machine that was in communication before it was moved, i.e. VMs in hot VM mobility. The entries for VMs in cold VM mobility are removed after receiving VM Registration Request message from the management node. Binding Update List is used to create the host routes. Source Hypervisor sends all packets from ongoing connections of the virtual machine to the destination Hypervisor using the host route. Destination Hypervisor receives the packet and sends it to the VM. This delivery mechanism does not avoid triangular routing but it avoids tunneling. Route optimization, i.e. avoiding triangular routing is explained in Section 7. At the source Hypervisor, virtual machine entries are kept in the binding update list until all inbound traffic to the virtual machine stops. A timer may be used for this purpose. When the timer times out, the entry is deleted. Destination Hypervisor of the virtual machine does a dynamic Domain Name System update [RFC3007]. This update is done for all the services that the VM provides which means that all traffic from new connections are directed to the new location of the virtual machine with no tunneling or triangular routing. 6. Moving Local State of VM After VM mobility related signaling (VM Registration Request/Reply), the virtual machine state needs to be transferred to the destination Hypervisor. The state includes its memory and file system. Source Hypervisor opens a TCP connection with destination Hypervisor over Sarikaya & Dunbar Expires August 22, 2013 [Page 7] Internet-Draft VM Mobility Using Distributed PMIPv6 February 2013 which VM's memory state is transferred. File system or local storage is more complicated to transfer. The transfer should ensure consistency, i.e. the VM at the destination should find the same file system it had at the source. Precopying is commonly used technique for transferring the file system. First the whole disk image is transferred while VM continues to run. After the VM is moved any changes in the file system are packaged together and sent to the destination Hypervisor which reflects these changes to the file system locally at the destination. 7. Handling of Hot and Cold Virtual Machine Mobility When the VM moves to the destination hypervisor, it starts a secure dynamic DNS update using [RFC3007]. VM registers all its services in the DNS. Dynamic DNS update solves the cold VM mobility problem since all new communication to this VM can only be initiated at the new addresses that VM acquired at the destination hypervisor. Cold VM mobility also allows all source hypervisors to delete binding update list entries of the VM. When VM in motion has ongoing communications with outside hosts, the packets will continue to be received at the source hypervisors. Source hypervisors create host routes based on the binding cache entries they have for the VM. Source route enables them to route ongoing communications to the destination hypervisor. If the VM moved to a different data center then the packets are routed to the new data center. Host routes avoid tunneling inherent in the mobility protocols such as Proxy Mobile IPv6 [RFC5213]. However host routes do not avoid triangular routing. Route optimization is needed to avoid triangular routing. In mobility protocols route optimization is achieved by establishing a direct route between all communicating hosts, a.k.a. correspondent nodes and the destination virtual machine. Such a solution requires host modifications and not scalable in virtual machine mobility. Optimal IP routing in virtual machine mobility has two components [I-D.ietf-nvo3-vm-mobility-issues]: outgoing traffic and incoming traffic. Optimal IP routing for the outgoing traffic can be achieved by assigning a default router that is topologically closest to the ToR that connects the server presently hosting that VM. This can be achieved by limiting Layer 2 network to each rack and ToR to act as the default gateway of all the servers connected to this ToR. Optimal IP routing of the incoming traffic is divided into two Sarikaya & Dunbar Expires August 22, 2013 [Page 8] Internet-Draft VM Mobility Using Distributed PMIPv6 February 2013 components: intra data center traffic and inter data center traffic. 7.1. Intra Data Center Hot Virtual Machine Mobility Optimal IP routing of the incoming intra data center traffic is achieved as follows: Management node after sending VM Registration Request message to the source Hypervisors in Section 5, it exchanges VM Registration Request/Reply messages with the default router of the source hypervisor. Default router is usually the Top of Rack switch in this document, but the default router could be a different node depending on the configuration of the data center. The default router interprets Home Network Prefix values in pairs as host routes for virtual machines. The default router (ToR) establishes these host routes and uses them to redirect traffic from any correspondent nodes or from VMs in the servers connected to the ToR. This solution requires all default routers in the data center to implement the protocol defined in this document. 7.2. Inter Data Center Hot Virtual Machine Mobility Optimal IP routing of the incoming inter data center traffic can be achieved by propagating the host routes using routing protocols such as Border Gateway Protocol (BGP) [RFC4271]. Propagation of the host routes can be initiated by the ToR towards their border routers. Border router (BR) MAY send BGP UPDATE message to its BGP peers. Source hypervisors receiving incoming traffic for a VM that has moved first try to reroute the traffic using host routes. Next action should be to inform the border router to initiate a BGP update message. Source hypervisor may inform each host route that it has in its binding update list for the VM to the BR. Border router generates an UPDATE message using the information it received from the source hypervisors. UPDATE message contains one or more ORIGIN path attributes containing the address prefix values in IPv4 or IPv6 of the VM when it was at the source hypervisor. The destination prefix is contained in Network Layer Reachability Information (NLRI) field of the UPDATE message. UPDATE messages with host routes should be exchanged among a particular set of data centers, possibly the data centers belonging to the same operator. This constrained propagation can be achived by policy enforcement. Sarikaya & Dunbar Expires August 22, 2013 [Page 9] Internet-Draft VM Mobility Using Distributed PMIPv6 February 2013 8. Virtual Machine Operation Virtual machines are not involved in any mobility signalling. Once VM moves to the destination hypervisor, VM should be able to continue to receive packets to its previous address(es). This happens in hot VM mobility scenarios. VM establishes virtual interfaces for each of its previous addresses. Virtual interfaces are established only if there is communication on an address previously acquired. This address is assigned to the virtual interface. These virtual interfaces enable the VM to continue to receive packets from its previous addresses. The virtual interfaces are delete when there is no longer any communication over that address. 9. Handling IPv4 Virtual Machine Mobility Virtual machines may be created to serve legacy clients and therefore may be assigned IPv4 addresses. In this section we specify how IPv4 virtual machine mobility is handled. Source Hypervisor registers the virtual machine by sending VM Registration request message structured as in Section 5 but instead of Virtual Machine Address option Virtual Machine IPv4 Address option which is renamed from IPv4 Home Address Request option defined in [RFC5844] where IPv4 address of the virtual machine is placed. Hypervisor also adds Virtual Machine Default Router Address option renamed from IPv4 Default-Router Address option defined in [RFC5844] where Hypervisor's IPv4 address is placed. Hypervisor which is dual- stack sends VM Registration Request message in IPv6. Management node replies back with Virtual Machine Registration Reply message after registering IPv4 address in the binding cache to the Hypervisor. In the reply, IPv4 Home Address Reply option is not used because virtual machines are assigned IPv4 addresses by their hypervisors. Registration contains a lifetime value as in IPv6. The management node records the virtual machine information in a binding cache entry for this virtual machine with Virtual Machine Default Router Address as Proxy-CoA. Any traffic tunnelled to this virtual machine is directed to Proxy-CoA. When virtual machine moves to a destination Hypervisor, the Hypervisor registers the VM in the management node as in Section 5. Next, the management node sends VM Registration Request message to the source Hypervisor(s). For IPv4 VMs, source Hypervisor MUST create IPv4 host routes. Old VM IPv4 address is obtained from the binding Update List entry matching this virtual machine and the new Sarikaya & Dunbar Expires August 22, 2013 [Page 10] Internet-Draft VM Mobility Using Distributed PMIPv6 February 2013 VM IPv4 address on the other hand is obtained from VM Registration Request message. Source Hypervisor transfers virtual machine state to the destination Hypervisor in IPv4 over TCP connection(s) using the old and new Proxy-CoA IPv4 addresses. In case Network Address Translation (NAT) [RFC3022] is used, changing the NAT box after mobility invalidates all private addresses the VM had at the source hypervisor. The protocol described in this document can handle VM mobility in the presence of NAT when the NAT box is centrally located in a data center such as at the border router. In this case only intra-data center VM mobility of privately addressed VMs can be handled. 10. Security Considerations TBD. 11. IANA Considerations TBD. 12. Acknowledgements TBD. 13. References 13.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC2629] Rose, M., "Writing I-Ds and RFCs using XML", RFC 2629, June 1999. [RFC5213] Gundavelli, S., Leung, K., Devarapalli, V., Chowdhury, K., and B. Patil, "Proxy Mobile IPv6", RFC 5213, August 2008. [RFC5844] Wakikawa, R. and S. Gundavelli, "IPv4 Support for Proxy Mobile IPv6", RFC 5844, May 2010. [RFC3007] Wellington, B., "Secure Domain Name System (DNS) Dynamic Sarikaya & Dunbar Expires August 22, 2013 [Page 11] Internet-Draft VM Mobility Using Distributed PMIPv6 February 2013 Update", RFC 3007, November 2000. [RFC3022] Srisuresh, P. and K. Egevang, "Traditional IP Network Address Translator (Traditional NAT)", RFC 3022, January 2001. [RFC4271] Rekhter, Y., Li, T., and S. Hares, "A Border Gateway Protocol 4 (BGP-4)", RFC 4271, January 2006. [RFC6820] Narten, T., Karir, M., and I. Foo, "Address Resolution Problems in Large Data Center Networks", RFC 6820, January 2013. [I-D.ietf-nvo3-vm-mobility-issues] Rekhter, Y., Henderickx, W., Shekhar, R., Fang, L., Dunbar, L., and A. Sajassi, "Network-related VM Mobility Issues", draft-ietf-nvo3-vm-mobility-issues-00 (work in progress), December 2012. 13.2. Informative references [I-D.narten-nvo3-overlay-problem-statement] Narten, T., Black, D., Dutt, D., Fang, L., Gray, E., Kreeger, L., Napierala, M., and M. Sridhavan, "Problem Statement: Overlays for Network Virtualization", draft-narten-nvo3-overlay-problem-statement-04 (work in progress), August 2012. [I-D.kreeger-nvo3-overlay-cp] Kreeger, L., Dutt, D., Narten, T., and M. Sridharan, "Network Virtualization Overlay Control Protocol Requirements", draft-kreeger-nvo3-overlay-cp-02 (work in progress), October 2012. [I-D.wkumari-dcops-l3-vmmobility] Kumari, W. and J. Halpern, "Virtual Machine mobility in L3 Networks.", draft-wkumari-dcops-l3-vmmobility-00 (work in progress), August 2011. [I-D.shima-clouds-net-portability-reqs-and-models] Shima, K., Sekiya, Y., and K. Horiba, "Network Portability Requirements and Models for Cloud Environment", draft-shima-clouds-net-portability-reqs-and-models-01 (work in progress), October 2011. [I-D.bernardos-dmm-distributed-anchoring] Bernardos, C. and J. Zuniga, "PMIPv6-based distributed anchoring", draft-bernardos-dmm-distributed-anchoring-01 Sarikaya & Dunbar Expires August 22, 2013 [Page 12] Internet-Draft VM Mobility Using Distributed PMIPv6 February 2013 (work in progress), September 2012. [I-D.seite-dmm-dma] Seite, P., Bertin, P., and J. Lee, "Distributed Mobility Anchoring", draft-seite-dmm-dma-06 (work in progress), January 2013. [I-D.raggarwa-data-center-mobility] Aggarwal, R., Rekhter, Y., Henderickx, W., Shekhar, R., Fang, L., and A. Sajassi, "Data Center Mobility based on E-VPN, BGP/MPLS IP VPN, IP Routing and NHRP", draft-raggarwa-data-center-mobility-04 (work in progress), December 2012. [802.1Qbg] IEEE Draft Std 802.1Qbg/D2.2, "MAC Bridges and Virtual Bridged Local Area Networks - Amendment XX: Edge Virtual Bridging", February 2012. Sarikaya & Dunbar Expires August 22, 2013 [Page 13] Internet-Draft VM Mobility Using Distributed PMIPv6 February 2013 Authors' Addresses Behcet Sarikaya Huawei USA 5340 Legacy Dr. Building 3 Plano, TX 75024 Email: sarikaya@ieee.org Linda Dunbar Huawei USA 5340 Legacy Dr. Building 3 Plano, TX 75024 Email: linda.dunbar@huawei.com Sarikaya & Dunbar Expires August 22, 2013 [Page 14]