Network Working Group Bhumip Khasnabish Internet-Draft ZTE USA,Inc. Intended status: Informational Bin Liu Expires: December 31, 2012 ZTE Corporation Baohua Lei Feng Wang China Telecom June 29, 2012 Requirements for Mobility and Interconnection of Virtual Machine and Virtual Network Elements draft-khasnabish-vmmi-problems-01.txt Abstract In this draft, we discuss the challenges and requirements related to migration, mobility, and interconnection of Virtual Machines (VMs)and Virtual Network Elements (VNEs). We also describe the limitations of various types of virtual local area networking (VLAN) and virtual private networking (VPN) techniques that are traditionally expected to support such migration, mobility, and interconnections. Status of this Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on December 31, 2012. Copyright Notice Copyright (c) 2012 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of Bhumip Khasnabish, et al. Expires December 31, 2012 [Page 1] Internet-Draft Mobility and Interconnection of VM & VNE June 2012 publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.1. Conventions used in this document . . . . . . . . . . . . 4 2. Terminology and Concepts . . . . . . . . . . . . . . . . . . . 4 3. Network Related Prloblem specification . . . . . . . . . . . . 6 3.1. The Evolution Problems of The Logical Network Topology in VMMI Environments . . . . . . . . . . . . . . . . . . . 8 3.2. Cloud Service Virtualization Requirements . . . . . . . . 9 3.2.1. Requirement of logical element . . . . . . . . . . . . 9 3.2.2. Requirements for Resource Allocation Gateway (RA GW) Function . . . . . . . . . . . . . . . . . . . . . 10 3.2.3. Performance Requirements . . . . . . . . . . . . . . . 11 3.2.4. Fault Tolerance Capability Requirements . . . . . . . 11 3.2.5. Network Model . . . . . . . . . . . . . . . . . . . . 12 3.2.6. Types and Applications of VPNs Interconnection between DCs which provide Cloud Services . . . . . . . 12 3.2.6.1. Types of VPNs Layer3 VPN . . . . . . . . . . . . . 12 3.2.6.2. Applications of L2VPN in DCs . . . . . . . . . . . 13 3.2.6.3. Applications of L3VPN in DCs . . . . . . . . . . . 13 3.2.7. VN Requirements . . . . . . . . . . . . . . . . . . . 14 3.2.8. Packet Encapsulation Problems . . . . . . . . . . . . 14 3.2.9. Network Bandwidth Efficiency Problem of Resource Use . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.2.10. VM Migration Problem in mixed IPv4 and IPv6 Environment . . . . . . . . . . . . . . . . . . . . . 15 3.2.10.1. Real-time Perception of Availability of Global Network and Storage Resources . . . . . . . . . . 15 3.2.10.2. The real-time perception of global available network resource and requested network resource for matching with storage resources . . . 16 3.2.10.3. The real-time perception of global requested network resource for matching with storage resources . . . . . . . . . . . . . . . . . . . . 16 3.2.11. Selection of Migration . . . . . . . . . . . . . . . . 16 3.2.11.1. Requirements with Different Network Environments and Protocol . . . . . . . . . . . . 16 3.2.11.2. Requirements for Live Migration of Virtual Machines . . . . . . . . . . . . . . . . . . . . . 17 3.2.12. Access and Migration of VMs without users' Bhumip Khasnabish, et al. Expires December 31, 2012 [Page 2] Internet-Draft Mobility and Interconnection of VM & VNE June 2012 Perception . . . . . . . . . . . . . . . . . . . . . . 17 3.2.12.1. VM Migration Problems and Strategies in the WAN with having Traffic Roundabout as a Prerequisite . . . . . . . . . . . . . . . . . . . 18 3.2.12.2. VM Migration Problems and Strategies in the WAN without having Traffic Roundabout as a Target . . . . . . . . . . . . . . . . . . . . . . 19 3.2.13. Review of VXLAN, NVGRE, and NVO3 . . . . . . . . . . . 20 3.2.14. The East-West Traffic Problem . . . . . . . . . . . . 21 3.2.15. Data Center Interconnection Fabric Related Problems . 23 3.2.16. MAC, IP, and ARP Explosion Problems . . . . . . . . . 24 3.2.17. Suppressing Flooding within VLAN . . . . . . . . . . . 24 3.2.18. Convergence and Multipath Support . . . . . . . . . . 24 3.2.19. Routing Control - Multicast Processing . . . . . . . . 25 3.2.20. Problems and Requirement related to DMTF . . . . . . . 25 4. Control & Mobility Related Problem Specification . . . . . . . 26 4.1. General Requirements and Problems of State Migration . . . 26 4.1.1. Foundation of Migration Scheduling . . . . . . . . . . 26 4.1.2. Authentication for Migration . . . . . . . . . . . . . 26 4.1.3. Consultation for Assessing Migratability . . . . . . . 26 4.1.4. Standardization of Migration State . . . . . . . . . . 26 4.2. Mobility in Virtualized Environments . . . . . . . . . . . 28 4.3. VM Mobility Requirements . . . . . . . . . . . . . . . . . 29 4.3.1. Summarization of Mobility . . . . . . . . . . . . . . 29 4.3.2. Problem Statement . . . . . . . . . . . . . . . . . . 29 5. Network Management Related Problem Specification . . . . . . . 29 5.1. Data Center Maintenance . . . . . . . . . . . . . . . . . 29 5.2. Load Balancing after VM Migration and Integration . . . . 31 5.3. Security and Authentication of VMMI . . . . . . . . . . . 32 5.4. Efficiency of Data Migration and Fault Processing . . . . 32 5.5. Robustness Problems . . . . . . . . . . . . . . . . . . . 33 5.5.1. Robustness of VM Migration . . . . . . . . . . . . . . 33 5.5.2. Robustness of VNE . . . . . . . . . . . . . . . . . . 33 6. Acknowledgement . . . . . . . . . . . . . . . . . . . . . . . 34 7. References . . . . . . . . . . . . . . . . . . . . . . . . . . 34 8. Security Considerations . . . . . . . . . . . . . . . . . . . 35 9. IANA Consideration . . . . . . . . . . . . . . . . . . . . . . 35 10. Normative References . . . . . . . . . . . . . . . . . . . . . 35 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 35 Bhumip Khasnabish, et al. Expires December 31, 2012 [Page 3] Internet-Draft Mobility and Interconnection of VM & VNE June 2012 1. Introduction There are many challenges related to the VM migration and their interconnections among two or more data centers (DCs). The techniques that can be used for VM migration and data center interconnection should support the required level of performance, security, scalability, along with simplicity and cost-effective management, operations and maintenance. In this draft, the issues and requirements for moving the virtual machines are summarized with reference to the necessary conditions for migration, business needs, state classification, security, and efficiency. We then list the requirements for VM migration in the current IPV4 and IPV6 mixed environment. On the choice of the migration solution, the requirements for techniques that are useful on large-scale Layer-2 network and on segmented IP network are discussed. We summarize the requirements of virtual networks for VM migration, visual networking, and operations in DCI modes. In the following sections of this draft, we first describe the general challenges at high level, and then analyze the requirements for VM migration. We then discuss the commonly-used solutions and their limitations along with the desired features of a potential reference solution. A more detailed solution survey will be presented in a companion draft. 1.1. Conventions used in this document The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119]. 2. Terminology and Concepts o ACL: Access Control List o ARP: Address Resolution Protocol o DC: Data Center o DCB/DCBR: Data Center Border Routers o DC GW: Data Center Gateway Bhumip Khasnabish, et al. Expires December 31, 2012 [Page 4] Internet-Draft Mobility and Interconnection of VM & VNE June 2012 o DCI: Data Center Interconnection o DCS: Data Center Switch o FDB: Forwarding DataBase o HPC: High-Performance Computing o IDC: Internet Data Center o IGMP: Internet Group Management Protocol o IOMMU: Input/Output Memory Management Unit o IP: Internet Protocol o IP VPN: Layer 3 VPN, defined in L3VPN working group o ISATAP: Intra-Site Automatic Tunnel Addressing Protocol o LISP: Locator ID Separation Protocol o MatrixDCN: Matrix-based fabric for Data Center Network o NHRP: Next Hop Resolution Protocol o NVO3: Network Virtualization Overlays (Over Layer-3) o OTV: Overlay Transport Virtualization o PaaS: Platform as a Service o PIM: Protocol Independent Multicast o PBB: Provider Backbone Bridge o PM: Physical Machine o QoS: Quality of Service o RA GW: Resource Allocation GateWay o STP: Spanning Tree Protocol o TNI: Tenant Network Identifier o ToR: Top of the Rack Bhumip Khasnabish, et al. Expires December 31, 2012 [Page 5] Internet-Draft Mobility and Interconnection of VM & VNE June 2012 o TRILL: Transparent Interconnection of Lots of Links o VLAN: Virtual Local Area Networking o VM: Virtual Machine o VMMI: Virtual Machine Mobility and Interconnection o VN: Virtual Network o VNI: Virtual Network Identifier o VNE: Virtual Network Entity.(a virtualized laye-3/network entity with associated virtualized port and virtualized processing capabilities) o VPN: Virtual Private Network o VPLS: Virtual Private LAN Service o VRRP: Virtual Router Redundancy Protocol o VSE: Virtual Switching Entity (a virtualized laye-2/switch entity with associated virtualized port and virtualized processing capabilities) o VSw: Virtual Switch o WAN: Wide Area Network 3. Network Related Prloblem specification In this section, we describe the background of the virtual machine and VNE migration between the data centers. Why VM and VNE need to be migrated? First of all, in case of overload and during any natural disasters, business-critical data center applications need to be migrated to other data centers as quickly as possible. As a pre-condition of data center migration and/or integration, some of the applications can be migrated without interruption from one data center to another. As for the considerations of address resources, cooling and physical space in the primary data center, some of the virtual machines can be migrated to the backup data center(s) even under normal operating conditions. Secondly, through seamless management of VM migration, it may be possible to save operations, maintenance, and upgrade costs. For Bhumip Khasnabish, et al. Expires December 31, 2012 [Page 6] Internet-Draft Mobility and Interconnection of VM & VNE June 2012 example, the volume of previous server may be relatively large, and the volume of the present server may be relatively small. The migration of VMs would allow the users to simultaneously use a single server or to replace a set of smaller previous servers. Thus VM migration will save the user a substantial amount of physical rack space. In addition, the server of virtual machine has a unified "virtual hardware", unlike the previous server which may have a number of different hardware resources. After migration, the server can be managed through a unified interface. We note that using some of the virtual machine software such as high availability tools provided by VMware -- when the server shuts down due to various failure -- it is possible to automatically switch to another virtual server in the network without causing any disruption in operation. In short, migration of VMs under many desirable scenarios has the advantage of lowering operations costs, simplifying maintenance, improving system load balancing, enhancing system error tolerance, and optimizing system-wide power and space management. In general, a data center architecture consists of the following components: o Gateways (Data Center Gateway, Resource Allocation Gateway) o Core Router / Switch o Aggregation layer switch o Access layer ToR switch o Visual switch o Interconnection network between DCs o Servers o Firewall system, etc. Overall, the requirement of VM migration brings in the following challenges in the forefront of data center operations and management: (A)How to accommodate a large number of tenants in each isolated network in data center; (B)From one DC to another within one administrative domain, (i) how to ensure that the necessary conditions of migration are satisfied, (ii) how to ensure that a successful migration occurs without service disruption, and (c) how to ensure successful rollback when any unforeseen problem occur in the migration process. Bhumip Khasnabish, et al. Expires December 31, 2012 [Page 7] Internet-Draft Mobility and Interconnection of VM & VNE June 2012 (C)From one administrative domain to another, how to solve the problem of seamless communication between the domains. There are several different solutions to the current Layer-2 (L2) based DC interconnect technology, and each can solve different problems in different scenarios. In L2 network, VXLAN [draft-mahalingam-dutt-dcops-vxlan-01] is used to resolve the VLAN number limitation problem. And, NVGRE [draft-sridharan-virtualization-nvgre-00] attempts to solve similar problems, but artificially causes interoperability problems between domains. If the unification of packet encapsulation in different solutions can be achieved, it is bound to promote seamless migration of VMs among DCs along with the desired integration in cloud computing and networking. (D) How to utilize IP based technology to resolve migration of VMs over layer-3 (L3) network? For example, VPN technology can be used to carry L2 and L3 traffic across the IP/MPLS core network. (E)How to resolve the problems related to mobility and portability of VMs among DCs is also an important aspect to consider. We discuss the above in more details in the following sections. A related draft [DCN Ops Req] discusses data center network and operations requirements. 3.1. The Evolution Problems of The Logical Network Topology in VMMI Environments The question is whether there is any relation between VM migration and the topology of the network within a data center. In simple implementations, seamless VM migration should be realized over Layer-2 network. Since a large number of VMs and their applications are running in the same Layer-2 domain, it (VM migration) may be very stressful from bandwidth utilization viewpont of the data center switching network. In order to improve the bandwidth utilization, it is required to upgrade the load balancing capability of the network which has numerous ECMP between different points. Although multi-root tree (such as Fat Tree, MatrixDCN, and other network topology) and protocols support ECMP, we can achieve it by configuring the appropriate routing, or through TRILL or SPB. However, implementing TRILL or SPB requires elimination/upgrading of the existing equipment. If we can encode their positions in the topology by IP or MAC address along with using Fat Tree, MatrixDCN network topology, we can realize seamless and transparent VM migration within the data center, on the premise that the large Bhumip Khasnabish, et al. Expires December 31, 2012 [Page 8] Internet-Draft Mobility and Interconnection of VM & VNE June 2012 layer-2 network is composed of the existing low-end switching equipments. Note that although Ethernet and IP protocols are meant to support arbitrary topology, these Layer-2 and Layer-3 network protocols are not flexible enough for use in Data Center environments. The lack of flexibility may result in lack of scalability, management difficulties, inflexible communications, and poor fault tolerance. These ultimately result in lack of support for flexible VMs migration in the increasingly larger and complex Layer-2 networks. However, if we can solve these problems, we will be able to achieve the purpose of flexible migration for VMs in the scalable, fault tolerant layer-2 data center networks. Some solutions are moving forward in the direction to solve the problems above, there have been several new topological models and routing architectures. These include Fat Tree fabric and MatrixDCN fabric. MatrixDCN is a new style network fabric for data center networks. These fabrics can support super-large scale network including more than 100,0000 servers without performance degradation. Furthermore, through ECMP techology, MatrixDCN can eliminate the bandwidth bottleneck problems in the canonical tree-structure data center networks. MatrixDCN fabric is described in [Matrix DCN, I-D.sun- matrix-dcn]. 3.2. Cloud Service Virtualization Requirements The following sub-sections present the requirements of logical and physical elements for Cloud/DC service virtualization and their operations. 3.2.1. Requirement of logical element o Resource Allocation Gateway (RA GW) Network service providers provide virtualized basic network resources for tenants between data centers. Within the data center, the facilities include virtualized computing and virtualized storage resources. The RA gateway's role is to provide access to the virtualized resources. These resources are divided into the following three categories: networking resources, computing resources, and storage resources. The RA gateway compares the demanded networking, computing and storage resources with the available resources, finds out the corresponding relations, and achieves globally reasonable matching of resources scheduling. DC GW's function, described below, is a subset of RA GW functions. Bhumip Khasnabish, et al. Expires December 31, 2012 [Page 9] Internet-Draft Mobility and Interconnection of VM & VNE June 2012 o Data Center Gateway (DC GW) The DC gateway provides access to the data center for different outside users including the Internet access and VPN connection users. In the existing DC network model, the DC GW may be a router with virtual routing capabilities, or may be a PE device of IPVPN/L2VPN connection. Core Nodes which perform the roles of DC GWs, may also provide Internet connectivity, inter-DC connectivity and VPN support. o Core Router / Switch These are high-end core nodes / switch with routing capabilities located in the core layer, connecting aggregation layer switches. o Aggregation Layer Switch This switch aggregates traffic from the ToR switches and forwards the downstream traffic. The switch can be a normal aggregation switch, or multiple switches virtualized into a single stack switch. o Access Layer ToR Switch Access layer ToR switches are usually dual-homed to the parent node switch. o Virtual Switch This is a virtual software switch which runs on a server. The requirements related to the above demand that L2/L3 tunnel is terminated to one of the entities mentioned above. 3.2.2. Requirements for Resource Allocation Gateway (RA GW) Function The emerging DC and network providers offer virtualized computing, storage and networking resources and related services. Tenants are identified by the overlapping addresses, and share a pool of storage and networking resources. Therefore, a virtual platform is needed, with the capabilities of control and management for virtual machines, virtual services, virtual storage and virtual networks. What tenants see is a subset of the above four entities. The virtualized platform is built on the framework of the physical network, physical servers, physical switches and routers, and physical storage devices. Through the virtual platform, the tenants are offered globally scheduled resources for sharing throughout the entire system. The RA GW collects information related to system-wide availability of computing, storage, and networking resources. The RA GW then allocates appropriate quantities of computing, storage and networking resources to the tenants according to certain policies, and the demands for resources. Bhumip Khasnabish, et al. Expires December 31, 2012 [Page 10] Internet-Draft Mobility and Interconnection of VM & VNE June 2012 Note that in order to prevent any single point of failure the RA GW needs to have backup support. The global resource availability information and scheduling information (between resource allocation gateway and backup resource allocation gateway) also needs real-time backup. It is possible to provide automatic matching and scheduling of the virtualized resources, which are dynamically adjusted according to the operating conditions. It can optimize utilization of the computing resources, networking resources such as IDC interconnection resources, IDC internal routing and switching resources, and storage resources. It should consider the optimization of the network path routing for matching with network resources. Routing selection can be based on the degree of matching between the required bandwidth and the bandwidth that can be provided, the shortest path, service level, user and usage level. These factors need to be considered in the decision-making process. 3.2.3. Performance Requirements Any preferred solution should be able to easily support a large number of tenants sharing the data center resources. It is also required to support a large (more than 4K) number of VLANs. For example, there are a number of VPN applications -- VPLS or IP VPN -- which serve more than 10K tenants, each requiring multiple VLANs. In this scenario the availability of 4K VLANs is not sufficient for the tenants. The solution should guarantee high quality of service, and must ensure a large number of network connections are not interrupted even during overloads or minor failure conditions. The connectivity should meet carrier-class reliability and availability requirements. 3.2.4. Fault Tolerance Capability Requirements In the event of any fault or error, it is required to quickly recover from an error condition. Error recovery includes network fault recovery, computing power recovery, VM migration recovery, and storage recovery. Among them, the network fault recovery capability and computing power recovery are the fundamental requirements for VM migration recovery and storage recovery. Network fault recovery: Once an error or fault condition is identified in virtual network connectivity, alarms should be triggered, and recovery by using backup virtual network should be automatically activated. Computing capability recovery: Once the computing capability fails, an efficient detection mechanism is needed to find the problem and Bhumip Khasnabish, et al. Expires December 31, 2012 [Page 11] Internet-Draft Mobility and Interconnection of VM & VNE June 2012 services can be scheduled to backup virtual machines that are being used for the services. VM migration recovery: In the event of VM migration failure, it is required to automatically restore to the original state of the virtual machines so that users' services are not adversely impacted. Storage recovery: In the event of storage failures, it is required to automatically find a backup virtual storage resource so that it can be enabled or activated immediately. The response and recovery times should be very short in order to minimize service delay and disruptions. After the VM migration, it is required to consider the impact on the switching network, such as whether the new network environment will have the problem of insufficient bandwidth. Although at the consultation phase before the migration, there will be an initial judgment, but it cannot guarantee that no problem will occur at all after the migration. In addition, if the destination DC needs to activate the standby servers and additional network resources, it may be worthwhile to consider allocating and activating additional server and network resources. And, in some cases, some routing policies -- on network segments and server clusters -- may need to be adjusted as well after migration. 3.2.5. Network Model Traditionally, the DCs have their own private networks for the interconnection among themselves. Alternatively, the data centers can use independent WAN service provider's interconnection facilities for primary and/or secondary connections. 3.2.6. Types and Applications of VPNs Interconnection between DCs which provide Cloud Services 3.2.6.1. Types of VPNs Layer3 VPN Layer3 VPN BGP / MPLS IP Virtual Private Networks (VPNs) (BGP / MPLS IP Virtual Private Networks (VPNs)) RFC 4364 Layer2 VPN PBB + L2VPN TRILL + L2VPN VLAN + L2VPN NVGRE [draft-sridharan-virtualization-nvgre-00] PBB VPLS Bhumip Khasnabish, et al. Expires December 31, 2012 [Page 12] Internet-Draft Mobility and Interconnection of VM & VNE June 2012 E-VPN PBB-EVPN VPLS VPWS 3.2.6.2. Applications of L2VPN in DCs It is a very common practice to use L2 interconnection technologies for DC interconnection across geographical regions. Note that VPN technology is also used to carry L2 and L3 traffic across the IP/MPLS core network. This technology can be used in the same DC to support scalability or interconnection across L3 domains. VPLS is commonly used for IP/MPLS connection over WAN and it supports transparent LAN services. IP VPN, including BGP / MPLS IP VPN and IPSec VPN, has been used in a common IP/MPLS core network to provide virtual IP routing instances. The implementation of PBB plus L2-VPN can take advantage of some of the existing technologies. It is flexible to use VPN network in the cloud computing environment and can support a sufficient number of VPN connections/sessions (networking resources), which is much larger than the 4K VLAN mode of L2VPN. Therefore, it can achieve the effect which is similar to that of VXLAN. Note that PBB can not only support access to more than 16M virtual LAN instances, it can also separate the customers and provide different domains by isolated MAC address spaces. The use of PBB encapsulation has one major advantage. Note that since VM's MAC address will not be processed by ToRs and Core SWs, MAC table size of ToRs and Core SWs may be reduced by two orders of magnitude; the specific number is related with the number of virtual machines in each server and VM virtual interfaces. One solution to solve problems in DC is to deploy other technologies in the existing DC network. A service provider can separate its domains of VLAN into different VLAN islands, in this way each island can support up to 4K VLANs. Domains of VLAN can be interconnected via VPLS, at the same time, DC GWs can be used as VPLS PEs. If retaining the existing VLAN-based solutions only in VSw, while the number of tenants in some VLAN islands is more than 4K, the service provider needs to deploy VPLS deeper in the DC network. This is equivalent to supporting L2VPN from the ToRs, and using the existing VPLS solutions to enable MPLS for the ToR and core DC elements. 3.2.6.3. Applications of L3VPN in DCs IP VPN technology can also be used for data center network virtualization. For example, multi-tenant L3 virtualization can be achieved by assigning a different IP VPN instance to each tenant who needs L3 virtualization in a DC network. There are many advantages of using IP VPN as a Layer-3 virtualization Bhumip Khasnabish, et al. Expires December 31, 2012 [Page 13] Internet-Draft Mobility and Interconnection of VM & VNE June 2012 solution within DC compared to using existing virtual routing DC technology. Some of the advantages are as mentioned below: (1) It supports many VRF-to-VRF tunneling options containing different operational models: BGP/MPLS IP VPN, IP or L3 VPN GRE, etc. (2) The connections of IP VPN instances used in Cloud services below the WAN can be IP VPN that is directly involved in the WAN. 3.2.7. VN Requirements The Virtual Networks (VNs) consists of virtual IDC network, and virtual DC internal switching network. These VNs are built on the basis of the physical networks. VM migration is not affected by the physical network. As long as it is within the scope of the VN, it is free to migrate if it satisfies the necessary conditions. In addition, network architecture and forwarding/switching capacity should match between the source network and destination network, without causing any concern for the physical network. The physical characteristics of the network, such as VLAN, IP subnet, L2 protocol entities, QoS supporting entities, etc. are abstracted as the logical elements of VN. Because the VMs operate in VN environment, each VM has the associated logical elements, such as the CPU process, I/O, memory, Disk, etc., and VN also has a corresponding set of logical elements. In general, the VNs are isolated from each other. The VMs within each VN communicate using their own internal address, and send and receive Ethernet packets. VNs do not have ties to their specific implementation; the implementation can use Internet, L2VPN, L3VPN, GRE, etc. From the VN layer, IP can be used to make that distinction. Traffic traverse through firewall into the VN, and ACL and other security policies are also needed in the access layer. 3.2.8. Packet Encapsulation Problems In order to implement virtual network (VN), a method similar to the overlay address is required. Overlay address can be reflected by VXLAN or the I-SID of PBB+L2VPN. The overlay address works as an identifier corresponding to every instance of VN. The implementation model requires that the edge switch or router acts as the DC GW for the encapsulation and de-encapsulation of the tunnel packets. Various VNs within the DC rely on overlay address in order to distinguish and separate one from the other. Each VN also contain 4K VLANs for its internal use. The data packets travel to DC interconnection network through DC GW, and are encapsulated for subsequent transmission. The main issue related to the above is the support of encapsulation. In L2 network, VXLAN supports the VLAN expansion requirements. In Bhumip Khasnabish, et al. Expires December 31, 2012 [Page 14] Internet-Draft Mobility and Interconnection of VM & VNE June 2012 NVGRE, a similar problem is also resolved in a different way. Therefore, in order to achieve seamless migration of VMs across DCs that support different VLAN expansion mechanisms, unification of packet encapsulation methods is required. 3.2.9. Network Bandwidth Efficiency Problem of Resource Use A single data center site can't have infinite capacity, due to limitations of space, power, cooling, electrical and cable plant, so a large data center is usally composed of some geographically separated sites to ensure scalability and reliability. Some technologies have been proposed such as OTV and VPLSoverGRE to support layer-2 connection between different sites of one data center. Inter-site bandwidth is limited compared with intra-site bandwidth, so it is apt to be the bottleneck of commnications in data center. This proposal improves the utility of inter-site bandwidth through ip compression technology. It describes the position, processing procedure of ip compression model and relationship with OTV or VPLSoverGRE. This problem and solution is described by draft-sun-ip-compression-dcn-00. 3.2.10. VM Migration Problem in mixed IPv4 and IPv6 Environment With the proliferation of IPv6 technology, the existing IPv4 networks will have attachment to IPv6 hosts. This is driving the development of a series of tunnel technologies, e.g., 6to4 tunnel technology, ISATAP tunnel technology, and so on. ISATAP tunnel is a point to point automatic tunnel technology, and 6to4 tunnel is multipoint automatic tunnel technology which is mainly used for attaching multiple IPv6 islands over an IPv4 network to connect to the IPv6 network. ISATAP and 6to4 tunnel technology works through the IPv4 address embedded in destination address of IPv6 packets, which are automatically obtained at the end of the tunnel. The following issues are pertinent to the migration of VMs across data centers in mixed (IPv4 and IPv6) network environment. 3.2.10.1. Real-time Perception of Availability of Global Network and Storage Resources In the current system, that status of availability of network resources and storage resources may not be reported in hard real- time. This may cause a mismatch between the reported and actually available virtual machines/storage system resources in the data centers. However, from the global the scale, the compute and storage resources in the distributed data center system may need to be used more efficiently. Without real-time up-to-date information about system resources availability, the network resources cannot be used Bhumip Khasnabish, et al. Expires December 31, 2012 [Page 15] Internet-Draft Mobility and Interconnection of VM & VNE June 2012 more efficiently. Therefore, a management model needs to be established. This model needs to keep track of system-wide network resources and storage resources, and dispatch them on as needed basis. The management model can be integrated into the framework of virtual machine migration as being currently discussed in DMTF [DMTF VSMP]. The real challenges here are how to learn about the availability of system-wide networking, compute, and storage resources. A set of uniform methods, mechanisms and protocols would be very useful to resolve these issues. 3.2.10.2. The real-time perception of global available network resource and requested network resource for matching with storage resources In mixed IPv4 and IPv6 networks, a multi-tunneling VPN gateway solution may be useful to resolve the problem of establishing communication between heterogeneous networks. This will be helpful for supporting seamless communication across heterogeneous data centers about the availability of system-wide resources. 3.2.10.3. The real-time perception of global requested network resource for matching with storage resources The access to data center virtual machine / storage resources can be accurately performed when we have a set of standardized APIs, resources (memory, storage, processing, communications, etc.) format, and communication protocols. The availability of virtual machine / storage system resources in the global scope needs to be registered, and their status need to be reported to the resource management system in the cloud system. Eventually, the resource management system in the cloud system is kept well-informed of system-wide network resources. 3.2.11. Selection of Migration 3.2.11.1. Requirements with Different Network Environments and Protocol Currently in large-scale DCs, Layer-2 interconnection techniques are mainly used for migration of virtual machines, but there also exists Layer-3 interconnection techniques for VM migration. These two technologies are suitable for different implementation environments and scenarios. The former is often used for frequent data migration with strict Bhumip Khasnabish, et al. Expires December 31, 2012 [Page 16] Internet-Draft Mobility and Interconnection of VM & VNE June 2012 requirements on data security, such as data migration and backup in the bank, etc., whereas the latter is commonly used for data migration for personal or mobile users, or bulk data transfer between different service providers. Because of users' demands for establishment of a unified management platform, it will become more and more important to build the distributed PaaS across different cloud/DC service providers. No user is willing to maintain too many independent platforms. At the same time, sharing of resources across multiple data centers is becoming a major trend. As a result, it will become very cumbersome for data center managers to build a large number of VPN connections for all data centers. What may be needed is a portal operator, who can manage of all the internal VPN connections between the clouds/DCs and can unify the scheduling of data/VM migration in order to achieve optimum utilization of resources. 3.2.11.2. Requirements for Live Migration of Virtual Machines The scenarios for live migration of VMs across DCs include the following: (a) Migration across IPv4 networks and across IPv6 networks, (b) Migration from IPv4 to IPv6 networks and vice versa, (c) Migration based on mobile IP. Live migration of VMs may be more suitable for mobile applications for small scale and home users. The complexity of the network can be fully shielded from the users, as long as both source and destination have either IPv4 or IPV6 addresses. This migration paradigm can be more secure and applicable in Layer-3 networking environment. 3.2.12. Access and Migration of VMs without users' Perception For VM migration without users' perception, it is required to achieve migration of VMs from one DC to another without causing any significant disruption of services. In essence, the users should not be able to perceive that the VM migration has occurred. To achieve this, none (or insignificant amount) of critical data packets can be lost during the process of VM migration. The following two conditions are helpful to achieve this: i. First, consider how to avoid traffic roundabout while having traffic roundabout problem as a prerequisite. ii. Second, consider how to portray the state of no migration in user's perception and no traffic roundabout with having no traffic roundabout problem as a target. The following are the relevant problems and possible solutions in Bhumip Khasnabish, et al. Expires December 31, 2012 [Page 17] Internet-Draft Mobility and Interconnection of VM & VNE June 2012 these two areas. 3.2.12.1. VM Migration Problems and Strategies in the WAN with having Traffic Roundabout as a Prerequisite _____________ / \ user c + MAN C + \_____________/ | | | \|/ =--_==--==--=--= / \ = backbone network = = = \___=--__--==-__= / \ | / __\ \ | \|/ / / \ \|/ __________ / /|\ \ _______ / ___|___ | ___|___ \ user a + MAN A | VM-A | |gateway|MAN B + user b \_________|gateway| | |______/ |_______| |_______| | | _|__ __|__ ||VM-A|| ||VM-A|| ||____|| migration ||____|| | |__________\| | |Server| /|Server| |______| |______| Figure 1: Roundabout Traffic Scenario 3.2.12.1.1. VM Migration Requirements For migration in Layer-2 (L2) network, it is required to keep VM MAC / IP address the same as they are in the source domain. This will help live VM migration and seamless inter-DC communications among the service providers. Bhumip Khasnabish, et al. Expires December 31, 2012 [Page 18] Internet-Draft Mobility and Interconnection of VM & VNE June 2012 3.2.12.1.2. A Scenario Let us consider the scenario where a VM needs to be migrated from the IDC in metro A to the IDC in metro B. There is almost no traffic roundabout for users within the metro (such as for user a). For access to IDC services by WAN, such as from user in metro C, the client traffic must first access VM-A gateway after VM migration, and then be sent to the migrated VM through the Layer-2 tunnel. 3.2.12.1.3. A Possible Solution Through mechanisms such as DNS service, businesses can access services from a location/DC which is as close as possible and the roundabout routes can be minimized after migration. However, the shortcoming of this approach is that, for access across the metro network, there are still traffic roundabout issues. This approach is a solution to evade the problem, and does not completely solve the problem. Moreover, additional processing is involved in the control of DNS service, which increases the complexity of the solution. 3.2.12.2. VM Migration Problems and Strategies in the WAN without having Traffic Roundabout as a Target In this process of VM migration, in order to achieve real-time migration without users' perception, the entire state of the management programs (including firewall) needs to migrate as VMs migrate. The state migration of the firewalls is the key to ensure that the packets in the original firewalls' data flow are neither lost nor mis-routed during the VM migration. Before a VM migrates to a new DC environment, firewalls have recorded the existing VMs connections' session tables. In the event of VM migration, the firewalls in the new DC location will be used for the access to the VM. If the firewalls in the new location don't have the session tables of the original firewalls' data flows it will cause loss or mis-routing of packets. The original sessions will be disconnected and the users' data flows will fail to access the VM. To solve this problem, the original firewall's session tables in use need to be migrated and synchronized with the session tables of the firewall in the new VM location. The session table should contain at least the following information: Source IP address, Destination IP address, Source Port address, Destination Port address, Protocol type, VLAN ID, time of expiration, public guard information for Firewall defense. Since the firewall's session table needs to migrate when a VM migrates, the deployment of the source and destination firewall should be known in advance. There are at least two kinds of firewall deployment. The first kind is the centralized deployment. In this case, the Bhumip Khasnabish, et al. Expires December 31, 2012 [Page 19] Internet-Draft Mobility and Interconnection of VM & VNE June 2012 firewalls are placed on the connection point of the DC and WAN. Each DC has firewalls either on or adjacent to the core switches. The second one is the distributed deployment. In this case, the firewalls are distributed on the aggregation switches or access switches. The former's advantages are convenient management and deployment. The disadvantage is the firewalls can easily become a bottleneck because of centralized/aggregated processing. The latter's advantage is distributed processing of huge VM data flows in large L2 network. After knowing the deployment of the firewall, it is necessary to determine how to migrate the firewall session table from the source location to the destination location. Since the location and number of the centralized and distributed firewall deployment differ, the mechanisms that are utilized to migrate the session tables in these two deployments are not exactly the same. These are new challenges to be addressed for VM migration. 3.2.13. Review of VXLAN, NVGRE, and NVO3 In order to solve the problem of insufficient number of VLANs in DC, the techniques like VXLAN and NVGRE have adopted two major strategies; one is the encapsulation and the other is tunneling. Both VXLAN and NVGRE use encapsulation and tunneling to create a number of VLAN subnets, which can be extended to the Layer-2 and Layer-3 networks. This solves the problem of limitation of the number of VLAN as defined by IEEE802.1Q, and helps achieve shared load-balancing in multi-tenant environment in both public and private networks. The VXLAN technology is introduced in 2011, and it is designed to address the number restrictions of 802.1Q VLAN. The technologies like MAC in MAC, MAC in GRE also extend the number of VLANs. However, VXLAN attempts to address the issues related to inadequate utilization of link resources, monitoring of packets after re- encapsulation of header more effectively. The frame format of VXLAN is the same as that of OTV and LISP, although these three solutions solve different problems of IDC Interconnection and the VM migration. Also, in VXLAN, the packet is encapsulated in MAC in UDP, and addressing is extended to 24-bit, which is the effective solution to the restrictions of VLAN numbers. UDP encapsulation enables the logical virtual network extension to different subnets. It also supports the migration of VMs across subnets. The change of the frame's structure increases the field for extending the VLAN. Note that VXLAN solves different problem compared to OTV. OTV solves the problem of IDC interconnection, which builds an IP tunnel between Bhumip Khasnabish, et al. Expires December 31, 2012 [Page 20] Internet-Draft Mobility and Interconnection of VM & VNE June 2012 different data centers through MAC in IP. VXVLAN mainly solves the problem of limitation of VLAN resources in DCs due to the increase in the number of tenants. The key is the expansion of the VNI field to increase the number of VLANs. Both techniques can be applied to VM migration, since the two packet formats are almost the same and completely compatible. NVGRE specifies the 24-bit Tenant Network Identifier (TNI) and resolves some issues related to supporting multiple tenants in DC network. It uses GRE to create an independent virtual Layer-2 network, and limit physical Layer-2 network to expand across subnet borders. Terminals supporting NVGRE insert the TNI indicators in the GRE headers to separate the TNIs. NVGRE and VXLAN solve the same problem. The two technologies were proposed almost at the same time. Hoverer, there are some differences between them: VXLAN not only increases VXLAN header(VNI), but also increases the outer UDP encapsulation on the package, which facilitates live migration of VMs across subnets. In addition, differentiated services can be supported to the tenants in the same subnet because of the use of UDP. Both proposals are built on the assumption that load-balancing is the necessary condition to achieve efficient operation. VXLAN randomly assigns port number to achieve load- balancing, while NVGRE uses the retained 8-bit in the key GRE field. However, there may be opportunity to improve the capability of the control plane for both mechanisms in future. 3.2.14. The East-West Traffic Problem Let us discuss the background of East-West traffic problem first. There are a variety of applications in the DC, such as distributed computing, distributed storage, and distributed search. These applications and services need frequent exchanges of transactions between the business servers across the DCs. According to the traditional three-tier network model, the data streams first flows north-south and then finally flows east-west. In order to improve the forwarding efficiency of the data stream, it is necessary to update the existing network model and network forwarding technology. Among others, the Layer-2 multi-path technology being studied is one of the directions to solve this problem. Distributed computing is the basis of transformation of the existing IT services. This allows scalable and efficient use of sometime underutilized computing and storage resources scattered across the data centers. Bhumip Khasnabish, et al. Expires December 31, 2012 [Page 21] Internet-Draft Mobility and Interconnection of VM & VNE June 2012 In typical data centers, the average server utilization is often low in the existing network. The concept of virtualization and distributed computing can perfectly solve the problem of capacity limitation of a single server in demanding environments in certain DCs via on-demand utilization of resources and without impacting the performance. This revolutionary technology of distributed computing and services using resources in the DCs also produces several horizontal flows of traffic. The application of distributed computing technology on the servers produces a large number of interactive traffic streams between servers. In addition, the types of IDC would influence the traffic model both within and across data centers. The first type of IDC is telecom operators who usually not only operate DCs, but also supply bandwidth for the Level-2 ISP providers. The second type is the traditional ISP companies with strong power. The third type is some IT enterprises which invest in the construction of DCs. The fourth type is high-performance computing (HPC) centers that are built by universities and research institutes and organizations. Note that in these types of DCs, the south-north traffic flow is significantly smaller compared to the horizontal flow, and this causes greatest challenges to the network design and installation. In addition to the normal flow of traffic due to the distributed computing, storage, communications, and management, hot backup and VM migration requirements produce a sudden lateral flow of traffic, and associated challenges. There are three potential solutions to the distributed horizontal flow of traffic, as described below. A. The first one is to solve the problem of east-west traffic within the server clusters by exploiting representative technologies such as vswitch, Dcell, B-cube, and DCTCP . B. The second solution is through the server network and Ethernet network, by exploiting technologies such as IEEE 802.1qbg, VEPA and UCS. C. The third solution is the network-based solution. The tree structure of the traditional DC network is not inherently efficient for horizontal flow of traffic. The problems can be solved in two ways: (i)The direction of radical changes: radical deformations in changing the tree structure to multi-path, and (ii) The direction of mild improvement: change L2 big trees to L2 small trees and meet the requirements by expanding the interconnection capacity of the upper node, clustering/stacking system, and links trunking. The requirements related to the above are as follows: Stacking technology across the data center requires specialized interfaces, and the length of feasible transmission distance is limited. Bhumip Khasnabish, et al. Expires December 31, 2012 [Page 22] Internet-Draft Mobility and Interconnection of VM & VNE June 2012 The problems related to the above statement include the following: (a) although TRILL resolves the multi-path problem of Layer-2 protocol, it negatively impacts the multi-path properties of Layer-3 protocol. This is because only one active default router supports Virtual Router Redundancy Protocol (VRRP), and this means that the multi-path characteristics cannot be fully utilized on Layer-3 protocol. In addition, TRILL does not define how to deal with the problem of overlapping namespace, nor it provides any solution to the requirement of supporting more than 4K VLANs. 3.2.15. Data Center Interconnection Fabric Related Problems One of the most important factors that directly impact the VMMI is connectivity among the relevant data centers. There are many features that determine this required connectivity. These features of connectivity include bandwidth, security, quality of service, load balancing capability, etc. These are frequently utilized to make decision on whether a VM can join a host in real-time or it needs to join VRF in certain unit of VM. This connectivity fabric should be open and transparent, which can be achieved by developing simple extensions to some of the existing technologies. The program should have strong openness and compatibility; it must be easy to deploy any required extensions as well. The requirements related to the above are as follows: o The negative impact of ARP, MAC and IP entry explosion on the individual network which contains a large number of tenants should be minimized by DC and DC-interconnect technologies. o The link capacity of both intra-DC and inter-DC network should be effectively utilized. Efficient utilization of the link capacity requires traffic forwarding on the shortest path between two VMs both within the DC and across DCs. Therefore, Traffic should be forwarded on the shortest path between two VMs within the DC or across DCs. o Support of east-west traffic between customers' applications located in different DCs. o Management of VMs across DC o Mobility of VMs and their migration across DCs Many mature VPN technologies can be utilized to provide connectivity between DCs. The extension of VLAN and virtual domain between DCs Bhumip Khasnabish, et al. Expires December 31, 2012 [Page 23] Internet-Draft Mobility and Interconnection of VM & VNE June 2012 may also be utilized for this purpose. 3.2.16. MAC, IP, and ARP Explosion Problems Network devices within data centers encounter many problems for supporting conventional communication framework because they need to accommodate a huge number of IP, MAC addresses and ARP. Each blade server in a network device usually supports at least 16-40 VMs, and each VM has its own MAC adress and IP address. The entities like Disk, memory, FDB table, MAC table, etc. cause an increase in convergence time. In order to accommodate this large number of the servers, different options for the network topology, for example, fat tree topology or a conventional network topology may be considered. The number of ARP packets grows with not only the number of virtual L2 domains or ELANs which is instantiated on server but also with the number of VMs in that domain. Therefore, scenarios like overload of ARP entries on server/hypervisor, exhaustion of ARP entries on the routers/PEs, and processing overload of L3 service appliances, must be efficiently resolved. These problems will easily propagate throughout the layer 2 switching network. Consequently, what are needed to resolve these problems include (a) automated management of MAC/IP/ARP in IDC, and (b) network deployment that will reduce the explosion in MAC number requirements in DCs. 3.2.17. Suppressing Flooding within VLAN Efficient operations of Data Centers require that flooding of broadcast, multicast and unknown unicast frames within VLAN (that may be caused by the improper configuration) be reduced. 3.2.18. Convergence and Multipath Support Although STP is used to solve the broadcast storm problem in the loop, it may cause network oscillation resulting in inefficient utilization of resources. Possible solutions to this problem include switch virtualization, use of TRILL and SPB, etc. Consequently, standardization of switch virtualization and the support of complex network topology in TRILL/SPB would be very helpful. Bhumip Khasnabish, et al. Expires December 31, 2012 [Page 24] Internet-Draft Mobility and Interconnection of VM & VNE June 2012 3.2.19. Routing Control - Multicast Processing In order to achieve efficient operation of Data centers, the overheads and delays due to processing of (a) different types of packets such as unicast, multicast and broadcast, (b) ARP packets, and (c) load-balancing/-sharing mechanisms must be minimized. Note that STP bridging is often used to perform IGMP and/or PIM snooping to optimize multicast data delivery. However, since this snooping mechanism is performed by local STP topology, all traffic goes through the root bridge for each bridge. This type of traversing may lead to sub-optimal multicast traffic transmission. There also exist additional overheads because each customer multicast group is associated with the forwarding tree network throughout the Ethernet switching network. Consequently, development and standardization of efficient Layer-2 multicast mechanism to support intra- and inter-DC VM mobility would be very useful. 3.2.20. Problems and Requirement related to DMTF o Computing Resources It is required to standardize the format for virtualizing computing resources. Best practices for utilizing a standardized format for mobility and interconnection management of virtualized computing resources would be also very useful. o Storage Resources It is required to standardize the format for virtualizing storage resources. Best practices for utilizing a standardized format for mobility and interconnection management of virtualized storage resources would be also very useful. o Memory Resources It is required to standardize the format for virtualizing memory resources. Best practices for utilizing a standardized format for mobility and interconnection management of virtualized memory resources would be also very useful. o Switching Resources It is required to standardize the format for virtualizing switching resources. Best practices for utilizing a standardized format for mobility and interconnection management of virtualized switching resources would be also very useful. o Networking Resources It is required to standardize the format for virtualizing Bhumip Khasnabish, et al. Expires December 31, 2012 [Page 25] Internet-Draft Mobility and Interconnection of VM & VNE June 2012 networking resources. Best practices for utilizing a standardized format for mobility and interconnection management of virtualized networking resources would be also very useful. 4. Control & Mobility Related Problem Specification 4.1. General Requirements and Problems of State Migration 4.1.1. Foundation of Migration Scheduling A series of inspections need to be done before initiating the VM migration process. The hypervisor should be able to confirm which data centers need to be interconnected for migrating VM data in the network. The hypervisor should also be able to confirm which subnets and servers in the current network are most suitable to accommodate the migrated VMs. 4.1.2. Authentication for Migration For VM migration, authentication is required for all of the following entities: network resources, processor, memory and storage resources, load balancer, firewall, etc. 4.1.3. Consultation for Assessing Migratability After successful authentication, it is required to check that the inter-DC networking resources can support the migration of VMs. The required resources include network bandwidth resources, storage resources, resource pool scheduling or management resources, and so on. 4.1.4. Standardization of Migration State As an example of standardization of the VM state migration process, the following related entities should be aware of the state of each other. The flow of activities may be as follows: Global detection -> authentication processing -> capability negotiation->session establishment -> initialization instance -> establish the beginning stage -> begin migration -> migration & migration exception handling -> finish migration -> End stage -> deletion of instances - > Global detection +------------------------+ | \|/ | +------------------+ Bhumip Khasnabish, et al. Expires December 31, 2012 [Page 26] Internet-Draft Mobility and Interconnection of VM & VNE June 2012 | | Global detection | | +------------------+ | | | \|/ | +------------------+ | | authentication | | | processing | | +------------------+ | | | \|/ | +------------------+ | | capability | | | negotiation | | +------------------+ | | | \|/ | +------------------+ | | session | | | establishment | | +------------------+ | | | \|/ | +------------------+ | | initialization | establish the beginning stage | | instance | | | +------------------+ | | \| | | +---------------| | | | /| | | | | | | | \|/ | | | +------------------+ | | | | begin migration | | | | +------------------+ | | | | | | | | | | | \|/ | | +------------+ | | | exception |/ Y migration | | | processing |--- exception? | | +------------+\ | | | | | |N | | | | | \|/ | | +------------------+ | | | finish migration | | | +------------------+ | Bhumip Khasnabish, et al. Expires December 31, 2012 [Page 27] Internet-Draft Mobility and Interconnection of VM & VNE June 2012 | | | | \|/ | | +------------------+ \|/ | | destruction | end stage | | of instances | | +------------------+ | | +------------------------+ Figure 2: A Flow Chart for State Migration between Data Centers 4.2. Mobility in Virtualized Environments In order to support VM mobility, it is required to allow VMs to migrate easily and repeatedly -- that is as often as needed by the applications and services -- among a large (more than two) number of DCs. Seamless migration of VMs in mixed IPv4 and IPv6 VPN environments should be supported by using appropriate DC GWs. VMs in the resource pool should support mobility. These mobile VMs can move either within a DC or from one DC to another remote DC. The mobility can be triggered by factor like natural disaster, imbalance of load, cost (of space, electricity, etc.) reduction campaign, and so on. When a VM is migrated to a new location, it should maintain the existing client sessions. VM's MAC and IP address should be preserved and the state of the VM sessions should be copied to the new location. Some widely used virtual machine migration tools require that management programs on the source server and destination server are directly connected via an L2 network. The objective is to facilitate the implementation of smooth VM migration. One example of such tool is VMware's VMotion virtual machine migration tool. (1) Firstly, a VMotion ELAN may need to provide protection and load- balancing across multiple DC network. (2) Secondly, in the current VMotion procedure, the new location of the VM must be part of the tenant ELAN domain. When a new VM is activated, a Gratuitous ARP is sent, and the MAC FIB entries in the "tenant ELAN" are updated to direct the traffic for that VM to the new VM location. (3) Thirdly, if the path needs IP forwarding, the accessibility information of VM must be updated to the shortest path information to the VM. Bhumip Khasnabish, et al. Expires December 31, 2012 [Page 28] Internet-Draft Mobility and Interconnection of VM & VNE June 2012 4.3. VM Mobility Requirements 4.3.1. Summarization of Mobility Mobility refers to the movement of a VM from one server to another server within one DC or to a different DC, while maintaining the VM's original IP and MAC address throughout the process. VM mobility does not change the VLAN/subnet connection to the VM, and it requires that the serving VLAN be extended to the new location of VM. In summary, the seamless mobility solution in DC is based on IP routing, BGP / MPLS MAC-VPN, BGP / MPLS IP VPNs and NHRP. 4.3.2. Problem Statement The following are the major issues related to supporting seamless mobility of VM. The first problem is that the participating source server and destination server in the VM migration process may be located in different data centers. It may be required to extend the Layer-2 network beyond what is covered by the L2 network of the source DC. This may create islands of the same VLAN in different (geographically dispersed) data centers. The second problem is that the optimal forwarding in a VLAN that support VM mobility may involve traffic management over multiple data centers. The third problem is that the support of seamless mobility of VM across DCs may not necessarily always achieve optimal intra-VLAN forwarding. The forth problem is that the support of seamless mobility of VM across DCs may not necessarily always result in optimal routing. 5. Network Management Related Problem Specification 5.1. Data Center Maintenance We note that the servers and the applications/services in the data center should maintain uninterrupted service during the migration process. In order to provide uninterrupted service during the migration process, the following are some prerequisites: Bhumip Khasnabish, et al. Expires December 31, 2012 [Page 29] Internet-Draft Mobility and Interconnection of VM & VNE June 2012 o It is required to ensure the networking and communication services remain uninterrupted between the source node and destination node during the migration. o A stateful migration may be preferred. It may be desirable to not to respond to users' requests until a successful migration occurs. The service management program in the source server records the current state of VM and saves users' requests for any service/ operation to the VM in the source node. o It is required to copy the state data of source VM to the target VM in another DC, and then the new VM in the target node (DC) can be activated for accepting the service requests. o The service management program in the source server needs to store (in cache) both operation request and the current state of the source VM, and send those over the network to the service management program in the target server. As soon as the target server and VM become ready, the service management program in the target server publishes the received operation request to the target VM. The target VM takes the received final state information of the source VM as the initial operational parameters. However, in real-life operations, system malfunction may occur in any one of the above four steps/scenarios. For example, it may be difficult to ensure uninterrupted communication/networking between source node and destination node during the entire migration process. Maintaining sustainable network QoS may be complex, and VM migration may take excessively long time due to lack of timely availability of the required nodal/DC resources. Now, if the VM migration time is excessively long, the users may need to be allowed to continuously use the source VM, and the changes of data during the migration must also be recorded. At the same time it is required to take measures to ensure that the amount of change in the database and application is as small as possible. This will help achieve faster recovery, and at the same time the interruption due to VM migration will be almost imperceptible to the users. It may be useful if IETF proposes a standard definition of the uninterrupted service for the VM migration scenario. This definition along with the parameters can be the basis for checking the maturity of various VM migration solutions. The definition should take into account the time that the users/services can tolerate without giving any perception of interruption in the operation. Total time is the addition to the time required for execution of the four steps/ processes that are mentioned at the beginning of the section. It may be expected that the most mature solution in each of the steps/ Bhumip Khasnabish, et al. Expires December 31, 2012 [Page 30] Internet-Draft Mobility and Interconnection of VM & VNE June 2012 process will offer fastest and best solution to the VM migration process. The next problem related to this topic is the Physical Device Compatibility Problem. When migrating a VM from one Physical Machine (PM) to another, if the VM is depending on some special driver, hardware, which are NOT available in the target PM, the migration process will fail. For example, if a VM is using IOMMU technology which is used to access real hardware directly (not emulated by hypervisor, for high performance) from VM, and this device is not available in the target PM, VM migration process will fail. Therefore a basic requirement related to VM migration is checking for strict compatibility between source and target PM before initiating the migration process. Another problem related to this topic is migration of VMs between Heterogeneous Hypervisors. We note that some virtual network functions are implemented in hypervisor, such as vSwitch in VMware. Additional requirements related to the above are as follows: stateful and stateless VMMI processing need to be be treated separately. Stateless VMMI processing refers to the fact that the protocol state for the transaction does not need to be preserved in memory. This lack of state means that if the follow-up processing is needed before processing the information, it must be retransmitted. This means that it could lead to significant increase in the amount of data that need to be transferred as the number of connections increases. For stateless VM migration, there is no need transfer previous state information and hence lightweight processing and fast response can be achieved. 5.2. Load Balancing after VM Migration and Integration In the migration of virtual machines between data centers, users are provided with the nearest calculation principle of "follow the sun", or multi-site load balancing requirements. In addition, for reducing energy consumption, cooling costs and other similar considerations, the virtual machines can be integrated into less dynamic data centers, which is the future trend of the so-called "Green" data centers. The challenge related to this topic is how to solve the problem of load-balancing. For example, before the migration of VM, loading of the source VM server and network traffic distribution may be load- balanced locally, and the loading of the destination VM server and network traffic distribution may be load-balanced locally. However, after the migration of VM from the source server to destination Bhumip Khasnabish, et al. Expires December 31, 2012 [Page 31] Internet-Draft Mobility and Interconnection of VM & VNE June 2012 server, both loading condition and traffic distribution may not be balanced even for some extended time period. Therefore, it may be useful to define and enforce a set of policy in order to allocate VM and other networking and computing resources uniformly across data centers. Of course the software, hardware and networking environments of the source and destination servers should be also as similar as possible. 5.3. Security and Authentication of VMMI During the VMMI / VM migration process, it is required to give proper considerations to the security related matters; this includes solving traffic roundabout issues, ensuring that the firewall functionalities are appropriately enacted, and so on. Therefore, in addition to authorization and authentication, appropriate policies and measures to check/enforce the security level must be in place while migrating VMs from one DC to another, especially from a private DC to a public DC in the Cloud [NIST 800- 145, Cloud/DataCenter SDO Survey]. For example, when a VM is migrated to the destination DC network, the corresponding switch port of the VM and its host server should utilize the port strategy of the source switch. The end time of the VM migration and the issue time of the strategy must be synchronized. If the former is earlier than the latter, the services may not get a timely response, and if the former is later than the latter, it may not have exact level of network security for a time period. What may be helpful in such environment is the creation and maintenance of a reasonable interactive state machine. 5.4. Efficiency of Data Migration and Fault Processing It may be useful to streamline data before commencing VM migration. Incremental migration may help improve VM migration efficiency. For example, plan to transfer only differentiated data during VM migration process between two DCs. However, this strategy may have the risk of propagating faults between DCs. In addition, if VM migration occurs between heterogeneous database systems, such as transfer of data from ORACLE database in Linux system to SQL Server database in Windows system, it is necessary to define the security and policy when fault occurs. The processing of VM migration may be slower when database migration operation fails, and there may be a need to roll back to previous stable states for all of the databases involved in VM migration. Similar issues are being discussed in DMTF [DMTF VSMP] as well. Bhumip Khasnabish, et al. Expires December 31, 2012 [Page 32] Internet-Draft Mobility and Interconnection of VM & VNE June 2012 5.5. Robustness Problems 5.5.1. Robustness of VM Migration During normal operations, VMs may encounter a series of challenges, e.g., CPU overloaded, memory and storage stress, disk space limitation, excessive program response time, database write-up failure, file system failure, etc. If any of the above issues cannot be resolved in a timely fashion, it will lead to the collapse of the VM migration process. As a part of the recovery process, the VM management process should take a snapshot of all data in the VM and copy them into a blank VM (VM template) in the current or a distant server with an objective to prevent any service disruption. The snapshot can be stateful or stateless, depending on (a) the status, nature, and function of the owner to which various data belongs to in the VM, and (b) the strategy of replication. For example, for the data in the database, a stateful snapshot needs to be taken, because the database itself has the ability to record the running state of the database. We note that any incremental migration of VM state is not sufficient to guarantee service continuity. Another alternative solution may be warranted. During VM migration process if the speed of writing is faster than the data transfer (from source VM location to destination VM location) rate, the VM state transfer has to be paused to adjust the time for bulk data transfer. During this adjustment period, the service downtime will occur. It is required to develop methods and mechanisms to overcome such service discontinuity. 5.5.2. Robustness of VNE During normal operations, VNEs may encounter a series of challenges, e.g., CPU overloaded, memory stress, space limitation of MAC table and forwarding table, lack of routing convergence, excessive program response time, file system failure, etc. If any of the above issues cannot be resolved in a timely fashion, it will lead to the collapse of VNE migration. As a part of the recovery process, the VNE management process should take a snapshot of all data in the VNE and copy them into an idle/unassigned VNE in the current or a distant node with an objective to prevent any service disruption. The snapshot can be stateful or stateless, depending on (a) the status, nature, and function of the owner to which various data belongs to in the VNE, and (b) the strategy of replication. Bhumip Khasnabish, et al. Expires December 31, 2012 [Page 33] Internet-Draft Mobility and Interconnection of VM & VNE June 2012 For example, for stateful snapshot of a VNE both protocol state and the status of forwarding table need to be captured and transferred to the new (migrated) location of the VNE. 6. Acknowledgement The following experts have provided valuable comments on the earlier version of this draft: Thomas Narten, Christopher LILJENSTOLPE, Steven Blake, Ashish Dalela, Melinda Shore, David Black, Joel M. Halpern, Vishwas Manral, Lizhong Jin, Juergen Schoenwaelder, Donald Eastlake, and Truman Boyes. We express our sincere thanks to them, and expect that they will continue to provide suggestions in future. 7. References [PBB-VPLS] Balus, F. et al. "Extensions to VPLS PE model for Provider Backbone Bridging", draft-ietf-l2vpn-pbb-vpls-pe-model- 04.txt (work in progress), October 2011. [VM-Mobility] Raggarwa, R. et al. "Data Center Mobility based on BGP/MPLS, IP Routing and NHRP", draft-raggarwa-data-center- mobility-01.txt (work in progress), September 2011. [DCN Ops Req] A. Dalela. "Datacenter Network and Operations Requirements", draft-dalela-dc-requirements-00.txt, December 30, 2011 [DMTF VSMP] DMTF. "Virtual System Migration Profile", DSP1081, Version: 1.0.0c, May 2010 [VPN Applicability] Nabil Bitar. "Cloud Networking: Framework and VPN Applicability", draft-bitar-datacenter-vpn-applicability-01.txt, October 2011 [VXLAN] M.Mahalingam. "VXLAN: A Framework for Overlaying Virtualized Layer 2 Networks over Layer 3 Networks", draft-mahalingam-dutt-dcops-vxlan-01.txt, February 24, 2012 [NIST 800-145] NIST Special Publication 800-145, Peter Mell and Timothy Grance, The NIST definition of cloud computing, http://csrc.nist.gov/publications/nistpubs/800-145/SP800-145.pdf, September 2011 [Cloud/DataCenter SDO Survey] B. Khasnabish and C. JunSheng. "Cloud/ DataCenter SDO Activities Survey and Analysis", Bhumip Khasnabish, et al. Expires December 31, 2012 [Page 34] Internet-Draft Mobility and Interconnection of VM & VNE June 2012 draft-khasnabish-cloud-sdo-survey-02.txt, December 28, 2011 [NVGRE] M. Sridharan. "NVGRE: Network Virtualization using Generic Routing Encapsulation", draft-sridharan-virtualization-nvgre-00.txt, September 2011 [NVO3] Thomas Narten. " NVO3: Network Virtualization", l2vpn-9.pdf, November 2011 [Network State Migration] Yingjie Gu, "draft-gu-opsawg-policies-migration-01", draft-gu-opsawg-policies-migration-01.txt,October 2011 [Matrix DCN] Sun et al , "Matrix Fabric based Data Center Network", draft-sun-matrix-dcn-00.txt,Work in progress, 2012. 8. Security Considerations To be added later, on as-needed basis. 9. IANA Consideration The extensions that are discussed in this draft are related to DC operations environment. 10. Normative References [RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private Networks (VPNs)", RFC 4364, February 2006. Authors' Addresses Bhumip Khasnabish ZTE USA,Inc. 55 Madison Avenue, Suite 160 Morristown, NJ 07960 USA Phone: +001-781-752-8003 Email: vumip1@gmail.com, bhumip.khasnabish@zteusa.com Bhumip Khasnabish, et al. Expires December 31, 2012 [Page 35] Internet-Draft Mobility and Interconnection of VM & VNE June 2012 Bin Liu ZTE Corporation 15F, ZTE Plaza, No.19 East Huayuan Road,Haidian District Beijing 100191 P.R.China Phone: +86-10-59932098 Email: richard.bohan.liu@gmail.com,liu.bin21@zte.com.cn Baohua Lei China Telecom 118, St. Xizhimennei, Office 709, Xicheng District Beijing P.R.China Phone: +86-10-58552124 Email: leibh@ctbri.com.cn Feng Wang China Telecom 118, St. Xizhimennei, Office 709, Xicheng District Beijing P.R.China Phone: +86-10-58552866 Email: wangfeng@ctbri.com.cn Bhumip Khasnabish, et al. Expires December 31, 2012 [Page 36]