Internet Engineering Task Force V. Grado Internet-Draft T. Tsou Intended status: Informational Huawei Technologies Expires: December 15, 2011 N. So Verizon Communications Inc. June 13, 2011 Virtual Resource Operations & Management in the Data Center draft-tsou-vrom-problem-statement-02 Abstract TThe dynamic allocation of computing resources on a massive scale through the use of virtual machines to serve a large number of customers and applications simultaneously brings a number of benefits but also a number of challenges to data center operations. Such challenges range from acquiring the information needed to provision the physical servers, storage and networking elements, through accounting for resource and application usage at the user level. In particular, this document describes the problem of operational and management challenges that virtualization brings in the (carrier) data center as an enabler of new technologies such as self- provisioning and elastic capacity and related benefits of consolidation, reduced total cost of ownership, and energy management. This document does not cover the problem of address resolution in massive data centers. It does not cover technologies related to VDI either. Status of this Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on December 15, 2011. Copyright Notice Grado, et al. Expires December 15, 2011 [Page 1] Internet-Draft VRO&M June 2011 Copyright (c) 2011 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . 3 3. Operational Challenges for Virtualization . . . . . . . . . . . 4 3.1. Unique Requirements from Virtualization . . . . . . . . . . 4 3.2. Operations and Management in a Virtualized DC . . . . . . . 5 4. VM Performance and Configuration Management Challenges . . . . 6 4.1. Performance Management Challenges in Virtualization . . . . 6 4.2. VM Configuration and Inventory Operational Challenges . . . 6 5. Operational Challenges in Services with Virtual Resources . . . 6 5.1. VM Migration Operational Challenges . . . . . . . . . . . . 7 5.2. VPC Operational and Management Challenges . . . . . . . . . 7 6. Conclusion and Recommendation . . . . . . . . . . . . . . . . . 8 7. Manageability Considerations . . . . . . . . . . . . . . . . . 8 8. Security Considerations . . . . . . . . . . . . . . . . . . . . 8 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 8 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 8 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 8 Grado, et al. Expires December 15, 2011 [Page 2] Internet-Draft VRO&M June 2011 1. Introduction There is currently a strong movement toward virtualization of data center resources, with the aim of improving physical resource utilization, reducing energy consumption as a result, and improving responsiveness to demands for data center resources. Along with this is a parallel movement toward outsourcing data center operations, with the result that multiple enterprises may share the same physical resources for their own computing and storage requirements. Both in- house and outsourced data center virtualization raise obvious concerns over data security and regulatory compliance, but this is just one aspect of the operational and management challenges raised by large-scale resource virtualization. IThe basic unit of resource virtualization in this architecture is the virtual machine (VM), running over a "hypervisor" layer and sharing a physical server with other virtual machines and a management entity. The virtual machine has its own guest operating system, set of one or more applications, and allocations of processing, storage, and networking resources. The need to mix and match products from different vendors can lead to interoperability challenges that need to be addressed by standards from the start, or risk vendor lock-in. This document focuses on the problem statement of various data center virtual resources operations and management areas. This document does not cover the problem of address resolution in massive data centers nor the problem of technologies known as VDI. 2. Terminology CE: Customer Edge DC: Data Center DE: Data Center Edge PE: Provider Edge VDC: Virtualized Data Center VDI: Virtualized Desktop Infrastructure VPC: Virtual Private Clouds (or VPN based Clouds) VM: Virtual Machine (or host) Grado, et al. Expires December 15, 2011 [Page 3] Internet-Draft VRO&M June 2011 SLA: Service Level Agreement 3. Operational Challenges for Virtualization 3.1. Unique Requirements from Virtualization There are operational challenges and requirements ensuing from virtualized resources that are unique and not present in conventional implementations. In virtualized resources, a virtual machine (VM) embodies virtual hardware that is emulated by a hypervisor (or a similar mechanism in virtual networking resources), that mediates all interactions with the underlying physical hardware. That mechanism is transparent to the guest operating system, running completely independent of other guests VM in the same physical resources. The hypervisor then performs the mapping between the virtual resources of the VM (usually an application and a guest operating system) and the physical hardware of a server, storage, or network. The hypervisor is the component responsible for managing physical resources to allocate them fairly to the multiple VMs running on a host. The main physical resource pools that the hypervisor needs to manage for carrying out its job are as follows: o CPU: An configurable amount of CPU assigned to a VM, during creation, regardless of the real amount of physical CPU. A CPU scheduler is used by the hypervisor to process the CPU requests from the VMs. o Disk: A single large file allocated on one the host's datastores as a virtual disk for each VM. Disk I/O requests are also queued for each VM. o Memory: A fixed amount of memory that gets mapped into virtual memory pages and in turn to physical memory pages. The hypervisor must ensure there is no overallocation of virtual memory that the physical memory cannot handle. o Network: It includes a virtual network to provide the same functionality as a physical network, including IP address, virtual NIC, switches and firewalls. Because the network traffic is only handled between VMs, many times there is no visibility to external physical tools. Grado, et al. Expires December 15, 2011 [Page 4] Internet-Draft VRO&M June 2011 3.2. Operations and Management in a Virtualized DC From the above, a number of operational challenges arise in a virtualized environment that cover different leves of service in a DC. Some of challenges are: 1. New devices and elements * Monitor VM lifecycle, including VM migration ("lift & shift") * Address management for VM lifecycle support * Resource monitoring for faults and abnormal conditions * Resource availability, peformance metrics and usage based metering * Hypervisor status and interface monitoring 2. Infrastructure management support * Connectivity needs for virtualization management * Policy management and enforcement in the Virtualized DC * IPFIX for virtualization performance management * Interoperability of multiple hypervisors * Open programmatic interfaces to support access and management of Datacenter contents and resources 3. Service management enablement * Supporting secure low-latency VLAN and VPN connections in large scale on on-demand (pay as you go) basis for capacity management of dedicated pool of resources * Service hosting, co-location, and distributed virtualized redundancy for seamless scaling * Facilities management including premises, security, privacy and data integrity management for regulatory compliancy * Management of VPN-based clouds Grado, et al. Expires December 15, 2011 [Page 5] Internet-Draft VRO&M June 2011 4. VM Performance and Configuration Management Challenges 4.1. Performance Management Challenges in Virtualization From the discussion in the previous sections, it is clear that performance management for virtualized resources is a very critical area. This can include capacity management, as well as availability management, since they provide a status of the health of the network resources and services, including the health of the VMs and hypervisors. While a hypervisor is in charge of load balancing and keeping tab of resource utilization, additional mechanisms need to be in place to obtain the best performance. There is a need to obtain key metrics from the VMs that can be used to support a more robust management of resources and services. A protocol such as IPFIX that can tap into VMs to obtain flow metrics needs to be devised (or existing ones enhanced). The source of the metrics for virtualized resources are diferent from the sources in physical resources, as described above. Metrics such as uptime that are provided by mechanisms within IPPM and PMOL recommendations need to be obtained for virtualized hosts as well. 4.2. VM Configuration and Inventory Operational Challenges Another critical challenge arising from the creation of virtual hosts is 'sprawl' that can happen over time when there is lack of control and monitoring in the lifecycle of a large quantity of VMs. Besides service and performance problems that might arise, configuration management issues will ensue from VM that are consuming resources in the background, if unmonitored, some may become out of sync with policy and compliance, with fine-tuning applications, with bandwith management, etc. There is a need to carry out configuration management for virtual resources and hosts that include discovery, inventory and backup, for the both the virtual and physical resources. There is a need for a protocol like NETCONF that also covers virtual hosts. 5. Operational Challenges in Services with Virtual Resources Grado, et al. Expires December 15, 2011 [Page 6] Internet-Draft VRO&M June 2011 5.1. VM Migration Operational Challenges VM migration, also called by other names such as VM Motion, "lift & shift", etc., implies moving a VM to another location within a data center, or even to a different data center, with the consequent operational and management challenges. Just to name a few of the challenges: o Policy reconfiguration in the destination device o Other dynamic information updating in destination o Address management and reconfiguration when involving different data centers In addition, there are a few complex models for the interconnection of service providers supporting virtualized resources already working their way into real implementations that will allow more complex VM migration schemes but that also represent their own set of operational and management challenges. 5.2. VPC Operational and Management Challenges From the virtualized resources deployment models, this model brings together most of the operational requirements into the unified computing stack, and in particular the network side, directly. VPC embodies services that are delivered over a virtual private network (VPN) and therefore the VPN protocols and implementations need to be enhanced to support the "characteristics" mentioned above in addition to their own operational requirements. At a higher level, a VPC needs to meet SLA and enfore policies, meet demands from the management of order requests, self-provisioning, usage-based metering, and management, either through programmatic interfaces or by other means. There are two cases, 1) the pure play virtualization-enabled provider, that needs to use another carrier to interconnect the different data centers, and 2) the carrier offering over their own network. In both of those cases, the VPN protocols need operational enhancements to support end-to-end SLA monitoring or even just internal service level objectives in addition to customer SLA, all which requires corresponding metrics, based on usage per resource and per customer. In the second case, there are additional improvements that can be made as in the case of the deployment of a DC Edge device (DE), a box in between the DC and the network edge that will simplify Grado, et al. Expires December 15, 2011 [Page 7] Internet-Draft VRO&M June 2011 provisioning and operations by eliminating the need of a CE-PE pair. 6. Conclusion and Recommendation With the new networking, server and storage technologies converging in to the DC in the form of unified computing solutions at whose core is a virtualization stack, many new operational and management challenges arise. Therefore, we recommend that the IETF engage in the study of the problem of virtualized resources operations and management and, if appropriate, the development of interoperable solutions. 7. Manageability Considerations This document does not add additional manageability considerations. 8. Security Considerations To come. 9. IANA Considerations This memo currently includes no request to IANA. 10. Acknowledgements Awaiting comments. Authors' Addresses Victor M. Grado Huawei Technologies 2330 Central Expwy Santa Clara, CA 95050 US Phone: Email: vgrado@huawei.com Grado, et al. Expires December 15, 2011 [Page 8] Internet-Draft VRO&M June 2011 Tina Tsou Huawei Technologies 2330 Central Expwy Santa Clara, CA 95050 US Phone: Email: tena@huawei.com Ning So Verizon Communications Inc. 2400 N. Glenville Ave Richardson, TX 75080 US Phone: Email: ning.so@verizonbusiness.com Grado, et al. Expires December 15, 2011 [Page 9]