Network Working Group N. Zong Internet-Draft Huawei Technologies Intended status: Informational July 15, 2013 Expires: January 16, 2014 Problem Statement for Reliable Virtualized Network Function (VNF) Pool draft-zong-vnfpool-problem-statement-00 Abstract Network Function Virtualization (NFV), conceptualized by the European Telecommunications Standards Institute (ETSI) NFV Industry Specification Group (ISG) , is gaining significant momentum within the the telecoms industry. A key area currently being discussed within the ETSI NFV ISG is the reliability and availability of the network service implemented by a set of Virtualized Network Functions (VNFs) building on top of the virtualization infrastructure. This document mainly focus on problem statement and gap analysis. It provides an overview of the problem space related to NFV reliability, it then briefly reviews an applicable architecture to scope potential solution space. Finally it identifies the gaps of several existing approaches to NFV reliability for potential reuse and extension. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on January 16, 2014. Copyright Notice Copyright (c) 2013 IETF Trust and the persons identified as the document authors. All rights reserved. Zong Expires January 16, 2014 [Page 1] Internet-Draft Reliable VNF Pool July 2013 This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Background . . . . . . . . . . . . . . . . . . . . . . . . . 2 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 3. Problem Statement . . . . . . . . . . . . . . . . . . . . . . 3 3.1. Failure on Hypervisor . . . . . . . . . . . . . . . . . . 4 3.2. Reliable Data Connection . . . . . . . . . . . . . . . . 4 3.3. Failure Separation . . . . . . . . . . . . . . . . . . . 4 3.4. Reliability Class . . . . . . . . . . . . . . . . . . . . 5 4. Reliable Virtualized Network Function Pool . . . . . . . . . 5 5. Gap Analysis and Related Works . . . . . . . . . . . . . . . 7 5.1. Reliable Server Pool . . . . . . . . . . . . . . . . . . 7 5.2. Multipath TCP . . . . . . . . . . . . . . . . . . . . . . 8 5.3. VNF Forwarding Graph . . . . . . . . . . . . . . . . . . 9 6. Security Considerations . . . . . . . . . . . . . . . . . . . 9 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 10 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 10 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 10 9.1. Normative References . . . . . . . . . . . . . . . . . . 10 9.2. Informative References . . . . . . . . . . . . . . . . . 10 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 10 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 10 1. Background Network Function Virtualization (NFV) utilizes IT virtualization technology to consolidate application specific network equipment onto industry standard high volume servers and switches, which could be located within the Data Center (DC), network nodes and customer premises. For a full overview from the principle operators involved with establishing NFV see [NFV-WP]. The European Telecommunications Standards Institute (ETSI) established an NFV Industry Specification Group (ISG) to study the operator use cases, high level requirements, and functional architecture of NFV. A key group within the NFV ISG is the Reliability and Availability Working Group (RELAV WG) tasked with identifying and documenting the important aspects of NFV reliability Zong Expires January 16, 2014 [Page 2] Internet-Draft Reliable VNF Pool July 2013 and availability. A fundamental objective is to manage Virtualized Network Function (VNF) building on top of the virtualization infrastructure to meet a various levels of reliability and availability requirements of the network services [NFV-REL]. In this document, we firstly overviews the problem space related to NFV reliability. We then present an applicable architecture of reliable VNF to scope potential solution space. This document is not intended as a catchall for all reliability issues identified by the RELAV working group. We focus on providing the base for reliable VNF instances and reliable transport connections between VNF instances, which will support a wide range of network services. Finally, we identify the gaps of several existing approaches to NFV reliability for potential reuse and extension to existing mechanisms. 2. Terminology Network Service: a service (e.g. telephony, messaging, Internet connectivity) that is composed of a set of network functions (e.g. firewall, load balancer). Virtualized Network Function (VNF): a VNF provides the same functional behaviour and interfaces as the equivalent network function, but is deployed as software instances (e.g. VMs) inside the virtualization infrastructure (e.g. hypervisor) [NFV-TERM]. VNF Pool: a set of software instances (e.g. VMs) where each instance can be configured to implement a specific VNF, and connected with each other to support network service. Pool Manager (PM): an entity that interacts with VNF pool for pool management, and network service for service request and response. 3. Problem Statement In the context of NFV, a network service essentially consists of a set of VNF with each VNF building on top of virtualization infrastructure to implement a specific VNF, and the data connections between VNFs, as shown below. Network Service (e.g. VOIP, Web) +----------+ +----------+ +----------+ | VNF | data conn | VNF | data conn | VNF | | |-----------| |- ... ... -| | | +----------+ +----------+ +----------+ | |_____________________________________________________________| ^ | Virtualization Zong Expires January 16, 2014 [Page 3] Internet-Draft Reliable VNF Pool July 2013 +----------------------------------------------------------+ | Virtualization Infrastructure (e.g.hypervisors) | +----------------------------------------------------------+ Figure 1: Network Service and Virtualized Network Function. Although there are existing approaches to reliable network service, such as active-standby server mode, there are several reliability issues unique to the NFV environment. These issues are described in the following subsections. 3.1. Failure on Hypervisor It is likely that more than one VNF instance will run on top of a single hypervisor. Thus the failure of a single hypervisor will affect multiple VNF instances which will potentially result in a wider failure of network services. Therefore, the need to ensure that the hypervisor does not become a single point of failure is critical. Failover approaches to hypervisor layering, including hypervisor monitoring, fault detection, resource scaling, migration of VNF instances between host hypervisors also need consideration. 3.2. Reliable Data Connection Establishing reliable VNF instances is important, as is reliable data connections between VNF instances for reliable communication within the network service. It is possible to achieve a certain level of resiliency for data connection utilizing hypervisor layers, e.g., virtual network interfaces, however there will always be potential factors such as congestion and link failures in the physical network layer that will affect the data connections between VNF instances. Therefore, failover mechanisms which include link status monitoring and redirection of traffic from the fault affected data path to another data path is required. 3.3. Failure Separation It is important to limit the impact on the service performance in case of potential failure. The failure of a single hypervisor should affect a minimum number of VNF instances, as well as a minimum number of concurrent network services, based on the fact that multiple VNF instances in one or more network services are likely to be hosted by the same hypervisor. Therefore, an application may need to define some affinity rules regarding the deployment of VNF instances, e.g., separate hypervisors, separate DC sites. Zong Expires January 16, 2014 [Page 4] Internet-Draft Reliable VNF Pool July 2013 3.4. Reliability Class It is well-known that network services will require different levels of reliability. For example, real-time applications will required reliable VNF instances to negate disruption to delay-sensitive services. However, different VNF instances may have varying reliability and performance due to some varying factors, these include physical resource state (e.g., server load, network bandwidth). Therefore, a network service may need to request for certain class of reliability which would be provided once the admission control (policy) has established application or user rights for the requested reliability level. 4. Reliable Virtualized Network Function Pool Some implementations of Cloud Management System (CMS) may have already provided APIs to the cloud applications to improve reliability, such as status notifications from different Infrastructure as a Service (IaaS) products via plug-ins. However, certain degree of standardization is required in order to allow application, orchestrator and virtualization infrastructure to be developed independently from each other. Reliability and availability is a wide ranging problem space within the ETSI NFV ISG and wider NFV environment. We don't target on solving all the reliability issues of NFV. Instead, we focus on developing some tools to improve the reliability of NFV by using reliable VNF pool - including the set of reliable VNF instances and the reliable transport connections between VNF instances, which are applicable to a wide range of network services. We introduce an applicable architecture of reliable VNF pool. Note that the main purpose of this section is to scope potential solution space. The specification of the components and interfaces of the reliable VNF pool should be addressed in separate drafts [RSNDP]. A high level diagram of reliable VNF pool is illustrated as below. +-----------------+ | Network Service | +-----------------+ ^ | Service Request / Response V +--------------+ | Pool Manager | +--------------+ ^ | Pool Management Zong Expires January 16, 2014 [Page 5] Internet-Draft Reliable VNF Pool July 2013 | (e.g. VNF instance failover, | transport conn failover) V +--------------------------------------------------+ | +----------+ +----------+ +----------+ | | | VNF | | VNF | | VNF | | | | Instance | | Instance | ... ... | Instance | | | +----------+ +----------+ +----------+ | | VNF Pool | +--------------------------------------------------+ Figure 2: Reliable VNF Pool. There are two major parts in the reliable VNF pool. The first part is the interface between Pool Manager (PM) and VNF pool, which may include the following functions: 1) VNF instance registration to PM. Characteristics of a VNF instance include instance ID, VNF type, host hypervisor ID. 2) VNF instance status collection and fault management by PM. The status of VNF instance may include load, liveness. Fault management of VNF instance may include replacement of VNF instance, as well as re-establishment of the associated transport connections with other VNF instances. 3) Hypervisor status collection and fault management by PM. Fault management of hypervisor may include migration of VNF instances between host hypervisors, as well as re-establishment of the associated transport connections. 4) Transport connection status collection and fault management by PM. The status of transport connection may include congestion, link failure. Fault management of transport connection may include redirection of data traffic from one path to another. The second part is the interface between PM and network service, which may include the following functions: 1) Reliability class. A network service may request to a PM the desired or required class of reliability of the service. The PM will accordingly select VNF instances and establish transport connections to fulfill specific reliability requirements; otherwise a PM will notify the service regarding the resource availability. Zong Expires January 16, 2014 [Page 6] Internet-Draft Reliable VNF Pool July 2013 2) Failure separation. A network service may request to a PM specific affinity rules regarding the deployment of VNF instances, e.g., separate hypervisors, separate DC sites. The PM will select VNF instances based on the affinity criteria to fulfill the request; in the event that the request cannot be met the PM will notify the service the resource availability. 5. Gap Analysis and Related Works This section presents some prior work and discusses the suitability of existing solutions. Where applicable, the document will also highlight work which may be extended to meet the requirements and objectives of reliable VNF pools. 5.1. Reliable Server Pool Reliable Server Pool (RSerPool) supports high availability and scalability of applications through the use of pools of servers [RFC5351]. The basic functions of RSerPool are: 1) Server pool management including server registration, server fault management, load balancing, etc.; 2) Receive requests and a way for the client to bind to a desired server. The main protocol developed by RSerPool is called Aggregate Server Access Protocol (ASAP), which is responsible for the abstraction of the transport layer protocols (e.g. TCP, SCTP), load balancing, fault management, as well as presentation to the applications via a unified primitive interface [RFC5352]. The architecture of RSerPool is shown as below. +--------------+ | Application | +--------------+ ^ | Service Request / Response V +--------------+ | PR | +--------------+ ^ | Pool Management | (e.g. PE failover) V +--------------------------------------------------+ | +----------+ +----------+ +----------+ | Zong Expires January 16, 2014 [Page 7] Internet-Draft Reliable VNF Pool July 2013 | | PE | | PE | ... ... | PE | | | +----------+ +----------+ +----------+ | | Server Pool | +--------------------------------------------------+ Figure 3: Reliable Server Pool. The similarity and applicability of RSerPool to reliable VNF pool includes: 1) Pool Elements (PEs) can be regarded as VNF instances; 2) Pool Registrar (PR) in RSerPool has similar role with PM in reliable VNF pool in the perspective of PE registration, PE fault management, PE selection, etc. Nevertheless, there are some gaps for RSerPool such as: 1) No reliability class and failure separation support; 2) No transport layer reliability between any pair of VNF instances; 3) No hypervisor layer fault management. 5.2. Multipath TCP Multipath TCP (MPTCP) is a modified version of TCP that implements a multipath transport and achieves enhanced reliability of the data connection by pooling multiple paths within a transport connection, transparently to the application [RFC6182]. MPTCP is primarily concerned with utilizing multiple paths end-to-end, where one or both of the end hosts are multi-homed. The following diagram illustrates a typical usage scenario for MPTCP [RFC6182]. +------+ __________ +------+ | |A1 ______ ( ) ______ B1| | | Host |--/ ( ) \--| Host | | | ( Internet ) | | | A |--\______( )______/--| B | | |A2 (__________) B2| | +------+ +------+ Figure 4: Scenario of MPTCP. The applicability of MPTCP to reliable VNF pool includes: Zong Expires January 16, 2014 [Page 8] Internet-Draft Reliable VNF Pool July 2013 1) Transport layer reliability based on multiple transport between VNF instances. There are some identified constraints to the implementation of MPTCP such as potential shared bottlenecks, interpose of middleware terminating TCP sessions which is common in the context of end-to-end virtualized network service. Other gaps for MPTCP will be further studied. 5.3. VNF Forwarding Graph VNF forwarding graph (a.k.a. service chain in a wider sense) defines the sequence of VNF instances per user session must traverse [NFVUC]. An example of a VNF forwarding graph is where user packets traverse a sequence of following VNF instances: 1) Intrusion Detection Device; 2) Firewall; 3) Network Address Translation (NAT); 4) Server Load Balancer. Different network services have different VNF forwarding graphs based on specific user and therefore service logic. VNF forwarding graph and reliable VNF pool are independent but complementary with each other in the following aspects: 1) VNF forwarding graph determines the sequential relation between VNF instances, while reliable VNF pool selects reliable VNF instances and reliable transport connections between VNF instances; 2) Reliable VNF pool focuses on transport layer (e.g. SCTP, MPTCP) for reliable data connection, which is independent to packet forwarding layer (e.g. L2/L3). 6. Security Considerations TBD. Zong Expires January 16, 2014 [Page 9] Internet-Draft Reliable VNF Pool July 2013 7. IANA Considerations This document has no actions for IANA. 8. Acknowledgements The authors would like to than Daniel King from Lancaster University, UK for the valuable comments to this draft. 9. References 9.1. Normative References TBD. 9.2. Informative References [NFV-WP] NFV Whitepaper: "Network Function Virtualization", issue 1, 2012, http://portal.etsi.org/NFV/NFV_White_Paper.pdf. [NFV-REL] ETSI GS NFV REL 001: "Network Function Virtualization; Resiliency Requirements", Version 0.0.1, 2013. [NFV-TERM] ETSI GS NFV 003: "Terminology for Main Conceptional Entities in NFV", Version 0.0.4, 2013. [RSNDP] Q. Wu, "An Overview of Reliable Service Nodes discovery and provision Protocols", draft-wu-rsndp-overview-00, 2013. [RFC5351] P. Lei, L. Ong, M. Tuexen and T. Dreibholz, "An Overview of Reliable Server Pooling Protocols", RFC5351, September 2008. [RFC5352] R. Stewart, Q. Xie, M. Stillman and M. Tuexen, "Aggregate Server Access Protocol (ASAP)", RFC5352, September 2008. [RFC6182] A. Ford, C. Raiciu, M. Handley, S. Barre and J. Iyengar, "Architectural Guidelines for Multipath TCP Development", RFC6182, March 2011. [NFV-UC] ETSI GS NFV 001: "Network Function Virtualization; Use Cases", Version 0.0.2, 2013. 10. References Author's Address Zong Expires January 16, 2014 [Page 10] Internet-Draft Reliable VNF Pool July 2013 Ning Zong Huawei Technologies Email: zongning@huawei.com Zong Expires January 16, 2014 [Page 11]