Network Working Group L. Xia
Internet-Draft Q. Wu
Intended status: Standards Track Huawei
Expires: January 16, 2014 D. King
Lancaster University
July 15, 2013

Use cases and Requirements for Virtual Service Node Pool Management
draft-xia-vsnpool-management-use-case-00

Abstract

Network edge applications such as subscriber termination, firewalls, tunnel switching, intrusion detection, and routing are currently provided using dedicated network function hardware. As network function is migrated from dedicated hardware platforms into a virtualized environment, a set of use cases with application specific requirements begin to emerge. These use cases and requirements cover a broad range of capability and objectives, which will require detailed investigation and documentation in order to identify relevant architecture, protocol and procedure solutions.

This document provides an analysis of the key management requirements for applications that may be hosted within a virtualized environment. These engineering requirements are based on a variety of goals including: virtual application security, reliability, scalability, performance, management and automation.

Note that this document is not intended to provide or recommend solutions.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at http://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on January 16, 2014.

Copyright Notice

Copyright (c) 2013 IETF Trust and the persons identified as the document authors. All rights reserved.

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.


Table of Contents

1. Introduction

Network virtualization technologies are finding increasing support among network and Data Center (DC) operators. This is due to demonstrable capital cost reduction and operational energy savings, simplification of service management, and potential for increased resiliency and elasticity.

Within traditional DC networks, multiple middleware boxes including FW (Fire Wall), NAT (Network Address Translation), LB (Load Balance), WoC (Wan Optimization Controller), etc., are being used to provide services, traffic control and optimization. Each function is an essential part of the entire DC network, and overall service chain. Combined these functions and capabilities can be termed as service nodes.

In terms of virtualizing the DC network, a significant amount of Service nodes and Function instances within the service nodes can be initiated and virtualized, in essence the middleware capability is implemented in software on commodity hardware using well defined industry standard servers. Thus allowing the creation, scaling, migration, modification, and deletion of single or groups of functions, across few or many service nodes.

These virtual service nodes are location independent, i.e., they may exist across distributed or centralized DC hardware. This architecture will pose new issues and great challenges to the automatic provision across the DC network, while maintaining high availability, fault-tolerant, load balancing, and plethora of other requirements some of which are technology and policy based.

Today, mechanisms exist to define architecture and protocols for the management and operation of server hardware supporting applications, these hardware resources are known as server node pools, which may be accessed by other servers and clients. These server node pools have a well-established set of requirements related to management, availability, scalability and performance. Within this document we refer to virtualization of server node pools as Virtual Service Node Pool (VSNP).

[VNF-PS] provides an overview of the problem space related to service nodes reliability. This document provides an analysis of the key applications that may be hosted within a virtualized environment. These engineering requirements are based on a variety of objectives related to virtual application security, reliability, scalability, performance, management and automation.

This document is not intended to provide or recommend solutions. The intention of this document is to present an agreed set of objectives for VSNPs, identify requirements and present architecture framing.

2. Terminology

Broadband Network Gateway (BNG):
IP Edge Route where bandwidth and QoS policies may be applied, to support multi-service delivery [TR-101].
Call Session Control Function (CSCF):
A function that is used to manage the mobile IP Multimedia Subsystem (IMS) signaling from users to services and network gateways.
Hypervisor:
Software running on a server that allows multiple VMs to run on the same physical server. The hypervisor manages and provide network connectivity to Virtual machines [NVO3-FWK].
IP Multimedia Subsystem (IMS):
The IP Multimedia Subsystem used within mobile core networks.
Network Functions Virtualization (NFV):
Moving network function from dedicated hardware platforms onto industry standard high volume servers, switches and storage.
Residential Gateway (RGW)
Set-top Box (STB):
This device contains Audio and Video decoders and is intended to connect to a television set and media source.
Virtual Machine (VM):
Software abstraction of underlying hardware.
Virtualized Server (VS):
A virtualized server runs a hypervisor supporting one or more VMs [NVO3-FWK].
Virtual Service Node Pool (VSNP):
Virtualized server resources supporting a variety of applications.

3. Application Availability and Reliability

Shifting towards virtualization of hardware function presents a number of challenges and requirements related to application availability and reliability. Redundancy via multiple instances of virtualized network function in the virtualized server (VS) or virtual service node pool (VSNP) may insulate applications and client services ultimately from certain hardware related failures and errors, but it does not present a scalable solution.

Hosted applications in large DC environments, may need to deal with traffic from millions of hosts. Furthermore, there are separate availability and reliability requirements and objectives for the Virtualized Server and a VSNP, and the connectivity between VSNPs or even traditional service node pools.

3.1. Virtualized Server

As highlighted earlier in this document, a number of functions or function instances may be provided on a VS. Using a VS providing firewall(FW) application as an example, VS provides multiple firewall function instances for reliability consideration, in the event of one function instance failure or resource insufficient in a VS, it would be important to detect faults and take the necessary action to resolve the problem and ensure client traffic continues to be inspected and forwarded by the FW application running on the VS. This example can be articulated as a number of objectives, documented as requirements, which are detailed below:

3.2. Virtual Service Nodes Pool (VSNP)

The VS may have one or more virtual network functions running on the hypervisor. These virtual network functions may provide the same type of service or each provides different type of service. In many cases, these virtual network function instances may belong to several different VSs.

In order to manage server virtualization across a set of virtualized servers and provide fault tolerant and load sharing across VSs, the VSNPs may be initiated, facilitating the migration of a large number of virtual network function instances running on different hypervisors and belonging to different VSs to register into and deregister out. In case of function instance failure or VS overloading, such VSNPs can be used to support traditional service node replacement or service node adding. Therefore a number of similar objectives for VS instances, documented as requirements, are detailed below:

3.3. The Connectivity between Service Nodes

The connectivity between service nodes can be used to deliver service through a set of service nodes to meet the service requirements.

3.4. The Connectivity between Virtual Service Nodes Pools

One virtual service node pools can not provide registration service for all the virtual network function instances running on different hypervisor and belonging to different virtualization sever. Therefore usually we uses multiple service node pools to provide a fully distributed and fault-tolerant registration service.

The connectivity between virtual service node pools can be used to maintain synchronization of data Concerning virtual network function instance scattered in different virtual service pool. By this means, every service node pool can acquire the overall information of all the virtual service nodes and provide protection for each other. Also a number of mechanisms, documented as requirements are detailed below:

3.5. The Connectivity between Virtual Service Node Pool and Service Node

The connectivity between virtual service node pool and service node is used by virtual service pool to provide registry service to the virtual network function instance belonging to different virtual server and provide failover of the service node. A set of virtual service node pools can be configured to provide reliable registration. When one service node cannot get a register response from one virtual service node pool, it can go to another pool for registration.

4. Use Cases

4.1. IP Multimedia Core Network Subsystem (IMS)

A key use case for NFV is the virtualization of key mobile core network functions. The ETSI NFV use case [NFV-ISG-UC] describes requirements for server and packet gateways (S/P-GW) used for Packet Data Network (PDN) connections and IMS session (see Figure 1: Virtualized mobile core network and IMS). Typically these services are time dependent and may require a large number of computing resources. Therefore it is desirable to scale them according to their specific computing requirements. The virtualization can be applied to the Evolved Packet Core (EPC) and the IMS to provide end to end service with service availability and resilience. When those virtualized network functions(e.g., virtualized S/P-GW and IMS functions) are down or overloaded, dynamic relocation of those virtualized network function can be performed, the relocation of the managed sessions and/or connections must be accordingly managed. It also should be noted in [NFV-REL-REQ]that the traffic in the original virtualized network function instance must be routed to the new location and it is desirable that the movement of the virtual network function is transparent to other virtual network function instances and or physical network entities such as client application on the UE. That is to say the other virtual network function instances don’t require to take any special action to this movement.

+----------------+   +---------------------------------+
| vEPC           |   |    vIMS                         |
|                |   |                                 |
|  +---------+   |   |                 +----------+    |
|  |         |   |   |                 |          |    |
|  | vP/SGW  +---+-+-|              +--+ vS-CSCF  |    |
|  |         |   | | |              |  |          |    |
|  +---------+   | | | +--------+   |  +----------+    |
|Overload/Failure| |-+-|        +---| Overload/Failure |
|                |   | | P-CSCF |                      |
|                | ++++|        +++++                  |
|  +---------+   | + | +--------+   +  +----------+    |
|  |         |   | + |              +  |          |    |
|  | vP/SGW  +++++++ |              +++| vS-CSCF  |    |
|  |         |   |   |                 |          |    |
|  +---------+   |   |                 +----------+    |
|                |   |                                 |
|  PDN Connection|   |      IMS Session                |
+----------------+   +---------------------------------+

Figure 1: Virtualized Mobile Core Network and IMS

In this use case, the following requirements need to be satisfied:

4.2. Resilience for Stateful Service

In the Service continuity use case provided by ETSI [NFV-REL-REQ], it describes virtual middlebox appliances providing layer-3 to layer-7 services may require maintaining status information, e.g., stateful vFW. In case of hardware failure or processing overload, it is necessary to move that status information to where the vFW can keep accessing. In the meanwhile the vFW function instance offering firewall service can be moved as well and the offered service and its performance can be maintained.

Another typical example is a session-based service such as SIP. The status information can be restored in the same VM where the vCSCF is moved (1:1 Resiliency) or in a different VM (1:N Resiliency) as far as the vFw can keep accessing it.

In case of multiple vFw on one VM and not enough resources are available at the time of failure, a possible approach is to move some part of the virtual network function instances to a new place desirably based on the Service Level Agreement (SLA). Two strategies can be taken: one is to move as many vFws as possible to a new place according to the available resources, and the other is to suspend one or more running virtual network function instances in the new place and move all vFws on the failed hardware.

                      Limited |                          |
                      Resource|                   Suspend|
                              V                          V
   +----+ +----+     +----+ +----+     +----+ +----+  +----+
   |vFw1| |vFw1|     |vFw1| |vFw2|     |vFw1| |vFw1|  |vFw3|
   +----+ +----+     +----+ +----+     +----+ +----+  +----+
   +------------+    +------------+    +-------------------+
   |    VM      |    |    VM      |    |        VM         |
   +------------+    +------------+    +-------------------+
   +------------+    +------------+    +-------------------+
 /-\            |    |            |    |                   |
|  ||  Server   |    |   Server   |    |      Server       |
 \-/            |    |            |    |                   |
   +------------+    +------------+    +-------------------+
Hardware
Failure

Figure 2:Resilience for Stateful Service

In this case, the following requirements need to be satisfied:

4.3. Auto Scale of Virtual Network Function Instances

Adjusting resource to achieve dynamic scaling of VMs described in the ETSI [NFV-INF-UC] use case and [NFV-REL-REQ], the management and orchestration entity may be configured by to support dynamic scaling (increase or decrease) of allocated VMs hosting virtual network functions (see Figure 5: Auto Scale of Virtual Network Function Instance). If more service requests come to a Virtual Network Function Instance than can be accommodated in one physical hardware node, processing overload starts to occur. In this case, the movement of the Virtual Network Function Instance to another physical node with the same performance will just create the same overload situation. A more desirable approach is to replicate the Virtual Network Function Instance and distribute ones to multiple physical hardware nodes and at the same time distribute the incoming requests to those nodes. For example, some particular virtual network function instances requiring increased performance might be partitioned across multiple VMs. To guarantee this performance, the hypervisor dynamically mediates(scaling up or scaling down) resources to each virtual network function instances in line with the current or predicted performance needs.

                         +--------------+
 +-------------------+   |              |
 |                   |   |Management and|
 |                   <===>Orchestration |
 |    +---------+    |   |    Entity    |
 |    |   #1    |    |   +--------------+
 |  --| vIPS/IDS|--  |           /\
 |  | +---------+ |  |           ||         +---------+
 |  |             |--|--         ||      <--|End User1|
 |  |    VM #1    |  | |         ||         +---------+
 |  +-------------+  | |    +----\/---+
 |                   | |    |         |     +---------+
 |    +---------+    | |    |         |  <--|End User2|
 |    |   #2    |    | |    |         |     +---------+
 |  --| vIPS/IDS|--  | |    |         |
 |  | +---------+ |  | |    |         |     +---------+
 |  |             ---|------- Service |  <--|End User3|
 |  |    VM #2    |  | |    | Router  |     +---------+
 |  +-------------+  | |    |         |     +---------+
 |                   | |    |         |  <--|End User4|
 |    +---------+    | |    |         |     +---------+
 |    |   #3    |    | |    |         |     +---------+
 |  --| vIPS/IDS|--  | |    |         |  <--|End User5|
 |  | +---------+ |  | |    +---------+     +---------+
 |  |             ---|--                        :
 |  |    VM #3    |  |
 |  +-------------+  |                          :
 |                   |
 +-------------------+

Figure 3: Auto Scale of virtual network Function Instances

In this case, the following requirements need to be satisfied:

4.4. Reliable Network Connectivity between Network Nodes

In the Reliable network connectivity between network nodes use case provided by ETSI [NFV-INF-UC] use case, the Management and Orchestration entities must be informed of changes in network connectivity resources between network nodes. For example, Some network connectivity resources may be temporarily put in power savings mode when resources are not in use. Another example, some network connectivity resource may be temporarily in a fault state and comes back into an active state, however some other network connectivity resource becomes permanent in a fault state and is not available for use.

    +-----------+
    |Ochestrator|
    +-----------+

                      Web
         vDPI       vCache      vFW         vNATPT

       +--------+ +--------+  +--------+ +--------+
       | +----+ | | +----+ |  | +-++-+ | | +----+ |
  |------|    ------|    -------| || | ----|    |<-----
  |    | |    | | | |    | |  | | || | | | |    | |   |
  |    | +----+ | | +----+ |  | +-++-+ | | +----+ |   |
  |    |        | |        |  |        | |        |   |
+----+ |        | | +----+ |  | +-++-+ | |        |   V| ,--,--,--.
|    | |        | | |    | |  | | || | | |        |  ,-'          `-.
|    |<->---------- |    |----- | || |-----------<-->    Internet   )
|    | |        | | +----+ |  | +-++-+ | |        |  `-.          ,-'
+-|--+ |        | |        |  |        | |        |   A `--'--'--'
  |    | +----+ | |        |  | +-++-+ | | +----+ |   |
  |    | |    ------------------| || ------|    |<----|
  --------    | | |        |  | | || | | | |    | |
       | +----+ | |        |  | +-++-+ | | +----+ |
       +--------+ +--------+  +--------+ +--------+

Figure 4. Reliable Network connectivity

In this case, the following requirements need to be satisfied:

4.5. Existing Operating Virtual Network Function Instance Replacement

                           Direct flow to new    |   |
          +------------+        vFW              |   |
          |Orchestrator|---------------|         |   |
          +-|---------|+               |       +-V---V+
            |         |                --------|,--,--|/
 Create and launch    | Report Statist    ,-'  +------+`-.
     new vFW          | (Traffic,CPU     (               ')
            |         |   Failure..)      `-. +-------+,-'
            |         |                      `|  APP  |
   +--------|---+  +--|---------+             | Server|
   |Host2       |  |Host1       |             +-------+
   |            |  |            |
   | +---++---+ |  | +---++---+ |
   | |vFW||vFW| |  | |vFW||vFW| |
   | +---++---+ |  | +---++---+ |
   | +---++---+ |  | +---++---+ |
   | |vFW||vFW| |  | |vFW||vFW| |
   | +---++---+ |  | +---++---+ |
   +------------+  +------------+

Figure 5. Existing vFW replacement

In the Replacement of existing operating VNF instance use case provided by ETSI [NFV-INF-UC] use case, the Management and Orchestration entity may be configured to support virtualized network function replacement. For example, the Network Service Provider has a virtual firwall that is operating. When the operating vFW overloads or fails,the Management and Orchestration entity determines that this vFW instance needs to be replaced by another vFW instance. In this case, the following requirements need to be satisfied:

4.6. Reliable Traffic Steering

The characteristics shared by aggregation and mobile-backhaul networks, include a large number of nodes, middlebox appliances and applications providing layer-3 to layer-7 services. Connections are relatively static tunnel, that provide traffic multiplexing for many flows (see Figure 4: Reliable Traffic Steering). These networks are also known for their stringent requirements with regard to reliability and short recovery times. The virtualization of the aggregation network will provide optimization of resource allocation and improved traffic forwarding.

Within the aforementioned networks subscriber traffic may be steered through more than one appliances or bypass some appliances completely. For example, traffic may pass through virtualized DPI and FW functions, However, once the type of the flow has been determined by the virtualized DPI function, the operator may decide to modify the services applied to it. For example, if the flow is an internet video stream, it may no longer need to pass the FW service, reducing traffic load on it. Furthermore, in order to reduce traffic load on some appliances or isolate fault on some appliances, after the service type has been detected, the subsequent packets of the same flow may no longer need to pass the LB service either; hence the path of the flow can be updated.

                     --,--.,--,--,--.--,--.
                  ,-'                      `-.
              ,                              -
    Home     (     -------                  | |  -
  Enviroment (   +-|--+ +-|-++----++----+ +----+  )
+-----------+(   |vDPI| |vLB||vFW1||vNAT| |vFW2|  )
|           |(   +----+ +---++----++----+ +----+  )
|  +----+   |(     \      |                /  /   )
|  |STB |\  |(      \     |               /  /    )
|  +----+  \|--`       \  /       /-------/  /    )
|           |(  \    +---+ ,--,+---+_._ _ _ /    -)
|  +----+   |(   --- |   |----'|SBR|-- .          )
|  |PC  |++++++++++++|SBR|     +---+  |')         )
|  +----+   |(------ |   |+        +---+          )
|  +----+  /|(       +---+ ++++'++'|   |-------   )
|  |iPad|/  |(                     |SBR|          )
|  +----+   |(                     |   |++++++-   )
|           |(                     +---+          )
+-----------+ .                                   )
               `-  SBR-Service Border Router   ,-'
                 `-.  --,--.,--,--,--.--,- ,

Figure 6: Reliable traffic steering

In this case, the following requirements need to be satisfied:

4.7. Reliable Quality Content Offering

Virtualization of CDNs described in the ETSI [NFV-ISG-UC] use case (see Figure 3: Virtualized CDNs network), the CDN controller (a centralized component) selects a Cache Node (CN), or a pool of CNs, to satisfy user requests and demand. A number of CNs are distributed within the network and to meet user requests and deliver content [RFC6707]. In order to deal with exponential growth of content traffic delivered to users, whilst achieving acceptable performance by shifting from broadcast to unicast delivery [RFC6707], the CDN Controller and CNs may be virtualized and the content placed closer to the user. This provides network bandwidth savings and delivery of high bandwidth content more reliably. Deploying CNs as virtual appliances on a standardized commodity hardware also allows efficient and cost effective scaling and delivery of content.

              |    +----------+   |
              |    |  CDN     |   |
              |    |Controller|   |
            +-+---++----------+ +-+---+ +-------+ +-------+
            |vCN1 |             |vCN2 | | CSP-1 | | CSP-2 |
            +-+-|-+             +|+---+ +-------+ +-------+
              | \                /|         |         |
              |  \ ,--,--,--.   / |        ,--,--,--./
+----------+  | ,-'          `-/  |     ,-'          `-.
| End User | =|(CDN Provider 'B')=|====(CDN Provider 'A')
+----------+  | `-. (CDN-B)  ,-'  |     `-.  (CDN-A) ,-'
              |    `--/--'\-'     |        `--'--'--'
              |      /     \    --+---+
           +--+--+  /       \---|vCN1 |
           |vCN2 |-/            ------+
           +--+--|                |
              |                   |
              |                   |

                 vCN1-vCache Node1 vCN2=vCache Node2
                 CSP-1 Content Service Provider1
                 CSP-2 Content Service Provider2

Figure 7: Virtualized CDNs network

In this case, the following requirements should be satisfied:

4.8. Availability of High Bandwidth Access

In the ETSI Virtualization of Home Environment use case [NFV-ISG-UC] it describes how home devices including the Residential Gateway (RGW) and Set-top Box (STB) (see Figure 2: Virtualized Home Network) providing service and functionality are virtualized and migrated to the service platform located in the network for provisioning simplification and service integration. The virtualized RGW (vRGW) [WT-317]provides private address to the home and deliver services to home devices. The Virtual STB (vSTB) uses a public IP address to communicate with the vRGW and its service platforms (IPTV or Internet platforms) via Broadband Network Gateway (BNG).

+-----------------+                                  ----
| Home Network    |                              ///-    -\\\
|                 |                             /            \
|  +------+       |                            |              |
|  | STB  |-------+-+                       +--|  Data Center |
|  +------+       | |                       |  |              |
|                 | |                       |   \            /
| +------------+  | |+------+     +-----+   |    \\\-    -///
| |            |  | ||      |     |     |   |        ----
| | Small      +--+-++ vRGW +-----|vSTB |   |
| | HDMI Dongle|  | ||      |     |     |   |
| |            |  | |+------+     +--+--+   |
| +------------+  | |                |      |        ----
|                 | |                |      |    ///-    -\\\
|                 | |             +--+--+   |   /            \
|  +------+       | |             |     |   |  |              |
|  | STB  +-------+-+             |BNG  +---+--+   Internet   |
|  +------+       |               |     |      |              |
|                 |               +-----+       \            /
+-----------------+                              \\\-    -///
                                                     ----
               Figure 8 Virtualized Home Network

Virtualization of media services such as those provided by the vSTB will pose a variety of CPU, memory and bandwidth challenges:

In this use case, the following requirements must be satisfied:

5. IANA Considerations

This document has no actions for IANA.

6. Security Considerations

TBC.

7. Informative References

, ", ", "
[NFV-ISG-UC]Network Function Virtualisation; Use Cases;", ISG NFV Use Case, June 2013.
[NFV-INF-UC]Network Functions Virtualisation Infrastructure Architecture Part 2: Use Cases", ISG INF Use Case, June 2013.
[NFV-REL-REQ]Network Function Virtualisation Resiliency Requirements", ISG REL Requirements, June 2013.
[VNF-PS] Zong, N., "Problem Statement for Reliable Virtualized Network Function (VNF) Pool", July 2013.
[TR-101] Broadband Forum, "Migration to Ethernet-Based DSL Aggregation", 2006.
[WT-317] Broadband Forum, "Network Enhanced Residential Gateway", 2013.
[BBF-FSC-UC] Broadband Forum, "Flexible Service Chaining", 2013.
[NVO3-FWK] Lasserre, M., "Framework for DC Network Virtualization", ID draft-ietf-nvo3-framework-00, September 2012.
[RFC6707] Niven-Jenkins, B., "Content Distribution Network Interconnection (CDNI) Problem Statement", September 2012.

Authors' Addresses

Liang Xia Huawei 101 Software Avenue, Yuhua District Nanjing, Jiangsu 210012 China EMail: frank.xialiang@huawei.com
Qin Wu Huawei 101 Software Avenue, Yuhua District Nanjing, Jiangsu 210012 China EMail: sunseawq@huawei.com
Daniel King Lancaster University UK EMail: d.king@lancaster.ac.uk