Internet DRAFT - draft-kj-nvo3-pion-architecture

draft-kj-nvo3-pion-architecture






Network Working Group                                             L. Jin
Internet-Draft                                                       ZTE
Intended status: Informational                             B. Khasnabish
Expires: November 12, 2012                                       ZTE USA
                                                            May 11, 2012


         Architecture of PSN Independent Overlay Network(PION)
                 draft-kj-nvo3-pion-architecture-00.txt

Abstract

   This draft introduces PSN independent overlay network (PION)
   architecture for intra- and inter-datacenter (DC) connections.  The
   motivations, protocol layers, applications, and etc, for PION are
   also discussed.  PION provides a virtualized underlying-PSN-
   independent network in order to maximize the reuse of IETF protocol
   definitions and implementations.  The inter- and intra-DC connection
   provided by PION could be from endpoint to endpoint, or endpoint to
   network, or network to network.  The packet transport capabilities
   provided by the overlay network are determined by the capability of
   the underlying PSN.

Status of this Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on November 12, 2012.

Copyright Notice

   Copyright (c) 2012 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of



Jin & Khasnabish        Expires November 12, 2012               [Page 1]

Internet-Draft     draft-kj-nvo3-pion-architecture-00           May 2012


   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.


Table of Contents

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
   2.  List of Acronyms . . . . . . . . . . . . . . . . . . . . . . .  3
   3.  PION Definition  . . . . . . . . . . . . . . . . . . . . . . .  3
   4.  PION Motivation  . . . . . . . . . . . . . . . . . . . . . . .  4
   5.  PION Protocol model  . . . . . . . . . . . . . . . . . . . . .  5
     5.1.  Protocol Layers  . . . . . . . . . . . . . . . . . . . . .  5
     5.2.  Encapsulation Layer  . . . . . . . . . . . . . . . . . . .  6
     5.3.  Tenant Network Identifier Layer  . . . . . . . . . . . . .  7
     5.4.  PSN Layer Encapsulation  . . . . . . . . . . . . . . . . .  7
     5.5.  Associate tenant and PSN Layer . . . . . . . . . . . . . .  7
   6.  Network Architecture . . . . . . . . . . . . . . . . . . . . .  8
   7.  Applicability of PION  . . . . . . . . . . . . . . . . . . . .  8
     7.1.  PION over IP PSN . . . . . . . . . . . . . . . . . . . . .  9
     7.2.  PION over MPLS PSN . . . . . . . . . . . . . . . . . . . .  9
   8.  Control Plane Consideration  . . . . . . . . . . . . . . . . . 10
   9.  Acknowledgments  . . . . . . . . . . . . . . . . . . . . . . . 11
   10. Informative References . . . . . . . . . . . . . . . . . . . . 11
   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 12























Jin & Khasnabish        Expires November 12, 2012               [Page 2]

Internet-Draft     draft-kj-nvo3-pion-architecture-00           May 2012


1.  Introduction

   This draft introduces architecture of PSN independent overlay network
   (PION) for intra and inter datacenter connection, the motivation,
   protocol layer, application and etc.  The PSN in this draft refers to
   IP or MPLS network.  PION provides a virtualized network independent
   of underlying PSN, so as to maximize the reuse of IETF protocol
   definitions and implementations.  That means the overlay network
   could work on any underlying PSN layer, and reuse the capability of
   the underlying layer.

   The inter- and intra-DC connection provided by PION could be from
   endpoint to endpoint, or endpoint to network, or network to network.
   The packet transport capabilities provided by the overlay network are
   determined by the capability of the underlying PSN.  Enabling overlay
   network to be independent of underlying PSN allows overlay network to
   be benefit from different kinds of underlying PSN capabilities.


2.  List of Acronyms

   PSN: Packet Switched Network

   PION: PSN independent overlay network

   PION Header: include an encapsulation layer and tenant ID layer

   Tenant Packet: a customer packet encapsulated with PION header

   BW: BandWidth

   ECMP: Equal Cost Multi-Path

   MPLS: Multi-Protocol Label Switch

   NVE: Network Virtualization Edge

   QoS: Quality of Service

   SLA: Service level aggrement


3.  PION Definition

   PSN independent overlay network (PION) is to provide an overlay
   network for datacenter, to provider intra and inter connection
   between end-points over various kinds of underlying PSN.  PION could
   provide service bandwidth and QoS assurance, multicast, traffic



Jin & Khasnabish        Expires November 12, 2012               [Page 3]

Internet-Draft     draft-kj-nvo3-pion-architecture-00           May 2012


   engineering, security and other capabilities by relying on different
   kinds of underlying PSN capabilities.  PION is designed to isolate
   traffic and addresses among different tenants, and to be scalable
   enough to accommodate millions of end-points.

   The PION provides the following functionalities:

   1.  Be PSN independent, to maximize reuse of existing IETF defined
   PSN technologies.

   2.  Provide traffic/address isolation for each tenant traffic;

   3.  Provide good scalability to accommodate two million VMs running
   on greater than one hundred thousands of physical servers;

   4.  Provide differentiate service for differentiate tenant, including
   bandwidth, QoS and etc;


4.  PION Motivation

   As a core requirement, emerging datacenters need to support both
   multi-tenancy and high scalability.  The packet transport
   capabilities provided by overlay network are determined by the
   capability of underlying PSN, making overlay network independent of
   underlying PSN allow overlay network to benefit from different kinds
   of underlying PSN capabilities.

   The IP PSN could provide the maximum connection availability for the
   overlay network, and it also provides stateless IP connections which
   ease the operation of PSN tunnel.  Approach examples like VXLAN
   [I-D.mahalingam-dutt-dcops-vxlan], NVGRE
   [I-D.sridharan-virtualization-nvgre] or STT [I-D.davie-stt] allow
   setting up of a Layer 2 overlay network over UDP, IP, or even TCP-
   like.

   The MPLS PSN is now widely deployed in wide area network, and has
   been proved to have capability to provide connection with bandwidth
   guarantee, differentiate QoS assurance, high resiliency, and etc.
   There are some capabilities that IP PSN does not own, but MPLS PSN
   do.  One typical example is as below:

   1.  It is required to support bandwidth guarantee per tenant, not
   shared bandwidth provisioning among tenants.  The IP connections
   resources among NVEs are served for all tenants, and would be unable
   to setup connections that are dedicated for tenants.  Some "Gold"
   class tenants may require bandwidth guarantee for the service.  If
   tenants of other category (e.g., Silver, Bronze, etc.) are mixed/



Jin & Khasnabish        Expires November 12, 2012               [Page 4]

Internet-Draft     draft-kj-nvo3-pion-architecture-00           May 2012


   shared with Gold category tenants, and the traffic flows from all
   category tenants are transferred over the same connection, the
   desired bandwidth of the Gold-class tenants may not be guaranteed.
   When the overlay network is across WAN, the bandwidth guarantee
   problem would be exaggerated by the limited bandwidth in WAN.  The
   MPLS PSM has the capability to provide tenant-aware traffic
   transportation.  For example, when the connection provided by overlay
   network is across WAN with IP/MPLS enabled, and then the specified
   tenant traffic could be traffic engineered by the IP/MPLS network,
   which would greatly improve the tenant service transportation
   quality.

   The purpose of PSN independent overlay network (PION) is to reuse
   various kinds of existing IETF defined PSN technologies, while
   keeping the tenant packet encapsulation to be uniformed over
   different type of PSN connections/tunnels.  The PSN here mainly refer
   to IP and MPLS, the layer2 PSN technologies are excluded.


5.  PION Protocol model

5.1.  Protocol Layers

   PION protocol layering model is shown below:


       +-------------------------------------------+
       |                Customer Payload           |
       |                      ~~~                  |
       /===========================================\
       H           Tenant Network Identifier       H
       H-------------------------------------------H  <--Tenant Header
       H                 Encapsulation             H
       \===========================================/
       |                  PSN Layer                |
       +-------------------------------------------+
       |                  Data-Link                |
       +-------------------------------------------+
       |                Physical Layer             |
       +-------------------------------------------+

                         Figure 1

   The customer payload in datacenter would be an Ethernet payload, but
   here it does not preclude other type of payload, e.g, IP payload.

   The encapsulation layer provides packet transport with some
   capabilities that other layers could not provide.  For more detail,



Jin & Khasnabish        Expires November 12, 2012               [Page 5]

Internet-Draft     draft-kj-nvo3-pion-architecture-00           May 2012


   see section 4.2.

   Tenant Network Identifier (TNI) layer provides customer traffic and
   address isolation among different tenants.  This identifier maybe
   unique per NVE, but MUST be unique per connection between two NVEs.

   The PSN layer provides physical network transport for the virtualized
   network in datacenter, and is maximally reused from IETF definition
   protocols.  The ECMP transport capability of PSN layer should be able
   to hash the traffic per flow per tenant.

   Data-Link and physical layer is out of the scope of this document.

5.2.  Encapsulation Layer

   There are several functions/services that the encapsulation layer
   could provide.  This draft lists the following functions/services:

   1.  Customer payload indication to indicate different type of
   customer payloads.

   2.  Packet sequencing and fragmentation capability.

   3.  Flow entropy value to add flow based entropy, and tag all the
   packets from a flow with an entropy label.

   The customer payload could be Ethernet in many cases, but does not
   preclude IP payload.  An indication value in encapsulation layer
   could be provided to indicate the customer payload type.  Some
   application using UDP transportation requires to transmit packets
   with sequence, and to get the information of packet loss.  Some
   application requires lager packet transportation to improve
   efficiency, and then packet fragmentation is required, and is
   preferred to be performed at hardware layer.  The encapsulation layer
   has the capability to provide packet fragmentation information.  Some
   PSN connection used by PION does not provide ECMP capability, e.g,
   GRE.  The encapsulation layer would provide such ECMP capability, by
   adding a flow entropy value to indicate flow based entropy, and it is
   required to tag all the packets from one flow with same entropy
   value.

   As the PSN layer with UDP encapsulation, the entropy value could be
   added to the UDP source port, then the flow entropy value in
   encapsulation layer could be omitted.

   As the PSN layer with GRE tunnel, the flow entropy value in
   encapsulation layer should be added if ECMP per flow is required.




Jin & Khasnabish        Expires November 12, 2012               [Page 6]

Internet-Draft     draft-kj-nvo3-pion-architecture-00           May 2012


   As the PSN layer with TCP-like [I-D.davie-stt] encapsulation, the
   sequencing and fragmentation could be provided by the IP layer, and
   then the sequencing and fragmentation capability in tenant header
   could be omitted.  The entropy value could be added to the TCP source
   port, and then the flow entropy value in encapsulation layer could be
   omitted.

   As the PSN layer with MPLS tunnel, the sequencing and fragmentation
   in tenant header would be applied if required.  The entropy value
   could be added to the MPLS flow label, and then the flow entropy
   value in encapsulation layer could be omitted.

5.3.  Tenant Network Identifier Layer

   The tenant network identifier (TNI) could be an integer to indicate
   the membership of each customer packet.  One example is to use an
   explicit integer number, like VLAN.  Take datacenter for example,
   explicit tenant ID will simplify the interoperations in the inter-
   datacenter connection environment.  By whatever control plane TNI has
   been allocated, static configuration or dynamic allocation, the
   overlay network with different control plane could be always
   interoperable with same TNI.  That would be particular useful when
   interconnecting two datacenter with different control plane, the
   operator only needs to ensure the same TNI (or by TNI translation) to
   interoperate.

5.4.  PSN Layer Encapsulation

   The PSN Layer required for PION could be any kinds of PSN connection
   that has capability to transmit tenant packets.  There would be
   generally two kinds of PSN connection that could be provided, IP and
   MPLS.

5.5.  Associate tenant and PSN Layer

   It is the NVE's responsibility to associate the tenant with PSN
   connection to one peer NVE, which could be done by configuration or
   other implementation specific way.  Different type of PSN connections
   could be used between different NVEs within one tenant.  The NVE
   should have the capability to setup the specified PSN connection if
   required.  For example, if only IP connection required between or
   among NVEs, IP connection setup capability is required for NVEs.  If
   one NVE requires BW guarantee connection to peer NVE which is located
   in another datacenter across WAN, the NVE should setup hierarchy MPLS
   LSP as specified in section 7.2, and specify the bandwidth required.






Jin & Khasnabish        Expires November 12, 2012               [Page 7]

Internet-Draft     draft-kj-nvo3-pion-architecture-00           May 2012


6.  Network Architecture

   One important application for the overlay network is to provide intra
   and inter datacenter connection between end-point and end-point, or
   end-point and network, see figure below.


         /-----DC1-----\ /-------WAN-------\ /-----DC2-----\
       +---+           | |                 | |           +---+
       |NVE|--\        | |                 | |        /--|NVE|
       | 1 |   \                                     /   | 3 |
       +---+    \   +-------+           +-------+   /    +---+
                 \--|  Edge |           |  Edge |--/
                 /--|Router1|-----------|Router2|--\
                /   +-------+           +-------+   \
       +---+   /                                     \   +---+
       |NVE|--/                                       \--|NVE|
       | 2 |           | |                 | |           | 4 |
       +---+           | |                 | |           +---+
         \-----DC1-----/ \-------WAN-------/ \-----DC2-----/

                              Figure 2

   There are two datacenters, DC1 and DC2 managed by same
   administrators, and the two datacenters are connected by the WAN
   which would be an IP/MPLS network.  The overlay network of intra-
   datacenter connection is to connect end-points within a datacenter
   (e.g, between NVE1 and NVE2, NVE3 and NVE4), and be scalable enough
   without being restricted by the topology of underlying datacenter
   network.

   The overlay network of inter-datacenter connection is to connect end-
   points between two different datacenters (e.g, between NVE1 and NVE3
   if NVE1 and NVE3 is the gateway).  In this case, the overlay network
   would be across a wide area network where the network resources would
   be always limited.  The overlay network should have the capability to
   provide QoS/BW guarantee per tenant customers, even when being across
   WAN where large network providers have already deployed MPLS
   technology widely.  In addition, the MPLS network has the capability
   to provide traffic-engineering, QoS/BW guarantee, and higher
   reliability.  The overlay network should have the capability to
   benefit from the underlying network to provide high quality service.


7.  Applicability of PION






Jin & Khasnabish        Expires November 12, 2012               [Page 8]

Internet-Draft     draft-kj-nvo3-pion-architecture-00           May 2012


7.1.  PION over IP PSN

   Most of the datacenters have IP transport capability, and the end-
   points would be reasonably to be assumed to be IP reachable in most
   cases.  The PSN layer would be an IP layer with UDP encapsulation,
   GRE encapsulation, or TCP-like [I-D.davie-stt] encapsulation.  If
   security is required, IPsec could be used as a PSN tunnel.

   The IP connection is indexed by destination IP address, and all
   tenants would share the same IP connection when sending to the same
   destination.  To provide PSN tunnel per destination per tenant,
   please see section 7.2.

7.2.  PION over MPLS PSN

   The MPLS LSP could be setup per destination per tenant, and provide
   optimized traffic transmission for the overlay network, which would
   greatly improve the service quality especially when interconnecting
   different datacenters.  And hierarchy MPLS LSP could provide flexible
   connection across different domains.

   PION over MPLS PSN does not require all the nodes in the network to
   be MPLS enabled.  Hierarchy MPLS LSP over IP would be used to adapt
   the IP environment of a datacenter house.  Most of the deployment
   would only require NVE and edge router to be MPLS capable in
   datacenter.  One typical deployment use case for MPLS tunnel per
   destination per tenant would be PION deployment across WAN.  See the
   figure below.


           /----DC1----\   /----WAN--------\   /----DC2---\
          /             \ /                 \ /            \
        +---+        +-------+           +-------+        +---+
        |NVE|        |  Edge |           |  Edge |        |NVE|
        | 1 |--------|Router1|-----------|Router2|--------| 2 |
        +---+        +-------+           +-------+        +---+
        ||               |                   |               ||
        ||<=============>|<=================>|<=============>||
        |     T1(IP)           T2(MPLS)            T3(IP)     |
        |<--------------------------------------------------->|
                         End to End MPLS Tunnel

                              Figure 3

   The two interconnected datacenter would be across WAN which is MPLS
   enabled.  The MPLS tunnel per destination address per tenant ID
   provided for "Gold" class tenant customer could have dedicated
   network resources to serve.



Jin & Khasnabish        Expires November 12, 2012               [Page 9]

Internet-Draft     draft-kj-nvo3-pion-architecture-00           May 2012


   Assuming only the WAN network is required to provide bandwidth
   guarantee, where congestion is always happened.  When setting up
   connection from NVE1 to NVE2 where the two NVEs are gateway for the
   specified tenant, there would be a hierarchy LSP from NVE1 to NVE2.
   The underlying Tunnel3 (T3 in the figure) between Edge Router2 and
   NVE2, underlying Tunnel1 (T1 in the figure) between Edge Router1 and
   NVE1 could be IP connection within the datacenter (e.g, GRE
   encapsulated).  The underlying Tunnel2 (T2 in the figure) between
   Edge Router1 and Edge Router2 could be selected as an MPLS-TE tunnel
   which would provide QoS/BW guarantee.  The allocated MPLS label is an
   inner label to associated different underlying tunnels in different
   domains.  And inner MPLS label is only switched at underlying tunnel
   stitching point, e.g, Edge Router1 and Edge Router2.

   In the above case, only NVE and Edge Router are required to be MPLS
   capable and the MPLS network in WAN could be optimized to provide
   high quality service to tenant customer.  The edge router in above
   case is designed to be tenant-aware to optimize the tenant traffic by
   standard IETF way.


8.  Control Plane Consideration

   There are three kinds of control plane functions for PION:

   1.  One is between end-point and NVE which is used to signal the
   behave requirement of end-point to NVE.  One option to implement the
   first control plane is to reuse VDP defined by IEEE.

   2.  The other one is among NVEs, to synchronize end-point and PION
   connection mapping among NVEs.  Please refer to the requirement
   [I-D.kreeger-nvo3-overlay-cp] for more detail.

   One option to implement the second control plane above within one
   datacenter could be by using one centralized server, and a standard
   interface between the central server and NVE should be defined.  The
   centralized server would collect all the mapping information from
   each NVE through this standard interface.  When the NVE receives a
   packet without forwarding entry, it would request the forwarding
   entry from the centralized server to get the correct forwarding entry
   and install it with appropriate lift time.

   It is also possible to employ two or more centralized servers in one
   datacenters, different centralized servers should be able to
   synchronize the mapping information, and a standard interface between
   different centralized servers should be defined.

   When inter-connecting two datacenters, a standard interface between



Jin & Khasnabish        Expires November 12, 2012              [Page 10]

Internet-Draft     draft-kj-nvo3-pion-architecture-00           May 2012


   the corresponding two centralized servers should also be defined, the
   interface would be the same as the one within datacenter.  All the
   PION mapping information would be exchanged between the two central
   servers through this standard interface.

   3.  Additional control plane function for PION is to setup PSN
   connection.  The IP connection within IP PSN is setup by normal
   routing protocols and the IETF defined control plane could be reused.
   The control plane of MPLS connection per destination per tenant would
   be defined, one possible way is to reused MP-BGP or XMPP.


9.  Acknowledgments

   The authors would like to thank Igor Gashinsky, David McDysan,
   Patricia Thaler, Thomas Morin, Vishwas Manral for their review and
   contributions.


10.  Informative References

   [I-D.davie-stt]
              Davie, B. and J. Gross, "A Stateless Transport Tunneling
              Protocol for Network Virtualization (STT)",
              draft-davie-stt-01 (work in progress), March 2012.

   [I-D.kreeger-nvo3-overlay-cp]
              Black, D., Dutt, D., Kreeger, L., Sridhavan, M., and T.
              Narten, "Network Virtualization Overlay Control Protocol
              Requirements", draft-kreeger-nvo3-overlay-cp-00 (work in
              progress), January 2012.

   [I-D.mahalingam-dutt-dcops-vxlan]
              Sridhar, T., Bursell, M., Kreeger, L., Dutt, D., Wright,
              C., Mahalingam, M., Duda, K., and P. Agarwal, "VXLAN: A
              Framework for Overlaying Virtualized Layer 2 Networks over
              Layer 3 Networks", draft-mahalingam-dutt-dcops-vxlan-01
              (work in progress), February 2012.

   [I-D.sridharan-virtualization-nvgre]
              Sridhavan, M., Duda, K., Ganga, I., Greenberg, A., Lin,
              G., Pearson, M., Thaler, P., Tumuluri, C., and Y. Wang,
              "NVGRE: Network Virtualization using Generic Routing
              Encapsulation", draft-sridharan-virtualization-nvgre-00
              (work in progress), September 2011.

   [RFC6513]  Rosen, E. and R. Aggarwal, "Multicast in MPLS/BGP IP
              VPNs", RFC 6513, February 2012.



Jin & Khasnabish        Expires November 12, 2012              [Page 11]

Internet-Draft     draft-kj-nvo3-pion-architecture-00           May 2012


Authors' Addresses

   Lizhong Jin
   ZTE
   889, Bibo Road
   Shanghai, 201203, China

   Email: lizhong.jin@zte.com.cn, lizho.jin@gmail.com


   Bhumip Khasnabish
   ZTE USA, Inc.
   55 Madison Avenue, Suite 160
   Morristown, NJ 07960 USA

   Email: bhumip.khasnabish@zteusa.com, vumip1@gmail.com



































Jin & Khasnabish        Expires November 12, 2012              [Page 12]