Network Virtualization Overlays Working Group Q. Wu
Internet-Draft Huawei
Intended status: Standards Track April 23, 2013
Expires: October 25, 2013

Proposed Control Plane requirements for Network Virtualization Overlays
draft-wu-nvo3-nve2nve-05

Abstract

This document focuses on control plane aspect related to both tenant system to NVE control interface and NVE to Network Virtualization Authority (NVA) control interface NVE use to enable communication between tenant systems, which is complementary to [draft-kreeger-nvo3-hypervisor-nve-cp] that describes the high level control plane requirements related to the interaction between tenant system and NVE when the two entities are not co-located on the same physical device.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at http://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on October 25, 2013.

Copyright Notice

Copyright (c) 2013 IETF Trust and the persons identified as the document authors. All rights reserved.

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.


Table of Contents

1. Introduction

In [I.D-ietf-nvo3-overlay-problem-statement], two control planes are identified to realize an overlay solution:

Where NVE to NVA Control plane is used to deal with address mapping dissemination and Tenant System to NVE control plane is used to deal with VM attachment and detachment.

In [I.D-ietf-nvo3-framework], three control plane components are defined to build these two control planes and provide the following capabilities:

In [I.D-fw-nvo3-server2vcenter],the control interface between NVE and the Oracle backend system or Network Virtualization Authority (NVA) is defined to provide the capability:

This document focuses on control plane aspect related to both tenant system to NVE control interface and NVE to Oracle control interface NVE use to enable communication between tenant systems, which is complementary to [draft-kreeger-nvo3-hypervisor-nve-cp] that describes the high level control plane requirements related to the interaction between tenant system and NVE when the two entities are not co-located on the same physical device.

2. Conventions used in this document

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC2119 [RFC2119].

Site :


If multiple tenant systems connect to the VN through one NVE, the collection of these tenant systems and the NVE associated with these tenant systems are referred to as a site or virtualization network subnet.
Tenant System:


A physical or virtual system that can play the role of a host, or a forwarding element such as a router, switch, firewall, etc. It belongs to a single tenant and connects to one or more VNs of that tenant.
vNIC:


A vNIC is similar to a physical NIC. Each virtual machine has one or more vNIC adapters that it uses to communicate with both the virtual and physical networks. Each vNIC has its own MAC address and can be assigned one or more IP address just like a NIC found in a non virtualized machine.

3. NVO3 Control plane Overview

Figure 1 shows the example NVO3 Networking architecture to give an overview of NVO3 control plane for interconnection between VNs or between VN and Non-VN. There are 4 basic network components that make up networking architecture: Tenant system, local NVE, remote NVE and Network Virtualization Authority. This example NVO3 networking architecture assumes that:

      +---------------+-------------+--------------+
      | VN1           | +--------+  |   +--------+ |
      |               | |VM1VM2VM3  |   |VM4VM5VM6 |
      |               | +--------+  |   +--------+ |
      |               | |        |  |   |        | |
      |               | |Server1 |  |   |Server2 | |
      |               | |        |  |   |        | |
      |               | +--------+  |   +--------+ |
      |               | +---------+ |   +---------+|
      |              -+-|NVE Edge1+-+---+NVE Edge2++-
      |            // | +---------+ |   +---------+| \\
      |           |   |Site1        |              |   |
      |           |   +-------------+  +VN3--------+---+------------+
      | |--+---+ +---+                 |           |+---+ +---|--+  |
      | |VMd S | |   |                 |           ||   | | S |VMh  |
      | |  | e | |NVE|                 |           ||NVE| | e |  |  |
      | |  | r | |   |                 |           ||   | | r |  |  |
      | |VMe v | | E |   ,---------.   |           || E | | v |VMi  |
      | |  | e | | d | Interconnection |           || d | | e |  |  |
      | |  | r | | g |(               )|           || g | | r |  |  |
      | |VMf   | | e | `Functionality' |           || e | |   |VMj  |
      | |  | 5 | | 5 |   `---------'   |           || 6 | | 6 |  |  |
      | |VMg   | |   |                 |           ||   | |   |VMk  |
      | |--|---+ ++--+                 |           |++--+ |---+--+  |
      +-----------+--------------------+-----------+   |            |
                  |                    |               |            |
                  |  +VN2--------------+------------+  |            |
                 .   |... .............|Site2..... .|  |            |
                 .|  | +---------+ ..  |+---------+ |//             |
                 . \\| |NVE Edge3+-----++NVE Edge4+-+               |
                 .   | +---------+     |+---------+ |               |
                 .   | +--------+  ..  | +--------+ | .             |
                 .   | |        |  ..  | |        | | .             |
                 .   | |Server3 |  ..  | |Server4 | | .             |
                 .   | |        |  ..  | |        | | .             |
                 .   | +--------+  ..  | +--------+ | .             |
                 .   | |VM7VM8VM9  ..  | |VMaVMbVMc | .             |
                 ....| ---------+......|.---------+ |..             |
                     +-----------------+------------+               |
                                       +----------------------------+

Figure 1: Example NVO3 control plane Overview

4. Mapping table entry at the NVE and Network Virtualization Authority

Suppose a VM has multiple vNICs, each vNIC is corresponding to one or multiple tenant system interfaces. Tenant system could be able to directly connect to multiple VNs without needing to traverse a NVE or gateway(e.g., tenant system acts as gateway to connect to two different VNs). Therefore every NVE pair( local NVE and remote NVE ) associated with the tenant system MUST maintain at least one mapping table entry for each currently attached tenant system (In the case where TS has multiple tenant system interfaces, there may have multiple mapping table entry corresponding to one TS). Each mapping table entry corresponds to the Tenant system connection to each VN/PN or one Tenant system interface and conceptually may contain all or a sub set of the following fields:

In addition, the Network Virtualization Authority may also maintain a mapping table entry for each currently attached tenant system or each newly joined NVE. Each mapping table entry corresponds to one tenant system interface or the Tenant system connection to each VN/Physical Network(PN) and and conceptually may contain all or a sub set of the following fields:

4.1. Mapping table Entry Fields relationship

One Tenant system is corresponding to one VM. Each Tenant System that is virtual system may have multiple vNIC adapters that it uses to communicate with both the virtual and physical networks. vNIC the tenant system has should belong to a single tenant. Each vNIC must be assigned with one and only one unique MAC address. In addition, each vNIC has at least one IP address. When a VM using one vNIC connects to multiple VNs,the vNIC should be assigned with multiple IP addresses with each connecting to different VN. In this case, VM may create multiple binding cache entries with each associating one of multiple IP addresses to the same unique vNIC MAC address. vNIC MAC address may be modified or assigned with a new MAC address at any time. However one vNIC should not use more than one MAC addresses to connect to multiple VNs at the same time. When multiple vNICs hosted in the same VM connect to multiple VNs, it is allowed that some of these vNICs may connect to different VNs through the same NVE.

Each tenant system uses TSI to interface with VNI at the NVE via VAP. Each TSI can be identified by a pair of MAC address and IP address which the tenant system assigns to the TSI. Each VAP can be identified by the logical interface identifiers (e.g., VLAN ID, internal vSwitch Interface ID connected to a VM)which the NVE assigns to the VAP. In order to establish the network connection between tenant system and NVE and associate tenant system and NVE with the same VN, VNID should be used to correlate one TSI to one VAP that belong to the same VNI.

          +-------------------------+
          |                         |
          |    VM (Tenant System)   |
          |                         |
          +-+-----+-----+-------+---+
            |     |     |       |
            |     |     |       |
            |     |     |       |
            +     +     +   ... +
          vNIC1 vNIC2  vNIC3
            |     |     |
            |     |     |
            |     |     |
            |     |     |
           VN1    |    VN4
                  |
            |-----+--+
            |        |
            |        |
           VN2     VN3

   Tenant System Interfaces(TSI):
   TSIa [VNID1,MAC addr1,IP addr1]corresponding to vNIC1

   TSIb [VNID2,MAC addr2,IP addr2]corresponding to vNIC2
   TSIc [VNID3,MAC addr2,IP addr3]corresponding to vNIC2

   TSId [VNID4,MAC addr3,IP addr4]corresponding to vNIC3

   Legend:
   PN --  Physical Network
                  Figure 2. VM information Hierarchy


  +-------------------------+
  |                         |
  |          NVE            |
  |                         |
  |VNI1  VNI2  VNI3    VNIx |
  +-+-----+-----+-------+---+
    |     |     |       |
    |     |     |       |
   VAP1  VAP2  VAP3     |
    +     +     +   ... +
    |     |     |
    |     |     +--------+
    |     |              |
    |     |              |
    |     |              |
   TSIa  TSIb           TSIc
 +--|-----|--+     +-----+-----+
 |           |     |           |
 |  Tenant   |     |  Tenant   |
 |  System1  |     |  System2  |
 |           |     |           |
 +-----------+     +-----------+

           Figure x  Interfaces between Tenant system and NVE 

                            VM1(Tenant System)
                             |
                   |---------+--------+
                   |         |        |
                   |         |        |
             vNIC1(TSI1) vNIC2(TSI2) ..vNICx(TSIx)
                   |
                   |
                   |
                  NVE
                   |
           +-------+------+
           |       |      |
          VN1     VN2    VN3


          Binding Cache Database
          Binding corresponding to TSI1
          binding[VNIC1 MAC addr, IP1 addr, BID1]
          binding[vNIC1 MAC addr, IP2 addr, BID2]
          binding[vNIC1 MAC addr, IP3 addr, BID3]

         Figure 3 Simultaneous multiple connections
         for Layer 3 Virtual Network Service


                           VM1 (Tenant System)
                            |
                  |---------+--------+
                  |         |        |
                  |         |        |
             vNIC1(TSI1) vNIC2(TSI2) ..vNICx(TSIx)
                  |
                  |
                  |
                 NVE
                  |
          +-------+------+
          |       |      |
         VN1     VN2    VN3

         Binding Cache Database
         Binding Corresponding to TSI1
         binding[VNIC1 MAC addr, VLAN ID1, BID1]
         binding[vNIC1 MAC addr, VLAN ID2, BID2]
         binding[vNIC1 MAC addr, VLAN ID3, BID3]

         Figure 4. Simultaneous multiple connections
          for Layer 2 Virtual Network Service


4.2. Multiple TSIs for multiple simultaneous connection support

If tenant system has multiple Tenant System Interfaces (TSI), Tenant System may use these multiple TSIs to connect to multiple VNs simultaneously. Each TSI uses different TSI Identifier (e.g.,combination of MAC address,IP address, VNID of vNIC) and corresponds to each connection to VN. In order to distinct one TSI from another, Tenant system may create a Binding Identifier (BID) to each IP address that is used to connect to VN. The BID should be unique for a given Tenant System Interface. If the tenant system has only Tenant System inteface, the assignment of a BID is not needed until it has multiple TSIs, at which time all of TSIs of TS MUST be mapped to BIDs. BID is suggested to be stored in the mapping table entry at the NVE and Network Virtualization Authority(NVA).

4.3. Forwarding functionality at the tenant system

Suppose tenant system A only has one vNIC and is corresponding to a VM, when the tenant system A plays the role of forwarding functionality to connect two VNs, the following three cases should be considered.

For (a), tenant system A or external system that is close to tenant system A should support layer 3 forwarding functionality. When source tenant system in one VN communicates with destination tenant system in another VN through the tenant system A, if tenant system A support layer 3 forwarding, the tenant system A should forward IP packets on the behalf of source Tenant System and destination tenant system irrespective of data plane encapsulation format(e.g., VXLAN or NVGREW, MPLS over GRE). If two VNs use different data plane encapsulation format, the tenant system A should also support converting one data plane encapsulation format into another. If tenant system A doesn't support layer 3 forwarding, the external system that is close to tenant system A should associate TSI to local NVE using information like VNID,TS MAC address and VLAN tag information and forward IP packets on the behalf of source tenant system and destination tenant system.

For (b), tenant system A vNIC or external system that is close to tenant system A should support layer 2 forwarding functionality. When source tenant system in one VN communicates with destination tenant system in another VN through the tenant system A, if tenant system A support layer 2 forwarding, the tenant system A should know which tenant systems connecting to itself are allowed for layer 2 forwarding and then forward layer 2 frames on the behalf of source Tenant System and destination tenant system based on forwarding allowed list. If two layer 2 VNs support different data plane encapsulation format, the tenant system A should also support converting one data plane encapsulation format to another. If tenant system A doesn't support layer 2 forwarding, the external system that is close to tenant system A should associate TSI to local NVE using information like VNID, TS MAC address and VLAN tag information and forward layer 2 frames on the behalf of source tenant system and destination tenant system.

For (c), tenant system A or external system that is close to tenant system A should support both layer 2 forwarding and layer 3 forwarding. When source tenant systems in layer 2 VN communicates with destination tenant system in layer 3 VN through the tenant system A, if tenant system A support both layer 2 and layer 3 forwarding the tenant system A should support translating layer 2 frame into layer 3 packet and forward traffic between layer 2 VN and layer 3 VN. If two VNs support different data plane encapsulation format, the tenant system A should also support converting one data plane encapsulation format to another. If tenant system A doesn't support layer 2 forwarding or layer3 forwarding, the external system that is close to tenant system A should associate TSI to local NVE using information like VNID,TS MAC address and VLAN tag information and forward traffic on the behalf of source tenant system and destination tenant system.

When the tenant system A plays the role of interconnection functionality to connect between VN and Non-VN, suppose source tenant system in one VN communicates with destination end device in Non-VN environment through the tenant system A, the tenant system A acts as NVO3 GW between VN and Non-VN in this case peering with other Gateways and should be explicitly configured with a list of destination MAC addresses that allows passed to the Non-VN environment and perform translation between VNID and Non-VN label when forwarding traffic between VN interface and Non-VN interface. For outgoing frames on VN connected interface, the tenant system A decapsulates NVO3 outer header and forwards the inner frame to Non-VN environment based on configured allowed list. For incoming frames on non-VN connected interface(e.g.,WAN interface), the tenant system A should map the incoming frames from end device to specific VN based on inner Ethernet frame information (e.g., VLAN ID). The mapping table is setup at the tenant system A to perform VNID lookup in VN and label lookup in the Non-VN environment.

5. NVE to NVA Control plane protocol functionality

The core functional entities for NVE to NVA Control plane infrastructure are the NVE and Network Virtualization Authority. The Network Virtualization Authority is responsible for maintaining the tenant system reachability state and is the topological anchor point for the Tenant system MAC addresses or Tenant System Names (i.e.,vNIC name). The NVE is the entity that performs the address mapping management on behalf of tenant system, and it resides on the access link where the tenant system is anchored. The NVE is responsible for detecting the VM's movements to and from the access link and for initiating location binding registrations to the tenant system's NVA. There can be multiple NVAs in a VN each serving a different group of tenant system.

5.1. NVE connect/disconnect notification

When a tenant system connects to one VN by attaching to a local NVE, the local NVE should also be added into VN context together with tenant system information. This helps Network Virtualization Authority know with which NVE a group of the tenant systems are attached or current location of these tenant systems. When the last tenant system is disconnected to one VN through one local NVE, this local NVE should also be removed from VN context. This should also be updated to Network Virtualization Authority and let Network Virtualization Authority know that there are no tenant system associated with that NVE.

5.2. VN membership Registration and Query

In order to enable tenant system A to communicate with any tenant system that is not under the same local NVE, the mapping table should be distributed to all the remote NVEs that belong to the same VN even though there is no tenant system which communicates with tenant system A behind that remote NVE. However how should local NVE know a list of remote NVEs that belong to the same VN as local NVE? In order to tackle this, when a tenant system connects to one VN by attaching to a local NVE, VN membership (e.g., VNID, VN Name, a list of NVE that belong to VN) should be registered to the Network Virtualization Authority. When local NVE needs to know which remote NVEs it should forward a data packet, Network Virtualization Authority can be queried by the local NVE. The Network Virtualization Authority can redirect the query from local NVE to the remote NVEs based on VN membership registration and obtain answer from the right remote NVE. In addition, VM membership may contain detailed mapping between tenant system, NVE and VN in the form of TESID=<VNID, NVE_ID,TS_ID>. In this case, The Network Virtualization Authority can directly supply answer for the request from the local NVE.

5.3. Address Mapping information reflection/distribution

Data plane learning can be used to build mapping table without need for control plane protocol. However it requires each data packet to be flooded to the whole VN. In order to eliminate flooding introduced by data plane learning, a control protocol is needed to provide both MAC address and IP address in the form of mapping information. When VN membership registration complete, the NVE can forward such address mapping information directly to all the remote NVEs based on VN membership, alternatively, the NVE also can forward such address mapping information indirectly to the Network Virtualization Authority and let the Network Virtualization Authority to reflect the address mapping information to all the relevant remote NVEs based on VN membership.

5.4. VN context moving

In some cases, one tenant system may be detached from one NVE and move to another NVE. In such cases, the VN context should be moved from the NVE to which the tenant system was previously attached to the new NVE to which the tenant system is currently attached. In order to achieve this, the per tenant system VN context including VN profile can be maintained at the Network Virtualization Authority and be retrieved at the new place based on the VN Identifier (VNID).

6. Hypervisor-to-NVE Control Plane Protocol Functionality

6.1. Multiple TSIs of one TS for multiple simultaneous connections support

Typically, a TSI or a vNIC is assigned with a single MAC address and a single IP address. However, a TSI may be assigned multiple IP addresses with each to connect to one VN. In such case, BID may be assigned for each TSI when tenant system wants to register multiple IP address with its MAC address corresponding to one TSI simultaneously to the local NVE. If a TSI has only one IP address, the assignment of a BID is not needed until it has multiple IP addresses to register with, at which time all of the IP addresses of TSI MUST be mapped to BIDs. When a tenant system registers a given BID for the first time it MUST send BID together with the IP address of TSI. For any subsequent registrations that either re-register or de-register the same BID, the TS only needs send BID and does not need to send IP address of TSI associated with BID.

7. Key functions aspect for signaling control/forwarding info to NVEs

7.1. Create and Update tenant Virtual Network (VN)

The tenant virtualization network(VN) is a collection of tenant systems, Network Virtualization Edges (NVE)(s) and other end systems that are interconnected with each other. The tenant VN also consists of a set of sites where each can send traffic directly to the other.

In order to create or update a tenant VN, when a Tenant System is attached to a local NVE, the tenant system should inform the attached local NVE which VN the tenant system belong to.

7.2. Associate the NVE and tenant system with VN context

The VN context includes a set of configuration attributes defining access and tunnel policies and (L2 and/or L3) forwarding functions. When a Tenant System is attached to a local NVE, a VN network instance should be allocated to the local NVE. The tenant system should be associated with the specific VN context using virtual Network Instance(VNI). The tenant system should also inform the attached local NVE which VN context the tenant system belong to. Therefore the VN context can be bound with the data path from the tenant system to the local NVE and the tunnel from local NVE associated with the tenant system and all the remote NVEs that belong to the same VN as the local NVE. For the data path from the tenant system and the local NVE, the network policy can be installed on the underlying switched network and forwarding tables also can be populated to each network elements in the underlying network based on the specific VNI associated with the tenant system. For the tunnel from local NVE to the remote NVEs, the traffic engineering information can be applied to each tunnel based on VNI associated with the tenant system.

7.3. Populate mapping tables information at the local NVE

In some cases, two tenant systems may be attached to the same local NVE. In order to allow the NVE to locally route traffic between two tenant systems that are attached to the same NVE, the mapping table that maps a final destination address to the proper tunnel should be populated at the local NVE.

In some cases, two tenant systems may connect to the different VNs through the same interconnection functionality, in order to allow two tenant systems communication between two VNs, the mapping table that maps a final destination address to the proper tunnel should be populated in both NVE associated with two communicated tenant system and the interconnection functionality associated corresponding NVE.

7.4. Distribute the mapping table information to remote NVEs in the VN

When the packet sent from one tenant system arrives at the ingress NVE associated with that tenant system, in order to determine which tunnel the packet needs to be sent to, the mapping table that maps a final destination address to the proper tunnel should also be distributed to all the remote NVEs in the VN using a control plane protocol or dynamic data plane learning. The mapping table may be advertised directly to other remote NVEs that belong to the same VN or firstly advertised to the centralized controller that maintain global view of NVEs that belong to the same VN and then let the centralized controller distribute the mapping tables to all the relevant remote NVEs that belong to the same VN.

7.5. The mapping table information update at the NVE when VM moves or connection fails

In some cases, one tenant system may be detached from one NVE and move to another NVE. In such cases, the mapping table should be removed from the NVE to which the tenant system was previously attached and the new mapping table should be created at the new NVE to which the tenant system is currently attached. Such mapping table should be updated at each remote NVE associated with the tenant system and the centralized controller,e.g., the Network Virtualization Authority.

In some cases, one tenant system may fail to connect to the VN through the NVE. In such cases, the mapping table should be removed from the NVE to which the tenant system is currently attached. In addition, the mapping table should be updated at each remote NVE in the same VN through which the tenant system is communicating with the destination tenant system.

7.6. The VN context re-association at the NVE when VM moves

In some cases, one tenant system may be detached from one NVE and move to another NVE. In such cases, the VN context should be moved from the NVE to which the tenant system was previously attached to the new NVE to which the tenant system is currently attached. In order to achieve this, the per tenant system VN context can be maintained at the centralized database and be retrieved at the new place based on the VN Identifier (VNID).

8. IANA Considerations

This document has no actions for IANA.

9. Security Considerations

TBC.

10. References

10.1. Normative References

[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", March 1997.
[I.D-ietf-nvo3-overlay-problem-statement] Narten, T., "Problem Statement: Overlays for Network Virtualization", ID draft-ietf-nvo3-overlay-problem-statement-02, Feburary 2013.
[I.D-ietf-nvo3-framework] Lasserre, M., "Framework for DC Network Virtualization", ID draft-ietf-nvo3-framework-00, September 2012.
[I.D-kreeger-nvo3-hypervisor-nve-cp] Kreeger, L., "Network Virtualization Hypervisor-to-NVE Overlay Control Protocol Requirements", ID draft-kreeger-nvo3-hypervisor-nve-cp-01, Feburary 2013.

10.2. Informative References

[I.D-fw-nvo3-server2vcenter] Wu, Q. and R. Scott, "Network Virtualization Architecture", ID draft-fw-nvo3-server2vcenter-01, January 2013.

Appendix A. Change Log

Note to the RFC-Editor: please remove this section prior to publication as an RFC.

A.1. draft-wu-nvo3-nve2nve-05

The following are the major changes to previous version :

A.2. draft-wu-nvo3-nve2nve-04

The following are the major changes to previous version :

Author's Address

Qin Wu Huawei 101 Software Avenue, Yuhua District Nanjing, Jiangsu 210012 China EMail: bill.wu@huawei.com