Internet DRAFT - draft-chen-svdc

draft-chen-svdc



 



INTERNET-DRAFT                                              Congjie Chen
Intended Status: Standards Track                                  Dan Li
Expires: Feb 2016                                    Tsinghua University
                                                                  Jun Li
                                                    University of Oregon
                                                             August 2015
                                                                        


SVDC: Software Defined Data Center Network Virtualization Architecture 
                           draft-chen-svdc-00


Abstract

   This document describes SVDC, a highly-scalable and low-overhead
   virtualization architecture designed for large layer-2 data center
   networks. By leveraging the emerging software defined network
   framework, SVDC decouples the global identifier of a virtual network
   from the identifier carried in the packet header. Hence, SVDC can
   scale to a large scale of virtual networks with a very short tag in
   the packet header, which is never achieved by previous network
   virtualization solutions. SVDC enhances MAC-in-MAC encapsulation in a
   way that packets with overlapped MAC addresses are correctly
   forwarded even without in-packet global identifiers to differentiate
   the virtual networks they belong to. Besides, scalable and efficient
   layer-2 multicast and broadcast within virtual networks are also
   supported in SVDC. This document also introduces a basic framework to
   illustrate SVDC deployment.

Status of this Memo

   This Internet-Draft is submitted to IETF in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as
   Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/1id-abstracts.html

 


Chen, et al.                Expires Feb 2016                    [Page 1]

INTERNET DRAFT                    SVDC                       August 2015


   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html


Copyright and License Notice

   Copyright (c) 2015 IETF Trust and the persons identified as the
   document authors. All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document. Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document. Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

   This Internet-Draft will expire on January, 2016.


Table of Contents

   1  Introduction  . . . . . . . . . . . . . . . . . . . . . . . . .  3
     1.1  Terminology . . . . . . . . . . . . . . . . . . . . . . . .  4
   2.  SVDC Architecture  . . . . . . . . . . . . . . . . . . . . . .  5
     2.1  Virtual Switch  . . . . . . . . . . . . . . . . . . . . . .  8
     2.2  Edge switches . . . . . . . . . . . . . . . . . . . . . . .  9
     2.3  SVDC Controller . . . . . . . . . . . . . . . . . . . . . . 10
   3.  Packet Forwarding  . . . . . . . . . . . . . . . . . . . . . . 11
     3.1  Unicast Traffic . . . . . . . . . . . . . . . . . . . . . . 11
     3.2 Multicast/Broadcast Traffic  . . . . . . . . . . . . . . . . 12
     3.3  SVDC Frame Format . . . . . . . . . . . . . . . . . . . . . 13
   4.  SVDC Deployment Considerations . . . . . . . . . . . . . . . . 14
     4.1  VM Migration  . . . . . . . . . . . . . . . . . . . . . . . 14
     4.2  Fault Tolerance . . . . . . . . . . . . . . . . . . . . . . 15
   5  Security Considerations . . . . . . . . . . . . . . . . . . . . 15
   6  IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 16
   7  References  . . . . . . . . . . . . . . . . . . . . . . . . . . 16
     7.1  Normative References  . . . . . . . . . . . . . . . . . . . 16
     7.2  Informative References  . . . . . . . . . . . . . . . . . . 16
   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 17





 


Chen, et al.                Expires Feb 2016                    [Page 2]

INTERNET DRAFT                    SVDC                       August 2015


1  Introduction

   Due to the simplicity and easiness to manage, large layer-2 network
   is widely accepted as the fabric to build a data center network.
   Scalable layer-2 architectures, for example, TRILL [RFC6325] and SPB
   [802.1aq] are proposed as industry standards. A large layer-2 network
   segment can even cross the Internet via virtualization services such
   as VPLS [RFC4762]. However, this kind of layer-2 network fabric
   design mainly focus on routing/forwarding rules in the network, and
   it is still an open issue how to run a multi-tenant network
   virtualization scheme on top of the large layer-2 network fabrics.
   Existing network virtualization solutions, including VLAN [802.1q],
   VXLAN [RFC7348] and [NVGRE] either face severe scalability problem or
   are not specifically designed for layer-2 networks. Particularly, 
   designing a virtualization solution for large layer-2 network needs
   to address following challenges. 

   For a large-scale, geographically distributed layer-2 network
   operated by a cloud provider, the potential number of tenants and
   virtual networks can be huge. Network virtualization based on VLAN
   can support at most 4094 virtual networks, which is obviously not
   enough. Although VXLAN [RFC7348] and [NVGRE] can support 16,777,216
   virtual networks, they are at the cost of using much more bits in the
   packet header. The fundamental issue is, in existing network
   virtualization proposals, the number of virtual networks that can be
   differentiated depends on the number of bits used in the packet
   header.

   Given the possible overlapped MAC addresses for VMs in different
   virtual networks and the limited forwarding table size in data center
   switches, it is inevitable to encapsulate the original MAC address of
   a packet when transmitting it in the core network. MAC-in-UDP
   encapsulation used in VXLAN [RFC7348] incur unnecessary packet header
   overhead for a layer-2 network. MAC-in-MAC encapsulation framework is
   more applicable in the multi-tenant large layer-2 network where MAC
   addresses of VMs largely overlap.

   Multicast service is common in data center networks, but how to
   support scalable multicast service in a multi-tenant virtualized
   large layer-2 network is still open. A desired capability with a
   layer-2 network virtualization framework is to support efficient and
   scalable layer-2 multicast as well as broadcast.

   This document describes SVDC, which leverages the framework of [SDN]
   to address the challenges above, and achieves the goal of a high
   scalability and low overhead large layer-2 network virtualization
   architecture. It decouples the global identifier of a virtual network
   and the in-packet tag to encompass a great scale of virtual networks
 


Chen, et al.                Expires Feb 2016                    [Page 3]

INTERNET DRAFT                    SVDC                       August 2015


   with a minimal tag length in the packet header. The global identifer
   is maintained in the SVDC controller while the in-packet identifier
   is only used to differentiate virtual networks residing in the same
   server. To mask the VM MAC address overlap in the core network, SVDC
   uses MAC-in-MAC encapsulation in ingress edge switches and employs
   two techniques to guarantee correct packet forwarding in the first
   hop and last hop without in-packet global virtual network identifier.
   What's more, SVDC can efficiently support up to tens of billions of
   multicast and broadcast groups with possible overlapping multicast or
   broadcast addresses in different virtual networks in a layer-2
   network by the same framework as in unicast.

1.1  Terminology

   This document uses the following terminology.

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119 [RFC2119].

   Virtual Network (VN): A VN is a logical abstraction of a physical
   network that provides L2 network services to a set of Tenant Systems.

   Virtual Machine (VM): It is an instance of OS's running on top of
   hypervisor over a physical machine or server. Multiple VMs can share
   the same physical server via the hypervisor, yet are completely
   isolated from each other in terms of compute, storage, and other OS
   resources.

   Virtual Switch (vSwitch): A function within a hypervisor (typically
   implemented in software) that provides similar forwarding services to
   a physical Ethernet switch.  A vSwitch forwards Ethernet frames
   between VMs running on the same server or between a VM and a physical
   Network Interface Card (NIC) connecting the server to a physical
   Ethernet switch or router. A vSwitch also enforces network isolation
   between VMs that by policy are not permitted to communicate with each
   other. (e.g., by honoring VLANs).

   Global Tenant Network Identifier (GTID): A GTID is a global
   identifier of a virtual network. It is never carried in packets that
   VMs send out but maintained in the SVDC controller.

   Local Tenant Network Identifier (LTID): A LTID is a local identifier
   that is used to differentiate virtual networks on the same server.
   For the same virtual network, its LTID in different servers can
   either be different or the same. When a new virtual network is
   created, it will be assigned a LTID in each server that hosts its
   VMs.
 


Chen, et al.                Expires Feb 2016                    [Page 4]

INTERNET DRAFT                    SVDC                       August 2015


   Global Identifier of a Multicast/Broadcast Group (Group-G): It is
   used to denote the address of a multicast/broadcast group that can be
   used in the physical network in the SVDC architecture. When a new
   multicast/broadcast group wants to send traffic across the core
   network, an available Group-G will be assigned to it. When all the
   receivers of a group leave a multicast group, or a broadcast group
   lacks of activity for a long duration, the corresponding Group-G will
   be removed.

   Local Identifier of a Multicast/Broadcast Group (Group-L): It is used
   to denote the address of a multicast/broadcast group within a virtual
   network. Group-L in different virtual networks can be overlapped.

   Edge Switch Identifier (EID): It is used to denote the identifier of
   an edge switch. Any identifier of a switch such as the MAC address of
   a switch can be represented as it.

   Server Identifier (SID): It is used to denote the identifier of a
   physical server just like EID.

   Virtual Machine MAC Address (VMAC): This is the MAC address assigned
   to the virtual NIC of each VM. It is visible to VMs and applications
   running within VMs.

   Egress Port Identifier (p-ID): It denotes the outgoing port to which
   the egress edge switch should forward the packet.


2.  SVDC Architecture

   The basic architecture of SVDC is depicted in Figure 1.

















 


Chen, et al.                Expires Feb 2016                    [Page 5]

INTERNET DRAFT                    SVDC                       August 2015


      +--------------------+        +--------------------+
      |       Server 1     |        |       Server 2     |
      | +----------+       |        | +-----------+      |
      | |   VN 1   |       |        | |   VN 2    |      |
      | | +-------+|       |        | | +-------+ |      |
      | | | VM 1  ||       |        | | | VM 2  | |      |
      | | | VMAC 1||       |        | | | VMAC 2| |      |
      | | +-------+|       |        | | +-------+ |      |
      | |     |    |       |        | |     |     |      |
      | +----------+       |        | +-----------+      |
      |       |            |        |       |            |
      | +----------------+ |        | +----------------+ |
      | |Virtual Switch 1| |        | |Virtual Switch 2| |
      | +----------------+ |        | +----------------+ |
      |           |        |        |           |        |
      +--------------------+        +--------------------+
                  |                                | 
        +-------------+               +----------------+
        |Ingress Edge |               |  Ingress Edge  |
        |  Switch 1   |               |    Switch 2    |
        +-------------+               +----------------+
             |     |                         |     |
             |     |        ,----------.     |     |
             |     |      ,'            `.   |     |
             |     +-----(  Core Network  )--+     |
             |            `.            ,'         |
             |              `-+-------+'           |
             |                                     |
             |     +-------------------------+     | 
             +-----|     SVDC Controller     |-----+ 
                   +-------------------------+        

                         Figure 1 SVDC Architecture

   In minimum configuration, the SVDC architecture only contains an SVDC
   controller and the updated edge switches. The controller interacts
   with the edge switches using an SDN protocol like [OPENFLOW]. A very
   light-weight modification on the virtual switch is required to fill
   the server-local identifier of a virtual network into the packet.
   Core switches and VMs just run legacy protocols, and can be unaware
   of SVDC.

   In the core network, any kind of layer-2 forwarding schemes can be
   used, for example, Spanning Tree Protocol (STP) [802.1D], TRILL
   protocol [RFC6325] and Shortest Path Bridging protocol [802.1aq] for
   unicast, while global multicast tree formation protocol for
   multicast. However, up to the operator's configuration, the SVDC
   controller can also use [OPENFLOW] to configure the unicast/multicast
 


Chen, et al.                Expires Feb 2016                    [Page 6]

INTERNET DRAFT                    SVDC                       August 2015


   forwarding entries in the core network. SVDC can seamlessly coexist
   with any forwarding fabric in the core network, either SDN or non-
   SDN.

   Every virtual switch maintains a local FIB table with entries
   destined to VMs on the local server, while packets sent to all the
   other VMs are simply forwarded to the edge switch it connects to. An
   edge switch maintains both a unicast encapsulation table and a
   multicast encapsulation table, used in MAC-in-MAC encapsulation for
   every packet. When the first packet of a flow arrives at an ingress
   edge switch, the encapsulation table lookup will fail and then the
   packet is directed to the SVDC controller. The SVDC controller then
   looks up its mapping tables which maintain the global information of
   the network, and responds to the ingress switch with the information
   to update its encapsulation table. Subsequent packets of the flow
   will be directly encapsulated by looking up the encapsulation table,
   without interrupting the SVDC controller again. Multicast group join
   requests are also directed to the SVDC controller, and then the
   controller updates the multicast decapsulation table in corresponding
   egress switches with group membership.

   SVDC supports a great scale of virtual networks by maintaining a
   global identifier for every virtual network in the SVDC controller,
   but never carrying the identifier in the packet. Instead, a server-
   local identifier is carried in the packet header to identify a
   virtual network on a certain physical server. The SVDC controller
   maintains the mapping relationship between the global and local
   identifiers, and is responsible for the translation when the first
   packet of a flow is directed to the SVDC controller. The translation
   includes both mapping a server-local virtual network identifier to
   the global identifier, and vice-versa. SVDC reuses the 12-bit VLAN
   [802.1q] field as the in-packet server-local virtual network
   identifier, which should be adequate since the number of virtual
   networks in a physical server cannot exceed 4096.

   To minimize the packet header overhead introduced due to
   encapsulating the original Ethernet packets from VMs in a layer-2
   network, SVDC uses MAC-in-MAC encapsulation in ingress switches. It
   not only masks the MAC address overlap from VMs in different virtual
   networks, but also minimizes the number of forwarding entries in core
   switches. The key point here is how to guarantee correct packet
   forwarding in the first hop and last hop, since no information is
   carried in the packet to globally differentiate the virtual networks
   in a direct way. SVDC has two approaches to deal with these problems.

   First, for the ingress switch to identify the virtual network an
   incoming packet belongs to, only the server-local identifier carried
   in the VLAN field is not enough. But the VLAN field together with the
 


Chen, et al.                Expires Feb 2016                    [Page 7]

INTERNET DRAFT                    SVDC                       August 2015


   incoming port of the switch are just enough for the identification,
   since the incoming port of the switch can uniquely identify the
   physical server where the packet is sent from.

   Second, when the egress switch decapsulates the outer MAC header, it
   needs a way to correctly forward the packet to an outgoing port.
   Local table lookup cannot help because the in-packet virtual network
   identifier is not the global one and thus can overlap. The way we
   come up with is to reuse the VLAN field of the outer MAC header to
   indicate the forwarding port in the egress switch. The field is
   filled in the ingress switch for a unicast packet by looking up the
   unicast encapsulation table, and filled in the egress switch for a
   multicast packet by looking up the multicast decapsulation table. The
   12-bit VLAN tag is also more than enough to identify different
   servers connecting the egress switch, unless the egress switch has
   more than 4096 ports, which cannot happen in practice.

   SVDC encompasses multicast and broadcast within each virtual network
   with possible overlapping group addresses. In order to avoid traffic
   leakage among virtual networks, the SVDC controller maps each
   multicast group or broadcast in a virtual network to a global
   multicast group, which can be identified by the global multicast
   group address, composed of 23-bit multicast MAC address and 12-bit
   VLAN field. This 35-bit global multicast group address is enough to
   support a potentially huge number of multicast/broadcast groups
   within virtual networks and can be carried in the outer Ethernet
   header.

   The following sections will describe the design detail of each
   component in SVDC architecture.

2.1  Virtual Switch

   Every virtual switch configures its FIB table entries towards VMs in
   the local server, and sets the forwarding port of the default entry
   towards the edge switch connecting to the server it resides in. The
   key of the FIB table entry in virtual switch is a tuple (LTID,VMAC),
   which uniquely identifies a VM in a physical server. Note that in
   SVDC, VMs are not aware of the virtualized network infrastructure,
   and thus the Ethernet header sent by a VM does not contain any LTID.

   When a virtual switch receives an Ethernet packet, it first
   determines whether it is from a local VM or from the outbound port.
   If from a local VM, the virtual switch adds the LTID in the VLAN
   field of the Ethernet header based on the incoming port and then
   forwards it out. If from the outbound port, operations on it depend
   on whether it is a unicast packet or a multicast/broadcast packet.
   For a unicast packet, the virtual switch directly looks up the FIB
 


Chen, et al.                Expires Feb 2016                    [Page 8]

INTERNET DRAFT                    SVDC                       August 2015


   table and forwards it to a certain VM in the local server; for a
   broadcast packet, the virtual switch forwards it to all VMs within
   the same virtual network on the local server; while for a tenant-
   defined multicast packet, the virtual switch forwards it towards VMs
   that are interested in it, which can be learned by snooping the
   multicast group join message sent by VMs.

2.2  Edge switches 

   Edge switches bear most intelligence of the data plane in SVDC. It is
   responsible for rewriting VLAN field in the inner Ethernet packet
   header and encapsulating/decapsulating the original Ethernet packets.

   Every ingress edge switch maintains a unicast encapsulation table
   which maps from (in-port, LTID-s, VM-d) to (LTID-d, ES-d, p-ID),
   where in-port is the incoming port of the packet, LTID-s is the LTID
   of the virtual network in the source server, VM-d is the MAC address
   of the destination VM in the original Ethernet header, LTID-d is the
   LTID of the virtual network in the destination server, ES-d is the
   MAC address of the egress edge switch, and p-ID is the outgoing port
   to which the egress edge switch should forward the packet. If the
   lookup hits, the ingress edge switch will do the following
   operations. First, it rewrites LTID-s in the VLAN field of the
   original Ethernet header as LTID-d. Second, it encapsulates the
   packet by adding an outer Ethernet header, with ES-d as the
   destination MAC address, its own MAC address (ES-s) as the source MAC
   address, and p-ID as the VLAN field. Third, it forwards the
   encapsulated packet by looking up the forwarding table. However, if
   the lookup fails, the ingress edge switch will direct the packet to
   the SVDC controller with incoming port of the packet, which helps the
   controller obtain the information required to install an
   encapsulation entry in the unicast encapsulation table.

   A multicast encapsulation table is also maintained, which maps from
   the tuple (in-port, LTID-s, Group-L) to the global multicast group
   address Group-G to fill in the outer Ethernet header. If the lookup
   hits, it encapsulates the multicast/broadcast packets with Group-G as
   the destination MAC address and VLAN ID while ES-s as the source MAC
   address. If the lookup misses, it will send this packet to the SVDC
   controller to update the multicast encapsulation table.

   Since VMs of a certain group can have different LTIDs in different
   servers, egress edge switches should rewrite LTID in the inner
   Ethernet header for each packet duplication destined to different
   servers. Thus, every egress edge switch maintains a multicast
   decapsulation table, which maps from Group-G to multiple (Out-PORT,
   LTID-d) tuples, where Out-PORT is an output port of a
   multicast/broadcast packet duplication and LTID-d is the LTID of the
 


Chen, et al.                Expires Feb 2016                    [Page 9]

INTERNET DRAFT                    SVDC                       August 2015


   virtual network in the destination server connecting to the Out-PORT.
   Entries in this table are inserted by the SVDC controller when the
   multicast group join message sent by a VM is directed to it. When an
   egress edge switch receives a multicast packet, it first duplicates
   this packet as the number of (Out-PORT,LTID-d) tuples. Then, it
   decapsulates each packet duplication, rewrites the LTID in the inner
   Ethernet header of each packet duplication as indicated by LTID-d and
   sends each packet duplication towards the destination server as
   indicated by the Out-PORT.

2.3  SVDC Controller

   The SVDC controller keeps several groups of mapping tables based on
   its global knowledge of the network.

      - LT-GT MAP: (SID, LTID) is mapped to GTID.
        It is used to identify the global identifier of a virtual
        network based on a physical server identifier and its local
        virtual network identifier.

      - VM-LT MAP: (GTID, VMAC) is mapped to (SID,LTID).
        Based on the global identifier of a virtual network and a
        certain MAC address, we can uniquely identify the physical
        server a VM resides in as well as the local identifier of the
        virtual network on that server.

      - SID-ES MAP: (EID, port) is mapped to SID and vice versa.
        This mapping table can be directly obtained from the network
        topology and it is used to identify the server connected to a
        certain port of an edge switch or vice versa.

      - GL-GG MAP: (GTID,Group-L) is mapped to Group-G.
        It is used to map a multicast group or broadcast address within
        a virtual network to its global multicast group address.

   The main function of the SVDC controller is to respond to requests
   from edge switches with information they need, which helps install
   the encapsulation/decapsulation table entries in the ingress/egress
   edge switches. When an ingress edge switch receives the first packet
   of a flow, it directs the packet to the controller with the incoming
   port of the packet and queries the controller for the information
   required.

   If it is a unicast data packet, the controller first uses SID-ES MAP
   to get the SID of the source server. By source server's SID and LTID
   in the original packet, the controller then identifies GTID of the
   virtual network by LT-GT MAP. Based on the GTID and the destination
   MAC address of the original packet, the controller can use VM-LT MAP
 


Chen, et al.                Expires Feb 2016                   [Page 10]

INTERNET DRAFT                    SVDC                       August 2015


   to further identify the destination SID and LTID of the virtual
   network in the destination server. Finally, the controller depends on
   the SID-ES MAP again to get the MAC address of the egress edge switch
   as well as the port number of the egress edge switch connecting to
   the destination server. Now, the SVDC controller can return all the
   information needed by the ingress edge switch to construct an unicast
   encapsulation table entry.

   If it is a multicast data packet, the controller uses SID-ES MAP and
   LT-GT MAP sequentially to get the GTID of the virtual network as
   aforementioned. Then, if the controller can find a corresponding
   entry in GL-GG MAP to get Group-G, it returns Group-G to the ingress
   switch to build the multicast encapsulation table. If not, it will
   find an available global multicast group address Group-G, insert a
   new entry to GL-GG MAP, and return the new Group-G to the ingress
   edge switch.

   If it is a multicast group join request, the SVDC controller first
   gets the GTID of the virtual network by using SID-ES MAP and LT-GT
   MAP sequentially. Then, it looks up the GL-GG MAP to find the
   corresponding Group-G. If the SVDC controller can find one, it just
   responds to the edge switch with this information. If not, the SVDC
   controller will find an available Group-G and insert a new entry to
   the GL-GG MAP before it responds it to the edge switch. After the
   edge switch gets the Group-G from the SVDC controller, it inserts a
   new entry into the multicast decapsulation table with Out-PORT as the
   incoming port of the multicast group join request and LTID-d as the
   LTID of it.

   If the cloud provider's layer-2 data center networks are
   geographically distributed across the Internet, the SVDC controller
   needs to maintain the information of all cloud data center networks
   of this cloud provider. In practice, each data center network has a
   controller and the global information is synchronized among the
   controllers periodically.


3.  Packet Forwarding


3.1  Unicast Traffic

   When a unicast packet is generated by a VM and sent out to the local
   virtual switch, it carries the destination MAC address (VM-d), the
   source MAC address (VM-s), and leaves the VLAN field empty.

   The virtual switch then adds the local LTID (LTID-s) into the VLAN
   field of the packet and looks up the local FIB table for forwarding.
 


Chen, et al.                Expires Feb 2016                   [Page 11]

INTERNET DRAFT                    SVDC                       August 2015


   If the destination VM is within the local server, the packet will be
   directly forwarded to it. Otherwise, the packet is delivered to the
   ingress edge switch ES-s.

   Next, the ingress edge switch ES-s looks up its encapsulation table
   using (in-port, LTID-s, VM-d) as key. If missed, the ingress edge
   switch directs the packet to the controller and the controller
   installs the encapsulation entry for the flow. If hit, the ingress
   edge switch obtains the tuple (LTID-d, ES-d, p-ID). Then VLAN field
   of the original Ethernet header is changed from LTID-s to LTID-d, and
   an outer Ethernet header is added. The ingress edge switch
   immediately looks up the FIB table to forward the packet.

   After that, the packet is delivered by core switches towards the
   egress edge switch ES-d. The egress edge switch gets the VLAN field
   of the outer Ethernet header p-ID, decapsulates the outer Ethernet
   header, and forwards it to the port p-ID.

   Finally the packet arrives at the destination virtual switch. The
   virtual switch looks up the FIB table based on LTID-d and VM-d, and
   delivers it to the destination VM.


3.2 Multicast/Broadcast Traffic

   When a VM generates a multicast packet, the destination address field
   of the Ethernet header is filled with the layer-2 multicast group
   address, denoted as Group-L. This packet then goes to the virtual
   switch, which inserts LTID-s into the VLAN field and forwards it
   towards the ingress edge switch.

   The ingress edge switch ES-s looks up its multicast encapsulation
   table using (in-port, LTID-s, Group-L) as key. If missed, the ingress
   edge switch directs the packet to the controller. Then, the
   controller installs the multicast encapsulation entry into the
   ingress edge switch and the multicast decapsulation entries into the
   egress edge switches. If hit, the ingress edge switch gets the global
   multicast group address Group-G to fill in the outer Ethernet header.

   This packet is then forwarded towards the egress edge switches along
   the multicast tree. When an egress edge switch receives this packet,
   it takes Group-G filled in the outer Ethernet header as key and gets
   multiple (Out-PORT,LTID-d) tuples. It then duplicates the packet as
   the number of the tuples, decapsulates each packet duplication,
   rewrites the LTID of it and forwards it towards the Out-PORT.

   Finally, the packet arrives at the destination virtual switch and is
   forwarded towards VMs which have joined the multicast group in the
 


Chen, et al.                Expires Feb 2016                   [Page 12]

INTERNET DRAFT                    SVDC                       August 2015


   virtual network.


3.3  SVDC Frame Format

   To mask the overlapped VM MAC addresses and mitigate the limitation
   of the forwarding table size in switches. SVDC enhances MAC-in-MAC
   encapsulation to guarantee correct packet forwarding. Figure 2
   demonstrates the packet format of the MAC-in-MAC encapsulation used
   in SVDC.

    Outer Ethernet Header:             
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |             Outer Destination MAC Address                     |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       | Outer Destination MAC Address | Outer Source MAC Address      |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                Outer Source MAC Address                       |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |Ethertype = SVDC Ethertype     | Outer.VLAN Tag (p-ID)         |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    Inner Ethernet Header:
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |             Inner Destination MAC Address                     |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       | Inner Destination MAC Address | Inner Source MAC Address      |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                  Inner Source MAC Address                     |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |Ethertype = C-Tag [802.1q]     | Inner.VLAN Tag (LTID)         |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    Payload:
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       | Ethertype of Original Payload |                               |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               |
       |                                  Original Ethernet Payload    |
       |                                                               |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    Frame Check Sequence:
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |               New FCS (Frame Check Sequence)                  |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                   Figure 2. SVDC MAC-in-MAC Packet Format

   The outer Ethernet header: The source Ethernet address in the outer
   Ethernet header is set to the MAC address of the ingress edge switch.
   The destination Ethernet address is either set to the MAC address of
 


Chen, et al.                Expires Feb 2016                   [Page 13]

INTERNET DRAFT                    SVDC                       August 2015


   the egress edge switch (in unicast traffic) or set to the first 48
   bits of the Global-G assigned to the virtual network (in
   multicast/broadcast). To distinguish SVDC packets, the Ethertype of
   the outer Ethernet header needs to be set to a specific SVDC
   Ethertype. The outer VLAN information is used to indicate either the
   egress port of the packet in the egress edge switch (in unicast
   traffic) or the last 12 bits of the Global-G of the virtual network.

   The Inner Ethernet header: The source and destination Ethernet
   address in the inner Ethernet header is set to the MAC address of the
   source and destination VM, respectively. Value of the VLAN tag is
   used to indicate the LTID of the virtual network this packet belongs
   to in the destination server. The payload of the inner Ethernet
   header includes the Ethertype of the original payload and the
   original Ethernet payload.


4.  SVDC Deployment Considerations


4.1  VM Migration

   To handle VM migration, a central VM manager which can communicate
   with all hosts needs to be deployed in the network. The SVDC
   controller needs to be co-located with this central VM manager. In
   this scenario, when a VM is about to migrate, the VM manager will
   notify the SVDC controller about the destination server ID, the IP
   address and the GTID of this VM.

   SVDC controller needs to check whether a LTID is assigned to the
   virtual network of this VM in the destination server before VM
   migration starts. If not, a LTID will be created and the virtual
   switch on the destination server will be configured.

   After VM migration completes, a gratuitous ARP message is sent from
   the destination server to announce the new location of the VM. This
   ARP message is directed to SVDC controller for broadcast entries
   query when it arrives at the edge switch. In this way, SVDC can
   confirm VM migration completion and update the location information
   of this VM in its mapping tables.

   To maintain the communication states destined for the migrated VM in
   edge switches, SVDC controller broadcasts an entry update message to
   all edge switches immediately after it receives the gratuitous ARP
   message. This message contains the (LTID, ES, p-ID) tuple the
   migrated VM uses after migration. All edge switches that maintain
   encapsulation table entries toward the migrated VM update their
   encapsulation tables and keep the communication states towards the
 


Chen, et al.                Expires Feb 2016                   [Page 14]

INTERNET DRAFT                    SVDC                       August 2015


   migrated VM. The gratuitous ARP message is then sent to VMs within
   the same virtual networks to update the ARP tables of them.


4.2  Fault Tolerance

   An important aspect of large virtualized data center network is the
   increased likelihood of failures. SVDC tolerates server failures as
   well as edge switch failures, because no "hard state" is associated
   with a specific virtual switch or edge switch. In large virtualized
   data center, it is rational to assume that there are virtual network
   and physical network management systems which are responsible for
   detecting failed virtual switches or edge switches.

   However, it is necessary for SVDC to handle failures of controller
   instances or control links between controller instances and edge
   switches. To handle failures of controller instances, more than one
   controller instances can be used to manage each network element. All
   controller instances will synchronize network information
   periodically. They can work in hot backup or cold backup mode. When
   one controller instance fails, another instance can replace it in
   time. To handle failures of control links, traditional routing
   protocols that are fault-tolerant, e.g. Spanning-Tree protocol
   [802.1D], can be applied to the out-band management network
   deployment. For in-band management network deployment, we assume the
   layer-2 routing scheme in the core network can take the
   responsibility to handle link failures.


5  Security Considerations

   Since SVDC enhances MAC-in-MAC technique to implement network
   virtualization, it faces several security challenges that traditional
   Ethernet network also faces, such as layer-2 traffic snooping, packet
   flooding causing denial of service attack and MAC address spoofing.
   In SVDC, malicious end-point can choose to attack the SVDC controller
   by forging a great number of communication request with different
   source and destination pairs or hijack the MAC address of the edge
   switch to interfere the normal communication between the SVDC
   controller and the edge switches.

   Traditional layer-2 technique can be deployed in SVDC to handle these
   problems, for example, IEEE 802.1 port admission control mechanism
   [802.1X] can be used to mitigate the spoofing problem. The security
   of the communication channel between edge switches and the SVDC
   controller relies on security mechanism in transport layer. 


 


Chen, et al.                Expires Feb 2016                   [Page 15]

INTERNET DRAFT                    SVDC                       August 2015


6  IANA Considerations

   This document has no actions for IANA, but SVDC needs to be assigned
   a new ethertype.


7  References

7.1  Normative References


   [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.

7.2  Informative References

   [802.1aq] IEEE, "Standard for Local and metropolitan area networks --
              Media Access Control (MAC) Bridges and Virtual Bridged
              Local Area Networks -- Amendment 20: Shortest Path
              Bridging", IEEE P802.1aq-2012, 2012.

   [802.1D] IEEE, "Draft Standard for Local and Metropolitan Area
              Networks/ Media Access Control (MAC) Bridges", IEEE
              P802.1D-2004, 2004.

   [802.1q] IEEE, "Standards for Local and Metropolitan Area Networks:
              Virtual Bridged Local Area Networks.", IEEE Standard
              802.1Q, 2005 Edition, May 2006.

   [802.1X] IEEE, "IEEE Standard for Local and Metropolitan area
              networks -- Port-Based Network Access Control", IEEE Std
              802.1X-2010, February 2010.

   [RFC4762] Lasserre, M. and Kompella, V., "Virtual private LAN service
              (VPLS) using label distribution protocol (LDP) signaling",
              RFC 4762, January 2007.

   [RFC6325] Perlman, R., Eastlake 3rd, D., Dutt, D., Gai, S., and A.
              Ghanwani, "Routing Bridges (RBridges): Base Protocol
              Specification", RFC 6325, July 2011.

   [RFC7348] Mahalingam, M., Dutt, D., Duda, K., and Agarwal, P.,
              "Virtual eXtensible Local Area Network (VXLAN): A
              Framework for Overlaying Virtualized Layer 2 Networks over
              Layer 3 Networks", RFC 7348, August 2014.

   [NVGRE] Sridharan, M., A. Greenberg, N. Venkataramiah, Y. Wang, K.
              Duda, I. Ganga, G. Lin, M. Pearson, P. Thaler, and C.
 


Chen, et al.                Expires Feb 2016                   [Page 16]

INTERNET DRAFT                    SVDC                       August 2015


              Tumuluri. "NVGRE: Network virtualization using generic
              routing encapsulation." IETF draft, April, 2015.

   [SDN] Open Networking Foundation White Paper, "Software-Defined
              Networking: The New Norm for Networks", April 2012.

   [OPENFLOW] McKeown, N., T. Anderson, H. Balakrishnan, G. Parulkar, L.
              Peterson, J. Rexford, S. Shenker, and J. Turner.
              "OpenFlow: enabling innovation in campus networks
              (OpenFlow White Paper)." Online:
              http://www.openflowswitch.org 2008.



Authors' Addresses


   Congjie Chen
   4-104, FIT Building, 
   Tsinghua University, 
   Hai Dian District, 
   Beijing, China

   EMail: ccjguangzhou@gmail.com



   Dan Li
   4-104, FIT Building, 
   Tsinghua University, 
   Hai Dian District, 
   Beijing, China

   EMail: tolidan@tsinghua.edu.cn



   Jun Li
   Network and Security Research Laboratory, 
   Department of Computer and Information Science, 
   University of Oregon,
   1585 E 13th Ave.
   Eugene, OR 97403

   EMail: lijun@cs.uoregon.edu






Chen, et al.                Expires Feb 2016                   [Page 17]