Internet DRAFT - draft-bitar-lasserre-nvo3-dp-reqs

draft-bitar-lasserre-nvo3-dp-reqs



Internet Engineering Task Force                             Nabil Bitar
Internet Draft                                                  Verizon
Intended status: Informational
Expires: November 2012                                    Marc Lasserre
                                                           Florin Balus
                                                         Alcatel-Lucent




                                                           May 15, 2012




                       NVO3 Data Plane Requirements
                 draft-bitar-lasserre-nvo3-dp-reqs-00.txt





Status of this Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six
   months and may be updated, replaced, or obsoleted by other documents
   at any time.  It is inappropriate to use Internet-Drafts as
   reference material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html

   This Internet-Draft will expire on November 15, 2012.

Copyright Notice

   Copyright (c) 2012 IETF Trust and the persons identified as the
   document authors. All rights reserved.




Lasserre, et al.      Expires November 15, 2012                [Page 1]

Internet-Draft          NVO3 Data Plane Requirements         May 2012


   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document. Please review these documents
   carefully, as they describe your rights and restrictions with
   respect to this document.



Abstract

   Several IETF drafts relate to the use of overlay networks to support
   large scale virtual data centers. This draft provides a list of data
   plane requirements for Network Virtualization over L3 (NVO3) that
   have to be addressed in solutions documents.



Table of Contents

   1. Introduction...................................................3
      1.1. Conventions used in this document.........................3
      1.2. General terminology.......................................3
   2. Data Path Overview.............................................4
   3. Data Plane Requirements........................................5
      3.1. Virtual Access Points (VAPs)..............................5
      3.2. Virtual Network Instance (VNI)............................5
      3.2.1. L2 VNI..................................................6
      3.2.2. L3 VNI..................................................6
      3.3. Overlay Module............................................7
      3.3.1. NVO3 overlay header.....................................7
      3.3.1.1. Virtual Network Identifier (VNID).....................7
      3.3.1.2. Service QoS identifier................................8
      3.3.2. NVE Tunneling function..................................9
      3.3.2.1. LAG and ECMP..........................................9
      3.3.2.2. DiffServ and ECN marking.............................10
      3.3.2.3. Handling of BUM traffic..............................10
      3.4. External NVO3 connectivity...............................11
      3.4.1. GW Types...............................................11
      3.4.1.1. VPN and Internet GWs.................................11
      3.4.1.2. Inter-DC GW..........................................11
      3.4.1.3. Intra-DC gateways....................................11
      3.4.2. Path optimality between NVEs and Gateways..............12
      3.4.2.1. Triangular Routing Issues,a.k.a.: Traffic Tromboning.13
      3.5. Path MTU.................................................14
      3.6. Hierarchical NVE.........................................14



Lasserre, et al.      Expires November 15, 2012                [Page 2]

Internet-Draft          NVO3 Data Plane Requirements         May 2012


      3.7. NVE Multi-Homing Requirements............................14
      3.8. OAM......................................................15
      3.9. Other considerations.....................................15
      3.9.1. Data Plane Optimizations...............................15
      3.9.2. NVE location trade-offs................................16
   4. Security Considerations.......................................16
   5. IANA Considerations...........................................16
   6. References....................................................17
      6.1. Normative References.....................................17
      6.2. Informative References...................................17
   7. Acknowledgments...............................................18



1. Introduction

1.1. Conventions used in this document

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC-2119 [RFC2119].

   In this document, these words will appear with that interpretation
   only when in ALL CAPS. Lower case uses of these words are not to be
   interpreted as carrying RFC-2119 significance.

1.2. General terminology

   The terminology defined in [NVO3-framework] is used throughout this
   document. Terminology specific to this memo is defined here and is
   introduced as needed in later sections.

   DC: Data Center

   BUM: Broadcast, Unknown Unicast, Multicast traffic

   TES: Tenant End System

   VAP: Virtual Access Point

   VNI: Virtual Network Instance

   VNID: VNI ID






Lasserre, et al.      Expires November 15, 2012                [Page 3]

Internet-Draft          NVO3 Data Plane Requirements         May 2012




2. Data Path Overview

   The NVO3 framework [NVO3-framework] defines the generic NVE model
   depicted in Figure 1:

                      +------- L3 Network ------+
                      |                         |
                      |       Tunnel Overlay    |
         +------------+--------+       +--------+------------+
         | +----------+------+ |       | +------+----------+ |
         | | Overlay Module  | |       | | Overlay Module  | |
         | +--------+--------+ |       | +--------+--------+ |
         |          | VNID     |       |          | VNID     |
         |          |          |       |          |          |
         |  +-------+-------+  |       |  +-------+-------+  |
         |  |      VNI      |  |       |  |      VNI      |  |
    NVE1 |  +-+-----------+-+  |       |  +-+-----------+-+  | NVE2
         |    |   VAPs    |    |       |    |   VAPs    |    |
         +----+-----------+----+       +----+-----------+----+
              |           |                 |           |
       -------+-----------+-----------------+-----------+-------
              |           |     Tenant      |           |
              |           |   Service IF    |           |
            Tenant End Systems            Tenant End Systems

              Figure 1 : Generic reference model for NV Edge

   When a frame is received by an ingress NVE from a Tenant End System
   over a local VAP, it needs to be parsed in order to identify which
   virtual network instance it belongs to. The parsing function can
   examine various fields in the data frame (e.g., VLANID) and/or
   associated interface/port the frame came from.

   Once a corresponding VNI is identified, a lookup is performed to
   determine where the frame needs to be sent. This lookup can be based
   on any combinations of various fields in the data frame (e.g.,
   source/destination MAC addresses and/or source/destination IP
   addresses). Note that additional criteria such as 802.1p and/or DSCP
   markings can be used to select an appropriate tunnel or local VAP
   destination.



Lasserre, et al.      Expires November 15, 2012                [Page 4]

Internet-Draft          NVO3 Data Plane Requirements         May 2012


   Lookup tables can be populated using different techniques: data
   plane learning, management plane configuration, or a distributed
   control plane. Management and control planes are not in the scope of
   this document. The data plane based solution is described in this
   document as it has implications on the data plane processing
   function.

   The result of this lookup yields the corresponding tunnel
   information needed to build the overlay encapsulation header. This
   information includes the destination L3 address of the egress NVE.
   Note that this lookup might yield a list of tunnels such as when
   ingress replication is used for BUM traffic.

   The overlay tunnel encapsulation header MUST include a context
   identifier which the egress NVE will use to identify which VNI this
   frame belongs to.

   The egress NVE checks the context identifier and removes the
   encapsulation header and then forwards the original frame towards
   the appropriate recipient, usually a local VAP.

3. Data Plane Requirements

3.1. Virtual Access Points (VAPs)

   The NVE forwarding plane MUST support VAP identification through the
   following mechanisms:

   - Using the local interface on which the frames are received, where
     the local interface may be an internal, virtual port in a VSwitch
     or a physical port on the ToR
   - Using the local interface and some fields in the frame header,
     e.g. one or multiple VLANs or the source MAC

3.2. Virtual Network Instance (VNI)

   VAPs are associated with a specific VNI at service instantiation
   time.

   A VNI identifies a per-tenant private context, i.e. per-tenant
   policies and a FIB table to allow overlapping address space between
   tenants.

   There are different VNI types differentiated by the virtual network
   service they provide to Tenant End Systems. Network virtualization
   can be provided by L2 and/or L3 VNIs.



Lasserre, et al.      Expires November 15, 2012                [Page 5]

Internet-Draft          NVO3 Data Plane Requirements         May 2012


3.2.1. L2 VNI

   An L2 VNI MUST provide an emulated Ethernet multipoint service as if
   Tenant End Systems are interconnected by an 802.1Q LAN over a set of
   NVO3 tunnels. An L2 VNI provides per tenant virtual switching
   instance with MAC addressing isolation and L3 tunneling. Loop
   avoidance capability MUST be provided.

   In the absence of a management or control plane, data plane learning
   MUST be used to populate forwarding tables. Forwarding table entries
   provide mapping information between MAC addresses and L3 tunnel
   destination addresses. As frames arrive from VAPs or from overlay
   tunnels, the MAC learning procedures described in IEEE 802.1Q are
   used: The source MAC address is learned against the VAP or the NVO3
   tunnel on which the frame arrived.

   Broadcast, Unknown Unicast and Multicast (BUM) traffic handling MUST
   be supported. To achieve this, the NVE must be able to build at
   least a default flooding tree per VNI. The flooding tree is
   equivalent with a multicast (*,G) construct where all the NVEs where
   the corresponding VNI is instantiated are members.

   The ability to establish or pre-provision multicast trees at the NVE
   SHOULD be supported.

   It MUST also be possible to select whether the NVE provides
   optimized multicast trees inside the VNI for individual tenant
   multicast groups or whether the default VNI flooding tree is used.
   If the former option is selected the VNI MUST be able to snoop
   IGMP/MLD messages in order to efficiently join/prune End System from
   multicast trees.

3.2.2. L3 VNI

   L3 VNIs MUST provide virtualized IP routing and forwarding. L3 VNIs
   MUST support per-tenant routing instance with IP addressing
   isolation and L3 tunneling for interconnecting instances of the same
   VNI on NVEs.

   In the case of L3 VNI, the inner TTL field MUST be decremented by 1
   as if the NVO3 egress NVE was one hop away. The TTL field in the
   outer IP header must be set to a value appropriate for delivery of
   the encapsulated frame to the tunnel exit point. Thus, the default
   behavior must be the TTL pipe model where the overlay network looks
   like one hop to the sending NVE. Configuration of the uniform TTL
   model where the outer tunnel TTL is set equal to the inner TTL on



Lasserre, et al.      Expires November 15, 2012                [Page 6]

Internet-Draft          NVO3 Data Plane Requirements         May 2012


   ingress NVE and the inner TTL is set to the outer TTL value on
   egress should be supported.

   L2 and L3 VNIs can be deployed in isolation or in combination to
   optimize traffic flows per tenant across the overlay network. For
   example, an L2 VNI may be configured across a number of NVEs to
   offer L2 multi-point service connectivity while a L3 VNI can be co-
   located to offer local routing capabilities and gateway
   functionality. In addition, integrated routing and bridging per
   tenant must be supported on an NVE. An instantiation of such service
   may be realized by interconnecting an L2 VNI as access to an L3 VNI
   on the NVE.

   The L3 VNI does not require support for Broadcast and Unknown
   Unicast traffic. The L3 VNI MAY provide support for customer
   multicast groups. This paragraph will be expanded in a future
   version of the draft.

3.3. Overlay Module

   The overlay module performs a number of functions related to NVO3
   header and tunnel processing. Specifically for a L2 VNI it provides
   the capability to encapsulate and send Ethernet traffic over NVO3
   tunnels. For a L3 VNI it provides the capability to encapsulate and
   carry IP traffic (both IPv4 and IPv6) over NVO3 tunnels.

3.3.1. NVO3 overlay header

   An NVO3 overlay header MUST be included after the tunnel
   encapsulation header when forwarding tenant traffic. This section
   describes the fields that need to be included as part of the NOV3
   overlay header. In this version the focus is on the VN instance and
   service QoS fields. Future versions may include additional fields.

3.3.1.1. Virtual Network Identifier (VNID)

   A VNID MUST be included in the overlay encapsulation header on the
   ingress NVE when encapsulating a tenant Ethernet frame or IP packet
   on the overlay tunnel. The egress NVE uses the VNID to identify the
   VN context in which the encapsulated frame should be processed. The
   VNID can be either a globally unique identifier (on a per-
   administrative domain) or a locally significant identifier. It MUST
   be easily parsed and processed by the data path and MAY be
   distributable by a control-plane or configured via a management
   plane.




Lasserre, et al.      Expires November 15, 2012                [Page 7]

Internet-Draft          NVO3 Data Plane Requirements         May 2012


   The VNID MUST be large enough to scale to 100's of thousands of
   virtual networks.

3.3.1.2. Service QoS identifier

   Traffic flows originating from different applications could rely on
   differentiated forwarding treatment to meet end-to-end availability
   and performance objectives. Such applications may span across one or
   more overlay networks. To enable such treatment, support for
   multiple Classes of Service across or between overlay networks is
   required.

   To effectively enforce CoS across or between overlay networks, NVEs
   should be able to map CoS markings between networking layers, e.g.,
   Tenant End Systems, Overlays, and/or Underlay, enabling each
   networking layer to independently enforce its own CoS policies. For
   example:

   - TES (e.g. VM) CoS

        o Tenant CoS policies MAY be defined by Tenant administrators

        o QoS fields (e.g. IP DSCP and/or Ethernet 802.1p) in the
          tenant frame are used to indicate application level CoS
          requirements

   - NVE CoS

        o NVE MAY classify packets based on Tenant CoS markings or
          other mechanisms (eg. DPI) to identify the proper service CoS
          to be applied across the overlay network

        o NVE service CoS levels are normalized to a common set (for
          example 8 levels) across multiple tenants; NVE uses per
          tenant policies to map Tenant CoS to the normalized service
          CoS fields in the NVO3 header

   - Underlay CoS

        o The underlay/core network may use a different CoS set (for
          example 4 levels) than the NVE CoS as the core devices may
          have different QoS capabilities compared with NVEs.

        o The Underlay CoS may also change as the NVO3 tunnels pass
          between different domains.




Lasserre, et al.      Expires November 15, 2012                [Page 8]

Internet-Draft          NVO3 Data Plane Requirements         May 2012


   Support for NVE Service CoS SHOULD be provided through a QoS field,
   inside the NVO3 overlay header. Examples of service CoS provided
   part of the service tag are 802.1p and DE bits in the VLAN and PBB
   ISID tags and MPLS EXP bits in the VPN labels.

3.3.2. NVE Tunneling function

   This section describes NVE tunneling requirements. From an
   encapsulation perspective the IPv4 and IPv6 encapsulations MUST be
   supported, MPLS tunneling MAY be supported.

3.3.2.1. LAG and ECMP

   For performance reasons, multipath over LAG and ECMP paths SHOULD be
   supported.

   LAG (Link Aggregation Group) [IEEE 802.1AX-2008] and ECMP (Equal
   Cost Multi Path) are commonly used techniques to perform load-
   balancing of microflows over a set of a parallel links either at
   Layer-2 (LAG) or Layer-3 (ECMP). Existing deployed hardware
   implementations of LAG and ECMP uses a hash of various fields in the
   encapsulation (outermost) header(s) (e.g. source and destination MAC
   addresses for non-IP traffic, source and destination IP addresses,
   L4 protocol, L4 source and destination port numbers, etc).
   Furthermore, hardware deployed for the underlay network(s) will be
   most often unaware of the carried, innermost L2 frames or L3 packets
   transmitted by the TES. Thus, in order to perform fine-grained load-
   balancing over LAG and ECMP paths in the underlying network the NVO3
   encapsulation headers and/or tunneling methods MUST contain a
   "entropy field" or "entropy label" so the underlying network can
   perform fine-grained load-balancing of the NVO3 encapsulated
   traffic, (e.g.: [RFC6391], [RFC6438], [draft-kompella-mpls-entropy-
   label-02], etc.) It is recommended this entropy label/field be
   applied at the ingress VNI, likely using information gleaned from
   the ingress VAP. If necessary, the entropy label/field will be
   discarded at the egress VNI.

   All packets that belong to a specific flow MUST follow the same path
   in order to prevent packet re-ordering. This is typically achieved
   by ensuring that the fields used for hashing are identical for a
   given flow.

   All paths available to the overlay network SHOULD be used
   efficiently. Different flows SHOULD be distributed as evenly as
   possible across multiple underlay network paths. This can be




Lasserre, et al.      Expires November 15, 2012                [Page 9]

Internet-Draft          NVO3 Data Plane Requirements         May 2012


   achieved by ensuring that some fields used for hashing are randomly
   generated.

3.3.2.2. DiffServ and ECN marking

   When traffic is encapsulated in a tunnel header, there are numerous
   options as to how the Diffserv Code-Point (DSCP) and Explicit
   Congestion Notification (ECN) markings are set in the outer header
   and propagated to the inner header on decapsulation.

   [RFC2983] defines two modes for mapping the DSCP markings from inner
   to outer headers and vice versa.  The Uniform model copies the inner
   DSCP marking to the outer header on tunnel ingress, and copies that
   outer header value back to the inner header at tunnel egress.  The
   Pipe model sets the DSCP value to some value based on local policy
   at ingress and does not modify the inner header on egress.  Both
   models SHOULD be supported.

   ECN marking MUST be performed according to [RFC6040] which describes
   the correct ECN behavior for IP tunnels.

3.3.2.3. Handling of BUM traffic

   NVO3 data plane support for either ingress replication or point-to-
   multipoint tunnels is required to send traffic destined to multiple
   locations on a per-VNI basis (e.g. L2/L3 multicast traffic, L2
   broadcast and unknown unicast traffic). It is possible that both
   methods be used simultaneously.

   L2 NVEs MUST support ingress replication and SHOULD support point-
   to-multipoint tunnels. L3 VNIs MAY support either one of the two
   methods.

   There is a bandwidth vs state trade-off between the two approaches.
   User-definable knobs MUST be provided to select which method(s) gets
   used based upon the amount of replication required (i.e. the number
   of hosts per group), the amount of multicast state to maintain, the
   duration of multicast flows and the scalability of multicast
   protocols.

   When ingress replication is used, NVEs must track for each VNI the
   related tunnel endpoints to which it needs to replicate the frame.

   For point-to-multipoint tunnels, the bandwidth efficiency is
   increased at the cost of more state in the Core nodes. The ability
   to auto-discover or pre-provision the mapping between VNI multicast



Lasserre, et al.      Expires November 15, 2012               [Page 10]

Internet-Draft          NVO3 Data Plane Requirements         May 2012


   trees to related tunnel endpoints at the NVE and/or throughout the
   core SHOULD be supported.

3.4. External NVO3 connectivity

   NVO3 services MUST interoperate with current VPN and Internet
   services. This may happen inside one DC during a migration phase or
   as NVO3 services are delivered to the outside world via Internet or
   VPN gateways.

   Moreover the compute and storage services delivered by a NVO3 domain
   may span multiple DCs requiring Inter-DC connectivity. From a DC
   perspective a set of gateway devices are required in all of these
   cases albeit with different functionalities influenced by the
   overlay type across the WAN, the service type and the DC network
   technologies used at each DC site.

   A GW handling the connectivity between NVO3 and external domains
   represents a single point of failure that may affect multiple tenant
   services. Redundancy between NVO3 and external domains MUST be
   supported.

3.4.1. GW Types

3.4.1.1. VPN and Internet GWs

   Tenant sites may be already interconnected using one of the existing
   VPN services and technologies (VPLS or IP VPN). A VPN GW is required
   to translate between NVO3 and VPN encapsulation and forwarding
   procedures. Internet connected Tenants require translation from NVO3
   encapsulation to IP in the NVO3 gateway. The translation function
   SHOULD NOT require provisioning touches and SHOULD NOT use
   intermediate hand-offs, for example VLANs.

3.4.1.2. Inter-DC GW

   Inter-DC connectivity may be required to provide support for
   features like disaster prevention or compute load re-distribution.
   This may be provided through a set of gateways interconnected
   through a WAN. This type of connectivity may be provided either
   through extension of the NVO3 tunneling domain or via VPN GWs.

3.4.1.3. Intra-DC gateways

   Even within one DC there may be End Devices that do not support NVO3
   encapsulation, for example bare metal servers, hardware appliances



Lasserre, et al.      Expires November 15, 2012               [Page 11]

Internet-Draft          NVO3 Data Plane Requirements         May 2012


   and storage. A gateway device, e.g. a ToR, is required to translate
   the NVO3 to Ethernet VLAN encapsulation.

3.4.2. Path optimality between NVEs and Gateways

   Within the NVO3 overlay, a default assumption is that NVO3 traffic
   will be equally load-balanced across the underlying network
   consisting of LAG and/or ECMP paths. This assumption is valid only
   as long as: a) all traffic is load-balanced equally among each of
   the component-links and paths; and, b) each of the component-
   links/paths is of identical capacity. During the course of normal
   operation of the underlying network, it is possible that one, or
   more, of the component-links/paths of a LAG may be taken out-of-
   service in order to be repaired, e.g.: due to hardware failure of
   cabling, optics, etc. In such cases, the administrator should
   configure the underlying network such that an entire LAG bundle in
   the underlying network will be reported as operationally down if
   there is a failure of any single component-link member of the LAG
   bundle, (e.g.: N = M configuration of the LAG bundle), and, thus,
   they know that traffic will be carried sufficiently by alternate,
   available (potentially ECMP) paths in the underlying network. This
   is a likely an adequate assumption for Intra-DC traffic where
   presumably the costs for additional, protection capacity along
   alternate paths is not cost-prohibitive. Thus, there are likely no
   additional requirements on NVO3 solutions to accommodate this type
   of underlying network configuration and administration.

   There is a similar case with ECMP, used Intra-DC, where failure of a
   single component-path of an ECMP group would result in traffic
   shifting onto the surviving members of the ECMP group.
   Unfortunately, there are no automatic recovery methods in IP routing
   protocols to detect a simultaneous failure of more than one
   component-path in a ECMP group, operationally disable the entire
   ECMP group and allow traffic to shift onto alternative paths. This
   is problem is attributable to the underlying network and, thus, out-
   of-scope of any NVO3 solutions.

   On the other hand, for Inter-DC and DC to External Network cases
   that use a WAN, the costs of the underlying network and/or service
   (e.g.: IPVPN service) are more expensive; therefore, there is a
   requirement on administrators to both: a) ensure high availability
   (active-backup failover or active-active load-balancing); and, b)
   maintaining substantial utilization of the WAN transport capacity at
   nearly all times, particularly in the case of active-active load-
   balancing. With respect to the dataplane requirements of NVO3
   solutions, in the case of active-backup fail-over, all of the



Lasserre, et al.      Expires November 15, 2012               [Page 12]

Internet-Draft          NVO3 Data Plane Requirements         May 2012


   ingress NVE's MUST dynamically learn of the failure of an active NVE
   GW when the backup NVE GW announces itself into the NVO3 overlay
   immediately following a failure of the previously active NVE GW and
   update their forwarding tables accordingly, (e.g.: perhaps through
   dataplane learning and/or translation of a gratuitous ARP, IPv6
   Router Advertisement, etc.) Note that active-backup fail-over could
   be used to accomplish a crude form of load-balancing by, for
   example, manually configuring each tenant to use a different NVE GW,
   in a round-robin fashion. On the other hand, with respect to active-
   active load-balancing across physically separate NVE GW's (e.g.:
   two, separate chassis) an NVO3 solution SHOULD support forwarding
   tables that can simultaneously map a single egress NVE to more than
   one NVO3 tunnels. The granularity of such mappings, in both active-
   backup and active-active, MUST be unique to each tenant.

3.4.2.1. Triangular Routing Issues,a.k.a.: Traffic Tromboning

   L2/ELAN over NVO3 service may span multiple racks distributed across
   different DC regions. Multiple ELANs belonging to one tenant may be
   interconnected or connected to the outside world through multiple
   Router/VRF gateways distributed throughout the DC regions. In this
   scenario, without aid from an NVO3 or other type of solution,
   traffic from an ingress NVE destined to External gateways will take
   a non-optimal path that will result in higher latency and costs,
   (since it is using more expensive resources of a WAN). In the case
   of traffic from an IP/MPLS network destined toward the entrance to
   an NVO3 overlay, well-known IP routing techniques may be used to
   optimize traffic into the NVO3 overlay, (at the expense of
   additional routes in the IP/MPLS network). In summary, these issues
   are well known as triangular routing.

   Procedures for gateway selection to avoid triangular routing issues
   SHOULD be provided. The details of such procedures are, most likely,
   part of the NVO3 Management and/or Control Plane requirements and,
   thus, out of scope of this document. However, a key requirement on
   the dataplane of any NVO3 solution to avoid triangular routing is
   stated above, in Section 3.4.2, with respect to active-active load-
   balancing. More specifically, an NVO3 solution SHOULD support
   forwarding tables that can simultaneously map a single egress NVE to
   more than one NVO3 tunnels. The expectation is that, through the
   Control and/or Management Planes, this mapping information may be
   dynamically manipulated to, for example, provide the closest
   geographic and/or topological exit point (egress NVE) for each
   ingress NVE.





Lasserre, et al.      Expires November 15, 2012               [Page 13]

Internet-Draft          NVO3 Data Plane Requirements         May 2012


3.5. Path MTU

   The tunnel overlay header can cause the MTU of the path to the
   egress tunnel endpoint to be exceeded.

   IP fragmentation should be avoided for performance reasons.

   The interface MTU as seen by a Tenant End System SHOULD be adjusted
   such that no fragmentation is needed. This can be achieved by
   configuration or be discovered dynamically.

   Either of the following options MUST be supported:

     o Classical ICMP-based MTU Path Discovery [RFC1191] [RFC1981] or
        Extended MTU Path Discovery techniques such as defined in
        [RFC4821]

     o Segmentation and reassembly support from the overlay layer
        operations without relying on the Tenant End Systems to know
        about the end-to-end MTU

     o The underlay network may be designed in such a way that the MTU
        can accommodate the extra tunnel overhead.

3.6. Hierarchical NVE

   It might be desirable to support the concept of hierarchical NVEs,
   such as spoke NVEs and hub NVEs, in order to address possible NVE
   performance limitations and service connectivity optimizations.

   For instance, it is possible that the amount of tunneling state to
   handle within a single NVE be too important. This can happen with
   both IP based tunneling and more specifically with MPLS based
   tunneling.

   NVEs can be either connected in an any-to-any or hub and spoke
   topology on a per VNI basis.

3.7. NVE Multi-Homing Requirements

   Multi-homing to a set of NVEs may be required in certain scenarios:

     . End Device dual-homed to two ToR switches acting as NVEs
     . Multi-homing into NVE-GWs providing connectivity between
        domains using different technologies
     . Hierarchical NVEs: Spoke NVE multi-homed to Hub NVEs



Lasserre, et al.      Expires November 15, 2012               [Page 14]

Internet-Draft          NVO3 Data Plane Requirements         May 2012


   This section will be extended in the next revision.

3.8. OAM

   NVE may be able to originate/terminate OAM messages for connectivity
   verification, performance monitoring, statistic gathering and fault
   isolation. Depending on configuration, NVEs SHOULD be able to
   process or transparently tunnel OAM messages, as well as supporting
   alarm propagation capabilities.

   Given the critical requirement to load-balance NVO3 encapsulated
   packets over LAG and ECMP paths, it will be equally critical to
   ensure existing and/or new OAM tools allow NVE administrators to
   proactively and/or reactively monitor the health of various
   component-links that comprise both LAG and ECMP paths carrying NVO3
   encapsulated packets. For example, it will be important that such
   OAM tools allow NVE administrators to reveal the set of underlying
   network hops (topology) in order that the underlying network
   administrators can use this information to quickly perform fault
   isolation and restore the underlying network.

   The NVE MUST provide the ability to reveal the set of ECMP and/or
   LAG paths used by NVO3 encapsulated packets in the underlying
   network from an ingress NVE to egress NVE. The NVE MUST provide the
   ability to provide a "ping"-like functionality that may be used to
   determine the health (liveness) of remote NVE's or their VNI's. The
   NVE SHOULD provide a "ping"-like functionality to more expeditiously
   aid in troubleshooting performance problems, i.e.: blackholing or
   other types of congestion occurring in the underlying network, for
   NVO3 encapsulated packets carried over LAG and/or ECMP paths.

3.9. Other considerations

3.9.1. Data Plane Optimizations

   Data plane forwarding and encapsulation choices SHOULD consider the
   limitation of possible NVE implementations, specifically in software
   based implementations (e.g.  servers running VSwitches)

   NVE should provide efficient processing of traffic. For instance,
   packet alignment, the use of offsets to minimize header parsing,
   padding techniques SHOULD be considered when designing NVO3
   encapsulation types.

   Software-based NVEs SHOULD make use of hardware assist provided by
   NICs in order to speed up packet processing.



Lasserre, et al.      Expires November 15, 2012               [Page 15]

Internet-Draft          NVO3 Data Plane Requirements         May 2012


3.9.2. NVE location trade-offs

   In the case of DC traffic, traffic originated from a VM is native
   Ethernet traffic. This traffic can be switched by a local VM switch
   or ToR switch and then by a DC gateway. The NVE function can be
   embedded within any of these elements.

   The NVE function can be supported in various DC network elements
   such as a VM, VM switch, ToR switch or DC GW.

   The following criteria SHOULD be considered when deciding where the
   NVE processing boundary happens:

     o Processing and memory requirements

          o Datapath (e.g. lookups, filtering,
            encapsulation/decapsulation)

          o Control plane processing (e.g. routing, signaling, OAM)

     o FIB/RIB size

     o Multicast support

          o Routing protocols

          o Packet replication capability

     o Fragmentation support

     o QoS transparency

     o Resiliency

4. Security Considerations

   The tenant to overlay mapping function can introduce significant
   security risks if appropriate protocols are not used that can
   support authentication.

   No other new security issues are introduced beyond those described
   already in the related L2VPN and L3VPN RFCs.

5. IANA Considerations

   IANA does not need to take any action for this draft.



Lasserre, et al.      Expires November 15, 2012               [Page 16]

Internet-Draft          NVO3 Data Plane Requirements         May 2012


6. References

6.1. Normative References

   [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
             Requirement Levels", BCP 14, RFC 2119, March 1997.

6.2. Informative References

   [NVOPS]  Narten, T. et al, "Problem Statement: Overlays for Network
             Virtualization", draft-narten-nvo3-overlay-problem-
             statement (work in progress)

   [NVO3-framework]  Lasserre, M. et al, "Framework for DC Network
             Virtualization", draft-lasserre-nvo3-framework (work in
             progress)

   [OVCPREQ] Kreeger, L. et al, "Network Virtualization Overlay Control
             Protocol Requirements", draft-kreeger-nvo3-overlay-cp
             (work in progress)

   [FLOYD]  Sally Floyd, Allyn Romanow, "Dynamics of TCP Traffic over
             ATM Networks", IEEE JSAC, V. 13 N. 4, May 1995

   [RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private
             Networks (VPNs)", RFC 4364, February 2006.

   [RFC1191] Mogul, J. "Path MTU Discovery", RFC1191, November 1990

   [RFC1981] McCann, J. et al, "Path MTU Discovery for IPv6", RFC1981,
             August 1996

   [RFC4821] Mathis, M. et al, "Packetization Layer Path MTU
             Discovery", RFC4821, March 2007

   [RFC2983] Black, D. "Diffserv and tunnels", RFC2983, Cotober 2000

   [RFC6040] Briscoe, B. "Tunnelling of Explicit Congestion
             Notification", RFC6040, November 2010

   [RFC6438] Carpenter, B. et al, "Using the IPv6 Flow Label for Equal
             Cost Multipath Routing and Link Aggregation in Tunnels",
             RFC6438, November 2011






Lasserre, et al.      Expires November 15, 2012               [Page 17]

Internet-Draft          NVO3 Data Plane Requirements         May 2012


   [RFC6391] Bryant, S. et al, "Flow-Aware Transport of Pseudowires
             over an MPLS Packet Switched Network", RFC6391, November
             2011

7. Acknowledgments

   In addition to the authors the following people have contributed to
   this document:

   Shane Amante, Level3

   Dimitrios Stiliadis, Rotem Salomonovitch, Alcatel-Lucent

   This document was prepared using 2-Word-v2.0.template.dot.

Authors' Addresses

   Nabil Bitar
   Verizon
   40 Sylvan Road
   Waltham, MA 02145
   Email: nabil.bitar@verizon.com

   Marc Lasserre
   Alcatel-Lucent
   Email: marc.lasserre@alcatel-lucent.com

   Florin Balus
   Alcatel-Lucent
   777 E. Middlefield Road
   Mountain View, CA, USA 94043
   Email: florin.balus@alcatel-lucent.com

















Lasserre, et al.      Expires November 15, 2012               [Page 18]