Internet DRAFT - draft-dunbar-nvo3-nva-gap-analysis

draft-dunbar-nvo3-nva-gap-analysis



NV03 working group                                  L. Dunbar
Internet Draft                                    D. Eastlake
Category: Informational                                Huawei


Expires: April 4 2014                      September 20, 2013




                         NV03 NVA Gap Analysis

                 draft-dunbar-nvo3-nva-gap-analysis-01

Status of this Memo

   This Internet-Draft is submitted to IETF in full conformance with
   the provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups. Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six
   months and may be updated, replaced, or obsoleted by other
   documents at any time. It is inappropriate to use Internet-Drafts
   as reference material or to cite them other than as "work in
   progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt.

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

Copyright Notice

   Copyright (c) 2013 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document. Please review these documents




Dunbar, et al         Expires November 2014           [Page 1]

Internet-Draft    draft-dunbar-nv03-NVA-gap-analysis   June 2013



   carefully, as they describe your rights and restrictions with
   respect to this document. Code Components extracted from this
   document must include Simplified BSD License text as described in
   Section 4.e of the Trust Legal Provisions and are provided
   without warranty as described in the Simplified BSD License.



Abstract

   The intent of the draft is to identify the gaps of existing
   solutions against NVO3's NVE <-> NVA control plane requirement.
   Through the gap analysis, the document provides a basis for
   future works that develop solutions for NVE<->NVA control plane.



Table of Contents


   1. Introduction ................................................ 3
   2. Terminology ................................................. 3
   3. Overall Requirement for NVE<->NVA Control Plane ............. 4
   4. Existing Directory Components ............................... 5
      4.1. Types of NVA: .......................................... 5
      4.2. Key components of the information kept in the NVA ...... 6
      4.3. Mapping Entries Distribution Mechanism ................. 6
         4.3.1. Push Mode ......................................... 6
         4.3.2. Pull Mode ......................................... 8
         4.3.3. Hybrid Mode....................................... 11
   5. Redundancy ................................................. 12
   6. Inconsistency Processing.................................... 12
   7. Gap Summary ................................................ 12
      7.1. Features necessary to NVO3 but not present in TRILL ... 12
      7.2. Additional detailed requirement applicable to NVO3's NVA 13
   8. Security Considerations..................................... 14
   9. IANA Considerations ........................................ 14
   10. Acknowledgements .......................................... 14
   11. References ................................................ 14
      11.1. Normative References.................................. 14
      11.2. Informative References................................ 15
   Authors' Addresses ............................................ 15








Dunbar, et al                                           [Page 2]

Internet-Draft    draft-dunbar-nv03-NVA-gap-analysis   June 2013



1. Introduction

   The intent of the draft is to identify the gaps of existing
   solutions against NVO3's requirement for Network Virtualization
   Authority (NVA). Through the gap analysis, the document provides
   a basis for future works to develop solutions for (NVA).

   The existing solutions analyzed in draft include the LISP mapping
   database system and TRILL's directory mechanism.

   Section 4.5 of [nvo3-problem-statement] describes the back-end
   Network Virtualization Authority (NVA) that is responsible for
   distributing the mapping information for entire overlay system.
   [nvo3-nve-nva-cp-req] defines the requirement for the control
   plane between NVA and NVE.

   There are many similarities between LISP, TRILL [RFC6325] and
   NVO3, e.g. LISP using IP header to achieve overlay, TRILL using
   TRILL header to achieve overlay, and NV03 using L3 headers plus
   VNID to achieve overlay.  This draft analyzes the TRILL directory
   mechanisms along with some LISP mapping database system that are
   applicable to NVO3's NVA<->NVE and summarize the gaps.

2. Terminology

   The following terms are used interchangeably in this document:

             - The terms "Subnet" and "VLAN" because it is common to
               map one subnet to one VLAN.
             - The term ''Directory'' and ''Network Virtualization
               Authority (NVA)''
             - The term ''NVE'' and ''Edge''


   Bridge:  IEEE Std 802.1Q-2011 compliant device [802.1Q]. In this
             draft, Bridge is used interchangeably with Layer 2
             switch.

   DA:     Destination Address

   DC:      Data Center

   EoR:    End of Row switches in data center. Also known as
             aggregation switches.

   End Station:    Guest OS running on a physical server or on a
             virtual machine. An end station in this document has at


Dunbar, et al                                           [Page 3]

Internet-Draft    draft-dunbar-nv03-NVA-gap-analysis   June 2013



             least one IP address and at least one MAC address, which
             could be in DA or SA field of a data frame.

   LISP:    Locator/ID Separation Protocol

   RBridge: ''Routing Bridge'', an alternative name for a TRILL
             switch.

   NVA:    Network Virtualization Authority

   NVE:    Network Virtualization Edge

   SA:     Source Address

   Station: A node, or a virtual node, with IP and/or MAC addresses,
             which could be in the DA or SA of a data frame.

   ToR:    Top of Rack Switch in data center. It is also known as
             access switches in some data centers.

   TRILL:   Transparent Interconnection of Lots of Links [RFC6325]

   TRILL switch: A device implementing the TRILL protocol [RFC6325]

   TS:     Tenant System

   VM:     Virtual Machines

   VN:     Virtual Network

   VNID:    Virtual Network Instance Identifier



3. Overall Requirement for NVE<->NVA Control Plane

   Section 3.1 of [nvo3-cp-req] describes the basic requirement of
   inner address to outer address mapping for NVO3.  A NVE needs to
   know the mapping of the Tenant System destination (inner) address
   to the (outer) address (IP) on the Underlying Network of the
   egress NVE, in the same way as a TRILL Edge node needing to know
   how the inner MAC/VLAN is mapped to the egress TRILL edge.

   Section 3.1 of [nvo3-cp-req] states that a protocol is needed to
   provide this inner to outer mapping and VN Context to each NVE
   that requires it and keep the mapping updated in a timely manner.




Dunbar, et al                                           [Page 4]

Internet-Draft    draft-dunbar-nv03-NVA-gap-analysis   June 2013



   Timely updates are important for maintaining connectivity between
   Tenant Systems.

   TRILL's directory mechanism and LISP mapping database system are
   to achieve the same goal as NVO3's NVE-NVA control plane, i.e.
   distributing the mapping table that edge nodes use to tunnel
   traffic across the underlying network. Therefore it is worthwhile
   to examine the TRILL's directory mechanism and LISP mapping
   database system, and analyze the gaps.

4. Existing Directory Components

   For the ease of description, we match the terminologies used by
   TRILL/LISP to NVO3. The document will use the NVO3's
   terminologies as much as possible throughout the document to
   describe TRILL's directory assistance mechanism.



   NVO3      LISP             TRILL

   ----     --------         --------------------------

   NVE       Edge             Edge, TRILL Edge or RBridge Edge

   NVA       MapServer        Directory



   4.1. Types of NVA:

   NVAs can be centralized or distributed with each NVA holding the
   mapping information for a subset of VNs. Centralized NVA could
   have multiple entities for redundancy purpose. A NVA could be
   instantiated on a server/VM attached to a NVE, very much like a
   TS attached to a NVE, or could be integrated with a NVE. When a
   NVA is a standalone server/VM attached to a NVE, it has to be
   reachable via the attached NVE by other NVEs. A NVA can also be
   instantiated on a NVE that doesn't have any TSs attached. The
   NVE-NVA control plane for NVA being attached to NVE will require
   additional functions on NVEs than NVA being instantiated on NVE.









Dunbar, et al                                           [Page 5]

Internet-Draft    draft-dunbar-nv03-NVA-gap-analysis   June 2013



   4.2. Key components of the information kept in the NVA

   The information held by the TRILL directories is inner-outer
   address mapping information as well as hosts' VLAN IDs. Same is
   true for NVO3's NVA. For each TS (or VM), TRILL directory has the
   following attributes:

      1. Inner Address: TS (host) Address family (IPv4/IPv6, MAC,
        virtual network Identifier MPLS/VLAN, etc)
      2. Outer Address: The list of locally attached edges (NVEs);
        normally one TS is attached to one edge, TS could also be
        attached to 2 edges for redundancy (dual homing). One TS is
        rarely attached to more than 2 edges, though it could be
        possible;
      3. Timer for NVEs to keep the entry when pushed down to or
        pulled from NVEs.
      4. Optionally the list of interested remote edges (NVEs). This
        information is for NVA to promptly update relevant edges
        (NVEs) when there is any change to this TS' attachment to
        edges (NVEs). However, this information doesn't have to be
        kept per TS. It can be kept per VN.

     NVO3's NVA will need one additional attribute: VN Context (VN
     ID and/or VN Name).



   4.3. Mapping Entries Distribution Mechanism

     A directory can offer services in a Push, Pull mode, or the
     combination of the two.

    4.3.1. Push Mode

      Under this mode, Directory Server(s) push the inner-outer
      mapping for all the entries of the VNs that are enabled on an
      edge node (NVE).  If the destination of a data frame arriving
      at the Ingress Edge (NVE) can't be found in its inner-outer
      mapping database that are pushed down from the Directory
      Server(s) (or NVA), the Ingress edge could be configured with
      one or more of the following policies:

             - simply drop the data frame,



Dunbar, et al                                           [Page 6]

Internet-Draft    draft-dunbar-nv03-NVA-gap-analysis   June 2013



             - Using legacy method(s) to forward the data frames to
               other edges, or
             - start the ''pull'' process to get information from Pull
               Directory Server(s) (or NVA)
               When the edge is waiting for reply from Pull process,
               the edge can either drop or queue the packet.


      Again, the VN Context (VNID or VN name) needs to be added for
      NVO3.

      One drawback of the Push Mode is that it usually will push
      more mapping entries to an edge (NVE) than needed.  Under the
      normal process of edge cache aging and unknown destination
      address flooding, rarely used entries would have been removed.
      It would be difficult for Directory Servers (NVA) to predict
      the communication patterns among TSs within one VN.
      Therefore, it is likely that the Directory Servers will push
      down all the entries for all the VNs that are enabled on the
      NVE.

      And with Push there really can't be any source-based policy.
      It's all or nothing.

   4.3.1.1. Requesting Push Service

      In the Push Mode, it is necessary to have a way for an edge
      node (NVE) to request directory server(s) (NVA) to start the
      pushing process, e.g. when the NVE is initialized or re-
      started.  Or it can be like a routing protocol where it just
      happens automatically.

      Push Directory servers (NVAs) advertise their availability to
      push mapping information for a particular virtual network to
      all edges who participate in the VN. There could be multiple
      directories (NVAs), with each having mapping information for a
      subset of VNs.

      TRILL edge uses modified Virtual Network scoped instances of
      the IS-IS reliable link state flooding protocol, a.k.a. the
      ESADI protocol mechanism, to announce all the Virtual Networks
      in which it is participating to directories (NVAs) who have
      the mapping information for the VNs. An edge subscribes to
      push directory information.




Dunbar, et al                                           [Page 7]

Internet-Draft    draft-dunbar-nv03-NVA-gap-analysis   June 2013



      The subscription is VN scoped, so that a directory server
      doesn't need to push down the entire set of mapping entries.
      Each Push Directory server also has a priority. For
      robustness, the one or two directories with the highest
      priority are considered as Active in pushing information for
      the VN to all edges who have subscribed for that VN.



   4.3.1.2. Incremental Push Service

      Whenever there is any change in TS' association to an edge
      (NVE), which can be triggered by TS being added, removed, or
      de-commissioned, an incremental update can be sent to the
      edges that are impacted by the change. Therefore, sequence
      numbers have to be maintained by directory servers (NVA) and
      edges (NVEs).

      If the Push Directory server is configured to believe it has
      complete mapping information for VN X then, after it has
      actually transmitted all of its ESADI-LSPs for X it waits its
      CSNP time (see Section 6.1 of [ESADI]), and then updates its
      ESADI-Parameters APPsub-TLV to set the Complete Push (CP) bit
      to one. It then maintains the CP bit as one as long as it is
      Active.

    4.3.2. Pull Mode

      Under this mode, an NVE pulls the mapping entry from the
      directory servers (or NVA) when its cache doesn't have the
      entry.

      The main advantage of Pull Mode is that state is stored only
      where it needs to be stored and only when it is required. In
      addition, in the Pull Mode, edge nodes (NVEs) can age out
      mapping entries if they haven't been used for a certain period
      of time. Therefore, each edge (NVE) will only keep the entries
      that are frequently used, so its mapping table size will be
      smaller than a complete table pushed down from NVA.

      The drawback of Pull Mode is that it might take some time for
      NVEs to pull the needed mapping from NVA. Before NVE gets the
      response from NVA, the NVE has to buffer the subsequent data
      frames with destination address to the same target. The buffer
      could overflow before the NVE gets the response from NVA.
      However, this scenario should not happen very often in data
      center environment because most likely the TSs are end systems



Dunbar, et al                                           [Page 8]

Internet-Draft    draft-dunbar-nv03-NVA-gap-analysis   June 2013



      which have to wait for (TCP) acknowledgement before sending
      subsequent data frames.  Another option is forward, not flood,
      subsequent frames to a default location, i.e. forward to a re-
      encapsulating NVE.



      The practice of an edge waiting and dropping packets upon
      receiving an unknown DA is not new. Most deployed routers
      today drop packets while waiting for target addresses to be
      resolved. It is too expensive to queue subsequent packets
      while resolving target address. The routers send ARP/ND
      requests to the target upon receiving a packet with DA not in
      its ARP/ND cache and wait for an ARP or ND responses. This
      practice minimizes flooding when targets don't exist in the
      subnet. When the target doesn't exist in the subnet, routers
      generally re-send an ARP/ND request a few more times before
      dropping the packets. The holding time by routers to wait for
      an ARP/ND response when the target doesn't exist in the subnet
      can be longer than the time taken by the Pull Mode to get
      mapping from NVA.

   4.3.2.1. Pull Requests

      Here are some events that can trigger the pulling process:

        o An edge node (NVE) receives an ingress data frame with a
          destination whose attached edge (NVE) is unknown, or
        o The edge node (NVE) receives an ingress ARP/ND request for
          a target whose link address (MAC) or attached edge (NVE)
          is unknown.

      Each Pull request can have queries for multiple inner-outer
      mapping entries.

   4.3.2.2. Pull Response

      There are several possibilities of the Pull Response:

      1. Valid inner-outer address mapping, coupled with the valid
         timer indicating how long the entry can be cached by the
         edge (NVE).
         The timer for cache should be short in an environment where
         VMs move frequently. The cache timer can also be configured.




Dunbar, et al                                           [Page 9]

Internet-Draft    draft-dunbar-nv03-NVA-gap-analysis   June 2013



      2. The target being queried is not available. The response
         should include the policy if requester should forward data
         frame in legacy way, or drop the data frame.

      3. The requestor is administratively prohibited from getting an
         informative response.


      If no response is received to a Pull Directory request within
      a configurable timeout, the request should be re-transmitted
      with the same Sequence Number up to a configurable number of
      times that defaults to three.

   4.3.2.3. Cache Consistency

      It is important that the cached information be kept consistent
      with the actual placement of VMs. Therefore, it is highly
      desirable to have a mechanism to prevent NVEs from using the
      staled mapping entries.

      When data at a Pull Directory changes, such as entry being
      deleted or new entry added, and there may be unexpired stale
      information at a querying edge (NVE), the Pull Directory MUST
      send an unsolicited message to the edge (NVE).

      To achieve this goal, a Pull Directory server MUST maintain
      one of the following, in order of increasing specificity.

      1. An overall record per VN of when the last returned query
      data will expire at a requestor and when the last query record
      specific negative response will expire.

      2. For each unit of data (IA APPsub-TLV Address Set) held by
      the server and each address about which a negative response
      was sent, when the last expected response with that unit or
      negative response will expire at a requester.

      Note: It is much more important to cache negative reply,
      because there are many invalid address queries. Study has
      shown that for each valid ND query, there are 100's of invalid
      address queries.



      3. For each unit of data held by the server and each address
      about which a negative response was sent, a list of Edges that
      were sent that unit as the response or sent a negative


Dunbar, et al                                           [Page 10]

Internet-Draft    draft-dunbar-nv03-NVA-gap-analysis   June 2013



      response to the address, with the expected time to expiration
      at each of them.



   4.3.2.4. Pull Request Errors

      If errors occur at the query level, they MUST be reported in a
      response message separate from the results of any successful
      queries. If multiple queries in a request have different
      errors, they MUST be reported in separate response messages.
      If multiple queries in a request have the same error, this
      error response MAY be reported in one response message.



   4.3.2.5. Redundant Pull Directories (NVAs)

      There could be multiple directories (NVA) holding mapping
      information for a particular VN for reliability or scalability
      purposes. Pulling Directories (NVAs) advertise themselves by
      having the Pull Directory flag on in their Interested VNs sub-
      TLV [rfc6326bis].

      A pull request can be sent to any of them that is reachable
      but it is RECOMMENDED that pull requests be sent to a server
      (NVA) that is least cost from the requesting edge (NVE).


    4.3.3. Hybrid Mode

      For some edge nodes that have great number of VNs enabled and
      combined number of hosts under all those VNs are large,
      managing the inner-outer address mapping for hosts under all
      those VNs can be a challenge. This is especially true for Data
      Center gateway nodes, which need to communicate with a
      majority of VNs if not all.

      For those Edge nodes, a hybrid mode should be considered. That
      is the Push Mode being used for some VNs, and the Pull Mode
      being used for other VNs. It is the network operator's
      decision by configuration as to which VNs' mapping entries are
      pushed down from directories (NVA) and which VNs' mapping
      entries are pulled.

      In addition, directory can inform the Edge to use legacy way
      to forward if it doesn't have the mapping information, or the


Dunbar, et al                                           [Page 11]

Internet-Draft    draft-dunbar-nv03-NVA-gap-analysis   June 2013



      Edge is administratively prohibited from forwarding data frame
      to the requested target.



5. Redundancy

   For redundancy purpose, there should be more than one directory
   (NVAs) that hold mapping information for each VN. At any given
   time, only one or a small number of push directories is
   considered as active for a particular VN. All NVAs should
   announce its capability and priority to all the edges.

6. Inconsistency Processing

   If an edge (NVE) notices that a Push Directory server (NVA) is no
   longer reachable [RFCclear], it MUST ignore any Push Directory
   data from that server because it is no longer being updated and
   may be stale.

   There may be transient conflicts between mapping information from
   different Push Directory servers (NVAs) or conflicts between
   locally learned information and information received from a Push
   Directory server. TRILL associates a confidence level with
   address table information so, in case of such conflicts,
   information with a higher confidence value is preferred over
   information with a lower confidence. In case of equal confidence,
   Push Directory information is preferred to locally learned
   information and if information from Push Directory servers
   conflicts, the information from the higher priority Push
   Directory server is preferred.

7. Gap Summary

   7.1. Features necessary to NVO3 but not present in TRILL

     NVO3's NVA will need one additional attribute: VN context (VN
     ID and/or VN Name).

     For data center networks that don't have IS-IS protocol
     enabled, other mechanism have to be considered.









Dunbar, et al                                           [Page 12]

Internet-Draft    draft-dunbar-nv03-NVA-gap-analysis   June 2013



   7.2. Additional detailed requirement applicable to NVO3's NVA

   Here are some of the TRILL's directory detailed requirements that
   should be considered by NVO3 NVA as well:

   - Push Mode:
        o For redundancy purposes, for each VN there should be
           multiple NVA entities holding the mapping information for
           the TSs in the VN. At any given time, only one or a small
           number of the NVAs are considered as Active for a
           particular VN. All NVAs should announce their capability
           and priority to all the edges.
        o If the destination of a data frame arriving at the Ingress
           Edge (NVE) can't be found in its inner-outer mapping table
           that are pushed down from the Directory Server(s) (NVA),
           the Ingress edge could be configured to:

               simply drop the data frame,
               flood it to all other edges that are in the same VN,
               or
               start the ''pull'' process to get information from Pull
               Directory Server(s) (or NVA)
        o If an NVE lost its connection to its NVA, it MUST ignore
           any Push Directory data from that server because it is no
           longer being updated and may be stale.
        o When transient conflict occurs: higher priority data take
           precedence.

   - Pull Mode:
        o The Pull Directory response could indicate that the
           address being queried is not available in NVA or that the
           requestor is administratively prohibited from getting an
           informative response.
        o The timer for ingress NVE caching should be short in an
           environment where VMs move frequently. The cache timer
           could be configured or could be sent along with the Pulled
           reply from the NVA.
        o Each Pull request can have multiple queries for different
           TSs.
        o It is highly desirable to have a mechanism to prevent NVEs
           from using the stale mapping entries pulled from NVA.




Dunbar, et al                                           [Page 13]

Internet-Draft    draft-dunbar-nv03-NVA-gap-analysis   June 2013



        o While waiting for query response from NVA, the NVE has to
           buffer the subsequent data frames with destination address
           to the same target. The buffer could overflow before the
           NVE gets the response from NVA.
        o If no response is received to a Pull Directory request
           within a configurable timeout, the request should be re-
           transmitted with the same Sequence Number up to a
           configurable number of times.

   - Hybrid Mode:
        o NVE can be configured to get some VN's mapping entries via
           push mode and other VN's mapping entries via pull mode.


8. Security Considerations

    Accurate mapping of inner address into outer addresses is
    important to the correct delivery of information. The security
    of specific directory assisted mechanisms will be discussed in
    the document or documents specifying those mechanisms.

    For general TRILL security considerations, see [RFC6325].

9. IANA Considerations

   This document requires no IANA actions. RFC Editor: please delete
   this section before publication.

10. Acknowledgements

   Special thanks to Dino Farinacci for valuable suggestions and
   comments to this draft.

11. References

   11.1. Normative References

   As an Informational document, this draft has no Normative
   References.


   [nvo3-nve-nva-cp-req] draft-ietf-nvo3-nve-nva-cp-req-00, "Network
             Virtualization NVE to NVA Control Protocol
             Requirements", Kreeger, et al. July 31, 3013.





Dunbar, et al                                           [Page 14]

Internet-Draft    draft-dunbar-nv03-NVA-gap-analysis   June 2013








   11.2. Informative References

   [802.1Q] IEEE Std 802.1Q-2011, "IEEE Standard for Local and
             metropolitan area networks - Virtual Bridged Local Area
             Networks", May 2011.

   [802.1Qbg] IEEE Std 802.1Qbg-2012, ''Media Access Control (MAC)
          Bridges and Virtual Bridged Local Area Networks -- Edge
          Virtual Bridging'', July 2012.

   [RFC826] Plummer, D., "An Ethernet Address Resolution Protocol",
             RFC 826, November 1982.

   [RFC4861] Narten, T., Nordmark, E., Simpson, W., and H. Soliman,
             "Neighbor Discovery for IP version 6 (IPv6)", RFC 4861,
             September 2007.

   [RFC6325] Perlman, et, al ''RBridge: Base Protocol Specification'',
             https://datatracker.ietf.org/doc/rfc6325/, July, 2011


   [RFC6439] Perlman, et, al ''RBridges: Appointed Forwarders'',
             https://datatracker.ietf.org/doc/rfc6439/, Nov 2011






 Authors' Addresses

   Linda Dunbar
   Huawei Technologies
   5430 Legacy Drive, Suite #175
   Plano, TX 75024, USA
   Phone: (469) 277 5840
   Email: linda.dunbar@huawei.com








Dunbar, et al                                           [Page 15]

Internet-Draft    draft-dunbar-nv03-NVA-gap-analysis   June 2013




   Donald Eastlake
   Huawei Technologies
   155 Beaver Street
   Milford, MA 01757 USA
   Phone: 1-508-333-2270
   Email: d3e3e3@gmail.com










































Dunbar, et al                                           [Page 16]