Network Working Group                                Tissa Senevirathne 
 Internet Draft                                                (Force10) 
 Document: draft-tsenevir-l2vpn-pmesh-00.txt                            
 Category: Informational                                                 
                                                                         
                                                              June, 2001 
  
  
    Use of Partial meshed tunnels to achieve forwarding behavior of full 
                              meshed tunnels. 
   
 Status of this Memo 
  
     
    This document is an Internet-Draft and is in full conformance with 
       all provisions of Section 10 of RFC2026 [1].  
     
    Internet-Drafts are working documents of the Internet Engineering 
    Task Force (IETF), its areas, and its working groups. Note that 
    other groups may also distribute working documents as Internet-
    Drafts. Internet-Drafts are draft documents valid for a maximum of 
    six months and may be updated, replaced, or obsoleted by other 
    documents at any time. It is inappropriate to use Internet- Drafts 
    as reference material or to cite them other than as "work in 
    progress."  
    The list of current Internet-Drafts can be accessed at 
    http://www.ietf.org/ietf/1id-abstracts.txt  
    The list of Internet-Draft Shadow Directories can be accessed at 
    http://www.ietf.org/shadow.html. 
     
     
    For potential updates to the above required-text see: 
    http://www.ietf.org/ietf/1id-guidelines.txt 
     
 Placement of this Memo in Sub-IP Area 
     
    RELATED DOCUMENTS: 
     
      See reference. 
     
    WHERE DOES IT FIT IN THE PICTURE OF THE SUB-IP WORK 
     
    The ID presented fits in to the PPVPN WG and/or CCAMP WG.  
     
    WHY IS IT TARGETED AT THIS WG(s) 
     
    Sub-IP and IP tunnels are becoming a popular method in carrying 
    data transparently over the provider or the core network. 
     
    Use of such tunnels are key component of PPVPN infrastructure. On 
    the other hand CCAMP WG charter includes defining common control 
    and measurement plane. Hence optimal use of tunnels is an integral 
    part of the control infrastructure. 
     
  
  Senevirathne       Informational - December 2001                   1 
                   draft-tsenevir-l2vpn-pmesh-00.txt        June, 2001 
  
  
    JUSTIFICATION 
     
    Increasing number of service providers are offering Ethernet 
    services to the customer. In the core of the network IP or Sub-IP 
    technologies are used. In general, Ethernet services provided to 
    customers are VPN service having multi-points of service access (as 
    opposed to point-to-point). 
     
    Requirement to use fully meshed networks seriously affects the 
    scalability of Layer 2 NBVPN. The methods presented in this 
    document facilitate service providers to offer scalable Layer 2 VPN 
    solutions. 
     
 1. Abstract 
     
    This document presents methods to achieve proper forwarding of 
    Broadcast, Multicast and Unknown traffic over a set of partial mesh 
    tunnels. In addition, the methods presented in this document may be 
    used to achieve loop free topology.  
     
   
 Senevirathne        Informational - December 2001                   2 
                   draft-tsenevir-l2vpn-pmesh-00.txt        June, 2001 
  
  
     
 2. Conventions used in this document 
     
    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 
    "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in 
    this document are to be interpreted as described in RFC-2119 [2]. 
     
    Table of Content 
     
    1. Abstract......................................................2 
    2. Conventions used in this document.............................3 
    3. Introduction..................................................3 
    4.0 Deployment scenario for Layer 2 NBVPN........................4 
    5.0 Objective and Methodology....................................5 
    6.0 Building Blocks..............................................6 
    7.0 Application of Minimum Spanning Tree Algorithm...............7 
    7.1 Calculation of Minimum Spanning Tree.........................7 
    7.2 Calculation of Source Mesh...................................7 
    7.3 Calculation of Intermediate Mesh.............................8 
    8.0 Application of Shortest Path First (SPF)algorithm:...........8 
    8.1 Calculation of Source Mesh...................................8 
    8.2 Calculation of Intermediate Mesh.............................9 
    9.0 Comparison between SPF and STP algorithms....................9 
    10.0 Interaction between various components.....................10 
    11.0 Other Applications:........................................10 
    12.0 Security Considerations....................................11 
    13.0 References.................................................11 
    14.0 Acknowledgments............................................11 
    15.0 Author's Addresses.........................................11 
    Full Copyright Statement........................................12 
     
     
 3. Introduction 
  
    Problem Definition 
     
    Layer 2 NBVPN services use tunneling methods such as IPSec or MPLS 
    to carry customer's Layer 2 traffic over the provider's network. 
    Layer 2 NBVPN deployment, in general, requires many-to-many 
    connectivity. Unknown unicast, Broadcast and Multicast traffic are 
    required to be forwarded to all end-points of the Layer 2 NBVPN. 
    When there are more than two end-points in the Layer 2 NBVPN, to 
    achieve the required Layer 2 behavior, a set of fully meshed 
    tunnels are required. However requirement of such fully meshed 
    tunnels seriously affects the scalability of Layer 2 NBVPN that 
    require many-to-many connectivity.  
     
     
    In this document we provide methods to achieve required Layer 2 
    behavior using set of partially meshed tunnels.  
     
   
 Senevirathne        Informational - December 2001                   3 
                   draft-tsenevir-l2vpn-pmesh-00.txt        June, 2001 
  
  
     In the first part of the document a typical deployment scenario is 
    presented. In this section we also explain the proposed solution 
    using an example. 
     
    In the second part required building blocks for the proposed 
    solution is discussed. 
     
    Later in the document use of Spanning Tree and Shortest Path First 
    Algorithms are discussed. Also discussed are the advantages, and 
    disadvantages of using Spanning Tree Algorithms vs. Shortest Path 
    First Algorithms for the purpose. 
     
    Scope of the Document 
     
    In this document word node and end-point is used to identify PE 
    devices. The discussion in the document does not focus on devices 
    in the core of the network. The "tunnel" in this discussion refer 
    to a logical connection between two PE devices.  
     
  
 4.0 Deployment scenario for Layer 2 NBVPN 
     
     
    Consider the scenario where there are (n)end points in the Layer 2 
    NBVPN. Now each PE device is required to maintain (n -1) tunnels to 
    all other end points. As a result the network is required to 
    maintain n*(n-1) set of tunnels. 
     
    Diagram below depicts partial mesh deployment of a 4 end-point 
    Layer 2 NBVPN deployment. Typically, some end-points may be 
    connected to all other end-points and some may only be connected to 
    a sub set of end-points. 
     
     
     
     --------                               ------- 
    |  (PE)  |                             |  (PE)  | 
    |   A    |---------                    |   D    | 
    |        |         \                   |        | 
     --------           \                   -------- 
         |               \                      | 
         |                \                     | 
         |                 \                    | 
         |                  \                   | 
         |                   \                  | 
     --------                 \            -------- 
    |  (PE)  |                  --------- |  (PE)   | 
    |   B    |----------------------------|   C     | 
    |        |                            |         | 
     --------                              --------- 
     
     
     
   
 Senevirathne        Informational - December 2001                   4 
                   draft-tsenevir-l2vpn-pmesh-00.txt        June, 2001 
  
  
    Fig: Partial Mesh connectivity 
     
    In theory, a partial mesh graph G may be represented by union of 
    set of fully mesh graphs g and set of partial mesh graphs g'. In 
    the above diagram graph G {A, B, C, D} contain fully mesh graph g 
    {A,B,C} and partial mesh graph g' {C,D}. 
     
    Discussions in this document will focus on the forwarding at node A 
    and C. Node A represents partial mesh node. Node C represents a 
    fully meshed node. However, the discussion is equally applicable to 
    all other nodes in other graphs with any arbitrarily number of 
    nodes. Only restriction is that graph is required to be connected 
    (ie no disconnected nodes). 
     
     
 5.0 Objective and Methodology 
     
    Objective 
     
    Each node must be able to derive the mesh topology of the Layer 2 
    NBVPN domain. The derived topology by each node must be identical. 
    As a result each node is capable of deriving nodes that have 
    already received traffic from the upstream. Forwarding policies 
    deduce set of outgoing tunnels for each incoming tunnel. Note: for 
    fully meshed nodes the outgoing tunnel set is NULL. 
     
    Methodology 
     
    Each end-point advertise its preference (by some means) to receive 
    traffic over a given tunnel T that is terminating/starting at the 
    end-point. A tunnel T is represented using end-point addresses 
    (id). The preference value assigned must be coordinated via global 
    policies. As an example, lower numerical value represents higher 
    preference.  
     
    We assume that there is control plane IP connectivity between all 
    end-points. Intermediate devices in the IP plane do not modify the 
    tunnel preference as they forward advertisements. 
     
    Each source node chooses set of optimal tunnels, using the 
    preference information received. If there is no direct tunnel the 
    best intermediate end-point is selected. The set of tunnels that a 
    source end-point use to forward unknown, multicast and broadcast 
    traffic is called Source Mesh and denoted by Ms{A} where A is the 
    end-point id. Ms{A} = [A-B,A-C]. 
     
    Each end-point also maintains set of tunnels for each incoming 
    tunnel. These tunnels are called intermediate mesh and represented 
    by receiving Tunnel and intermediate end-point address. The 
    intermediate Mesh is denoted by Mi{k,t}; where j is the 
    intermediate node and t is the tunnel that traffic is arriving. As 
    an example intermediate mesh for tunnel A-C at end-point C is Mi{C, 
    A-C}. Mi{C,A-C} = [D]. 
   
 Senevirathne        Informational - December 2001                   5 
                   draft-tsenevir-l2vpn-pmesh-00.txt        June, 2001 
  
  
     
    The above source and intermediate graphs(mesh) are derived using a 
    Minimum Spanning Tree or a Shortest Path First Algorithm. Based on 
    the methods presented in discussion the intermediate mesh derived 
    by an end-point is a subset of source mesh of the corresponding 
    source end-point. As an example Mi{C, A-C} is a subset of Ms{A}. 
    Thus guaranteeing loop free forwarding. 
      
    [3] has specified the requirement to maintain a separate Virtual 
    Forwarding Instance (VFI) by each PE device for each Layer 2 NBVPN 
    domain. [3] also specify the requirement for each VFI to contain a 
    flooding scope. Flooding scope of VFI represents tunnels and local 
    ports that any unknown, broadcast or multicast packets should be 
    forwarded. We propose to use multiple flooding scopes; flooding 
    scope for locally originating traffic is called source flooding and 
    denoted by Fs{i} where i is the end-point identifier. At end-point 
    A source flooding scope is denoted by Fs{A}. Flooding scope of non 
    locally originating traffic is called intermediate flooding scope 
    and denoted by Fi{j-t} where j is the intermediate endpoint id and 
    t is the receiving tunnel id. As an example at end-point C there 
    are three intermediate flooding scopes: Fi{C,A-C}, Fi{C, B-C}, 
    Fi{C, D-C} 
     
    As a result; in theory each end-point for each Layer 2 NBVPN domain 
    has a single source flooding scope and multiple intermediate 
    flooding scopes (each for each tunnel). 
  
    Although exact implementation details of multiple flooding scopes 
    are beyond the scope of this document we would like to present a 
    simple method to implement multiple flooding scopes. A CAM (Content 
    Addressable Memory) lookup with ingress tunnel (port) id and Layer 
    2 NBVPN domain Id may be used to obtain the appropriate flooding 
    scope. Similarly, if MPLS is used as the tunneling method; incoming 
    Label may be used to derive the corresponding flooding scope.  
     
     
 6.0 Building Blocks 
     
    The methods presented in this document can be broadly classified in 
    to four major blocks. These building blocks collectively specify 
    the implementation of Layer 2 NBVPN using partial meshed tunneling 
    topologies. 
     
    1. Global policy for reachability preference. It is important that 
    all end-points use the same set of polices. Uses of such policies 
    assure proper forwarding behavior. Reachability preference policies 
    are used to derive the source and intermediate mesh. 
     
     
 2. Advertisement protocol for advertisement of  reachability 
 preferences. The protocol used for advertisement of preferences MAY be 
 Link State. Use of such protocol guarantees faster convergence. OSPF 
 Opaque LSA can be easily adapted for the purpose. 
   
 Senevirathne        Informational - December 2001                   6 
                   draft-tsenevir-l2vpn-pmesh-00.txt        June, 2001 
  
  
      
    3. Tunneling methods. Tunneling protocols such as IPSec or MPLS can 
    be used to implement required tunnels. 
     
     
    4. Tree Algorithm. A Tree generation algorithm that is capable of 
    building loop free graphs that use minimum cost concepts MUST be 
    used. In this discussion we propose to use either a Minimum 
    Spanning Tree Algorithm or Shortest Path First algorithm. Prim 
    Algorithm [4] that is used in 802.1w [5] specification may be used 
    for Minimum Spanning Tree. Dijkstra algorithm that is used in OSPF 
    [6] may be used for Shortest Path First Algorithm.   
     
     
 7.0 Application of Minimum Spanning Tree Algorithm 
     
    When using ST there is a single tree for the entire Layer 2 NBVPN, 
    rooted at some node. In order to identify the root node for Layer 2 
    NBPVN, the participating nodes advertise the node priority. In 
    addition the nodes are also required advertise the reachability 
    preference for each tunnel that originate at the node. 
     
    edge representation = {node-id, nexthop-node-id} 
     
    node-id = IP address of the node. 
     
    node-preference = [integer] 
     
    Semantics of node-preference is a global policy. As an example 
    numerically lower numbers may represent higher preference. 
     
 7.1 Calculation of Minimum Spanning Tree 
     
    Step 1: Select the node with the highest preference as the root 
    node. In the event of a tie use the node-id as the tie barker. 
     
    Step 2: Derive the Spanning Tree for the Layer 2 NBVPN using a 
    minimum spanning tree algorithm. Here we propose to use Prim [4] 
    Algorithm for the purpose. 
     
    Step 3. Let the derived Spanning Tree is T. 
  
 7.2 Calculation of Source Mesh 
     
    Step 1. Select source node v. 
     
    Step 2. Remove all the edges in the graph T except the edges that 
    are directly connected to the node v. Let say this is graph T'. 
     
    Step 3. Graph T' is the source Mesh Ms for node v. 
     
     
   
 Senevirathne        Informational - December 2001                   7 
                   draft-tsenevir-l2vpn-pmesh-00.txt        June, 2001 
  
  
    Step 4. The Graph T' represent the set of active tunnels to forward 
    traffic (unknown, broadcast and multicast) originating from the 
    node v. 
     
     
 7.3 Calculation of Intermediate Mesh 
     
    Let v is the local node. 
     
    Let T' is the source Mesh derived for the Local node v. 
     
    Let g is set of all nodes that have a directed edge with the local 
    node v. 
     
    for each node u in g 
  
    Step 1: Remove corresponding edge in T'. 
     
    Step 2: The resultant graph T" is (intermediate mesh) Mi{v, u-v}. 
    Broadcast, unknown and multicast traffic arriving on tunnel {u-v} 
    MUST be forwarded on T". 
  
    Repeat the above process for each tunnel starting/terminating at 
    node v. 
     
     
 8.0 Application of Shortest Path First (SPF)algorithm: 
     
     
    We propose to use Dijkstra algorithm for the purpose of calculating 
    Source Mesh and Intermediate Mesh. Nodes (endpoints), edges 
    (tunnels) are represented as below. 
     
    edge representation = {node-id, nexthop-node-id} 
     
    node-id = IP address of the node. 
     
    Preference = [integer]; preference to receive traffic over a 
    tunnel. Deduction of preference is a global policy that all nodes 
    agrees. 
     
    Treat each Layer 2 NBVPN domain as a single graph. In analogy to 
    OSPF, that is single area. 
     
 8.1 Calculation of Source Mesh 
     
    Step 1. Calculate the SPF tree T for the source node v. 
     
    Step 2. Remove all the edges in the graph T except the edges that 
    are directly connected to the node v. Let say this is graph T'. 
     
    Step 3. Graph T' is the source Mesh Ms for node v. 
     
   
 Senevirathne        Informational - December 2001                   8 
                   draft-tsenevir-l2vpn-pmesh-00.txt        June, 2001 
  
  
     
    Step 4. The Graph T' represent the set of active tunnels to forward 
    traffic (unknown, broadcast and multicast) originating from the 
    node v. 
     
     
 8.2 Calculation of Intermediate Mesh 
     
    Let v is the local node. 
     
    Let g is set of all nodes that have a directed edge with the local 
    node v. 
     
    for each node u in g 
     
    Step 1: calculate the SPF tree T for node u such that node u is 
    member of g. 
     
    Step 2: Traverse from u to each node in T. Remove nodes that does 
    not require traversing via local node v. Let the resultant graph 
    T'. 
     
    Step 3: select local node v. Remove all the edges in T' that do not 
    have a direct edge with v. Remove edge {u-v}. Let T'' is the 
    resultant graph. 
     
    Step 4: The resultant graph T'' is the set of active tunnels for 
    traffic arriving on tunnel {u-v}. Broadcast, unknown and multicast 
    traffic arriving on tunnel {u-v} MUST be forwarded on T''. 
     
    Repeat the above process for each tunnel starting/terminating at 
    node v. 
     
     
 9.0 Comparison between SPF and STP algorithms 
     
    When using Spanning Tree algorithm, there is a single Spanning Tree 
    for the given Layer 2 NBVPN domain (graph). The Tree is rooted at 
    the root node that was selected based on some criteria. As a 
    result, path taken by some nodes to reach some other nodes may not 
    be optimal.  
     
    When using Shortest Path First Algorithms, there is a separate 
    Shortest Path Tree for each node. As a result path taken by traffic 
    originating at the node is always assured to take the best path. 
     
    However, SPF requires each node to derive SPF trees for each node. 
    On the other hand Spanning Tree algorithm requires deriving only a 
    single tree. All intermediate meshes and Source mesh can be derived 
    from the Spanning Tree (there is only one tree for the network/VPN 
    domain). Hence Spanning Tree requires less iteration of the 
    algorithm. 
     
   
 Senevirathne        Informational - December 2001                   9 
                   draft-tsenevir-l2vpn-pmesh-00.txt        June, 2001 
  
  
    Dijkstra is a very popular algorithm used to derive Shortest Path 
    Trees. Dijkstra has computational complexity of O(n^2) ; where n is 
    number of nodes. 
     
    Prim's algorithm is a popular algorithm used for Spanning Tree. 
    Prim's Algorithm has computational complexity of O(n^2) ; where n 
    is number of nodes. Kruskal's [7] algorithm is a variation of 
    Prim's. Kruskal's Algorithm has a computational complexity of 
    O(E*logE) where E is number of edges and E << N^2. 
     
 10.0 Interaction between various components 
     
     --------       ---------                   -------- 
    | NBVPN1 |     | NBVPN2  |                 |NBVPNn   | 
    | MDB    |     | MDB     |                 |MDB      | 
     --------       ---------                   -------- 
       |               |                             | 
       |               |                             | 
       |     ----------------                        | 
       |    | OSPF Opaque/   |                       | 
        - - | BGP-MP mux     |----------------------- 
             ---------------- 
                     | 
                     | 
                     | 
      -------------------------------------------- 
     |                                            | 
     |       Core Protocol Engine (OSPF/BGP)      | 
     |                                            | 
      -------------------------------------------- 
     
     
    NBVPN(i)MDB - Represent reachability information received from 
    other end-points. SPF or STP calculation for source Mesh and 
    intermediate Mesh for the NBVPN domain is performed in this 
    context. 
     
    OSPF Opaque/BGP-MP Mux - This module performs multiplexing of 
    tunnel reachability information received to the correct NBVPN 
    instance. 
     
    Core Protocol Engine - This module represent the protocol 
    implementation. 
     
 11.0 Other Applications: 
     
    The methods presented in this document may be easily applicable to 
    any other applications that require optimum path selection via 
    transit node. Optical Lambda switching is such application that may 
    use the methods presented in this document. 
     
    Internet Exchange Points (IEP) are a new evolving concept. IEP use 
    a shared network fabric to provide multi-party peering for customer 
   
 Senevirathne        Informational - December 2001                  10 
                   draft-tsenevir-l2vpn-pmesh-00.txt        June, 2001 
  
  
    sites. Methods presented in this discussion can be easily adapted 
    to provide multi-party peering using partially meshed networks. 
     
 12.0 Security Considerations 
  
    A security analysis of the methods presented in this discussion has 
    not yet been performed. 
     
 13.0 References 
     
  
    1  Bradner, S., "The Internet Standards Process -- Revision 3", BCP 
       9, RFC 2026, October 1996. 
     
    2  Bradner, S., "Key words for use in RFCs to Indicate Requirement 
       Levels", BCP 14, RFC 2119, March 1997 
     
    3  Senevirathne, T., and et.al, "Requirements for Layer 2 Network 
       Based VPN", Work in Progress, May 2001. 
     
    4  Gross, J., and Yellen, J, "Graph Theory and its Applications", 
       CRC Press, 1998. 
     
    5  IEEE/ISO , "Amendment 2-Rapid Reconfiguration", IEEE802.1w, 
       March 26, 2001. 
     
    6  Moy, J., OSPF Version 2, RFC 1583, March 1994. 
     
    7  Aho, A.V., and et.al., "Data Structures and Algorithms", 
       Addison-Wesley 1983. 
     
    
    
    
    
14.0 Acknowledgments 
    
   Several people provided suggestions and comments and volunteered to 
   review this draft. The suggestions and feedback received helped this 
   document to evolve to a draft. Waldemar Augustyn and Andrew Smith 
   provided valuable suggestions and comments. 
    
    
15.0 Author's Addresses 
    
   Tissa Senevirathne 
   Force10 Networks 
   1440 McCarthy Blvd 
   Milipitas, CA 95035
   Phone: 408-965-5103 
   Email: tissa@force10networks.com 
    
   
 Senevirathne        Informational - December 2001                  11 
                  draft-tsenevir-l2vpn-pmesh-00.txt        June, 2001 
 
 
    
Full Copyright Statement 
 
   "Copyright (C) The Internet Society (2001). All Rights Reserved. 
   This document and translations of it may be copied and furnished to 
   others, and derivative works that comment on or otherwise explain it 
   or assist in its implmentation may be prepared, copied, published 
   and distributed, in whole or in part, without restriction of any 
   kind, provided that the above copyright notice and this paragraph 
   are included on all such copies and derivative works. However, this 
   document itself may not be modified in any way, such as by removing 
   the copyright notice or references to the Internet Society or other 
   Internet organizations, except as needed for the purpose of 
   developing Internet standards in which case the procedures for 
   copyrights defined in the Internet Standards process must be 
   followed, or as required to translate it into 
 
  
Senevirathne        Informational - December 2001                  12