TRILL Working Group                                          R. Perlman 
Internet Draft                                                      Sun 
Expires: June 2007                                               S. Gai 
                                                          Nuova Systems 
                                                                S. Sane 
                                                                  Cisco 
                                                               J. Touch 
                                                                USC/ISI 
                                                      December 13, 2006 
                                    
 
                   Rbridges: Base Protocol Specification 
                 draft-ietf-trill-rbridge-protocol-01.txt 


Status of this Memo 

   By submitting this Internet-Draft, each author represents that       
   any applicable patent or other IPR claims of which he or she is       
   aware have been or will be disclosed, and any of which he or she       
   becomes aware will be disclosed, in accordance with Section 6 of       
   BCP 79. 

   Internet-Drafts are working documents of the Internet Engineering 
   Task Force (IETF), its areas, and its working groups.  Note that 
   other groups may also distribute working documents as Internet-
   Drafts. 

   Internet-Drafts are draft documents valid for a maximum of six months 
   and may be updated, replaced, or obsoleted by other documents at any 
   time.  It is inappropriate to use Internet-Drafts as reference 
   material or to cite them other than as "work in progress." 

   The list of current Internet-Drafts can be accessed at 
        http://www.ietf.org/ietf/1id-abstracts.txt 

   The list of Internet-Draft Shadow Directories can be accessed at 
        http://www.ietf.org/shadow.html 

   This Internet-Draft will expire on June 13, 2007. 

Abstract 

   RBridges provide the ability to have an entire campus, with multiple 
   physical links, look to IP like a single subnet. The design allows 
   for zero configuration of switches within a campus, optimal pair-wise 
   routing, safe forwarding even during periods of temporary loops, and 
   the ability to cut down on ARP/ND traffic. The design also supports 
   VLANs, and allows forwarding tables to be based on RBridge 
 
 
Perlman                 Expires June 13, 2007                  [Page 1] 

Internet-Draft             RBridge Protocol               December 2006 
    

   destinations (rather than endnode destinations), which allows 
   internal routing tables to be substantially smaller than in 
   conventional bridge systems.  

Table of Contents 

    
   1. Introduction...................................................2 
   2. Detailed Rbridge Design........................................7 
      2.1. Link State Protocol.......................................7 
         2.1.1. Separate Instances...................................7 
         2.1.2. Multiple Rbridge IS-IS Instances.....................7 
      2.2. Distribution Tree Calculation.............................9 
      2.3. Pruning the Ingress Rbridge Tree.........................10 
      2.4. Designated Rbridge.......................................11 
      2.5. Wiring Closet Topology...................................13 
      2.6. Learning Endnode Location................................14 
      2.7. Forwarding Behavior......................................14 
         2.7.1. Receipt of a Native Packet..........................14 
         2.7.2. Receipt of an In-transit Packet.....................14 
            2.7.2.1. Flooded Packet.................................15 
            2.7.2.2. Unicast Packet.................................15 
      2.8. IGMP Learning............................................16 
      2.9. RBridge Nicknames........................................16 
      2.10. Forwarding Header on 802 Links..........................17 
      2.11. Handling ARP/ND Queries.................................18 
      2.12. Discovering IP Multicast Routers........................20 
      2.13. Assuring Freshness of Endnode Information...............20 
   3. Rbridge Addresses, Parameters, and Constants..................20 
   4. Security Considerations.......................................21 
   5. IANA Considerations...........................................21 
   6. Conclusions...................................................21 
   7. Acknowledgments...............................................21 
   8. References....................................................22 
      8.1. Normative References.....................................22 
      8.2. Informative References...................................22 
   Author's Addresses...............................................22 
   Intellectual Property Statement..................................23 
   Disclaimer of Validity...........................................24 
   Copyright Statement..............................................24 
   Acknowledgment...................................................24 
    
1. Introduction 

   In traditional IPv4 and IPv6 networks, each link must have a unique 
   prefix.  This means that a node that moves from one link to another 
   must change its IP address, and a node with multiple links must have 
 
 
Perlman                 Expires June 13, 2007                  [Page 2] 

Internet-Draft             RBridge Protocol               December 2006 
    

   multiple addresses.  It also means that a company with many links 
   (separated by routers) will have difficulty making full use of its IP 
   address block (since any link not fully populated will waste 
   addresses), and IP routers require significant configuration. Bridges 
   avoid these problems because bridges can transparently glue many 
   physical links into what appears to IP to be a single LAN. 

   However, bridge routing via the spanning tree using the layer 2 
   header has some disadvantages: 

   o  The spanning tree limits which links can be used, and therefore 
      concentrates traffic onto selected links 

   o  Forwarding based on a header without a TTL is dangerous, because 
      temporary loops might arise due to topology changes, lost spanning 
      tree messages, or components such as repeaters coming up) 

   o  Routes cannot be pair-wise shortest paths, but instead whatever 
      path remains after the spanning tree eliminates redundant paths 

   We define the term "campus" to be the set of links connected by any 
   combination of RBridges and bridges. A campus appears to IP nodes to 
   be a single subnet. 

   This document presents the design for RBridges (routing bridges), 
   which combines the advantages of bridges and routers. Like bridges, 
   RBridges are zero configuration, and are transparent to IP nodes. 
   Like routers, RBridges forward on pair-wise shortest paths, and do 
   not have dangerous behavior during temporary loops. RBridges have the 
   additional advantage that they can optimize ARP (IPv4) and ND (Ipv6) 
   by avoiding the broadcast/multicast behavior of the queries. 

   RBridges are fully compatible with current bridges as well as current 
   IPv4 and IPv6 routers and endnodes.  They are as invisible to current 
   IP routers as bridges are, and like routers, they terminate a bridged 
   spanning tree. 

   The main idea is to have RBridges run a link state protocol amongst 
   themselves. This enables them to have enough information to compute 
   pairwise optimal paths for unicast, and to calculate distribution 
   trees for delivery of packets to unknown destinations, or 
   multicast/broadcast packets. 

   RBridges must learn the location of endnodes. They learn the location 
   and layer 2 addresses of attached nodes from the source address of 
   data frames, as bridges do. Additionally, in order to facility proxy 

 
Perlman                 Expires June 13, 2007                  [Page 3] 

Internet-Draft             RBridge Protocol               December 2006 
    

   ARP or proxy ND optimizations, RBridges also learn the (layer 3, 
   layer 2) addresses of attached IP nodes from ARP or ND replies. 

   Once an RBridge learns the location of a directly attached endnode, 
   it informs the other RBridges in its link state information. 

   RBridge forwarding can be done, as with a router, via pairwise 
   shortest paths. 

   To mitigate the temporary loop issues with bridges, RBridges must 
   always forward based on a header with a hop count. Although the hop 
   count will quickly discard looping frames, it is also desirable not 
   to spawn additional copies of frames. This can be accomplished by 
   having RBridges specify the next RBridge recipient while forwarding 
   across a shared-media link. 

   Frames must be encapsulated as they travel between RBridges for 
   several reasons: 

   1. to prevent source MAC learning from frames in transmit 

   2. so that the frames can be directed towards the egress RBridge. 
      This enables forwarding tables of RBridges to be sized with the 
      number of RBridges rather than the total number of nodes in the 
      common broadcast domain 

   3. so that frames in transit can include a hop count (for links, like 
      Ethernet, that do not already contain a hop count) 

   In order to coexist with Ethernet bridges on Ethernet links, frames 
   in transit on Ethernet links must be encapsulated with an Ethernet 
   header. The outer header of an RBridge-forwarded frame must look, to 
   an Ethernet bridge on the path between two RBridges, like the header 
   of a normal frame that the bridge will forward. To enable RBridges to 
   distinguish encapsulated frames, a new Ethertype (to be assigned) 
   will be used in the outer header. 

   Inside that header is a shim header that RBridges will add to the 
   frame that will contain: 

   o  the ingress-RBridge (in the case of a broadcast/multicast/unknown 
      destination frame), or egress-RBridge (in the case of a unicast 
      frame to a known destination) 

   o  a hop count 


Perlman                 Expires June 13, 2007                  [Page 4] 

Internet-Draft             RBridge Protocol               December 2006 
    

   Inside the shim header is the original frame, as injected into the 
   campus. 

   RBridges must also support VLANs. 

   A VLAN is a way that has been used within layer 2 to partition 
   endnodes into different communities. The usual method of determining 
   which community a frame belongs to is based on the port from which it 
   is received. The first bridge inserts a VLAN tag, based on its port 
   configuration, and the last bridge removes the VLAN tag. However, 
   sometimes the VLAN tag might be inserted by an endnode on the link. 
   (where "endnode" is a source or sink of traffic on the bridged LAN). 

   RBridges will be configured with VLAN membership per port, just like 
   bridges are. And they will also enforce that a frame originating on a 
   particular VLAN only gets delivered to other links in the same VLAN. 

   A side-effect of VLANs is that it makes RBridges more scalable, since 
   endnode membership in a VLAN is only of interest to RBridges that 
   have an attached port configured to be in that VLAN. This means that 
   endnode membership in VLAN A only needs to be announced to RBridges 
   attached to a link in VLAN A. 

   There are several types of frames which RBridges must deliver, and 
   which are handled slightly differently: 

   1. frames for known unicast destinations 

   2. frames for unknown unicast destinations 

   3. frames for layer 2 multicast addresses derived from IP multicast 
      addresses 

   4. frames for layer 2 broadcast/multicast frames which are not 
      derived from IP multicast addresses 

   5. ARP/ND queries 

   6. IGMP membership reports 

   If a frame belongs in a particular VLAN, the frame must be delivered 
   only to links in that VLAN. This is true for both broadcast/multicast 
   frames, and unicast frames. 

   RBridges will calculate a distribution tree for each potential root 
   RBridge, which we will refer to as the "ingress RBRidge tree". In 
   theory, RBridges could have calculated a single spanning tree for the 
 
 
Perlman                 Expires June 13, 2007                  [Page 5] 

Internet-Draft             RBridge Protocol               December 2006 
    

   entire campus. However, it was decided that the additional 
   computation necessary to compute ingress RBridge trees was warranted 
   because: 

   1. it optimizes the distribution path and (almost always) the cost of 
      delivery when the number of destination links is a subset of the 
      total number of links. Delivery is only to a subset of links in 
      the case of VLANs and IP multicasts 

   2. for unknown destinations, out-of-order delivery is minimized 
      because in the case where a flow starts before the location of the 
      destination is known by the RBridges, the path to the destination 
      through the per-ingress-RBridge tree will be the same as the path 
      directly to the destination 

   RBridges will not use the bridge spanning tree algorithm to calculate 
   trees. Instead, the trees are calculated based on the link state 
   information, selecting a particular RBridge as the root, and with a 
   deterministic tie-breaker so all RBridges calculate the same 
   distribution tree based on the same root and same link state 
   database. Therefore the tree calculation is done without requiring 
   any additional exchange of information between RBridges. 

   Other than the two arguments above (optimal cost to deliver traffic 
   from source to a set of destinations, and minimizing out of order 
   delivery), a single tree could suffice for all multicast traffic. 

   Another option is to calculate a separate tree for each ingress 
   RBridge, and distribute multicast along the tree with the ingress 
   tree as root (where VLAN-tagged traffic and IP multicast traffic can 
   be pruned, but otherwise all multicast traffic with the same ingress 
   travels on the same links). Two reasons this solution might not be 
   preferable: 

   1. In some cases, a different tradeoff might be wanted in terms of 
      expense of computation vs. optimality of traffic distribution (so 
      fewer trees would be desired) 

   2. It might be desirable to allow choosing a different distribution 
      tree than the one rooted at the ingress RBridge, in order to allow 
      multipathing of multicast traffic injected by a particular 
      RBridge. 

   For this reason, we allow an RBridge R1 to announce (via a flag in 
   its link state announcement) whether RBridges should compute a tree 
   rooted at R1. The default is yes. If R1 is a tree root, then any 
   RBridge R2 can choose the R1-tree for distribution of multicast 
 
 
Perlman                 Expires June 13, 2007                  [Page 6] 

Internet-Draft             RBridge Protocol               December 2006 
    

   traffic that R1 is injecting into the campus. And in the shim header, 
   RBridges can specify which (bidirectional) tree the multicast packet 
   should travel along. 

2. Detailed Rbridge Design 

2.1. Link State Protocol 

   Running a link state protocol among RBridges is straightforward.  It 
   is the same as running a level 1 routing protocol in an area, with 
   endnode addresses being layer 2 addresses rather than, say, IP 
   addresses.  IS-IS is natural choice for a link state protocol because 
   it is easy in IS-IS to define new TLVs for carrying new information, 
   and because IS-IS can be done with zero configuration. All that is 
   required to run IS-IS is for each RBridge to have a unique 6-byte 
   system ID, which can be any of the RBridge's MAC addresses. 

2.1.1. Separate Instances 

   The instance of IS-IS that RBridges will implement is separate from 
   any routing protocol that IP routers will implement, just as the 
   spanning tree messages are not implemented by IP routers. 

   To prevent potential confusion between an IS-IS instance being run by 
   IP routers and the IS-IS being run by RBridges, RBridge IS-IS 
   messages will be sent to a different layer 2 multicast address than 
   layer 3 IS-IS routing messages.  The RBridge IS-IS instance is also 
   differentiated by having a distinct, contant "area address" (the 
   value 0) that would never appear as a real IS-IS area address. 

   RBridge IS-IS messages will be sent with the same Ethertype (in the 
   outer header) as RBridge-encapsulated data packets. RBridge IS-IS 
   messages will be differentiated from RBridge-encapsulated data 
   packets because RBridges will use a different multicast address (in 
   the outer header) for IS-IS messages than for encapsulated multicast 
   data messages. Unicast RBridge-encapsulated packets are sent to a 
   specific neighbor, so would not have a group address in the outer 
   header.  

2.1.2. Multiple Rbridge IS-IS Instances 

   There are two types of information that are carried in RBridge link 
   state information; "core-RBridge information", and "endnode 
   information". In theory this information could all be contained in 
   one instance of RBridge IS-IS. However, since endnode information for 
   a particular VLAN only needs to be known to RBridges that are 
   connected to links configured to be in that VLAN, each RBridge R1 
 
 
Perlman                 Expires June 13, 2007                  [Page 7] 

Internet-Draft             RBridge Protocol               December 2006 
    

   will run a "core" instance of IS-IS for the core RBridge information, 
   and an instance per VLAN that R1 is attached to, for the endnode 
   information for those VLANs. 

   The core-RBridge information, which is carried in the core-RBridge 
   instance, is: 

   1. the system IDs of RBridges which are neighbors of RBridge R1, and 
      the cost of the link to each of those neighbors 

   2. VLAN numbers of VLANs directly connected to R1 

   3. Flag indicating whether RBridges should calculate a tree rooted at 
      R1 (default = yes) 

   Even if RBridge R2 is not connected to VLAN A, it is relevant to R2 
   that R1 is connected to VLAN A, even though R2 does not need to know 
   which endnodes are in VLAN A. The reason for this is to allow R2 to 
   filter multicast/unknown destination packets that are VLAN-tagged. If 
   R2 is forwarding a multicast packet tagged with VLAN A, R2 need not 
   forward it onto branches of the distribution tree that have no 
   downstream VLAN A links. 

   The endnode information for VLAN A, which is carried in the VLAN A 
   IS-IS instance injected by R1, contains: 

   1. L2INFO: layer 2 addresses of nodes on a VLAN A link attached to R1 
      which have transmitted frames but have not transmitted ARP or ND 
      replies (i.e., these are not known to be IP nodes) 

   2. L3and2INFO: layer 3, layer 2 addresses of IP nodes attached to R1, 
      which R1 has learned through ARP/ND replies emitted by endnodes on 
      an attached VLAN A link.  For data compression, only the portion 
      of the address following the campus-wide prefix need be carried.  
      (This is a more important optimization for IPv6 than for IPv4) 

   3. Multicast Router attached: This is one bit of information that 
      indicates whether there is an IP multicast router attached. This 
      information is used because IGMP Membership Reports must be 
      transmitted to all links with IGMP routers, and not to links 
      without IGMP routers. Also, all packets for IP-derived multicast 
      addresses must be transmitted to all links with IGMP routers 
      (within the VLAN), in addition to links from which an IP node has 
      explicitly asked to join the group which the packet is for. 


Perlman                 Expires June 13, 2007                  [Page 8] 

Internet-Draft             RBridge Protocol               December 2006 
    

   4. Layer 2 addresses derived from IPv4 or IPv6 IGMP notification 
      messages received from attached endnodes, indicating the location 
      of listeners for these multicast addresses. ***Note: Should this 
      be layer 3 group addresses? If it's layer 2, then multiple IP 
      multicast groups will map to the same layer 2 multicast address*** 

   If R1 has learned endnode E's location first from a data packet (and 
   therefore has included E's layer 2 address in the L2INFO, and later E 
   transmits an ARP/ND reply, R1 MUST include E in the L3andL2INFO, and 
   MAY remove E from L2INFO. 

   Given that RBridges must already support delivery only to links 
   within a VLAN (for multicast or unknown frames marked with the VLAN's 
   tag), the same mechanism is used by the per-VLAN instance of IS-IS to 
   distribute endnode information solely to RBridges within a VLAN. 

   The per-VLAN instance of IS-IS will appear to the RBridges to consist 
   of a single link. R1 will originate a VLAN-A-specific IS-IS frame. 
   All RBridges will recognize the frame as a VLAN A multicast frame 
   (even if they are not connected to VLAN A), and prune the specified 
   distribution tree so as to only deliver the frame along branches with 
   VLAN A links. This is the same behavior core RBridges would have for 
   any VLAN A multicast/broadcast/unknown destination frame. RBridges 
   that are connected to VLAN A links will, in addition to forwarding 
   along the specified distribution tree, process the frame in their 
   VLAN-A IS-IS instance. 

   Thus suppose that RBridges R1, R2, and R3 are all on VLAN A, on links 
   scattered throughout the campus. The VLAN A IS-IS instance will 
   appear to be a single link (broadcast domain) with R1, R2, and R3 as 
   neighbors. The only information carried in the instance is the 
   endnode information for VLAN A. The other RBridges on the campus 
   facilitate delivery within the VLAN A broadcast domain, and therefore 
   may be on the path between R1 and R2, but will treat the VLAN A 
   instance link state frames as ordinary datagrams. 

   The way that RBridges distinguish which IS-IS instance the link state 
   information is for is based on the VLAN tag in the inner header. 

2.2. Distribution Tree Calculation 

   Some frames (e.g., to unknown destinations, or multicast 
   destinations) will need to be delivered to multiple links. RBridges 
   must calculate at least one tree, and the default is to calculate a 
   tree for every RBridge. However, in order to avoid requiring the 
   RBridges in a campus from calculating as many trees, each RBridge MAY 

 
Perlman                 Expires June 13, 2007                  [Page 9] 

Internet-Draft             RBridge Protocol               December 2006 
    

   be configured to indicate that it should not be the root of a 
   distribution tree. 

   The RBridge with lowest ID MUST have the flag set to "yes" (I should 
   be the root of a tree). 

   In IS-IS a shared link is modeled as a pseudonode, with a 7-byte ID 
   consisting of a 6-byte ID owned by the Designated Router (DR), plus a 
   nonzero byte assigned by the DR. The "I want to be a Root" flag is 
   defaulted to "no" for pseudonodes. 

   Calculation of a tree rooted at R1 is done by performing the SPF 
   calculation with R1 as the root, and with a deterministic tie-
   breaker, so that all RBridges calculate the same distribution tree. 
   The tie-breaker is that if a node N can be attached to either parent 
   P1 or P2 with the same minimal path cost from R1 to N, then choose P1 
   if P1's ID is lower than P2. 

   The calculated tree is a bidirectional tree. Each RBridge R keeps a 
   set of adjacencies (port, neighbor pair) selected for each 
   distribution tree. So for instance, for the distribution tree rooted 
   at R1, R chooses the adjacency which connects R to its parent in that 
   SPF tree, as well as any adjacencies that connect children to R. Once 
   the adjacencies are chosen, it is irrelevant which ones are towards 
   the root R1, and which are away from R1. So R might have calculated 
   that adjacencies a, c, and f are in the tree. That means that if 
   there is a multicast packet that indicates it should be transmitted 
   on distribution tree R1, and it is received on any adjacency other 
   than a, c, or f, R should discard the packet. If it is received on 
   any of the selected adjacencies (a, c, or f), then R should forward 
   onto the other two adjacencies.  

2.3. Pruning the Ingress Rbridge Tree 

   Packets which must be flooded (e.g., multicasts, unknown 
   destinations), are flooded along the selected distribution tree 
   rooted at the RBridge specified in the shim header, and pruned based 
   on whether there are potential receivers downstream of each of the 
   branches. In the case of a VLAN-tagged packet, it is forwarded only 
   on branches that have RBridges participating in that VLAN reachable 
   via that branch. 

   Further pruning is done in the case of IGMP Notification Messages, 
   where these are to be delivered only to ports with IP Multicast 
   Routers. In the case of a multicast derived from an IP multicast, 
   these multicast data packets are delivered only to links that have 
   registered listeners, plus links which have IP Multicast routers. 
 
 
Perlman                 Expires June 13, 2007                 [Page 10] 

Internet-Draft             RBridge Protocol               December 2006 
    

   The actual tree to forward along is chosen based on the specified 
   RBridge in the shim header, say R1. Say that RBridge R knows that 
   adjacencies (a, c, and f) are in the R1-distribution tree. 

   R marks pruning information for each of the adjacencies for the R1-
   tree. For each adjacency for each tree, R marks: 

   o  Flag for whether there are downstream IP routers 

   o  Set of VLANs reachable downstream 

   o  Set of layer 2 multicast addresses derived from IP multicast 
      groups for which there are receivers that have joined the group 

   Pruning is first done by VLAN tag. 

   Further pruning is done if: 

     . The inner packet is an IGMP Notification message, in which case 
        the frame is sent only on links with downstream IP Multicast 
        routers (in the VLAN indicated in the frame's inner header) 

     . The inner packet is an IP multicast data packet, in which case 
        the frame is sent only on links that either have downstream IP 
        multicast listeners (in the indicated VLAN) or downstream IP 
        multicast routers (in the indicated VLAN). 

   For each link for which R is Designated RBridge, R additionally 
   checks to see if it should decapsulate the frame and send it to the 
   link (e.g., if it is a distributed ARP in the right VLAN for that 
   link), or process the packet (e.g., if it is a per-VLAN IS-IS 
   instance link state announcement for a VLAN that R is attached to). 

2.4. Designated Rbridge 

   One RBridge on each link needs to be elected to have special duties. 
   This elected RBridge is known as the Designated RBridge. IS-IS 
   already holds such an election. 

   The Designated RBridge is the one on the link that will learn and 
   advertise the identities of attached endnodes, encapsulate and 
   forward frames that originate on that link to the rest of the campus, 
   decapsulate and forward frames onto that link received from other 
   RBridges, initiate a distributed ARP when an ARP query is received 
   for an unknown destination, and answer ARP queries when the target 
   node is known. 

 
Perlman                 Expires June 13, 2007                 [Page 11] 

Internet-Draft             RBridge Protocol               December 2006 
    

   It is dangerous to have multiple RBridges being Designated RBridge. 
   This could temporarily happen if a partitioned bridged LAN were 
   connected with a bridge or repeater. The situation will resolve once 
   the better priority RBridge's IS-IS Hello is received by the other 
   RBridges on the link. However, it is possible that some intervening 
   bridges might be discarding the IS-IS Hello messages due to being in 
   preforwarding state. 

   The one message type that is not delayed due to preforwarding state 
   is the spanning tree BPDU. If RBridges listen to BPDUs, and if the 
   LANs for which R1 was DR, and for which R2 was DR get joined, then 
   one or the other of R1 or R2 will note that the bridge Root has 
   changed identity, let's say R2 notices. 

   The conservative thing to do would be to invoke something like a 
   preforwarding state, in which R2 stops forwarding anything to or from 
   the link until it is sure the IS-IS link election would have 
   completed. But the IS-IS election could get slowed down due to 
   bridges in preforwarding state, and it would be undesirable to 
   disrupt traffic to and from the link just because the root ID has 
   changed. 

   The solution is to have RBridges participate in the spanning tree 
   election, with higher priority for becoming root (actually, lowest 
   numerical priority value) than any of the bridges, and with the same 
   priority as for becoming Designated RBridge on the link. Then an 
   RBridge is Designated RBridge if and only if it is the spanning tree 
   Root. 

   Note that RBridges MUST NOT merge spanning trees from different 
   ports. If two ports of R1 are connected to the same bridged LAN, then 
   the regular bridge spanning tree algorithm will partition the LAN 
   into distinct LANs for each of R1's ports. However, if two of R1's 
   ports are connected to the same shared medium (without any bridges 
   between the ports), then the regular bridge spanning tree algorithm 
   will turn off one of R1's ports. 

   So for example, R1 will initiate BPDUs on each of its ports, with 
   itself as Root (with highest, i.e., numerically lowest priority), 0 
   cost from Root, and the port ID. There are several possible cases: 

   o  R1 is the highest priority RBridge on the bridged LAN, in which 
      case it will become spanning tree Root and Designated RBridge 


Perlman                 Expires June 13, 2007                 [Page 12] 

Internet-Draft             RBridge Protocol               December 2006 
    

   o  R1 receives a BPDU from itself (because two of its ports are on 
      the same shared medium without any bridges between). In this case, 
      the numerically lowest port will stay on, and the other port(s) 
      will go into spanning tree backup state. 

   o  R1 receives a BPDU from someone else with higher priority 
      (numerically lower priority|ID), in which case R1 is not Root, and 
      not Designated RBridge. It is possible this is due to a bridge 
      being configured with the lowest priority, and then if R1 declines 
      being DR, the LAN becomes orphaned from the campus. We could treat 
      this case as a misconfiguration of bridges, or we could attempt to 
      solve it by having R1 eventually discover it is not receiving any 
      IS-IS Hellos, and become DR even though it is not spanning tree 
      Root. ******question here-do we care about this case?******** 

2.5. Wiring Closet Topology 

   In the case where there are two (or more) groups of endnodes, each 
   attached to a bridge (say B1 and B2 respectively), and each bridge is 
   attached to an RBridge (say R1 and R2 respectively), with a link 
   connecting B1 and B2, it is desirable to have the B1-B2 link only as 
   a backup in case one of R1 and R2, or the links B1-R1 or B2-R2 fail. 

   Default behavior would be that one of R1 or R2 (say R1) would become 
   Designated RBridge, and forward traffic to/from the link, so endnodes 
   attached to B2 would be connected to the campus via the path B2-B2-
   R1, rather than the desired B2-R2. 

   The solution is to configure R1 and R2 to be part of a "wiring closet 
   group", with a configured ID (which can be R1 or R2's ID). Both R1 
   and R2 participate in the bridge spanning tree on the configured 
   ports as root R1, which will cause the spanning tree to break the B1-
   B2 link as desired, and both R1 and R2 will act as Designated RBridge 
   on each of their respective partitions. 

   In the BPDU, Root will be "R1", cost to Root will be 0, Designated 
   Bridge ID will be "R1" when R1 transmits, and "R2" when R2 transmits, 
   and port ID will be a distinct value chosen by each of R1 and R2 to 
   distinguish each of its own ports. If R1 and R2 were actually on the 
   same shared medium with no bridges between them, the result will be 
   that the one with the larger ID will see "better" BPDUs (because of 
   the tie-breaker on the third field), and will turn off the port. 

   The only misconfiguration that can occur is if the link R1-R2 is on 
   the cut set of the campus, and there are bridges between R1 and R2, 
   and R2 is configured to believe it is the wiring closet topology. In 

 
Perlman                 Expires June 13, 2007                 [Page 13] 

Internet-Draft             RBridge Protocol               December 2006 
    

   that case, the link will become partitioned and the campus will 
   become partitioned. 

2.6. Learning Endnode Location 

   RBridges learn endnode location from data frames. They learn (layer 
   3, layer 2) pairs (for the purpose of supporting ARP/ND optimization) 
   from listening to ARP or ND replies. 

   This endnode information is learned by the DR, and distributed to 
   other RBridges through the link state protocol. 

2.7. Forwarding Behavior 

2.7.1. Receipt of a Native Packet 

   R1 receives a native (i.e., not RBridge-encapsulated) unicast frame. 
   R1 knows that this is a native frame because the Ethertype is not 
   "RBridge encapsulated frame". The destination in the layer 2 header 
   is D, the source is S. 

   R1 inserts a VLAN tag if required, according to the same rules as 
   bridges do. 

   Once the VLAN (if any) is established, the layer 2 address of D is 
   looked up in the destination table for that VLAN to find the egress 
   RBridge R2, or discover that D is unknown. 

   If D is known, with egress R2, then R1 encapsulates the packet, with 
   R2 indicated in the shim header as egress RBridge. In the outer 
   header, R1 puts "R1" as source, and next hop RBridge (in the path to 
   R2) as "destination", and "encapsulated RBridge packet" as the 
   Ethertype. 

   If D is unknown, R1 encapsulates the packet, with "R1" indicated as 
   ingress RBridge in the shim header, and outer header with source=R1, 
   destination = "all-RBridges". The egress RBridge field indicates the 
   chosen distribution tree. The default is for R1 to put its own 
   nickname there. However, R1 MAY be configured to select some other 
   tree. If R1 is configured to decline to be a tree root, then R1 MUST 
   select some other RBridge which has elected to be a tree root. 

2.7.2. Receipt of an In-transit Packet 

   RBridge R1 receives an encapsulated frame (as indicated by 
   Ethertype="Rbridge-encapsulated). 

 
Perlman                 Expires June 13, 2007                 [Page 14] 

Internet-Draft             RBridge Protocol               December 2006 
    

2.7.2.1. Flooded Packet 

   If the destination in the outer header is "all-RBridges", then R1 
   forwards along the ingress RBridge tree indicated by the shim header. 

   If the frame's inner header indicates it is for a specific VLAN, 
   links in that indicated ingress RBridge tree that do not lead to 
   links in that VLAN are pruned for this packet. Furthermore, if the 
   frame contains an IP multicast packet, then R1 only forwards on 
   branches that have learned, through IGMP, have receiver on those 
   links for this IP multicast. 

   In addition, for links for which R1 is Designated, R1 decapsulates 
   the packet and transmits the packet onto those links (unless the 
   packet is IP multicast or VLAN-tagged, and the packet does not belong 
   on that link). 

   If the frame belongs in VLAN A, (based on the presence of a tag in 
   the inner header) then R1 (the ingress RBridge) looks up D's location 
   in R1's table of VLAN A endnodes. 

   If the native frame's destination is a layer 2 multicast, then if  
   the frame is a BDPU, the RBridge drops the frame. 

   If the native frame's destination is "all-RBridges" with Ethertype 
   "IS-IS", then R1 processes the link state packet. 

   If the packet is an IGMP announcement, which will be transmitted to 
   an IP-derived layer 2 multicast address of "all IP routers", then the 
   RBridge learns, based on the "ingress RBridge" in the shim header, 
   the mapping between egress RBridges and IP multicast address 
   listeners. 

2.7.2.2. Unicast Packet 

   If the destination in the outer header is not R1, then R1 drops the 
   frame. 

   If the shim header indicates R1 is the egress RBridge, then R1 
   extracts the inner frame and forwards it onto the link containing the 
   destination, or processes the packet if the destination in the inner 
   frame is R1. 

   Else, R1 looks up the egress RBridge R2 indicated in the shim header, 
   in its forwarding table, and forwards the packet towards R2, by 
   replacing the outer header with one with source=R1, 

 
Perlman                 Expires June 13, 2007                 [Page 15] 

Internet-Draft             RBridge Protocol               December 2006 
    

   destination=nexthop RBridge towards R2, and Ethertype "encapsulated 
   RBridge". 

2.8. IGMP Learning 

   RBridges learn, based on seeing IGMP packets, which multicast 
   addresses should be forwarded onto which links. 

   IGMP messages have to be forwarded throughout the campus, since IP 
   routers in the broadcast domain also need to see these messages. 

   IGMP messages are forwarded by RBridges throughout the campus like 
   any layer 2 multicast. They are recognized by having an IP message 
   type=2 in the IP header. In addition, they are processed by RBridges 
   in order to extract, from announcements, what egress RBridges have 
   receivers for which groups.  

2.9. RBridge Nicknames 

   To make the shim header smaller, RBridges dynamically acquire 2-byte 
   nicknames that are unique within the campus. The nickname allocation 
   protocol is piggybacked on the core IS-IS RBridge instance as 
   follows: 

   We will assign a new type value to be carried in the IS-IS core 
   instance LSPs.  The TLV will carry the nickname the LSP source wishes 
   to use. 
    

   Each RBridge chooses its own nickname.  However, each RBridge is also 
   responsible for ensuring that its nickname is unique.  If R1 chooses 
   nickname x, and R1 discovers, through receipt of R2's LSP, that R2 
   has also chosen x, then the RBridge with the lower system ID keeps 
   the nickname, and the other one must choose a new nickname. 
    

   If two RBridge domains merge, then there might be a lot of nickname 
   collisions for a short time, but as soon as each side receives the 
   link state packets of the other, the RBridges that need to change 
   nicknames will quickly become aware of this, and choose new nicknames 
   that do not, to the best of their ability, collide with any existing 
   nicknames. 
    

   To minimize the probability of nickname collisions, each RBridge 
   chooses its nickname randomly from the set of assigned nicknames. 
   Alternatively, we could use some sort of hash algorithm (such as the 
 
 
Perlman                 Expires June 13, 2007                 [Page 16] 

Internet-Draft             RBridge Protocol               December 2006 
    

   bottom 16 bits of the MD5 of the RBridge's system ID), to choose the 
   first nickname, and then if there is a collision, go to the next 16 
   bits of the MD5, and so on, until all 128 bits of the MD5 hash are 
   exhausted, in which case the RBridge hashes its own system ID again, 
   this time together with the constant "1". 
    

   There is no reason for all RBridges to use the same algorithm for 
   choosing nicknames.  Picking them at random, or using a hash, are an 
   attempt to avoid collisions when the network starts up, but that is 
   only an optimization.  Even if all RBridges used the same algorithm, 
   say as a worst case, they all start with "1" and count up 
   sequentially until they find an uncontested nickname, the network 
   will eventually stabilize.  And once it is stable, nicknames should 
   remain stable even as routers go up or down. 
    
   To minimize the probability of a new RBridge usurping a nickname 
   already in use, an RBridge should wait to acquire the link state 
   database from a neighbor before it announces its own nickname. 

2.10. Forwarding Header on 802 Links 

   It is essential that RBridges coexist with ordinary bridges. 
   Therefore, a frame in transit must look to ordinary bridges like an 
   ordinary layer 2 frame. However, it must also be differentiable from 
   a native layer 2 frame by RBridges. To accomplish this, we use a new 
   layer 2 protocol type ("Ethertype"). 

   A frame in transit on an 802 link will therefore have two 802 
   headers, since the original frame (including the original 802 header) 
   will be tunneled by the RBridges. But rather than just having an 
   additional 802 header, we include additional information between the 
   two headers; at least a hop count. 

   An encapsulated frame would look as follows: 

               +--------------+-------------+-----------------+  
               | outer header | shim header | original frame  | 
               +--------------+-------------+-----------------+ 
    
                        Figure 1 Encapsulated Frame 

   The outer header contains: 

   o  L2 destination = next RBridge, or for flooded frames, a new (to be 
      assigned) multicast layer 2 address meaning "all RBridges" 

 
Perlman                 Expires June 13, 2007                 [Page 17] 

Internet-Draft             RBridge Protocol               December 2006 
    

   o  L2 source = transmitting RBridge (the one that most recently 
      handled this frame) 

   protocol type = "to be assigned...RBridge encapsulated frame" 

   The 6-byte shim header includes: 

   o  TTL = starts at some value and decremented by each RBridge. 
      Discarded if=0. This field uses 6 bits for TTL, and the remaining 
      10 bits are reserved. 

   o  ingress RBridge nickname. 16 bits 

   o  egress RBridge nickname (or selected distribution tree, in the 
      case of multicast). 16 bits 

2.11. Handling ARP/ND Queries 

   We will use the term "optimized ARP/ND response" to cover several 
   possible behaviors an RBridge might utilize. Non-optimized behavior 
   would consist of treating an ARP or ND query as an ordinary layer 2 
   broadcast/multicast, and send the query to all links in the campus, 
   allowing the target to respond as to an ordinary ARP/ND query. This 
   behavior is essential when the location of the target is unknown, 
   although RBridges could suppress multiple queries to the same target 
   within some amount of time. 

   When the target's location is assumed to be known by the first 
   RBridge, it need not flood the query. Alternative behaviors of the 
   first Designated RBridge that receives the ARP/ND query would be to: 

   1. send a response directly to the querier, with the layer 2 address 
      of the target, as believed by the RBridge 

   2. encapsulate the ARP/ND query to the target's Designated RBridge, 
      and have the Designated RBridge at the target forward the query to 
      the target. This behavior has the advantage that a response to the 
      query will be definitive. If the query does not reach the target, 
      then the querier will not get a response 

   3. block ARP/ND queries that occur for some time after a query to the 
      same target has been launched, and then respond to the querier 
      when the response to the recently-launched query to that target is 
      received 

   The reason not to do the most optimized behavior all the time is for 
   timeliness of detecting a stale cache. Also, in the case of SEND, 
 
 
Perlman                 Expires June 13, 2007                 [Page 18] 

Internet-Draft             RBridge Protocol               December 2006 
    

   cryptography might prevent behavior 1, since the RBridge would not be 
   able to sign the response with the target's private key. 

   It is not essential that all RBridges use the same strategy for which 
   option to select for a particular query. However, once the first 
   Designated RBridge decides on a strategy for a particular query, the 
   other RBridges must carry that through. If the first RBridge responds 
   directly to the querier, or blocks the query, then no other RBridges 
   are involved. 

   If the first Designated RBridge R1 decides to unicast the query to 
   the target's Designated RBridge R2, then R2 must decapsulate the 
   query, and initiate an ARP/ND query on the target's link. When/if the 
   target responds, R2 must encapsulate and unicast the response to R1, 
   which will decapsulate the response and send it to the querier. 

   If the first Designated RBridge R1 decides to flood the query (which 
   it MUST do if the target is unknown, but MAY do if it wants to assure 
   freshness of the information), the query is encapsulated to be 
   flooded through the indicated VLAN. 

   The distributed ARP query is carried by RBridges through the RBridge 
   spanning tree. Each Designated RBridge, in addition to forwarding the 
   query through the spanning tree, initiates an ARP query on its 
   link(s). If a reply is received from the target by Designated RBridge 
   R2, R2 initiates a link state update to inform all the other RBridges 
   of D's location, layer 3 address, and layer 2 address, in addition to 
   forwarding the reply to the querier. 

   It is the querier's Designated RBridge R1 that chooses which strategy 
   to employ when seeing an ARP query. 

   Some mix of these strategies (responding directly, unicasting the 
   query to the target's Designated RBridge, or flooding the query) 
   might be the best solution. For instance, even if the target's 
   location and (layer 3, layer 2) correspondence is in the link state 
   information R1 received from R2, if the target's location has not 
   been recently verified by R1 through a broadcast ARP/ND or unicast 
   query to the target, then R1 MAY broadcast or unicast the query or 
   respond directly. So for instance, RBridges could keep track of the 
   last time a broadcast ARP/ND occurred for each endnode E (by any 
   source, and injected by any RBridge). Let's say the parameter is 20 
   seconds. If a source S on RBridge R1's link does an ARP/ND for D, if 
   R1 has not seen an ARP/ND for D within the last 20 seconds, R1  
   unicasts the query to force a reply from the target; otherwise it 
   proxies the reply. 

 
Perlman                 Expires June 13, 2007                 [Page 19] 

Internet-Draft             RBridge Protocol               December 2006 
    

   When R2 forwards a unicast ARP/ND query, if the target does not 
   respond, then R2 MAY replay the query, and if the target does not 
   respond, R2 will remove the target from its link state information. 

2.12. Discovering IP Multicast Routers 

   Until Multicast Router Discovery (RFC 4286)is universally deployed, 
   RBridges must discover IP multicast routers because they transmit PIM 
   messages. So an RBridge concludes there is an IP multicast router on 
   its port if it either receives an MRD message, or a PIM message on 
   that port. A PIM message is recognized because the protocol type in 
   the IP header is decimal 103. 

2.13. Assuring Freshness of Endnode Information 

   Designated RBridge R1 can ensure freshness of its endnode information 
   by doing ARP/ND queries periodically to ensure that the endnodes are 
   actually there. This can be a problem if the endnodes are in power-
   saver mode, and this should be a configuration parameter on R1 as to 
   whether R1 should "ping" the endnodes by doing ARP/ND queries. 

3. Rbridge Addresses, Parameters, and Constants 

   Each RBridge needs a unique ID within the campus.  The simplest such 
   address is a unique 6-byte ID, since such an ID is easily obtainable 
   as any of the EUI-48's owned by that RBridge.  IS-IS already requires 
   each router to have such an address. 

   A parameter is the value to which to initially set the hop count in 
   the envelope.  Recommended default=20. 

   A new Ethertype must be assigned to indicate an RBridge-encapsulated 
   frame. 

   A layer 2 multicast address for "all RBridges" must be assigned for 
   use as the destination address in flooded frames. A different layer 2 
   multicast address for "IS-IS" must be assigned for use as the 
   destination address in IS-IS packets. 

   To support VLANs, RBridges (like bridges today), must be configured, 
   for each port, with the VLAN in which that port belongs. 

   We may want a parameter to determine whether an RBridge should 
   periodically do queries to ensure that the endnode information is 
   fresh, and if so, with what frequency. 


Perlman                 Expires June 13, 2007                 [Page 20] 

Internet-Draft             RBridge Protocol               December 2006 
    

   A parameter indicates whether an RBridge wants to be the root of a 
   distribution tree. 

   Configuration for wiring closet topology consists of system ID of the 
   RBridge with lowest ID. If R1 and R2 are part of a wiring closet 
   topology, only R2 needs to be configured to know about this, and that 
   R1 is the ID it should use in the spanning tree protocol on the 
   specified port. 

4. Security Considerations 

   The goal is for RBridges to not add additional security issues over 
   what would be present with traditional bridges.  RBridges will not be 
   able to prevent nodes from impersonating other nodes, for instance, 
   by issuing bogus ARP replies.  However, RBridges will not interfere 
   with any schemes that would secure neighbor discovery. 

   As with routing schemes, authentication of RBridge messages would be 
   a simple addition to the design (and it would be accomplished the 
   same way as it would be in IS-IS).  However, any sort of 
   authentication requires additional configuration, which might 
   interfere with the perception that RBridges, like bridges, are zero 
   configuration. 

5. IANA Considerations 

   A new Ethertype must be assigned to indicate an RBridge-encapsulated 
   frame. 

   A layer 2 multicast address for "all RBridges" must be assigned for 
   use as the destination address in flooded frames. 

6. Conclusions 

   This design allows transparent interconnection of multiple links into 
   a single IP subnet.  Management would be just like with bridges 
   (plug-and-play).  But this design avoids the disadvantages of 
   bridges.  Temporary loops are not a problem so failover can be as 
   fast as possible, and shortest paths can be followed. 

   The design is compatible with current IP nodes and routers, and with 
   current bridges. 

7. Acknowledgments 

   Many people have contributed to this design, including the working 
   group chairs Erik Nordmark and Donald Eastlake, and many other 
 
 
Perlman                 Expires June 13, 2007                 [Page 21] 

Internet-Draft             RBridge Protocol               December 2006 
    

   members of the working group such as Dino Farinacci and Eric Gray. We 
   invite you to join the mailing list at http://www.postel.org/rbridge. 

   This draft was written using 2-Word-v2.0.template.dot. 

8. References 

8.1. Normative References 

       [1] IEEE 802.1d bridging standard, "IEEE 802.1d bridging 
             standard". 

       [2] Haberman, B., Martin, J., "Multicast Router Discovery", RFC 
             4286, Dec 2005. 

       [3] Christensen, M., Kimball, K, Solensky, F., "Considerations 
             for IGMP and MLD Snooping Switches", draft-ietf-magma-
             snoop-12.txt 

       [4] [IGMPv3] Cain, B., "Internet Group Management Protocol, 
             Version 3", RFC3376, October 2002. 

8.2. Informative References 

       [5] Bryant, S., Perlman, R., Atlas, Alk, Fedyk, D., "TRILL using 
             Pseudo-Wire Emulation (PWE) Encapsulation", internet draft 
             draft-bryant-perlman-trill-pwe-encap-00. 

       [6] Narten, T., Nordmark, E. and W. Simpson, "Neighbor Discovery 
             for IP Version 6 (IPv6)", RFC 2461 (Standards Track), 
             December 1998. 

       [7] Perlman, R., "RBridges: Transparent Routing", Proc. Infocom 
             2005, March 2004. 

       [8] Perlman, R., "Interconnection: Bridges, Routers, Switches, 
             and Internetworking Protocols", Addison Wesley Chapter 3, 
             1999. 

Author's Addresses 

   Radia Perlman 
   Sun Microsystems 
       
   Email: Radia.Perlman@sun.com 
    

Perlman                 Expires June 13, 2007                 [Page 22] 

Internet-Draft             RBridge Protocol               December 2006 
    

   Silvano Gai 
   Nuova Systems 
       
   Email: sgai@nuovasystems.com 
    

   Sanjay Sane 
   Cisco 
       
   Email: sanjays@cisco.com 
    

   Joe Touch 
   USC/ISI 
   4676 Admiralty Way 
   Marina del Rey, CA 90292-6695 
   U.S.A. 
       
   Phone: +1 (310) 448-9151 
   Email: touch@isi.edu 
   URL:   http://www.isi.edu/touch 
 

Intellectual Property Statement 

   The IETF takes no position regarding the validity or scope of any 
   Intellectual Property Rights or other rights that might be claimed to 
   pertain to the implementation or use of the technology described in 
   this document or the extent to which any license under such rights 
   might or might not be available; nor does it represent that it has 
   made any independent effort to identify any such rights.  Information 
   on the procedures with respect to rights in RFC documents can be 
   found in BCP 78 and BCP 79. 

   Copies of IPR disclosures made to the IETF Secretariat and any 
   assurances of licenses to be made available, or the result of an 
   attempt made to obtain a general license or permission for the use of 
   such proprietary rights by implementers or users of this 
   specification can be obtained from the IETF on-line IPR repository at 
   http://www.ietf.org/ipr. 

   The IETF invites any interested party to bring to its attention any 
   copyrights, patents or patent applications, or other proprietary 
   rights that may cover technology that may be required to implement 
   this standard.  Please address the information to the IETF at 
   ietf-ipr@ietf.org. 

 
Perlman                 Expires June 13, 2007                 [Page 23] 

Internet-Draft             RBridge Protocol               December 2006 
    

Disclaimer of Validity 

   This document and the information contained herein are provided on an 
   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 
   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET 
   ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, 
   INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE 
   INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 
   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 

Copyright Statement 

   Copyright (C) The Internet Society (2006). 

   This document is subject to the rights, licenses and restrictions 
   contained in BCP 78, and except as set forth therein, the authors 
   retain all their rights. 

Acknowledgment 

   Funding for the RFC Editor function is currently provided by the 
   Internet Society. 

    
Perlman                 Expires June 13, 2007                 [Page 24]