Network Working Group                                        R. Perlman 
Internet Draft                                                      Sun 
Expires: November 2005                                         J. Touch 
                                                                USC/ISI 
                                                               A. Yegin 
                                                                Samsung 
                                                            May 2, 2005 
                                    
 
                       RBridges: Transparent Routing 
                       draft-perlman-rbridge-03.txt 


Status of this Memo 

   By submitting this Internet-Draft, each author represents that       
   any applicable patent or other IPR claims of which he or she is       
   aware have been or will be disclosed, and any of which he or she       
   becomes aware will be disclosed, in accordance with Section 6 of       
   BCP 79. 

   This document may not be modified, and derivative works of it may not 
   be created, except to publish it as an RFC and to translate it into 
   languages other than English. 

   Internet-Drafts are working documents of the Internet Engineering 
   Task Force (IETF), its areas, and its working groups.  Note that 
   other groups may also distribute working documents as Internet-
   Drafts. 

   Internet-Drafts are draft documents valid for a maximum of six months 
   and may be updated, replaced, or obsoleted by other documents at any 
   time.  It is inappropriate to use Internet-Drafts as reference 
   material or to cite them other than as "work in progress." 

   The list of current Internet-Drafts can be accessed at 
        http://www.ietf.org/ietf/1id-abstracts.txt 

   The list of Internet-Draft Shadow Directories can be accessed at 
        http://www.ietf.org/shadow.html 

   This Internet-Draft will expire on November 2, 2005. 

Copyright Notice 

   Copyright (C) The Internet Society (2005).  All Rights Reserved. 

 
Perlman                Expires November 2, 2005                [Page 1] 

Internet-Draft      RBridges: Transparent Routing              May 2005 
    

Abstract 

   RBridges provide the ability to have an entire campus, with multiple 
   physical links, look to IP like a single subnet. The design allows 
   for zero configuration of switches within a campus, optimal pair-wise 
   routing, safe forwarding even during periods of temporary loops, and 
   the ability to cut down on ARP/ND traffic. The design also supports 
   VLANs, and allows forwarding tables to be based on RBridge 
   destinations (rather than endnode destinations), which allows 
   internal routing tables to be substantially smaller than in 
   conventional bridge systems.  

   This document is a work in progress; we invite you to participate on 
   the mailing list at http://www.postel.org/RBridge  

Table of Contents 

    
   1. Introduction...................................................3 
   2. Detailed RBridge Design........................................5 
      2.1. Link State Protocol.......................................5 
      2.2. Spanning Tree.............................................6 
      2.3. Designated RBridge........................................7 
      2.4. Learning Endnode Location.................................8 
      2.5. Forwarding Behavior.......................................8 
      2.6. Forwarding Header on 802 Links............................8 
      2.7. Distributed ARP Query....................................11 
   3. RBridge Addresses, Parameters, and Constants..................12 
   4. Handling ARP Queries..........................................12 
   5. Issues........................................................13 
      5.1. How Many Spanning Trees?.................................13 
         5.1.1. Per-ingress Spanning Tree...........................13 
         5.1.2. Per VLAN............................................13 
         5.1.3. Single Spanning Tree................................13 
      5.2. Reasons Not to Optimize Handling of IP packets...........14 
         5.2.1. Avoiding Encapsulation for On-campus IP Packets.....14 
         5.2.2. Avoiding Encapsulation for Cff-campus IP Packets....15 
      5.3. Supporting Heterogeneous Link Types......................15 
      5.4. Effects on L3 TTL........................................15 
      5.5. Using L3 encapsulation...................................15 
      5.6. Optimizing ARP/ND........................................16 
   6. Security Considerations.......................................17 
   7. Conclusions...................................................17 
   8. Acknowledgments...............................................17 
   9. References....................................................17 
      9.1. Normative References.....................................17 
      9.2. Informative References...................................18 
 
 
Perlman                Expires November 2, 2005                [Page 2] 

Internet-Draft      RBridges: Transparent Routing              May 2005 
    

   Author's Addresses...............................................19 
   Intellectual Property Statement..................................19 
   Disclaimer of Validity...........................................20 
   Copyright Statement..............................................20 
   Acknowledgment...................................................20 
    
1. Introduction 

   In traditional IPv4 and IPv6 networks, each link must have a unique 
   prefix.  This means that a node that moves from one link to another 
   must change its IP address, and a node with multiple links must have 
   multiple addresses.  It also means that a company with many links 
   (separated by routers) will have difficulty making full use of its IP 
   address block (since any link not fully populated will waste 
   addresses), and routers require significant configuration. 

   Bridges avoid these problems because bridges can transparently glue 
   many physical links into what appears to IP to be a single LAN. 
   However, bridge routing via the spanning tree concentrates traffic   
   onto selected links, forward based on a header for which any 
   temporary loops (which might arise due to topology changes or lost 
   spanning tree messages or components such as repeaters coming up) are 
   very dangerous (because there is no hop count in the header and there 
   may be exponential proliferation of packets during loops), and routes 
   cannot be pair-wise shortest paths, but instead whatever path remains 
   after the spanning tree eliminates redundant paths. 

   We define the term "campus" to be the set of links connected by any 
   combination of RBridges and bridges. In other words the term 'campus' 
   needs to be clearly defined.  A campus refers to a set of links 
   connected by either RBridges or bridges.  In other words, the campus 
   is terminated by traditional IP routers, in the same way that an IP 
   subnet would be terminated by an IP router.  A campus will look to IP 
   nodes like a single IP subnet, whether the interconnection of the 
   links is done with bridges, RBridges, or some combination of the two. 

   There have been proposals for having routers within a campus 
   automatically number links with distinct IP subnet numbers.  Although 
   this makes a campus plug-and-play, it requires a large number of IP 
   subnet numbers, a node must change its address if it moves to a 
   different link, and addresses of nodes might fluctuate as the 
   topology changes and links must be renumbered. 

   This proposal introduces RBridges [8] (Routing Bridges), which 
   combine the advantages of bridges and routers. Like bridges, RBridges 
   are zero configuration, and are transparent to IP nodes. Like 
   routers, RBridges forward on pair-wise shortest paths, and do not 
 
 
Perlman                Expires November 2, 2005                [Page 3] 

Internet-Draft      RBridges: Transparent Routing              May 2005 
    

   have dangerous behavior during temporary loops. RBridges have the 
   additional advantage that they can suppress the broadcast/multicast 
   for neighbor discovery by doing proxy ARP (IPv4) or proxy ND (IPv6). 

   RBridges are fully compatible with current bridges as well as current 
   IPv4 and IPv6 routers and endnodes.  They are as invisible to current 
   IP routers as bridges are, and like routers, they terminate a bridged 
   spanning tree. 

   The main idea is to have RBridges run a link state protocol amongst 
   themselves (IS-IS is ideal, since its TLV encoding easily allows new 
   information to be carried in link state information, as this proposal 
   requires, and also makes zero configuration easier because IS-IS does 
   not require assigning IP addresses to the RBridges). 

   The next step is for RBridges to learn the location of endnodes. They 
   can learn the location and layer 2 addresses of attached nodes from 
   the source address of data packets, as bridges do. Additionally, in 
   order to facility proxy ARP or proxy ND optimizations, RBridges can 
   also learn the (layer 3, layer 2) addresses of attached IP nodes from 
   ARP or ND replies. 

   Once an RBridge learns the location of a directly attached endnode, 
   it informs the other RBridges in its link state information. 

   RBridge forwarding can be done, as with a router, via pairwise 
   shortest paths.  RBridges could also utilize forwarding 
   optimizations, e.g., MPLS. 

   To prevent the temporary loop issues with bridges, RBridges must 
   always forward based on a header with a hop count. Although the hop 
   count will quickly discard looping packets, it is also desirable not 
   to spawn additional copies of packets. This can be accomplished by 
   having RBridges specify the next RBridge recipient while forwarding 
   across a shared-media link. 

   For two reasons, packets must be encapsulated as they are traveling 
   between RBridges: 

   1. so that intermediate RBridges (and bridges) will not be confused 
      about the location of the source by learning the source address 
      from packets in transit 

   2. so that the packet can be directed towards the egress RBridge, and 
      can include a hop count (for links, like Ethernet, that do not 
      already contain a hop count). 

 
Perlman                Expires November 2, 2005                [Page 4] 

Internet-Draft      RBridges: Transparent Routing              May 2005 
    

   RBridges are similar to Recursive Routers, which provide similar 
   transit to emulate a single L3 router, in that case using L3 + L2 
   encapsulation [10][11]. 

   A VLAN is a broadcast domain. That means that a layer 2 broadcast 
   (multicast) packet sent to a VLAN must only be delivered to links 
   that are in that VLAN. A packet for a particular VLAN may transit any 
   link on the campus, but an unencapsulated VLAN packet must only be 
   delivered to links that RBridges have been configured to know support 
   that VLAN. Support of VLANs does traditionally require configuration 
   of the bridges (or in this case RBridges) to know which links belong 
   to which VLANs. In theory some other mechanism might allow an RBridge 
   to know which VLANs should be supported on which port. The RBridge 
   design does not care how RBridges discover which VLANs are supported 
   by each of their ports, but for simplicity we assume here that 
   RBridges (like bridges) are configured with this information. 

   RBridges must calculate a spanning tree for each broadcast domain. In 
   a campus without VLANs, this means a single spanning tree would be 
   used for delivery of packets with unknown or group address layer 2 
   destination. 

   It is possible to support VLANs with a single spanning tree, and just 
   avoid forwarding the decapsulated packet onto links that do not 
   support that VLAN. However, it will allow for more optimal delivery 
   if a different spanning tree is calculated for each broadcast domain. 

   It is not necessary to use the bridge spanning tree algorithm to 
   calculate the spanning trees. Instead, they can be calculated based 
   on the link state information. Using the link state protocol to 
   calculate spanning trees makes the design very flexible and 
   efficient. The link state database gives sufficient information so 
   that RBridges can calculate a single spanning tree, spanning trees 
   per VLAN, or per-ingress RBridge spanning trees without requiring any 
   additional exchange of information between RBridges. 

2. Detailed RBridge Design 

2.1. Link State Protocol 

   Running a link state protocol among RBridges is straightforward.  It 
   is the same as running a level 1 routing protocol in an area.  IS-IS 
   is a more appropriate choice than OSPF in this case because it is 
   easy in IS-IS to define new TLVs for carrying new information. 
   However, the instance of IS-IS that RBridges will implement will be 
   separate from any routing protocol that IP routers will implement, 
   just as the spanning tree messages are not implemented by IP routers. 
 
 
Perlman                Expires November 2, 2005                [Page 5] 

Internet-Draft      RBridges: Transparent Routing              May 2005 
    

   To keep the instances separate, RBridge routing messages should be 
   sent to a different layer 2 multicast address than IS-IS routing 
   messages.  Alternatively, they can be differentiated by having a 
   different "area address", where, in order to keep RBridges 
   configuration-free, the RBridge area address would be a constant for 
   all RBridges, and would not be one that would ever appear as a real 
   IS-IS area address. 

   Additional information that RBridge link state information will carry 
   is: 

   o  layer 2 addresses of nodes within the campus which have 
      transmitted packets but have not transmitted ARP or ND replies  

   o  layer 3, layer 2 addresses of IP nodes within the campus.  For 
      data compression, perhaps only the portion of the address 
      following the campus-wide prefix need be carried.  This will be 
      more of an issue for IPv6 than for IPv4. 

   o  VLANs directly connected to this RBridge 

   The endnode information (the endnode information) need only be 
   delivered to RBridges supporting the VLAN in which the endnode 
   resides. So for instance, if endnode E is discovered through a VLAN A 
   packet, then E's location need only be delivered to other RBridges 
   that are attached to VLAN A links. 

   Given that RBridges must support delivery only to links within a VLAN 
   (for multicast or unknown packets marked with the VLAN's tag), this 
   mechanism can be used to advertise endnode information solely to 
   RBridges within a VLAN. Although a separate instance of the link 
   state protocol could be run for this purpose, the topology is so 
   restricted (just a single broadcast domain), that it might be 
   preferable to design a special case mechanism where each DR 
   advertises its attached endnodes, and receives explicit acks from the 
   other RBridges. 

2.2. Spanning Tree 

   There will be cases when RBridges may need to send packets to all 
   links.  These cases include: 

   o  layer 2 multicast or broadcast packets 

   o  unknown layer 2 destination addresses 

   o  distributed RBridge layer 3 address location query 
 
 
Perlman                Expires November 2, 2005                [Page 6] 

Internet-Draft      RBridges: Transparent Routing              May 2005 
    

   In this case the packets must be sent through a spanning tree. 
   However, there is no need to implement a separate spanning tree 
   protocol in addition to the link state protocol.  Instead, the link 
   state information can be used to create a single spanning tree 
   throughout the campus.  This is done by choosing the RBridge with 
   lowest ID, and calculating the Dijkstra tree with that RBridge as 
   Root. 

   In the case of multiple equal cost links, some tie-breaker must be 
   used to ensure that all RBridges calculate the same spanning tree. We 
   suggest using the ID of the parent as the tie breaker (if a node can 
   be attached to either parent P1 or P2 with the same cost, choose P1 
   if P1's ID is lower than P2). 

   In the case of multicast L2 addresses, the RBridge may treat these as 
   broadcast, or may include existing techniques for emulating multicast 
   at L2, i.e., snooping IGMP and/or PIM-SM packets to configure an 
   internal, L2 multicast tree. 

   For a packet tagged with a VLAN ID (e.g., VLAN A), the packet is only 
   delivered to links that support VLAN A. It would provide for more 
   optimal delivery if a different spanning tree were calculated for 
   each VLAN. This would be done by choosing the RBridge with lowest ID 
   that connects to that VLAN as root, and calculating a tree of 
   shortest paths from that RBridge. RBridges that do not support VLAN A 
   may be on the delivery path for VLAN A packets, but they will not 
   decapsulate the packet onto links that are not VLAN A links. 

   If IGMP snooping is used to know where recipients of a multicast 
   packet reside, then the total number of packet-hops to deliver the 
   packet can be optimized by calculating a separate spanning tree per 
   ingress RBridge. This, however, requires a lot more computation (one 
   tree per RBridge). The tradeoffs will be discussed in the "Issues" 
   section at the end of this document. 

2.3. Designated RBridge 

   It is useful for one RBridge on each link to have special duties. 
   Thus one RBridge per link should be elected Designated RBridge. IS-IS 
   already holds such an election. 

   The Designated RBridge is the one on the link that will learn the 
   identities of attached endnodes, initiate a distributed ARP when an 
   ARP query is received for an unknown destination, and answer ARP 
   queries when the target node is known. 


Perlman                Expires November 2, 2005                [Page 7] 

Internet-Draft      RBridges: Transparent Routing              May 2005 
    

2.4. Learning Endnode Location 

   RBridges learn endnode location from data packets. They learn (layer 
   3, layer 2) pairs (for the purpose of supporting proxy ARP/ND) from 
   listening to ARP or ND replies. 

   This endnode information is learned by the DR, and distributed to 
   other RBridges through the link state protocol. 

2.5. Forwarding Behavior 

   When a DR R1 receives a native packet with layer 2 address S and 
   layer 2 destination address D, R1 looks up the location of D. If D is 
   claimed by egress RBridge R2, then R1 encapsulates the packet, 
   directing it towards R2. 

   When an RBridge receives an encapsulated packet, it forwards based on 
   the specified egress RBridge (rather than the ultimate destination 
   endnode). 

   If the packet belongs in VLAN A, then R1 (the ingress RBridge) looks 
   up D's location in R1's table of VLAN A endnodes. 

2.6. Forwarding Header on 802 Links 

   It is essential that RBridges coexist with ordinary bridges. 
   Therefore, a packet in transit must look to ordinary bridges like an 
   ordinary layer 2 packet. However, it must also be differentiable from 
   a native layer 2 packet by RBridges. To accomplish this, we use a new 
   layer 2 protocol type ("Ethertype"). 

   A packet in transit on an 802 link will therefore have two 802 
   headers, since the original frame (including the original 802 header) 
   will be tunneled by the RBridges. But rather than just having an 
   additional 802 header, we include additional information between the 
   two headers; at least a hop count. 

   An encapsulated packet would look as follows: 

               +--------------+-------------+-----------------+  
               | outer header | shim header | original packet | 
               +--------------+-------------+-----------------+ 
    
                       Figure 1 Encapsulated packet 


Perlman                Expires November 2, 2005                [Page 8] 

Internet-Draft      RBridges: Transparent Routing              May 2005 
    

   The outer header contains: 

   o  L2 destination = next RBridge 

   o  L2 source = transmitting RBridge (the most one that most recently 
      handled this packet) 

   o  protocol type = "to be assigned...RBridge encapsulated packet" 

   The shim header includes: 

   o  TTL = starts at some value and decremented by each RBridge. 
      Discarded if=0 

   o  egress RBridge (in the case of unicast), or ingress RBridge (in 
      the case of multicast) 

   Note that one variation is to have the egress RBridge specified in 
   the outer header rather than in the shim header. This will mean that 
   some packet duplication might occur during temporary loops. But the 
   advantage is that the header will be 6 bytes smaller. This is 
   discussed in the "issues" section. 

   The following is a walk-through of a packet traversing an RBridge 
   campus. Consider a packet consisting of "data" to be sent from node A 
   to node B through an RBridge campus (dotted area) as per Figure 2. 

                     ............................... 
                     .                             . 
         +--------+  .+-----+    +-----+    +-----+.   +--------+ 
         |        |  .|     |    |     |    |     |.   |        | 
         | Host A ----- Rb1 ------ Rb2 ------ Rb3 ------ Host B | 
         |        |  .|     |    |     |    |     |.   |        | 
         +--------+  .+-----+    +-----+    +-----+.   +--------+ 
                     .                             . 
                     .              RBridge campus . 
                     ............................... 
    
       Figure 2 Sample path for packet traversing an RBridge campus 

   In this figure, Host A is the source, Host B the sink, and Rb1..Rb3 
   are nodes of the RBridge campus. Rb1 is the ingress, and Rb3 is the 
   egress. Additionally, layer 2 (L2) addresses are as shown below the 
   components on the particular ports in Figure 3; note that addresses 
   are required for RBridge nodes for encapsulation and routing within 
   the campus. Different addresses are shown for each port on an RBridge 
   node for simplicity, although this is not required. 
 
 
Perlman                Expires November 2, 2005                [Page 9] 

Internet-Draft      RBridges: Transparent Routing              May 2005 
    

                     ............................... 
                     .                             . 
         +--------+  .+-----+    +-----+    +-----+.   +--------+ 
         |        |  .|     |    |     |    |     |.   |        | 
         | Host A ----- Rb1 ------ Rb2 ------ Rb3 ------ Host B | 
         |        a   b1x b1y    b2x b2y    b3x b3y    b        | 
         |        |  .|     |    |     |    |     |.   |        | 
         +--------+  .+-----+    +-----+    +-----+.   +--------+ 
                     .                             . 
                     .              RBridge campus . 
                     ............................... 
    
                Figure 3 Sample path including L2 addresses 

   Consider the originating packet as per Figure 4; "L2 a->b" means the 
   layer 2 (L2) source address is "a" and the L2 destination address is 
   "b", and "IP A->B" means the IP source address is A and the IP 
   destination is B. 

                      +---------+---------+--------+  
                      | L2 a->b | IP A->B |  data  | 
                      +---------+---------+--------+ 
 
                  Figure 4 Packet as originated at Host A 

   The ingress RBridge Rb1 looks up 'b' in its encapsulation tables, 
   which indicate that Rb3 is the egress RBridge. The packet gets 
   wrapped to direct it to Rb3 using a shim header (SH), where the 
   destination is based on the L2 address of Rb3 (the egress) and uses a 
   TTL of 20, as shown in Figure 5. 

              +-----------------+---------+---------+--------+  
              | SH ->b3y TTL=20 | L2 a->b | IP A->B |  data  | 
              +-----------------+---------+---------+--------+ 
 
                     Figure 5 Packet with shim header 

   Note that the shim header includes only egress addresses for unicast 
   packets; for multicast packets, ingress L2 is used instead. 

   Rb1 then looks up the shim header destination in its (campus) 
   forwarding tables, yielding Rb2 as the next hop inside the campus. 
   Rb1 then sends the packet on to Rb2 by adding the appropriate L2 
   header, as shown in Figure 6. 


Perlman                Expires November 2, 2005               [Page 10] 

Internet-Draft      RBridges: Transparent Routing              May 2005 
    

      +-------------+-----------------+---------+---------+--------+  
      | L2 b1y->b2x | SH ->b3y TTL=20 | L2 a->b | IP A->B |  data  | 
      +-------------+-----------------+---------+---------+--------+ 
    
                  Figure 6 Packet as sent from Rb1 to Rb2 

   Rb2 unwraps the outermost L2, decrements the shim TTL, and looks up 
   the shim destination's next hop (which is Rb3 here). Rb2 then adds a 
   new L2 header addressed to Rb3, as shown in Figure 7. 

      +-------------+-----------------+---------+---------+--------+  
      | L2 b2y->b3x | SH ->b3y TTL=19 | L2 a->b | IP A->B |  data  | 
      +-------------+-----------------+---------+---------+--------+ 
    
                  Figure 7 Packet as sent from Rb2 to Rb3 

   Rb3 unwraps the outer L2, notices that the shim destination has been 
   reached (itself), and unwraps the shim too. At that point, it 
   proceeds to send the original packet shown in Figure 4 to Host B. 

2.7. Distributed ARP Query 

   The distributed ARP query is carried by RBridges through the RBridge 
   spanning tree. Each Designated RBridge, in addition to forwarding the 
   query through the spanning tree, initiates an ARP query on its 
   link(s). If a reply is received by Designated RBridge R2, R2 
   initiates a link state update to inform all the other RBridges of D's 
   location, layer 3 address, and layer 2 address. 

   The distributed ARP query must be sent to a (new, to be assigned) 
   layer 2 multicast address. The fields it must contain are:  

   Outer Layer 2 header: 

   o  destination = newly defined l2 multicast address 

   o  source = transmitting RBridge (replaced hop by hop) 

   o  protocol type = same as encapsulated RBridge 

   Shim header: 

   o  TTL (for safety if the RBridge spanning tree has temporary loops, 
      and where the L2 header lacks an existing TTL) 


Perlman                Expires November 2, 2005               [Page 11] 

Internet-Draft      RBridges: Transparent Routing              May 2005 
    

   o  ingress RBridge (rather than egress RBridge, which would be 
      specified in unicast packets to known destinations); this is used 
      for ingress-specific forwarding, e.g., for VLANs 

   RBridge payload: 

   o  original ARP or ND query 

   Intermediate RBridges decrement the above TTL, and replace the source 
   RBridge with their own layer 2 address on the outgoing interface. 

3. RBridge Addresses, Parameters, and Constants 

   Each RBridge needs a unique ID within the campus.  The simplest such 
   address is a unique 6-byte ID, since such an ID is easily obtainable 
   as any of the EUI-48's owned by that RBridge.  IS-IS already requires 
   each router to have such an address. 

   A parameter is the value to which to initially set the hop count in 
   the envelope.  Recommended default=20. 

   A new Ethertype must be assigned to indicate an RBridge-encapsulated 
   packet. 

   A layer 2 multicast address must be assigned for use as the 
   destination address in distributed ARP queries. 

   To support VLANs, RBridges (like bridges today), must be configured, 
   for each port, with the VLAN in which that port belongs. 

4. Handling ARP Queries 

   If the target address is unknown, initiate a distributed ARP query. 
   If the target address is known, reply with a proxy ARP reply, giving 
   the target's true layer 2 address. 

   When initiating a distributed ARP query (or IPv6 neighbor 
   solicitation) remember the address of the requesting node.  When the 
   information is discovered, respond to the requester. 


Perlman                Expires November 2, 2005               [Page 12] 

Internet-Draft      RBridges: Transparent Routing              May 2005 
    

5. Issues 

5.1. How Many Spanning Trees? 

5.1.1. Per-ingress Spanning Tree 

   If a separate spanning tree is calculated per ingress RBridge, then 
   delivery of both broadcast and multicast packets, where the recipient 
   locations are known through some mechanism such as IGMP snooping, can 
   be optimized (for number of packet hops to deliver the multicast 
   packet). 

   Also, if a separate spanning tree is calculated per ingress RBridge, 
   then out of order delivery is minimized when RBridges learn the 
   location of the destination, since the packet will traverse the same 
   path whether it is being delivered via the "destination unknown" tree 
   to that broadcast domain, or the direct path to that destination. 

   However, there is obvious overhead involved in calculating separate 
   spanning trees. 

   This mechanism of avoiding out of order delivery by calculating 
   separate spanning trees per ingress RBridge was presented at the IETF 
   TRILL BOF on March 10, 2005. 

5.1.2. Per VLAN 

   If there are not many links that support VLAN A, then total number of 
   packet hops to deliver a packet within the VLAN A broadcast domain is 
   minimized by calculating a separate spanning tree for each VLAN. 

   It would be possible to still support VLANs with a single spanning 
   tree, by having RBridges only decapsulate a VLAN A packet onto VLAN A 
   links, but the number of transit links such a packet would traverse 
   would be more than necessary (assuming that the location of VLAN A 
   links within the campus is somewhat sparse). 

5.1.3. Single Spanning Tree 

   Broadcast and multicast and VLANs can be supported with a single 
   spanning tree, which the simplest solution and requires the least 
   computation and smallest forwarding tables in the RBridges. In that 
   case all such packets would be delivered to all the RBridges, and 
   only Designated RBridges would differentiably not forward onto links 
   that the packet does belong on. So from the endnodes' point of view, 
   things are still correct; a packet will only be delivered to the 

 
Perlman                Expires November 2, 2005               [Page 13] 

Internet-Draft      RBridges: Transparent Routing              May 2005 
    

   proper links. But the cost to deliver the packet within the core can 
   be much greater. 

   Additionally, the more different spanning trees that are utilized, 
   the more all the links within the core can be fully utilized. 

   The cases in which a broadcast/multicast packet is not delivered to 
   all the links in the campus are: 

   o  when there is a VLAN tag, in which case the packet will only be 
      delivered to links that support that VLAN 

   o  when the layer 2 multicast is derived from an IP multicast, and 
      the RBridges have learned, through IGMP snooping, which links wish 
      to receive the packet 

5.2. Reasons Not to Optimize Handling of IP packets 

   There are two optimizations that were considered but abandoned due to 
   their impact on transparency, i.e., that an RBridge should appear 
   like a bridged network to upper layer protocols. These optimizations 
   focus on ways of merging the shim layer functionality with the 
   existing headers of IP packets. 

5.2.1. Avoiding Encapsulation for On-campus IP Packets 

   In theory, on-campus IP packets need not be encapsulated with an 
   additional layer 2 header.  The original layer 2 header can be 
   discarded and replaced with one where the layer 2 destination is 
   replaced by the next RBridge, and the source layer 2 address is 
   replaced by something that will not confuse bridge learning (since 
   packets will be injected into each segment from unpredictable 
   directions because shortest path routes will be used). 

   The disadvantages of this approach are: 

   o  the IP header's TTL would be decremented by each RBridge, making 
      the customer aware that bridges have been replaced by RBridges, 
      and possibly breaking IP protocols that expect the TTL not to be 
      decremented over an L2 system 

   o  the original layer 2 addresses might need to be preserved for some 
      conceivable uses 

   The real disadvantage, though, is that RBridges would have to have 
   more complex forwarding behavior. They would need to forward based on 
   layer 2 addresses sometimes, and layer 3 addresses at other times. 
 
 
Perlman                Expires November 2, 2005               [Page 14] 

Internet-Draft      RBridges: Transparent Routing              May 2005 
    

   Even if all packets were IP, RBridges would need to forward packets 
   for off-campus IP destinations based on the layer 2 address of the IP 
   router. 

5.2.2. Avoiding Encapsulation for Cff-campus IP Packets 

   Likewise, in theory, off-campus IP packets need not be encapsulated. 
   The TTL in the IP header can be decremented.  The same disadvantages 
   as for on-campus IP packets apply, including the concerns on the 
   impact of decremented TTL on other IP protocol behavior.  However, 
   there is the additional disadvantage that since the actual layer 2 
   destination has to be preserved end-to-end there is the danger of 
   packet proliferation if multiple RBridges decide to forward the 
   packet, which can occur while the topology is adjusting. 

5.3. Supporting Heterogeneous Link Types 

   It is easy to support link types other than 802 links with RBridges. 
   However, mixing link types within a single campus raises 
   complexities, such as packet size, incompatible layer 2 addresses, 
   and other layer 2 features (such as priority) that might be lost when 
   trying to "bridge" two different link types. 

5.4. Effects on L3 TTL 

   In general, an RBridge should have no effect on a Layer 3, e.g., IP 
   TTL field, since the RBridge is a Layer 2 device.  The TTLs which 
   ensure loop-free operation in an RBridge system should occur in the 
   encapsulation header, and not affect any of the headers of the packet 
   passed through the RBridge system.  The RBridge should do nothing to 
   transited packets other than that which would be done by an 
   equivalent L2 system. 

5.5. Using L3 encapsulation 

   RBridges may use L3, e.g., IP encapsulation to provide a routable 
   internal address and a loop-check indicator.  This allows the RBridge 
   system to use L3 routing algorithms, e.g., OSPF, using existing L3 
   implementations.  As with any RBridge system, packets are forwarded 
   only within the preconfigured RBridge system.  Intermediate L2 
   bridges are allowed whether L2 or L3 encapsulation is used.  L3 
   encapsulation processing - including ICMP handling, fragmentation, 
   etc., are well-defined (e.g., RFC2003). 

   In this case, the L3 encapsulation should not decrement the TTL of 
   the inner transited packet, since (as per RFC2003) the RBridge system 
   would not be considered a forwarding (i.e., L3) 'tunnel'.  Further, 
 
 
Perlman                Expires November 2, 2005               [Page 15] 

Internet-Draft      RBridges: Transparent Routing              May 2005 
    

   changing the IP TTL would potentially affect the reachability of all 
   1's broadcast or multicast, which would not reach the full L2 subnet. 

   The primary disadvantage to L3 encapsulation is the increased 
   overhead of encapsulation (e.g., adding both an L3 and subsequent 
   outer L2 header) and complexity of providing L2 services (broadcast 
   notably) within the L3 subnet (RFC1122, RFC1812).  Note that L3 
   supports fragmentation and reassembly for tunnels, notably both for 
   IPv4 and IPv6 encapsulation.  Reassembly would be required at the 
   egress, which increases the load on the egress RBridge in tracking 
   and storing the fragments, but the resulting transited packet is 
   generally transparent to the process.  The primary effect would be if 
   there were a large amount of reordering (increasing the reassembly 
   load) or high packet loss (resulting in failed reassembly and thus 
   lost packets).  In the latter case, packet loss is amplified because 
   of the lack of fate sharing of the fragments of a single transited 
   packet. 

5.6. Optimizing ARP/ND 

   There are various alternatives for how an RBridge could handle 
   ARPs/NDs when the target is known (because of having been 
   disseminated through the link state protocol). Listed from most 
   expensive to least expensive: 

   o  treat ARP/ND like any multicast packet, and send along the 
      (appropriate) spanning tree, and let the target respond 

   o  route the ARP/ND to the RBridge that claims attachment to the 
      target 

   o  do proxy ARP/ND 

   The only reason not to do proxy ARP/ND is in case the target node has 
   actually moved, and has not yet been discovered by the RBridges. If 
   the actual target needs to respond, then obviously the target is 
   there. If the query is routed to the expected link, then there won't 
   be a false positive, but the real location of the target may not be 
   found, if the target has moved. 

   Some mix of these strategies might be the best solution. For 
   instance, if the target's location has not been recently verified 
   through a broadcast ARP/ND, then the source's RBridge should 
   broadcast the ARP/ND. Otherwise it should do proxy ARP. So for 
   instance, RBridges could keep track of the last time a broadcast 
   ARP/ND occurred for each endnode E (by any source, and injected by 
   any RBridge). Let's say the parameter is 20 seconds. If a source S on 
 
 
Perlman                Expires November 2, 2005               [Page 16] 

Internet-Draft      RBridges: Transparent Routing              May 2005 
    

   RBridge R1's link does an ARP/ND for D, if R1 has not seen an ARP/ND 
   for D within the last 20 seconds, R1 broadcasts the query; otherwise 
   it proxies the reply. 

6. Security Considerations 

   The goal is for RBridges to not add additional security issues over 
   what would be present with traditional bridges.  RBridges will not be 
   able to prevent nodes from impersonating other nodes, for instance, 
   by issuing bogus ARP replies.  However, RBridges will not interfere 
   with any schemes that would secure neighbor discovery. 

   As with routing schemes, authentication of RBridge messages would be 
   a simple addition to the design (and it would be accomplished the 
   same way as it would be in IS-IS).  However, any sort of 
   authentication requires additional configuration, which might 
   interfere with the perception that RBridges, like bridges, are zero 
   configuration. 

7. Conclusions 

   This design allows transparent interconnection of multiple links into 
   a single IP subnet.  Management would be just like with bridges 
   (plug-and-play).  But this design avoids the disadvantages of 
   bridges.  Temporary loops are not a problem so failover can be as 
   fast as possible, and shortest paths can be followed. 

   The design is compatible with current IP nodes and routers, and with 
   current bridges. 

8. Acknowledgments 

   We anticipate that many people will contribute to this design, and 
   invite you to join the mailing list at http://www.postel.org/rbridge 

9. References 

9.1. Normative References 

   [1]   Perkins, C., "IP Encapsulation within IP", RFC 2003 (Standards 
         Track), October 1996. 

   [2]   Braden, R., "Requirements for Internet Hosts - Communication 
         Layers", STD 3, RFC 1122, October 1989. 

   [3]   Baker, F., "Requirements for IP Version 4 Routers", RFC 1812 
         (Standards Track), June 1995. 
 
 
Perlman                Expires November 2, 2005               [Page 17] 

Internet-Draft      RBridges: Transparent Routing              May 2005 
    

   [4]   Plummer, D., "Ethernet Address Resolution Protocol: Or 
         converting network protocol addresses to 48.bit Ethernet 
         address for transmission on Ethernet hardware", STD 37, RFC 
         826, November 1982. 

   [5]   Narten, T., Nordmark, E. and W. Simpson, "Neighbor Discovery 
         for IP Version 6 (IPv6)", RFC 2461 (Standards Track), December 
         1998. 

   [6]   Callon, R., "Use of OSI IS-IS for routing in TCP/IP and dual 
         environments", RFC 1195, December 1990. 

   [7]   IEEE 802.1d bridging standard, "IEEE 802.1d bridging standard". 

   [8]   Perlman, R., "RBridges: Transparent Routing", Proc. Infocom 
         2005, March 2004. 

   [9]   Perlman, R., "Interconnection: Bridges, Routers, Switches, and 
         Internetworking Protocols", Addison Wesley Chapter 3, 1999. 

   [10]  Touch, J., "Dynamic Internet overlay deployment and management 
         using the X-Bone", Computer Networks Vol. 36, No. 2-3, July 
         2001. 

   [11]  Touch, J., Wang, Y., Eggert, L. and G. Finn, "A Virtual 
         Internet Architecture", ISI Technical Report ISI-TR-570, 
         Presented at the Workshop on Future Directions in Network 
         Architecture (FDNA) 2003 at Sigcomm 2003, March 2003. 

9.2. Informative References 

   [12]  Harkins, D. and D. Carrel, "The Internet Key Exchange (IKE)", 
         RFC 2409 (Standards Track), November 1998. 

   [13]  Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191, 
         November 1990. 

   [14]  Lahey, K., "TCP Problems with Path MTU Discovery", RFC 2923 
         (Informational), September 2000. 

   [15]  Kent, S., "IP Encapsulating Security Payload (ESP)", 
         draft-ietf-ipsec-esp-v3-10 (work in progress), March 2005. 

   [16]  Kent, S., "IP Authentication Header", 
         draft-ietf-ipsec-rfc2402bis-011 (work in progress), March 2005. 


Perlman                Expires November 2, 2005               [Page 18] 

Internet-Draft      RBridges: Transparent Routing              May 2005 
    

   [17]  Kaufman, C., "Internet Key Exchange (IKEv2) Protocol", 
         draft-ietf-ipsec-ikev2-17 (work in progress), Oct. 2004. 

Author's Addresses 

   Radia Perlman 
   Sun Microsystems 
       
   Email: Radia.Perlman@sun.com 
    

   Joe Touch 
   USC/ISI 
   4676 Admiralty Way 
   Marina del Rey, CA 90292 U.S.A. 
       
   Phone: +1 (310)_448-9151 
   Email: touch@isi.edu 
    

   Alper Yegin 
   Samsung Advanced Institute of Technology 
       
   Email: alper.yegin@samsung.com 
    

Intellectual Property Statement 

   The IETF takes no position regarding the validity or scope of any 
   Intellectual Property Rights or other rights that might be claimed to 
   pertain to the implementation or use of the technology described in 
   this document or the extent to which any license under such rights 
   might or might not be available; nor does it represent that it has 
   made any independent effort to identify any such rights.  Information 
   on the procedures with respect to rights in RFC documents can be 
   found in BCP 78 and BCP 79. 

   Copies of IPR disclosures made to the IETF Secretariat and any 
   assurances of licenses to be made available, or the result of an 
   attempt made to obtain a general license or permission for the use of 
   such proprietary rights by implementers or users of this 
   specification can be obtained from the IETF on-line IPR repository at 
   http://www.ietf.org/ipr. 

   The IETF invites any interested party to bring to its attention any 
   copyrights, patents or patent applications, or other proprietary 
   rights that may cover technology that may be required to implement 
 
 
Perlman                Expires November 2, 2005               [Page 19] 

Internet-Draft      RBridges: Transparent Routing              May 2005 
    

   this standard.  Please address the information to the IETF at 
   ietf-ipr@ietf.org 

Disclaimer of Validity 

   This document and the information contained herein are provided on an 
   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 
   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET 
   ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, 
   INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE 
   INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 
   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 

Copyright Statement 

   Copyright (C) The Internet Society (2005). 

   This document is subject to the rights, licenses and restrictions 
   contained in BCP 78, and except as set forth therein, the authors 
   retain all their rights. 

Acknowledgment 

   Funding for the RFC Editor function is currently provided by the 
   Internet Society. 

    
Perlman                Expires November 2, 2005               [Page 20]