Network Working Group R. Perlman Internet Draft Sun Expires: November 2005 J. Touch USC/ISI A. Yegin Samsung May 2, 2005 RBridges: Transparent Routing draft-perlman-rbridge-03.txt Status of this Memo By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79. This document may not be modified, and derivative works of it may not be created, except to publish it as an RFC and to translate it into languages other than English. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html This Internet-Draft will expire on November 2, 2005. Copyright Notice Copyright (C) The Internet Society (2005). All Rights Reserved. Perlman Expires November 2, 2005 [Page 1] Internet-Draft RBridges: Transparent Routing May 2005 Abstract RBridges provide the ability to have an entire campus, with multiple physical links, look to IP like a single subnet. The design allows for zero configuration of switches within a campus, optimal pair-wise routing, safe forwarding even during periods of temporary loops, and the ability to cut down on ARP/ND traffic. The design also supports VLANs, and allows forwarding tables to be based on RBridge destinations (rather than endnode destinations), which allows internal routing tables to be substantially smaller than in conventional bridge systems. This document is a work in progress; we invite you to participate on the mailing list at http://www.postel.org/RBridge Table of Contents 1. Introduction...................................................3 2. Detailed RBridge Design........................................5 2.1. Link State Protocol.......................................5 2.2. Spanning Tree.............................................6 2.3. Designated RBridge........................................7 2.4. Learning Endnode Location.................................8 2.5. Forwarding Behavior.......................................8 2.6. Forwarding Header on 802 Links............................8 2.7. Distributed ARP Query....................................11 3. RBridge Addresses, Parameters, and Constants..................12 4. Handling ARP Queries..........................................12 5. Issues........................................................13 5.1. How Many Spanning Trees?.................................13 5.1.1. Per-ingress Spanning Tree...........................13 5.1.2. Per VLAN............................................13 5.1.3. Single Spanning Tree................................13 5.2. Reasons Not to Optimize Handling of IP packets...........14 5.2.1. Avoiding Encapsulation for On-campus IP Packets.....14 5.2.2. Avoiding Encapsulation for Cff-campus IP Packets....15 5.3. Supporting Heterogeneous Link Types......................15 5.4. Effects on L3 TTL........................................15 5.5. Using L3 encapsulation...................................15 5.6. Optimizing ARP/ND........................................16 6. Security Considerations.......................................17 7. Conclusions...................................................17 8. Acknowledgments...............................................17 9. References....................................................17 9.1. Normative References.....................................17 9.2. Informative References...................................18 Perlman Expires November 2, 2005 [Page 2] Internet-Draft RBridges: Transparent Routing May 2005 Author's Addresses...............................................19 Intellectual Property Statement..................................19 Disclaimer of Validity...........................................20 Copyright Statement..............................................20 Acknowledgment...................................................20 1. Introduction In traditional IPv4 and IPv6 networks, each link must have a unique prefix. This means that a node that moves from one link to another must change its IP address, and a node with multiple links must have multiple addresses. It also means that a company with many links (separated by routers) will have difficulty making full use of its IP address block (since any link not fully populated will waste addresses), and routers require significant configuration. Bridges avoid these problems because bridges can transparently glue many physical links into what appears to IP to be a single LAN. However, bridge routing via the spanning tree concentrates traffic onto selected links, forward based on a header for which any temporary loops (which might arise due to topology changes or lost spanning tree messages or components such as repeaters coming up) are very dangerous (because there is no hop count in the header and there may be exponential proliferation of packets during loops), and routes cannot be pair-wise shortest paths, but instead whatever path remains after the spanning tree eliminates redundant paths. We define the term "campus" to be the set of links connected by any combination of RBridges and bridges. In other words the term 'campus' needs to be clearly defined. A campus refers to a set of links connected by either RBridges or bridges. In other words, the campus is terminated by traditional IP routers, in the same way that an IP subnet would be terminated by an IP router. A campus will look to IP nodes like a single IP subnet, whether the interconnection of the links is done with bridges, RBridges, or some combination of the two. There have been proposals for having routers within a campus automatically number links with distinct IP subnet numbers. Although this makes a campus plug-and-play, it requires a large number of IP subnet numbers, a node must change its address if it moves to a different link, and addresses of nodes might fluctuate as the topology changes and links must be renumbered. This proposal introduces RBridges [8] (Routing Bridges), which combine the advantages of bridges and routers. Like bridges, RBridges are zero configuration, and are transparent to IP nodes. Like routers, RBridges forward on pair-wise shortest paths, and do not Perlman Expires November 2, 2005 [Page 3] Internet-Draft RBridges: Transparent Routing May 2005 have dangerous behavior during temporary loops. RBridges have the additional advantage that they can suppress the broadcast/multicast for neighbor discovery by doing proxy ARP (IPv4) or proxy ND (IPv6). RBridges are fully compatible with current bridges as well as current IPv4 and IPv6 routers and endnodes. They are as invisible to current IP routers as bridges are, and like routers, they terminate a bridged spanning tree. The main idea is to have RBridges run a link state protocol amongst themselves (IS-IS is ideal, since its TLV encoding easily allows new information to be carried in link state information, as this proposal requires, and also makes zero configuration easier because IS-IS does not require assigning IP addresses to the RBridges). The next step is for RBridges to learn the location of endnodes. They can learn the location and layer 2 addresses of attached nodes from the source address of data packets, as bridges do. Additionally, in order to facility proxy ARP or proxy ND optimizations, RBridges can also learn the (layer 3, layer 2) addresses of attached IP nodes from ARP or ND replies. Once an RBridge learns the location of a directly attached endnode, it informs the other RBridges in its link state information. RBridge forwarding can be done, as with a router, via pairwise shortest paths. RBridges could also utilize forwarding optimizations, e.g., MPLS. To prevent the temporary loop issues with bridges, RBridges must always forward based on a header with a hop count. Although the hop count will quickly discard looping packets, it is also desirable not to spawn additional copies of packets. This can be accomplished by having RBridges specify the next RBridge recipient while forwarding across a shared-media link. For two reasons, packets must be encapsulated as they are traveling between RBridges: 1. so that intermediate RBridges (and bridges) will not be confused about the location of the source by learning the source address from packets in transit 2. so that the packet can be directed towards the egress RBridge, and can include a hop count (for links, like Ethernet, that do not already contain a hop count). Perlman Expires November 2, 2005 [Page 4] Internet-Draft RBridges: Transparent Routing May 2005 RBridges are similar to Recursive Routers, which provide similar transit to emulate a single L3 router, in that case using L3 + L2 encapsulation [10][11]. A VLAN is a broadcast domain. That means that a layer 2 broadcast (multicast) packet sent to a VLAN must only be delivered to links that are in that VLAN. A packet for a particular VLAN may transit any link on the campus, but an unencapsulated VLAN packet must only be delivered to links that RBridges have been configured to know support that VLAN. Support of VLANs does traditionally require configuration of the bridges (or in this case RBridges) to know which links belong to which VLANs. In theory some other mechanism might allow an RBridge to know which VLANs should be supported on which port. The RBridge design does not care how RBridges discover which VLANs are supported by each of their ports, but for simplicity we assume here that RBridges (like bridges) are configured with this information. RBridges must calculate a spanning tree for each broadcast domain. In a campus without VLANs, this means a single spanning tree would be used for delivery of packets with unknown or group address layer 2 destination. It is possible to support VLANs with a single spanning tree, and just avoid forwarding the decapsulated packet onto links that do not support that VLAN. However, it will allow for more optimal delivery if a different spanning tree is calculated for each broadcast domain. It is not necessary to use the bridge spanning tree algorithm to calculate the spanning trees. Instead, they can be calculated based on the link state information. Using the link state protocol to calculate spanning trees makes the design very flexible and efficient. The link state database gives sufficient information so that RBridges can calculate a single spanning tree, spanning trees per VLAN, or per-ingress RBridge spanning trees without requiring any additional exchange of information between RBridges. 2. Detailed RBridge Design 2.1. Link State Protocol Running a link state protocol among RBridges is straightforward. It is the same as running a level 1 routing protocol in an area. IS-IS is a more appropriate choice than OSPF in this case because it is easy in IS-IS to define new TLVs for carrying new information. However, the instance of IS-IS that RBridges will implement will be separate from any routing protocol that IP routers will implement, just as the spanning tree messages are not implemented by IP routers. Perlman Expires November 2, 2005 [Page 5] Internet-Draft RBridges: Transparent Routing May 2005 To keep the instances separate, RBridge routing messages should be sent to a different layer 2 multicast address than IS-IS routing messages. Alternatively, they can be differentiated by having a different "area address", where, in order to keep RBridges configuration-free, the RBridge area address would be a constant for all RBridges, and would not be one that would ever appear as a real IS-IS area address. Additional information that RBridge link state information will carry is: o layer 2 addresses of nodes within the campus which have transmitted packets but have not transmitted ARP or ND replies o layer 3, layer 2 addresses of IP nodes within the campus. For data compression, perhaps only the portion of the address following the campus-wide prefix need be carried. This will be more of an issue for IPv6 than for IPv4. o VLANs directly connected to this RBridge The endnode information (the endnode information) need only be delivered to RBridges supporting the VLAN in which the endnode resides. So for instance, if endnode E is discovered through a VLAN A packet, then E's location need only be delivered to other RBridges that are attached to VLAN A links. Given that RBridges must support delivery only to links within a VLAN (for multicast or unknown packets marked with the VLAN's tag), this mechanism can be used to advertise endnode information solely to RBridges within a VLAN. Although a separate instance of the link state protocol could be run for this purpose, the topology is so restricted (just a single broadcast domain), that it might be preferable to design a special case mechanism where each DR advertises its attached endnodes, and receives explicit acks from the other RBridges. 2.2. Spanning Tree There will be cases when RBridges may need to send packets to all links. These cases include: o layer 2 multicast or broadcast packets o unknown layer 2 destination addresses o distributed RBridge layer 3 address location query Perlman Expires November 2, 2005 [Page 6] Internet-Draft RBridges: Transparent Routing May 2005 In this case the packets must be sent through a spanning tree. However, there is no need to implement a separate spanning tree protocol in addition to the link state protocol. Instead, the link state information can be used to create a single spanning tree throughout the campus. This is done by choosing the RBridge with lowest ID, and calculating the Dijkstra tree with that RBridge as Root. In the case of multiple equal cost links, some tie-breaker must be used to ensure that all RBridges calculate the same spanning tree. We suggest using the ID of the parent as the tie breaker (if a node can be attached to either parent P1 or P2 with the same cost, choose P1 if P1's ID is lower than P2). In the case of multicast L2 addresses, the RBridge may treat these as broadcast, or may include existing techniques for emulating multicast at L2, i.e., snooping IGMP and/or PIM-SM packets to configure an internal, L2 multicast tree. For a packet tagged with a VLAN ID (e.g., VLAN A), the packet is only delivered to links that support VLAN A. It would provide for more optimal delivery if a different spanning tree were calculated for each VLAN. This would be done by choosing the RBridge with lowest ID that connects to that VLAN as root, and calculating a tree of shortest paths from that RBridge. RBridges that do not support VLAN A may be on the delivery path for VLAN A packets, but they will not decapsulate the packet onto links that are not VLAN A links. If IGMP snooping is used to know where recipients of a multicast packet reside, then the total number of packet-hops to deliver the packet can be optimized by calculating a separate spanning tree per ingress RBridge. This, however, requires a lot more computation (one tree per RBridge). The tradeoffs will be discussed in the "Issues" section at the end of this document. 2.3. Designated RBridge It is useful for one RBridge on each link to have special duties. Thus one RBridge per link should be elected Designated RBridge. IS-IS already holds such an election. The Designated RBridge is the one on the link that will learn the identities of attached endnodes, initiate a distributed ARP when an ARP query is received for an unknown destination, and answer ARP queries when the target node is known. Perlman Expires November 2, 2005 [Page 7] Internet-Draft RBridges: Transparent Routing May 2005 2.4. Learning Endnode Location RBridges learn endnode location from data packets. They learn (layer 3, layer 2) pairs (for the purpose of supporting proxy ARP/ND) from listening to ARP or ND replies. This endnode information is learned by the DR, and distributed to other RBridges through the link state protocol. 2.5. Forwarding Behavior When a DR R1 receives a native packet with layer 2 address S and layer 2 destination address D, R1 looks up the location of D. If D is claimed by egress RBridge R2, then R1 encapsulates the packet, directing it towards R2. When an RBridge receives an encapsulated packet, it forwards based on the specified egress RBridge (rather than the ultimate destination endnode). If the packet belongs in VLAN A, then R1 (the ingress RBridge) looks up D's location in R1's table of VLAN A endnodes. 2.6. Forwarding Header on 802 Links It is essential that RBridges coexist with ordinary bridges. Therefore, a packet in transit must look to ordinary bridges like an ordinary layer 2 packet. However, it must also be differentiable from a native layer 2 packet by RBridges. To accomplish this, we use a new layer 2 protocol type ("Ethertype"). A packet in transit on an 802 link will therefore have two 802 headers, since the original frame (including the original 802 header) will be tunneled by the RBridges. But rather than just having an additional 802 header, we include additional information between the two headers; at least a hop count. An encapsulated packet would look as follows: +--------------+-------------+-----------------+ | outer header | shim header | original packet | +--------------+-------------+-----------------+ Figure 1 Encapsulated packet Perlman Expires November 2, 2005 [Page 8] Internet-Draft RBridges: Transparent Routing May 2005 The outer header contains: o L2 destination = next RBridge o L2 source = transmitting RBridge (the most one that most recently handled this packet) o protocol type = "to be assigned...RBridge encapsulated packet" The shim header includes: o TTL = starts at some value and decremented by each RBridge. Discarded if=0 o egress RBridge (in the case of unicast), or ingress RBridge (in the case of multicast) Note that one variation is to have the egress RBridge specified in the outer header rather than in the shim header. This will mean that some packet duplication might occur during temporary loops. But the advantage is that the header will be 6 bytes smaller. This is discussed in the "issues" section. The following is a walk-through of a packet traversing an RBridge campus. Consider a packet consisting of "data" to be sent from node A to node B through an RBridge campus (dotted area) as per Figure 2. ............................... . . +--------+ .+-----+ +-----+ +-----+. +--------+ | | .| | | | | |. | | | Host A ----- Rb1 ------ Rb2 ------ Rb3 ------ Host B | | | .| | | | | |. | | +--------+ .+-----+ +-----+ +-----+. +--------+ . . . RBridge campus . ............................... Figure 2 Sample path for packet traversing an RBridge campus In this figure, Host A is the source, Host B the sink, and Rb1..Rb3 are nodes of the RBridge campus. Rb1 is the ingress, and Rb3 is the egress. Additionally, layer 2 (L2) addresses are as shown below the components on the particular ports in Figure 3; note that addresses are required for RBridge nodes for encapsulation and routing within the campus. Different addresses are shown for each port on an RBridge node for simplicity, although this is not required. Perlman Expires November 2, 2005 [Page 9] Internet-Draft RBridges: Transparent Routing May 2005 ............................... . . +--------+ .+-----+ +-----+ +-----+. +--------+ | | .| | | | | |. | | | Host A ----- Rb1 ------ Rb2 ------ Rb3 ------ Host B | | a b1x b1y b2x b2y b3x b3y b | | | .| | | | | |. | | +--------+ .+-----+ +-----+ +-----+. +--------+ . . . RBridge campus . ............................... Figure 3 Sample path including L2 addresses Consider the originating packet as per Figure 4; "L2 a->b" means the layer 2 (L2) source address is "a" and the L2 destination address is "b", and "IP A->B" means the IP source address is A and the IP destination is B. +---------+---------+--------+ | L2 a->b | IP A->B | data | +---------+---------+--------+ Figure 4 Packet as originated at Host A The ingress RBridge Rb1 looks up 'b' in its encapsulation tables, which indicate that Rb3 is the egress RBridge. The packet gets wrapped to direct it to Rb3 using a shim header (SH), where the destination is based on the L2 address of Rb3 (the egress) and uses a TTL of 20, as shown in Figure 5. +-----------------+---------+---------+--------+ | SH ->b3y TTL=20 | L2 a->b | IP A->B | data | +-----------------+---------+---------+--------+ Figure 5 Packet with shim header Note that the shim header includes only egress addresses for unicast packets; for multicast packets, ingress L2 is used instead. Rb1 then looks up the shim header destination in its (campus) forwarding tables, yielding Rb2 as the next hop inside the campus. Rb1 then sends the packet on to Rb2 by adding the appropriate L2 header, as shown in Figure 6. Perlman Expires November 2, 2005 [Page 10] Internet-Draft RBridges: Transparent Routing May 2005 +-------------+-----------------+---------+---------+--------+ | L2 b1y->b2x | SH ->b3y TTL=20 | L2 a->b | IP A->B | data | +-------------+-----------------+---------+---------+--------+ Figure 6 Packet as sent from Rb1 to Rb2 Rb2 unwraps the outermost L2, decrements the shim TTL, and looks up the shim destination's next hop (which is Rb3 here). Rb2 then adds a new L2 header addressed to Rb3, as shown in Figure 7. +-------------+-----------------+---------+---------+--------+ | L2 b2y->b3x | SH ->b3y TTL=19 | L2 a->b | IP A->B | data | +-------------+-----------------+---------+---------+--------+ Figure 7 Packet as sent from Rb2 to Rb3 Rb3 unwraps the outer L2, notices that the shim destination has been reached (itself), and unwraps the shim too. At that point, it proceeds to send the original packet shown in Figure 4 to Host B. 2.7. Distributed ARP Query The distributed ARP query is carried by RBridges through the RBridge spanning tree. Each Designated RBridge, in addition to forwarding the query through the spanning tree, initiates an ARP query on its link(s). If a reply is received by Designated RBridge R2, R2 initiates a link state update to inform all the other RBridges of D's location, layer 3 address, and layer 2 address. The distributed ARP query must be sent to a (new, to be assigned) layer 2 multicast address. The fields it must contain are: Outer Layer 2 header: o destination = newly defined l2 multicast address o source = transmitting RBridge (replaced hop by hop) o protocol type = same as encapsulated RBridge Shim header: o TTL (for safety if the RBridge spanning tree has temporary loops, and where the L2 header lacks an existing TTL) Perlman Expires November 2, 2005 [Page 11] Internet-Draft RBridges: Transparent Routing May 2005 o ingress RBridge (rather than egress RBridge, which would be specified in unicast packets to known destinations); this is used for ingress-specific forwarding, e.g., for VLANs RBridge payload: o original ARP or ND query Intermediate RBridges decrement the above TTL, and replace the source RBridge with their own layer 2 address on the outgoing interface. 3. RBridge Addresses, Parameters, and Constants Each RBridge needs a unique ID within the campus. The simplest such address is a unique 6-byte ID, since such an ID is easily obtainable as any of the EUI-48's owned by that RBridge. IS-IS already requires each router to have such an address. A parameter is the value to which to initially set the hop count in the envelope. Recommended default=20. A new Ethertype must be assigned to indicate an RBridge-encapsulated packet. A layer 2 multicast address must be assigned for use as the destination address in distributed ARP queries. To support VLANs, RBridges (like bridges today), must be configured, for each port, with the VLAN in which that port belongs. 4. Handling ARP Queries If the target address is unknown, initiate a distributed ARP query. If the target address is known, reply with a proxy ARP reply, giving the target's true layer 2 address. When initiating a distributed ARP query (or IPv6 neighbor solicitation) remember the address of the requesting node. When the information is discovered, respond to the requester. Perlman Expires November 2, 2005 [Page 12] Internet-Draft RBridges: Transparent Routing May 2005 5. Issues 5.1. How Many Spanning Trees? 5.1.1. Per-ingress Spanning Tree If a separate spanning tree is calculated per ingress RBridge, then delivery of both broadcast and multicast packets, where the recipient locations are known through some mechanism such as IGMP snooping, can be optimized (for number of packet hops to deliver the multicast packet). Also, if a separate spanning tree is calculated per ingress RBridge, then out of order delivery is minimized when RBridges learn the location of the destination, since the packet will traverse the same path whether it is being delivered via the "destination unknown" tree to that broadcast domain, or the direct path to that destination. However, there is obvious overhead involved in calculating separate spanning trees. This mechanism of avoiding out of order delivery by calculating separate spanning trees per ingress RBridge was presented at the IETF TRILL BOF on March 10, 2005. 5.1.2. Per VLAN If there are not many links that support VLAN A, then total number of packet hops to deliver a packet within the VLAN A broadcast domain is minimized by calculating a separate spanning tree for each VLAN. It would be possible to still support VLANs with a single spanning tree, by having RBridges only decapsulate a VLAN A packet onto VLAN A links, but the number of transit links such a packet would traverse would be more than necessary (assuming that the location of VLAN A links within the campus is somewhat sparse). 5.1.3. Single Spanning Tree Broadcast and multicast and VLANs can be supported with a single spanning tree, which the simplest solution and requires the least computation and smallest forwarding tables in the RBridges. In that case all such packets would be delivered to all the RBridges, and only Designated RBridges would differentiably not forward onto links that the packet does belong on. So from the endnodes' point of view, things are still correct; a packet will only be delivered to the Perlman Expires November 2, 2005 [Page 13] Internet-Draft RBridges: Transparent Routing May 2005 proper links. But the cost to deliver the packet within the core can be much greater. Additionally, the more different spanning trees that are utilized, the more all the links within the core can be fully utilized. The cases in which a broadcast/multicast packet is not delivered to all the links in the campus are: o when there is a VLAN tag, in which case the packet will only be delivered to links that support that VLAN o when the layer 2 multicast is derived from an IP multicast, and the RBridges have learned, through IGMP snooping, which links wish to receive the packet 5.2. Reasons Not to Optimize Handling of IP packets There are two optimizations that were considered but abandoned due to their impact on transparency, i.e., that an RBridge should appear like a bridged network to upper layer protocols. These optimizations focus on ways of merging the shim layer functionality with the existing headers of IP packets. 5.2.1. Avoiding Encapsulation for On-campus IP Packets In theory, on-campus IP packets need not be encapsulated with an additional layer 2 header. The original layer 2 header can be discarded and replaced with one where the layer 2 destination is replaced by the next RBridge, and the source layer 2 address is replaced by something that will not confuse bridge learning (since packets will be injected into each segment from unpredictable directions because shortest path routes will be used). The disadvantages of this approach are: o the IP header's TTL would be decremented by each RBridge, making the customer aware that bridges have been replaced by RBridges, and possibly breaking IP protocols that expect the TTL not to be decremented over an L2 system o the original layer 2 addresses might need to be preserved for some conceivable uses The real disadvantage, though, is that RBridges would have to have more complex forwarding behavior. They would need to forward based on layer 2 addresses sometimes, and layer 3 addresses at other times. Perlman Expires November 2, 2005 [Page 14] Internet-Draft RBridges: Transparent Routing May 2005 Even if all packets were IP, RBridges would need to forward packets for off-campus IP destinations based on the layer 2 address of the IP router. 5.2.2. Avoiding Encapsulation for Cff-campus IP Packets Likewise, in theory, off-campus IP packets need not be encapsulated. The TTL in the IP header can be decremented. The same disadvantages as for on-campus IP packets apply, including the concerns on the impact of decremented TTL on other IP protocol behavior. However, there is the additional disadvantage that since the actual layer 2 destination has to be preserved end-to-end there is the danger of packet proliferation if multiple RBridges decide to forward the packet, which can occur while the topology is adjusting. 5.3. Supporting Heterogeneous Link Types It is easy to support link types other than 802 links with RBridges. However, mixing link types within a single campus raises complexities, such as packet size, incompatible layer 2 addresses, and other layer 2 features (such as priority) that might be lost when trying to "bridge" two different link types. 5.4. Effects on L3 TTL In general, an RBridge should have no effect on a Layer 3, e.g., IP TTL field, since the RBridge is a Layer 2 device. The TTLs which ensure loop-free operation in an RBridge system should occur in the encapsulation header, and not affect any of the headers of the packet passed through the RBridge system. The RBridge should do nothing to transited packets other than that which would be done by an equivalent L2 system. 5.5. Using L3 encapsulation RBridges may use L3, e.g., IP encapsulation to provide a routable internal address and a loop-check indicator. This allows the RBridge system to use L3 routing algorithms, e.g., OSPF, using existing L3 implementations. As with any RBridge system, packets are forwarded only within the preconfigured RBridge system. Intermediate L2 bridges are allowed whether L2 or L3 encapsulation is used. L3 encapsulation processing - including ICMP handling, fragmentation, etc., are well-defined (e.g., RFC2003). In this case, the L3 encapsulation should not decrement the TTL of the inner transited packet, since (as per RFC2003) the RBridge system would not be considered a forwarding (i.e., L3) 'tunnel'. Further, Perlman Expires November 2, 2005 [Page 15] Internet-Draft RBridges: Transparent Routing May 2005 changing the IP TTL would potentially affect the reachability of all 1's broadcast or multicast, which would not reach the full L2 subnet. The primary disadvantage to L3 encapsulation is the increased overhead of encapsulation (e.g., adding both an L3 and subsequent outer L2 header) and complexity of providing L2 services (broadcast notably) within the L3 subnet (RFC1122, RFC1812). Note that L3 supports fragmentation and reassembly for tunnels, notably both for IPv4 and IPv6 encapsulation. Reassembly would be required at the egress, which increases the load on the egress RBridge in tracking and storing the fragments, but the resulting transited packet is generally transparent to the process. The primary effect would be if there were a large amount of reordering (increasing the reassembly load) or high packet loss (resulting in failed reassembly and thus lost packets). In the latter case, packet loss is amplified because of the lack of fate sharing of the fragments of a single transited packet. 5.6. Optimizing ARP/ND There are various alternatives for how an RBridge could handle ARPs/NDs when the target is known (because of having been disseminated through the link state protocol). Listed from most expensive to least expensive: o treat ARP/ND like any multicast packet, and send along the (appropriate) spanning tree, and let the target respond o route the ARP/ND to the RBridge that claims attachment to the target o do proxy ARP/ND The only reason not to do proxy ARP/ND is in case the target node has actually moved, and has not yet been discovered by the RBridges. If the actual target needs to respond, then obviously the target is there. If the query is routed to the expected link, then there won't be a false positive, but the real location of the target may not be found, if the target has moved. Some mix of these strategies might be the best solution. For instance, if the target's location has not been recently verified through a broadcast ARP/ND, then the source's RBridge should broadcast the ARP/ND. Otherwise it should do proxy ARP. So for instance, RBridges could keep track of the last time a broadcast ARP/ND occurred for each endnode E (by any source, and injected by any RBridge). Let's say the parameter is 20 seconds. If a source S on Perlman Expires November 2, 2005 [Page 16] Internet-Draft RBridges: Transparent Routing May 2005 RBridge R1's link does an ARP/ND for D, if R1 has not seen an ARP/ND for D within the last 20 seconds, R1 broadcasts the query; otherwise it proxies the reply. 6. Security Considerations The goal is for RBridges to not add additional security issues over what would be present with traditional bridges. RBridges will not be able to prevent nodes from impersonating other nodes, for instance, by issuing bogus ARP replies. However, RBridges will not interfere with any schemes that would secure neighbor discovery. As with routing schemes, authentication of RBridge messages would be a simple addition to the design (and it would be accomplished the same way as it would be in IS-IS). However, any sort of authentication requires additional configuration, which might interfere with the perception that RBridges, like bridges, are zero configuration. 7. Conclusions This design allows transparent interconnection of multiple links into a single IP subnet. Management would be just like with bridges (plug-and-play). But this design avoids the disadvantages of bridges. Temporary loops are not a problem so failover can be as fast as possible, and shortest paths can be followed. The design is compatible with current IP nodes and routers, and with current bridges. 8. Acknowledgments We anticipate that many people will contribute to this design, and invite you to join the mailing list at http://www.postel.org/rbridge 9. References 9.1. Normative References [1] Perkins, C., "IP Encapsulation within IP", RFC 2003 (Standards Track), October 1996. [2] Braden, R., "Requirements for Internet Hosts - Communication Layers", STD 3, RFC 1122, October 1989. [3] Baker, F., "Requirements for IP Version 4 Routers", RFC 1812 (Standards Track), June 1995. Perlman Expires November 2, 2005 [Page 17] Internet-Draft RBridges: Transparent Routing May 2005 [4] Plummer, D., "Ethernet Address Resolution Protocol: Or converting network protocol addresses to 48.bit Ethernet address for transmission on Ethernet hardware", STD 37, RFC 826, November 1982. [5] Narten, T., Nordmark, E. and W. Simpson, "Neighbor Discovery for IP Version 6 (IPv6)", RFC 2461 (Standards Track), December 1998. [6] Callon, R., "Use of OSI IS-IS for routing in TCP/IP and dual environments", RFC 1195, December 1990. [7] IEEE 802.1d bridging standard, "IEEE 802.1d bridging standard". [8] Perlman, R., "RBridges: Transparent Routing", Proc. Infocom 2005, March 2004. [9] Perlman, R., "Interconnection: Bridges, Routers, Switches, and Internetworking Protocols", Addison Wesley Chapter 3, 1999. [10] Touch, J., "Dynamic Internet overlay deployment and management using the X-Bone", Computer Networks Vol. 36, No. 2-3, July 2001. [11] Touch, J., Wang, Y., Eggert, L. and G. Finn, "A Virtual Internet Architecture", ISI Technical Report ISI-TR-570, Presented at the Workshop on Future Directions in Network Architecture (FDNA) 2003 at Sigcomm 2003, March 2003. 9.2. Informative References [12] Harkins, D. and D. Carrel, "The Internet Key Exchange (IKE)", RFC 2409 (Standards Track), November 1998. [13] Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191, November 1990. [14] Lahey, K., "TCP Problems with Path MTU Discovery", RFC 2923 (Informational), September 2000. [15] Kent, S., "IP Encapsulating Security Payload (ESP)", draft-ietf-ipsec-esp-v3-10 (work in progress), March 2005. [16] Kent, S., "IP Authentication Header", draft-ietf-ipsec-rfc2402bis-011 (work in progress), March 2005. Perlman Expires November 2, 2005 [Page 18] Internet-Draft RBridges: Transparent Routing May 2005 [17] Kaufman, C., "Internet Key Exchange (IKEv2) Protocol", draft-ietf-ipsec-ikev2-17 (work in progress), Oct. 2004. Author's Addresses Radia Perlman Sun Microsystems Email: Radia.Perlman@sun.com Joe Touch USC/ISI 4676 Admiralty Way Marina del Rey, CA 90292 U.S.A. Phone: +1 (310)_448-9151 Email: touch@isi.edu Alper Yegin Samsung Advanced Institute of Technology Email: alper.yegin@samsung.com Intellectual Property Statement The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79. Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement Perlman Expires November 2, 2005 [Page 19] Internet-Draft RBridges: Transparent Routing May 2005 this standard. Please address the information to the IETF at ietf-ipr@ietf.org Disclaimer of Validity This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Copyright Statement Copyright (C) The Internet Society (2005). This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights. Acknowledgment Funding for the RFC Editor function is currently provided by the Internet Society. Perlman Expires November 2, 2005 [Page 20]