Network Working Group Dave Thaler Internet-Draft Christian Huitema Expires: November 2001 Microsoft 14 May 2001 Multi-link Subnet Support in IPv6 Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Copyright Notice Copyright (C) The Internet Society (2001). All Rights Reserved. Expires November 2001 [Page 1] Draft Multilink Subnets May 2001 Abstract Bridging disparate links into a single entity has several operational advantages. A single subnet prefix is sufficient to support multiple physical links. There is no need to allocate subnet numbers to the different networks, simplifying management. This document introduces the concept of a "multilink subnet", defined as a collection of independent links, connected by routers, but sharing a common subnet prefix. It then provides a summary of multiple potential approaches, as a basis for working group discussion. 1. Introduction Bridging disparate links into a single entity has several operational advantages. A single subnet prefix is sufficient to support multiple physical links. There is no need to allocate subnet numbers to the different networks, simplifying management. However, not all link-layer media can be easily bridged. Classic IEEE 802 bridging technology fails when the media does not naturally support IEEE 802 addressing. Furthermore, the operation becomes problematic when the different links don't support the same MTU size. Finally, bridging cannot be easily implemented when the network interface cannot be easily placed in "promiscuous" mode. This document introduces the concept of a "multilink subnet", defined as a collection of independent links, connected by routers, but sharing a common subnet prefix. Herein we discuss many of the problems and possible solutions surrounding this problem. The initial version of this draft will not specify behavior, but merely discuss the tradeoffs. A later version will narrow the solution space to a recommended approach. 2. Terminology multilink subnet: a collection of independent links, connected by routers, but sharing a common subnet prefix. subnet scope: multicast SCOP value 3, as specified in [ADDRARCH], which Expires November 2001 [Page 2] Draft Multilink Subnets May 2001 covers a (potentially multilink) subnet. This is the next larger multicast scope above link scope. subnet scope zone: a set of interfaces of a node that are connected to the same subnet, which may be a multilink subnet. intra-subnet router (ISR): a router with multiple interfaces in the same subnet scope zone. 3. Design Goals Multilink subnets are designed with the following goals in mind: o Existing IPv6 end hosts should continue to work when connected to a multilink subnet, without requiring any change to their behavior. For example, the host behavior parts of Router Discovery, Neighbor Discovery [ND], and Multicast Listener Discovery [MLD], must be supported. o Leave link-local address behavior unchanged. Link-local behavior continues to function only within a link, not across a multilink subnet. o Support sending and receiving unicast and anycast traffic at the site and global scopes. o Support sending and receiving multicast traffic at the subnet scope and above. o Prevent routing loops. o Support nodes moving between links within the subnet, with a reasonably fast convergence time (on the same order as Neighbor Unreachability Detection). 4. Overview 4.1. Router Discovery Router Discovery continues to work on a per-link basis, as specified in [ND]. When sending Router Advertisements (RAs) with Expires November 2001 [Page 3] Draft Multilink Subnets May 2001 a Prefix Information Option, there are two possibilities for how an ISR can influence the Neighbor Discovery procedure used. 4.1.1. Making hosts not use ND If the ISR sets the A (autonomous address-configuration) flag on, and the L (on-link) flag off, then hosts on the link will attempt stateless address configuration [ADDRCONF] in the given prefix, but will not treat the prefix as being onlink. As a result, neighbor discovery is effectively disabled and packets to new destinations always go to the router first, which will then either forward them if the destination is off-link, or redirect them if the destination is on-link. In the remainder of this document, we will refer to this mechanism as the "off-link" mechanism, since hosts initially treat all addresses in the subnet as being off-link. 4.1.2. Making hosts use ND If the ISR sets both the A and the L flags, then hosts on the link will perform stateless address configuration and neighbor discovery as usual. However, since Neighbor Solicitations (NSs) from existing hosts are sent to a link-scoped solicited-node multicast address, they will never reach nodes on other links within the subnet. Instead, ISRs must either know the location of the destination a priori, or else be able to relay such NS's to other links, either using link-scoped NS's relayed link-by-link, or using a subnet-scoped NS. In the remainder of this document, we will refer to this mechanism as the "on-link" mechanism, since hosts treat all addresses in the subnet as being on-link. 4.1.3. Effects on Duplicate Address Detection In either approach above, existing nodes will still do Duplicate Address Detection using the link-scoped solicited-node multicast address. One problem arises from the statement in [ND] that: "the link- local address MUST be tested for uniqueness, and if no duplicate Expires November 2001 [Page 4] Draft Multilink Subnets May 2001 address is detected, an implementation MAY choose to skip Duplicate Address Detection for additional addresses derived from the same interface identifier". Collisions would result if the interface identifier were unique on the link, but not across the entire multilink subnet. To avoid this, ISRs must get involved in duplicate address detection even for link-local addresses, to ensure that they are unique across a multilink subnet. To assist in DAD, ISRs must listen on all solicited-node multicast addresses (in practice, this means all multicast groups). Their actual behavior is discussed later. 4.2. Neighbor Discovery Neighbor Discovery would work differently, depending on whether the on-link or off-link mechanism is used, as described in the previous section. Off-link mechanism If the subnet is treated as being off-link, all packets are sent to a default router. It is then the default router's responsibility to figure out the next-hop of the packets. If the next-hop is on-link, it sends a Redirect to the source. On-link mechanism If the subnet is treated as being on-link, nodes will send NS's to the solicited node multicast address. (If a node has interfaces attached to multiple links in the subnet, NS's MAY be sent on each link.) If the next-hop is off-link, a router will respond with a proxy Neighbor Advertisement (NA) containing its own link-layer address. In either case, it is the router's responsibility to determine whether a destination in the subnet is on-link. While it is resolving a next-hop, the router also remembers each node sending an NS for the destination so that upon receipt of an NA, it can send an NA to each one, containing its own link-layer address as the Target Link Layer Address. As specified in [ND], proxy Neighbor Advertisements sent by ISR's on behalf of remote targets should always have the Override bit Expires November 2001 [Page 5] Draft Multilink Subnets May 2001 clear, since the presence of multiple ISR's responding is analoguous to making the target address be an anycast address. 4.3. Basic Unicast In this section, we step through an example of basic unicast communication, assuming that address configuration has already completed, and the router's routing table and neighbor cache already have any required information. A subsequent section will discuss such mechanisms for inter-router communication. In the simple scenario depicted in Figure 1 below, two links, (1) and (2) on a common subnet with global prefix G, are connected by an ISR B. Node A has link-layer address a on link 1, and has acquired global IPv6 address Ga, and link-local IPv6 address La. Similarly, ISR B has on link 1, link-layer address b1, and IPv6 addresses Gb1 and Lb1, and on link 2, and link-layer address b2 and IPv6 addresses Gb2 and Lb2. Node C has link-layer address c2 on link 2, and IPv6 addresses Gc and Lc. Node D has link-layer address d1 on link 1, and IPv6 addresses Gd and Ld. +---+ +---+ | A | | D | +-+-+ +-+-+ | | --+------------+-------------+--------------(1)-- | +-+-+ | B | +-+-+ | ---------------+-------------+--------------(2)-- | +-+-+ | C | +---+ Figure 1: Simple Scenario Off-link mechanism When A wants to start communication with Gc, it finds that the destination address matches no on-link prefix, and so Expires November 2001 [Page 6] Draft Multilink Subnets May 2001 sends the packet directly to its default router B. B first applies its usual packet validation rules (including decrementing the Hop Count in the IPv6 header). B knows that C is on-link to link 2, with link-layer address c2, and so if the packet is not dropped, it forwards the packet to C. When A wants to communicate with Dc, it again finds that the destination address matches no on-link prefix, and so sends the packet directly to its default router B. B knows that D is on-link to the same link as A, and so responds with a Redirect. On-link mechanism When A wants to start communication with Gc, it finds that the destination address matches an on-link prefix, and so sends an NS to the solicited-node multicast address Sc constructed from Gc. The NS message is received by the ISR B, which listens on all multicast groups. B knows that C is on-link to link 2, and responds to A with an NA containing its own link-layer address b1 as the Target Link-Layer Address. After this, A can send packets to the address Gc. The packets will be sent to the link address b1; they will be received by B, which will apply its usual validation rules (including decrementing the Hop Count in the IPv6 header), and relay them to the address c2 on link 2. When A wants to communicate with Gd, it again finds that the destination address matches an on-link prefix, and so sends an NS to its solicited-node multicast address. D receives the NS and responds. B also receives the NS, but knows that D is on the same link as A, and so does not respond. We note that B does not need to turn on "promiscuous mode" listening, at least for unicast packets; it merely needs to listen to all multicast addresses. We also did not assume that the links had to use IEEE 802 addresses, or in fact any form of consistent addressing. B can also handle MTU discovery procedures, returning an ICMP messages if either A or C sends a packet that is too long. Expires November 2001 [Page 7] Draft Multilink Subnets May 2001 4.4. Multicast Most multicast routing protocols are based on a "Reverse-Path Forwarding" check. That is, they drop a packet if the packet does not arrive on the link towards a given address (e.g., the source address, or a Rendezvous Point address associated with the group address). Thus, multicast will work as long as a router can tell which link is towards any address within the subnet. Note that in particular, simply using the subnet route is not sufficient in a multilink subnet. A router requires either the equivalent of host routes (or neighbor cache entries) for RPF, or that a non-RPF- based mechanism (such as a spanning tree) is used within the subnet. 5. Intra-Router Communication In the network depicted in Figure 2, we have now three links, and also three intra-subnet routers (ISRs), B, E, and F. +---+ +-+-+ | A | | D | +-+-+ +-+-+ | | --+------------+-------------+----------+---(1)-- | | +-+-+ +-+-+ | B | | E | +-+-+ +-+-+ | | -----------+-------------+----------+---(2)-- | +-+-+ | F | +-+-+ | ------+----------+--------------(3)-- | +-+-+ | C | +---+ Figure 2: Multiple-Routers Scenario Expires November 2001 [Page 8] Draft Multilink Subnets May 2001 The network is sufficiently complex to expose problems inherent to bridging: o If A sends an NS packet, that packet is received by both B and E. Depending on the intra-router communication mechanism, this could lead to duplicate transmissions on link 2, and possibly to random behaviors, or to loops. o If A sends a multicast packet, and that packet is relayed by both B and E, it would lead to duplicate traffic, or even potential loops. It may not be relayed at all, if neither B nor E realize there is a group member hidden behind F. There are (at least) three possible approaches to solving the above problems which might meet our design goals. We discuss each approach in turn below, with examples using Figure 2 when no previous state is known. Some of these methods use a "Local Distance" Option: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type | Length | Reserved | Hop Count | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Timestamp | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The option contains five fields, encoded on 8 octets. The Hop Count field contains an 8-bit unsigned integer being the number of hops between the advertising station and the source or the target address. It is used to assist in loop prevention and provide shortest paths. The Timestamp, is a 32-bit integer (in seconds) that describes the time at which the source or target address was last advertised by the actual node with that address. It is used to ensure that neighbor discovery messages do not loop forever if the propagation delay through across the subnet is significant. (Authors' note: is there a way to make this work without synchronized clocks? Is a Timestamp really required?) If this option is used, it is expected that an ISR's neighbor cache entries would also contain the Hop Count and Timestamp information associated with the link-layer address used. Expires November 2001 [Page 9] Draft Multilink Subnets May 2001 5.1. Method A: Creating a spanning tree IEEE 802 bridges avoid loops by constructing a spanning tree, selecting which bridges will be allowed to relay packets between any two links. We could replicate such a protocol on top of IPv6, but doing so is not necessarily the best solution: o We would need a standard, but defining a spanning tree discovery protocol on top of IPv6 introduces a great deal of complexity, and may require long debates. o Implementing an independent protocol is probably harder than simply extending the neighbor discovery procedures. o Extending the neighbor discovery procedure can allow faster handling of topology changes, which could be very useful in an ad hoc networking environment. The basic idea of this approach is that a simple spanning tree is created among ISRs by creating a new ICMP message that is multicast on each link, to elect a "core" ISR, using the same mechanism as the PIM Bootstrap election mechanism. That is, each ISR begins thinking it is the core. An ISR loses the core election if it hears about another core with a lower interface id. Multicast announcements are originated periodically by the core, and relayed hop-by-hop by other routers. The Local Distance option would be used to track distance from the core, and select the best next-hop towards the core. Once a spanning tree is generated, multicast packets can then be sent along the spanning tree without any RPF check inside the subnet. One of the other two methods below could be used for unicast traffic, possibly restricted to communication along the spanning tree to provide more assurance against loops. 5.2. Method B: Flooding Neighbor Solicitations The basic idea here is that an ISR would, when needing to resolve a target address to a next-hop, send a Neighbor Solicitation on each attached link in the subnet. After sending an NS, the router suppresses sending of any other NS's for the same target address for a short interval (which must be less than ND's RetransTimer). Expires November 2001 [Page 10] Draft Multilink Subnets May 2001 A Neighbor Advertisement would be sent in response to an NS only by (a) the actual node with the target address, or (b) an ISR which has received an NA in response to a relayed NS it sent as a result of receiving the first NS. Specifically, an NA is not sent just because the ISR has a neighbor cache entry for the target. This is needed because only an NA from the actual target provides a liveness indication, and avoids circular state refreshes among ISRs. Since multiple paths may exist, to assist in loop prevention and provide shortest paths, a new "Local Distance" option in NA's can be defined, which contains the number of hops from the actual target. The absence of such an option implies the value 0. When proxying an NA, an ISR would include the Local Distance option with an incremented value. Legacy nodes will ignore the option, but ISRs (and new nodes if they wish) can use the option to prefer link-layer addresses with a lower Local Distance. To route actual packets, an ISR's route lookup would determine that the longest matching route is on-link to multiple links. The router would consult its (conceptual) neighbor cache, and use the next-hop with the lowest Local Distance. The same procedure would apply to multicast packets as well, when the router would look up the RPF address. 5.2.1. On-link mechanism example In Figure 2, when A wishes to communicate with Gc, both B and E will receive an NS from A. Each will originate an NS for Gc on link 2. B, E, and F will receive the NS's on link 2. B and E will ignore each others' NS since they have just sent an NS for the same address. F will receive the NS's and the first one will cause it to create a neighbor cache entry in the INCOMPLETE state, and originate its own NS on link 3. When C receives this NS, it will respond with an NA. When F receives the NA from C, it will respond to B and E with an NA with its own link-layer address f2 as the Target Link Layer Address, and a "Local Distance = 1" option. B and E will then respond to A with NAs containing b1 and e1, respectively, as the Target Link Layer Address, and a "Local Distance = 2" option. Expires November 2001 [Page 11] Draft Multilink Subnets May 2001 5.2.2. Off-link mechanism example In Figure 2, when A wishes to communicate with Gc, it will send packets to a default router, say, B. B will send an NS on link 1, which will be received by E, and on link 2 which will be received by E and F. Depending on timing, E may send an NS on link 1 or link 2 or neither. (If a short delay were inserted before sending, both could be suppressed.) F will send an NS on link 3, to which C will reply with an NA. Upon receiving the NA, F sends an NA to all nodes from which it has seen an NS for Gc, namely B and possibly E. B (and possibly E as well) will then send an NA on link 1, after which A can communicate with C. 5.3. Method C: Proactively populate host routes The basic idea here is that ISR's would inject host routes into a routing protocol used within (at least) the subnet upon detecting a new node on a directly-connected link. This method requires no ND proxying. Instead, when a node sends an NS as part of its DAD attempts, an ISRs on the link would consult its routing table. If an existing host route exists (for another node), it would respond with an NA, causing the node to detect a duplicate. If no existing host route exists, one is created and advertised to other ISRs. Once host routes exist, either the off-link or the on-link mechanism could be used. In addition, multicast works with no changes, since host routes would be used for RPF checks. Another advantage is that since all resolution is done by ISR's "a priori", no additional delay is incurred when A wants to communicate with A. If the on-link mechanism is used, no neighbor discovery delay exists at all. Packets are immediately forwarded along the correct path. This approach avoids all bursty-source problems, at the expense of larger routing tables (at least within the subnet). One potential problem that would need to be addressed is how to prevent collisions if two hosts on separate links simultaneously try to assign the same interface id. Expires November 2001 [Page 12] Draft Multilink Subnets May 2001 5.3.1. On-link mechanism example A sends an NS for target address Gc, which is received by B and E. Each finds that a host route exists via F, and replies with an NS containing their own link-layer address. A selects one of them (say b1) for its neighbor cache entry. Subsequent packets are sent to b1, and forwarded along the host route to F. F has a neighbor cache entry on link 3 (if stale, F resends an NS on link 3 to confirm that C is still present). 5.3.2. Off-link mechanism example Since A determines that Gc is off-link, A sends packets destined to Gc to its default router, say B, where they follow a host route to F. Again, F has a neighbor cache entry on link 3 (if stale, F resends an NS on link 3 to confirm that C is still present). Expires November 2001 [Page 13] Draft Multilink Subnets May 2001 6. Security Considerations TBD. 7. Acknowledgements Brian Zill and Hesham Soliman participated in discussions that led to this draft. 8. Authors' Addresses Dave Thaler Microsoft Corporation One Microsoft Way Redmond, WA 98052-6399 Phone: +1 425 703 8835 EMail: dthaler@microsoft.com Christian Huitema Microsoft Corporation One Microsoft Way Redmond, WA 98052-6399 EMail: huitema@microsoft.com 9. References [ADDRARCH] Hinden, R., and S. Deering, "IP Version 6 Addressing Architecture", RFC 2373, July 1998. [ADDRCONF] Thomson, S., and T. Narten, "IPv6 Stateless Address Autoconfiguration", RFC 2462, December 1998. [MLD] Deering, S., Fenner, W., and B. Haberman, "Multicast Listener Discovery (MLD) for IPv6", RFC 2710, October 1999. [ND] Narten, T., Nordmark, E., and W. Simpson, "Neighbor Discovery for IP Version 6 (IPv6)", RFC 2461, December 1998. Expires November 2001 [Page 14] Draft Multilink Subnets May 2001 10. Full Copyright Statement Copyright (C) The Internet Society (1999). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implmentation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Expires November 2001 [Page 15]