Network Working Group                                  Dave Thaler
Internet-Draft                                   Christian Huitema
Expires: November 2001                                   Microsoft
                                                       14 May 2001


                Multi-link Subnet Support in IPv6
          <draft-thaler-ipngwg-multilink-subnets-00.txt>


Status of this Memo

This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026.

Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups.  Note that
other groups may also distribute working documents as Internet-
Drafts.

Internet-Drafts are draft documents valid for a maximum of six
months and may be updated, replaced, or obsoleted by other
documents at any time.  It is inappropriate to use Internet-
Drafts as reference material or to cite them other than as "work
in progress."

The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt

The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.


Copyright Notice

Copyright (C) The Internet Society (2001).  All Rights Reserved.


Expires November 2001                                     [Page 1]


Draft                    Multilink Subnets                May 2001


Abstract

Bridging disparate links into a single entity has several
operational advantages.  A single subnet prefix is sufficient to
support multiple physical links.  There is no need to allocate
subnet numbers to the different networks, simplifying management.
This document introduces the concept of a "multilink subnet",
defined as a collection of independent links, connected by
routers, but sharing a common subnet prefix.  It then provides a
summary of multiple potential approaches, as a basis for working
group discussion.


1.  Introduction

Bridging disparate links into a single entity has several
operational advantages.  A single subnet prefix is sufficient to
support multiple physical links.  There is no need to allocate
subnet numbers to the different networks, simplifying management.

However, not all link-layer media can be easily bridged.  Classic
IEEE 802 bridging technology fails when the media does not
naturally support IEEE 802 addressing.  Furthermore, the operation
becomes problematic when the different links don't support the
same MTU size.  Finally, bridging cannot be easily implemented
when the network interface cannot be easily placed in
"promiscuous" mode.

This document introduces the concept of a "multilink subnet",
defined as a collection of independent links, connected by
routers, but sharing a common subnet prefix.  Herein we discuss
many of the problems and possible solutions surrounding this
problem.  The initial version of this draft will not specify
behavior, but merely discuss the tradeoffs.  A later version will
narrow the solution space to a recommended approach.


2.  Terminology

multilink subnet:
      a collection of independent links, connected by routers, but
      sharing a common subnet prefix.

subnet scope:
      multicast SCOP value 3, as specified in [ADDRARCH], which


Expires November 2001                                     [Page 2]


Draft                    Multilink Subnets                May 2001


      covers a (potentially multilink) subnet.  This is the next
      larger multicast scope above link scope.

subnet scope zone:
      a set of interfaces of a node that are connected to the same
      subnet, which may be a multilink subnet.

intra-subnet router (ISR):
      a router with multiple interfaces in the same subnet scope
      zone.


3.  Design Goals

Multilink subnets are designed with the following goals in mind:

o    Existing IPv6 end hosts should continue to work when
     connected to a multilink subnet, without requiring any change
     to their behavior.  For example, the host behavior parts of
     Router Discovery, Neighbor Discovery [ND], and Multicast
     Listener Discovery [MLD], must be supported.

o    Leave link-local address behavior unchanged.  Link-local
     behavior continues to function only within a link, not across
     a multilink subnet.

o    Support sending and receiving unicast and anycast traffic at
     the site and global scopes.

o    Support sending and receiving multicast traffic at the subnet
     scope and above.

o    Prevent routing loops.

o    Support nodes moving between links within the subnet, with a
     reasonably fast convergence time (on the same order as
     Neighbor Unreachability Detection).


4.  Overview

4.1.  Router Discovery

Router Discovery continues to work on a per-link basis, as
specified in [ND].  When sending Router Advertisements (RAs) with


Expires November 2001                                     [Page 3]


Draft                    Multilink Subnets                May 2001


a Prefix Information Option, there are two possibilities for how
an ISR can influence the Neighbor Discovery procedure used.


4.1.1.  Making hosts not use ND

If the ISR sets the A (autonomous address-configuration) flag on,
and the L (on-link) flag off, then hosts on the link will attempt
stateless address configuration [ADDRCONF] in the given prefix,
but will not treat the prefix as being onlink.  As a result,
neighbor discovery is effectively disabled and packets to new
destinations always go to the router first, which will then either
forward them if the destination is off-link, or redirect them if
the destination is on-link.

In the remainder of this document, we will refer to this mechanism
as the "off-link" mechanism, since hosts initially treat all
addresses in the subnet as being off-link.


4.1.2.  Making hosts use ND

If the ISR sets both the A and the L flags, then hosts on the link
will perform stateless address configuration and neighbor
discovery as usual.  However, since Neighbor Solicitations (NSs)
from existing hosts are sent to a link-scoped solicited-node
multicast address, they will never reach nodes on other links
within the subnet.  Instead, ISRs must either know the location of
the destination a priori, or else be able to relay such NS's to
other links, either using link-scoped NS's relayed link-by-link,
or using a subnet-scoped NS.

In the remainder of this document, we will refer to this mechanism
as the "on-link" mechanism, since hosts treat all addresses in the
subnet as being on-link.


4.1.3.  Effects on Duplicate Address Detection

In either approach above, existing nodes will still do Duplicate
Address Detection using the link-scoped solicited-node multicast
address.

One problem arises from the statement in [ND] that: "the link-
local address MUST be tested for uniqueness, and if no duplicate


Expires November 2001                                     [Page 4]


Draft                    Multilink Subnets                May 2001


address is detected, an implementation MAY choose to skip
Duplicate Address Detection for additional addresses derived from
the same interface identifier".

Collisions would result if the interface identifier were unique on
the link, but not across the entire multilink subnet.  To avoid
this, ISRs must get involved in duplicate address detection even
for link-local addresses, to ensure that they are unique across a
multilink subnet.

To assist in DAD, ISRs must listen on all solicited-node multicast
addresses (in practice, this means all multicast groups).  Their
actual behavior is discussed later.


4.2.  Neighbor Discovery

Neighbor Discovery would work differently, depending on whether
the on-link or off-link mechanism is used, as described in the
previous section.


Off-link mechanism
     If the subnet is treated as being off-link, all packets are
     sent to a default router.  It is then the default router's
     responsibility to figure out the next-hop of the packets.  If
     the next-hop is on-link, it sends a Redirect to the source.

On-link mechanism
     If the subnet is treated as being on-link, nodes will send
     NS's to the solicited node multicast address.  (If a node has
     interfaces attached to multiple links in the subnet, NS's MAY
     be sent on each link.)  If the next-hop is off-link, a router
     will respond with a proxy Neighbor Advertisement (NA)
     containing its own link-layer address.

In either case, it is the router's responsibility to determine
whether a destination in the subnet is on-link.  While it is
resolving a next-hop, the router also remembers each node sending
an NS for the destination so that upon receipt of an NA, it can
send an NA to each one, containing its own link-layer address as
the Target Link Layer Address.

As specified in [ND], proxy Neighbor Advertisements sent by ISR's
on behalf of remote targets should always have the Override bit


Expires November 2001                                     [Page 5]


Draft                    Multilink Subnets                May 2001


clear, since the presence of multiple ISR's responding is
analoguous to making the target address be an anycast address.


4.3.  Basic Unicast

In this section, we step through an example of basic unicast
communication, assuming that address configuration has already
completed, and the router's routing table and neighbor cache
already have any required information.  A subsequent section will
discuss such mechanisms for inter-router communication.

In the simple scenario depicted in Figure 1 below, two links, (1)
and (2) on a common subnet with global prefix G, are connected by
an ISR B.  Node A has link-layer address a on link 1, and has
acquired global IPv6 address Ga, and link-local IPv6 address La.
Similarly, ISR B has on link 1, link-layer address b1, and IPv6
addresses Gb1 and Lb1, and on link 2, and link-layer address b2
and IPv6 addresses Gb2 and Lb2.  Node C has link-layer address c2
on link 2, and IPv6 addresses Gc and Lc.  Node D has link-layer
address d1 on link 1, and IPv6 addresses Gd and Ld.

+---+                      +---+
| A |                      | D |
+-+-+                      +-+-+
  |                          |
--+------------+-------------+--------------(1)--
               |
             +-+-+
             | B |
             +-+-+
               |
---------------+-------------+--------------(2)--
                             |
                           +-+-+
                           | C |
                           +---+

                    Figure 1: Simple Scenario


Off-link mechanism
     When A wants to start communication with Gc, it finds that
     the destination address matches no on-link prefix, and so


Expires November 2001                                     [Page 6]


Draft                    Multilink Subnets                May 2001


     sends the packet directly to its default router B.  B first
     applies its usual packet validation rules (including
     decrementing the Hop Count in the IPv6 header).  B knows that
     C is on-link to link 2, with link-layer address c2, and so if
     the packet is not dropped, it forwards the packet to C.

     When A wants to communicate with Dc, it again finds that the
     destination address matches no on-link prefix, and so sends
     the packet directly to its default router B.  B knows that D
     is on-link to the same link as A, and so responds with a
     Redirect.


On-link mechanism
     When A wants to start communication with Gc, it finds that
     the destination address matches an on-link prefix, and so
     sends an NS to the solicited-node multicast address Sc
     constructed from Gc.  The NS message is received by the ISR
     B, which listens on all multicast groups.  B knows that C is
     on-link to link 2, and responds to A with an NA containing
     its own link-layer address b1 as the Target Link-Layer
     Address.

     After this, A can send packets to the address Gc.  The
     packets will be sent to the link address b1; they will be
     received by B, which will apply its usual validation rules
     (including decrementing the Hop Count in the IPv6 header),
     and relay them to the address c2 on link 2.

     When A wants to communicate with Gd, it again finds that the
     destination address matches an on-link prefix, and so sends
     an NS to its solicited-node multicast address.  D receives
     the NS and responds.  B also receives the NS, but knows that
     D is on the same link as A, and so does not respond.

We note that B does not need to turn on "promiscuous mode"
listening, at least for unicast packets; it merely needs to listen
to all multicast addresses. We also did not assume that the links
had to use IEEE 802 addresses, or in fact any form of consistent
addressing.  B can also handle MTU discovery procedures, returning
an ICMP messages if either A or C sends a packet that is too long.


Expires November 2001                                     [Page 7]


Draft                    Multilink Subnets                May 2001


4.4.  Multicast

Most multicast routing protocols are based on a "Reverse-Path
Forwarding" check.  That is, they drop a packet if the packet does
not arrive on the link towards a given address (e.g., the source
address, or a Rendezvous Point address associated with the group
address).  Thus, multicast will work as long as a router can tell
which link is towards any address within the subnet.  Note that in
particular, simply using the subnet route is not sufficient in a
multilink subnet.  A router requires either the equivalent of host
routes (or neighbor cache entries) for RPF, or that a non-RPF-
based mechanism (such as a spanning tree) is used within the
subnet.


5.  Intra-Router Communication

In the network depicted in Figure 2, we have now three links, and
also three intra-subnet routers (ISRs), B, E, and F.

+---+                      +-+-+
| A |                      | D |
+-+-+                      +-+-+
  |                          |
--+------------+-------------+----------+---(1)--
               |                        |
             +-+-+                    +-+-+
             | B |                    | E |
             +-+-+                    +-+-+
               |                        |
    -----------+-------------+----------+---(2)--
                             |
                           +-+-+
                           | F |
                           +-+-+
                             |
            ------+----------+--------------(3)--
                  |
                +-+-+
                | C |
                +---+

               Figure 2: Multiple-Routers Scenario


Expires November 2001                                     [Page 8]


Draft                    Multilink Subnets                May 2001


The network is sufficiently complex to expose problems inherent to
bridging:

o    If A sends an NS packet, that packet is received by both B
     and E.  Depending on the intra-router communication
     mechanism, this could lead to duplicate transmissions on link
     2, and possibly to random behaviors, or to loops.

o    If A sends a multicast packet, and that packet is relayed by
     both B and E, it would lead to duplicate traffic, or even
     potential loops.  It may not be relayed at all, if neither B
     nor E realize there is a group member hidden behind F.

There are (at least) three possible approaches to solving the
above problems which might meet our design goals.  We discuss each
approach in turn below, with examples using Figure 2 when no
previous state is known.

Some of these methods use a "Local Distance" Option:
 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|     Type      |    Length     |    Reserved   |  Hop Count    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                          Timestamp                            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
The option contains five fields, encoded on 8 octets.

The Hop Count field contains an 8-bit unsigned integer being the
number of hops between the advertising station and the source or
the target address.  It is used to assist in loop prevention and
provide shortest paths.

The Timestamp, is a 32-bit integer (in seconds) that describes the
time at which the source or target address was last advertised by
the actual node with that address.  It is used to ensure that
neighbor discovery messages do not loop forever if the propagation
delay through across the subnet is significant.  (Authors' note:
is there a way to make this work without synchronized clocks?  Is
a Timestamp really required?)

If this option is used, it is expected that an ISR's neighbor
cache entries would also contain the Hop Count and Timestamp
information associated with the link-layer address used.


Expires November 2001                                     [Page 9]


Draft                    Multilink Subnets                May 2001


5.1.  Method A: Creating a spanning tree

IEEE 802 bridges avoid loops by constructing a spanning tree,
selecting which bridges will be allowed to relay packets between
any two links.  We could replicate such a protocol on top of IPv6,
but doing so is not necessarily the best solution:

o    We would need a standard, but defining a spanning tree
     discovery protocol on top of IPv6 introduces a great deal of
     complexity, and may require long debates.

o    Implementing an independent protocol is probably harder than
     simply extending the neighbor discovery procedures.

o    Extending the neighbor discovery procedure can allow faster
     handling of topology changes, which could be very useful in
     an ad hoc networking environment.

The basic idea of this approach is that a simple spanning tree is
created among ISRs by creating a new ICMP message that is
multicast on each link, to elect a "core" ISR, using the same
mechanism as the PIM Bootstrap election mechanism.  That is, each
ISR begins thinking it is the core.  An ISR loses the core
election if it hears about another core with a lower interface id.
Multicast announcements are originated periodically by the core,
and relayed hop-by-hop by other routers.  The Local Distance
option would be used to track distance from the core, and select
the best next-hop towards the core.

Once a spanning tree is generated, multicast packets can then be
sent along the spanning tree without any RPF check inside the
subnet.

One of the other two methods below could be used for unicast
traffic, possibly restricted to communication along the spanning
tree to provide more assurance against loops.


5.2.  Method B: Flooding Neighbor Solicitations

The basic idea here is that an ISR would, when needing to resolve
a target address to a next-hop, send a Neighbor Solicitation on
each attached link in the subnet.  After sending an NS, the router
suppresses sending of any other NS's for the same target address
for a short interval (which must be less than ND's RetransTimer).


Expires November 2001                                    [Page 10]


Draft                    Multilink Subnets                May 2001


A Neighbor Advertisement would be sent in response to an NS only
by (a) the actual node with the target address, or (b) an ISR
which has received an NA in response to a relayed NS it sent as a
result of receiving the first NS.  Specifically, an NA is not sent
just because the ISR has a neighbor cache entry for the target.
This is needed because only an NA from the actual target provides
a liveness indication, and avoids circular state refreshes among
ISRs.

Since multiple paths may exist, to assist in loop prevention and
provide shortest paths, a new "Local Distance" option in NA's can
be defined, which contains the number of hops from the actual
target.  The absence of such an option implies the value 0.  When
proxying an NA, an ISR would include the Local Distance option
with an incremented value.  Legacy nodes will ignore the option,
but ISRs (and new nodes if they wish) can use the option to prefer
link-layer addresses with a lower Local Distance.

To route actual packets, an ISR's route lookup would determine
that the longest matching route is on-link to multiple links.  The
router would consult its (conceptual) neighbor cache, and use the
next-hop with the lowest Local Distance.  The same procedure would
apply to multicast packets as well, when the router would look up
the RPF address.


5.2.1.  On-link mechanism example

In Figure 2, when A wishes to communicate with Gc, both B and E
will receive an NS from A.  Each will originate an NS for Gc on
link 2.  B, E, and F will receive the NS's on link 2.  B and E
will ignore each others' NS since they have just sent an NS for
the same address.

F will receive the NS's and the first one will cause it to create
a neighbor cache entry in the INCOMPLETE state, and originate its
own NS on link 3.  When C receives this NS, it will respond with
an NA.

When F receives the NA from C, it will respond to B and E with an
NA with its own link-layer address f2 as the Target Link Layer
Address, and a "Local Distance = 1" option.  B and E will then
respond to A with NAs containing b1 and e1, respectively, as the
Target Link Layer Address, and a "Local Distance = 2" option.


Expires November 2001                                    [Page 11]


Draft                    Multilink Subnets                May 2001


5.2.2.  Off-link mechanism example

In Figure 2, when A wishes to communicate with Gc, it will send
packets to a default router, say, B.  B will send an NS on link 1,
which will be received by E, and on link 2 which will be received
by E and F.  Depending on timing, E may send an NS on link 1 or
link 2 or neither.  (If a short delay were inserted before
sending, both could be suppressed.)  F will send an NS on link 3,
to which C will reply with an NA.  Upon receiving the NA, F sends
an NA to all nodes from which it has seen an NS for Gc, namely B
and possibly E.

B (and possibly E as well) will then send an NA on link 1, after
which A can communicate with C.


5.3.  Method C: Proactively populate host routes

The basic idea here is that ISR's would inject host routes into a
routing protocol used within (at least) the subnet upon detecting
a new node on a directly-connected link.

This method requires no ND proxying.  Instead, when a node sends
an NS as part of its DAD attempts, an ISRs on the link would
consult its routing table.  If an existing host route exists (for
another node), it would respond with an NA, causing the node to
detect a duplicate.  If no existing host route exists, one is
created and advertised to other ISRs.

Once host routes exist, either the off-link or the on-link
mechanism could be used.  In addition, multicast works with no
changes, since host routes would be used for RPF checks.

Another advantage is that since all resolution is done by ISR's "a
priori", no additional delay is incurred when A wants to
communicate with A.  If the on-link mechanism is used, no neighbor
discovery delay exists at all.  Packets are immediately forwarded
along the correct path.  This approach avoids all bursty-source
problems, at the expense of larger routing tables (at least within
the subnet).

One potential problem that would need to be addressed is how to
prevent collisions if two hosts on separate links simultaneously
try to assign the same interface id.


Expires November 2001                                    [Page 12]


Draft                    Multilink Subnets                May 2001


5.3.1.  On-link mechanism example

A sends an NS for target address Gc, which is received by B and E.
Each finds that a host route exists via F, and replies with an NS
containing their own link-layer address.

A selects one of them (say b1) for its neighbor cache entry.
Subsequent packets are sent to b1, and forwarded along the host
route to F.  F has a neighbor cache entry on link 3 (if stale, F
resends an NS on link 3 to confirm that C is still present).


5.3.2.  Off-link mechanism example

Since A determines that Gc is off-link, A sends packets destined
to Gc to its default router, say B, where they follow a host route
to F.  Again, F has a neighbor cache entry on link 3 (if stale, F
resends an NS on link 3 to confirm that C is still present).


Expires November 2001                                    [Page 13]


Draft                    Multilink Subnets                May 2001


6.  Security Considerations

TBD.


7.  Acknowledgements

Brian Zill and Hesham Soliman participated in discussions that led
to this draft.


8.  Authors' Addresses

     Dave Thaler
     Microsoft Corporation
     One Microsoft Way
     Redmond, WA  98052-6399
     Phone: +1 425 703 8835
     EMail: dthaler@microsoft.com

     Christian Huitema
     Microsoft Corporation
     One Microsoft Way
     Redmond, WA  98052-6399
     EMail: huitema@microsoft.com


9.  References

[ADDRARCH]
     Hinden, R., and S. Deering, "IP Version 6 Addressing
     Architecture", RFC 2373, July 1998.

[ADDRCONF]
     Thomson, S., and T. Narten, "IPv6 Stateless Address
     Autoconfiguration", RFC 2462, December 1998.

[MLD]
     Deering, S., Fenner, W., and B. Haberman, "Multicast Listener
     Discovery (MLD) for IPv6", RFC 2710, October 1999.

[ND] Narten, T., Nordmark, E., and W. Simpson, "Neighbor Discovery
     for IP Version 6 (IPv6)", RFC 2461, December 1998.


Expires November 2001                                    [Page 14]


Draft                    Multilink Subnets                May 2001


10.  Full Copyright Statement

Copyright (C) The Internet Society (1999).  All Rights Reserved.

This document and translations of it may be copied and furnished
to others, and derivative works that comment on or otherwise
explain it or assist in its implmentation may be prepared, copied,
published and distributed, in whole or in part, without
restriction of any kind, provided that the above copyright notice
and this paragraph are included on all such copies and derivative
works.  However, this document itself may not be modified in any
way, such as by removing the copyright notice or references to the
Internet Society or other Internet organizations, except as needed
for the purpose of developing Internet standards in which case the
procedures for copyrights defined in the Internet Standards
process must be followed, or as required to translate it into
languages other than English.

The limited permissions granted above are perpetual and will not
be revoked by the Internet Society or its successors or assigns.

This document and the information contained herein is provided on
an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET
ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.


Expires November 2001                                    [Page 15]