HTTP/1.1 200 OK Date: Tue, 09 Apr 2002 02:53:57 GMT Server: Apache/1.3.20 (Unix) Last-Modified: Tue, 07 Apr 1998 05:51:42 GMT ETag: "2e7ab7-1150b-3529beee" Accept-Ranges: bytes Content-Length: 70923 Connection: close Content-Type: text/plain Inter-Domain Multicast Routing (IDMR) A. Ballardie INTERNET-DRAFT Consultant B. Cain Bay Networks Z. Zhang Bay Networks March 1998 Core Based Trees (CBT version 3) Multicast Routing -- Protocol Specification -- Status of this Memo This document is an Internet Draft. Internet Drafts are working doc- uments of the Internet Engineering Task Force (IETF), its Areas, and its Working Groups. Note that other groups may also distribute work- ing documents as Internet Drafts). Internet Drafts are draft documents valid for a maximum of six months. Internet Drafts may be updated, replaced, or obsoleted by other documents at any time. It is not appropriate to use Internet Drafts as reference material or to cite them other than as a "working draft" or "work in progress." Please check the I-D abstract listing contained in each Internet Draft directory to learn the current status of this or any other Internet Draft. Abstract This document describes the Core Based Tree (CBT version 3) network layer multicast routing protocol. CBT builds a shared multicast dis- tribution tree per group, and is suited to inter- and intra-domain multicast routing. CBT may use a separate multicast routing table, or it may use that of underlying unicast routing, to establish paths between senders and receivers. The CBT architecture is described in [1]. This specification supercedes and obsoletes RFC 2189. Changes from Expires October 1998 [Page 1] INTERNET-DRAFT CBTv3 Protocol Specification March 1998 RFC 2189 include support for source specific joining and pruning to provide better CBT transit domain capability, new packet formats, and new robustness features. Section 1 documents the primary changes to RFC 2189. This document is progressing through the IDMR working group of the IETF. CBT related documents include [1, 2, 3, 5, 8]. For all IDMR- related documents, see http://www.cs.ucl.ac.uk/ietf/idmr. TABLE OF CONTENTS 1. Changes from RFC 2189 .......................................... 4 2. Building a CBT Multicast Domain ................................ 5 3. Introduction & Terminology ..................................... 5 4. CBT Functional Overview ........................................ 6 4.1. The First Step: Joining the Tree .......................... 6 4.2. Transient State ........................................... 7 4.3. Getting on-tree ........................................... 7 4.3. Pruning & Prune State ..................................... 8 4.4. The Forwarding Cache ...................................... 9 4.5. Packet Forwarding ......................................... 11 4.7. The "Keepalive" Protocol .................................. 11 4.8. Control Message Precedence & Forwarding Criteria .......... 12 4.9. Broadcast LANs ............................................ 13 4.10. The "all-cbt-routers" Group .............................. 14 4.11. Non-Member Sending ....................................... 15 5. Protocol Specification Details ................................. 15 5.1. CBT HELLO Protocol ........................................ 15 Expires October 1998 [Page 2] INTERNET-DRAFT CBTv3 Protocol Specification March 1998 5.1.1. Sending HELLOs ..................................... 16 5.1.2. Receiving HELLOs ................................... 17 5.2. JOIN_REQUEST Processing ................................... 19 5.2.1. Sending JOIN_REQUESTs .............................. 19 5.2.2. Receiving JOIN_REQUESTs ............................ 19 5.2.3. Additional Aspects Related to Receiving Multicast JOIN_REQUESTs ............................................. 20 5.3. JOIN_ACK Processing ....................................... 20 5.3.1. Sending JOIN_ACKs .................................. 20 5.3.2. Receiving JOIN_ACKs ................................ 21 5.4. QUIT_NOTIFICATION Processing .............................. 21 5.4.1. Sending QUIT_NOTIFICATIONs ......................... 21 5.4.2. Receiving QUIT_NOTIFICATIONs ....................... 22 5.5. ECHO_REQUEST Processing ................................... 23 5.5.1. Sending ECHO_REQUESTs .............................. 23 5.5.2. Receiving ECHO_REQUESTs ............................ 23 5.6. ECHO_REPLY Processing ..................................... 24 5.6.1. Sending ECHO_REPLYs ................................ 24 5.6.2. Receiving ECHO_REPLYs .............................. 24 5.7. FLUSH_TREE Processing ..................................... 25 5.7.1. Sending FLUSH_TREE messages ........................ 25 5.7.2. Receiving FLUSH_TREE messages ...................... 25 6. Timers and Default Values ...................................... 26 Expires October 1998 [Page 3] INTERNET-DRAFT CBTv3 Protocol Specification March 1998 7. CBT Packet Formats and Message Types ........................... 27 7.1. CBT Common Control Packet Header .......................... 27 7.2. Packet Format for CBT Control Packet Types 0 - 6 .......... 28 7.2.1. Option Type Definitions ............................ 29 8. Core Router Discovery .......................................... 30 8.1. "Bootstrap" Mechanism Overview ............................ 30 8.2. Bootstrap Message Format .................................. 32 8.3. Candidate Core Advertisement Message Format ............... 32 Acknowledgements .................................................. 32 References ........................................................ 33 Author Information ................................................ 34 1. Changes from RFC 2189 +o forwarding cache support for entries of different granularities, i.e. (*, G), (*, Core), or (S, G), and support for S and/or G masks for representing S and/or G aggregates +o included support for joins, quits (prunes), and flushes of differ- ent granularities, i.e. (*, G), (*, Core), or (S, G), where S and/or G can be aggregates +o optional one-way join capability +o improved the LAN HELLO protocol and included a state diagram +o revised packet format, and provided option support for all control packets +o added downstream state timeout to CBT router Expires October 1998 [Page 4] INTERNET-DRAFT CBTv3 Protocol Specification March 1998 +o revised the CBT "keepalive" mechanism between adjacent on-tree CBT routers +o overall provided added clarification of protocol events and mecha- nisms Unfortunately, most of these changes are not backwards compatible with RFC 2189, but at the time of writing, these changes remain in advent of any widespread implementation or deployment. 2. Building a CBT Multicast Domain When building a CBT multicast domain that attaches to other multicast domains, this document should be used in conjunction with draft-ietf- idmr-cbt-br-spec-**.txt, which describes the CBT Border Router Speci- fication and discusses various issues related to CBT domain intercon- nection. 3. Introduction & Terminology In CBT, a "core router" (or just "core") is a router which acts as a "meeting point" between a sender and group receivers. The term "ren- dezvous point (RP)" is used equivalently in some documents [2]. A router that is part of a CBT distribution tree is known as an "on- tree" router. An router which is on-tree for a group is one which has forwarding state for the group. We refer to a broadcast interface as any interface that is multicast capable. An "upstream" interface (or router) is one which is on the path towards the group's core router with respect to this router. A "down- stream" interface (or router) is one which is on the path away from the group's core router with respect to this router. Expires October 1998 [Page 5] INTERNET-DRAFT CBTv3 Protocol Specification March 1998 4. CBT Functional Overview The CBT protocol is designed to build and maintain a shared multicast distribution tree that spans only those networks and links leading to interested receivers. 4.1. The First Step: Joining the Tree As a first step, a host first expresses its interest in joining a group by multicasting an IGMP host membership report [3] across its attached link. Note that all CBT routers, similar to other multicast protocol routers, are expected to participate in IGMP for the purpose of monitoring directly attached group memberships, and acting as IGMP querier should the need arise. On receiving an IGMP Group Membership Report, a local CBT router invokes the tree joining process (unless it has already) by generat- ing a JOIN_REQUEST message, which is sent to the next hop on the path towards the group's core router (how the local router discovers which core to join is discussed in section 8). This join message must be explicitly acknowledged (JOIN_ACK) either by the core router itself, or by another router that is on the path between the sending router and the core, which itself has already successfully joined the tree. By default, joins/join-acks create bi-directional forwarding state, i.e. data can flow in the direction downstream -> upstream, or upstream -> downstream. In some circumstances a join/join-ack may include an option which results in uni-directional forwarding state; an interface over which a uni-directional join-ack is forwarded (not received) is marked as pruned. Data is permitted to be received via a pruned interface, but must not be forwarded over a pruned inter- face. Prune state can also be instantiated by the QUIT_NOTIFICATION message (see section 4.8). A join-request is made uni-directional by the inclusion of the "uni- directional" join option (see section 7.2.1), which is copied to the corresponding join-ack; join-request options are always copied to the corresponding join-ack. CBT now supports source specific joins/prunes so as to be better equipped when deployed in a transit domain; source specific control messages are only ever generated by CBT Border Routers (BRs). Source specific control messages follow G, not S, i.e. they are routed Expires October 1998 [Page 6] INTERNET-DRAFT CBTv3 Protocol Specification March 1998 towards the core (not S) and no further. Thus, (S, G) state only exists on the "core tree" in a CBT domain - those routers and links between a BR and core router. 4.2. Transient State The join message sets up transient join state in the router that originates it (a LAN's designated router (DR)) and the routers it traverses (an exception is described in section 4.9), and this state consists of ; "source" is optional, and relevant only to source specific control messages. On broadcast networks "downstream address" is the local IP address of the interface over which this router received the join (or IGMP Host Membership Report), and "upstream address" is the local IP address of the interface over which this router forwarded the join (according to unicast routing). On non-broadcast networks "downstream address" is the IP address of the join's previous hop, and "upstream address" is the IP address of the next hop (according to unicast routing). Tran- sient state eventually times out unless the join is explicitly acknowledged. When a join is acknowledged, the transient join state is transferred to the router's multicast forwarding cache. If "downstream address" implies a broadcast LAN, the transient state MUST be able to distinguish between a member host being reachable over that interface, and a downstream router being reachable over that interface. This is necessary so that, on receipt of a JOIN_ACK, a router with transient state knows whether "downstream address" only leads to a group member, in which case the JOIN_ACK is not forwarded, or whether "downstream address" leads to a downstream router that either originated or forwarded the join prior to this router receiv- ing it, in which case this router must forward a received JOIN_ACK. Precisely how this distinction is made is implementation dependent. A router must also be able to distinguish these two conditions wrt its forwarding cache. 4.3. Getting "On-tree" A router which terminates a JOIN_REQUEST (see section 4.8) sends a JOIN_ACK in response. A join acknowledgement (JOIN_ACK) traverses Expires October 1998 [Page 7] INTERNET-DRAFT CBTv3 Protocol Specification March 1998 the reverse path of the corresponding join message, which is possible due to the presence of the transient join state. Once the acknowl- edgement reaches the router that originated the join message, the new receiver can receive traffic sent to the group. A router is not considered "on-tree" until it has received a JOIN_ACK for a previously sent/forwarded JOIN_REQUEST, and has instantiated the relevant forwarding state. Loops cannot be created in a CBT tree because a) there is only one active core per group, and b) tree building/maintenance scenarios which may lead to the creation of tree loops are avoided. For exam- ple, if a router's parent router for a group becomes unreachable, the router (child) immediately "flushes" all of its downstream branches, allowing them to individually rejoin if necessary. Transient unicast loops do not pose a threat because a new join message that loops back on itself will never get acknowledged, and thus eventually times out. 4.4. Pruning and Prune State Any of a forwarding cache entry's children can be "pruned" by the immediate downstream router (child). In CBT, pruning is implemented by means of the QUIT_NOTIFICATION message, which is sent hop-by-hop in the direction: downstream --> upstream. A pruned child must be distinguishable from a non-pruned child - how is implementation dependent. One possible way would be to associate a "prune bit" with each child in the forwarding cache. The granularity of a quit (prune) can be (*, G), (*, Core), or (S, G). (*, Core) and (S, G) prunes are only relevant to core tree branches, i.e. those routers between a CBT BR and a core (inclu- sive). (*, G) prunes are applicable anywhere on a CBT tree. Refer to section 4.8 for the procedures relating to receiving and forwarding a quit (prune) message. Data is permitted to be received via a pruned interface, but must not be forwarded over a pruned interface. Thus, pruning is always uni- directional - it can stop data flowing downstream, but does not pre- vent data from flowing upstream. CBT BRs are able to take advantage of this uni-directionality; if the BR does not have any directly attached group members, and is not Expires October 1998 [Page 8] INTERNET-DRAFT CBTv3 Protocol Specification March 1998 serving a neighbouring domain with group traffic, it can elect not to receive traffic for the group sourced inside, or received via, the CBT domain. At the same time, if the BR is the ingress BR for a par- ticular (*, G), or (S, G), externally sourced traffic for (*, G) or (S, G) need not be encapsulated by the ingress BR and unicast to the relevant core router - the BR can send the traffic using native IP multicast. 4.5. The Forwarding Cache A CBT router MUST implement a multicast forwarding cache which sup- ports source specific (i.e. (S, G)) as well as source independent (i.e. (*, G) and (*, Core)) entries. This forwarding cache is known as the router's private CBT forwarding cache, or PFC. All implementations SHOULD also implement a shared (i.e. protocol independent) multicast forwarding cache - recommended in [8] to facilitate interoperability - which is only used by Border Routers and shared by all protocols operating on the Border Router (hence "shared"). This forwarding cache is known as the router's shared for- warding cache, or SFC. By having all CBT implementations support an SFC, any CBT router is eligible to become a Border Router. (*, Core) entries are only relevant to a CBT PFC. This state is rep- resented in the cache by specifying the core's IP unicast address in place of a group address/group address range. Wrt representing groups (G's) in the forwarding cache, G may be an individual Class D 32-bit group address, or may be a prefix repre- senting a contiguous range of group addresses (a group aggregate). Similarly, for source specific PFC entries, S can be an aggregate. Therefore, the PFC SHOULD support the inclusion of masks or mask lengths to be associated with each of S and G. In CBT, all PFC entries require that an entry's "upstream" interface is distinguishable as such - how is implementation dependent. CBT uses the term "parent" interchangeably with "upstream", and "child/children" interchangeably with "downstream". Whenever the sending/receiving of a CBT join or prune results in the instantiation of more specific state in the router (e.g. (*, Core) Expires October 1998 [Page 9] INTERNET-DRAFT CBTv3 Protocol Specification March 1998 state exists, then a (*, G) join arrives), the children of the new entry represent the union of the children from all other less spe- cific forwarding cache entries, as well as the child (interface) over which the message was received (if not already included). This is so that at most a single forwarding cache entry need be matched with an incoming packet. Note that in CBT, there is no notion of "expected" or "incoming" interface for (S, G) forwarding entries - these are treated just like (*, G) entries. Take the following example: core |\ | \ R1 \ | R2 - s1 | | | | R4-R3 - g1 | | BR Figure 1. In figure 1 suppose R3 joins (*, g1) via the path R4 --> R1 --> core. BR joins (*, core) via the path R4 --> R1 --> core. BR issues a (s1, g1) QUIT_NOTIFICATION resulting in the instantiation of (s1, g1) between BR and the core. Thus, on R4 (s1, g1) and (*, g1) states exist. Assuming (s1, g1) state was instantiated on R4 AFTER (*, g1) state, R4's (s1, g1) child list comprises two interfaces, one point- ing to BR, the other pointing to R3 (the latter copied from R4's (*, g1) entry). R4's (s1, g1) parent points towards R1. Wrt R4's (s1, g1) entry, it is not possible for R4 to determine which is the correct incoming interface for s1 traffic, since R2 may send s1 traffic towards the core, or towards R3. Thus, R4 may receive (s1, g1) traffic via any of its on-tree interfaces, though R4 will not forward the traffic over a pruned child. A forwarding cache entry whose children are ALL marked as pruned as a result of receiving quit messages may delete the entry provided there exists no less specific state with at least one non-pruned child. Expires October 1998 [Page 10] INTERNET-DRAFT CBTv3 Protocol Specification March 1998 A core router's "parent" is always NULL. 4.6. Packet Forwarding When a data packet arrives, the forwarding cache is searched for a best matching (according to longest match) entry. If no match is found the packet is discarded. If the packet arrived natively it is accepted if it arrives via an on-tree interface, i.e. any interface listed in a matching entry, otherwise the packet is discarded. Assum- ing the packet is accepted, a copy of the packet is forwarded over each other (outgoing) non-pruned interface listed in the matching entry. If the packet arrived IP-in-IP encapsulated and the packet has reached its final destination, the packet is decapsulated and treated as described above, EXCEPT the packet need not have arrived via an on-tree interface according to the matching entry. 4.7. The "Keepalive" Protocol The CBT forwarding state created by join/ack messages is soft state. This soft state is maintained by a separate "keepalive" mechanism rather than by join/ack refreshes. The CBT "keepalive" mechanism operates between adjacent on-tree routers. The keepalive mechanism is implemented by means of group specific ECHO_REQUEST and ECHO_REPLY messages, with the child routers responsible for periodically (explicitly) querying the parent router. The parent router (implicitly) monitors its children by expecting to periodically receive queries (ECHO_REQUESTs) from each child (per child router on non-broadcast networks; per child interface on broad- cast networks). The repeated absence of either an expected query (ECHO_REQUEST) or expected response (ECHO_REPLY) results in the cor- responding interface being marked as pruned in the router's forward- ing cache. This constitutes a state timeout due to an exception con- dition. An interface can also be pruned in an explicit and timely fashion by means of either a QUIT_NOTIFICATION (downstream to upstream) or FLUSH_TREE (upstream to downstream) message. Note that the network path comprising a CBT branch only changes due Expires October 1998 [Page 11] INTERNET-DRAFT CBTv3 Protocol Specification March 1998 to connectivity failure. An implementation could, however, invoke the tearing down and rebuilding of a tree branch whenever an underly- ing routing change occurs, irrespective of whether that change is due to connectivity failure. This is not CBT's default behaviour. 4.8. Control Message Precedence & Forwarding Criteria When a router receives a CBT join or (quit) prune message, if the message contains state for which the receiving router has no matching (according to longest match) state in its forwarding cache, the receiving router creates a forwarding cache entry for the correspond- ing state and forwards the control message upstream. CBT join and quit (prune) messages are forwarded as far upstream as the corresponding core router, or first router encountered with equally- or less specific state AND at least one other non-pruned child for that state. Forwarding state corresponding exactly to the granularity of the join/quit is instantiated in all routers between the join/quit originator and join/quit terminator, inclusive. Take the following examples: core core | | | | R R | / \ | / \ BR BR1 BR2 Figure 2. Figure 3. Assume in figure 2 BR has instantiated a priori (*, Core) state between itself and the core. It subsequently wishes to prune (*, G) from (*, Core) so sends a (*, G) QUIT_NOTIFICATION upstream. When the quit (prune) reaches router R, R already has less specific state (i.e. (*, Core)), but this quit message results in its only child interface (leading to BR) being marked as pruned under newly instan- tiated (*, G) state. Since router R has no other child under either (*, Core) or (*, G) states, R can forward the received (*, G) quit. Assume in figure 3 neither BR1 nor BR2 has a priori state between Expires October 1998 [Page 12] INTERNET-DRAFT CBTv3 Protocol Specification March 1998 itself and the core. Assume that BR1 is explicitly notified by its neighbouring domain of group membership for G, causing BR1 to send a (*, G) (bi-directional) JOIN_REQUEST towards the core, via router R. R receives the join, instantiates (*, G) state, and forwards the join since it has no pre-existing equal- or less specific state. Assume subsequently that BR2 is explicitly notified by its neighbour- ing domain of interest in (S, G) traffic. BR2 sends a (S, G) (bi- directional) JOIN_REQUEST towards the core, via router R. R receives the join, and already has less specific state (i.e. (*, G)) with one other child (leading to BR1). Thus, R instantiates (S, G) state (copying children from (*, G) state) and includes the child (pointing to BR2), but does not forward the (S, G) join due to the pre-exis- tence of less specific state with one other non-pruned child. A router with a forwarding cache entry whose children are ALL pruned can remove (delete) the corresponding entry UNLESS there exists less specific state with at least one non-pruned child. If an entry is eligible for deletion, a quit representing the same granularity as the forwarding cache entry is sent upstream. Returning to the example used with figure 2 above, when router R receives the (*, G) quit sent by BR and instantiates the correspond- ing state, R can send a (*, G) quit upstream since there are no less specific entries with _other_ non-pruned children. However, the (*, G) state in R cannot be removed despite all (*, G) children being pruned (in this e.g. there is only one child) because a less specific (i.e. (*, Core)) cache entry exists with a non-pruned child. CBT flush messages are forwarded downstream removing all equally- and more specific state. A flush messsage is terminated by a leaf router, or a router with less specific state; the flush message does not affect the terminating router's less specific state. 4.9. Broadcast LANs It cannot be assumed all of the routers on a broadcast link have a uniform view of unicast routing; this is particularly the case when a broadcast link spans two or more unicast routing domains. This could lead to multiple upstream tree branches being formed for any one group (an error condition) unless steps are taken to ensure all routers on the link agree on a single LAN upstream forwarding router. CBT routers attached to a broadcast link participate in an explicit Expires October 1998 [Page 13] INTERNET-DRAFT CBTv3 Protocol Specification March 1998 election mechanism that elects a single router, the designated router (DR); the DR is a "join broker" for all LAN routers in so far as joins are routed according to the DR's view of routing - without a DR there could be conflicts potentially resulting in tree loops. The router that actually forwards a join off-LAN for a group (towards the group's core) is known as the LAN "upstream router" for that group. A group's LAN upstream router may or may not be the LAN DR. With regards to a JOIN_REQUEST being multicast onto a broadcast LAN, the LAN DR decides over which interface to forward it. Depending on the group's core location, the DR may re-direct (unicast) the join back across the same link as it arrived to what it considers is the best next hop towards the core. In this case, the LAN DR does not keep any transient state for the JOIN_REQUEST it passed on. This best next hop router is then the LAN upstream forwarder for the corre- sponding group. This re-direction only applies to joins, which are relatively infrequent - native multicast data never traverses a link more than once. For the case where a DR *originates* a join, and has to unicast it to a LAN neighbour, the DR MUST keep transient state for the join. On broadcast LANs it is necessary for a router to be able distinguish between a directly attached (downstream) group member, and any (at least one) downstream on-tree router(s). For a router to be able to send a QUIT_NOTIFICATION (prune) upstream it must be sure it neither has any (downstream) directly attached group members or on-tree routers reachable via a downstream interface. How this is achieved is implementation-dependent. One possible way would be for a CBT for- warding cache to maintain 2 extra bits for each child entry - one bit to indicate the presence of a group member on that interface, the other bit indicating the presence of an on-tree router on that inter- face. Both these bits must be clear (i.e. unset) before this router can send a QUIT_NOTIFICATION for the corresponding state upstream. 4.10. The "all-cbt-routers" Group The IP destination address of CBT control messages is either the "all-cbt-routers" group address, or a unicast address, as appropri- ate. All CBT control messages are multicast over broadcast links to the "all-cbt-routers" group (IANA assigned as 224.0.0.15), with IP TTL 1. Expires October 1998 [Page 14] INTERNET-DRAFT CBTv3 Protocol Specification March 1998 The exception to this is if a DR decides to forward a control packet back over the interface on which it arrived, in which the DR unicasts the control packet. The IP source address of CBT control messages is the sending router's outgoing interface. CBT control messages are unicast over non-broadcast media. A CBT control message originated or forwarded by a router is never processed by itself. 4.11. Non-Member Sending This section is relevant to non-member sending where the data is sourced inside the CBT domain. A host always originates native multicast data. All multicast traffic is received promiscuously by CBT routers. All but the LAN's desig- nated router (DR) discard the packet. The DR looks up the relevant mapping, encapsulates (IP-in-IP) the data, and unicasts it to the group's core router. Consequently, no group state is required in the network between the first hop router and the group's core. On arriving at the core router, the data packet is decapsulated and disemminated over the group tree in the manner already described. 5. Protocol Specification Details Details of the CBT protocol are presented in the context of a single router implementation. 5.1. CBT HELLO Protocol The HELLO protocol is used to elect a designated router (DR) on broadcast-type links. A router represents its status as a link's DR by setting the DR-flag on that interface; a DR flag is associated with each of a router's broadcast interfaces. This flag can only assume one of two values: Expires October 1998 [Page 15] INTERNET-DRAFT CBTv3 Protocol Specification March 1998 TRUE or FALSE. By default, this flag is FALSE. A network manager can preference a router's DR eligibility by option- ally configuring an HELLO preference, which is included in the router's HELLO messages. Valid configuration values range from 1 to 254 (decimal), 1 representing the "most eligible" value. In the absence of explicit configuration, a router assumes the default HELLO preference value of 255. The elected DR uses HELLO preference zero (0) in HELLO advertisements, irrespective of any configured prefer- ence. The DR continues to use preference zero for as long as it is running. HELLO messages are multicast periodically to the all-cbt-routers group, 224.0.0.15, using IP TTL 1. The advertisement period is speci- fied by an hello timer, which is [HELLO_INTERVAL] seconds. HELLO messages have a suppressing effect on those routers which would advertise a "lesser preference" in their HELLO messages; a router resets its hello timer if the received HELLO is "better" than its own. Thus, in steady state, the HELLO protocol incurs very little traffic overhead. The DR election winner is that which advertises the lowest HELLO preference, or the lowest-addressed in the event of a tie. The situation where two or more routers attached to the same broad- cast link are advertising HELLO preference 0 should never arise. How- ever, should this situation arise, all but the lowest addressed zero- advertising router relinquishes its claim as DR immediately by unset- ting the DR flag on the corresponding interface. The relinquishing router(s) subsequently advertise their previously used preference value in HELLO advertisements. 5.1.1. Sending HELLOs When a router starts up, it multicasts two HELLO messages over each of its broadcast interfaces in successsion. The DR flag is initially unset (FALSE) on each broadcast interface. This avoids the situation in which each router on a broadcast subnet believes it is the DR, thus preventing the multiple forwarding of join-requests should they arrive during this start up period. If, after sending an HELLO message, no "better" HELLO message is received after HOLDTIME seconds, the router assumes the role of DR on Expires October 1998 [Page 16] INTERNET-DRAFT CBTv3 Protocol Specification March 1998 the corresponding interface. Whenever a router's status goes from non-DR to DR it immediately sends a zero preferenced HELLO message. Once a router becomes DR on an interface, it should remain DR for as long as it is running (assuming a lower-addressed router on the same subnet does not advertise a zero-preferenced HELLO message). A router sends an HELLO message whenever its hello timer expires, or its transition timer [DR_TRANS_TIMER] (if running) expires. Whenever a router sends an HELLO message, it resets its hello timer. The hello timer of the DR is [HELLO_INTERVAL] seconds. The hello timer of all other (non-DR) routers is [HELLO_INTERVAL] + rnd seconds, where "rnd" is a random interval between 1 and [HOLDTIME] seconds. 5.1.2. Receiving HELLOs A router does not respond to an HELLO message if the received HELLO is "better" than its own, or equally preferenced but lower addressed. In this case, if the router has a transition timer [DR_TRANS_TIMER] running on the same interface, the timer is cancelled. A router must respond to an HELLO message if that received is lesser preferenced (or equally preferenced but higher addressed) than would be sent by this router over the same interface. This response HELLO is sent immediately by the DR, or on expiry of an interval timer which is set between one and [HOLDTIME] seconds by non-DRs - this interval is known as the [DR_TRANS_TIMER] interval. Non-DRs cancel this transition timer if a better hello is received whilst this timer is running. Figure 4 shows the state diagram for the HELLO protocol. The following apply to the state diagram: +o for the DR, hello timer = HELLO_INTERVAL +o for non-DR(s), hello timer = HELLO_INTERVAL + rnd +o rnd = random delay timer between 1 and HOLDTIME seconds +o the DR always sends HELLO message with Preference zero +o trans timer ([DR_TRANS_TIMER]) is a transition timer, set to rnd Expires October 1998 [Page 17] INTERNET-DRAFT CBTv3 Protocol Specification March 1998 Start O | A: Send 2 HELLO's | A: Start HOLDTIME V E: Recv better HELLO **********---- A: Reset hello timer * Init * | E: Recv worse HELLO -------------------------------* *<--- | ********** | | | | E: HOLDTIME expires | | A: Send HELLO | | A: Reset hello timer | V | ********** | ----* *---- | E: Rec worse HELLO | * DR * | E: hello timer expires | A: Send HELLO | * * | A: Send HELLO | A: Reset hello timer --->**********<--- A: Reset hello timer | | ^ | | | E: HOLDTIME expires | | | A: Send HELLO | | | A: Reset hello timer | E: Recv better HELLO | ------------------------------- | A: Reset hello timer | | |________________________________ | | | | | | | | E: Recv better HELLO | | | A: Cancel trans timer V V E: Recv better HELLO | ********** A: Reset hello timer ************* A: Reset hello timer ******* * Not DR *-------------------->* *<-----------------------* * * & recd * * Not DR * * DR * * worse * * * *wait * * hello *<--------------------* *----------------------->* * ********** E: Recv worse HELLO ************* E: hello timer expires ******* | A: Start trans timer | ^ A: Send HELLO ^ | | | A: Start HOLDTIME | | --------- A: Reset hello timer | | E: Rec better HELLO | | A: Reset hello time | | | ------------------------------------------------------------------- E: trans timer expires A: Send HELLO A: Start HOLDTIME A: Reset hello timer Figure 4: HELLO Protocol State Diagram Expires October 1998 [Page 18] INTERNET-DRAFT CBTv3 Protocol Specification March 1998 5.2. JOIN_REQUEST Processing A JOIN_REQUEST is the CBT control message used to register a member host's interest in joining the distribution tree for the group. A JOIN_REQUEST can be of (*, G), (*, Core), or (S, G) granularity. 5.2.1. Sending JOIN_REQUESTs A JOIN_REQUEST can be only be originated by a LAN designated router (DR), or by a CBT Border Router (BR). A join message cannot be sent by a router that is the core router for the group. A join message is sent hop-by-hop towards the core router for the group (see section 8 - Core Router Discovery). Refer to section 4.8 for the procedures relating to forward- ing/receiving a join message. A router sending a join message caches state for each join sent/forwarded. This state is known as "transient join state". The router MUST be able to distinguish between reaching a group member host, or a router, or both, via its "Downstream address". How this is achieved is implemen- tation dependent (see section 4.9). A join originator is responsible for any retransmissions of this message if a response is not received within [RTX_INTERVAL]. Retransmissions are not generated by any router other than the join originator. It is an error if no response is received after [JOIN_TIMEOUT] sec- onds. If this error condition occurs, the joining process may be re- invoked by the receipt of the next IGMP host membership report from a locally attached member host. IGMP host membership reporting may not be applicable to a CBT BR, and so it is recommended [JOIN_TIMEOUT] be extended to, for example, 3 times the default value (see section 6). 5.2.2. Receiving JOIN_REQUESTs If a JOIN_REQUEST is eligible for forwarding upstream (see section 4.8), transient join state is created for this join (unless it already exists) and the join is forwarded upstream. Expires October 1998 [Page 19] INTERNET-DRAFT CBTv3 Protocol Specification March 1998 If this transient join state is not "confirmed" with a join acknowl- edgement (JOIN_ACK), the state is timed out after [TRANSIENT_TIMEOUT] seconds. A join cannot be acknowledged by an on-tree router if the join arrives via the router's parent interface for the group. A router which originates an acknowledgment for a join never forwards the join further. 5.2.3. Additional Aspects Related to Receiving Multicast JOIN_REQUESTs Some aspects related to receiving multicast joins have already been discussed in section 4.9. In addition to that section, if a router receives a multicast join and the router has a child interface deletion timer [CHILD_DEL_TIMER] running on the same interface that is equally- or less-specific than the received join, the timer is cancelled (see section 5.4.2). This router acknowledges the received join. 5.3. JOIN_ACK Processing A JOIN_ACK is the mechanism used by a router to confirm to a down- stream router that the upstream router has instantiated the desired forwarding state. A JOIN_ACK must be of the same granularity as the corresponding JOIN_REQUEST, and any JOIN_REQUEST options must be copied to the JOIN_ACK. The downstream router receiving the join-ack converts its corresponding transient state to its forwarding cache, then removes the relevant transient state. 5.3.1. Sending JOIN_ACKs A router which terminates a JOIN_REQUEST (see section 4.8) sends a JOIN_ACK in response. A JOIN_ACK is sent over the same interface as the corresponding JOIN_REQUEST was received. Any options present in the join must be copied to the join-ack. Expires October 1998 [Page 20] INTERNET-DRAFT CBTv3 Protocol Specification March 1998 The sending of a JOIN_ACK - which inlcudes the "uni-directional" option - over a child results in the child being pruned. The sending of a JOIN_ACK - which includes no options - over a child that is marked as pruned results in that child being "un-pruned". 5.3.2. Receiving JOIN_ACKs An arriving JOIN_ACK must be matched to the corresponding from the router's cached transient state. If no match is found, the JOIN_ACK is dis- carded. If a match is found, a CBT forwarding cache entry is created (or updated) by transferring the necessary transient join state to the router's forwarding cache. The interface over which the join-ack arrives becomes the entry's parent. If the router's transient join state indicates that a router is pre- sent downstream, it forwards the join-ack accordingly. A join-ack is not forwarded downstream if this router's transient state indicates ONLY group member hosts reside downstream (as opposed to router(s)). A router's transient and forwarding states MUST be able to distin- guish these two conditions. Once transient state has been confirmed by transferring it to the forwarding cache, the transient state is deleted. 5.4. QUIT_NOTIFICATION Processing A QUIT_NOTIFICATION (quit or prune) is both a means of improving, i.e. speeding up, group leave latency for CBT leaf routers, and a means for CBT Border Routers to elect not to receive traffic either from sources within, or via, the CBT domain. A quit (prune) can be of (*, G), (*, Core), or (S, G) granularity. A single quit message can carry information representing multiple dif- ferent states. 5.4.1. Sending QUIT_NOTIFICATIONs A CBT router *originates* a QUIT_NOTIFICATION of the relevant granu- larity when all children of a forwarding cache entry become pruned, AND there exists no less specific state with at least one other non- pruned child. Expires October 1998 [Page 21] INTERNET-DRAFT CBTv3 Protocol Specification March 1998 Forwarding rules for a quit are explained in section 4.8. A QUIT_NOTIFICATION is not acknowledged. To help ensure consistency between a child and parent router given the potential for loss of a QUIT_NOTIFICATION, a total of [MAX_RTX] QUIT_NOTIFICATIONs are sent, each HOLDTIME seconds after the previous one. 5.4.2. Receiving QUIT_NOTIFICATIONs The receipt of a valid QUIT_NOTIFICATION results in the arrival interface being marked as pruned. Rules regarding the forwarding of a received quit (prune) are explained in section 4.8. If a quit is accepted and was unicast, the child via which the quit was received is added to the entry's child list (if not already), and immediately marked as pruned. If the quit is accepted and was multicast, and the receiving router has pre-existing forwarding cache state of equal granularity, the router sets a child interface deletion timer [CHILD_DEL_TIMER] on the arrival interface with the same granularity. Because this router might be acting as a parent router for multiple downstream routers attached to the arrival link, [CHILD_DEL_TIMER] interval gives those routers that did not send the QUIT_NOTIFICATION, but received it over their parent interface, the opportunity to ensure that the parent router does not remove the link from its child interface list. Therefore, on receipt of a multicast QUIT_NOTIFICA- TION over a PARENT interface, a receiving router schedules an ECHO_REQUEST for the group for sending at a random interval between 0 (zero) and HOLDTIME seconds. The granularity of the echo MUST be equal or less specific than the received quit. The receipt of an ECHO_REQUEST for the group by the parent router over a child interface on which [CHILD_DEL_TIMER] is running for the group, results in the timer being cancelled, provided the echo is equal or less specific than the granularity of the timer. If the [CHILD_DEL_TIMER] expires, it implies no downstream on-tree router is present on that interface. If no group member is present on the same interface, the child can be marked as pruned in the relevant forwarding cache entry. Expires October 1998 [Page 22] INTERNET-DRAFT CBTv3 Protocol Specification March 1998 5.5. ECHO_REQUEST Processing The ECHO_REQUEST/ECHO_REPLY messages constitute a "keepalive" mecha- nism which allows a group's child and parent routers to monitor each other's liveness. ECHO_REQUESTs can be of (*, G), (*, Core), or (S, G) granularity. A single echo can carry information representing multiple different states. The following timers are specifically relevant to the "keepalive" mechanism. The granularity of the timers corresponds the granularity of the state that is to be "kept alive", i.e. it can be (*, G), (*, Core), or (S, G), and is per interface: [ECHO_INTERVAL], [UPSTREAM_EXPIRE_TIME] (monitors parent interface), and [DOWN- STREAM_EXPIRE_TIME] (monitors child interface). 5.5.1. Sending ECHO_REQUESTs Whenever a router creates a forwarding cache entry due to the receipt of a JOIN_ACK, the router begins the periodic sending of ECHO_REQUEST messages over its parent interface. The granularity of the echo is equal to that of the sending router's forwarding cache entry, i.e. (*, G), (*, Core), or (S, G). An ECHO_REQUEST is multicast (224.0.0.15, TTL 1) or unicast, as appropriate. ECHO_REQUEST messages are sent at [ECHO_INTERVAL] second intervals. To avoid undesirable synchronisation effects each of a host's inter- face's [ECHO_INTERVAL] timers includes a random response interval. Whenever an ECHO_REQUEST is sent, [ECHO_INTERVAL] is reset for each (*, G), or (*, Core), or (S, G), reported in the ECHO_REQUEST. If no response is forthcoming, the upstream interface timer [UPSTREAM_EXPIRE_TIME] running on the upstream interface for the state reported in the ECHO_REQUEST will eventually expire. A FLUSH_TREE message is sent over all pruned and non-pruned children. The flush message reports the same state granularity as the echo for which no response was forthcoming. 5.5.2. Receiving ECHO_REQUESTs Whenever an ECHO_REQUEST is received on an interface, if the router's interface is a parent interface for the reported state(s) it resets Expires October 1998 [Page 23] INTERNET-DRAFT CBTv3 Protocol Specification March 1998 its [ECHO_INTERVAL] timer on that interface for those state(s), if appropriate. This implies that an ECHO_REQUEST which is multicast on a LAN suppresses the ECHO_REQUEST that is about to be sent by another router(s) for the same state(s) over the same interface. If the router's receiving interface is a child interface for the reported state(s), it resets its [DOWNSTREAM_EXPIRE_TIME] timer on that interface for those state(s), if appropriate, and sends an ECHO_REPLY to the same child reporting all states for which this router is the parent for that child. Failure to receive an ECHO_REQUEST for a state(s) from a child after [DOWNSTREAM_EXPIRE_TIME] results in the immediate removal of the child from the relevant forwarding cache entry if the child is reach- able via a non-broadcast network. If the child is reachable via a broadcast network, the expiry of [DOWNSTREAM_EXPIRE_TIME] results in the removal of the child from the router's relevant forwarding cache entry only if the router is sure no other downstream on-tree routers are reachable via the same interface, and no group members are pre- sent on that interface. 5.6. ECHO_REPLY Processing ECHO_REPLY messages are sent in immediate response to ECHO_REQUEST messages received over a valid child interface for the reported state(s). The ECHO_REPLY reports all state(s) for which this router considers itself the parent to the echo-requesting child. A single ECHO_REPLY can carry information representing multiple dif- ferent states. 5.6.1. Sending ECHO_REPLY messages An ECHO_REPLY message is sent in immediate response to receiving an ECHO_REQUEST message via one of this router's valid children for the reported state(s). The ECHO_REPLY contains a list of all states for which this router considers itself the parent to the child. 5.6.2. Receiving ECHO_REPLY messages For each state reported in an ECHO_REPLY message received from a valid parent, the timers [UPSTREAM_EXPIRE_TIME] and [ECHO_INTERVAL] Expires October 1998 [Page 24] INTERNET-DRAFT CBTv3 Protocol Specification March 1998 are refreshed for the reported states. Failure to receive the relevant ECHO_REPLY [HOLDTIME] seconds after sending an ECHO_REQUEST results in the corresponding ECHO_REQUEST being resent. An ECHO_REQUEST can be resent a maximum of [MAX_RTX] times. If no response is forthcoming, the corresponding state(s) is removed from the parent after [UPSTREAM_EXPIRE_TIME] seconds, and a FLUSH_TREE message is sent over each of the children represented by the state(s). [Note: If this router has directly attached members for any of the flushed groups, the receipt of an IGMP host membership report for any of those groups will prompt this router to rejoin the corresponding tree(s).] 5.7. FLUSH_TREE Processing The FLUSH_TREE (flush) message is the mechanism by which a router invokes the tearing down of all its downstream branches for a partic- ular group. A flush can be of (*, G), (*, Core), or (S, G) granularity. A single flush message can carry information representing multiple different states. 5.7.1. Sending FLUSH_TREE messages A FLUSH_TREE message is sent over all pruned and non-pruned children whenever a router loses connectivity to its parent. Once a flush message(s) has been sent, the relevant forwarding cache entry/entries are deleted. 5.7.2. Receiving FLUSH_TREE messages CBT flush messages are forwarded downstream removing all equally- and more specific state. A flush messsage is terminated by a leaf router, or a router with less specific state; the flush message does not affect the terminating router's less specific state. Expires October 1998 [Page 25] INTERNET-DRAFT CBTv3 Protocol Specification March 1998 6. Timers and Default Values This section provides a summary of the timers described above, together with their recommended default values. Other values may be configured; if so, the values used should be consistent across all CBT routers attached to the same network. +o [HELLO_INTERVAL]: the interval between sending an HELLO message. Default: 60 seconds. +o [HELLO_PREFERENCE]: Default: 255. +o [HOLDTIME]: generic response interval. Default: 3 seconds. +o [DR_TRANS_TIMER]: random delay timer used in transition from non-DR to DR. Default: delay set at between 1 and [HOLDTIME] seconds. +o [MAX_RTX]: default maximum number of retransmissions. Default 3. +o [RTX_INTERVAL]: message retransmission time. Default: 5 seconds. +o [JOIN_TIMEOUT]: raise exception due to tree join failure. Default: (3.5*[RTX_INTERVAL]) seconds. +o [TRANSIENT_TIMEOUT]: delete (unconfirmed) transient state. Default: [JOIN_TIMEOUT] seconds. +o [CHILD_DEL_TIMER]: remove child interface from forwarding cache. Default: (1.5*HOLDTIME) seconds. +o [UPSTREAM_EXPIRE_TIME]: time to send a QUIT_NOTIFICATION to our non-responding parent. Default: ([MAX_RTX]*[RTX_INTERVAL] + [HOLD- TIME]) seconds. +o [DOWNSTREAM_EXPIRE_TIME]: not heard from child, time to remove child interface. Default: ([ECHO_INTERVAL] + [UPSTREAM_EXPIRE_TIME]) seconds. +o [ECHO_INTERVAL]: interval between sending ECHO_REQUEST to parent routers. Default: 60 + rnd seconds, where "rnd" is between 0 and [HOLDTIME] seconds. Expires October 1998 [Page 26] INTERNET-DRAFT CBTv3 Protocol Specification March 1998 7. CBT Packet Formats and Message Types CBT control packets are encapsulated in IP. CBT has been assigned IP protocol number 7 by IANA [4]. 7.1. CBT Common Control Packet Header All CBT control messages have a common fixed length header. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | vers | type | addr len | checksum | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 5. CBT Common Control Packet Header This CBT specification is version 3. CBT packet types are: +o type 0: HELLO +o type 1: JOIN_REQUEST +o type 2: JOIN_ACK +o type 3: QUIT_NOTIFICATION +o type 4: ECHO_REQUEST +o type 5: ECHO_REPLY +o type 6: FLUSH_TREE +o type 7: Bootstrap Message (optional) +o type 8: Candidate Core Advertisement (optional) +o Addr Length: address length in bytes of unicast or multicast addresses carried in the control packet. Expires October 1998 [Page 27] INTERNET-DRAFT CBTv3 Protocol Specification March 1998 +o Checksum: the 16-bit one's complement of the one's complement sum of the entire CBT control packet. 7.2. Packet Format for CBT Control Packet Types 0 - 6 The following packet format is used on CBT control packet types 0 - 6. For the format of CBT "Bootstrap" control packets, see section 8 below. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | CBT Control Packet Header | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | # of groups | # of options | reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | group (or Core) address #1 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | group address #2 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | group address #n | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | option type | option len | option value... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 6. CBT Control Packet Format for Types 0 - 6. Control Packet Field Definitions: +o # of groups: the number of individual (non-contiguous, or set of contiguous) groups that are included in the main body of this mes- sage. +o # of options: the number of distinct options (as defined by option type) carried in this control packet. +o group address #n: multicast group address. A control packet repre- senting all groups associated with a core router (*, Core) includes only one group address field which contains the unicast IP address of the relevant core router. Any group(s) exempted from those rep- resented in the main body of the message i.e. groups for which this message should not apply, are represented using option type 4 (see below). Expires October 1998 [Page 28] INTERNET-DRAFT CBTv3 Protocol Specification March 1998 +o option type: unique option identifier. +o option len: option length. The number of bytes consumed by this option's value. +o option value: variable length option value. NOTE: all control messages are padded to a 32-bit boundary. 7.2.1. Option Type Definitions +o type 1: Hello Preference. Applicable only to HELLO packets to denote this HELLO packet's preference value. This option consumes 1 byte of "option value". +o type 2: Uni-directional. Applicable only to JOIN_REQUESTs to indi- cate a uni-directional join. +o type 3: Inclusion List. Enables the reporting of a contiguous set of groups using a group mask, for which this control message should apply. The mask is represented by an 8-bit "masklen" field which is always included as the first 8 bits of this option's value. One or more group prefixes follow, padded out (zeroed) to 32 bits. +o type 4: Exclusion List. This option allows for the reporting of group(s) to be exempted from the set reported elsewhere in this control packet. A contiguous range of groups may be specified using a group mask. The mask is represented by an 8-bit "masklen" field which is always included as the first 8 bits of this option's value. One or more group prefixes follow, padded out (zeroed) to 32 bits. +o type 5: (Source, Group) Info. This option enables a control message to report source specific group information, e.g. it is used on (S, G) joins, (S, G) quits, and (S, G) flushes. The first 8 bits of this option's value represent the number of distinct (source, group) pairs ("# (S,G)") contained in this option. The following 8 bits represent the mask length ("masklen") to be applied to "source", which comprises the following 32-bits, padded Expires October 1998 [Page 29] INTERNET-DRAFT CBTv3 Protocol Specification March 1998 out (zeroed) to 32-bits if necessary. "Group" is encoded exactly like "source", using an 8-bit masklen, followed by a group prefix, padded out to 32 bits if necessary. This sequence is repeated "# (S,G)" times. 8. Core Router Discovery There are two available options for CBTv2 core discovery; the "boot- strap" mechanism (as currently specified with the PIM sparse mode protocol [2]) is applicable only to intra-domain core discovery, and allows for a "plug & play" type operation with minimal configuration. The disadvantage of the bootstrap mechanism is that it is much more difficult to affect the shape, and thus optimality, of the resulting distribution tree. Also, to be applicable, all CBT routers within a domain must implement the bootstrap mechanism. The other option is to manually configure leaf routers with mappings (note: leaf routers only); this imposes a degree of administrative burden - the mapping for a particular group must be coordinated across all leaf routers to ensure consistency. Hence, this method does not scale particularly well. However, it is likely that "better" trees will result from this method, and it is also the only available option for inter-domain core discovery currently available. 8.1. "Bootstrap" Mechanism Overview It is unlikely that the bootstrap mechanism will be appended to a well-known network layer protocol, such as IGMP [3], though this would facilitate its ubiquitous (intra-domain) deployment. Therefore, each multicast routing protocol requiring the bootstrap mechanism must implement it as part of the multicast routing protocol itself. A summary of the operation of the bootstrap mechanism follows (details are provided in [6]). It is assumed that all routers within the domain implement the "bootstrap" protocol, or at least forward bootstrap protocol messages. A subset of the domain's routers are configured to be CBT candidate Expires October 1998 [Page 30] INTERNET-DRAFT CBTv3 Protocol Specification March 1998 core routers. Each candidate core router periodically (default every 60 secs) advertises itself to the domain's Bootstrap Router (BSR), using "Core Advertisement" messages. The BSR is itself elected dynamically from all (or participating) routers in the domain. The domain's elected BSR collects "Core Advertisement" messages from can- didate core routers and periodically advertises a candidate core set (CC-set) to each other router in the domain, using traditional hop- by-hop unicast forwarding. The BSR uses "Bootstrap Messages" to advertise the CC-set. Together, "Core Advertisements" and "Bootstrap Messages" comprise the "bootstrap" protocol. When a router receives an IGMP host membership report from one of its directly attached hosts, the local router uses a hash function on the reported group address, the result of which is used as an index into the CC-set. This is how local routers discover which core to use for a particular group. Note the hash function is specifically tailored such that a small number of consecutive groups always hash to the same core. Further- more, bootstrap messages can carry a "group mask", potentially limit- ing a CC-set to a particular range of groups. This can help reduce traffic concentration at the core. If a BSR detects a particular core as being unreachable (it has not announced its availability within some period), it deletes the rele- vant core from the CC-set sent in its next bootstrap message. This is how a local router discovers a group's core is unreachable; the router must re-hash for each affected group and join the new core after removing the old state. The removal of the "old" state follows the sending of a QUIT_NOTIFICATION upstream, and a FLUSH_TREE message downstream. Expires October 1998 [Page 31] INTERNET-DRAFT CBTv3 Protocol Specification March 1998 8.2. Bootstrap Message Format 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | CBT common control packet header | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | For full Bootstrap Message specification, see [6] | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 7. Bootstrap Message Format 8.3. Candidate Core Advertisement Message Format 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | CBT common control packet header | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | For full Candidate Core Adv. Message specification, see [6] | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 8. Candidate Core Advertisement Message Format Acknowledgements Special thanks goes to Paul Francis, NTT Japan, for the original brainstorming sessions that brought about this work. The use of a single core model since CBTv2 owes much to Clay Shields and his work on Ordered CBT (OCBT) [7]. Clay identified and proved several failure modes of CBT(v1) as it was specified with multiple cores, and also suggested using an unreliable quit mechanism, which has appeared since the CBTv2 specification as the QUIT_NOTIFICATION. Clay also provided more general constructive comments on the CBT architecture and specification. Others that have contributed to the progress of CBT include Ken Carl- berg, Eric Crawley, Jon Crowcroft, Bill Fenner, Mark Handley, Ahmed Helmy, Nitin Jain, Alan O'Neill, Steven Ostrowsksi, Radia Perlman, Expires October 1998 [Page 32] INTERNET-DRAFT CBTv3 Protocol Specification March 1998 Scott Reeve, Benny Rodrig, Martin Tatham, Dave Thaler, Sue Thompson, Paul White, and other participants of the IETF IDMR working group. Thanks also to 3Com Corporation and British Telecom Plc for assisting with funding this work. References [1] Core Based Trees (CBT) Multicast Routing Architecture; A. Bal- lardie; RFC 2201; ftp://ds.internic.net/rfc/rfc2201.txt. [2] Protocol Independent Multicast (PIM) Sparse Mode/Dense Mode; D. Estrin et al; http://netweb.usc.edu/pim RFC XXXX and Working drafts. [3] Internet Group Management Protocol, version 2 (IGMPv2); W. Fenner; ftp://ds.internic.net/internet-drafts/draft-ietf-idmr-igmp-v2-08.txt. Working draft, 1998. [4] Assigned Numbers; J. Reynolds and J. Postel; RFC 1700, October 1994. [5] CBT Multicast Border Router Specification; A. Ballardie, B. Cain, Z. Zhang; ftp://ds.internic.net/internet-drafts/draft-ietf-idmr-cbt- br-spec-**.txt. Working draft, March 1998. [6] A Dynamic Bootstrap Mechanism for Rendezvous-based Multicast Rout- ing; D. Estrin et al.; Technical Report; http://catarina.usc.edu/pim [7] The Ordered Core Based Tree Protocol; C. Shields and J.J. Garcia- Luna-Aceves; In Proceedings of IEEE Infocom'97, Kobe, Japan, April 1997; http://www.cse.ucsc.edu/research/ccrg/publications/info- comm97ocbt.ps.gz [8] Interoperability Rules for Multicast Routing Protocols; D. Thaler; ftp://ds.internic.net/internet-drafts/draft-thaler-multicast- interop-01.txt; March 1997. Expires October 1998 [Page 33] INTERNET-DRAFT CBTv3 Protocol Specification March 1998 Author Information: Tony Ballardie, Research Consultant, e-mail: ABallardie@acm.org Brad Cain, Bay Networks Inc., 3, Federal Street, Billerica, MA 01821, USA. e-mail: bcain@baynetworks.com voice: +1 978 916 1316 Zhaohui "Jeffrey" Zhang, Bay Networks Inc., 600 Technology Park Drive, Billerica, MA 01821, USA. Phone: +1 (978) 439 0280 Fax: +1 978 670 8760 e-mail: zzhang@baynetworks.com Expires October 1998 [Page 34]