HTTP/1.1 200 OK Date: Tue, 09 Apr 2002 02:53:05 GMT Server: Apache/1.3.20 (Unix) Last-Modified: Fri, 09 Feb 1996 23:00:00 GMT ETag: "2edafb-b59f-311bd1f0" Accept-Ranges: bytes Content-Length: 46495 Connection: close Content-Type: text/plain Inter-Domain Multicast Routing (IDMR) A. J. Ballardie INTERNET-DRAFT University College London February 9th, 1996 Core Based Trees (CBT) Multicast Architecture Status of this Memo This document is an Internet Draft. Internet Drafts are working do- cuments of the Internet Engineering Task Force (IETF), its Areas, and its Working Groups. Note that other groups may also distribute work- ing documents as Internet Drafts). Internet Drafts are draft documents valid for a maximum of six months. Internet Drafts may be updated, replaced, or obsoleted by other documents at any time. It is not appropriate to use Internet Drafts as reference material or to cite them other than as a "working draft" or "work in progress." Please check the I-D abstract listing contained in each Internet Draft directory to learn the current status of this or any other In- ternet Draft. Abstract CBT is a new multicast architecture that builds a single, shared delivery tree. Traditional multicast algorithms build a source-based tree per sender subnetwork, as does DVMRP, the protocol currently de- ployed in the Internet's multicast backbone (MBONE) [1]. The primary advantage of the shared tree approach is that it typically offers more favourable scaling characteristics than do existing multicast algorithms. The CBT protocol [2] is a network layer multicast protocol that builds a shared delivery tree. The CBT protocol also allows for the incorporation of security features [2, 3], and the potential to reserve network resources as part of delivery tree set-up [14]. The sending and receiving of multicast data by hosts on a subnetwork conforms to the traditional IP multicast service model [4]; multicast data on a subnetwork appears no different to that of existing schemes. Expires August 9th, 1996 [Page 1] INTERNET-DRAFT CBT Multicast Architecture February 1996 CBT is progressing through the IDMR working group of the IETF. The CBT protocol is described in an accompanying document [2]. For this, and all IDMR-related documents, see http://www.cs.ucl.ac.uk/ietf/idmr TABLE OF CONTENTS 1. Background................................................... 3 2. Introduction................................................. 4 3. Source Based Tree Algorithms & the Motivation for Shared Trees.............................. 5 3.1 Distance-Vector Multicast Algorithm...................... 5 3.2 Link State Multicast Algorithm........................... 6 3.3 The Motivation for Shared Trees.......................... 7 4. CBT - The New Architecture................................... 8 4.1 Design Requirements...................................... 8 4.2 Architectural Overview................................... 11 4.3 Core Selection, Placement, and Management................ 13 5. Architectural Justification: Comparisons & Simulation Results............................. 15 5.1 Traffic Concentration.................................... 19 5.2 Shared Trees and Load Balancing.......................... 19 6. Conclusions.................................................. 19 Acknowledgements................................................ 20 APPENDIX........................................................ 21 Expires August 9th, 1996 [Page 2] INTERNET-DRAFT CBT Multicast Architecture February 1996 1. Background Centre based forwarding was first described in the early 1980s by Wall in his PhD thesis on broadcast and selective broadcast. At this time, the benefits and uses of multicast were not fully understood. It wasn't until much later that the IP multicast address space was defined (class D space), and Deering defined the IP multicast service model [4]. Deering's work was pioneering in that he invented algo- rithms that allowed hosts to arbitrarily join a multicast group (IGMP [5]), and enable a local multicast-capable router to become part of a receiver-initiated wide-area delivery tree. The latter consituted algorithms that build sourced-based delivery trees, with one delivery tree per sender subnetwork. These algorithms are documented in [4]. The multicast-capable part of the Internet (MBONE [1]) primarily implements an a protocol instance of the distance-vector multicast algorithm [4] called DVMRP. Now we have several years practical experience with multicast, and a diversity of multicast applications, from shared workspace tools, to audio- and video-conferencing tools, with new applications emerging all the time (e.g. distributed video gaming), some of which have wide-ranging multicast requirements. Furthermore, the MBONE has been constantly growing since the first public audiocast (audio multicast) of a 1992 IETF meeting [6] took place. Then, the MBONE intercon- nected 40 subnets in 4 different countries. Statistics suggest that the MBONE has doubled in size over the past eight months, and as of July 1995 comprises over 2,500 subnetworks spanning 25 countries [1]. The obvious popularity and growth of multicast means that the scaling aspects of wide-area multicasting cannot be overlooked; some predic- tions talk of thousands of groups being present at any one time in the Internet. Distributed Interactive Simulation (DIS) applications [15] have real-time requirements (in terms of join latency, group membership, group sender populations) that far exceed the require- ments of most applications currently in use. Scalability is measured in terms of network state maintenance, bandwidth efficiency, and protocol overhead. Other factors that can affect these parameters include sender set size, and wide-area dis- tribution of group members. Expires August 9th, 1996 [Page 3] INTERNET-DRAFT CBT Multicast Architecture February 1996 2. Introduction Deering's algorithms build source-based multicast delivery trees, that is, the delivery tree is rooted at a multicast-capable router on the subnetwork directly attached to a sender. The IGMP protocol [5] operates between hosts and multicast router(s) on a subnetwork, ena- bling multicast routers to monitor group presence on attached sub- nets, so that on receipt of a multicast packet, a multicast router knows over which interfaces group members are located. Multicasting on the local subnetwork does not require either the presence of a multicast router or the implementation of a multicast algorithm; on most shared media (e.g. Ethernet), a host, which need not necessarily be a member, simply sends a multicast data packet, which is received by any member hosts. For multicasts to extend beyond the scope of the local subnetwork, the subnet must have a multicast-capable router attached, which itself is attached (possibly virtually) to another multicast-capable router, and so on. The col- lection of these (virtually) connected multicast routers forms the Internet's MBONE [1]. Each multicast router must implement the same multicast routing protocol to ensure full multicast connectivity (footnote). For wide-area multicasting then, multicast routers "con- spire" to get multicast data to the group's receivers, wherever they be located. This is the job of the multicast routing algorithm. Different algorithms use different techniques for establishing a dis- tribution tree. If we classify these algorithms into source-based tree algorithms and shared tree algorithms, we'll see that the dif- ferent classes have considerably different scaling characteristics, and the characteristics of the resulting trees differ too, for exam- ple, average delay. Let's look at source-based tree algorithms first. _________________________ Domains that deploy different multicast protocols may be interconnected using a common multicast protocol at domain boundaries (border routers). A special protocol interface needs to be implemented in these border routers. Expires August 9th, 1996 [Page 4] INTERNET-DRAFT CBT Multicast Architecture February 1996 3. Source-Based Tree Algorithms & the Motivation for Shared Trees The strategy we'll use for motivating (CBT) shared tree multicast is based, in part, in explaining the characteristics of source-based tree multicast, in particular its scalability. We then summarize the primary motivation for shared trees in section 3.3. Source-based tree multicast algorithms are often referred to as "dense-mode" algorithms; they assume that the receiver population densely populates the domain of operation, and therefore the accom- panying overhead (in terms of state, bandwidth usage, and/or process- ing costs) is justified. Whilst this might be true of a local environment, it is almost never true of the wide-area environment, especially since wide-area group membership tends to be sparsely dis- tributed throughout the Internet. There may be "pockets" of dense- ness, but if one views the global picture, wide-area groups tend to be sparsely distributed. Source-based multicast trees are either built by a distance-vector style algorithm, which may be implemented separately from the unicast routing algorithm (as is the case with DVMRP), or the multicast tree may be built using the information present in the underlying unicast routing table (as is the case with PIM-DM [7]). The other algorithm used for building source-based trees is the link-state algorithm (a protocol instance being M-OSPF [8]). 3.1. Distance-Vector Multicast Algorithm The distance-vector multicast algorithm builds a multicast delivery tree using a variant of the Reverse-Path Forwarding technique [9]. The technique basically is as follows: when a multicast router receives a multicast data packet, if the packet arrives on the inter- face used to reach the source of the packet, the packet is forwarded over all outgoing interfaces, except leaf subnets (footnote) with no members attached. Otherwise the packet is discarded. This consti- tutes a "broadcast & prune" approach. Subsequently, outgoing inter- faces may be "pruned"; when a data packet reaches a leaf router, if that router has no membership registered on its directly attached subnetworks, the router may send a prune message one hop back towards _________________________ A "leaf" subnet is one with no downstream routers at- tached. A "leaf" router is one which no other router uses to reach the source. Expires August 9th, 1996 [Page 5] INTERNET-DRAFT CBT Multicast Architecture February 1996 the source. The receiving router then checks its leaf subnets for group membership, and checks whether it has received a prune from all of its downstream routers. If the checks succeed, the router itself can send a prune upstream over the interface leading to the source. Thus, prune messages prune the tree branches not leading to group members. Prune messages are stored per pair, and are subject to timeout, after which data flows over the branch again. If there are still no members present, the pruning process repeats itself. State that "times out" in this way is referred to as "soft state". It can be inferred from the above algorithm that multicast routers must maintain state per group per active source. This source-specific state manifests itself in the multicast forwarding table, where, for each active source, the outgoing interface list is dependent on the prunes received, and on the time since they were received (because they time out). As we said, most wide-area groups are likely to be sparsely distri- buted. As a result, it is fair to assume that some potentially large number of routers in the internetwork, i.e. those not leading to downstream receivers, are incurred the cost of storing state. The "broadcast & prune" nature of the distance-vector multicast algo- rithm, the sparseness of receivers, combined with the fact that many wide-area links have limited bandwidth, demonstrates that such an algorithm cannot scale to a large internetwork that supports large numbers of groups. 3.2. Link-State Multicast Algorithm Routers implementing a link state algorithm periodically collect reachability information to their directly attached neighbours, then flood this throughout the routing domain in so-called link state update packets. Deering extended the link state algorithm for multi- casting by having a router additionally detect group membership changes on its incident links before flooding this information in link state packets. Each router then, has a complete, up-to-date image of a domain's topology and group membership. On receiving a multicast data packet, each router uses its membership and topology information to calculate a shortest-path tree rooted at the sender subnetwork. Provided the Expires August 9th, 1996 [Page 6] INTERNET-DRAFT CBT Multicast Architecture February 1996 calculating router falls within the computed tree, it forwards the data packet over the interfaces defined by its calculation. Hence, multicast data packets only ever traverse routers leading to members, either directly attached, or further downstream. That is, the delivery tree is a true multicast tree right from the start. However, the flooding (reliable broadcasting) of group membership information is the predominant factor preventing the link state mul- ticast algorithm being applicable over the wide-area. The other lim- iting factor is the processing cost of the Dijkstra calculation to compute the shortest-path tree for each active source. 3.3. The Motivation for Shared Trees The algorithms described in the previous sections clearly motivate the need for a multicast algorithm(s) that is more scalable. CBT was designed primarily to address the topic of scalability; a shared tree architecture offers an improvement in scalability over source tree architectures by a factor of the number of active sources (where source is a subnetwork aggregate). Source trees scale O(S * G), since all sources use the same shared tree; shared trees eliminate the source (S) scaling factor, and hence a shared tree scales O(G). The implication of this is that applications with many active senders, such as distributed interactive simulation applications, and distri- buted video-gaming (where most receivers are also senders), scale considerably better if shared trees are used. In the table below we compare the amount of state required by CBT and DVMRP for different group sizes with different numbers of active sources: Expires August 9th, 1996 [Page 7] INTERNET-DRAFT CBT Multicast Architecture February 1996 |--------------|---------------------------------------------------| | Number of | | | | | groups | 10 | 100 | 1000 | ==================================================================== | Group size | | | | | (# members) | 20 | 40 | 60 | -------------------------------------------------------------------| | No. of srcs | | | | | | | | | | | per group |10% | 50% |100% |10% | 50% |100% |10% | 50% | 100% | -------------------------------------------------------------------- | No. of DVMRP | | | | | | | | | | | router | | | | | | | | | | | entries | 20 | 100 | 200 |400 | 2K | 4K | 6K | 30K | 60K | -------------------------------------------------------------------- | No. of CBT | | | | | router | | | | | entries | 10 | 100 | 1000 | |------------------------------------------------------------------| Figure 1: Comparison of DVMRP and CBT Router State There are also additional significant bandwidth and state savings with shared trees in contrast to source trees; firstly, the tree only spans a group's receivers (including links/routers leading to receivers) -- there is no cost to routers/links in other parts of the network. Secondly, routers between a non-member sender and the delivery tree are not incurred any cost pertaining to multicast, and indeed, these routers need not even be multicast-capable -- packets from non-member senders are encapsulated and unicast towards a core on the tree. 4. CBT - The New Architecture 4.1. Design Requirements The CBT shared tree design was geared towards several design objec- tives: Expires August 9th, 1996 [Page 8] INTERNET-DRAFT CBT Multicast Architecture February 1996 + scalability + routing protocol independence + robustness + interoperability As we have mentioned, a shared multicast tree improves scalability by an order of magnitude. A facet of existing source tree algorithms is that they are all tied inextricably, one way or another, to a particular routing protocol. For example, DVMRP is based heavily on RIP, and relies for its correct operation on some of the features of RIP (e.g. poison reverse). Similarly, with M-OSPF. With the multicast infrastructure fast converging on the unicast infrastructure, which is heterogeneous in nature, the extent to which a particular multicast algorithm can be deployed is severely limited if deployment is dependent on the presence of a particular unicast algorithm. Therefore, it makes good sense to separate out these dependencies from the multicast algorithm; this is exactly what CBT does, as does PIM [7]. The CBT protocol makes use of the underlying unicast routing table when building its delivery tree, independent of how the unicast table(s) is computed. Source-based tree algorithms are clearly robust; a sender simply sends its data, and intervening routers "conspire" to get the data where it needs to, creating state along the way. This is the so- called "data driven" approach -- there is no set-up protocol involved. It is not as easy to achieve the same degree of robustness in shared tree algorithms, since there is network entity involved which is the focal point of the tree, which effectively, keeps a shared tree fully connected. This entity is just a pre-nominated router, to which all receivers (their directly-attached routers) must explicitly join. In actual fact, any shared tree is likely to have several such so-called "cores", or "rendevous points (RPs)" to increase robustness. CBT and PIM diverge in their approaches to robustness; PIM attempts to maintain a data-driven approach, with only one RP active at any one time. In CBT, all cores are active. PIM's desire to emulate the robustness of source trees comes at a cost, especially in terms of Expires August 9th, 1996 [Page 9] INTERNET-DRAFT CBT Multicast Architecture February 1996 protocol complexity, and an overall increased bandwidth requirement [10]. CBT, on the other hand, adopts the "hard state" approach, whereby a tree branch is created reflecting underlying unicast at the instant it is created; thereafter, the same tree branch does not necessarily change to reflect underlying unicast changes. This has positive implications for the case where a branch is created together with a resource reservation. The reservation remains fixed unless a neighbouring link/router fails, in which case there is no option but to rejoin the tree by means of explicit protocol, and request the resources again. However, the router to which a branch is rejoined may not be able to honour the same reservation request. CBT branches are torn down by means of explicit protocol messages, whereas PIM branches time out. PIM incurs protocol complexity to manage its various timers (to keep state where it is required), and protocol "refresh" messages consume bandwidth. CBT implements a "keepalive" mechanism between adjacent routers on a tree; these are CBT's only periodic tree maintenance messages, and they are aggre- gated such that there is one per link, not one per group per link. Both protocols reduce the frequency of tree maintenance messages as the number of messages increases. A comparison of protocol costs/state/scalability etc. is summarised in section 5, and is presented in detail in [10]. With regards to interoperability, the type of multicast delivery tree interconnecting member hosts (subnets) over the wide-area is tran- sparent to those hosts; hosts send/receive multicast data in tradi- tional IP format, and hosts are not involved explicitly in any way with tree set-up. This is the sole responsibility of local multicast routers. CBT also accommodates interoperability between "clouds" or domains operating different multicast protocols. It is expected that intero- perability between different multicast protocols only be relevant at domain (or region, or "cloud") boundaries; inside a particular domain only a single multicast protocol is expected to be present. One method of how different trees can be "spliced" together at a domain boundary is presented in [11]. Homogeneous clouds, as described in the previous paragraph, means that multicast data can travel in what we call "native mode", i.e. no encapsulation is required that keeps data, from different groups using different multicast protocols, separate. CBT also specifies a "CBT mode", where a particular cloud is heterogeneous, i.e. multiple Expires August 9th, 1996 [Page 10] INTERNET-DRAFT CBT Multicast Architecture February 1996 multicast protocols are present possibly on the same subnet; CBT mode data is encapsulated with a CBT header to distinguish it from any other multicast traffic, for example, traffic that is using a source based tree. These two "modes" are described in the CBT specification document [2]. 4.2. Architectural Overview Shared trees were first described by Wall in his investigation into low-delay approaches to broadcast and selective broadcast [12]. Wall concluded that delay will not be minimal, as with shortest-path trees, but the delay can be kept within bounds that may be accept- able. Simulations have recently been carried out to compare various scaling characteristics of centre-based and shortest-path trees. A summary of these simulations can be found in section 5, and the details are presented in [10]. A CBT shared tree involves having a small set of pre-nominated cores (routers), to which routers connected to member hosts join by means of an explicit protocol control message. Any core is eligible for joining, although only a single core is targetted by a join message. On receipt of a join, the core router replies with an acknowledge- ment, which traverses the reverse-path of the corresponding join (the join sets up transient state along the way). As such, tree branches are created, and parent-child relationships set up between adjacent CBT routers on the tree, the parent being the router nearer the core. When the ack is received, the router creates a CBT forwarding infor- mation base (FIB) entry, listing interfaces corresponding to a par- ticular group. Due to the way the tree is formed, each tree link is symmetric, and the tree reflects an undirected graph. Tree branches are made up of so-called non-core routers, which form a shortest forward path between a member-host's directly attached router, and the core. A router at the end of a branch is known as a leaf router on the tree. A diagram showing a single-core CBT tree is shown in the figure below. Only one core is shown to demonstrate the principle. Expires August 9th, 1996 [Page 11] INTERNET-DRAFT CBT Multicast Architecture February 1996 b b b-----b \ | | \ | | b---b b------b / \ / KEY.... / \/ b X---b-----b X = Core / \ b = non-core router / \ / \ b b------b / \ | / \ | b b b Figure 2: Single-Core CBT Tree Join messages do not necessarily travel all the way to the core before being acknowledged; if a join message hits a router on the tree (on-tree router) on its journey towards a core, that on-tree router terminates the join and acknowledges it. A router that is itself in the process of joining the tree (i.e. one that has not yet received an ack itself) can terminate and cache any subsequent received joins, but cannot acknowledge them until its own join has been successfully acknowledged. For simplicity, part of the CBT protocol [2] is data driven; one of a set of cores for a group is elected (by a group initiator) as the PRIMARY core, all others are termed SECONDARY cores. The ordering of the secondary cores is irrelevant, however, the primary must be uni- form across the whole group. Whenever a secondary core is joined, it first acks the join then proceeds to join the primary core -- the primary core simply listens for incoming joins, it need never send a join itself. In this way, a CBT tree becomes fully connected in response to members joining. The tree joining process is triggered when a subnet's CBT-capable router receives an IGMP group membership report [5]. If more than one CBT router is present on a subnetwork, only one router, the subnet's designated router (DR), generates a join message. A subnet's DR is implicitly elected (i.e. no CBT protocol involvement) as a "side- effect" of IGMP. Expires August 9th, 1996 [Page 12] INTERNET-DRAFT CBT Multicast Architecture February 1996 CBT builds a loop-free shared tree. In some scenarios it is necessary to take explicit precautions against loops, in others it is not. For example, a loop cannot form between a router that wishes to join the tree, if that router has no child tree branches (interfaces). There is a potential for loops if a joining router has at least one child attached. This scenario would constitute a rejoin. Once the rejoining router has received an ack, it must generate a loop detection which is sent over its parent interface. Receiving routers must forward this packet over their parent interface only, unless it is received by its originator, in which case a loop is present, or it is recieved by the primary core, in which case no loop is present. In the former case, the loop is broken by the originator sending a quit packet to its new parent, and the latter case solicits an ack from the primary core, confirming the absence of a loop. Tree maintenance takes place between adjacent on-tree routers in the form of protocol "keepalive" messages. Only one of these need be sent per link (not per group), and this message is sent in the direction child -> parent. The parent sends a corresponding acknowledgement. This is how tree connectivity is monitored, and breakages/failures detected. When a CBT router receives a multicast data packet, it simply for- wards it over all outgoing interfaces, as specified in its FIB entry for the group. Protocol mechanisms help ensure that data packets never loop. The CBT protocol is simple and lightweight. The CBT protocol specifi- cation is described in an separate document [2]. 4.3. Core Selection, Placement, and Management The issues of core (RP) selection, placement, and management are still under review, although several recent advancements have been made [13]. Until recently, it was thought that hosts would need to be explicitly involved in discovering core addresses corresponding to a particular group. This would require host system changes, and modifications to applications, such that an application could request the host to establish a mapping for a given group, as well as some sort of core advertisement protocol in which hosts and core routers Expires August 9th, 1996 [Page 13] INTERNET-DRAFT CBT Multicast Architecture February 1996 participate. A dependency on host or application involvement with core (RP) discovery is therefore very undesirable. Indeed, the type of multi- cast tree to be joined should be invisible to hosts (and even appli- cations). A new proposal has recently emerged called Hierarchical PIM [13], which proposes having a multi-level hierarchy of RPs (cores), but which are known only to multicast routers; cores (RPs) remain invisi- ble to hosts. Whilst its name suggests it is specific to PIM, its principles are not. Each level of hierarchy corresponds to a multicast "scope level", with a certain multicast address range corresponding to each scope. This therefore, requires the multicast address space to be officially "administratively scoped"; a group initiator chooses a multicast address based on the perceived scope of the group. On receipt of an IGMP group membership report from a host, a local router (the lowest-level of the core (RP) hierarchy) knows, based on the group address, whether it has to join a core at the next level up in the hierarchy; each candidate RP at a particular level advertises itself within its own level (using a well-known scoped multicast address), and to the level below. Hence, an RP should always know how to reach an RP at the next level up in the hierarchy. A next-level RP is chosen by using a hash function across the group address. The lower the hierarchy level, the more densely RPs populate that level. However, at the top level (for globally-scoped groups) it is expected there will be sufficient RP distribution so that shared tree paths (i.e. delay) is reasonably efficient. It is expected that there will probably be six or seven levels of hierarchy (local, region, country, continent, etc.), with the candi- date RPs changing relatively infrequently (say every 24 hrs), with the candidate RP announcements being made more frequently, e.g. every few minutes. The finer details of this scheme have still to be worked out, but due to the significant advantage of being host-transparent, it is highly likely that this RP/Core selection/placement/management scheme will be adopted (footnote). _________________________ This work is currently progressing under the auspices of the IDMR working group of the IETF. Expires August 9th, 1996 [Page 14] INTERNET-DRAFT CBT Multicast Architecture February 1996 5. Architectural Justification: Comparisons & Simulation Results This majority of this section summarises the results of in-depth simulations carried out by Harris Corp., USA, investigating the per- formance and resource cost comparisons of multicast algorithms for distributed interactive simulation applications [10]. More pre- cisely, the report summarises the work on the Real-time Information Transfer & Networking (RITN) program, comparing the cost and perfor- mance of PIM [7] and CBT [2] in a DIS environment. As we said ear- lier, DIS applications have wide-ranging requirements. We feel it is important to take into account a wide range of requirements so that future applications can be accommodated with ease; all other multi- cast architectures are tailored to the requirements of applications in the current Internet, particularly audio and video applications. Figure 3 shows a comparison of application requirements [10]. We also present results into the study of whether source-based trees or shared trees are the best choice for multicasting in the RITN pro- gram. In the study of shortest-path trees (SPTs) vs. shared trees, PIM Dense-Mode and PIM-SM with SPTs were used as SPTs, with CBT and PIM-SM used as shared trees. This section assumes the reader is fami- liar with the different modes of PIM [7]. Expires August 9th, 1996 [Page 15] INTERNET-DRAFT CBT Multicast Architecture February 1996 |--------------|----------------------------------------------------| | | Paradigm | | |----------------------------------------------------| | Requirement | | audio/video | audio/video | | | DIS | lecture dist'n | conference | ===================================================================== | senders | many | small number | small number | --------------------------------------------------------------------- | receivers |also senders | many recvrs | also senders | --------------------------------------------------------------------- | no. of grps | | | | | per appl'n | many | one or few | one or few | --------------------------------------------------------------------- | Data tx | real time | real time | real time | --------------------------------------------------------------------- | e2e delay | small | non-critical | moderate | --------------------------------------------------------------------- | set up | real time | non real time | non real time | --------------------------------------------------------------------- | join/leave | rapid for | rapid for | can be rapid for | | dynamics | participants| receivers | participants | --------------------------------------------------------------------- | | must be | must be scalable| | | scalability | scalable to | to large n/ws | need scalability | | | large n/ws &| and large nos | large n/ws | | | large grps, | of recvrs per | | | | with large | group | | | | nos senders | | | | | & recvrs per| | | | | group | | | --------------------------------------------------------------------- | | based upon | | | | multicast | the DIS | rooted on src | incl participants | | tree | virtual | and includes | and can slowly move| | | space, with | current recvrs | over phys topology | | | rapid mvmt | | | | | over phys | | | | | topology | | | --------------------------------------------------------------------- Figure 3: Comparison of Multicast Requirements Expires August 9th, 1996 [Page 16] INTERNET-DRAFT CBT Multicast Architecture February 1996 The following criteria were considered in the simulations: + end-to-end delay + group join time + scalability (all participants are both senders & receivers) + bandwidth utilization + overhead traffic + protocol complexity A brief summary of the the results of the evaluation follow. For a detailed description of the simulations and test environments, refer to [10]. + End-to-end delay. It was shown that PIM-DM and PIM-SM with SPTs deliver packets between 13% and 31% faster than CBT. PIM-SM has about the same delay characteristics as CBT. Processing delay was not taken into account here, and this stands in PIM's favour, since PIM routers are likely to have much larger routing tables, and thus, much greater search times. + Join time. There was very little difference between any of the algorithms. + Bandwidth efficiency. It was shown that PIM-SM with shared trees, and PIM-SM with SPTs both required only about 4% more bandwidth in total, to deliver data to hosts. PIM-DM however, is very bandwidth inefficient, but this improves greatly as the network becomes densely populated with group receivers. + Overhead comparisons. CBT exhibited the lowest overhead percen- tage, even less than PIM-SM with shared trees. PIM-DM was shown to have more than double the overhead of PIM-SM with SPTs due to the periodic broadcasting & pruning. The Harris simulations [10] measured the average number of rout- ing table entries required at each router for CBT, PIM-DM, PIM- SM with SPTs, and PIM-SM with shared trees. The following param- eters were used in the simulations: + N = number of active multicast groups in the network. Expires August 9th, 1996 [Page 17] INTERNET-DRAFT CBT Multicast Architecture February 1996 + n = average number of senders to a group. + b = fraction of groups moving to source tree in PIM-SM. + c = average percentage of routers on a shared tree for a group (or source tree for a particular sender). + s = average percentage of routers on a source-based tree for a group (but not on shared tree). + d = average number of hops from a host to the RP. + r = total number of routers in the network. The following results were calculated, given b = 1, c = 0.5, s = 0.25, and d/r = 0.05. |--------------|----------------------------------------------------| | | Group size parameters | | |----------------------------------------------------| | Protocol | N = 1000 | N = 1000 | N = 20,000 | N = 20,000 | | | n = 10 | n = 200 | n = 10 | n = 200 | ===================================================================== | | | | | | | CBT | 500 | 500 | 10,000 | 10,000 | --------------------------------------------------------------------- | | | | | | | PIM-Dense | 10,000 | 200,000 | 200,000 | 4,000,000 | --------------------------------------------------------------------- | PIM-Sparse | | | | | | with SPT | 8000 | 150,500 | 160,000 | 3,010,000 | --------------------------------------------------------------------- | PIM-Sparse, | | | | | | shared tree | 1000 | 1,500 | 20,000 | 210,000 | --------------------------------------------------------------------- Figure 4: Comparison of Router State Requirements + Complexity comparisons. Protocol complexity, protocol traffic overhead, and routing table size were considered here. CBT was found to be considerably simpler than all other schemes, on all counts (footnote). _________________________ The comparisons carried out were subjective. Expires August 9th, 1996 [Page 18] INTERNET-DRAFT CBT Multicast Architecture February 1996 5.1. Traffic Concentration One of the criticisms leveled at shared trees is that traffic is aggregated onto a smaller subset of network links, than with source tree protocols. It has been shown that if shared trees, spanning a large number of receivers, with some (not insignificant proportion of these being also senders), if such a shared tree has only a small number of cores (RPs), traffic concentration is a problem on the core incident links, and for the core itself. The Harris simulation results [10] suggest increasing the number of cores as group size increases, thereby largely eliminating the problem. 5.2. Shared Trees and Load Balancing An important question raised in the SPT vs. CBT debate is: how effec- tively can load sharing be achieved by the different schemes? The scope of ability for DVMRP to do load-sharing is limited primarily by its forwarding algorithm (RPF); this allows for only single-path routing. With shared tree schemes however, each receiver can choose which of the small selection of cores it wishes to join. Cores and on-tree nodes can be configured to accept only a certain number of joins, forcing a receiver to join via a different path. This flexibility gives shared tree schemes the ability to achieve load balancing. In general, spread over all groups, CBT has the ability to randomize the group set over different trees (spanning different links around the centre of the network), something that would not seem possible under SPT schemes. 6. Conclusions In comparing CBT with the other shared tree architecture, PIM, CBT was found to be more favourable in terms of scalability, complexity, and overhead. Other characteristics were found to be very similar. When comparing SPTs with shared trees, we find that the end-to-end delays of shared trees are usually acceptable, and can be improved by strategic core placement. Routing table size is another important factor in support of shared trees, as figures 1 and 4 clearly illus- trate. Expires August 9th, 1996 [Page 19] INTERNET-DRAFT CBT Multicast Architecture February 1996 Shared trees also have more potential to load balance traffic across different links or trees. In the absence of multi-path multicast routing, this does not seem possible with source-based tree tech- niques. Acknowledgements Special thanks goes to Paul Francis, NTT Japan, for the original brainstorming sessions that brought about this work. Thanks also to Bibb Cain et al. (Harris Corporation) for allowing the publication of their simulation results, and the duplication of fig- ures 3 and 4. Ken Carlberg (SAIC) reviewed the text, and provided useful feedback and suggestions. I would also like to thank the participants of the IETF IDMR working group meetings for their general constructive comments and sugges- tions since the inception of CBT. Expires August 9th, 1996 [Page 20] INTERNET-DRAFT CBT Multicast Architecture February 1996 APPENDIX The following formulae were used by Harris Corp. in calculating the values in figure 4. The meaning of the formulae arguments precedes figure 4. + average no. of entries in each PIM-DM router is approximated by: T(PIM-DM) ~ N * n + average no. of entries a router maintains is the likelihood, c, that the router will be on a shared tree, times the total number, N, of shared trees: T(CBT) = N * c + average no. of entries a router maintains due to each src based tree is the average no. of hops, d, from a host to the RP, divided by the number of routers, r, in the network: T(PIM-SM, shared tree) = N * c + N * n * d/r + average no. of routing table entries for PIM-SM with some groups setting up source-based trees: T(PIM, SPT) = N * [B * n + 1] * c + b * n * s For more detailed information on these formulae, refer to [10]. References [1] MBONE, The Multicast BackbONE; M. Macedonia and D. Brutzman; available from http://www.cs.ucl.ac.uk/mice/mbone_review.html. [2] Core Based Trees (CBT) Multicast: Protocol Specification; A. J. Ballardie, N. Jain, and S. Reeve; ftp://cs.ucl.ac.uk/darpa/IDMR/draft-ietf-idmr-cbt-spec-04.txt. [3] A. J. Ballardie. Scalable Multicast Key Distribution (ftp://cs.ucl.ac.uk/darpa/IDMR/draft-ietf-idmr-mkd-01.{ps,txt}). Work- ing draft, 1995. [4] DVMRP. Described in "Multicast Routing in a Datagram Internet- work", S. Deering, PhD Thesis, 1990. Available via anonymous ftp from: Expires August 9th, 1996 [Page 21] INTERNET-DRAFT CBT Multicast Architecture February 1996 gregorio.stanford.edu:vmtp/sd-thesis.ps. [5] W. Fenner. Internet Group Management Protocol, version 2 (IGMPv2), (draft-idmr-igmp-v2-01.txt). Working draft. [6] First IETF Internet Audiocast; S. Casner, S. Deering; In proceed- ings of ACM Sigcomm'92. [7] D. Estrin et al. USC/ISI, Work in progress. (http://netweb.usc.edu/pim/). [8] J. Moy. Multicast Routing Extensions to OSPF. Communications of the ACM, 37(8): 61-66, August 1994. [9] Y.K. Dalal and R.M. Metcalfe. Reverse path forwarding of broad- cast packets. Communications of the ACM, 21(12):1040--1048, 1978. [10] T. Billhartz, J. Bibb Cain, E. Farrey-Goudreau, and D. Feig. Performance and Resource Cost Comparisons of Multicast Routing Algo- rithms for Distributed Interactive Simulation Applications; a report prepared for NRL ****, July 1995. [11] A. J. Ballardie. Defining a Level-1/Level-2 Interface in the Presence of Multiple Protocols. Working draft, January 1996. ftp://cs.ucl.ac.uk/darpa/IDMR/draft-ietf-idmr-hier-intf-00.txt [12] D. Wall. Mechanisms for Broadcast and Selective Broadcast. PhD thesis, Stanford University, June 1980. Technical Report #90. [13] M. Handley, J. Crowcroft, I. Wakeman. Hierarchical Rendezvous Point proposal, work in progress. (http://www.cs.ucl.ac.uk/staff/M.Handley/hpim.ps) and (ftp://cs.ucl.ac.uk/darpa/IDMR/IETF-DEC95/hpim-slides.ps). [14] A. J. Ballardie. "A New Approach to Multicast Communication in a Datagram Internetwork", PhD Thesis, 1995. Available via anonymous ftp from: cs.ucl.ac.uk:darpa/IDMR/ballardie-thesis.ps.Z. [15] D. J. Hook, J. 0. Calvin, M. K. Newton, D. A. Fuscio. "An Approach to DIS Scalability", Eleventh DIS Workshop, Orlando, Florida, Sept. 26th-30th, 1994, pp 94-141. Expires August 9th, 1996 [Page 22] INTERNET-DRAFT CBT Multicast Architecture February 1996 Author's Address: Tony Ballardie, Department of Computer Science, University College London, Gower Street, London, WC1E 6BT, ENGLAND, U.K. Tel: ++44 (0)71 419 3462 e-mail: A.Ballardie@cs.ucl.ac.uk Expires August 9th, 1996 [Page 23]