Inter-Domain Multicast Routing (IDMR)			 A. J. Ballardie
INTERNET-DRAFT				       University College London
						      February 9th, 1996


	     Core Based	Trees (CBT) Multicast Architecture
		   <draft-ietf-idmr-cbt-arch-03.txt>

Status of this Memo

   This	document is an Internet	Draft.	Internet Drafts	are working do-
   cuments of the Internet Engineering Task Force (IETF), its Areas, and
   its Working Groups. Note that other groups may also distribute work-
   ing documents as Internet Drafts).

   Internet Drafts are draft documents valid for a maximum of six
   months. Internet Drafts may be updated, replaced, or	obsoleted by
   other documents at any time.	 It is not appropriate to use Internet
   Drafts as reference material	or to cite them	other than as a	"working
   draft" or "work in progress."

   Please check	the I-D	abstract listing contained in each Internet
   Draft directory to learn the	current	status of this or any other In-
   ternet Draft.


Abstract

   CBT is a new	multicast architecture that builds a single, shared
   delivery tree. Traditional multicast	algorithms build a source-based
   tree	per sender subnetwork, as does DVMRP, the protocol currently de-
   ployed in the Internet's multicast backbone (MBONE) [1].  The primary
   advantage of	the shared tree	approach is that it typically offers
   more	favourable scaling characteristics than	do existing multicast
   algorithms.

   The CBT protocol [2]	is a network layer multicast protocol that
   builds a shared delivery tree. The CBT protocol also	allows for the
   incorporation of security features [2, 3], and the potential	to
   reserve network resources as	part of	delivery tree set-up [14].

   The sending and receiving of	multicast data by hosts	on a subnetwork
   conforms to the traditional IP multicast service model [4]; multicast
   data	on a subnetwork	appears	no different to	that of	existing
   schemes.


Expires	August 9th, 1996					[Page 1]


INTERNET-DRAFT	       CBT Multicast Architecture	  February 1996


   CBT is progressing through the IDMR working group of	the IETF. The
   CBT protocol	is described in	an accompanying	document [2]. For this,
   and all IDMR-related	documents, see http://www.cs.ucl.ac.uk/ietf/idmr


TABLE OF CONTENTS

  1. Background................................................... 3

  2. Introduction................................................. 4

  3. Source Based Tree Algorithms &
     the Motivation for	Shared Trees.............................. 5

     3.1 Distance-Vector Multicast Algorithm...................... 5

     3.2 Link State Multicast Algorithm........................... 6

     3.3 The Motivation	for Shared Trees.......................... 7

  4. CBT - The New Architecture................................... 8

     4.1 Design	Requirements...................................... 8

     4.2 Architectural Overview................................... 11

     4.3 Core Selection, Placement, and	Management................ 13

  5. Architectural Justification:
     Comparisons & Simulation Results............................. 15

     5.1 Traffic Concentration.................................... 19

     5.2 Shared	Trees and Load Balancing.......................... 19

  6. Conclusions.................................................. 19

  Acknowledgements................................................ 20

  APPENDIX........................................................ 21


Expires	August 9th, 1996					[Page 2]


INTERNET-DRAFT	       CBT Multicast Architecture	  February 1996


1.  Background

   Centre based	forwarding was first described in the early 1980s by
   Wall	in his PhD thesis on broadcast and selective broadcast.	 At this
   time, the benefits and uses of multicast were not fully understood.
   It wasn't until much	later that the IP multicast address space was
   defined (class D space), and	Deering	defined	the IP multicast service
   model [4]. Deering's	work was pioneering in that he invented	algo-
   rithms that allowed hosts to	arbitrarily join a multicast group (IGMP
   [5]), and enable a local multicast-capable router to	become part of a
   receiver-initiated wide-area	delivery tree. The latter consituted
   algorithms that build sourced-based delivery	trees, with one	delivery
   tree	per sender subnetwork. These algorithms	are documented in [4].

   The multicast-capable part of the Internet (MBONE [1]) primarily
   implements an a protocol instance of	the distance-vector multicast
   algorithm [4] called	DVMRP.

   Now we have several years practical experience with multicast, and a
   diversity of	multicast applications,	from shared workspace tools, to
   audio- and video-conferencing tools,	with new applications emerging
   all the time	(e.g. distributed video	gaming), some of which have
   wide-ranging	multicast requirements.	 Furthermore, the MBONE	has been
   constantly growing since the	first public audiocast (audio multicast)
   of a	1992 IETF meeting [6] took place.  Then, the MBONE intercon-
   nected 40 subnets in	4 different countries.	Statistics suggest that
   the MBONE has doubled in size over the past eight months, and as of
   July	1995 comprises over 2,500 subnetworks spanning 25 countries [1].

   The obvious popularity and growth of	multicast means	that the scaling
   aspects of wide-area	multicasting cannot be overlooked; some	predic-
   tions talk of thousands of groups being present at any one time in
   the Internet.  Distributed Interactive Simulation (DIS) applications
   [15]	have real-time requirements (in	terms of join latency, group
   membership, group sender populations) that far exceed the require-
   ments of most applications currently	in use.

   Scalability is measured in terms of network state maintenance,
   bandwidth efficiency, and protocol overhead.	Other factors that can
   affect these	parameters include sender set size, and	wide-area dis-
   tribution of	group members.


Expires	August 9th, 1996					[Page 3]


INTERNET-DRAFT	       CBT Multicast Architecture	  February 1996


2.  Introduction

   Deering's algorithms	build source-based multicast delivery trees,
   that	is, the	delivery tree is rooted	at a multicast-capable router on
   the subnetwork directly attached to a sender. The IGMP protocol [5]
   operates between hosts and multicast	router(s) on a subnetwork, ena-
   bling multicast routers to monitor group presence on	attached sub-
   nets, so that on receipt of a multicast packet, a multicast router
   knows over which interfaces group members are located.

   Multicasting	on the local subnetwork	does not require either	the
   presence of a multicast router or the implementation	of a multicast
   algorithm; on most shared media (e.g. Ethernet), a host, which need
   not necessarily be a	member,	simply sends a multicast data packet,
   which is received by	any member hosts. For multicasts to extend
   beyond the scope of the local subnetwork, the subnet	must have a
   multicast-capable router attached, which itself is attached (possibly
   virtually) to another multicast-capable router, and so on. The col-
   lection of these (virtually)	connected multicast routers forms the
   Internet's MBONE [1]. Each multicast	router must implement the same
   multicast routing protocol to ensure	full multicast connectivity
   (footnote).	For wide-area multicasting then, multicast routers "con-
   spire" to get multicast data	to the group's receivers, wherever they
   be located.	This is	the job	of the multicast routing algorithm.

   Different algorithms	use different techniques for establishing a dis-
   tribution tree. If we classify these	algorithms into	source-based
   tree	algorithms and shared tree algorithms, we'll see that the dif-
   ferent classes have considerably different scaling characteristics,
   and the characteristics of the resulting trees differ too, for exam-
   ple,	average	delay. Let's look at source-based tree algorithms first.


_________________________
Domains	that deploy different multicast	 protocols  may
be  interconnected using a common multicast protocol at
domain boundaries (border routers). A special  protocol
interface  needs  to  be  implemented  in  these border
routers.


Expires	August 9th, 1996					[Page 4]


INTERNET-DRAFT	       CBT Multicast Architecture	  February 1996


3.  Source-Based Tree Algorithms & the Motivation for Shared Trees

   The strategy	we'll use for motivating (CBT) shared tree multicast is
   based, in part, in explaining the characteristics of	source-based
   tree	multicast, in particular its scalability. We then summarize the
   primary motivation for shared trees in section 3.3.

   Source-based	tree multicast algorithms are often referred to	as
   "dense-mode"	algorithms; they assume	that the receiver population
   densely populates the domain	of operation, and therefore the	accom-
   panying overhead (in	terms of state,	bandwidth usage, and/or	process-
   ing costs) is justified.  Whilst this might be true of a local
   environment,	it is almost never true	of the wide-area environment,
   especially since wide-area group membership tends to	be sparsely dis-
   tributed throughout the Internet.  There may	be "pockets" of	dense-
   ness, but if	one views the global picture, wide-area	groups tend to
   be sparsely distributed.

   Source-based	multicast trees	are either built by a distance-vector
   style algorithm, which may be implemented separately	from the unicast
   routing algorithm (as is the	case with DVMRP), or the multicast tree
   may be built	using the information present in the underlying	unicast
   routing table (as is	the case with PIM-DM [7]). The other algorithm
   used	for building source-based trees	is the link-state algorithm (a
   protocol instance being M-OSPF [8]).


3.1.  Distance-Vector Multicast	Algorithm

   The distance-vector multicast algorithm builds a multicast delivery
   tree	using a	variant	of the Reverse-Path Forwarding technique [9].
   The technique basically is as follows: when a multicast router
   receives a multicast	data packet, if	the packet arrives on the inter-
   face	used to	reach the source of the	packet,	the packet is forwarded
   over	all outgoing interfaces, except	leaf subnets (footnote)	with no
   members attached. Otherwise the packet is discarded.	 This consti-
   tutes a "broadcast &	prune" approach. Subsequently, outgoing	inter-
   faces may be	"pruned"; when a data packet reaches a leaf router, if
   that	router has no membership registered on its directly attached
   subnetworks,	the router may send a prune message one	hop back towards
_________________________
A "leaf" subnet	is one with no downstream  routers  at-
tached.	 A  "leaf"  router is one which	no other router
uses to	reach the source.


Expires	August 9th, 1996					[Page 5]


INTERNET-DRAFT	       CBT Multicast Architecture	  February 1996


   the source. The receiving router then checks	its leaf subnets for
   group membership, and checks	whether	it has received	a prune	from all
   of its downstream routers. If the checks succeed, the router	itself
   can send a prune upstream over the interface	leading	to the source.
   Thus, prune messages	prune the tree branches	not leading to group
   members. Prune messages are stored per <source, group> pair,	and are
   subject to timeout, after which data	flows over the branch again.  If
   there are still no members present, the pruning process repeats
   itself.  State that "times out" in this way is referred to as "soft
   state".

   It can be inferred from the above algorithm that multicast routers
   must	maintain state per group per active source. This source-specific
   state manifests itself in the multicast forwarding table, where, for
   each	active source, the outgoing interface list is dependent	on the
   prunes received, and	on the time since they were received (because
   they	time out).

   As we said, most wide-area groups are likely	to be sparsely distri-
   buted.  As a	result,	it is fair to assume that some potentially large
   number of routers in	the internetwork, i.e. those not leading to
   downstream receivers, are incurred the cost of storing <source,
   group> state.

   The "broadcast & prune" nature of the distance-vector multicast algo-
   rithm, the sparseness of receivers, combined	with the fact that many
   wide-area links have	limited	bandwidth, demonstrates	that such an
   algorithm cannot scale to a large internetwork that supports	large
   numbers of groups.


3.2.  Link-State Multicast Algorithm

   Routers implementing	a link state algorithm periodically collect
   reachability	information to their directly attached neighbours, then
   flood this throughout the routing domain in so-called link state
   update packets. Deering extended the	link state algorithm for multi-
   casting by having a router additionally detect group	membership
   changes on its incident links before	flooding this information in
   link	state packets.

   Each	router then, has a complete, up-to-date	image of a domain's
   topology and	group membership. On receiving a multicast data	packet,
   each	router uses its	membership and topology	information to calculate
   a shortest-path tree	rooted at the sender subnetwork. Provided the


Expires	August 9th, 1996					[Page 6]


INTERNET-DRAFT	       CBT Multicast Architecture	  February 1996


   calculating router falls within the computed	tree, it forwards the
   data	packet over the	interfaces defined by its calculation. Hence,
   multicast data packets only ever traverse routers leading to	members,
   either directly attached, or	further	downstream. That is, the
   delivery tree is a true multicast tree right	from the start.

   However, the	flooding (reliable broadcasting) of group membership
   information is the predominant factor preventing the	link state mul-
   ticast algorithm being applicable over the wide-area.  The other lim-
   iting factor	is the processing cost of the Dijkstra calculation to
   compute the shortest-path tree for each active source.


3.3.  The Motivation for Shared	Trees

   The algorithms described in the previous sections clearly motivate
   the need for	a multicast algorithm(s) that is more scalable.	CBT was
   designed primarily to address the topic of scalability; a shared tree
   architecture	offers an improvement in scalability over source tree
   architectures by a factor of	the number of active sources (where
   source is a subnetwork aggregate). Source trees scale O(S * G), since
   all sources use the same shared tree; shared	trees eliminate	the
   source (S) scaling factor, and hence	a shared tree scales O(G). The
   implication of this is that applications with many active senders,
   such	as distributed interactive simulation applications, and	distri-
   buted video-gaming (where most receivers are	also senders), scale
   considerably	better if shared trees are used.

   In the table	below we compare the amount of state required by CBT and
   DVMRP for different group sizes with	different numbers of active
   sources:


Expires	August 9th, 1996					[Page 7]


INTERNET-DRAFT	       CBT Multicast Architecture	  February 1996


     |--------------|---------------------------------------------------|
     |	Number of   |		     |		      |			|
     |	  groups    |	     10	     |	     100      |	       1000	|
     ====================================================================
     |	Group size  |		     |		      |			|
     | (# members)  |	     20	     |	     40	      |		60	|
     -------------------------------------------------------------------|
     | No. of srcs  |	 |     |     |	  |	|     |	   |	 |	|
     |	per group   |10% | 50% |100% |10% | 50%	|100% |10% | 50% | 100%	|
     --------------------------------------------------------------------
     | No. of DVMRP |	 |     |     |	  |	|     |	   |	 |	|
     |	  router    |	 |     |     |	  |	|     |	   |	 |	|
     |	 entries    | 20 | 100 | 200 |400 | 2K	| 4K  |	6K | 30K | 60K	|
     --------------------------------------------------------------------
     | No. of CBT   |		     |		      |			|
     |	router	    |		     |		      |			|
     |	entries	    |	    10	     |	     100      |	      1000	|
     |------------------------------------------------------------------|

	    Figure 1: Comparison of DVMRP and CBT Router State

   There are also additional significant bandwidth and state savings
   with	shared trees in	contrast to source trees; firstly, the tree only
   spans a group's receivers (including	links/routers leading to
   receivers) -- there is no cost to routers/links in other parts of the
   network. Secondly, routers between a	non-member sender and the
   delivery tree are not incurred any cost pertaining to multicast, and
   indeed, these routers need not even be multicast-capable -- packets
   from	non-member senders are encapsulated and	unicast	towards	a core
   on the tree.


4.  CBT	- The New Architecture


4.1.  Design Requirements

   The CBT shared tree design was geared towards several design	objec-
   tives:


Expires	August 9th, 1996					[Page 8]


INTERNET-DRAFT	       CBT Multicast Architecture	  February 1996


   +	scalability

   +	routing	protocol independence

   +	robustness

   +	interoperability

   As we have mentioned, a shared multicast tree improves scalability by
   an order of magnitude.

   A facet of existing source tree algorithms is that they are all tied
   inextricably, one way or another, to	a particular routing protocol.
   For example,	DVMRP is based heavily on RIP, and relies for its
   correct operation on	some of	the features of	RIP (e.g. poison
   reverse). Similarly,	with M-OSPF.

   With	the multicast infrastructure fast converging on	the unicast
   infrastructure, which is heterogeneous in nature, the extent	to which
   a particular	multicast algorithm can	be deployed is severely	limited
   if deployment is dependent on the presence of a particular unicast
   algorithm. Therefore, it makes good sense to	separate out these
   dependencies	from the multicast algorithm; this is exactly what CBT
   does, as does PIM [7]. The CBT protocol makes use of	the underlying
   unicast routing table when building its delivery tree, independent of
   how the unicast table(s) is computed.

   Source-based	tree algorithms	are clearly robust; a sender simply
   sends its data, and intervening routers "conspire" to get the data
   where it needs to, creating state along the way. This is the	so-
   called "data	driven"	approach -- there is no	set-up protocol
   involved.

   It is not as	easy to	achieve	the same degree	of robustness in shared
   tree	algorithms, since there	is network entity involved which is the
   focal point of the tree, which effectively, keeps a shared tree fully
   connected. This entity is just a pre-nominated router, to which all
   receivers (their directly-attached routers) must explicitly join. In
   actual fact,	any shared tree	is likely to have several such so-called
   "cores", or "rendevous points (RPs)"	to increase robustness.

   CBT and PIM diverge in their	approaches to robustness; PIM attempts
   to maintain a data-driven approach, with only one RP	active at any
   one time. In	CBT, all cores are active. PIM's desire	to emulate the
   robustness of source	trees comes at a cost, especially in terms of


Expires	August 9th, 1996					[Page 9]


INTERNET-DRAFT	       CBT Multicast Architecture	  February 1996


   protocol complexity,	and an overall increased bandwidth requirement
   [10]. CBT, on the other hand, adopts	the "hard state" approach,
   whereby a tree branch is created reflecting underlying unicast at the
   instant it is created; thereafter, the same tree branch does	not
   necessarily change to reflect underlying unicast changes. This has
   positive implications for the case where a branch is	created	together
   with	a resource reservation.	The reservation	remains	fixed unless a
   neighbouring	link/router fails, in which case there is no option but
   to rejoin the tree by means of explicit protocol, and request the
   resources again. However, the router	to which a branch is rejoined
   may not be able to honour the same reservation request.

   CBT branches	are torn down by means of explicit protocol messages,
   whereas PIM branches	time out. PIM incurs protocol complexity to
   manage its various timers (to keep state where it is	required), and
   protocol "refresh" messages consume bandwidth. CBT implements a
   "keepalive" mechanism between adjacent routers on a tree; these are
   CBT's only periodic tree maintenance	messages, and they are aggre-
   gated such that there is one	per link, not one per group per	link.
   Both	protocols reduce the frequency of tree maintenance messages as
   the number of messages increases.

   A comparison	of protocol costs/state/scalability etc. is summarised
   in section 5, and is	presented in detail in [10].

   With	regards	to interoperability, the type of multicast delivery tree
   interconnecting member hosts	(subnets) over the wide-area is	tran-
   sparent to those hosts; hosts send/receive multicast	data in	tradi-
   tional IP format, and hosts are not involved	explicitly in any way
   with	tree set-up. This is the sole responsibility of	local multicast
   routers.

   CBT also accommodates interoperability between "clouds" or domains
   operating different multicast protocols. It is expected that	intero-
   perability between different	multicast protocols only be relevant at
   domain (or region, or "cloud") boundaries; inside a particular domain
   only	a single multicast protocol is expected	to be present. One
   method of how different trees can be	"spliced" together at a	domain
   boundary is presented in [11].

   Homogeneous clouds, as described in the previous paragraph, means
   that	multicast data can travel in what we call "native mode", i.e.
   no encapsulation is required	that keeps data, from different	groups
   using different multicast protocols,	separate. CBT also specifies a
   "CBT	mode", where a particular cloud	is heterogeneous, i.e. multiple


Expires	August 9th, 1996				       [Page 10]


INTERNET-DRAFT	       CBT Multicast Architecture	  February 1996


   multicast protocols are present possibly on the same	subnet;	CBT mode
   data	is encapsulated	with a CBT header to distinguish it from any
   other multicast traffic, for	example, traffic that is using a source
   based tree. These two "modes" are described in the CBT specification
   document [2].


4.2.  Architectural Overview

   Shared trees	were first described by	Wall in	his investigation into
   low-delay approaches	to broadcast and selective broadcast [12]. Wall
   concluded that delay	will not be minimal, as	with shortest-path
   trees, but the delay	can be kept within bounds that may be accept-
   able.  Simulations have recently been carried out to	compare	various
   scaling characteristics of centre-based and shortest-path trees.  A
   summary of these simulations	can be found in	section	5, and the
   details are presented in [10].

   A CBT shared	tree involves having a small set of pre-nominated cores
   (routers), to which routers connected to member hosts join by means
   of an explicit protocol control message. Any	core is	eligible for
   joining, although only a single core	is targetted by	a join message.
   On receipt of a join, the core router replies with an acknowledge-
   ment, which traverses the reverse-path of the corresponding join (the
   join	sets up	transient state	along the way).	As such, tree branches
   are created,	and parent-child relationships set up between adjacent
   CBT routers on the tree, the	parent being the router	nearer the core.
   When	the ack	is received, the router	creates	a CBT forwarding infor-
   mation base (FIB) entry, listing interfaces corresponding to	a par-
   ticular group. Due to the way the tree is formed, each tree link is
   symmetric, and the tree reflects an undirected graph.

   Tree	branches are made up of	so-called non-core routers, which form a
   shortest forward path between a member-host's directly attached
   router, and the core. A router at the end of	a branch is known as a
   leaf	router on the tree.

   A diagram showing a single-core CBT tree is shown in	the figure
   below. Only one core	is shown to demonstrate	the principle.


Expires	August 9th, 1996				       [Page 11]


INTERNET-DRAFT	       CBT Multicast Architecture	  February 1996


	   b	  b	b-----b
	    \	  |	|
	     \	  |	|
	      b---b	b------b
	     /	   \  /			  KEY....
	    /	    \/
	   b	     X---b-----b	  X = Core
		    / \			  b = non-core router
		   /   \
		  /	\
		  b	 b------b
		 / \	 |
		/   \	 |
	       b     b	 b

		      Figure 2:	Single-Core CBT	Tree

   Join	messages do not	necessarily travel all the way to the core
   before being	acknowledged; if a join	message	hits a router on the
   tree	(on-tree router) on its	journey	towards	a core,	that on-tree
   router terminates the join and acknowledges it. A router that is
   itself in the process of joining the	tree (i.e. one that has	not yet
   received an ack itself) can terminate and cache any subsequent
   received joins, but cannot acknowledge them until its own join has
   been	successfully acknowledged.

   For simplicity, part	of the CBT protocol [2]	is data	driven;	one of a
   set of cores	for a group is elected (by a group initiator) as the
   PRIMARY core, all others are	termed SECONDARY cores.	The ordering of
   the secondary cores is irrelevant, however, the primary must	be uni-
   form	across the whole group.	Whenever a secondary core is joined, it
   first acks the join then proceeds to	join the primary core -- the
   primary core	simply listens for incoming joins, it need never send a
   join	itself.	In this	way, a CBT tree	becomes	fully connected	in
   response to members joining.

   The tree joining process is triggered when a	subnet's CBT-capable
   router receives an IGMP group membership report [5].	If more	than one
   CBT router is present on a subnetwork, only one router, the subnet's
   designated router (DR), generates a join message. A subnet's	DR is
   implicitly elected (i.e. no CBT protocol involvement) as a "side-
   effect" of IGMP.


Expires	August 9th, 1996				       [Page 12]


INTERNET-DRAFT	       CBT Multicast Architecture	  February 1996


   CBT builds a	loop-free shared tree. In some scenarios it is necessary
   to take explicit precautions	against	loops, in others it is not.  For
   example, a loop cannot form between a router	that wishes to join the
   tree, if that router	has no child tree branches (interfaces). There
   is a	potential for loops if a joining router	has at least one child
   attached. This scenario would constitute a rejoin. Once the rejoining
   router has received an ack, it must generate	a loop detection which
   is sent over	its parent interface. Receiving	routers	must forward
   this	packet over their parent interface only, unless	it is received
   by its originator, in which case a loop is present, or it is	recieved
   by the primary core,	in which case no loop is present. In the former
   case, the loop is broken by the originator sending a	quit packet to
   its new parent, and the latter case solicits	an ack from the	primary
   core, confirming the	absence	of a loop.

   Tree	maintenance takes place	between	adjacent on-tree routers in the
   form	of protocol "keepalive"	messages. Only one of these need be sent
   per link (not per group), and this message is sent in the direction
   child -> parent. The	parent sends a corresponding acknowledgement.
   This	is how tree connectivity is monitored, and breakages/failures
   detected.

   When	a CBT router receives a	multicast data packet, it simply for-
   wards it over all outgoing interfaces, as specified in its FIB entry
   for the group. Protocol mechanisms help ensure that data packets
   never loop.

   The CBT protocol is simple and lightweight. The CBT protocol	specifi-
   cation is described in an separate document [2].


4.3.  Core Selection, Placement, and Management

   The issues of core (RP) selection, placement, and management	are
   still under review, although	several	recent advancements have been
   made	[13].

   Until recently, it was thought that hosts would need	to be explicitly
   involved in discovering core	addresses corresponding	to a particular
   group. This would require host system changes, and modifications to
   applications, such that an application could	request	the host to
   establish a <core, group> mapping for a given group,	as well	as some
   sort	of core	advertisement protocol in which	hosts and core routers


Expires	August 9th, 1996				       [Page 13]


INTERNET-DRAFT	       CBT Multicast Architecture	  February 1996


   participate.

   A dependency	on host	or application involvement with	core (RP)
   discovery is	therefore very undesirable. Indeed, the	type of	multi-
   cast	tree to	be joined should be invisible to hosts (and even appli-
   cations).

   A new proposal has recently emerged called Hierarchical PIM [13],
   which proposes having a multi-level hierarchy of RPs	(cores), but
   which are known only	to multicast routers; cores (RPs) remain invisi-
   ble to hosts. Whilst	its name suggests it is	specific to PIM, its
   principles are not.

   Each	level of hierarchy corresponds to a multicast "scope level",
   with	a certain multicast address range corresponding	to each	scope.
   This	therefore, requires the	multicast address space	to be officially
   "administratively scoped"; a	group initiator	chooses	a multicast
   address based on the	perceived scope	of the group. On receipt of an
   IGMP	group membership report	from a host, a local router (the
   lowest-level	of the core (RP) hierarchy) knows, based on the	group
   address, whether it has to join a core at the next level up in the
   hierarchy; each candidate RP	at a particular	level advertises itself
   within its own level	(using a well-known scoped multicast address),
   and to the level below. Hence, an RP	should always know how to reach
   an RP at the	next level up in the hierarchy.	A next-level RP	is
   chosen by using a hash function across the group address.

   The lower the hierarchy level, the more densely RPs populate	that
   level. However, at the top level (for globally-scoped groups) it is
   expected there will be sufficient RP	distribution so	that shared tree
   paths (i.e. delay) is reasonably efficient.

   It is expected that there will probably be six or seven levels of
   hierarchy (local, region, country, continent, etc.),	with the candi-
   date	RPs changing relatively	infrequently (say every	24 hrs), with
   the candidate RP announcements being	made more frequently, e.g.
   every few minutes.

   The finer details of	this scheme have still to be worked out, but due
   to the significant advantage	of being host-transparent, it is highly
   likely that this RP/Core selection/placement/management scheme will
   be adopted (footnote).
_________________________
This work is currently progressing under  the  auspices
of the IDMR working group of the IETF.


Expires	August 9th, 1996				       [Page 14]


INTERNET-DRAFT	       CBT Multicast Architecture	  February 1996


5.  Architectural Justification: Comparisons & Simulation Results

   This	majority of this section summarises the	results	of in-depth
   simulations carried out by Harris Corp., USA, investigating the per-
   formance and	resource cost comparisons of multicast algorithms for
   distributed interactive simulation applications [10].  More pre-
   cisely, the report summarises the work on the Real-time Information
   Transfer & Networking (RITN)	program, comparing the cost and	perfor-
   mance of PIM	[7] and	CBT [2]	in a DIS environment. As we said ear-
   lier, DIS applications have wide-ranging requirements. We feel it is
   important to	take into account a wide range of requirements so that
   future applications can be accommodated with	ease; all other	multi-
   cast	architectures are tailored to the requirements of applications
   in the current Internet, particularly audio and video applications.
   Figure 3 shows a comparison of application requirements [10].

   We also present results into	the study of whether source-based trees
   or shared trees are the best	choice for multicasting	in the RITN pro-
   gram. In the	study of shortest-path trees (SPTs) vs.	shared trees,
   PIM Dense-Mode and PIM-SM with SPTs were used as SPTs, with CBT and
   PIM-SM used as shared trees.	This section assumes the reader	is fami-
   liar	with the different modes of PIM	[7].


Expires	August 9th, 1996				       [Page 15]


INTERNET-DRAFT	       CBT Multicast Architecture	  February 1996


     |--------------|----------------------------------------------------|
     |		    |			 Paradigm			 |
     |		    |----------------------------------------------------|
     | Requirement  |		  |  audio/video    |	 audio/video	 |
     |		    |	 DIS	  | lecture dist'n  |	  conference	 |
     =====================================================================
     |	 senders    |	 many	  |  small number   |	small number	 |
     ---------------------------------------------------------------------
     |	receivers   |also senders |  many recvrs    |  also senders	 |
     ---------------------------------------------------------------------
     | no. of grps  |		  |		    |			 |
     | per appl'n   |	 many	  | one	or few	    |  one or few	 |
     ---------------------------------------------------------------------
     | Data tx	    |  real time  |  real time	    |  real time	 |
     ---------------------------------------------------------------------
     | e2e delay    |	 small	  |  non-critical   |	moderate	 |
     ---------------------------------------------------------------------
     |	set up	    |  real time  | non	real time   |	non real time	 |
     ---------------------------------------------------------------------
     | join/leave   | rapid for	  | rapid for	    | can be rapid for	 |
     | dynamics	    | participants| receivers	    |  participants	 |
     ---------------------------------------------------------------------
     |		    | must be	  | must be scalable|			 |
     | scalability  | scalable to | to large n/ws   |	need scalability |
     |		    | large n/ws &| and	large nos   |	large n/ws	 |
     |		    | large grps, | of recvrs per   |			 |
     |		    | with large  | group	    |			 |
     |		    | nos senders |		    |			 |
     |		    | &	recvrs per|		    |			 |
     |		    | group	  |		    |			 |
     ---------------------------------------------------------------------
     |		    | based upon  |		    |			 |
     | multicast    | the DIS	  |  rooted on src  | incl participants	 |
     |	 tree	    | virtual	  |  and includes   | and can slowly move|
     |		    | space, with |  current recvrs | over phys	topology |
     |		    | rapid mvmt  |		    |			 |
     |		    | over phys	  |		    |			 |
     |		    | topology	  |		    |			 |
     ---------------------------------------------------------------------

	      Figure 3:	Comparison of Multicast	Requirements


Expires	August 9th, 1996				       [Page 16]


INTERNET-DRAFT	       CBT Multicast Architecture	  February 1996


   The following criteria were considered in the simulations:

   +	end-to-end delay

   +	group join time

   +	scalability (all participants are both senders & receivers)

   +	bandwidth utilization

   +	overhead traffic

   +	protocol complexity

   A brief summary of the the results of the evaluation	follow.	For a
   detailed description	of the simulations and test environments, refer
   to [10].

   +	End-to-end delay. It was shown that PIM-DM and PIM-SM with SPTs
	deliver	packets	between	13% and	31% faster than	CBT. PIM-SM has
	about the same delay characteristics as	CBT. Processing	delay
	was not	taken into account here, and this stands in PIM's
	favour,	since PIM routers are likely to	have much larger routing
	tables,	and thus, much greater search times.

   +	Join time. There was very little difference between any	of the
	algorithms.

   +	Bandwidth efficiency. It was shown that	PIM-SM with shared
	trees, and PIM-SM with SPTs both required only about 4%	more
	bandwidth in total, to deliver data to hosts. PIM-DM however, is
	very bandwidth inefficient, but	this improves greatly as the
	network	becomes	densely	populated with group receivers.

   +	Overhead comparisons. CBT exhibited the	lowest overhead	percen-
	tage, even less	than PIM-SM with shared	trees. PIM-DM was shown
	to have	more than double the overhead of PIM-SM	with SPTs due to
	the periodic broadcasting & pruning.

	The Harris simulations [10] measured the average number	of rout-
	ing table entries required at each router for CBT, PIM-DM, PIM-
	SM with	SPTs, and PIM-SM with shared trees. The	following param-
	eters were used	in the simulations:

	 +  N =	number of active multicast groups in the network.


Expires	August 9th, 1996				       [Page 17]


INTERNET-DRAFT	       CBT Multicast Architecture	  February 1996


	 +  n =	average	number of senders to a group.

	 +  b =	fraction of groups moving to source tree in PIM-SM.

	 +  c =	average	percentage of routers on a shared tree for a
	 group (or source tree for a particular	sender).

	 +  s =	average	percentage of routers on a source-based	tree for
	 a group (but not on shared tree).

	 +  d =	average	number of hops from a host to the RP.

	 +  r =	total number of	routers	in the network.

	 The following results were calculated,	given b	= 1, c = 0.5,
	 s = 0.25, and d/r = 0.05.


	   |--------------|----------------------------------------------------|
	   |		  |		  Group	size parameters		       |
	   |		  |----------------------------------------------------|
	   |   Protocol	  |   N	= 1000	| N = 1000  | N	= 20,000  | N =	20,000 |
	   |		  |    n = 10	|  n = 200  |	n = 10	  |   n	= 200  |
	   =====================================================================
	   |		  |		|	    |		  |	       |
	   |	 CBT	  |	500	|    500    |	10,000	  |   10,000   |
	   ---------------------------------------------------------------------
	   |		  |		|	    |		  |	       |
	   |  PIM-Dense	  |   10,000	|  200,000  |	200,000	  |  4,000,000 |
	   ---------------------------------------------------------------------
	   |  PIM-Sparse  |		|	    |		  |	       |
	   |   with SPT	  |    8000	|  150,500  |	160,000	  |  3,010,000 |
	   ---------------------------------------------------------------------
	   | PIM-Sparse,  |		|	    |		  |	       |
	   | shared tree  |    1000	|   1,500   |	20,000	  |   210,000  |
	   ---------------------------------------------------------------------

	       Figure 4: Comparison of Router State Requirements

    +	 Complexity comparisons. Protocol complexity, protocol traffic
	 overhead, and routing table size were considered here.	CBT was
	 found to be considerably simpler than all other schemes, on all
	 counts	(footnote).
_________________________
The comparisons	carried	out were subjective.


Expires	August 9th, 1996				       [Page 18]


INTERNET-DRAFT	       CBT Multicast Architecture	  February 1996


5.1.  Traffic Concentration

   One of the criticisms leveled at shared trees is that traffic is
   aggregated onto a smaller subset of network links, than with	source
   tree	protocols.

   It has been shown that if shared trees, spanning a large number of
   receivers, with some	(not insignificant proportion of these being
   also	senders), if such a shared tree	has only a small number	of cores
   (RPs), traffic concentration	is a problem on	the core incident links,
   and for the core itself. The	Harris simulation results [10] suggest
   increasing the number of cores as group size	increases, thereby
   largely eliminating the problem.

5.2.  Shared Trees and Load Balancing

   An important	question raised	in the SPT vs. CBT debate is: how effec-
   tively can load sharing be achieved by the different	schemes? The
   scope of ability for	DVMRP to do load-sharing is limited primarily by
   its forwarding algorithm (RPF); this	allows for only	single-path
   routing.

   With	shared tree schemes however, each receiver can choose which of
   the small selection of cores	it wishes to join. Cores and on-tree
   nodes can be	configured to accept only a certain number of joins,
   forcing a receiver to join via a different path. This flexibility
   gives shared	tree schemes the ability to achieve load balancing.

   In general, spread over all groups, CBT has the ability to randomize
   the group set over different	trees (spanning	different links	around
   the centre of the network), something that would not	seem possible
   under SPT schemes.

6.  Conclusions

   In comparing	CBT with the other shared tree architecture, PIM, CBT
   was found to	be more	favourable in terms of scalability, complexity,
   and overhead. Other characteristics were found to be	very similar.

   When	comparing SPTs with shared trees, we find that the end-to-end
   delays of shared trees are usually acceptable, and can be improved by
   strategic core placement. Routing table size	is another important
   factor in support of	shared trees, as figures 1 and 4 clearly illus-
   trate.


Expires	August 9th, 1996				       [Page 19]


INTERNET-DRAFT	       CBT Multicast Architecture	  February 1996


   Shared trees	also have more potential to load balance traffic across
   different links or trees.  In the absence of	multi-path multicast
   routing, this does not seem possible	with source-based tree tech-
   niques.


   Acknowledgements

   Special thanks goes to Paul Francis,	NTT Japan, for the original
   brainstorming sessions that brought about this work.

   Thanks also to Bibb Cain et al. (Harris Corporation)	for allowing the
   publication of their	simulation results, and	the duplication	of fig-
   ures	3 and 4.

   Ken Carlberg	(SAIC) reviewed	the text, and provided useful feedback
   and suggestions.

   I would also	like to	thank the participants of the IETF IDMR	working
   group meetings for their general constructive comments and sugges-
   tions since the inception of	CBT.


Expires	August 9th, 1996				       [Page 20]


INTERNET-DRAFT	       CBT Multicast Architecture	  February 1996


   APPENDIX

   The following formulae were used by Harris Corp. in calculating the
   values in figure 4. The meaning of the formulae arguments precedes
   figure 4.

   +	average	no. of entries in each PIM-DM router is	approximated by:

	T(PIM-DM) ~ N *	n

   +	average	no. of entries a router	maintains is the likelihood, c,
	that the router	will be	on a shared tree, times	the total
	number,	N, of shared trees:

	T(CBT) = N * c

   +	average	no. of entries a router	maintains due to each src based
	tree is	the average no.	of hops, d, from a host	to the RP,
	divided	by the number of routers, r, in	the network:

	T(PIM-SM, shared tree) = N * c + N * n * d/r

   +	average	no. of routing table entries for PIM-SM	with some groups
	setting	up source-based	trees:

	T(PIM, SPT) = N	* [B * n + 1] *	c + b *	n * s

   For more detailed information on these formulae, refer to [10].


References

  [1] MBONE, The Multicast BackbONE; M.	Macedonia and D. Brutzman;
  available from http://www.cs.ucl.ac.uk/mice/mbone_review.html.

  [2] Core Based Trees (CBT) Multicast:	Protocol Specification;	A. J.
  Ballardie, N.	Jain, and S. Reeve;
  ftp://cs.ucl.ac.uk/darpa/IDMR/draft-ietf-idmr-cbt-spec-04.txt.

  [3] A. J. Ballardie. Scalable	Multicast Key Distribution
  (ftp://cs.ucl.ac.uk/darpa/IDMR/draft-ietf-idmr-mkd-01.{ps,txt}). Work-
  ing draft, 1995.

  [4] DVMRP. Described in "Multicast Routing in	a Datagram Internet-
  work", S. Deering, PhD Thesis, 1990. Available via anonymous ftp from:


Expires	August 9th, 1996				       [Page 21]


INTERNET-DRAFT	       CBT Multicast Architecture	  February 1996


  gregorio.stanford.edu:vmtp/sd-thesis.ps.

  [5] W. Fenner. Internet Group	Management Protocol, version 2 (IGMPv2),
  (draft-idmr-igmp-v2-01.txt). Working draft.

  [6] First IETF Internet Audiocast; S.	Casner,	S. Deering; In proceed-
  ings of ACM Sigcomm'92.

  [7] D. Estrin	et al. USC/ISI,	Work in	progress.
  (http://netweb.usc.edu/pim/).

  [8] J. Moy. Multicast	Routing	Extensions to OSPF. Communications of
  the ACM, 37(8): 61-66, August	1994.

  [9] Y.K. Dalal and R.M. Metcalfe. Reverse path forwarding of	broad-
  cast packets.	Communications of the ACM, 21(12):1040--1048, 1978.

  [10] T. Billhartz, J.	Bibb Cain, E. Farrey-Goudreau, and D. Feig.
  Performance and Resource Cost	Comparisons of Multicast Routing Algo-
  rithms for Distributed Interactive Simulation	Applications; a	report
  prepared for NRL ****, July 1995.

  [11] A. J. Ballardie.	Defining a Level-1/Level-2 Interface in	the
  Presence of Multiple Protocols. Working draft, January 1996.
  ftp://cs.ucl.ac.uk/darpa/IDMR/draft-ietf-idmr-hier-intf-00.txt

  [12] D. Wall.	Mechanisms for Broadcast and Selective Broadcast. PhD
  thesis, Stanford University, June 1980. Technical Report #90.

  [13] M. Handley, J. Crowcroft, I. Wakeman. Hierarchical Rendezvous
  Point	proposal, work in progress.
  (http://www.cs.ucl.ac.uk/staff/M.Handley/hpim.ps) and
  (ftp://cs.ucl.ac.uk/darpa/IDMR/IETF-DEC95/hpim-slides.ps).

  [14] A. J. Ballardie.	"A New Approach	to Multicast Communication in a
  Datagram Internetwork", PhD Thesis, 1995. Available via anonymous ftp
  from:	cs.ucl.ac.uk:darpa/IDMR/ballardie-thesis.ps.Z.

  [15] D. J. Hook, J. 0. Calvin, M. K. Newton, D. A. Fuscio. "An
  Approach to DIS Scalability",	Eleventh DIS Workshop, Orlando,	Florida,
  Sept.	26th-30th, 1994, pp 94-141.


Expires	August 9th, 1996				       [Page 22]


INTERNET-DRAFT	       CBT Multicast Architecture	  February 1996


Author's Address:

   Tony	Ballardie,
   Department of Computer Science,
   University College London,
   Gower Street,
   London, WC1E	6BT,
   ENGLAND, U.K.

   Tel:	++44 (0)71 419 3462
   e-mail: A.Ballardie@cs.ucl.ac.uk


Expires	August 9th, 1996				       [Page 23]