Internet Engineering Task Force Subir Biswas INTERNET DRAFT Rauf Izmailov Expires November, 1999 Bala Rajagopalan NEC USA Inc June 25, 1999 A QoS-Aware Routing Framework for PIM-SM Based IP-Multicast Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Distribution of this memo is unlimited. Copyright Notice Copyright (C) The Internet Society (1999). All Rights Reserved. Abstract This document describes a framework for QoS-aware multicast routing based on enhancements to PIM-SM. This framework has two components: the development of multicast tree computation schemes that account for heterogeneous receiver QoS requirements, and the incorporation of QoS capabilities into PIM-SM for establishing computed paths. The choice of PIM-SM is based on its ability to support receiver-initiated join, both shared and source-specific trees. Two decentralized tree computation schemes, TIQM and NUQM, are considered in this document, both requiring support from the underlying unicast routing scheme to acquire link and nodal resource availability information. Additionally, TIQM requires complete information at each router about the resource allocation in the network for the existing multicast tree. After a brief analysis of these two schemes, the incorporation of QoS awareness in PIM-SM with Biswas et. al. [Page 1] Internet Draft A QoS-Aware Routing Framework for PIM-SM June 25, 1999 NUQM as its underlying route computation mechanism is presented. It is shown that with few extensions to its existing control syntaxes, PIM-SM can be successfully used for QoS integration. Finally, a Tree State Update (TSU) mechanism, which can be used for efficient and scalable dissemination of multicast tree state information, is proposed. It is shown that by using the proposed TSU protocol it is possible to partially alleviate the control overhead of TIQM. Table of Contents 1. Introduction .............................................. 2 2. Multicast Routing and Quality of Service.... .............. 4 2.1 Core Based Tree (CBT)................................... 5 2.2 Dense Mode Protocols: DVMRP and PIM-DM.................. 6 2.3 Multicast OSPF (MOSPF).................................. 6 2.4 Protocol-Independent Multicast - Sparse Mode (PIM-SM)... 7 3. QoS-Aware Multicast Tree Construction...................... 8 3.1 Problem Statement........................................ 8 3.2 Solution Outline......................................... 10 4. Tree Information Based QoS Multicast (TIQM)................ 11 4.1 Path Computation......................................... 11 4.2 Tree Construction and Maintenance........................ 12 4.3 Discussion............................................... 16 5. Nave Unicast Based QoS Multicast (NUQM)................... 17 5.1 Path Computation......................................... 17 5.2 Tree Construction and Maintenance........................ 18 5.3 Discussion............................................... 18 6. Architecture for QoS-Enhanced PIM-SM....................... 19 6.1 Service Model............................................ 20 6.2 Tree Construction and Switching.......................... 21 6.3 QoS-extensions Necessary for PIM-SM...................... 25 7. Further Enhancements by Tree State Update (TSU) Protocol.... 26 8. Summary and Conclusions..................................... 28 References..................................................... 29 Authors' Addresses............................................. 32 Full Copyright Statement ...................................... 32 1. Introduction This document describes a QoS-aware multicast routing framework that implements resource reservation at the IP routing level. This approach aims to alleviate a problem that may be encountered with RSVP, under which routing is decoupled from resource reservation: the inability to satisfy receiver QoS requests even when suitable paths exist in the network. As a background note, few other researchers have already identified and attempted to solve this problem. The works in [5] and [6] propose bandwidth reservation on a bi-directional shared tree that can be easily incorporated within the Core Based Tree (CBT) [7] multicast routing protocol. It can be argued that in the presence of heterogeneous and receiver-specified QoS requirements, source specific trees are more efficient than bi-directional trees for optimal QoS provisioning. It is therefore concluded that source specific tree oriented protocols like MOSPF Biswas et. al. [Page 2] Internet Draft A QoS-Aware Routing Framework for PIM-SM June 25, 1999 [8] and PIM-SM [9] are more suitable than CBT, for QoS enhancements. In [10], a new protocol QoSMIC that uses source specific trees is proposed. While this protocol is functionally feasible, the following problems can be identified. First, every new receiver-join process requires an extensive network-wide search that is carried out by both the receiver and the multicast tree that it intends to join. In addition to being excessively control-intensive, it increases the join-latency for a new receiver. Moreover, the path search-space for a potential new receiver is restricted to paths used by the route advertisement packets from the potential joining routers on the tree to the new receiver. This restriction may give rise to reduced join-success rate in certain scenarios. Also, there are two administrative limitations. The protocol requires a tree manager that is responsible for coordinating what is termed as a multicast tree search procedure. This centralized component of the protocol makes it vulnerable to a single point of failure. Finally, QoSMIC requires message and timer syntaxes, which are significantly different from those of any of the existing IP multicast protocols. This implies that deployment of QoSMIC may require a completely new routing protocol implementation rather that upgrading an existing one for QoS support. This document describes a multicast routing framework that is capable of providing receiver-oriented and heterogeneous Quality of Service to point-to-multipoint applications. Also, it is attempted to maintain backward compatibility in the sense that the new QoS-aware protocols should be able to (a) support best-effort multicast applications and (b) be implemented by extending an existing IP-multicast routing protocols. (Similar work is reported in [28] where a nave unicast routing approach [18] is used for incremental construction of multicast trees with bandwidth reservation. Three tree computation algorithms are proposed in this paper, each assuming different amount of tree-specific information available at the intermediate multicast routers. These algorithms support both heterogeneity and receiver-oriented bandwidth reservation. End-to-end additive QoS constraints like delay and loss, however, are not considered in [28]. The objective in this document is to design incremental tree construction algorithms which consider both bandwidth and additive QoS constraints specified by the receivers.) This document has three distinct contributions. The first one is to design a set of receiver-based algorithms for multicast tree computation with QoS constraints. The second part is to design protocol syntaxes for tree construction and maintenance using the proposed tree computation algorithms. Note that these two components are independent of any specific existing multicast routing protocol and can be used with any QoS-aware routing protocol that uses source-specific multicast trees. The third and final component of this work is to incorporate the first two into a PIM-SM routing framework. In this regard, starting with a QoS service model it is shown how the PIM-SM protocol can be augmented for QoS support. It Biswas et. al. [Page 3] Internet Draft A QoS-Aware Routing Framework for PIM-SM June 25, 1999 is notable that this is done without sacrificing the soft-state feature of PIM-SM. It is also shown how the proposed tree computation algorithms can be integrated within PIM-SM while keeping the latter backward compatible with best-effort services. This extension for PIM-SM is done only for intra-AS scenarios. Extensions for inter-AS are currently under consideration. For QoS-aware tree computation two algorithms are proposed. The first one, termed as Tree Information based QoS Multicast (TIQM), relies on tree-specific link-state information storage at every node in the network. While being extremely successful in discovering QoS-accommodating paths, TIQM suffers from scalability problems due to its reliance on multicast session-specific information storage. This limitation is alleviated in the second algorithm, namely Nave Unicast based QoS Multicast (NUQM), in which no session or tree-specific link-states are used at a receiver for computing its join path. It is shown that any QoS-aware unicast routing algorithm can be used for developing NUQM. This very feature makes NUQM suitable also for inter-domain multicast tree construction. For this, all it requires is a QoS-extended version of inter-domain routing algorithm like BGP [11]. It is anticipated that at low to moderate network loads NUQM can deliver performance that is comparable to TIQM. Detailed performance comparisons between these two tree computation algorithms are furnished in [12]. The rest of this draft is structured as follows. The next section introduces the problem of multicast QoS guarantee and presents a comprehensive analysis of the QoS potential of the existing IP-multicast routing protocols. The problem of constructing QoS-constrained multicast trees is formally presented in Section 3. Sections 4 and 5 describe the proposed tree construction algorithms TIQM and NUQM, respectively. QoS-enhanced PIM-SM design is described in Section 6. In Section 7, a Tree State Update (TSU) protocol, which can be used for tree information dissemination in PIM-SM is described. Finally, in Section 8 a summary is presented. 2. Multicast Routing and Quality of Service The objective of this work is to devise an efficient, yet simple, multicast routing mechanism that integrates both route setup and resource reservation within a single QoS-aware protocol framework. Considering RSVP, the multicast tree for a group is first built using any of the standard IP multicast routing protocols like DVMRP [16] or MOSPF [18], which are QoS unaware. Afterwards, RSVP PATH messages are sent from a sender to the receivers [2] over this tree and then RESERVE messages are sent from receivers to the sender to establish resource reservation. Since RSVP resource reservation is de-coupled from route creation, the reservation is restricted only along the route that was created by the underlying IP multicast routing protocol. This implies that if the multicast routing protocols are not QoS-aware then RSVP alone cannot always provide quality of service guarantees to the multicast Biswas et. al. [Page 4] Internet Draft A QoS-Aware Routing Framework for PIM-SM June 25, 1999 receivers. Therefore, it is necessary to integrate both routing and reservation installation procedures into a single multicast protocol. Existing multicast routing protocols can be classified based on their following properties. The first one is the type of tree a protocol uses for data forwarding. The second one is its underlying unicast routing protocol and the last one is the way a receiver is added to a multicast group. ------------------------------------------------------------------------ DVMRP MOSPF CBT PIM-DM PIM-SM ------------------------------------------------------------------------ Tree Type Source Source Shared Source Both U. cast Routing RIP OSPF Independent Independent Independent Rx. Join Mode Flood& Rx Init Rx. Init Flood& Rx. Init Prune Prune ------------------------------------------------------------------------ Table 1: Classification of IP Multicast routing protocols A classification summary of the existing multicast protocols is shown in Table 1. It is instructive to evaluate the QoS potentials of all five protocols in light of their properties as listed in the table. 2.1 Core Based Tree (CBT) CBT [7] is the only protocol that does not support source-based trees. It was designed so because the primary objectives of the protocol were to maintain scalability while providing efficient link sharing. This is achieved by forwarding data packets from all the sources of a multicast group through a single bi-directional shared tree. In the absence of QoS constraints and bandwidth reservation, a bi-directional tree can minimize the number of links involved with a multicast group. This is achieved through link sharing among packets from multiple senders. Link sharing, however, causes traffic concentration [27], which in turn may degrade the end-to-end performance of a multicast session. It was shown in [29] that the bound on maximum delay of an optimal core-based tree is two times the shortest path delay. Also, simulation results in [27] show that the maximum delays of core based trees, with optimal core placement, are up to 1.4 times those of the shortest path trees. For multicast with resource reservation, bi-directional tree poses additional restrictions. Consider a QoS-enabled scenario where multiple sources of a multicast group are transmitting through a shared bi-directional tree. Each sender has to transmit at a data rate that is the maximum of the individual data rates required by its receivers. In this case, a link on the tree may have to reserve bandwidth for packets from one or more of those senders. In other words, in the presence of bandwidth guarantee (between each sender-receiver pair), even though a Biswas et. al. [Page 5] Internet Draft A QoS-Aware Routing Framework for PIM-SM June 25, 1999 link can be shared, separate bandwidth reservations on the link is necessary for packets from different senders. The problem that arises here is that the bandwidth reservations for all sources are restricted within the links only on the bi-directional tree. The number of links, which can be potentially used, is higher in scenarios where each sender has a source-specific multicast tree. This implies that the amount of bandwidth-constrained multicast traffic (and the number of senders/receivers) supported by a bi-directional CBT tree can potentially be smaller than that supported by multiple source-specific trees. The situation may become worse when one considers additive QoS constraints like end-to-end delay and loss. Another problem with CBT is its core selection process. CBT core selection for a group is not based on the QoS requirements of the receivers of that group. Once a core is selected, the CBT tree gradually grows around it. This process may restrict the tree topology and further reduce the QoS-constrained traffic handling capacity of the bi-directional CBT trees. The protocols in [5] and [6] propose QoS extensions to CBT but do not identify these problems. Based on the preceding reasons, source-specific trees seem better suited for scalable QoS-based IP multicast. 2.2 Dense Mode Protocols: DVMRP and PIM-DM Both DVMRP [16] and PIM-DM [17] use source specific trees. Both of them, however, rely on a flood-and-prune mechanism for joining new receivers to a multicast tree. With flood-and-prune, the data packets start arriving at a potential multicast receiver even before the receiver explicitly joins the group. As a result, the route from the source to a receiver cannot be chosen based on the specific QoS requirements of the receiver. It has to be chosen based on the QoS parameters specified by the source. Another option is to perform best-effort routing. In either case, receiver-oriented QoS cannot be entertained. Based on this observation, it may be concluded that both DVMRP and PIM-DM are not naturally suitable for receiver-oriented QoS. 2.3 Multicast OSPF (MOSPF) Support for both source-specific tree and receiver-initiated join makes MOSPF [8] suitable for QoS-oriented multicast routing. In the following, a brief outline of a possible mechanism for augmenting MOSPF to support receiver-oriented QoS is described. In order to compute a tree, a multicast router needs to know about the complete group membership information for that tree. This information is already available within standard MOSPF framework. In addition, for QoS support, the router also requires to know about the amount of resource reservation for the group at every link on the tree. This information can also be obtained through the MOSPF link-state Biswas et. al. [Page 6] Internet Draft A QoS-Aware Routing Framework for PIM-SM June 25, 1999 dissemination mechanism. When a multicast packet arrives at a router, using this information, the router can uniquely compute the necessary multicast tree and forward the data packet accordingly. When a new receiver wants to join a group, the receiver's topological identity, along with its QoS requirements, is disseminated all over the local domain using MOSPF link-state update protocol. Upon reception of this new information, a multicast router re-computes the relevant multicast tree. Note that the protocol requires every router to use the same tree computation algorithm in order to avoid potential looping problems. While being functionally sufficient for QoS extensions, MOSPF does not scale well because of its reliance on group-specific link-state information dissemination [15], across the network. Note that most of the network routing protocols, including OSPF, floods network-specific information like physical topology and link-state parameters. When necessary, route computation is performed based on this information. Control and storage requirements of these protocols increase only with the network size and not with the number of sessions/connections in the network. MOSPF however departs from this model significantly by flooding and storing multicast group-specific information. The control and the storage complexities go up with both the network size and the number of active sessions in the network. With resource reservation, the situation is aggravated because in addition to group membership changes, any change in group reservation on a link will now generate a link state update message. This may cause scalability concerns for MOSPF with described QoS extensions. 2.4 Protocol-Independent Multicast - Sparse Mode (PIM-SM) PIM-SM [9] is the most recently designed multicast protocol and it attempts to incorporate most of the desirable features of multicast routing. In addition to supporting source-specific trees, the protocol provides a mechanism for uni-directional shared tree construction. The preferred mode for a receiver is to initially join the shared tree for a group and then, if better service from a source is necessary, to join the source specific tree for that particular source. This provides an ideal paradigm for QoS based multicast where a receiver can initially join the shared tree when it does not have any QoS requirements. Later, if and when the receiver decides about its specific QoS parameters, it can join the individual source-specific trees one at a time. Also, note that this paradigm can allow multiple receivers, with heterogeneous QoS requirements, to join a single source-specific tree. Another advantage of PIM-SM is its independence from the underlying unicast routing protocol. This implies that, unlike MOSPF, one is free to choose a unicast protocol that is specifically designed and better suited for routing with QoS constraints. Also, it is possible to use the same unicast protocol for computing both the source-specific and the shared trees of a multicast group. Later in Section 6, it is described how a common QoS-based tree construction algorithm can be implemented for creating both types of trees in PIM-SM. Based on these observations, MOSPF and PIM-SM are identified as Biswas et. al. [Page 7] Internet Draft A QoS-Aware Routing Framework for PIM-SM June 25, 1999 target multicast routing protocols for QoS extensions. But considering the potential scalability problems of MOSPF, PIM-SM is proposed as the preferred multicast protocol for the QoS routing framework. 3. QoS-Aware Multicast Tree Construction In this section, the problem of constructing QoS-aware multicast trees is described. This applies to both source-specific and shared trees in PIM-SM. After outlining a solution, two tree-construction algorithms are proposed in the next section. The first is based on globally disseminated multicast group information and it is functionally similar to the tree computation part of QoS-augmented MOSPF (see Section 2.3). The second algorithm constructs QoS-aware trees without any global distribution of group membership information. It uses the underlying QoS-aware unicast routing algorithm. After the description and analysis of these algorithms, it is shown how they can be incorporated within the PIM-SM framework. _____ | N_0 | S |_____| \ E_(0,1) __\__ / \ _____ | N_1 | | | Source | | |_____| \_____/ _____ E_(1,2) / \ / \ __ /_ \ | | Router / \ \ | | | N_2 | \ \_____/ | | \ \_____/ \ E_(1,4) _____ * \ \ | | Receiver E_(2,5) * \ E_(2,3) \ |_____| * \ \ |_____| ___*_ __\__ __\__ | N_5 | | N_3 | | N_4 | |_____| |_____| |_____| |_____| |_____| |_____| R_3 R_1 R_2 Figure 1: Model for multicast tree with reservation and QoS guarantees 3.1 Problem Statement An incremental tree construction model is assumed in which the receivers join and leave a multicast group asynchronously. The tree construction problem is explained using the example scenario Biswas et. al. [Page 8] Internet Draft A QoS-Aware Routing Framework for PIM-SM June 25, 1999 depicted in Figure 1. A new receiver R_3 intends to join a multicast tree (S, G) with source S and two existing receivers R_1 and R_2. The tuple (S,G) refers to the source-specific tree for source S in group G. Similarly, a shared tree for group G is represented as (*,G). The objective here is to join the new receiver with its desired Quality of Service, while not interrupting the existing QoS that is being delivered to the receivers R1 and R2. The i_th node of the tree is enumerated as N_i and the edge between the nodes N_i and N_j is represented as E_(i,j). Let B_(i,j)(S,G) represent the amount of bandwidth reserved on edge E_(i,j) for the tree (S,G). Similarly, D_(i,j) and L_(i,j) represent the delay and loss bounds which have been advertised for the edge E_(i,j). We also define a parameter A_(i,j) which indicates the amount of unoccupied bandwidth that is available for new traffic across the edge E_(i,j). Note that these notations are instantaneous and directional in nature. For instance, the quantity D_(i,j) represents the current value of the advertised delay bound for link E_(i,j) in the direction N_i to N_j. Now the QoS specifications for a receiver R_i is defined as the tuple {r_i(S,G), d_i(S,G), l_i(S,G)}. The parameter r_i indicates the rate with which the receiver intends to receive IP packets through tree (S,G). Similarly, d_i, and l_i represent the bounds for end-to-end delay and packet-loss-rate, which the receiver can tolerate. Data rate from the source is expressed as B(S,G). Using these notations, the bandwidth constraints of the QoS-aware tree with receivers R_1 and R_2 can be stated as: r_1(S,G) <= B_(2,3)(S,G), r_2(S,G) <= B_(1,4)(S,G), B_(2,3)(S,G) <= B_(1,2)(S,G), Max(B_(1,2)(S,G), B_(1,4)(S,G)) <= B_(0,1)(S,G) and B_(0,1)(S,G) <= B(S,G). Similarly, the delay and the loss constraints can be stated as: d_1(S,G) >= D_(2,3) + D_(1,2) + D_(0,1), l_1(S,G) >= L_(2,3) + L_(1,2) + L_(0,1), d_2(S,G) >= D_(1,4) + D_(0,1) and l_2(S,G) >= L_(1,4) + L_(0,1). Biswas et. al. [Page 9] Internet Draft A QoS-Aware Routing Framework for PIM-SM June 25, 1999 Now the problem is to add the new receiver R_3 to the tree in a way so that the bandwidth, delay and loss constraints are satisfied for both the new and the existing receivers. In this example the new receiver can join the tree (S,G) at any of the nodes N_1 or N_2. This is because in this model it is assumed that a new receiver can join only at an intermediate router and not at the source or at any of the existing receiver nodes. If R_3 joins at node N_2 then in addition to the previous conditions the following should be satisfied. r_3(S,G) <= B_(2,5)(S,G), Max(B_(2,5)(S,G), B_(2,3)(S,G)) <= B_(1,2)(S,G), d_3(S,G) >= D_(2,5) + D_(1,2) + D_(0,1) and l_3(S,G) >= L_(2,5) + L_(1,2) + L_(0,1). This model of tree construction assumes that when a new receiver joins, the existing part of the tree is not changed. By eliminating this restrictive assumption it is possible to construct and maintain trees which are more efficient in terms of network resource utilization [18]. In practice, however, it means that every time a new receiver joins, the tree might have to undergo a reconstruction phase and this may cause significant service disruptions for the existing receivers. Hence, the provisions for reconstruction of the existing parts of a multicast tree are precluded. 3.2 Solution Outline The tree construction problem as described above is different from static Steiner Minimal Tree (SMT) computation [15, 18]. The reasons are as follows. First, in addition to bandwidth, additive QoS constraints are also considered. These are not considered in the standard SMT computation. Also, the standard SMT computation does not take into account heterogeneous receiver requests and incremental multicast trees. In the present case, after satisfying the bandwidth, delay and loss constraints of individual receivers, it is necessary to minimize the bandwidth consumption of a computed tree. It is shown in [19] that computing just a point-to-point route with more than one additive QoS constraint (i.e. delay and loss) is NP-complete. However, algorithms are available to compute such a route with bandwidth and only one additive constraint in polynomial time [3]. Using similar algorithms it is also possible to compute source-specific multicast trees, with bandwidth and only one additive constraint, in polynomial time. However, construction of such a tree with bandwidth, delay and loss constraints is NP-compete. For computational scalability reasons, the constrained tree is computed using a set of heuristics based on Dijkstra's shortest path computation algorithm [20]. Two algorithms, differing in their assumptions about the amount of group-specific information availability, are proposed in this draft. Biswas et. al. [Page 10] Internet Draft A QoS-Aware Routing Framework for PIM-SM June 25, 1999 4. Tree Information Based QoS Multicast (TIQM) In this algorithm, the joining point for a new receiver R, to an existing tree (S,G), is computed based on the network and tree-specific information stored at R. Following is a list of state information which are required to be stored within every multicast-aware router. The general scenario is considered in which a receiver can also act as a multicast router. This is to include the behavior of a PIM-SM Designated Router (DR) that can act both as a receiver and an intermediate multicast router 1. Physical connectivity of the domain (with N nodes). 2. Link-state information A_(i,j), D_(i,j) and L_(i,j) for links E_(i,j), where i,j = 1..N and i != j. 3. Topology and membership of all the multicast trees in the domain. 4. For every tree (S,G), the reservation information B_(i,j)(S,G). This quantity refers to the amount of bandwidth reserved for tree (S,G) along the direction N_i -> N_j on link E_(i,j), where i,j = 1..N and i != j. The information in item (1) can be disseminated using an existing link-state update protocol such as OSPF [13]. The information indicated in item (2) can be propagated in link-state advertisement(LSA) messages. Such additions are already proposed and partially implemented in various QoS-aware unicast protocols like those presented in [21] and [22]. Tree-specific membership information in item (3) can also be flooded by the link-state protocol, as exemplified by MOSPF. Finally, the tree-specific reservation information of item (4) can also be broadcast in LSAs of a protocol like MOSPF. Design of TIQM involves two major steps. The first part is the development of the algorithm used by a joining receiver to determine the path to the appropriate node in the target tree. Once such a path is computed, the second part involves the syntax and protocol specification for routing the join message, and for actually grafting the new receiver to the existing tree. This also includes mechanisms for membership deletion and tree pruning. 4.1 Path Computation As stated in Section 3.1, the problem here is to compute a route for a new receiver R_n to join a multicast tree (S,G). The route computation should be such that the QoS requirements {r_n(S,G), d_n(S,G), l_n(S,G)} of R_n are respected while those of the existing receivers are not violated. It can be shown that computing just a unicast route with more than one additive QoS constraints (i.e. delay and loss) is NP-compete [19]. Extending this for multicast makes it computationally even more expensive. For computational scalability reasons, a polynomial time heuristic solution that can compute a join-route is considered. Biswas et. al. [Page 11] Internet Draft A QoS-Aware Routing Framework for PIM-SM June 25, 1999 The join-route computation algorithm is executed at the new receiver R_n, with its QoS requirement {r_n(S,G), d_n(S,G), l_n(S,G)}. The steps of the heuristic algorithm are as follows: 1. From the network graph, prune all the links E_(i,j) with A_(i,j) + B_(i,j)(S,G) <= r_n(S,G). If the pruned graph contains multiple disjoint sub-graphs, and the nodes R_n and S becomes disconnected then declare a join-rejection and terminate the algorithm. If R_n and S belong to a connected sub-graph then on every link of that sub-graph there should be enough bandwidth to support a join-route from R_n to (S,G). 2. Run Dijkstra's min-cost algorithm [20] on the sub-graph with D_(i,j) as the cost parameter. The result of this would be the minimum-delay paths from Rn to all other nodes in the sub-graph. 3. Choose M of those, which are the min-delay paths from Rn to all M nodes which are on tree (S,G). Extend each of these M paths, from the node on (S,G) tree to the source S, by traversing upwards along the tree. 4. Check the delay (d_n(S,G)) and loss (l_n(S,G)) feasibility for all of those M extended paths. If available, choose a feasible path and terminate the algorithm. When multiple paths are feasible, one can be selected based on a preset policy like minimizing bandwidth consumption. 5. If no feasible path was found at this stage, run Dijkstra's min-cost algorithm with loss L_(i,j) as the cost parameter and repeat steps 2 to 4. 6. If no path is found then declare a join-rejection for receiver Rn. The worst case computational complexity of the algorithm is as follows. Step (1) is O(N^2) complex, where N is the number of nodes in the network. Dijkstra's min-cost path computation in Step (2) has a worst case complexity of O(N^2) [20]. Steps 3 and 4, together, are again O(N^2) complex. Combining these, the worst case complexity turns out to be O(N^2). 4.2 Tree Construction and Maintenance The route computation of TIQM is de-coupled from the actual tree construction and maintenance protocol. TIQM can be used with the QoS-enhanced versions of both MOSPF and PIM-SM. In this section, the signaling syntax for receiver initiated join and tree maintenance is described. A variation of this will be used later, in Section 6, for incorporating TIQM into the proposed QoS-aware PIM-SM framework. Consider the scenario in Figure 2, where a new receiver R_n intends to join a tree (S,G) that has two existing receivers R_1 and R_2. Based on the route computation algorithm described in Section 4.1, Biswas et. al. [Page 12] Internet Draft A QoS-Aware Routing Framework for PIM-SM June 25, 1999 R_n computes a QoS-feasible path {N_5->N_6->N_2->N_1->N_0}. Two signaling messages, namely, Join and Join_Ack are defined. These are used for path setup from receiver R_n to tree (S,G). A Join message is formatted as: Join(R_n, (S,G), {r_n(S,G), d_n(S,G), l_n(S,G) }, ERO), where r_n(S,G) indicates the receiver specified data-rate and {d_n(S,G), l_n(S,G)} represent the remainder of the delay and the loss budgets, initially specified by R_n. To explain it further, when an intermediate node receives a Join message, it interprets d_n(S,G) as the maximum delay that can be incurred by the remainder of the hops to the tree source. The node also modifies the delay budget as d_n(S,G) d_n(S,G) - D_(i,j), before forwarding the Join message. The last entity ERO is the Explicit Route Object, which is {N_5-N_6-N_2-N_1-N_0} in the present example. A Join_Ack message is constructed as: Join_Ack(R_n, (S,G), Success_Flag, Failure_Reason, Failed_Link), in which the Success_Flag indicates if R_n was successfully joined to (S,G). The parameter Failure_Reason encodes the reason for join-rejection and Failed_Link carries the identity of the link on which the join had failed. _____ + | N_0 | S Receiver-Computed + -->|_____|..... Join-path to (S,G) + | \ : + | __\__ : + |____/ \<..: + | N_1 | + ------>| | + | \_____/\ + | / : \ + | __ /_ : \ + |___/ \<.: \ + | N_2 | \ + ------>| | \ + | \_____/\ \ + | * : \ \ + | __*__ : \ \ + |____/ \<.: \ \ + | N_6 | \ \ + ----->| |.... \ \ + | \_____/ : \ \ + | * : \ \ + | ___*_ : __\__ __\__ + |___| N_5 | : | N_3 | | N_4 | + Join |_____|<.....: |_____| |_____| + |_____| Join_Ack |_____| |_____| R_n R_1 R_2 Figure 2: Join procedure for a new receiver Biswas et. al. [Page 13] Internet Draft A QoS-Aware Routing Framework for PIM-SM June 25, 1999 The join procedure has the following steps. 1. Receiver R_n computes a join-path and launches a Join message towards (S,G). 2. Upon reception of the Join message, an intermediate node N_i checks for resources and QoS constraints along the downstream direction (sender to receiver) on the incoming link E_(i,j). The resource check succeeds if r_n(S,G) <= A_(i,j) + B_(i,j)(S,G), d_n(S,G) >= D_(i,j) and l_n(S,G) >= L_(i,j). Given that TIQM has included link E_(i,j) in the ERO, this resource-check should fail only due to transient mismatch between the actual link-states and those stored within the receiver node R_n. In case of a failure, the node sends a Join_Ack message, with Failed_Link as E_(i,j), back to R_n. On its way back to Rn, the Join_ack message clears all the resources that were reserved by the corresponding Join message. 3. If link E_(i,j) is on tree (S,G) then a check is performed to see if the existing reservation B_(i,j)(S,G) for tree (S,G) is sufficient to handle the new receiver with bandwidth requirement r_n(S,G). If B_(i,j)(S,G) <= r_n(S,G), first the available bandwidth record is adjusted as A_(i,j) <- A_(i,j) - (r_n(S,G) - B_(i,j)(S,G)) and then the reservation B_(i,j)(S,G) is increased to r_n(S,G). If E_(i,j) is not on tree (S,G) then a reservation for tree (S,G) is made as B_(i,j)(S,G) <- r_n(S,G) and the available bandwidth record is adjusted as A_(i,j) <- A_(i,j) - B_(i,j)(S,G). 4. After executing steps 2 and 3, if the intermediate node N_i finds that it itself is not the source node then it forwards the Join message towards the source. If the node is not on (S,G) then it uses the ERO object for forwarding. Otherwise, it is forwarded upstream towards S along the tree (S,G). Before forwarding a Join message it's delay and loss objects are modified as d_n(S,G) <- d_n(S,G) - D_(i,j) and l_n(S,G) <- l_n(S,G) - L_(i,j). If N_i is the source, it sends back a Join_Ack, with Success_Flag as TRUE, to the receiver R_n. If the Join message is forwarded up, the next upstream nodes execute steps 2 and 3. Note that this signaling protocol assumes that every successful Join message has to travel all the way up to the source S. This turns out to be a requirement only when the joining receiver specifies at least one additive QoS metrics like delay or loss. This requirement can be avoided as follows. Every intermediate multicast router on a tree stores its distance Biswas et. al. [Page 14] Internet Draft A QoS-Aware Routing Framework for PIM-SM June 25, 1999 (both in terms of delay and loss) from the source along the tree. When a Join message arrives, based on the stored distance information a node can judge if the additive constraints can be satisfied. With such an arrangement, a Join message has to travel up only for the bandwidth constraint but not for the additive ones. Downstream Join-Ack messages can be used for building up these distance information along the routers. In the absence of any additive metric, the Join message propagation stops as soon as an intermediate node N_i finds the condition r_n(S,G) <= B_(i,j)(S,G) to be true. 5. When a Join_Ack message comes back with a failure indication to Rn, it has the option to crank-back [23] the join operation. Cranking-back involves running TIQM with the Failed_Link pruned out of the network, and then re-launching a Join message with the newly computed ERO object. Note that a crank-back will help only when the initial join-rejection was a result of transient errors in link-state information within Rn. A join operation may also fail in the following situation. Since the receivers do not know about the sender's capability a priori, it is possible for a receiver to ask for bandwidth that is more than the sender's data rate. In such a situation the Join message will travel all the way to the source and if the source is not ready to increase its data rate, a Join_Ack message will be returned to the corresponding receiver. The Join_Ak message will carry the rate bound for the sender. If the receiver intends to retry the Join, this time it should choose a realistic bandwidth parameter based on the rate bound, specified by the sender. The usual crank-back would not help in this situation. The join-protocol assumes an explicit routing model in which the new receiver computes the entire join-path and sends that as the ERO component in a Join message. TIQM can also be used by a hop-by-hop join-protocol in which every hop node has to run TIQM and find the next hop link on the join path. This has the advantage of being more dynamic and not requiring a long Join message for incorporating a large information field like ERO. However, the hop-by-hop model may suffer from larger join latency, caused by TIQM computation at every intermediate node on the join-path. Also, a hop-by-hop mechanism is less immune to looping that may result from possible inconsistencies in link-state information in different TIQM computing nodes. TIQM can be implemented as soft-state or hard-state. In hard-state (e.g. [7]), after a receiver joins a tree the routing states in the multicast routers are nailed down permanently. It can be deleted only when the receiver leaves the tree and a prune message is sent explicitly to the tree. With soft-state, a receiver, after joining the tree, sends periodic Join messages for refreshing the routing states in the network. In order to leave the receiver simply stops sending Join messages and the routers eventually time out and delete Biswas et. al. [Page 15] Internet Draft A QoS-Aware Routing Framework for PIM-SM June 25, 1999 the corresponding routing states. More about soft-state protocols can be found in [2]. Every multicast router with a (S,G) routing entry maintains an input-interface (iif) and a number of output-interfaces (oif) associated with the routing entry. A multicast IP packet, that has a source stamp (S,G) and is received through the iif, is copied and forwarded through all the oif-s. Note that the Reverse Path Forwarding (RPF) check [4] is not feasible for forwarding with QoS constraints. This is simply because there is no way to ensure that a router will receive an IP packet, on a QoS-aware multicast tree, through an interface which is used by the router for sending unicast packets to the tree source. So the loop prevention has to be performed at the TIQM route computation level. 4.3 Discussion TIQM provides an algorithm for QoS-aware multicast route computation in the presence of both network-specific and tree-specific link state information. Availability of global information ensures that when enough bandwidth is available to accommodate a new receiver, TIQM does so with very low join-rejection probability. Also, because of its reliance on multicast group-specific information, TIQM has the potential for performing traffic engineering [24] while computing multicast routes. However, the algorithm has the following disadvantages. First, it relies on group and tree-specific link state information, which may be of severe scalability concern [15]. It will be more so when multicast applications become widespread in the Internet. In addition to the network-specific state changes, every join and leave event for all trees in a domain would generate a link state update event. This adds to both control and storage complexity. Performance of the algorithm can be broadly categorized into a user-specific part and a network-specific part. The user-specific part includes mean join-acceptance rate and the packet-level Quality of Service, experienced by the multicast receivers. The network-specific part can be expressed in terms of bandwidth utilization, which translates to the multicast traffic handling capacity of the protocol. More about the performance of TIQM can be found in [12]. Since TIQM relies on maximum amount of possible link and group state information, its performance is considered as a reference for comparison while evaluating the next tree computation algorithm, NUQM. Biswas et. al. [Page 16] Internet Draft A QoS-Aware Routing Framework for PIM-SM June 25, 1999 5. Nave Unicast Based QoS Multicast (NUQM) This algorithm avoids the scalability problems encountered with TIQM. The idea behind NUQM is quite simple and it has two components. In order to join a receiver R_n, QoS-aware unicast route from R_n to the source S (of tree (S,G)) is first computed and this route is then merged with the existing tree (S,G). Since the algorithm needs to compute only unicast routes, it can be implemented using any QoS-aware unicast routing algorithm [21, 22], or an overlay QoS-routing protocol, as proposed in [25]. This unicast protocol-independence makes NUQM an ideal candidate for PIM-SM. Also, note that in NUQM no group or tree-specific link-state information is necessary for route computation. Following is a list of state information which are required to be stored within every multicast-aware router. 1. Physical connectivity of the domain (with N nodes). 2. Link-state information A_(i,j), D_(i,j) and L_(i,j) for links E_(i,j), where i,j = 1..N and i != j. While the information in item (1) can be obtained through a link-state update protocol such as OSPF, the information required under item (2) requires either a QoS-aware version of OSPF [21, 22] or an overlay routing protocol like [25]. Like in TIQM, NUQM also has a route computation and a join-signaling phase. It is the signaling phase during which the computed unicast path is merged with a multicast tree. 5.1 Path Computation The problem here is to compute a unicast route from a new receiver Rn to the source S of the destination tree (S,G). The constraints are to satisfy QoS requirements {r_n(S,G), d_n(S,G), l_n(S,G)} of R_n. As stated earlier, computing such a path with more than one additive QoS constraints is NP-complete. Therefore, a heuristic solution that can compute a route in polynomial time is considered. The route computation steps at Rn are as follows. 1. From the network graph, prune all the links E_(i,j) with A_(i,j) <= r_n(S,G). If the pruned graph contains multiple disjoint sub-graphs and the nodes R_n and S becomes disconnected then declare a join-rejection and terminate the algorithm. If R_n and S belong to a connected sub-graph then on every link of that sub-graph there should be enough bandwidth to support a unicast connection from Rn to S. 2. Run Dijkstra's min-cost algorithm on the sub-graph with delay D_(i,j) as the cost parameter. This step will yield the minimum-delay path from R_n to source S. Biswas et. al. [Page 17] Internet Draft A QoS-Aware Routing Framework for PIM-SM June 25, 1999 3. Check the delay (d_n(S,G)) and loss (l_n(S,G)) feasibility for this min-delay path. If feasible, select this path and terminate the algorithm. 4. If the path was not feasible then run Dijkstra's min-cost algorithm with loss L_(i,j) as the cost parameter and repeat steps 2 and 3. 5. If no path is still found then declare a join-rejection for receiver Rn. It should be noted that the path computation is only to find a unicast route and no multicast group or tree-specific link-state information is used for this unicast route computation. In the worst case, Step (1) is O(N^2) complex where N is the number of nodes in the network. Dijkstra's min-cost path computation in Step (2) also has a worst case complexity of O(N^2). Finally, O(N) is the worst case complexity of Step (3). Combining these, the worst case computational complexity of NUQM turns out to be O(N^2). 5.2 Tree Construction and Maintenance The same Join and Join_Ack message formats, as defined for TIQM in Section 4.2, is used here. The join procedure is explained with an example scenario that is depicted in Figure 2. The new receiver R_n first computes a QoS-constraint unicast route {N_5-N_6-N_2-N_1-N_0} to the source S and then it launches a Join message with the entire unicast route as the ERO in it. On its way towards S, the Join message checks if each link on its path belongs to the tree (S,G). If a link is not part of the tree, as in the case of links E_(6,5) and E_(2,6), the required resources are reserved on them. For links that are part of the tree, as in the case of E_(1,2) and E_(0,1), it is first checked if enough resources are already reserved for the existing tree. If the existing reservation is not sufficient for handling receiver R_n, then the tree reservation is increased on those two links. Once the join message reaches the source S, a successful join for R_n is declared. Note that if sufficient resources are not available at any of these steps, a join-rejection occurs and a Join_Ack with failure indication is sent back to receiver R_n. The join procedure, formally described for TIQM in Section 4.2, canbe used for NUQM without any modification. In addition, the explicit routing, crank-back, pruning, soft-state, packet forwarding and loop prevention issues, which were discussed for TIQM, also apply for NUQM in the same way. 5.3 Discussion NUQM provides multicast routing with heterogeneous and receiver-oriented QoS support. While using the same join and prune protocols, NUQM differs from TIQM in that it computes a QoS-aware Biswas et. al. [Page 18] Internet Draft A QoS-Aware Routing Framework for PIM-SM June 25, 1999 unicast route based only on network link-state information. Unlike TIQM, by not using group and tree-specific information, NUQM keeps the control and storage overheads low. For instance, NUQM can easily scale for inter-domain multicast, provided there exists a QoS-extended inter-domain unicast routing protocol [11]. Considering this, NUQM may be of significant potential for next generation QoS-based multicast protocols. The downside of NUQM is its high receiver join-rejection probability in certain scenarios. Consider a QoS-aware unicast route computation that fails because of insufficient A_(i,j) values. This can be a false rejection because the network may already have reservation B_(i,j)(S,G), which is sufficient for accommodating the new receiver with requirement r_n(S,G). In other words, the algorithm is bandwidth-conservative because it anticipates more bandwidth than what is really needed to accommodate a new receiver. When compared with TIQM, this nave protocol is expected to provide similar performance in light to moderately loaded scenarios. For higher load, however, NUQM will have larger join-rejection rate because of its conservative bandwidth assumption. Under the present incremental tree construction model, even for TIQM, a multicast tree may drift away from optimality (in terms of overall bandwidth consumption) as new receivers join and leave the tree. This effect is expected to shrink the performance gap between TIQM and NUQM at high network load. Optimizing the unicast route computation algorithm in the following way can alleviate the mentioned false join-rejection problem of NUQM. A new receiver should use larger (than Ai,j values) bandwidth availability numbers, A_(i,j) + k_(i,j), while computing the unicast route. The excess amount ki,j can be estimated from various higher level topology and tree information. For instance, topologically (physical) closer a link from the tree source, it is more likely that the link is on the tree. And if it is assumed to be on the tree then one can optimistically infer that certain amount of bandwidth on the link is already reserved for the tree. Based on this assumption, along with a suitable estimation model, the receiver can calculate the quantity k_(i,j) for computing the unicast route. Performance details for both TIQM and NUQM, along with the improvements by this heuristics, are reported in [12]. 6. Architecture for QoS-Enhanced PIM-SM After comparing the existing multicast routing protocols it was concluded in Section 2 that PIM-SM is the preferred candidate for QoS extensions. This is mainly because of its ability to support receiver-initiated join, both shared and source-specific trees and its independence from the underlying unicast routing protocol. In this section, an architecture for QoS-oriented PIM-SM is described. This can be implemented using either TIQM or NUQM as its underlying multicast tree construction algorithm. Biswas et. al. [Page 19] Internet Draft A QoS-Aware Routing Framework for PIM-SM June 25, 1999 In order to keep the description specific, the architecture is presented with NUQM. NUQM and TIQM differ only in terms a of their path computation algorithms, but the join and the prune procedures for both of them are identical. This implies that a common set of QoS-enhanced PIM-SM protocol syntaxes can be used for both NUQM and TIQM. Here only NUQM is considered. While joining a multicast tree, a receiver can opt for TIQM or NUQM based on the amount of tree-specific information available to it. After specifying the NUQM integration details, a set of architectural changes that are necessary for QoS support in PIM-SM are outlined. The exact message, protocol and timer syntaxes are deferred to a future design specification. The present protocol architecture is targeted only towards intra-domain multicast applications; the inter-domain scenarios are left for future consideration. 6.1 Service Model Before embarking into the protocol details let us illustrate the multiparty service model that can be supported by QoS-extended PIM-SM. Consider an example of a technical conference with 16 parallel sessions being multicast over the Internet. Video and audio IP packets from all 16 sessions are sent initially over a single PIM-SM shared tree. We shall refer each machine that generates packets for a session as the source for that multicast session. There are therefore 16 sources for this session. A researcher, who wants to tele-attend the conference, takes the following steps on her machine in the given sequence. 1. She launches her Conference-Viewer application which, using a unicast connection, contacts the conference-server (e.g. www.comsoc.org) in order to retrieve the multicast group address assigned to the conference. Optionally, the application may also receive finer grain details including number of multicast sources, their IP addresses and the current traffic specification of each source. 2. Conference-Viewer now sends a PIM-SM Join message in order to join the researcher's machine to the shared tree. After a successful join, 16 windows are created and the conference session proceedings are displayed individually in those windows. At this stage, since all the traffic is coming through the non-optimal shared tree, the viewing performance of a number of sessions could be less than satisfactory. 3. After looking at the session windows, the researcher may decide on a specific session she may want to view at a specific data-rate. Once she indicates this to the application, it initiates a QoS-aware join to the corresponding source-specific tree of PIM-SM. Once joined, a session window with desired video quality pops up at her workstation terminal. At this stage, the application may or Biswas et. al. [Page 20] Internet Draft A QoS-Aware Routing Framework for PIM-SM June 25, 1999 may not leave the shared tree. When it decides to do so, it sends an explicit prune message towards the shared tree. In this model, it was assumed that the shared tree is QoS-unaware. This is because providing QoS over the shared tree requires the receivers to know about the details of the unicast tunneled routes between the sources and the Rendezvous Point (RP) of the tree. This information may or may not be available to the receivers. However, if a receiver has access to this information, possibly from a session server (the conference server in the above example), then it is possible to incorporate QoS constraints even within the shared tree. 6.2 Tree Construction and Switching The mechanisms for QoS-aware tree construction and switching are explained through an example multicast session. Details of standard PIM-SM will not be described here. Only the bare minimum that is necessary for QoS details will be presented. For a comprehensive description of the standard PIM-SM architecture and protocol details the readers are referred to [27]. (1) |S_1 (1) | S_2 (1) |S_1 (1) | S_2 __|__ __|__ __|__ __|__ | N_1 | | N_2 | | N_1 | | N_2 | |_____| |_____| |_____| |_____| (2)\ /(2) (2)\ /(2) \ RP / \ RP / (2) \_____/ (1) (2) \_____/ (1) / \ / \ Shared | N_0 |(*,G):null->3(0) | N_0 |(*,G): Tree | |<----- (ST) | |null->3(0) (ST) \_____/ |Join # \_____/ # |(3) | # |(3) # | | # | # (1) _|___ | # (1) _|___ # / \----- ## / \ # | N_3 |(*,G):1(0)->3(0) # # | N_3 |(*,G): # | | # # | |1(0)->2(0),3(0) # \_____/<---- # # \_____/<-------------- # /(2) |Join # #/(2) \(3) Join| # / | # /# \ | # __/__ (1) | # (1)__/__# __\__ (1) | # | N_4 | | # | N_4 |# | N_5 | | # |_____|-------- # |_____| # |_____|--------- |_____|(*,G):1(0)->2(0) (*,G): |_____| #|_____|(*,G): | 1(0)->2(0) | | 1(0)->2(0) R_1 |(2) R_1 |(2) R_2 |(2) (part-a) (part-b) Figure 3: Receivers R_1 and R_2 join shared tree (*,G) Biswas et. al. [Page 21] Internet Draft A QoS-Aware Routing Framework for PIM-SM June 25, 1999 In Figure 3, a multicast group G is shown with two sources S_1 and S_2. Router N_0 in G is designated as the Rendezvous Point (RP) for its shared tree (*,G). Note that the nodes N_1, N_2 and N_4 represent the designated routers (DR) [4] for sources and receivers, which are directly connected to them. The DRs can be root, leaf and non-leaf nodes of the multicast trees. In order to keep the protocol description simple, the DRs are referred to as sources and receivers. This way, QoS-aware PIM-SM can be described without involving the IGMP [26] protocol details. Note that this assumption does not restrict the scope of the described protocol. (1) |S_1 (1) | S_2 # N_0 __|__ __|__ # -------------->| N_1 | | N_2 | shared # | /|_____| |_____| tree # |Join /(3) \(2) /(2) ## N_3 | / \ RP / # # | B=2 / (2)\_____/ (1) # # | / / \ # # | / | N_0 |(*,G): # # | / | |null->3(0) # # | (1)_/__ \_____/ # N_4 #N_5 | / \ |(3) ----| N_6 |RP=0:(S1,G): | | | 1(2)->2(2) _|___ (1) N_1 # \_____/ / \RP=1:(S_1,G): #source ^ \(2) | N_3 | 1(0)->3(0) # specific | \ | |(*,G): # QoS tree | \ \_____/ 1(0)->2(0),3(0) # (SSQT) Join \ \ (2)/ ^ \ (3) # B=2 \ \ / | \ # \ B=2 \ (1)/ | \ # N_6 \ \ / P| \ # RP=0: \ \ /____ | __\__ # (S_1,G): \ \ | N_4 |-/ | N_5 |(1) # B=2 3(2)->2(2) \-----\|_____| |_____| # (*,G): (3)|_____| |_____|(*,G): # 1(0)->2(0) | | 1(0)->2(0) # R_1 |(2) R_2 |(2) N_4 # Figure 4: Receiver R_1 joins source specific QoS tree (S_1,G) Receiver Join ------------- When a receiver R_1 intends to join the Shared Tree (ST) it uses NUQM (with RP as the source and without any QoS constraint) to compute the join-path to ST. The first part of the figure shows how the join messages are forwarded towards RP. Note that Join and Prune messages are indicated by the 'J' and 'P' symbols in all the following figures. Although in standard PIM-SM both join and prune syntaxes are combined into a single message, these messages will be referred to separately throughout this document for the sake of clarity. The Biswas et. al. [Page 22] Internet Draft A QoS-Aware Routing Framework for PIM-SM June 25, 1999 interface numbers of the routers are shown within dotted circles. Multicast routing-table entries are shown as: [RP=X, (tree): iif(B) -> oif_j(B_j)], where RP indicates the standard PIM-SM RP-Bit, tree refers to the associated tree with this entry, 'iif' is the input interface, B is the bandwidth reservation at input interface, oif_j refers to one of multiple possible output interfaces and B_j indicates the reservation at output interface oif_j. After the join is complete for R_1, the (*,G) tree topology and multicast routing entries for the tree becomes as shown in Figure 3:a. At this stage, a new receiver R_2 joins the shared tree by first computing an ERO using NUQM and then sending a Join message towards RP. The scenario is shown in Figure 3:b, where the computed ERO contains {N_5, N_3, N_0} and the Join message is forwarded up to the node N_3. Note that for this non QoS-aware case, a Join message does not have to travel all the way to the source; it is terminated by the first intermediate node that is already on ST. Tree Switching -------------- Initiation of a tree switching is considered in Figure 4. The receiver R_1 decides to join the Source Specific QoS Tree (SSQT) for source S_1. In this example, R_1 decides to join with a bandwidth specification of 2 units. It first computes the join-path using NUQM (with r_1(S_1,G) = 2) and then it launches a Join message with the computed ERO {N_4, N_6, N_1}. While reserving the required resources (it is indicated as B=2 in Figure 4) the Join message travels all the way up to the source S_1. After a successful join, the SSQT for S_1 would be as shown in the figure. Note that the receiver R_1 remains a member of both (*,G) and (S_1,G) trees. Through (*,G) it continues receiving IP packets from source S_2. After the join to SSQT is complete, R_1 sends a Prune message along the (*,G) tree with a prune list containing source S_1. This is necessary to make sure that packets from S_1 should no longer be delivered to R_1 along the (*,G) tree. The Prune message does not have to travel beyond node N_3 because N_3 still has to forward packets from S_1 along (*,G) in order to deliver them to receiver R_2. After this Prune procedure is completed, the multicast routing table entries across the network looks like those in Figure 4. Figure 5 shows what happens when receiver R_2 also joins the SSQT for S_1. Using NUQM, R_2 computes a join-path ERO {N_5->N_3->N_6->N_1} and that is with a specified data-rate r_2(S1,G) = 5. Then it launches a Join message which travels all the way up to S1 and installs the new bandwidth reservation state B=5 on all the links on its way. Note that in this example the Join message has to traverse all the way to the root of SSQT because the new reservation (B=5) is more than what was already reserved (B=2) for SSQT after receiver R1 has joined it. With r_2(S_1,G) = 1, for instance, the Biswas et. al. [Page 23] Internet Draft A QoS-Aware Routing Framework for PIM-SM June 25, 1999 Join message propagation could have been terminated at node N6. After the Join operation is completed, SSQT entries [RP=0:(S_1,G):4(5) -> 3(5)] at node N_3 and [RP=0:(S1,G):1(5) -> 2(2),3(5)] at node N_6 are set up. At this stage, R_2 sends a Prune message along the (*,G) tree with a prune list containing source S_1. Upon reception of it, node N_3 deletes the S1-specific part of the shared tree entry [RP=1:(S_1,G):1(0) -> 3(0)]. It also realizes that there is no more downstream node that is interested in the packets from S1 along the shared tree. That is why it forwards the Prune message upstream towards RP. Upon reception of this message, RP inactivates the S1-specific part of the shared tree entry and sends a Register-Stop [27] message towards source S_1. This is for instructing S_1 not to forward its encapsulated packets towards RP any longer. (1) |S_1 (1) | S_2 # N_0 __|__ __|__ # -------------->| N_1 | | N_2 | shared # | (3)/|_____| |_____| tree # |Join / (2)\ /(2) ## N_3 | / \ RP / # # | B=5 / (2)\_____/(1) # # | / / \ # # | / | N_0 |RP=1:(S_1,G): # # | / | | null->null # # | (1)_/__ \_____/(*,G):null->3(0) # N_4 # N_5 ----/ \ (3) | ^ | N_6 | | : Prune | |(3) B=5 (1)_|_:_RP=0:(S_1,G): N_1 # \_____/___________ / \ 4(5)->3(5) # source (2)\ <-----------| N_3 |(*,G): # specific RP=0: \ Join (4)| | 1(0)->2(0),3(0) # QoS tree(SSQT) (S_1,G): \ \_____/<..... # 1(5)->2(2), \ (2)/ ^\(3) :Prune # B=5 3(5) \ / | \ : ############# N_3 \ / | \B=5 : # N_6 # B=2 \ /(1) | \ : # # \ _/___ |J __\__: # # (3)\| N_4 | | | N_5 |(1) # B=2 # B=5 RP=0: |_____| -|_____|RP=0:(S_1,G): # # (S_1,G):3(2)->2(2)|_____| |_____| 1(5)->2(5) # # (*,G) :1(0)->2(0) | | (*,G): # # R_1 |(2) R_2 |(2) 1(0)->2(0) N_4 # N_5 # Figure 5: Receiver R2 joins source specific QoS tree (S1,G) Receiver Departure ------------------ Consider the scenario in which receiver R_2 intends to leave the multicast group. This requires two separate Prune Biswas et. al. [Page 24] Internet Draft A QoS-Aware Routing Framework for PIM-SM June 25, 1999 processes for modifying both (*,G) and (S_1,G) trees. Pruning the shared tree is just as in standard PIM-SM. However, for the SSQT, the Prune message may or may not have to travel up to the source; this will be decided based on the (S_1,G) reservation status at different edges of the tree. In this particular example, a Prune for (S_1,G), from R_2, would travel the route {N_5->N_3->N_6->N_1} so that it can delete the entry for (S_1,G) at node N_3 and adjust the bandwidth reservation (from 5 to 2) for (S_1,G) over link E_(1,6). This concludes an outline of the NUQM based PIM-SM architecture that allows receivers to join multicast sessions with heterogeneous QoS specifications. In the present example, it was assumed that the PIM shared tree was QoS-unaware and the source specific tree was QoS-aware. This is because of the assumption that in the service model a receiver does not know anything about the tunneled unicast route between a source and the Rendezvous Point of the shared tree. However, if this information is available, the proposed architecture does not preclude the possibility of constructing QoS-aware shared trees. Note that the protocol description is restricted only for point-to-point links. This is mainly because a clean model is still unavailable for resource reservation over multi-access links like Ethernet. However, the present architecture is fully backward compatible in the sense that non QoS-aware receivers can also be incorporated across shared media links. 6.3 QoS-extensions Necessary for PIM-SM In this section, the necessary functional and protocol extensions for PIM-SM for incorporating QoS are outlined. Although both NUQM and TIQM can work hop-by-hop or explicitly routed, the latter is preferred for its better loop-prevention capabilities. Also, for explicit routing, only the joining receiver has to compute the route whereas for hop-by-hop routing every intermediate router needs to compute a QoS-aware route. For standard PIM-SM, however, routing is performed in a hop-by-hop mode. For better integration of the QoS-aware routing mechanisms, it is proposed that an explicit-routing option for PIM-SM be adopted. Explicit routing requires a Join message to carry a variable length Explicit Routing Object (ERO). In order to avoid IP-fragmentation issues it is proposed that control messages are sent over TCP. Reliable transfer of the Join/Prune messages will also make the protocol simpler at PIM-SM level. The next important issue would be to decide about an acknowledgement strategy. Since the use of TCP has been proposed, there is no need for a hop-by-hop PIM-SM level acknowledgement for reliability. However, an end-to-end Join_Ack is necessary for confirming a receiver-join and for activating reservation along the newly setup route for the new receiver. Biswas et. al. [Page 25] Internet Draft A QoS-Aware Routing Framework for PIM-SM June 25, 1999 The soft-state nature of the standard PIM-SM is still maintained in the proposed framework. While the event-initiated (receiver join/departure) Join/Prune messages need to travel across multiple hops, the timer-initiated messages need to travel only one hop to maintain the soft routing states within routers. Incorporating the reserved bandwidth numbers also augments the format of multicast routing table entries. This is necessary for Join and Prune operations with data-rate guarantees. Finally, the Reverse Path Forwarding (RPF) check [4] is not feasible for forwarding with QoS constraints. This is simply because there is no way to ensure that a router will receive an IP packet, on a QoS-aware multicast tree, through an interface that is used by the router for sending unicast packets to the tree source. So the loop prevention has to be performed at the NUQM (or TIQM) route computation level. 7. Further Enhancements by Tree State Update (TSU) Protocol In this section propose an architectural enhancement to significantly improve the QoS capabilities of the PIM-SM protocol is discussed. In Section 6, the architecture described for QoS-aware PIM-SM can use either TIQM or NUQM as its underlying multicast tree computation algorithm. The motivation for using TIQM is that it can provide better join-success rate than that for NUQM. However, the downside of TIQM is that it relies on tree-specific link state information, dissemination of which may cause both control and storage scalability problems at the multicast routers. The problem stems from the fact that every multicast router within a domain requires to maintain both topology and resource reservation information about all the multicast trees within the domain. This can pose non-scalable control burden on the underlying link-state update protocols. In addition, a multicast router needs to store information about multicast trees, which may never pass through the router itself. Here, a mechanism is described for partially alleviating the scalability problem can while not compromising the functionalities of TIQM. The scheme exploits the service model of PIM-SM (see Section 6.1), in which a receiver first joins the QoS unaware Shared Tree (ST) and then, if necessary, it switches to the Source-Specific QoS Trees (SSQT), one at a time. The idea is that while a receiver is connected to the ST, it can gather SSQT-specific information through the shared tree. When the receiver wants to switch from the ST to a SSQT, it can run TIQM using the gathered tree-specific information. Now the objective here is to devise an efficient tree information gathering mechanism (through the shared tree) that bypasses the conventional link state update protocols. If such a low-cost information gathering mechanism can be designed then clearly TIQM can be used without its Biswas et. al. [Page 26] Internet Draft A QoS-Aware Routing Framework for PIM-SM June 25, 1999 scalability penalties. Following is the description of a mechanism that proposed for distributing SSQT information through the shared tree. 1. Every receiver that had joined an SSQT periodically sends a Tree State Update (TSU) message to its parent router on the tree. TSU contains both membership and reservation information for the link between the receiver and the parent node. 2. A parent router aggregates TSU information from all its children nodes and constructs a new TSU message with the aggregated information. The router also adds its own topological identity and relevant tree-specific link reservation information to the TSU message. This message is now sent up to the router's parent node and the step is carried out recursively until all the TSU messages, carrying entire SSQT topology and reservation information, reach the root of the tree. Steps 1 and 2, together, ensure that at steady state the root of an SSQT maintains information about the entire tree. Since the receivers sends TSU messages periodically, any change in tree topology and its reservation eventually get reflected in the information base that is maintained by the toot. 3. A multicast source, which is the root of its SSQT, periodically sends its tree information to the Rendezvous Point (RP) of its designated multicast group. This way, it is ensured that in steady state the RP of a group will contain tree information for all the existing SSQTs for that group. Note that the TSU propagation is carried out entirely in soft-state and therefore any change in the SSQTs of a group will eventually be informed to the group RP. 4. Now that the RP maintains SSQT-specific information, the question is how to disseminate this to the multicast receivers. Following are the design options. The first option is that when a receiver wants to join an SSQT, it fetches the tree information for the SSQT from the group RP. Using this information, then it executes TIQM in order to calculate the join-path towards the SSQT. The other option is to arrange for the RP to periodically send the tree information for existing group SSQTs along the shared tree. This way, all the receivers on the shared tree will be informed about the current state of all the group's SSQTs. In this model, when a receiver wants to join an SSQT it does not have to explicitly contact the RP to get the relevant SSQT information. In the first approach, the amount of TSU traffic is less than the second one. However, in this approach when a receiver wants to Biswas et. al. [Page 27] Internet Draft A QoS-Aware Routing Framework for PIM-SM June 25, 1999 join an SSQT it has to tolerate an initial latency for accessing the tree-specific information from the RP. In the second approach this latency problem is solved at the expense of additional TSU control traffic. A distinction must be made between this Tree State Update (TSU) and the standard Link State Update (LSU) that is used by multicast protocols like MOSPF. For a conventional LSU protocol every router within a domain requires to receive and store multicast tree-specific information even if it is not part of any of those trees. Whereas in the TSU, only the nodes, that are on the SSQT and ST, keep tree-specific information for that group. This can offer significant performance improvement in terms of both control and storage complexities of tree information dissemination. The described TSU mechanism has the following shortcomings. First, all tree-specific information dissemination through RP makes the protocol vulnerable to a single point of failure. In addition, for the very same reason the RP may also become a point of control-traffic congestion. The situation may become especially aggravated when a single router takes the role of the RP for multiple PIM-SM multicast groups. To summarize, the described TSU mechanism provides a low-cost and scalable mechanism for disseminating multicast tree information. This mechanism can be used in QoS-aware PIM-SM so that a receiver, when switching from shared tree to a source-specific tree, can use TIQM. TIQM, as noted earlier, provides better join-success rate and efficient network bandwidth utilization. 8. Summary and Conclusions In this draft, an IP-multicast routing framework that is capable of delivering receiver-specified and heterogeneous QoS guarantees was described. After analyzing the QoS enhancement potentials of the existing multicast protocols it was concluded that PIM-SM is the best candidate for QoS extensions. This is mainly because of its support for both source-specific and shared multicast trees and its ability to incorporate receiver-initiated joins. After deciding on PIM-SM, the problem of constructing multicast trees with QoS constraints was defined. Two algorithms, namely TIQM and NUQM, for tree construction were proposed. It was shown that TIQM, while being able to compute near-optimal QoS-constrained trees, might have scalability concerns in certain situations. This was overcome in NUQM by restricting the amount of information that is used during tree computation. It was argued that under low to moderate network loads NUQM can deliver performance that is comparable to TIQM. A QoS-extended PIM-SM framework, with NUQM as its underlying route computation algorithm, was proposed. It was shown that with moderate extensions to the existing syntaxes of PIM-SM, it is possible to introduce QoS support. Numerical performance results are reported in [12]. Biswas et. al. [Page 28] Internet Draft A QoS-Aware Routing Framework for PIM-SM June 25, 1999 The following extensions of this work are currently being carried out: - Simulation and performance works for TIQM and NUQM - Detailed design specification with exact message, timers and protocol syntaxes for the proposed framework - Extension of the protocol from intra-domain to the inter-domain case, and - Incorporation into the MPLS framework. References [1] B. Quinn et al, IP Multicast Applications: Challenges and Solutions, IETF Internet-Draft: draft-ietf-mboned-mcast-apps-00.txt, February 1999. [2] Lixia Zhang et al, RSVP: A New Resource ReServation Protocol, IEEE Network Magazine, September 1993. [3] S. Chen et al, An Overview of Quality of Service Routing for Next-Generation High-Speed Networks: Problems and Solutions, IEEE Network Magazine, November/December 1998. [4] S. Paul, Multicasting on Internet and Its Applications, Book published by Kluwer Academic Publishers, 1998. [5] D. Cavendish et al, On the Construction of Low Cost Multicast Trees with Bandwidth Reservation, Lecture Notes in Computer Science, Springer-Verlag, 1998. [6] J. Hou et al, QoS Extensions to CBT, IETF Internet-Draft: draft-hou-cbt-qos-00.txt, February 1999. [7] A. J. Ballardie et al, Core Based Trees, Proceedings of ACM SIGCOMM, San Francisco, 1993. [8] J. Moy, Multicast Extensions to OSPF, IETF Request for Comments: RFC-1584, March 1994. [9] D. Estrin et al, Protocol Independent Multicast-Sparse Mode (PIM-SM): Protocol Specification, IETF Internet-Draft: draft-ietf-idmr-PIM-SM-spec-10.txt, March 1997. Biswas et. al. [Page 29] Internet Draft A QoS-Aware Routing Framework for PIM-SM June 25, 1999 [10] M. Faloutsos et al, QoSMIC: Quality of Service sensitive Multicast Internet protoCol, Proceedings of ACM SIGCOMM, Vancouver, 1998. [11] Y. Rekhter et al, A Border Gateway Protocol 4 (BGP-4), IETF Internet-Draft: draft-ietf-idr-bgp4-08.txt, August 1998. [12] S. K. Biswas et al, Performance of Multicast Route Computation Algorithms for QoS-aware PIM-SM Protocol, NEC CCRL Internal Document, Under Preparation. [13] J. Moy, OSPF Version 2, IETF Request for Comments: RFC-1583, March 1994. [14] S. Bradner et al, Internet Protocol Quality of Service Problem Statement, IETF Internet-Draft: draft-bradner-qos-problem-00.txt, September 1997. [15] L. Wei et al, The Trade-offs of Multicast Trees and Algorithms, Proceedings of ICCCN'94, San Francisco, September 1994. [16] T. Pusateri, Distance Vector Multicast Routing Protocol, IETF Internet-Draft: draft-ietf-idmr-dvmrp-v3-05.txt, October 1997. [17] S. Deering et al, Protocol Independent Multicast Version 2 Dense Mode Specifcation, IETF Internet-Draft: draft-ietf-pim-v2-dm-01.txt, November 1998. [18] M. Doar et al, How Bad is Nave Multicast Routing?, Proceedings of IEEE INFOCOM'93, San Francisco, April 1993. [19] M. R. Garey et al, Computers and Intractibility - A Guide to the Theory of NP-Completeness, Book published by Freeman, California, 1979. [20] E. Dijkstra, A Note on Two Problems in Conexion with Graph, Numerische Mathematik, Vol. 1, 1959. [21] Z. Zhang et al, Quality of Service Extensions to OSPF, IETF Internet-Draft: draft-zhang-qos-ospf-01.txt, September 1997. Biswas et. al. [Page 30] Internet Draft A QoS-Aware Routing Framework for PIM-SM June 25, 1999 [22] W. Wimer, Additional OSPF Extensions for Traffic Engineering and QoS Routing, IETF Internet-Draft: draft-wimer-ospf-traffic-00.txt, September 1997. [23] Private Network-Network Interface Specification version 2.0, btd-pnni-02.00, ATM Forum, September 1997. [24] D. Awduche et al, Requirements for Traffic Engineering Over MPLS, IETF Internet-Draft: draft-awduche-mpls-traffic-eng-00.txt, April 1998. [25] B. Rajagopalan, IGP-Independent Routing Support for Traffic Engineering, IETF Internet-Draft: draft-rajagopalan-igpfree-te-00.txt, February 1999. [26] B. Cain, Internet Group Management Protocol, Version 3, IETF Internet-Draft: draft-ietf-idmr-igmp-v3-01.txt, February 1999. [27] S. Deering et al, The PIM Architecture for Wide-Area Multicast Routing, IEEE/ACM Transactions on Networking, Vol. 4, No. 2, April 1996. [28] B. Rajagopalan et al, Multicast Routing with Resource Reservation, Journal of High Speed Networks, July 1998. [29] D. Wall, Mechanisms for Broadcast and Selective Broadcast, Ph.D. Thesis. Technical Report, no. 190, Stanford University, Stanford, CA, June 1980. Biswas et. al. [Page 31] Internet Draft A QoS-Aware Routing Framework for PIM-SM June 25, 1999 11. Authors' Addresses Subir K. Biswas C&C Research Labs, NEC USA 4 Independence Way, Princeton, NJ, USA Phone : 1 609 951 2995 Fax : 1 609 951 2499 E-mail: skb@ccrl.nj.nec.com Rauf Izmailov C&C Research Labs, NEC USA 4 Independence Way, Princeton, NJ, USA Phone : 1 609 951 2454 Fax : 1 609 951 2499 E-mail: rauf@ccrl.nj.nec.com Bala Rajagopalan C&C Research Labs, NEC USA 4 Independence Way, Princeton, NJ, USA Phone : 1 609 951 2969 Fax : 1 609 951 2499 E-mail: braja@ccrl.nj.nec.com Full Copyright Statement Copyright (C) The Internet Society (1999). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Biswas et. al. [Page 32]