PIM Working Group Ying-Dar. Lin Internet-Draft Nai-Bin. Hsu Expires: April 22, 2002 Dept of Comp. and Info. Sci., National Chiao Tung Uni. Ren-Hung. Hwang Dept of Comp Sci and Info Eng, National Chung Cheng Uni. October 22, 2001 RP Relocation Extension to PIM-SM Multicast Routing draft-ydlin-pim-sm-rp-01 Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on April 22, 2002. Abstract The Protocol Independent Multicast-Sparse Mode (PIM-SM) protocol establishes core-base tree to forward multicast datagrams in a network. In PIM-SM, the core or Rendezvous Point (RP) of a group is determined at each multicast router by hashing a group address, i.e., a class-D IP address, to one of the candidate RPs. The hash function is characterized by its ability to evenly and uniquely choose the core for a group and remains insensitive to the geographic distribution of the group members and the sources. However, it may result in a multicast tree with high cost. This draft presents a relocation mechanism which is extension to PIM- Lin, et. al. Expires April 22, 2002 [Page 1] Internet-Draft RP Relocation Extension October 2001 SM, in which RP could be relocated periodically. When a new RP is found, the original RP informs all members to re-join to the new RP. 1. Introduction Multicast routing protocols can be categorized as source-based tree and shared-tree protocols. A source-based tree protocol builds separate trees for each (source, group) pair, that is, each source has its own tree that reaches the active group members, such as DVMRP [1], PIM-DM [2], and MOSPF [3]. On the other hand, shared-based protocols such as PIM-SM [4] and CBT [5] build distributed trees having a central point (or core) to whom all receivers attached. Typically, a source sends datagrams to the RP and RP forwards them to all members through the RP-based multicast tree. Therefore, a shared-tree router only needs to maintain state information for each group instead of for each (source, group) pair. In PIM-SM, new members wanting to join a group send Join messages to a core, called Rendezvous Point (RP), of the distribution tree. The RP administers the specific multicast group(s), and facilitates the joining/leaving of the group members. The address of the RP is determined at each multicast router by mapping a group to one of the candidate RPs with a hash function. However, the chosen RP may be inappropriate for its group, for example, causes more tree cost. The hash function is characterized by its ability to evenly and uniquely choose a candidate RP to be a core in order to balance the service load of each candidate RP in the network. This draft proposes an RP relocation mechanism which is extension to the PIM-SM multicast routing protocol. The hash function of PIM-SM is initially used to obtain an RP for the multicast group. As the sources come and go, RP relocates its location periodically afterward. An attempt is made to obtain an appropriate core location by using an estimated tree cost function to evaluate the appropriateness of the candidates. When RP relocation is determined, the original RP multicasts NEW_RP message to all members to inform them to re-join to the new RP. Consequently, a new distribution tree is built with the least cost. 2. Issues and Motivations PIM-SM is a commonly used multicast routing protocol that provides efficient communication for multicast groups with sparsely distributed members. The designers observed that several hosts wanting to participate in a multicast conference do not justify having their group's multicast traffic periodically broadcast across the entire network. To eliminate the scaling problem, PIM-SM is designed to limit multicast traffic so that only routers interested Lin, et. al. Expires April 22, 2002 [Page 2] Internet-Draft RP Relocation Extension October 2001 in receiving traffic for a particular group will receive it. By unicast routing, the source router knows how to reach and forward the traffic to the RP. Then, RP distributes the traffic to all the members through the RP-based multicast tree. In order to broadcast the set of candidate RPs to the network nodes, the bootstrap router (BSR) is elected for the domain. BSR originates Bootstrap messages to distribute the set of RPs information, which are distributed hop-by-hop throughout the domain. There is only one RP-set per PIM-SM domain. By using a hash function, say Hash(), each router can uniquely map a group address G to one of the routers in the RP-set. That is, candidate C_k is chosen as RP if C_k yields highest hash value Hash(G, M, C_i), for all C_i belong to the RPset. The hash mark, M, allows a number of consecutive groups to resolve to the same RP. The shared tree, although provides better scalability, does not optimize the delivery path through the network. RP for the group is typically designed without respect to its location. Thus, an RP could be located for away from all group members, resulting in inefficient transmission. Steiner tree [6] is a well-known example of finding a minimum multicast tree. However, the multicast tree T(S, R) in PIM-SM, the set of sources, S, and the set of members, R, are dynamic changed. That is, after sources and/or receivers come and go, the least cost multicast tree is different. Therefore, considering a single group in which no member switches over to the shortest path tree (SPT), there is a sequence of least cost multicast trees, {T0, T1, T2, ... }, where Ti=Tree(S ->RPi) U Tree(RPi -> R). Since the tree Ti is rooted at the RPi the problem of finding the sequence {Ti} is reduced to the problem of finding the sequence {RPi}, where the RP0 is the initial hash RP. Therefore, in this study, assume RP_i is the current RP of G, we propose an RP relocation mechanism which relocates RP when the tree cost reduction, TC(RP_i) - TC(RP_{i+1}), if it is larger than a pre- defined threshold, q. In addition, three control messages are needed to migrate to the new RP and reliably maintain the membership of the group. For convenience of our description, Table 1 summarizes the symbols used in this draft. Lin, et. al. Expires April 22, 2002 [Page 3] Internet-Draft RP Relocation Extension October 2001 TABLE 1. Symbol Definitions --------------------------------------------------------------------- R: set of receiving routers currentRP: current RP of G S: set of source routers newRP: RP be migrated M: hash mask hashRP: initial hashed RP of G m: new member oldRP: previous RP of G tr: RP relocation interval candRP: candidate RP G: multicast group q: cost reduction threshold G.RP: RP addr. of group G TC: cost of multicast tree G.rFlag: relocation flag of G C_i: ith candidate RP, in RPset RPset: set of candidate RPs C_k: min cost candidate RP, in RPset --------------------------------------------------------------------- 3. RP Relocation Algorithm Of priority concern in a distribution tree, the lower the tree costs of hop count and delay implies better routing paths. Tree cost is defined as the sum of the costs of all links of the multicast tree. Our mechanism attempts to reduce the tree cost by relocating the RP for a multicast group. To map an appropriate RP in a group, %this study uses an estimated function can be used. For example, computing by equations in [7] we can obtain a RP with the nearly minimum tree cost from the RPset of a group. Equation (3) calculates the distribution tree cost if the candidate C_i is taken as the RP, TC(C_i), by taking the average of Eq. (1) and (2), i.e., the maximum and minimum bounds on tree cost. These equations use the distance (i.e. cost:E -> R+) for each possible destination as the metric. Notably, the distance information is already available to routers. TCmin(C_i)= min. tree cost (1) TCmax(C_i)= max. tree cost (2) TC(C_i) =[ TCmin(C_i)+ TCmax(C_i) ]/2 (3) The best-case tree is linear if a lower bound on the cost, TCmin, of a tree rooted at some node is obtained. The cost of the tree is simply the maximum distance from RP to any sender. When using hop count (i.e. cost:E -> 1) as the metric, the function can obtain a tighter bound by adding the number of senders that are at an equal distance. This bound is owing to that the distribution tree cannot be completely linear, but must have at least an additional link. Lin, et. al. Expires April 22, 2002 [Page 4] Internet-Draft RP Relocation Extension October 2001 -------------------------------------------- Algorithm RPIM-SM ( G ) set_of_sources S; set_of_members R; member_of_G m; relocation_flag rFlag; relocation_timer tr; tree_cost_function TC; Begin while ( 1 ) do case hashRP: if receive 'Join' from a new member m // hashRP found by Hash function if (G.rFlag==true) then unicasts 'NEW_RP(currentRP)' to m // m send 'Join' to currentRP // m send 'Prune' to hashRP endif endif case currentRP: if ( tr expired and S changed ) then newRP = Select C_k from RPset, where TC(C_k) is minimum if reduction of TC(C_k) > q then multicasts 'NEW_RP(currentRP, newRP)' to {hashRP, sources, members} of G endif endif endwhile End -------------------------------------------- Figure 1: The RP Relocation for the Group G. Opposite to the lower bound of the cost, the upper bound on the cost, TCmax, of a tree rooted at some node is that no links are shared among the paths to each sender. Therefore, the maximum tree cost is the sum of the sender distances. Also, if the number of group senders is greater than the root degree, the bound is tightened by subtracting the difference to calculate the cost for the sharing links. The estimation function is thus defined as the average of the sum of the minimum cost and maximum cost. As shown in the algorithm of Fig.1, a new member m that joins a group sends a Join message to the hashed RP. The hashRP verifies whether the RP of this group has been migrated (i.e. the rFlag is true). If so, hashRP redirects m to the new RP by sending the NEW_RP message. Thus, m sends a Join message and established (*, G) state along the path towards the new RP. In addition, m sends a Prune message to the Lin, et. al. Expires April 22, 2002 [Page 5] Internet-Draft RP Relocation Extension October 2001 hashRP to discard the previous path. The currentRP also confirms whether if the relocation timer tr has expired and the set of sources of the group has been changed. If it has, a new RP with significant cost reduction is computed by the above cost functions TC for each candidate RP. -------------------------------------------- Algorithm ChangeRP(currentRP, newRP, G) set_of_members R; member_of_G m; relocation_flag rFlag; Begin if (newRP == hashRP) then G.rFlag = false /* return back to initial RP */ else G.rFlag = true endif G.oldRP=G.RP G.RP=newRP case newRP: send I_AM_RP message to hashRP case source: re-register to the newRP case member: send 'Join' to newRP send 'Prune' to currentRP end ----------------------------------------------------------- Figure 2: The RP Migration of Group G. TABLE 2. Relocation State Table +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | G | rFlag | G.RP |iif| nbrRPF | G.oldRP |iif | nbrRPF | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | G1| true | RP1 |i_1| v_1 | RP1' |i_1'| v_1' | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | G2| true | RP2 |i_2| v_2 | RP2' |i_2'| v_2' | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | ..... | | | | --- | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Gn| false | RPn |i_n| v_n | --- | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Once the RP with least cost for a group is found, the function ChangeRP is invoked, as shown in Fig.2. The currentRP first checks whether the new RP is the hashRP. If it is, the rFlag is reset to Lin, et. al. Expires April 22, 2002 [Page 6] Internet-Draft RP Relocation Extension October 2001 false since the currentRP returns back to the original hashed RP. To migrate the distribution tree, message NEW_RP is advertised by the currentRP to inform the newRP, hashRP, sources, and all members of the group. For keeping this relocation state, each node with (*, G) state maintains a relocation table to record the state of RP relocation, as well as the addresses of the RP and old RP, as shown in Table 2, G.RP and G.oldRP, respectively. The rFlag indicates whether if RP of this group has been relocated, i.e., not the original hashed RP. When a new RP is obtained, the current RP becomes the oldRP and its address is moved to the G.oldRP; the new RP address is then saved at the G.RP. Notably, the corresponding incoming interface (iif) and the Reverse Path Forwarding (RPF) neighbor (nbrRPF) of the G.newRP/G.oldRP are also saved. State information of the relocating process prevents the chaos caused by RPF checking during the transition between the two consecutive multicast trees. For example, a node with (*, G1) state continues to receive and distribute packets from the old RP (RP1') at the interface i_1', until this node receives packets from the new RP (RP1) at the interface i_1. Furthermore, in order to migrate the distribution tree of the group G1, the current RP (RP1') of G1 multicasts the encoded NEW_RP message towards the newRP, all sources (First-hop routers) and members (Last-hop routers) of group G1. This message redirects all sources to re-register, all members to re-join to the new RP. Note that in case of Gn, since the rFlag is false, this entry records hashRP of Gn at the G.RP field. Figure 3 shows the RP state transition of a specific group G. At the system start up, the RP is in the $Hashing$ state and enters the $Selection$ state as timer tr expires. If the selected candidate (C_k) can significantly reduce the current tree cost, migration of RP tree is invoked and the $Transition$ state is entered, and thereafter transits further to the $Relocation$ state or $Hashing$ state, depending on whether the newly selected RP is the original hashed RP. Returning back to the $Hashing$ state means that the original hashed RP turns out to be the best RP for G again. On the other hand, if the reduction in tree cost does not exceed the threshold, q, the RP enters the $Relocation$ state or $Hashing$ state, depending on the value of rFlag where true means this RP is a relocated, instead of hashed, one. Note that the RPIM-SM verifies the source list, S, and the tree cost of G every tr time unless the system re-startup. Lin, et. al. Expires April 22, 2002 [Page 7] Internet-Draft RP Relocation Extension October 2001 start up +-------+ (tr expired) & (S changed) +-----------> +Hashing+------------------+ | [rFlag=0] +---+---+ (reduction +---+-+-----+ |Relocation|& (S changed) | Selection | +----+-----+ | (reductionq [rFlag=true, [changeRP] currentRP=Ck] Figure 3. State transition diagram of the RP Relocation. 4. Control Messages and Timers 4.1 Receiving Join Messages This message is sent toward the RP or a source periodically, to create or refresh a branch of a multicast distribution tree. 4.1.1 At the hashRP if (G.rFlag == true) then send NEW_RP(G.RP) to receiving router else handles it as in PIM-SM. endif 4.1.2 At the currentRP handles this message as in PIM-SM. Lin, et. al. Expires April 22, 2002 [Page 8] Internet-Draft RP Relocation Extension October 2001 4.2 Receiving Prune Messages This message is sent toward a source by a router when it wish to leave the multicast group. This message is also sent toward the RP when a router switches from the RP tree to the source-rooted shortest path tree. RPIM-SM handles this message as in PIM-SM. 4.3 Receiving NEW_RP Messages This message is used to advertise to the group that who is the new RP from now on. When the computation of tree cost in Fig. 1 is done and the decision of the RP relocation is positive, the current RP sends the message NEW_RP (G, newRP) to the newRP, hashRP, sources, and multicasts it to all members of the group. Then, the migration process is triggered, i.e., the sources re-register to the newRP and members re-join to the newRP. 4.3.1 At the candRP send I_AM_RP message to hashRP G.RP=newRP 4.3.2 At the source router send register message to newRP G.oldRP=G.RP G.RP=newRP continue to send packet to G.oldRP for Td time period 4.3.3 At the receiving router send Join message to newRP G.oldRP=G.RP G.RP=newRP upon receiving packet from G.RP send Prune message to G.oldRP Lin, et. al. Expires April 22, 2002 [Page 9] Internet-Draft RP Relocation Extension October 2001 4.3.4 At the hashRP if (newRP==hashRP) then G.rFlag=false; else G.RP=hashRP; endif 4.3.5 Message Format The format of this message is shown as following: * PIM Ver: 2 * Type: 9 * Reserved=0, ignored on receipt. * Checksum: standard IP checksum. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |PIM Ver| Type | Reserved | Checksum | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Encoded-Unicast-Upstream Neighbor Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Encoded-Multicast Group Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Encoded-newRP Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 4.4 Receiving I_AM_RP Messages Using this message, newRP informs hashRP that íºI am the RP of the group at this time.í¿ Without this information at the hashRP, the incoming receivers could not find the current RP to whom the Join message send. 4.4.1 At the hashRP G.RP=newRP send RP_CONFIRM message to G.RP restart Reset Timer Th 4.5 Receiving RP_CONFIRM Messages Upon receiving I_AM_RP message, hashRP updates the corresponding relocation state of G, and acknowledge the newRP with the RP_CONFIRM Lin, et. al. Expires April 22, 2002 [Page 10] Internet-Draft RP Relocation Extension October 2001 message. 4.5.1 At the currentRP restart Expiry Timer Te 4.5.2 Message Format The format of I_AM_RP and RP_CONFIRM messages is shown as following: * PIM Ver: 2 * Type: 10 (I_AM_RP)/11 (RP_CONFIRM) * Reserved=0, ignored on receipt. * Checksum: standard IP checksum. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |PIM Ver| Type | Reserved | Checksum | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Encoded-Unicast-Upstream Neighbor Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Encoded-Multicast Group Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Encoded-RP Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 4.6 Relocation Timer Tr Expires at the currentRP if (S changed) then newRP = Select C_k from RPset, where TreeCost(C_k) is minimum if reduction of TreeCost(C_k) > q then send NEW_RP to { newRP, hashRP, S, R } restart Relocation Timer Tr if(currentRP is hashRP) G.rFlag=true; endif endif 4.7 Expiry Timer Te Expires at the currentRP send I_AM_RP message to hashRP Lin, et. al. Expires April 22, 2002 [Page 11] Internet-Draft RP Relocation Extension October 2001 4.8 Reset Timer Th Expires at the hashRP send NEW_RP message to { S, R } restart Relocation Timer Tr G.rFlag=false; 4.9 Duplicate Timer Td Expires at the source router stop sending packet to G.oldRP 5. Summary This draft proposes a dynamic mechanism, RPIM-SM, which is an extension to the PIM-SM multicast routing protocol. The original hashed Rendezvous Point (RP) location may be inappropriate for the group. Therefore, according to an estimated tree cost, the PIM-SM is extended by relocating the RP periodically. In addition, the candidate RP with minimum tree cost is selected to be the new RP of the group. To migrate to the new RP, a additional message, NEW_RP, is needed to notify sources to re-register, and members to re-join the new RP. 6. Security Considerations In addition to PIM-SM, the control messages, NEW_RP, I_AM_RP and RP_CONFIRM may use IPsec to address security concerns. Lin, et. al. Expires April 22, 2002 [Page 12] Internet-Draft RP Relocation Extension October 2001 7. References [1] D. Waitzman, C. Partridge, S.E. Deering, "Distance Vector Multicast Routing Protocol", RFC1075, November 1988. [2] S. Deering, D. Estrin, D. Farinacci, V. Jacobson, A. Helmy, L. Wei, "Protocol Indepenent Multicast Version 2, Dense Mode Specification", Internet Draft, Work in progress, draft-ietf-idmr-pim-dm-05.txt, May 21, 1995. [3] J. Moy, "Multicast Extensions to OSPF", RFC1584, March 1994. [4] D. Estrin, D. Farinacci, A. Helmy, D. Thaler, S. Deering, M. Handley, V. Jacobson, C. Liu, P. Sharma, L. Wei, "Protocol Independent Multicast-Sparse Mode (PIM-SM): Protocol Specification", RFC2362, June 1998. [5] A. Ballardie, "Core Based Trees (CBT) Multicast Routing Architecture", RFC2201, September. 1997. [6] S.L.Hakimi, "Steiner Problem in Graphs and Its Implications," Networks VOL. 1 (1971) pp. 113-133. [7] David G. Thaler and Chinya V. Ravishankar, "Distributed Center- Location Algorithms", IEEE Journal on Selected Areas in Communications, VOL. 15, No. 3, April 1997. Authors' Addresses Ying-Dar Lin Dept of Comp. and Info. Sci., National Chiao Tung Uni. No. 1001, Ta-Hsueh Road Hsinchu, Taiwan 30050 ROC Phone: +886 3 5731899 EMail: ydlin@cis.nctu.edu.tw URI: http://speed.cis.nctu.edu.tw/~ydlin/ Lin, et. al. Expires April 22, 2002 [Page 13] Internet-Draft RP Relocation Extension October 2001 Nai-Bin Hsu Dept of Comp. and Info. Sci., National Chiao Tung Uni. No. 43, Lane 486, Ming-Hu Road Hsinchu, Taiwan 30050 ROC Phone: +886 3 4245612 EMail: naibin@ms7.hinet.net URI: http://speed.cis.nctu.edu.tw/ Ren-Hung Hwang Dept of Comp Sci and Info Eng, National Chung Cheng Uni. No. 160, SAN-HSING, MING-HSIUNG Chiayi county, Taiwan 621 ROC Phone: +886 5 2720411 EMail: rhhwang@cs.ccu.edu.tw URI: http://exodus.cs.ccu.edu.tw/~rhhwang/ Lin, et. al. Expires April 22, 2002 [Page 14]