PIM Working Group Ying-Dar. Lin Internet-Draft Nai-Bin. Hsu Expires: October 22, 2001 Dept of Comp. and Info. Sci., National Chiao Tung Uni. Ren-Hung. Hwang Dept of Comp Sci & Info Eng, National Chung Cheng Uni. April 23, 2001 RP Relocation Extension to PIM-SM Multicast Routing draft-ydlin-pim-sm-rp-00.txt Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on October 24, 2001. Abstract The Protocol Independent Multicast-Sparse Mode (PIM-SM) protocol establishes core-base tree to forward multicast datagrams in a network. In PIM-SM, the core or Rendezvous Point (RP) of a group is determined at each multicast router by hashing a group address, i.e., a class-D IP address, to one of the candidate RPs. The hash function is characterized by its ability to evenly and uniquely choose the core for a group and remains insensitive to the geographic Lin, et. al. Expires October 24, 2001 [Page 1] Internet-Draft RP Relocation Extension April 2001 distribution of the group members and the sources. However, it may result in a multicast tree with high cost. This draft presents a relocation mechanism which is extension to PIM- SM, in which RP could be relocated periodically. When a new RP is found, the original RP informs all members to re-join to the new RP. 1. Introduction Multicast routing protocols can be categorized as source-based tree and shared-tree protocols. A source-based tree protocol builds separate trees for each (source, group) pair, that is, each source has its own tree that reaches the active group members, such as DVMRP [1], PIM-DM [2], and MOSPF [3]. On the other hand, shared-based protocols such as PIM-SM [4] and CBT [5] build distributed trees having a central point (or core) to whom all receivers attached. Typically, a source sends datagrams to the RP and RP forwards them to all members through the RP-based multicast tree. Therefore, a shared-tree router only needs to maintain state information for each group instead of for each (source, group) pair. In PIM-SM, new members wanting to join a group send Join messages to a core, called Rendezvous Point (RP), of the distribution tree. The RP administers the specific multicast group(s), and facilitates the joining/leaving of the group members. The address of the RP is determined at each multicast router by mapping a group to one of the candidate RPs with a hash function. However, the chosen RP may be inappropriate for its group, for example, causes more tree cost. The hash function is characterized by its ability to evenly and uniquely choose a candidate RP to be a core in order to balance the service load of each candidate RP in the network. This draft proposes an RP relocation mechanism which is extension to the PIM-SM multicast routing protocol. The hash function of PIM-SM is initially used to obtain an RP for the multicast group. As the sources come and go, RP relocates its location periodically afterward. An attempt is made to obtain an appropriate core location by using the estimated tree cost function to evaluate the appropriateness of the candidates. When RP relocation is determined, the original RP multicasts NEW_RP message to all members to inform them to re-join to the new RP. Consequently, a new distribution tree is built with the least cost. 2. Issues and Motivations PIM-SM is a commonly used multicast routing protocol that provides efficient communication for multicast groups with sparsely distributed members. The designers observed that several hosts Lin, et. al. Expires October 24, 2001 [Page 2] Internet-Draft RP Relocation Extension April 2001 wanting to participate in a multicast conference do not justify having their group's multicast traffic periodically broadcast across the entire network. To eliminate the scaling problem, PIM-SM is designed to limit multicast traffic so that only routers interested in receiving traffic for a particular group will receive it. By unicast routing, the source router knows how to reach and forward the traffic to the RP. Then, RP distributes the traffic to all the members through the RP-based multicast tree. In order to broadcast the set of candidate RPs to the network nodes, the bootstrap router (BSR) is elected for the domain. BSR originates Bootstrap messages to distribute the set of RPs information, which are distributed hop-by-hop throughout the domain. There is only one RP-set per PIM-SM domain. By using a hash function, say Hash(), each router can uniquely map a group address G to one of the routers in the RP-set. That is, candidate C_k is chosen as RP if C_k yields highest hash value Hash(G, M, C_i), for all C_i belong to the RPset. The hash mark, M, allows a number of consecutive groups to resolve to the same RP. The shared tree, although provides better scalability, does not optimize the delivery path through the network. RP for the group is typically designed without respect to its location. Thus, an RP could be located for away from all group members, resulting in inefficient transmission. Steiner tree [6] is a well-known example of finding a minimum multicast tree. However, the multicast tree T(S, R) in PIM-SM, the set of sources, S, and the set of members, R, are dynamic changed. That is, after sources and/or receivers come and go, the least cost multicast tree is different. Therefore, considering a single group in which no member switches over to the shortest path tree (SPT), there is a sequence of least cost multicast trees, {T0, T1, T2, ... }, where Ti=Tree(S ->RPi)í…Tree(RPi -> R). Since the tree Ti is rooted at the RPi the problem of finding the sequence {Ti} is reduced to the problem of finding the sequence {RPi}, where the RP0 is the initial hash RP. Therefore, in this study, assume RP_i is the current RP of G, we propose an RP relocation mechanism which relocates RP when the tree cost reduction, TC(RP_i) - TC(RP_{i+1}), if it is larger than a pre- defined threshold, q. In addition, three control messages are needed to migrate to the new RP and reliably maintain the membership of the group. For convenience of our description, Table 1 summarizes the symbols used in this draft. Lin, et. al. Expires October 24, 2001 [Page 3] Internet-Draft RP Relocation Extension April 2001 TABLE 1. Symbol Definitions --------------------------------------------------------------------- V: set of routers hashRP: initial hashed RP of G E: set of links currentRP: current RP of G S: set of sources newRP: RP be migrated G: multicast group Pc, Pe: Probability functions M: hash mask RPset: set of candidate RPs m: new member Ci: ith candidate RP, Ci IRPset tr: RP relocation interval Ck: min cost candidate, Ck IRPset TC: cost of multicast tree k: current # of group members u, v: networks nodes q: cost reduction threshold n: # of nodes in the network dist: distance or hop count nd: # duplicate distance node in S deg: node degree or connectivity --------------------------------------------------------------------- 3. RP Relocation Algorithm Of priority concern in a distribution tree, the lower the tree costs of hop count and delay implies better routing paths. Tree cost is defined as the sum of the costs of all links of the multicast tree. Our mechanism attempts to reduce the tree cost by relocating the RP for a multicast group. To map an appropriate RP in a group, this study uses an estimated function in [7] to obtain a RP with the minimum tree cost from the RPset of a group. Equation (3) calculates the distribution tree cost if the candidate C_i is taken as the RP, TC(C_i), by taking the average of Eq. (1) and (2), i.e., the maximum and minimum bounds on tree cost. These equations use the distance (i.e. $cost:E -> R+) for each possible destination as the metric. Notably, the distance information is already available to routers. TCmin(C_i)= min. tree cost (1) TCmax(C_i)= max. tree cost (2) TC(C_i) =[ TCmin(C_i)+ TCmax(C_i) ]/2 (3) The best-case tree is linear if a lower bound on the cost, TCmin, of a tree rooted at some node is obtained. The cost of the tree is simply the maximum distance from RP to any sender. When using hop count (i.e. cost:E -> 1) as the metric, the function can obtain a tighter bound by adding the number of senders that are at an equal distance. This bound is owing to that the distribution tree cannot be completely linear, but must have at least an additional link. Lin, et. al. Expires October 24, 2001 [Page 4] Internet-Draft RP Relocation Extension April 2001 -------------------------------------------- Algorithm RPIM-SM ( G ) set_of_sources S; set_of_members R; member_of_G m; relocation_flag rFlag; relocation_timer tr; tree_cost_function TC; Begin while ( 1 ) do if ( m wants to join G ) then hashRP = HashFind(G, RPset) m send 'Join' to hashRP hashRP check if (rFlag==true) then hashRP unicasts 'NEW_RP(currentRP)' to m m send 'Join' to currentRP m send 'Prune' to hashRP endif endif currentRP check if ( tr expired and S changed ) then newRP = Select C_k from RPset, where TC(C_k) is minimum if reduction of TC(C_k) > q then ChangeRP(currentRP, newRP, G) endif endif endwhile End -------------------------------------------- Figure 1: The RP Relocation. Opposite to the lower bound of the cost, the upper bound on the cost, TCmax, of a tree rooted at some node is that no links are shared among the paths to each sender. Therefore, the maximum tree cost is the sum of the sender distances. Also, if the number of group senders is greater than the root degree, the bound is tightened by subtracting the difference to calculate the cost for the sharing links. The estimation function is thus defined as the average of the sum of the minimum cost and maximum cost. As shown in the algorithm of Fig. 1, a new member m that joins a group sends a Join message to the hashed RP. The hashRP verifies whether the RP of this group has been migrated (i.e. the rFlag is true). If so, hashRP redirects m to the new RP by sending the NEW_RP message. Thus, m sends a Join message and established (*, G) state along the path towards the new RP. In addition, m sends a Prune message to the hashRP to discard the previous path. The currentRP also confirms whether if the relocation timer tr has expired and the Lin, et. al. Expires October 24, 2001 [Page 5] Internet-Draft RP Relocation Extension April 2001 set of sources of the group has been changed. If it has, a new RP with significant cost reduction is computed by the above cost functions TC for each candidate RP. -------------------------------------------- Algorithm ChangeRP(currentRP, newRP, G) set_of_members R; member_of_G m; relocation_flag rFlag; Begin currentRP check if (newRP == hashRP) then rFlag = false /* return back to initial RP */ else rFlag = true endif currentRP multicasts 'NEW_RP(currentRP, newRP)' to {hashRP, sources, members} of G Upon receiving 'NEW_RP' message, begin newRP send I_AM_RP message to hashRP each source re-register to the newRP member m send 'Join' to newRP and send 'Prune' to currentRP end currentRP = newRP end ----------------------------------------------------------- Figure 2: The RP Migration. TABLE 2. Relocation State Table +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | G | rFlag | newRPAddr |iif| nbrRPF | oldRPAddr |iif | nbrRPF | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | G1| true | RP1 |i_1| v_1 | RP1' |i_1'| v_1' | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | G2| true | RP2 |i_2| v_2 | RP2' |i_2'| v_2' | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | ..... | | | | --- | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Gn| false | RPn |i_n| v_n | --- | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Once the RP with least cost for a group is found, the function ChangeRP is invoked, as shown in Fig. 2. The currentRP first checks whether the new RP is the hashRP. If it is, the rFlag is reset to false since the currentRP returns back to the original hashed RP. To migrate the distribution tree, message NEW_RP is advertised by the Lin, et. al. Expires October 24, 2001 [Page 6] Internet-Draft RP Relocation Extension April 2001 currentRP to inform the newRP, hashRP, sources, and all members of the group. For encoding this message, each node with (*, G) state maintains a relocation table to record the state of RP relocation, as well as the addresses of the new RP and old RP, as shown in Table 2, newRPAddr and oldRPAddr, respectively. The rFlag indicates whether if RP of this group has been relocated, i.e., not the original hashed RP. When a new RP is obtained, the previous newRP becomes the oldRP and its address is moved to the oldRPAddr; the new RP address is then saved at the newRPAddr. Notably, the corresponding incoming interface (iif) and the Reverse Path Forwarding (RPF) neighbor (nbrRPF) of the newRPAddr/oldRPAddr are also saved. State information of the relocating process prevents the chaos caused by RPF checking during the transition between the two consecutive multicast trees. For example, a node with (*, G1) state continues to distribute packets from the old RP (RP1') at the interface i1', until this node receives packets from the new RP (RP1) at the interface i_1. Furthermore, in order to migrate the distribution tree of the group G1, according to Table 2, the current RP (RP1') of G1 extracts the newRPAddr RP_1 from the table and multicasts the encoded NEW_RP message towards the newRP, all sources and members of group G1. This message redirects all sources to re-register, all members to re-join to the new RP. Note that in case of Gn, since the rFlag is false, this entry records hashRP of G_n at the newRPAddr field. Figure 3 shows the RP state transition of a specific group G. At the system start up, the RP is in the $Hashing$ state and enters the $Selection$ state as timer tr expires. If the selected candidate (C_k) can significantly reduce the current tree cost, ChangeRP is invoked and the $Transition$ state is entered, and thereafter transits further to the $Relocation$ state or $Hashing$ state, depending on whether the newly selected RP is the original hashed RP. Returning back to the $Hashing$ state means that the original hashed RP turns out to be the best RP for G again. On the other hand, if the reduction in tree cost does not exceed the threshold, q, the RP enters the $Relocation$ state or $Hashing$ state, depending on the value of rFlag where true means this RP is a relocated, instead of hashed, one. Note that the RPIM-SM verifies the source list, S, and the tree cost of G every tr time unless the system re-startup. Lin, et. al. Expires October 24, 2001 [Page 7] Internet-Draft RP Relocation Extension April 2001 start up +-------+ (tr expired) & (S changed) +-----------> +Hashing+------------------+ | [rFlag=0] +---+---+ (reduction +---+-+-----+ |Relocation|& (S changed) | Selection | +----+-----+ | (reductionq [rFlag=true, [changeRP] currentRP=Ck] Figure 3. State transition diagram of the RP 4. Control Messages 4.1 NEW_RP This message is used to advertise to the group that who is the new RP from now on. When the computation of tree cost in Fig. 1 is done and the decision of the RP relocation is positive, the current RP sends the message NEW_RP (G, newRP) to the newRP, hashRP, sources, and multicasts it to all members of the group. Then, the migration process is triggered, i.e., the sources re-register to the newRP and members re-join to the newRP. The format of this message is shown as following: Lin, et. al. Expires October 24, 2001 [Page 8] Internet-Draft RP Relocation Extension April 2001 * PIM Ver: 2 * Type: 9 * Reserved=0, ignored on receipt. * Checksum: standard IP checksum. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |PIM Ver| Type | Reserved | Checksum | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Encoded-Unicast-Upstream Neighbor Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Encoded-Multicast Group Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Encoded-newRP Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 4.2 I_AM_RP and RP_CONFIRM Using this message, newRP informs hashRP that íºI am the RP of the group at this time.í¿ Without this information at the hashRP, the incoming receivers could not find the current RP to whom the Join message send. Upon receiving I_AM_RP message, hashRP updates the corresponding relocation state of G, and acknowledge the newRP with the RP_CONFIRM message. The format of this message is shown as following: * PIM Ver: 2 * Type: 10 (I_AM_RP)/11 (RP_CONFIRM) * Reserved=0, ignored on receipt. * Checksum: standard IP checksum. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |PIM Ver| Type | Reserved | Checksum | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Encoded-Unicast-Upstream Neighbor Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Encoded-Multicast Group Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Encoded-RP Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 5. Summary This draft proposes a dynamic mechanism, RPIM-SM, which is an extension to the PIM-SM multicast routing protocol. The original Lin, et. al. Expires October 24, 2001 [Page 9] Internet-Draft RP Relocation Extension April 2001 hashed Rendezvous Point (RP) location may be inappropriate for the group. Therefore, according to the estimated tree cost, the PIM-SM is extended by relocating the RP periodically. In addition, the candidate RP with minimum tree cost is selected to be the new RP of the group. To migrate to the new RP, a additional message, NEW_RP, is needed to notify sources to re-register, and members to re-join the new RP. 6. Security Considerations In addition to PIM-SM, the control messages, NEW_RP, I_AM_RP and RP_CONFIRM may use IPsec to address security concerns. 7. References [1] D. Waitzman, C. Partridge, S.E. Deering, "Distance Vector Multicast Routing Protocol", RFC1075, November 1988. [2] S. Deering, D. Estrin, D. Farinacci, V. Jacobson, A. Helmy, L. Wei, "Protocol Indepenent Multicast Version 2, Dense Mode Specification", Internet Draft, Work in progress, draft-ietf-idmr-pim-dm-05.txt, May 21, 1995. [3] J. Moy, "Multicast Extensions to OSPF", RFC1584, March 1994. [4] D. Estrin, D. Farinacci, A. Helmy, D. Thaler, S. Deering, M. Handley, V. Jacobson, C. Liu, P. Sharma, L. Wei, "Protocol Independent Multicast-Sparse Mode (PIM-SM): Protocol Specification", RFC2362, June 1998. [5] A. Ballardie, "Core Based Trees (CBT) Multicast Routing Architecture", RFC2201, September. 1997. [6] S.L.Hakimi, "Steiner Problem in Graphs and Its Implications," Networks VOL. 1 (1971) pp. 113íV133. [7] David G. Thaler and Chinya V. Ravishankar, "Distributed Center- Location Algorithms", IEEE Journal on Selected Areas in Communications, VOL. 15, No. 3, April 1997. Lin, et. al. Expires October 24, 2001 [Page 10] Internet-Draft RP Relocation Extension April 2001 Authors' Addresses Nai-Bin Hsu Dept of Comp. and Info. Sci., National Chiao Tung Uni. 43, Lane 486, Min-Hu Road Hsinchu, Taiwan 30050 ROC Phone: +886 3 4245612 EMail: naibin@ms7.hinet.net URI: http://speed.cis.nctu.edu.tw/ Ying-Dar Lin Dept of Comp. and Info. Sci., National Chiao Tung Uni. 1001 Ta Hsueh Road Hsinchu, Taiwan 30050 ROC Phone: +886 3 5731899 EMail: ydlin@cis.nctu.edu.tw URI: http://speed.cis.nctu.edu.tw/~ydlin/ Ren-Hung Hwang Dept of Comp Sci & Info Eng, National Chung Cheng Uni. 160, San-Hsing, Ming-Hsung Chiayi, Taiwan 621 ROC Phone: +886 5 2720411 EMail: rhhwang@cs.ccu.edu.tw URI: http://exodus.cs.ccu.edu.tw/~rhhwang/ Lin, et. al. Expires October 24, 2001 [Page 11]