PIM Working Group J. William Atwood Internet-Draft Ritesh Mukherjee Expires: May 21, 2002 Department of Computer Science Concordia University November 21, 2001 RP Relocation in PIM-SM Multicast draft-atwood-pim-sm-rp-01.txt Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions in Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on May 21, 2002. Abstract The Protocol Independent Multicast-Sparse Mode (PIM-SM) protocol uses a core-based tree to forward multicast datagrams to members of a multicast group. Currently there is no method for relocating the core or Rendezvous Point (RP) for a group. Administratively chosen RP's lead to low performance as the sources and receivers in the multicast group join and leave the group. This draft presents a mechanism for dynamically relocating the RP to a position that minimizes the tree cost and improves performance. Atwood/Mukherjee [Page 1] RP relocation in PIM-SM November 2001 1. Introduction The current method of choosing the RP for a multicast group is by administratively selecting a position in the network depending upon the expected source and receiver locations. However, with the variety of today's high bandwidth applications being used, it is impossible for an administrator to "guess" where the sources and receivers are most likely to be [1]. To improve network performance, reduce tree costs and reduce delay, it is necessary for the RP to be dependent on the sources and receivers in the multicast group at any point in time. With sources and receivers joining and leaving groups an RP has to be dynamically relocated to maintain network efficiency and reduce delays. This draft is an alternate proposal to the one made by Lin, et. al. [2]. 2. Motivation and Objectives Sparse mode PIM (PIM-SM) [3,4] uses a center specific tree construction. The distribution center of PIM-SM is called a Rendezvous Point (RP). PIM tree construction revolves around the selection of the RP. All senders for the group must register with the RP. Receivers requesting to join the group set up the path from themselves to the RP. Thus the location of the RP decides the efficiency of the multicast tree. From the user's point of view, efficiency would mean minimizing end-to-end delay. The network would prefer to minimize the tree cost and be able to reduce congestion. The shared tree routed at the RP does not optimize the delivery path through the network. If the RP is located far from the participants, a longer packet delay is experienced and excessive resources are consumed. If an RP is selected for a set of participants that later changes drastically, the RP location will severely affect the quality of the tree for new participants. Therefore the RP should be dynamically located depending upon the participants in the multicast group. Ideally, we would want the RP to be located in the vicinity of the senders to the multicast group. But there is not enough information to proceed with such a calculation. Another method [2] would be to use the RP-set broadcast by the BSR in the bootstrap message. But this RP-set is too small and unlikely to contain the address of the best RP for the multicast group. An algorithm that considers all the potential routers would be best suited for calculating the best RP. In each domain there is an elected BSR (Bootstrap Router) [5], which collects candidate RP information and broadcasts it hop-by-hop to all the routers. Routers that can act as RP's periodically unicast Atwood/Mukherjee Expires May 21, 2002 [Page 2] RP relocation in PIM-SM November 2001 Candidate-RP-advertisement messages (C-RP-Advs) which contain group prefixes for which the candidacy is advertised to the BSR. The BSR then includes a set of these Candidate-RPs (the RP-Set), along with the corresponding group prefixes, in Bootstrap messages it periodically originates. 3. New RP Calculation Assume RPold is the current RP of G; we propose an RP relocation mechanism that relocates the RP to RPnew. It should clearly be noted that this relocation is done only when there is a significant improvement in tree cost. The relocation comes into effect after a certain time period and when there are changes in the group membership. To determine the position of RPnew, RPold uses the topology information available to it from the Interior Gateway Protocol (such as OSPF (Open Short Path First), which has the required information in the Link State Database). We begin by describing two functions that will help in deciding if the RP should be moved to a new location or not. [These functions have been taken from [6] with slight modifications. The paper [6] contains other algorithms for calculating the best RP and they may be used instead of the one mentioned here.] 1. The function cost_calculate(root) uses the estimation function [6] to calculate the cost of a tree at a given root: Let S be the set of sources for the multicast group, u a source in S, root the supplied root to the algorithm and d(a,b) the distance from a to b. First, get a lower bound on the cost of the tree at some node. In this case, the best case tree is linear, that is, all group members lie on the path from the root to the farthest member. When distances are given as hop counts, we can get a slightly tighter bound. Specifically if two group members are at an equal distance, the distribution tree cannot be completely linear, but must have one additional link. Thus, Est. Cost(min) = max d(root,u) + number of duplicate distance nodes in S To get an upper bound on the cost of the tree routed at some node, in the worst case tree no links are shared among the paths to each member. Thus the maximum tree cost is the sum of the member distances. If the number of group members (other than the root, if it a member) is greater than the Atwood/Mukherjee Expires May 21, 2002 [Page 3] RP relocation in PIM-SM November 2001 root degree, we may tighten the bound by subtracting the difference to account for the knowledge of sharing those links. Thus, Est. Cost(max) = sum d(root,u) if |S|<=deg(root) sum d(root,u)-(|S|-deg(root)) otherwise The final estimated cost is given by, Est. Cost = (Est. Cost(min) + Est. Cost(max))/2 2. The function why_move() follows the following algorithm: 1. Assign a variable RPmove with the value RPold. 2. Call cost_calculate with RPmove as the root. Store the cost in a variable currcost and the visited node in a list dontrepeat[]. 3. Call cost_calculate with all of the neighboring unvisited nodes of RPmove as the root. Store the lowest cost in a variable lowestcost, the lowest cost node in a variable RPlow and add the visited nodes to the list dontrepeat[]. 4. If lowestcost is lower than currcost then assign lowestcost to currcost else goto step 6. 5. Assign RPmove with the value RPlow and goto step 3. 6. If RPmove is different from RPold and the estimated cost of the tree with RP at RPold is greater than lowercost by more than the threshold then return RPmove else return NULL. The value of the variable threshold used in the algorithm is a predefined value set by the network administrator. Thus calling the function why_move() returns NULL if the RP should not be relocated or returns RPmove if the RP should be relocated. This leaves us with the problem of dynamically switching from RPold to RPnew. 4. RP Relocation Figure 1 shows the outline of the relocation algorithm. When the relocation timer has expired and there is a change in the group Atwood/Mukherjee Expires May 21, 2002 [Page 4] RP relocation in PIM-SM November 2001 membership the present tree cost is calculated. The tree cost with the RP at the best position is calculated. ---------------------------------------------------- RELOCATE(G) set_of_sources S begin while(1) do if (tr expired and S changed) RPnew = why_move() if (RPnew != NULL) reloc_var = 1 gen_move_join(RPold, RPnew, G) gen_move_RP(RPold, RPnew, G) else reloc_var = 0 endif endif endwhile end tr - relocation timer ---------------------------------------------------- Figure 1: Relocation If RPnew is not NULL then the RP has to be relocated. For this a MOVE_JOIN message is sent to RPnew and Move_RP message is sent to the BSR. RPnew can ignore all the messages if it is not an RP-capable router or is configured not to act as RP for any group. If RPnew is RP- capable then it does the following: On receiving the MOVE_JOIN message RPnew assumes the role of RP for group G. When it receives an encapsulated message from RPold it sends the message out on the multicast tree. When it receives a message from a source it sends the message out on the multicast tree and encapsulates and sends this to RPold. On receiving the Move_RP message if the BSR does not find RPnew in the local pool of candidate RPs it assumes that RPnew is not RP-capable and ignores the message otherwise it does the following: 1. To exclude the current members of the RP-set from being the RP for the group, the BSR can do one of the following: i) It changes the priority of RPnew for that group to the minimum value of priority in that group (i.e. maximum priority) and increases the priorities of all other candidates for that group by 1; or Atwood/Mukherjee Expires May 21, 2002 [Page 5] RP relocation in PIM-SM November 2001 ii) It can only include the address of RPnew in the RP-set being broadcast in the bootstrap message and leave out all the remaining candidates. 2. The BSR then sends out its bootstrap message. The routers on receiving this bootstrap message join the group at RPnew. While this transition is taking place RPold continues to act as the RP for the group. The MOVE_JOIN message sent to RPnew ensures that RPold receives packets from RPnew during the transition. When RPold receives a message from a source it continues to send to the group and also encapsulates and sends the packet to RPnew. On receiving a bootstrap message RPold checks the reloc_var. If the reloc_var for the group is greater than 0 and the RP-set for the group is unchanged, then it increases the value of reloc_var by 1. Once all the members have pruned off from RPold, the tree to RPold is torn down. RPold then sends a MOVE_OVER message to RPnew and resets the reloc_var to zero. The state is then killed and the tree is now set up at RPnew. On receiving the MOVE_OVER message RPnew stops sending encapsulated messages to RPold. This method of RP relocation ensures no loss of messages during the transition. If RPold gets three bootstrap messages from the BSR with the same RP-set then the value of reloc_var will be four. If the value of reloc_var reaches four then RPold assumes that RPnew is not RP- capable and it sets the reloc_var to zero and continues to serve as RP for the multicast group. RPold will also stop sending encapsulated packets to RPnew. 5. Control Messages 5.1 MOVE_JOIN This message is used to tell RPnew that it should act as the new RP for the multicast group G and should send any messages during the transition period to RPold. The IP address is set to the address of RPold and the destination address is set to RPnew. The IP TTL of the packet is the system's normal unicast TTL. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |PIM Ver| Type | Reserved | Checksum | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Encoded-Multicast-Group-Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Atwood/Mukherjee Expires May 21, 2002 [Page 6] RP relocation in PIM-SM November 2001 PIM Ver PIM Version number is 2. Type 9 = Move_Join Reserved Set to zero on transmission. Ignored upon receipt. Checksum The checksum is a standard IP checksum. Encoded-Multicast-Group-Address The encoded multicast group address for the group G whose RP has to be relocated. 5.2 MOVE_RP This message is used to tell the Bootstrap router that the RP for the group should be shifted from RPold to RPnew. The IP address is set to the address of the RPold and the destination address is set to the address of the BSR. The IP TTL of the packet is the system's normal unicast TTL. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |PIM Ver| Type | Reserved | Checksum | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Encoded-Multicast-Group-Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Encoded-RPnew-Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ PIM Ver PIM Version number is 2. Type 10 = Move_RP Reserved Set to zero on transmission. Ignored upon receipt. Atwood/Mukherjee Expires May 21, 2002 [Page 7] RP relocation in PIM-SM November 2001 Checksum The checksum is a standard IP checksum. Encoded-Multicast-Group-Address The encoded multicast group address for the group G whose RP has to be relocated. Encoded-RPnew-Address The encoded address of the router to which the RP should be relocated. 5.3 MOVE_OVER This message is used to tell RPnew that all the members of group G have left the multicast tree at RPold so there is no need to send any more messages to RPold. The IP address is set to the address of RPold and the destination address is set to RPnew. The IP TTL of the packet is the system's normal unicast TTL. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |PIM Ver| Type | Reserved | Checksum | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Encoded-Multicast-Group-Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ PIM Ver PIM Version number is 2. Type 11 = Move_Over Reserved Set to zero on transmission. Ignored upon receipt. Checksum The checksum is a standard IP checksum. Encoded-Multicast-Group-Address Atwood/Mukherjee Expires May 21, 2002 [Page 8] RP relocation in PIM-SM November 2001 The encoded multicast group address for the group G whose RP has to be relocated. 6. Summary This draft proposes a dynamic mechanism for relocating RP in the PIM-SM multicast routing protocol. The original RP may not be the ideal RP for the group and this method of relocating to an appropriate position will help reduce delay and improve network performance. 7. Security Considerations All the control messages may use IPsec to address security concerns. 8. References [1] Robert Voigt, Robert Barton, Shridhar Shukla, "A Tool for configuring Multicast Data Distribution over Global Networks", April 1995 [2] Ying-Dar Lin, Nai-Bin Hsu, Ren-Hung Hwang, "RP relocation extension to PIM-SM multicast routing", Internet Draft, Work in progress, draft-ydlin-pim-sm-rp-01.txt [3] D. Estrin, D. Farinacci, A. Helmy, D. Thaler, S. Deering, M. Handley, V. Jacobson, C. Liu, P. Sharma, L. Wei, "PIM-SM Protocol Specification", RFC 2362, June 1998 [4] Bill Fenner, Mark Handley, Hugh Holbrook, Isidor Kouvelas, "PIM-SM, Protocol Specification", Internet Draft, Work in progress, draft-ietf-pim-sm-v2-new-03.ps [5] Bill Fenner, Mark Handley, David Thaler, "BSR mechanism for PIM-SM", Internet Draft, Work in progress, draft-ietf-pim-sm-bsr- 01.ps Atwood/Mukherjee Expires May 21, 2002 [Page 9] RP relocation in PIM-SM November 2001 [6] David G. Thaler, Chinya V. Ravichankar, "Distributed Center Location Algorithms", IEEE Journal of Selected Areas in Communications, April 1997 Authors' Addresses J. William Atwood Department of Computer Science 1455 de Maisonneuve Blvd. West Montreal, Quebec, H3G 1M8 Canada Phone: +1 514 848 3046 Email: bill@cs.concordia.ca URL: http://www.cs.concordia.ca/~faculty/bill/ Ritesh Mukherjee Department of Computer Science 1455 de Maisonneuve Blvd. West Montreal, Quebec, H3G 1M8 Canada Phone: +1 514 989 4530 Email: mukherj@cs.concordia.ca URL: http://www.cs.concordia.ca/~grad/mukherj Atwood/Mukherjee Expires May 21, 2002 [Page 10]