INTERNET-DRAFT Zhaohui (Jeffrey) Zhang Intended Status: Proposed Standard WeeSan Lee Expires: 2012-04-24 Juniper Networks, Inc. October 24, 2011 Avoid Unnecessary Upstream Traffic in BIDIR-PIM draft-zzhang-pim-bidir-rp-prune-graft-00 Abstract In a BIDIR-PIM network, data packets could be forwarded from the source to the RP and then get dropped there because there is no receiver on the other side of the distribution tree. This document specifies a way to prune the unnecessary traffic with minimum additional states and procedures. Status of this Memo This Internet-Draft is submitted to IETF in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/1id-abstracts.html The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html Copyright and License Notice Copyright (c) 2011 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of Zhang Expires 2012-04-24 [Page 1] INTERNET DRAFT BIDIR-PIM RP-Prune-Graft 2011-10-24 publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1 Terminology . . . . . . . . . . . . . . . . . . . . . . . . 8 2. Protocol Specification . . . . . . . . . . . . . . . . . . . . 8 2.1. RP-Prune state . . . . . . . . . . . . . . . . . . . . . . 8 2.2 RP-Prune/Graft message . . . . . . . . . . . . . . . . . . . 8 2.2.1 Sending of RP-Prune message . . . . . . . . . . . . . . 9 2.2.2 Receiving of RP-Prune message . . . . . . . . . . . . . 9 2.2.3 Sending of RP-Graft message . . . . . . . . . . . . . . 10 2.2.4 Receiving of RP-Graft message . . . . . . . . . . . . . 10 2.3 RP-Prune-Keep/Cancel message . . . . . . . . . . . . . . . . 10 2.3.1 Sending of RP-Prune-Keep message . . . . . . . . . . . . 10 2.3.2 Receiving of RP-Prune-Keep message . . . . . . . . . . . 10 2.3.3 Sending of RP-Prune-Cancel message . . . . . . . . . . . 11 2.3.4 Receiving of RP-Prune-Cancel message . . . . . . . . . . 11 2.4 Virtual RPs . . . . . . . . . . . . . . . . . . . . . . . . 11 2.5 (*,G) RP-Prune state maintenance . . . . . . . . . . . . . . 12 2.6 Packet format . . . . . . . . . . . . . . . . . . . . . . . 13 3 Security Considerations . . . . . . . . . . . . . . . . . . . . 14 4 IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 14 5. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 14 6 References . . . . . . . . . . . . . . . . . . . . . . . . . . 14 6.1 Normative References . . . . . . . . . . . . . . . . . . . 14 6.2 Informative References . . . . . . . . . . . . . . . . . . 15 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 15 Zhang Expires 2012-04-24 [Page 2] INTERNET DRAFT BIDIR-PIM RP-Prune-Graft 2011-10-24 1 Introduction [RFC4601] defines Protocol Independent Multicast - Sparse mode (PIM- SM), an efficient multicast routing protocol; however, the use of both the shortest-path tree and shared tree make the protocol extremely complex. [RFC5015] defines Bidirectional PIM (BIDIR-PIM), which distributes the packets using only bidirectional shared trees. Comparing to PIM-SM, BIDIR-PIM is a much simpler protocol; however, it is not without drawbacks. One drawback is that the data packets could be forwarded unnecessarily to the RP, Rendezvous Point, and then get dropped there when there is no receiver at all on the shared tree. Particularly, there are two such scenarios: (i) There is no receiver for the group G at all; (ii) The senders and receivers for the group G share the same branch towards the RP. For simplicity, we assume explicit tracking is enabled. Additionally, we use a RPA that belongs to a real router's loopback interface in our examples. The same procedures are applicable to more general situations (see section 2.4) with some changes. Zhang Expires 2012-04-24 [Page 3] INTERNET DRAFT BIDIR-PIM RP-Prune-Graft 2011-10-24 +----+ | RP | +----+ / \ / \ / \ +----+ +----+ | R1 | | R2 | +----+ +----+ / \ / \ / \ / \ +----+ +----+ +----+ |src1|---| R3 | | R4 | +----+ +----+ +----+ / \ / \ / \ / \ / \ / \ +----+ +----+ +----+ +----+ | R5 | | R6 | | R7 | | R8 | +----+ +----+ +-+--+ +----+ | +-+--+ |src2| +----+ Figure 1 Scenario (i) can be illustrated with Figure 1. There is no receiver for group G at all. In this example, the source, src1, starts sending traffic, followed by another source, src2, later. We assume each BIDIR-PIM router except the RP has installed at least a (*, G-prefix) route, which is responsible for forwarding the traffic for groups belongs to the G-prefix from the senders to the RP. On the RP itself, a special (*,G-prefix) route, called Resolve Route in this document, is installed to trigger (throttled) notifications to the control plane when packets match this route. This notification is similar to PIM-SM's way of notification of new flows, but here it serves the purpose of indicating that traffic is being unecessarily received (if there were receivers, the traffic would have matched a more specific (*,G) forwarding route triggered by joins). The procedures work as follows: when RP receives a data packet from src1, the (*,G-prefix) resolve route (vs. a more specific (*,G) route) is hit and the control plane is notified (that traffic is Zhang Expires 2012-04-24 [Page 4] INTERNET DRAFT BIDIR-PIM RP-Prune-Graft 2011-10-24 being received unnecessarily). RP then triggers a (*,G) RP-Prune to ALL_PIM_ROUTERS on the interface on which the packet arrived. Upon receiving the RP-Prune, R1 will install a similar (*,G) resolve route. In a similiar fashion, a subsequent packet from src1 would trigger R1 to send a (*,G) RP-Prune out of the interface that connects to R3, and eventually stop the data packets from src1 traveling all the way to the RP and get dropped there. When src2 starts sending packets toward the RP, the packets arrive at R1. The resolve route on R1 will prompt R1 to trigger a RP-Prune out to the interface connecting to R4, which results a (*,G) resolve route gets installed on R4. Similarly, a subsequent packet from src2 will trigger R4 to send a RP-Prune out of the interface connecting to R7. This effectively stops src2 from sending more traffic towards the RP. Note that the traffic from src2 does not need to get to the RP before it is pruned back. The RP-Prune message is purposedly named with the "RP-" prefix to indicate that it is triggered from the RP. Unlike the original BIDIR-PIM prune message, which travels upstream towards the RP, the RP-Prune message travels downstream. Zhang Expires 2012-04-24 [Page 5] INTERNET DRAFT BIDIR-PIM RP-Prune-Graft 2011-10-24 +----+ | RP | (*,G) +----+ / \ / \ (*,G)/ \ +----+ +----+ | R1 | | R2 | +----+ +----+ / \ / \ / \ / \ (*,G) +----+ +----+ +----+ +-----+ |src1|---| R3 | | R4 |---|rcvr1| +----+ +----+ +----+ +-----+ / \ / \ / \ / \ / \ / \ (*,G) +----+ +----+ +----+ +----+ +-----+ | R5 | | R6 | | R7 | | R8 |---|rcvr2| +----+ +----+ +-+--+ +----+ +-----+ | +-+--+ |src2| +----+ Figure 2 Scenerio (ii) can be illustrated with Figure 2. There are two receivers for group G, off R4 and R8 respectively. As a result, there are (*,G) join states on R8, R4, R1, and RP. Additionally, there are two senders for group G, off R7 and R3 respectively. In this scenerio, the traffic from src1 would go up to R1, and then up to RP. At the same time, R1 would send the traffic down to R4 and R8 where the receivers are. The problem here is that the traffic from src1 should stop at R1 without going all the way to RP and gets dropped there. Similarly, the traffic from src2 should make an U-turn on R4 to R8, without the need to go to R1 and RP for there are no other receivers on the shared tree. To achieve the purpose of pruning the upstream traffic wherever it is appropriate, the RP initiates (*,G) RP-Prune messages for (*,G) join states that have only one downstream neighbor, towards that neighbor. When receiving the RP-Prune message, the downstream router does the Zhang Expires 2012-04-24 [Page 6] INTERNET DRAFT BIDIR-PIM RP-Prune-Graft 2011-10-24 same by checking its corresponding (*,G) join state. If the number of the downstream interface is one and the number of downstream neighbors is one, the router further propagates the RP-Prune message downstream to that only neighbor; otherwise, the router terminates the RP-Prune message. After receiving a RP-Prune message, the downstream router removes the upstream interface from its outgoing interface list so that traffic will not be forwarded upstream. It also creates/updates a (*,G) RP- Prune state to remember the upstream neighbor from which the RP-Prune arrived. Additionally, before sending out a RP-Prune message, it also creates/updates a (*,G) RP-Prune state to remember the interface and/or neighbor the RP-Prune is sent to. In Figure 2, the RP-Prune message will be sent from the RP to R1, from R1 to R4, and terminated at R4 whose (*,G) state has two downstream interfaces: one to rcvr1, another to R8 where rcvr2 is attached. The reason R4 does not propagates the RP-Prune further down to R8 is because if another source off R8 start sending traffic, the traffic needs to travel upwards toward R4 to be forwarded to rcvr1. Now, let's say src1 becomes a receiver as well and sends a (*,G) join toward the RP via R3 and R1. Now the number of downstream interface/neigbhor of the (*,G) state on R1 becomes two. That will trigger a RP-Graft message being sent out of/to all the interfaces/neighbors on which the RP-Prune was sent out(the interface that the new (*,G) join arrived on should not be in that list). The RP-Graft message is to cancel out the RP-Prune message so that the traffic can be grafted back. Notice that R1 will still not send traffic to RP because RP still only has one downstream interface/neighbor so it does not send a RP-Graft to R1. Note that for scenario (i), the RP-Prune is data driven hop-by-hop: RP has the (*,G-prefix) resolve route pre-installed while downstream router will install the (*,G) resolve route when a RP-Prune is received, and the RP-Prune is not propagted on downstream routers but re-triggered by the next packet hitting the newly installed resolve route. For scenario (ii), the RP-Prune is control driven: initiated by RP and propagated by downstream routers when appropriate. It is worth to note that the extra (*,G) RP-Prune states are kept only on the relevant routers - where traffic is being or may be received unnecessarily. The maintenance of the (*,G) RP-Prune states is specified in the protocol specification sections. Zhang Expires 2012-04-24 [Page 7] INTERNET DRAFT BIDIR-PIM RP-Prune-Graft 2011-10-24 1.1 Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119]. 2. Protocol Specification 2.1. RP-Prune state Like other PIM states, (*,G) RP-Prune state is soft-state; it relies on the periodic refresh to keep the state alive. The (*,G) RP-Prune state includes the following information: - Upstream interface and neighbor from which the RP-Prune is received - a timer to time out the RP-Prune state after the upstream stops refreshing the RP-Prune messages (this is for scenario (i)). This is to in case the RP-Graft from the upstream is lost. - List of downstream interfaces. For each of them: - a flag indicating whether an interface-wide RP-Prune was sent An interface-wide RP-Prune is sent in response to a resolve notification, which indicates that unnecessary traffic is being received on an interface. In contrast, a neighbor-specific RP- Prune is sent when the (*,G) join state only has one downstream interface. - a list of downstream neighbors. For each of them: - if a RP-Prune was sent to the neighbor, a timer for its refresh. This is the case where an neighbor-specific RP-Prune was sent when there is only one downstream neighbor for a (*,G) join state. - if a RP-Prune-Keep (see Section 2.3) was received from the neighbor, a timer to timeout the neighbor after it stops sending RP-Prune-Keep. This is the case where an interface-wide RP-Prune was sent. 2.2 RP-Prune/Graft message Zhang Expires 2012-04-24 [Page 8] INTERNET DRAFT BIDIR-PIM RP-Prune-Graft 2011-10-24 RP-Prune/Graft messages are always sent downstream away from the RPA. An interface-wide RP-Prune message is multicast to ALL-PIM-ROUTERS (sent in response to the resolve notification that it is receiving unnecessary traffic on an interface). A neighbor-specific RP-Prune message is unicast to a particular downstream neighbor (it is originated from the RP or propagted by a downstream router when there is only one downstream neighbor for a (*,G) join state). Similarly, if a RP-Graft message is sent out of an interface where an interface-wide RP-Prune was sent out of before, ALL-PIM-ROUTERS address is used; if it is sent because a neighbor-specific RP-Prune was sent before, then the address of that particular neighbor is used. To make sure that a RP-Graft message is received by all appropriate downstream neighbors, it is retransmitted until it has received RP- Prune-Cancel message (see below) from all the downstream neighbors listed in the RP-Prune state. 2.2.1 Sending of RP-Prune message 2.2.2 Receiving of RP-Prune message When a RP-Prune meessage is received, it is first validated with the following crieteria: - It must be from the DF on the RPF interface - If it is addressed to ALL_PIM_ROUTERS then there must not be a (*,G) Join state. - If it is addressed to this router, then there must be a corresponding (*,G) join state. If the validation fails, the message is discarded. Otherwise, a corresponding RP-Prune state is looked up or created, with the upstream interface and neighbor recorded. If it does not have corresponding (*,G) Join state, a (*,G) resolve route is created and the RP-Prune message is terminated. If it does have corresponding (*,G) Join state, the RPF interface is removed from the (*,G) forwarding route's OIF list, and the upstream timer in the RP-Prune state is restarted. Zhang Expires 2012-04-24 [Page 9] INTERNET DRAFT BIDIR-PIM RP-Prune-Graft 2011-10-24 If the (*,G) join state has more than one downstream interface/neighbor, the RP-Prune is terminated. If it only has one downstream interface and only one downstream neighbor on it, the RP- Prune is further propagated to that neighbor. 2.2.3 Sending of RP-Graft message 2.2.4 Receiving of RP-Graft message 2.3 RP-Prune-Keep/Cancel message RP-Prune-Keep/Cancel messages are always sent upstream towards the RPA. For a RP-Prune state triggered in case of a single downstream neighbor for a (*,G) join state, it is maintained by periodical refresh from RP downwards. For a RP-Prune state that is triggered in response to resolve notification for traffic being received unnecessarily, it is maintained from the FHR upstream towards the RP. In scenerio (i), the resolve notification will keep coming on R3 and R7 as long as traffic continues. That will maintain the downstream interface in the RP- Prune state (and the RP-Prune state itself). As a result, R3 and R7 will send RP-Prune-Keep messages upstream periodically, which in turn will maintain the downstream interface in the upstream's RP-Prune state (and the RP-Prune state itself). The process goes on. If the source stop sending traffic, resolve notifications will stop happening on the FHR and it will timeout the corresponding downstream interface in the corresponding RP-Prune state. When all the downstream interfaces time out, the RP-Prune state will time out, and the router sends RP-Prune-Cancel upstream so that the upstream router can immediately remove the correspodning downstream interface. The RP-Prune-Cancel also serves as a acknowledgement for RP-Graft message (see above section 2.2). 2.3.1 Sending of RP-Prune-Keep message 2.3.2 Receiving of RP-Prune-Keep message The router looks up the corresponding (*,G) RP-Prune state when the message is received, and the corresponding downstream interface with Zhang Expires 2012-04-24 [Page 10] INTERNET DRAFT BIDIR-PIM RP-Prune-Graft 2011-10-24 the interface-wide RP-Prune flag set. If either one is not found, an interface-wide RP-Graft message is immediately sent. Most likely, a previous RP-Graft is lost or there has been a topology change (see 2.5). Otherwise, the timer for the downstream interface is restarted. 2.3.3 Sending of RP-Prune-Cancel message 2.3.4 Receiving of RP-Prune-Cancel message 2.4 Virtual RPs In this document, we refer all the routers connected to the RPL, Rendezvous Point Link, as the virtual RPs. The virtual RPs perform the enhancement procedures described earlier, with some small changes. Before the necessary changes are discussed, the following are the current standard behaviors for the routers on the RPL: - (*,G) joins are terminated by the virtual RPs and not forwarded onto the RPL. - traffic is always forwarded to the RPL - the RPL is added to the outgoing interface list of either (*,G) and (*,G-prefix) routes on the virtual RPs. For the RP-Prune/Graft procedure to work, the following changes are necessary: - (*,G) joins/prunes are propagated on the RPL to ALL-PIM-ROUTERS (with Upstream Neighbor Address set to 0), and processed by the virtual RPs. The RPL is added to the outgoing list of a (*,G) route if and only if a (*,G) join has been received on the RPL. The (*,G) join/prunes are never sent further downstream, as in the base BIDIR-PIM case. - The virtual RPs install a (*,G-prefix) resolve route instead of a regular forwarding route that forwards traffic onto the RPL. With the above changes, traffic for group G will be forwarded to the RPL by a virtual RP only when at least one other virtual RP has sent Zhang Expires 2012-04-24 [Page 11] INTERNET DRAFT BIDIR-PIM RP-Prune-Graft 2011-10-24 a (*,G) join to the RPL. With the enhancement procedures described earlier: - A virtual RP's control plane will receive resolve notifications for traffic arriving on non-RPL interfaces and trigger RP-Prunes out of those interfaces, unless the traffic matches specific (*,G) forwarding routes. - with respect to triggering RP-Prunes when there is only one downstream neighbor for a (*,G) join state, other virtual RPs are also counted as downstream neighbors if they are in the (*,G) join states on this virtual RP. Those RP-Prunes would only be triggered away from the RPL, as a result of only having one downstream neighbors on non-RPLs, as illustrated in Figure 3 below. +----+ +----+ | R1 | (*,G) | R2 | (*,G) +----+ +----+ | | | | ============================================= RPL | | | | +----+ +----+ | R3 | (*,G) | R4 | (*,G) +----+ +----+ | | | | (*,G) +----+ +----+ +----+ | R5 | | R6 |------|rcvr| +----+ +----+ +----+ Figure 3 There is a receiver off R6, so there are (*,G) join states on R6 and R4 per normal BIDIR-PIM behavior. With the enhancement procedures, R4 will propogate the joins onto the RPL, so all R1 ~ R4 have the (*,G) state, each with only one downstream neighbor (R1~R3 has R4 as the downstream neighbor and R4 has R6 as the downstream neighbor), but only R4 will trigger RP-Prune towards R6 and R1 ~ R3 will not trigger RP-Prune to the RPL. As soon as another receiver joins on R5, R4 will have two downstream neighbors for the (*,G) join state - one being R3 on the RPL and one being the existing R6 - so it will send RP-Graft to R6. 2.5 (*,G) RP-Prune state maintenance Zhang Expires 2012-04-24 [Page 12] INTERNET DRAFT BIDIR-PIM RP-Prune-Graft 2011-10-24 In a steady environment, the (*,G) RP-Prune states are maintained by the RP-Prune refresh from upstream (for scenario (ii)), or by the RP- Prune-Keep refresh messages from downstream (for scenario (i)). In the former case, if an upstream stops refreshing RP-Prune messages, downstream will time out the correspoding states (normally the upstream should have sent RP-Graft in that case). In the latter case, the (*,G) RP-Prune states are no longer needed if relevant sources stops sending - the removal of unnecessary (*,G) RP-Prune states is achieved by the RP-Prune-Keep/Canel procedures triggered from FHRs (see above section 2.3). With topology/configuration changes, if the upstream interface changes, or the DF on the upstream interface changes, all RP-Prune states with that upstream interface and neighbor are deleted, with the following actions: - if a (*,G) Resolve route was installed (scneario (i)), it is deleted. - if the upstream interface was removed from the (*,G) forwarding route (scenario (ii)), it is re-evaluated to see if needs to be added back (w/o considering RP-Prune state) - RP-Graft messages are sent out of downstream interfaces or to a neighbor where RP-Prune was sent before (with retransmission if necessary) 2.6 Packet format The RP-Prune/Graft, RP-Prune-Keep/Cancel messages use the PIM-SM PIM Join/Prune format, with the following differences: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |PIM Ver| Type |Subtype| Rsvd | Checksum | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ........................................................... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Type The type is X, with a 4-bit Subtype field carved out of the previous Reserved field (similar to the DF Election messages). The four messages have the following Subtype values: 1 RP-Prune 2 RP-Graft 3 RP-Prune-Keep Zhang Expires 2012-04-24 [Page 13] INTERNET DRAFT BIDIR-PIM RP-Prune-Graft 2011-10-24 4 RP-Prune-Cancel Upstream Neighbor Address For RP-Prune/Graft messages, it is the address of the downstream neighbor, or zero in case of interface-wide RP-Prune/Graft messages. For RP-Prune-Keep/Cancel messages, it is the address of the upstream DF. Holdtime For RP-Prune-Cancel, it is 0. For RP-Graft, it is the amount of time within which the receiver message must respond with a RP-Prune-Cancel as an acknowledgement. For RP-Prune, it is the amount of time that a receiver can keep the RP-Prune state alive. For RP-Prune-Keep, it is the amount of time that a receiver can keep the downstream neighbor alive. Number of Joined Sources, Number of Pruned Sources Always 0. 3 Security Considerations 4 IANA Considerations 5. Acknowledgements Thanks for review comments and suggestions from Kurt Windisch and Rober Kebler. 6 References 6.1 Normative References [KEYWORDS] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. Zhang Expires 2012-04-24 [Page 14] INTERNET DRAFT BIDIR-PIM RP-Prune-Graft 2011-10-24 [RFC5015] Handley, M., Kouvelas, I., Speakman, T., and L. Vicisano, "Bidirectional Protocol Independent Multicast (BIDIR- PIM)", RFC 5015, October 2007. [RFC4601] Fenner, B., Handley, M., Holbrook, H., and I. Kouvelas, "Protocol Independent Multicast - Sparse Mode (PIM-SM): Protocol Specification (Revised)", RFC 4601, August 2006. 6.2 Informative References Authors' Addresses Zhaohui (Jeffrey) Zhang Juniper Networks, Inc. 10 Technology Park Drive Westford, MA 01886 EMail: zzhang@juniper.net WeeSan Lee Juniper Networks, Inc. 1194 North Mathilda Avenue Sunnyvale, CA 94089-1206 EMail: weesan@juniper.net Zhang Expires 2012-04-24 [Page 15]