INTERNET DRAFT Venu Hemige Alcatel Internet Engineering Task Force Yetik Serbest Document: SBC draft-hemige-serbest-l2vpn-vpls-pim-snooping- Ray Qiu 00.txt Suresh Boddapati Alcatel November 2005 Category: Informational Expires: May 2006 PIM Snooping over VPLS Status of this memo Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. IPR Disclosure Acknowledgement By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79. Abstract In Virtual Private LAN Service (VPLS), as also in IEEE Bridged Networks, the switches simply flood multicast traffic on all ports in the LAN by default. IGMP/MLD Snooping is commonly deployed to ensure multicast traffic is not forwarded on ports without IGMP/MLD receivers. [Page 1] ^L draft-hemige-serbest-l2vpn-vpls-pim-snooping-00.txt Nov, 2005 The procedures and recommendations for IGMP/MLD Snooping are defined in [IGMP-SNOOP]. But when any protocol other than IGMP or MLD is used, the common practice is to simply flood multicast traffic to all ports. PIM-SM, PIM-SSM, PIM-BIDIR are widely deployed routing protocols. PIM Snooping procedures are important to restrict multicast traffic to only the routers interested in receiving such traffic. While most of the PIM Snooping procedures defined here also apply to IEEE Bridged Networks, VPLS demands certain special procedures due to the split-horizon rules that require the Provider Edge (PE) devices to co-operate. This document describes the procedures and recommendations for PIM-Snooping in VPLS to facilitate replication to only those ports behind which there are interested PIM routers and/or IGMP/MLD hosts. Conventions used in this document The key words MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119. Table of Contents 1. Contributing Authors............................................3 2. Introduction....................................................3 2.1. Definitions...................................................4 3. Overview of VPLS................................................4 4. Multicast Traffic over VPLS.....................................5 5. Constraining of IP Multicast in a VPLS..........................6 5.1. General Rules for PIM Snooping in VPLS........................7 5.2. PIM Snooping for VPLS.........................................7 5.2.1. PIM Snooping State Summarization Macros.....................8 5.2.2. Snooping PIM Packets.......................................11 5.2.2.1. Join Suppression Issues..................................11 5.2.2.2. Forwarding PIM Packets...................................11 5.2.3. Discovering PIM Routers....................................11 5.2.4. PIM-DM.....................................................12 5.2.4.1. Building PIM-DM Snooping States..........................12 5.2.4.1.1 PIM-DM Downstream Per-Port PIM(S,G,N) State Machine.....12 5.2.4.2. Triggering ASSERT election in PIM-DM.....................13 5.2.5. PIM-SM and PIM-SSM.........................................13 5.2.5.1. Building PIM-SM Snooping States..........................13 5.2.5.1.1 PIM-SM Downstream per-port PIM-SM (S,G,N)/(*,G,N) State Machine...........................................................14 5.2.5.1.2 PIM-SM Downstream per-port PIM-SM (S,G,Rpt,N) State Machine...........................................................14 5.2.5.2. Triggering ASSERT Election in PIM-SM.....................14 5.2.6. Bidirectional-PIM (BIDIR-PIM)..............................17 5.2.6.1. Building BIDIR-PIM Snooping States.......................17 5.2.6.1.1 BIDIR-PIM Downstream per-port BIDIR-PIM (*,G,N) State Machine...........................................................18 [Page 2] draft-hemige-serbest-l2vpn-vpls-pim-snooping-00.txt Nov, 2005 5.2.7. Multicast Source Directly Connected to the VPLS Instance...18 5.2.7.1. PIM Join Suppression Issues..............................18 5.3. Data Forwarding Rules........................................19 6. IANA Considerations............................................19 7. Security Considerations........................................19 8. Normative References...........................................20 9. Informative References.........................................20 1. Contributing Authors This document was the combined effort of several individuals. The following are the authors, in alphabetical order, who contributed to this document: Suresh Boddapati Venu Hemige Sunil Khandekar Vach Kompella Marc Lasserre Rob Nath Ray Qiu Yetik Serbest Himanshu Shah 2. Introduction In Virtual Private LAN Service (VPLS), the Provider Edge (PE) devices provide a logical interconnect such that Customer Edge (CE) devices belonging to a specific VPLS instance appear to be connected by a single LAN. Forwarding information base for particular VPLS instance is populated dynamically by source MAC address learning. This is a straightforward solution to support unicast traffic, with reasonable flooding for unicast unknown traffic. Since a VPLS provides LAN emulation for IEEE bridges as wells as for routers, the unicast and multicast traffic need to follow the same path for layer-2 protocols to work properly. As such, multicast traffic is treated as broadcast traffic and is flooded to every site in the VPLS instance. VPLS solutions (i.e., [VPLS-LDP] and [VPLS-BGP]) perform replication for multicast traffic at the ingress PE devices. When replicated at the ingress PE, multicast traffic wastes bandwidth when: 1. Multicast traffic is sent to sites with no members, 2. Pseudo wires to different sites go through a shared path, and 3. Multicast traffic is forwarded along a shortest path tree as opposed to the minimum cost spanning tree. This document is addressing the first problem by IGMP/MLD and PIM snooping. Problems #2 and #3 are orthogonal to #1 and outside the scope of this document. The different methods to get traffic from a PE to other PEs and the pros and cons of each method are discussed in [RAHUL-VPLS-MCAST]. Using VPLS in conjunction with IGMP/MLD and/or PIM snooping has the following advantages: [Page 3] draft-hemige-serbest-l2vpn-vpls-pim-snooping-00.txt Nov, 2005 - It improves VPLS to support IP multicast efficiently (not necessarily optimum, as there can still be bandwidth waste if traffic from a PE to other PE(s) is not forwarded along a minimum cost spanning tree.), - It prevents sending multicast traffic to sites with no members. Procedures for IGMP/MLD Snooping are specified in [IGMP-SNOOP]. This document describes the procedures for Protocol Independent Multicast (PIM) snooping over VPLS for efficient distribution of IP multicast traffic. It also describes the rules when both IGMP/MLD and PIM are active in a VPLS instance. 2.1. Definitions The following definitions and abbreviations are used throughout this document: - A port is defined as either an attachment circuit (AC) or a Pseudo-Wire (PW). - When we say a PIM packet is received on a port, it means the packet is snooped ingressing on that port. - A router port associated with router R (Rport( R )) is defined as the port used to reach the router R. Abbreviations used in the document: - S: IP Address of the Multicast Source. - G: IP Address of the Multicast Group. - N: Upstream Neighbor field in a Join/Prune/Graft message. - Rport(X): Port on which neighbor X is learnt - UP: Rport(N) 3. Overview of VPLS In case of VPLS, the PE devices provide a logical interconnect such that CE devices belonging to a specific VPLS appear to be connected by a single LAN. End-to-end VPLS consists of a bridge module and a LAN emulation module ([L2VPN-FR]). In a VPLS, a customer site receives Layer-2 service from the SP. The PE is attached via an access connection to one or more CEs. The PE performs forwarding of user data packets based on information in the Layer-2 header, that is, MAC destination address. The CE sees a bridge. The details of VPLS reference model, which we summarize here, can be found in [L2VPN-FR]. In VPLS, the PE can be viewed as containing a Virtual Switching Instance (VSI) for each L2VPN that it serves. A CE device attaches, possibly through an access network, to a bridge module of a PE. Within the PE, the bridge module attaches, through an Emulated LAN Port to an Emulated LAN. For each VPLS, there is an [Page 4] draft-hemige-serbest-l2vpn-vpls-pim-snooping-00.txt Nov, 2005 Emulated LAN instance. The Emulated LAN consists of VPLS Forwarder module (one per PE per VPLS service instance) connected by pseudo wires (PW), where the PWs may be traveling through Packet Switched Network (PSN) tunnels over a routed backbone. VSI is a logical entity that contains a VPLS forwarder module and part of the bridge module relevant to the VPLS service instance [L2VPN-FR]. Hence, the VSI terminates PWs for interconnection with other VSIs and also terminates attachment circuits (ACs) for accommodating CEs. A VSI includes the forwarding information base for a L2VPN [L2VPN-FR] which is the set of information regarding how to forward Layer-2 frames received over the AC from the CE to VSIs in other PEs supporting the same L2VPN service (and/or to other ACs), and contains information regarding how to forward Layer-2 frames received from PWs to ACs. Forwarding information bases can be populated dynamically (such as by source MAC address learning) or statically (e.g., by configuration). Each PE device is responsible for proper forwarding of the customer traffic to the appropriate destination(s) based on the forwarding information base of the corresponding VSI. 4. Multicast Traffic over VPLS In VPLS, if a PE receives a frame from an Attachment Circuit (AC) with no matching entry in the forwarding information base for that particular VPLS instance, it floods the frame to all other PEs (which are part of this VPLS instance) and to directly connected ACs (other than the one that the frame is received from). The flooding of a frame occurs when: - The destination MAC address has not been learned, - The destination MAC address is a broadcast address, - The destination MAC address is a multicast address. Malicious attacks (e.g., receiving unknown frames constantly) aside, the first situation is handled by VPLS solutions as long as destination MAC address can be learned. After that point on, the frames will not be flooded. A PE is REQUIRED to have safeguards, such as unknown unicast limiting and MAC table limiting, against malicious unknown unicast attacks. There is no way around flooding broadcast frames. To prevent runaway broadcast traffic from adversely affecting the VPLS service and the SP network, a PE is REQUIRED to have tools to rate limit the broadcast traffic as well. Similar to broadcast frames, multicast frames are flooded as well, as a PE cannot know where multicast members reside. Rate limiting multicast traffic, while possible, should be should be done carefully since several network control protocols relies on multicast. For one thing, layer-2 and layer-3 protocols utilize multicast for their operation. For instance, Bridge Protocol Data Units (BPDUs) use an IEEE assigned all bridges multicast MAC address, and OSPF is multicast to all OSPF routers multicast MAC address. If the rate- limiting of multicast traffic is not done properly, the customer network will experience instability and poor performance. For the [Page 5] draft-hemige-serbest-l2vpn-vpls-pim-snooping-00.txt Nov, 2005 other, it is not straightforward to determine the right rate limiting parameters for multicast. A VPLS solution MUST NOT affect the operation of customer layer-2 protocols (e.g., BPDUs). Additionally, a VPLS solution MUST NOT affect the operation of layer-3 protocols. In the following section, we describe procedures to constrain the flooding of IP multicast traffic in a VPLS. 5. Constraining of IP Multicast in a VPLS The objective of improving the efficiency of VPLS for multicast traffic that we are trying to optimize here has the following constraints: - The service is VPLS, i.e., a layer-2 VPN, - In VPLS, ingress replication is required, - There is no layer-3 adjacency (e.g., PIM) between a CE and a PE. Under these circumstances, the most obvious approach is implementation of IGMP/MLD and PIM snooping in VPLS. Other multicast routing protocols such as DVMRP and MOSPF are outside the scope of this document. Another approach to constrain multicast traffic in a VPLS is to utilize point-multipoint LSPs (e.g., [PMP-RSVP-TE]). In such case, one has to establish a point-multipoint LSP from a source PE (i.e., the PE to which the source router is connected to) to all other PEs participating in the VPLS instance. In this case, if nothing is done, all PEs will receive multicast traffic even if they do not have any members hanging off of them. One can apply IGMP/MLD and PIM snooping, but this time IGMP/PIM snooping should be done in P routers as well. One can propose a dynamic way of establishing point-multipoint LSPs, for instance by mapping IGMP/PIM messages to RSVP-TE signaling. One should consider the effect of such approach on the signaling load and on the delay between the time the join request received and the traffic is received (this is important for IPTV application for instance). This approach is outside the scope of this document. Although, in some extremely controlled cases, such as a ring topology of PE routers with no P routers or a tree topology, the efficiency of the replication of IP multicast can be improved. For instance, spoke PWs of a hierarchical VPLS can be daisy-chained together and some replication rules can be devised. These cases are not expected to be common and will not be considered in this document. In the following sub-sections, we provide some guidelines for the implementation of PIM snooping in VPLS. Snooping techniques need to be employed on ACs at the downstream PEs. Snooping techniques can also be employed on PWs at the upstream PEs. This may work well for small to medium scale deployments. However, if there are a large number of VPLS instances with a large number of PEs per instances, then the amount of snooping required at the upstream PEs can [Page 6] draft-hemige-serbest-l2vpn-vpls-pim-snooping-00.txt Nov, 2005 overwhelm the upstream PEs. In [VPLS-MCAST-LDP] and [VPLS-MCAST-BGP], procedures are defined to exchange multicast membership information between the PEs using LDP or BGP. Using a reliable mechanism like LDP or BGP allows the upstream PEs to eliminate the requirement to snoop on PWs. It also eliminates the need to refresh multicast states on the upstream PEs. 5.1. General Rules for PIM Snooping in VPLS The following rules for the correct operation of IGMP/PIM snooping MUST be followed. Rule 1: IGMP and PIM messages forwarded by PEs MUST follow the split- horizon rule for mesh PWs as defined in [VPLS-LDP]. Rule 2: IGMP/PIM snooping states in a PE MUST be per VPLS instance. Rule 3: Multicast traffic MUST be replicated per PW and AC basis, i.e., even if there are more than one PIM neighbor behind a PW/AC, only one replication MUST be sent to that PW/AC. 5.2. PIM Snooping for VPLS IGMP/MLD snooping procedures described in [IGMP-SNOOP] provide efficient delivery of IP multicast traffic in a given VPLS service when end stations are connected to the VPLS. However, when VPLS is offered as a WAN service it is likely that the CE devices are routers and would run PIM between them. To provide efficient IP multicasting in such cases, it is necessary that the PE routers offering the VPLS service do PIM snooping. PIM is a multicast routing protocol, which runs exclusively between routers. PIM shares many of the common characteristics of a routing protocol, such as discovery messages (e.g., neighbor discovery using Hello messages), topology information (e.g., multicast tree), and error detection and notification (e.g., dead timer and designated router election). On the other hand, PIM does not participate in any kind of exchange of databases, as it uses the unicast routing table to provide reverse path information for building multicast trees. There are a few variants of PIM. In PIM-DM ([PIM-DM]), multicast data is pushed towards the members similar to broadcast mechanism. PIM-DM constructs a separate delivery tree for each multicast group. As opposed to PIM-DM, other PIM flavors (PIM-SM [PIM-SM], PIM-SSM [PIM-SSM], and BIDIR-PIM [BIDIR-PIM]) invoke a pull methodology instead of push technique. PIM routers periodically exchange Hello messages to discover and maintain stateful sessions with neighbors. After neighbors are discovered, PIM routers can signal their intentions to join/prune specific multicast groups. This is accomplished by having downstream routers send an explicit join message (for the sake of generalization, consider Graft messages for PIM-DM as join messages) to the upstream routers. The join/prune message can be group specific (*,G) or group and source specific (S,G). [Page 7] draft-hemige-serbest-l2vpn-vpls-pim-snooping-00.txt Nov, 2005 In PIM snooping, a PE snoops on the PIM message exchange between routers, and builds its multicast states. Based on the multicast states, it forwards IP multicast traffic accordingly to avoid unnecessary flooding. 5.2.1. PIM Snooping State Summarization Macros The following sets are defined to help build the forwarding state on a PE. Some sets may apply only to a subset of the PIM Protocol variants as noted along with the definition of the sets. All_Pim_Neighbors = Set of all PIM neighbors in a VPLS instance. PIM_DR = The PIM Neighbor which is the elected PIM designated router in the VPLS instance. pim_joins(*,G,N) = { All ports P such that DownstreamJPState(*,G,N,P) is either in Join or Prune Pending state. } This set applies to PIM-SM and BIDIR-PIM. pim_joins(*,G) = This set is the union of all pim_joins(*,G,N) for each N in All_Pim_Neighbors . This set applies only to PIM-SM and BIDIR-PIM. pim_joins(S,G,N) = { All ports P such that DownstreamJPState(S,G,N,P) is either in Join or Prune Pending state. } This set applies only to PIM-SM and PIM-SSM. pim_joins(S,G) = This set is the union of all pim_joins(S,G,N) for each N in All_Pim_Neighbors. This set applies only to PIM-SM and PIM-SSM. pim_iifs(*,G) = { All Rport(N) for each N in All_Pim_Neighbors such that Pim_joins(*,G,N) is not empty. } pim_iifs(S,G) = { All Rport(N) for each N in All_Pim_Neighbors such that Pim_joins(S,G,N) is not empty. } [Page 8] draft-hemige-serbest-l2vpn-vpls-pim-snooping-00.txt Nov, 2005 pim_prunes(S,G,N) = { All ports P such that DownstreamPState(S,G,N,P) is in Pruned state. } This set applies only to PIM-DM. pim_prunes(S,G) = This is the union of all pim_prunes(S,G,N) for each N in All_Pim_Neighbors. This set applies only to PIM-DM. pim_prunes(S,G,rpt,N) = { All ports P such that DownstreamJPState(S,G,rpt,N,P) is in Prune or PruneTmp state. } This set applies only to PIM-SM. pim_prunes(S,G,rpt) = This set is the union of all pim_prunes(S,G,rpt,N) for each N in All_Pim_Neighbors. This set applies only to PIM-SM. For PIM-DM, pim_oiflist(S,G) = All_Pim_Neighbors (-) pim_prunes(S,G) (+) pim_iifs(S,G) For PIM-SM, and PIM-SSM, Pim_inherited_oiflist(S,G,rpt) = pim_joins(*,G) (-) pim_prunes(S,G,rpt) (+) pim_iifs(*,G) pim_oiflist(*,G) = pim_joins(*,G) (+) pim_iifs(*,G) pim_oiflist(S,G) = pim_inherited_oiflist(S,G,rpt) (+) pim_joins(S,G) (+) pim_iifs(S,G) For PIM-SSM, pim_oiflist(S,G) = pim_joins(S,G) (+) pim_iifs(S,G) For PIM-BIDIR, Pim_oiflist(*,G) = DF(RP(G)) + pim_joins(*,G) Where DF(RP(G)) is the AC/PW towards the router that is the designated forwarder for RP(G). In the above, one should note that pim_iifs if included in pim_oifs. This is necessary for handling duplicate traffic issue (i.e., [Page 9] draft-hemige-serbest-l2vpn-vpls-pim-snooping-00.txt Nov, 2005 triggering Assert) explained in Section 5.2.5.2. It should also be noted that multicast traffic received from a port in pim_iifs will not be sent back to that port jus because it is also in pim_oiflist, because VPLS split-horozin rule will prevent that. Note that pim_oiflist(S,G)/pim_oiflist(*,G) are not the complete list of outgoing ports (oiflist). IGMP/MLD also contribute to this list. In addition to the above state summarization macros, we define the following IGMP/MLD state summarization macros which are important while considering the forwarding behavior when both IGMP/MLD and PIM snooping is active in a VPLS instance. local_include(*,G) = { All ports P such that IGMP/MLD module or other local membership mechanism has determined that local members on port P desire to receive traffic sent to group G. } local_include(S,G) = { All ports P such that IGMP/MLD module or other local membership mechanism has determined that local members on port P desire to receive traffic sent from source S to group G. } local_exclude(S,G) = { All ports P such that IGMP/MLD module or other local membership mechanism has determined that local members on port P desire to NOT receive traffic sent from source S to group G. } local_iif(*,G) = { if local_include(*,G) is non-empty RPort(PIM_DR) } This sets contains the port towards the PIM_DR in the VPLS instance if local_inlclude(*,G) is non-empty. local_iif(S,G) = { if (local_include(*,G) (-) local_exclude(S,G) (+) local_include(S,G)) is non-empty { RPort(PIM_DR) } } [Page 10] draft-hemige-serbest-l2vpn-vpls-pim-snooping-00.txt Nov, 2005 In the following sub-sections, snooping mechanisms for each variety of PIM are specified. 5.2.2. Snooping PIM Packets A PIM-Snooping PE snoops on the following PIM packets to build its multicast states. - PIM Hellos - PIM Join/Prunes - PIM Graft - PIM State Refresh Note that the PIM packets are not modified by the PE, but are only snooped for the purposes of building multicast states. 5.2.2.1. Join Suppression Issues Since PIM Snooping switches build state by snooping on PIM Join/Prune packets, one rule is that CE PIM routers MUST disable join- suppression in the VPLS instance. If Join-suppression is not disabled, then traffic may not ever flow to some receivers when PIM snooping is enabled in the VPLS instance. 5.2.2.2. Forwarding PIM Packets PIM Packets that are snooped MUST also be forwarded as is in the VPLS instance. As noted in the previous section, it is assumed that the CE routers do not suppress their PIM Joins when they see Joins from other routers. 5.2.3. Discovering PIM Routers A PIM Snooping PE MUST snoop on PIM Hellos received on ACs and PWs. PIM Hellos are used by the snooping switch to discover PIM routers and their characteristics. The PE includes an entry in the PIM Neighbor Database containing the following fields from the PIM Hello it snoops on: MAC Address of the Router sending the PIM Hello. IP Address and address family of the Router sending the PIM Hello. Port (AC / PW) on which the PIM Hello was received. Hello-Hold-Time. Bi-Dir Capable. Tracking Support. DR Priority Please refer to [PIM-SM] for the meaning of the various fields mentioned here. [Page 11] draft-hemige-serbest-l2vpn-vpls-pim-snooping-00.txt Nov, 2005 The corresponding to the Routers IP Address is used to determine where to forward subsequent PIM Joins destined to that IP Address. When a PIM Hello is received, the PE MUST reset the neighbor-expiry- timer to Hello-Hold-Time. If a PE does not receive a Hello message from a router within Hello-Hold-Time, the PE MUST remove that router from the PIM snooping state. If a PE receives a Hello message from a router with Hello-Hold-Time value set to zero, the PE MUST remove that router from the PIM snooping state immediately. From the PIM Neighbor Database, a PE MUST be able to use the procedures defined in [PIM-SM] to determine Designated Router in the VPLS instance. If Tracking Support is active in the VPLS instance. 5.2.4. PIM-DM The characteristics of PIM-DM is flood and prune behavior. Shortest path trees are built as a multicast source starts transmitting. The procedures to discover PIM-DM routers are as explained in section 5.2.3. 5.2.4.1. Building PIM-DM Snooping States PIM-DM Snooping states are built by snooping on the PIM-DM Join, Prune, Graft and State Refresh messages received on AC/PWs and State- Refresh Messages sent on AC/PWs. By snooping on these PIM-DM messages, a PE builds the following states per (S,G,N) where S is the address of the multicast source, G is the Group address and N is the upstream neighbor to which Prunes/Grafts are sent by downstream CEs: Per PIM (S,G,N): Per Port PIM (S,G,N) Prune State: - DownstreamPState(S,G,N,P): One of {"NoInfo" (NI), "Pruned" (P), "PrunePending" (PP)} - Prune Pending Timer (PPT) - Prune Timer (PT) - Upstream Port (UP) (valid if the PIM(S,G,N) Prune State is "Pruned"). From the above states, we can derive the macros pim_prunes(S,G,N), pim_prunes(S,G) and pim_iifs(S,G) that are defined in section 5.2.1. 5.2.4.1.1 PIM-DM Downstream Per-Port PIM(S,G,N) State Machine The downstream per-port PIM(S,G,N) state machine is as defined in section 4.4.2 of [PIM-DM] with a few changes relevant to PIM Snooping. When reading section 4.4.2 of [PIM-DM] for the purposes of [Page 12] draft-hemige-serbest-l2vpn-vpls-pim-snooping-00.txt Nov, 2005 PIM-Snooping please be aware that the downstream states are built per (S, G, N, Downstream-Port} in PIM-Snooping and not per {Downstream- Interface, S, G} as in a PIM-DM router. As noted in the previous section 5.2.4.1. , the states (DownstreamPState) and timers (PPT and PT) are per (S,G,N,P). 5.2.4.2. Triggering ASSERT election in PIM-DM Since PIM-DM is a flood-and-prune protocol, traffic is flooded to all routers unless explicitly pruned. Since PIM-DM routers do not prune on non-RPF interfaces, PEs should typically not receive Prunes on Rport(RPF-neighbor). So the asserting routers should typically be in pim_oiflist(S,G). In most cases, assert election should occur naturally without any special handling since data traffic will be forwarded to the asserting routers. However, there are some scenarios where a prune might be received on a port which is also an upstream port (UP). If we prune the port from pim_oiflist(S,G), then it would not be possible for the asserting routers to determine if traffic arrived on their downstream port. This can be fixed by adding pim_iifs(S,G) to pim_oiflist(S,G) so that data traffic flows to the UP ports. 5.2.5. PIM-SM and PIM-SSM The key characteristic of PIM-SM and PIM-SSM is explicit join behavior. In this model, the multicast traffic is only sent to locations that specifically request it. The root node of a tree is the Rendezvous Point (RP) in case of a shared tree (PIM-SM only) or the first hop router that is directly connected to the multicast source in the case of a shortest path tree. All the procedures described in this section apply to both PIM-SM and PIM-SSM, except for the fact that there is no (*,G) state in PIM-SSM. We assume that the PEs have the capability to store (S,G) states for PIM-SM snooping and forward/replicate traffic accordingly. This is not mandatory. An implementation, can fall back to (*,G) states, if its hardware cannot support it. In such case, the efficiency of multicast forwarding will be less. The procedures to discover PIM-SM routers in a VPLS instance are as described in section 5.2.3. 5.2.5.1. Building PIM-SM Snooping States PIM-SM and PIM-SSM Snooping states are built by snooping on the PIM- SM Join/Prune messages received on AC/PWs. PIM-SM Join/Prune Messages received by a PE on a port MUST be flooded within the VPLS instance. The snooping procedures described here assume that the CE routers support Explicit Tracking and therefore Join Suppression is disabled. If Join Suppression is not disabled, snooping states may not be built correctly. [Page 13] draft-hemige-serbest-l2vpn-vpls-pim-snooping-00.txt Nov, 2005 The downstream state is built per (S,G,N)/(*,G,N)) where S is the address of the multicast source, * represents the shared tree rooted at the RP, G is the Group address and N is the upstream neighbor to which Join/Prune Messages are sent by downstream CEs. For PIM-SSM, (*,G,N) states are not applicable. The downstream state consists of: Per PIM (S,G,N)/(*,G,N): Per Port PIM (S,G,N) Join/Prune State: - DownstreamJPState: One of { "NoInfo" (NI), "Join" (J), "Prune Pending" (PP) } - Prune Pending Timer (PPT) - Join Prune Expiry Timer (ET) - Upstream Port (UP) (valid if the PIM(S,G,N) Join Prune State is "Join" or "Prune Pending"). From the above states, we can derive the macros pim_joins(S,G,N)/(*,G,N), and pim_iifs(S,G) that are defined in section 5.2.1. A PE MAY ignore a Join/Prune message for an (S,G) not addressed to its own AC ONLY if it has no ACs in OifList(S,G). Note that no ACs in OifList(S,G) means the PE is neither a source nor a sink for the traffic. It cannot be a source since pim_iif(S,G) is also part of OifList(S,G). If it is a source, then it needs to know about other sources in order to trigger assert. If it is a sink, then it needs to ensure that there is only one source so it has to keep track of all sources to trigger assert. If it is neither, it has no need to create state. 5.2.5.1.1 PIM-SM Downstream per-port PIM-SM (S,G,N)/(*,G,N) State achine To correctly build PIM-SM snooping states, a PE will have to snoop on Join/Prune messages. The per-port state machine for receiving (*,G)/(S,G) Join/Prune messages is as described in Sections 4.5.2 and 4.5.3 of [PIM-SM] with the exception that the downstream state is per port per upstream neighbor for a given (S,G)/(*,G) as opposed to per interface. 5.2.5.1.2 PIM-SM Downstream per-port PIM-SM (S,G,Rpt,N) State Machine The per-port state machine for receiving (S,G,Rpt) Join Prune messages is as described in Section 4.5.4 of [PIM-SM] with the exception that the downstream state is per port per upstream neighbor as opposed to per interface. 5.2.5.2. Triggering ASSERT Election in PIM-SM In PIM-SM, there are scenarios where multiple routers could be forwarding the same multicast traffic on a LAN. When this happens, [Page 14] draft-hemige-serbest-l2vpn-vpls-pim-snooping-00.txt Nov, 2005 using PIM Assert Election process by sending PIM Assert Messages, routers ensure that only the Assert Winner forwards traffic on the LAN. In a typical LAN, the Assert Election is a data driven event and happens only if a router sees traffic on the interface to which it should be forwarding the traffic. Therefore, in the case of VPLS, in order to trigger Assert Election and stop duplicate traffic, it is necessary that two routers that are forwarding duplicate traffic for an (S,G)/(*,G) see each others traffic. The set pim_iifs keeps track of all the possible Rports from which traffic could arrive for a given state. Note that VPLS does not have the layer 3 routing information available to the routers in order to determine if the upstream neighbor information in the Join/Prune Message is correct or not. Therefore, it has to keep track of all the upstream routers to which Joins have been sent for a given state. The set pim_iifs is constructed for a given (S,G)/(*,G) as follows: 1) When a Join is received targeted to an Upstream Neighbor N, Rport(N) is added to the pim_iif set, if it is not already in the set. 2) When a Prune is received targeted to an Upstream Neighbor N, Rport(N) is removed from the pim_iif set if there is no other upstream neighbor on this port to which a Join for the state was sent. The pim_iif set is also a part of the macro pim_oiflist (Section 5.2.1). This ensures that data is forwarded on all Rports where upstream neighbors are present, which in turn facilitates the routers on those ports to detect duplicate traffic, trigger Assert Procedures and stop the duplicate traffic. Note that the VPLS forwarding rules still apply i.e. a packet received on a PW MUST NOT be forwarded back on another PW even if the PW is in the pim_oiflist. Triggering Assert in certain scenarios is important. There can be some scenarios where CE routers can receive duplicate multicast traffic. Let us consider the scenario in Figure 1. +------+ AC3 +------+ | PE2 |-----| CE3 | /| | | | / +------+ +------+ / | | / | | /PW12 | | / | +-----+ / |PW23 | S | / | +-----+ / | | / | | / | | +------+ +------+ / +------+ +------+ | CE1 | | PE1 |/ PW13 | PE3 | | CE4 | | |-----| |-------------| |-----| | +------+ AC1 +------+ +------+ AC4 +------+ [Page 15] draft-hemige-serbest-l2vpn-vpls-pim-snooping-00.txt Nov, 2005 | |AC2 +------+ | CE2 | | | +------+ Figure 1 An Example Scenario for Triggering Assert In the scenario depicted in Figure 3, both CE1 and CE2 has two ECMP routes to reach the source "S". Hence, CE1 may pick R3 as its next hop ("Upstream Neighbor"), and CE2 may pick CE4 as its next hop. As a result, both CE1 and CE2 will receive duplicate traffic for a moment, then Assert procedures will kick in and duplicate traffic will be resolved. Here is the sequence of events: 1. CE1 is sending a (S,G) Join with N=CE3. 2. pim_iifs(S,G)={PW12} on PE1. PE1 floods the join and if using LDP (as explained in [VPLS-MCAST-LDP]), sends the Join via LDP on PW12. pim_oiflist(S,G)={AC1, PW12} on PE1. 3. pim_iifs(S,G)={AC3} and pim_oiflist(S,G)=(PW12, AC3} on PE2. The above is all that needs to occur in most cases where there is no assert. 4. CE2 sends a (S,G) Join with N=CE4. 5. pim_iifs(S,G)={PW12, PW13} on PE1. PE1 floods the join and if using LDP, sends a Join via LDP on {PW12, PW13}. Pim_oiflist(S,G)={AC1, AC2, PW12, PW13} on PE1. 6. PE2 receives the Join. Pim_iifs(S,G)={AC3, PW23} on PE2. pim_oiflist(S,G)={PW12, AC3, PW23} on PE2. 7. PE3 receives the Join too. Pim_iifs(S,G)={AC4} on PE3. pim_oiflist={PW13, AC4} on PE3. So even before multicast traffic starts flowing, the pim_oiflist(S,G) on the PEs are (i.e., the forwarding plane): PE1: {AC1, AC2, PW12, PW13} PE2: {PW12, AC3, PW23} PE3: {PW13, AC4} By building such a forwarding state when Joins are processed, there needs to be no additional action taken when data traffic is received. PE1 does not need to detect duplicate traffic. Traffic from PE2 will automatically flow to PE3. When the assert election is complete, if CE3 becomes the assert winner, then 8. CE2 sends a (S,G) Prune with N=CE4 and a (S,G) Join with N=CE3. 9. The JP message (both prune and join) is flooded (or sent to pim_iifs(S,G)={PW12, PW13} via LDP). As a result of the Prune, pim_iifs(S,G)={PW12} and pim_oiflist(S,G)={AC1,AC2,PW12} on PE1. 10. PE2 receives the Prune directed to CE4. As a result pim_iifs(S,G)={AC3} and pim_oiflist(S,G)={PW12, AC3}. 11. PE3 receives the Prune too. As a result, pim_iifs(S,G)={} and pim_oiflist(S,G)={}. So, PE3 purges state for that (S,G). [Page 16] draft-hemige-serbest-l2vpn-vpls-pim-snooping-00.txt Nov, 2005 After assert election, the forwarding state should be: PE1: {AC1, PW12} PE2: {PW12, AC2} PE3: {} 5.2.6. Bidirectional-PIM (BIDIR-PIM) BIDIR-PIM is a variation of PIM-SM. The main differences between PIM-SM and Bidirectional-PIM are as follows: - There are no source-based trees, and source-specific multicast is not supported (i.e., no (S,G) states) in BIDIR- PIM. - Multicast traffic can flow up the shared tree in BIDIR-PIM. - To avoid forwarding loops, one router on each link is elected as the Designated Forwarder (DF) for each RP in BIDIR-PIM. The main advantage of BIDIR-PIM is that it scales well for many-to- many applications. However, the lack of source-based trees means that multicast traffic is forced to remain on the shared tree. The procedures to discover PIM-SM routers in a VPLS instance are as described in section 5.2.3. For BIDIR-PIM to work properly, all routers within the domain must know the address of the RP. There are three methods to discover the RPs: 1. Static configuration, 2, Snooping Auto-RP messages, and 3. Snooping PIMv2 Bootstrap messages. Auto-RP and Bootstrap messages are multicast and will be flooded in the VPLS instance. During RP discovery time, PIM routers elect DF per subnet for each RP. The algorithm to elect the DF is as follows: all PIM neighbors in a subnet advertise their unicast route to elect the RP and the router with the best route is elected. All PEs MUST snoop the DF elections messages and determine the DF for each (*,G) and the port towards the DF (DF(RP)) MUST be added to the oiflist whose RP(G) is RP. The DF election state machine is described as in Section 3.5 of [BIDIR-PIM]. 5.2.6.1. Building BIDIR-PIM Snooping States The BIDIR-PIM snooping for Join and Prune messages is similar to PIM- SM and the following (some of which are repetitions from PIM-SM section) applies. BIDIR-PIM Snooping states are built by snooping on the BIDIR-PIM Join/Prune messages received on AC/PWs. PIM-SM Join/Prune Messages received by a PE on a port MUST be flooded within the VPLS instance. The snooping procedures described here assume that the CE routers support Explicit Tracking and therefore Join Suppression is disabled. If Join Suppression is not disabled, snooping states may not be built correctly. [Page 17] draft-hemige-serbest-l2vpn-vpls-pim-snooping-00.txt Nov, 2005 The downstream state is built per (*,G,N) where * represents the shared tree, G is the Group address and N is the upstream neighbor to which Join/Prune Messages are sent by downstream CEs. The downstream state consists of: Per PIM (*,G,N): Per Port PIM (*,G,N) Join/Prune State: - DownstreamJPState: One of { "NoInfo" (NI), "Join" (J), "Prune Pending" (PP) } - Prune Pending Timer (PPT) - Join Prune Expiry Timer (ET) - Upstream Port (UP) (valid if the PIM(*,G,N) Join Prune State is "Join" or "Prune Pending"). From the above states, we can derive the macros pim_joins(*,G,N), and pim_iifs(*,G) that are defined in section 5.2.1. 5.2.6.1.1 BIDIR-PIM Downstream per-port BIDIR-PIM (*,G,N) State achine To correctly build PIM-SM snooping states, a PE will have to snoop on Join/Prune messages. The per-port state machine for receiving (*,G) Join/Prune messages is as described in Sections 3.4.1 and 3.4.2 of [BIDIR-PIM] with the exception that the downstream state is per port per upstream neighbor for a given (*,G) as opposed to per interface. 5.2.7. Multicast Source Directly Connected to the VPLS Instance If there is a source in the CE network that connects directly into the VPLS instance, then multicast traffic from that source MUST be sent to all PIM routers on the VPLS instance apart from the outgoing interface list for the corresponding snooping state. If there is already (S,G)/(*,G) snooping state that is formed on any PE, this will not happen per the current forwarding rules and guidelines. The (S,G)/(*,G) state may not send traffic towards all the routers. So, in order to determine if traffic needs to be flooded to all routers, a PE must be able to determine if the traffic came from a host on that LAN. There are three ways to address this problem: - The PE would have to do ARP snooping to determine if a source is directly connected. - Another option is to have configuration on all PEs to say there are CE sources that are directly connected to the VPLS instance and disallow snooping for the groups for which the source is going to send traffic. This way traffic from that source to those groups will always be flooded within the provider network. - A third option is to require that sources of CE multicast routers must appear behind a router. 5.2.7.1. PIM Join Suppression Issues [Page 18] draft-hemige-serbest-l2vpn-vpls-pim-snooping-00.txt Nov, 2005 For VPLS Multicast to work, the C-routers MUST disable PIM Join suppression. However, it is our understanding that existing deployments from several vendors do not support the capability to disable PIM Join suppression. If that is so, then VPLS Multicast simply does not work if we multicast the C-Joins to all C-routers. Also, the provider has no control over the configuration on a C- router (to ensure that C-Join Suppression is disabled). If the downstream PE determines that PIM Join suppression is active in a VPLS instance, then it MUST unicast-forward the C-Joins towards the RPF-neighbor field in the C-Join. This allows the C-Join to not be seen by other C-routers. Since we recommend that it unicast- forward the C-Join/Prune packets, it is important to ensure that the PIM control packets are received in order at the upstream C-router. To achieve this, the same ordering restriction that apply to broadcast and unknown frames apply to PIM control packets. 5.3. Data Forwarding Rules The final list of outgoing ports for a given (S,G) or (*,G) is computed by combining the IGMP/MLD and PIM state summarization macros. OifList(*,G) = local_include(*,G) (+) pim_oiflist(*,G) (+) local_iif(*,G) Oiflist(S,G) = local_include(*,G) (-) local_exclude(S,G) (+) local_include(S,G) (+) pim_oiflist(S,G) (+) local_iif(S,G) The following rules MUST be followed when forwarding multicast traffic in a VPLS: - Traffic arriving on a port MUST NOT be forwarded back onto the same port. - Due to VPLS Split-Horizon rules, traffic ingressing on a PW MUST NOT be forwarded to any other PW. Additional Guidelines: - If there is no matching FIB entry, then the PE MAY either discard the packet or send it to All_Pim_Neighbors or to a configured set of ports. How this is determined is outside the scope of this document. 6. IANA Considerations This document does not require any IANA assignments or action. 7. Security Considerations [Page 19] draft-hemige-serbest-l2vpn-vpls-pim-snooping-00.txt Nov, 2005 Security considerations provided in VPLS solution documents (i.e., [VPLS-LDP] and [VPLS-BGP) apply to this document as well. 8. Normative References 9. Informative References [VPLS-LDP] Lasserre, M, et al. "Virtual Private LAN Services over MPLS", work in progress [VPLSD-BGP] Kompella, K, et al. "Virtual Private LAN Service", work in progress [L2VPN-FR] Andersson, L, et al. "L2VPN Framework", work in progress [PMP-RSVP-TE] Aggarwal, R, et al. "Extensions to RSVP-TE for Point to Multipoint TE LSPs", work in progress [RFC1112] Deering, S., "Host Extensions for IP Multicasting", RFC 1112, August 1989. [RFC2236] Fenner, W., "Internet Group Management Protocol, Version 2", RFC 2236, November 1997. [RFC3376] Cain, B., et al. "Internet Group Management Protocol, Version 3", RFC 3376, October 2002. [IGMP-SNOOP] Christensen, M., et al. "Considerations for IGMP and MLD Snooping Switches", work in progress [PIM-DM] Deering, S., et al. "Protocol Independent Multicast Version 2 - Dense Mode Specification", RFC 3973, January 2005. [PIM-SM] Fenner, W, et al. "Protocol Independent Multicast- Sparse Mode (PIM-SM): Protocol Specification (Revised)", draft-ietf-pim-sm-v2-new-11.txt, April 2005. [PIM-SSM] Holbrook, H., et al. "Source-Specific Multicast for IP", work in progress [BIDIR-PIM] Handley, M., et al. "Bi-directional Protocol Independent Multicast (BIDIR-PIM)", work in progress [VPLS-MCAST-LDP] Qui, R, Serbest, Y, et al, "Using LDP for VPLS Multicast", draft-qiu-serbest-vpls-mcast-ldp-00.txt, Work in progress [VPLS-MCAST-BGP] Aggarwal, R, et al, "Propagation of VPLS IP Multicast Group Membership Information", draft- raggarwa-l2vpn-vpls-mcast-ctrl-00.txt, Work in progress [VPLS-MCAST-TREES] Aggarwal, R, et al. "Multicast in VPLS", draft- raggarwa-l2vpn-vpls-mcast-01.txt, Work in progress. Authors' Addresses Venu Hemige Alcatel North America 701 East Middlefield Rd. Mountain View, CA 94043 Venu.hemige@alcatel.com [Page 20] draft-hemige-serbest-l2vpn-vpls-pim-snooping-00.txt Nov, 2005 Yetik Serbest SBC Labs 9505 Arboretum Blvd. Austin, TX 78759 Yetik_serbest@labs.sbc.com Ray Qiu Alcatel North America 701 East Middlefield Rd. Mountain View, CA 94043 Ray.Qiu@alcatel.com Suresh Boddapati Alcatel North America 701 East Middlefield Rd. Mountain View, CA 94043 Suresh.boddapati@alcatel.com Rob Nath Riverstone Networks 5200 Great America Parkway Santa Clara, CA 95054 Rnath@riverstonenet.com Sunil Khandekar Alcatel North America 701 East Middlefield Rd. Mountain View, CA 94043 Sunil.khandekar@alcatel.com Vach Kompella Alcatel North America 701 East Middlefield Rd. Mountain View, CA 94043 Vach.kompella@alcatel.com Marc Lasserre Riverstone Networks Marc@riverstonenet.com Himanshu Shah Ciena hshah@ciena.com [Page 21] draft-hemige-serbest-l2vpn-vpls-pim-snooping-00.txt Nov, 2005 Intellectual Property Statement The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79. Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf- ipr@ietf.org. Full copyright statement Copyright (C) The Internet Society (2005). This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights. This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. [Page 22]