Network Working Group Rahul Aggarwal Internet Draft Juniper Networks Expiration Date: August 2005 Yuji Kamite NTT Communications Luyuan Fang AT&T Multicast in VPLS draft-raggarwa-l2vpn-vpls-mcast-00.txt Status of this Memo By submitting this Internet-Draft, we certify that any applicable patent or IPR claims of which we are aware have been disclosed, and any of which we become aware will be disclosed, in accordance with RFC 3668. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as ``work in progress.'' The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. draft-raggarwa-l2vpn-vpls-mcast-00.txt [Page 1] Internet Draft draft-raggarwa-l2vpn-vpls-mcast-00.txt February 2005 Abstract This document describes a solution for overcoming the limitations of existing VPLS multicast solutions. It describes procedures for VPLS multicast that utilize multicast trees in the sevice provider (SP) network. One such multicast tree can be shared between multiple VPLS instances. Procedures for propagating multicast control information, learned from local VPLS sites, to remote VPLS sites, are described. These procedures do not require IGMP-PIM snooping on the SP backbone links. Conventions used in this document The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC-2119 [KEYWORDS]. 1. Contributors Rahul Aggarwal Yakov Rekhter Juniper Networks Yuji Kamite NTT Communications Luyuan Fang AT&T Chaitanya Kodeboniya Juniper Networks 2. Terminology This document uses terminology described in [VPLS-BGP] and [VPLS- LDP]. draft-raggarwa-l2vpn-vpls-mcast-00.txt [Page 2] Internet Draft draft-raggarwa-l2vpn-vpls-mcast-00.txt February 2005 3. Introduction [VPLS-BGP] and [VPLS-LDP] describe a solution for VPLS multicast that relies on ingress replication. This solution has certain limitations for certain VPLS multicast traffic profiles. This document describes procedures for overcoming the limitations of existing VPLS multicast solutions. It describes procedures for VPLS multicast that utilize multicast trees in the sevice provider (SP) network. One such multicast tree can be shared between multiple VPLS instances. Procedures for propagating multicast control information, learned from local VPLS sites, to remote VPLS sites are described. These procedures do not require IGMP-PIM snooping on the SP backbone links. 4. Existing Limitation of VPLS Multicast VPLS multicast solutions described in [VPLS-BGP] and [VPLS-LDP] rely on ingress replication. Thus the ingress PE replicates the multicast packet for each egress PE and sends it to the egress PE using a unicast tunnel. This is a reasonable model when the bandwidth of the multicast traffic is low or/and the number of replications performed on an average on each outgoing interface for a particular customer VPLS multicast packet is small. If this is not the case it is desirable to utilize multicast trees in the SP core to transmit VPLS multicast packets. Note that unicast packets that are flooded to each of the egress PEs, before the ingress PE performs learning for those unicast packets, will still use ingress replication. By appropriate IGMP or PIM snooping it is possible for the ingress PE to send the packet only to the egress PEs that have the receivers for that traffic, rather than to all the PEs in the VPLS instance. While PIM/IGMP snooping allows to avoid the situation where an IP multicast packet is sent to PEs with no receivers, there is a cost for this optimization. Namely, a PE has to maintain (S,G) state for all the (S,G) of all the VPLSs present on the PE. And not only this, but also PIM snooping has to be done not only on the CE-PE interfaces, but on Pseudo-Wire (PW) interfaces as well, which in turn introduces a non-negligeable overhead on the PE. It is desirable to reduce this overhead when IGMP/PIM snooping is used. draft-raggarwa-l2vpn-vpls-mcast-00.txt [Page 3] Internet Draft draft-raggarwa-l2vpn-vpls-mcast-00.txt February 2005 5. Overview This document describes procedures for using multicast trees in the SP network to transport VPLS multicast data packets. RSVP-TE P2MP LSPs described in [RSVP-P2MP] are an example of such multicast trees. The use of multicast trees in the SP network can be beneficial when the bandwidth of the multicast traffic is high or when it is desirable to optimize the number of copies of a multicast packet transmitted by the ingress. This comes at a cost of state in the SP core to build multicast trees and overhead to maintain this state. This document places no restrictions on the protocols used to build SP multicast trees. Multicast trees used for VPLS can be of two types: 1. Default Trees. A single multicast distribution tree in the SP backbone is used to carry all the multicast traffic from a specified set of one or more VPLSs. These multicast distribution trees can be set up to carry the traffic of a single VPLS, or to carry the traffic of multiple VPLSs. The ability to carry the traffic of more than one VPLS on the same tree is termed 'Aggregation'. The tree will include every PE that is a member of any of the VPLSs that are using the tree. This enables the SP to place a bound on the amount of multicast routing state which the P routers must have. This implies that a PE may receive multicast traffic for a multicast stream even if it doesn't have any receivers on the path of that stream. 2. Data Trees. A Data Tree is used by a PE to send multicast traffic for one or more multicast streams, that belong to the same or different VPLSs, to a subset of the PEs that belong to those VPLSs. Each of the PEs in the subset are on the path to a receiver of one or more multicast streams that are mapped onto the tree. The ability to use the same tree for multicast streams that belong to different VPLSs is termed 'Aggregation'. The reason for having Data Trees is to provide a PE to have the ability to create separate SP multicast trees for high bandwidth multicast groups. This allows traffic for these multicast groups to reach only those PE routers that have receivers in these groups. This avoids flooding other PE routers in the VPLS. A SP can use both Default Trees and Data Trees or either of them for a given VPLS on a PE, based on local configuration. Default Trees can be used for both IP and non-IP data multicast traffic, while Data Trees can be used only for IP multicast data traffic. In order to establish Default and Data multicast trees the root of the tree must be able to discover the VPLS membership of all the PEs and/or the multicast groups that each PE has receivers in. This document describes procedures for doing this. For discovering the draft-raggarwa-l2vpn-vpls-mcast-00.txt [Page 4] Internet Draft draft-raggarwa-l2vpn-vpls-mcast-00.txt February 2005 multicast group membership this document describes procedures that do not rely on IGMP-PIM snooping in the SP backbone. These procedures can also be used with ingress replication to send traffic for a multicast stream to only those PEs that are on the path to receivers for that strea. Aggregation also requires a mechanism for the egresses of the tree to demultiplex the multicast traffic received over the tree. This document describes how upstream label allocation by the root of the tree can be used to perform this demultiplexing. This document also describes procedures based on BGP that are used by the root of an Aggregate Tree to advertise the Default or Data tree binding and the demultiplexing information to the leaves of the tree This document uses the prefix 'C' to refer to the customer control or data packets and 'P' to refer to the provider control or data packets. 6. VPLS Multicast / Broadcast / Unknown Unicast Data Packet Treatment If the destination MAC address of a VPLS packet received by a PE from a VPLS site is a multicast adddress, a multicast tree SHOULD be used to transport the packet, if possible. Such a tree can be a Default Tree for the VPLS. It can also be a Data Tree if the VPLS multicast packet is an IP packet. If the destination MAC address of a VPLS packet is a broadcast address, it is flooded. If Default tree is already established, PE floods over it. If Default Tree cannot be used for some reason, PE MUST flood over multiple unicast PWs, based on [VPLS-BGP] [VPLS-LDP]. If the destination MAC address of the packet has not been learned, the flooding of the packet also occurs. Unlike broadcast case, it should be noted that when a PE learns the MAC it might immediately switch to transport over one particular PW. This implies that flooding unknown unicast traffic over Default Tree might lead to packet reordering. Therefore, unknown unicast SHOULD be flooded over multiple unicast PWs based on [VPLS-BGP] [VPLS-LDP], not over multicast trees. P-multicast trees are intended to be used only for VPLS C-multicast data packets, not for control packets being used by a customer's layer-2 and layer-3 control protocols. For instance, Bridge Protocol Data Units (BPDUs) use an IEEE assigned all bridges multicast MAC address, and OSPF uses OSPF routers multicast MAC address. P- multicast trees SHOULD not be used for transporting these control packets. draft-raggarwa-l2vpn-vpls-mcast-00.txt [Page 5] Internet Draft draft-raggarwa-l2vpn-vpls-mcast-00.txt February 2005 7. Propagating Multicast Control Information PEs participating in VPLS need to learn the information for two reasons: 1. With ingress replication, this allows a PE to send the IP multicast packet for a only to other PEs in the VPLS instance, that have receivers interested in that particular . This eliminates flooding. 2. It allows the construction of Aggregate Data Trees. There are two components for a PE to learn the information in a VPLS: 1. Learning the information from the locally homed VSIs. 2. Learning the information from the remote VSIs. 7.1. IGMP/PIM Snooping In order to learn the information from the locally homed VSIs a PE needs to implement IGMP/PIM snooping. This is because there is no PIM adjacency between the locally homed CEs and the PE. IGMP/PIM snooping has to be used to build the database of C-Joins that are being sent by the customer for a particular VSI. This also requires a PE to create a IGMP/PIM instance per VSI for which IGMP/PIM snooping is used. This instance is analogous to the multicast VRF PIM instance that is created for MVPNs. It is conceivable that IGMP/PIM snooping can be used to learn information from remote VSIs by snooping VPLS traffic received over the SP backbone. However IGMP/PIM snooping is computationally expensive. Furthermore the periodic nature of PIM Join/Prune messages implies that snooping PIM messages places even a greater processing burden on a PE. Hence to learn information from remote VSIs, this document proposes the use of a reliable protocol machinery to transport information over the SP infrastructure. This is described in the next section. 7.2. C-Multicast Control Information Propagation in the SP A C-Join/Prune message for coming from a customer, that are snooped by a PE have to be propagated to the remote PE that can reach C-S. One way to do this is to forward the C-Join/Prune as a multicast data packet and let the egress PEs perform IGMP/PIM snooping over the pseudo-wire. However PIM is a soft state protocol and periodically re-transmits C-Join/Prune messages. This places a big burden on a PE while snooping PIM messages. It is not possible to draft-raggarwa-l2vpn-vpls-mcast-00.txt [Page 6] Internet Draft draft-raggarwa-l2vpn-vpls-mcast-00.txt February 2005 eliminate this overhead for snooping messages received over the customer facing interfaces. However it is possible to alleviate this overhead over SP facing interfaces. This is done by converting snooped PIM C-Join/Prune messages to reliable protocol messages over the SP network. Each PE maintains the database of IGMP/PIM entries that are snooped and that are learnt from remote PEs for each VSI. Unlike MVPNs there is an additional challenge while propagating snooped PIM C-Join/Prune messages over the SP network for VPLS. If the ingress PE wishes to propagate the C-Join/Prune only to the upstream PE which has reachability to C-S, this upstream PE is not known. This is because the local PE doesn't have a route to reach C- S. This is unlike MVPNs where the route to reach C-S is known from the unicast VPN routing table. This implies that the C-Join/Prune message has to be sent to all the PEs in the VPLS. This document proposes two possible solutions for achieving this and one of these will be eventually picked after discussion in the WG. 1. Using PIM This is similar to the propagation of PIM C-Join/Prune messages for MVPN that has been described earlier in the document. The PIM Neighbor discovery and maintenance is based on the VPLS membership information learnt as part of VPLS auto-discovery. VPLS auto- discovery allows a particular PE to learn which of the other PEs belong to a particular VPLS instance. Each of these PEs can be treated as a neighbor for PIM procedures while sending PIM C- Join/Prune messages to other PEs. The neighbor is considered up as long as the VPLS auto-discovery mechanism does not withdraw the neighbor membership in the VPLS instance. The C-Join/Prune messages is sent to all the PEs in the VPLS using unicast PIM messages. The use of unicast PIM implies that there is no Join suppression PIM refresh reduction mechanisms, that are currently being worked upon in the PIM WG, MUST be used. To send the C- Join/Prune message to a particular remote PE, the message is encapsulated in the PW used to reach the PE, for the VPLS that the C- Join/Prune message belongs to. 2. Using BGP The use of PIM for propagation VPLS C-Join/Prune information may have scalability limitations. This is because even after building PIM refresh reduction mechanisms PIM will not have optimized transport when there is one sender and multiple receivers. BGP provides such transport as it has route-reflector machinery. One option to draft-raggarwa-l2vpn-vpls-mcast-00.txt [Page 7] Internet Draft draft-raggarwa-l2vpn-vpls-mcast-00.txt February 2005 propagate the C-Join/Prune information is to use BGP. This is done by using the BGP mechanisms described in section 13. 8. Multicast Tree Leaf Discovery 8.1. Default Tree Leaf Discovery VPLS auto-discovery as described in [VPLS-BGP, BGP-AUTO] or another VPLS auto-discovery mechanism enables a PE to learn the VPLS membership of other PEs. This is used by the root of the Tree to learn the egresses of the tree. 8.2. Data Tree Leaf Discovery This is done using the C-Multicast control information propagation described in the previous section. 9. Demultiplexing Multicast Tree Traffic Demultiplexing received VPLS traffic requires the receiving PE to determine the VPLS instance the packet belongs to. The egress PE can then perform a VPLS lookup to further forward the packet. 9.1. One Multicast Tree - One VPLS Mapping When a multicast tree is mapped to only one VPLS, determining the tree on which the packet is received is sufficient to determine the VPLS instance on which the packet is received. The tree is determined based on the tree encapsulation. If MPLS encapsulation is used, eg: RSVP-TE P2MP LSPs, the outer MPLS label is used to determine the tree. Penultimate-hop-popping must be disabled on the RSVP-TE P2MP LSP. 9.1.1. One Multicast Tree - Many VPLS Mapping As traffic belonging to multiple VPLSs can be carried over the same tree, there is a need to identify the VPLS the packet belongs to. This is done by using an inner label that corresponds to the VPLS for which the packet is intended. The ingress PE uses this label as the inner label while encapsulating a customer multicast data packet. Each of the egress PEs must be able to associate this inner label with the same VPLS and use it to demultimplex the traffic received over the Aggregate Default Tree or the Aggregate Data Tree. If downstream label assignment were used this would require all the egress PEs in the VPLS to agree on a common label for the VPLS. draft-raggarwa-l2vpn-vpls-mcast-00.txt [Page 8] Internet Draft draft-raggarwa-l2vpn-vpls-mcast-00.txt February 2005 We propose a solution that uses upstream label assignment by the ingress PE. Hence the inner label is allocated by the ingress PE. Each egress PE has a separate label space for every Aggregate Tree for which the egress PE is a leaf node. The egress PEs create a forwarding entry for the inner VPN label, allocated by the ingress PE, in this label space. Hence when the egress PE receives a packet over an Aggregate Tree, the Tree identifier specifies the label space to perform the inner label lookup. An implementation may create a logical interface corresponding to an Aggregate Tree. In that case the label space to lookup the inner label is an interface based label space where the interface corresponds to the tree. When PIM based IP/GRE trees are used the root PE source address and the tree P-group address identifies the tree interface. The label space corresponding to the tree interface is the label space to perform the inner label lookup in. A lookup in this label space identifies the VPLS in which the customer multicast lookup needs to be done. If the tree uses MPLS encapsulation the outer MPLS label and the incoming interface provides the label space of the label beneath it. This assumes that penultimate-hop-popping is disabled. An example of this is RSVP-TE P2MP LSPs. The outer label and incoming interface effectively identifies the Tree interface. The ingress PE informs the egress PEs about the inner label as part of the tree binding procedures described in section 11. 10. Establishing Multicast Trees This document does not place any restrictions on the multicast technology used to setup P-multicast trees. However specific procedures are specified only for RSVP-TE P2MP LSPs, PIM-SM and PIM- SSM based trees. A P-multicast tree can be either a source tree or a shared tree. A source tree is used to carry traffic only for the VPLSs that exist locally on the root of the tree i.e. for which the root has local CEs. A shared tree on the other hand can be used to carry traffic belonging to VPLSs that exist on other PEs as well. For example a RP based PIM-SM Aggregate tree would be a shared tree. draft-raggarwa-l2vpn-vpls-mcast-00.txt [Page 9] Internet Draft draft-raggarwa-l2vpn-vpls-mcast-00.txt February 2005 10.1. RSVP-TE P2MP LSPs This section describes procedures that are specific to the usage of RSVP-TE P2MP LSPs for instantiating a tree. The RSVP-TE P2MP LSP can be either a source tree or a shared tree. Procedures in [RSVP-TE- P2MP] are used to signal the LSP. The LSP is signaled after the root of the LSP discovers the leaves. The egress PEs are discovered using the procedures described in section 8. Aggregation as described in this document is supported. 10.1.1. P2MP TE LSP - VPLS Mapping P2MP TE LSP to VPLS mapping can be learned at the egress PEs using BGP based advertisements of the P2MP TE LSP - VPLS mapping. They require that the root of the tree include the P2MP TE LSP identifier as the tunnel identifier in the BGP advertisements. This identifier contains the following information elements: - The type of the tunnel is set to RSVP-TE P2MP LSP - RSVP-TE P2MP LSP's SESSION Object - RSVP-TE P2MP LSP's SENDER_TEMPLATE Object 10.1.2. Demultiplexing C-Multicast Data Packets Demultiplexing the C-multicast data packets at the egress PE require that the PE be able to determine the P2MP TE LSP that the packets are received on. The egress PE needs to determine the P2MP LSP to determine the VPLS that the packet belongs to, as described in section 9. To achieve this the LSP must be signaled with penultimate- hop-popping (PHP) off. This is because the egress PE needs to rely on the MPLS label, that it advertises to its upstream neighbor, to determine the P2MP LSP that a C-multicast data packet is received on. Signaling the P2MP TE LSP with PHP off requires an extension to RSVP- TE which will be described in a future version of this document. 10.2. Receiver Initiated MPLS Trees Receiver initiated MPLS trees can also be used. Details of the usage of these trees will be specified in a later revision. 10.3. PIM Based Trees When PIM is used to setup multicast trees in the SP core, an Aggregate Default Tree is termed as the "Aggregate MDT" and an Aggregate Data Tree is termed as an "Aggregate Data MDT". The Aggregate MDT may be a shared tree, rooted at the RP, or a shortest path tree. Aggregate Data MDT is rooted at the PE that is connected to the multicast traffic source. The root of the Aggregate MDT or the Aggregate Data MDT has to advertise the P-Group address chosen by it draft-raggarwa-l2vpn-vpls-mcast-00.txt [Page 10] Internet Draft draft-raggarwa-l2vpn-vpls-mcast-00.txt February 2005 for the MDT to the PEs that are leaves of the MDT. These other PEs can then Join this MDT. The announcement of this address is done as part of the tree binding procedures described in section 11. 10.4. Encapsulation of the Aggregate Default Tree and Aggregate Data Tree An Aggregate Default Tree or an Aggregate Data Tree may use an IP/GRE encapsulation or a MPLS encapsulation. The protocol type in the IP/GRE header in the former case and the protocol type in the data link header in the latter case needs further explanation. This will be done later. 11. Tree to VPLS / C-Multicast Stream Binding Distribution Once a PE sets up an Aggregate Default Tree or an Aggregate Data Tree it needs to announce the customer multicast groups being mapped to this tree to other PEs in the network. This procedure is referred to as Default Tree or Data Tree binding distribution and is performed using BGP. For an Default Tree this discovery implies announcing the mapping of all VPLSs mapped to the Default Tree. The inner label allocated by the ingress PE for each VPLS is included. The Default Tree Identifier is also included. For an Data Tree this discovery implies announcing all the specific entries mapped to this tree along with the Data Tree Identifier. The inner label allocated for each is included. The Data Tree Identifier is also included. The egress PE creates a logical interface corresponding to the Default Tree or the Data Tree identifier. An Default Tree by definition maps to all the entries belonging to all the VPLSs associated with the Default Tree. An Data Tree maps to the specific associated with it. When PIM is used to setup SP multicast trees, the egress PE also Joins the P-Group Address corresponding to the MDT or the Data MDT. This results in setup of the PIM SP tree. draft-raggarwa-l2vpn-vpls-mcast-00.txt [Page 11] Internet Draft draft-raggarwa-l2vpn-vpls-mcast-00.txt February 2005 12. Switching to Aggregate Data Trees Data Trees provide a PE the ability to create separate SP multicast trees for certain entires. The source PE that originates the Data Tree and the egress PEs have to switch to using the Data Tree for the entries that are mapped to it. Once a source PE decides to setup an Data Tree, it announces the mapping of the entries that are mapped on the tree to the other PEs using BGP. Depending on the SP multicast technology used, this announcement may be done before or after setting up the Data Tree. After the egress PEs receive the announcement they setup their forwarding path to receive traffic on the Data Tree if they have one or more receivers interested in the entries mapped to the tree. This implies setting up the demultiplexing forwarding entries based on the inner label as described earlier. The egress PEs may perform this switch to the Data Tree once the advertisement from the ingress PE is received or wait for a preconfigured timer to do so. A source PE may use one of two approaches to decide when to start transmitting data on the Data tree. In the first approach once the source PE sets up the Data Tree, it starts sending multicast packets for entries mapped to the tree on both that tree as well as on the Default Tree. After some preconfigured timer the PE stops sending multicast packets for entries mapped on the Data Tree on the default tree. In the second approach a certain pre- configured delay after advertising the entries mapped to an Data Tree, the source PE begins to send traffic on the Data Tree. At this point it stops to send traffic for the entries, that are mapped on the Data Tree, on the Default Tree. This traffic is instead transmitted on the Data Tree. 13. BGP Advertisements The procedures required in this document use BGP for Tree - VPLS binding advertisements, Tree - Multicast stream binding advertisement, and for C-Multicast control propagation. This section first describes the information that needs to be propagated in BGP for achieving the functional requirements. It then describes a suggested encoding. 13.1. Information Elements draft-raggarwa-l2vpn-vpls-mcast-00.txt [Page 12] Internet Draft draft-raggarwa-l2vpn-vpls-mcast-00.txt February 2005 13.1.1. Default Tree - VPLS Binding Advertisement The root of an Aggregate Default Tree maps one or more VPLS instances to the Default Tree. It announces this mapping in BGP. Along with the VPLS instances that are mapped to the Default Tree. The Default Tree identifier is also advertised in BGP. The following information is required in BGP to advertise the VPLS instance that is mapped to the Default Tree: 1. The address of the router that is the root of the Default Tree. 2. The inner label allocated by the Default Tree root for the VPLS instance. The usage of this label is described in section 9. When a PE distributes this information via BGP, it must include the following: 1. An identifier of the Default Tree. 2. Route Target Extended Communities attribute. This RT must be an "Import RT" of each VSI in the VPLS. The BGP distribution procedures used by [VPLS-BGP] or [BGP-AUTO] will then ensure that the advertised information gets associated with the right VSIs. 13.1.2. Data Tree - C-Multicast Stream Binding Advertisement The root of an Aggregate Data Tree maps one or more entries to the tree. These entries are advertised in BGP along with the the Data Tree identifier to which these entries are mapped. The following information is required in BGP to advertise the entries that are mapped to the Data Tree: 1. The RD configured for the VPLS instance. This is required to uniquely identify the as the addresses could overlap between different VPLS instances. 2. The inner label allocated by the Data Tree root for the . The usage of this label is described in section 9. 3. The C-Source address. This address can be a prefix in order to allow a range of C-Source addresses to be mapped to the Data Tree. 4. The C-Group address. This address can be a range in order to allow a range of C-Group addresses to be mapped to the Data Tree. When a PE distributes this information via BGP, it must include the following: 1. An identifier of the Data Tree. 2. Route Target Extended Communities attribute. This is used as described in section 13.1.1. draft-raggarwa-l2vpn-vpls-mcast-00.txt [Page 13] Internet Draft draft-raggarwa-l2vpn-vpls-mcast-00.txt February 2005 13.1.3. Using BGP for Propagating VPLS C-Joins/Prunes Section 7.2 describes PIM and BGP as possible options for propagating VPLS C-Join/Prune information. This section describes the information elements needed if BGP were to be used to propagate the VPLS C- Join/Prune information in the SP network. The following information is required to be advertised by BGP for a VPLS for VPLS C-Join propagation and withdrawn by BGP for VPLS C-Prune propagation. 1. The RD configured for the VPLS instance. This is required to uniquely identify the as the addresses could overlap between different VPLS instances. 2. The C-Source address. This can be a prefix. 3. The C-Group address. This can be a prefix. When a PE distributes this information via BGP, it must include the Route Target Extended Communities attribute. This is used as described in section 13.1.1. 13.1.4. Default Tree/Data Tree Identifier Default Tree and Data Tree advertisements carry the Tree identifier. The following information elements are needed in this identifier. 1. Whether this is a shared Default Tree or not. 2. The type of the tree. For example the tree may use PIM-SM or PIM-SSM. 3. The identifier of the tree. For trees setup using PIM the identifier is a (S, G) value. 13.2. Suggested Encoding This section describes a suggested BGP encoding for carrying the information elements described above. This encoding needs further discussion. A new Subsequence-Address Family (SAFI) called the VPLS MCAST SAFI is proposed. Following is the format of the NLRI associated with this SAFI: +---------------------------------+ | Length (2 octets) | +---------------------------------+ | MPLS Labels (variable) | |---------------------------------+ | RD (8 octets) | +---------------------------------+ | Multicast Source Length | draft-raggarwa-l2vpn-vpls-mcast-00.txt [Page 14] Internet Draft draft-raggarwa-l2vpn-vpls-mcast-00.txt February 2005 +---------------------------------+ |Multicast Source (Variable) | +---------------------------------+ |Multicast Group (Variable) | +---------------------------------+ For Default Tree discovery the information elements for the VPLS instances that are mapped to the Default Tree are encoded as a NLRI. The RD is set to the configured RD for the VPLS. The Multicast Group is set to 0. The source address is set to the PE's P-address. This advertisement also carries a new attribute to identify the Default Tree. The BGP next-hop address in the NEXT_HOP attribute or the MP_REACH_ATTRIBUTE is set to the PE's P-address. This P-address is the address of the root of the tree. For Data Tree discovery, the information elements for the entries that are mapped to the tree are encoded in a NLRI and are set using the information elements described in section 13.1.2. The address of the Data Tree root router is carried in the BGP next-hop address of the MP_REACH_ATTRIBUTE. For VPLS C-Join/Prune propagation the information elements are encoded in a NLRI. The address of the router originating the C- Join/Prunes is carried in the BGP next-hop address of the MP_REACH_ATTRIBUTE. A new optional transitive attribute called the Multicast_Tree_Attribute is defined to signal the Default Tree or the Data Tree. Following is the format of this attribute: +---------------------------------+ |S| Reserved | Tree Type | +---------------------------------+ | Tree Identifier | | . | | . | +---------------------------------+ The S bit is set if the tree is a shared Default Tree. Tree type identifies the SP multicast technology used to establish the tree. This determines the semantics of the tree identifier. Currently three Tree Types are defined: 1. PIM-SSM Tree 2. PIM-SM Tree draft-raggarwa-l2vpn-vpls-mcast-00.txt [Page 15] Internet Draft draft-raggarwa-l2vpn-vpls-mcast-00.txt February 2005 3. RSVP-TE P2MP LSP When the type is set to PIM-SM MDT or PIM-SSM, the tree identifier contains a PIM address. When the type is set to RSVP-TE P2MP LSP, the tree identifier contains a RSVP-TE tuple. Hence MP_REACH identifies the set of VPLS customer's multicast trees, the Multicast_Tree_Attribute identifies a particular SP tree (aka Default Tree or Data Tree), and the advertisement of both in a single BGP Update creates a binding/mapping between the SP tree (the Default Tree or Data Tree) and the set of VPLS customer's trees. 14. Aggregation Methodology In general the herustics used to decide which VPLS instances or entries to aggregate is implementation dependent. It is also conceivable that offline tools can be used for this purpose. This section discusses some tradeoffs with respect to aggregation. The "congruency" of aggregation is defined by the amount of overlap in the leaves of the client trees that are aggregated on a SP tree. For Aggregate Default Trees the congruency depends on the overlap in the membership of the VPLSs that are aggregated on the Aggregate Default Tree. If there is complete overlap aggregation is perfectly congruent. As the overlap between the VPLSs that are aggregated reduces, the congruency reduces. If aggregation is done such that it is not perfectly congruent a PE may receive traffic for VPLSs to which it doesn't belong. As the amount of multicast traffic in these unwanted VPLSs increases aggregation becomes less optimal with respect to delivered traffic. Hence there is a tradeoff between reducing state and delivering unwanted traffic. An implementation should provide knobs to control the congruency of aggregation. This will allow a SP to deploy aggregation depending on the VPLS membership and traffic profiles in its network. If different PEs or RPs are setting up Aggregate Default Trees this will also allow a SP to engineer the maximum amount of unwanted VPLSs that a particular PE may receive traffic for. The state/bandwidth optimality trade-off can be further improved by having a versatile many-to-many association between client trees and provider trees. Thus a VPLS can be mapped to multiple Aggregate Trees. The mechanisms for achieving this are for further study. Also it may be possible to use both ingress replication and an Aggregate draft-raggarwa-l2vpn-vpls-mcast-00.txt [Page 16] Internet Draft draft-raggarwa-l2vpn-vpls-mcast-00.txt February 2005 Tree for a particular VPLS. Mechanisms for achieving this are also for further study. 15. Data Forwarding 15.1. MPLS Tree Encapsulation The following diagram shows the progression of the VPLS IP multicast packet as it enters and leaves the SP network when MPLS trees are being used for multiple VPLS instances. RSVP-TE P2MP LSPs are examples of such trees. Packets received Packets in transit Packets forwarded at ingress PE in the service by egress PEs provider network +---------------+ |MPLS Tree Label| +---------------+ | VPN Label | ++=============++ ++=============++ ++=============++ || C-IP Header || || C-IP Header || || C-IP Header || ++=============++ >>>>> ++=============++ >>>>> ++=============++ || C-Payload || || C-Payload || || C-Payload || ++=============++ ++=============++ ++=============++ The receiver PE does a lookup on the outer MPLS tree label and determines the MPLS forwarding table in which to lookup the inner MPLS label. This table is specific to the tree label space. The inner label is unique within the context of the root of the tree (as it is assigned by the root of the tree, without any coordination with any other nodes). Thus it is not unique across multiple roots. So, to unambiguously identify a particular VPLS one has to know the label, and the context within which that label is unique. The context is provided by the outer MPLS label. The outer MPLS label is stripped. The lookup of the resulting MPLS label determines the VSI in which the receiver PE needs to do the C- multicast data packet lookup. It then strips the inner MPLS label and sends the packet to the VSI for multicast data forwarding. draft-raggarwa-l2vpn-vpls-mcast-00.txt [Page 17] Internet Draft draft-raggarwa-l2vpn-vpls-mcast-00.txt February 2005 15.2. IP Tree Encapsulation The following diagram shows the progression of the packet as it enters and leaves the SP network when the Aggregate MDT or Aggregate Data MDTs are being used for multiple VPLS instances. MPLS-in-GRE [MPLS-IP] encapsulation is used to encapsulate the customer multicast packets. Packets received Packets in transit Packets forwarded at ingress PE in the service by egress PEs provider network +---------------+ | P-IP Header | +---------------+ | GRE | +---------------+ | VPN Label | ++=============++ ++=============++ ++=============++ || C-IP Header || || C-IP Header || || C-IP Header || ++=============++ >>>>> ++=============++ >>>>> ++=============++ || C-Payload || || C-Payload || || C-Payload || ++=============++ ++=============++ ++=============++ The P-IP header contains the Aggregate MDT (or Aggregate Data MDT) P- group address as the destination address and the root PE address as the source address. The receiver PE does a lookup on the P-IP header and determines the MPLS forwarding table in which to lookup the inner MPLS label. This table is specific to the Aggregate MDT (or Aggregate Data MDT) label space. The inner label is unique within the context of the root of the MDT (as it is assigned by the root of the MDT, without any coordination with any other nodes). Thus it is not unique across multiple roots. So, to unambiguously identify a particular VPLS one has to know the label, and the context within which that label is unique. The context is provided by the P-IP header. The P-IP header and the GRE header is stripped. The lookup of the resulting MPLS label determines the VSI in which the receiver PE needs to do the C-multicast data packet lookup. It then strips the inner MPLS label and sends the packet to the VSI for multicast data forwarding. draft-raggarwa-l2vpn-vpls-mcast-00.txt [Page 18] Internet Draft draft-raggarwa-l2vpn-vpls-mcast-00.txt February 2005 16. Security Considerations Security considerations discussed in [VPLS-BGP] and [VPLS-LDP] apply to this document. 17. Acknowledgments Many thanks to Thomas Morin for his support of this work. 18. Normative References [RFC2119] "Key words for use in RFCs to Indicate Requirement Levels.", Bradner, March 1997 [RFC3107] Y. Rekhter, E. Rosen, "Carrying Label Information in BGP-4", RFC3107. [VPLS-BGP] K. Kompella, Y. Rekther, "Virtual Private LAN Service", draft-ietf-l2vpn-vpls-bgp-02.txt [VPLS-LDP] M. Lasserre, V. Kompella, "Virtual Private LAN Services over MPLS", draft-ietf-l2vpn-vpls-ldp-03.txt [MPLS-IP] T. Worster, Y. Rekhter, E. Rosen, "Encapsulating MPLS in IP or Generic Routing Encapsulation (GRE)", draft-ietf-mpls-in-ip-or- gre-08.txt [BGP-AUTO] H. Ould-Brahim et al., "Using BGP as an Auto-Discovery Mechanism for Layer-3 and Layer-2 VPNs", draft-ietf-l3vpn-bgpvpn- auto-04.txt [RSVP-P2MP] R. Aggarwal et. al, "Extensions to RSVP-TE for Point to Multipoint TE LSPs", draft-ietf-mpls-rsvp-te-p2mp-01.txt draft-raggarwa-l2vpn-vpls-mcast-00.txt [Page 19] Internet Draft draft-raggarwa-l2vpn-vpls-mcast-00.txt February 2005 19. Informative References 20. Author Information 20.1. Editor Information Rahul Aggarwal Juniper Networks 1194 North Mathilda Ave. Sunnyvale, CA 94089 Email: rahul@juniper.net 20.2. Contributor Information Yakov Rekhter Juniper Networks 1194 North Mathilda Ave. Sunnyvale, CA 94089 Email: yakov@juniper.net Yuji Kamite NTT Communications Corporation Tokyo Opera City Tower 3-20-2 Nishi Shinjuku, Shinjuku-ku, Tokyo 163-1421, Japan Email: y.kamite@ntt.com Luyuan Fang AT&T 200 Laurel Avenue, Room C2-3B35 Middletown, NJ 07748 Phone: 732-420-1921 Email: luyuanfang@att.com Chaitanya Kodeboniya Juniper Networks 1194 North Mathilda Ave. Sunnyvale, CA 94089 Email: ck@juniper.net draft-raggarwa-l2vpn-vpls-mcast-00.txt [Page 20] Internet Draft draft-raggarwa-l2vpn-vpls-mcast-00.txt February 2005 21. Intellectual Property The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79. Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf- ipr@ietf.org. 22. Full Copyright Statement Copyright (C) The Internet Society (2004). This document is subject to the rights, licenses and restrictions contained in BCP 78 and except as set forth therein, the authors retain all their rights. This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUNG BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. draft-raggarwa-l2vpn-vpls-mcast-00.txt [Page 21] Internet Draft draft-raggarwa-l2vpn-vpls-mcast-00.txt February 2005 23. Acknowledgement Funding for the RFC Editor function is currently provided by the Internet Society. draft-raggarwa-l2vpn-vpls-mcast-00.txt [Page 22]