Network Working Group                          Rahul Aggarwal
Internet Draft                                 Juniper Networks
Expiration Date: August 2005                   Yuji Kamite
                                               NTT Communications
                                               Luyuan Fang
                                               AT&T


                           Multicast in VPLS

                 draft-raggarwa-l2vpn-vpls-mcast-00.txt


Status of this Memo

   By submitting this Internet-Draft, we certify that any applicable
   patent or IPR claims of which we are aware have been disclosed, and
   any of which we become aware will be disclosed, in accordance with
   RFC 3668.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as ``work in progress.''

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.


draft-raggarwa-l2vpn-vpls-mcast-00.txt                          [Page 1]

Internet Draft   draft-raggarwa-l2vpn-vpls-mcast-00.txt    February 2005


Abstract

   This document describes a solution for overcoming the limitations of
   existing VPLS multicast solutions. It describes procedures for VPLS
   multicast that utilize multicast trees in the sevice provider (SP)
   network.  One such multicast tree can be shared between multiple VPLS
   instances.  Procedures for propagating multicast control information,
   learned from local VPLS sites, to remote VPLS sites, are described.
   These procedures do not require IGMP-PIM snooping on the SP backbone
   links.


Conventions used in this document

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC-2119 [KEYWORDS].


1. Contributors


   Rahul Aggarwal
   Yakov Rekhter
   Juniper Networks
   Yuji Kamite
   NTT Communications
   Luyuan Fang
   AT&T
   Chaitanya Kodeboniya
   Juniper Networks


2. Terminology

   This document uses terminology described in [VPLS-BGP] and [VPLS-
   LDP].


draft-raggarwa-l2vpn-vpls-mcast-00.txt                          [Page 2]

Internet Draft   draft-raggarwa-l2vpn-vpls-mcast-00.txt    February 2005


3. Introduction

   [VPLS-BGP] and [VPLS-LDP] describe a solution for VPLS multicast that
   relies on ingress replication. This solution has certain limitations
   for certain VPLS multicast traffic profiles. This document describes
   procedures for overcoming the limitations of existing VPLS multicast
   solutions.

   It describes procedures for VPLS multicast that utilize multicast
   trees in the sevice provider (SP) network. One such multicast tree
   can be shared between multiple VPLS instances. Procedures for
   propagating multicast control information, learned from local VPLS
   sites, to remote VPLS sites are described. These procedures do not
   require IGMP-PIM snooping on the SP backbone links.


4. Existing Limitation of VPLS Multicast

   VPLS multicast solutions described in [VPLS-BGP] and [VPLS-LDP] rely
   on ingress replication. Thus the ingress PE replicates the multicast
   packet for each egress PE and sends it to the egress PE using a
   unicast tunnel.

   This is a reasonable model when the bandwidth of the multicast
   traffic is low or/and the number of replications performed on an
   average on each outgoing interface for a particular customer VPLS
   multicast packet is small. If this is not the case it is desirable to
   utilize multicast trees in the SP core to transmit VPLS multicast
   packets.  Note that unicast packets that are flooded to each of the
   egress PEs, before the ingress PE performs learning for those unicast
   packets, will still use ingress replication.

   By appropriate IGMP or PIM snooping it is possible for the ingress PE
   to send the packet only to the egress PEs that have the receivers for
   that traffic, rather than to all the PEs in the VPLS instance. While
   PIM/IGMP snooping allows to avoid the situation where an IP multicast
   packet is sent to PEs with no receivers, there is a cost for this
   optimization. Namely, a PE has to maintain (S,G) state for all the
   (S,G) of all the VPLSs present on the PE.  And not only this, but
   also PIM snooping has to be done not only on the CE-PE interfaces,
   but on Pseudo-Wire (PW) interfaces as well, which in turn introduces
   a non-negligeable overhead on the PE. It is desirable to reduce this
   overhead when IGMP/PIM snooping is used.


draft-raggarwa-l2vpn-vpls-mcast-00.txt                          [Page 3]

Internet Draft   draft-raggarwa-l2vpn-vpls-mcast-00.txt    February 2005


5. Overview

   This document describes procedures for using multicast trees in the
   SP network to transport VPLS multicast data packets. RSVP-TE P2MP
   LSPs described in [RSVP-P2MP] are an example of such multicast trees.
   The use of multicast trees in the SP network can be beneficial when
   the bandwidth of the multicast traffic is high or when it is
   desirable to optimize the number of copies of a multicast packet
   transmitted by the ingress. This comes at a cost of state in the SP
   core to build multicast trees and overhead to maintain this state.
   This document places no restrictions on the protocols used to build
   SP multicast trees.

   Multicast trees used for VPLS can be of two types:
       1. Default Trees. A single multicast distribution tree in the SP
   backbone is used to carry all the multicast traffic from a specified
   set of one or more VPLSs. These multicast distribution trees can be
   set up to carry the traffic of a single VPLS, or to carry the traffic
   of multiple VPLSs.  The ability to carry the traffic of more than one
   VPLS on the same tree is termed 'Aggregation'. The tree will include
   every PE that is a member of any of the VPLSs that are using the
   tree. This enables the SP to place a bound on the amount of multicast
   routing state which the P routers must have. This implies that a PE
   may receive multicast traffic for a multicast stream even if it
   doesn't have any receivers on the path of that stream.

       2. Data Trees. A Data Tree is used by a PE to send multicast
   traffic for one or more multicast streams, that belong to the same or
   different VPLSs, to a subset of the PEs that belong to those VPLSs.
   Each of the PEs in the subset are on the path to a receiver of one or
   more multicast streams that are mapped onto the tree. The ability to
   use the same tree for multicast streams that belong to different
   VPLSs is termed 'Aggregation'. The reason for having Data Trees is to
   provide a PE to have the ability to create separate SP multicast
   trees for high bandwidth multicast groups. This allows traffic for
   these multicast groups to reach only those PE routers that have
   receivers in these groups. This avoids flooding other PE routers in
   the VPLS.

   A SP can use both Default Trees and Data Trees or either of them for
   a given VPLS on a PE, based on local configuration. Default Trees can
   be used for both IP and non-IP data multicast traffic, while Data
   Trees can be used only for IP multicast data traffic.

   In order to establish Default and Data multicast trees the root of
   the tree must be able to discover the VPLS membership of all the PEs
   and/or the multicast groups that each PE has receivers in. This
   document describes procedures for doing this. For discovering the


draft-raggarwa-l2vpn-vpls-mcast-00.txt                          [Page 4]

Internet Draft   draft-raggarwa-l2vpn-vpls-mcast-00.txt    February 2005


   multicast group membership this document describes procedures that do
   not rely on IGMP-PIM snooping in the SP backbone. These procedures
   can also be used with ingress replication to send traffic for a
   multicast stream to only those PEs that are on the path to receivers
   for that strea. Aggregation also requires a mechanism for the
   egresses of the tree to demultiplex the multicast traffic received
   over the tree. This document describes how upstream label allocation
   by the root of the tree can be used to perform this demultiplexing.
   This document also describes procedures based on BGP that are used by
   the root of an Aggregate Tree to advertise the Default or Data tree
   binding and the demultiplexing information to the leaves of the tree

   This document uses the prefix 'C' to refer to the customer control or
   data packets and 'P' to refer to the provider control or data
   packets.


6. VPLS Multicast / Broadcast / Unknown Unicast Data Packet Treatment

   If the destination MAC address of a VPLS packet received by a PE from
   a VPLS site is a multicast adddress, a multicast tree SHOULD be used
   to transport the packet, if possible. Such a tree can be a Default
   Tree for the VPLS.  It can also be a Data Tree if the VPLS multicast
   packet is an IP packet.

   If the destination MAC address of a VPLS packet is a broadcast
   address, it is flooded. If Default tree is already established, PE
   floods over it. If Default Tree cannot be used for some reason, PE
   MUST flood over multiple unicast PWs, based on [VPLS-BGP] [VPLS-LDP].

   If the destination MAC address of the packet has not been learned,
   the flooding of the packet also occurs. Unlike broadcast case, it
   should be noted that when a PE learns the MAC it might immediately
   switch to transport over one particular PW. This implies that
   flooding unknown unicast traffic over Default Tree might lead to
   packet reordering.  Therefore, unknown unicast SHOULD be flooded over
   multiple unicast PWs based on [VPLS-BGP] [VPLS-LDP], not over
   multicast trees.

   P-multicast trees are intended to be used only for VPLS C-multicast
   data packets, not for control packets being used by a customer's
   layer-2 and layer-3 control protocols. For instance, Bridge Protocol
   Data Units (BPDUs) use an IEEE assigned all bridges multicast MAC
   address, and OSPF uses OSPF routers multicast MAC address. P-
   multicast trees SHOULD not be used for transporting these control
   packets.


draft-raggarwa-l2vpn-vpls-mcast-00.txt                          [Page 5]

Internet Draft   draft-raggarwa-l2vpn-vpls-mcast-00.txt    February 2005


7. Propagating Multicast Control Information

   PEs participating in VPLS need to learn the <C-S, C-G> information
   for two reasons:
      1. With ingress replication, this allows a PE to send the IP
   multicast packet for a <C-S, C-G> only to other PEs in the VPLS
   instance, that have receivers interested in that particular <C-S, C-
   G>. This eliminates flooding.

      2. It allows the construction of Aggregate Data Trees.

   There are two components for a PE to learn the <C-S, C-G> information
   in a VPLS:

      1. Learning the <C-S, C-G> information from the locally homed
   VSIs.
      2. Learning the <C-S, C-G> information from the remote VSIs.

7.1. IGMP/PIM Snooping

   In order to learn the <C-S, C-G> information from the locally homed
   VSIs a PE needs to implement IGMP/PIM snooping. This is because there
   is no PIM adjacency between the locally homed CEs and the PE.
   IGMP/PIM snooping has to be used to build the database of C-Joins
   that are being sent by the customer for a particular VSI. This also
   requires a PE to create a IGMP/PIM instance per VSI for which
   IGMP/PIM snooping is used. This instance is analogous to the
   multicast VRF PIM instance that is created for MVPNs.

   It is conceivable that IGMP/PIM snooping can be used to learn <C-S,
   C-G> information from remote VSIs by snooping VPLS traffic received
   over the SP backbone. However IGMP/PIM snooping is computationally
   expensive.  Furthermore the periodic nature of PIM Join/Prune
   messages implies that snooping PIM messages places even a greater
   processing burden on a PE.  Hence to learn <C-S, C-G> information
   from remote VSIs, this document proposes the use of a reliable
   protocol machinery to transport <C-S, C-G> information over the SP
   infrastructure. This is described in the next section.

7.2. C-Multicast Control Information Propagation in the SP

   A C-Join/Prune message for <C-S, C-G> coming from a customer, that
   are snooped by a PE have to be propagated to the remote PE that can
   reach C-S.  One way to do this is to forward the C-Join/Prune as a
   multicast data packet and let the egress PEs perform IGMP/PIM
   snooping over the pseudo-wire. However PIM is a soft state protocol
   and periodically re-transmits C-Join/Prune messages. This places a
   big burden on a PE while snooping PIM messages. It is not possible to


draft-raggarwa-l2vpn-vpls-mcast-00.txt                          [Page 6]

Internet Draft   draft-raggarwa-l2vpn-vpls-mcast-00.txt    February 2005


   eliminate this overhead for snooping messages received over the
   customer facing interfaces. However it is possible to alleviate this
   overhead over SP facing interfaces. This is done by converting
   snooped PIM C-Join/Prune messages to reliable protocol messages over
   the SP network.

   Each PE maintains the database of IGMP/PIM <C-S, C-G> entries that
   are snooped and that are learnt from remote PEs for each VSI.

   Unlike MVPNs there is an additional challenge while propagating
   snooped PIM C-Join/Prune messages over the SP network for VPLS. If
   the ingress PE wishes to propagate the C-Join/Prune only to the
   upstream PE which has reachability to C-S, this upstream PE is not
   known. This is because the local PE doesn't have a route to reach C-
   S. This is unlike MVPNs where the route to reach C-S is known from
   the unicast VPN routing table. This implies that the C-Join/Prune
   message has to be sent to all the PEs in the VPLS. This document
   proposes two possible solutions for achieving this and one of these
   will be eventually picked after discussion in the WG.

   1. Using PIM

   This is similar to the propagation of PIM C-Join/Prune messages for
   MVPN that has been described earlier in the document. The PIM
   Neighbor discovery and maintenance is based on the VPLS membership
   information learnt as part of VPLS auto-discovery. VPLS auto-
   discovery allows a particular PE to learn which of the other PEs
   belong to a particular VPLS instance. Each of these PEs can be
   treated as a neighbor for PIM procedures while sending PIM C-
   Join/Prune messages to other PEs. The neighbor is considered up as
   long as the VPLS auto-discovery mechanism does not withdraw the
   neighbor membership in the VPLS instance.

   The C-Join/Prune messages is sent to all the PEs in the VPLS using
   unicast PIM messages. The use of unicast PIM implies that there is no
   Join suppression PIM refresh reduction mechanisms, that are currently
   being worked upon in the PIM WG, MUST be used. To send the C-
   Join/Prune message to a particular remote PE, the message is
   encapsulated in the PW used to reach the PE, for the VPLS that the C-
   Join/Prune message belongs to.

   2. Using BGP

   The use of PIM for propagation VPLS C-Join/Prune information may have
   scalability limitations. This is because even after building PIM
   refresh reduction mechanisms PIM will not have optimized transport
   when there is one sender and multiple receivers. BGP provides such
   transport as it has route-reflector machinery. One option to


draft-raggarwa-l2vpn-vpls-mcast-00.txt                          [Page 7]

Internet Draft   draft-raggarwa-l2vpn-vpls-mcast-00.txt    February 2005


   propagate the C-Join/Prune information is to use BGP. This is done by
   using the BGP mechanisms described in section 13.


8. Multicast Tree Leaf Discovery

8.1. Default Tree Leaf Discovery

   VPLS auto-discovery as described in [VPLS-BGP, BGP-AUTO] or another
   VPLS auto-discovery mechanism enables a PE to learn the VPLS
   membership of other PEs. This is used by the root of the Tree to
   learn the egresses of the tree.

8.2. Data Tree Leaf Discovery

   This is done using the C-Multicast control information propagation
   described in the previous section.


9. Demultiplexing Multicast Tree Traffic

   Demultiplexing received VPLS traffic requires the receiving PE to
   determine the VPLS instance the packet belongs to. The egress PE can
   then perform a VPLS lookup to further forward the packet.

9.1. One Multicast Tree - One VPLS Mapping

   When a multicast tree is mapped to only one VPLS, determining the
   tree on which the packet is received is sufficient to determine the
   VPLS instance on which the packet is received. The tree is determined
   based on the tree encapsulation. If MPLS encapsulation is used, eg:
   RSVP-TE P2MP LSPs, the outer MPLS label is used to determine the
   tree. Penultimate-hop-popping must be disabled on the RSVP-TE P2MP
   LSP.

9.1.1. One Multicast Tree - Many VPLS Mapping

   As traffic belonging to multiple VPLSs can be carried over the same
   tree, there is a need to identify the VPLS the packet belongs to.
   This is done by using an inner label that corresponds to the VPLS for
   which the packet is intended. The ingress PE uses this label as the
   inner label while encapsulating a customer multicast data packet.
   Each of the egress PEs must be able to associate this inner label
   with the same VPLS and use it to demultimplex the traffic received
   over the Aggregate Default Tree or the Aggregate Data Tree. If
   downstream label assignment were used this would require all the
   egress PEs in the VPLS to agree on a common label for the VPLS.


draft-raggarwa-l2vpn-vpls-mcast-00.txt                          [Page 8]

Internet Draft   draft-raggarwa-l2vpn-vpls-mcast-00.txt    February 2005


   We propose a solution that uses upstream label assignment by the
   ingress PE.  Hence the inner label is allocated by the ingress PE.
   Each egress PE has a separate label space for every Aggregate Tree
   for which the egress PE is a leaf node. The egress PEs create a
   forwarding entry for the inner VPN label, allocated by the ingress
   PE, in this label space. Hence when the egress PE receives a packet
   over an Aggregate Tree,  the Tree identifier specifies the label
   space to perform the inner label lookup. An implementation
    may create a logical interface corresponding to an Aggregate Tree.
   In that case the label space to lookup the inner label is an
   interface based label space where the interface corresponds to the
   tree.

   When PIM based IP/GRE trees are used the root PE source address and
   the tree P-group address identifies the tree interface. The label
   space corresponding to the tree interface is the label space to
   perform the inner label lookup in.  A lookup in this label space
   identifies the VPLS in which the customer multicast lookup needs to
   be done.

   If the tree uses MPLS encapsulation the outer MPLS label and the
   incoming interface provides the label space of the label beneath it.
   This assumes that penultimate-hop-popping is disabled. An example of
   this is RSVP-TE P2MP LSPs.  The outer label and incoming interface
   effectively identifies the Tree interface.

   The ingress PE informs the egress PEs about the inner label as part
   of the tree binding procedures described in section 11.


10. Establishing Multicast Trees

   This document does not place any restrictions on the multicast
   technology used to setup P-multicast trees. However specific
   procedures are specified only for RSVP-TE P2MP LSPs, PIM-SM and PIM-
   SSM based trees.

   A P-multicast tree can be either a source tree or a shared tree. A
   source tree is used to carry traffic only for the VPLSs that exist
   locally on the root of the tree i.e. for which the root has local
   CEs. A shared tree on the other hand can be used to carry traffic
   belonging to VPLSs that exist on other PEs as well.  For example a RP
   based PIM-SM Aggregate tree would be a shared tree.


draft-raggarwa-l2vpn-vpls-mcast-00.txt                          [Page 9]

Internet Draft   draft-raggarwa-l2vpn-vpls-mcast-00.txt    February 2005


10.1. RSVP-TE P2MP LSPs

   This section describes procedures that are specific to the usage of
   RSVP-TE P2MP LSPs for instantiating a tree. The RSVP-TE P2MP LSP can
   be either a source tree or a shared tree. Procedures in [RSVP-TE-
   P2MP] are used to signal the LSP. The LSP is signaled after the root
   of the LSP discovers the leaves. The egress PEs are discovered using
   the procedures described in section 8. Aggregation as described in
   this document is supported.

10.1.1. P2MP TE LSP - VPLS Mapping

   P2MP TE LSP to VPLS mapping can be learned at the egress PEs using
   BGP based advertisements of the P2MP TE LSP - VPLS mapping. They
   require that the root of the tree include the P2MP TE LSP identifier
   as the tunnel identifier in the BGP advertisements. This identifier
   contains the following information elements:
         - The type of the tunnel is set to RSVP-TE P2MP LSP
         - RSVP-TE P2MP LSP's SESSION Object
         - RSVP-TE P2MP LSP's SENDER_TEMPLATE Object

10.1.2. Demultiplexing C-Multicast Data Packets

   Demultiplexing the C-multicast data packets at the egress PE require
   that the PE be able to determine the P2MP TE LSP that the packets are
   received on.  The egress PE needs to determine the P2MP LSP to
   determine the VPLS that the packet belongs to, as described in
   section 9. To achieve this the LSP must be signaled with penultimate-
   hop-popping (PHP) off. This is because the egress PE needs to rely on
   the MPLS label, that it advertises to its upstream neighbor, to
   determine the P2MP LSP that a C-multicast data packet is received on.
   Signaling the P2MP TE LSP with PHP off requires an extension to RSVP-
   TE which will be described in a future version of this document.

10.2. Receiver Initiated MPLS Trees

   Receiver initiated MPLS trees can also be used. Details of the usage
   of these trees will be specified in a later revision.

10.3. PIM Based Trees

   When PIM is used to setup multicast trees in the SP core, an
   Aggregate Default Tree is termed as the "Aggregate MDT" and an
   Aggregate Data Tree is termed as an "Aggregate Data MDT". The
   Aggregate MDT may be a shared tree, rooted at the RP, or a shortest
   path tree. Aggregate Data MDT is rooted at the PE that is connected
   to the multicast traffic source. The root of the Aggregate MDT or the
   Aggregate Data MDT has to advertise the P-Group address chosen by it


draft-raggarwa-l2vpn-vpls-mcast-00.txt                         [Page 10]

Internet Draft   draft-raggarwa-l2vpn-vpls-mcast-00.txt    February 2005


   for the MDT to the PEs that are leaves of the MDT. These other PEs
   can then Join this MDT. The announcement of this address is done as
   part of the tree binding procedures described in section 11.

10.4. Encapsulation of the Aggregate Default Tree and Aggregate Data
   Tree

   An Aggregate Default Tree or an Aggregate Data Tree may use an IP/GRE
   encapsulation or a MPLS encapsulation. The protocol type in the
   IP/GRE header in the former case and the protocol type in the data
   link header in the latter case needs further explanation. This will
   be done later.


11. Tree to VPLS / C-Multicast Stream Binding Distribution

   Once a PE sets up an Aggregate Default Tree or an Aggregate Data Tree
   it needs to announce the customer multicast groups being mapped to
   this tree to other PEs in the network. This procedure is referred to
   as Default Tree or Data Tree binding distribution and is performed
   using BGP. For an Default Tree this discovery implies announcing the
   mapping of all VPLSs mapped to the Default Tree. The inner label
   allocated by the ingress PE for each VPLS is included. The  Default
   Tree Identifier is also included. For an Data Tree this discovery
   implies announcing all the specific <C-Source, C-Group> entries
   mapped to this tree along with the Data Tree Identifier. The inner
   label allocated for each <C-Source, C-Group> is included. The Data
   Tree Identifier is also included.

   The egress PE creates a logical interface corresponding to the
   Default Tree or the Data Tree identifier. An Default Tree by
   definition maps to all the <C-Source, C-Group> entries belonging to
   all the VPLSs associated with the Default Tree. An Data Tree maps to
   the specific <C-Source, C-Group> associated with it.

   When PIM is used to setup SP multicast trees, the egress PE also
   Joins the P-Group Address corresponding to the MDT or the Data MDT.
   This results in setup of the PIM SP tree.


draft-raggarwa-l2vpn-vpls-mcast-00.txt                         [Page 11]

Internet Draft   draft-raggarwa-l2vpn-vpls-mcast-00.txt    February 2005


12. Switching to Aggregate Data Trees

   Data Trees provide a PE the ability to create separate SP multicast
   trees for certain <C-S, C-G> entires. The source PE that originates
   the Data Tree and the egress PEs have to switch to using the Data
   Tree for the <C-S, C-G> entries that are mapped to it.

   Once a source PE decides to setup an Data Tree, it announces the
   mapping of the <C-S, C-G> entries that are mapped on the tree to the
   other PEs using BGP. Depending on the SP multicast technology used,
   this announcement may be done before or after setting up the Data
   Tree.  After the egress PEs receive the announcement they setup their
   forwarding path to receive traffic on the Data Tree if they have one
   or more receivers interested in the <C-S, C-G> entries mapped to the
   tree. This implies setting up the demultiplexing forwarding entries
   based on the inner label as described earlier. The egress PEs may
   perform this switch to the Data Tree once the advertisement from the
   ingress PE is received or wait for a preconfigured timer to do so.

   A source PE may use one of two approaches to decide when to start
   transmitting data on the Data tree. In the first approach once the
   source PE sets up the Data Tree, it starts sending multicast packets
   for <C-S, C-G> entries mapped to the tree on both that tree as well
   as on the Default Tree. After some preconfigured timer the PE stops
   sending multicast packets for <C-S, C-G> entries mapped on the Data
   Tree on the default tree. In the second approach a certain pre-
   configured delay after advertising the <C-S, C-G> entries mapped to
   an Data Tree, the source PE begins to send traffic on the Data Tree.
   At this point it stops to send traffic for the <C-S, C-G> entries,
   that are mapped on the Data Tree, on the Default Tree. This traffic
   is instead transmitted on the Data Tree.


13. BGP Advertisements

   The procedures required in this document use BGP for Tree - VPLS
   binding advertisements, Tree - Multicast stream binding
   advertisement, and for C-Multicast control propagation. This section
   first describes the information that needs to be propagated in BGP
   for achieving the functional requirements.  It then describes a
   suggested encoding.

13.1. Information Elements


draft-raggarwa-l2vpn-vpls-mcast-00.txt                         [Page 12]

Internet Draft   draft-raggarwa-l2vpn-vpls-mcast-00.txt    February 2005


13.1.1. Default Tree - VPLS Binding Advertisement

   The root of an Aggregate Default Tree maps one or more VPLS instances
   to the Default Tree. It announces this mapping in BGP.  Along with
   the VPLS instances that are mapped to the Default Tree. The Default
   Tree identifier is also advertised in BGP.

   The following information is required in BGP to advertise the VPLS
   instance that is mapped to the Default Tree:
      1. The address of the router that is the root of the Default Tree.
      2. The inner label allocated by the Default Tree root for the VPLS
   instance. The usage of this label is described in section 9.

   When a PE distributes this information via BGP, it must include the
   following:
      1. An identifier of the Default Tree.
      2. Route Target Extended Communities attribute. This RT must be an
   "Import RT" of each VSI in the VPLS. The BGP distribution procedures
   used by [VPLS-BGP] or [BGP-AUTO] will then ensure that the advertised
   information gets associated with the right VSIs.

13.1.2. Data Tree - C-Multicast Stream Binding Advertisement

   The root of an Aggregate Data Tree maps one or more <C-Source, C-
   Group> entries to the tree. These entries are advertised in BGP along
   with the the Data Tree identifier to which these entries are mapped.

   The following information is required in BGP to advertise the <C-
   Source, C-Group> entries that are mapped to the Data Tree:
      1. The RD configured for the VPLS instance.  This is required to
   uniquely identify the <C-Source, C-Group> as the addresses could
   overlap between different VPLS instances.
      2. The inner label allocated by the Data Tree root for the <C-
   Source, C-Group>. The usage of this label is described in section 9.
      3. The C-Source address. This address can be a prefix in order to
   allow a range of C-Source addresses to be mapped to the Data Tree.
      4. The C-Group address. This address can be a range in order to
   allow a range of C-Group addresses to be mapped to the Data Tree.

   When a PE distributes this information via BGP, it must include the
   following:
      1. An identifier of the Data Tree.
      2. Route Target Extended Communities attribute. This is used as
   described in section 13.1.1.


draft-raggarwa-l2vpn-vpls-mcast-00.txt                         [Page 13]

Internet Draft   draft-raggarwa-l2vpn-vpls-mcast-00.txt    February 2005


13.1.3. Using BGP for Propagating VPLS C-Joins/Prunes

   Section 7.2 describes PIM and BGP as possible options for propagating
   VPLS C-Join/Prune information. This section describes the information
   elements needed if BGP were to be used to propagate the VPLS C-
   Join/Prune information in the SP network.

   The following information is required to be advertised by BGP for a
   VPLS <C-Source, C-Group> for VPLS C-Join propagation and withdrawn by
   BGP for VPLS C-Prune propagation.
      1. The RD configured for the VPLS instance. This is required to
   uniquely identify the <C-Source, C-Group> as the addresses could
   overlap between different VPLS instances.
      2. The C-Source address. This can be a prefix.
      3. The C-Group address. This can be a prefix.

   When a PE distributes this information via BGP, it must include the
   Route Target Extended Communities attribute. This is used as
   described in section 13.1.1.

13.1.4. Default Tree/Data Tree Identifier

   Default Tree and Data Tree advertisements carry the Tree identifier.
   The following information elements are needed in this identifier.
       1. Whether this is a shared Default Tree or not.
       2. The type of the tree. For example the tree may use PIM-SM or
   PIM-SSM.
       3. The identifier of the tree. For trees setup using PIM the
   identifier is a (S, G) value.

13.2. Suggested Encoding

   This section describes a suggested BGP encoding for carrying the
   information elements described above. This encoding needs further
   discussion.

   A new Subsequence-Address Family (SAFI) called the VPLS MCAST SAFI is
   proposed.  Following is the format of the NLRI associated with this
   SAFI:

             +---------------------------------+
             |   Length (2 octets)             |
             +---------------------------------+
             |   MPLS Labels (variable)        |
             |---------------------------------+
             |    RD   (8 octets)              |
             +---------------------------------+
             |    Multicast Source Length      |


draft-raggarwa-l2vpn-vpls-mcast-00.txt                         [Page 14]

Internet Draft   draft-raggarwa-l2vpn-vpls-mcast-00.txt    February 2005


             +---------------------------------+
             |Multicast Source  (Variable)     |
             +---------------------------------+
             |Multicast Group   (Variable)     |
             +---------------------------------+


   For Default Tree discovery the information elements for the VPLS
   instances that are mapped to the Default Tree are encoded as a NLRI.
   The RD is set to the configured RD for the VPLS.  The Multicast Group
   is set to 0. The source address is set to the PE's P-address. This
   advertisement also carries a new attribute to identify the Default
   Tree.

   The BGP next-hop address in the NEXT_HOP attribute or the
   MP_REACH_ATTRIBUTE is set to the PE's P-address. This P-address is
   the address of the root of the tree.

   For Data Tree discovery, the information elements for the <C-S, C-G>
   entries that are mapped to the tree are encoded in a NLRI and are set
   using the information elements described in section 13.1.2. The
   address of the Data Tree root router is carried in the BGP next-hop
   address of the MP_REACH_ATTRIBUTE.

   For VPLS C-Join/Prune propagation the information elements are
   encoded in a NLRI. The address of the router originating the C-
   Join/Prunes is carried in the BGP next-hop address of the
   MP_REACH_ATTRIBUTE.

   A new optional transitive attribute called the
   Multicast_Tree_Attribute is defined to signal the Default Tree or the
   Data Tree.  Following is the format of this attribute:

             +---------------------------------+
             |S|   Reserved   |   Tree Type    |
             +---------------------------------+
             |   Tree Identifier               |
             |          .                      |
             |          .                      |
             +---------------------------------+

   The S bit is set if the tree is a shared Default Tree. Tree type
   identifies the SP multicast technology used to establish the tree.
   This determines the semantics of the tree identifier. Currently three
   Tree Types are defined:
     1. PIM-SSM Tree
     2. PIM-SM Tree


draft-raggarwa-l2vpn-vpls-mcast-00.txt                         [Page 15]

Internet Draft   draft-raggarwa-l2vpn-vpls-mcast-00.txt    February 2005


     3. RSVP-TE P2MP LSP

   When the type is set to PIM-SM MDT or PIM-SSM,  the tree identifier
   contains a PIM <P-Source, P-Multicast Group> address. When the type
   is set to RSVP-TE P2MP LSP, the tree identifier contains a RSVP-TE
   <Sesssion, Sender Template> tuple.

   Hence MP_REACH identifies the set of VPLS customer's multicast trees,
   the Multicast_Tree_Attribute identifies a particular SP tree (aka
   Default Tree or Data Tree), and the advertisement of both in a single
   BGP Update creates a binding/mapping between the SP tree (the Default
   Tree or Data Tree) and the set of VPLS customer's trees.


14. Aggregation Methodology

   In general the herustics used to decide which VPLS instances or <C-S,
   C-G> entries to aggregate is implementation dependent. It is also
   conceivable that offline tools can be used for this purpose. This
   section discusses some tradeoffs with respect to aggregation.

   The "congruency" of aggregation is defined by the amount of overlap
   in the leaves of the client trees that are aggregated on a SP tree.
   For Aggregate Default Trees the congruency depends on the overlap in
   the membership of the VPLSs that are aggregated on the Aggregate
   Default Tree. If there is complete overlap aggregation is perfectly
   congruent. As the overlap between the VPLSs that are aggregated
   reduces, the congruency reduces.

   If aggregation is done such that it is not perfectly congruent a PE
   may receive traffic for VPLSs to which it doesn't belong. As the
   amount of multicast traffic in these unwanted VPLSs increases
   aggregation becomes less optimal with respect to delivered traffic.
   Hence there is a tradeoff between reducing state and delivering
   unwanted traffic.

   An implementation should provide knobs to control the congruency of
   aggregation. This will allow a SP to deploy aggregation depending on
   the VPLS membership and traffic profiles in its network.  If
   different PEs or RPs are setting up Aggregate Default Trees this will
   also allow a SP to engineer the maximum amount of unwanted VPLSs that
   a particular PE may receive traffic for.

   The state/bandwidth optimality trade-off can be further improved by
   having a versatile many-to-many association between client trees and
   provider trees. Thus a VPLS can be mapped to multiple Aggregate
   Trees. The mechanisms for achieving this are for further study. Also
   it may be possible to use both ingress replication and an Aggregate


draft-raggarwa-l2vpn-vpls-mcast-00.txt                         [Page 16]

Internet Draft   draft-raggarwa-l2vpn-vpls-mcast-00.txt    February 2005


   Tree for a particular VPLS. Mechanisms for achieving this are also
   for further study.


15. Data Forwarding

15.1. MPLS Tree Encapsulation

   The following diagram shows the progression of the VPLS IP multicast
   packet as it enters and leaves the SP network when MPLS trees are
   being used for multiple VPLS instances. RSVP-TE P2MP LSPs are
   examples of such trees.


      Packets received        Packets in transit      Packets forwarded
      at ingress PE           in the service          by egress PEs
                              provider network


                              +---------------+
                              |MPLS Tree Label|
                              +---------------+
                              | VPN Label     |
      ++=============++       ++=============++       ++=============++
      || C-IP Header ||       || C-IP Header ||       || C-IP Header ||
      ++=============++ >>>>> ++=============++ >>>>> ++=============++
      || C-Payload   ||       || C-Payload   ||       || C-Payload   ||
      ++=============++       ++=============++       ++=============++


   The receiver PE does a lookup on the outer MPLS tree label and
   determines the MPLS forwarding table in which to lookup the inner
   MPLS label. This table is specific to the tree label space. The inner
   label is unique within the context of the root of the tree (as it is
   assigned by the root of the tree, without any coordination with any
   other nodes). Thus it is not unique across multiple roots.  So, to
   unambiguously identify a particular VPLS one has to know the label,
   and the context within which that label is unique. The context is
   provided by the outer MPLS label.

   The outer MPLS label is stripped. The lookup of the resulting MPLS
   label determines the VSI in which the receiver PE needs to do the C-
   multicast data packet lookup. It then strips the inner MPLS label and
   sends the packet to the VSI for multicast data forwarding.


draft-raggarwa-l2vpn-vpls-mcast-00.txt                         [Page 17]

Internet Draft   draft-raggarwa-l2vpn-vpls-mcast-00.txt    February 2005


15.2. IP Tree Encapsulation

   The following diagram shows the progression of the packet as it
   enters and leaves the SP network when the Aggregate MDT or Aggregate
   Data MDTs are being used for multiple VPLS instances. MPLS-in-GRE
   [MPLS-IP] encapsulation is used to encapsulate the customer multicast
   packets.


      Packets received        Packets in transit      Packets forwarded
      at ingress PE           in the service          by egress PEs
                              provider network

                              +---------------+
                              |  P-IP Header  |
                              +---------------+
                              |      GRE      |
                              +---------------+
                              | VPN Label     |
      ++=============++       ++=============++       ++=============++
      || C-IP Header ||       || C-IP Header ||       || C-IP Header ||
      ++=============++ >>>>> ++=============++ >>>>> ++=============++
      || C-Payload   ||       || C-Payload   ||       || C-Payload   ||
      ++=============++       ++=============++       ++=============++


   The P-IP header contains the Aggregate MDT (or Aggregate Data MDT) P-
   group address as the destination address and the root PE address as
   the source address. The receiver PE does a lookup on the P-IP header
   and determines the MPLS forwarding table in which to lookup the inner
   MPLS label. This table is specific to the Aggregate MDT (or Aggregate
   Data MDT) label space. The inner label is unique within the context
   of the root of the MDT (as it is assigned by the root of the MDT,
   without any coordination with any other nodes). Thus it is not unique
   across multiple roots.  So, to unambiguously identify a particular
   VPLS one has to know the label, and the context within which that
   label is unique. The context is provided by the P-IP header.

   The P-IP header and the GRE header is stripped. The lookup of the
   resulting MPLS label determines the VSI in which the receiver PE
   needs to do the C-multicast data packet lookup. It then strips the
   inner MPLS label and sends the packet to the VSI for multicast data
   forwarding.


draft-raggarwa-l2vpn-vpls-mcast-00.txt                         [Page 18]

Internet Draft   draft-raggarwa-l2vpn-vpls-mcast-00.txt    February 2005


16. Security Considerations

   Security considerations discussed in [VPLS-BGP] and [VPLS-LDP] apply
   to this document.


17. Acknowledgments

   Many thanks to Thomas Morin for his support of this work.


18. Normative References

   [RFC2119] "Key words for use in RFCs to Indicate Requirement
   Levels.", Bradner, March 1997

   [RFC3107] Y. Rekhter, E. Rosen, "Carrying Label Information in
   BGP-4", RFC3107.

   [VPLS-BGP] K. Kompella, Y. Rekther, "Virtual Private LAN Service",
   draft-ietf-l2vpn-vpls-bgp-02.txt

   [VPLS-LDP] M. Lasserre, V. Kompella, "Virtual Private LAN Services
   over MPLS", draft-ietf-l2vpn-vpls-ldp-03.txt

   [MPLS-IP] T. Worster, Y. Rekhter, E. Rosen, "Encapsulating MPLS in IP
   or Generic Routing Encapsulation (GRE)", draft-ietf-mpls-in-ip-or-
   gre-08.txt

   [BGP-AUTO] H. Ould-Brahim et al., "Using BGP as an Auto-Discovery
   Mechanism for Layer-3 and Layer-2 VPNs", draft-ietf-l3vpn-bgpvpn-
   auto-04.txt

   [RSVP-P2MP] R. Aggarwal et. al, "Extensions to RSVP-TE for Point to
   Multipoint TE LSPs", draft-ietf-mpls-rsvp-te-p2mp-01.txt


draft-raggarwa-l2vpn-vpls-mcast-00.txt                         [Page 19]

Internet Draft   draft-raggarwa-l2vpn-vpls-mcast-00.txt    February 2005


19. Informative References


20. Author Information

20.1. Editor Information


   Rahul Aggarwal
   Juniper Networks
   1194 North Mathilda Ave.
   Sunnyvale, CA 94089
   Email: rahul@juniper.net


20.2. Contributor Information


   Yakov Rekhter
   Juniper Networks
   1194 North Mathilda Ave.
   Sunnyvale, CA 94089
   Email: yakov@juniper.net

   Yuji Kamite
   NTT Communications Corporation
   Tokyo Opera City Tower
   3-20-2 Nishi Shinjuku, Shinjuku-ku,
   Tokyo 163-1421,
   Japan
   Email: y.kamite@ntt.com

   Luyuan Fang
   AT&T
   200 Laurel Avenue, Room C2-3B35
   Middletown, NJ 07748
   Phone: 732-420-1921
   Email: luyuanfang@att.com

   Chaitanya Kodeboniya
   Juniper Networks
   1194 North Mathilda Ave.
   Sunnyvale, CA 94089
   Email: ck@juniper.net


draft-raggarwa-l2vpn-vpls-mcast-00.txt                         [Page 20]

Internet Draft   draft-raggarwa-l2vpn-vpls-mcast-00.txt    February 2005


21. Intellectual Property

   The IETF takes no position regarding the validity or scope of any
   Intellectual Property Rights or other rights that might be claimed to
   pertain to the implementation or use of the technology described in
   this document or the extent to which any license under such rights
   might or might not be available; nor does it represent that it has
   made any independent effort to identify any such rights.  Information
   on the procedures with respect to rights in RFC documents can be
   found in BCP 78 and BCP 79.

   Copies of IPR disclosures made to the IETF Secretariat and any
   assurances of licenses to be made available, or the result of an
   attempt made to obtain a general license or permission for the use of
   such proprietary rights by implementers or users of this
   specification can be obtained from the IETF on-line IPR repository at
   http://www.ietf.org/ipr.

   The IETF invites any interested party to bring to its attention any
   copyrights, patents or patent applications, or other proprietary
   rights that may cover technology that may be required to implement
   this standard.  Please address the information to the IETF at ietf-
   ipr@ietf.org.


22. Full Copyright Statement

   Copyright (C) The Internet Society (2004). This document is subject
   to the rights, licenses and restrictions contained in BCP 78 and
   except as set forth therein, the authors retain all their rights.

   This document and the information contained herein are provided on an
   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
   ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
   INCLUNG BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
   INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.


draft-raggarwa-l2vpn-vpls-mcast-00.txt                         [Page 21]

Internet Draft   draft-raggarwa-l2vpn-vpls-mcast-00.txt    February 2005


23. Acknowledgement

   Funding for the RFC Editor function is currently provided by the
   Internet Society.


draft-raggarwa-l2vpn-vpls-mcast-00.txt                         [Page 22]