INTERNET DRAFT D. Ooms, J. De Clercq Alcatel February, 2002 Expires August, 2002 Overview of Multicast in VPNs Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract This document describes on a high level how the various types of VPN services (L3VPN, L2VPN, VPLS) can support multicast and broadcast delivery between sites of a VPN. Various approaches and their applicability are discussed. Table of Contents 1. Introduction 2. Reference Architecture 3. Where to replicate? 3.1. CE 3.2. PE 3.3. P routers 4. SP requirements for each approach 5. Scaling properties Ooms, et al. Expires August 2002 [Page 1] Internet Draft draft-ooms-ppvpn-mcast-overview-00.txt February 2002 5.1. Link usage in SP network 5.2. Number of trees in the SP network 5.3. CE fan-out (access link duplicates) 6. Some considerations about multicast routing 7. Security Considerations 1. Introduction This document describes on a high level how the various types of VPN services (L3VPN [L3VPN], L2VPN [L2VPN], VPLS[VPLS]) can support multicast and broadcast delivery between sites of a VPN. For the VPLS service, broadcast capabilities are of high importance because of the flooding phase of self-learning bridges. This document does not propose a single solution, it describes the requirements for the (Service Provider) SP, the scaling properties and the applicability of multiple solutions. The solutions that are considered are a superset of the solutions described in [ROSEN]. Section 6 explains how point-to-multipoint trees can be created without the need of IP multicast routing in the SP network. 2. Reference Architecture Figure 1 depicts the reference architecture. In this document "VPN" means any of the VPN models: L3VPN, L2VPN or VPLS. Following assumptions apply: - the L2PE devices are only present in the case of VPLS and can coincide with the PE devices. - a PE can have zero or more downstream L2PEs leading to a certain VPN. - an L2PE can have zero or more downstream CEs belonging to a certain VPN. - for a certain VPN, a CE has only one upstream PE/L2PE (this is a limitation compared to the reference architecture in the framework document). - for a certain VPLS, L2PE uses only one upstream PE ([L2PE]). Ooms, et al. Expires August 2002 [Page 2] Internet Draft draft-ooms-ppvpn-mcast-overview-00.txt February 2002 +---------------+ +----CE3<--sender | | | CE1---[L2PE1]-PE1--+ P P +--PE2---[L2PE2]---CE4 | | | | CE2----+ | P | +---[L2PE3]---CE5<--sender | | no rcvr +-----+----+----+ | | CEB--[L2PE5]----PE4 PE3---[L2PE4]----CE6<--no rcvr | +-----CEA Figure 1 For traffic flowing from CE3 to CE1 we will call CE3 the 'Ingress CE' and CE1 the 'Egress CE'. And similarly, PE2 will be named the 'Ingress PE' and PE1 the 'Egress PE'. The network of Figure 1 also acts as the example network throughout this document. We assume that the sites CE1 to CE6 belong to the same VPN, let's call it VPNi. In VPNi several multicast groups can exist. The number of groups in VPNi is N_Gi. For one of these groups, let's call it Gij, the sites at CE3 and CE5 contain one or more senders. In the sites connected to CE4 and CE6 there are no receivers for Gij. Table 1 shows the parameters that are used in this document. Ooms, et al. Expires August 2002 [Page 3] Internet Draft draft-ooms-ppvpn-mcast-overview-00.txt February 2002 +----------+----------------------------+---------------------+ |parameter | description |value in example | +----------+----------------------------+---------------------+ | NETWORK | | +----------+----------------------------+---------------------+ |L_CE | average number of SP hops |L_CE | | | between CEs | | |L_L2PE | average number of SP hops |L_L2PE | | | between L2PEs | | |L_PE | average number of SP hops |L_PE>=L_L2PE>=L_CE | | | between PEs | | +----------+----------------------------+---------------------+ | VPNi | | +----------+----------------------------+---------------------+ |N_PEi | #PEs involved in VPNi |3 | |N_L2PEi | #L2PEs involved in VPNi |4 | |N_CEi | #CEs of VPNi |6 | |N_Gi | #groups in VPNi |N-Gi | +----------+----------------------------+---------------------+ |Group Gij | | +----------+----------------------------+---------------------+ |N_S_CEij | #sending CEs in Gij |2 (CE3, CE5) | |N_S_L2PEij| #sending L2PEs in Gij |2 (L2PE2, L2PE3) | |N_S_PEij | #sending PEs in Gij |1 (PE2) | |P_R_CEij | % of CEs of VPNi with |66% | | | receivers for group Gij |(all except CE5-6) | |P_R_L2PEij| % of L2PEs of VPNi with |50% | | | receivers for group Gij |(all except L2PE3-4) | |P_R_PEij | % of PEs of VPNi with |66% | | | receivers for group Gij |(all except PE3) | |BWij | bandwidth of group Gij |BWij | +----------+----------------------------+---------------------+ Table 1 3. Where to replicate? The replication of multicast and broadcast packets can happen in several locations of the VPN reference architecture: CE, PE (L2PE) or P routers. 3.1. CE The CE connected to the source of the multicast packet replicates the packet: one copy for every CE interested in the packet. CEs are multicast routing neighbors. The sites (including the CEs) are multicast-enabled and do further replication if there are multiple Ooms, et al. Expires August 2002 [Page 4] Internet Draft draft-ooms-ppvpn-mcast-overview-00.txt February 2002 receivers inside a site. In this approach, the SP doesn't need any multicast-aware boxes. Since the current L2VPN model provides point-to-point connectivity, the replication happens in the CE (or even more upstream in the site). Replication in the CE can also be obtained in L3VPNs and VPLSs if the CEs create tunnels between each other to carry multicast traffic (which is an extra task for the network manager of the customer network). This results in a multicast overlay network. - For a L3VPN this means that the SP is involved in the VPN's unicast routing, but not in its multicast routing. - For a VPLS it means that the SP is involved in learning unicast routes, but not in learning multicast routes by e.g. snooping PIM-SM ([PIM-SM]) or IGMP messages ([IGMP-SNOOP]). Basically, the VPLS- bridge will not encounter any multicast packet since these are encapsulated in unicast. However, this replication in the CE does not remove the need of the PE (L2PE) to broadcast packets during the flooding phase. The disavantage of replication in the CE is the wasted bandwidth on the access link between sending CE and Ingress PE. 3.2. PE If the replication is done by PEs, one can distinguish two variants: 1) Both Ingress and Egress PE replicate: the Ingress PE sends a copy to every Egress PE (that has receivers) and the Egress PE further replicates to every CE (that has receivers). 2) Only the Ingress PE is replicating, so it has to send a copy to every CE (that has receivers). For L3VPNs the first variant is described in [ROSEN section 4]. The SP network looks like an NBMA network and the PEs are neighbors in the customer's multicast routing. To outsource the replication of data to the PE, a L2VPN could create a dedicated circuit (DLCI, VPI/VCI) between CE and Ingress PE to carry all multicast and broadcast traffic. The Ingress PE of a L2VPN has an LSP to every Egress CE, soit can obtain the behavior of the second variant by broadcasting the data received on the dedicated circuit to all LSPs. If the PE of the L2VPN is able to e.g. snooping IP multicast routing messages it could perform a more selective Ooms, et al. Expires August 2002 [Page 5] Internet Draft draft-ooms-ppvpn-mcast-overview-00.txt February 2002 replication for IP multicast data. To obtain the behavior of the first variant, the Egress PE must be able to easily detect which packets possibly need replication in the Egress PE, so an extra label per triplet (PEi, PEj, VPNk) is needed (L2VPNs create labels per (CEi, CEj, VPNk)). As long as a PE in a VPLS hasn't learned the path for a (unicast or multicast) destination, it will flood the packet to all other PEs (variant 1). If the PE is able to learn multicast paths (by e.g. multicast routing snooping) it can perform a more selective replication for packets with an Ethernet multicast address. 3.2.1. L2PE In a VPLS the PE function might be split over a L2PE and a PE. If the replication happens in the L2PE, the L2PE should do the snooping. If the PEs are replicating then a dedicated LSP for 'multicast' data is needed between L2PE and PE. The latter LSP could also be used for unicast packets with non-learned paths. 3.3. P routers In the previous sections no new state (tunnel) was created in the P routers to carry multicast traffic. The methods in this section will create new point-to-multipoint LSPs in the SP network to carry multicast and broadcast traffic. These new point-to-multipoint can be created for various flow granularities (FECs), i.e. various combinations of source and destination addresses. Table 2 gives an overview. (C1) (C2) +---------------------------+-------+-------+ | destination |any Gij|certain| | source |in VPNi| Gij | +---------------------------+-------+-------+ | any PE in VPNi | 3.3.1 | 3.3.4 | (R1) | certain PE in VPNi | 3.3.2 | 3.3.5 | (R2) | certain sending PE in VPNi| 3.3.3 | 3.3.6 | (R3) +---------------------------+-------+-------+ Table 2 3.3.1. Tree per VPN Only one tree per VPN is used. The tree connects all PEs that belong Ooms, et al. Expires August 2002 [Page 6] Internet Draft draft-ooms-ppvpn-mcast-overview-00.txt February 2002 to a VPN (so it is in fact a broadcast distribution tree). This tree can be a bidirectional PIM-SM [BIDIR] tree or a shared PIM-SM tree [PIM-SM]. The tree will carry all multicast traffic of the VPN to all PEs (also to PEs that belong to the VPN, but don't have downstream receivers for the group). For L3VPNs this approach is described in [ROSEN section 2]. Application to VPLSs is straightforward, since it is simply a matter of putting all packets that have non-learned paths on this distribution tree (both unicast and multicast packets). There is no need for the Ingress PE to learn multicast paths because multicast packets are always put on this single distribution tree. To enable early filtering of non-wanted multicast packets, an Egress PE could learn the multicast paths. A L2VPN can use such a tree by bringing all data that should be multicasted or broadcasted via a dedicated circuit to the Ingress PE. The Egress PE should replicate the data received from the tree to all connected CEs. To enable early filtering of non-wanted multicast packets, an Egress PE could learn which Egress CEs have receivers. The advantage of this approach is that only one tree is created. The disadvantage is that bidirectional trees are not part of the base specification of PIM-SM and that shared tree do not combine well with MPLS ([MPLS-MC], [ROSEN]: shared tree requires an extra GRE encapsulation). 3.3.2. Tree per (PE, VPN) In this approach N_PEi source trees are created per VPN. These trees can be created by ASM PIM-SM, SSM-only PIM-SM or a simple extension to existing MPLS signaling protocols (see section 6). Using source trees has the additional advantage that the mapping of source trees onto point-to-multipoint LSPs is straightforward ([MPLS-MC]). 3.3.3. Tree per (sending PE, VPN) To limit the number of source trees one could only construct source trees for Ingress PEs that have sources. Whether a PE has a multicast source could be communicated via BGP to the peer PEs. BGP itself could get this information by configuration or by a mechanism that discovers whether there are sources active for any group. 3.3.4. Tree per group The approaches in section 3.3.1 to 3.3.3 have the disadvantage that data is also delivered to Egress PEs that have no interest in the Ooms, et al. Expires August 2002 [Page 7] Internet Draft draft-ooms-ppvpn-mcast-overview-00.txt February 2002 data. In this approach only PEs that have receivers for a group are connected to the distribution tree, which is a bidirectional or a shared tree. For L3VPNs this approach is described in [ROSEN section 3]. To apply this to L2VPNs or VPLSs the Egress PEs have to learn whether they have downstream receivers for a certain group. In case they have downstream receivers they should join the distribution tree for this group. The disadvantage is that a tree for every group of every VPN is created in the SP network, which is a scaling issue. When there are many groups in a VPN and N_PEi is not too large, one could consider to create a tree for every possible combination of PEs belonging to VPNi. 3.3.5. Tree per (PE, group) See parts of 3.3.4 and 3.3.2. 3.3.6. Tree per (sending PE, group) See parts of 3.3.4 and 3.3.3. 3.3.7. Tree for some (sending PE, group) To further tune the number of trees one could consider to create only trees in the SP network for high bandwidth flows, this would require an additional mechanism to detect the highly active (source, group) couples. 4. SP requirements for each approach The various approaches pose different requirements to the network of the Service Provider. Table 3 gives an overview of which features are required for which approach. Ooms, et al. Expires August 2002 [Page 8] Internet Draft draft-ooms-ppvpn-mcast-overview-00.txt February 2002 +--------+---+---+---+---+---+---+---+ |approach| 1 | 2 | 3 | 4 | 5 | 6 | 7 | +--------+---+---+---+---+---+---+---+ | 3.1 | | | | | | | | | 3.2 | X | | | | | | | | 3.3.1 | X | X or X| | | X | | | 3.3.2 | X | |X or X or X| X | | | 3.3.3 | X | |X or X or X| X | X | | 3.3.4 | X | X or X| | | X | | | 3.3.5 | X | |X or X or X| X | | | 3.3.6 | X | |X or X or X| X | X | +--------+---+---+---+---+---+---+---+ (1) multicast routing/snooping on customer itf of PE router (2) ASM bidir PIM-SM in SP network (3) ASM PIM-SM in SP network (shared tree) (4) SSM PIM-SM in SP network (5) extension to MPLS signaling (6) point-to-multipoint LSPs (7) active source detection Table 3 5. Scaling properties 5.1. Link usage in SP network Table 4 indicates the link*bandwidth product used in the SP network by one multicast group. The Chuang-Sirbu law [CHUANG] states that: Lm = M**0.8 Lu with: Lm: the number of links in a multicast tree M: the number of leaves of the tree Lu: the (average) number of links in a unicast path Ooms, et al. Expires August 2002 [Page 9] Internet Draft draft-ooms-ppvpn-mcast-overview-00.txt February 2002 +--------+---------------------------------------+-------------+ |approach| general expression | example | +--------+---------------------------------------+-------------+ | 3.1 | N_CEi*P_R_CEij*L_CE*BWij |4*L_CE*BWij | | 3.2 | N_PEi*P_R_PEij*L_PE*BWij |2*L_PE*BWij | | 3.2.1 | N_L2PEi*P_R_L2PEij*L_L2PE*BWij |2*L_L2PE*BWij| | 3.3.1 | (N_PEi**0.8)*L_PE*BWij |2.4*L_PE*BWij| | 3.3.2 | " | " | | 3.3.3 | " | " | | 3.3.4 | ((N_PEi*P_R_PEij)**0.8)*L_PE*BWij |1.7*L_PE*BWij| | 3.3.5 | " | " | | 3.3.6 | " | " | +--------+---------------------------------------+-------------+ Table 4 One can determine the link efficiency of sub-optimal approaches like 3.2 and 3.3.1-3 by comparing it with the optimal (but less scalable) approaches 3.3.4-6. The efficiency of 3.2 as a function of P_R_PEij is depicted in Figure 2. Eff(3.2) ^ | 1| \ | - | \ | -- | \ 1.15/N_PEi**0.2| --- | \ 1/N_PEi**0.2| ----- +--------------------------> 0.5 1 P_R_PEij Figure 2 The efficiency of 3.3.1-3 as a function of P_R_PEij is depicted in Figure 3. Ooms, et al. Expires August 2002 [Page 10] Internet Draft draft-ooms-ppvpn-mcast-overview-00.txt February 2002 Eff(3.3.1-3) ^ | 1| / | / | / | / 0.57| / | / | / | / +-----------------------> 0.5 1 P_R_PEij Figure 3 One can also determine the trade-off between 3.2 and 3.3.1-3: when is 3.2 more efficient than 3.3.1-3 and vice versa. The ratio of the link usage is: 3.2/3.3.1-3= N_PEi**0.2 * P_R_PEij P_R_PEij ^ | 1| \ | - | \ 3.3.1-3 | -- 0.5| \ | 3.2 --- | \ | ----- +--------------------------> 1 32 N_PEi Figure 4 As long as N_PEi is small or for larger N_PEi but with a small membership percentage it stays (in terms of link usage) beneficial to duplicate in the PE (approach 3.2). 5.2. Number of trees in the SP network Table 5 shows the number of trees created in the SP network by the N_Gi multicast groups belonging to VPNi. Ooms, et al. Expires August 2002 [Page 11] Internet Draft draft-ooms-ppvpn-mcast-overview-00.txt February 2002 +--------+---------------------------------------+-------------+ |approach| general expression | example | +--------+---------------------------------------+-------------+ | 3.1 | 0 | 0 | | 3.2 | 0 | 0 | | 3.2.1 | 0 | 0 | | 3.3.1 | 1 | 1 | | 3.3.2 | N_PEi | 3 | | 3.3.3 | N_S_PEij | 1 | | 3.3.4 | N_Gi | N_Gi | | 3.3.5 | N_Gi*N_PEi | 3*N_Gi | | 3.3.6 | N_Gi*N_S_PEij | N_Gi | +--------+---------------------------------------+-------------+ Table 5 Remember that each VPNi will generate this amount of trees in the SP network. 5.3. CE fan-out (access link duplicates) Approach 3.1 will carry N_CEi*P_R_CEij (- 1) duplicate packets over the access link. All other approaches don't generate duplicate traffic on the access link. 6. Some considerations about multicast routing Approaches 3.3.2-3 and 3.3.5-6 can be achieved by running multicast routing in the provider's network and do a mapping of IP multicast trees on point-to-multipoint LSPs, but another solution which doesn't require IP multicast in the provider's network is also possible. Let's have a look at the tasks performed by a multicast routing protocol (such as PIM-SM): 1. Source discovery mechanism: this allows the receivers to discover which the sources are for a certain group. 2. Switch-over from shared to source trees. 3. Hop-by-hop signaling to create a tree. 4. Creation of tree state in every router. Now, for the above mentioned approaches: 1. The source PEs are already known via BGP. Ooms, et al. Expires August 2002 [Page 12] Internet Draft draft-ooms-ppvpn-mcast-overview-00.txt February 2002 2. There are only source trees, so there is no need for a switch- over. 3-4. These are functions that are typically also performed by MPLS. Since the SP network already offers (unicast) VPNs, it is already MPLS-enabled and thus these functions are basically available. Since the Ingress PE knows all Egress PEs and the Egress PE knows all Ingress PEs, the label distribution can be Ingress (root) or Egress (leaf) initiated ([CHENG], [OOMS]). Further, an IP multicast tree is nothing else than a point-to- multipoint connection, but with some limitations: 1. The class D address has two roles: - connection identifier - the key to the 'connection table' and is in every hop the same In other connection-oriented protocols (e.g. MPLS) these roles are clearly separated: - the connection identifier is the FEC - the key to the 'connection table' is a label that can be swapped This means e.g. that if approach 3.3.2 is used with an MPLS extension one could use (PE, VPN) as the FEC (connection identifier), but when approach 3.3.2 is used with multicast routing an extra step is required to allocate a unique class D address. 2. The IP multicast trees are always constructed along the Reverse Shortest Path. Other connection-oriented protocols (like MPLS) allow to deviate from the shortest path, which enables goodies like Traffic Engineering ([CHENG], [OOMS]). 7. Security Considerations Security considerations will be addressed in a future revision of this document. References [BIDIR] "Bi-directional Protocol Independent Multicast", M. Handley, Ooms, et al. Expires August 2002 [Page 13] Internet Draft draft-ooms-ppvpn-mcast-overview-00.txt February 2002 I. Kouvelas, T. Speakman, L. Vicisano, work in progress, Internet Draft, , June 2001. [CHENG] "RSVP-TE: Extensions to RSVP for Multicast LSP Tunnels", D. Cheng, work in progress, Internet Draft, , October 2001. [CHUANG] "Pricing Multicast Communication: a cost-based approach", J. Chuang, M. Sirbu, Telecommunications Systems 17:3, 281-297, 2001 (Kluwer Academic Publishers). [IGMP-SNOOP] "IGMP and MLD snooping switches", M. Christensen, F. Solensky, work in progress, Internet Draft, , January 2002. [L2PE] "Decoupled Virtual Private LAN Services", K. Kompella et al, work in progress, Internet Draft, , November 2001. [L2VPN] "Layer 2 VPNs Over Tunnels", K. Kompella et al, work in pro- gress, Internet Draft, , November 2001. [L3VPN] "A Framework for Layer Provider Provisioned Virtual Private Networks", R. Callon et al, work in progress, Internet Draft, , February 2002. [MPLS-MC] "Multicast in MPLS networks", D.Ooms, et.al., work in pro- gress, Internet Draft, , January 2002. [OOMS] "MPLS Multicast Traffic Engineering", D.Ooms, et.al., work in progress, Internet Draft, , February 2002. [PIM-SM] "Protocol Independent Multicast-Sparse Mode (PIM-SM)", B. Fenner, et.al., work in progress, Internet Draft, , November 2001. [ROSEN] "Multicast in MPLS/BGP VPNs", E. Rosen et.al., work in pro- gress, Internet Draft, , July 2001. [SSM] "Source-Specific Multicast for IP", H. Holbrook, B. Cain, work in progress, Internet draft, , November 2001. Ooms, et al. Expires August 2002 [Page 14] Internet Draft draft-ooms-ppvpn-mcast-overview-00.txt February 2002 [VPLS] "Virtual Private LAN Service", K. Kompella et at, work in pro- gress, Internet Draft, , November 2001. Authors Addresses Dirk Ooms Alcatel Fr. Wellesplein 1, 2018 Antwerpen, Belgium. Phone : 32 3 2404732 E-mail: Dirk.Ooms@alcatel.be Jeremy De Clercq Alcatel Fr. Wellesplein 1, 2018 Antwerpen, Belgium. Phone : 32 3 2404752 E-mail: Jeremy.De_Clercq@alcatel.be Ooms, et al. Expires August 2002 [Page 15]