Network Working Group Kireeti Kompella Internet Draft Manoj Leelanivas Expiration Date: April 2001 Quaizar Vohra Juniper Networks Javier Achirica Ronald Bonica Telefonica Data WorldCom Chris Liljenstolpe Eduard Metz Cable & Wireless KPN Dutch Telecom Chandramouli Sargor Vijay Srinivasan CoSine Communications MPLS-based Layer 2 VPNs draft-kompella-mpls-l2vpn-00.txt 1. Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as ``work in progress.'' The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Kompella et al. [Page 1] Internet Draft draft-kompella-mpls-l2vpn-00.txt October 2000 2. Abstract Virtual Private Networks (VPNs) based on Frame Relay or ATM circuits have been around a long time. While these VPNs work well, the costs of maintaining separate networks for Internet traffic and VPNs and the administrative burden of provisioning these VPNs have led Service Providers to look for alternative solutions. In this document, we present a VPN solution where from the customer's point of view, the VPN is based on Layer 2 circuits, but the Service Provider maintains and manages a single MPLS-based network for IP, MPLS IP VPNs, and Layer 2 VPNs. 3. Introduction The first corporate networks were based on dedicated leased lines interconnecting the various offices of the corporation. Such networks offered connectivity and little else: they didn't scale well, they were expensive for the service providers (and hence for their customers), and provisioning them was a slow and arduous task. The first Virtual Private Networks (VPNs) were based on Layer 2 circuits: X.25, Frame Relay and ATM (see [VPN]). Layer 2 VPNs were easier to provision, and virtual circuits allowed the service provider to share a common infrastructure for all the VPNs. These features were passed on to the customers in terms of cost savings. However, while Layer 2 VPNs were a significant step forward from dedicated lines, they still had their drawbacks. First, they tied the service provider VPN infrastructure to a single medium (e.g., ATM). This became even more of a burden if the Internet infrastructure was to share the same physical links. Second, the Internet infrastructure and the VPN infrastructure, even if they shared the same physical network, needed separate administration and maintenance. Third, while provisioning was much easier than for dedicated lines, it was still complex. This was especially evident in the effort to add a site to an existing VPN. This document offers a solution that preserves the advantages of a Layer 2 VPN while allowing the Service Provider to maintain and manage a single (MPLS-based) network for IP, MPLS IP VPNs ([IPVPN]) and Layer 2 VPNs, and reducing the provisioning problem significantly. In particular, adding a site to an existing VPN in most cases requires configuring just the Provider Edge router connected to the new site. The rest of this section discusses the relative merits of MPLS-based Layer 2 and Layer 3 VPNs. Section 4 describes the operation of an MPLS-based Layer 2 VPN. Sections 5 and 6 offer two alternative means Kompella et al. [Page 2] Internet Draft draft-kompella-mpls-l2vpn-00.txt October 2000 of signalling Layer 2 VPNs, one using LDP and the other using BGP. 3.1. Terminology We assume that the reader is familiar with Multi-Protocol Label Switching (MPLS [MPLS]), the Label Distribution Protocol (LDP [LDP]) and the Border Gateway Protocol version 4 (BGP [BGP]). The terminology we use follows. A "customer" is a customer of a Service Provider seeking to interconnect the various "sites" (independently connected networks) through the Service Provider's network, while maintaining privacy of communication and address space. The device in a customer site that connects to a Service Provider router is termed the CE (customer edge device); this device may be a router or a switch. The Service Provider router to which a CE connects is termed a PE. A router in the Service Provider's network which doesn't connect directly to any CE is termed P. These definitions follow those given in [IPVPN]. 3.2. Advantages of Layer 2 VPNs We define a Layer 2 VPN as one where a Service Provider provides a layer 2 network to the customer. As far as the customer is concerned, they have (say) Frame Relay circuits connecting the various sites; each CE is configured with a DLCI with which to talk to other CEs. Within the Service Provider's network, though, the layer 2 packets are transported within MPLS Label-Switched Paths (LSPs). The Service Provider does not participate in the customer's layer 3 network, in particular, in the routing, resulting in several advantages to the SP as a whole and to PE routers in particular. 3.2.1. Separation of Administrative Responsibilities In a Layer 2 VPN, the Service Provider is responsible for Layer 2 connectivity; the customer is responsible for Layer 3 connectivity, which includes routing. If the customer says that host x in site A cannot reach host y in site B, the Service Provider need only demonstrate that site A is connected to site B. The details of how routes for host y reach host x are the customer's responsibility. Another very important factor is that once a PE provides Layer 2 connectivity to its connected CE, its job is done. A misbehaving CE can at worst flap its interface. On the other hand, a misbehaving CE Kompella et al. [Page 3] Internet Draft draft-kompella-mpls-l2vpn-00.txt October 2000 in a Layer 3 VPN can flap its routes, leading to instability of the PE router or even the entire SP network. This means that the Service Provider must aggressively damp route flaps from a CE; this is common enough with external BGP peers, but in the case of VPNs, the scale of the problem is much larger; also, the CE-PE routing protocol may not be BGP, and thus not have BGP's flap damping control. 3.2.2. Migrating from Traditional Layer 2 VPNs Since "traditional" Layer 2 VPNs (i.e., real Frame Relay circuits connecting sites) are indistinguishable from MPLS-based VPNs from the customer's point-of-view, migrating from one to the other raises few issues. With Layer 3 VPNs, special care has to be taken that routes within the traditional VPN are not preferred over the Layer 3 VPN routes (the so-called "backdoor routing" problem, whose solution requires protocol changes that are somewhat ad hoc). 3.2.3. Privacy of Routing In a Layer 2 VPN, the privacy of customer routing is a natural fallout of the fact that the Service Provider does not participate in routing. The SP routers need not do anything special to keep customer routes separate from other customers or from the Internet; there is no need for per-VPN routing tables, and the additional complexity this imposes on PE routers. 3.2.4. Layer 3 Independence Since the Service Provider simply provides Layer 2 connectivity, the customer can run any Layer 3 protocols they choose. If the SP were participating in customer routing, it would be vital that the customer and SP both use the same layer 3 protocol(s) and routing protocols. 3.2.5. Multicast Routing Supporting IP multicast over MPLS-based Layer 3 VPN is as yet undocumented. In the Layer 2 VPN case, the CE routers run native multicast routing directly. The SP backbone just provides pipes to connect the CE routers; whether the CE routers run IP unicast or IP multicast or some other network protocol is irrelevant to the SP routers. Kompella et al. [Page 4] Internet Draft draft-kompella-mpls-l2vpn-00.txt October 2000 3.2.6. PE Scaling In the Layer 2 VPN scheme described below, each PE transmits a single small chunk of information about every CE that the PE is connected to to every other PE. That means that each PE need only maintain a single chunk of information from each CE in each VPN, and keep a single "route" to every site in every VPN. This means that both the Forwarding Information Base and the Routing Information Base scale well with the number of sites and number of VPNs. Furthermore, the scaling properties are independent of the customer: the only germane quantity is the total number of VPN sites. This is to be contrasted with Layer 3 VPNs, where each CE in a VPN may have an arbitrary number of routes that need to be carried by the SP. This leads to two issues. First, both the information stored at each PE and the number of routes installed by the PE for a CE in a VPN can be (in principle) unbounded, which means in practice that a PE must restrict itself to installing routes associated with the VPNs that it is currently a member of. Second, a CE can send a large number of routes to its PE, which means that the PE must protect itself against such a condition. Thus, the SP must enforce limits on the number of prefixes accepted from a CE; this in turn requires the PE router to offer such control. The scaling issues of Layer 3 VPNs come into sharp focus at a BGP route reflector (RR). An RR cannot keep all the advertised routes in every VPN since the number of routes will be too large. The following solutions/extensions are needed to address this issue: 1) RRs could be partitioned so that each RR services a subset of VPNs so that no single RR has to carry all the routes. This method has the disadvantage that a PE changing its VPN membership could force a change in the RR configuration, and would require carefully constructing RR topologies. 2) An RR could use a preconfigured list of Route-Targets for its inbound route filtering. The RR may also need to install Outbound Route Filters [BGP-ORF] which contain the above list of Route-Targets on each of its peers so that they do not send unnecessary VPN routes. This method also requires significant extensions along with the fact that multiple RRs are needed to service different sets of VPNs. Kompella et al. [Page 5] Internet Draft draft-kompella-mpls-l2vpn-00.txt October 2000 3.2.7. Ease of Configuration Configuring traditional Layer 2 VPNs was a burden primarily because of the O(n*n) nature of the task. If there are n CEs in a Frame Relay VPN, say full-mesh connected, n*(n-1)/2 DLCI PVCs must be provisioned across the SP network. At each CE, (n-1) DLCIs must be configured to reach each of the other CEs. Furthermore, when a new CE is added, n new DLCI PVCs must be provisioned; also, each existing CE must be updated with a new DLCI to reach the new CE. In our proposal, the provisioning of "PVCs" across the SP network is handled by signalling protocols (LDP, RSVP-TE), reducing a large part of the provisioning burden. Furthermore, we assume that DLCIs at the CE edge are relatively cheap; and labels in the SP network are cheap. This allows the SP to "over-provision" VPNs: for example, allocate 50 CEs to a VPN when only 20 are needed. With this over-provisioning, adding a new CE to a VPN requires configuring just the new CE and its associated PE; existing CEs and their PEs need not be re-configured. 3.3. Advantages of Layer 3 VPNs Layer 3 VPNs ([IPVPN] in particular) offer a good solution when the customer traffic is wholly IP, customer routing is reasonably simple, and the customer sites connect to the SP with a variety of Layer 2 technologies. 3.3.1. Layer 2 Independence One major restriction in a Layer 2 VPN is that the Layer 2 medium with which the various sites of a single VPN connect to the SP must be uniform. On the other hand, the various sites of a Layer 3 VPN can connect to the SP with any supported media; for example, some sites may connect with Frame Relay circuits, and others with Ethernet. A corollary to this is that the number of sites that can be in a Layer 2 VPN is determined by the number of Layer 2 circuits that the Layer 2 technology provides. For example, if the Layer 2 technology is Frame Relay with 2-byte DLCIs, a CE can connect to at most about a thousand other CEs in a VPN. Kompella et al. [Page 6] Internet Draft draft-kompella-mpls-l2vpn-00.txt October 2000 3.3.2. SP Routing as Added Value Another problem with Layer 2 VPNs is that the CE router in a VPN must be able to deal with having N routing peers, where N is the number of sites in the VPN. This can be alleviated by manipulating the topology of the VPN. For example, a hub-and-spoke VPN architecture means that only one CE router (the hub) needs to deal with N neighbors. However, in a Layer 3 VPN, a CE router need only deal with one neighbor, the PE router. Thus, the SP can offer Layer 3 VPNs as a value-added service to its customers. Moreover, with layer 2 VPNs it is up to a customer to build and operate the whole network. With Layer 3 VPNs, a customer is just responsible for building and operating routing within each site, which is likely to be much simpler than building and operating routing for the whole VPN. That, in turn, makes Layer 3 VPNs more suitable for customers who don't have sufficient routing expertise, again allowing the SP to provide added value. 3.3.3. Class-of-Service Class-of-Service issues have been addressed for Layer 3 VPNs. Since the PE router has visibility into the network layer (IP), the PE router can take on the tasks of CoS classification and routing. Class-of-Service issues for Layer 2 VPNs will be addressed in a future revision. 4. Operation of a Layer 2 VPN The following simple example of a customer with 4 sites connected to 3 PE routers in a Service Provider network will hopefully illustrate the various aspects of the operation of a Layer 2 VPN. For simplicity, we assume that a full-mesh topology is desired. In what follows, Frame Relay serves as the Layer 2 medium, and each CE has multiple DLCIs to its PE, each to connect to another CE in the VPN. If the Layer 2 medium were ATM, then each CE would have multiple VPI/VCIs to connect to other CEs. For PPP and Cisco HDLC, each CE would have multiple physical interfaces to connect to other CEs. Kompella et al. [Page 7] Internet Draft draft-kompella-mpls-l2vpn-00.txt October 2000 4.1. Network Topology Consider a Service Provider network with edge routers PE0, PE1, and PE2. Assume that PE0 and PE1 are IGP neighbors, and PE2 is more than one hop away from PE0. Suppose that a customer C has 4 sites S0, S1, S2 and S3 that C wants to connect via the Service Provider's network using Frame Relay. Site S0 has CE0 and CE1 both connected to PE0. Site S1 has CE2 connected to PE0. Site S2 has CE3 connected to PE1 and CE4 connected to PE2. Site S3 has CE5 connected to PE2. (See the Figure 1 below.) Suppose further that C wants to "over-provision" each current site, in expectation that the number of sites will grow to at least 10 in the near future. However, CE4 is only provisioned with 9 DLCIs. Suppose finally that CE0 and CE2 have DLCIs 100 through 109 free; CE1 and CE3 have DLCIs 200 through 209 free; CE4 has DLCIs 107, 209, 265, 301, 414, 555, 654, 777 and 888 free; and CE5 has DLCIs 417-426. 4.2. Configuration The following sub-sections detail the configuration that is needed to provision the above VPN. For the purpose of exposition, we assume that the customer will connect to the SP with Frame Relay circuits, and that the customer's IGP of choice is OSPF. While we focus primarily on the configuration that an SP has to do, we touch upon the configuration requirements of CEs as well. The main point of contact in CE-PE configuration is that both must agree on the DLCIs that will be used on the interface connecting them. If the PE-CE connection is Frame Relay, it is recommended to run LMI between the PE and CE with the PE as DCE and the CE as DTE. For the case of ATM VCs, OAM cells may be used; for PPP and Cisco HDLC, keepalives may be used. 4.2.1. CE Configuration Each CE that belongs to a VPN is given a "CE ID". CE IDs must be unique in the context of a VPN. We assume that the CE ID for CE-k is k. Each CE is also configured with a maximum number of CEs that it can connect to; this is the CE's "range". Each CE is configured to communicate with its corresponding PE with the set of DLCIs given above; for example, CE0 is configured with Kompella et al. [Page 8] Internet Draft draft-kompella-mpls-l2vpn-00.txt October 2000 Figure 1: Example Network Topology S0 S3 .............. .............. . . . . . +-----+ . . . . | CE0 |-----------+ . +-----+ . . +-----+ . | . | CE5 | . . . | . +--+--+ . . +-----+ . | . | . . | CE1 |-------+ | .......|...... . +-----+ . | | / . . | | / .............. | | / | | SP Network / .....|...|.............................../..... . | | / . . +-+---+-+ +-------+ / . . | PE0 |-------| P |-- | . . +-+---+-+ +-------+ \ | . . / \ \ +---+---+ . . | -----+ --| PE2 | . . | | +---+---+ . . | +---+---+ / . . | | PE1 | / . . | +---+---+ / . . | \ / . ...|.............|.............../............. | | / | | / | | / S1 | | S2 / .............. | ........|........../...... . . | . | | . . +-----+ . | . +--+--+ +--+--+ . . | CE2 |-----+ . | CE3 | | CE4 | . . +-----+ . . +-----+ +-----+ . . . . . .............. .......................... DLCIs 100 through 109. OSPF is configured to run over each DLCI. Each CE also "knows" which DLCI connects it to each other CE. A simple algorithm is to use the CE ID of the other CE as an index into the DLCI list this CE has (with zero-based indexing, i.e., 0 is the first index). For example, CE0 is connected to CE3 through its Kompella et al. [Page 9] Internet Draft draft-kompella-mpls-l2vpn-00.txt October 2000 fourth DLCI, 103; CE4 is connected to CE2 by the third DLCI in its list, namely 265. This is the methodology used in the examples below; the actual methodology used to pick the DLCI to be used is a local matter; the key factor is that CE-k may communicate with CE-m using a different DLCI from the DLCI that CE-m uses to communicate to CE-k, i.e., the SP network effectively acts as a giant Frame Relay switch. This is very important, as it decouples the DLCIs used at each CE site, making for much simpler provisioning. 4.2.2. PE Configuration Each PE is configured with the VPNs in which it participates. Each VPN has an VPN ID that is unique within the SP network. For each VPN, the PE has a list of CEs that are members of that VPN. For each CE, the PE knows the CE ID, which DLCIs to expect from the CE, and the CE's range. 4.2.3. Adding a New Site The first step in adding a new site to a VPN is to pick a new CE ID. If all current members of the VPN are over-provisioned, i.e., their range includes the new CE ID, adding the new site is a purely local task. Otherwise, the sites whose range doesn't include the new CE ID and wish to communicate directly with the new CE must have their ranges increased to incorporate the new CE ID. The next step is ensuring that the new site has the required connectivity (see below). This may require tweaking the connectivity mechanism; however, in several common cases, the only configuration needed is local to the PE to which the CE is attached. The rest of the configuration is a local matter between the new CE and the PE to which it is attached. It bears repeating that the key to making additions easy is over- provisioning. However, what is being over-provisioned is the number of DLCIs/VCIs that connect the CE to the PE. This is a local matter, and generally is not an issue. 4.3. PE Information Exchange When a PE is configured with all the needed information for a CE, it first of all chooses a contiguous set of labels with n labels, where n is the CE's range. Call the smallest label in this set the label- base. The PE then advertises (for this CE): its Router ID, the VPN Kompella et al. [Page 10] Internet Draft draft-kompella-mpls-l2vpn-00.txt October 2000 ID, the CE ID, the CE's range, and the label-base. This is the basic Layer 2 VPN advertisement. This same advertisement is sent to all other PEs. Note that PEs that may not be part of the VPN can receive and keep this information, in case at some future point, a CE connected to the PE joins the VPN. If the PE-CE connection goes down, or the CE configuration is removed, the above advertisement is withdrawn. 4.3.1. PE Advertisement Processing When a PE receives a Layer 2 VPN advertisement, it checks if the VPN ID matches any VPN that it is a member of. If not, the PE just stores the advertisement for future use. Otherwise, suppose the advertisement is from PE A for VPN X, CE m with range Rm and label base Lm. For each CE that the receiving PE B is connected to that is a member of VPN X, PE B does the following. 0) Look up the configuration information associated with the CE. If the encapsulation type for VPN X in the advertisement does not match the configured encapsulation type for VPN X, stop. 1) Say the configured CE ID is k, the range is Rk, and the DLCI list is Dk[]. Also, get the label base PE B allocated for this CE, say Lk. 2) Check if k = m. If so, issue an error: "CE ID k has been allocated to two CEs in VPN X (check CE at PE A)". Stop. 3) Check if k >= Rm, or m >= Rk. If so, issue a warning: "Cannot communicate with CE m (PE A) of VPN X: outside range". Stop. 4) Look in the appropriate table to see which label will get to PE A. This is the "outer" label, Z. 5) The DLCI that CE-k will use to talk to CE-m is Dk[m]. The "inner" label for sending packets to CE-m is (Lm + k). The "inner" label on which to expect packets from CE-m is (Lk + m). 6) Install a "route" such that packets from CE-k with DLCI Dk[m] will be sent with outer label Z, inner label (Lm + k). Also, install a route such that packets received with label (Lk + m) will be mapped to DLCI Dk[m] and be sent to CE-k. 7) Activate DLCI Dk[m] to the CE. This can be done using LMI. If an advertisement is withdrawn, the appropriate DLCI must be de- activated, and the corresponding routes must be removed from the forwarding table. Kompella et al. [Page 11] Internet Draft draft-kompella-mpls-l2vpn-00.txt October 2000 4.3.2. Example of PE Advertisment Processing Consider the example network of Figure 1. Let the VPN connecting S0, S1, S2 and S3 has a VPN id of 1. Suppose PE2 receives an advertisement from PE0 for VPN 1, CE ID 0 with CE range R0 = 10 and label base L0 = 1000. Since PE2 is connected to CE4 which is also in VPN 1, PE2 does the following: 0) Look up the configuration information associated with CE4. The advertised encapsulation type matches the configured encapsulation type (both are Frame Relay), so proceed. 1) CE4's range R4 is 9, its DLCI list D4[] is [ 107, 209, 265, 301, 414, 555, 654, 777, 888], and its label base L4 is 4000. 2) CE0 and CE4 have ids 0 and 4 respectively, so step 2 of 4.3.1 is skipped. 3) Since CE4's id is less than R0, and CE0's id is less than R4, step 3 of 4.3.1 is skipped. 4) Look in the appropriate table on PE2 to see which label will get to PE0. Let the label be 10001. 5) The DLCI that CE4 will use to talk to CE0 is D4[0], i.e., 107. The inner label for sending packets to CE0 is (L0 + 4), i.e 1004. The inner label on which to expect packets from CE0 is (L4 + 0), i.e., 4000. 6) Install a "route" such that packets from CE4 with DLCI 107 will be sent with outer label 10001, inner label 1004. Also, install a route such that packets received with label 4000 will be mapped to DLCI 107 and be sent to CE4. 7) Activate DLCI 107 to CE4. Since CE5 is also attached to PE2, PE2 needs to do processing similar to the above for CE5. Similarly, when PE0 receives an advertisment from PE2 for VPN1, CE4, with range R4 = 9, and label base L4 = 4000. PE0 processes the advertisment for CE0 (and CE1, which is also in VPN 1). 0) Look up the configuration information associated with CE0. The advertised encapsulation type matches the configured encapsulation type (both are Frame Relay), so proceed. 1) CE0's range, R0, is 9, its DLCI list D0[] is [100 - 109], and its label base L0 is 1000. 2) CE0 and CE4 have ids 0 and 4 respectively, so step 2 of 4.3.1 is skipped. 3) Since CE4's id is less than R0, and CE0's id is less than R4, step 3 of 4.3.1 is skipped. 4) Let the outer label to reach PE2 be 9999. 5) The DLCI which CE0 will use to talk to CE4 is D0[4], i.e., 104. The inner label for sending packets to CE4 is (L4 + 0), i.e. Kompella et al. [Page 12] Internet Draft draft-kompella-mpls-l2vpn-00.txt October 2000 4000. The inner label on which to expect packets from CE4 is (L0 + 4), i.e., 1004. 6) Install a "route" such that packets from CE0 with DLCI 104 will be sent with outer label 9999, inner label 4000. Also, install a route that packets received with label 1004 will be mapped to DLCI 104 and be sent to CE0. 7) Activate DLCI 104 to CE0. Note that the inner label of 4000, computed by PE0, for sending packets from CE0 to CE4 is the same as what PE2 computed as the incoming label for receiving packets originated at CE0 and destined to CE4. Similarly, the inner label of 1004, computed by PE0, for receiving packets from CE4 to CE0 is same as what PE2 computed as the outgoing label for sending packets originated at CE4 and destined to CE0. 4.3.3. Generalizing the VPN Topology In the above, we assumed for simplicity that the VPN was a full mesh. To allow for more general VPN topologies when using LDP for signalling, we introduce the notion of node colors, and the "spoke" attribute; together, these constitute a node's "connectivity". A node (CE) in a VPN can be colored with one or more colors. Furthermore, a node may be a hub or a spoke. Two nodes are connected iff they share a color in common, and they are not both spokes. To incorporate connectivity into the processing of advertisements, add step 3' to the above: 3') If CE k and CE m are not connected, stop. This notion of connectivity does not allow arbitrary topologies to be built; however, it is a compromise of generality and efficiency. A more general mechanism based on BGP extended communities can also be used; naturally, this mechanism can only be used when signalling VPNs with BGP. See below for details. 4.4. Packet Transport When a packet arrives at a PE from a CE in a Layer 2 VPN, the layer 2 address of the packet identifies to which other CE the packet is destined. The procedure outlined above installs a route that maps the layer 2 address to an outer label and an inner label. The layer 2 address is stripped from the packet (see below), the labels prepended, the packet encapsulated as an MPLS packet, and sent to the Kompella et al. [Page 13] Internet Draft draft-kompella-mpls-l2vpn-00.txt October 2000 PE to which the destination CE is attached. When the packet arrives at the destination PE, the inner label is used to determine which CE is the destination CE, and which new layer 2 address to prepend to the packet. The label is stripped, the layer 2 address added, and the fully-formed layer 2 packet is sent to the CE. The MTU on the Layer 2 access links MUST be chosen such that the size of the L2 frames plus the L2VPN header does not exceed the MTU of the MPLS network. Layer 2 frames that exceed the MPLS MTU after encapsulation MUST be dropped. 4.4.1. Layer 2 Frame Format For each VPN encapsulation type (see section 5.1.3), we describe below the format of the frame as it is transported in the MPLS LSP. Note that the "outer" label may not always be necessary. Figure 2: MPLS LSP packet format +---------------------------------------------------+ | MPLS | Outer | Inner | Sequence | Modified Layer | | Encap | Label | Label | Number | 2 Frame | +---------------------------------------------------+ The "Outer Label" is used to transport the packet to the PE that is attached to the destination CE. The "Inner Label" is used by the destination PE to distinguish which CE to send the packet to, and what layer 2 address to use (if applicable). The "Sequence Number" is an optional two octet unsigned number that wraps back to zero that is used to ensure in-sequence delivery of L2 frames. The sequence number field is only included if its use is indicated via VPN signalling. A Layer 2 'connection' between two specific CEs is characterized within the MPLS network by the PEs to which the CEs are attached and a specific Inner Label in each direction. For each such Layer 2 connection, the sequence number field is set to zero for the first packet transmitted and incremented by 1 for each subsequent packet sent on the same Layer 2 connection. When an out-of-sequence packet arrives at the receiver, it MAY be buffered for future delivery or discarded. The modification to the Layer 2 frame depends on the Layer 2 type. In general, the modification consists simply of removing 0 or more Kompella et al. [Page 14] Internet Draft draft-kompella-mpls-l2vpn-00.txt October 2000 octets from the start of the frame. The following describes the modifications for ATM AAL/5, ATM cells, Frame Relay, PPP, Cisco HDLC and Ethernet VLAN. For ATM AAL/5 VPNs, the AAL/5 PDU is transported without indication of the VPI/VCI. At the receiving PE, the AAL/5 PDU is fragmented, a cell header with the correct VPI/VCI added to each cell, and the cells sent to the CE. For ATM Cell VPNs, ATM cells (including the 5 octet header) are transported. At the receiving PE, the cells are sent to the CE. For Frame Relay VPNs (with two octet DLCIs), the two DLCI octets are stripped, and the rest of the Layer 2 frame transported. At the receiving PE, the new DLCI is added back to the frame, and this is sent to the CE. For PPP, Cisco HDLC and unswitched Ethernet VLANs VPNs, the Layer 2 frame is transported whole, without any modification. The Layer 2 frame does not include HLDC flags or Ethernet preamble, nor CRCs; we assume that bit/byte stuffing has been undone. At the receiving PE, the frame is sent to the CE. 5. Signalling MPLS-Based Layer 2 VPNs There are two alternative means of signalling the MPLS-based Layer 2 VPNs described in this document: using LDP ([LDP]) or using BGP version 4 ([BGP]). In LDP, VPN CE information and its associated label base are carried in a Label Mapping message, distributed in the downstream unsolicited mode described in [LDP]. A new FEC element is defined below to carry all the information corresponding to a VPN CE, except from the label base. The label base is carried in the Label TLV following the FEC TLV. If a FEC element in a FEC TLV encodes Layer 2 VPN information, it MUST be the only FEC element in the FEC TLV. The Layer 2 VPN FEC element is depicted in Figure 3 below. In BGP, the Multiprotocol Extensions [BGP-MP] are used to carry L2-VPN signalling information. [BGP-MP] defines the format of two BGP attributes (MP_REACH_NLRI and MP_UNREACH_NLRI) that can be used to announce and withdraw the announcement of reachability information. We introduce a new address family identifier (AFI) for L2-VPN [to be assigned by IANA], a new subsequent address family identifier (SAFI) [to be assigned by IANA], and also a new NLRI Kompella et al. [Page 15] Internet Draft draft-kompella-mpls-l2vpn-00.txt October 2000 Figure 3: L2 VPN FEC Element 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type | Encaps. Type | Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Control Flags | Reserved (Must Be Zero) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | VPN ID | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | CE ID | CE Range | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | CE Connectivity | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Sub-TLVs | . ... . . ... . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ format for carrying the individual L2-VPN CE information. This NLRI will be carried in the above-mentioned BGP attributes. This NLRI MUST be accompanied by one or more extended communities. The extended community type is "Layer 2 VPN" (to be assigned by IANA); and the format is :, where is 4 octets in length, and is two octets. All extended communities accompanying one or more Layer 2 VPN NLRIs MUST have the same . PEs receiving VPN information may filter advertisements based on the extended communities, thus controlling CE-to-CE connectivity. The format of the Layer 2 VPN NLRI is as shown in Figure 4 below. 5.1. Signalled Information 5.1.1. Type (LDP only) The Type is L2-VPN (to be decided by IETF Consensus Action). Kompella et al. [Page 16] Internet Draft draft-kompella-mpls-l2vpn-00.txt October 2000 Figure 4: BGP NLRI for L2 VPN Information +------------------------------------+ | Length (2 octets) | +------------------------------------+ | Encaps Type (1 octet) | +------------------------------------+ | Control Flags (1 octet) | +------------------------------------+ | Label base (3 octets) | +------------------------------------+ | Reserved (Must Be Zero) (1 octet) | +------------------------------------+ | CE ID (2 octets) | +------------------------------------+ | CE Range (2 octets) | +------------------------------------+ | Variable TLVs (0 to n octets) | | ... | +------------------------------------+ 5.1.2. Length In LDP, the Length is the entire length of the L2 VPN FEC element, including the fixed header and all the sub-TLVs. In BGP, the Length field indicates the length in bytes of the L2-VPN address prefix. 5.1.3. Encapsulation Type Identifies the layer 2 encapsulation, e.g., ATM, Frame Relay etc. The following encapsulation types are defined: Value Encapsulation 0 Reserved 1 ATM PDUs (AAL/5) 2 ATM Cells 3 Frame Relay 4 PPP 5 Cisco-HDLC 6 Ethernet VLAN (unswitched) 7 MPLS Kompella et al. [Page 17] Internet Draft draft-kompella-mpls-l2vpn-00.txt October 2000 5.1.4. Control Flags This is a bit vector, defined as in the following Figure. Figure 5: Control Flags Bit Vector 0 1 2 3 4 5 6 7 +-+-+-+-+-+-+-+-+ | Reserved |S| +-+-+-+-+-+-+-+-+ The following bit is defined; the rest MUST be set to zero. Name Bit Meaning S 0 Sequenced delivery of frames is required 5.1.5. Label base (BGP only) The label-base which is to be used for determining the inner label for forwarding packets to the CE identified by CE ID. (Note: LDP carries the label-base in the Label TLV following the FEC TLV.) 5.1.6. VPN ID (LDP only) A 32 bit number which uniquely identifies a VPN in a provider's domain. 5.1.7. CE ID A 16 bit number which uniquely identifies a CE in a VPN. 5.1.8. CE Range A 16 bit number which describes the range of CE IDs to which the advertised CE is willing to connect. In particular, a PE receiving an L2 VPN TLV MUST NOT use a label greater than or equal to + when sending traffic for this VPN to the advertising PE. Kompella et al. [Page 18] Internet Draft draft-kompella-mpls-l2vpn-00.txt October 2000 5.1.9. CE Connectivity (LDP only) A 32-bit number encoding connectivity. If the leftmost bit is 1, the CE is a spoke. The remaining 31 bits encode the CE colors (bit i = 1 means the CE has color i). 5.1.10. Sub-TLVs New sub-TLVs can be introduced as needed. In LDP, the TLV encoding mechanism described in [LDP] must be used. In BGP, TLVs (type takes 1 octet) can be added to extend the information carried in the L2 VPN address prefix. A TLV (type = 1) will be used for carrying VLAN IDs if the encapsulation is VLAN. 5.2. BGP L2 VPN capability The BGP Multiprotocol capability extension [BGP-CAP] is used to indicate that the BGP speaker wants to negotiate L2 VPN capability with its peers. The capability code is 1, the capability length is 4, and the AFI and SAFI values will be set to the L2 VPN AFI and L2 VPN SAFI (discussed in section 5) respectively. 5.3. Advantages of Using BGP PE routers in an SP network typically run BGP v4. This means that SPs are familiar with using BGP, and have already configured BGP on their PEs, so configuring and using BGP to signal Layer 2 VPNs is not much of an additional burden to the SP operators. This is especially the case when the protocol of choice for signalling MPLS LSPs across the SP network is RSVP (perhaps for its Traffic Engineering properties); in this case, the SP may find using LDP to signal Layer 2 VPN information undesirable. Another advantage of using BGP is that with BPG it is easier to build inter-provider VPNs. Mechanisms for this will be described in a future version. Kompella et al. [Page 19] Internet Draft draft-kompella-mpls-l2vpn-00.txt October 2000 6. Acknowledgments The authors would like to thank Dennis Ferguson, Der-Hwa Gan, Dave Katz, Nischal Sheth, John Stewart, and Paul Traina for the enlightening discussions that helped shape the ideas presented here, and Ross Callon for his valuable comments. The idea of using extended communities for more general connectivity of a Layer 2 VPN was a contribution by Yakov Rekhter, who also gave many useful comments on the text; many thanks to him. 7. Security Considerations The security aspects of this solution will be discussed at a later time. 8. IANA Considerations (To be filled in in a later revision.) 9. References [BGP] Rekhter, Y., and Li, T., "A Border Gateway Protocol 4 (BGP-4)", RFC 1771, March 1995. [BGP-CAP] Chandra, R., and Scudder, J., "Capabilities Advertisement with BGP-4", RFC 2842, May 2000. [BGP-MP] Bates, T., Rekhter, Y., Chandra, R., and Katz, D., "Multiprotocol Extensions for BGP-4", RFC 2858, June 2000 [BGP-ORF] Chen, E., and Rekhter, Y., "Cooperative Route Filtering Capability for BGP-4", March 2000 (work in progress). [BGP-RFSH] Chen, E., "Route Refresh Capability for BGP-4", draft- ietf-idr-bgp-route-refresh-01.txt, March 2000, (work in progress). [IPVPN] Rosen, E., and Rekhter, Y., "BGP/MPLS VPNs", RFC 2547, March 1999. [LDP] Andersson, L., Doolan, P., Feldman, N., Fredette, A., and Thomas, B., "LDP Specification", draft-ietf-mpls-ldp-11.txt, August 2000 (work in progress). [MPLS] Callon, R., Doolan, P., Feldman, N., Fredette, A., Swallow, Kompella et al. [Page 20] Internet Draft draft-kompella-mpls-l2vpn-00.txt October 2000 G., and Viswanathan, A., "A Framework for Multiprotocol Label Switching", draft-ietf-mpls-framework-05.txt, September 1999 (work in progress). [VPN] Kosiur, Dave, "Building and Managing Virtual Private Networks", Wiley Computer Publishing, 1998. 10. Intellectual Property Considerations Juniper Networks may seek patent or other intellectual property protection for some of all of the technologies disclosed in this document. If any standards arising from this document are or become protected by one or more patents assigned to Juniper Networks, Juniper intends to disclose those patents and license them on reasonable and non-discriminatory terms. 11. Full Copyright Statement Copyright (C) The Internet Society (2000). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Kompella et al. [Page 21] Internet Draft draft-kompella-mpls-l2vpn-00.txt October 2000 12. Author Information Kireeti Kompella Juniper Networks 1194 N. Mathilda Ave Sunnyvale, CA 94089 kireeti@juniper.net Manoj Leelanivas Juniper Networks 1194 N. Mathilda Ave Sunnyvale, CA 94089 manoj@juniper.net Quaizar Vohra Juniper Networks 1194 N. Mathilda Ave Sunnyvale, CA 94089 qv@juniper.net Javier Achirica Telefonica Data javier.achirica@telefonica-data.com Ronald P. Bonica WorldCom 22001 Loudoun County Pkwy Ashburn, Virginia, 20147 rbonica@mci.net Chris Liljenstolpe Cable & Wireless chris@cw.net Eduard Metz KPN Royal Dutch Telecom St. Paulusstraat 4 2264 XZ Leidschendam The Netherlands e.t.metz@kpn.com Chandramouli Sargor CoSine Communications 1200 Bridge Parkway Redwood City, CA 94065 csargor@cosinecom.com Kompella et al. [Page 22] Internet Draft draft-kompella-mpls-l2vpn-00.txt October 2000 Vijay Srinivasan CoSine Communications 1200 Bridge Parkway Redwood City, CA 94065 vijay@cosinecom.com Kompella et al. [Page 23]