Internet Engineering Task Force N. Leymann, Ed. Internet-Draft Deutsche Telekom AG Intended status: Informational B. Decraene Expires: September 12, 2011 France Telecom C. Filsfils Cisco Systems M. Konstantynowicz Juniper Networks D. Steinberg Steinberg Consulting March 11, 2011 Seamless MPLS Architecture draft-leymann-mpls-seamless-mpls-03 Abstract This documents describes an architecture which can be used to extend MPLS networks to integrate access and aggregation networks into a single MPLS domain ("Seamless MPLS"). The Seamless MPLS approach is based on existing and well known protocols. It provides a highly flexible and a scalable architecture and the possibility to integrate 100.000 of nodes. The separation of the service and transport plane is one of the key elements; Seamless MPLS provides end to end service independent transport. Therefore it removes the need for service specific configurations in network transport nodes (without end to end transport MPLS, some additional services nodes/configurations would be required to glue each transport domain). This draft defines a routing architecture using existing standardized protocols. It does not invent any new protocols or defines extensions to existing protocols. Status of this Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." Leymann, et al. Expires September 12, 2011 [Page 1] Internet-Draft Seamless MPLS March 2011 This Internet-Draft will expire on September 12, 2011. Copyright Notice Copyright (c) 2011 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Leymann, et al. Expires September 12, 2011 [Page 2] Internet-Draft Seamless MPLS March 2011 Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 5 1.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 5 2. Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.1. Why Seamless MPLS . . . . . . . . . . . . . . . . . . . . 7 2.2. Use Case #1 . . . . . . . . . . . . . . . . . . . . . . . 8 2.2.1. Description . . . . . . . . . . . . . . . . . . . . . 8 2.2.2. Typical Numbers . . . . . . . . . . . . . . . . . . . 11 2.3. Use Case #2 . . . . . . . . . . . . . . . . . . . . . . . 11 2.3.1. Description . . . . . . . . . . . . . . . . . . . . . 11 2.3.2. Typical Numbers . . . . . . . . . . . . . . . . . . . 13 3. Requirements . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.1. Overall . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.1.1. Access . . . . . . . . . . . . . . . . . . . . . . . . 14 3.1.2. Aggregation . . . . . . . . . . . . . . . . . . . . . 14 3.1.3. Core . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.2. Multicast . . . . . . . . . . . . . . . . . . . . . . . . 15 3.3. Availability . . . . . . . . . . . . . . . . . . . . . . . 15 3.4. Scalability . . . . . . . . . . . . . . . . . . . . . . . 16 3.5. Stability . . . . . . . . . . . . . . . . . . . . . . . . 16 4. Architecture . . . . . . . . . . . . . . . . . . . . . . . . . 16 4.1. Overall . . . . . . . . . . . . . . . . . . . . . . . . . 16 4.2. Multi-Domain MPLS networks . . . . . . . . . . . . . . . . 16 4.3. Hierarchy . . . . . . . . . . . . . . . . . . . . . . . . 17 4.4. Intra-Domain Routing . . . . . . . . . . . . . . . . . . . 17 4.5. Inter-Domain Routing . . . . . . . . . . . . . . . . . . . 17 4.6. Access . . . . . . . . . . . . . . . . . . . . . . . . . . 18 5. Deployment Scenarios . . . . . . . . . . . . . . . . . . . . . 18 5.1. Deployment Scenario #1 . . . . . . . . . . . . . . . . . . 18 5.1.1. Overview . . . . . . . . . . . . . . . . . . . . . . . 18 5.1.2. General Network Topology . . . . . . . . . . . . . . . 18 5.1.3. Hierarchy . . . . . . . . . . . . . . . . . . . . . . 19 5.1.4. Intra-Area Routing . . . . . . . . . . . . . . . . . . 20 5.1.4.1. Core . . . . . . . . . . . . . . . . . . . . . . . 20 5.1.4.2. Aggregation . . . . . . . . . . . . . . . . . . . 20 5.1.5. Access . . . . . . . . . . . . . . . . . . . . . . . . 20 5.1.5.1. LDP Downstream-on-Demand (DoD) . . . . . . . . . . 21 5.1.6. Inter-Area Routing . . . . . . . . . . . . . . . . . . 22 5.1.7. Labled iBGP next-hop handling . . . . . . . . . . . . 23 5.1.8. Network Availability and Simplicity . . . . . . . . . 24 5.1.8.1. IGP Convergence . . . . . . . . . . . . . . . . . 24 5.1.8.2. Per-Prefix LFA FRR . . . . . . . . . . . . . . . . 25 5.1.8.3. Hierarchical Dataplane and BGP Prefix Independent Convergence . . . . . . . . . . . . . 25 5.1.8.4. Local Protection using Anycast BGP . . . . . . . . 26 5.1.8.5. Assessing loss of connectivity upon any failure . 31 Leymann, et al. Expires September 12, 2011 [Page 3] Internet-Draft Seamless MPLS March 2011 5.1.8.6. Network Resiliency and Simplicity . . . . . . . . 36 5.1.8.7. Conclusion . . . . . . . . . . . . . . . . . . . . 37 5.1.9. Next-Hop Redundancy . . . . . . . . . . . . . . . . . 37 5.2. Scalability Analysis . . . . . . . . . . . . . . . . . . . 38 5.2.1. Control and Data Plane State for Deployment Scenario #1 . . . . . . . . . . . . . . . . . . . . . 38 5.2.1.1. Introduction . . . . . . . . . . . . . . . . . . . 38 5.2.1.2. Core Domain . . . . . . . . . . . . . . . . . . . 39 5.2.1.3. Aggregation Domain . . . . . . . . . . . . . . . . 40 5.2.1.4. Summary . . . . . . . . . . . . . . . . . . . . . 41 5.2.1.5. Numerical application for use case #1 . . . . . . 42 5.2.1.6. Numerical application for use case #2 . . . . . . 42 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 43 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 43 8. Security Considerations . . . . . . . . . . . . . . . . . . . 44 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 44 9.1. Normative References . . . . . . . . . . . . . . . . . . . 44 9.2. Informative References . . . . . . . . . . . . . . . . . . 44 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 46 Leymann, et al. Expires September 12, 2011 [Page 4] Internet-Draft Seamless MPLS March 2011 1. Introduction MPLS as a mature and well known technology is widely deployed in today's core and aggregation/metro area networks. Many metro area networks are already based on MPLS delivering Ethernet services to residential and business customers. Until now those deployments are usually done in different domains; e.g. core and metro area networks are handled as separate MPLS domains. Seamless MPLS extends the core domain and integrates aggregation and access domains into a single MPLS domain ("Seamless MPLS"). This enables a very flexible deployment of an end to end service delivery. In order to obtain a highly scalable architecture Seamless MPLS takes into account that typical access devices (DSLAMs, MSAN) are lacking some advanced MPLS features, and may have more scalability limitations. Hence access devices are kept as simple as possible. Seamless MPLS is not a new protocol suite but describes an architecture by deploying existing protocols like BGP, LDP and ISIS. Multiple options are possible and this document aims at defining a single architecture for the main function in order to ease implementation prioritization and deployments in multi vendor networks. Yet the architecture should be flexible enough to allow some level of personalization, depending on use cases, existing deployed base and requirements. Currently, this document focus on end to end unicast LSP. 1.1. Requirements Language The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119]. 1.2. Terminology This document uses the following terminology o Access Node (AN): An access node is a node which processes customers frames or packets at Layer 2 or above. This includes but is not limited to DSLAMs or OLTs (in case of (G)PON deployments). Access nodes have only limited MPLS functionalities in order to reduce complexity in the access network. o Aggregation Node (AGN): An aggregation node (AGN) is a node which aggregates several access nodes (ANs). o Area Border Router (ABR): Router between aggregation and core domain. Leymann, et al. Expires September 12, 2011 [Page 5] Internet-Draft Seamless MPLS March 2011 o Deployment Scenario: Describes which an implementation of Seamless MPLS in order to fullfil the requirements derived from one or more use cases. o Seamless MPLS Domain: A set of MPLS equipments which can set MPLS LSPs between them. o Transport Node (TN): Transport nodes are used to connect access nodes to service nodes, and services nodes to services nodes. Transport nodes ideally have no customer or service state and are therefore decoupled from service creation. o Seamless MPLS (S-MPLS): Used as a generic term to describe an architecture which integrates access, aggregation and core network in a single MPLS domain. o Service Node (SN): A service node is used to create services for customers and is connected to one or more transport nodes. Typical examples include Broadband Network Gateways (BNGs), video servers o Transport Pseudo Wire (T-PW): A transport pseudowire provides service independent transport mechanisms based on Pseudo-Wires within the Seamless MPLS architecture. o Use Case: Describes a typical network including service creation points in order to describe the requirments, typical numbers etc. which need to be taken into account when applying the Seamless MPLS architecture. 2. Motivation MPLS is deployed in core and aggregation network for several years and provides a mature and stable basis for large networks. In addition MPLS is already used in access networks, e.g. such as mobile or DSL backhaul. Today MPLS as technology is being used on two different layers: o the Transport Layer and o the Service Layer (e.g. for MPLS VPNs) In both cases the protocols and the encapsulation are identical but the use of MPLS is different especially concerning the signalling, the control plane, the provisioning, the scalability and the frequency of updates. On the service layer only service specific information is exchanged; every service can potentially deploy it's Leymann, et al. Expires September 12, 2011 [Page 6] Internet-Draft Seamless MPLS March 2011 own architecture and individual protocols. The services are running on top of the transport layer. Nevertheless those deployments are usually isolated, focussed on a single use case and not integrated into an end-to-end manner. The motivation of Seamless MPLS is to provide an architecture which supports a wide variety of different services on a single MPLS platform fully integrating access, aggregation and core network. The architecture can be used for residential services, mobile backhaul, business services and supports fast reroute, redundancy and load balancing. Seamless MPLS provides the deployment of service creation points which can be virtually everywhere in the network. This enables network and service providers with a flexible service and service creation. Service creation can be done based on the existing requirements without the needs for dedicated service creation areas on fixed locations. With the flexibility of Seamless MPLS the service creation can be done anywhere in the network and easily moved between different locations. 2.1. Why Seamless MPLS Multiple SP plan to deploy networks with 10k to 100k MPLS nodes. This is typically at least one order of magnitude higher than typical deployments and may require a new architecture. Multiple options are possible and it makes sense for the industry (both vendors and SP) to restrict the options in order to ease the first deployments (e.g. restrict the number of options to implement and/or scales for vendors, reduce interoperability and debugging issues for SP). Many aggregation networks are already deploying MPLS but are limited to the use of MPLS per aggregation area. Those MPLS based aggregation domains are connected to a core network running MPLS as well. Nevertheless most of the services are not limited to an aggregation domain but running between several aggregation domains crossing the core network. In the past it was necessary to provide connectivity between the different domains and the core on a per service level and not based on MPLS (e.g. by deploying native IP- Routing or Ethernet based technologies between aggregation and core). In most cases service specific configurations on the border nodes between core and aggregation were required. New services led to additional configurations and changes in the provisioning tools (see Figure 1). With Seamless MPLS there are no technology boundaries and no topology boundaries for the services. Network (or region) boundaries are for scaling and manageability, and do not affect the service layer, since the Transport Pseudowire that carries packets from the AN to the SN doesn't care whether it takes two hops or twenty, nor how many region Leymann, et al. Expires September 12, 2011 [Page 7] Internet-Draft Seamless MPLS March 2011 boundaries it needs to cross. The network architecture is about network scaling, network resilience and network manageability; the service architecture is about optimal delivery: service scaling, service resilience (via replicated SNs) and service manageability. The two are decoupled: each can be managed separately and changed independently. +--------------+ +--------------+ +--------------+ | Aggregation | | Core | | Aggregation | | Domain #1 +---------+ Domain +---------+ Domain #2 | | MPLS | ^ | MPLS | ^ | MPLS | +--------------+ | +--------------+ | +--------------+ | | +------ service specific ------+ configuration Figure 1: Service Specific Configurations One of the main motivations of Seamless MPLS is to get rid of services specific configurations between the different MPLS islands. Seamless MPLS connects all MPLS domains on the MPLS transport layer providing a single transport layer for all services - independent of the service itself. The Seamless MPLS architecture therefore decuples the service and transport layer and integrates access, aggregation and core into a single platform. One of the big advantages is that problems on the transport layer only need to be solved once (and the solutions are available to all services). With Seamless MPLS it is not necessary to use service specific configurations on intermediate nodes; all services can be deployed in an end to end manner. 2.2. Use Case #1 2.2.1. Description In most cases at least residential and business services need to be supported by a network. This section describes a Seamless MPLS use case which supports such a scenario. The use case includes point to point services for business customers as well as typical service creation for residential customers. Leymann, et al. Expires September 12, 2011 [Page 8] Internet-Draft Seamless MPLS March 2011 +-------------+ | Service | | Creation | | Residential | | Customers | +------+------+ | | | PW1 +-------+ +---+---+ ######################### | # +--+ AGN11 +---+ AGN21 + +------+ # / | | /| |\ | | +--------+ +--#-+/ +-------+\/ +-------+ \| | | remote | | AN | /\ + CORE +---......--+ AN | +--#-+\ +-------+ \+-------+ /| | ####### | # \ | | | |/################### +--------+ # +--+ AGN12 +---+ AGN22 +##+------+ P2P Business Service ############################## PW2 +-------+ +-------+ Figure 2: Use Case #1: Service Creation Figure 2 shows the different service creation points and the corresponding pseudowires between the access nodes and the service creation points. The use case does not show all PWs (e.g. not the PWs needed to support redundancy) in order to keep the figure simple. Node and link failures are handled by rerouting the PWs (based on standard mechanisms). End customers (either residential or business customers) are connected to the access nodes using a native technology like Ethernet. The access nodes terminates the PW(s) carrying the traffic for the end customers. The link between the access node (AN) and the aggregation node (AGN) is the first MPLS enabled link. Residential Services: The service creation for all residential customers connected to the Access Nodes in an aggregation domain is located on an Service Node connected to the AGN2x. The PW (PW1) originated at the AN and terminates at the AGN2. A second PW is deployed in the case where redundancy is needed on the AN (the figure shows redundancy but this might not be the case for all ANs in this Use Case). Additonal PWs can be deployed as well in case more than a single service creation is needed for the residential service (e.g. one service creation point for Internet access and a second service creation point for IPTV services). Leymann, et al. Expires September 12, 2011 [Page 9] Internet-Draft Seamless MPLS March 2011 Business Sercvices: For business services the use cases shows point to point connections between two access nodes. PW2 originates at the AN and terminates on the remote AN crossing two aggregation areas and the core network. If the access node needs connections to several remote ANs the corresponding number of PWs will be originated at the AN. Nevertheless taking the number of ports available and the number of business customers on a typical access node the number of PWs will be relatively small. +-------+ +-------+ +------+ +------+ | | | | | | | | +--+ AGN11 +---+ AGN21 +---+ ABR1 +---+ LSR1 +--> to AGN / | | /| | | | | | +----+/ +-------+\/ +-------+ +------+ /+------+ | AN | /\ \/ +----+\ +-------+ \+-------+ +------+/\ +------+ \ | | | | | | \| | +--+ AGN12 +---+ AGN22 +---+ ABR2 +---+ LSR2 +--> to AGN | | | | | | | | +-------+ +-------+ +------+ +------+ static route ISIS L1 LDP ISIS L2 LDP <-Access-><--Aggregation Domain--><---------Core---------> Figure 3: Use Case #1: Redundancy Figure 3 shows the redundancy at the access and aggregation network deploying a two stage aggregation network (AGN1x/AGN2x). Nevertheless redundancy is not a MUST in this use case. It is also possible to use non redundant connection between the ANs and AGN1 stage and/or between the AGN1 and AGN2 stages. The AGN2x stage is used to aggregate traffic from several AGN1x pairs. In this use case an aggregation domain is not limited to the use of a single pair of AGN2x; the deployment of several AGN2 pairs within the domain is also supported. As design goal for the scalability of the routing and forwarding within the Seamless MPLS architecture the following numbers are used: o Number of Aggregation Domains: 100 o Number of Backbone Nodes: 1.000 o Number of Aggregation Nodes: 10.000 o Number of Access Nodes: 100.000 The access nodes (AN) are dual homed to two different aggregation Leymann, et al. Expires September 12, 2011 [Page 10] Internet-Draft Seamless MPLS March 2011 nodes (AGN11 and AGN12) using static routing entries on the AN. The ANs are always source or sink nodes for MPLS traffic but not transit nodes. This allows a light MPLS implementation in order to reduce the complexity in the AN. The aggregation network consists of two stages with redundant connections between the stages (AGN11 is connected to AGN21 and AGN22 as well as AGN12 to AGN21 and AGN22). The gateway between the aggregation and core network is realized using the Area Border Routers (ABR). From the perspective of the MPLS transport layer all systems are clearly identified using the loopback address of the system. An ingress node must be able to establish a service to an arbitrary egress system by using the corresponding MPLS transport label 2.2.2. Typical Numbers Table 1 shows typical numbers which are expected for Use Case #1 (access node). +-------------------+---------------+ | Parameter | Typical Value | +-------------------+---------------+ | IGP Control Plane | 2 | | IP FIB | 2 | | LDP Control Plane | 200 | | LDP FIB | 200 | | BGP Control Plane | 0 | | BGP FIB | 0 | +-------------------+---------------+ Table 1: Use Case #1: Typical Numbers for Access Node 2.3. Use Case #2 2.3.1. Description In most cases, residential, wholesales and business services need to be supported by the network. Leymann, et al. Expires September 12, 2011 [Page 11] Internet-Draft Seamless MPLS March 2011 +-------------+ | Service | | platforms | |(VoIP, VoD..)| | Residential | | Customers | +------+------+ | | +---+ +-----+ +--+--+ +-----+ |AN1|----+AGN11+--+AGN21+---+ ABR | +---+ +--+--+ +--+--+ +--+--+ | | | +---+ +--+--+ | | +----+ |AN2|----+AGN12+ | | --+ PE | +---+ +--+--+ | | +----+ | | | . | | . | | . | | | | | +---+ +---+ +--+--+ +--+--+ +--+--+ |AN4+---+AN3|----+AGN1x+--+AGN22+---+ ABR | +---+ +---+ +-----+ +-----+ +-----+ <-Access-><--Aggregation Domain--><---------Core---------> Figure 4: Use Case #2 The above topology (see Figure 4) is subject to evolutions, depending on AN types and capacities (in terms of number of customers and/or aggregated bandwidth). For examples, AGN1x connection toward AGN2y currently forms a ring but may latter evolve in a square or triangle topology; AGN2y nodes may not be present... Most access nodes (AN) are single attached on one aggregation node using static routing entries on the AN and AGN. Some AN, are dual attached on two different AGN using static routes. Some AN are used as transit by some lower level AN. Static routes are expected to be used between those AN. IPv4, IPv6 and MPLS interconnection between the aggregation and core network is realized using the Area Border Routers (ABR). Any ingress node must be able to establish IPv4, IPv6 and MPLS connections to any egress node in the seamless MPLS domain. Regarding MPLS connectivity requirements, a full mesh of MPLS LSPs is required between the ANs of an aggregation area, at least for 6PE Leymann, et al. Expires September 12, 2011 [Page 12] Internet-Draft Seamless MPLS March 2011 purposes. Some additional LSPs are needed between ANs and some PE in the aggregation area or in the core area for access to services, wholesale and enterprises services. In short, a meshing of LSP is required between the AGN of the whole seamless MPLS domain. Finally, LSP between any node to any node should be possible. From a scalability standpoint, the following numbers are the targets: o Number of Aggregation Domains: 30 o Number of Backbone Nodes: 150 o Number of Aggregation Nodes: 1.500 o Number of Access Nodes: 40.000 2.3.2. Typical Numbers Table 2 shows typical numbers which are expected for Use Case #2 for the purpose of establishing the transport LSPs. They do not take into account the services built in addition. (e.g. 6PE will require additional IPv6 routes). +-------------------+---------------+ | Parameter | Typical Value | +-------------------+---------------+ | IGP Control Plane | 2 | | IP FIB | 2 | | LDP Control Plane | 1000 | | LDP FIB | 1000 | +-------------------+---------------+ Table 2: Use Case #2: Typical Numbers for Access Node 3. Requirements The following section describes the overall requirements which need to be fulfilled by the Seamless MPLS architecture. Beside the general requirements of the architecture itself there are also certain requirements which are related to the different network nodes. o End to End Transport LSP: MPLS based services (pseudowire based, L3-VPN or IP) SHALL be provided by the Seamless MPLS based infrastructure between any nodes. Leymann, et al. Expires September 12, 2011 [Page 13] Internet-Draft Seamless MPLS March 2011 o Scalability: The network SHALL be scalable to the minimum of 100.000 nodes. o Fast convergence (sub second resilience) SHALL be supported. Fast reroute (LFA) SHOULD be supported. o Flexibility: The Seamless MPLS architecture SHALL be applied to a wide variety of existing MPLS deployments. It SHALL use a flexible approach deploying building blocks with the possiblity to use certain features only if those features are needed (e.g. dual homing ANs or fast reroute mechanisms). o Service independence: Service and transport layer SHALL be decoupled. The architecture SHALL remove the need for service specific configurations on intermediate nodes. o Native Multicast support: P2MP MPLS LSPs SHOULD be supported by the Seamless MPLS architecture. o Interoperable end to end OAM mechanisms SHALL be implemented 3.1. Overall 3.1.1. Access In respect of MPLS functionality the access network should be kept as simple as possible. Compared to the aggregation and/or core network within Seamless MPLS a typical access node is less powerful. The control plane and the forwarding should be as simple as possible. To reduce the complexity and the costs of an access node not the full MPLS functionality need to be supported (control and data plane). The use of an IGP should be avoided. Static routing should be sufficient. Required functionality to reach the required scalability should be moved out of the access node. The number of access nodes can be very high. The support of load balancing for layer 2 services should be implemented. 3.1.2. Aggregation The aggregation network aggregates traffic from access nodes. The aggregation Node must have functionalities that enlarge the scalability of the simple access nodes that are connected. The IGP must be link state based. Each aggregation area must be a separated area. All routes that are interarea should use an EGP to keep the IGP small. The aggregation node must have the full scalability concerning control plane and forwarding. The support of load balancing for layer 2 services must be implemented. Leymann, et al. Expires September 12, 2011 [Page 14] Internet-Draft Seamless MPLS March 2011 3.1.3. Core The core connects the aggregation areas. The core network elements must have the full scalability concerning control plane and forwarding. The IGP must be link state based. The core area must not include routes from aggregation areas. All routes that are interarea should use an EGP to keep the IGP small. Each area of the link state based IGP should have less than 2000 routes. The support of load balancing for layer 2 services must be implemented. 3.2. Multicast Compared with unicast connectivity Multicast is more dynamic. User generated messages - like joining or leaving multicast groups - are interacting directly with network components in the access and aggregation network (in order to build the corresponding forwarding states). This leads to the need for a highly dynamic handling of messages on access and aggregation nodes. Nevertheless the core network SHOULD be stable and state changes triggered by user generated messages SHOULD be minimized. This rises the need for an hierarchy for the P2MP support in Seamless MPLS hiding the dynamic behaviour of the access and aggregation nodes o mLDP o P2MP RSVP-TE 3.3. Availability All network elements should be high available (99.999% availability). Outage times should be as low as possible. A repair time of 50 milliseconds or less should be guarantied at all nodes and lines in the network that are redundant. Fast convergence features SHOULD be used in all control plane protocols. Local Repair functions SHOULD be used wherever possible. Full redundancy is required at all equipment that is shared in a network element. o Power Supply o Switch Fabric o Routing Processor A change from an active component to a standby component SHOULD happen without effecting customers traffic. The Influence of customer traffic MUST be as low as possible. Leymann, et al. Expires September 12, 2011 [Page 15] Internet-Draft Seamless MPLS March 2011 3.4. Scalability The network must be highly scalable. As a minimum requirement the following scalability figures should be met: o Number of aggregation domains: 100 o Number of backbone nodes: 1.000 o Number of aggregation nodes: 10.000 o Number of access nodes: 100.000 3.5. Stability o The platform should be stable under certain circumstances (e.g. missconfiguration within one area should not cause instability in other areas). o Differentiate between "All Loopbacks and Link addresses should be ping able from every where." Vs. "Link addresses are not necessary ping able from everywhere". 4. Architecture 4.1. Overall One of the key questions that emerge when designing an architecture for a seamless MPLS network is how to handle the sheer size of the necessary routing and MPLS label information control plane and forwarding plane state resulting from the stated scalability goals especially with respect to the total number of access nodes. This needs to be done without overwhelming the technical scaling limits of any of the involved nodes in the network (access, aggregation and core) and without introducing too much complexity in the design of the network while at the same time still maintaining good convergence properties to allow for quick MPLS transport and service restoration in case of network failures. 4.2. Multi-Domain MPLS networks The key design paradigm that leads to a sound and scalable solution is the divide and conquer approach, whereby the large problem is decomposed into many smaller problems for which the solution can be found using well-known standard architectures. In the specific case of seamless MPLS the overall MPLS network SHOULD Leymann, et al. Expires September 12, 2011 [Page 16] Internet-Draft Seamless MPLS March 2011 be decomposed into multiple MPLS domains, each well within the scaling limits of well-known architectures and network node implementations. From an organizational and operational point of view it MAY make sense to define the boundaries of such domains along the pre-existing boundaries of aggregation networks and the core network. Examples of how networks can be decomposed include using IGP areas as well as using multiple BGP autonomous systems. 4.3. Hierarchy These MPLS domains SHOULD then be then be connected into an MPLS multi-domain network in a hierarchical fashion that enables the seamless exchange of loopback addresses and MPLS label bindings for transport LSPs across the entire MPLS internetwork while at the same time preventing the flooding of unnecessary routing and label binding information into domains or parts of the network that do not need them. Such a hierarchical routing and forwarding concept allows a scalability in different dimensions and allows to hide the complexity and size of the aggregation and access networks. 4.4. Intra-Domain Routing The intra-domain routing within each of the MPLS domains (i.e. aggregation domains and core) SHOULD utilize standard IGP protocols like OSPF or ISIS. By definition, each of these domains is small enough so that there are no relevant scaling limits within each IGP domain, given well-known state-of-the-art IGP design principles and recent router technology. The intra-domain MPLS LSP setup and label distribution SHOULD utilize standard protocols like LDP or RSVP. 4.5. Inter-Domain Routing The inter-domain routing is responsible for establishing connectivity between and across all MPLS domains. The inter-domain routing SHOULD establish a routing and forwarding hierarchy in order to achieve the scaling goals of seamless MPLS. Note that the IP aggregation usually performed between region (IGP areas/AS) in IP routing does not work for MPLS as MPLS is not capable of aggregating FEC (because MPLS forwarding use an exact match lookup, while IP uses longest match). Therefore it is RECOMMENDED to utilize protocols that support indirect next-hops (like BGP with MPLS labels "labled BGP/SAFI4" [RFC3107]). Leymann, et al. Expires September 12, 2011 [Page 17] Internet-Draft Seamless MPLS March 2011 4.6. Access Compared to the aggregation and core parts of the Seamless MPLS network the access part is special in two respects: o The number of ndes in the access is at least one order of magnitude higher than in any other part of the network. o Because of the large quantity of access nodes, the cost of these nodes is extremly relevant for the overall costs of the entire network, i.e. acess nodes are very cost sensitive. This makes it desirable to design the architecture such that the AN functionality can be kept as simple as possible. This should always be kept in mind when evalulating different seamless MPLS architectures. The goal is to limit both the number of different protocols needed on the AN as well as the scale to which each protocol must perform to the absolute minimum. 5. Deployment Scenarios This section describes the deployment scenarios based on the use cases and the generic architecture above. 5.1. Deployment Scenario #1 Section describing the Seamless MPLS implementation of a large european ISP. 5.1.1. Overview This deployment scenario describes one way to implement a seamless MPLS architecture. Specific to this implementation is the choice of intra- and inter-domain routing and label distribution protocols, as well as the details of the interworking of these protocols to achieve the overall scalable hierarchical architecture. 5.1.2. General Network Topology There are multiple aggregation domains (in the order of up to 100) connected to the core in a star topology, i.e. aggregation domains are never connected among themselves, but only to the core. The core has its own domain. Leymann, et al. Expires September 12, 2011 [Page 18] Internet-Draft Seamless MPLS March 2011 +-------+ +-------+ +------+ +------+ | | | | | | | | +--+ AGN11 +---+ AGN21 +---+ ABR1 +---+ LSR1 +--> to AGN / | | /| | | | | | +----+/ +-------+\/ +-------+ +------+ /+------+ | AN | /\ \/ | +----+\ +-------+ \+-------+ +------+/\ +------+ \ | | | | | | \| | +--+ AGN12 +---+ AGN22 +---+ ABR2 +---+ LSR2 +--> to AGN | | | | | | | | +-------+ +-------+ +------+ +------+ static route ISIS L1 LDP ISIS L2 LDP <-Access-><--Aggregation Domain--><---------Core---------> Figure 5: Deployment Scenario #1 As shown in Figure 5, the access nodes (AN) are connected to the aggregation network via aggregation nodes called AGN1x, either to a single AGN1x or redundantly to two AGN1x. Each AGN1x has redundant uplinks to a pair of second-level aggregation nodes called AGN2x. Each aggregation domain is connected to the core via exactly two border routers (ABR) on the core side. There can be multiple AGN2 pairs per aggregation domain, but only one ABR pair for each aggregation domain. Each of the AGN2 in an AGN2 pair connects to one of the ABRs in the ABR pair responsible for that aggregation domain. The ABRs on the core side have redundant connections to a pair of LSR routers. The LSR pair is also connected via a direct link. The core LSR are connected to other core LSR in a partly meshed topology so that there are disjunct, redundant paths from each LSR to each other LSR. 5.1.3. Hierarchy As explained before, hierarchy is the key to a scalable seamless MPLS architecture. The hierarchy in this implementation is achieved by forming different MPLS domains for aggregation domains and core, where within each of these domains a fairly common MPLS deployment using ISIS as intradomain link-state routing protocol and using LDP for MPLS label distribution is used. These MPLS domains are mapped to ISIS areas as follows: Aggregation Leymann, et al. Expires September 12, 2011 [Page 19] Internet-Draft Seamless MPLS March 2011 domains are mapped to ISIS L1 areas. The core is configured as ISIS L2. The border routers connecting aggregation and core are ISIS L1L2 and are referred to as ABRs. From a technical and operational point of view these ABRs are part of the core, althought they also belong to the respective aggregation domain purely from a routing protocol point of view. For the interdomain-routing BGP with MPLS labels is deployed ("labled BGP/SAFI4" [RFC3107]). 5.1.4. Intra-Area Routing 5.1.4.1. Core The core uses ISIS L2 to distribute routing information for the loopback addresses of all core nodes. The border routers (ABR) that connect to the aggregation domains are also part of the respective aggregation ISIS L1 area and hence ISIS L1L2. LDP is used to distribute MPLS label binding information for the loopback addresses of all core nodes. 5.1.4.2. Aggregation The aggregation domains uses ISIS L1 as intra-domain routing protocol. All AGN loopback addresses are carried in ISIS. As in the core, the aggregation also uses LDP to distribute MPLS label bindings for the loopback addresses. 5.1.5. Access Access nodes do not have their own domain or IGP area. Instead, they directly connect to the AGN1 nodes in the aggregation domain. To keep access devices as simple as possible, ANs do not participate in ISIS. Instead, each AN has two static default routes pointing to each of the AGN1 it is connected to. Appropriate techniques SHOULD be deployed to make sure that a given default route is invalidated when the link to an AGN1 or that node itself fails. Examples of such techniques include monitoring the pysical link state for loss of light/loss of frame, or using Ethernet link OAM or BFD [I-D.ietf-bfd-v4v6-1hop]. The AGN1 MUST have a configured static route to the loopback address of each of the ANs it is connected to, because it cannot learn the AN loopback address in any other way. These static routes have to be Leymann, et al. Expires September 12, 2011 [Page 20] Internet-Draft Seamless MPLS March 2011 monitored and invalidated if necessary using the same techniques as described above for the static default routes on the AN. The AGN1 redistributes these routes into ISIS for intra-domain reachability of all AN loopback addresses. LDP is used for MPLS label distribution between AGN1 and AN. In order to keep the AN control plane as lightweight as possible, and to avoid the necessity for the AN to store 100.000 MPLS label bindings for each upstream AGN1 peer, LDP is deployed in downstream-on-demand (DoD) mode, described below. To allow the label bindings received via LDP DoD to be installed into the LFIB on the AN without having the specific host route to the destination loopback address, but only a default route, use of the LDP Extension for Inter-Area Label Switched Paths [RFC5283] is made. 5.1.5.1. LDP Downstream-on-Demand (DoD) LDP downstream-on-demand mode is specified in [RFC5036]. Although it was originally intended to be used with ATM switch hardware, there is nothing from a protocol perspective preventing its use in a regular MPLS frame-based environment. In this mode the upstream LSR will explicitly ask the downstream LSR for a label binding for a particular FEC when needed. The assumption is that a given AN will only have a limited number of services configured to an even more limited number of destinations, or egress LER. Instead of learning and storing all label bindings for all possible loopback addresses within the entire Seamless MPLS network, the AN will use LDP DoD to only request the label bindings for the FECs corresponding to the loopback addresses of those egress nodes to which it has services configured. For LDP DoD the AGN1 MUST also ask the AN for label bindings for specific FECs. FECs are necessary for all pseudowire destinations at the AN. Most preferable this pseudowire destination is the LSR-ID of the AN. Depending on the AN implementation and architecture multiple pseudowire destination addresses and associated FECs could be needed. The conclusion of this results to the following requirement: o The AGN1 MUST ask the AN for label bindings for all potential pseudowire destination addresses on the AN. Because the AGN (at least in many cases) does not take part in the pseudowire signaling an independent way of receiving the AN FEC is necessary on the AGN. These potential pseudowire destinations MUST be known on the AGN1, by configuration or otherwise. These are typically the loopback addresses of the AN, to which a static route has been Leymann, et al. Expires September 12, 2011 [Page 21] Internet-Draft Seamless MPLS March 2011 configured anyway on the AGN1, as explained above. In addition to these static routes, the AGN1 SHOULD be configured statically to request MPLS label bindings for these loopback addresses via LDP DoD. o Optionally an automatism that asks for a FEC for the LSR-ID COULD be implemented. A configuration switch that disables this option must be implemented. The label is necessary. The way of initiating the DoD-signaling of the label could be done with both methods (configuration/automatism). o The AN knows by configuration to which destination a pseudowire is set up. The AN is always the endpoint of the pseudowire. Before signalling a pseudowire the AN MUST ask (via LDP DoD) the AGN for a FEC. Because of this an independent preconfiguration is not necessary on the AN. o The following are the triggers for ANs to request a label: o * When a control session (targeted LDP) to a target has to be established * When a service label has been received by a control session (e.g. pseudo wire label) 5.1.6. Inter-Area Routing The inter-domain MPLS connectivity from the aggregation domains to and across the core domain is realized primarily using BGP with MPLS labels ("labled BGP/SAFI4" [RFC3107]). A very limited amount of route leaking from ISIS L2 into L1 is also used. All ABR and PE nodes in the core are part of the labeled iBGP mesh, which can be either full mesh or based on route reflectors. These nodes advertise their respective loopback addresses (which are also carried in ISIS L2) into labeled BGP. Each ABR node has labeled iBGP sessions with all AGN1 nodes inside the aggregation domain that they connect to the core. Since there are two ABR nodes per aggregation domain, this leads to each AGN1 node having an iBGP sessions with each of the two ABR. Note that the use of iBGP implies that the entire seamless MPLS internetwork is just a single AS to which all core and aggregation nodes belong. The AGN1 nodes advertise their own loopback addresses into labeled BGP, in addition to these loopbacks also being in ISIS L1. Leymann, et al. Expires September 12, 2011 [Page 22] Internet-Draft Seamless MPLS March 2011 Additionally the AGN1 nodes also redistribute all the statically configured routes to the AN loopback addresses into labeled BGP. Note that as stated obove, the AGN1 MUST ask the AN for label bindings for the AN loopback FECs via LDP DoD in order to have a valid labeled route with a non-null label. This architecture results in carrying all loopbacks of all nodes except pure P nodes (AN, AGN, ABR and core PE) in labeled BGP, e.g. there will be in the order of 100.000 routes in labeled BGP when approaching the stated scalability goal. Note that this only affects the BGP RIB size and does not necessarily imply that any node needs to actually have active forwarding state (LFIB) in the same order of magnitude. In fact, as will be discussed in the scalability analysis, no single node needs to install all labeled BGP routes into the LFIB, but each node only needs a small percentage of the RIB as active forwarding state in the LFIB. And from a RIB point of view, BGP is known to scale to hundreds of thousands of routes. 5.1.7. Labled iBGP next-hop handling The ABR nodes run labeled iBGP both to the core mesh as well as to the AGN1 nodes of their respective aggregation domains. Therefore they operate as iBGP route reflectors, reflecting labeled routes from the aggregation into the core and vice versa. When reflecting routes from the core into the aggregation domain, the ABR SHOULD NOT change the BGP NEXT-HOP addresses (next-hop- unchanged). This is the usual behaviour for iBGP route reflection. In order to make these routes resolvable to the AGN1 nodes inside the aggregation domain, the ABR MUST leak all other ABR and core PE loopback addresses from ISIS L2 into ISIS L1 of the aggregation domain. Note that the number of leaked addresses is limited so that the overall scalability of the seamless MPLS architecture is not impacted. In the worst case all core loopback addresses COULD be leaked into ISIS L1, but even that would not be a scalability problem. When reflecting routes from the aggregation into the core, the ABR MUST set then BGP NEXT-HOP to its own loopback addresses (next-hop- self). This is not the default behaviour for iBGP route reflection, but requires special configuration on the ABR. Note that this also implies that the ABR MUST allocate a new local MPLS label for each labeled iBGP FEC that it reflects from the aggregation into the core. This special next-hop handling is essential for the scalability of the overall seamless MPLS architecture since it creates the required hierarchy and enables the hiding of all aggregation and access addresses behind the ABRs from an IGP point of view. Leaking of aggregation ISIS L1 loopback addresses into ISIS L2 is not necessary Leymann, et al. Expires September 12, 2011 [Page 23] Internet-Draft Seamless MPLS March 2011 and MUST NOT be allowed. The resulting hierarchical inter-domain MPLS routing structure is similar to the one described in [RFC4364] section 10c, only that we use one AS with route reflection instead of using multiple ASes. 5.1.8. Network Availability and Simplicity The seamless mpls architecture illustrated in deployment case study 1 guarantees a sub-second loss of connectivity upon any link or node failures. Furthermore, in the vast majority of cases, the loss of connectivity is limited to sub-50msec. These network availability properties are provided without any degradation on scale and simplicity. This is a key achievement of the design. In the remainder of this section, we first introduce the different network availability technologies and then review their applicability for each possible failure scenario. 5.1.8.1. IGP Convergence IGP convergence can be modelled as a linear process with an initial delay and a linear FIB update [ACM01]. The initial delay could conservatively be assumed to be 260msec: 50msec to detect failures with BFD (most failures would be detected faster with loss of light for example or with faster BFD timers), 50msec to throttle the LSP generation, 150msec to throttle the SPF computation (making sure than all the required LSP's are received even in case of SRLG failures) and 10msec for shortest-path-first tree computation. Assuming 250usec per update (conservative), this allows for (1000- 260)/0.250= 2960 prefixes update within a second following the outage. More precisely, this allows for 2960 important IGP prefixes updates. Important prefixes are automatically classified by the router implementation through simple heuristic (/32 is more important than non-/32). The number of IGP important routes (loopbacks) in deployment case study 1 is much smaller than 2960, and hence sub-second IGP convergence is conservative. IGP convergence is a simple technology for the operator provided that the router vendor optimizes the default IGP behavior (no need to tune any arcane knob). Leymann, et al. Expires September 12, 2011 [Page 24] Internet-Draft Seamless MPLS March 2011 5.1.8.2. Per-Prefix LFA FRR A per-prefix LFA for a destination D is a precomputed backup IGP nexthop for that destination. This backup IGP nexthop can be link protecting or node protecting [RFC5286]. The analysis of the applicability of Per-Prefix LFA in the deployment model 1 of Seamless MPLS architecture is straightforward thanks to [I-D.filsfils-rtgwg-lfa-applicability]. In deployment model 1, each aggregation network either follows the triangle or full-mesh topology. Further more, the backbone region implements a dual-plane. As a consequence, the failure of any link or node within an aggregation domain is protected by LFA FRR (sub- 50msec) for all impacted IGP prefixes, whether intra-area or inter- area. No uloop may form as a result of these failures [I-D.filsfils-rtgwg-lfa-applicability]. Per-Prefix LFA FRR is generally assessed as a simple technology for the operator [I-D.filsfils-rtgwg-lfa-applicability]. It certainly is in the context of deployment case study 1 as the designer enforced triangle and full-mesh topologies in the aggregation network as well as a dual-plane core network. 5.1.8.3. Hierarchical Dataplane and BGP Prefix Independent Convergence In a hierarchical dataplane, the FIB used by the packet processing engine reflects the recursions between routes. For example, a BGP route B recursing on IGP route I whose best path is via interface O is encoded as a FIB entry B pointing to a FIB entry I pointing to a FIB entry 0. Hierarchical FIB [BGPPIC] extends the hierarchical dataplane with the concept of a BGP Path-List. A BGP path-list may be abstracted as a set of primary multipath nhops and a backup nhop. When the primary set is empty, packets destined to the BGP destinations are rerouted via the backup nhop. With hierarchical FIB and hierarchical dataplane, a FIB entry representing a BGP route points to a FIB entry representing a BGP Path-List. This entry may either point again to another BGP Path list entry (BGP over BGP recursion) or more likely points to a FIB entry representing an IGP route. A BGP Path-list may be computed automatically by the router and does not require any operator involvement. Specifically, the automated computation adapts to any routing policy (this is key to understand the simplicity of hierarchical FIB and the ability to enable it as a Leymann, et al. Expires September 12, 2011 [Page 25] Internet-Draft Seamless MPLS March 2011 default router behavior). There is no constraint at all on the operator design. Any policy is supported (multipath, primary/backup between neighboring domains or via alternate domains). The BGP backup nhop is computed in advance of any failure (ie. a second bestpath computation after excluding the primary nhops). Hierarchical dataplane and hierarchical FIB provide two important routing availability properties. First, upon IGP convergence, recursive BGP routes immediately benefit from the updated IGP paths thanks to the dataplane indirection. This is key as most of the traffic is destined to BGP routes, not to IGP routes. Second, upon loss of the primary BGP nhop, the dataplane can immediately reroute the packets towards the pre-computed backup nhop. This redirection is said to be prefix independent as the only entries that need to be modified are the BGP path-lists. These entries are shared across all the BGP prefixes with the same primary and backup next-hops. This scale independence is key. In the context of deployment model 1, while there might be 100k BGP routes, we only expect on the order of 200 BGP path-lists. Assuming 10usec in-place modification per BGP path-list, we see that the router can enable the backup path for 100k BGP destinations in less than 2msec (less than 200 * 10usec). The detection of the loss of the primary BGP nhop (and hence the need to enable the pre-computed backup BGP nhop) can be local (a local link failing between an edge device and a single-hop eBGP peer) or involves an IGP convergence (a remote border router goes down). These hierarchical FIB properties benefit to any BGP routes: Internet, L3VPN, 3107, IPv4 or IPv6. Future evolution of VPLS will also benefit from such properties [I-D.raggarwa-mac-vpn], [I-D.sajassi-l2vpn-rvpls-bgp] Hierarchical forwarding and hierarchical FIB are very simple technology to operate. Their ability to adapt to any topology, any routing policy and any BGP address family allows router vendors to enable this behavior by default. 5.1.8.4. Local Protection using Anycast BGP Leymann, et al. Expires September 12, 2011 [Page 26] Internet-Draft Seamless MPLS March 2011 5.1.8.4.1. Anycast BGP applied to ABR node failure In this section we described a mechanism that provides local protection for area border router (ABR) failures. To illustrate this mechanism consider an example shown in Figure 6. +-------+ | | vl0+ ABR 1 | /| | +----------+ +-------+ / +-------+ | | | |/ | PE / LER +-..-+ PLR | | | | |\ +----------+ +-------+ \ +-------+ \| | vl0+ ABR 2 | | | +-------+ +-------+ +-------+ +-------+ | LDP-L +-----+ LDP-L +-----+ LDP-L | +-------+ +-------+ +-------+ | BGP-L +-------------------+ BGP-L | +-------+ +-------+ --------------- traffic ----------------> <----- routing + label distribution ----- Figure 6: Routing and Traffic Flow The core router adjacent to ABR1 and ABR2 acts as a point of local repair (PLR). When the PLR detects ABR1 failure, the PLR re-routes to ABR2 the traffic that the PLR used to forward to ABR1, with ABR2 providing the subsequent forwarding for this traffic. To accomplish this ABR1, ABR2, and the PLR employ the following procedures. ABR1, in addition to its own loopback, is provisioned with another IP address (vl0). This IP address is used to identify the forwarding state/context on ABR1 that is the subject to the local protection mechanism outlined in this section. We refer to this IP address, vl0, as the "context identifier". ABR1 advertises its context identifier in ISIS and LDP. As ABR1 re-advertises to its core peers the BGP routes it receives from its peers in the aggregation domain(s), ABR1 sets the BGP Next Hop on these routes to its context identifier (this creates an association between the forwarding state/ context created by these routes and the context identifier). Leymann, et al. Expires September 12, 2011 [Page 27] Internet-Draft Seamless MPLS March 2011 ABR2, acting as a protector for ABR1, is configured with the ABR1's context identifier. ABR2 advertises this context identifier into LDP and ISIS. The LDP advertisement is done with no PHP and a non-null label, and the ISIS advertisement is done with a very high metric. As a result, the PLR would have an LFA route/LSP to this context identifier with ABR2 as the next hop. When the PLR detects ABR1's failure, the LFA procedures on the PLR would result in sending to ABR2 the traffic that the PLR used to forward to ABR1. Moreover, since ABR2 advertises into LDP a non-null label for the ABR1's context identifier, this label would enable ABR2 to identify such traffic (as we'll see further down the ability to identify such traffic is essential in order for ABR2 to correctly forward this traffic). Leymann, et al. Expires September 12, 2011 [Page 28] Internet-Draft Seamless MPLS March 2011 +-----------------+-----------+-----------+ | FEC 10.0.1.1/32 | Label 200 | NH AGN2-1 | +-----------------+-----------+-----------+ | FEC 10.0.1.2/32 | Label 233 | NH AGN2-1 | ABR1 +-----------------+-----------+-----------+ | FEC 10.0.1.3/32 | Label 313 | NH AGN2-1 | +-----------------+-----------+-----------+ +------+ +-------+ | | | | +------------------+ vl0+ ABR1 +----+ AGN21 +----+ AGN11:10.0.1.1/32| /| | | |\ /+------------------+ / +------+\ /+-------+ \/ +----+ +-----+/ \/ \ /\ +------------------+ | PE +---+ PLR | /\ X X+ AGN12:10.0.1.2/32| +----+ +-----+\ / \ / \/ +------------------+ \ +------+ +-------+ /\ \| | | |/ \+------------------+ vl0+ ABR2 +----+ AGN22 +----+ AGN13:10.0.1.3/32| | | | | +------------------+ +------+ +-------+ +----------------------------------------+ | native forwarding context | +-----------------+-----------+----------+ | FEC 10.0.1.1/32 | Label 100 | NH AGN21 | +-----------------+-----------+----------+ | FEC 10.0.1.2/32 | Label 107 | NH AGN21 | ABR2 +-----------------+-----------+----------+ | FEC 10.0.1.3/32 | Label 152 | NH AGN21 | +-----------------+-----------+----------+ | | | V V V +----------------------------------------+ | backup forwarding context | +-----------------+-----------+----------+ | FEC 10.0.1.1/32 | Label 200 | NH AGN21 | +-----------------+-----------+----------+ | FEC 10.0.1.2/32 | Label 233 | NH AGN21 | ABR2 +-----------------+-----------+----------+ | FEC 10.0.1.3/32 | Label 313 | NH AGN21 | +-----------------+-----------+----------+ (ABR2 acting as backup for ABR1) Figure 7: ABR Failure Scenarios ABR2, acting as a protector for the forwarding context of ABR1, has Leymann, et al. Expires September 12, 2011 [Page 29] Internet-Draft Seamless MPLS March 2011 to have the label> mapping for the FECs present in that forwarding context, and should use this mapping to create the forwarding state it would use when forwarding the traffic received from the PLR. Figure 7 shows the label> mapping on ABR1 and ABR2. Note that the backup forwarding context on ABR2 is a mirror image of the forwarding context on ABR1. This backup forwarding context is populated using the routes that have been re-advertised by ABR1 to its core peers (as ABR2 is a BGP core peer of ABR1). The label that ABR2 advertises into LDP for ABR1's context identifier points to the backup context. This way, ABR2 forwards all the traffic received with this label using not its native forwarding context, but the backup forwarding context. Note that whether the PLR could rely on the basic LFA to re-route to ABR2 the traffic that the PLR used to forward to ABR1 depends on the LFA coverage. Since the basic LFA does not guarantee 100% coverage in all topologies, relying on basic LFA may not be sufficient, in which case the basic LFA would need to be augmented to provide 100% coverage. The procedures outlined above provide local protection upon ABR node failure. By virtue of being local protection, the actions required to restore connectivity upon the failure detection are fully localized to the router closest to the failure - the router directly connected to the failed ABR. This enables to deliver under 50msec connectivity recovery time in the presence of ABR failure. These actions do not depend on propagating failure information in ISIS, thus providing connectivity recovery time that is independent of the ISIS routing convergence time. In contrast, a combination of hierarchical FIB organization and ISIS routing convergence, being a global protection mechanism, does rely on the ISIS routing convergence time, as the prefix-independent switch-over on the pre- computed backup next hop occurs upon IGP convergence (deletion of the IGP route to the remote ABR), and thus would have several 100s msec connectivity recovery time. 5.1.8.4.2. Extensions to support ABR's connected to different aggregation regions Note that for the purpose of identifying the forwarding context ABR1's forwarding state could be partitioned, with each partition being assigned its own IP address (its own context identifier). ABR1 would advertise all these identifiers into ISIS and LDP. This may be useful in the scenario where ABR1 is connected to more than one aggregation domain (more than one L1 area), in which case each context identifier would identify the ABR1's forwarding state associated with a single aggregation domain. Leymann, et al. Expires September 12, 2011 [Page 30] Internet-Draft Seamless MPLS March 2011 One could further refine the above scheme by implementing protector functionality that would allow a single protector to protect multiple forwarding contexts, with each forwarding context being associated with all the forwarding state maintained by a given (protected) ABR. Such functionality could be implemented either on a separate router, or could be co-located with an existing ABR. Details of this are outside the scope of this document. 5.1.8.4.3. Anycast BGP applied to a L3VPN PE BGP Anycast is also used to protect against L3VPN PE failures. In general a given VPN site can be multi-homed (connected to several L3VPN PEs). Moreover, multi-homed sites may be non-congruent with each other - different multi-homed sites connected to a given PE may have their other connection(s) to different other PEs. BGP Anycast scheme, utilizing the construct of Protector PE, provides forwarding context protection for multiple egress PEs in the presence of non- congruent multi-homed sites. Protector PE function is enhanced from the basic BGP Anycast 1:1 mirroring procedures described for ABR protection, by supporting multiple backup forwarding contexts, one per protected egress PE. Each backup forwarding context on the Protector PE is identified by the context identifier of the associated protected egress PE. Protector PE advertises these context identifiers into IGP with a large metric and into LDP with no PHP and a non-null label. This results in PLR of each egress PE having an LFA route/LSP (or bypass LSP if no native LFA coverage for specific topology) to the associated context identifier with Protector PE as the next hop. Protector PE creates a backup forwarding context per protected egress PE based on BGP advertisements from this egress PE and other egress PEs with the same multi-homed customer networks. Similarly to the ABR case described earlier, in case of specific protected egress PE failure, PLR will follow standard LFA procedure (or local protection to bypass LSP) and forward affected flows to Protector PE. Those flows will arrive to Protector PE on the LSP associated with the context identifier for the failed egress PE, the backup forwarding context will be identified by this LSP, and flows will be switched to alternative egress PE(s). 5.1.8.5. Assessing loss of connectivity upon any failure We select two typical traffic flows and analyze the loss of connectivity (LoC) upon each possible failure. Leymann, et al. Expires September 12, 2011 [Page 31] Internet-Draft Seamless MPLS March 2011 Flow F1 starts from an AN1 in a left aggregation region and ends on an AN2 in a right aggregation region. Each AN is dual-homed to two AGN's. Flow F2 starts from an L3VPN PE1 in the core and ends at an L3VPN PE2 in the core. Note that due to the symmetric network topology in case study 1, uni- directional flows F1' and F2', associated with F1 and F2 and forwarded in the reversed direction (AN2 to AN1 right-to-left and PE2 to PE1, respectively), take advantage of the same failure restoration mechanisms as F1 and F2. . 5.1.8.5.1. AN1-AGN link failure or AGN node failure F1 is impacted but LoC <50msec is possible assuming fast BFD detection and fast-switchover implementation on the AN. F2 is not impacted. 5.1.8.5.2. Link or node failure within the left aggregation region F1 is impacted but LoC <50msec thanks to LFA FRR. No uloop will occur during the IGP convergence following the LFA protection. Note: if LFA is not available (other topology then case study one) or if LFA is not enabled, then the LoC would be < second as the number of impacted important IGP route in a seamless architecture is much smaller than 2960. F2 is not impacted. 5.1.8.5.3. ABR node failure between left region and the core F1 is impacted but LoC <50msec thanks to LFA FRR. No uloop will occur during the IGP convergence following the LFA protection. Note: This case is also called "Local ABR failure" as the ABR which fails is the one connected to the aggregation region at the source of flow F1. Note: remember that the left region receives the routes to all the remote ABR's and that the labelled BGP routes are reflected from the core to the left region with next-hop unchanged. This ensures that the loss of the (local) ABR between the left region and the core is seen as an IGP route impact and hence can be addressed by LFA. Note: if LFA is not available (other topology then case study one) or if LFA is not enabled, then the LoC would be < second as the number of impacted important IGP route in a seamless architecture is much Leymann, et al. Expires September 12, 2011 [Page 32] Internet-Draft Seamless MPLS March 2011 smaller than 2960. F2 is not impacted. 5.1.8.5.4. Link or node failure within the core region F1 and F2 are impacted but LoC <50msec thanks to LFA FRR. This is specific to the particular core topology used in deployment case study 1. The core topology has been optimized [I-D.filsfils-rtgwg-lfa-applicability] for LFA applicability. As explained in [I-D.filsfils-rtgwg-lfa-applicability], another alternative to provide <50msec in this case consists in using an MPLS-TE full-mesh and MPLS-TE FRR. This is required when the designer is not able or does not want to optimize the topology for LFA applicability and he wants to achieve <50msec protection. Alternatively, simple IGP convergence would ensure a LoC < second as the number of impacted important IGP route in a seamless architecture is much smaller than 2960. 5.1.8.5.5. PE2 failure F1 is not impacted. F2 is impacted and the LoC is sub-300msec thanks to IGP convergence and hierarchical FIB. The detection of the primary nhop failure (PE2 down) is performed by a single-area IGP convergence. In this specific case, the convergence should be much faster than 90% of the IGP/BGP3107 footprint at least). If the guidelines cannot be met, then either the designer will rely on (1) augmenting native LFA coverage with RSVP, or (2) a full-mesh TE FRR model, or (3) IGP convergence. The first option provides the same sub-50msec protection as LFA, but introduces additional RSVP LSPs. The second option optimizes for sub-50msec protection, but implies a more complex operational model. The third option optimizes for simple operation but only provides 500k and rising) so the target can be handled with current implementations. In addition, AN routes are internal routes whose churn and instability is smaller and more under control than external routes. BGP Route Reflector (RR) NLRI : #AN ~ o(n) path : 2*#AN ~ o(2n) ABR handles both the core and aggregations routes. They do not depend on the total number of AN nodes, but only on the number of AN in their aggregation domain. Leymann, et al. Expires September 12, 2011 [Page 39] Internet-Draft Seamless MPLS March 2011 ABR: IP FIB : 5*#Core + (5*#AGN + #AN) / #Area ~ o(#AN /#Area) MPLS LFIB : #Core + (#AGN + #AN) / #Area ~ o(#AN / #Area) 5.2.1.3. Aggregation Domain In the aggregation domain, IGP & LDP are not affected by the number of access nodes outside of their domain. They are not affected by the total number of AN nodes: IGP: node : #AGN / #Area ~ o(1) links : 3*#AGN / #Area ~ o(1) IP prefixes : #Core + #Area + (5*#AGN + #AN) / #Area ~ o(#AN *5/ #Area) + + 1 loopback per core node + one aggregate per area + 5 prefixes per AGN in the area + 1 prefix per AN in the area. LDP FEC: Core + (#AGN + #AN) / #Area ~ o(#AN / #Area) + + 1 loopback per core node + 1 loopback per AGN & AN node in the area. AGN FIBs grows with the number of node in the core area, in their aggregation area, plus the number of inter domain LSP required by the AN attached to them. They do not depend on the total number of AN nodes. In the BGP control plane, AGN also needs to handle all the AN routes. AGN: IP FIB : #Core + #Area + (5*#AGN + #AN) / #Area ~ o(#AN *5/ #Area) Leymann, et al. Expires September 12, 2011 [Page 40] Internet-Draft Seamless MPLS March 2011 MPLS LFIB : #Core + (#AGN + #AN) / #Area + 100 ~ o(#AN / #Area) AN FIBs grows with its connectivity requirement. They do not depend on the number of AN, AGN, SN or any others nodes. AN: IP RIB : 1 ~ o(1) MPLS LIB : 1k ~ o(1) IP FIB : 1 ~ o(1) MPLS LFIB : 1k ~ o(1) 5.2.1.4. Summary AN requirements are kept minimal. BGP is not required and the size of their FIB is limited to their own connectivity requirements. In the core area, IGP and LDP are not affected by the node in the aggregation domains. In particular they do not grow with the number of AGN or AN. In the aggregation areas, IGP and LDP are affected by the number of core nodes and the number of AGN and AN in their area. They are not affected by the total number of AGN or AN in the seamless MPLS domain. No FIB of any node is required to handle the total number of AGN or AN in the seamless MPLS domain. In other word, the number of AGN and AN in the seamless MPLS domain is not limited, if the number of areas can grow accordingly. The main limitation is the MPLS connectivity requirements on the AN, i.e. mainly the number of LSP needed on the AN. Another limitation may be the number of different LSP needed by AN attached or behind an AGN. However, given foreseen deployments and current AGN capabilities, this is not expected to be a limitation. In the control plane, BGP will typically handle all AN routes. This is significant but target deployments are well under current equipments capacities. In addition, if required, additional techniques could be used to improve this scalability, based on the experience gained with scaling BGP/MPLS VPN (e.g. route partitioning between RR planes, route filtering (static or dynamic with ORF or route refresh) between AN and on AGN to improve AGN scalability. Leymann, et al. Expires September 12, 2011 [Page 41] Internet-Draft Seamless MPLS March 2011 5.2.1.5. Numerical application for use case #1 As a recap, targets for deployment scenario 1 are: o Number of Aggregation Domains 100 o Number of Backbone Nodes 1.000 o Number of AGgregation Nodes 10.000 o Number of Access Nodes 100.000 This gives the following scaling numbers for each category of nodes: o AN IP FIB 1 o AN MPLS LFIB 1 000 o AGN IP FIB 2 600 o AGN MPLS LFIB 2 200 o ABR IP FIB 7 600 o ABR MPLS LFIB 2 100 o TN IP FIB 5 000 o TN MPLS LFIB 1 000 o RR BGP NLRI 100 000 o RR BGP paths 200 000 5.2.1.6. Numerical application for use case #2 As a recap, targets for deployment scenario 1 are: o Number of Aggregation Domains 30 o Number of Backbone Nodes 150 o Number of AGgregation Nodes 1.500 o Number of Access Nodes 40.000 This gives the following scaling numbers for each category of nodes: Leymann, et al. Expires September 12, 2011 [Page 42] Internet-Draft Seamless MPLS March 2011 o AN IP FIB 1 o AN MPLS LFIB 1 000 o AGN IP FIB 1 700 o AGN MPLS LFIB 1 800 o ABR IP FIB 3 700 o ABR MPLS LFIB 1 600 o TN IP FIB 750 o TN MPLS LFIB 150 o RR BGP NLRI 40 000 o RR BGP paths 80 000 6. Acknowledgements Many people contributed to this document. The authors wish to thank - in alphabetical order: o Wim Henderickx (Alcatel) o Clarence Filsfils (Cisco Networks), o Thomas Beckhaus, Wilfried Maas, Roger Wenner (Deutsche Telekom), o Kireeti Kompella, Yakov Rekhter (Juniper Networks), o Mark Tinka (Global Transit) o Simon DeLord (Telstra) 7. IANA Considerations This memo includes no request to IANA. All drafts are required to have an IANA considerations section (see the update of RFC 2434 [I-D.narten-iana-considerations-rfc2434bis] for a guide). If the draft does not require IANA to do anything, the section contains an explicit statement that this is the case (as above). If there are no requirements for IANA, the section will be Leymann, et al. Expires September 12, 2011 [Page 43] Internet-Draft Seamless MPLS March 2011 removed during conversion into an RFC by the RFC Editor. 8. Security Considerations In a typical MPLS deployment the use of MPLS is limited to relatively small network consisting of core and edge nodes. Those nodes are under full control of the services provider and placed at locations where only authorized personal has access (this also includes physical access to the nodes). With the extensions of MPLS towards access and aggregation nodes not all nodes will be "locked away" in secure locations. Small access nodes like DSLAMs will be located in street cabinets, potentially offering access to the "interested researcher". Nevertheless the unauthorized access to such in device SHOULD NOT impose any security risks to the MPLS infrastructure itself. Seamless MPLS must be stable regarding attacks against access and aggregation nodes running MPLS. Levels of Security: tbd. Access Network: tbd. Aggregation Network: tbd. Core Network: tbd. 9. References 9.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. 9.2. Informative References [ABRFRR] Rekhter, Y., "Local Protection for LSP tail-end node failure, MPLS World Congress 2009". [ACM01] "Archieving sub-second IGP convergence in large IP networks, ACM SIGCOMM Computer Communication Review, v.35 n.3", July 2005. [BGPPIC] "BGP PIC, Technical Report", November 2007. [I-D.filsfils-rtgwg-lfa-applicability] Filsfils, C., Francois, P., Shand, M., Decraene, B., Uttaro, J., Leymann, N., and M. Horneffer, "LFA Leymann, et al. Expires September 12, 2011 [Page 44] Internet-Draft Seamless MPLS March 2011 applicability in SP networks", draft-filsfils-rtgwg-lfa-applicability-00 (work in progress), March 2010. [I-D.ietf-bfd-v4v6-1hop] Katz, D. and D. Ward, "BFD for IPv4 and IPv6 (Single Hop)", draft-ietf-bfd-v4v6-1hop-11 (work in progress), January 2010. [I-D.kothari-henderickx-l2vpn-vpls-multihoming] Kothari, B., Kompella, K., Henderickx, W., and F. Balus, "BGP based Multi-homing in Virtual Private LAN Service", draft-kothari-henderickx-l2vpn-vpls-multihoming-01 (work in progress), July 2009. [I-D.narten-iana-considerations-rfc2434bis] Narten, T. and H. Alvestrand, "Guidelines for Writing an IANA Considerations Section in RFCs", draft-narten-iana-considerations-rfc2434bis-09 (work in progress), March 2008. [I-D.raggarwa-mac-vpn] Aggarwal, R., Isaac, A., Uttaro, J., Henderickx, W., and F. Balus, "BGP MPLS Based MAC VPN", draft-raggarwa-mac-vpn-01 (work in progress), June 2010. [I-D.sajassi-l2vpn-rvpls-bgp] Sajassi, A., Patel, K., Mohapatra, P., Filsfils, C., and S. Boutros, "Routed VPLS using BGP", draft-sajassi-l2vpn-rvpls-bgp-01 (work in progress), July 2010. [PEFRR] Le Roux, J., Decraene, B., and Z. Ahmad, "Fast Reroute in MPLS L3VPN Networks - Towards CE-to-CE Protection, MPLS 2006 Conference". [RFC2629] Rose, M., "Writing I-Ds and RFCs using XML", RFC 2629, June 1999. [RFC3031] Rosen, E., Viswanathan, A., and R. Callon, "Multiprotocol Label Switching Architecture", RFC 3031, January 2001. [RFC3107] Rekhter, Y. and E. Rosen, "Carrying Label Information in BGP-4", RFC 3107, May 2001. [RFC3209] Awduche, D., Berger, L., Gan, D., Li, T., Srinivasan, V., and G. Swallow, "RSVP-TE: Extensions to RSVP for LSP Tunnels", RFC 3209, December 2001. Leymann, et al. Expires September 12, 2011 [Page 45] Internet-Draft Seamless MPLS March 2011 [RFC3353] Ooms, D., Sales, B., Livens, W., Acharya, A., Griffoul, F., and F. Ansari, "Overview of IP Multicast in a Multi- Protocol Label Switching (MPLS) Environment", RFC 3353, August 2002. [RFC3552] Rescorla, E. and B. Korver, "Guidelines for Writing RFC Text on Security Considerations", BCP 72, RFC 3552, July 2003. [RFC4090] Pan, P., Swallow, G., and A. Atlas, "Fast Reroute Extensions to RSVP-TE for LSP Tunnels", RFC 4090, May 2005. [RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private Networks (VPNs)", RFC 4364, February 2006. [RFC5036] Andersson, L., Minei, I., and B. Thomas, "LDP Specification", RFC 5036, October 2007. [RFC5283] Decraene, B., Le Roux, JL., and I. Minei, "LDP Extension for Inter-Area Label Switched Paths (LSPs)", RFC 5283, July 2008. [RFC5286] Atlas, A. and A. Zinin, "Basic Specification for IP Fast Reroute: Loop-Free Alternates", RFC 5286, September 2008. [RFC5332] Eckert, T., Rosen, E., Aggarwal, R., and Y. Rekhter, "MPLS Multicast Encapsulations", RFC 5332, August 2008. Authors' Addresses Nicolai Leymann (editor) Deutsche Telekom AG Winterfeldtstrasse 21 Berlin 10781 DE Phone: +49 30 8353-92761 Email: n.leymann@telekom.de Leymann, et al. Expires September 12, 2011 [Page 46] Internet-Draft Seamless MPLS March 2011 Bruno Decraene France Telecom 38-40 rue du General Leclerc Issy Moulineaux cedex 9, 92794 FR Phone: Fax: Email: bruno.decraene@orange-ftgroup.com URI: Clarence Filsfils Cisco Systems Brussels, Belgium Phone: Fax: Email: cfilsfil@cisco.com URI: Maciek Konstantynowicz Juniper Networks Phone: Fax: Email: maciek@juniper.net URI: Dirk Steinberg Steinberg Consulting Ringstrasse 2 Buchholz 53567 DE Email: dws@steinbergnet.net Leymann, et al. Expires September 12, 2011 [Page 47]