Internet Engineering Task Force Y. Shen, Ed. Internet-Draft R. Aggarwal Intended status: Standards Track Juniper Networks Expires: August 30, 2012 W. Henderickx Alcatel-Lucent February 27, 2012 PW Endpoint Fast Failure Protection draft-shen-pwe3-endpoint-fast-protection-01 Abstract This document specifies a fast protection mechanism for pseudowires (PWs) against egress attachment circuit (AC) failure, egress PE failure, and switching PE failure. Designed on the basis of multi- homed CE, PW redundancy, upstream label assignment and context specific label switching, the mechanism enables local repair to be performed immediately upon a failure. In particular, the router at point of local repair (PLR) can redirect PW traffic to a protector via a bypass LSP in the order of tens of milliseconds, achieving fast protection that is comparable to RSVP fast-reroute. Status of this Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on August 30, 2012. Copyright Notice Copyright (c) 2012 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of Shen, et al. Expires August 30, 2012 [Page 1] Internet-Draft PW Endpoint Fast Failure Protection February 2012 publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Specification of Requirements . . . . . . . . . . . . . . . . 4 3. Reference Models and Failure Cases . . . . . . . . . . . . . . 4 3.1. Single-Segment PW . . . . . . . . . . . . . . . . . . . . 4 3.2. Multi-Segment PW . . . . . . . . . . . . . . . . . . . . . 6 4. Theory of Operation . . . . . . . . . . . . . . . . . . . . . 7 4.1. Protector and Context Identifier . . . . . . . . . . . . . 9 4.2. Protection Models . . . . . . . . . . . . . . . . . . . . 9 4.3. Context Identifier Advertisement by IGP . . . . . . . . . 12 4.4. LSP and Context Identifier Association . . . . . . . . . . 14 4.5. PW and Context Identifier Association . . . . . . . . . . 14 4.6. Bypass LSP . . . . . . . . . . . . . . . . . . . . . . . . 15 4.6.1. RSVP Signaled Bypass LSP and Backup LSP . . . . . . . 15 4.6.2. LDP Signaled Bypass LSP . . . . . . . . . . . . . . . 16 4.7. Forwarding State on Protector . . . . . . . . . . . . . . 17 4.8. PW Label Distribution from Primary PE to Protector . . . . 19 4.8.1. Protection FEC Element Encoding for PWid . . . . . . . 21 4.8.2. Protection FEC Element Encoding for Generalized PWid . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.9. PW Label Distribution from Backup PE to Protector . . . . 23 4.10. Revertive Behavior . . . . . . . . . . . . . . . . . . . . 24 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 25 6. Security Considerations . . . . . . . . . . . . . . . . . . . 25 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 25 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 25 8.1. Normative References . . . . . . . . . . . . . . . . . . . 25 8.2. Informative References . . . . . . . . . . . . . . . . . . 26 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 27 Shen, et al. Expires August 30, 2012 [Page 2] Internet-Draft PW Endpoint Fast Failure Protection February 2012 1. Introduction Per [RFC 3985], [RFC 4447] and [RFC 5659], a pseudowire (PW) or PW segment can be thought of as a connection between a pair of forwarders hosted by two PEs, carrying an emulated layer-2 service. In the single-segment PW (SS-PW) case, a forwarder binds a PW to an attachment circuit (AC). In the multi-segment PW (MS-PW) case, a forwarder on terminating PE (T-PE) binds a PW segment to an AC, while a forwarder on switching PE (S-PE) binds one PW segment to another PW segment. PW packets are transported between PEs through an MPLS tunnel in each direction, which is called a transport LSP. In order to protect a layer-2 service against network failures, it is necessary to protect every link and node along the entire data path, including ingress AC, ingress (T-)PE, intermediate LSRs of transport LSP, S-PEs, egress (T-)PE, and egress AC. To minimize traffic disruption upon a failure, it is also desirable that each of these components is protected by a fast protection mechanism based on local repair. Today, fast protection against ingress AC failure and ingress (T-)PE failure is achievable by using multi-homed CE and redundant PWs. fast protection against failure of LSR is achievable through RSVP fast- reroute [RFC 4090]. However, there is a lack of similar protection against egress AC failure, egress (T-)PE failure, and S-PE failure. In these cases, a global repair mechanism has to be relied on. Global repair mechanisms are normally driven by ingress CE or ingress (T-)PE, and dependent on control plane convergence. Therefore, they are relatively slow in reacting to failures and restoring traffic. This document specifies a fast protection mechanism for PWs based on the technique of local repair. The mechanism can protect PWs against the following types of failures. a. Egress AC failure. b. Egress node failure: Failure of egress PE of a SS-PW; Failure of T-PE of an MS-PW. c. Switching node failure: Failure of S-PE of an MS-PW. The mechanism is relevant to networks with redundant PWs and multi- homed CEs. It is designed on the basis of MPLS upstream label assignment and context specific label switching [RFC 5331]. fast protection refers to the ability to perform local repair upon a failure in the order of tens of milliseconds, which is comparable to RSVP fast-reroute [RFC 4090]. This is achieved by establishing local protection at the router adjacent to the failure. Compared with the Shen, et al. Expires August 30, 2012 [Page 3] Internet-Draft PW Endpoint Fast Failure Protection February 2012 existing global repair mechanisms, this mechanism can provide faster failure detection and traffic restoration. However, this mechanism is intended to complement the global repair mechanisms, rather than replacing them in any way. The mechanism is applicable to LDP signaled PWs. 2. Specification of Requirements The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119. 3. Reference Models and Failure Cases This document refers to the following topologies to describe PW endpoint failures and protection procedures. These topologies are commonly seen in PW redundancy for end-to-end global protection. In this document, the fast protection mechanism also use them for the local repair purposes. This SHALL allow local repair and global repair to work in tandem to achieve broader scope of protection for services. 3.1. Single-Segment PW |<-------------- PW1 --------------->| - PE1 -------------- P1 ---------------- PE2 - / \ / \ CE1 CE2 \ / \ / - PE3 -------------- P2 ---------------- PE4 - |<-------------- PW2 --------------->| Figure 1 In Figure 1, the MPLS network consists of PE-routers and P-routers. It provides an emulation of a layer-2 service between CE1 and CE2. Each CE is multi-homed to two PEs. Hence, there are two divergent paths between the CEs. The first path uses PW1 established between PE1 and PE2, connecting the AC CE1-PE1 and the AC CE2-PE2. The Shen, et al. Expires August 30, 2012 [Page 4] Internet-Draft PW Endpoint Fast Failure Protection February 2012 second path uses PW2 established between PE3 and PE4, connecting the AC CE1-PE3 and the AC CE2-PE4. The operational states of all the PWs and ACs are up. At any given time, each CE sends traffic via only one AC and receives traffic via only one AC. The two ACs MAY or MAY NOT be the same. The AC used to send traffic is determined by the CE, and MAY rely on an end-to-end OAM mechanism between the CEs. The AC used for the CE to receive traffic is determined by the state of the MPLS network and the protection mechanism in use, as described later in this document. From the perspective of traffic towards a given CE, the set of PWs, PEs and ACs involved can be viewed to serve primary and backup (or active and standby) roles. When the MPLS network is in a steady state, the PW that is intended to carry the traffic is referred to as a primary PW. The PE at the egress of the primary PW is a primary PE. The AC connecting the CE and the primary PE is a primary AC. The other PWs that may be used to carry the traffic upon a network failure are referred to as backup PWs. The PE at the egress of a backup PW is a backup PE. The AC connecting the CE and a backup PE is a backup AC. In this document, the following primary and backup roles are assigned for the traffic going from CE1 to CE2: Primary PW: PW1 Primary PE: PE2 Primary AC: CE2-PE2 Backup PW: PW2 Backup PE: PE4 Backup AC: CE2-PE4 In this case, an egress AC failure refers to the failure of the primary AC, i.e. the AC CE2-PE2. An egress node failure refers to the failure of the primary PE, i.e. PE2. The backup PE, backup PW and backup AC may be used to carry the traffic when CE1 and CE2 switches traffic to PW2 during a global repair, or when a local repair takes effect, as described later in this document. Shen, et al. Expires August 30, 2012 [Page 5] Internet-Draft PW Endpoint Fast Failure Protection February 2012 |<-------------- PW1 --------------->| ------------- P1 ---------------- PE2 - / \ / \ CE1 -- PE1 CE2 \ / \ / ------------- P2 ---------------- PE4 - |<-------------- PW2 --------------->| Figure 2 Figure 2 shows another possible scenario, where CE2 remains multi- homed to PE2 and PE4, while CE1 is single-homed to PE1. From the perspective of egress protection for the traffic from CE1 to CE2, this topology is not much different than Figure 1. However, for the traffic in the opposite direction, i.e. from CE2 to CE1, PE1 must anticipate the traffic on PW1 and PW2, and sends it to CE1 over the AC CE1-PE1 in both cases. 3.2. Multi-Segment PW |<--------------- PW1 --------------->| |<----- SEG1 ----->|<----- SEG2 ----->| - TPE1 -------------- SPE1 --------------- TPE2 - / \ / \ CE1 CE2 \ / \ / - TPE3 -------------- SPE2 --------------- TPE4 - |<----- SEG3 ----->|<----- SEG4 ----->| |<--------------- PW2 --------------->| Figure 3 Figure 3 shows a topology that is similar to Figure 1 but in an MS-PW environment. PW1 and PW2 are both MS-PWs. PW1 is established between TPE1 and TPE2, and switched at SPE1. PW2 is established between TPE3 and TPE4, and switched at SPE2. CE1 is multi-homed to TPE1 and TPE3. CE2 is multi-homed to TPE2 and TPE4. In this document, the following primary and backup roles are assigned Shen, et al. Expires August 30, 2012 [Page 6] Internet-Draft PW Endpoint Fast Failure Protection February 2012 for the traffic going from CE1 to CE2: Primary PW: PW1 Primary T-PE: TPE2 Primary S-PE: SPE1 Primary AC: CE2-TPE2 Backup PW: PW2 Backup T-PE: TPE4 Backup S-PE: SPE2 Backup AC: CE2-TPE4 In this case, an egress AC failure refers to the failure of the primary AC, i.e. the AC CE2-TPE2. An egress node failure refers to the failure of the primary T-PE, i.e. TPE2. In addition, an switching node failure refers to the failure of the primary S-PE, i.e. SPE1. The backup T-PE, backup PW and backup AC are used in the protection against egress AC failure and egress node failure. The backup S-PE and the backup PW are used in the protection against switching node failure, as described later in this document. For consistency with the SS-PW scenario, primary T-PEs and a primary S-PEs may simply be referred to as primary PEs in this document, where specifics is not required. Similarly, backup T-PEs and backup S-PEs may be referred to as backup PEs. 4. Theory of Operation The fast protection mechanism in this document provides three types of protection for PWs, corresponding to the three types of failures described in Section 3. a. Egress AC protection b. Egress node protection for (T-)PE c. Switching node protection for S-PE The mechanism is only relevant when the target CE is multi-homed to a Shen, et al. Expires August 30, 2012 [Page 7] Internet-Draft PW Endpoint Fast Failure Protection February 2012 primary PE and one or more backup PEs, and when there is a backup PW in the network. In switching node protection, it is also assumed that there SHOULD be a backup S-PE on the backup PW. The mechanism relies on local repair to be performed by routers adjacent to failures. Such a router MUST be able to detect failures by using a rapid mechanism, such as physical layer failure detection, Bidirectional Failure Detection (BFD) [RFC 5880], etc. The router MUST also pre-establish an MPLS LSP, called bypass LSP, in anticipation of failures. This router is referred to as a point of local repair (PLR), as it serves the same role as the PLR in RSVP fast-reroute [RFC 4090]. o In egress AC protection, the PLR is considered as the primary PE that hosts the primary AC. Upon a failure of the primary AC, the PLR invokes the route of the bypass LSP and redirects traffic to a backup PE, which in turn sends the traffic to the CE via a backup AC. o In egress node protection, the PLR is considered as the penultimate hop router of transport LSP of primary PW. Upon a failure of the primary PE, the PLR invokes the route of the bypass LSP and redirects traffic to a backup PE, which in turn sends the traffic to the CE via a backup AC. o In switching node protection, the PLR is considered as the penultimate hop router of transport LSP of current primary PW segment. Upon a failure of primary S-PE, the PLR invokes the route of the bypass LSP and redirects traffic to a backup S-PE LSP. The backup S-PE then sends the traffic via the next segment of the backup PW to the backup T-PE. The backup T-PE finally sends the traffic to the CE via a backup AC. In all the above cases, each backup (S-)PE is said to serve as a "protector" for the primary PW. It is also possible to have a dedicated router serving as a protector. In this case, the protector is not a backup (S-)PE of the primary PW. During a local repair, the PLR still redirects traffic to the protector via a bypass LSP. The protector MUST then send the traffic to a backup (S-)PE via an MPLS LSP. Finally, the backup (S-)PE sends the traffic towards the CE via a backup AC or backup PW segment. In any case, when a PLR redirects traffic to a protector during local repair, it MUST keep the PW label intact. This simplifies the backup forwarding state that the PLR must install in advance, and reduces overheads in setting up the protection. This also means that the Shen, et al. Expires August 30, 2012 [Page 8] Internet-Draft PW Endpoint Fast Failure Protection February 2012 protector MUST be able to forward the traffic based on a label that is assigned by the primary PE. From the protector's perspective, this label is an upstream assigned label [RFC 5331]. Hence, the protector MUST look up the label in a context-specific label space. 4.1. Protector and Context Identifier A router that protects a PW against egress endpoint failures and is able to forward traffic based on the PW label assigned by the primary PE is called a protector. This is the router that a PLR will redirect traffic to during a local repair. It MUST forward the traffic in such a way that the traffic can eventually reach the target CE. Examples of protector include backup (S-)PE and a dedicated router that assumes such a role. A given protector MAY protect multiple PWs that are terminated at one or multiple primary PEs. Likewise, the PWs terminated at a given primary PE MAY be protected by multiple protectors, each for a subset of the PWs. In any case, each PW is associated with one and only one pair of {primary PE, protector}. For each ordered pair of {primary PE, protector}, an IPv4/v6 address is assigned to identify the two routers and their relationship. This address is referred to as a "context identifier", as it indicates the forwarding context for the protector with regard to the primary PE. Each context identifier MUST be globally unique, or unique within the address space of the network where the primary PE and the protector reside. 4.2. Protection Models There are two protection models based on the location and role of a protector. 1. Co-located protector In this model, the protector is a backup PE that is directly connected to the target CE via a backup AC, or it is a backup S-PE on a backup PW. That is, the protector is co-located with the backup (S-)PE. In egress AC protection and egress node protection, when a protector receives traffic from the PLR, it MUST send the traffic directly to the CE via the backup AC. This is shown in Figure 4, where PE2 is the PLR for egress AC failure, P3 is the PLR for PE2 failure, and PE4 (the backup PE) is the protector. Shen, et al. Expires August 30, 2012 [Page 9] Internet-Draft PW Endpoint Fast Failure Protection February 2012 |<-------------- PW1 --------------->| - PE1 -------------- P1 ------- P3 ----- PE2 - / PLR \ PLR \ / \ \ CE1 \ CE2 \ \ / \ \ / - PE3 -------------- P2 ---------------- PE4 - protector |<-------------- PW2 --------------->| Figure 4 In switching node protection, when a protector receives traffic from the PLR, it MUST send the traffic via the next segment of the backup PW. The T-PE of the backup PW MUST send the traffic to the CE via a backup AC. This is shown in Figure 5, where P4 is the PLR for SPE1 failure, and SPE2 (the backup S-PE) serves as the protector for SPE1 (the primary S-PE). |<--------------- PW1 --------------->| |<----- SEG1 ----->|<----- SEG2 ----->| - TPE1 ----- P4 ----- SPE1 -------------- TPE2 - / PLR \ \ / \ \ CE1 \ CE2 \ \ / \ \ / - TPE3 --------------- SPE2 -------------- TPE4 - protector |<----- SEG3 ----->|<----- SEG4 ----->| |<--------------- PW2 --------------->| Figure 5 In this model, the number of context identifiers required by a network is the number of distinct {primary PE, backup PE} pairs. Therefore, the model is suitable for scenarios where the number backup PEs for any given primary PE is relatively small. Shen, et al. Expires August 30, 2012 [Page 10] Internet-Draft PW Endpoint Fast Failure Protection February 2012 2. Centralized protector In this model, the protector is a dedicated P router or PE router that protects all the primary PWs of one or multiple primary PEs. In egress AC protection and egress node protection, the protector MAY or MAY NOT be a backup PE with a direct connection to the target CE. In switching node protection, it MAY or MAY NOT be a backup S-PE on a backup PW. In egress AC protection and egress node protection, when the protector receives traffic from the PLR, if the protector has a direct connection (i.e. backup AC) to the CE, it MUST send the traffic to the CE via the backup AC, similar to Figure 4. Otherwise, it MUST send the traffic to a backup PE, which MUST then send the traffic to the CE via a backup AC. This is shown in Figure 6, where the protector receives traffic from P3 or PE2 (the PLR) and sends the traffic to PE4 (the backup PE). The protector may be protecting other PWs as well, which are not shown. |<-------------- PW1 --------------->| - PE1 -------------- P1 ------- P3 ----- PE2 - / PLR \ PLR \ / \ \ CE1 protector CE2 \ \ / \ \ / - PE3 -------------- P2 ---------------- PE4 - |<-------------- PW2 --------------->| Figure 6 In switching node protection, when the protector receives traffic from the PLR, if the protector is a backup S-PE on a backup PW, it MUST send the traffic via the next segment of the backup PW, and the T-PE of the backup PW MUST send the traffic to the CE via a backup AC, similar to Figure 5. Otherwise, the protector MUST first send the traffic to an backup S-PE, which MUST then send the traffic via the next segment of the backup PW. Finally, the T-PE of the backup PW MUST send the traffic to the CE via a backup AC. This is shown in Figure 7, where the protector sends traffic to SPE2 (the backup S-PE). The protector may be protecting other PW segments as well, which are not shown. Shen, et al. Expires August 30, 2012 [Page 11] Internet-Draft PW Endpoint Fast Failure Protection February 2012 |<--------------- PW1 --------------->| |<----- SEG1 ----->|<----- SEG2 ----->| - TPE1 ----- P4 ----- SPE1 -------------- TPE2 - / PLR \ \ / \ \ CE1 protector CE2 \ \ / \ \ / - TPE3 --------------- SPE2 -------------- TPE4 - |<----- SEG3 ----->|<----- SEG4 ----->| |<--------------- PW2 --------------->| Figure 7 In this model, each primary PE MAY only need one protector to protect all of its PWs. Therefore, the number of context identifiers required by a network can be as low as the number of primary PEs. A network MAY use either protection model, or a combination of both, depending on requirements. 4.3. Context Identifier Advertisement by IGP The context identifier of a pair of {primary PE, protector} MUST be advertised by IGP and IGP-TE as a virtual node that is connected to both the primary PE and the protector via unnumbered point-to-point links. This virtual node is called a "context node", as shown in Figure 8. This is useful to facilitate path computation and selection for the context identifier (Section 4.4). Shen, et al. Expires August 30, 2012 [Page 12] Internet-Draft PW Endpoint Fast Failure Protection February 2012 primary PE - \ (metric 1, TE metric 1, bandwidth max) \ \ \ \ (metric max, TE metric max, bandwidth 0) | context node | / (metric max, TE metric max, bandwidth 0) / / / / (metric max, TE metric max, bandwidth max) protector - Figure 8 The advertisement involves the following parts: o The primary PE advertises the context node with two unnumbered links to the primary PE and the protector, respectively. The router ID of the context node is the context identifier. Both unnumbered links are advertised with maximum routable metric, maximum TE metric, and zero bandwidth. Other TE parameters may be advertised for the links based on configuration. In the case of ISIS [ISO10589], the system ID is derived from the context identifier with Binary Coded Decimal (BCD) encoding. The resulting system-ID MUST be unique. The LSP (Link State Packet) MUST include an Area Address TLV, and MAY include a Dynamic Hostname TLV. The area addresses MUST be a subset of or preferably identical to those advertised by the primary PE at the corresponding level. The hostname MAY be derived from the context identifier and the primary PE's hostname. The Overload bit MUST be set to 1. The Attached and the Partition Repair bits MUST be set to 0. In the case of OSPF [RFC 2328], the Advertising Router and Link State ID of the router LSA (Link State Advertisement) MUST both be the context identifier. All options bits in the router LSA MUST be set to zero. o The primary PE advertises an unnumbered link to the context node, with metric 1, TE metric 1, and maximum bandwidth. Other TE parameters may be advertised for the link based on configuration. Shen, et al. Expires August 30, 2012 [Page 13] Internet-Draft PW Endpoint Fast Failure Protection February 2012 o The protector advertises an unnumbered link to the context node, with maximum routable metric, maximum TE metric, and maximum bandwidth. Other TE parameters may be advertised for the link based on configuration. 4.4. LSP and Context Identifier Association The transport LSP of a primary PW MUST be destined for the context identifier of the {primary PE, protector} of the PW, rather than an address of the primary PE. This MAY be based on configuration or an auto-discovery mechanism. Similarly, a bypass LSP initiated by a PLR towards a protector MUST also be destined for that context identifier, rather than an address of the protector. When the transport LSP is an RSVP signaled LSP and bypass LSP creation is triggered by RSVP fast-reroute mechanism (Section 4.6.1), the bypass LSP MAY inherit the context identifier as the destination from the transport LSP. Otherwise, this MAY be based on configuration as well. Since the context identifier is advertised by IGP and IGP-TE as a context node connected to both the primary PE and the protector, the path computation and selection for these LSPs MUST meet the following requirements. o The transport LSP MUST prefer the primary PE to reach the context identifier. The path MUST be terminated at the primary PE. o The bypass LSP MUST avoid the primary PE, leaving the protector as the only viable router to reach the context identifier. The path MUST be terminated at the protector, and MUST NOT traverse the primary PE. When these LSPs are RSVP signaled LSPs, these requirements can be satisfied by using Constrained Shortest Path First (CSPF) algorithm. When the LSPs are LDP signaled LSPs, it MAY require both the primary PE and the protector to advertise the context identifier as an LDP IPv4/v6 FEC. 4.5. PW and Context Identifier Association The ingress PE of a primary PW (or PW segment) MUST associate the PW with the primary (S-)PE, as in normal LDP signaling. The ingress PE MUST also associate the PW with the context identifier of the {primary PE, protector}, and use the context identifier as the destination address to resolve a transport LSP for the PW. As described in Section 4.4, a candidate LSP MUST be destined for the Shen, et al. Expires August 30, 2012 [Page 14] Internet-Draft PW Endpoint Fast Failure Protection February 2012 context identifier. The association MAY be based on configuration, or the ingress PE MAY learn it from the primary PE. In the later case, the primary PE MAY advertise the context identifier as "third party next hop" in an IPv4/v6 Interface_ID TLV [RFC 3471, RFC 3472] in LDP Label Mapping message. 4.6. Bypass LSP The set of PWs protected by a PLR may be associated with one or multiple pairs of {primary PE, protector}. The PLR MUST establish a bypass LSP to each protector for each distinct context identifier of the protector. The destination of the bypass LSP MUST be the context identifier. For examples, in Figure 4 and Figure 6, a bypass LSP is established from PE2 (PLR for egress AC failure) to the protector, and another bypass LSP is established from P3 (PLR for egress node failure) to the protector. The destinations of both bypass LSPs are the context identifier of {PE2, protector}. In Figure 5 and Figure 7, a bypass LSP is established from P4 (PLR for switching node failure) to the protector. Its destination is the context identifier of {SPE1, protector}. During a local repair, a PLR MUST redirect traffic to the protector via the bypass LSP with PW label intact. Each PW packet will carry a label stack of two labels. The inner label is the PW label, and the outer label is the bypass LSP's label. The protector MUST then forward the traffic based on this PW label, i.e. an upstream assigned label that is assigned by the primary PE. In order for the protector to perform such kind of forwarding, the bypass LSP MUST use ultimate hop popping (UHP) [RFC 3031]. That is, the protector MUST assign an un-reserved label to the bypass LSP. This label indicates the forwarding context, i.e. the context- specific label space of the primary PE, in which all PW packets received on the bypass LSP MUST be forwarded. The protector MUST install a forwarding entry for this label, with a label pop and a nexthop pointing to the context-specific label space. Thus, all packets with an inner label will be forwarded based on a label lookup in that label space. A bypass LSP may be signaled by RSVP or LDP, which may or may not be the same as the signaling protocol of transport LSPs. 4.6.1. RSVP Signaled Bypass LSP and Backup LSP If a bypass LSP is an RSVP signaled LSP, its path MAY be statically configured or dynamically computed by CSPF (Section 4.4). Shen, et al. Expires August 30, 2012 [Page 15] Internet-Draft PW Endpoint Fast Failure Protection February 2012 If the transport LSP is LDP signaled, the bypass LSP will be a standalone LSP from the PLR to the protector. Its creation MAY be based on configuration. If the transport LSP is RSVP signaled, creation of the bypass LSP depends on specific protection scenarios. In egress AC protection, the PLR is the primary PE. In this case, the bypass LSP is a standalone LSP from the PLR to the protector, and its creation MAY be based on configuration. In egress node protection and switching node protection, the PLR is the penultimate-hop router of the transport LSP. In this case, the PLR MUST rely on the RSVP facility-backup fast-reroute mechanism to create the bypass LSP and perform local repair, as described below. o When the primary PE builds an RRO for Resv message of the transport LSP, it MUST encode the context identifier (i.e. context node) as IPv4/v6 address and implicit NULL (3) as label, before inserting its own address and label. This will allow the PLR to view itself as two hops away from the destination, with the primary PE as nexthop, and the context node as next-nexthop. o The PLR SHOULD start signaling a node-protection bypass LSP based on the "local protection desired" and "node protection desired" bits that are set in SESSION_ATTRIBUTE of Path message of the transport LSP [RFC 2205, RFC 3209, RFC 4090]. o After the bypass LSP is established, the PLR MUST set the "local protection available" and "node protection" bits in the RRO of Resv message of the transport LSP. o In the event of an egress node or switching node failure, the PLR MUST signal a backup LSP [RFC 4090] to the protector via the bypass LSP. The protector MUST terminate the backup LSP as egress router. After the backup LSP is established, PLR MUST set the "local protection in use" bit in the RRO of Resv message of the transport LSP. This procedure only imposes a specific requirement on the primary PE to insert an extra hop in the RRO of Resv message. The PLR SHOULD behave as in normal RSVP facility-backup fast-reroute. In fact, the procedure is transparent to the PLR, and the PLR does not need to be aware of it in order to participate. 4.6.2. LDP Signaled Bypass LSP If a bypass LSP is an LDP signaled LSP, its FEC MUST be the context identifier advertised by the protector (Section 4.4). The PLR MUST select this FEC to perform local repair. Shen, et al. Expires August 30, 2012 [Page 16] Internet-Draft PW Endpoint Fast Failure Protection February 2012 4.7. Forwarding State on Protector A protector MUST be able to forward traffic based on the PW labels assigned by primary PEs. Hence, it MUST learn the PW labels from the primary PEs, and maintain the labels in a separate context-specific label space [RFC 5331] for each primary PE (Section 4.8). In the control plane, each context-specific label space is identified by the context identifier of associated {primary PE, protector}. When the protector learns a label from a primary PE, it MUST map the label to a context-specific label space via this context identifier. In the forwarding plane, each context-specific label space is indicated by the UHP labels of associated bypass LSPs. In Figure 9, PE4 is a co-located protector that protects PW1 against egress AC failure and egress node failure. It maintains a context- specific label space for PE2, which is identified by the context identifier of {PE2, PE4}. It learns from PE2 the label that PE2 has assigned to PW1, and installs an forwarding entry for the label in the context-specific label space. The nexthop of the forwarding entry indicates a label pop with outgoing interface pointing to the backup AC CE2-PE4. |<-------------- PW1 --------------->| - PE1 -------------- P1 ------- P3 ----- PE2 - / PLR \ PLR \ / \ \ CE1 \ CE2 \ \ / \ \ / - PE3 -------------- P2 ---------------- PE4 - protector |<-------------- PW2 --------------->| Figure 9 In Figure 10, SPE2 is a co-located protector that protects PW1 against switching node failure. It maintains a context-specific label space for SPE1, which is identified by the context identifier of {SPE1, SPE2}. It learns the label that SPE1 has assigned to the PW segment SEG1, and installs a forwarding entry in the context- specific label space. The nexthop of the forwarding entry indicates a label swap to the label of the PW segment SEG4, and then a label push with the label of the transport LSP of SEG4. Shen, et al. Expires August 30, 2012 [Page 17] Internet-Draft PW Endpoint Fast Failure Protection February 2012 |<--------------- PW1 --------------->| |<----- SEG1 ----->|<----- SEG2 ----->| - TPE1 ----- P4 ----- SPE1 --------------- TPE2 - / PLR \ \ / \ \ CE1 \ CE2 \ \ / \ \ / - TPE3 --------------- SPE2 --------------- TPE4 - protector |<----- SEG3 ----->|<----- SEG4 ----->| |<--------------- PW2 --------------->| Figure 10 In the centralized protector model, for each primary PW of which the protector is not a backup (S-)PE, the protector MUST also learn the label of a backup PW from a backup (S-)PE (Section 4.9). This is the backup (S-)PE that the protector will send traffic to. The protector MUST use the label as the outgoing label for the forwarding entry of the primary PW label in the context-specific label space. In Figure 11, the protector is a centralized protector that protects PW1 against egress AC failure and egress node failure. It maintains a context-specific label space for PE2, which is identified by the context identifier of {PE2, protector}. It learns from PE2 the label that PE2 has assigned to PW1, and learns from PE4 the label that PE4 has assigned to PW2. It installs a forwarding entry for PW1's label in the context-specific label space. The nexthop of the forwarding entry is a label swap to PW2's label, followed by a label push with the label of a transport LSP from the protector to PE4. Shen, et al. Expires August 30, 2012 [Page 18] Internet-Draft PW Endpoint Fast Failure Protection February 2012 |<-------------- PW1 --------------->| - PE1 -------------- P1 ------- P3 ----- PE2 - / PLR \ PLR \ / \ \ CE1 protector CE2 \ \ / \ \ / - PE3 -------------- P2 ---------------- PE4 - |<-------------- PW2 --------------->| Figure 11 In Figure 12, the protector is a centralized protector that protects the PW segment SEG1 of PW1 against switching node failure of SPE1. It maintains a context-specific label space for SPE1, which is identified by the context identifier of {SPE1, protector}. It learns from SPE1 the label that SPE1 has assigned to SEG1, and learns from SPE2 the label that SPE2 has assigned to SEG3. It installs a forwarding entry for SEG1's label in the context-specific label space. The nexthop of the forwarding entry is a label swap to the label of SEG3, followed by a label push with the label of a transport LSP from the protector to SPE2. |<--------------- PW1 --------------->| |<----- SEG1 ----->|<----- SEG2 ----->| - TPE1 ----- P4 ----- SPE1 -------------- TPE2 - / PLR \ \ / \ \ CE1 protector CE2 \ \ / \ \ / - TPE3 --------------- SPE2 -------------- TPE4 - |<----- SEG3 ----->|<----- SEG4 ----->| |<--------------- PW2 --------------->| Figure 12 4.8. PW Label Distribution from Primary PE to Protector A primary PE MUST distribute the label of each primary PW to the protector that protects the PW. This PW label is considered as upstream assigned label from the protector's perspective. Shen, et al. Expires August 30, 2012 [Page 19] Internet-Draft PW Endpoint Fast Failure Protection February 2012 To achieve this, the primary PE MUST establish a targeted LDP session with the protector. For each primary PW, the primary PE MUST advertise over that session a Protection FEC Element via Label Mapping message. The Protection FEC Element is a new LDP FEC, and its encoding is described below. The PW's label is encoded in the message using the Upstream-Assigned Label TLV defined in [LDP- UPSTREAM]. The Protection FEC Element and the PW label together represent the primary PE's forwarding state for the PW. The Label Mapping message MUST also carry an IPv4/v6 Interface_ID TLV [LDP- UPSTREAM, RFC 3471] encoded with the context identifier of the {primary PE, protector}. The protector that receives this Label Mapping message MUST install a forwarding entry for the PW label in the context-specific label space identified by the context identifier. As mentioned above, the nexthop of the forwarding entry MUST allow packets to be sent towards the target CE via a backup AC or a backup (S-)PE, depending on the protection model and SS-PW or MS-PW scenario involved. The Protection FEC Element has type 0x83. It is defined as below: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type(0x83) | Reserved | Encoding Type | Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | | ~ PW Information ~ | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 13 - Encoding Type Type of format that PW Information field is encoded. - Length Length of PW Information field in octets. - PW Information Shen, et al. Expires August 30, 2012 [Page 20] Internet-Draft PW Endpoint Fast Failure Protection February 2012 Field of variable length that specifies a PW For Encoding Type, 1 is defined for the PWid FEC Element format, and 2 is defined for the Generalized PWid FEC Element format [RFC 4447]. 4.8.1. Protection FEC Element Encoding for PWid 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type(0x83) | Reserved | Enc Type(1) | Length(16) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Ingress PE Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Egress PE Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Group ID | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | PW ID | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |C| PW Type | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 14 - Ingress PE Address IP address of the ingress PE of PW. - Egress PE Address IP address of the egress PE of PW. - Group ID An arbitrary 32-bit value that represents a group of PWs and that is used to create groups in the PW space. - PW ID A non-zero 32-bit connection ID that, together with the PW Type field, identifies a particular PW. - Control word bit (C) Shen, et al. Expires August 30, 2012 [Page 21] Internet-Draft PW Endpoint Fast Failure Protection February 2012 A bit that flags the presence of a control word on this PW. If C = 1, control word is present; If C = 0, control word is not present. - PW Type A 15-bit quantity that represents the type of PW. 4.8.2. Protection FEC Element Encoding for Generalized PWid 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type(0x83) | Reserved | Enc Type(2) | Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Ingress PE Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Egress PE Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |C| PW Type | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | AGI Type | Length | Value | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ~ AGI Value (contd.) ~ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | AII Type | Length | Value | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ~ SAII Value (contd.) ~ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | AII Type | Length | Value | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ~ TAII Value (contd.) ~ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 15 - Ingress PE Address IP address of the ingress PE of PW. - Egress PE Address Shen, et al. Expires August 30, 2012 [Page 22] Internet-Draft PW Endpoint Fast Failure Protection February 2012 IP address of the egress PE of PW. - Control word bit (C) A bit that flags the presence of a control word on this PW. If C = 1, control word is present; If C = 0, control word is not present. - PW Type A 15-bit quantity that represents the type of PW. - AGI Type, Length, Value, AGI Value Attachment Group Identifier of PW. - SAII Type, Length, Value, SAII Value Source Attachment Individual Identifier of PW. - TAII Type, Length, Value, TAII Value Target Attachment Individual Identifier of PW. 4.9. PW Label Distribution from Backup PE to Protector In the centralized protection model, in addition to learning PW labels from primary PEs (Section 4.8), a protector MUST also learn from backup (S-)PEs the labels of backup PWs and backup PW segments for which the protector is not a backup (S-)PE. To achieve this, each backup (S-)PE MUST establish a targeted LDP session with the protector. The backup PE MUST advertise over that session a Protection FEC Element for the backup PW via Label Mapping message. The content of this Protection FEC Element MUST match the Protection FEC Element that the primary PE advertises to the protector (section 4.8). The Label Mapping message MUST also include a Generic Label TLV encoded with the backup PW's label. The context identifier SHOULD not be encoded in Interface_ID TLV in this message. The Protection FEC Element and the backup PW's label together represent the backup PE's forwarding state for the backup PW. The protector that receives this Label Mapping message MUST associate the backup PW with the primary PW, based on the common Protection FEC Element. It MUST distinguish between the message from the primary PE and the message from the backup PE based on the presence and absence of context identifier in Interface_ID TLV. It MUST install a forwarding entry for the primary PW's label in the context-specific Shen, et al. Expires August 30, 2012 [Page 23] Internet-Draft PW Endpoint Fast Failure Protection February 2012 label space indentified by the context identifier. The nexthop of the forwarding entry MUST be a label swap to the backup PW's label, followed by a label push with the label of a transport LSP from the protector to the backup PE. 4.10. Revertive Behavior After a PLR locally repairs a primary PW and redirects traffic to a protector, there are two strategies for restoring the traffic to a fully working PW. o Global revertive mode While the traffic is taking a detour via the protector, if the ingress CE is multi-homed (Figure 1), it MAY switch the traffic to a backup AC which is bound to a backup PW. Or, if the ingress PE hosts a backup PW (Figure 2), it MAY switch the traffic to the backup PW. These procedures are referred to as global repair, and are driven by ingress CE or ingress PE. Possible triggers of global repair include PW status, OAM, BFD, and control plane convergence. o Local revertive mode The PLR MAY move traffic back to the primary PW, after the failure is resolved. In egress AC protection, upon detecting that the primary AC is restored, the PLR MAY start forwarding traffic via the AC again. Likewise, in egress node protection and switching node protection, upon detecting that the primary PE is restored, the PLR MAY re-signal the primary transport LSP to the primary PE. After the LSP is re-established, the PLR MAY move the traffic back to the LSP. These procedures are referred to as local reversion. The fast protection mechanism in this document SHOULD always be used in tandem with the globally revertive mode. Particularly in the case of egress (S-)PE failure, if the ingress PE or the protector loses communication with the (S-)PE for an extensive period of time, the LDP session between them may go down. Consequently, the ingress PE may bring down the primary PW, or the protector may delete the forwarding entry of the primary PW label from the context-specific label space. In either case, the service will be disrupted. In other words, although the fast protection can temporarily repair traffic, control plane states may start to time out if the failure persists. Therefore, it is recommended that the global revertive mode SHOULD always be established in advance, so that it can move traffic to a fully working backup PW shortly after the local repair. The local revertive mode is optional. In the circumstances where the Shen, et al. Expires August 30, 2012 [Page 24] Internet-Draft PW Endpoint Fast Failure Protection February 2012 failure is caused by resource flapping, local reversion MAY be dampened to limit potential disruptions. Local revertive mode MAY be disabled completely by configuration. 5. IANA Considerations IANA maintains a registry of LDP FECs at the registry "Label Distribution Protocol" in the sub-registry called "Forwarding Equivalence Class (FEC) Type Name Space". This document defines a new LDP Protection FEC Element in Section 4.8. IANA has assigned the type value 0x83 to it. 6. Security Considerations The security considerations discussed in RFC 5036, RFC 5331, RFC 3209, and RFC 4090 apply to this document. 7. Acknowledgements This document leverages work done by Hannes Gredler, Yakov Rekhter, Minto Jeyananth and several others on MPLS edge protection. Thanks to Nischal Sheth, Bhupesh Kothari, and Kevin Wang for their contribution. 8. References 8.1. Normative References [RFC3985] Bryant, S. and P. Pate, "Pseudo Wire Emulation Edge-to- Edge (PWE3) Architecture", RFC 3985, March 2005. [RFC5659] Bocci, M. and S. Bryant, "An Architecture for Multi- Segment Pseudowire Emulation Edge-to-Edge", RFC 5659, October 2009. [RFC4447] Martini, L., Rosen, E., El-Aawar, N., Smith, T., and G. Heron, "Pseudowire Setup and Maintenance Using the Label Distribution Protocol (LDP)", RFC 4447, April 2006. [RFC5331] Aggarwal, R., Rekhter, Y., and E. Rosen, "MPLS Upstream Label Assignment and Context-Specific Label Space", RFC 5331, August 2008. Shen, et al. Expires August 30, 2012 [Page 25] Internet-Draft PW Endpoint Fast Failure Protection February 2012 [RFC5036] Andersson, L., Minei, I., and B. Thomas, "LDP Specification", RFC 5036, October 2007. [RFC2205] Braden, B., Zhang, L., Berson, S., Herzog, S., and S. Jamin, "Resource ReSerVation Protocol (RSVP) -- Version 1 Functional Specification", RFC 2205, September 1997. [RFC3209] Awduche, D., Berger, L., Gan, D., Li, T., Srinivasan, V., and G. Swallow, "RSVP-TE: Extensions to RSVP for LSP Tunnels", RFC 3209, December 2001. [RFC4090] Pan, P., Swallow, G., and A. Atlas, "Fast Reroute Extensions to RSVP-TE for LSP Tunnels", RFC 4090, May 2005. [RFC3471] Berger, L., "Generalized Multi-Protocol Label Switching (GMPLS) Signaling Functional Description", RFC 3471, January 2003. [RFC3472] Ashwood-Smith, P. and L. Berger, "Generalized Multi- Protocol Label Switching (GMPLS) Signaling Constraint- based Routed Label Distribution Protocol (CR-LDP) Extensions", RFC 3472, January 2003. [RFC3031] Rosen, E., Viswanathan, A., and R. Callon, "Multiprotocol Label Switching Architecture", RFC 3031, January 2001. [RFC2328] Moy, J., "OSPF Version 2", STD 54, RFC 2328, April 1998. [RFC5880] Katz, D. and D. Ward, "Bidirectional Forwarding Detection (BFD)", RFC 5880, June 2010. [LDP-UPSTREAM] Aggarwal, R. and J. Roux, "MPLS Upstream Label Assignment for LDP", draft-ietf-mpls-ldp-upstream (work in progress), 2011. [ISO10589] ISO, "Intermediate System to Intermediate System intra- domain routeing information exchange protocol for use in conjunction with the protocol for providing the connectionless-mode network service (ISO 8473)", International Standard 10589:2002, Second Edition, 2002. 8.2. Informative References [RFC5920] Fang, L., "Security Framework for MPLS and GMPLS Networks", RFC 5920, July 2010. Shen, et al. Expires August 30, 2012 [Page 26] Internet-Draft PW Endpoint Fast Failure Protection February 2012 Authors' Addresses Yimin Shen (editor) Juniper Networks 10 Technology Park Drive Westford, MA 01886 USA Phone: +1 9785890722 Email: yshen@juniper.net Rahul Aggarwal Juniper Networks 1194 N Mathilda Avenue Sunnyvale, CA 94089 USA Phone: +1 4089362720 Email: rahul@juniper.net Wim Henderickx Alcatel-Lucent Copernicuslaan 50 2018 Antwerp Belgium Email: wim.henderickx@alcatel-lucent.be Shen, et al. Expires August 30, 2012 [Page 27]