MPLS-TP 1toN Protection
draft-ezy-mpls-1ton-protection-01.txt

Abstract

As part of the Transport Profile for Multiprotocol Label Switching (MPLS-TP) there is a requirement to support 1:n linear protection for transport paths. This requirement is elaborated on in the MPLS-TP Survivability Framework document [SurvivFwk]. The basic protocol for linear protection was specified in the MPLS-TP Linear Protection document [LinProt] but is limited to 1+1 and 1:1 protection. This document extends the protocol defined there to address the additional functionality necessary to support scenarios of a single protection path preconfigured to provide protection of multiple transport paths between two joint endpoints.

This document is a product of a joint Internet Engineering Task Force (IETF) / International Telecommunications Union Telecommunications Standardization Sector (ITU-T) effort to include an MPLS Transport Profile within the IETF MPLS and PWE3 architectures to support the capabilities and functionalities of a packet transport network as defined by the ITU-T.

Status of this Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on September 11, 2012.

Copyright Notice

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.

This document may contain material from IETF Documents or IETF Contributions published or made publicly available before November 10, 2008. The person(s) controlling the copyright in some of this material may not have granted the IETF Trust the right to allow modifications of such material outside the IETF Standards Process. Without obtaining an adequate license from the person(s) controlling the copyright in such materials, this document may not be modified outside the IETF Standards Process, and derivative works of it may not be created outside the IETF Standards Process, except to format it for publication as an RFC or to translate it into languages other than English.

1. Introduction
1.1. 1:n Protection architecture
1.2. Locking operation
1.3. Non-Locking
1.4. Path priority
1.5. Preemption
1.6. Contributing authors
2. Conventions used in this document
2.1. Acronyms
2.2. Definitions and Terminology
3. Use cases and scenarios
3.1. Non-locking use case: Per-node label space
3.2. Locking use-case:
3.3. PSC Scenarios
3.3.1. Unidirectional failure cases
3.3.1.1. Non-locking
3.3.1.2. Locking
3.3.2. Bidirectional fault scenarios
3.3.2.1. Non-Locking
3.3.2.2. Locking
3.3.3. Preemption scenarios
3.3.3.1. Unidirectional non-locking
3.3.3.2. Unidirectional locking
3.3.3.3. Bidirectional non-locking
3.3.3.4. Bidirectional locking
4. Changes to PSC
4.1. PSC
4.2. Changes to PSC Payload
4.2.1. Locking (L) flag
4.2.2. Fault path (FPath) field
4.2.3. Data path (Path) field
4.3. Changes to PSC Operation
4.3.1. Basic operation
4.3.2. Two-phased operation
4.3.3. Acknowledge message
4.3.4. Wait for Acknowledge (WFA) timer
4.3.5. Additional PSC State
4.3.5.1. Wait for Acknowledge (WFA) State
5. IANA Considerations
6. Security Considerations
7. Acknowledgements
8. References
8.1. Normative References
8.2. Informative References
Appendix A. PSC state machine tables
Authors' Addresses

1. Introduction

The MPLS Transport Profile (MPLS-TP) Requirements document [TPReq] includes requirements for the necessary survivability tools that are required for MPLS based transport networks. Network survivability is the ability of a network to recover traffic delivery following failure, or degradation of network resources. Requirement 67 lists various types of 1:n protection architectures that are required for MPLS-TP. The MPLS-TP Survivability Framework [SurvivFwk] is a framework for survivability in MPLS-TP networks, and describes recovery elements, types, methods, and topological considerations, focusing on mechanisms for recovering MPLS-TP Label Switched Paths (LSPs).

Linear protection in mesh networks – networks with arbitrary interconnectivity between nodes – is described in Section 4.7 of [SurvivFwk]. Linear protection provides rapid and simple protection switching. In a mesh network, linear protection provides a very suitable protection mechanism because it can operate between any pair of points within the network. It can protect against a defect in an intermediate node, a span, a transport path segment, or an end-to-end transport path.

[LinProt] defines a Protection State Coordination (PSC) protocol that supports the different 1+1 and 1:1 architectures described in [SurvivFwk]. The PSC protocol is a single-phased protocol that allows the two endpoints of the protection domain to coordinate the protection switching operation when a switching condition is detected on the transport paths of the protection domain.

This document extends the PSC protocol to allow it to support a protection domain that includes multiple working transport paths that are protected by a single protection transport path. All of the working transport paths and the protection transport path share common end points. The protection transport path is pre-allocated with resources to transport the traffic normally carried by any one of the working transport paths. This is the architecture described in [SurvivFwk] as 1:n protection, and is the generalization of the 1:1 protection architecture already supported by PSC.

1.1. 1:n Protection architecture

Linear protection switching is a fully allocated survivability mechanism. It is fully allocated in the sense that the route and bandwidth of the protection path is reserved for a set of working paths. For 1:n protection the protection path is allocated to protect any one of n working paths between the two endpoints of the protection domain.

            +-----+                             +-----+
            |     |=============================|     |
            |LER-A|     Working Path #1         |LER-Z|
            |     |                             |     |
            |     |=============================|     |
            |     |     Working Path #2         |     |
            |     |                             |     |
            |     |=============================|     |
            |     |     Working Path #3         |     |
            |     |                             |     |
            |     |      ooo                    |     |
            |     |                             |     |
            |     |=============================|     |
            |     |     Working Path #N         |     |
            |     |                             |     |
            |     |    Protection Path          |     |
            |     |*****************************|     |
            |     |                             |     |
            +-----+                             +-----+

                  |--------Protection Domain--------|

Figure 1 shows a protection domain with N working transport paths and a single protection path. In 1:n protection, the protection path may transport the traffic of only a single working path at any particular time. The identity of the working path that is being protected must be communicated between the two endpoints.

Unless otherwise specified, all examples will be based on the network topology in Figure 1, with the working paths referenced as Wi (for 1<=i<=N) and the protection path referenced as P. The end-points of the protection domain will be referred to as LER-A and LER-Z.

The different working paths may be disjoint at the intermediary points on the path between LER-A and LER-Z and may also have different resource requirements. In addition, each of the working paths may be assigned a priority that could be used to decide which working path would be protected in cases of conflict (see more on this topic in Section 1.5). It is usually advised to arrange these protection groups in a way that would minimize any potential conflict situation.

1:n protection in MPLS supports two modes of operation - locking and non-locking. Locking mirrors the behavior that is used by many transport protection mechanisms, and is necessary in some cases but may incur increased latency (and thus packet loss), as a result of prolonged switching time, in comparison to the non-locking case. Non-locking 1:n can be used in many MPLS networks and has far less packet loss as compared to locking, but must be used with care - since incorrect use of non-locking can lead to misconnectivity.

1.2. Locking operation

The high-level functionality of the locking operation mode of 1:n protection would follow the following basic steps:

LER-A detects a unidirectional failure of W1 and stops sending traffic on W1.
LER-A transmits a PSC SF message to LER-Z indicating that W1 has failed and its traffic should be redirected to P. No traffic is sent on P at this point.
LER-Z receives the PSC message from LER-A and begins transmitting W1 traffic in P, and sends a PSC message to LER-A indicating that W1 is now being protected by P. LER-A receives the normal data traffic intended for W1 from P, LER-Z receives the W1 data traffic from P and also bridges W1 data traffic into P.
LER-A receives the PSC message from LER-Z and begins transporting W1 traffic in P — that is, LER-A bridges W1 into P.

It should be clear from this description that no traffic is sent over P until LER-Z processes the PSC message from LER-A, and that traffic is only sent unidirectionally (Z->A) until LER-A processes the "reply" PSC message from LER-Z. As the message processing time is expected to be dwarfed by the propagation delay between LER-A to LER-Z, it can be said that there is complete traffic loss between the endpoints for the duration of the one-way propagation delay from LER-A to LER-Z, and full bidirectional traffic flow is not fully restored until after 1xRTT of the protection path.

This operation mode is referred to as "locking" because the sequence of processing the PSC messages includes periods where the protection path is locked from carrying protected traffic, while the two end-points verify that both are ready to process the W1 traffic that is received on P. More detailed information on this mode of operation will be supplied later in the document when considering different scenarios.

1.3. Non-Locking

In non-locking protection operation mode, LER-A switches data traffic onto P immediately upon failure detection. This minimizes traffic loss, but at the cost of temporary asymmetry of packet flow. At a high level, it looks like this:

LER-A detects the failure of W1 and stops sending traffic on W1.
LER-A immediately begins to transport W1's data traffic over the protection path P.
Simultaneously LER-A transmitts a PSC message to LER-Z indicating that W1 has failed and is currently being protected in P.
LER-Z receives the PSC message from LER-A, switches all W1 data traffic to P, and transmits a PSC message to LER-A indicating that W1 is now protected in P.
LER-A receives the PSC message from LER-Z and needs to take no action, as the protection switch had already been completed.

In the non-locking case, the packet loss between the endpoints is minimized. Packet loss in the A->Z direction is only the failure detection time , which is assumed, for this document, to be negligible. Packet loss in the Z->A direction is almost entirely the result of the one-way propagation delay of the PSC message from LER-A to LER-Z. Assuming the transport path from A->Z has the same delay as that from Z->A, it can be said that the packet loss in the non-locking case is roughly half that of the locking case.

1.4. Path priority

As the 1:n architecture requires the ability for one working path to preempt the traffic of another in the event of multiple failures (see Section 1.5), there must be an indication of priority between the different working paths so that an implementation can decide whether a new failure should be allowed to preempt a protection switch already in place. This priority is purely a local decision, i.e., determined by configuration at both endpoints of the protection domain. It is also possible to assign the same priority to multiple working paths, thus creating a "first come first served" preemption policy. This document provides no means to signal the priority of a given working path, nor a means to detect priority mismatches or misconfigurations. Thus, ensuring that the priorities of all working LSPs in a protection domain is a matter for the operator. Any mismatch or misconfiguration will likely result in unexpected protection behavior.

1.5. Preemption

Preemption occurs, for example, when the protection path is being used to transport traffic and is then required to transport traffic for a working path with higher priority. At this point, the current traffic that is being transported on the protection path needs to be interrupted to allow the transport of the protected traffic.

There are two basic scenarios for preemption of traffic –

When the protection path is used to transport "extra traffic". While this practice is discouraged by [TPReq], it is still not precluded. When the protection domain triggers a protection switch, the extra traffic should be preempted to allow the transport of the protected traffic from the working path that triggered the switching operation. The subsequent treatment of the interrupted service is out of the scope of this document.
When the protection path is transporting traffic from a working path and a second working path triggers a switching condition. This second trigger may either be a trigger with a higher priority (e.g. FS after a SF) or because the operator had assigned a higher priority to the working path of the second trigger. At this point, the traffic for the lower priority working path will be interrupted, and the higher priority traffic will be transmitted on the protection path. The preempted traffic will only renew transmission, when either the working path recovers, or the higher priority traffic relinquishes control of the protection path.

1.6. Contributing authors

Nurit Sprecher (NSN)

2. Conventions used in this document

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].

2.1. Acronyms

This draft uses the following acronyms:


Ack	Acknowledge
DNR	Do not revert
FS	Forced Switch
LER	Label Edge Router
LO	Lockout of protection
MPLS-TP	Transport Profile for MPLS
MS	Manual Switch
NR	No Request
P2P	Point-to-point
P2MP	Point-to-multipoint
PSC	Protection State Coordination Protocol
SD	Signal Degrade
SF	Signal Fail
WFA	Wait for Acknowledge
WTR	Wait-to-Restore

2.2. Definitions and Terminology

The terminology used in this document is based on the terminology defined in [RFC4427] and further adapted for MPLS-TP in [SurvivFwk]. In addition, we use the term LER to refer to a MPLS Network Element, whether it is a LSR, LER, T-PE, or S-PE.

3. Use cases and scenarios

This section will present some use-cases and scenarios that should illucidate the use of PSC for 1:n protection.

3.1. Non-locking use case: Per-node label space

Non-locking protection can be used when the payload that is received from the protection path is unambiguous and can be properly forwarded without the need to explicitly establish selector and bridge configuration at the time of failure. One example where this applies is when the endpoints of the protection domain are using per-platform label space [RFC3031].

In per-node or per-platform label space, the LIB is established on a node such that it can properly switch any labeled packet regardless of input interface.

Consider, as an example, the protection topology as shown in Figure 1 with four working paths – W1, W2, W3, W4 and a single protection path, P, that connect between LER-A and LER-Z. Each packet that transported from LER-A to LER-Z is labelled by LER-A depending upon the path that it is being transmitted over. From there the packet will traverse the relevant path and have its label manipulated by the intermediate LSRs until it arrives at LER-Z, at which point, the LER will pop the label for the path used within the protection domain and process the next label down to determine how to forward the packet payload. The following table gives the label assigned by LER-A and the one expected by LER-Z for each of the transport paths:

Path	Label at LER-A	Label for LER-Z
W1	100	105
W2	200	205
W3	300	305
W4	400	405
P	500	505

If there is a pseudowire (PW) that needs to be carried over one of these transport paths between LER-A and LER-Z, whose label is allocated from the per-platform label space on both LER-A and LER-Z (e.g. label 888), then when a packet for this PW is transported over W2, the label stack that will be sent from LER-A will be [200|888|..] and it will arrive at LER-Z with a label stack [205|888|..]. If W2 were to report a failure that triggers a protection switch and LER-A would redirect a packet for this PW to P, it would be transported with a label stack of [500|888|..] and be received by LER-Z with a label stack [505|888|..]. Since the PW label is drawn from per-node label space, when LER-Z pops the path label it will be able to process the PW label regardless of the transport path that was used between LER-A & LER-Z.

Since the forwarding behavior is preestablished, there is no need to ensure that LER-A and LER-Z coordinate the bridge/selector functions as part of the protection protocol. This is true for any underlying label assigned from per-node space. The label can be allocated by LDP, MPLS VPNs, PWs, TE tunnels, or any other application. As long as the label is preprogrammed in the receiving node's label space, coordination of the bridge/selection functions is unnecessary.

3.2. Locking use-case:

Locking protection must be used when the payload that is received on the protection path is ambiguous; that is, the switching behavior for the payload of the protection path must be established at the time of failure. One such example where this applies is when the endpoints of the protection domain are using per-interface label space, where the Working and Protect LSPs are instantiated as interfaces.

In per-interface label space, a node may use the same label value to represent different switching behaviors on different interfaces. For example, the label value 100 when received on LSP W1 may be treated differently than the label value 100 when received on LSP W2. Since either W1 or W2 may be protected in P, LSP P must ensure that it has the proper forwarding behavior defined for label 100. Using the wrong forwarding behavior (e.g. programming P's label space with W1's entry for label 100 when P is protecting W2) is likely to lead to misconnectivity.

Consider, as an example, the protection topology as shown in Figure 1 and in Section 3.1. There are four working paths - W1, W2, W3, W4 - and a single protection path, P, that connect between LER-A and LER-Z. Section 3.1 shows a table with the receive labels [105, 205, 305, 405, 505] at LER-Z, and those do not change. What changes is the payload of those labels. Section 3.1 gives the example of a PW drawn from global label space which uses the label 888 - this label is treated to the same forwarding behavior no matter which LSP is used to carry it from LER-A to LER-Z.

In per-interface label space, each W-LSP has its own label space. For this example, consider a PW switched over W1 with the outgoing label 900. Thus, the label stack when leaving LER-A is [100|900] and when arriving at LER-Z is [105|900]. There is also a PW defined over W2 which also uses label 900, but with a different forwarding behavior. The per-interface label switching tables on LER-Z look like this:

Input Interface	Label	Switching behavior
W1	900	Switch to Access Circuit #1
W2	900	Switch to Access Circuit #2
W3	900	Switch to Access Circuit #3
W4	900	Switch to Access Circuit #4
P	900	none defined (drop, log error)

The label space for P is established at the time of failure, using PSC. When there is no failure, there is no switching behavior defined for the P LSP's contents.

When the protection domain has determined that W2 has failed and needs to be switched, it coordinates this protection, using PSC, between LER-A and LER-Z. Part of the coordination is to establish the proper receive behavior on LER-Z, i.e. the Switching behavior on the input interface for Label 900 to be "Switch to Access Circuit #2". Whereas, if W1 fails and preempts W2, the switching behavior on LER-Z is changed be "Switch to Access Circuit #1".

Clearly it is imperative that there be no misconnectivity. This requirement means that there must be a "lock" on P established, such that there are no packets transmitted on an LSP until both ends agree on the switching behavior for that LSP. The details of the behavior in the locking use cases is explored further in Section 3.3. of this document.

3.3. PSC Scenarios

This section discusses the message exchange necessary to perform both non-locking and locking PSC options for 1:n protection. There are several examples presented here that attempt to cover all the combinations of failure and preemption, unidirectional and bidirectional protection for the two modes of operation. It should be noted that this is a non-exhaustive set of scenarios, but were chosen to highlight the main features of the proposal.

It is not the intent of this document to spell out all the combinations of preemption, directionality and locking behavior which can occur. That is not how one builds a robust protocol. This document spells out a state machine which reacts appropriately in all possible cases, and as part of that walks through some of the failure cases as examples. PSC is, at its heart, a simple protocol. A node is aware of both its local status and the status of the remote node, and transitions to the appropriate state and takes appropriate action based on the combination of these two states. Preemption, which as noted is only relevant in 1:n, does not increase the complexity of the protocol. The examples are detailed, but the behavior is quite simple.

All of these examples assume a protection domain consisting of four working paths [W1, W2, W3, W4] with priority in decreasing order, i.e. W1 > W2 etc. There is a single protection path, P. These examples use the notation "B = x" to indicate the protect LSP whose contents are bridged into the protect LSP. For example, if W3 has failed and is currently protected, B = 3. If no protection is in place, B = n/a. All examples end with the REQ(FPath, Path) and B values for each node in each example.

The non-locking cases assume that both LER-A and LER-Z have preestablished per-node label spaces, as per the use case above.

All cases assume that the time required to perform on-box operations such as bridging or selecting is instantaneous. The one-way delay between nodes is abbreviated OWD, and the round trip time is RTT (i.e. RTT = 2 x OWD).

3.3.1. Unidirectional failure cases

The examples in this section provide the message flow between LER-A and LER-Z for the scenario where a unidirectional fault is detected by LER-A on working path W1. The message flow is described as a sequence along a timeline.

3.3.1.1. Non-locking

Considering the scenario of a protection domain operating in non-locking mode the following is the event timeline:

  +--------------------------------------------------------------------+
  |Time|              Event  Description           |LER-A PSC|LER-Z PSC|
  |    |                                           |  Bridge |  Bridge |
  +----+-------------------------------------------+---------+---------+
  | t0 | Traffic is being transported on W1, P is  | NR(0,0) | NR(0,0) |
  |    | not carrying any traffic.  Both LER-A and | B = n/a | B = n/a |
  |    | LER-Z transmitting PSC NR(0,0) message.   |         |         |
  +----+-------------------------------------------+---------+---------+
  | t1 | LER-A detects SF on W1, bridges W1 into P | SF(1,1) | NR(0,0) |
  |    | and sends SF(1,1). LER-A enters into WFA  |  B = 1  | B = n/a |
  |    | (Waiting for Acknowledgement) state. LER-A|         |         |
  |    | still selects the traffic from W1. This is|         |         |
  |    | admittedly of not much use when LER-A sees|         |         |
  |    | SF, may be useful when LER-A encounters a |         |         |
  |    | partial failure such as SD.               |         |         |
  +----+-------------------------------------------+---------+---------+
  | t2 | LER-Z receives SF(1,1).  LER-Z enters     | SF(1,1) | NR(0,1) |
  |    | PF:W:R state.  LER-Z switches W1 onto P   |  B = 1  |  B = 1  |
  |    | and sends SF(1,1).  At this point traffic |         |         |
  |    | for W1 is protected in both directions    |         |         |
  +----+-------------------------------------------+---------+---------+
  | t3 | LER-A receives SF(1,1), which it takes as | SF(1,1) | NR(0,1) |
  |    | an ACK from LER-Z.  LER-A transits from   |  B = 1  |  B = 1  |
  |    | WFA to PF:W:L state.  Switch is complete. |         |         |
  +----+-------------------------------------------+---------+---------+

Note: Between t1 and t2, LER-A transports the data traffic on P while LER-Z continues transporting it on W1, and there is temporary path asymmetry. After t2, the data traffic is in P in both directions.

In this case, LER-A loses traffic for the OWD time, as it does not receive any traffic from LER-Z on P until LER-Z bridges W1 into P. LER-Z does not lose any traffic due to the immediate bridging on LER-A.

3.3.1.2. Locking

When examining the similar scenario for a protection domain that is using the Locking mode of operation, we have the following time sequence:

  +--------------------------------------------------------------------+
  |Time|              Event  Description           |LER-A PSC|LER-Z PSC|
  |    |                                           |  Bridge |  Bridge |
  +----+-------------------------------------------+---------+---------+
  | t0 | Traffic is being transported on W1, P is  | NR(0,0) | NR(0,0) |
  |    | not carrying any traffic.  Both LER-A and | B = n/a | B = n/a |
  |    | LER-Z transmitting PSC NR(0,0) message.   |         |         |
  +----+-------------------------------------------+---------+---------+
  | t1 | LER-A detects SF on W1, LER-A enters into | SF(1,0) | NR(0,0) |
  |    | WFA  state and sends SF(1,1).  LER-A still| B = n/a  | B = n/a |
  |    | transports and  selects the traffic from  |         |         |
  |    | W1. This allows traffic to get through if |         |         |
  |    | the failure is truly unidirectional.      |         |         |
  +----+-------------------------------------------+---------+---------+
  | t2 | LER-Z receives SF(1,0).  LER-Z enters     | SF(1,0) | NR(0,1) |
  |    | PF:W:R state.  LER-Z bridges W1 into P and|  B = 1  |  B = 1  |
  |    | sends NR(0,1) but continues to select     |         |         |
  |    | traffic from W1                           |         |         |
  +----+-------------------------------------------+---------+---------+
  | t3 | LER-A receives NR(0,1), which it takes as | SF(1,1) | NR(0,1) |
  |    | an ACK from LER-Z.  LER-A completely      |  B = 1  |  B = 1  |
  |    | switches W1 traffic onto P. LER-A transits|         |         |
  |    | from WFA to PF:W:L state.  Switch complete|         |         |
  +----+-------------------------------------------+---------+---------+
  | t4 | LER-Z receives SF(1,1).  LER-Z selects W1 | SF(1,1) | NR(0,1) |
  |    | traffic from P and sends NR(0,1)          |  B = 1  |  B = 1  |
  +----+-------------------------------------------+---------+---------+

Note: At t1, LER-A stops sending traffic to LER-Z. At t3, it resumes. Since the majority of the time delay at both t1 and t2 is the one-way transmission delay between LER-A and LER-Z, there is a total of 1xRTT traffic loss at both endpoints.

3.3.2. Bidirectional fault scenarios

The examples above focused on unidirectional failures in order to illustrate the basic principles of 1:n protection. However, most failures in carrier networks are bidirectional in nature. Bidirectionality includes not only the failure of both the tx and rx physical path (e.g. a fiber cut) but also a unidirectional failure made bidirectional by mechanisms outside of PSC such as CC-V or LDI.

Both ends of a protection domain may not see the bidirectional failure at the same instant. In the case of a true bidirectional fiber cut, the cut may be physically closer to one end of the domain than the other, and thus the end which is farther away takes longer to notice the failure. This is referred to as "asymmetric notification delay" in this document. Similarly, a unidirectional failure seen by one endpoint which triggers an LDI notification to the far endpoint will not be recognized by this far end until after ir has been noticed it at the near endpoint.

There are a number of scenarios that constitute bidirectional failure, and the variety of triggers and notification delays mean that it is impossible to document them all here. The scenario used in this case is of a true bidirectional failure, on working path W1, with asymmetric notification delay, as described above. Both the case of Non-locking and Locking operation modes are presented.

It is perhaps important to understand that a node, when reacting to a failure, simply reacts either to its local LSP status (e.g. SF on the underlying fiber) or the status of the remote node (e.g. the remote node sending SF(x,y)). A node neither knows nor cares whether the failure is bidirectional; it simply reacts to inputs to its local state machine. It can easily be observed that there are no special states needed for unidirectional vs. bidirectional error handling.

3.3.2.1. Non-Locking

First we present the scenario when operating in non-locking mode:

  +--------------------------------------------------------------------+
  |Time|              Event  Description           |LER-A PSC|LER-Z PSC|
  |    |                                           |  Bridge |  Bridge |
  +----+-------------------------------------------+---------+---------+
  | t0 | Traffic is being transported on W1, P is  | NR(0,0) | NR(0,0) |
  |    | not carrying any traffic.  Both LER-A and | B = n/a | B = n/a |
  |    | LER-Z transmitting PSC NR(0,0) message.   |         |         |
  +----+-------------------------------------------+---------+---------+
  | t1 | LER-A detects SF on W1, bridges W1 into P | SF(1,1) | NR(0,0) |
  |    | and sends SF(1,1).  LER-A enters into WFA |  B = 1  | B = n/a |
  |    | state and continues to select the traffic |         |         |
  |    | from W1.                                  |         |         |
  +----+-------------------------------------------+---------+---------+
  | t2 | LER-Z detects the SF on W1.  LER-Z enters | SF(1,1) | SF(1,1) |
  |    | WFA state and bridges W1 into P and       |  B = 1  |  B = 1  |
  |    | transmitting SF(1,1). At this point       |         |         |
  |    | traffic for W1 is protected in both       |         |         |
  |    | directions, however the endpoints are     |         |         |
  |    | still not coordinated                     |         |         |
  +----+-------------------------------------------+---------+---------+
  | t3 | LER-Z receives the SF(1,1) from LER-A and | SF(1,1) | SF(1,1) |
  |    | considers it an Ack and transits from WFA |  B = 1  |  B = 1  |
  |    | to PF:W:L state                           |         |         |
  +----+-------------------------------------------+---------+---------+
  | t4 | LER-A receives SF(1,1), which it takes as | SF(1,1) | SF(1,1) |
  |    | an Ack from LER-Z and transits from WFA   |  B = 1  |  B = 1  |
  |    | to PF:W:L state.  Switch is complete.     |         |         |
  +----+-------------------------------------------+---------+---------+

It is perhaps instructive to note that the only differences between the unidirectional non-locking and bidirectional non-locking scenarios are the trigger at t2 which causes Z to send SF(1,1) and the state Z finally enters (PF:W:L rather than PF:W:R). All other actions before and after this point are identical between the two cases.

3.3.2.2. Locking

We now follow the scenario for the locking mode of operation:

  +--------------------------------------------------------------------+
  |Time|              Event  Description           |LER-A PSC|LER-Z PSC|
  |    |                                           |  Bridge |  Bridge |
  +----+-------------------------------------------+---------+---------+
  | t0 | Traffic is being transported on W1, P is  | NR(0,0) | NR(0,0) |
  |    | not carrying any traffic.  Both LER-A and | B = n/a | B = n/a |
  |    | LER-Z transmitting PSC NR(0,0) message.   |         |         |
  +----+-------------------------------------------+---------+---------+
  | t1 | LER-A detects SF on W1 and sends SF(1,0). | SF(1,0) | NR(0,0) |
  |    | LER-A enters into WFA continues to bridge | B = n/a | B = n/a |
  |    | and select the traffic from W1.  This     |         |         |
  |    | allows traffic to get through if the      |         |         |
  |    | failure is really unidirectional.         |         |         |
  +----+-------------------------------------------+---------+---------+
  | t2 | LER-Z detects the SF on W1.  LER-Z enters | SF(1,0) | SF(1,0) |
  |    | WFA state and continues to bridge and     | B = n/a | B = n/a |
  |    | select traffic from W1 while transmitting |         |         |
  |    | SF(1,0).                                  |         |         |
  +----+-------------------------------------------+---------+---------+
  | t3 | LER-Z receives the SF(1,0) from LER-A and | SF(1,0) | SF(1,1) |
  |    | bridges traffic from W1 to P remaining in | B = n/a |  B = 1  |
  |    | WFA state now transmitting a SF(1,1)      |         |         |
  +----+-------------------------------------------+---------+---------+
  | t4 | LER-A receives the SF(1,0) from LER-Z and | SF(1,1) | SF(1,1) |
  |    | bridges traffic from W1 to P remaining in |  B = 1  |  B = 1  |
  |    | WFA state now transmitting a SF(1,1)      |         |         |
  +----+-------------------------------------------+---------+---------+
  | t5 | LER-A receives the SF(1,1) from LER-Z and | SF(1,1) | SF(1,1) |
  |    | considers it an Ack and transits from WFA |  B = 1  |  B = 1  |
  |    | to PF:W:L state                           |         |         |
  +----+-------------------------------------------+---------+---------+
  | t6 | LER-Z receives SF(1,1), which it takes as | SF(1,1) | SF(1,1) |
  |    | an Ack from LER-A and transits from WFA   |  B = 1  |  B = 1  |
  |    | to PF:W:L state.  Switch is complete.     |         |         |
  +----+-------------------------------------------+---------+---------+

As with non-locking, the major difference between the unidirectional and bidirectional scenarios of this failure are the alarm which causes LER-Z to take action and the final state LER-Z enters as a result.

3.3.3. Preemption scenarios

In addition to a bidirectional failure, it is also necessary to consider preemption. When protecting n entities e.g [W1, W2, W3] it is possible for multiple working LSPs to simultaneously fail. Consider the case where LSP W1 fails and starts to use the protection LSP. After this failure, LSP W2 fails before W1 has been restored. If W2 is of a lower relative priority than W1, there is no preemption. However, if W2 has a higher priority than W1, when W2 fails it preempts W1 from the protection LSP. Preemption is not an issue in 1:1 or 1+1, as with only a single working LSP there's nothing to preempt.

There are multiple scenarios of preemption depending on where the failures were detected. In addition to the combinations of failure directionality and preemption, it is also necessary to consider how these combinations behave in both the locking and non-locking modes of operation.

First consider, the two flavors of preemption due to multiple unidirectional failures.

The difference between Locking and Non-Locking is that in Non-Locking a node can continue to send traffic on the P-LSP during the preemption process. The P-LSP contents may momentarily disagree (A may send W1 on P, Z may send W2 on P) but in the non-locking case there is no risk of misconnectivity as explained in the previous discussion. For this reason, the identity of the path that the endpoints are selecting incoming traffic from are irrelevant. In a sense there is no selector; each node is able to properly process arbitrary data on the P-LSP.

However, WFA state is still necessary in order to ensure that the endpoints converge on the identity of the working path whose traffic is being transported on the P-LSP. Failure to converge is a problem that should be flagged to the operator.

The scenarios start after the two endpoints have converged on protecting a unidirectional SF condition that was detected on W2, when a new SF condition is detected on W1 (with higher priority):

3.3.3.1. Unidirectional non-locking

First, consider the event sequence for unidirectional faults in a domain in non-locking mode:

  +--------------------------------------------------------------------+
  |Time|              Event  Description           |LER-A PSC|LER-Z PSC|
  |    |                                           |  Bridge |  Bridge |
  +----+-------------------------------------------+---------+---------+
  | t0 | Traffic from W2 is being transported on P | SF(2,2) | NR(0,2) |
  |    | and both endpoints are coordinated        |  B = 2  |  B = 2  |
  +----+-------------------------------------------+---------+---------+
  | t1 | LER-A detects SF on W1 and sends SF(1,1). | SF(1,1) | NR(0,2) |
  |    | LER-A enters into WFA, blocks the W2      |  B = 1  |  B = 2  |
  |    | traffic and begins transporting W1 traffic|         |         |
  |    | on P. (Since W1 has higher priority)      |         |         |
  +----+-------------------------------------------+---------+---------+
  | t2 | LER-Z receives the SF(1,1) from LER-A and | SF(1,1) | NR(0,1) |
  |    | bridges traffic from W1 to P remaining in |  B = 1  |  B = 1  |
  |    | PF:W:R now transmitting a NR(0,1)         |         |         |
  +----+-------------------------------------------+---------+---------+
  | t3 | LER-A receives the NR(0,1) from LER-Z and | SF(1,1) | NR(0,1) |
  |    | considers it an Ack and transits from WFA |  B = 1  |  B = 1  |
  |    | to PF:W:L state.  Coordination complete   |         |         |
  +----+-------------------------------------------+---------+---------+

As mentioned, in steady state LER-A is sending SF(2,2) and LER-Z is sending NR(0,2). If LER-A detects an SF on W1, W1 must preempt W2 in its use of the protection LSP. What the network subsequently does with W2 is outside the scope of PSC, but likely recovery actions may include rerouting W2, alerting W2's clients as to the unprotected failure status of W2, and so forth.

3.3.3.2. Unidirectional locking

In locking operation mode, when A detects an SF on W1, it needs to alert the far-end, LER-Z, that the W2 traffic must be preempted. LER-A does this by indicating an SF on the higher priority LSP and by emptying the protection LSP. The following table presents the sequence for this scenario (we include the indication of the working path that is expected by each endpoint to be on the protection path, shown as "S = n")

  +--------------------------------------------------------------------+
  |Time|              Event  Description           |LER-A PSC|LER-Z PSC|
  |    |                                           |  Bridge |  Bridge |
  |    |                                           |Selector | Selector|
  +----+-------------------------------------------+---------+---------+
  | t0 | Traffic from W2 is being transported on P | SF(2,2) | NR(0,2) |
  |    | and both endpoints are coordinated        |  B = 2  |  B = 2  |
  |    |                                           |  S = 2  |  S = 2  |
  +----+-------------------------------------------+---------+---------+
  | t1 | LER-A detects SF on W1 and sends SF(1,0). | SF(1,0) | NR(0,2) |
  |    | LER-A enters into WFA blocks all traffic  | B = n/a |  B = 2  |
  |    | on the protection path                    | S = n/a |  S = 2  |
  +----+-------------------------------------------+---------+---------+
  | t2 | LER-Z receives the SF(1,0) from LER-A and | SF(1,0) | NR(0,1) |
  |    | bridges traffic from W1 to P (higher      | B = n/a |  B = 1  |
  |    | priority), and begins transmitting NR(0,1)| S = n/a |  S = 2  |
  |    | At this point W1 traffic is flowing Z->A  |         |         |
  |    | but not A->Z                              |         |         |
  +----+-------------------------------------------+---------+---------+
  | t3 | LER-A receives NR(0,1) from LER-Z and     | SF(1,1) | NR(0,1) |
  |    | considers it an Ack and transits from WFA |  B = 1  |  B = 1  |
  |    | to PF:W:L state and transmits SF(1,1)     |  S = 1  |  S = 2  |
  +----+-------------------------------------------+---------+---------+
  | t4 | LER-Z receives SF(1,1), and begins        | SF(1,1) | NR(0,1) |
  |    | selecting the protected traffic as W1 data|  B = 1  |  B = 1  |
  |    | Switch is complete.                       |  S = 1  |  S = 1  |
  +----+-------------------------------------------+---------+---------+

Traffic loss is asymmetric. Loss A->Z starts at t1 and ends at t4, roughly 1.5xRTT. Loss Z->A starts at t1 and ends at t3, roughly 0.5xRTT.

3.3.3.3. Bidirectional non-locking

Looking, similarly, at the implications of preemption on the basic scenarios of bidirectional faults in multiple working paths. Both of the operating modes, i.e. non-locking and locking, are presented. The scenarios begin at the point where W2 traffic is being transported on the protection path in a coordinated fashion, when a SF is detected by both endpoints of the 1:n protection domain. W1 traffic has a higher priority than that of W2 traffic and, therefore, will preempt the current protected traffic.

The following presents the scenario in non-locking operation:

  +--------------------------------------------------------------------+
  |Time|              Event  Description           |LER-A PSC|LER-Z PSC|
  |    |                                           |  Bridge |  Bridge |
  +----+-------------------------------------------+---------+---------+
  | t0 | Traffic from W2 is being transported on P | SF(2,2) | NR(0,2) |
  |    | and both endpoints are coordinated        |  B = 2  |  B = 2  |
  +----+-------------------------------------------+---------+---------+
  | t1 | LER-A detects SF on W1, bridges W1 into P | SF(1,1) | NR(0,2) |
  |    | and sends SF(1,1).  LER-A enters into WFA |  B = 1  |  B = 2  |
  |    | state and continues to select the         |         |         |
  |    | protected traffic from P that is for W2.  |         |         |
  +----+-------------------------------------------+---------+---------+
  | t2 | LER-Z detects the SF on W1.  LER-Z enters | SF(1,1) | SF(1,1) |
  |    | WFA state and bridges W1 into P and       |  B = 1  |  B = 1  |
  |    | transmitting SF(1,1). At this point       |         |         |
  |    | traffic for W1 is protected in both       |         |         |
  |    | directions, however the endpoints are     |         |         |
  |    | still not coordinated                     |         |         |
  +----+-------------------------------------------+---------+---------+
  | t3 | LER-Z receives the SF(1,1) from LER-A and | SF(1,1) | SF(1,1) |
  |    | considers it an Ack and transits from WFA |  B = 1  |  B = 1  |
  |    | to PF:W:L state                           |         |         |
  +----+-------------------------------------------+---------+---------+
  | t4 | LER-A receives SF(1,1), which it takes as | SF(1,1) | SF(1,1) |
  |    | an Ack from LER-Z and transits from WFA   |  B = 1  |  B = 1  |
  |    | to PF:W:L state.  Switch is complete.     |         |         |
  +----+-------------------------------------------+---------+---------+

3.3.3.4. Bidirectional locking

When considering the locking mode of operation, we must consider that the protection path, P, must be cleared of all traffic during the transition of traffic caused by preemption. The bidirectional case will be similar to the scenario for a unidirectional fault with the major difference being the final state of the two endpoints. The following would be the sequence of events:

  +--------------------------------------------------------------------+
  |Time|              Event  Description           |LER-A PSC|LER-Z PSC|
  |    |                                           |  Bridge |  Bridge |
  |    |                                           |Selector | Selector|
  +----+-------------------------------------------+---------+---------+
  | t0 | Traffic from W2 is being transported on P | SF(2,2) | NR(0,2) |
  |    | and both endpoints are coordinated        |  B = 2  |  B = 2  |
  |    |                                           |  S = 2  |  S = 2  |
  +----+-------------------------------------------+---------+---------+
  | t1 | LER-A detects SF on W1 and sends SF(1,0). | SF(1,0) | NR(0,2) |
  |    | LER-A enters into WFA blocks all traffic  | B = n/a |  B = 2  |
  |    | on the protection path                    | S = n/a |  S = 2  |
  +----+-------------------------------------------+---------+---------+
  | t2 | LER-Z detects the SF on W1.  LER-Z enters | SF(1,0) | SF(1,0) |
  |    | WFA state and blocks all traffic on the   | B = n/a | B = n/a |
  |    | protection path while transmitting SF(1,0)| S = n/a | S = n/a |
  +----+-------------------------------------------+---------+---------+
  | t3 | LER-Z receives the SF(1,0) from LER-A and | SF(1,0) | SF(1,1) |
  |    | bridges traffic from W1 to P (higher      | B = n/a |  B = 1  |
  |    | priority)  At this point W1 traffic is    | S = n/a | S = n/a |
  |    | flowing Z->A but not A->Z                 |         |         |
  +----+-------------------------------------------+---------+---------+
  | t4 | LER-A receives NR(0,1) from LER-Z and     | SF(1,1) | SF(1,1) |
  |    | considers it an Ack and transits from WFA |  B = 1  |  B = 1  |
  |    | to PF:W:L state                           |  S = 1  | S = n/a |
  +----+-------------------------------------------+---------+---------+
  | t5 | LER-Z receives SF(1,1), and begins        | SF(1,1) | SF(1,1) |
  |    | selecting the protected traffic as W1 data|  B = 1  |  B = 1  |
  |    | Switch is complete.                       |  S = 1  |  S = 1  |
  +----+-------------------------------------------+---------+---------+

4. Changes to PSC

The Protection State Coordination protocol (PSC) is defined in [LinProt]. This includes both the format of the G-ACh based message as well as a description of the operations and the state transition logic of the protocol. The extension to cover 1:n protection includes changes to both aspects of PSC.

The changes to the message structure, include both the addition of new information and extension of the semantics of some of the existing fields of the message. These changes will be described in Section 4.2.

The changes relative to the behavior of the base PSC protocol will be described in Section 4.3.

4.1. PSC

Base PSC (as defined in [LinProt] is a single-phased protocol, i.e. the endpoints perform protection switching without waiting for acknowledgement from the far end LER. The protocol messages are transmitted using the G-ACh and the format is described in Figure 10.

     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |0 0 0 1|Version|  Reserved     |       PSC-CT = 0x0024         |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |Ver|Request|PT |R|  Reserved1  |     FPath     |     Path      |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |         TLV Length            |         Reserved2             |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    ~                         Optional TLVs                         ~
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

In regards to the G-ACh Header no changes are suggested in the extensions for 1:n protection, i.e., the channel type field will continue to use the PSC-CT value defined in [LinProt]. The fields from the PSC payload which are affected by this document are the Ver field, the Reserved1 field, and the Fpath and Path fields.

4.2. Changes to PSC Payload

In order to support 1:n protection there is a need to make one small change to the format of the PSC payload (see Figure 11). In particular, we have added a new flag (L), taken from the Reserved1 space, to whether the protection domain is locking or non-locking. In addition, the semantics of the FPath and Path field are adjusted to indicate an index of the multiple working paths. The details of these changes are supplied in the following subsections.

Due to the significance of these changes, the value of the Ver field (in the PSC payload) for 1:n protection domain MUST be set to 2.

     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |Ver|Request|PT |R|L| Reserved1 |     FPath     |     Path      |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |         TLV Length            |         Reserved2             |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    ~                         Optional TLVs                         ~
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

4.2.1. Locking (L) flag

The Locking flag is used to indicate that the end-point is configured for Locking mode (see Section 1.2).

If the value is 1 then the protection-domain is using the locking mode

The Locking flag must be the same on both ends; if the two endpoints of a protection domain have different L-flag settings, this MUST raise an error to the network operator

4.2.2. Fault path (FPath) field

The Fpath field indicates which path is identified to be in a fault condition or affected by an administrative command. The following are the possible values:

0: indicates that the anomaly condition is on the protection path
1-128: indicates that the anomaly condition is on a working path whose index is indicated.
129-255: for future extensions or experimental use.

4.2.3. Data path (Path) field

The Path field indicates which data is being transmitted on the protection path. Under normal conditions, the protection path does not need to carry any user data traffic, but may carry extra traffic. If there is a failure/degrade condition on one of the working paths, then that working path's data traffic will be transmitted over the protection path. The following are the possible values:

0: indicates that the protection path is not transporting user data traffic.
1-128: indicates that the protection path is transmitting user traffic replacing the use of the working path indexed.
129-255: for future extensions or experimental use.

4.3. Changes to PSC Operation

In all of the following subsections, assume a protection domain between LER-A and LER-Z, using working paths 1-N and the protection path as shown in figure 1.

A basic premise of this protection architecture is that both endpoints of the protection domain are configured to associate the indices of the working paths with the proper LSP identifiers. If this condition is not met then the protection scheme will cause inconsistencies in traffic transmission.

4.3.1. Basic operation

Protection of the N working paths is based on the operational principles outlined in [LinProt] and will employ the same basic Protection State Coordination Protocol (PSC) outlined in that document. However, as can be expected, due to certain basic differences in the architecture of the protection domain, a small set of differences in operation are necessary. The following sub-sections will highlight these differences and explain their effects on the PSC state machine.

4.3.2. Two-phased operation

PSC, as presented in [LinProt] is a single-phased protocol. This means that when an endpoint receives a trigger to perform a protection switch, the LER switches traffic and then notifies the far end of the switch, without waiting for acknowledgement. When addressing the situation in a 1:n protection domain, the endpoint that receives the trigger must first verify that the protection path is available to transmit the protected traffic. This may involve interrupting the traffic that is currently being transmitted on the protection path by both endpoints.

In general, after the LER has detected a trigger for protection switching, e.g. a FS operator command, or a SF indication for one of the working paths, the LER SHALL transmit the appropriate PSC message as described in [LinProt] with the following changes:

If the protection domain is currently in either Protecting administrative or Protecting failure state, then the endpoint SHALL verify that the new trigger has a higher priority than the currently protected traffic. If the new trigger has a lower priority then it MUST be ignored.
The PSC message SHALL set the FPath value to the index of the working path that generated the trigger. The Path value SHOULD be set to 0, unless the protection path was previously transporting traffic from another working path (as indicated by the value of the Path field.)
If the protection path is currently transporting protected traffic and the protection domain is operating in locking mode, then the endpoint SHALL block all traffic of the protected working path.
The endpoint SHALL transit to WFA state (see below).
Upon reception of the switching PSC message, the far end LER SHALL verify that the received request is of higher priority than the known current traffic on the protection path, and if so SHALL interrupt the current traffic on the protection path, perform the switch to the requested protected traffic, and send a PSC message with the Path field set to the index of the current protected working path.
Upon reception of the PSC message, the initiating LER SHALL verify that the Path field is set to the index of the working path of the highest priority. If the Path field matches the highest priority path the LER SHALL perform the protection switch and transmit the appropriate PSC message, with the FPath field indicating the index of the working path that triggered the protection switch and the Path field set to the index of the working path whose traffic is being transported on the protection path.

4.3.3. Acknowledge message

As stated above, before performing a protection switch the endpoint that detected a switching trigger MUST wait for an Acknowledge message prior to performing the switch. There are two types of message that will be considered as an Acknowledge message:

A reply message with the Request field reflecting the state of the far end, and the Path field set to the index of the working path that triggered the switching condition. For example, if there is a Forced Switch command detected by LER-Z on working path W4, then LER-Z will have sent an FS(4,0) message to LER-A. Then when LER-Z receives a message such as NR(0,4)Ack this should be considered acknowledgement of the switching and that the protection path is available to switch the traffic from working path W4.
A remote message with the same Request field and FPath field as that transmitted by the LER in the WFA state. For example, if there is a bidirectional Signal fault detected by LER-A on working path W4, then LER-A will enter WFA state and transmit a SF(4,0) message. When it receives the SF(4,0) message from LER-Z, that has also detected the SF condition, it should be considered an acknowledgement of the switching and that the protection path is available to switch the traffic from working path W2.

4.3.4. Wait for Acknowledge (WFA) timer

The protection system MUST include a timer called the Wait for Acknowledge (WFA) timer that SHALL be started when the LER enters WFA state and reset when the Acknowledge message is received. The length of the WFA timer SHOULD be configured to allow protection switching within the normal time constraints. The WFA timer will expire only if no Acknowledge message was recieved by the LER in WFA state. The WFA Expires local input should have a priority just below that of the WTRExpires signal.

4.3.5. Additional PSC State

As described above and demonstrated in the scenarios in Section 3.3, there is a need, in some scenarios, for the endpoint that is reporting on a trigger for protection-switching to delay the actual switchover until an acknowledge is received from the far end LER. In order to facilitate this wait period it is necessary to define a new PSC State - Wait for Acknowledge (WFA) state. WFA is used in both the Locking and Non-Locking cases. It is more essential to the Locking mode of operation, as agreement is the mechanism to establish and release the lock on the protection LSP. However, it is necessary for the Non-Locking mode as a persistent disagreement on the contents of the protection LSP indicates an error in the network devices and WFA is the method used to detect this error.

In the locking mode, WFA comes into play when a failed LSP preempts another LSP. This is highlighted in the scenarios presented in Figure 7 & Figure 9.

When a working path is preempted, the protection domain must transition the contents of the protecting path from the preempted working path to the preempting working path. In the locking case, the protecting path must temporarily be blocked (that is, nothing is being protected) in order to ensure that there is no misconnectivity. In the case where W1 preempts W2, the contents of the protection path transitions from transporting the W2 to not carrying any traffic before beginning to transport W1 traffic.

The following sub-section will describe the actions to be taken when an LER is in the WFA state.

4.3.5.1. Wait for Acknowledge (WFA) State

An LER will enter the Wait for Acknowledge state before transitioning into a protection state, i.e. either Protecting administrative or Protecting failure state. The LER SHALL remain in this state until either receiving an Acknowledge message, or until a WFA timer expires. Normally, the Acknowledge message will be a remote PSC input. The following describe how the LER, in WFA state, should react to a new local input:

A local Clear SHALL cause the LER to go into Normal state if the LER is in WFA state due to either a FS or MS trigger and transmit an NR(0,0) PSC message. If the LER is in WFA state due to a SF trigger then the local Clear SHALL be ignored.
A local LO SHALL cause the LER to go into Unavailable state and begin to transmit LO(x, 0) [where x indicates the index of the working path that triggered the WFA state].
A local FS SHALL cause the LER to remain in WFA state and transmit the FS(x, 0) message [where x indicates the index of the protected working path]. If the LER is in WFA state due to a FS from a different working path, then the working path with the higher priority SHALL be the protected working path. If the LER is in WFA state due to any other switching trigger, then the working path that is identified in this FS will be the protected working path.
A local SF SHALL cause the LER to remain in WFA state. If the LER is in WFA state due to an existing FS trigger, then ignore the local SF and continue to transmit the FS(x, 0) PSC message. If the LER is in WFA state due to an existing SF trigger then transmit the SF(x, 0) PSC message [where x indicates the index of protected working path, i.e. the highest priority working path indicating an SF condition]. If the LER is in WFA state due to any other trigger, then begin transmitting a SF(x, 0) PSC message [where x indicates the index of the working path that is generating the SF condition].
A local ClearSF indication where the working path is the same as the path that triggered the LER into WFA state SHALL cause the LER to go into WTR state (note: 1:N protection is always revertive) and to transmit the WTR(0, 0) message. If the ClearSF indicates a different index from the protected working path or incates the protection path then the indication SHALL be ignored.
A local MS operator command SHALL cause the LER to remain in WFA state. If the LER is in WFA state due an existing MS trigger, then the node continues to transmit MS(x, 0) messages [where x indicates the index of the protected working path, i.e. the highest prirority working path indicating the MS condition]. If the LER is in WFA state due to any other trigger, ignore the MS command and continue transmitting the current message.
If the WFA timer expires, i.e. the LER did not receive the Acknowledge message from the far end in a timely manner, then the LER SHALL go to Unavailable state, i.e. it assumes that there is a problem on the protection path (where all PSC traffic is transmitted) and send an error notification to the management system. The LER SHALL continue transmitting the current PSC message with Path field set to 0.
All other local indications SHALL be ignored.

The following details the reactions of the LER in WFA state to remote messages:

Any remote message with the Acknowledge flag set to 1 and the Path field set to the index of the protected working path SHALL cause the LER to change state. If the trigger was either FS or MS command, the LER enters Protecting administrative state. The LER transmits the appropriate message according to the trigger (i.e. FS(x,x) for FS command and MS(x,x) for the MS command). If the trigger was a SF condition, then the LER enters the Protecting failure state and begins to transmit the appropriate SF(x, x) message. A remote message with the Acknowledge flag set to 1 but where the Path field does not match, according to the description above, SHALL be ignored.
A remote LO message SHALL cause the LER to go into Unavailable state and transmit the appropriate message for the trigger that caused the WFA state.
A remote FS message indicating the same working path as the local FS command that triggered the WFA state SHALL be considered an Acknowledge message, even if the Acknowledge flag is not set. The LER SHALL perform the protection switch, and begin transmitting the FS(x, x) message [where x indicates the index of the protected working path]. If the remote FS message indicates a different index than the one indicated in the local FS and if the remote FS message indicates a lower priority working path than the working path in the local FS trigger then the LER SHALL ignore the remote FS message and remain in WFA state. If the remote FS message indicates an index of higher priority or the LER is in WFA state as a result of a SF or MS trigger, then the LER SHALL perform the protection switch for the protected working path indicated by the remote FS message, and SHALL go to Protecting administrative state and transmit the appropriate message for the local trigger with the Path field set to the index of the remote message and the Acknowledge flag set to 1.
A remote SF message indicating an error on the protection path SHALL cause the LER to go into Unavailable stateand transmit the appropriate message for the trigger that caused to WFA state.
A remote SF message indicating an error on the same working path as the local SF condition that triggered the WFA state SHALL be considered an Acknowledge message (even if the Acknowledge flag is not set). The LER SHALL perform the protection switch, go to Protecting failure state and transmit the SF(x, x) message [where x is the index of the protected working path]. If the remote SF message indicates a different index than the one indicated in the local SF, then if the local command indicates a higher priority working path the LER SHALL ignore the remote SF message and remain in WFA state. If the remote SF message indicates an index of higher priority or the LER is in WFA state as a result of a MS trigger, then the LER SHALL perform the protection switch for the protected working path indicated by the remote SF message, and SHALL go to Protecting failure state and transmit the appropriate message for the local trigger with the Path field set to the index of the remote message and the Acknowledge flag set to 1. If the LER is in WFA state due to a local FS command, then it SHALL ignore the remote message and remain in WFA state.
A remote MS message indicating an error on the same working path as the local MS that triggered the WFA state SHALL be considered an Acknowledge message (even if the Acknowledge flag is not set). The LER SHALL perform the protection switch, go to Protecting administrative state and transmit the MS(x, x) message [where x is the index of the protected working path]. If the remote MS message indicates a different index than the one indicated in the local MS, then if the local command indicates a higher priority working path or the LER is in WFA due to either a FS or SF trigger, the LER SHALL ignore the remote MS message and remain in WFA state. If the remote MS message indicates an index of higher priority, then the LER SHALL perform the protection switch for the protected working path indicated by the remote MS message, and SHALL go to Protecting administrative state and transmit an NR(0, y) with the Path field set to the index of the remote message and the Acknowledge flag set to 1.
All other remote messages SHOULD be ignored.

5. IANA Considerations

This document does not include any required IANA considerations

6. Security Considerations

The generic security considerations for the data-plane of MPLS-TP are described in the security framework document [SecureFwk] together with the required mechanisms needed to address them. The security considerations for the generic associated control channel are described in [RFC5586]. The security considerations for protection and recovery aspects of MPLS-TP are addressed in [SurvivFwk].

The extensions to the protocol described in this document are extensions to the protocol defined in [LinProt] and does not introduce any new security risks.

7. Acknowledgements

The authors would like to thank all members of the teams (the Joint Working Team, the MPLS Interoperability Design Team in IETF and the T-MPLS Ad Hoc Group in ITU-T) involved in the definition and specification of MPLS Transport Profile.

8. References

8.1. Normative References

[RFC2119]	Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997.
[TPReq]	Niven-Jenkins, B., Brungard, D., Betts, M., Sprecher, N. and S. Ueno, "Requirements of an MPLS Transport Profile", RFC 5654, September 2009.
[LinProt]	Bryant, S., Sprecher, N., Osborne, E., Fulignoli, A. and Y. Weingarten, "Multi-protocol Label Switching Transport Profile Linear Protection", RFC 6378, Apr 2011.

8.2. Informative References

[RFC5586]	Vigoureux,, M., Bocci, M., Swallow, G., Aggarwal, R. and D. Ward, "MPLS Generic Associated Channel", RFC 5586, May 2009.
[RFC4427]	Mannie, E. and D. Papadimitriou, "Recovery Terminology for Generalized Multi-Protocol Label Switching", RFC 4427, Mar 2006.
[RFC3031]	Rosen, Eric, Viswanathan, A. and Ross Callon, "Multiprotocol Label Switching Architecture", RFC 3031, Mar 2006.
[SurvivFwk]	Sprecher, N., Farrel, A. and H. Shah, "Multi-protocol Label Switching Transport Profile Survivability Framework", RFC 6372, Feb 2009.
[SecureFwk]	Fang, L., Niven-Jenkins, B., Mansfield, S., Zhang, R., Bitar, N., Daikoku, M. and L. Wang, "MPLS-TP Security Framework", ID draft-ietf-mpls-tp-security-framework-02.txt, Feb 2011.

Appendix A. PSC state machine tables

Note/Disclaimer: This state machine is not currently in sync with the text of the document and will be updated in a future revision.

The full PSC state machine is described in [LinProt], both in textual and tabular form. This appendix highlights the changes to the basic PSC state machine. In the event of a mismatch between these tables and the text either in [LinProt] or in this document, the text is authoritative. Note that this appendix is intended to be a functional description, not an implementation specification.

The tables here use the same format and state descriptions used in the Linear Protection document with the addition of the WFA state, WFA Expires, and the changes in the behavior that is noted.

Each state corresponds to the transmission of a particular set of Request, FPath and Path bits. The table below lists the message that is generally sent in each particular state. If the message to be sent in a particular state deviates from the table below, it is noted in the footnotes to the state-machine table.

State	REQ(FP,P)
N	NR(0,0)
UA:LO:L	LO(0,0)
UA:P:L	SF(0,0)
UA:LO:R	NR(0,0)
UA:P:R	NR(0,0)
PF:W:L	SF(1,1)
PF:W:R	NR(0,1)
PA:F:L	FS(1,1)
PA:M:L	MS(1,1)
PA:F:R	NR(0,1)
PA:M:R	NR(0,1)
WTR	WTR(0,1)
DNR	DNR(0,1)

The top row in each table is the list of possible inputs. The local inputs are:


NR	No Request
OC	Operator Clear
LO	Lockout of protection
SF-P	Signal Fail on protection path
SF-W	Signal Fail on working path
FS	Forced Switch
SFc	Clear Signal Fail
MS	Manual Switch
WTRExp	WTR Expired

and the remote inputs are:


LO	remote LO message
SF-P	remote SF message indicating protection path
SF-W	remote SF message indicating working path
FS	remote FS message
MS	remote MS message
WTR	remote WTR message
DNR	remote DNR message
NR	remote NR message

Section 4.3.3 refers to some states as 'remote' and some as 'local'. By definition, all states listed in the table of local sources are local states, and all states listed in the table of remote sources are remote states. For example, section 4.3.3.1 says "A local Lockout of protection input SHALL cause the LER to go into local Unavailable State". As the trigger for this state change is a local one, 'local Unavailable State' is by definition displayed in the table of local sources. Similarly, "A remote Lockout of protection message SHALL cause the LER to go into remote Unavailable state" means that the state represented in the Unavailable rows in the table of remote sources is by definition a remote Unavailable state.

Each cell in the table below contains either a state, a footnote, or the letter 'i'. 'i' stands for Ignore, and is an indication to continue with the current behavior. See section 4.3.3. The footnotes are listed below the table.

Part 1: Local input state machine

           | OC  | LO    | SF-P | FS   | SF-W | SFc  | MS   | WTRExp
   --------+-----+-------+------+------+------+------+------+-------
   N       | i   |UA:LO:L|UA:P:L|PA:F:L|PF:W:L| i    |PA:M:L| i
   UA:LO:L | N   | i     | i    | i    | i    | i    | i    | i
   UA:P:L  | i   |UA:LO:L| i    | i    | i    | [5]  | i    | i
   UA:LO:R | i   |UA:LO:L| [1]  | i    | [2]  | [6]  | i    | i
   UA:P:R  | i   |UA:LO:L|UA:P:L| i    | [3]  | [6]  | i    | i
   PF:W:L  | i   |UA:LO:L|UA:P:L|PA:F:L| i    | [7]  | i    | i
   PF:W:R  | i   |UA:LO:L|UA:P:L|PA:F:L|PF:W:L| i    | i    | i
   PA:F:L  | N   |UA:LO:L|UA:P:L| i    | i    | i    | i    | i
   PA:M:L  | N   |UA:LO:L|UA:P:L|PA:F:L|PF:W:L| i    | i    | i
   PA:F:R  | i   |UA:LO:L|UA:P:L|PA:F:L| [4]  | [8]  | i    | i
   PA:M:R  | i   |UA:LO:L|UA:P:L|PA:F:L|PF:W:L| i    |PA:M:L| i
   WTR     | i   |UA:LO:L|UA:P:L|PA:F:L|PF:W:L| i    |PA:M:L| [9]
   DNR     | i   |UA:LO:L|UA:P:L|PA:F:L|PF:W:L| i    |PA:M:L| i

Part 2: Remote messages state machine

           | LO    | SF-P | FS   | SF-W | MS   | WTR  | DNR  | NR
   --------+-------+------+------+------+------+------+------+------
   N       |UA:LO:R|UA:P:R|PA:F:R|PF:W:R|PA:M:R| i    | i    | i
   UA:LO:L | i     | i    | i    | i    | i    | i    | i    | i
   UA:P:L  | [10]  | i    | i    | i    | i    | i    | i    | i
   UA:LO:R | i     | i    | i    | i    | i    | i    | i    | [16]
   UA:P:R  |UA:LO:R| i    | i    | i    | i    | i    | i    | [16]
   PF:W:L  | [11]  | [12] |PA:F:R| i    | i    | i    | i    | i
   PF:W:R  |UA:LO:R|UA:P:R|PA:F:R| i    | i    | [14] | [15] | N 
   PA:F:L  |UA:LO:R|UA:P:R| i    | i    | i    | i    | i    | i 
   PA:M:L  |UA:LO:R|UA:P:R|PA:F:R| [13] | i    | i    | i    | i 
   PA:F:R  |UA:LO:R|UA:P:R| i    | i    | i    | i    | i    | [17] 
   PA:M:R  |UA:LO:R|UA:P:R|PA:F:R| [13] | i    | i    | i    | N 
   WTR     |UA:LO:R|UA:P:R|PA:F:R|PF:W:R|PA:M:R| i    | i    | [18]
   DNR     |UA:LO:R|UA:P:R|PA:F:R|PF:W:R|PA:M:R| i    | i    | i

The following are the footnotes for the table:

[1] Remain in the current state (UA:LO:R) and transmit SF(0,0)

[2] Remain in the current state (UA:LO:R) and transmit SF(1,0)

[3] Remain in the current state (UA:P:R) and transmit SF(1,0)

[4] Remain in the current state (PA:F:R) and transmit SF(1,1)

[5] If the SF being cleared is SF-P, Transition to N. If it's SF-W, ignore the clear.

[6] Remain in current state (UA:x:R), if the SFc corresponds to a previous SF then begin transmitting NR(0,0).

[7] If domain configured for revertive behavior transition to WTR, else transition to DNR

[8] Remain in PA:F:R and transmit NR(0,1)

[9] Remain in WTR, send NR(0,1)

[10] Transition to UA:LO:R continue sending SF(0,0)

[11] Transition to UA:LO:R and send SF(1,0)

[12] Transition to UA and send SF(1,0)

[13] Transition to PF:W:R and send NR(0,1)

[14] Transition to WTR state and continue to send the current message.

[15] Transition to DNR state and continue to send the current message.

[16] If the local input is SF-P then transition to UA:P:L. If the local input is SF-W then transition to PF:W:L. Else - transition to N state and continue to send the current message.

[17] If the local input is SF-W then transition to PF:W:L. Else - transition to N state and continue to send the current message.

[18] If the receiving LER's WTR timer is running, maintain current state and message. If the WTR timer is stopped, transition to N.

Authors' Addresses

Eric Osborne Cisco United States EMail: eosborne@cisco.com

Fei Zhang ZTE China EMail: zhang.fei3@zte.com.cn

Yaacov Weingarten Nokia Siemens Networks 34 Hagefen St Karnei Shomron, 44853 Israel EMail: wyaacov@gmail.com