CCAMP Working Group Yi Lin Internet Draft Huawei Technologies Intended status: Standards Track November 3, 2019 Expires: May 2020 RSVP-TE Extensions in Support of Proactive Protection draft-lin-ccamp-gmpls-proactive-protection-00.txt Status of this Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html This Internet-Draft will expire on May 3, 2020. Copyright Notice Copyright (c) 2019 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Yi Lin Expires May 3, 2020 [Page 1] Internet-Draft GMPLS Proactive Protection November 2019 Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Abstract This document describes protocol-specific procedures and extensions for Generalized Multi-Protocol Label Switching (GMPLS) Resource ReSerVation Protocol - Traffic Engineering (RSVP-TE) signaling to support Label Switched Path (LSP) Proactive Protection, which create the protection LSP after a failure is predicted and before it becomes a real failure. Table of Contents 1. Introduction .................................................. 2 2. Conventions used in this document ............................. 3 3. Overview of Predicted Failure and Related Recovery Methods .... 3 3.1. Predicted Failure ........................................ 3 3.2. Proactive Protection ..................................... 4 4. Modified PROTECTION Object Format ............................. 5 5. Extension to ERROR_SPEC Object ................................ 6 5.1. New Error Code / Sub-code ................................ 6 5.2. New TLV in ERROR_SPEC Object ............................. 6 6. End-to-end Proactive Protection ............................... 7 6.1. Creation of the Protected LSP ............................ 7 6.2. Notification of Predicted Failure Event .................. 7 6.3. Tearing Down of the Protection LSP ....................... 8 7. Proactive Segment Protection .................................. 8 7.1. Creation of the Protected LSP ............................ 8 7.2. Notification of Predicted Failure Event .................. 9 7.3. Tearing Down of the Segment Recovery LSP ................. 9 7.4. Priority and Resource Pre-emption ....................... 10 8. Consideration of Backward Compatibility ...................... 11 9. Security Considerations ...................................... 11 10. IANA Considerations ......................................... 11 11. References .................................................. 12 11.1. Normative References ................................... 12 11.2. Informative References ................................. 12 12. Authors' Addresses .......................................... 12 1. Introduction [RFC4872] and [RFC4873] describe protocol-specific procedures and extensions for GMPLS RSVP-TE signaling to support end-to-end LSP Yi Lin Expires May 3, 2020 [Page 2] Internet-Draft GMPLS Proactive Protection November 2019 recovery (including protection and restoration) and segment LSP recovery, respectively. Traditional protection solution (e.g., 1+1 or 1:1 protection) could have very fast protection switch after failure happens, but takes twice of resource in the network during the whole lifetime of the LSP. On the other hand, the traditional restoration solution has much higher resource use, but the recovery of the LSP is much slower, due to the additional signaling time to create the restoration LSP. In order to reduce the recovery resource while keeping the very fast protection switch, an approach is to use the failure prediction technologies and to create 1+1 or 1:1 protection only when a potential failure is predicted. This approach refers to "Proactive Protection" in this document. This document extends the RSVP-TE protocol to support the control of the Proactive Protection. 2. Conventions used in this document The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here. 3. Overview of Predicted Failure and Related Recovery Methods 3.1. Predicted Failure In most cases, there will be some indications before a physical failure happens in a network. For example, abnormal fluctuation of noise of a lightpath, BER (Bit Error Rate) (before error correction) rising, temperature rising of a transponder. Therefore, by monitoring on certain physical parameters and analyzing the change tendency using, for example, Machine Learning (ML) or other technologies, a node is possible to predict whether failure will happen in an upcoming period of time. Note that a predicted failure is different from a Signal Degrade in that: - When Signal Degrade happens to a connection, the connection is still available but the quality of the signal carried by this Yi Lin Expires May 3, 2020 [Page 3] Internet-Draft GMPLS Proactive Protection November 2019 connection has declined and is lower than the predetermined threshold. For example, the BER of a connection rises and is out of tolerance. - When a predicted failure of a connection is inferred, no failure nor degradation happens at present, but there is a trend that after a period of time, failure will probably happen, which will cause Signal Fail or Signal Degrade. The methods to predict failures are outside the scope of this document. 3.2. Proactive Protection The "Proactive Protection" refers to an LSP protection approach which create the protection LSP after a failure is predicted and before it becomes a real failure. Both end-to-end protection (defined in [RFC4872] and segment protection (defined in [RFC4873]) are applicable for the Proactive Protection. The main procedure of Proactive Protection is shown in Figure 1: |-> Predicted failure notification received | |-> Proactive Protection path created | | |-> Real failure happens | | | |-> Protection switch finished | | | | | | | | Protection path deleted <-| | | | | if no failure happened | | | | | | | | t3 | | t6 | ---+---+--------+======x=+==========================+----+---> t t1 t2 | t4 t5 | t7 | | |<--Predicted failure time period-->| Figure 1: Overview of Proactive Protection - t1: The protection source node of an LSP is notified that a failure will probably happen during t3~t6, so it starts to create 1+1 or 1:1 protection of the connection. Here the protection source node can be the source node of the LSP (for end-to-end protection case), or a branch node located between the source node and the predicted failure point of the LSP (for segment protection case). Yi Lin Expires May 3, 2020 [Page 4] Internet-Draft GMPLS Proactive Protection November 2019 t2: The 1+1 or 1:1 protecting path is created between the protection source node and the protection destination node. Here the protection destination node can be the destination node of the LSP (for end-to-end protection case), or a merge node located between the predicted failure point and the destination node of the LSP (for segment protection case). - t4: If real failure happens as predicted, the 1+1 or 1:1 protection switch will be triggered. - t5: Protection switch finished and the service in the connection is recovered. - t7: If in fact the predicted failure didn't happen, and no further predicted failure notification received, the protection source node MAY tear down the protecting path after t6, in order to save the network resource. 4. Modified PROTECTION Object Format This document modifies the PROTECTION object (C-Type=2) by adding two new bits T and A in reserved fields, as shown in Figure 2 below: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Length | Class-Num(37) | C-Type (2) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |S|P|N|O|T| Res. | LSP Flags | Reserved | Link Flags| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |I|R|A| Reserved | Seg.Flags | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 2: The modified PROTECTION object (C-Type=2) - T (Triggered End-to-end Proactive Protection): 1 bit, when set (1), it indicates that the end-to-end Proactive Protection are required. Note that if T bit is set (1), the LSP Flags SHOULD be one of: 0x04 1:N Protection with Extra-Traffic 0x08 1+1 Unidirectional Protection 0x10 1+1 Bidirectional Protection - A (proActive Segment Protection): 1 bit, when set (1), it indicates that the Proactive Segment Protection are required. Yi Lin Expires May 3, 2020 [Page 5] Internet-Draft GMPLS Proactive Protection November 2019 Note that If A bit is set (1), the Seg. Flags SHOULD be one of: 0x04 1:N Protection with Extra-Traffic 0x08 1+1 Unidirectional Protection 0x10 1+1 Bidirectional Protection See [RFC4872] and [RFC4873] for the definition of other fields. 5. Extension to ERROR_SPEC Object 5.1. New Error Code / Sub-code A new Error Sub-code under Error Code "25 - Notify Error" is defined in this document, which is used to notify the event of a predicted failure: Error Code = 25: "Notify Error" (see [RFC3209]) Error Sub-code = TBA: "Notify Error/LSP Local Predicted Failure" 5.2. New TLV in ERROR_SPEC Object When predicting a failure, a certain time before which the failure may happen may also be predicted. This time information is useful for the source node to know how long it should wait for the predicted failure to become a real failure, and to decide when it's safe to tear down the protection LSP if the predicted failure didn't happen. A new TLV in IPv4/IPv6 IF_ID ERROR_SPEC Object is defined in this document, which is used to indicate the time before which the predicted failure will probably become real failure. The format of this new TLV is shown in Figure 3 below: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type = TBA | Length = 8 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Time | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 3: New TLV (type=TBA) in ERROR_SPEC Object - Type: TBA - Length: 8 Yi Lin Expires May 3, 2020 [Page 6] Internet-Draft GMPLS Proactive Protection November 2019 - Time: A relative time measured in second, which indicates within how many seconds (from the current time) the predicted failure will probably become real failure. 6. End-to-end Proactive Protection 6.1. Creation of the Protected LSP To create an LSP with recovery type of "End-to-end Proactive Protection", the source node of the LSP generates a Path message with a PROTECTION object included. The T bit in the PROTECTION object MUST be set to 1 (End-to-end Proactive Protection), so that all other nodes along the LSP can start the failure prediction function on related links/nodes. Note that the N bit in the PROTECTION object is used to indicate whether the control plane message exchange is only used for notification or for protection-switching purpose after real failure happens, see [RFC4872]. In other words, the N bit have nothing to do with the notification of a predicted failure before real failure happens. To allow the notification of predicted failure event to the source node by the Notify message, the NOTIFY REQUEST object MUST also be included in the Path message (see [RFC3473]), where the "Notify Node Address" SHOULD be the address of the source node of the LSP. 6.2. Notification of Predicted Failure Event When an intermediate node on an LSP infers that a failure will happen and will affect the LSP, a Notify message will be sent to the source node of the LSP, to inform such predicted failure event. A new error code/sub-code "Notify Error/LSP Local Predicted Failure" is used in the ERROR_SPEC object or IF_ID_ERROR_SPEC object in the Notify message. The Notify message MAY also include a TLV (type = TBA) in the IPv4 or IPv6 IF_ID_ERROR_SPEC object, to indicate the time before which the predicted failure will probably become real failure. On receiving the Notify message with error code/sub-code "Notify Error/LSP Local Predicted Failure", the source node of the LSP SHOULD trigger the procedure to create the protection LSP, according to the protection type indicated in the "LSP Flags" field of the PROTECTION object in the Path message for the protected LSP. The procedures of creating the protection LSP and the protection switching after real failure happens are described in [RFC4872]. Yi Lin Expires May 3, 2020 [Page 7] Internet-Draft GMPLS Proactive Protection November 2019 6.3. Tearing Down of the Protection LSP After the protected LSP is created, the source node MAY start a timer T_wait and wait for the predicted failure to become a real failure. If no real failure happens and no more notification of predicted failure is received till T_wait, the source node MAY trigger the procedure to tear down the protection LSP, according to local policy. See [RFC4872] about the process of tearing down a protection LSP. Implementations SHOULD allow this policy to be configured to provide a default across all LSPs on a node, but SHOULD also allow it to be configured per LSP. Note that the T_wait MUST longer than the time indicated in the TLV (type=TBA) in the ERROR_SPEC object in the Notify message, if the TLV exists. Note also that the value of T_wait is a local matter of the source node, and is outside the scope of this document. 7. Proactive Segment Protection 7.1. Creation of the Protected LSP To create an LSP with recovery type of "Proactive Segment Protection", the source node of the LSP generates a Path message, where: - A PROTECTION object is included, where the A bit MUST be set to 1 (Proactive Segment Protection), so that all nodes along the protected LSP can start the failure prediction function on related links/nodes if supported. The "Seg. Flags" are used to indicate the protection type of the Proactive Segment Protection. - One or more SERO objects MAY included (i.e., explicit Proactive Segment Protection), indicating the branch node and the merge node of each segment recovery LSP. If no SERO object is included, it indicates that the dynamic Proactive Segment Protection method is used. - A NOTIFY REQUEST object is included, where the Notify Node Address" SHOULD be the address of the source node of the LSP. For explicit Proactive Segment Protection, when a branch node receives a Path message with A bit set to 1 in the PROTECTION object, the branch node follows [RFC4873] to process the Path Yi Lin Expires May 3, 2020 [Page 8] Internet-Draft GMPLS Proactive Protection November 2019 message, except that the Path message for the recovery LSP will not be generated and be sent at this stage. Also, one more NOTIFY REQUEST object SHOULD be added to the Path message of the protected LSP, which carries the address of this branch node. For dynamic Proactive Segment Protection, when an intermediate node receives a Path message with A bit set to 1 in the PROTECTION object, the node will determine if it has the ability to be a branch node, as described in Section 6.2 of [RFC4873]. If yes, it follows the same procedure as what a branch node does in the case of explicit Proactive Segment Protection, as described above. If not, the node only follows the standard procedure to create the protected LSP. 7.2. Notification of Predicted Failure Event When an intermediate node between a pair of branch and merge nodes on an LSP infers that a failure will happen and will affect the LSP, a Notify message will be sent to the nearest branch node on the upstream direction of the LSP, to inform such predicted failure event. The error code/sub-code "Notify Error/LSP Local Predicted Failure" is used in the ERROR_SPEC object or IF_ID_ERROR_SPEC object in the Notify message. Similar to End-to-end Proactive Protection, the time before which the predicted failure may occur MAY also be included in the Notify message. On receiving the Notify message with error code/sub-code "Notify Error/LSP Local Predicted Failure", the branch node on the protected LSP SHOULD generate a new Path message, and send this new Path message along the recovery LSP between the branch and the merge nodes. The procedures of generating new Path message and creating the recovery LSP are the same as what is described in [RFC4873], except that the A bit in the PROTECTION object of this new Path message MUST set to 1. 7.3. Tearing Down of the Segment Recovery LSP After the segment recovery LSP is created, the branch node MAY start a timer T_wait and wait for the predicted failure to become a real failure. If no real failure happen and no more notification of predicted failure is received till T_wait, the branch node MAY trigger the procedure to tear down the segment recovery LSP, according to local policy. See [RFC4873] about the process of tearing down a segment recovery LSP. Yi Lin Expires May 3, 2020 [Page 9] Internet-Draft GMPLS Proactive Protection November 2019 Implementations SHOULD allow this policy to be configured to provide a default across all LSPs on a node, but SHOULD also allow it to be configured per LSP. Note that the T_wait MUST longer than the time indicated in the TLV (type=TBA) in the ERROR_SPEC object in the Notify message, if the TLV exists. Note also that the value of T_wait is a local matter of the branch node, and is outside the scope of this document. 7.4. Priority and Resource Pre-emption It's possible that after recovery LSP is created and before the predicted failure becomes a real failure, another real failure happens on the LSP outside the protected segment. In this case, the source node (or an intermediate node in the upstream direction of the real failure) may start a restoration procedure to recover the LSP. For the same protected LSP, since recovering from a real failure always has higher priority than protecting against a predicted failure which still hasn't happened, the restoration LSP can pre-empt the resource of the segment recovery LSP. As shown in Figure 4, assume that node B (branch node) was notified of a predicted failure event between N-4 and M (merge node), and has created the segment recovery LSP along B, N-1, N-2, N-3 and M. If another failure between S (source node) and B happens before the predicted failure becomes a real failure, node S will try to create the restoration LSP. Since that resource is limited, the restoration LSP can pre-empt the resource of the segment recovery LSP between N- 1 and N-3. The nodes along the segment recovery LSP has enough information to determine whether pre-emption is allowed. This is because these nodes know that: - The current segment recovery LSP is used for Proactive Segment Protection through the A bit in the PROTECTION object; - The segment recovery LSP and the restoration LSP are protecting the same LSP through the association relationship. Yi Lin Expires May 3, 2020 [Page 10] Internet-Draft GMPLS Proactive Protection November 2019 |<------ Pre-emption ------>| | | *************************************************************** *+---+ +---+ +---+ +---+ +---+* *| +---------+N-1+---------+N-2+---------+N-3+---------+ |* *+-+-+ +-+-+ +---+ +-+-+ +-+-+* * | |###########################| | * * | |# #| | * * | |# #| | * *+-+-+ +-+-+ +---+ +-+-+ +-+-+* ***| S +----X----+ B +---------+N-4+----?----+ M +---------+ D |*** +---+ +---+ +---+ +---+ +---+ =================================================================== S: Source node D: Destination node B: Branch node M: Merge node X: Real failure ?: Predicted failure (haven't happened yet) =====: Protected LSP #####: Segment Recovery LSP *****: Restoration LSP Figure 4: Resource pre-emption by restoration LSP 8. Consideration of Backward Compatibility TBD. [Editor's note]: will add some description about interwork with legacy nodes which do not support the function of failure prediction and reporting. 9. Security Considerations TBD. 10. IANA Considerations IANA assigns values to RSVP protocol parameters. Within the current document, a new Error code/sub-code value is defined: Error Code = 25: "Notify Error" (see [RFC3209]) o "Notify Error/LSP Local Predicted Failure" (TBA) Yi Lin Expires May 3, 2020 [Page 11] Internet-Draft GMPLS Proactive Protection November 2019 11. References 11.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997. [RFC3209] Awduche, D., Berger, L., Gan, D., Li, T., Srinivasan, V., and G. Swallow, "RSVP-TE: Extensions to RSVP for LSP Tunnels", RFC 3209, December 2001. [RFC3473] Berger, L., Ed., "Generalized Multi-Protocol Label Switching (GMPLS) Signaling Resource ReserVation Protocol- Traffic Engineering (RSVP-TE) Extensions", RFC 3473, January 2003. [RFC4872] Lang, J., Ed., Rekhter, Y., Ed., and D. Papadimitriou, Ed., "RSVP-TE Extensions in Support of End-to-End Generalized Multi-Protocol Label Switching (GMPLS) Recovery", RFC 4872, May 2007. [RFC4873] Berger, L., Bryskin, I., Papadimitriou, D., and A. Farrel, "GMPLS Segment Recovery", RFC 4873, May 2007. [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 2017. 11.2. Informative References [RFC4426] Lang, J., Ed., Rajagopalan, B., Ed., and D. Papadimitriou, Ed., "Generalized Multi-Protocol Label Switching (GMPLS) Recovery Functional Specification," RFC 4426, March 2006. 12. Authors' Addresses Yi Lin Huawei Technologies F3 R&D Center, Huawei Industrial Base, Bantian, Longgang District, Shenzhen 518129 P.R.China Email: yi.lin@huawei.com Yi Lin Expires May 3, 2020 [Page 12]