CCAMP Working Group T. Soumiya (Ed.) Internet Draft Fujitsu Laboratories Ltd Expires: December 2003 R. Rabbat (Ed.) Fujitsu Labs of America June 2003 Extensions to LMP for Flooding-based Fault Notification draft-soumiya-lmp-fault-notification-ext-01 Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC 2026 [1]. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract This draft describes extensions to the Link Management Protocol (LMP) for use in flooding-based fault notification in optical networks. We focus on networks that use a common control plane (e.g, Generalized Multi-Protocol Label Switching or GMPLS). These extensions implement the Fault Notification Protocol, a flooding-based approach to notifying faults to nodes in the network. We motivate the use of LMP extensions for flooding, define message formats and explain the communication messages that occur using LMP. Soumiya (Ed.) Expires - December 2003 [Page 1] draft-soumiya-lmp-fault-notification-ext-01 June 2003 Table of Contents 1. Overview.......................................................2 1.1 Terminology...................................................3 1.2 Glossary of Terms Used........................................3 2. Flooding-based Fault Notification..............................3 3. Fault Recovery Scenario........................................4 4. Additional LMP Message Formats.................................6 4.1 FaultNotify Message (Msg Type = TBD)..........................6 4.2 FaultNotifyAck Message (Msg Type = TBD).......................6 5. Additional LMP Object Definitions..............................6 5.1 TTL Class (Class = TBD).......................................7 5.2 FAULT_ID Class (Class = TBD)..................................7 6. Priority-Based Recovery........................................7 7. Implementation Considerations..................................8 8. Security Considerations........................................9 9. Conclusion.....................................................9 10. Intellectual Property Considerations.........................10 11. References...................................................11 12. Acknowledgments..............................................12 13. Editors' Addresses...........................................12 14. Contributing Authors.........................................12 15. Full Copyright Statement.....................................12 1. Overview This draft describes extensions to the Link Management Protocol (LMP) to implement the flooding-based Fault Notification Protocol in optical networks. We make the case in [2] for a flooding-based approach to the notification phase because it has the potential to offer speed and flexibility advantages over using RSVP-TE (or CR-LDP) signaling for notification. The focus in this draft is to extend LMP to include fault notification; we argue that this is a good implementation of the protocol because fault management is already one of the features of LMP and we can make use of many of its protocol objects in implementing our scheme. Currently, there are several Internet Drafts related to recovery in networks featuring a GMPLS control-plane. They cover the topics of terminology [3], requirements [4], functional specification [5], and mechanisms analysis [6]. The requirements for control plane-based recovery were found to fall into four main categories: o Meeting timing requirements o Efficient usage of data plane resources o Efficient usage of control plane resources o Supporting flexible design of recovery schemes Soumiya & Rabbat (Eds.) Expires - December 2003 [Page 2] draft-soumiya-lmp-fault-notification-ext-01 June 2003 That controlled flooding meets the requirements on fault notification mechanisms and has beneficial side effects over notification via GMPLS signaling is discussed in [4]. Flooding-based notification is also appropriate for shared mesh-based (M:N) recovery schemes that are promoted for their resource efficiency and flexibility. Generic mechanisms for implementing a Fault Notification Protocol are proposed in [2]. In this draft, we describe the implementation of the protocol and provide the necessary extensions to LMP message formats and data object definitions. 1.1 Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [7]. 1.2 Glossary of Terms Used In addition to the terminology for GMPLS-based recovery that is documented in [3], this draft uses the following acronyms: o GMPLS: Generalized Multi-Protocol Label Switching [8] o LMP: Link Management Protocol [9] o LSP: Label Switched Path o NMS: Network Management System o RSVP-TE: Resource Reservation Protocol-Traffic Engineering [10] 2. Flooding-based Fault Notification In a flooding-based fault notification approach, we have two choices for implementation. One would be to use IGP flooding (i.e. OSPF and IS-IS), while the other is an implementation specific to fault notification. In case of IGP flooding, timing requirements may not be satisfied because of the use of a pacing timer for IGP flooding. To address this point, the latter option is more suitable to notify nodes in a network of a fault. LMP is one such candidate for implementing the flooding-based fault notification since it already includes fault management functions such as fault localization. For this purpose, we believe that LMP has the ability to detect a fault very quickly. Fault notification using LMP is done by the control plane, which controls data plane nodes such as Optical Cross Connects (OXC). LMP flooding is done in the same way as IGP flooding. In the case where LMP extensions are used, notification may be done through the control Soumiya & Rabbat (Eds.) Expires - December 2003 [Page 3] draft-soumiya-lmp-fault-notification-ext-01 June 2003 channel, which is set up using the control channel management function of LMP. 3. Fault Recovery Scenario In this section, a fault recovery scenario is described based on shared mesh recovery. Every node maintains adjacency with each of its neighbors via at least one LMP control channel. A pre-planned recovery path table is configured using extended GMPLS signaling messages or through a Network Management System (NMS). The recovery path table is a table to recovery paths that a node is responsible for, together with information on the resources that each recovery path is using, in addition to fault node/link IDs. For example, the recovery path table may consist of fault node ID, fault link ID, input port, input label (i.e. lambda), output port and output label. When a failure occurs, the following procedure is carried out: 1. A downstream node close to the failure detects it. This node is called the detecting node. If the path is bi-directional, an upstream node also detects it. The detecting node should report the detection of the failure (and becomes the reporting node). 2. The reporting node sends unicast FaultNotify messages (defined in section 4.1) to all its immediate neighbor nodes. The node keeps sending unicast FaultNotify messages periodically to each of its neighbors until it receives FaultNotifyAck messages (as defined in section 4.2) from its neighbors, or a timer to retry sending expires. A FaultNotify message contains the node ID of the reporting node, the link ID of the failed link and a sequence Number. The message may optionally contain a TTL, failed wavelength ID, or failed SRLG ID. 3. A neighbor node receives a FaultNotify message with failure data and sequence number. 4. The receiving node checks that it has not yet received the message about this failure from another neighboring node. For this purpose, it searches a database indexed on the failure data and sequence number. The database stores the failure data and sequence numbers from the received messages. a. If the receiving node has not yet received the message about this failure, it adds the failure data and sequence number into the database. The node then sends unicast FaultNotify Soumiya & Rabbat (Eds.) Expires - December 2003 [Page 4] draft-soumiya-lmp-fault-notification-ext-01 June 2003 messages to all its neighbors, except the node that sent it the message. b. If it has already received the message about this failure, go to 6. 5. The receiving node possibly activates one or more recovery paths according to a pre-planned recovery table based on the failure data. 6. The receiving node sends back FaultNotifyAck message to the node that sent the FaultNotify message. [Optional]: If the receiving node has activated one or more recovery paths, it sends a RecoveryCompleteNotify message to either the egress nodes of the recovery LSPs or to the NMS. It continues sending RecoveryCompleteNotify messages periodically until it receives a RecoveryCompleteNotifyAck message or a timer to retry sending expires. [Optional]: The node at the egress of the protection path, or the NMS sends back a RecoveryCompleteNotifyAck message to the originator of the RecoveryCompleteNotify message. [R1]---[R2]---[R3]---[R4] \ / [R5]-------------[R6] / \ / \ [R7]---[R8]---[R9]---[R10] Working LSP1: [R1->R2->R3->R4] Working LSP2: [R7->R8->R9->R10] Recovery LSP1: [R1->R5->R6->R4] Recovery LSP2: [R7->R5->R6->R10] Figure 1: Shared Mesh Recovery Figure 1 shows an example of shared mesh-based recovery. One working path, W-LSP1, runs from R1 to R4 via R2 and R3, and another working path, W-LSP2, runs from R7 to R10 via R8 and R9. R1 can provide user traffic protection by creating a backup LSP that merges with the working LSP at R4. We refer to the LSP R-LSP1 [R1->R5->R6->R4] as the recovery LSP of W-LSP1. In the same manner, we refer to the LSP R-LSP2 [R7->R8->R9->R10] as the recovery LSP of W-LSP2. In this situation, if it can be assumed that multiple failures do not occur at the same time, the resources for recovering the working LSPs can be shared. In other words, the link between R5 and R6 can be shared between the recovery LSP1 and recovery LSP2. By setting up Soumiya & Rabbat (Eds.) Expires - December 2003 [Page 5] draft-soumiya-lmp-fault-notification-ext-01 June 2003 these recovery LSPs, the spare/working capacity ratio in the network can be reduced, thereby increasing network efficiency. When a failure occurs at a link between R8 and R9 on the working LSP W-LSP2, the endpoint nodes (both R8 and R9) of the link will detect the failure (if the link is bi-directional). Then, the detecting nodes start sending FaultNotify messages. Node R9 sends these messages to R6 and R10. When R6 and R10 receive FaultNotify messages, they send back a FaultNotifyAck message to the sending node R9. In the same manner, they send FaultNotify messages to their immediate neighbors. The FaultNotify message includes information regarding the failure such as FAULT_ID, etc. When nodes on the recovery LSPs receive the FaultNotify message, they activate the recovery path. In this example, R-LSP2 is activated and R7 switches W-LSP2 traffic to be carried by R-LSP2. R10, in its turn, merges R-LSP2 to the original route. As a result, traffic will follow the path [R7->R5->R6->R10]. 4. Additional LMP Message Formats LMP is a good candidate protocol to use and extend for purposes of fault notification. Two messages (FaultNotify and FaultNotifyAck) need to be defined. Furthermore, most of the necessary data objects are already defined in LMP [9]. 4.1 FaultNotify Message (Msg Type = TBD) ::= [] { [] ...} or ::= [ ...] 4.2 FaultNotifyAck Message (Msg Type = TBD) ::= The contents of the MESSAGE_ID_ACK object MUST be obtained from the FaultNotify message being acknowledged. 5. Additional LMP Object Definitions The formats for the Common Header at the beginning of LMP messages and the LMP objects used to build the messages are defined in [9]. Soumiya & Rabbat (Eds.) Expires - December 2003 [Page 6] draft-soumiya-lmp-fault-notification-ext-01 June 2003 That document also defines the MESSAGE_ID, MESSAGE_ID_ACK, LOCAL_NODE_ID, and CHANNEL_STATUS data objects used in our extended messages. The SRLG_ID data object is defined in [11]. This leaves us to define data objects for TTL and FAULT_ID. 5.1 TTL Class (Class = TBD) o C-Type = 1, Time to Live (= Hop Count) 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | TTL | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ TTL: 8 bits This is an unsigned integer to indicate the remaining hop count value. A node receiving a FaultNotify message having a TTL of zero MUST silently discard the message. This object is non-negotiable. 5.2 FAULT_ID Class (Class = TBD) o C-Type = 1, Failure Identifier 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Reserved | FaultId | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ FaultId: 16 bits This MUST be a node-wide unique unsigned integer. The FaultId identifies the sequence of failures. A node increases the value when it detects a failure. This object is non-negotiable. 6. Priority-Based Recovery Fault recovery schemes typically assume single failure events. However, multiple failures may occur in some short time interval. Protection against occurrences of failure scenarios requires large amounts of spare capacity. Ideally, the network should at least recover some of the working paths in this situation. Soumiya & Rabbat (Eds.) Expires - December 2003 [Page 7] draft-soumiya-lmp-fault-notification-ext-01 June 2003 For example, consider figure 2, when two failures occur at the same time. One failure is a link failure between R3 and R4, and the other failure is a link failure between R7 and R8. In this example, R4 detects a failure and sends a fault notification message to R6. At almost the same time, R8 detects a failure and sends a message to R5. At R6, a recovery path switches traffic from R6 to R4 because the fault notification message is for W-LSP1. On the other hand, at R5, a recovery path switches traffic from R5 to R6 because the fault notification message is for W-LSP2. As a result, an invalid recovery path is set to follow [R7->R5->R6->R4]. [R1]---[R2]---[R3]-X-[R4] \ / [R5]-------------[R6] / \ / \ [R7]-X-[R8]---[R9]---[R10] Working LSP1: [R1->R2->R3->R4] Working LSP2: [R7->R8->R9->R10] Recovery LSP1: [R1->R5->R6->R4] Recovery LSP2: [R7->R5->R6->R10] Figure 2: Multiple failure scenario. Priority-based control is an effective solution for the case of recovering specific working paths under the condition of multiple failures. In the above example, if the priority of W-LSP1 is higher than W-LSP2, then the fault notification messages for W-LSP1 are preferred. In other words, the system checks the priority of the protection path and changes the priority setting. In that case, the switching traffic from R6 to R4 takes precedence over switching traffic from R6 to R10. By adopting priority-based control, such behavior can be avoided. As a result, the high priority recovery path is activated. Priority in general should be set according to a network operator's policy and/or network service. 7. Implementation Considerations Since we are proposing LMP extensions, it is important to briefly discuss their implementation. First, we note that several different implementations of LMP itself are possible, depending on the specific system architecture and the design tradeoffs made in implementing it. The LMP specification does not require a specific protocol implementation, and both centralized (for example, in the control plane) or distributed (for example, different LMP stacks running on different line cards) are possible. Soumiya & Rabbat (Eds.) Expires - December 2003 [Page 8] draft-soumiya-lmp-fault-notification-ext-01 June 2003 In fact, a hybrid implementation, where some LMP functions are distributed while others are centralized, is also possible, with it being the system designerĘs decision. In that respect, one should understand the design decisionĘs impact on the implementation of the flooding method. For a centralized control plane-based LMP implementation, no further changes need to be made, besides the implementation of the steps highlighted in section 3. For a fully-distributed implementation of LMP, some amount of message passing has to take place to communicate fault information between LMP stacks. One implementation, for example, could be to have the control plane execute the steps of section 3. An LMP stack upon receiving any of the messages explained in section 4, forwards them to the Fault Notification Protocol process in the control plane. After the message has been processed in the control plane, and an LMP message has to be generated, the message may either be generated in the control plane and sent to the relevant LMP process that transmits it to the correct destination, or a communication may be sent to the relevant LMP process to generate and transmit the appropriate LMP message. 8. Security Considerations Security requirements depend on the level of trust between nodes that exchange fault notification messages. In general, when nodes in an optical network are in the same administrative domain as opposed to talking to nodes in a different administrative domain, security considerations may be more relaxed. When a fault notification mechanism is implemented based on LMP, the security mechanisms of LMP can be adopted. All LMP messages should be sent over an IPsec channel that has been either pre-established or is set-up on an as-needed basis. Note however that Fault Notification Protocol itself introduces no new security considerations. 9. Conclusion This draft describes extensions to the Link Management Protocol (LMP) for use in fault notification in optical networks. We presented the motivation for this work and the working of the implementation. We also presented some descriptive scenarios on how controlled flooding of fault notification operates. Soumiya & Rabbat (Eds.) Expires - December 2003 [Page 9] draft-soumiya-lmp-fault-notification-ext-01 June 2003 10. Intellectual Property Considerations This section is taken from Section 10.4 of RFC 2026 [1]. The IETF takes no position regarding the validity or scope of any intellectual property or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; neither does it represent that it has made any effort to identify any such rights. Information on the IETF's procedures with respect to rights in standards-track and standards-related documentation can be found in BCP-11. Copies of claims of rights made available for publication and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementors or users of this specification can be obtained from the IETF Secretariat. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights, which may cover technology that may be required to practice this standard. Please address the information to the IETF Executive Director. Soumiya & Rabbat (Eds.) Expires - December 2003 [Page 10] draft-soumiya-lmp-fault-notification-ext-01 June 2003 11. References [1] Bradner, S., "The Internet Standards Process -- Revision 3", BCP 9, RFC 2026, October 1996. [2] Rabbat, R., and V. Sharma (Eds.), "Fault Notification Protocol for GMPLS-Based Recovery", Internet Draft, work in progress, draft-rabbat-fault-notification-protocol-03.txt, June 2003. [3] Mannie, E. and D. Papadimitriou (Eds.), "Recovery (Protection and Restoration) Terminology for GMPLS", Internet Draft, work in progress, draft-ietf-ccamp-gmpls-recovery-terminology- 02.txt, February 2003. [4] Rabbat, R. and T. Soumiya (Eds.), "Optical network failure recovery requirements", Internet Draft, work in progress, draft-rabbat-optical-recovery-reqs-00.txt, June 2003. [5] Lang, J. P. and B. Rajagopalan (Eds.), "Generalized MPLS Recovery Functional Specification", Internet Draft, work in progress, draft-ietf-ccamp-gmpls-recovery-functional-01.txt, May 2003. [6] Papadimitriou, D. and E. Mannie (Eds.), "Analysis of Generalized MPLS-based Recovery Mechanisms (including Protection and Restoration)", Internet draft, work in progress, draft-ietf-ccamp-gmpls-recovery-analysis-01.txt, May 2003. [7] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [8] Mannie, E. (Ed.), "Generalized Multi-Protocol Label Switching (GMPLS) Architecture", Internet Draft, work in progress, draft- ietf-ccamp-gmpls-architecture-07.txt, May 2003. [9] Lang, J. (Ed.), "Link Management Protocol (LMP)", Internet Draft, work in progress, draft-ietf-ccamp-lmp-09.txt, June 2003. [10] Berger, L. (Ed.), "Generalized Multi-Protocol Label Switching (GMPLS) Signaling - Resource ReserVation Protocol-Traffic Engineering (RSVP-TE) Extensions", RFC 3473, January 2003. [11] Fredette, A., and J. Lang (Eds.), " Link Management Protocol (LMP) for Dense Wavelength Division Multiplexing (DWDM) Optical Line Systems", Internet Draft, work in progress, draft-ietf- ccamp-lmp-wdm-02.txt, March 2003. Soumiya & Rabbat (Eds.) Expires - December 2003 [Page 11] 12. Acknowledgments The authors would like to thank Ching-Fong Su and Takafumi Chujo of Fujitsu Labs of America, and Masafumi Katoh and Akira Chugo of Fujitsu Laboratories, Ltd. 13. Editors' Addresses Toshio Soumiya Richard Rabbat, Ph.D. Fujitsu Laboratories Ltd. Fujitsu Labs of America, Inc. 1-1, Kamikodanaka 4-Chome 1240 E. Arques Ave, MS 345 Nakahara-ku, Kawasaki Sunnyvale, CA 94085 211-8588, Japan United States of America Phone: +81-44-754-2765 Phone: +1-408-530-4537 Email: soumiya.toshio@jp.fujitsu.com Email: rabbat@alum.mit.edu 14. Contributing Authors Takeo Hamada Shinya Kanoh Fujitsu Labs of America, Inc. Fujitsu Laboratories Ltd. 1240 E. Arques Ave, MS 345 1-1, Kamikodanaka 4-Chome Sunnyvale, CA 94085 Nakahara-ku, Kawasaki United States of America 211-8588, Japan Phone: +1-408-530-4516 Phone: +81-44-754-2765 Email: thamada@fla.fujitsu.com Email: kanoh@jp.fujitsu.com Vishal Sharma Metanoia, Inc. 1600 Villa Street, Unit 352 Mtn. View, CA 94041 United States of America Phone: +1-650-386-6723 Email: v.sharma@ieee.org 15. Full Copyright Statement "Copyright (C) The Internet Society (2003). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for Soumiya (Ed.) Expires - December 2003 [Page 12] draft-soumiya-lmp-fault-notification-ext-01 June 2003 copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE." Soumiya & Rabbat (Eds.) Expires - December 2003 [Page 13]