CCAMP Working Group R. Rabbat (Ed.) Internet Draft Fujitsu Labs of America Expires: December 2003 Toshio Soumiya (Ed.) Fujitsu Laboratories Ltd June 2003 Optical Network Failure Recovery Requirements draft-rabbat-optical-recovery-reqs-00.txt Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026 [1]. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract This draft presents requirements for control plane-based recovery in optical networks. We focus on data-plane failure recovery in optical networks that use a GMPLS-based control plane, but use different transport plane technologies. Our goal is to gather and systematically lay out these requirements in one document to serve as a coherent basis for work on protocol and solution development. We begin with a brief overview and consideration of the requirements, and then list the requirements. Conventions used in this document The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC-2119 [2]. Rabbat & Soumiya (Eds.) Expires - December 2003 [Page 1] draft-rabbat-optical-recovery-reqs-00.txt June 2003 Table of Contents 1. Introduction...................................................2 2. Glossary of Terms Used.........................................3 3. Failure Recovery Requirements..................................3 3.1 Overview of Recovery Requirements.............................3 3.2 Shared Mesh-based Recovery....................................4 3.3 Failure Notification Mechanisms...............................5 3.4 Optical Network Failure Recovery Requirements.................6 4. Security Considerations........................................7 5. Conclusions....................................................8 6. Intellectual Property Considerations...........................8 7. References.....................................................8 8. Acknowledgments................................................9 9. EditorsÆ Address...............................................9 10. AuthorsÆ Addresses...........................................10 Full Copyright Statement.........................................10 1. Introduction This draft describes requirements for control plane-based recovery from data plane failures in optical networks. We focus on optical networks that use a Generalized Multi-Protocol Label Switching (GMPLS)-based [3] control plane and various transport plane technologies. Service recovery from failures, using either a protection or restoration scheme, is an important feature of these networks to ensure high reliability and uninterrupted service. Protection and restoration algorithms may be used for local repair (around failed spans or nodes) or edge-to-edge recovery of an LSP. Shared mesh-based recovery is desirable to reduce spare capacity requirements and enable flexible service recovery scenarios. While edge-to-edge based recovery has the potential to be more efficient, it also entails the potentially lengthy delay incurred in notifying all nodes along the recovery path of the failure of a remote resource on the working path. For some applications, recovery paths must be chosen carefully to meet strict recovery time requirement (e.g., in the range of few 10s to few hundred ms). Several Internet Drafts in the Sub-IP Area currently relate to recovery in GMPLS networks. They cover terminology [4], functional specifications [5] and analysis [6] for recovery in GMPLS-based networks, and survivability requirements and considerations for traffic engineered or hierarchical networks [7,8]. As a set, these documents provide their readers with detailed descriptions of the concepts and mechanisms used in network recovery. The list of requirements for control plane-based recovery has not, however, been Rabbat & Soumiya (Eds.) Expires - December 2003 [Page 2] draft-rabbat-optical-recovery-reqs-00.txt June 2003 specifically detailed in any one document, which is the objective of this draft. 2. Glossary of Terms Used The following acronyms are used in this document: o LSP: Label Switched Path o RSVP-TE: Resource Reservation Protocol-Traffic Eng. [9] The terminology for GMPLS-based recovery is documented in [4]. We use the following terms from that document: o Detecting Entity (Failure Detection). o Reporting Entity (Failure Correlation and Notification). o Deciding Entity (part of the failure recovery decision process). o Recovery Entity (part of the failure recovery activation process). o Bridge, which could be Permanent Bridge, Broadcast Bridge, or Selector Bridge). o Selector, which could be a ôSelective selectorö or a ôMerging Selectorö. o Recovery phases: 1. Failure Detection, 2. Failure Localization and Isolation, 3. Failure Notification, 4. Recovery (Protection or Restoration), 5. Reversion (Normalization) 3. Failure Recovery Requirements Even though some requirements for fault recovery have been discussed in working groups of the Sub-IP area, several additional aspects should be examined and mentioned regarding recovery in optical networks. In this section, we describe the fault recovery requirements that we see. For purposes of completeness, we have attempted to gather together requirements that have appeared in an isolated way in other drafts. 3.1 Overview of Recovery Requirements This subsection summarizes the survivability requirements for optical networks. Greater details on the requirements are provided in the subsequent subsections. The following classes (types) of recovery are required for span, LSP segment, and LSP recovery: o Protection Rabbat & Soumiya (Eds.) Expires - December 2003 [Page 3] draft-rabbat-optical-recovery-reqs-00.txt June 2003 - pre-computed route and pre-established (i.e., cross- connected) resources o Restoration - pre-computed route and on-demand establishment of resources - on-demand route and on-demand establishment of resources A recovery scheme uses either protection or restoration (or both), together with failure detection and notification mechanisms. Depending on the service specification, the timing bounds for the recovery schemes may range from 50 ms (for e.g., to repair services carrying video or voice data) to other less strict bounds of say several hundred ms (for low priority data). For multi-layered networks, hold-off timers are required to allow recovery at lower layers. Escalation to higher layers should be possible when necessary. Support for horizontal hierarchy must also be included, because large networks are usually segmented [7]. In general, recovery schemes must operate in a stable and cooperative manner to maximize the network's reliability and availability. Recovery schemes should also be resource efficient and flexible with respect to types of failures, service classes, and the network operatorsÆ policies that they work with. As has been identified by the P&R Design Team [4], a critical component in guaranteeing the time constraints for service recovery is the Failure Notification phase. 3.2 Shared Mesh-based Recovery TodayÆs Synchronous Optical Network / Synchronous Digital Hierarchy (SONET/SDH) networks use recovery techniques based on linear and ring topologies. Linear protection may include 1+1 and 1:N protection, while ring protection usually involves uni-directional path switched ring (UPSR) and bi-directional line switched ring (BLSR) protection. Linear 1+1 protection and ring-based protection both require 100% redundancy in spare resources for every working path. Even with 1:N based link protection, it may be difficult to select different routes flexibly. Therefore, shared mesh-based recovery appears to be a flexible and efficient option for optical network recovery. Shared mesh recovery allows for the possibility of sharing recovery capacity among multiple working paths. This increases flexibility by allowing for more options when routing both working and recovery paths. Furthermore, this flexibility facilitates fast recovery because the shared mesh provides for a greater number of suitable/feasible intermediate nodes for routing the recovery paths. Rabbat & Soumiya (Eds.) Expires - December 2003 [Page 4] draft-rabbat-optical-recovery-reqs-00.txt June 2003 3.3 Failure Notification Mechanisms In general, there are two alternatives for control plane based failure notification: o Failure notification messages dispatched using GMPLS signaling o Controlled flooding of failure notification messages The GMPLS signaling protocol, RSVP-TE [9], supports notification using a Notify message. Under this scheme, the deciding entity pre- arranges to receive the notifications by sending a Notify Request object in the Path or Resv messages. The recovery process requires 2 or 3-phases. The reporting entity first sends notification of the failure to the deciding entity. The deciding entity then begins a 1 or 2-phase signaling process on the recovery LSP (which requires either signaling down the recovery LSP or signaling down and back). The controlled flooding of failure notification messages on the control plane is another alternative for failure notification. Flooding the notifications in one shot to an appropriate portion of the network ensures their timely delivery. This supports recovery schemes that require policy or priority-based decisions to be made at multiple decision entities distributed within the network, off the working path. +---+ .....| E |.............. : +---+ : : : +---+ +---+ \ / +---+ +---+ ===| A |====| B |====X====| C |====| D |=== +---+ +---+ / \ +---+ +---+ : : : +---+ +---+ : :......| F |.........| G |......: +---+ +---+ Figure 2. Multiple (partial) recovery paths protecting against the failure of link BC. To meet the time constraints for recovery, both the failure correlation/ aggregation time (that is, the time spent on the computations performed at the reporting entity) and failure notification time (the time that elapses prior to all entities involved in the recovery receiving a failure notification signal) Rabbat & Soumiya (Eds.) Expires - December 2003 [Page 5] draft-rabbat-optical-recovery-reqs-00.txt June 2003 must also be minimized. Flooding allows messages to take the shortest available paths to these entities. Figure 2 above shows a network when a failure occurs on link BC. The working LSPs follow the route ABCD, and two (dotted) recovery paths have been reserved, but not activated. Recovery paths BED and AFGD are each responsible for recovering a portion of the working capacity on link BC. In this case, nodes A, B, D, E, F, and G must all receive a notification of the failure and perform reconfiguration actions. 3.4 Optical Network Failure Recovery Requirements o Requirements on the efficiency of working and recovery bandwidth (1) A recovery scheme SHOULD allow efficient use of working LSP bandwidth using such measures as route optimization, taking into account route dependencies between a working path and its recovery path. (2) A recovery scheme SHOULD allow efficient use of recovery LSP bandwidth using such measures as route optimization, taking into account route dependencies between a working path and its recovery path. (3) A recovery scheme SHOULD, when possible, allow sharing of recovery bandwidth among multiple recovery paths to enable efficient use of recovery bandwidth. o Requirements on recovery actions (4) A recovery scheme SHOULD allow suppression of fault notification messages, so that spurious fault notification messages and recovery action messages are not broadcast within the network, ensuring scalability of the fault recovery mechanism. (5) A recovery scheme SHOULD ensure reliable transmission of fault recovery messages, provided the control plane is connected. (6) A recovery scheme SHOULD allow fallback operations of its recovery actions. For example, when the system encounters a fault class (eg., multiple simultaneous failures) that was not anticipated, the system should execute a best-effort recovery, such that as many working paths as possible are restored under the circumstances. (7) A recovery scheme SHOULD allow the network operator to choose whether or not the reversion actions are to be performed. Rabbat & Soumiya (Eds.) Expires - December 2003 [Page 6] draft-rabbat-optical-recovery-reqs-00.txt June 2003 (8) A recovery scheme SHOULD support recovery within bounded time constraints and MAY be compliant with generally used recovery times like 50ms for SONET/SDH protection. (9) A recovery scheme SHOULD allow testing and verification of the availability of the recovery path before its actual use. This testing may occur when the recovery path is provisioned, or after it is provisioned but before actual recovery action occurs. (10) A recovery scheme SHOULD guarantee that recovery actions correctly deliver traffic from working paths to the respective recovery paths, such that the recovery actions do not result in any unintended connections or unintended diversion of traffic. o Requirements on recovery schemes (11) A recovery scheme SHOULD provide mechanisms that can be used to support generally used recovery schemes such as 1+1, 1:1, 1:N, M:N, and unprotected. (12) A recovery scheme SHOULD support priority-based recovery of failed LSPs. This means that recovery should be ordered according to each LSP's recovery priority. o Requirements on recovery priority of service classes (13) A recovery scheme SHOULD take into consideration the recovery priority of LSPs. (14) A recovery scheme SHOULD allow support of service classes with different recovery time guarantee. o Requirements on recovery granularity (15) A recovery scheme SHOULD allow recovery of traffic on an aggregated basis, for scalability. o Requirements on failure notification delivery (16) A recovery scheme SHOULD have a failure notification mechanism that guarantees prompt and reliable delivery of notification of data plane faults to a deciding entity in charge of recovering the fault. 4. Security Considerations This draft does not introduce any new security issues. Rabbat & Soumiya (Eds.) Expires - December 2003 [Page 7] draft-rabbat-optical-recovery-reqs-00.txt June 2003 5. Conclusions This draft described requirements for control plane-based recovery from data plane failures in optical networks. We identified that some important requirements are enabling flexible recovery schemes, facilitating the efficient use of resources, and meeting the potentially strict recovery times. 6. Intellectual Property Considerations This section is taken from Section 10.4 of RFC2026 [1]. The IETF takes no position regarding the validity or scope of any intellectual property or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; neither does it represent that it has made any effort to identify any such rights. Information on the IETF's procedures with respect to rights in standards-track and standards-related documentation can be found in BCP-11. Copies of claims of rights made available for publication and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementors or users of this specification can be obtained from the IETF Secretariat. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights, which may cover technology that may be required to practice this standard. Please address the information to the IETF Executive Director. 7. References [1] Bradner, S., "The Internet Standards Process -- Revision 3", BCP 9, RFC 2026, October 1996. [2] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [3] Mannie, E. (Ed.), "Generalized Multi-Protocol Label Switching (GMPLS) Architecture", Internet Draft, work in progress, draft- ietf-ccamp-gmpls-architecture-03.txt, August 2002. [4] Mannie, E. and D. Papadimitriou (Eds.), "Recovery (Protection and Restoration) Terminology for GMPLS", Internet Draft, work in Rabbat & Soumiya (Eds.) Expires - December 2003 [Page 8] draft-rabbat-optical-recovery-reqs-00.txt June 2003 progress, draft-ietf-ccamp-gmpls-recovery-terminology-02.txt, May 2003. [5] Lang, J.P. and B. Rajagopalan (Eds.), "Generalized MPLS Recovery Functional Specification", Internet Draft, work in progress, draft-ietf-ccamp-gmpls-recovery-functional-00.txt, January 2003. [6] Papadimitriou, D. and E. Mannie (Eds.), "Analysis of Generalized MPLS-based Recovery Mechanisms (including Protection and Restoration)", Internet Draft, work in progress, draft-ietf- ccamp-gmpls-recovery-analysis-01.txt, May 2003. [7] Lai, W.S., and D. McDysan (Eds.), "Network Hierarchy and Multilayer Survivability", RFC 3386, November 2002. [8] Owens, K., et al., "Network Survivability Considerations for Traffic Engineered IP Networks", Internet Draft, work in progress, draft-owens-te-network-survivability-03.txt, May 2002. [9] Berger, L. (Ed.), "Generalized MPLS Signaling - RSVP-TE Extensions", Internet Draft, work in progress, draft-ietf-mpls- generalized-rsvp-te-09.txt", September 2002. 8. Acknowledgments The authors would like to thank Peter Czezowski and Takafumi Chujo of Fujitsu Labs of America, Inc., Norihiko Shinomiya and Akira Chugo of Fujitsu Laboratories, Ltd for various inputs, and Vishal Sharma of Metanoia and Jonathan Lang of Rincon Networks for review and feedback. 9. EditorsÆ Address Richard Rabbat Fujitsu Labs of America, Inc. 1240 E. Arques Ave., MS 345 Sunnyvale, CA 94085 United States of America Phone: +1-408-530-4537 Email: rabbat@fla.fujitsu.com Toshio Soumiya Fujitsu Laboratories Ltd. 1-1, Kamikodanaka 4-Chome Nakahara-ku, Kawasaki 211-8588, Japan Rabbat & Soumiya (Eds.) Expires - December 2003 [Page 9] draft-rabbat-optical-recovery-reqs-00.txt June 2003 Phone: +81-44-754-2765 Email: soumiya.toshio@jp.fujitsu.com 10. AuthorsÆ Addresses Kohei Shiomoto NTT Network Innovation Laboratories Midori-machi 3-9-11, Musashino-shi Tokyo, Japan 180-8585 Phone: +81-422-59-4402 Email: Shiomoto.Kohei@lab.ntt.co.jp Shoichiro Seno Mitsubishi Electric Corporation 5-1-1 Ofuna, Kamakura Kanagawa, Japan 247-8501 Phone: +81-467-41-2430 Email: senos@isl.melco.co.jp Full Copyright Statement "Copyright (C) The Internet Society (2003). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE." Rabbat & Soumiya (Eds.) Expires - December 2003 [Page 10]