CCAMP Working Group                                     R. Rabbat (Ed.) 
   Internet Draft                                  Fujitsu Labs of America 
   Expires: July 2004                                 Toshio Soumiya (Ed.) 
                                                  Fujitsu Laboratories Ltd 
                                                              January 2004 
 
 
          Optical Transport Network Failure Recovery Requirements 
 
                 draft-rabbat-optical-recovery-reqs-01.txt 
 
 
Status of this Memo 
 
   This document is an Internet-Draft and is in full conformance with 
   all provisions of Section 10 of RFC2026 [1].  
    
   Internet-Drafts are working documents of the Internet Engineering 
   Task Force (IETF), its areas, and its working groups.  Note that      
   other groups may also distribute working documents as Internet-
   Drafts. 
    
   Internet-Drafts are draft documents valid for a maximum of six months 
   and may be updated, replaced, or obsoleted by other documents at any 
   time.  It is inappropriate to use Internet-Drafts as reference 
   material or to cite them other than as "work in progress." 
    
   The list of current Internet-Drafts can be accessed at 
        http://www.ietf.org/ietf/1id-abstracts.txt 
   The list of Internet-Draft Shadow Directories can be accessed at 
        http://www.ietf.org/shadow.html. 
 
 
Abstract 
    
   This document focuses on requirements for control-plane based 
   recovery from data-plane failures in optical transport networks that 
   use an IP-based (GMPLS) control plane. It aims to gather and 
   systematically lay out the requirements so that they can serve as a 
   coherent basis for work on solution and protocol enhancements and 
   developments.  
    
 
Conventions used in this document 
    
   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 
   "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this 
   document are to be interpreted as described in RFC-2119 [2]. 
    
    
Rabbat & Soumiya (Eds.)       Expires - July 2004             [Page 1] 
 
           draft-rabbat-optical-recovery-reqs-01.txt      January 2004 
 
 
Table of Contents 
    
   1. Introduction...................................................2 
   2. Glossary of Terms Used.........................................3 
   3. Failure Recovery Requirements..................................3 
   3.1 Overview of Recovery Requirements.............................3 
   3.2 Shared Mesh-based Recovery....................................4 
   3.3 Failure Notification Mechanisms...............................5 
   3.4 Optical Network Failure Recovery Requirements.................6 
   4. Security Considerations........................................8 
   5. Conclusions....................................................8 
   6. Intellectual Property Considerations...........................8 
   7. References.....................................................9 
   8. Acknowledgments...............................................10 
   9. EditorsÆ Address..............................................10 
   10. AuthorsÆ Addresses...........................................10 
   Full Copyright Statement.........................................11 
    
    
1. Introduction 
    
   This document describes requirements for control plane-based recovery 
   from data-plane failures in optical networks.  We focus on optical 
   networks that use a Generalized Multi-Protocol Label Switching 
   (GMPLS)-based [3] control plane and various data plane technologies.  
   Service recovery from failures, using either a protection or 
   restoration scheme, is an important feature of these transport 
   networks to ensure high-reliability and uninterrupted service.  
   Protection and restoration algorithms may be used either for local 
   repair (around failed spans or nodes) or edge-to-edge recovery of an 
   LSP.  Shared mesh-based recovery is desirable to reduce spare 
   capacity requirements and enable flexible service recovery scenarios. 
          
   While edge-to-edge based recovery has the potential to be more 
   resource-efficient than link-based protection, it also entails the 
   (potentially lengthy) delay incurred in notifying all nodes along the 
   recovery path of the failure of a remote resource on the working 
   path.  For many applications, recovery paths must be chosen carefully 
   to meet strict recovery time requirement (e.g., in the range of few 
   tens to a few hundred ms).  
    
   Several documents within the CCAMP WG currently relate to recovery in 
   GMPLS networks. They cover terminology and functional specifications 
   [4, 5] and analysis [6] for recovery in GMPLS-based networks, and 
   survivability requirements and considerations for traffic engineered 
   or hierarchical networks [7].  As a set, these documents provide 
   detailed discussions of the concepts and mechanisms used in network 
   recovery.  The requirements for control plane-based recovery in 

 
Rabbat & Soumiya (Eds.)  Expires - July 2004                 [Page 2] 
 
           draft-rabbat-optical-recovery-reqs-01.txt      January 2004 
 
 
   transport network have not, however, been specifically detailed in 
   any one document. This is the objective of the current document. 
 
 
2. Glossary of Terms Used 
    
   The following acronyms are used in this document: 
    
     o LMP:      Link Management Protocol [8] 
     o LSP:      Label Switched Path  
     o OADM:     Optical Add/Drop Multiplexer 
     o OXC:      Optical Cross-Connect 
     o RSVP-TE:  Resource Reservation Protocol-Traffic Engineering [9] 
 
   The terminology for GMPLS-based recovery is documented in [4]. These 
   terms are borrowed from the generic protection switching document at 
   the ITU-T [10]. We use the following terms from that document: 
 
     o Detecting Entity (Failure Detection). 
     o Reporting Entity (Failure Correlation and Notification). 
     o Deciding Entity (part of the failure recovery decision process). 
     o Recovery Entity (part of the failure recovery activation 
        process). 
     o Bridge, which could be Permanent Bridge, Broadcast Bridge, or 
        Selector Bridge. 
     o Selector, which could be a ôSelective selectorö or a ôMerging 
        Selectorö. 
     o Recovery phases: 1. Failure Detection, 2. Failure Localization 
        and Isolation, 3. Failure Notification, 4. Recovery (Protection 
        or Restoration), 5. Reversion (Normalization) 
    
    
3. Failure Recovery Requirements 
    
   Even though some requirements for fault recovery have been discussed 
   in the CCAMP, MPLS, and TE WGs, several additional aspects need to be 
   examined in the context of recovery in optical networks.  In this 
   section, we describe the fault recovery requirements that we have 
   collected based on discussions with several carriers. 
      
3.1 Overview of Recovery Requirements 
    
   This subsection summarizes the survivability requirements for optical 
   networks.  Greater details on the requirements are provided in the 
   subsequent subsections. 
    
   The following classes (types) of recovery are required for span, LSP 
   segment, and LSP recovery: 
    
 
Rabbat & Soumiya (Eds.)  Expires - July 2004                 [Page 3] 
 
           draft-rabbat-optical-recovery-reqs-01.txt      January 2004 
 
 
     o Protection 
          - pre-computed route and pre-established (i.e., cross-
             connected) resources 
    
     o Restoration 
          - pre-computed route and on-demand establishment of resources, 
             or 
          - on-demand route and on-demand establishment of resources 
    
   A recovery scheme uses either protection or restoration (or both), 
   together with failure detection and notification mechanisms.  
   Depending on the service specification, the timing bounds for 
   recovery may range from 50 ms (for e.g., to repair services carrying 
   PSTN voice) to other less strict bounds of say several hundred ms 
   (for low priority data). 
    
   For multi-layered networks, hold-off timers are required to allow 
   recovery at lower layers to proceed before higher layers take action 
   (if needed).  Of course, escalation to higher layers should be 
   possible when necessary.  Support for horizontal hierarchy must also 
   be included, because large networks are usually segmented [7]. 
    
   In general, recovery schemes must operate in a stable and cooperative 
   manner to maximize the network's reliability and availability.  
   Recovery schemes should also be resource efficient and flexible with 
   respect to types of failures, service classes, and the network 
   operator policies that they can support. 
 
   As has been identified in [4], a critical component in guaranteeing 
   the time constraints for service recovery is the Failure Notification 
   phase.  
 
3.2 Shared Mesh-based Recovery 
    
   TodayÆs Synchronous Optical Network / Synchronous Digital Hierarchy 
   (SONET/SDH) networks use recovery techniques based on linear and ring 
   topologies.  Linear protection may include 1+1 and 1:N protection, 
   while ring protection usually involves uni-directional path switched 
   ring (UPSR) and bi-directional line switched ring (BLSR) protection. 
    
   Linear 1+1 protection and ring-based protection both require 100% 
   redundancy in spare resources for every working path.  Even with 1:N 
   based link protection, it may difficult to select different routes 
   flexibly.  Therefore, shared mesh-based recovery has emerged as a 
   flexible and efficient option for optical network recovery. 
    
   Shared mesh recovery allows for the possibility of sharing recovery 
   capacity among multiple working paths.  This increases flexibility, 
   by allowing for more options when routing both the working and the 
 
Rabbat & Soumiya (Eds.)  Expires - July 2004                 [Page 4] 
 
           draft-rabbat-optical-recovery-reqs-01.txt      January 2004 
 
 
   recovery paths.  Furthermore, this flexibility allows faster recovery 
   because the shared mesh provides for a greater number of 
   suitable/feasible intermediate nodes for routing the recovery paths. 
    
   However, it does raise the need that failure notification and 
   reconfiguration may have to be performed at multiple nodes along the 
   protection path, as illustrated by the following simple example. 
    
    
                              +---+  
                         .....| E |.............. 
                         :    +---+             : 
                         :                      : 
              +---+    +---+   \ /   +---+    +---+ 
           ===| A |====| B |====X====| C |====| D |=== 
              +---+    +---+   / \   +---+    +---+ 
                :                               : 
                :      +---+         +---+      : 
                :......| F |.........| G |......: 
                       +---+         +---+ 
    
     Figure 1. Multiple (partial) recovery paths protecting against the 
        failure of link BC. 
    
    
   Figure 1 illustrates how, for shared mesh recovery, different network 
   nodes may need to be informed of a network fault/failure. Suppose a 
   failure occurs on link BC.  Here, the working LSPs follow the route 
   ABCD, and the recovery paths have been reserved along the two dotted 
   routes. The nodes along the recovery paths have not been activated, 
   however.  Recovery paths BED and AFGD are each responsible for 
   recovering a portion of the working capacity on link BC.  In this 
   case, nodes A, B, D, E, F, and G must all receive a notification of 
   the failure and perform reconfiguration actions before the backup 
   paths can carry traffic from the working path. 
    
3.3 Failure Notification Mechanisms 
    
   To effect recovery in a timely fashion, both the failure correlation/ 
   aggregation time (that is, the time spent on the computations 
   performed at the reporting entity) and failure notification time (the 
   time that elapses prior to all entities involved in the recovery 
   receiving a failure notification signal) must be minimized. The 
   failure correlation time is required regardless of the restoration 
   scheme used. 
    
   Since shared-mesh restoration potentially requires the 
   reconfiguration of nodes along the protection path, merely using data 
   plane notification techniques to notify the end points of an LSP of a 
 
Rabbat & Soumiya (Eds.)  Expires - July 2004                 [Page 5] 
 
           draft-rabbat-optical-recovery-reqs-01.txt      January 2004 
 
 
   failure are not sufficient to effect recovery. Rather, there needs to 
   be a means for the control-plane to inform nodes on the backup path 
   of a failure/fault in the network (which can be viewed as control-
   plane based failure notification).  
    
   There are, in general, two alternatives for control-plane based 
   failure notification: 
    
      o  Failure notification messages dispatched using GMPLS signaling 
      o  Controlled flooding of failure notification messages 
    
   The GMPLS signaling protocol, RSVP-TE [9], supports notification 
   using a Notify message.  Under this scheme, the deciding entity pre-
   arranges to receive the notifications by sending a Notify Request 
   object in the Path or Resv messages.   
    
   The recovery process therefore requires 2 or 3-phases. The reporting 
   entity first sends notification of the failure to the deciding 
   entity. The deciding entity then begins a 1 or 2-phase signaling 
   process on the recovery LSP (which requires either signaling down the 
   recovery LSP or signaling down and back). 
    
   The controlled flooding of failure notification messages in the 
   control plane is another alternative for failure notification.  
   Flooding supports recovery schemes that require reconfiguration, or 
   policy or priority-based decisions to be made at multiple decision 
   entities distributed within the network, off the working path. 
    
3.4 Optical Network Failure Recovery Requirements 
    
   o Requirements on the efficiency of bandwidth use 
    
     1. A recovery scheme SHOULD allow efficient use of working LSP 
        bandwidth using such measures as route optimization, taking 
        into account route dependencies between a working path and its 
        recovery path. 
    
     2. A recovery scheme SHOULD allow efficient use of recovery LSP 
        bandwidth using such measures as route optimization, taking 
        into account route dependencies between a working path and its 
        recovery path. 
    
     3. A recovery scheme SHOULD, when possible, allow sharing of 
        recovery bandwidth among multiple recovery paths to enable 
        efficient use of recovery bandwidth. 
    
   o Requirements on recovery actions 
    

Rabbat & Soumiya (Eds.)  Expires - July 2004                 [Page 6] 
 
           draft-rabbat-optical-recovery-reqs-01.txt      January 2004 
 
 
     4. A recovery scheme SHOULD allow suppression of fault 
        notification messages, so that spurious fault notification 
        messages and recovery action messages are not transmitted 
        within the network, ensuring scalability of the fault recovery 
        mechanism. 
    
     5. A recovery scheme SHOULD ensure reliable transmission of fault 
        notification messages, provided the control plane is connected. 
    
     6. A recovery scheme SHOULD allow the network operator to choose 
        whether or not reversion actions are to be performed. 
    
     7. A recovery scheme SHOULD allow testing and verification of the 
        availability of the recovery path before its actual use.  This 
        testing may occur when the recovery path is provisioned, or 
        after it is provisioned but before actual recovery action 
        occurs. 
    
     8. A recovery scheme SHOULD make sure that recovery actions 
        correctly move traffic from failed paths to their respective 
        recovery paths, such that the recovery actions do not result in 
        long-term misconnections 
    
   o Requirements on recovery schemes 
    
     9. A recovery scheme SHOULD provide mechanisms that can be used to 
        support generally used recovery schemes such as 1+1, 1:1, 1:N, 
        M:N, and unprotected. 
    
     10. A recovery scheme SHOULD support priority-based recovery of 
        failed LSPs.  This means that recovery should be ordered 
        according to each LSP's recovery priority.  
    
   o Requirements on recovery priority of service classes 
    
     11. A recovery scheme SHOULD take into consideration the recovery 
        priority of LSPs. 
    
     12. A recovery scheme SHOULD allow support of service classes with 
        different recovery time guarantees.  
    
   o Requirements on recovery granularity 
    
     13. A recovery scheme SHOULD allow recovery of traffic on an 
        aggregated basis, for scalability. 
    
   o Requirements on fault notification  
    

Rabbat & Soumiya (Eds.)  Expires - July 2004                 [Page 7] 
 
           draft-rabbat-optical-recovery-reqs-01.txt      January 2004 
 
 
     14. A recovery scheme SHOULD have a failure notification mechanism 
        that guarantees prompt and reliable delivery of notification of 
        data plane faults to a deciding entity in charge of recovering 
        from the fault. 
    
     15. A recovery scheme SHOULD support recovery within bounded time 
        constraints and MAY be compliant with generally used recovery 
        times like 50ms for SONET/SDH protection. 
    
   o Requirements on graceful degradation and network stability 
    
     16. A recovery scheme SHOULD allow for graceful degradation of 
        performance in the presence of a fault class that was not 
        anticipated.  
    
     17. A recovery scheme SHOULD allow fallback operations of its 
        recovery actions. For example, when the system encounters a 
        fault class that was not anticipated, the system should execute 
        a best-effort recovery, such that as many working paths as 
        possible are restored under the circumstances. 
    
     18. A recovery scheme SHOULD NOT compromise the stability of the 
        network when the network encounters a fault class that was not 
        anticipated (such as multiple, independent, simultaneous 
        failures). 
    
    
4. Security Considerations 
    
   This draft does not introduce any new security issues. 
    
    
5. Conclusions 
    
   This draft described requirements for control plane-based recovery 
   from data plane failures in optical transport networks.  We 
   identified some important requirements for enabling flexible recovery 
   schemes, facilitating the efficient use of resources, and meeting the 
   potentially strict recovery times in such networks. 
    
    
6. Intellectual Property Considerations 
    
   This section is taken from Section 10.4 of RFC2026 [1]. 
    
   The IETF takes no position regarding the validity or scope of any 
   intellectual property or other rights that might be claimed to 
   pertain to the implementation or use of the technology described in 
   this document or the extent to which any license under such rights 
 
Rabbat & Soumiya (Eds.)  Expires - July 2004                 [Page 8] 
 
           draft-rabbat-optical-recovery-reqs-01.txt      January 2004 
 
 
   might or might not be available; neither does it represent that it 
   has made any effort to identify any such rights. Information on the 
   IETF's procedures with respect to rights in standards-track and 
   standards-related documentation can be found in BCP-11. Copies of 
   claims of rights made available for publication and any assurances of 
   licenses to be made available, or the result of an attempt made to 
   obtain a general license or permission for the use of such 
   proprietary rights by implementors or users of this specification can 
   be obtained from the IETF Secretariat. 
    
   The IETF invites any interested party to bring to its attention any 
   copyrights, patents or patent applications, or other proprietary 
   rights, which may cover technology that may be required to practice 
   this standard. Please address the information to the IETF Executive 
   Director. 
    
    
7. References
                     
   [1]  Bradner, S., "The Internet Standards Process -- Revision 3", BCP 
        9, RFC 2026, October 1996. 
    
   [2]  Bradner, S., "Key words for use in RFCs to Indicate Requirement 
        Levels", BCP 14, RFC 2119, March 1997. 
    
   [3]  Mannie, E. (Ed.), "Generalized Multi-Protocol Label Switching 
        (GMPLS) Architecture", Internet Draft, work in progress, draft-
        ietf-ccamp-gmpls-architecture-07.txt, May 2003. 
    
   [4]  Mannie, E. and D. Papadimitriou (Eds.), "Recovery (Protection 
        and Restoration) Terminology for GMPLS", Internet Draft, work in 
        progress, draft-ietf-ccamp-gmpls-recovery-terminology-02.txt, 
        May 2003. 
    
   [5]  Lang, J.P. and B. Rajagopalan (Eds.), "Generalized MPLS Recovery 
        Functional Specification", Internet Draft, work in progress, 
        draft-ietf-ccamp-gmpls-recovery-functional-01.txt, September 
        2003. 
    
   [6]  Papadimitriou, D. and E. Mannie (Eds.), "Analysis of Generalized 
        MPLS-based Recovery Mechanisms (including Protection and 
        Restoration)", Internet Draft, work in progress, draft-ietf-
        ccamp-gmpls-recovery-analysis-02.txt, September 2003. 
    
   [7]  Lai, W.S., and D. McDysan (Eds.), "Network Hierarchy and 
        Multilayer Survivability", RFC 3386, November 2002. 
    

Rabbat & Soumiya (Eds.)  Expires - July 2004                 [Page 9] 
 
           draft-rabbat-optical-recovery-reqs-01.txt      January 2004 
 
 
   [8]  Lang, J. (Ed.), "Link Management Protocol (LMP)", Internet 
        Draft, draft-ietf-ccamp-lmp-10.txt, October 2003. 
    
   [9]  Berger, L. (Ed.), "Generalized MPLS Signaling - RSVP-TE 
        Extensions", RFC 3473, January 2003. 
    
   [10] "Generic Protection Switching: Linear Trail and Sub-Network 
        Protection", ITU-T Recommendation G.808.1, November 2003. 
 
 
8. Acknowledgments 
    
   The authors would like to thank Peter Czezowski and Takafumi Chujo of 
   Fujitsu Labs of America, Inc., Norihiko Shinomiya and Akira Chugo of 
   Fujitsu Laboratories, Ltd for various inputs, Jonathan Lang for 
   valuable review and feedback, and Adrian Farrell for his feedback. 
    
    
9. EditorsÆ Address 
    
   Richard Rabbat 
   Fujitsu Labs of America, Inc. 
   1240 E. Arques Ave., MS 345 
   Sunnyvale, CA 94085 
   United States of America 
   Phone: +1-408-530-4537 
   Email: rabbat@alum.mit.edu 
    
   Toshio Soumiya 
   Fujitsu Laboratories Ltd. 
   1-1, Kamikodanaka 4-Chome 
   Nakahara-ku, Kawasaki 
   211-8588, Japan 
   Phone: +81-44-754-2765 
   Email: soumiya.toshio@jp.fujitsu.com 
    
    
10. AuthorsÆ Addresses 
    
   Kohei Shiomoto 
   NTT Network Innovation Laboratories                 
   Midori-machi 3-9-11, Musashino-shi 
   Tokyo, Japan 180-8585 
   Phone: +81-422-59-4402 
   Email: Shiomoto.Kohei@lab.ntt.co.jp 
    
   Shoichiro Seno 

 
Rabbat & Soumiya (Eds.)  Expires - July 2004                [Page 10] 
 
           draft-rabbat-optical-recovery-reqs-01.txt      January 2004 
 
 
   Mitsubishi Electric Corporation 
   5-1-1 Ofuna, Kamakura 
   Kanagawa, Japan 247-8501 
   Phone: +81-467-41-2430 
   Email: senos@isl.melco.co.jp 
    
    
Full Copyright Statement 
    
   "Copyright (C) The Internet Society (2003). All Rights Reserved. 
   This document and translations of it may be copied and furnished to  
   others, and derivative works that comment on or otherwise explain it 
   or assist in its implementation may be prepared, copied, published 
   and distributed, in whole or in part, without restriction of any 
   kind, provided that the above copyright notice and this paragraph are 
   included on all such copies and derivative works. However, this 
   document itself may not be modified in any way, such as by removing 
   the copyright notice or references to the Internet Society or other 
   Internet organizations, except as needed for the purpose of 
   developing Internet standards in which case the procedures for 
   copyrights defined in the Internet Standards process must be 
   followed, or as required to translate it into languages other than 
   English. 
    
   The limited permissions granted above are perpetual and will not be 
   revoked by the Internet Society or its successors or assigns. 
    
   This document and the information contained herein is provided on an 
   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING 
   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING 
   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION 
   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF 
   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE." 
    

Rabbat & Soumiya (Eds.)  Expires - July 2004                [Page 11]