CCAMP Working Group                                 T. Soumiya (Editor)
Internet Draft                                     Fujitsu Laboratories
Expires: August 2003                              P. Czezowski (Editor)
                                                Fujitsu Labs of America
 
                                                          February 2003
 
          Extensions to LMP for Flooding-based Fault Notification 
              draft-soumiya-lmp-fault-notification-ext-00.txt 
     
Status of this Memo  
    
   This document is an Internet-Draft and is in full conformance with 
   all provisions of Section 10 of RFC2026 [1]. 
    
   Internet-Drafts are working documents of the Internet Engineering 
   Task Force (IETF), its areas, and its working groups.  Note that 
   other groups may also distribute working documents as Internet-
   Drafts.  
    
   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time. It is inappropriate to use Internet-Drafts as reference 
   material or to cite them other than as "work in progress."  
    
   The list of current Internet-Drafts can be accessed at  
        http://www.ietf.org/ietf/1id-abstracts.txt  
   The list of Internet-Draft Shadow Directories can be accessed at  
        http://www.ietf.org/shadow.html. 
 
 
Abstract
         

   This draft describes extensions to the Link Management Protocol (LMP)
   for use in flooding-based fault notification in pre-OTN networks.  
   Pre-OTN networks are transport networks that have a GMPLS-based 
   control plane and various transport plane technologies (such as 
   Optical Cross Connects and Optical Add/Drop Multiplexers, etc.)  An 
   important feature of these networks is timely recovery from failures
   - using either a protection or restoration scheme.  The recovery 
   schemes should also be resource efficient and flexible to meet 
   operator requirements.  Once a fault is detected, fault notification
   is one of a series of phases needed to achieve recovery.  We prefer a
   flooding-based approach to the notification phase because it may 
   offer speed and flexibility advantages over using RSVP-TE signaling 
   for notification.  Extending LMP to include fault notification is a 
   good fit to the problem because fault management is already one of 
   its features and many of its protocol objects can be reused. 


Soumiya & Czezowski (Eds.)      Expires - August 2003         [Page 1]
         draft-soumiya-lmp-fault-notification-ext-00.txt February 2003

 
Table of Contents 
    
   1. Overview.......................................................2 
   1.1 Terminology...................................................3 
   1.2 Glossary of Terms Used........................................3 
   2. Fault Recovery Scenario........................................3 
   3. Additional LMP Message Formats.................................5 
   3.1 FaultNotify Message (Msg Type = TBD)..........................5 
   3.2 FaultNotifyAck Message (Msg Type = TBD).......................6 
   4. Additional LMP Object Definitions..............................6 
   4.1 TTL Class (Class = TBD).......................................6 
   4.2 FAULT_ID Class (Class = TBD)..................................6 
   5. Priority-Based Recovery........................................7 
   6. Security Considerations........................................8 
   7. Conclusion.....................................................8 
   References........................................................9 
   Acknowledgments..................................................10 
   Editors' Addresses...............................................10 
   Contributing Authors.............................................10 
    
    
1. Overview 
    
   This draft describes extensions to the Link Management Protocol (LMP)
   for use in flooding-based fault notification in pre-OTN networks.  
   Pre-OTN networks are transport networks that have a GMPLS-based 
   control plane and various transport plane technologies (such as 
   Optical Cross Connects and Optical Add/Drop Multiplexers, etc.)  An 
   important feature of these networks is timely recovery from failures 
   - using either a protection or restoration scheme.  The recovery 
   schemes should also be resource efficient and flexible to meet 
   operator requirements.  Once a fault is detected, fault notification 
   is one of a series of phases needed to achieve recovery.  We prefer a
   flooding-based approach to the notification phase because it may 
   offer speed and flexibility advantages over using RSVP-TE signaling 
   for notification.  Extending LMP to include fault notification is a 
   good fit to the problem because fault management is already one of 
   its features and many of its protocol objects can be reused. 
 
   Currently, there are several Internet Drafts related to recovery in 
   networks featuring a GMPLS control-plane. They cover the topics of 
   terminology [2], requirements [3], functional specification [4], and
   mechanisms analysis [5].  The requirements for control plane-based 
   recovery were found to fall into four main categories: 
    
      o Meeting timing requirements 
      o Efficient usage of data plane resources 
      o Efficient usage of control plane resources 
      o Supporting flexible design of recovery schemes 
 
Soumiya & Czezowski (Eds.)        Expires - August 2003       [Page 2]
         draft-soumiya-lmp-fault-notification-ext-00.txt February 2003

 
   That controlled flooding meets the requirements on fault notification
   mechanisms and has beneficial side effects over notification via 
   GMPLS signaling is discussed in [3].  Flooding-based notification is
   also appropriate for shared mesh-based recovery schemes that are 
   promoted for their resource efficiency and flexibility.  Generic 
   mechanisms for implementing a flooding-based fault notification 
   protocol are proposed in [6]. 
     
   In this draft, we describe the implementation of flooding-based fault
   notification in a recovery scenario, and provide the necessary 
   extensions to LMP message formats and data object definitions. 
 
1.1 Terminology 
    
   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in [7]. 
 
1.2 Glossary of Terms Used 
    
   In addition to the terminology for GMPLS-based recovery that is 
   documented in [2], this draft uses the following acronyms: 
    
      o  GMPLS:   Generalized Multiprotocol Label Switching [8]  
      o  LMP:     Link Management Protocol [9]  
      o  LSP:     Label Switched Path   
      o  NMS:     Network Management System 
      o  OTN:     Optical Transport Network  
      o  RSVP-TE: Resource Reservation Protocol-Traffic Eng. [10]  
 
 
2. Fault Recovery Scenario 
    
   In this section, a fault recovery scenario is described based on 
   shared mesh recovery.  Every node maintains an adjacency with each of
   its neighbors via at least one LMP control channel.  A pre-planned 
   recovery path table is configured using extended GMPLS signaling 
   messages or through a Network Management System (NMS). 
    
   When a failure occurs, the following procedure is carried out: 
    
      1. A downstream node close to the failure detects it.  This node 
   is called the detecting node.  If path is bi-directional, an upstream
   node also detects it. The detecting node should report the detection 
   of the failure (and becomes the reporting node). 
    
      2. The reporting node unicasts FaultNotify messages to all its 
   immediate neighbor nodes.  The node continues sending unicast 
 
Soumiya & Czezowski (Eds.)        Expires - August 2003       [Page 3] 
         draft-soumiya-lmp-fault-notification-ext-00.txt February 2003 

 
   FaultNotify messages periodically until it receives FaultNotifyAck 
   messages from its neighbors, or a timer to retry sending has expired.
   A FaultNotify message contains the node ID of the reporting node, the
   link ID of the failed link and a sequence Number. The message may 
   optionally contain a TTL, failed wavelength ID, or failed SRLG ID. 
    
      3. A neighbor node receives FaultNotify message with failure data
   and sequence number. 
    
      4. The receiving node confirms it has not yet received the message
   about this failure.  For this purpose, it searches a database indexed
   on the failure data and sequence number.  The database stores the 
   failure data and sequence numbers from the received messages. 
         4a. If it has not yet received the message about this failure,
   it adds the failure data and sequence number into the database.  The
   node then unicasts FaultNotify messages to all its neighbors, except
   the node that sent the message. 
         4b. If it has already received the message about this failure,
   go to 6. 
    
      5. The receiving node possibly sets up one or more protection 
   paths according to a pre-planned protection table and the failure 
   data. 
    
      6. The receiving node sends back FaultNotifyAck message to the 
   node that sent the FaultNotify message. 
    
   [Optional]: If the receiving node has set up one or more protection 
   paths, it sends a ProctectionCompleteNotify message to either the 
   egress node of the protection path or to the NMS. It continues 
   sending ProtectionCompleteNotify messages periodically until it 
   receives a ProtectionCompleteNotifyAck message or a timer to retry 
   sending has expired. 
    
   [Optional]: The receiving node at the egress of the protection path,
   or the NMS sends back ProtectionCompeteNotifyAck message. 
    
    
                         [R1]---[R2]---[R3]---[R4] 
                           \                  / 
                           [R5]-------------[R6] 
                           /  \           /   \ 
                        [R7]---[R8]---[R9]---[R10] 
    
     Working LSP1:  [R1->R2->R3 >R4]   Working LSP2:  [R7->R8->R9->R10]
     Recovery LSP1: [R1->R5->R6->R4]   Recovery LSP2: [R7->R5->R6->R10]
    
                      Figure 1: Shared Mesh Recovery 
    
 
Soumiya & Czezowski (Eds.)        Expires - August 2003       [Page 4] 
         draft-soumiya-lmp-fault-notification-ext-00.txt February 2003

 
   Figure 1 shows an example of shared mesh-based recovery.  One working
   path, W-LSP1, runs from R1 to R4 via R2 and R3, and another working 
   path, W-LSP2, runs from R7 to R10 via R8 and R9.  R1 can provide user
   traffic protection by creating a backup LSP that merges with the 
   working LSP at R4.  We refer to a R-LSP1 [R1->R5->R6->R4] as the 
   recovery LSP of W-LSP1.  In the same manner, we refer to R-LSP2 [R7-
   >R8->R9->R10] as recovery of W-LSP2. 
    
   In this situation, if it can be assumed that multiple failures do not
   occur at a same time, the resource for recovering the working LSPs 
   can be shared.  In other words, the resource between R5 and R6 can be
   shared between the recovery LSP1 and recovery LSP2.  By setting up 
   these recovery LSPs, the spare/work capacity ratio in the network can
   be reduced. 
    
   When a failure occurs at a link between R8 and R9 on the working W-
   LSP2, the endpoint nodes (both R8 and R9) of the link will detect the
   failure (if the link is bi-directional).  Then, the detecting nodes 
   start sending FaultNotify messages in the flooding-based manner. In 
   case of R9, the messages are sent to R6 and R10.  When a FaultNotify 
   message is received, these nodes send back a FaultNotifyAck message 
   to the sending node.  In the same manner, they flood the messages to 
   their immediate neighbors. 
    
   The FaultNotify message includes information regarding the failure 
   such as FAULT_ID, etc.  When nodes on the recovery LSPs receive the 
   FaultNotify message, they activate the pre-calculated recovery path. 
   In this example, R-LSP2 is activated and R7 switches W-LSP2 traffic 
   to be carried by R-LSP2.  Also R10 merges R-LSP2 to the original 
   route.  As the result, traffic will take the path [R7->R5->R6->R10]. 
    
    
3. Additional LMP Message Formats 
    
   LMP is a good candidate protocol to extend for the purposes of fault 
   notification. Flooding-based fault notification is quite simple, and 
   only two messages (FaultNotify and FaultNotifyAck) need to be 
   defined. Furthermore, most of the necessary data objects are already 
   defined in LMP [9]. 
    
3.1  FaultNotify Message (Msg Type = TBD) 
    
   <FaultNotify Message> ::= <Common Header> <MESSAGE_ID> 
                                [<TTL>] <FAULT_ID> <LOCAL_NODE_ID>  
                                {<LINK_ID> [<CHANNEL_STATUS>] ...} 
   or 
     
   <FaultNotify Message> ::= <Common Header> <MESSAGE_ID> 
                                <TTL> <FAULT_ID> [<SRLG ID> ...] 
 
Soumiya & Czezowski (Eds.)        Expires - August 2003       [Page 5] 
         draft-soumiya-lmp-fault-notification-ext-00.txt February 2003 

 
3.2  FaultNotifyAck Message (Msg Type = TBD) 
    
   <FaultNotifyAck Message> ::= <Common Header><MESSAGE_ID_ACK> 
    
   The contents of the MESSAGE_ID_ACK object MUST be obtained from the 
   FaultNotify message being acknowledged. 
 
 
4. Additional LMP Object Definitions 
    
   The formats for the Common Header at the beginning of LMP messages 
   and the LMP objects used to build the messages are defined in [9].  
   That document also defines the MESSAGE_ID, MESSAGE_ID_ACK, 
   LOCAL_NODE_ID, and CHANNEL_STATUS data objects used in our extended 
   messages. The SRLG_ID data object is defined in [11]. This leaves us 
   to define data objects for TTL and FAULT_ID. 
      
4.1 TTL Class (Class = TBD) 
    
   o  C-Type = 1, Time to Live (= Hop Count) 
    
   0                   1                   2                   3  
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1  
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+  
   |     TTL       |                    Reserved                   |  
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+  
    
   TTL:  8 bits 
    
      This is an unsigned integer to indicate a remaining hop count 
      value.  A node receiving a FaultNotify message having a TTL of 
      zero MUST silently discard the message. 
    
   This object is non-negotiable.  
        
4.2 FAULT_ID Class (Class = TBD) 
    
   o  C-Type = 1, Failure Identifier 
    
   0                   1                   2                   3  
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1  
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+  
   |           Reserved            |         FaultId               |  
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+  
    
   FaultId:  16 bits 
    
      This MUST be a node-wide unique unsigned integer. The FaultId 
 
Soumiya & Czezowski (Eds.)        Expires - August 2003       [Page 6] 
         draft-soumiya-lmp-fault-notification-ext-00.txt February 2003

 
      identifies the sequence of failures.  A node increases the value 
      when it detects a failure. 
    
   This object is non-negotiable. 
 
 
5. Priority-Based Recovery 
    
   Fault recovery schemes typically assume single failure events. 
   However, there may occur multiple failures in some short time 
   interval.  Protection against occurrences of failure scenarios 
   requires exorbitant spare capacity.  Ideally, the network should at 
   least save some of the working paths in this situation.  
    
   For example, consider figure 2, when two failures occur at a same 
   time.  One failure is a link failure between R3 and R4, and the other 
   failure is also link failure between R7 and R8.  In this example, R4 
   detects a failure and send fault notification message using flooding 
   to R6.  Also, R8 detects a failure and sends a message to R5.  At R6, 
   a recovery path switches traffic from R6 to R4 because the fault 
   notification message is for W-LSP1.  On the other hand, at R5, a 
   recovery path switches traffic from R5 to R6 because the fault 
   notification message is for W-LSP2.  As the result, an invalid 
   recovery path is set to follow [R7->R5->R6->R4]. 
    
 
                         [R1]---[R2]---[R3]-X-[R4] 
                           \                  / 
                           [R5]-------------[R6] 
                           /  \           /   \ 
                        [R7]-X-[R8]---[R9]---[R10] 
    
     Working LSP1:  [R1->R2->R3->R4]   Working LSP2:  [R7->R8->R9->R10] 
     Recovery LSP1: [R1->R5->R6->R4]   Recovery LSP2: [R7->R5->R6->R10] 
    
                   Figure 2: Multiple failure scenario. 
 
   Priority-based control is an effective solution for the case of 
   saving specific working paths in multiple failure condition.  In the 
   above example, if the priority of W-LSP1 is higher than W-LSP2, then 
   the fault notification messages for W-LSP1 are preferred.  In other 
   words, the system checks the priority of the protection path and 
   changes the setting by priority.  In that case, the setting of R6 to 
   R4 takes place over R6 to R10.  By adopting the priority-based 
   control, such misbehavior can be avoided.  As the result, the high 
   priority protection path is set up.  This priority should be set 
   according to a network operator's policy and/or network service.  
 
 
Soumiya & Czezowski (Eds.)        Expires - August 2003       [Page 7] 
         draft-soumiya-lmp-fault-notification-ext-00.txt February 2003 

 
6. Security Considerations 
    
   Security requirements depend on the level of trust between nodes that 
   exchange fault notification messages.  In general, when nodes in a  
   pre-OTN network are in the same administrative domain than when 
   talking to nodes in a different administrative domain, the security 
   consideration may apply more relaxed. 
    
   When flooding-based fault notification mechanism is implemented based 
   on LMP [9], the security mechanisms of LMP can be adopted.  All LMP 
   messages should be sent over an IPsec channel that has been either 
   pre-established or is set-up on a per need basis. 
    
   Note however that fault recovery protocol itself introduces no new 
   security considerations. 
 
    
7. Conclusion 
    
   This draft describes extensions to the Link Management Protocol (LMP) 
   for use in flooding-based fault notification in pre-OTN networks. 
   While there are currently several Internet Drafts in the Sub-IP Area 
   related to service recovery in GMPLS networks, fault notification 
   method for control plane-based networks has not been specifically 
   detailed in any one document.  We believe that flooding base fault 
   notification method is the best way to satisfy fault recovery 
   requirements.  We show how the notification functions in fault 
   recovery scenarios. 


Soumiya & Czezowski (Eds.)        Expires - August 2003       [Page 8] 
         draft-soumiya-lmp-fault-notification-ext-00.txt February 2003 

 
References
                     
   [1] Bradner, S., "The Internet Standards Process -- Revision 3", 
       BCP 9, RFC 2026, October 1996. 
    
   [2] Mannie, E., et al, "Recovery (Protection and Restoration) 
      Terminology for GMPLS", Internet Draft, work in progress, draft-
      ietf-ccamp-gmpls-recovery-terminology-01.txt, November 2002. 
    
   [3] Czezowski, P., and T. Soumiya (Eds.), "Optical network failure 
      recovery requirements", Internet Draft, work in progress, draft-
      czezowski-optical-recovery-reqs-01.txt, February 2003. 
    
   [4] Lang, J.P. and B. Rajagopalan (Eds.), "Generalized MPLS Recovery 
      Functional Specification", Internet Draft, work in progress, 
      draft-ietf-ccamp-gmpls-recovery-functional-00.txt, January 2003. 
    
   [5] Papadimitriou, D., et al, "Analysis of Generalized MPLS-based 
      Recovery Mechanisms (including Protection and Restoration)", 
      Internet draft, work in progress, draft-ietf-ccamp-gmpls-recovery-
      analysis-00.txt, January 2003. 
     
   [6] Rabbat, R., and V. Sharma (Eds.), "Fault Notification Protocol 
      for GMPLS-based Recovery", Internet Draft, work in progress, 
      draft-rabbat-fault-notification-protocol-02.txt, February 2003. 
    
   [7] Bradner, S., "Key words for use in RFCs to Indicate Requirement 
       Levels", BCP 14, RFC 2119, March 1997. 
    
   [8] Mannie, E. (Ed.), "Generalized Multi-Protocol Label Switching 
      (GMPLS) Architecture", Internet Draft, work in progress, draft-
      ietf-ccamp-gmpls-architecture-03.txt, August 2002. 
    
   [9] Lang, J. (Ed.), "Link Management Protocol (LMP)", Internet Draft, 
      draft-ietf-ccamp-lmp-07.txt, November 2002. 
    
   [10] Berger, L. (Ed.), "Generalized MPLS Signaling - RSVP-TE 
      Extensions", Internet Draft, work in progress, draft-ietf-mpls-
      generalized-rsvp-te-09.txt", September 2002. 
    
   [11] Fredette, A., and J. Lang (Eds.), "Link Management Protocol 
      (LMP) for DWDM Optical Line Systems", Internet Draft, work in 
      progress, draft-ietf-ccamp-lmp-wdm-01.txt, September 2002. 


Soumiya & Czezowski (Eds.)        Expires - August 2003       [Page 9] 
                                     

Acknowledgments 
    
   The following individuals provided valuable input to this draft: 
   Richard Rabbat, Ching-Fong Su and Takafumi Chujo of Fujitsu Labs of 
   America, and Masafumi Katoh and Akira Chugo of Fujitsu Laboratories, 
   Ltd. 
    
    
Editors' Addresses  
        
   Toshio Soumiya                    Peter Czezowski
   Fujitsu Laboratories Ltd.         Fujitsu Labs of America, Inc.
   1-1, Kamikodanaka 4-Chome         595 Lawrence Expressway
   Nakahara-ku, Kawasaki             Sunnyvale, CA 94085
   211-8588, Japan                   United States of America
   Phone: +81-44-754-2765            Phone: +1-408-530-4516
   Email: soumiya.toshio@jp.fujitsu.com Email: peterc@fla.fujitsu.com


Contributing Authors 
    
   Shinya Kanoh                      Takeo Hamada 
   Fujitsu Laboratories Ltd.         Fujitsu Labs of America, Inc. 
   1-1, Kamikodanaka 4-Chome         595 Lawrence Expressway 
   Nakahara-ku, Kawasaki             Sunnyvale, CA 94085 
   211-8588, Japan                   United States of America 
   Phone: +81-44-754-2765            Phone: +1-408-530-4516 
   Email: kanoh@jp.fujitsu.com       Email: thamada@fla.fujitsu.com 


Soumiya & Czezowski (Eds.)      Expires - August 2003        [Page 10]