Fault Notification Protocol for GMPLS-Based Recovery  November 2002 

CCAMP Working Group                                Richard Rabbat (FLA) 
Internet Draft                                      Ching-Fong Su (FLA) 
Expires: May 2003                              Norihiko Shinomiya (FLL) 
                                               Vishal Sharma (Metanoia) 
                                                   Takafumi Chujo (FLA) 
                                                      Akira Chugo (FLL) 
                                                          November 2002 
 
           Fault Notification Protocol for GMPLS-Based Recovery 
               draft-rabbat-fault-notification-protocol-01.txt
     
Status of this Memo  
    
   This document is an Internet-Draft and is in full conformance with 
   all provisions of Section 10 of RFC2026.  
    
   Internet-Drafts are working documents of the Internet Engineering 
   Task Force (IETF), its areas, and its working groups.  Note that 
   other groups may also distribute working documents as Internet-
   Drafts.  
    
   Internet-Drafts are draft documents valid for a maximum of six months 
   and may be updated, replaced, or obsoleted by other documents at any 
   time. It is inappropriate to use Internet-Drafts as reference 
   material or to cite them other than as "work in progress."  
    
   The list of current Internet-Drafts can be accessed at  
        http://www.ietf.org/ietf/1id-abstracts.txt  
   The list of Internet-Draft Shadow Directories can be accessed at  
        http://www.ietf.org/shadow.html.  
    
Abstract  
    
   This draft describes a fault notification protocol to be used in 
   recovery mechanisms in GMPLS-based networks.  This protocol achieves 
   bounded time activation of protection paths in the case of single 
   failures, based on constrained protection/restoration path routing 
   and requirements on the nodes in terms of physical capabilities, 
   mainly the ability to switch Label Switched Paths simultaneously and 
   the control plane delay characteristics.  The draft proposal presents 
   a complete solution to the problem and justifies choices made for the 
   notification method, extensions required to current algorithms and 
   protocols, in addition to a discussion of security and deployment 
   issues. 
 
Table of Contents 
    
   Status of this Memo...............................................1 
   Abstract..........................................................1 

 
Rabbat                    Expires - May 2003                  [Page 1] 
           Fault Notification and Service Recovery Protocol   November 
2002 
 
 
   Table of Contents.................................................1 
   1. Overview.......................................................2 
   2. Terminology....................................................3 
   3. Glossary of Terms Used.........................................3 
   4. Requirements at Protection Path Setup Time.....................4 
   5. Protocol Steps in Failure Notification and Service Recovery....4 
   5.1 T1: Fault Detection Time......................................5 
   5.2 T2: Hold-Off Time.............................................6 
   5.3 T3: Fault Notification and Completion of Recovery Operation...6 
   5.3.1 Delays Incurred by Messages.................................8 
   5.4 T4: Traffic Recovery Time.....................................9 
   6. Finding the Time-Constrained Protection Path...................9 
   6.1 Finding a Sub-graph for the Protection Algorithm..............9 
   6.2 Reversion After Fixing the Failure...........................10 
   7. Security Considerations.......................................10 
   8. Conclusion....................................................10 
   Appendix A. Fault Notification Message Delays on Path............10 
   A.1 Delays Associated with Link Traversal........................11 
   A.2 Delays Incurred at the Nodes.................................11 
   Appendix B. Finding a Sub-graph Subject to Time Constraints......12 
   9. References....................................................14 
   10. Author's Addresses...........................................15 
    
    
1.   Overview 
    
   The issue of recovery (protection and restoration) in optical 
   switching networks is of high importance to ensure high-availability 
   and uninterrupted service.  Several mechanisms for protection in  
   mesh and ring topologies have been devised.  One draft by the 
   protection and restoration design team [1] looks at differences 
   between protection, restoration, path-based, link-based and span-
   based approaches.  Protection and restoration algorithms can be used 
   for local repair (link-based or node-based), span protection and path 
   protection.  This document describes a fault notification protocol 
   designed to ensure bounded recovery times, e.g., 50 ms recovery time, 
   which is comparable to recovery in ring-based SONET/SDH networks. 
      
   Reference [2] describes the terminology used for recovery in the case 
   of protection and restoration.  A link-based protection algorithm can 
   handle faults such as fiber link failures and transponder failures.  
   In the case of a node failure, the control plane uses either node-
   based or path-based recovery.  The advantage of span-based and path-
   based protection lies in its ability to reduce wavelength redundancy 
   (wavelengths that are reserved for possible failures) but its 
   disadvantage is the potentially lengthy  delay incurred in notifying 
   all nodes along the protection path, of the failure of a remote 

 
Rabbat                    Expires - May 2003                  [Page 2] 
           Fault Notification and Service Recovery Protocol   November 
2002 
 
 
   resource.  In some applications, protection paths need to be chosen 
   carefully to meet some restoration time requirement (e.g., 50ms). 
    
   Several drafts being discussed by the Internet Engineering Task Force 
   deal with signaling and methodology that can be used in a framework 
   for guaranteeing a bounded protection and recovery time, but none do 
   provide hard guarantees for GMPLS-based SONET and WDM networks, 
   although some have claimed results of fast reroute on MPLS-based IP 
   networks. For example, although several mechanisms and frameworks 
   discuss requirements for 50 ms protection switching, they do not 
   propose a process that achieves the goal.  This document addresses 
   this issue.  In addition, the proposal addresses the issue of 
   scalability, an important issue when using signaling for fault 
   notification. 
    
   This document presents a fault-notification protocol that is both 
   technology and topology agnostic.  It applies to intra-domain 
   protection.  Multi-domain protection is left for further study. 
 
   We assume unidirectional traffic through Label Switched Paths (LSPs) 
   and, where relevant, discuss applicability to bi-directional traffic, 
   which consists of two unidirectional LSPs.  For the purpose of 
   illustration, we also assume a mesh WDM network; applicability to 
   ring topology is automatic. 
    
    
2.   Terminology 
    
   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 
   document are to be interpreted as described in [3]. 
    
    
3.   Glossary of Terms Used 
    
   FDN: Failure Detection Node, defined in this draft as the node that 
   detects the resource failure, either by a signal from the 
   optical/transport layer 
    
   MEMS: Micro-Electro Mechanical Systems 
    
   PXC: Photonic Cross-Connect, a cross-connect that switches 
   wavelengths transparently, by means of a switching fabric such as 
   MEMS 
    
   AIS: Alarm Indication Signal, a signal at the SONET/SDH transport 
   layer 

 
Rabbat                    Expires - May 2003                  [Page 3] 
           Fault Notification and Service Recovery Protocol   November 
2002 
 
 
   BDI: Backward Defect Indication, a signal at the transport layer sent 
   upstream 
    
    
4.   Requirements at Protection Path Setup Time  
    
   When the protection algorithm calculates the protection path for a 
   certain type of protection (local repair or end-to-end) in a GMPLS-
   enabled  network, it distributes labels for that path to reserve the 
   link resources (wavelengths, wavebands, etc.) for protection and 
   possibly select them.  These link resources could be used to carry 
   preemptible best-effort traffic to increase the utilization of the 
   network, when the protection path is not activated.  Alternatively, 
   the same link resource may be reserved by multiple protection paths 
   for different link failures as long as these protection paths do not 
   need to be activated simultaneously (e.g., M:N shared protection). In 
   either case, proper link resource needs to be activated upon the 
   notification of failure. 
    
   When a label for a protection LSP is set up on a certain node A 
   through RSVP-TE or CR-LDP, node A must know what network resource 
   this LSP is protecting.  In the case of RSVP-TE for example, the 
   protection PATH message may notify all nodes on the protection path 
   with that information at path setup as in the case of [4].  This 
   allows node A to bundle labels (as well as its link resources) that 
   protect a particular network resource.  For example, if two labels j 
   and k correspond to two LSPs used to protect working paths from the 
   failure of link (X, Y), then they belong to the bundle L (X, Y).  
   This allows node A to jointly activate/cross-connect both LSPs 
   referenced by labels j and k when it receives notification of the 
   failure of link (X, Y).   
    
   This documents proposes a method for per-failure fault notification 
   (as compared to per-LSP fault notification), hence such bundled label 
   information is essential.  The main difference between "per-failure" 
   vs. "per-LSP" notification is the number of notification mechanisms 
   that need to be started.  Per-failure fault notification allows one 
   mechanism to engage to notify all relevant nodes of the fault.  On 
   the other hand and in the case of per-LSP notification, as many 
   mechanisms as there exists failed LSP's (for example, all LSPÆs that 
   failed due to a link failure) have to be engaged. 
    
    
5.   Protocol Steps in Failure Notification and Service Recovery  
    

Rabbat                    Expires - May 2003                  [Page 4] 
           Fault Notification and Service Recovery Protocol   November 
2002 
 
 
   The steps described in this section present a fault notification 
   protocol that details the process used in notifying nodes of the 
   resource failure and activating the protection lightpaths.  The 
   failure sequence is based on Internet Draft in [5] adapted to WDM 
   networks; it is similar to the timing sequence sent in the ITU-T 
   communication entitled G.GPS.  The timing diagram in Figure 1 is 
   reproduced and modified based on [5].  The critical component in 
   guaranteeing time constraints to service recovery is the fault 
   notification process.  The following sequence of events MUST be 
   followed in order to ensure that the recovery process happens within 
   a specific amount of time, as is the case of SONET/SDH-based 
   networks.   
    
    
    --Network Impairment  
    |    --Fault Detected  
    |    |    --Start of Notification   
    |    |    |    --Recovery Operation Complete 
    |    |    |    |    --Path Traffic Restored 
    |    |    |    |    |  
    |    |    |    |    |  
    v    v    v    v    v  
   ------------------------------------------------  
    | T1 | T2 | T3 | T4 |  
    
   Figure 1. Recovery Timing Diagram 
    
    
5.1    T1: Fault Detection Time 
    
   This is the period of time between the network impairment and the 
   detection at the control plane.  We define the Failure Detection Node 
   (FDN) to be the node at which the detecting entity detects the fault.  
   An example of such network impairment is a fiber cut.  Layer 1 at a 
   certain node detects the fault and passes it to the control plane.  
   This document assumes that equipment in the optical network can 
   detect such failures.  This time is not included in the calculation 
   of the recovery time.  In general, in case a full-duplex link is cut, 
   two nodes will detect the fault, one upstream of the fault, and the 
   other downstream.  In the case of a unidirectional link failure, the 
   node downstream detects the failure.  In that case, that node will 
   send at the transport layer a signal such as the Backward Defect 
   Indication (BDI) defined in ITU-T G.709 to the node upstream that 
   will also act as FDN.  We assume that the time difference between 
   detection and inference based on BDI is negligible.  Other transport 
   plane technologies MUST offer the same capability to be used in this 
   context.  So both upstream and downstream nodes detect the failure. 

 
Rabbat                    Expires - May 2003                  [Page 5] 
           Fault Notification and Service Recovery Protocol   November 
2002 
 
 
   To support the failure detection requirement, nodes MUST implement 
   per-channel monitoring that will pinpoint the failure and report it 
   to the detecting entity. 
    
5.2    T2: Hold-Off Time 
    
   This is the period of time that elapses before the node that has 
   detected the fault starts the fault-recovery process.  This time 
   assumes that the fault-recovery process at a given layer (for 
   example, the recovery protocol outlined in this draft) may wait for 
   recovery to occur at another layer (for example, SONET/SDH 
   protection).  In the case of WDM-based protection, this time should 
   be 0 sec since there is no underlying protection layer that will kick 
   in.  
    
   In the case of GMPLS-enabled IP network over SONET, the T2 may be set 
   to 50ms such that SONET protection scheme can activate before any IP 
   (MPLS) layer protection is triggered. For GMPLS-enabled SONET over 
   WDM, the choice is a bit complicated. 
   Protection mechanisms such as SONET/SDH protection could be used in 
   the same environment in conjunction with WDM-based protection by 
   picking either protection mechanism or no protection at all.  
   Allowing redundant protection mechanisms for the same lightpath may 
   increase the recovery time.  The SONET/SDH layer, if it exists, makes 
   the decision as to whether to request from the WDM layer a protected 
   or unprotected lightpath to connect the SONET equipment.   
    
5.3    T3: Fault Notification and Completion of Recovery Operation 
    
   T3 is the period between the time when FDN starts sending out a fault 
   notification message, and the time when every node including ingress 
   nodes on the corresponding protection paths have been notified of the 
   failure and finished reconfiguring themselves for carrying restored 
   traffic. The ingress node to the protection LSP is the same node as 
   the FDN in the case of link-based protection.  The ingress node 
   SHOULD be as close to the link failure as possible.  This reduces the 
   recovery time since no messages have to be relayed to a remote or 
   centralized authority to initiate that recovery.  
    
   Some ingress or egress nodes may detect a failure, for example, a 
   Loss of Light (LoL) event.  The fault notification message MUST be 
   initiated by the FDN even if the ingress and egress nodes have 
   detected the error.  This allows the fault notification mechanism to 
   solve for the worst-case scenarios and gives timely notification of 
   all concerned nodes on the protection path(s).  For the purpose of 
   this draft, transport plane signals such as the AIS (Alarm Indication 

 
Rabbat                    Expires - May 2003                  [Page 6] 
           Fault Notification and Service Recovery Protocol   November 
2002 
 
 
   Signal) and the BDI will be disregarded by all OXCs except the FDNs.  
   It is to be noted that fault notification occurs at the control plane 
   to minimize layer interaction. 
    
   The FDN MAY use several fault notification methods to notify other 
   nodes of the failure, including GMPLS-based signaling and flooding. 
   In the case of GMPLS-based signaling, there is generally one fault 
   notification message per disrupted Label Switched Path.  Hence, 
   signaling does not scale well with the number of connections; in 
   addition, the message processing delay is less predictable. For 
   details about the notification methods and the choice of flooding for 
   this draft, the reader is encouraged to refer to [6]. This document 
   specifies a notification protocol based on message flooding. 
    
   In the case of flooding, the message sent from the FDN to all other 
   nodes on the different protection paths should reach them within the 
   specified amount of time Trec (recovery time)minus the 
   reconfiguration time Tcfg needed at each node after fault 
   notification.  We define this time to be Tntf = Trec - Tcfg.  Tntf is 
   the fault notification time.  This is explained in detail in Appendix 
   B. 
    
   Nodes on a protection path (including the ingress node) are aware 
   that they are protecting against the failure of a particular resource.  
   All nodes notified of the failure will activate the protection path 
   by performing any required hardware reconfiguration (for example, 
   moving mirrors in the case of a MEMS-based switching fabric).  It is 
   important for the reconfiguration to happen in parallel.  If parallel 
   reconfiguration is not available, there is need to use a protection 
   algorithm that protects resources on physically-disjoint protection 
   LSPÆs.  The ingress node starts sending data on the protection path 
   at the start time S(I) specified in the next paragraph.  If the 
   detecting entities at the ingress or egress node detects at the data 
   plane a failure the protection lightpath to be activated, it MUST 
   raise an alarm that may be dealt with at the management plane.  The 
   management plane will take appropriate remediation action.  Alarm and 
   remediation are outside the scope of this draft. 
    
   The nodes on protection paths receive the fault notification from the 
   FDN, within a deterministic time.  This time delay is calculated by 
   each node as explained in Appendix A.  In order to avoid complex 
   clock synchronization implementations, an ingress node identified as 
   node I that receives the notification from FDN node J will calculate 
   the start time S(I) at which it switches traffic to the protection 
   path as follows: 
    
      S(I) = time-of-notification(I) - min-delay-between(J, I) + Trec 

 
Rabbat                    Expires - May 2003                  [Page 7] 
           Fault Notification and Service Recovery Protocol   November 
2002 
 
 
      where 
    
         time-of-notification (I) returns the clock time at node I; 
         min-delay-between(J, I) returns the minimum time needed for the 
                  notification from J to reach node I; 
    
   Note that (time-of-notification (I) - min-delay-between(J, I)) will 
   give the time when failure was detected at J, and Trec is the 
   recovery time requirement.  For simplicity, in this example, we 
   assume that hardware reconfiguration time and fault notification time 
   have been considered during the protection path set up, which will be 
   considered in Section 6. Hence at S(I), every node on the protection 
   paths should have been notified of the failure and finished 
   reconfiguration. 
    
   Fault notification is carried out through message flooding as follows.  
   The FDN sends a notification packet to its neighbors on all outgoing 
   links.  The notification packet is a high-priority packet.  The 
   packet contains the unique global identifier of the link at fault.  
   Each node that receives such a packet duplicates it on as many 
   outgoing links to neighboring nodes that it has (minus the node it 
   came from and any other node it may have received a copy from), and 
   sends the duplicates to its neighbors.  The node also sends an 
   acknowledgement back on the link it received the message from. 
    
5.3.1      Delays Incurred by Messages 
    
   The above discussion suggests that in order for the protection 
   algorithm to abide by the Trec ms recovery requirement, it needs to 
   be either: 
    
      1. Aware of timing issues to be able to select a proper path. 
      2. Passed a set of nodes and links that satisfy the timing 
      constraints. 
    
   The protection algorithms found in the literature work by computing a 
   protection path for the working paths that require protection.  All 
   resources identified in the network topology are usually used in 
   calculating the protection paths.  In contrast, a modified protection 
   algorithm should be executed in the case of a strict requirement on 
   the recovery time.  For example, a pruned topology should be 
   considered for protection path computation.  A database of link 
   information should hold the fiber physical length and the capacity of 
   each link (or channel) as well as the notification message processing 
   time.  The total time needed by a notification packet to travel from 
   source to destination can be broken into two types of delay: the time 

 
Rabbat                    Expires - May 2003                  [Page 8] 
           Fault Notification and Service Recovery Protocol   November 
2002 
 
 
   needed to traverse each link and the time needed to go through each 
   node.  The different delay calculations are discussed in Appendix A. 
    
5.4    T4: Traffic Recovery Time 
    
   This is the time between the last recovery action and the time that 
   the traffic (if present) is completely recovered.  This interval is 
   intended to account for the time required for traffic to once again 
   arrive at the point in the network that experienced disrupted or 
   degraded service due to the occurrence of the fault, i.e. the egress 
   node. 
    
    
6.   Finding the Time-Constrained Protection Path 
    
   In [6], we outline two methods for dealing with recovery time 
   constraints for the protection path and picks the most suitable.  The 
   first method consists of choosing a sub-graph G' of the original 
   graph G where path calculation will yield a protection path that can 
   be used within a recovery time of Trec milliseconds.  The second 
   method consists of changing the protection algorithm to make it aware 
   of time constraints while computing a protection path. Because of the 
   complexity of the second method, this document recommends the first 
   method and provide detailed explanation in the following. 
    
6.1     Finding a Sub-graph for the Protection Algorithm 
    
   This method relies on the view of the current network topology, 
   including node distances and channel speeds.  We can run a shortest 
   path algorithm such as the Dijkstra or Bellman-Ford algorithm to 
   implement this method.  When computing shortest paths, for every link 
   that is considered, the metric used in calculating a new shortest 
   path will associate the time delay incurred to traverse that link, 
   instead of just computing hop count.  This of course assumes 
   knowledge of the delays incurred while traversing the links and nodes 
   and is equivalent to finding the fastest path.  A delay metric 
   associated with each link allows the Dijkstra algorithm to keep track 
   of total delay incurred on links and nodes.  We assume an upper delay 
   limit of Trec (ms).  The algorithm shown in Appendix B is considered 
   with applicability to both link and path protection.  Note that we 
   use the words link and arc interchangeably to mean directed link. 
    
   This ensures that the paths returned by the shortest path algorithm 
   can be activated within time Trec after the failure is detected.  Of 
   course, it does not guarantee that such a path can be found.  The 
   functioning of the algorithm is to determine the set of nodes and 
   arcs that can be used in calculating the protection path, which will 

 
Rabbat                    Expires - May 2003                  [Page 9] 
           Fault Notification and Service Recovery Protocol   November 
2002 
 
 
   guarantee that the protection path will be activated within time Trec.  
   The protection algorithm can then calculate protection paths using 
   sub-graph G' rather than graph G for either link or path protection. 
    
   In the case of link-based protection, the algorithm just needs to 
   define the sub-graph that can be reached by any of the two nodes at 
   the endpoints of the link.  In the case of path-based or span-based 
   protection, the algorithm checks that irrespective of where the fault 
   happens, the fault recovery process can activate the protection 
   path(s) within time Trec.  This method identifies eligible nodes, but 
   does not guarantee that the algorithm will find a path that satisfies 
   the delay constraint, since once the graph is pruned of some nodes 
   that it could use, it has less choice in finding a protection path.  
   The advantage though is that the protection algorithm will run over a 
   small graph G' than the original graph G, leading to a speedup in the 
   protection path computation.  A side effect of this algorithm is that 
   the protection algorithm will find a path in the "neighborhood" of 
   the failed link. 
    
6.2    Reversion After Fixing the Failure 
    
   Most of the current literature recommends that when the failed link 
   or node is back online, then traffic should be moved back to the 
   original path, as it is more efficient.  This can be achieved by 
   having the LSP stop using the protection path and revert to the 
   previously unavailable working path. 
    
    
7.   Security Considerations  
     
   This draft makes use of several protocols; therefore this draft does 
   not introduce any new security issues besides the ones that arise in 
   the use of these protocols. 
    
    
8.   Conclusion 
    
   This draft discusses a fault notification and service recovery 
   protocol for GMPLS-enabled optical networks.  It presents the steps 
   required in the notification process, leading to lightpath service 
   recovery within specific time bounds.  The protocol proposes an 
   algorithm to find the subset of nodes in the network that can be used 
   to pick a protection path that can activated within that time bound. 
    
    
Appendix A.            Fault Notification Message Delays on Path 
    

Rabbat                    Expires - May 2003                 [Page 10] 
           Fault Notification and Service Recovery Protocol   November 
2002 
 
 
   This appendix describes the delays incurred on the path.  Two types 
   of delays occur on the path between any two nodes.  They are delays 
   incurred during traversal of the links on that path, and delays that 
   occur at the nodes along the path.  The following presents the 
   computations and expected values for the different delays. 
    
A.1    Delays Associated with Link Traversal 
    
   The time needed to traverse each link is the sum of the transmission 
   time and the link propagation delay: 
    
   1. The transmission time is a value based on link capacity.  The 
      calculation is as follows: D trans = (packet size) / (link 
      speed). 
   2. The link propagation delay is due to the physical length of the 
      link: D prop = length / (light propagation speed on fiber). 
    
   The notification packet size is dependent on its content, and will be 
   provided after information about the packet is presented.  The length 
   of such a packet is usually of the order of a hundred bytes (about 
   10^3 bits).  As an example, for a link speed of 1 Gbps, 
    
   D trans ~= 10^3 / 10^9 = 10^-6 s = 1 micro-second. 
    
   This value therefore can safely be ignored in calculating delays.  On 
   the other hand, the link propagation delay in metropolitan area and 
   long-haul networks affects total delay.  For a distance of 100 km, 
   with light speed in a fiber at 2/3 (about 200,000 km/s) of its speed 
   in free space, this delay would be 0.5 ms. 
    
A.2    Delays Incurred at the Nodes 
    
   At each node, two delays are important: queuing delay and processing 
   time.  The processing time D proc has been identified in the 
   literature as a few tenths of a millisecond in the case of an RSVP 
   object.  This value is smaller in the case of a simpler IP packet 
   requesting the activation of an LSP path. 
    
   The issue of queuing delay is important at all intermediate nodes.   
   Fault notification messages should be queued at the front of the 
   buffer that holds other control packets in order to avoid queuing 
   delays, (those messages do not have to contend with data packets 
   since obviously no data are sent in the control channel).  A queuing 
   process such as priority queuing would allow those packets to be 
   admitted at the head of the queue, through the setup of the priority 
   of the packet.  A simple mechanism such as the setup of the priority 
   bits at the IP header, such as the IP precedence bits or DSCP code 

 
Rabbat                    Expires - May 2003                 [Page 11] 
           Fault Notification and Service Recovery Protocol   November 
2002 
 
 
   points of the TOS (Type Of Service) byte would be appropriate.  Using 
   priority queuing for fault notification messages will ensure that the 
   queuing delay will be bounded.  In the case of flooding for fault 
   notification, D queue(a) = 0 sec.  If other fault notification 
   messages are in the queue as well, this implies multiple failures, 
   where the time recovery guarantee does not apply.  Otherwise, it may 
   indicate the fact that multiple messages are traveling on different 
   protection paths to notify the same link failure, such as the case 
   when a signaling protocol is used for fault notification.  In the 
   case of per-LSP fault notification just as in the case of using a 
   signaling protocol, the maximum queuing delay at node a is: 
    
   D queue max(a)= (number of protection paths) * (packet size) / (link 
   bandwidth). 
    
   This explains mathematically the choice against using a signaling 
   protocol for fault notification.  Flooding allows that value to be 0 
   sec.  In the absence of priority queuing, the maximum queue delay can 
   be calculated as follows at node a, assuming fair queuing of the FIFO 
   buffers of all control channels and assuming input buffers only: 
    
   D queue max(a)= (number of queues) * (queue size) / (link bandwidth). 
    
   This value is an upper bound, and is dependent on hardware buffer 
   implementations. 
    
    
Appendix B.            Finding a Sub-graph Subject to Time Constraints 
    
   For simplicity, we assume the recovery time constraint is Trec 
   seconds and the reconfiguration time at each node once notified of 
   the failure is Tcfg.  Thus, the allowable fault notification time 
   Tntf = Trec - Tcfg. 
     
   Initialization: 
      G (N, A) is the graph containing the set N of nodes n and set A 
      of arcs (I, J) representing the control channel links 
      // Note that the links in this graph could be different from the 
      // links used for the data channels 
    
      For every link (I, J) in A 
         Set cost (I, J) = d proc (I) + d queue max (I) + d prop (I, J) 
      // this sets all link costs to a delay value that includes the 
      // delays incurred at the node I. 
    
      node-neighborhood-for-failure (I, (I, J)) is the set of nodes that 
      are Tntf away from node I if link (I, J) fails initially set to {}. 

 
Rabbat                    Expires - May 2003                 [Page 12] 
           Fault Notification and Service Recovery Protocol   November 
2002 
 
 
   Procedure Find-node-neighborhood-for-failure (I, (I, J)) 
   Begin 
      Run shortest path algorithm on graph pruned from link (I, J) 
       
      // This results in a set SC (I), that contains for each node X,  
      // the smallest cost (minimum delay) from node I. 
      // min-delay-between (I, X) is defined in section 5.3 as the 
      // minimum time (smallest cost) needed for a message to travel  
      // from I to X. 
       
      For every entry for node X in SC, 
         min-delay-between (I, X) += d proc (X) + d queue max (X) 
         // add the delays incurred at the last node 
       
      // SC now contains the shortest time from node I to any node 
      // including processing time at the end-nodes for link (I, J) 
      For every entry in SC, if min-delay-between (I, X) < Tntf, then 
         Add node X in SC to node-neighborhood-for-failure (I, (I, J)) 
    
      // If the message cannot go from I to J within Tntf sec, then  
      // the node is not reachable within the time constraints 
   End 
    
   Procedure Prune-for-link-protection (I, J) 
   Begin 
      Find-node-neighborhood-for-failure (I, (I, J)) 
      Find-node-neighborhood-for-failure (J, (J, I)) 
      Set reachable (I, J) = union of  
               node-neighborhood-for-failure (I, (I, J)) 
               and node-neighborhood-for-failure (J, (J, I)) 
      // This is the set of nodes reachable by either endpoints of  
      // link (I, J) within time Tntf 
      Return reachable (I, J) 
   End 
    
   Procedure Prune-for-path-protection (P (I1, I2,... In))  
   // Replace path with span for procedure for span protection  
   Begin 
      Set N' = N 
      // all nodes originally 
      For any link (Ik; Ik+1) of path P 
      Begin 
         Prune-for-link-protection (Ik, Ik+1) 
         Set N' = intersection of N' and reachable (Ik, Ik+1)  
      End 
   End 

 
Rabbat                    Expires - May 2003                 [Page 13] 
           Fault Notification and Service Recovery Protocol   November 
2002 
 
 
   Running for link (I, J) protection: 
      Prune-for-link-protection (I, J) 
      Set N' = reachable (I, J) 
      Run link protection algorithm considering only nodes in set N' and 
         any data links between them (not necessarily the same links as  
         the control channels) 
    
   Running for Path P (I1, I2,... In) protection: 
      Prune-for-path-protection (P (I1, I2,... In)) 
      Run path protection algorithm considering only nodes in set N' and 
         any data links between them 
    
   Running for Span S (I1, I2,... In) protection: 
      Prune-for-span-protection (S (I1, I2,... In)) 
      Run span protection algorithm considering only nodes in set N' and 
         any data links between them 
    
    
9.   References  
                     
   [1] Papadimitriou, D., et al, "Analysis Grid for GMPLS-based Recovery 
      Mechanisms (including Protection and Restoration)", Internet 
      draft, work in progress, draft-papadimitriou-ccamp-gmpls-recovery-
      analysis-02.txt, August 2002. 
    
   [2] Mannie, E., et al, "Recovery (Protection and Restoration) 
      Terminology for GMPLS", Internet Draft, work in progress, draft-
      ietf-ccamp-gmpls-recovery-terminology-01.txt, November 2002. 
     
   [3] Bradner, S., "Key words for use in RFCs to Indicate Requirement 
      Levels", BCP 14, RFC 2119, March 1997. 
    
   [4] Li, G., J. Yates, et al, "Experiments in Fast Restoration using 
      GMPLS in Optical/Electronic Mesh Networks", Post-deadline Papers 
      Digest, OFC 2001, Anaheim, CA, March 2001. 
    
   [5] Sharma, V., et al, "Framework for MPLS-based recovery", Internet 
      Draft, draft-ietf-mpls-recovery-frmwrk-08.txt, October 2002, 
      approved as Informational RFC. 
    
   [6] Rabbat, R. et al, "Fault Notification and Service Recovery in WDM 
      Networks", white paper available at: 
      http://perth.mit.edu/~richard/wp-ietf-fault-notification.pdf. 
    

Rabbat                    Expires - May 2003                 [Page 14] 
     Fault Notification Protocol for GMPLS-Based Recovery  November 2002 

    
10.    Author's Addresses  
        
   Richard Rabbat                      Ching-Fong Su 
   Fujitsu Labs of America, Inc.       Fujitsu Labs of America, Inc. 
   595 Lawrence Expressway             595 Lawrence Expressway 
   Sunnyvale, CA 94085                 Sunnyvale, CA 94085 
   United States of America            United States of America 
   Phone: +1-408-530-4537              Phone: +1-408-530-4572 
   Email: rabbat@fla.fujitsu.com       Email: csu@fla.fujitsu.com 
    
   Norihiko Shinomiya                  Vishal Sharma  
   Fujitsu Laboratories Ltd.           Metanoia, Inc.  
   1-1, Kamikodanaka 4-Chome           305 Elan Village Lane, Unit 121  
   Nakahara-ku, Kawasaki               San Jose, CA 95134-2545  
   211-8588, Japan                     United States of America 
   Phone: +81-44-754-2635              Phone: +1-408-955-0910  
   Email: shinomi@jp.fujitsu.com       Email: v.sharma@ieee.org 
    
   Takafumi Chujo                      Akira Chugo 
   Fujitsu Labs of America, Inc.       Fujitsu Laboratories Ltd. 
   595 Lawrence Expressway             1-1, Kamikodanaka 4-Chome 
   Sunnyvale, CA 94085                 Nakahara-ku, Kawasaki 
   United States of America            211-8588, Japan 
   Phone: +1-408-530-4507              Phone: +81-44-754-2629 
   Email: takafumi@fla.fujitsu.com     Email: chugo@flab.fujitsu.co.jp 
    

Rabbat                    Expires - May 2003                 [Page 15]