CCAMP Working Group Richard Rabbat (Fujitsu Labs of America) Internet Draft Vishal Sharma (Metanoia, Inc.) Expires: August 2004 Zafar Ali (Cisco Systems) February 2004 Expedited Flooding for Restoration in Shared-Mesh Transport Networks draft-rabbat-expedited-flooding-01.txt Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026 [1]. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract Optical transport networks require fast restoration mechanisms with tight time bounds in order to recover from resource failures in a timely manner. These failures may include fiber cuts, transponder failure and node failures. Time-bounded recovery is challenging in mesh-based transport networks with shared protection. This draft discusses some currently available mechanisms and their limitations, and explains the need for an expedited flooding mechanism to accomplish this objective. The draft also highlights some challenges, mitigating factors, and possible solutions in the implementation of an expedited flooding protocol. Rabbat, R., et al Expires - August 2004 [Page 1] draft-rabbat-expedited-flooding-01.txt February 2004 Table of Contents 1. Introduction...................................................2 2. Terminology....................................................3 3. Restoration in Packet Versus Shared Mesh Transport Networks....3 4. Signaling versus Flooding Notification for Transport Networks..6 4.1 Signaling-Based Notification..................................6 4.2 Flooding-Based Notification...................................7 4.3 Comparison Between Signaling- and Flooding-Based Notification.8 4.4 Alternative Notification Method..............................12 5. Expedited Flooding for Notification...........................13 5.1 Operation upon Fault Repair..................................14 5.2 Impact of Expedited Flooding on Network Operation............14 5.3 Graceful Degradation.........................................15 5.3.1 Loss of Notification Messages..............................15 5.3.2 Multiple Fiber Cuts........................................16 6. Conclusion....................................................16 7. Intellectual Property Considerations..........................17 8. References....................................................18 9. Authors' Addresses............................................19 10. Full Copyright Statement.....................................20 1. Introduction With networks evolving towards a packet data layer operating on an optical transport layer controlled by an IP-based control plane, the nature and type of restoration needed at each layer is changing as well. Time-constrained recovery and protocol scalability are crucial in order to reduce service interruption and eliminate duplication of fault notification. We illustrate how recovery strategies at the data and transport layers have important differences based on their goals. A key mechanism during recovery is fault notification, which conveys information from the node detecting the fault to the nodes responsible for activating the protection path. We consider solutions at the transport layer and present a comparative analysis of a signaling-based approach to notification and flooding-based notification. Section 3 describes differences between restoration mechanisms in packet data networks versus restoration in shared-mesh based optical transport networks. Section 4 discusses the fault notification when using a signaling-based and flooding-based approach, respectively. We study the scalability and impact on recovery time of both solutions. Section 5 describes the qualities of expedited flooding for notification and highlights its advantages as compared to the flooding mechanisms of unmodified or native link-state protocols. It also describes how expedited flooding can be used in the case of Rabbat, R., et al Expires - April 2004 [Page 2] draft-rabbat-expedited-flooding-01.txt February 2004 multiple concurrent fiber cuts and supports graceful degradation. Section 6 concludes this draft. 2. Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [2]. 3. Restoration in Packet Versus Shared Mesh Transport Networks In this section, we point out certain key differences between restoration in packet and transport networks. In MPLS packet networks, one can pre-signal and pre-configure a backup LSP for a working LSP. This is because selecting a label for a backup LSP at a node is sufficient to be able to switch traffic for that LSP when that traffic arrives. If resources (buffers and bandwidth) are required for the backup LSP, they can also be reserved in advance (during the LSP signaling phase). As long as there is no failure on the working LPS, these same resources can still be used by low-priority or extra-traffic LSPs. This holds true even for shared mesh restoration in MPLS networks. In this case, multiple labels are assigned, one for each of the backup LSPs transiting a node (corresponding to link- or node- disjoint working LSPs that they protect) and using the shared backup path. But in this case, only one set of resources (buffers, bandwidth) needs to be reserved. When we consider transport networks, the situation is different. Now, a backup LSP can be pre-signaled but not pre-configured (unless simple 1+1 protection is desired). This is because, once an LSP in a transport network is established (that is, it is cross-connected), the full bandwidth of the LSP is automatically consumed, irrespective of whether traffic actually flows on that LSP. For this reason, when implementing shared restoration schemes in transport networks (or allowing extra-traffic between endpoints other than the source-destination of a backup LSP) a backup LSP cannot be cross-connected until after the specific failure for which this LSP was pre-signaled has occurred. Thus, for transport networks an additional step of reconfiguration is required at all the nodes that lie along the path of a backup LSP corresponding to a working LSP. Rabbat, R., et al Expires - April 2004 [Page 3] draft-rabbat-expedited-flooding-01.txt February 2004 This difference is illustrated in Figure 1 and Figure 2 below. Figure 1(a) shows a shared mesh restoration scenario in a packet-based MPLS network. There are two working LSPs W1 and W2, with a single shared backup LSP P1/P2. The label assignments have been made as shown (L1 and L2 for W1 at node B, L1Æ and L2Æ for P1 at node F, and L3Æ and L4Æ for P2 at node F, and so on). When a fault affecting W1 occurs (Figure 1(b)), node A upon learning of the failure immediately performs a protection switch and begins transmitting working traffic from W1 down the backup LSP P1 with the label L1Æ. Node F now simply label switches the traffic arriving on link A-F with label L1Æ by placing it on the outgoing link F-C with label L2Æ. Node F may squelch or drop the low-priority traffic (or any extra-traffic; LSPs E1 and E2 respectively) that was being carried when the backup LSP was not active, by simply purging it from its outgoing queues. ############# ############# ............. ############# # - # . - # # +----------|A|----------+ # . +----------|A|----------+ # # | * - # E1 | # . | # - . | # # | P1/P2* | # | # . | # | . | # # | * | # | # . | # | .Dropped | # L1 # | * | #L5 | # L3 . | P1 # | . | # # | L1'* | ########> | # . | # | ........> | # # - * - L6 - # . - # - - # # |B|-------*-|F|---------|D|# . |B|-------#-|F|---------|D|# # - * - L7 - # . - # - - # L2 # | * | ######### | # . | # | ......... | # # | L2'* | # E2 | # L4 . | # | . | # # | * | #L8 | # . \/ # | .Dropped | # # | * | # | # . /\ # | . | # # | V - V | # . | V - V | # # +----------|C|----------+ # . +----------|C|----------+ # # - # . - # #############> <############ .............> <############ W1 W2 W1 W2 (a) (b) Figure 1. Restoration scenario in a simple MPLS-based packet network By contrast, Figure 2 illustrates the same situation for an optical transport network, where the LSPs in question are lambda LSPs. For ease of exposition, we assume a single lambda per link. Figure 2(a) shows the same network and LSPs are in the previous case. As shown in Figure 2(b), here the intermediate node F, upon learning of a fault along LSP W1, has to first drop any extra-traffic (or low priority) LSPs that were using the bandwidth (lambda) reserved for the backup Rabbat, R., et al Expires - April 2004 [Page 4] draft-rabbat-expedited-flooding-01.txt February 2004 LSP P1/P2. It then reconfigures its cross-connect matrix to connect the incoming lambda on link A-F to the outgoing lambda on link F-C (that is, it changes its configuration from A-F -> F-D and D-F -> F-C to A-F -> F-C). ############# ############# ............. ############# # - # . - # # +----------|A|----------+ # . +----------|A|----------+ # # | * - # E1 | # . | * - . | # # | P1/P2* | # | # . | * | . | # # | * | # | # . | * | .1. Drop | # # | * | # | # . | P1 * | . | # # | * | ########> | # . | * | ........> | # # - * - - # . - * - - # # |B|-------*-|F|---------|D|# . |B|-------*-|F|---------|D|# # - * - - # . - * - 2. Recfg. - # # | * | ######### | # . | * | ......... | # # | * | # E2 | # . | * | . | # # | * | # | # . \/ * | . | # # | * | # | # . /\ * | . | # # | V - V | # . | V - V | # # +----------|C|----------+ # . +----------|C|----------+ # # - # . - # #############> <############ .............> <############ W1 W2 W1 W2 (a) (b) ............. ############# . - # . +----------|A|----------+ # . | # - 3. Switch | # . | # | | # . | # | | # . | P1 # | | # . | # | | # . - # - - # . |B|-------#-|F|---------|D|# . - # - - # . | # | | # . | # | | # . \/ # | | # . /\ # | | # . | V - | # . +----------|C|----------+ # . - # .............> <############ (c) W1 W2 Rabbat, R., et al Expires - April 2004 [Page 5] draft-rabbat-expedited-flooding-01.txt February 2004 Figure 2. Restoration scenario in an optical transport network, for a situation identical to that depicted in Figure 1. As shown in Figure 2(c), it is only then that node F is able to carry the working traffic from W1 on P1. Thus, there needs to be a way to inform all of the intermediate nodes (here F) of the failure along the working path, so that each node can appropriately reset its cross-connects. This is not required in packet networks as illustrated in Figure 1. 4. Signaling versus Flooding Notification for Transport Networks In transport networks, after the fault localization step, there are several options that can be used for transmitting information about the fault. A possible choice is to make use of control plane signaling (sending RSVP-TE Notify messages as per [3]). This is the approach used in [4] where each LSP end-node sends a Notify message to its corresponding end-node and receives an ACK back. Another option is to use flooding, where the detecting node floods the network with information about the fault. We describe both mechanisms, and compare and contrast them in terms of speed and complexity. 4.1 Signaling-Based Notification Control plane signaling can be used to notify nodes of a failure and recover from that failure [4]. In the case of signaling, the process or recovery from a link failure is briefly as follows: The steps of the process for a node that detects a failure to notify the LSP sources are as follows: - Detect all LSPs that are affected by a link failure - Send a failure indication message to the source of each identified LSP - Intermediate nodes that receive the message forward it to on to the LSP source node When each LSP source node receives the failure indication message, it performs the following: - It sends a failure acknowledgement message to the detecting node. Intermediate nodes upon receiving that message in that case send it on to the originating node. - It sends an end-to-end switchover request message to the LSP destination node along the protection path, with information about the LSP that is to be recovered The LSP destination node sends an end-to-end switchover response message back to the LSP source node along the protection path. Rabbat, R., et al Expires - April 2004 [Page 6] draft-rabbat-expedited-flooding-01.txt February 2004 Upon receipt of the response message, the LSP source node starts sending data on the protection path. In the case of simple 1:1 protection, the amount of messaging in the above scheme can be kept small. This is not the case of shared-mesh restoration, however. As an example, consider a network with 100 nodes and 200 fiber links, with an average path length of 10 hops; a cut of a fiber that carries eighty wavelengths will lead to the generation of the following number of messages: - 80 messages (end-to-end failure indication) sent by the detecting node to the LSP source nodes - 80 messages (end-to-end failure acknowledgement) sent by LSP source nodes for each of the previous messages - 80 messages (end-to-end switchover request) from each LSP source node to each LSP destination node - 80 messages (end-to-end switchover response) from the LSP destination node to the LSP source node Failure indication and acknowledgement messages travel an average of five hops, while end-to-end switchover requests and responses travel an average of ten hops. This simple scenario generates 80*5 + 80*5 + 80*10 + 80*10 = 2400 message hops. This calculation does not take into account any acknowledgements needed to ensure reliable transmission. In general, for mesh networks with shared restoration, the number of messages needed to recover from failures can be very large and may lead to notification storms [5]. In addressing this point, reference [4] recommends making use of link bundling [6] to decrease the messaging need. In that respect, when "the working and protection links are mapped to component links, and the labels are the same on the working and protection links, it may be possible to change the component links without needing to re- signal each individual LSP" [4]. This condition may only be applicable in a few select cases. Another issue is the length of time it takes to finish the process. Notification time is crucial to the recovery process; thus lengthening that time is detrimental to speedy recovery. 4.2 Flooding-Based Notification An alternative approach to address the issue of messaging is to use flooding. Instead of sending per-LSP notification and initiating per-LSP recovery at each LSP source node, the node that detects a failure (e.g. transponder failure or fiber cut) notifies all nodes of the network. Nodes that are concerned with the recovery take the actions required of them while others forward the messages on with no Rabbat, R., et al Expires - April 2004 [Page 7] draft-rabbat-expedited-flooding-01.txt February 2004 extra action but knowledge about the resource failure in order to maintain an accurate picture of resource availability. One such implementation of flooding is by using a link state routing protocol such as Open Shortest Path First (OSPF). The usual link- state protocol floods advertisements periodically. In fact, OSPF requires that Link State Advertisements (LSAs) be refreshed every 1800 seconds [7] and otherwise be expired in 3600 seconds. Flooding frequency is crucial to the stability of the network, since increasing it may lead to excessive messaging and a larger number of retransmissions and ACKs. In the case of recovery from link failure in data networks, this may not be a problem and using OSPF-based flooding could be a good solution that decreases the amount of messaging relative to signaling. A flooding method for transport networks, however, needs to add another dimension to the flooding efficiency: the speed of notification. Thus, a solution that applies to transport networks needs to be developed. As discussed in Section 3, flooding in transport networks needs to occur much faster than relying on OSPF hold-off timers. In addition, OSPF flooding is heavy and carries a variety of maintenance information. This has the downside of relying on a protocol that by design is engineered to be slow and reliable to try to deliver time critical fault information. Time-constrained flooding, which we call expedited flooding, has the ability to deliver a lightweight solution to fault notification. Expedited flooding is used to notify nodes of a fault. When a fault is corrected, it can be sent at a slower pace. This is done in order to minimize the possibility of fluttering, which in itself may lead to network meltdowns [8]. The advantage of expedited flooding is the ability to meet requirements for time constraints. In any case (whether using OSPF-based flooding or a new expedited flooding mechanism), the number of messages needed versus signaling is substantially decreased. For the example cited in section 4.1, the number of messages is the number of fibers (200). This results in a reduction in messaging to less than a 10th of the messaging used in signaling. 4.3 Comparison Between Signaling- and Flooding-Based Notification In this section, we present a theoretical comparative analysis of the messaging needs of signaling and flooding. We compare two metrics: keeping to time bounds and the number of messages generated in the worst-case scenario. Rabbat, R., et al Expires - April 2004 [Page 8] draft-rabbat-expedited-flooding-01.txt February 2004 We consider a control plane network graph G = (N, A), where N is the set of nodes and A is the set of control channel links; n = |N| and m = |A|. We also consider the set B of data-plane links. We consider a mesh DWDM (Dense Wavelength Division Multiplexing) network with L wavelengths per link, and look at unidirectional paths for simplicity. The worst-case scenario for signaling occurs when the protection LSPs do not share an ingress or egress. The worst-case scenario for flooding occurs when the control plane connectedness is very sparse. In comparing messaging needs, we consider the scenarios of signaling- based notification in Figure 3 and flooding-based notification of Figure 4. In Figure 3, B detects a link fault between B and C and sends a Notify message towards LSP ingress S (arrows 1 and 2). S sends an acknowledgement back (arrows 3Æ and 4Æ) and starts an RSVP-TE handshake process with the destination, the LSP egress T (arrows 3 through 16). The notation nÆ indicates that the message is sent asynchronously at step n. Ingress S sends messages 3 and 3Æ independently of each other. In Figure 4, flooding can only follow the same path as signaling. In that case, B sends a notification message to the network that reaches S after steps 1 and 2. The notification message is forwarded on to T through steps 4 through 9. Acknowledgements (nÆ) are sent asynchronously between every node pair. After T has received the message, no further action is required, though the notification message is forwarded to the remaining nodes. Node S knows at what time to start sending data on the activated protection LSP and does so. This example shows that the theoretical messaging steps needed in the case of flooding are smaller than those of signaling in all cases, and thus lead to a shortened recovery time. If we assume that a control channel associated with the protection LSP has a path length of len(cc), and that the length of path that the Notify message travels is len (notification), the maximum number of messages (with no consideration to the acknowledgements) for signaling and flooding respectively to finish the notification steps are: o Maximum messages(signaling) = len(notification) + 2 * len(cc) o Maximum messages(flooding) = len(notification) + len(cc) In the case of flooding, the number of messages is on average smaller, while it is fixed in the case of signaling. Rabbat, R., et al Expires - April 2004 [Page 9] draft-rabbat-expedited-flooding-01.txt February 2004 3Æ -> 4Æ -> <- 2 <- 1 ----- ----- ----- ----- ----- ----- | | | | | | | | | | | | | S |----| A |----| B |--/--| C |----| D |----| T | 3 | | | | | | | | | | | | 10 ----- ----- ----- ----- ----- ----- ^ | | | ^ | | v | | | v ----- ----- ----- ----- ----- ----- 16 | | | | | | | | | | | | 9 | E |----| F |----| G |-----| H |----| I |----| J | | | | | | | | | | | | | ----- ----- ----- ----- ----- ----- 4 -> 5 -> 6 -> 7 -> 8 -> <- 15 <- 14 <- 13 <- 12 <- 11 Figure 3. Signaling-Based Fault Notification Messages 3Æ -> 2Æ -> 12Æ -> 11Æ -> <- 2 <- 1 <- 11 <- 10 ----- ----- ----- ----- ----- ----- | | | | | | | | | | | | | S |----| A |----| B |--/--| C |----| D |----| T | 3 | | | | | | | | | | | | 10Æ ----- ----- ----- ----- ----- ----- ^ | | | ^ | | v | | | V ----- ----- ----- ----- ----- ----- 4Æ | | | | | | | | | | | | 9 | E |----| F |----| G |-----| H |----| I |----| J | | | | | | | | | | | | | ----- ----- ----- ----- ----- ----- 4 -> 5 -> 6 -> 7 -> 8 -> <- 5Æ <- 6Æ <- 7Æ <- 8Æ <- 9Æ Figure 4. Worst-Case Scenario for Flooding-Based Fault Notification Messages Rabbat, R., et al Expires - April 2004 [Page 10] draft-rabbat-expedited-flooding-01.txt February 2004 We now consider the ability to keep to strict time bounds using signaling and flooding, respectively, via the scenarios shown in Figure 5 and Figure 6. In these scenarios, the working LSPs are (C, D, E, F), (B, C, D, E, F) and (A, B, C, D, E, F). The network uses path protection, so the protection LSPs of the aforementioned working LSPs are (C, G, H, I, F), (B, G, H, I, F) and (A, G, H, I, F) respectively. In both figures, for simplicity, we only show the forward messages, i.e. the notification and the reservation messages, and not the acknowledgement or the RSVP-TE Path messages. <- 3 <- 2 <- 1 <- 2* <- 1* <- 1ö ----- ----- ----- ----- ----- ----- | |----| |----| |----| |---/-| |----| | | A | | B |----| C |----| D |--/--| E |----| F | | | | | | |----| |-/---| |----| | ----- ----- ----- ----- ----- ----- 4 | | 3* | 2ö ||| ^ ^ ^ | | | | | | ||| | | | v | | v | v ----- ----- ----- 5ö 6* 7 | | -------| |-----| |----| | | ----------------| G |-----| H |----| I | -------------------------| |-----| |----| | ----- ----- ----- 3ö -> 4ö -> 4* -> 5* -> 5 -> 6 -> Figure 5. Loose Time Bounds in Signaling-Based Fault Notification In Figure 5, the cut of fiber (D; E) results in the loss of three LSPs and subsequent notification of nodes responsible for activating the backup LSPs for each of these working LSPs. Message sets i, i* and iö each recover one of the LSPs. In this scenario, it is possible for messages 2ö, 3* and 4 to arrive at the control plane of G at the same time, and their arrival order leads to an order in the recovery of the LSPs. Therefore, a situation arises when the subsequent messages 3ö, 4* and 5 then 4ö, 5* and 6, etc. start arriving simultaneously at the control planes of the subsequent nodes and so have to be buffered at each of them. This leads to scalability problems when the number of LSPs that need to be recovered grows. In general, to accommodate the queuing of signaling messages at the nodes, it would be necessary to take into consideration the maximum Rabbat, R., et al Expires - April 2004 [Page 11] draft-rabbat-expedited-flooding-01.txt February 2004 number of LSPs that may fail at the same time, in our case, L LSPs (the number of wavelengths). Subsequently, at each node, one would have to account for the maximum queuing delays experienced by the signaling messages. This increases the recovery time substantially. When equipment is upgraded or more wavelengths are added, the earlier calculations to account for buffering delay will not hold. Therefore, the network either has to allow a longer delay or a possible reconfiguration of all nodes, or is not upgradeable. To satisfy a notification time bound, the queuing delay calculations would need to assume an in-band control channel and priority queuing for fault notification. o Max queuing delay(signaling) = (L-1) * [ len(notification) + 2 * len(cc) ] <- 3 <- 2 <- 1 <- 9 ----- ----- ----- ----- ----- ----- | |----| |----| |----| |---/-| |----| | | A | | B |----| C |----| D |--/--| E |----| F | | | | | | |----| |-/---| |----| | ----- ----- ----- ----- ----- ----- 4 | | 5 | 6 ||| ^ | | | | | | ||| | v | | v | v ----- ----- ----- 8 | | -------| |-----| |----| | | ----------------| G |-----| H |----| I | -------------------------| |-----| |----| | ----- ----- ----- 7 -> 8 -> Figure 6. Strict Time Bounds for Flooding-Based Fault Notification With flooding, the messaging is much simpler. Considering the scenario of Figure 6, only one message per control channel is exchanged. In the worst-case scenario where G receives messages from nodes A, B and C simultaneously, after it has processed one message, messages from other nodes can be safely discarded after minimal processing. Thus, the messages experience no queuing delays in the case of single faults: o Max queuing delay(flooding) = 0 4.4 Alternative Notification Method Ideally, when a fault affecting multiple LSPs occurs, the notification should only target the nodes that are involved in the recovery procedure, including the ingress and egress nodes of the Rabbat, R., et al Expires - April 2004 [Page 12] draft-rabbat-expedited-flooding-01.txt February 2004 different LSPs and the nodes on the recovery LSPs. This would ensure that nodes that are not affected by the failure do not have to perform any processing. This mechanism could be implemented through multicast addressing. However, keeping multicast trees constantly updated (with changing topology) and the amount of data needed to maintain them, adds undue complexity and makes multicast notification not very practical. 5. Expedited Flooding for Notification Given the reasoning presented in section 4, a rapid, lightweight flooding mechanism (or what we call an ôexpedited floodingö mechanism) is a promising candidate for achieving time-bounded notification in transport networks. Such a scheme would allow each node to rapidly forward notification messages (after performing some minimal operations on them), and perform any required processing and attendant reconfiguration in the background. In this manner, the fault notification propagates rapidly through the network, eventually reaching the edge nodes (or nodes responsible for restoration action), while at the same time allowing the intermediate nodes along the path of the backup LSP to reconfigure themselves. Operationally, the scheme would work in the following manner. A node that detects a fault sends a notification packet to all its neighbors, containing the identity of the link, node, or interface at fault. Each node upon receiving a notification packet on an incoming link performs a local check to ensure that the same fault has not been reported earlier. - If it has, the node does not need to take any further action on this packet and can discard it. - If it has not, the node immediately broadcasts the incoming packet on all its remaining outgoing interfaces (after possibly updating a TTL field). At the same time, the node examines its routing and TE databases to ascertain whether it is on the backup path of (a) working LSP(s) affected by the fault reported by the notification packet. If it is not, the node has no further work to do. If it is, the node takes the appropriate action (such as dropping extra traffic or low priority traffic and reconfiguring its cross-connect) to be able to forward the traffic arriving on the backup LSP corresponding to the affected working LSP(s). There has been a lot of interest in the networking community to allow for fast restoration and recovery. Several proposals such as flooding over one of many parallel links between neighbors [9], [10], Rabbat, R., et al Expires - April 2004 [Page 13] draft-rabbat-expedited-flooding-01.txt February 2004 processing Hello messages at higher priority within the network and at the node [11], flooding over spanning trees [12] have been proposed. Expedited flooding can also take advantage of features like flooding over one of many parallel links between neighbors [9], [10]. Specifically, if two systems are connected by multiple parallel point-to-point links, flooding can be done over only one such link. If the link designated for flooding goes down and at least one other parallel link is still up, a different parallel link is designated for flooding. 5.1 Operation upon Fault Repair Once a previously detected fault is corrected or repaired, it is important to notify the network nodes of this event. The same process that notifies the nodes of a fault event should also notify them of the fault recovery event. This ensures consistency in network state. Since information about the recovery of a resource is not time critical, in this case, the detecting node can hold off sending a fault recovered flooding message for some appropriate amount of time. For example, if one were using a link-state routing protocol for expedited flooding, the fault recovered message would be sent to the network using regular protocol flooding without bypassing its hold- off mechanisms. This allows the detecting node to dependably track the state (up or down) of a resource. In the event that the detecting node observes the resource to be oscillating between the ôupö and ôdownö states, it would know that flapping or a mis-configuration may exist in the network and could suppress the expedited flooding mechanism. It could then either invoke a remedial action at other layers or raise an alarm. The specific action that nodes take upon receiving a "fault recoveredö message is based on policy. 5.2 Impact of Expedited Flooding on Network Operation It is important to realize that the time-bounded recovery application requires only a light-weight flooding scheme. Specifically, normal flooding for link state advertisements needs to guarantee convergence of the link-state routing protocol. It is, therefore, vital for link state flooding to ensure that link state PDUs (LSAs in the case of OSPF or LSPs in the case of ISIS) that are originated after the initial network topology database synchronization between neighbors is completed are delivered to all routers within the flooding scope limits (an area or the whole AS depending on the protocol and the type of the link state PDU). Expedited flooding mechanisms discussed Rabbat, R., et al Expires - April 2004 [Page 14] draft-rabbat-expedited-flooding-01.txt February 2004 here, on the other hand, are one-shot notifications that are expedited only at the time when a link failure is detected. As the expedited fault notification message is active only in the event of a failure in the network, the impact of expedited flooding discussed here on the operation of normal link state flooding is minimal. For the same reason, the average message rate for expedited flooding becomes negligible. Hence, the expedited flooding messages discussed here can bypass the hold-off mechanisms typically used for dampening in usual routing protocols. The processing overhead for a node to find whether or not it has heard about the fault reported by a received notification message is also quite small. (A node can determine whether or not it is on the backup path of a working LSP(s) affected by the fault reported by a notification packet by simply looking at the local information on a line card.) 5.3 Graceful Degradation It is important, from a providerÆs perspective, to ensure that the network operates stably, and that in the case of unanticipated failures (multiple, near-simultaneous fiber cuts, for example) network performance degrades gracefully, while maintaining stability. In this section, we outline two key issues and their solutions: the loss of notification messages and multiple simultaneous fiber cuts. 5.3.1 Loss of Notification Messages A notification message could be lost for a variety of reasons including lack of buffer space in the control plane, packet error, software bug and general misconfiguration. If a notification message or its acknowledgement is lost, the control plane that sent it does not receive an acknowledgement within a specific period of time. Therefore, it will wait for a period of time before retransmitting the message. This period of time ensures the stability of the network. After a number of unsuccessful retries, the control channel would be considered DOWN and no further retransmission would be tried. In that event, the node that could not successfully send the notification message would raise an alarm and try to notify the LSP endnodes to stop transmitting data. By implementing diversity in the network, the operator offers mitigation strategies against such errors and achieves fast notification and recovery. A network operator can use the expected probability of error in notification messages when calculating expected recovery times to be able to degrade gracefully. Rabbat, R., et al Expires - April 2004 [Page 15] draft-rabbat-expedited-flooding-01.txt February 2004 5.3.2 Multiple Fiber Cuts When multiple fiber cuts occur almost simultaneously and the recovery LSPs share a protection resource, notification of that event occurs while the network has not had enough time to compute updated backup LSPs. This occurrence is considered to be extremely unlikely û a US-based carrier mentioned that the simultaneous occurrence of multiple fiber cuts is less than 1% of the total number of fiber faults. The probability of the cuts affecting the same protection resource is also small (generally 1 to 10%), making this a very low probability event. The time-bounded nature of the notification allows for nodes to intelligently deduce the occurrence of multiple faults, and (depending on the timing) enables them to: either activate the correct backup LSP(s) or block the activation of all backup LSPs, thus degrading gracefully. To solve the multiple fiber cut problem, a multi-pronged tie-breaking strategy can be adopted such as follows: a. LSP priority: A node considers the priority of the LSPs that need to be recovered and will bump the LSPs with the lower priority. This concludes the problem if LSP priority is used. b. In the event the LSPs have the same priority, the node will select the LSP that originates at an edge of smallest or largest node id. c. If a node receives subsequent fault notifications after the activation of a given backup LSP, and the tie-breaking rules used above dictate a different decision, the nodes will disable/tear down all backup LSPs, and intimate the LSPÆs respective source nodes via a signaling message. 6. Conclusion This draft discussed the issue of time-constrained recovery in optical transport networks. We highlighted important issues to demonstrate the appropriateness of using a flooding-based approach to notification, including scalability concerns and time-bounded recovery. We observed that traditional flooding mechanisms, such as OSPF flooding, if used unmodified would not be appropriate for such critical failure notification. Therefore, we highlighted the need for a fast flooding mechanism and outlined how its operation would have minimal impact on the network. Rabbat, R., et al Expires - April 2004 [Page 16] draft-rabbat-expedited-flooding-01.txt February 2004 7. Intellectual Property Considerations This section is taken from Section 10.4 of RFC2026 [1]. The IETF takes no position regarding the validity or scope of any intellectual property or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; neither does it represent that it has made any effort to identify any such rights. Information on the IETF's procedures with respect to rights in standards-track and standards-related documentation can be found in BCP-11. Copies of claims of rights made available for publication and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementors or users of this specification can be obtained from the IETF Secretariat. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights, which may cover technology that may be required to practice this standard. Please address the information to the IETF Executive Director. Rabbat, R., et al Expires - April 2004 [Page 17] draft-rabbat-expedited-flooding-01.txt February 2004 8. References [1] Bradner, S., "The Internet Standards Process -- Revision 3", BCP 9, IETF RFC 2026, October 1996. [2] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels," BCP 14, IETF RFC 2119, March 1997. [3] Berger, L. (Editor) et al., "Generalized MPLS Signaling - RSVP- TE Extensions," IETF RFC 3473, January 2003. [4] Lang, J., et al (eds.), "RSVP-TE Extensions in support of End- to-End GMPLS-based Recovery," work in progress, September 2003. [5] Papadimitriou, D., Mannie, E. (eds.), "Analysis of Generalized Multi-Protocol Label Switching (GMPLS)-based Recovery Mechanisms (including Protection and Restoration)," work in progress, September 2003. [6] Kompella, K., Rekhter, R. and Berger, L., "Link Bundling in MPLS Traffic Engineering," work in progress, July 2002. [7] Moy, J., "OSPF Version 2," IETF RFC 2328, April 1998. [8] Yu, J., "Scalable Routing Design Principles," IETF RFC 2791, July 2000. [9] A. Zinin and M. Shand, "Flooding Optimizations in Link-State Routing Protocols," work in progress. [10] J. Moy, "Flooding over Parallel Point-to-Point Links," work in progress. [11] A. S. Maunder and G. Choudhury, "Explicit Marking and Prioritized Treatment of Specific IGP Packets for Faster IGP Convergence and Improved Network Scalability and Stability", work in progress. [12] Gagan L. Choudhury, and Vishwas Manral "LSA Flooding Optimization Algorithms and Their Simulation Study", draft- choudhury-manral-flooding-simulation-00.txt. Rabbat, R., et al Expires - April 2004 [Page 18] 9. Authors' Addresses Richard Rabbat Fujitsu Labs of America, Inc. 1240 E. Arques Ave, MS 345 Sunnyvale, CA 94085 United States of America Phone: +1-408-530-4537 Email: rabbat@fla.fujitsu.com Vishal Sharma Metanoia, Inc. 1600 Villa Street, Unit 352 Mtn. View, CA 94041 United States of America Phone: +1-650-386-6723 Email: v.sharma@ieee.org Zafar Ali Cisco Systems Inc. 100 South Main St. #200 Ann Arbor, MI 48104 United States of America Phone: +1-734-276-2459 Email: zali@cisco.com Rabbat, R., et al Expires - August 2004 [Page 19] draft-rabbat-expedited-flooding-01.txt February 2004 10. Full Copyright Statement "Copyright (C) The Internet Society (2003). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE." Rabbat, R., et al Expires - April 2004 [Page 20]