CCAMP Working Group Richard Rabbat (Fujitsu Labs. of America) Internet Draft Toshio Soumiya (Fujitsu Labs. Limited) Expires: July 2004 Shinya Kanoh (Fujitsu Labs. Limited) Category: Experimental Vishal Sharma (Metanoia, Inc.) Fabio Ricciato (CoriTel) Roberto Albanese (La Sapienza) February 2004 Implementation and Performance of Flooding-based Fault Notification draft-rabbat-ccamp-perf-flooding-notification-exp-00 Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026 [1]. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract This memo presents the observations and results obtained from two test-beds designed to study and evaluate the performance and feasibility of rapid fault notification via flooding. Flooding-based fault notification [2,3] is an alternative to signaling-only notification approaches, which has the advantages of: scalability and the ability to meet bounded recovery-time constraints, if needed. We implemented flooding-based notification at both the transport and packet layers. For optical transport networks the flooding mechanism (e.g. [3]), was realized using enhancements to the Link Management Protocol (LMP), while for packet/MPLS networks the flooding was realized using enhancements to Open Shortest Path First (OSPF). We present experiences and performance measurements from these implementations on FreeBSD/Linux platforms, and also present the Rabbat, Ricciato et al Expires - July 2004 [Page 1] draft-rabbat-ccamp-perf-flooding-notification-exp-00 February 2004 protocol enhancements that were made to LMP and OSPF, respectively, to realize the rapid flooding function. Table of Contents 1. Introduction...................................................2 2. Terminology....................................................3 3. Comparison of Signaling versus Flooding-based Notification.....3 4. Fault-Notification in Optical Transport Networks...............4 4.1 Testbed Implementation and Architectural Model................5 4.2 LMP-based Flooding Solution...................................6 4.3 Implementation Experiences....................................6 4.4 Experimental Results..........................................7 5. Fault Notification in Packet (IP/MPLS) Networks................8 5.1 Testbed Implementation and Architectural Model................8 5.2 OSPF-based Flooding Solution..................................9 5.3 Implementation Experiences...................................10 5.4 Experimental Results.........................................11 6. Summary and Conclusions.......................................13 Appendix A. LMP Protocol Modifications for flooding-based Fault Notification.....................................................13 A.1 Fault Recovery Scenario......................................13 A.2 Additional LMP Message Formats...............................14 A.3 Additional LMP Object Definitions............................15 Appendix B. OSPF Protocol Modifications for flooding-based Fault Notification: O-LSA Message format...............................15 7. Intellectual Property Considerations..........................18 8. References....................................................18 9. Authors' Addresses............................................20 10. Full Copyright Statement.....................................20 1. Introduction Fault notification is a key mechanism in the recovery of end-to-end connections and/or path segments in a packet or transport network. The fault notification scheme is responsible for conveying information from the node detecting fault(s) to the node(s) responsible for taking the restorative action. The goal of this document is to present results from two test-bed implementations of fault notification, at the transport and packet layers, respectively, and to demonstrate that flooding presents a scalable and efficient means of fault notification. We implemented flooding-based notification independently in two test- beds. The first, for optical transport networks, was done using enhancements to the Link Management Protocol, while, the second, for Rabbat, Ricciato et al Expires û July 2004 [Page 2] draft-rabbat-ccamp-perf-flooding-notification-exp-00 February 2004 packet networks, was done using enhancements to the Open Shortest Path First with TE extensions protocol. The rest of this document is organized as follows. Section 3 presents some key points of differentiation between signaling and flooding- based solutions to fault notification. In Section 4, we discuss in detail the fault notification in optical transport networks, while in Section 5 we discuss in detail notification in IP/MPLS packet networks. In both cases, we present the testbed implementation and architectural model used, the LMP or OSPF-TE ûbased flooding solution, and our implementation experiences and experimental results. In Appendices A and B, we present the protocol modifications that were made to LMP and OSPF-TE respectively, to realize the rapid flooding function, thus making this work self-contained. 2. Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [4]. 3. Comparison of Signaling versus Flooding-based Notification As mentioned earlier, a key mechanism during fault recovery is the fault notification scheme, which conveys information from the node detecting fault(s) to the node(s) responsible for taking the restorative action. After a localization period that allows the network to pinpoint the fault, a node has the option to notify others of that fault using one of two techniques: signaling or flooding. In a signaling-based approach, the detecting node will send, for each failed path affected by the fault, a notification to the protection switching point. That switching point will send an acknowledgement back to the detecting node then start an RSVP Path-Resv handshake process with the nodes on the protection path. The working traffic is recovered upon the completion of the signaling process. The downside is the possible duplication in the recovery operation and the scalability concerns due to a large number of paths being affected by the fault. An alternative method is for the detecting node to flood the network with information about the fault. Intelligent nodes would, upon receipt of the fault notification, recognize when they need to reconfigure their switching fabrics to activate pertinent protection paths. This enables fast and timely recovery and allows the scalability of the solution to a large number of paths affected by a fault. Rabbat, Ricciato et al Expires û July 2004 [Page 3] draft-rabbat-ccamp-perf-flooding-notification-exp-00 February 2004 An added advantage of flooding-based notification is the fast dissemination of error information to the whole network. This allows ingress nodes to update their view of the network and calculate future working and backup paths on timely information that increases the probability of success of path setup requests. 4. Fault-Notification in Optical Transport Networks Since the transport layer forms the foundation of a multi-layer architecture, time-bounded recovery (and thus, time-constrained fault notification) is a key requirement at this layer, as it influences the nature and types of guarantees that the transport layer provides to the carrierÆs customers. Such time criticality has implications for both path selection and fault notification. On the other hand, it is important to facilitate shared mesh protection [5]. We present an example in Figure 1 of the operation in the case of shared mesh recovery to highlight the messaging requirements. [R1]---[R2]---[R3]---[R4] \ / [R5]-------------[R6] / \ / \ [R7]---[R8]------[R9]---[R10] Working LSP1: [R1->R2->R3->R4] Working LSP2: [R7->R8->R9->R10] Recovery LSP1: [R1->R5->R6->R4] Recovery LSP2: [R7->R5->R6->R10] Figure 1. Shared Mesh Recovery One working path, W-LSP1, runs from R1 to R4 via R2 and R3, and another working path, W-LSP2, runs from R7 to R10 via R8 and R9. R1 can provide user traffic protection by creating a backup LSP that merges with the working LSP at R4. We refer to the LSP R-LSP1 [R1- >R5->R6->R4] as the recovery LSP of W-LSP1. In the same manner, we refer to the LSP R-LSP2 [R7->R8->R9->R10] as the recovery LSP of W- LSP2. In this situation, the resources for recovering the working LSPs can be shared. In other words, the link between R5 and R6 can be shared between the recovery LSP1 and recovery LSP2, thereby increasing network efficiency by increasing the backup/working capacity ratio. When a failure occurs at a link between R8 and R9 on the working LSP W-LSP2, the endpoints (both R8 and R9) of the link will detect the failure (if the link is bi-directional). Then, the detecting nodes Rabbat, Ricciato et al Expires û July 2004 [Page 4] draft-rabbat-ccamp-perf-flooding-notification-exp-00 February 2004 start sending FaultNotify messages. Node R9 sends these messages to R6 and R10. When R6 and R10 receive FaultNotify messages, they send back a FaultNotifyAck message to the sending node R9 and send FaultNotify messages to their immediate neighbors. If a FaultNotify message is lost, a sending node retransmits it to the neighbor. So, it may take some retransmission delay, however if the receiving node has diversity, the receiving node can get another FaultNotify message form another sending node. As a result, the retransmission delay may be negligible. In this example, R-LSP2 is activated and R7 switches W-LSP2 traffic to be carried by R-LSP2. R10, in its turn, merges R-LSP2 to the original route. As a result, traffic will follow the path [R7->R5- >R6->R10]. 4.1 Testbed Implementation and Architectural Model Fault Notify (FN) FN +------+ -> +------+ -> +------+ |CNT 4 +----------+CNT 5 +----------+CNT 6 | +---+--+ +--+---+ +---+--+ / | / | / | FN | / | FN | / | FN | / | V / | V / | V / | / | / | / | Control Plane / | FN / | FN / | +------+ | <- +------+ | -> +------+ | |CNT 1 +----------+CNT 2 +-----------+CNT 3 | | +--+---+ | +-+----+ | +-+----+ | | | | | | | | | | | | | | +---+--+ | +--+---+ | +--+---+ | | | | | | | | | | |Node 4+-----|----+Node 5+------|----+Node 6| **|***| |*****|****| | | | | ^^^|^^ +------+ | *+------+ | +------+ | ^ / | * / | / |^ / | * / | / | ##### Fiber cut|* / | / Data Plane ^| / | / | / ^ | / *| / | / ^^^+--+---+ *-+----+ +-+----+ ***| ***********| | | | |Node 1+----------+Node 2+-----------+Node 3| | | | | | | +------+ +------+ +------+ Figure 2. Experimental setup for LMP-based notification Rabbat, Ricciato et al Expires û July 2004 [Page 5] draft-rabbat-ccamp-perf-flooding-notification-exp-00 February 2004 In this study, we considered a Generalized Multi-Protocol Switching (GMPLS) domain, with a distributed working and protection calculation algorithm that allows a high degree of sharing of protection resources. All nodes are aware of the complete topology and opaque Link State Advertisements (LSAs) are used to update them on the resources used when more working/protection Label Switched Paths (LSPs) are set up or torn down. We also assumed that the reconfiguration of multiple ports in the switching fabric occurs serially, leading to slowdowns when several LSPs at the same node have to be activated (channels have to be cross-connected) due to a failure. The objective of the study was to provide a recovery time less than 50 ms for a fiber cut that affected a large number of working LSPs. The work was a joint study of Fujitsu Laboratories Limited and Fujitsu Labs of America. 4.2 LMP-based Flooding Solution We developed an implementation for flooding of fault notification by defining extensions to LMP (Link Management Protocol). LMP allows a quick and simple implementation of flooding and can be processed in a timely fashion. The Fault Notification Protocol (FNP) is a point-to- point lightweight protocol that can guarantee time-bounded recovery. FNP notifies all nodes on the network that a fault has occurred; each node inspects the fault and determines what LSPs -if any- need to be recovered. We implemented this protocol in a 6-node grid topology. Figure 2 shows the testbed with an emulated data plane. The example shows two LSPs, a primary LSP from node 1 to node 4 and a backup LSP that goes through nodes 1, 2, 5 and 4. Since the occurrence of a fault has to be communicated in a bounded time period, the primary and backup paths were selected in a way that satisfies the time constraint. The backup path computation algorithm therefore does not consider all available network resources when calculating the protection path, but rather a subset of that graph, consisting of nodes that can be reached within the time bound. By dissociating the path computation from the timing constraints, we simplify the path computation problem and allow for multiple levels of recovery speeds (which may correspond to different levels of Quality of Service). 4.3 Implementation Experiences The test-bed system consists of data plane and control plane as described in section 4.1. There is a control server used in order to manage the control plane. The control server can ask a control node to setup working path by appropriate calls on the web-based interface on the server. A protection path, which is related to the working Rabbat, Ricciato et al Expires û July 2004 [Page 6] draft-rabbat-ccamp-perf-flooding-notification-exp-00 February 2004 path is automatically setup according to a pre-planned route. Each node is a PC-based UNIX system (running FreeBSD). The MPLS forwarding function is used to emulate the data plane of the optical network. In order to emulate data plane, we use the MPLS label stack. This means that the outer label indicates LSC (Label Switch Capable) and inner label indicates PSC (Packet Switch Capable). Each node is connected using fiber running Ethernet. The control node on the control plane has LMP-based fault notification (see Appendix A) and RSVP-TE based GMPLS signaling protocol. We used this testbed system to measure the recovery time. In addition, we also looked at reversion time, assuming that when the fiber cut is fixed, LSPs go back to the initial working paths. Detailed results are shown in the next section. 4.4 Experimental Results We tried several scenarios that assumed 1, 10 and 20 LSPs are affected by a fiber cut between nodes 1 and 4. LMP messages are flooded over control plane channels by controllers CNT 1 through 6. This allows nodes 2 and 5 involved in the recovery to activate the recovery path. Simultaneously, nodes 3 and 6 are notified of the fault and no longer use the fiber that was cut in calculating working and protection paths for new path setup requests. Table 1. Results of experiment of LMP-based flooding notification +-------------------+-----------------------------------------------+ | Unit: millisecond | | +-------------------+-----------------------+-----------------------+ | | Fault Recovery Time | Reversion Time | +-------------------+-----------+-----------+-----------------------+ | | 10 paths | 20 paths | 10 paths | 20 paths | +-------------------+-----------+-----------+-----------+-----------+ | Flooding time | 0.691 | 0.764 | 0.791 | 0.727 | +-------------------+-----------+-----------+-----------+-----------+ | SW time of node 1 | 22.991 | 47.115 | 23.987 | 47.239 | +-------------------+-----------+-----------+-----------+-----------+ | SW time of node 2 | 7.928 | 14.797 | 7.310 | 14.585 | +-------------------+-----------+-----------+-----------+-----------+ | SW time of node 4 | 19.677 | 37.680 | 19.154 | 37.156 | +-------------------+-----------+-----------+-----------+-----------+ | SW time of node 5 | 7.583 | 12.113 | 6.770 | 11.463 | +-------------------+-----------+-----------+-----------+-----------+ | Total SW time | 22.991 | 47.115 | 23.987 | 47.239 | +-------------------+-----------+-----------+-----------+-----------+ Recovery times were 1.3 msec for traffic recovery affecting 1 LSP, 22.99 msec in the case of 10 LSPs and up to 47 msec in the case of 20 Rabbat, Ricciato et al Expires û July 2004 [Page 7] draft-rabbat-ccamp-perf-flooding-notification-exp-00 February 2004 LSPs. The increased recovery time is due to serial reconfiguration of the switching fabric, while the notification time remained constant since weÆre using flooding-based notification. Table 1 shows the complete results for the recovery from the fiber cut between nodes 1 and 4 for 10 and 20 LSPs. It also lists the speed of reversion when the fiber cut has been fixed. In situations where cross-connects can be reconfigured in parallel, recovery times will be obviously smaller. 5. Fault Notification in Packet (IP/MPLS) Networks When introducing new capability at the packet layer, one is concerned with i) scalability and ii) ease of implementation in the framework of the deployed protocols. For notification at the packet layer, delay requirements are less stringent (probably several hundreds of milliseconds or more), but scalability is a more serious concern, due to the smaller bandwidth granularity, larger number of LSPs, and more complex topologies. At the same time, any notification scheme for the packet layer should, as far as possible, utilize existing protocols and have minimal impact on current network architecture. Since in most cases the packet infrastructure is already deployed, minimizing impact will facilitate adoption and deployment by carriers. With these objectives in mind, we developed and tested a prototypical implementation of fault notification through flooding of Opaque Link State Advertisement (O-LSA) in OSPF-TE. This activity was embedded in the development of a MPLS-TE experimental test-bed in the TANGO project [6], held jointly between CoRiTeL and University La Sapienza, Rome. The objective was to produce a demonstration for various traffic engineering and fault recovery aspects in an MPLS-TE network using a PC/Linux platform. The implementation of an expedited fault notification mechanism through OSPF-TE flooding was one of the more important achievements of that work. 5.1 Testbed Implementation and Architectural Model The TANGO architectural model assumes a Diffserv/MPLS domain with three QoS classes and on-demand provisioning of end-to-end connections for each class. Additionally, each connection is associated to one out of three protection classes: out of the following ones: Unprotected (UP), Single-Fault Protected (SFP) and Dual-Fault Protected (DFP). Since end-to-end path protection is used, each SFP connection is associated to two disjoint LSP (working + backup), while each DFP connection has three (working + two backup). The connection requests are processed by the Node Manager (NM) module within the ingress router. The NM computes the end-to-end routes and manages the RSVP-TE signaling sessions for all the involved LSP. The NM is also in charge to trigger packet switching Rabbat, Ricciato et al Expires û July 2004 [Page 8] draft-rabbat-ccamp-perf-flooding-notification-exp-00 February 2004 from working to backup LSP upon receiving a failure notification. All LSP for QoS traffic are explicitly routed by means of strict-ERO object in the RSVP-TE messages [7]. The route computation is done at the ingress router by a dedicated module, the Route Selection Engine (RSE), and is based on the network load information collected in a local database, the Network State Database (NS/DB), which is continuously updated by the OSPF-TE flooding process of Opaque-LSA [8]. In fact, each network node disseminates the changes in the level of bandwidth reservation on each link. Appropriate flooding- reduction techniques [9] might be implemented to dampen the message overhead. Whenever a link failure is notified by the OSPF-TE process, the NM module is in charge of triggering the label re-mapping function on the data plane for the impacted LSPs, i.e. to switch packet from the working to the backup LSP. Therefore, the NM needs to parse the full set of outgoing LSP in order to identify those impacted by the failure. This is achieved by a query to a local database, the LSP Database (LSP/DB), which maintains all relevant information about all outgoing LSP for the specific ingress node. Regarding the software modules, our starting point was the RSVP-TE code which is available from the TEQUILA project [10], and the OSPF- TE implementation which is available from the ZEBRA project [11]. All the other modules (NM, RSE, NS/DB, LSP/DB, etc.) have been developed from the scratch. The O-LSA messages had to be enriched in order to carry the following link attributes, in addition to the bandwidth-related ones: o Shared-Risk-Link-Group (SRLG) membership, as required to enable the RSE at the ingress router to select SRLG-disjoint routes, whenever applicable. This semantic was carried in the Administrative group sub-TLV of the O-LSA. o No-Risk-Link flag, as required to communicate that the specific link is to be considered NOT prone to failure at the packet layer (e.g., because it is already protected at lower layer, typically Sonet/SDH). The route computation algorithm at the RSE allows working and backup LSP for the same demand to share the same No-Risk-Link. o Fault Notification semantic. More details on the message format are given in Appendix B. 5.2 OSPF-based Flooding Solution In addition to link-load dissemination, the flooding of OSPF-TE Opaque-LSA was exploited to quickly distribute fault notification. We used area-local scope O-LSA (type 10, see [8]) thus implicitly Rabbat, Ricciato et al Expires û July 2004 [Page 9] draft-rabbat-ccamp-perf-flooding-notification-exp-00 February 2004 assuming a single-area domain. This choice was dictated by the fact that this is the only type of O-LSA which is currently implemented in the zebra OSPF-TE daemon. The "link-failed" notification semantic was carried using the TE-metric sub-TLV (Type-Length-Value), through an appropriate flag. An O-LSA carrying a link-failed notification will be referred in the following as an "FN-LSA". A key issue we had to cope with in implementing rapid flooding of FN O-LSA was the hold-down timers. In order to ensure network stability, OSPF uses two hold-down timers [12]. One is used to defer the generation of a new LSA when a previous LSA for to the same link has been transmitted (Min-LS-Interval). The second is used to delay an LSA when a previous LSA referring to the same link has been received (Min-LS-Arrival). These timers would add an unacceptable delay to the recovery, of the order of a few seconds. The solution that we adopted was to introduce a timer exception, which forces a hold-down timer to expire immediately upon the generation / reception of a FN-LSA. Remarkably, the recognition of whether or not the O-LSA is carrying link-failed semantic does requires only negligible additional processing, since any node has to process the O-LSA in order to understand to which link it refers to. In fact, both timers above are associated to each link. This exception allows the first FN-LSA after the link failure to be immediately flooded through the network, without suffering any artificial delay due to timers. Very importantly, the timer exception is not applied to subsequent FN-LSA from the same link, which will be delayed according to the normal hold-down timers used for non-FN O-LSA. In that respect, we ensure that fault notification does not lead to network overload in the case for example of link flapping. 5.3 Implementation Experiences The implementation of fault notification mechanism in the OSPF-TE code required rather limited efforts. A limited amount of code lines and time was sufficient to implement the message format (described in Appendix B) and introduce the timer exception described above. This is, we believe, the most important lesson learned from the test-bed implementation. Other annotations worth mentioning: o We did experience virtually no problems in working with the OSPF-TE code, while a lot of problems were found in handling the RSVP-TE agents and related modules (instability, bugs, etc.) o We had no problem when the failure consisted in the physical interruption of the link, as caused by manually unplugging the link connector, or suddenly switching-off the node at the other end. On the other hand, when the failure was emulated by stopping the interface via software (e.g., by the ôifdown eth0ö command), the fault notification did not worked properly. In Rabbat, Ricciato et al Expires û July 2004 [Page 10] draft-rabbat-ccamp-perf-flooding-notification-exp-00 February 2004 fact, in this case the Linux OS immediately informs the OSPF process of the new state of the link, which immediately triggers the flooding of a (non-Opaque) LSA before the FN-LSA (Opaque). This was just a problem of the choice of OS. o We experienced several criticalities that are mainly due to the general-purpose nature of the hardware and software equipment. For example, an incidental overload of the data-plane due to a mis-configuration of the generator of the test flows had a heavy impact on the control-plane processing, included the OSPF-TE daemon, which caused pathologically long recovery delay. This kind of problem can be easily overcome by appropriate prioritization of processes and messages on the control plane, and should not apply when using dedicated HW / SW equipment 5.4 Experimental Results In this section we report on the recovery delay measured in our test- bed. Recall that the equipment used in the test-bed is commercial low-end PCs with Linux OS. All links are Fast Ethernet. The trials consisted in loading the network with a certain number of protected demands, then physically unplug one link in order to emulate a physical cut. The test network was almost unloaded on the data plane: only one LSP at a time was loaded with a test flow of UDP packet in order to measure the total end-to-end recovery delay. Since the test flow was periodic - one packet every 20 ms - the total number of lost packets during the transitory can be accounted as a rough estimation of the total recovery delay, within an error of 20 ms. Several trials were repeated with different number of impacted LSPs, on a test network consisting of 8 nodes. The results are reported in Table 2. Consider that in the experiments the failure was at two-hop distance from the head-end node, so that each FN-LSA had to be processed by at least one intermediate node before the head-end. Table 2. Total Recovery Delay +-----------------------------+-----------------------------+ | 1 LSP | 20 LSP | +---------+---------+---------+---------+---------+---------+ | min | max | mean | min | max | mean | +------+---------+---------+---------+---------+---------+---------+ | TOS | 70 | 180 | 120 | 146 | 380 | 220 | +------+---------+---------+---------+---------+---------+---------+ It can be seen that the total recovery delay is in the order of few hundreds of msec, with a remarkable dependence on the number of LSP. Rabbat, Ricciato et al Expires û July 2004 [Page 11] draft-rabbat-ccamp-perf-flooding-notification-exp-00 February 2004 As discussed below, this is due to i) the LSP/DB query and ii) the label re-mapping at the head-node: both actions involve a processing delay which is dependent on the number of LSP. The random dispersion around the mean value was expected, considering that the PC was running a general purpose operating system. Next, we were interested in analyzing the contribution of individual delay components to the total delay budget. Figure 3 describes the temporal evolution of the recovery procedure, with the analysis of each delay component as measured in the TANGO test-bed. The 1st delay component (1-2 ms) accounts for the failure detection at the node local to the failure. This is done by a small dedicated software module that continuously monitors the state of the link and directly reports the changes to the local OSPF-TE daemon. The 2nd delay component (between 40-85 ms, with an average of 60 ms) accounts for the time it takes for the local OSPF-TE daemon to prepare and send the first FN-LSA. The 3rd component accounts for the processing of FN-LSA at intermediate nodes: we measured approximately 10 ms of processing time for each FN-OLSA at each hop. Clearly, the total notification delay depends on the network diameter (denote by n) roughly 10*n ms. Exact values for T1 and T2 delay components are reported in Table 3 for different numbers of LSP. +-Fault | | +-Fault Notified to | | OSPF-TE daemon +-OSPF Flooding Start at the detecting node | | | | | | +-Fault Notified to the | | | | Node Manager at head-end | | | | | | | | +-Switch Procedure Start | | | | | | | | | | +-Switch | | | | | | Procedure | | | | | | Accomplished V V V V V V |---|---------------------|----------|------|-------|---------> 2 40-85 10*n T1 T2 time (ms) Figure 3. Recovery Delay Components Rabbat, Ricciato et al Expires û July 2004 [Page 12] draft-rabbat-ccamp-perf-flooding-notification-exp-00 February 2004 At the head-end node, the OSPF-TE module immediately informs the Node Manager module, which in turn queries the LSP/DB in order to retrieve the subset of impacted LSP associated to the specific failure. The LSP/DB query delay is accounted in the 4th delay components (T1 in Figure 3), which depends on number of outgoing LSP which have been interrupted. Remarkably, in this phase the bottleneck function is not the table lookup in the database, but rather the communication with the NM that is implemented via socket. After the query, the NM triggers the label re-mapping on the data plane for the impacted LSP. The completion of the label re-mapping heavily depends on the number of records to be updated in the forwarding table, i.e. on the number of LSP that have been interrupted by the fault (T2 in Figure 3). Table 3. Recovery Delay Components +-----------------------------+-----------------------------+ | 1 LSP | 20 LSP | +---------+---------+---------+---------+---------+---------+ | min | max | mean | min | max | mean | +------+---------+---------+---------+---------+---------+---------+ | T1 | 2 | 5 | 3 | 30 | 35 | 31 | +------+---------+---------+---------+---------+---------+---------+ | T2 | 22 | 30 | 28 | 113 | 289 | 135 | +------+---------+---------+---------+---------+---------+---------+ 6. Summary and Conclusions We described experiments in using flooding-based fault notification for two layers: optical transport and data networks. We achieved fast recovery using this scalable notification method. Appendix A. LMP Protocol Modifications for flooding-based Fault Notification In this section, we first describe a fault recovery scenario based on shared mesh recovery, and then provide the message formats and object definitions for LMP that we implemented to enable fault notification via LMP. A.1 Fault Recovery Scenario Nodes maintain a recovery table that keeps track of recovery paths that a node is responsible for, together with information on the resources that each recovery path is using, in addition to fault node/link IDs. For example, the recovery path table may consist of Rabbat, Ricciato et al Expires û July 2004 [Page 13] draft-rabbat-ccamp-perf-flooding-notification-exp-00 February 2004 fault node ID, fault link ID, input port, input label (i.e. lambda), output port and output label. When a failure occurs, the procedure followed is based on [3]. Messages of interest to us are FaultNotify and FaultNotifyAck. Briefly, each node that receives the FaultNotify message acknowledges it with a FaultNotifyAck and generates FaultNotify messages to the neighbors from which it hasnÆt received the FaultNotify message. A node keeps sending unicast FaultNotify messages periodically to each of its neighbors until it receives FaultNotifyAck messages (as defined in section A.1) from its neighbors, or a retry timer sending expires. A FaultNotify message contains the node ID of the reporting node, the link ID of the failed link and a sequence Number. The message may optionally contain a TTL, failed wavelength ID, or failed SRLG ID. Each node that receives a FaultNotify message checks that it has not yet received the message about this failure from another neighboring node. For this purpose, it searches a database indexed on the failure data and sequence number. The database stores the failure data and sequence numbers from the received messages. If the receiving node has not yet received the message about this failure, it adds the failure data and sequence number into the database. The receiving node possibly activates one or more recovery paths according to the recovery table based on the failure data. A.2 Additional LMP Message Formats Two messages (FaultNotify and FaultNotifyAck) need to be defined. Most of the necessary data objects are already defined in LMP [13]. o FaultNotify Message (Msg Type = TBD) ::= [] { [] ...} or ::= [ ...] o FaultNotifyAck Message (Msg Type = TBD) ::= The contents of the MESSAGE_ID_ACK object MUST be obtained from the FaultNotify message being acknowledged. Rabbat, Ricciato et al Expires û July 2004 [Page 14] draft-rabbat-ccamp-perf-flooding-notification-exp-00 February 2004 A.3 Additional LMP Object Definitions The formats for the Common Header at the beginning of LMP messages and the LMP objects used to build the messages are defined in [13]. That document also defines the MESSAGE_ID, MESSAGE_ID_ACK, LOCAL_NODE_ID, and CHANNEL_STATUS data objects used in our extended messages. The SRLG_ID data object is defined in [14]. This leaves us to define data objects for TTL and FAULT_ID. o TTL Class (Class = TBD) C-Type = 1, Time to Live (= Hop Count) 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | TTL | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ TTL: 8 bits This is an unsigned integer to indicate the remaining hop count value. A node receiving a FaultNotify message having a TTL of zero MUST silently discard the message. This object is non-negotiable. o FAULT_ID Class (Class = TBD) 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Reserved | FaultId | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ FaultId: 16 bits This MUST be a node-wide unique unsigned integer. The FaultId identifies the sequence of failures. A node increases the value when it detects a failure. This object is non-negotiable. Appendix B. OSPF Protocol Modifications for flooding-based Fault Notification: O-LSA Message format The message that we use to advertise a fault notification is the Opaque-LSA type-10(area-local scope) described in [8], and has the format shown in Figure 4. Rabbat, Ricciato et al Expires û July 2004 [Page 15] draft-rabbat-ccamp-perf-flooding-notification-exp-00 February 2004 Type-10 Opaque LSA 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | LSA age | Options | 10 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | LSA_ID | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Advertising Router | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | LS sequence number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | LS checksum | Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | | | Data | | | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 4. Message Format for Fault Notification In this message, the meaning of the various fields is described in [12] with the exception of the LSA_ID field, which has the format described in Figure 5. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 1 | Instance | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 5. LSA ID Field The Instance field is an arbitrary value used to maintain multiple TE-LSAs. The LSA payload of the Type-10 Opaque-LSA that we implemented is constituted of two TLV (Type/Lenght/Value) triplets, a Type-1 TLV called Router Address TLV and a Type-2 TLV, called Link-TLV, as depicted in Figure 6. Rabbat, Ricciato et al Expires û July 2004 [Page 16] draft-rabbat-ccamp-perf-flooding-notification-exp-00 February 2004 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type = 1 - Router-TLV | Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Value | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type = 2 - Link-TLV | Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | ... | | | | Value | | | | ... | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 6. LSA Payload The Value Field of the Link-TLV in particular is constituted of a set of sub-TLV, that have the following structure: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type | Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Value | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 7. Sub-TLV Structure In our implementation, Type-5 and Type-9 sub-TLVs, respectivly TE- metric sub-TLV and Administrative group sub-TLV, are adapted to flood information about the status of each link; in particular we can handle the following information: o Link-Failed: The TE-metric sub-TLV is used to notify the failure of a link; upon detecting the failure, the nodes connected to that link send a Type-10 Opaque-LSA including a TE-metric sub- TLV with the Value field set to 1 throughout all their interfaces. This Opaque-LSA will be flooded as described in [8], in order to broadcast the failure notification to all network nodes. Rabbat, Ricciato et al Expires û July 2004 [Page 17] draft-rabbat-ccamp-perf-flooding-notification-exp-00 February 2004 o SRLG membership: The informations about the SRLGs are distributed via the Administrative group sub-TLV, in the three least significant bytes of the Value field. o No-Risk-Link flag: the information about the indestructibility of a link is distributed via the Administrative group sub-TLV, in the most significant byte. 7. Intellectual Property Considerations This section is taken from Section 10.4 of RFC2026 [1]. The IETF takes no position regarding the validity or scope of any intellectual property or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; neither does it represent that it has made any effort to identify any such rights. Information on the IETF's procedures with respect to rights in standards-track and standards-related documentation can be found in BCP-11. Copies of claims of rights made available for publication and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementors or users of this specification can be obtained from the IETF Secretariat. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights, which may cover technology that may be required to practice this standard. Please address the information to the IETF Executive Director. 8. References [1] Bradner, S., "The Internet Standards Process -- Revision 3", BCP 9, IETF RFC 2026, October 1996. [2] Rabbat, R., V. Sharma and Z. Ali, "Expedited Flooding for Restoration in Shared-Mesh Transport Networks", Internet draft, work in progress, draft-rabbat-expedited-flooding-01.txt, January 2004. [3] Rabbat, R. and V. Sharma (Eds.), "Fault Notification Protocol for GMPLS-Based Recovery", Internet Draft, work in progress, draft-rabbat-fault-notification-protocol-04.txt, October 2003. [4] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels," BCP 14, IETF RFC 2119, March 1997. Rabbat, Ricciato et al Expires û July 2004 [Page 18] draft-rabbat-ccamp-perf-flooding-notification-exp-00 February 2004 [5] Rabbat, R. and T. Soumiya, (Eds.), "Optical Network Failure Recovery Requirements", Internet Draft, work in progress, draft-rabbat-optical-recovery-reqs-01.txt, January 2004. [6] TANGO project homepage: http://tango.isti.crn.it [7] Awduche, D. et al. RSVP-TE: Extensions to RSVP for LSP Tunnels, RFC 3209, December 2001. [8] Coltun, R., "The OSPF Opaque LSA Option", RFC 2370, July 1998. [9] Apostolopoulos, G., R. Guerin, and S. K. Tripathi, "Quality of Service Routing: A Performance Perspective", in Proc. SIGCOMM'98, ACM, September 1998. [10] Tequila project homepage: http://dsmpls.atlantis.ugent.be [11] Zebra project homepage: http://zebra.org [12] Moy, J., "OSPF Version 2", STD 54, RFC 2328, April 1998. [13] Lang, J. (Ed.), "Link Management Protocol (LMP)", Internet Draft, work in progress, draft-ietf-ccamp-lmp-10.txt, October 2003. [14] Fredette, A. and J. Lang (Eds.), "Link Management Protocol (LMP) for Dense Wavelength Division Multiplexing (DWDM) Optical Line Systems", Internet Draft, work in progress, draft-ietf-ccamp- lmp-wdm-03.txt, December 2003. Rabbat, Ricciato et al Expires û July 2004 [Page 19] draft-rabbat-ccamp-perf-flooding-notification-exp-00 February 2004 9. Authors' Addresses Richard Rabbat Toshio Soumiya Fujitsu Labs of America, Inc. Fujitsu Laboratories Ltd. 1240 E. Arques Ave, MS 345 1-1, Kamikodanaka 4-Chome Sunnyvale, CA 94085 Nakahara-ku, Kawasaki United States of America 211-8588, Japan Phone: +1-408-530-4537 Phone: +81-44-754-2765 Email: rabbat@fla.fujitsu.com Email: soumiya.toshio@jp.fujitsu.com Shinya Kanoh Vishal Sharma Fujitsu Laboratories Ltd. Metanoia, Inc. 1-1, Kamikodanaka 4-Chome 1600 Villa Street, Unit 352 Nakahara-ku, Kawasaki Mountain View, CA 94041 211-8588, Japan United States of America Phone: +81-44-754-2765 Phone: +1 408-530-8313 Email: kanoh@jp.fujitsu.com Email: v.Sharma@ieee.org Fabio Ricciato Roberto Albanese INFOCOM Dept. INFOCOM Dept. University La Sapienza of Rome University La Sapienza of Rome Rome, Italy Rome, Italy Email: fabio@coritel.it Email: Albanese@coritel.it 10. Full Copyright Statement "Copyright (C) The Internet Society (2004). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. Rabbat, Ricciato et al Expires - July 2004 [Page 20] draft-rabbat-ccamp-perf-flooding-notification-exp-00 February 2004 This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE." Rabbat, Ricciato et al Expires û July 2004 [Page 21]