draft-vasseur-mpls-linknode-failure-00.txt October 2002 Anna Charny Jean-Philippe Vasseur Cisco Systems, Inc. IETF Internet Draft Expires: April, 2003 October, 2002 draft-vasseur-mpls-linknode-failure-00.txt Distinguish a link from a node failure using RSVP Hellos extensions Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are Working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Charny and Vasseur 1 draft-vasseur-mpls-linknode-failure-00.txt October 2002 Abstract The aim of this draft is to provide a method to distinguish a link from a node failure using RSVP hello extensions. In a network making use of MPLS Traffic Engineering Fast Reroute as specified in [FAST-REROUTE], efficient use can be made of the network links when protecting against link/node failures. As described in [FACILITY], excess capacity used for bypass tunnels can be shared between bypass tunnels providing protection for mutually exclusive failures of different links or nodes. This results in significant bandwidth savings under the single failure assumption. Making use of the single failure assumption implies the need to distinguish a link from a node failure. However, the mechanisms currently available for failures detection do not always allow to distinguishing a link from a node failure. 1. Terminology LSR - Label Switch Router LSP - An MPLS Label Switched Path PCS - Path Computation Server (may be any kind of LSR (ABR, ...) or a centralized path computation server PCC - Path Computation Client (any head-end LSR) requesting a path computation of the Path Computation Server. Local Repair - Techniques used to repair LSP tunnels quickly when a node or link along the LSPs path fails. Protected LSP - An LSP is said to be protected at a given hop if it has one or multiple associated backup tunnels originating at that hop. Detour LSP - An MPLS LSP used to re-route traffic around a failure in one-to-one backup. Bypass Tunnel - An LSP that is used to protect a set of LSPs passing over a common facility. Backup Tunnel - The LSP that is used to backup up one of the many LSPs in many-to-one backup. PLR - Point of Local Repair. The head-end of a backup tunnel or a detour LSP. MP - Merge Point. The LSR where detour or backup tunnels meet the protected LSP. In case of one-to-one backup, this is where Charny and Vasseur 2 draft-vasseur-mpls-linknode-failure-00.txt October 2002 multiple detours converge. A MP may also be a PLR. NHOP Bypass Tunnel - Next-Hop Bypass Tunnel. A backup tunnel which bypasses a single link of the protected LSP. NNHOP Bypass Tunnel - Next-Next-Hop Bypass Tunnel. A backup tunnel which bypasses a single node of the protected LSP. Reroutable LSP - Any LSP for with the "Local protection desired" bit is set in the Flag field of the SESSION_ATTRIBUTE object of its Path messages. CSPF - Constraint-based Shortest Path First. 2. Introduction In a network making use of MPLS Traffic Engineering Fast Reroute as specified in [FAST-REROUTE], efficient use can be made of the network links when protecting against link/node failures. As described in [FACILITY], excess capacity used for bypass tunnels can be shared between bypass tunnels providing protection for mutually exclusive failures of different links or nodes. This results in significant bandwidth savings under the single failure assumption. Making use of the single failure assumption implies the need to distinguish a link from a node failure. However, the mechanisms currently available for failures detection do not always allow to distinguishing a link from a node failure. Typically a link down event does not tell to a PLR whether the link attached to its NHOP or its NHOP itself has failed. 3. Problem statement Let's consider the following scenario: R6--------R7---------R8 | | | | | | | R5 | | / \ | | / \ | R1----R2------R3-----R4 - a bypass tunnel T1 from R2 to its NNHOP R4 following the R2-R5-R7-R8- R4 path, - a bypass tunnel T2 from R3 to its NNHOP R1 following the R3-R5-R7-R6- R1 path. As T1 and T2 protect independent resources (respectively routers R3 and R2), under the assumption of a single failure, they can share the backup bandwidth, in particular on the R5-R7 link. Charny and Vasseur 3 draft-vasseur-mpls-linknode-failure-00.txt October 2002 In case of a failure of a bi-directional link R2-R3, the inability to distinguish a link from a node failure would result in the simultaneous usage of T1 and T2 as both R2 and R3 could start using their NNHOP backup tunnel concluding of a neighbor node failure..,. This would in effect result in a violation of the single failure assumption, although in practice only a single failure (that of bi-directional link R2-R3) occurred. [FACILITY] proposes both a centralized and a distributed bypass tunnel path computation model. In either of these models, the bypass tunnels are placed with zero signaled bandwidth in order to take advantage of bandwidth sharing between independent failures In a centralized scenario, the inability to distinguish a link from a node failure could be handled by the centralized backup tunnels path computation by taking into account that T1 and T2 in fact will be used simultaneously, and making sure that any link they traverse together has enough bandwidth to accommodate both of them at the same time. However, this would clearly result in additional algorithm complexity and a less optimal bandwidth sharing. In the distributed backup tunnel path computation scenario, the capability to distinguish a link from a node failure is clearly mandatory, as the lack of such capability results in dependency between backup tunnels protecting different elements cannot be computed independently of each other 4. Usage of RSVP hello The RSVP hellos extension is defined in RFC3209. RSVP hellos can be used in various contexts: - to detect a link failure for layer2 protocol that do not provide: - link failure notification (e.g two routers connected via a GE switch), - fast link failure notification, - to detect a node failure when: - the layer 2 protocol does not provide link failure notification, - the layer 2 protocol does not provide fast failure notification, - the link does not fail. 5. Mechanism to distinguish a link from a node failure using RSVP hellos The proposal of this draft to distinguish a link from a node failure is to enable the exchange of RSVP hello messages over an alternate path than the directly connected link. Charny and Vasseur 4 draft-vasseur-mpls-linknode-failure-00.txt October 2002 Note that this alternate path can be the NHOP bypass tunnel used to protect the link or any other NHOP TE LSP following a path diversely routed from the protected link. At steady state, the PLR maintains an RSVP hello adjacency with its neighbor over the directly connected link. When a link failure is detected through the layer 2 or the RSVP hello adjacency over the directly connected link goes down, the PLR uses the RSVP hello to determine whether its neighbor is reachable via another path than the failed link. If this is the case, the PLR can conclude of a link failure. If not, the failure is a node failure. This allows the PLR to take the appropriate rerouting decision and make use of the NHOP or NNHOP bypass tunnel(s) (see section 7). Note that sending the RSVP hellos over the NHOP bypass tunnel does not require any additional RSVP extension and is in compliance with RFC3209. 6. Mode of operation In the previous network depicted in figure 1, let suppose: - On R2, two bypass tunnels are defined: - a NHOP bypass tunnel T1 following the R2-R5-R3 path, - a bypass tunnel T2 from R2 to its NNHOP R4 following the R2-R5-R7-R8-R4 path, - On R3, two bypass tunnels are defined: - a NHOP bypass tunnel T'1following the R3-R5-R2 path, - a bypass tunnel T'2 from R3 to its NNHOP R1 following the R3-R5-R7-R6-R1 path, - RSVP hello messages exchanged over the directly connected link at a frequency of hello-interval1. The RSVP hello adjacency is considered as down if no RSVP hello are received from a neighbor after miss-ack1 * hello-interval1. - RSVP hello messages exchanged over the alternate path at a frequency of hello-interval2. The RSVP hello adjacency is considered as down if no RSVP hello are received from a neighbor after miss-ack2 * hello- interval2. The proposal can run in two modes: - "Triggered" mode: once the link failure has been detected or the RSVP hello adjacency over the directly connected link goes down, the PLR triggers the sending of RSVP Hellos over the alternate path (NHOP bypass tunnel). Ideally, the frequency of the RSVP hello sent over the NHOP bypass tunnel should be high to reduce the time required to detect a node failure. This does not have any substantial scalability impact as those RSVP hellos are just sent in this particular situation, not at steady state. Charny and Vasseur 5 draft-vasseur-mpls-linknode-failure-00.txt October 2002 If the PLR does not get any response within a configurable amount of time (miss-ack2 * hello-interval2), it can conclude of a node failure. In the following cases: - node failure over a layer 2 link providing fast link failure notification (e.g Packet Over SONET link) also generating a link failure, the PLR will detect the node failure after at most: link failure detection time + miss-ack2 * hello-interval2 + Propagation delay on the alternate path - node failure over a layer 2 link not providing (fast) link failure notification or a node failure not generating a link failure, the PLR will detect the node failure after at most: miss-ack1 * hello-interval1 + miss-ack2 * hello-interval2 + Propagation delay on the alternate path. - "Dual" mode: in this mode, the PLR maintains at least two RSVP hello adjacencies with every neighbor: one other the directly connected link, one over the NHOP bypass tunnel. In case of a link failure over a layer 2 link providing fast link failure notification (e.g Packet Over SONET link), the PLR will detect the link failure and will get a RSVP hello reply in at most miss-ack2 * hello-interval2 + Propagation delay on the alternate path. In the following cases: - node failure over a layer 2 link providing fast link failure notification (e.g Packet Over SONET link) also generating a link failure, the PLR will detect the node failure after at most: max ( link failure detection time , (miss-ack2 * hello- interval2) ) + Propagation delay on the alternate path. - node failure over a layer 2 link not providing (fast) link failure notification or a node failure non generating a link failure, the PLR will detect the node failure after max ( (miss-ack1 * hello-interval1) , (miss-ack2 * hello-interval2) ) + Propagation delay on the alternate path. Performance comparisons Which of the two modes is the more efficient depends on several factors: Ex 1: with a Packet Over SONET link (POS link), a node failure also generating a link failure will be quickly detected by the PLR. In that case, the triggered mode, provided the propagation delay is not very high over the alternate path, is likely to be the most efficient. Ex 2: in case of a Node failure not generating any link failure, the dual mode is likely to be more efficient as it will require max ((miss-ack1 * hello-interval1) , (miss-ack2 * hello-interval2) + Charny and Vasseur 6 draft-vasseur-mpls-linknode-failure-00.txt October 2002 Propagation delay on the alternate path) to detect the node failure compared to (miss-ack1 * hello-interval1) + (miss-ack2 * hello- interval2) + Propagation delay on the alternate path. The "dual" mode might have some scalability impact as it requires to double the number of RSVP hello adjacencies on every node. 7. Fast Rerouting decision Once the link failure has been detected by the PLR or the RSVP hellos adjacency goes down over the directly connected link, there is a period of time during which the PLR does not know whether the failure is a link or a node failure (the duration of that period depends on the type of failure and the mode in use ("triggered" versus "dual" mode) as described above). Once the link failure has been detected by the PLR or the RSVP hellos adjacency goes down over the directly connected link, there are two possible decisions that the PLR can take: (1) start using the NHOP bypass tunnel(s) to reroute every protected TE LSP that used to cross the failed link, supposing the failure is a link failure. When the PLR knows the type of failure: o if this is a link failure, do nothing, o if this is a node failure, move the protected TE LSP not terminating at the NHOP from the NHOP bypass tunnel(s) to the NNHOP bypass tunnel(s). (2) start using the NHOP bypass tunnel(s) to reroute the protected TE LSP that terminate to the NHOP and use the NNHOP bypass tunnel(s) to reroute the protected TE LSP that do not terminate to the NHOP. Once the failure type is determined: o if this is a link failure, move the protected TE LSP not terminating at the NHOP to the NHOP bypass tunnel. o if this is a node failure, stop rerouting the TE LSP terminating at the NHOP. The pros and cons of each approach are quite straightforward: - (1) in the case of link-only failure it guarantees bandwidth protection upon link failure detection; in the case of node failure (resulting in link failure as well) it is a bit slower (more traffic disruption) for the protected TE LSPs terminating at NNHOP, but guarantees bandwidth protection once the node failure is detected - (2) is faster (less traffic disruption) in case of node failure for the protected TE LSP not terminating at the NHOP but in the case of link-only failure it might result in temporary bandwidth violation for the period of time between the detection of link failure and the determination that the node failure occurred as well (but bandwidth protection is guaranteed after the failure has been properly classified). Charny and Vasseur 7 draft-vasseur-mpls-linknode-failure-00.txt October 2002 8. Possible optimization Optimization 1: in some link failure cases, the PLR can unambiguously identify a link failure. In those cases, this does not require the PLR to start an RSVP hello adjacency on the bypass tunnel (triggered mode). Optimization 2: in case of a node failure, the penultimate LSR over the bypass tunnel path will likely send a Path Error right after the node failure. In this case, the PLR might receive the Path Error before the RSVP hello adjacency over the bypass tunnel goes down and could immediately conclude of a node failure. 9. Security Considerations The practice described in this draft does not raise specific security issues beyond those of existing TE. 10. Acknowledgment The authors would like to thank Carol Iturralde, Rob Goguen, Elisheva Hochberg and Jay Hosler for their useful and valuable comments. 11. Intellectual Property The contributor represents that he has disclosed the existence of any proprietary or intellectual property rights in the contribution that are reasonably and personally known to the contributor. The contributor does not represent that he personally knows of all potentially pertinent proprietary and intellectual property rights owned or claimed by the organization he represents (if any) or third parties. References [TE-REQ] Awduche et al, Requirements for Traffic Engineering over MPLS, RFC2702, September 1999. [OSPF-TE] Katz, Yeung, Traffic Engineering Extensions to OSPF, draft- katz-yeung-ospf-traffic-05.txt, June 2001. [ISIS-TE] Smit, Li, IS-IS extensions for Traffic Engineering, draft- ietf-isis-traffic-03.txt, June 2001. [RSVP-TE] Awduche et al, "RSVP-TE: Extensions to RSVP for LSP Tunnels", RFC3209, December 2001. Charny and Vasseur 8 draft-vasseur-mpls-linknode-failure-00.txt October 2002 [CR-LDP] Jamoussi et al., "Constraint-Based LSP Setup using LDP", draft-ietf-mpls-cr-ldp-05.txt, February 2001 [METRICS] Fedyk et al, "Multiple Metrics for Traffic Engineering with IS-IS and OSPF", draft-fedyk-isis-ospf-te-metrics-01.txt, November 2000. [DS-TE] Le Faucheur et al, "Requirements for support of Diff-Serv-aware MPLS Traffic Engineering", draft-ietf-tewg-diff-te-reqts-01.txt, June 2001. [PATH-COMP] Vasseur et al, "RSVP Path computation request and reply messages", draft-vasseur-mpls-computation-rsvp-03.txt, November 2001. [FAST-REROUTE] Pan, P. et al., "Fast Reroute Techniques in RSVP-TE", Internet Draft, draft-ietf-mpls-rsvp-lsp-fastreroute-01.txt , October 2002 [FACILITY] Vasseur, Charny, Le Faucheur and Achirica, "MPLS Traffic Engineering Fast reroute: backup tunnel path computation for bandwidth protection", draft-vasseur-mpls-backup-computation- 01.txt, October 2002. [BP-PLACEMENT] Leroux, Calvignac, "A method for an Optimized Online Placement of MPLS Bypass Tunnels", draft-leroux-mpls-bypass-placement- 00.txt, February 2002. [KINI] Kini et al, "Shared Backup Label Switched Path Restoration", draft-kini-restoration-shared-backup-01.txt, May 2001. [ISIS-PCSD] Vasseur and Shand, "IS-IS Path Computation Server discovery TLV'', draft-vasseur-mpls-isis-pcsd-discovery-00.txt, work in progress. [OSPF-TE-CAP] Vasseur, Psenak, "OSPF Traffic Engineering capability TLVs'', draft-vasseur-mpls-ospf-te-cap-00.txt, work in progress. Authors' Address: Anna Charny Cisco Systems, Inc. 300 Apollo Drive Chelmsford, MA 01824 USA Email: acharny@cisco.com Jean Philippe Vasseur Cisco Systems, Inc. 300 Apollo Drive Chelmsford, MA 01824 Charny and Vasseur 9 draft-vasseur-mpls-linknode-failure-00.txt October 2002 USA Email: jpv@cisco.com Charny and Vasseur 10