INTERNET-DRAFT Bidirectional Forwarding Detection A. Palanivelan Category: HISTORIC Cisco Systems Expires: Dec 2010 June 09, 2010 Bidirectional Forwarding Detection (BFD) with Graceful Restart draft-palanivelan-bfd-v2-gr-05.txt Status of this Memo This Internet-Draft is submitted to IETF in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on December 10, 2010. Copyright Notice Copyright (c) 2010 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the BSD License. A.Palanivelan [Page 1] Internet Draft draft-palanivelan-bfd-v2-gr-05.txt June 2010 Abstract This document proposes an extension for Bidirectional Forwarding Detection (BFD) to support Graceful restart, in complementing Gracefulrestart support of the underlying protocol.This shall work consistently irrespective of the bfd mode or protocol or the type of restart and most importantly the vendors design and implementation.This document describes in detail the challenges to bfd in surviving a graceful restart and a generic solution to succeed. Table of Contents 1 INTRODUCTION ............................................. 3 2 OVERVIEW ............................................... 3 3 MOTIVATIONS .............................................. 4 3.1 Planned Restarts with control protocols ............... 4 3.2 BFD Co-existing with BB configs ....................... 5 4 Extensions to BFD .......................................... 5 4.1 Version (Vers)........................................ 5 4.2 Diagnostic (Diag)..................................... 6 4.3 My Restart Interval.................................... 6 4.4 Your Restart Interval.................................. 6 5 State Machine for BFD with GR Support....................... 6 6 Theory of operation......................................... 8 6.1 Session Establishment and GR Timer exchange............ 8 6.2 Remote Neighbor Restart and Recovery................... 9 7 Security Considerations...................................... 11 8 IANA Considerations......................................... 11 9 References ................................................. 11 9.1 Normative References................................... 11 9.2 Informative References................................. 11 10 Author's address............................................ 12 A.Palanivelan [Page 2] Internet Draft draft-palanivelan-bfd-v2-gr-05.txt June 2010 1. Introduction The Bidirectional Forwarding Detection protocol [BFD] provides a mechanism for liveness detection of arbitrary paths between systems. It is intended to provide low-overhead, short-duration detection of failures in the path between adjacent forwarding engines, including the interfaces, data link(s), and to the extent possible the forwarding engines themselves. It operates independently of media,data protocols,and routing protocols. An additional goal is to provide a single mechanism that can be used for liveness detection over any media, at any protocol layer, with a wide range of detection times and overhead, to avoid a proliferation of different methods. The extensions introduced in this draft for bfd shall aid in bfd complementing the GR capabilities of protocols such as ospf and also in providing a consistent behavior for planned/unplanned restarts irrespective of the underlying protocols.The intention of this document is to provide a solution that works fine for all types of bfd implementations. 2. Overview The Bidirectional Forwarding Detection [BFD] specification defines a protocol with simple and specific semantics. Its sole purpose is to verify connectivity between a pair of systems, for a particular data protocol across a path (which may be of any technology, length, or OSI layer). The promptness of the detection of a path failure can be controlled by trading off protocol overhead and system load with detection times. BFD is assumed to be working fine without a need for any GR support in it.But, the deployments show that the different types of implementations in the products and their inherent mechanisms lead to issues with bfd especially in surviving GR.It is true that prioritizing bfd would make sure the other CPU intensive processes do not fail bfd, but this won't be possible as there may be other higher priority processing that cant be ignored.Example for this are the existing subscriber connections that can't be given a lesser priority. The extensions introduced in this draft for bfd shall aid in bfd complementing the GR capabilities of protocols such as ospf and also in providing a consistent behavior for planned/unplanned restarts for the underlying protocols. A.Palanivelan [Page 3] Internet Draft draft-palanivelan-bfd-v2-gr-05.txt June 2010 3. Motivations Though the existing drafts discuss bfd interactions with applications with Graceful Restart and ways of implementing in serving successful GR, the drafts itself have some exceptions and caveats applied. This draft in particular discusses the issues in the following scenarios and provides a generic solution that would scale for future applications and to provide a solution that works fine for all types of bfd implementations. * Planned restart with a control protocol such as IS-IS,which cannot signal GR. * BFD co-existing with BB configs This document tries to address the above issues in specific and Graceful restart mechanism in general, for bfd. 3.1 Planned Restarts with control protocols The existing bfd drafts suggest administratively disabling bfd prior to the start of GR. But, this works only for planned restarts and not for unplanned restarts. This also does not work for a protocol such as IS-IS that cannot signal a planned restart. For a Planned restart where a control protocol can signal before restarting, if a BFD session failure occurs during the restart, it is recommended in the existing document(s) that, such a planned restart SHOULD NOT be aborted and the session failure SHOULD NOT result in a topology change being signaled in the control protocol. Control protocols that cannot signal a planned restart depend on the recently restarted system to signal the Graceful Restart prior to the control protocol adjacency timeout. In most cases, whether the restart is planned or unplanned, it is likely that the BFD session will time out prior to the onset of Graceful Restart, and a topology change SHALL be signaled. This type of implementation shall impact non-stop routing and non-stop forwarding support using GR-enabled protocols and provides an opportunity to review the existing bfd implementations and improve. A.Palanivelan [Page 4] Internet Draft draft-palanivelan-bfd-v2-gr-05.txt June 2010 3.2 BFD Co-existing with BB configs In a real time scenario with Broadband configurations,it is highly likely that the bfd sessions do not survive a Graceful restart. Assume a router at PE that has active DHCP sessions with a large number of clients (say 16k). During a planned restart, it is also likely that the DHCP clients request for renewal of IP address to the server (restarting router) at that time. When the router is restarting, these requests do not reach the router. But, when these requests reach the router when the router has just come up, it will treat these requests at a high priority and responds to them. When we have thousands of such requests to the restarting router, the router shall spend a major part of its first second of uptime in addressing these requests. In this scenario, a control protocol like ospfv2 that has GR enabled [OSPF-GRACE], shall withstand the restart for the specified restart interval (as it will be in seconds) and it is likely to survive the restart in maintaining its forwarding plane. In the same scenario, if bfd is enabled for ospfv2, for an unplanned restart, the (bfd) neighbor router will be expecting bfd control packets in milliseconds interval and during the restart process, is likely to timeout, also impacting the associated ospfv2 adjacency and resulting in loss of traffic. The scenario will be the same for bfd with a protocol such as is-is [IS-IS-GRACE], where the problem is likely to be seen even for a planned restart. 4. Extensions to BFD This draft introduces a new diag value to indicate that the neighbor is restarting and provisions to configure graceful restart timers. The Generic BFD Control Packet Format shown below introduces two additional sections "My Restart Interval" and "Your Restart Interval". 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Vers | Diag |Sta|P|F|C|A|D|M| Detect Mult | Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | My Discriminator | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Your Discriminator | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Desired Min TX Interval | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Required Min RX Interval | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Required Min Echo RX Interval | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | My Restart Interval | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Your Restart Interval | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 4.1 Version (Vers) The version of bfd defined by this draft, that has support for GR configuration and a diag for neighbor restarting state, shall have a value of 2. A.Palanivelan [Page 5] Internet Draft draft-palanivelan-bfd-v2-gr-05.txt June 2010 4.2 Diagnostic (Diag) A diagnostic code specifying the local system's reason for the last change in session state. A new diag value 9 for "Neighbor Restarting" is introduced in this draft. Values are: 0 -- No Diagnostic 1 -- Control Detection Time Expired 2 -- Echo Function Failed 3 -- Neighbor Signaled Session Down 4 -- Forwarding Plane Reset 5 -- Path Down 6 -- Concatenated Path Down 7 -- Administratively Down 8 -- Reverse Concatenated Path Down 9 - Neighbor Restarting 10-31 -- Reserved for future use This field allows remote systems to determine the reason that the previous session failed. 4.3 My Restart Interval This is the restart interval,in microseconds, of the transmitting system advertised to the remote system. In the case of a restart (of transmitting system), the remote system is expected to keep the bfd session up for this duration of time.This field shall have a value greater than the detection time.Value of 0 shall indicate to the remote system that this system has bfd-gr disabled. 4.4 Your Restart Interval The restart interval,in microseconds, received from the corresponding remote system. In the case of a restart (of remote system), the transmitting system is expected to keep the bfd session up for this duration of time.This field shall have a value greater than the detection time. 5. State Machine for BFD with GR Support The BFD state machine is quite straightforward and explained in detail by [BFD].The [BFD} RFC describes different states for BFD as: Down, Init,Up, AdminDown. A.Palanivelan [Page 6] Internet Draft draft-palanivelan-bfd-v2-gr-05.txt June 2010 Each system communicates its session state in the State (Sta) field in the BFD Control packet, and that received state, in combination with the local session state, drives the state machine. Please refer [BFD] for state machine diagram and detailed explanations of the state transitions. The following diagram provides an overview of the state machine, for state transitions for BFD with GR support (where "Your Restart Interval" has a non-zero value and greater than Detection time).This document does not introduce any new state to BFD state machine. The "Your Restart Interval" shall have a value greater than the detection time value.If this value is zero or less than the detection time value, the state transitions shall completely follow bfd state machine as defined by [BFD]. The notation on each arc represents the state of the remote system (as received in the State field in the BFD Control packet) or indicates the expiration of the Detection Timer. +-----+ | | INIT, UP | v +-----------------------------+ +-------->|State = UP, Diag = 0, | | |Timer = "Detect Interval" |<----+ | | | | | +-----------------------------+ | | | | | | | | | | | {Neighbor Restart} | | | | | | | | | |INIT,UP | | {Neighbor Restart | | | complete} | | | | | | v | +-------+ | +----------------------------------+ +-----| | | | State = UP, Diag = 9, | | | | | | Timer = "Your Restart interval" | DOWN | | INIT | | +----------------------------------+ +---->| | | | +-------+ |ADMIN DOWN, | | |DOWN, | | |TIMER | | v ADMIN DOWN,| | +-------+ DOWN,| | | | TIMER | ADMIN DOWN,| | DOWN |<---------------+ TIMER | | | +------>| | +-------+ | ^ | | UP, ADMIN DOWN, TIMER | | +---+ Note1: This state diagram holds for bfd with GR extension,as described in this document, which implies that "your Restart Interval" has a value greater than the Detection time value of the established session. Note2: The parts of the diagram with flower braces {} indicates the GR Specific events on the remote neighbor(Restart/Restart complete). A.Palanivelan [Page 7] Internet Draft draft-palanivelan-bfd-v2-gr-05.txt June 2010 6. Theory of Operation The system that has support for high-availability, when using a routing protocol that is GR enabled, shall continue to forward traffic during a restart.when bfd is enabled on such a protocol, it is expected to assist the process than disturb it. With current bfd implementations, the bfd sessions do not survive a restart under different conditions.An Unplanned restart or a planned restart with a protocol such as IS-IS that cannot signal about restart, are some of the conditions where bfd config is set to impact a high-availability situation. Though there are certain implementations adopted by various companies to make bfd survive restarts, there is no uniform method of achieving this and is likely to fail when interop with routers from other companies.This draft proposes a standard way of achieving this objective. This draft recommends the introduction of a new diag value (9 for "Neighbor restarting"), new version (2 for GR supported bfd) and two additional sections to the bfd packet format.This design is expected to provide a capability to bfd in withstanding restart scenarios, in complementing the associated protocol.This shall work consistently irrespective of the bfd mode or protocol or the type of restart. 6.1. Session Establishment and GR Timer exchange The bfd session establishment follows the procedures as described in [BFD]. if the technology described by this document were to be implemented, the bfd control packets shall have the following field(s) with the values given below: The Version field (Section 4.1) shall have a value of 2,indicating the support for GR. A new section to the bfd control packet format,"My Restart Interval" (Section 4.3) shall have a non-zero value that is greater than the detection time. A new section to the bfd control packet format,"Your Restart Interval" (Section 4.4) shall have a non-zero value that is greater than the detection time. The "My Restart Interval" and "Your Restart The interval" shall be used in exchanging the GR timers information between the systems. "My Restart Interval" is the time interval in microseconds, that this system expects its remote system to wait for, before bringing down its bfd session with this system. "Your Restart Interval" is the time interval in microseconds, specified by the remote system, that it expects this system, to wait for, by this system before bringing down its bfd session by the remote system. A.Palanivelan [Page 8] Internet Draft draft-palanivelan-bfd-v2-gr-05.txt June 2010 Once the packet exchanges are complete and the bfd sessions are up,every bfd session will have info, about the time interval, its remote system will wait during a Restart and also the time interval this system has to wait,when the remote system restarts.The "My Restart Interval" and the "Your Restart Interval" values can be modified after the session is up, just like the other bfd parameters, and in this case, the packet exchanges shall sync up the restart interval times (My and Your) on both the sides appropriately. The exchange of GR Specific parameters, during bfd session establishment is indicated in the diagram below.The diagram shows only the part of control packets, for the purpose of clarity. SystemA SystemB | | | | |----------------------------------->| | {bfd.version = 2 | | bfd.MyRestartInterval = AAAA | | bfd.YourRestartInterval = 0 } | | | |<-----------------------------------| | {bfd.version = 2 | | bfd.MyRestartInterval = BBBB | | bfd.YourRestartInterval= AAAA } | | | |----------------------------------->| | {bfd.version = 2 | | bfd.MyRestartInterval = AAAA | | bfd.YourRestartInterval= BBBB } | | | The initial bfd packet exchange between the system to remote system shall have the exchanged values for the "My Restart Interval" or 0.The "Your Restart Interval" will reflect the value received in "My Restart Interval" from the corresponding remote system or is Zero if value is not set.A value of Zero for "Your Restart Interval" shall mean that the bfd GR is disabled at the remote end and similarly a value of Zero for "My Restart Interval" shall mean that bfd GR is disabled at the transmitting system. 6.2. Remote Neighbor Restart and Recovery When the bfd neighbors that have their bfd sessions established (with their bfd GR timer values exchanged as described above),the following set of operations take place, when the remote neighbor attempts a graceful restart (For eg.with a GR enabled routing protocol like OSPFv2/ IS-IS tied with BFD). A.Palanivelan [Page 9] Internet Draft draft-palanivelan-bfd-v2-gr-05.txt June 2010 Once the packet exchanges are complete and the bfd sessions are up,every bfd session will have info, about the time interval, its remote system will wait during a Restart and also the time interval this system has to wait,when the remote system restarts. For clarity, let us revisit the bfd timers and bfd detection time as described in [BFD]. The Detection Time (the period of time without receiving BFD packets after which the session is determined to have failed) is not carried explicitly in the protocol. Rather, it is calculated independently in each direction by the receiving system based on the negotiated transmit interval and the detection multiplier. This means that a bfd control packet shall be received from the remote neighbor within the detection time.When the bfd control packet is not received from the remote neighbor within this time, the timer expiry, shall bring the bfd session state to down. In the case of Graceful Restart scenario, we may end up in a situation that the routing protocol (like ospfv2) is in graceful restart mode with the remote neighbor restarting, and the system not receiving bfd control packets within the detection time, due to other CPU intensive processes in the system.This shall be addressed if the technology proposed by this document were adapted. When the set of systems had their bfd sessions established , with GR support, as described in this document,when the remote neighbor restarts, it shall set the bfd diagnostics field to a value of 9 (Neighbor Restarting) in the control packet to its neighbor (local system). When the local system receives bfd control packet with diag field set to 9, the local system shall update its timer to the previously exchanged value of "Your Restart Interval".This effectively means that the local system shall wait for a bfd control packet till "Your Restart Interval" instead of Detection time.This shall be the case as long as the diag field from the remote neighbor is 9. When the restart is complete and the remote neighbor recovers, the remote neighbor shall set the Diagnostics field to a value of 0.The local system on receiving bfd control packets, with diag field set to 0, understands that the restart process for remote neighbor is complete and hence shall revert the timer, back to detection time (by calculation) and shall expect control packets from the neighbor within this detection time. If the remote neighbor is not recovering in time to send a bfd control packet within the previously communicated "Your Restart Interval", the timer expiry, shall bring the session down. It is important to have a meaningful values to the "Your Restart Interval" and "My Restart Interval" to complement the GR timers in the associated protocol. A.Palanivelan [Page 10] Internet Draft draft-palanivelan-bfd-v2-gr-05.txt June 2010 7. Security Considerations Security considerations discussed in [BFD], [BFD-1HOP] apply to this document. 8. IANA Considerations If this technology were to be implemented, it would need two sections added to the BFD generic packet format namely "My Restart Interval" and "Your Restart Interval" as described in Section4 of this document. This document also defines a Diag value of 9 to be used to specify "Neighbor Restarting" in addition to the "BFD Diagnostic Codes" defined by [BFD] and referred in Section4.2 of this document.If this technology were to be implemented, the "BFD Diagnostic Codes" need to be updated as: Value BFD Diagnostic Code Name ----- ------------------------ 0 No Diagnostic 1 Control Detection Time Expired 2 Echo Function Failed 3 Neighbor Signaled Session Down 4 Forwarding Plane Reset 5 Path Down 6 Concatenated Path Down 7 Administratively Down 8 Reverse Concatenated Path Down 9 Neighbor Restarting 10-31 Unassigned 9. References 9.1. Normative References [BFD] Katz, D., and Ward, D., "Bidirectional Forwarding Detection", RFC 5880, June, 2010. [BFD-1HOP] Katz, D., and Ward, D., "BFD for IPv4 and IPv6 (Single Hop)", RFC 5881, June, 2010. 9.2. Informative References [IS-IS-GRACE] Shand, M., and Ginsberg, L., "Restart signaling for IS- IS", RFC 5306, October 2008. [OSPF-GRACE] Moy, J., et al, "Graceful OSPF Restart", RFC 3623, November 2003. A.Palanivelan [Page 11] Internet Draft draft-palanivelan-bfd-v2-gr-05.txt June 2010 10. Authors' Addresses Palanivelan A Cisco Systems, Bangalore,India. Email: apvelan@cisco.com A.Palanivelan [Page 12] Internet Draft draft-palanivelan-bfd-v2-gr-05.txt June 2010