Internet Engineering Task Force J. Parker INTERNET DRAFT Axiowave Networks Expiration Date: January 2002 Danny McPherson Amber Networks Cengiz Alaettinoglu Packet Design July 20, 2001 Short Adjacency Hold Times in IS-IS 1.0 Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Copyright Notice Copyright (C) The Internet Society (2000). All Rights Reserved. Parker, McPherson, Alaettinoglu Expires January 2002 [Page 1] Internet Draft - draft-parker-short-isis-hold-times-01.txt July 2001 2.0 Abstract This document discusses the impacts of using short time-to-live values, called hold times, for IS-IS adjacencies. We discuss best current practices to deal with short hold times. The IS-IS protocol (ISO 10589) is used today to construct routing tables for IP networks, as described in RFC 1195 [2]. Short hold times allow more rapid detection of failed adjacencies, leading to faster convergence. This document also describes a new option to be used in the Hello message in the IS-IS Routing protocol, to define sub-second hold times. The proposed change will interoperate with routers implementing the existing protocol. 3.0 Table of Contents 1. Status of this Memo.................................. 1 2. Abstract............................................. 2 3. Table of Contents.................................... 2 4. Overview............................................. 2 5. Role of Hello Protocol Data Units.................... 3 6. Principles........................................... 3 7. Proposal............................................. 5 8. Implementation Issues................................ 5 9. Security Implications................................ 6 10. Acknowledgments...................................... 6 11. References........................................... 7 12. Author's Address.................................... 7 13. Full Copyright Statement............................. 8 4.0 Overview Interior Gateway Protocols such as IS-IS are designed to provide timely information about the best routes in a routing domain. To this end, the protocol performs the following actions: (1) The protocol verifies and monitors connectivity between adjacent routers, creating adjacencies. (2) The protocol exchanges local topological information, including the adjacencies it believes are current. (3) The protocol computes the best next hops for a set of routes, and submits them to the local forwarding table. Parker, McPherson, Alaettinoglu Expires January 2002 [Page 2] Internet Draft - draft-parker-short-isis-hold-times-01.txt July 2001 Since the protocol was originally designed and deployed, there have been large increases in the speed of circuits, large increases in the speed of CPUs, and the general availability of Real Time Operating Systems (RTOS). The original default hold times reflect the bandwidth of the circuits, and the responsiveness of the systems current at the time. A delay of 20 seconds in updating a best route in today's net- works can mean hundreds of thousands of packets black holed across a single circuits. This draft proposes a change that will allow more rapid discovery of failed adjacencies, which can be used to drive more rapid conver- gence. The proposal interoperates with existing implementations, causing no observable change in behavior. The draft also discusses issues surrounding the change, including suggestions on the proper maintenance of adjacencies. 5.0 Role of Hello Protocol Data Units Routers running the IS-IS protocol send Hello Protocol Data Unit (IIH PDU) on a periodic basis over active circuits. The IIH PDU provides two functions. (1) It allows adjacent routers to negotiate an adjacency. (2) It allows the protocol to monitor the health of an existing adja- cency by acting as a keep-alive packet. One of the unusual aspects of the IS-IS protocol is that an adjacen- cies time to live, called the "Holding Time" is asymmetric. There is no negotiation of the hold time: instead, each side sets the Holding Time field in the IIH PDUs it sends, as described in section 9.5 of [1]. If router A proposes 10 seconds and router B proposes 100, then router B will be able to detect that A is unavailable within 10 seconds, while it may take router A up to 100 seconds to do the same. 6.0 Principles The following principles of network design may be implemented without modifying the protocol as seen by peers. (1) Detect Link Failures as low in the stack as possible (2) Avoid excess creation of new LSPs, as it burdens all routers in the domain. Parker, McPherson, Alaettinoglu Expires January 2002 [Page 3] Internet Draft - draft-parker-short-isis-hold-times-01.txt July 2001 (3) As a corollary, it is important to damp flapping adjacencies. 6.1 Detect Failures Close to Source Failures are best detected as close to the source as possible. Link failure might be detected at the physical level. For example, we may lose SONET framing or Ethernet carrier. Such physical link failure should be reported to all protocols using the circuit, and may allow them to react far sooner than high level hellos, however fine the timer granularity. Short hold times can be used to detect adjacency failure rapidly when the physical layer is healthy. For example, consider two routers connected via two Ethernet bridges as below. +----------+ A +--------+ B +--------+ C +----------+ | Router 1 | ----- | Bridge | ----- | Bridge | ----- | Router 2 | +----------+ +--------+ +--------+ +----------+ Router 1 may be able to detect failure at the physical layer if link A fails. However, neither router will detect a physical problem if link B fails. 6.2 Avoid Excess Flooding It is important to reduce excess generation of new IS-IS Link State PDU (LSP), as each new LSP must be flooded throughout the network, and may cause an SPF run to be scheduled on each router in a domain. 6.3 Damp Adjacency Flapping One common cause for excess generation of LSPs is "adjacency flap", when an adjacency is declared up and then down repeatedly. It is important to respond quickly when an adjacency goes down, as a broken circuit may lead to a black hole or a routing loop. Thus we must gen- erate a new LSP when we lose an adjacency. To reduce the number of times we change an adjacency's state, the only choice we have is to increase the time required to bring an adjacency up. It is best to lose quickly and win slowly. It may be useful to keep track of the number of times an adjacency has gone down recently, and use that to pace the time it takes to bring an adjacency back up. Parker, McPherson, Alaettinoglu Expires January 2002 [Page 4] Internet Draft - draft-parker-short-isis-hold-times-01.txt July 2001 7.0 Proposal With the current protocol, the Holding Time is measured in seconds, so the minimal interval is one second. The hello PDU retransmission rate could be set to 100 milliseconds to obtain failure detection within a second. There is no way to define a sub-second Holding Time in the current protocol. We propose adding a new IS-IS Option type, "Sub-Second Hold Time", defined as follows. Type = 0xF1 (decimal 241) Length = 2 Value: Network order short with Holding Time in milliseconds Any system that supports this mechanism may include this option in IS-IS Hello packets. Any system that does not understand this option will ignore it, and use the standard Holding Time defined in sections 9.5, 9.6, and 9.7 of [1]. Any system that is able to process this option will use the value defined in the received option when updating the holdingTimer, as described in sections 8.2.4.2.e.2 or 8.4.2.4 of [1]. 8.0 Implementation Issues This section addresses issues that are not visible on the wire, but have a profound effect on the stability of implementations using short hold times. Vendors and ISPs in the past have experimented with short hello timers, in the one-digit second range. In large networks, resource starvation may result in delay of packet transmission or receipt when running time consuming operations, such as the SPF computation or route redistribution. Problems arise from two sources. If either party fails to send hellos on a timely basis, or is unable to get the packets during a burst of traffic, the adjacency will be declared down, a new LSP will be generated and flooded, and the SPF computation will be rerun. If the physical link is really healthy, the adjacency will be re-established, new LSPs will be generated, and more SPF computations will run, which may lead to continual churn. Short hold times cannot be used without careful thought and Parker, McPherson, Alaettinoglu Expires January 2002 [Page 5] Internet Draft - draft-parker-short-isis-hold-times-01.txt July 2001 engineering. We recommend the following measures. 8.1 React to Physical Link Failures Physical link failures should result in an imediate deletion of an adjacency, and the immediate tranmission of new LSP reflecting the change. 8.2 Timely Transmission of Hellos The implementation must generate Hellos in a timely fashion. One way this may be done by using a real time operating system that supports hard limits. Another way is to have an independent source of the packets, such as a low latency process or logic on the line card. 8.3 Conservative Deletion of Adjacencies The implementation should be conservative when deleting an adjacency whose Hold Time has expired. If there is a backlog of PDUs in the receive queue, the implementation may wish to see if there is a Hello PDU queued from a circuit before deleting an adjacency on that cir- cuit. One way of doing this is to queue the Hello PDUs independently from the LSP and SNP (Sequence Number Packet) PDUs. Another possibility it to take an LSP or SNP from a neighbor as evi- dence that the neighbor is alive. Since these packets are not intended for that use, they do not have a Hold Time. 8.4 Damp Adjacency Flapping Flap Damping of some sort must be in place, so that a buggy implemen- tation does not force network meltdown. Implementations may decide to take exponentially longer to bring up an adjacency after each time-out. Another proposal would be to return to default hold times on any cir- cuit that has seen an adjacency flap more than N times recently. 8.5 Monitoring There should be logic to keep track of the number of times we have timed out an adjacency, to help trace problems to the source. Parker, McPherson, Alaettinoglu Expires January 2002 [Page 6] Internet Draft - draft-parker-short-isis-hold-times-01.txt July 2001 8.6 Shutdown Hellos When an implementation administratively shuts down an interface, it could send a hello message with a Hold Time of 0. This would allow the neighbor to delete the adjacency quickly. The 2-way check described in 7.2.4 of [1] will discard any link that is claimed by only one side. However, speeding up detection by the far side will increase the chance that LSPs on both sides are regenerated and received by all routers in the network at about the same time, requiring only one additional SPF run, rather than two. 9.0 Security Implications The sub-second TLV proposed in this document does not raise any new security concerns, as there is no change in the underlying protocol: there is simply an increase in the speed in which we may detect changes. The use of a hello with a Hold Time of 0 has always presented an opportunity for a denial of service attack. A third party on a LAN could capture Hellos from a router, and retransmit with a Hold Time of 0. Protection against this may be offered with an MD5 signature. To prevent the rebroadcast of malicious packets, implementations may choose to disable sending a hello with a Hold Time of 0 on circuits with MD5 signatures. 10.0 Acknowledgments Thanks to Vijay Gill, Dave Katz, Radia Perlman, Tony Przygienda, Henk Smit, and Chris Whyte for helpful discussions of the issues. 11.0 References [1] ISO, "Intermediate system to Intermediate system routing informa- tion exchange protocol for use in conjunction with the Protocol for providing the Connectionless-mode Network Service (ISO 8473)," ISO/IEC 10589:1992. [2] Callon, R., "OSI IS-IS for IP and Dual Environment," RFC 1195, December 1990. Parker, McPherson, Alaettinoglu Expires January 2002 [Page 7] Internet Draft - draft-parker-short-isis-hold-times-01.txt July 2001 12.0 Author's Addresses Jeff Parker Axiowave Networks 100 Nickerson Road Marlborough, Mass 01752 USA email: jparker@axiowave.com Danny McPherson Amber Networks 48664 Milmont Drive Fremont, CA 94538 USA email: danny@ambernetworks.com Cengiz Alaettinoglu Packet Design 66 Willow Place Menlo Park, CA 94025 USA email: cengiz@packetdesign.com 13.0 Full Copyright Statement Copyright (C) The Internet Society (1997). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of develop- ing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING Parker, McPherson, Alaettinoglu Expires January 2002 [Page 8] Internet Draft - draft-parker-short-isis-hold-times-01.txt July 2001 BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MER- CHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE." Parker, McPherson, Alaettinoglu Expires January 2002 [Page 9]