Network Working Group S. Kini Y. Yang B. Gao Internet Draft J. Jonnalagadda Document: draft-kini-ospf-gr-enhance-00.txt Mahi Networks Expires: June 2004 January 2004 Enhancements to OSPF Graceful Restart for Heterogeneous Environments Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract Reliability is a fundamental concern for the network. As a solution to improve network stability, the non-stop forwarding paradigm depends on protocol recovery based on graceful restart techniques. One of the proposed graceful restart techniques for [OSPF], a widely deployed IGP, is described in [OSPF-GR]. This technique has a limitation of not being backward compatible, in the sense that if a neighbor does not support the helper mode described in [OSPF-GR], the graceful-restart procedure will fail (i.e., revert to normal restart). For large multi-vendor networks, this scenario is fairly common. In this draft, we describe techniques that can achieve OSPF graceful restart even if a neighboring router does not support the helper-mode of [OSPF-GR]. Kini Expires Jun 2004 1 Enhancements to OSPF Graceful Restart January 2004 for Heterogeneous Environments Table of Contents Status of this Memo................................................1 Abstract...........................................................1 Conventions used in this document..................................3 1. INTRODUCTION....................................................3 2. TERMINOLOGY & NOTATION..........................................3 3. BASIC DEFINITIONS & ASSUMPTIONS.................................4 4. SYNC-ONLY-ADJACENCY APPROACH....................................5 4.1 Link-local-Opaque-LSA handling.................................8 4.2 Implementing a sync-only-adjacency.............................8 4.2.1 Non virtual adjacencies......................................9 4.2.2 Virtual adjacencies.........................................10 4.3 Scalability Issues............................................10 4.4 Error condition detection and processing......................11 4.5 Implementation and deployment issues..........................11 4.5.1 Pros........................................................11 4.5.2 Cons........................................................11 5. OSPF-GR ENHANCEMENT APPROACH...................................12 5.1 Detecting link state database inconsistency...................13 5.2 Recovering from link state database inconsistency.............14 6. CONCLUSION.....................................................14 7. FUTURE WORK....................................................15 8. ACKNOWLEDGEMENTS...............................................15 9. Security Considerations........................................15 10. References....................................................16 11. Author's Addresses............................................16 Kini Expires Jun 2004 2 Enhancements to OSPF Graceful Restart January 2004 for Heterogeneous Environments Conventions used in this document The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119]. 1. INTRODUCTION Reliability and high availability are fundamental requirements for the Internet, private telecom management networks, the intranet etc. Several efforts have been made to achieve non-stop forwarding through proprietary as well as standards based mechanism to enable graceful restart of routing and signaling protocols. [OSPF] is a prominently deployed IGP not only in the Internet, but also in private networks. The undergoing standardization effort in OSPF graceful restart by the IETF is reflected in [OSPF-GR]. Standards based graceful OSPF restart as described in [OSPF-GR] works in a homogeneous environment, i.e., all neighbors should implement the same graceful restart procedures. It is widely recognized that upgrading the software of an entire network is not feasible due to economical and network stability reasons. Also, in multi-vendor network deployments, software upgrades with all the restart features are rarely available simultaneously. These conditions inevitably lead to a heterogeneous environment. In a heterogeneous environment, it is desirable for the restart of a router with the latest software version, to result in non-stop forwarding in the network. Such a solution has greater value in terms of its longevity in a network deployment. This draft describes some techniques to achieve this goal. The underlying philosophy of [OSPF-GR] is to use the existing adjacency bring-up state machine and link state database flooding algorithms of [OSPF], while accomplishing non-stop forwarding. In this draft, we adhere to this philosophy. 2. TERMINOLOGY & NOTATION This draft describes two distinct techniques to enhance [OSPF-GR] to interwork with routers that have only implemented [OSPF]. One of them is described in section 4 and is henceforth referred to as OSPF-HR. The other technique is described in section 5. This technique is henceforth referred to as OSPF-EGR. Both these techniques assume the ability to implement a fake-adjacency. This is defined and described in section 3. The notation used to denote routers with different capabilities are as follows: Kini Expires Jun 2004 3 Enhancements to OSPF Graceful Restart January 2004 for Heterogeneous Environments . Routers with legacy OSPF implementations are denoted as O, O1, O2, O3, etc. These routers implement [OSPF]. They do not support the helper-mode of [OSPF-GR]. . Routers that have implemented [OSPF-GR] are denoted as OG, OG1, OG2, OG3, etc. . Routers that implement OSPF-HR are denoted as OH, OH1, OH2, OH3, etc. . Routers that implement OSPF-EGR are denoted as OE, OE1, OE2, OE3, etc. 3. BASIC DEFINITIONS & ASSUMPTIONS For this section, router OH and OE can be used interchangeably. Throughout this draft, we assume that a router OH (or OE) is able to maintain a fake-adjacency. A fake-adjacency is defined as an adjacency that a router OH maintains from the time OSPF is disabled on OH until the time OSPF is re-enabled. Such an adjacency is maintained by sending Hello packets periodically at intervals of HelloInterval for that adjacency. Such a Hello packet is referred to as a FakeHello. A FakeHello is the same as the last Hello packet sent before OSPF is disabled. Only an adjacency that is in the Full state before OSPF is disabled may be maintained across the restart as a fake-adjacency. Since the neighboring router's "Inactivity Timer" does not expire, it considers the adjacency state to be Full. Even though a fake- adjacency can be maintained using FakeHellos for as long as necessary (in accordance with [OSPF]), practical implementations tear down an adjacency after exceeding an upper limit on the number of LSA retransmissions. We assume that OH is able to re-enable OSPF before that limit is exceeded by a neighbor. Alternatively, if the upper limit on the number of retransmissions is configurable, it should be set on all neighbors of OH, to exceed the maximum time taken by OH to re-enable OSPF (the "grace period" of OH is a good upper bound). When OSPF is re-enabled on router OH, all OSPF procedures are started by treating a fake-adjacency as if it were in the Full state. As a result, the following actions take place 1. The "Hello Timer" and "Inactivity Timer" are started for this adjacency. 2. The router-LSA and network-LSA reflect the fake-adjacency as an adjacency in the Full state. 3. The previous adjacency information is restored into OSPF protocol data structures. This adjacency is now referred to as a recoverable-adjacency. The information in a FakeHello is sufficient to reconstruct the previous adjacency information in a recoverable-adjacency. The different stages of the adjacency are illustrated in Figure 1. The events causing the final transition from recoverable-adjacency Kini Expires Jun 2004 4 Enhancements to OSPF Graceful Restart January 2004 for Heterogeneous Environments to recovered are described in the sections detailing the specific OSPF restart technique (i.e., section 4 and 5, respectively). Timeline | __________ADJACENCY STATES____|____ROUTER EVENTS______________ | Adjacency in | Full State | | | - OSPF is disabled (planned/unplanned) | | fake-adjacency | | | - OSPF is re-enabled | recoverable-adjacency | Adjacency is treated as Full | but not yet recovered | | - | Adjacency in Full state | (recovered) | | | V Figure 1 An adjacency in Full state maintained as a fake-adjacency The ability to implementing a fake-adjacency is reasonable considering that modern routers have at least one if not several of these characteristics: 1. A redundant standby control plane processor. 2. Line cards that can generate a packet as a substitute for control plane, during control plane restart. 3. A control plane that can restart before its OSPF neighbor can tear down the adjacency. 4. Availability of non-volatile storage. A fake adjacency maintained with a neighbor O, results in O not recalculating any paths as OH restarts. Since the fake-adjacency is treated as Full, the link state database is not synchronized over it. The techniques proposed in this draft achieve link state database synchronization in this scenario. 4. SYNC-ONLY-ADJACENCY APPROACH Kini Expires Jun 2004 5 Enhancements to OSPF Graceful Restart January 2004 for Heterogeneous Environments In this approach, there is an underlying assumption that LSA flooding reduction procedures (e.g., mesh groups) are not deployed in the network. A router OH should associate an adjacency to a router O with a special type of adjacency in the same area to the same neighbor. This special type of adjacency is henceforth referred to as sync- only-adjacency. The main purpose of this adjacency is to re- synchronize the link state database without affecting forwarding. As we will show below, this technique elegantly builds upon the principle of separating data and control planes as described in [GMPLS-ARCH]. Multiple adjacencies between two routers and within the same area can share a single sync-only-adjacency between them (see Figure 3). In other words, a sync-only-adjacency can be associated with multiple adjacencies, provided all of them are between the same two routers and within the same area. ************************ * * * * +--*---+ +---*--+ | |--------------| | | O |--------------| OH | | |--------------| | +------+ +------+ ------ adjacencies in the same area ****** associated sync-only-adjacency (also in the same area) Figure 2 A sync-only-adjacency and its associated adjacencies The sync-only-adjacency must have the following characteristics: 1. The link state database must be synchronized over the sync- only-adjacency after OSPF is re-enabled on OH. 2. It should not affect path computation results. Any router in the network should arrive at the same result as it would in the absence of the mechanism described in this section. 3. A LSA flooded on an adjacency must also be flooded on its associated sync-only-adjacency provided the sync-only-adjacency is in a state that an LSA can be flooded on it. An exception to this are the link local opaque LSAs. A router OH is required to know if a neighboring router is a legacy router, i.e., it is incapable of the helper-mode described in [OSPF- GR]. This will typically be made known through operator configuration. Kini Expires Jun 2004 6 Enhancements to OSPF Graceful Restart January 2004 for Heterogeneous Environments When OSPF is disabled, OH maintains each adjacency to a neighbor O as a fake-adjacency except the sync-only-adjacency. When OSPF is re- enabled, OH follows the procedures of [OSPF-GR] with some modifications. The details of the modifications are described later. In brief, the modifications are: 1. The sync-only-adjacency is brought up as described in [OSPF-GR] (i.e., as a normal adjacency). OH should not advertise this adjacency in its router-LSA. 2. An adjacency that was maintained as a fake-adjacency is now treated as a recoverable-adjacency. It is considered as recovered only after it's associated sync-only-adjacency reaches Full state. 3. Adjacencies to routers other than O must be brought up using the procedures in [OSPF-GR]. During the sync-only-adjacency bring-up procedures, all LSAs are flooded again from the neighbor. This ensures that the link state database is synchronized in the network. In a sense, the neighbor is executing in a "helper mode" (as defined in [OSPF-GR]) without being aware of it. Figure 3 illustrates the state transitions for an adjacency to router O. Timeline | __________ADJACENCY STATES____|____ROUTER EVENTS______________ | Adjacency in | Full State | | | - OSPF is disabled (planned/unplanned) | | fake-adjacency | | | - OSPF is re-enabled | recoverable-adjacency | Adjacency is treated as Full | but not yet recovered | | - Associated sync-only-adjacency | goes to Full state Adjacency in Full state | (recovered) | V Figure 3 Adjacency state transition through a restart Kini Expires Jun 2004 7 Enhancements to OSPF Graceful Restart January 2004 for Heterogeneous Environments The operations of the restarting router described in section 2 of [OSPF-GR] are applicable to OH with a few modifications. The procedure described in section 2.2 of [OSPF-GR] (When to exit graceful restart) must be modified as follows: 1. In the first condition, the existing procedure applies only to neighboring routers other than O. In addition, every sync-only- adjacency to a neighbor O must also be re-established. 2. In the second condition, the consistency check with the pre- restart router-LSA must not be done for a LSA received from a neighbor O. 3. The following conditions must be added: When a fake-adjacency (or recoverable-adjacency) goes down. In addition, for the procedure described in section 2.3 of [OSPF-GR] (Actions on Exiting Graceful Restart) the following action must be added: 1. If a sync-only-adjacency has not reached Full state after OSPF is re-enabled, the (partially established) sync-only-adjacency and all associated recoverable-adjacencies must be torn down. The sync-only-adjacency need not be reestablished (until the next restart). The operations of a helper neighbor described in section 3 of [OSPF- GR] remains unchanged. All helper neighbors must have RestartHelperStrictLSAChecking disabled. 4.1 Link-local-Opaque-LSA handling Link-local-Opaque-LSAs originated by O will not be refreshed by O across the associated recoverable-adjacency when OSPF is re-enabled on OH. Hence, when OH receives such a LSA on an adjacency and stores it in the link state database, it should also be stored in non- volatile storage before sending back an acknowledgement. After OSPF is re-enabled, these LSAs should be flooded on that link, so the neighbor can refresh it. Self originated link-local-Opaque-LSAs should be refreshed by OH when OSPF is re-enabled. Note that the LSID (or Instance) values should be reconciled with those used before OSPF was disabled. This requires the LSID to be stored in non-volatile storage. Those LSAs corresponding to LSIDs used before OSPF was disabled but are not going to be generated after OSPF is re-enabled should be flushed by setting the LS Age to MaxAge. Typically, it is expected that there would be no need for these LSAs to change between the period of disabling and re-enabling OSPF. 4.2 Implementing a sync-only-adjacency The sync-only-adjacency should be brought up only if one of the adjacencies it is associated with is in the Full state (Note that the associated adjacency may be a recoverable-adjacency). It should Kini Expires Jun 2004 8 Enhancements to OSPF Graceful Restart January 2004 for Heterogeneous Environments be torn down if all its associated adjacencies reach a state less than Full. 4.2.1 Non virtual adjacencies To implement a sync-only-adjacency for a non virtual adjacency, any available IP tunneling technology can be used. Typical examples are [GRE], [IPIP] etc. Non-stop forwarding ensures that the OSPF packets of the sync-only-adjacency are tunneled correctly when OSPF is re- enabled and the sync-only-adjacency is brought up. Each of the following entities should be associated with a tunnel interface in order to implement its sync-only-adjacency: 1. A point-to-point interface 2. A neighbor on a point-to-multipoint interface 3. A neighbor on a NBMA interface capable of being a DR/BDR 4. A neighbor on a broadcast interface capable of being a DR/BDR. The sync-only-adjacency must not be a link on a shortest path in the network. A metric higher than the highest metric of an adjacency to that neighbor must be used. It is recommended that maximum value for link metric (0xffff) be configured for the tunnel interface. If all the other adjacencies to the neighbor have a metric strictly less than 0xffff, the tunnel interface is guaranteed not to be a link on a shortest path in the network. It is important to note that the tunnel need not be routed on an out of band network. It is likely that the OSPF packets of the sync- only-adjacency will be tunneled in-band on the physical interface comprising its associated adjacency. This is illustrated in Figure 4. The non-stop forwarding on the underlying physical interface ensures that OSPF packets of the sync-only-adjacency are tunneled correctly when OSPF is re-enabled and the sync-only-adjacency is brought up. _______ _______ / \ __________ / \ | O _ |_ _ _ _ _ |_ OH | | |__________| | \_______/ \_______/ _________ | | Physical interface having |_________| OSPF adjacency between O and OH _ _ _ _ _ Sync-only-adjacency associated with the adjacency of the underlying physical interface Figure 4 In-band sync-only-adjacency Also, note that some hardware may not be amenable to implement tunnels. However, the tunnels for sync-only-adjacency are not used Kini Expires Jun 2004 9 Enhancements to OSPF Graceful Restart January 2004 for Heterogeneous Environments for forwarding. Hence, software implementations for these tunnels are adequate. 4.2.2 Virtual adjacencies To implement a sync-only-adjacency for a virtual adjacency, define another virtual adjacency to the same neighbor using a different IP address. This could be the IP address of a loopback interface on that neighbor to ensure highly available reachability. Router OH must not advertise the virtual adjacency used as a sync-only- adjacency in its router-LSA. The OSPF implementation on O may not support defining two virtual adjacencies to the same neighbor. Alternatively, a tunneling mechanism as described in section 4.2.1 can be used to implement a sync-only-adjacency for a virtual adjacency. A couple of issues: 1. The flooding of AS-external-LSAs over such a sync-only- adjacency leads to some extra processing. However, LSA flooding reliability is not affected. 2. For a virtual link to be operational the underlying path cost must be less than or equal to 0xffff. To prevent the tunnel link of the sync-only-adjacency from being used in a shortest path, router OH must consider the associated virtual link to be operational only if the underlying path has a cost strictly less than 0xffff. In case of a network design error, the virtual link from OH to O may become inoperational because of the underlying path cost being greater than or equal to 0xffff. Since OH does not advertise the sync-only-adjacency in its router-LSA, the failure of the two-way-connectivity-check will ensure that the tunnel does not lie on a shortest path. 4.3 Scalability Issues Maintaining one additional adjacency per neighbor could in the worst case double the number of adjacencies on router OH. This introduces additional processing for 1. Running "Hello Timer" and generating periodic Hello packets 2. Running "Inactivity Timer" and processing neighbor Hello packets 3. Timers and packet generation/processing associated with Flooding LSAs To alleviate these issues the following steps should be taken 1. The HelloInterval and RouterDeadInterval for the sync-only- adjacency can be configured with far less stringent parameters since it is not used to detect a neighbor going down. 2. OH can bring down the sync-only-adjacency once it reaches Full state since it is not useful for LSA flooding after all its associated adjacencies are considered recovered. 3. On a broadcast/NBMA interface the sync-only-adjacency should be formed with only the DR and BDR. Of course, a DR or BDR must Kini Expires Jun 2004 10 Enhancements to OSPF Graceful Restart January 2004 for Heterogeneous Environments form the sync-only-adjacency with all other routers on the broadcast/NBMA network. 4.4 Error condition detection and processing If the "Router ID" of the sync-only-adjacency neighbor is different from its associated adjacency, the sync-only-adjacency should be torn down. If the associated adjacency is a recoverable-adjacency, then it must be torn down too. An operator notification should be generated, as the likely cause is operator configuration error. 4.5 Implementation and deployment issues 4.5.1 Pros 1. Routers O adjacent to OH only need changes to their configuration. The minimum configuration changes consist of: i. A tunnel interface terminating on router OH with a cost of 0xffff. If the adjacency is virtual, then either another virtual adjacency or a tunnel interface can be configured. ii. Change the max-lsa-retransmission counter to exceed the time it takes OH to re-enable OSPF (the grace-period is a good upper bound). 2. No changes to OSPF flooding algorithm. No changes to OSPF interface or adjacency state machines except for re-starting a fake-adjacency as an adjacency in the Full state. This is a very simple change. 3. Handles unplanned restart well. 4. This technique can co-exist in a network with [OSPF-GR] by implementing the helper-mode of [OSPF-GR]. 4.5.2 Cons 1. Tunnel implementation must be available in router O for this mechanism to work. 2. Two IP addresses are required for each tunnel. In the worst case, with n adjacencies in the network, up to 2n IP addresses may be required. If the domain does not have many IP addresses available, this could become an issue. Considering that NAT and private IP address spaces are well understood and deployed concepts, this should be a minor issue. 3. When OH restarts, SPF computation takes place in the entire network due to the changed LSAs generated by O. However, note that this does not result in any change in a shortest path computed by any router in the network. 4. Associating sync-only-adjacency with other adjacencies requires operator configuration. Especially for a broadcast interface, where neighbor configuration is typically not done, a per DR/BDR capable neighbor configuration is now required. 5. If OH goes down (i.e., without preserving non-stop forwarding) when sync-only-adjacency is in Full state, then path calculation will be incorrect until the neighboring router O Kini Expires Jun 2004 11 Enhancements to OSPF Graceful Restart January 2004 for Heterogeneous Environments detects that the sync-only-adjacency has gone down. Since sync- only-adjacency is torn down after reaching Full state as described in section 4.3, the probability of this condition occurring is highly unlikely. Typically, a Layer-1 (Loss of Signal) or Layer-2 indication should trigger this event (at O) with very little delay. However, if that is not the case, the "Inactivity Timer" expiry at router O will recover from this condition. To reduce recovery time in case this condition was to occur, the HelloInterval and RouterDeadInterval for sync- only-adjacency should not be kept too large. A value of 40s for HelloInterval and 160s for RouterDeadInterval is recommended for most networks. 6. In some networks, LSA flooding reduction techniques (e.g., mesh groups) are essential to maintain network stability. Depending on the specific flooding reduction technique used, configuring the network so the mechanism described in this section continues to flood LSAs reliably, could be a complicated and error-prone task. This technique should be avoided in such networks. 5. OSPF-GR ENHANCEMENT APPROACH This section describes a technique to implement "heterogeneous OSPF restart" without many of the limitations of the technique described in section 4. This technique is henceforth referred to as OSPF-EGR. The most important limitations that we try to avoid are: 1. Need for a tunnel mechanism on the neighbors 2. SPF instability introduced due to running SPF in the entire IGP domain. 3. Protocol processing overhead due to worst case doubling of the number of adjacencies in the network. This technique requires one neighbor in each area that router OE participates in, to be capable of the helper mode of [OSPF-GR]. As two special cases, the neighbor could be another virtual instance of OSPF in the same router or an external router multiple physical hops away and connected by a tunnel interface. The concept of sync-only- adjacency is not used in this technique. Router OE must be capable of [OSPF-GR] (both "helper" and "restart" mode). When OSPF is disabled, router OE maintains a fake-adjacency with all neighbors O. When OSPF is re-enabled, the fake-adjacency is treated as a recoverable-adjacency. Restart-mode procedures described in [OSPF-GR] are applied by OE to neighbors OG. On a normal exit of restart-mode procedures of [OSPF-GR], OE considers each recoverable- adjacency as recovered. On an abnormal exit of the restart-mode procedures of [OSPF-GR] each recoverable-adjacency is torn down by OE. The following conditions can lead to link state database inconsistency. Kini Expires Jun 2004 12 Enhancements to OSPF Graceful Restart January 2004 for Heterogeneous Environments 1. Before OSPF was disabled on OE, an LSA update received by OE from O and which has been acknowledged, but may not have been flooded to all other neighbors O1, O2, ... , OG1, ... etc. Since router O will not flood the LSA again to OE, the network will have an inconsistent picture of such a LSA. 2. Before OSPF was disabled on OE, LSAs being deleted (by MaxAge) and flooded by O to OE, may be acknowledged by OE before it has been flooded reliably to all other neighbors O1, O2, ..., OG1,. . Since O does not flood the MaxAge LSA again, it will not be removed from the link state database of other neighbors O1, O2,.. , OG1, .. etc. Again, the network will have an inconsistent picture of such a LSA. 3. A LSA originated by OE before OSPF is disabled, may not be generated after OSPF is re-enabled. If such a LSA was flooded to O but not flooded to OG, then it will not be removed from the link state database of O. These conditions need to be detected. A naive implementation may decide to abort on detecting this condition. A more robust implementation can recover from these conditions. The trade-offs are in implementation complexity, size of non-volatile storage required and the increase in LSA flooding times due to the time taken to write to non-volatile storage. 5.1 Detecting link state database inconsistency In this technique, a boolean flag is maintained on router OE. This flag indicates whether all link state retransmission lists on OE are empty. This flag is henceforth referred to as retx-list-empty. In addition, all routers OG have to enable RestartHelperStrictLSAChecking. When OSPF is re-enabled, it operates as follows depending on the value of retx-list-empty: 1. If the flag is true, OSPF aborts every fake-adjacency. It is not possible to recover on detection. 2. If the flag is false, OE treats every fake-adjacency as a recoverable-adjacency, and follows procedures in [OSPF-GR] to bring up the adjacency with the [OSPF-GR] capable neighbor. In this process, the entire link state database is synchronized on router OE. On exiting graceful restart due to an error (grace LSA timeout or inconsistent router LSA) every recoverable- adjacency is torn down. Since the probability of this flag being true is very low under typical network conditions, non-stop forwarding is achieved in most cases. Link-local-LSA handling is unchanged from section 4.1 for the recoverable-adjacency. Kini Expires Jun 2004 13 Enhancements to OSPF Graceful Restart January 2004 for Heterogeneous Environments 5.2 Recovering from link state database inconsistency In this section we describe a modification to technique described in section 5.1, so that recovery is possible when OSPF is re-enabled and the flag retx-list-empty is true. A minor modification to the helper mode procedure of [OSPF-GR] is required. When router OG receives a MaxAge LSA, it should not delete it from its database if a neighbor is undergoing restart (i.e., a grace-LSA is present from the neighbor). Henceforth, OG denotes a router that implements [OSPF-GR] with this minor modification. Router OG can also disable RestartHelperStrictLSAChecking. A modification is also required for the flooding algorithm on router OE. When router OE adds a LSA to its link state database, it first stores the LSA header in non-volatile storage. The LSA header is removed from non-volatile storage when . An acknowledgement is received from an adjacency with a neighbor OG (or OE), if the LSA is not a MaxAge LSA . Acknowledgements are received from all adjacencies, if the LSA is a MaxAge LSA Since the LSA header is stored in non-volatile storage for a very short time, the size of the non-volatile storage required is very less. When OSPF is re-enabled, OE floods all LSA headers in non- volatile storage, as LSAs, by changing the length to zero and setting checksum to zero. This is equivalent to request O to refresh the specific LSA. 6. CONCLUSION In this draft, we have discussed two techniques to implement OSPF restart in a heterogeneous environment. Both these techniques are specified in sufficient detail to ensure adherence to basic OSPF mechanisms. Care has been taken to ensure that major code changes are not required and the assumptions are reasonable given current routing system architectures. OSPF-HR is a novel idea to implement "heterogeneous OSPF restart". One interesting feature of this approach is that it can work on a node without any helper-mode neighbors. However, there is an assumption of support for a tunneling technique. In addition, there are some concerns on scalability. OSPF-EGR extends the procedures of [OSPF-GR] to ensure interoperability with legacy OSPF implementations. However, there is an assumption that at least one neighbor implements the helper-mode of [OSPF-GR] (with a minor modification). It is easily observed that OSPF-HR and OSPF-EGR can interoperate with each other. The flexibility in network deployments for the proposed solutions is illustrated in Figure 5. In this network, when Kini Expires Jun 2004 14 Enhancements to OSPF Graceful Restart January 2004 for Heterogeneous Environments OG1 restarts, non-stop forwarding cannot take place (since O1 will stop sending traffic to OG1). However, when OG, OH or OE restarts, non-stop forwarding takes place at all nodes in the network. +----+ +----+ +----+ | OG |------| OG1|------| O1 | +----+ +----+ +----+ / \ / \ +----+ +----+ +----+ +----+ | O2 |------| OH |------| OE |------| O3 | +----+ +----+ +----+ +----+ Figure 5 Relationship between different solutions The discussions on deployment issues illustrate the applicability of these techniques to the real world. We can conclude that implementing the techniques described in this draft enhances [OSPF- GR]. 7. FUTURE WORK This technique can be extended to other IGPs like [ISIS]. This should be a simpler task given the relative simplicity of [ISIS] as compared to [OSPF]. Another interesting possibility is to apply the sync-only-adjacency approach towards solving network-wide software upgrade. 8. ACKNOWLEDGEMENTS Our thanks to Charles Chen and Mahi Networks for supporting this work. 9. Security Considerations This draft does not introduce any new security issues for the OSPF protocol. Kini Expires Jun 2004 15 Enhancements to OSPF Graceful Restart January 2004 for Heterogeneous Environments 10. References [OSPF] Moy, J., "OSPF Version 2". April 1998. RFC2328 [OSPF-GR] Moy, J., et al. "Graceful OSPF Restart". November 2003. RFC3623. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997 [GRE] Farinacci, D., et al. "Generic Routing Encapsulation (GRE)". RFC2784 [IPIP] Perkins, C., "IP Encapsulation within IP". RFC2003. [ISIS] "Intermediate System to Intermediate System Intra-Domain Routeing Exchange Protocol for use in Conjunction with the Protocol for Providing the Connectionless-mode Network Service (ISO 8473)", ISO DP 10589, Feb 1990. 11. Author's Addresses Sriganesh Kini, Yibin Yang, Biao Gao, Jagannadha Jonnalagadda Mahi Networks 1039 N McDowell Blvd Petaluma, CA 94954 USA Phone: 1-707-283-1000 Email:{skini,yyang,bgao,jjonnalagadda}@mahinetworks.com Kini Expires Jun 2004 16