Internet Engineering Task Force Sriganesh Kini Internet draft Rohit Dube Expiration Date: April 2000 Bell Labs 12 Oct 1999 Redundant LSA reduction in OSPF draft-kini-dube-ospf-redundant-lsa-reduction-00.txt Status of This Memo This document is an Internet Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet Drafts are working documents of the Internet Engineering Task Force (IETF), its Areas, and its Working Groups. Note that other groups may also distribute working documents as Internet Drafts. Internet Drafts are draft documents valid for a maximum of six months. Internet Drafts may be updated, replaced, or obsoleted by other documents at any time. It is not appropriate to use Internet Drafts as reference material, or to cite them other than as a ``working draft'' or ``work in progress.'' The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html Abstract In a link state routing protocol like OSPF, the network and CPU bandwidth consumed by the routing protocol is primarily to propagate and process information about the network topology and changes in the topology. The 'flooding procedure' of OSPF propagates the topology change information to all routers in the network. This procedure delivers the information, multiple times to a router (depending on the source of the information and its connectivity) even though it needs to be delivered only once. The resultant wastage of network bandwidth and CPU processing time reduces the scalability of OSPF. In some cases like a fully connected subgraph this redundant information increases by an order of magnitude (from O(n) to O(n^2)). This draft describes a technique to reduce this redundant information for a fully connected subgraph of the network. Acknowledgments We would like to thank Shivi Fotedar, Furquan Ansari and Ajay Patel for discussions which enabled a clear definition of the idea. This draft is largely based on the ISIS mesh groups draft [Ref2]. Comments on this draft are best sent to the OSPF working group mailing list - ospf@discuss.microsoft.com. Kini/Dube [Page 1] Internet Draft April 2000 1 Introduction OSPF [Ref1] is a link state routing protocol designed to run internal to an Autonomous System (AS). The OSPF protocol enables each router to learn the topology of the network. This topological information is used by the router to compute the shortest path to each destination. The shortest path is used to associate each destination with the appropriate next hop. This is the information needed by the forwarding function in a router. To generate the topology of the network, each router generates its local connectivity information i.e. its connection with networks and routers adjacent to it. Each unit of this information is called a link state advertisement (LSA). All the LSAs put together constitute the topology of the network. Any change in the topology (caused by a link/interface going-down/coming-up) is conveyed by generating a new LSA. The OSPF protocol reliably propagates a newly generated LSA to all the routers in the network. Also when a router establishes a new adjacency it synchronizes all LSAs with its adjacent router. The network and CPU bandwidth requirement of the OSPF protocol is primarily to propagate LSAs. A newly generated LSA is propagated throughout the network by the 'flooding procedure'. The flooding procedure ensures reliability in propagating the LSA to all routers in the Autonomous System. Reliability is achieved by acknowledgments and timeout/retransmission of the LSA. To propagate an LSA each router establishes a protocol state with every neighboring router. This state is called an 'adjacency'. The propagation of LSAs take place over an 'adjacency'. Thus the LSA travels one hop at a time. An LSA received over an adjacency is handled by the 'flooding procedure'. It transmits the LSA over other adjacencies as required. On broadcast networks and NBMA networks (operated in the NBMA mode) the number of adjacencies is reduced by electing one router on that network as a designated router (DR) for that network. All routers on the broadcast network or NBMA network establish an adjacency only with the DR of that network. Since the DR is critical for propagating topology information, the protocol operation can be disrupted for some time if the DR goes down and a new DR has to be elected. Hence a backup DR (BDR) is also elected for a broadcast or NBMA network. All routers establish an adjacency with the BDR as well. The DR concept reduces the number of adjacencies by an order (from O(n^2) to O(n)) and a corresponding decrease in protocol traffic and processing requirements. This draft describes a technique to reduce redundant information for a fully connected subgraph of the network. The organization of the sections is as follows. The database synchronization procedure of the OSPF protocol is explained in detail in Section 2. Section 3 examines how and why redundant information is generated by the flooding procedure. Kini/Dube [Page 2] Internet Draft April 2000 Section 4 discusses approaches to reduce this redundant information. Section 5 outlines the implementation details for LSA reduction in a mesh group topology using configuration information. Examples for different configuration is given in section 6. Section 7 concludes the draft. 2 OSPF database synchronization Synchronization of the network topology database in the routers is the primary function of OSPF protocol. Each piece of the topology is described in a link state advertisement (LSA). OSPF defines 5 types of LSAs. LSAs are sent over adjacencies. An adjacency is a protocol state established between neighboring routers. OSPF packets are exchanged over an adjacency to synchronize the topological database. OSPF defines two procedures, one to bring up an adjacency i.e. initial synchronization of the database, and another to maintain the synchronization i.e. when there is a change in the topology of the network due to a node/link going up or coming down. Not every pair of neighbors form an adjacency. Adjacencies are established between routers connected by point-to- point networks, point-to-multipoint networks and virtual links. On broadcast and NBMA networks, one of the routers is elected a Designated Router (DR) and another router is elected a Backup Designated Router (BDR). All routers on that network become adjacent to both the DR and the BDR of that network. OSPF employs checksums and an age field to preserve the integrity of an LSA. Each LSA is transmitted with its check sum. If the checksum doesn't match the LSA is discarded. Since several instances of a LSA (generated at different time instances) can simultaneously exist in the AS, OSPF uses the age field to determine the latest version of the LSA. The topology database update procedures use the age field to ensure that only the latest version of the LSA is retained and older versions are dropped. The initial synchronization procedure exchanges database description (DD) packets. The DD packets contain a summary of all LSAs in the router. Reliability is ensured in DD packets transmission by using sequence numbers and timeout/retransmissions. The routers then request those LSAs which they do not have or have less recent copies of, by putting these in a link state request list. LSAs in the link state request list are sent in a link state request (LSR) packet. The peer responds by sending LSAs in a LSU packet. On receiving a LSA it is removed from the link state request list. Timeout/retransmission of the LSR ensures reliability in synchronizing the LSA. Kini/Dube [Page 3] Internet Draft April 2000 The procedure used to maintain database synchronization on a topology change is the flooding procedure. Reliability is achieved by acknowledging the LSA with a link state acknowledgment (LSack). Multiple LSAs on a retransmission list can be grouped together in a LSU packet. Similarly multiple acknowledgments can be grouped in a link state acknowledgment packet. A link state retransmission list is associated with each adjacent neighbor. LSAs which need to be flooded over an adjacency is placed on its link state retransmission list. These LSAs are grouped and sent in LSU packets. LSAs are removed from the list on receiving an acknowledgment. A LSA received on one adjacency is flooded on all the other adjacencies of the router. On a broadcast/NBMA network the DR sends the LSA to all the routers on that network. On a broadcast network, the LSUs are multicast to save bandwidth. 3 Redundant LSA generation in OSPF The flooding mechanism of OSPF sends an LSA along every adjacency. This results in a router receiving a LSA multiple times. Let us consider a simple topology where this can happen. Consider the topology in Figure 1. ----- | A | / ----- \ / \ / \ / \ ----- ----- | B |----------| C | ----- ----- Figure 1. Simple topology that generates redundant information If router A receives a LSA (say a new LSA) to be flooded. A adds the LSA to the retransmission list for B and C. The LSA is considered reliably sent to B and C after acknowledgments from B and C reach A. Consider the situation as seen from B. On receiving the LSA from A it will add it to the retransmission list of C. B does not know if A has reliably sent the LSA to C. Similarly, C will add the LSA to the retransmission list of B. Both B and C will send an LSU (containing the newly received LSA) to each other. Note, if B (C) receives the LSU before it sends the LSU to C (B), then it considers this as a positive acknowledgment and does not send the LSU itself. Hence at least one of B or C receives the LSA twice. Kini/Dube [Page 4] Internet Draft April 2000 The situation becomes worse as the degree of connectivity increases. Each node receives or transmits a LSA on each of its adjacencies. If OSPF is not to generate redundant information, each router should receive the LSA exactly once. Before a router sends an LSA along an adjacency it should be sure that the peer router has not received the LSA from another adjacency. The number of redundant copies of the LSA generated/received by a router (say X) is one less than the number of adjacent routers which have a path to the origin of the LSA going through X. In an extreme case like a fully connected subgraph, O(n^2) copies of the LSA will be transmitted where only O(n) are enough. 4 Redundant LSA reduction approaches To reduce the number of redundant LSA copies, various approaches can be taken. 4.1 Spanning tree approach Adjacencies can be established such that from the network a spanning tree is constructed. Any topology change information should be conveyed only along the spanning tree. Thus each router would receive the LSA exactly once. This scheme requires a protocol mechanism for all the routers to agree on the spanning tree. Moreover, the spanning tree itself may need to change depending on which part of the network changes. If a link which is not part of the spanning tree changes state, then the LSA can travel over the established spanning tree. However, if any component of the spanning tree itself changes then a protocol mechanism is required to re-establish the spanning tree. This will slow down the flooding process and also add to the complexity of the protocol. 4.2 Configuration information approach Instead of a protocol mechanism, configuration information can be used to reduce the redundant LSA generation. The configuration information should correlate the adjacency over which a LSA is received to the adjacencies over which this LSA should be forwarded such that LSA duplication is reduced. Consider a subset of routers and links which are fully connected. Let us call such a subset of routers and links as an OSPF mesh group. In an OSPF mesh group, a minimum of configuration information can remove LSA duplication. This mesh group can extend over point-to-point links and point- to-multipoint links. A typical case like a point-to-multipoint interface will have all members on the point-to-multipoint interface configured as part of a mesh group. The flooding of LSAs needs to be modified as follows. LSAs generated by the router or received over an adjacency which is not part of the mesh group is flooded as before. However a LSA received over an adjacency which is part of the mesh group is flooded only along those adjacencies which are not a part of the mesh group. Kini/Dube [Page 5] Internet Draft April 2000 The configuration information can also be set on the lines of the Designated Router concept (as in broadcast networks or NBMA networks). All members of the mesh group can be configured such that they flood the LSA to one or more designated routers which in turn flood the LSAs to all members of the mesh group. The adjacencies between members of the mesh group neither of which is not a DR should be configured to block the transmission of LSAs along the adjacency. The main drawback of this scheme is problems caused by incorrect configuration. If the mesh group configuration on one of the router wrongly includes a particular adjacency, then the router at the other end of the adjacency will have a blind spot for all LSAs it can receive from members of the mesh group to which it does not have an adjacency. Another drawback of this mechanism is the possibility of database inconsistencies. Consider a mesh group configured without a DR. If the adjacency between two members of a mesh group goes down, a LSA generated by either of the two will not reach the other via other members of the mesh group. If that adjacency is the only path between the two members of the mesh group then the database of the two members of the group will be inconsistent. Even if a DR is configured and an adjacency goes down between the DR and another member of the mesh group, then the databases can become inconsistent. 5 Implementation Details A short discussion on the implementation for mesh groups is given below. The following information needs to be configured for a point-to- point network, or a neighbor configured on a multi access network which does not have broadcast capability : meshgroupstate and meshgroupnumber. A broadcast interface can be made part of the mesh group only if every router on the broadcast network is a neighbor to every other router on the mesh group. The values for meshgroupstate can be either DISABLED, ENABLED or BLOCKED. If an interface is not a part of any mesh group then its meshgroupstate is set to DISABLED. Values assigned for meshgroupnumber can be any unique value for the meshgroup. The value of meshgroupnumber is used only if the meshgroupstate is not DISABLED. The state BLOCKED is used if the concept of DR is to be used within the mesh group. In this case, adjacencies to the DR should be in state ENABLED whereas adjacencies with others should be in state BLOCKED. In a mesh group, LSAs should not be sent along adjacencies which are in BLOCKED state. Kini/Dube [Page 6] Internet Draft April 2000 The original flooding procedure is as follows, flood_lsa (adjacency in, LSA) { for all adjacencies 'out' if (in != out) add LSA to retransmission list of out send LSUs and wait for LS acknowledgments } The flooding procedure is modified to, flood_lsa (adjacency in, LSA) { for all adjacencies 'out' if ( in != out) if ( out.meshgroupstate == DISABLED ) add LSA to retransmission list of out else { if ( in.meshgroupstate == DISABLED ) if ( out.meshgroupstate != BLOCKED ) add LSA to retransmission list of out else if ( in.meshgroupnumber != out.meshgroupnumber) add LSA to retransmission list of out else if ( out.meshgroupstate != BLOCKED ) add LSA to retransmission list of out } send LSUs and wait for LS acknowledgments } Note : LSAs originated by the router is equivalent to an LSA originated over an interface which has meshgroupstate set to DISABLED. Interfaces not in any meshgroups at all are considered DISABLED. The logic is explained more clearly in the decision matrix shown in Figure 2. D = disabled, E = enabled, B = blocked, X = send LSA (same mesh group) (different mesh group) out -> out -> | D E B | D E B in ----------------- in ----------------- | D | X X | D | X X V | V | E | X X E | X X | | B | X B | X X Figure 2. Flooding logic Kini/Dube [Page 7] Internet Draft April 2000 The implementation of this feature on a commercial router stack took about two man-weeks (including the debug and test cycle). 6 Configuration examples 6.1 Example 1 ----- | D | ----- |(d1) | |(a1) ----- | A | (a3)/ ----- \(a2) / \ / \ (b2)/ \(c2) ----- ----- | B |----------| C | ----- (b1) (c1) ----- Figure 3. Topology with a mesh group Consider the topology in Figure 3 consisting of four nodes A, B, C and D. Here A, B, C belong to a fully connected mesh group. The interfaces of A are named a1, a2, a3; interfaces of B are named b1, b2; interfaces of C are named c1, c2 and interface of D is named d1. Say the mesh group is denoted by 'g'. The configurations at the nodes for the respective interfaces will be. Interface meshgroupstate meshgroupnumber ( X = dont care) a1 DISABLED X a2 ENABLED g a3 ENABLED g b1 ENABLED g b2 ENABLED g c1 ENABLED g c2 ENABLED g d1 DISABLED X Note : For nodes which are not a part of any mesh group, no configuration is needed. In the above example node D need not have any configuration information. 6.2 Example 2 Consider the same topology shown in Figure 3. Say node B is to configured as a DR. The configuration at the nodes for the respective interfaces will be. Kini/Dube [Page 8] Internet Draft April 2000 Interface meshgroupstate meshgroupnumber ( X = dont care) a1 DISABLED X a2 BLOCKED g a3 ENABLED g b1 ENABLED g b2 ENABLED g c1 ENABLED g c2 BLOCKED g 6.3 Example 3 ----- ----- | B |----------| C | ----- (b2) (c1) ----- (b1)\ /(c2) g1 \ / \ (a2)/ (a1)\ -----/ (a5) (f1) ----- | A |--------------| F | / ----- \ ----- g2 /(a4) (a3)\ / \ (d2)/ \(e2) ----- ----- | D |----------| E | ----- (d1) (e1) ----- Figure 4. Topology with two mesh groups Consider the topology shown in Figure 4 consisting of 6 nodes A, B, C, D, E and F. The interfaces are named as shown in the figure. Two mesh groups are configured. The mesh group consisting of A, B and C is identified by g1. The other mesh group consisting of A, D and E is identified by g2. Say, neither of the two mesh groups have a DR configured. The configuration at the nodes for the respective interfaces will be. Interface meshgroupstate meshgroupnumber ( X = dont care) a1 ENABLED g1 a2 ENABLED g1 a3 ENABLED g2 a4 ENABLED g2 a5 DISABLED X b1 ENABLED g1 b2 ENABLED g1 c1 ENABLED g1 c2 ENABLED g1 d1 ENABLED g2 d2 ENABLED g2 e1 ENABLED g2 e2 ENABLED g2 Kini/Dube [Page 9] Internet Draft April 2000 6.4 Example 4 Consider the same topology as shown in Figure 4. Say, the mesh group g1 has node A configured as a DR. The mesh group g2 does not have a DR configured. The configuration at the nodes for the respective interfaces will be. Interface meshgroupstate meshgroupnumber ( X = dont care) a1 ENABLED g1 a2 ENABLED g1 a3 ENABLED g2 a4 ENABLED g2 a5 DISABLED X b1 ENABLED g1 b2 BLOCKED g1 c1 BLOCKED g1 c2 ENABLED g1 d1 ENABLED g2 d2 ENABLED g2 e1 ENABLED g2 e2 ENABLED g2 7 Conclusion This draft discusses how scalability of the OSPF protocol is adversely affected because of redundant information generated by the flooding procedure. A method to limit this by employing configuration information is also presented. This method is not free from incorrect operation due to configuration errors. The problems posed by having a dynamic mechanism to achieve the same result are stated and need to be investigated to improve on this approach. 8 References [Ref1] J. Moy. OSPF version 2. RFC 2328, Internet Engineering Task Force, 1998. [Ref2] R. Balay, D. Katz and J. Parker. ISIS mesh groups. Internet Draft draft-balay-katz-parker-mesh-00.txt. June 1999. 9 Author's Address Sriganesh Kini Bell Labs, Lucent Technologies 101 Crawfords Corner, Holmdel, NJ. 07733 USA email : kini@dnrc.bell-labs.com Rohit Dube Bell Labs, Lucent Technologies 101 Crawfords Corner, Holmdel, NJ. 07733 USA email : rohitd@dnrc.bell-labs.com Kini/Dube [Page 10]