Internet Engineering Task Force                          Sriganesh Kini
Internet draft                                               Rohit Dube
Expiration Date: April 2000                                   Bell Labs
                                                            12 Oct 1999

                     Redundant LSA reduction in OSPF
            draft-kini-dube-ospf-redundant-lsa-reduction-00.txt

Status of This Memo

    This document is an Internet Draft and is in full conformance with
    all provisions of Section 10 of RFC2026.  Internet Drafts are working
    documents of the Internet Engineering Task Force (IETF), its Areas,
    and its Working Groups.  Note that other groups may also distribute
    working documents as Internet Drafts.

    Internet Drafts are draft documents valid for a maximum of six
    months.  Internet Drafts may be updated, replaced, or obsoleted by
    other documents at any time.  It is not appropriate to use Internet
    Drafts as reference material, or to cite them other than as a
    ``working draft'' or ``work in progress.''

    The list of current Internet-Drafts can be accessed at
    http://www.ietf.org/ietf/1id-abstracts.txt

    The list of Internet-Draft Shadow Directories can be accessed at
    http://www.ietf.org/shadow.html

Abstract

    In a link state routing protocol like OSPF, the network and CPU
    bandwidth consumed by the routing protocol is primarily to
    propagate and process information about the network topology
    and changes in the topology. The 'flooding procedure' of OSPF
    propagates the topology change information to all routers in the
    network. This procedure delivers the information, multiple times
    to a router (depending on the source of the information and its
    connectivity) even though it needs to be delivered only once.
    The resultant wastage of network bandwidth and CPU processing
    time reduces the scalability of OSPF. In some cases like a fully
    connected subgraph this redundant information increases by an order
    of magnitude (from O(n) to O(n^2)). This draft describes a
    technique to reduce this redundant information for a fully
    connected subgraph of the network.

Acknowledgments

    We would like to thank Shivi Fotedar, Furquan Ansari and Ajay Patel
    for discussions which enabled a clear definition of the idea. This
    draft is largely based on the ISIS mesh groups draft [Ref2].
    Comments on this draft are best sent to the OSPF working group
    mailing list - ospf@discuss.microsoft.com.

Kini/Dube                                                     [Page 1]

Internet Draft                                              April 2000

1 Introduction

    OSPF [Ref1] is a link state routing protocol designed to
    run internal to an Autonomous System (AS). The OSPF protocol
    enables each router to learn the topology of the network. This
    topological information is used by the router to compute the
    shortest path to each destination. The shortest path is used to
    associate each destination with the appropriate next hop. This is
    the information needed by the forwarding function in a router.

    To generate the topology of the network, each router generates its
    local connectivity information i.e. its connection with networks
    and routers adjacent to it. Each unit of this information is called
    a link state advertisement (LSA). All the LSAs put together
    constitute the topology of the network. Any change in the topology
    (caused by a link/interface going-down/coming-up) is conveyed by
    generating a new LSA. The OSPF protocol reliably propagates a newly
    generated LSA to all the routers in the network. Also when a router
    establishes a new adjacency it synchronizes all LSAs with its
    adjacent router. The network and CPU bandwidth requirement of the
    OSPF protocol is primarily to propagate LSAs.

    A newly generated LSA is propagated throughout the network by the
    'flooding procedure'. The flooding procedure ensures reliability in
    propagating the LSA to all routers in the Autonomous System.
    Reliability is achieved by acknowledgments and
    timeout/retransmission of the LSA. To propagate an LSA each router
    establishes a protocol state with every neighboring router. This
    state is called an 'adjacency'. The propagation of LSAs take place
    over an 'adjacency'. Thus the LSA travels one hop at a time. An
    LSA received over an adjacency is handled by the
    'flooding procedure'. It transmits the LSA over other adjacencies
    as required. On broadcast networks and NBMA networks (operated in
    the NBMA mode) the number of adjacencies is reduced by electing one
    router on that network as a designated router (DR) for that
    network. All routers on the broadcast network or NBMA network
    establish an adjacency only with the DR of that network. Since the
    DR is critical for propagating topology information, the protocol
    operation can be disrupted for some time if the DR goes down and a
    new DR has to be elected. Hence a backup DR (BDR) is also elected
    for a broadcast or NBMA network. All routers establish an adjacency
    with the BDR as well. The DR concept reduces the number of
    adjacencies by an order (from O(n^2) to O(n)) and a
    corresponding decrease in protocol traffic and processing
    requirements.

    This draft describes a technique to reduce redundant information
    for a fully connected subgraph of the network. The organization
    of the sections is as follows. The database synchronization
    procedure of the OSPF protocol is explained in detail in
    Section 2. Section 3 examines how and why redundant information is
    generated by the flooding procedure.

Kini/Dube                                                     [Page 2]

Internet Draft                                              April 2000

    Section 4 discusses approaches to reduce this
    redundant information. Section 5 outlines the
    implementation details for LSA reduction in a mesh group topology
    using configuration information. Examples for different configuration
    is given in section 6. Section 7 concludes the draft.

2 OSPF database synchronization

    Synchronization of the network topology database in the routers is
    the primary function of OSPF protocol. Each piece of the topology
    is described in a link state advertisement (LSA). OSPF defines 5
    types of LSAs. LSAs are sent over adjacencies. An adjacency is a
    protocol state established between neighboring routers. OSPF
    packets are exchanged over an adjacency to synchronize the
    topological database. OSPF defines two procedures, one to bring up
    an adjacency i.e. initial synchronization of the database, and
    another to maintain the synchronization i.e. when there is a change
    in the topology of the network due to a node/link going up or
    coming down. Not every pair of neighbors form an adjacency.
    Adjacencies are established between routers connected by point-to-
    point networks, point-to-multipoint networks and virtual links.
    On broadcast and NBMA networks, one of the routers is elected a
    Designated Router (DR) and another router is elected a Backup
    Designated Router (BDR). All routers on that network become
    adjacent to both the DR and the BDR of that network.

    OSPF employs checksums and an age field to preserve the integrity
    of an LSA. Each LSA is transmitted with its check sum. If the
    checksum doesn't match the LSA is discarded. Since several instances
    of a LSA (generated at different time instances) can simultaneously
    exist in the AS, OSPF uses the age field to determine the latest
    version of the LSA. The topology database update procedures use
    the age field to ensure that only the latest version of the LSA is
    retained and older versions are dropped.

    The initial synchronization procedure exchanges database
    description (DD) packets. The DD packets contain a summary of all
    LSAs in the router. Reliability is ensured in DD packets
    transmission by using sequence numbers and timeout/retransmissions.
    The routers then request those LSAs which they do not have or have
    less recent copies of, by putting these in a link state request
    list. LSAs in the link state request list are sent in a link state
    request (LSR) packet. The peer responds by sending LSAs in a LSU
    packet. On receiving a LSA it is removed from the link state
    request list. Timeout/retransmission of the LSR ensures reliability
    in synchronizing the LSA.

Kini/Dube                                                     [Page 3]

Internet Draft                                              April 2000

    The procedure used to maintain database synchronization on a
    topology change is the flooding procedure. Reliability is achieved
    by acknowledging the LSA with a link state acknowledgment (LSack).
    Multiple LSAs on a retransmission list can be grouped together in a
    LSU packet. Similarly multiple acknowledgments can be grouped in a
    link state acknowledgment packet. A link state retransmission list
    is associated with each adjacent neighbor. LSAs which need to be
    flooded over an adjacency is placed on its link state retransmission
    list. These LSAs are grouped and sent in LSU packets. LSAs are
    removed from the list on receiving an acknowledgment. A LSA
    received on one adjacency is flooded on all the other adjacencies
    of the router. On a broadcast/NBMA network the DR sends the LSA to
    all the routers on that network. On a broadcast network, the LSUs
    are multicast to save bandwidth.

3 Redundant LSA generation in OSPF

    The flooding mechanism of OSPF sends an LSA along every adjacency.
    This results in a router receiving a LSA multiple times. Let us
    consider a simple topology where this can happen. Consider the
    topology in Figure 1.

                           -----
                          |  A  |
                         / ----- \
                        /         \
                       /           \
                      /             \
                   -----            -----
                  |  B  |----------|  C  |
                   -----            -----

    Figure 1. Simple topology that generates redundant information

    If router A receives a LSA (say a new LSA) to be flooded. A adds
    the LSA to the retransmission list for B and C. The LSA is
    considered reliably sent to B and C after acknowledgments from B
    and C reach A. Consider the situation as seen from B. On receiving
    the LSA from A it will add it to the retransmission list of C. B
    does not know if A has reliably sent the LSA to C. Similarly, C
    will add the LSA to the retransmission list of B. Both B and C will
    send an LSU (containing the newly received LSA) to each other.
    Note, if B (C) receives the LSU before it sends the LSU to C (B),
    then it considers this as a positive acknowledgment and does not
    send the LSU itself. Hence at least one of B or C receives the LSA
    twice.

Kini/Dube                                                     [Page 4]

Internet Draft                                              April 2000

    The situation becomes worse as the degree of connectivity increases.
    Each node receives or transmits a LSA on each of its adjacencies.
    If OSPF is not to generate redundant information, each router
    should receive the LSA exactly once. Before a router sends an LSA
    along an adjacency it should be sure that the peer router has not
    received the LSA from another adjacency. The number of redundant
    copies of the LSA generated/received by a router (say X) is one
    less than the number of adjacent routers which have a path to the
    origin of the LSA going through X. In an extreme case like a fully
    connected subgraph, O(n^2) copies of the LSA will be transmitted
    where only O(n) are enough.

4 Redundant LSA reduction approaches

    To reduce the number of redundant LSA copies, various approaches
    can be taken.

    4.1 Spanning tree approach

    Adjacencies can be established such that from the network a
    spanning tree is constructed. Any topology change information
    should be conveyed only along the spanning tree. Thus each router
    would receive the LSA exactly once. This scheme requires a protocol
    mechanism for all the routers to agree on the spanning tree.
    Moreover, the spanning tree itself may need to change depending on
    which part of the network changes. If a link which is not part of
    the spanning tree changes state, then the LSA can travel over the
    established spanning tree. However, if any component of the
    spanning tree itself changes then a protocol mechanism is required
    to re-establish the spanning tree. This will slow down the flooding
    process and also add to the complexity of the protocol.

    4.2 Configuration information approach

    Instead of a protocol mechanism, configuration information can be
    used to reduce the redundant LSA generation. The configuration
    information should correlate the adjacency over which a LSA is
    received to the adjacencies over which this LSA should be forwarded
    such that LSA duplication is reduced. Consider a subset of routers
    and links which are fully connected. Let us call such a subset of
    routers and links as an OSPF mesh group. In an OSPF mesh group, a
    minimum of configuration information can remove LSA duplication.
    This mesh group can extend over point-to-point links and point-
    to-multipoint links. A typical case like a point-to-multipoint
    interface will have all members on the point-to-multipoint
    interface configured as part of a mesh group.

    The flooding of LSAs needs to be modified as follows. LSAs
    generated by the router or received over an adjacency which is not
    part of the mesh group is flooded as before. However a LSA received
    over an adjacency which is part of the mesh group is flooded only
    along those adjacencies which are not a part of the mesh group.

Kini/Dube                                                     [Page 5]

Internet Draft                                              April 2000

    The configuration information can also be set on the lines of the
    Designated Router concept (as in broadcast networks or NBMA
    networks). All members of the mesh group can be configured such
    that they flood the LSA to one or more designated routers which in
    turn flood the LSAs to all members of the mesh group. The
    adjacencies between members of the mesh group neither of which is
    not a DR should be configured to block the transmission of LSAs
    along the adjacency.

    The main drawback of this scheme is problems caused by incorrect
    configuration. If the mesh group configuration on one of the router
    wrongly includes a particular adjacency, then the router at the
    other end of the adjacency will have a blind spot for all LSAs it
    can receive from members of the mesh group to which it does not
    have an adjacency. Another drawback of this mechanism is the
    possibility of database inconsistencies. Consider a mesh group
    configured without a DR. If the adjacency between two members of a
    mesh group goes down, a LSA generated by either of the two will not
    reach the other via other members of the mesh group. If that
    adjacency is the only path between the two members of the mesh group
    then the database of the two members of the group will be
    inconsistent. Even if a DR is configured and an adjacency goes down
    between the DR and another member of the mesh group, then the
    databases can become inconsistent.

5 Implementation Details

    A short discussion on the implementation for mesh groups is given
    below.

    The following information needs to be configured for a point-to-
    point network, or a neighbor configured on a multi access network
    which does not have broadcast capability : meshgroupstate and
    meshgroupnumber. A broadcast interface can be made part of the mesh
    group only if every router on the broadcast network is a neighbor
    to every other router on the mesh group. The values for
    meshgroupstate can be either DISABLED, ENABLED or BLOCKED. If an
    interface is not a part of any mesh group then its meshgroupstate
    is set to DISABLED. Values assigned for meshgroupnumber can be any
    unique value for the meshgroup. The value of meshgroupnumber is
    used only if the meshgroupstate is not DISABLED. The state BLOCKED
    is used if the concept of DR is to be used within the mesh group.
    In this case, adjacencies to the DR should be in state ENABLED
    whereas adjacencies with others should be in state BLOCKED. In a
    mesh group, LSAs should not be sent along adjacencies which are in
    BLOCKED state.

Kini/Dube                                                     [Page 6]

Internet Draft                                              April 2000

    The original flooding procedure is as follows,

    flood_lsa (adjacency in, LSA)
    {
      for all adjacencies 'out'
        if (in != out)
          add LSA to retransmission list of out

      send LSUs and wait for LS acknowledgments
    }

    The flooding procedure is modified to,

    flood_lsa (adjacency in, LSA)
    {
      for all adjacencies 'out'
        if ( in != out)
          if ( out.meshgroupstate == DISABLED )
            add LSA to retransmission list of out
          else
          {
             if ( in.meshgroupstate == DISABLED )
               if ( out.meshgroupstate != BLOCKED )
                 add LSA to retransmission list of out
             else
               if ( in.meshgroupnumber != out.meshgroupnumber)
                 add LSA to retransmission list of out
               else
                 if ( out.meshgroupstate != BLOCKED )
                   add LSA to retransmission list of out
          }

       send LSUs and wait for LS acknowledgments
    }

    Note : LSAs originated by the router is equivalent to an LSA originated
    over an interface which has meshgroupstate set to DISABLED. Interfaces
    not in any meshgroups at all are considered DISABLED.

    The logic is explained more clearly in the decision matrix shown
    in Figure 2.

     D = disabled,   E = enabled,   B = blocked,   X = send LSA

          (same mesh group)                    (different mesh group)
       out ->                               out ->
             | D    E    B                        | D    E    B
      in   -----------------               in   -----------------
       |   D | X    X                       |   D | X    X
       V     |                              V     |
           E | X    X                           E | X    X
             |                                    |
           B | X                                B | X    X


                   Figure 2. Flooding logic

Kini/Dube                                                    [Page 7]

Internet Draft                                             April 2000

    The implementation of this feature on a commercial router stack
    took about two man-weeks (including the debug and test cycle).

6 Configuration examples

    6.1 Example 1

                           -----
                          |  D  |
                           -----
                             |(d1)
                             |
                             |(a1)
                           -----
                          |  A  |
                     (a3)/ ----- \(a2)
                        /         \
                       /           \
                  (b2)/             \(c2)
                   -----            -----
                  |  B  |----------|  C  |
                   ----- (b1)  (c1) -----

            Figure 3. Topology with a mesh group

    Consider the topology in Figure 3 consisting of four nodes A, B,
    C and D. Here A, B, C belong to a fully connected mesh group. The
    interfaces of A are named a1, a2, a3; interfaces of B are named b1,
    b2; interfaces of C are named c1, c2 and interface of D is named
    d1. Say the mesh group is denoted by 'g'. The configurations at the
    nodes for the respective interfaces will be.

       Interface   meshgroupstate     meshgroupnumber ( X = dont care)
           a1         DISABLED               X
           a2         ENABLED                g
           a3         ENABLED                g
           b1         ENABLED                g
           b2         ENABLED                g
           c1         ENABLED                g
           c2         ENABLED                g
           d1         DISABLED               X

    Note : For nodes which are not a part of any mesh group, no
    configuration is needed. In the above example node D need not have
    any configuration information.

    6.2 Example 2

    Consider the same topology shown in Figure 3. Say node B is to
    configured as a DR. The configuration at the nodes for the
    respective interfaces will be.

Kini/Dube                                                     [Page 8]

Internet Draft                                              April 2000

       Interface   meshgroupstate     meshgroupnumber ( X = dont care)
           a1         DISABLED               X
           a2         BLOCKED                g
           a3         ENABLED                g
           b1         ENABLED                g
           b2         ENABLED                g
           c1         ENABLED                g
           c2         BLOCKED                g

    6.3 Example 3

                   -----            -----
                  |  B  |----------|  C  |
                   ----- (b2)  (c1) -----
                 (b1)\             /(c2)
          g1          \           /
                       \     (a2)/
                    (a1)\  -----/   (a5)   (f1) -----
                          |  A  |--------------|  F  |
                         / ----- \              -----
          g2            /(a4) (a3)\
                       /           \
                  (d2)/             \(e2)
                   -----            -----
                  |  D  |----------|  E  |
                   ----- (d1)  (e1) -----

             Figure 4. Topology with two mesh groups

    Consider the topology shown in Figure 4 consisting of 6 nodes A, B,
    C, D, E and F. The interfaces are named as shown in the figure. Two
    mesh groups are configured. The mesh group consisting of A, B and C
    is identified by g1. The other mesh group consisting of A, D and E
    is identified by g2. Say, neither of the two mesh groups have a DR
    configured. The configuration at the nodes for the respective
    interfaces will be.

       Interface   meshgroupstate     meshgroupnumber ( X = dont care)
           a1         ENABLED                g1
           a2         ENABLED                g1
           a3         ENABLED                g2
           a4         ENABLED                g2
           a5         DISABLED               X
           b1         ENABLED                g1
           b2         ENABLED                g1
           c1         ENABLED                g1
           c2         ENABLED                g1
           d1         ENABLED                g2
           d2         ENABLED                g2
           e1         ENABLED                g2
           e2         ENABLED                g2
Kini/Dube                                                    [Page 9]

Internet Draft                                             April 2000

    6.4 Example 4

    Consider the same topology as shown in Figure 4. Say, the mesh
    group g1 has node A configured as a DR. The mesh group g2 does
    not have a DR configured. The configuration at the nodes for the
    respective interfaces will be.

       Interface   meshgroupstate     meshgroupnumber ( X = dont care)
           a1         ENABLED                g1
           a2         ENABLED                g1
           a3         ENABLED                g2
           a4         ENABLED                g2
           a5         DISABLED               X
           b1         ENABLED                g1
           b2         BLOCKED                g1
           c1         BLOCKED                g1
           c2         ENABLED                g1
           d1         ENABLED                g2
           d2         ENABLED                g2
           e1         ENABLED                g2
           e2         ENABLED                g2

7 Conclusion

    This draft discusses how scalability of the OSPF protocol is
    adversely affected because of redundant information generated by
    the flooding procedure. A method to limit this by employing
    configuration information is also presented. This method is not
    free from incorrect operation due to configuration errors. The
    problems posed by having a dynamic mechanism to achieve the same
    result are stated and need to be investigated to improve on this
    approach.

8 References

    [Ref1] J. Moy. OSPF version 2. RFC 2328, Internet Engineering Task
           Force, 1998.

    [Ref2] R. Balay, D. Katz and J. Parker. ISIS mesh groups. Internet
           Draft draft-balay-katz-parker-mesh-00.txt. June 1999.

9 Author's Address

    Sriganesh Kini
    Bell Labs, Lucent Technologies
    101 Crawfords Corner, Holmdel, NJ. 07733 USA
    email : kini@dnrc.bell-labs.com

    Rohit Dube
    Bell Labs, Lucent Technologies
    101 Crawfords Corner, Holmdel, NJ. 07733 USA
    email : rohitd@dnrc.bell-labs.com

Kini/Dube                                                    [Page 10]