Network Working Group                                           R. White
Internet Draft                                                 V. Manral
Expiration Date: August 2003                                    R. Adams
File Name: draft-white-network-benchmark-00.txt            February 2003

  Considerations in Benchmarking Routing Protocol Network Convergence
                  draft-white-network-benchmark-00.txt

Status of this Memo

   This document is an Internet-Draft and is in full conformance with
   all provisions of Section 10 of RFC2026.

   Internet Drafts are working documents of the Internet Engineering
   Task Force (IETF), its Areas, and its Working Groups. Note that other
   groups may also distribute working documents as Internet Drafts.

   Internet Drafts are draft documents valid for a maximum of six
   months.  Internet Drafts may be updated, replaced, or obsoleted by
   other documents at any time. It is not appropriate to use Internet
   Drafts as reference material or to cite them other than as a "working
   draft" or "work in progress".

   The list of current Internet-Drafts can be accessed at
   http//www.ietf.org/ietf/1id-abstracts.txt

   The list of Internet-Draft Shadow Directories can be accessed at
   http//www.ietf.org/shadow.html.

Abstract

   This document attempts to discuss some of the definitions required to
   undertake the specifications of such benchmarks, and also to discuss
   some of the possible ways to benchmark a routing protocol performance
   within a network, and some of the implications of those benchmarks.
   The definition of convergence is discussed first, then polling
   network devices. Several tests which are commonly used to measure
   network convergence are examined.

   This draft does not attempt to define what techniques should be used
   to benchmark network convergence, but only to provide considerations
   that testers shoudl consider when attempting to measure netowrk
   convergence using various methods.


White, Manral, Adams                                            [Page 1]


INTERNET DRAFT     Condierations in Benchmarking RPs       February 2003


1. Motivation

   As the ability to benchmark components within a network appears to be
   coming under greater scrutiny, and specifications are being written
   to standardize ways to measure the performance of individual
   components within given frameworks, the next level of benchmarking
   has not been approached, that of measuring the perfomance of
   networks. But what is meant when we say the performance of a network,
   from the perspective of routing protocols? Various tests have been
   used in the past to measure the convergence of a network, some of
   which actually measure completely different things than others.

   It's important to attempt to examine the measurement of network
   convergence in a way that exposes these differences, and helps
   vendors, end users, and those in the research community have some
   common ground when discussing network convergence.


2. A Problem of Definitions

   As we examine the issues and concepts surrounding the measurement of
   network performance in terms of convergence, we find that most of the
   basic problems we face surround defining the terms in use. For
   instance, what is convergence, exactly? What is a network? In the
   following sections, we discuss each of these concepts, and attempt to
   address each one.


2.1. Networks

   In its most nominal form, a network is composed of a group of devices
   interconnected in some way, which send data over these
   interconnections for various purposes. But, when we discuss the
   concept of routing protocol convergence within a network, the
   definition needs to be more precise. For instance, since hosts do
   not, generally, participate in routing, should they be considered a
   part of the network when benchmarking the performance of a routing
   protocol?  The obvious answer appears to be a resounding no, but, in
   some possible tests types, hosts which do not participate in routing
   play a large part in the test itself.

   When considering tests in which hosts participate as traffic or route
   generators, then, we must consider the impact these hosts have on the
   test results, although we may not consider them a part of the network
   we are measuring the performance of.


White, Manral, Adams                                            [Page 2]


INTERNET DRAFT     Condierations in Benchmarking RPs       February 2003


2.2. Convergence

   Convergence is probably one of the hardest words in networking to
   define. Just about everyone who has worked on networks for a period
   of time knows what it means, but no-one can explain it sufficently to
   someone who doesn't understand how a network works for it to be
   understood. In fact, this is because there are several different
   meanings attributed to convergence, and which meaning is intended
   depends on the context in which the word is set. Convergence can
   mean:


   o    The time at which all the routing protocol processes running on
        devices which participate in routing in the network agree on the
        best path to each reachable destination in the network.

   o    The time at which the best path to each reachable destination in
        the network has been loaded into some local table which may then
        be used to forward packets (the routing information base, or
        RIB).

   o    The time at which each router in the network has built the
        tables necessary to actually forward packets through the net-
        work, so that a packet transmitted from one part of the network
        would actually reach any given reachable destination within the
        network.

   For instance, on a Cisco router, show ip ospf stats would allow the
   tester to see the time of the last completed SPF, show ip route would
   allow the tester to see what routes are installed in the RIB, and
   show ip cef would allow the tester to see the forwarding information
   which has been built from the RIB. Each test designed to measure the
   performance of routing protocols within a network must determine
   which type of convergence is being measured, if that measurement is
   acceptable to the information being gathered, and which test will
   actually measure the desired type of convergence.


White, Manral, Adams                                            [Page 3]


INTERNET DRAFT     Condierations in Benchmarking RPs       February 2003


3. Polling Devices in a Network

   One common way to measure network convergence is to poll the devices
   in the network, using some command suppplied within the routing
   software, to determine when particular events have occured, or par-
   ticular pieces of information have reached all the routers in the
   network. Polling elimiates the need for the clock of each device
   within the network to be synchronized for the test to have meaningful
   results. However, there are some issues with the rate of polling dev-
   ices within the network which need to be addressed in any test which
   polls devices for this information; the first is the rate at which
   polling takes place.

   If, in a test, you are attempting to measure some parameter to within
   one second of its occurance, then you would need to poll at a rate
   much higher than once per second.

    test starts here
     |
     |                   event occurs here
     |                   |
     v                   v
   -+----------+----------+----------+---
    ^          ^          ^          ^
    |          |          |          |
    0 seconds  1 second   2 seconds  3 seconds


   For instance, in this time line, suppose a polling event is set up
   which takes place every second. An event is started just after some
   polling event takes place, but the polling process doesn't recognize
   the test as starting until the 1 second poll. An event occurs just
   before the 2 second poll, and the polling process detects this at the
   2 second poll. The polling process would indicate that from the time
   the event started until the time the event has finished, one second
   has elapsed. In reality, closer to two seconds has elapsed.

   The interval of the polling process can be reduced until the measure-
   ment is felt to be accurate, but it should be at least half of the
   desired accuracy. Common practice actually shows that it should be
   about one tenth of the desired accuracy.

   A second consideration when polling for network events is the prefor-
   mance of the device running the polling process. If the process can-
   not poll each device at the scheduled interval, or the polling is
   "jittered," the time between each actual poll varies by some amount,
   the accuracy of the tests will be called into question. The amount of
   jitter introduced by the polling device, and the rate at which the


White, Manral, Adams                                            [Page 4]


INTERNET DRAFT     Condierations in Benchmarking RPs       February 2003


   device can effectively poll, should be measured in some way, and this
   measurement should be taken into account when designing tests which
   rely on polling.

   Finally, when polling devices to determine when a network event
   occurs, issues with serialization must be considered. Most devices
   which would be used for polling will not be able to poll several dev-
   ices within the network at once, and will thus serialize the polling
   of devices.

    p1   p3  p5  p7  p9
     | p2| p4| p6| p8| p10
     | | | | | | | | | |
     v v v v v v v v v v
   -+----------+----------+----------+---
    ^          ^          ^          ^
    |          |          |          |
    0 seconds  1 second   2 seconds  3 seconds

   Suppose, for instance, that a single device is polling ten devices in
   the network. If it can poll five devices per second, it will take a
   full two seconds for it to detect any event on all ten devices, giv-
   ing an effective accuracy of about four seconds. The amount of time
   required for a polling device to serialize through all the devices it
   is polling needs to be considered when polling a very large number of
   devices.


4. Tests to Measure Routing Protocols Convergence

   In this section, we will outline some of the various tests which have
   been used in the past to measure routing protocols convergence within
   a network, and discuss some aspects of these tests.


4.1. Determining When Each Device has Received Information About All
   Reachable Destinations

   In link state protocols, information is flooded throughout the net-
   work; discovering when each router in the network has received this
   information is an important consideration in network convergence.
   Slower flooding times will, of course, mean slower network conver-
   gence overall, thus flooding performance directly impacts overall
   routing protocols performance in the network.

   There are three methods which can be used to determine when the
   flooding of information has been completed.


White, Manral, Adams                                            [Page 5]


INTERNET DRAFT     Condierations in Benchmarking RPs       February 2003


4.1.1. Black Box Polling

   The test may poll each device using the methods specific to the pro-
   tocol to determine information receipt without using output from the
   devices themselves. [OSPF-BENCH] describes such methods for the OSPF
   protocol. The limitations on polling devices described above should
   be taken into consideration when using this method.


4.1.2. White Box Output

   Each device may provide some output which notes when certain pieces
   of information were received by the device, or some event driven
   notification or logging which notes when information is received by
   the device. To use this method, the time clocks of all devices within
   the network must be syncronized. If devices are polled to gather this
   output, the limitations on polling devices described above should be
   taken into consideration.


4.1.3. External Packet Monitoring

   Placing external devices which can record packets traversing the net-
   work, and examining the packet flow in the network to determine when
   all devices have been transmitted the flooded information. This tech-
   nique needs to take into account the time clocks of such capturing
   devices (they should be syncronized for effective measurement), and
   the flooding patterns of the routing protocol being measured. For
   instance, if acknowledgements are used within the protocol, then the
   tester needs to determine if the flooded information or these ack-
   nowledgements will be used to indicate succesful flooding of the
   information. If the acknowledgements are used, the test results will
   not only include flooding time, but the time required to process and
   acknowledge flooded packets.

   Each of  these methods has certain advantages  and disadvantages;
   some combination  of the  three would  probably provide  the  most
   accurate results.


4.2. Determining When Each Device has Finished Finding the Best Path to
   Each Reachable Destination

   This is, probably, the most difficult measurement to take in a net-
   work, since there are no known black box ways of determining when a
   device has finished computing the best path to each destination in
   the network. The only possible way of measuring this time is to use
   output from the devices in the network to provide this information.


White, Manral, Adams                                            [Page 6]


INTERNET DRAFT     Condierations in Benchmarking RPs       February 2003


   It's possible to poll each device periodically, examining output pro-
   vided by the devices, to determine when each device has calculated
   the best path to each destination in the network. This method is sub-
   ject to the limitations described in the section on polling devices,
   above.

   It's also possible to rely on some event driven output of each device
   in the network. For this to yield accurate results, the time clocks
   of all the devices in the network must be closely synchronized.


4.3. Passing Traffic Through the Network to Determine Convergence

   One of the most widely used tests for determining network convergence
   is starting some traffic stream at one end of a network, disrupting
   or completing the network, and determining how long the traffic
   stream is either not delivered, or takes to be delivered. For
   instance:

   Source----R1----R2----Sink

   A traffic stream is generated on Source, and the link between R1 and
   R2 is connected in some way. The time between the connection of this
   link and the arrival of the traffic at the Sink is measured as net-
   work convergence. This type of test is extremely useful in testing
   real the response of a network to changing conditions. There are some
   considerations which should be examined when using this sort of test,
   or examining the results of this sort of test.


4.3.1. The Various Elements of Performance Cannot Be Seperated

   Using this sort of testing, there is no way to seperate the perfor-
   mance of a routing protocol from the performance the interaction
   between the routing protocol and the forwarding engine, nor from the
   performance of the forwarding engine itself. In many tests, this is
   acceptable, since these are all elements of the network in total, but
   if specific elements of routing protocol performance are being meas-
   ured, such tests can be problematic when attempting to analyse the
   results.


White, Manral, Adams                                            [Page 7]


INTERNET DRAFT     Condierations in Benchmarking RPs       February 2003


4.3.2. The Total Convergence of the Network May Not Be Measured

   If you have the following topology:

   Source-----R1----R2-----Sink
              |      |
              R3    R4
              |      |
              R5----R6

   Suppose a traffic stream is sourced from Source, and then all the
   devices in the network are brought up (R1 through R6). The time from
   the device startup to the traffic stream reaching the sink is meas-
   ured as network convergence.

   As soon as the path Source, R1, R2, Sink converges, the Sink will
   begin receiving traffic, and the network will be considered to be
   converged by the test. However, without polling the remaining
   routers, R3 through R6, there is no way to know if those routers have
   also converged on the best path to the Sink and the Source. While
   this example may be considered extreme, there are many complex topo-
   logies where:


   o    The path chosen by the traffic stream may not be the path
        expected.

   o    The path chosen by the traffic stream may switch during network
        convergence, with the stream taking some secondary path at
        first, and the succesively better paths converging over the life
        of the test.

   o    The path chosen by the traffic stream switches so quickly that
        no traffic is lost, while the routing protocols still take some
        time to converge.

   Tests which rely on traffic passing through the network to determine
   network convergence times should thoroughly examine the way in which
   the test topology converges, and examine the consistency of that con-
   vergence, with enough test runs to get a good feel for the range of
   possible results. Examining the same test sequence with slight
   changes in the network topology may help to provide an understanding
   of how the network uner test converges, and also may help to provide
   more insight into the factors impacting convergence in the test net-
   work.

   It's also possible that if the test network does not converge com-
   pletely for some time after the test traffic succesfully passes


White, Manral, Adams                                            [Page 8]


INTERNET DRAFT     Condierations in Benchmarking RPs       February 2003


   through the topology, the continuing convergence could impact the
   results of a second test run, if the test runs are placed too closely
   together. If a first test is run, and a second test is started
   immediately on traffic making it through the test topology, the
   results of the second test may be skewed by convergence which is
   still taking place from the first test run.

   These are important considerations which should be noted when examin-
   ing or performing tests which rely on the presence of a data stream
   within a routing system to measure convergence.


5. References


[OSPF-BENCH]
     Manral, V., "Benchmarking Methodology for Basic OSPF Convergence",
     draft-bmwg-ospfconv-intraarea, May 2002


6. Authors' Addresses

      Russ White
      Cisco Systems, Inc.
      7025 Kit Creek Rd.
      Research Triangle Park, NC 27709

      riw@cisco.com

      Vishwas Manral,
      Netplane Systems,
      189 Prashasan Nagar,
      Road number 72,
      Jubilee Hills,
      Hyderabad.

      vmanral@netplane.com

      Robert Adams
      Cisco Systems, Inc.
      7025 Kit Creek Rd.
      Research Triangle Park, NC 27709

      robeadam@cisco.com


White, Manral, Adams                                            [Page 9]