Network Working Group R. White Internet Draft V. Manral Expiration Date: August 2003 R. Adams File Name: draft-white-network-benchmark-00.txt February 2003 Considerations in Benchmarking Routing Protocol Network Convergence draft-white-network-benchmark-00.txt Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet Drafts are working documents of the Internet Engineering Task Force (IETF), its Areas, and its Working Groups. Note that other groups may also distribute working documents as Internet Drafts. Internet Drafts are draft documents valid for a maximum of six months. Internet Drafts may be updated, replaced, or obsoleted by other documents at any time. It is not appropriate to use Internet Drafts as reference material or to cite them other than as a "working draft" or "work in progress". The list of current Internet-Drafts can be accessed at http//www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http//www.ietf.org/shadow.html. Abstract This document attempts to discuss some of the definitions required to undertake the specifications of such benchmarks, and also to discuss some of the possible ways to benchmark a routing protocol performance within a network, and some of the implications of those benchmarks. The definition of convergence is discussed first, then polling network devices. Several tests which are commonly used to measure network convergence are examined. This draft does not attempt to define what techniques should be used to benchmark network convergence, but only to provide considerations that testers shoudl consider when attempting to measure netowrk convergence using various methods. White, Manral, Adams [Page 1] INTERNET DRAFT Condierations in Benchmarking RPs February 2003 1. Motivation As the ability to benchmark components within a network appears to be coming under greater scrutiny, and specifications are being written to standardize ways to measure the performance of individual components within given frameworks, the next level of benchmarking has not been approached, that of measuring the perfomance of networks. But what is meant when we say the performance of a network, from the perspective of routing protocols? Various tests have been used in the past to measure the convergence of a network, some of which actually measure completely different things than others. It's important to attempt to examine the measurement of network convergence in a way that exposes these differences, and helps vendors, end users, and those in the research community have some common ground when discussing network convergence. 2. A Problem of Definitions As we examine the issues and concepts surrounding the measurement of network performance in terms of convergence, we find that most of the basic problems we face surround defining the terms in use. For instance, what is convergence, exactly? What is a network? In the following sections, we discuss each of these concepts, and attempt to address each one. 2.1. Networks In its most nominal form, a network is composed of a group of devices interconnected in some way, which send data over these interconnections for various purposes. But, when we discuss the concept of routing protocol convergence within a network, the definition needs to be more precise. For instance, since hosts do not, generally, participate in routing, should they be considered a part of the network when benchmarking the performance of a routing protocol? The obvious answer appears to be a resounding no, but, in some possible tests types, hosts which do not participate in routing play a large part in the test itself. When considering tests in which hosts participate as traffic or route generators, then, we must consider the impact these hosts have on the test results, although we may not consider them a part of the network we are measuring the performance of. White, Manral, Adams [Page 2] INTERNET DRAFT Condierations in Benchmarking RPs February 2003 2.2. Convergence Convergence is probably one of the hardest words in networking to define. Just about everyone who has worked on networks for a period of time knows what it means, but no-one can explain it sufficently to someone who doesn't understand how a network works for it to be understood. In fact, this is because there are several different meanings attributed to convergence, and which meaning is intended depends on the context in which the word is set. Convergence can mean: o The time at which all the routing protocol processes running on devices which participate in routing in the network agree on the best path to each reachable destination in the network. o The time at which the best path to each reachable destination in the network has been loaded into some local table which may then be used to forward packets (the routing information base, or RIB). o The time at which each router in the network has built the tables necessary to actually forward packets through the net- work, so that a packet transmitted from one part of the network would actually reach any given reachable destination within the network. For instance, on a Cisco router, show ip ospf stats would allow the tester to see the time of the last completed SPF, show ip route would allow the tester to see what routes are installed in the RIB, and show ip cef would allow the tester to see the forwarding information which has been built from the RIB. Each test designed to measure the performance of routing protocols within a network must determine which type of convergence is being measured, if that measurement is acceptable to the information being gathered, and which test will actually measure the desired type of convergence. White, Manral, Adams [Page 3] INTERNET DRAFT Condierations in Benchmarking RPs February 2003 3. Polling Devices in a Network One common way to measure network convergence is to poll the devices in the network, using some command suppplied within the routing software, to determine when particular events have occured, or par- ticular pieces of information have reached all the routers in the network. Polling elimiates the need for the clock of each device within the network to be synchronized for the test to have meaningful results. However, there are some issues with the rate of polling dev- ices within the network which need to be addressed in any test which polls devices for this information; the first is the rate at which polling takes place. If, in a test, you are attempting to measure some parameter to within one second of its occurance, then you would need to poll at a rate much higher than once per second. test starts here | | event occurs here | | v v -+----------+----------+----------+--- ^ ^ ^ ^ | | | | 0 seconds 1 second 2 seconds 3 seconds For instance, in this time line, suppose a polling event is set up which takes place every second. An event is started just after some polling event takes place, but the polling process doesn't recognize the test as starting until the 1 second poll. An event occurs just before the 2 second poll, and the polling process detects this at the 2 second poll. The polling process would indicate that from the time the event started until the time the event has finished, one second has elapsed. In reality, closer to two seconds has elapsed. The interval of the polling process can be reduced until the measure- ment is felt to be accurate, but it should be at least half of the desired accuracy. Common practice actually shows that it should be about one tenth of the desired accuracy. A second consideration when polling for network events is the prefor- mance of the device running the polling process. If the process can- not poll each device at the scheduled interval, or the polling is "jittered," the time between each actual poll varies by some amount, the accuracy of the tests will be called into question. The amount of jitter introduced by the polling device, and the rate at which the White, Manral, Adams [Page 4] INTERNET DRAFT Condierations in Benchmarking RPs February 2003 device can effectively poll, should be measured in some way, and this measurement should be taken into account when designing tests which rely on polling. Finally, when polling devices to determine when a network event occurs, issues with serialization must be considered. Most devices which would be used for polling will not be able to poll several dev- ices within the network at once, and will thus serialize the polling of devices. p1 p3 p5 p7 p9 | p2| p4| p6| p8| p10 | | | | | | | | | | v v v v v v v v v v -+----------+----------+----------+--- ^ ^ ^ ^ | | | | 0 seconds 1 second 2 seconds 3 seconds Suppose, for instance, that a single device is polling ten devices in the network. If it can poll five devices per second, it will take a full two seconds for it to detect any event on all ten devices, giv- ing an effective accuracy of about four seconds. The amount of time required for a polling device to serialize through all the devices it is polling needs to be considered when polling a very large number of devices. 4. Tests to Measure Routing Protocols Convergence In this section, we will outline some of the various tests which have been used in the past to measure routing protocols convergence within a network, and discuss some aspects of these tests. 4.1. Determining When Each Device has Received Information About All Reachable Destinations In link state protocols, information is flooded throughout the net- work; discovering when each router in the network has received this information is an important consideration in network convergence. Slower flooding times will, of course, mean slower network conver- gence overall, thus flooding performance directly impacts overall routing protocols performance in the network. There are three methods which can be used to determine when the flooding of information has been completed. White, Manral, Adams [Page 5] INTERNET DRAFT Condierations in Benchmarking RPs February 2003 4.1.1. Black Box Polling The test may poll each device using the methods specific to the pro- tocol to determine information receipt without using output from the devices themselves. [OSPF-BENCH] describes such methods for the OSPF protocol. The limitations on polling devices described above should be taken into consideration when using this method. 4.1.2. White Box Output Each device may provide some output which notes when certain pieces of information were received by the device, or some event driven notification or logging which notes when information is received by the device. To use this method, the time clocks of all devices within the network must be syncronized. If devices are polled to gather this output, the limitations on polling devices described above should be taken into consideration. 4.1.3. External Packet Monitoring Placing external devices which can record packets traversing the net- work, and examining the packet flow in the network to determine when all devices have been transmitted the flooded information. This tech- nique needs to take into account the time clocks of such capturing devices (they should be syncronized for effective measurement), and the flooding patterns of the routing protocol being measured. For instance, if acknowledgements are used within the protocol, then the tester needs to determine if the flooded information or these ack- nowledgements will be used to indicate succesful flooding of the information. If the acknowledgements are used, the test results will not only include flooding time, but the time required to process and acknowledge flooded packets. Each of these methods has certain advantages and disadvantages; some combination of the three would probably provide the most accurate results. 4.2. Determining When Each Device has Finished Finding the Best Path to Each Reachable Destination This is, probably, the most difficult measurement to take in a net- work, since there are no known black box ways of determining when a device has finished computing the best path to each destination in the network. The only possible way of measuring this time is to use output from the devices in the network to provide this information. White, Manral, Adams [Page 6] INTERNET DRAFT Condierations in Benchmarking RPs February 2003 It's possible to poll each device periodically, examining output pro- vided by the devices, to determine when each device has calculated the best path to each destination in the network. This method is sub- ject to the limitations described in the section on polling devices, above. It's also possible to rely on some event driven output of each device in the network. For this to yield accurate results, the time clocks of all the devices in the network must be closely synchronized. 4.3. Passing Traffic Through the Network to Determine Convergence One of the most widely used tests for determining network convergence is starting some traffic stream at one end of a network, disrupting or completing the network, and determining how long the traffic stream is either not delivered, or takes to be delivered. For instance: Source----R1----R2----Sink A traffic stream is generated on Source, and the link between R1 and R2 is connected in some way. The time between the connection of this link and the arrival of the traffic at the Sink is measured as net- work convergence. This type of test is extremely useful in testing real the response of a network to changing conditions. There are some considerations which should be examined when using this sort of test, or examining the results of this sort of test. 4.3.1. The Various Elements of Performance Cannot Be Seperated Using this sort of testing, there is no way to seperate the perfor- mance of a routing protocol from the performance the interaction between the routing protocol and the forwarding engine, nor from the performance of the forwarding engine itself. In many tests, this is acceptable, since these are all elements of the network in total, but if specific elements of routing protocol performance are being meas- ured, such tests can be problematic when attempting to analyse the results. White, Manral, Adams [Page 7] INTERNET DRAFT Condierations in Benchmarking RPs February 2003 4.3.2. The Total Convergence of the Network May Not Be Measured If you have the following topology: Source-----R1----R2-----Sink | | R3 R4 | | R5----R6 Suppose a traffic stream is sourced from Source, and then all the devices in the network are brought up (R1 through R6). The time from the device startup to the traffic stream reaching the sink is meas- ured as network convergence. As soon as the path Source, R1, R2, Sink converges, the Sink will begin receiving traffic, and the network will be considered to be converged by the test. However, without polling the remaining routers, R3 through R6, there is no way to know if those routers have also converged on the best path to the Sink and the Source. While this example may be considered extreme, there are many complex topo- logies where: o The path chosen by the traffic stream may not be the path expected. o The path chosen by the traffic stream may switch during network convergence, with the stream taking some secondary path at first, and the succesively better paths converging over the life of the test. o The path chosen by the traffic stream switches so quickly that no traffic is lost, while the routing protocols still take some time to converge. Tests which rely on traffic passing through the network to determine network convergence times should thoroughly examine the way in which the test topology converges, and examine the consistency of that con- vergence, with enough test runs to get a good feel for the range of possible results. Examining the same test sequence with slight changes in the network topology may help to provide an understanding of how the network uner test converges, and also may help to provide more insight into the factors impacting convergence in the test net- work. It's also possible that if the test network does not converge com- pletely for some time after the test traffic succesfully passes White, Manral, Adams [Page 8] INTERNET DRAFT Condierations in Benchmarking RPs February 2003 through the topology, the continuing convergence could impact the results of a second test run, if the test runs are placed too closely together. If a first test is run, and a second test is started immediately on traffic making it through the test topology, the results of the second test may be skewed by convergence which is still taking place from the first test run. These are important considerations which should be noted when examin- ing or performing tests which rely on the presence of a data stream within a routing system to measure convergence. 5. References [OSPF-BENCH] Manral, V., "Benchmarking Methodology for Basic OSPF Convergence", draft-bmwg-ospfconv-intraarea, May 2002 6. Authors' Addresses Russ White Cisco Systems, Inc. 7025 Kit Creek Rd. Research Triangle Park, NC 27709 riw@cisco.com Vishwas Manral, Netplane Systems, 189 Prashasan Nagar, Road number 72, Jubilee Hills, Hyderabad. vmanral@netplane.com Robert Adams Cisco Systems, Inc. 7025 Kit Creek Rd. Research Triangle Park, NC 27709 robeadam@cisco.com White, Manral, Adams [Page 9]