Network Working Group Sumit Khurana INTERNET-DRAFT Shobha Erramilli Expires January 2001 Tony Bogovic Telcordia Technologies July 2000 Benchmarking Methodology for Devices Implementing Differentiated Services draft-khurana-bmwg-diffservm-00.txt 1. Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft shadow directories can be accessed at http://www.ietf.org/shadow.html This memo provides information for the Internet community. This memo does not specify an Internet standard of any kind. Distribution of this memo is unlimited. 2. Abstract This document discusses and defines tests for performance characterization of network interconnect devices implementing support for Differentiated services. The metrics to be used for performance evaluation and the methodology for determining the metrics is explained, as also the format for reporting the results of the tests. It builds upon the test methodology described in RFC 2544. 3. Introduction Performance evaluation of network equipment is important for network service providers in order to provision networks to satisfy the Khurana, Erramilli, Bogovic [Page 1] INTERNET-DRAFT Benchmarking Methodology for DiffServ July 2000 requirements of their customers. From the vendors perspective having well defined benchmarks with clearly stated metrics and performance comparison criteria, helps them enumerate the strengths and weaknesses of their products in the marketplace. Of late there has been a demand to provide services such as Virtual Private Networks and Voice, over IP networks which may require that Internet Service Providers provide guaranteed performance levels to their customers. For example, VPNs may require low end to end loss and voice may be supported if and only if the end to end delay is bounded. Performance evaluation of networking equipment that comprise a provider's network, becomes even more critical in this context as the providers must satisfy their customers that they would be able to meet the requirements for mission-critical or real time applications. The IETF has proposed a Differentiated Services architecture[1] for providing scalable service discrimination in the Internet. Packets are marked differently to create several packet classes. Packets belonging to different classes are given different forwarding treatments in the network. Individual micro-flows at the edge of the network are classified as one of several unique service classes (such as gold, silver and bronze; Olympic services) based on the analysis of one or more fields (multi-field (MF)) in the IP header such as source, destination, port number, protocol ID etc. The DS field in the IP header is marked with distinct codepoints to identify different classes. Routers in the core base the treatment that these packets receive, on this marking by carrying out a behavior aggregate classification, which enables the packets to be put in different queues. Routers then implement a set of buffer management and packet scheduling mechanisms on these queues to permit per-hop service differentiation between traffic classes. A service provisioning policy can be engineered to control the amount of traffic associated with each class of service. Existence of a service level agreement (SLA) is assumed between the customer and the ISP. Traffic conforming to the SLA (in-profile) and traffic exceeding the SLA (e.g. packets arriving at a rate higher than the agreed upon bandwidth) are treated differently. Classifiers, markers and policers or shapers work at the edge of the network to enforce the service provisioning policy. 4. Motivation Conceptually, the functionality that a router provides, consists of two distinct blocks, the packet forwarding part and the routing part. The packet forwarding engine determines how the packets are queued and scheduled for service. The routing engine determines the next hop route that the packets take when they are serviced. The packet forwarding functionality changes in routers implementing Khurana, Erramilli, Bogovic [Page 2] INTERNET-DRAFT Benchmarking Methodology for DiffServ July 2000 differentiated services, as compared to routers handling only best effort traffic. The routing engine is not impacted as a result of implementing differentiated services. The logical components of the forwarding engine are classifier, meter, marker, dropper, queue and scheduler. Together, these serve to condition the traffic to conform to the service level agreements that are expected of the router implementing the services. For example, all traffic exceeding a certain traffic profile is dropped, whereas all traffic that is in profile, gets a configured departure rate and thus end to end delay bounds are provided for the customer's application. In terms of performance goals, it is important to characterize how well the devices implement the forwarding engine to meet service expectations. Diffserv provides the building blocks for services, but does not define the services themselves. Performance evaluation of a device can help determine what are the services that can be supported or provided using that router. In terms of router implementations, one wants to characterize the efficacy of the scheduling and buffer management algorithms in terms of providing the services without necessarily delving into the precise details of what these algorithms are. Some specific goals are to a) Determine the effectiveness of the device in providing different services to traffic belonging to different classes. b) Test whether enabling QoS features imposes an excessive performance penalty on the Device Under Test. c) Test whether lower priority classes starve in the presence of congestion. d) If one class under utilizes its share, determine how the excess is distributed. This document defines a set of tests for characterizing the performance of routers implementing differentiated services. The tests make no assumptions on the specific algorithms that the devices use to implement differentiated services. Instead, the test results should be evaluated in terms of the services that are expected from the device. RFC 2544[5] defines a set of tests for benchmarking network interconnect devices considering only one class of traffic. This document extends the test methodology explained in RFC2544 to provide per class distinction in the metrics obtained. In terms of benchmarking device performance RFC 2544 tests help characterize the aggregate and best effort forwarding behaviors of the device, whereas this document extends RFC 2544 tests to characterize the efficiency of the packet forwarding behavior of the device (that is, its scheduling, buffer management, policing and shaping capabilities,) in providing service differentiation. The outline of the rest of this document is as follows; Section 5 Khurana, Erramilli, Bogovic [Page 3] INTERNET-DRAFT Benchmarking Methodology for DiffServ July 2000 provides an overview of the general approach. Section 6 explains the modifications needed to the tests stated in RFC 2544 in order to be able to test differentiated services. Section 7 is a list of recommended tests. RFC 2544, RFC 1242[6] and RFC 2475[1] should be consulted before attempting to use this document. 5. Background 5.1 Device Configuration The Differentiated Services framework does not define service definitions or the specific mechanisms or algorithms through which Per-Hop behaviors are implemented. The test procedures and analysis must also therefore be independent of the specific mechanisms, that routers implement for scheduling or buffer management. Care must however be taken that equivalent features are enabled on the Devices Under Test, in order that test results are comparable. This is especially important for benchmarking, when routers A and B are being evaluated relative to each other. Since the standard doesn't impose truly equivalent features on devices, it is recommended that the features or mechanisms on the routers be enabled which best satisfy a given service definition. If the same algorithms exist on both routers those should be enabled. If not, configurations that serve to satisfy the service requirement should be enabled. A service definition can be defined in terms of the loss rate, latency or throughput expected for given code-points. The expectations could be expressed in several different ways. For example, for a given class; a) Absolute relative to another class (e.g. Throughput (X2) = 1.5 * Throughput(X2), configure classification and weights for weighted fair queuing such that this condition is satisfied.) b) An absolute value ( e.g. Throughput(X1) = 100 frames/ sec), c) Relative to other classes (e.g. latency(X2) < latency(X1), configure policing using the leaky bucket mechanism, such that out of profile traffic for X2 is dropped and map X2 to the implementation of the EF[3] PHB.) or d) in some other consistent user defined manner that is relevant. The configuration used should be stated along with test results. 5.2 Test Parameters The test configuration is as described in RFC 2544. i.e. a tester may be used to generate traffic and receive traffic after it passes through the device under test. In particular, sections 6 through 25 Khurana, Erramilli, Bogovic [Page 4] INTERNET-DRAFT Benchmarking Methodology for DiffServ July 2000 of RFC 2544 are directly applicable for the test input parameters. For example, the frame size values used should be consistent with RFC 2544 tests. The additional capability required of the tester is the ability to generate traffic belonging to different classes, i.e. the ability to mark the DSCP field of the generated IP packets differently. The receiver on the tester should be able to demultiplex packets based on the codepoints in order to determine metrics on a per code-point basis. Most of the terminology used is consistent with RFC 1242. Additional terminology for delay jitter is defined in section 5.4. 5.3 Test Result Evaluation Data obtained as a result of carrying out the listed tests should be evaluated with regard to the environment in which the device would be deployed and based on the most important service differentiation criteria expected of it. For example, does the device configured to provide a guaranteed bandwidth to traffic with code-point x, also ensure that the latency for traffic with code-point x is less than a certain maximum. Some suggested criteria for evaluating the test results are: a) Conformance to standards. For example, are packets belonging to an AF code-point mis-ordered. b) Service expectations. Is the device able to satisfy the service expectations. c) Metrics obtained for one device relative to another device. These criteria may be interpreted and/or enhanced in the context of the environment in which the equipment is to be deployed. 5.4 Additional Terminology The tests described in this document are consistent with the definitions of Throughput, Latency, Frame Loss Rate and the Ability to handle back to back frames as described in RFC 1242. However, they help determine these metrics separately for each class being tested. In addition a metric and corresponding methodology for Delay Jitter are defined. 5.4.1 Definition of Delay Jitter Khurana, Erramilli, Bogovic [Page 5] INTERNET-DRAFT Benchmarking Methodology for DiffServ July 2000 Definition: The instantaneous delay jitter of a frame is defined as the difference of the latency of that packet and the latency of the preceding packet in the stream. Discussion: Delay variation is an important consideration for supporting delay sensitive services. For example, when a real-time voice application (e.g. telephony) receives voice packets over an IP network, the packets are played out as they are received. However, if a packet arrives while previous packets are still being played out, the new packets must be buffered till it is time to play them out. Similarly, if a packet arrives too late there is a gap in the play out. Delay jitter thus has important implications for what services can be supported and the buffering required in order to support them. In the context of a device implementing Diffserv, it is important to measure the efficacy of the device in scheduling succeeding packets of a given class, such that the instantaneous packet delay variation is bounded. Measurement Units: Time with fine enough units to distinguish 2 events. 6. Benchmarking Tests Methodology This section describes a set of tests for determining throughput, latency, delay jitter, frame loss rate and the ability to handle back-to-back frames, given a set of codepoints as input. The tests described in section 7, recommend a minimal set of codepoints to test using these generic tests. For the sake of uniformity the tests are enumerated in the same format as used in RFC 2544. 6.1 Throughput Test Objective: To determine the throughput provided by the DUT when QoS parameters are enabled. Determine the relative share of bandwidth for each class being tested. Procedure: Input the code-points that are going to be exercised. Input the service expectation. For example, the allocation of relative bandwidth between different classes. Configure the device under test Khurana, Erramilli, Bogovic [Page 6] INTERNET-DRAFT Benchmarking Methodology for DiffServ July 2000 to meet this service expectation. For each codepoint send a specific number of frames at a rate equal to or higher than the input link capacity through the DUT and count the frames that are transmitted by the DUT. There will be losses. For the codepoint with the highest expected service (the codepoint with the highest numeric value for class selector PHB[2]), reduce the offered rate and rerun the test till no frames are lost, i.e. the count of received frames for the class is equal to the count of offered frames for the class. The throughput for the class is the fastest rate at which the count of test frames transmitted by the DUT for the class is equal to the number of test frames sent to it by the test equipment. With the higher expected service codepoints being transmitted at their throughput rate, repeat the test for the next highest priority class to determine its throughput, till all code-points being tested are exhausted. The output metric from the test is the set of (code- point, throughput) pairs. Repeat the test with different frame sizes. Reporting Format: The result should be reported as two graphs. On the first graph the x-coordinate should show the frame size. The y-coordinate should show the throughput as frames per second. A line representing the theoretical throughput for each class may be plotted if expected allocation of bandwidth was specified for the test. A line should be plotted for the actual throughput for each class. The total throughput obtained across all classes, ( sum of throughput of all the classes being tested) should also be plotted. On the second graph the x-coordinate should show the frame size. The y-coordinate should show the percentage of the total throughput obtained across all classes. Lines should be plotted for each class for the actual throughput. Lines may also be plotted for the expected throughput. The first graph shows the utilization when QoS is activated, giving an indication of the performance penalty, if any, when QoS is activated. The second graph shows the relative allocation of throughput for each class and allows comparison with the expected relative values. 6.2 Latency Test Objective: Determine the latency experienced by frames in each class. Procedure: Specify the code-points and service expectations, if any, to test Khurana, Erramilli, Bogovic [Page 7] INTERNET-DRAFT Benchmarking Methodology for DiffServ July 2000 with. Configure the device to meet the service expectations. Determine the throughput for the code-points with the device correctly configured for the listed frame size. Send a stream of frames of a particular size through the DUT at the determined throughput rate to a specific destination for each defined code- point, simultaneously. Tag one packet of each class at the halfway point in the time duration of the test as defined in the latency test in RFC 2544. Determine the latency as the time difference between the packet being transmitted and received as in RFC 2544 for each class. Repeat the test at least 20 times. The value reported is the average obtained over the 20 trials. The test duration must be at least 120 seconds with the packet whose latency is to be measured being tagged after 60 seconds. The test should be repeated with the tagged packets being part of the same flow and the tagged packet destined to a different address. Reporting Format: Latency results should be reported in the format of a table with a row for each frame size describing the load on each class and the latency observed for each class. The units used for reporting latency should be the maximum resolution that the test equipment can measure. In addition a graph may be plotted which shows frame size on the x axis and latency in on the y-axis for traffic corresponding to each code-point that is being tested. 6.3 Frame loss Test Objective: To determine the frame loss rate, for a DUT through a range of input data and frame sizes for each class. Procedure: Carry out the generic throughput test for the code-points being tested. Offer traffic at the maximum input line rate with the load composed of frames in the ratio determined by the generic throughput test. e.g. if code-points x1, x2 and x3 are being tested and the throughput values are in the ratio 1:2:3, offer x1 at 1/6*line rate , x2 and 2/6*line rate and x3 at 3/6*line rate. If the count of received frames is less than the count of offered frames there are losses. Compute the frame loss ratio for each class. If there are losses reduce the overall load by 10% for the next trial. i.e. offered load for x1, x2 and x3 are now 1/6*0.9*line rate, 2/6*0.9*line rate and 3/6*0.9*line rate. Continue reducing the load by 10% on each subsequent trial till two trials are obtained at which there are no losses. Repeat for different frame sizes. If possible the load should be decreased at a finer granularity than 10%. Khurana, Erramilli, Bogovic [Page 8] INTERNET-DRAFT Benchmarking Methodology for DiffServ July 2000 Loss ratio for a class is defined as ( input_count_for_class - output_count_for_class) / input_count_for_class Reporting Format: The result should be reported as a table. The table should have columns for the frame size, total offered load and the offered load for each class. The offered load should be reported as a fraction of the line rate. There should be columns for the overall loss ratio and the loss ratio for each class. The loss ratio is defined as the fraction of input packets lost as above. 6.4 Back-to-back frames Objective: To determine the ability of the DUT to handle back to back frames of each class. Determine whether the classes are independently forwarded. Procedure: For the code-points under test send alternating back to back frames with the minimum inter frame gap. If the count of transmitted frames is equal to the number of frames forwarded the length of the burst is increased and the test is rerun. The back-to-back value for each class is the longest burst that the DUT will handle without the loss of any frame. The reported values should be averaged over at least 50 trials. The trials should be of at least 2 seconds duration. The test should be repeated with different frame lengths. Reporting Format: A table with columns for burst length for each code-point should be reported. Values should be repeated for different frame sizes. 6.5 Delay Jitter Objective: To determine delay jitter. Procedure: The procedure for determining delay jitter is similar to the procedure for measuring latency as explained in section 6.2. Traffic for each class is transmitted at the throughput rate. Instead of one packet being marked for each class in the middle of the test, two packets are marked. The difference in the latencies for the two Khurana, Erramilli, Bogovic [Page 9] INTERNET-DRAFT Benchmarking Methodology for DiffServ July 2000 marked packets of each class determines the delay jitter. Reporting Format: Delay jitter results should be reported in the format of a table with a row for each frame size describing the load on each class and the delay variation observed for each class. The units used for reporting delay jitter should be the maximum resolution that the test equipment can measure. In addition a graph may be plotted which shows frame size on the x axis and the delay variation on the y-axis for traffic corresponding to each codepoint that is being tested. 7. Benchmarking Tests A minimal set of recommended code-points to carry out the generic tests are enumerated in this section. These tests are grouped together by the objective of the test. There may be an overlap in the input code-point values if tests in multiple groups are carried out. In this case, the tests need not be repeated. However, the metrics obtained for the input code-points from the previous test group should be reported. Other combinations of code-points beyond the minimal set may be tested depending on the services that are expected from the device. 7.1 Test for each code-point Objective: Determine the performance penalty associated with enabling QoS for each code-point. At a minimum the test should be carried out for the code-point with the highest expected performance being considered and best effort traffic 000000. Procedure: Carry out the throughput test explained in section 6.1 with all of the input traffic stream belonging to a single codepoint. Repeat the test for each codepoint, with and without QoS features enabled on the DUT. Carry out the generic latency test and jitter tests at the input rates obtained for each code-point. Repeat the tests for each codepoint, with and without QoS features enabled on the DUT. Carry out the frame loss test for the codepoint being tested. Determine the back-to back value for the code-point under test. Reporting Format: Report the throughput result as the graphs in the generic throughput test. The graphs show actual and expected throughput values, if any, for the code-point. Graphs should be plotted for both QoS enabled and QoS disabled. If possible, the graphs may be overlaid. Latency, Khurana, Erramilli, Bogovic [Page 10] INTERNET-DRAFT Benchmarking Methodology for DiffServ July 2000 frame loss and back-to-back values should be reported as tables for different frame sizes. 7.2 Test for two code-points Objective: Determine the relative service differentiation associated with different pairs of code-points. At a minimum the following pairs should be tested. 111000 and 000000 (Best Effort) 110000 and 000000 (Best Effort) This test ensures that best effort traffic is not starved and ensures compliance with class selector PHB standard, i.e. 111000 and 110000 (TOS precedence values historically used for routing traffic) are given preferential treatment[2] over best effort traffic. Procedure: For traffic streams corresponding to two user defined distinct code- points, carry out the generic throughput test. Carry out the generic latency, jitter, frame loss and back-to-back tests. Reporting Format: Plot graphs as for the generic throughput test. Results for latency, jitter, frame loss rate and back-to-back tests are reported as tables as explained in section 6. 7.3 Test to study effect of other classes on a specific class Objective: Study the effect of traffic from other classes on a given class having, code-point x. Determine whether x uses the excess bandwidth available if other classes are inactive. Ensure that class x is not starved. At a minimum the test should be carried out with x = 000000. Procedure: Input the codepoint of interest, x. Input a list y of other code- points against which the metrics obtained for x are to be compared. If feasible, the test should be carried out with all codepoints of interest as members of the set y. Carry out the generic throughput, frame loss and back-to-back tests with each combination comprising x and elements of y. i.e. if x and y1, y2, y3 are to be tested carry out the tests for all combinations of x, y1, y2, and y3. x always enabled and y1, y2, y3 either active or inactive for a total of 8 test cases. Khurana, Erramilli, Bogovic [Page 11] INTERNET-DRAFT Benchmarking Methodology for DiffServ July 2000 Reporting Format: The results should be reported as a table. There should be a column for each codepoint stating whether the code-point was enabled for the test. Another column should state the throughput obtained for the codepoint as a percentage of the total. The frame size should be stated in another column. Delay values should be reported similarly, with the table above containing additional columns for latency values obtained for each codepoint. The tables for frame loss and back-to-back values are reported as for the generic tests for these metrics. 8. Security Considerations Security issues are not addressed in this document. 9. Acknowledgements Thanks are due to Susan Thomson, Raghavan Kalkunte and Tong Zhang for discussions related to the contents of this draft. 10. References [1] S. Blake, D. Black, M. Carlson, E. Davies, Z. Wang, W. Weiss, An Architecture for Differentiated Services, IETF RFC 2475, December 1998. [2] K. Nichols, S. Blake, F. Baker, D. Black, Definition of the Differentiated Services Field (DS Field) in the IPv4 and IPv6 Headers, IETF RFC 2474, December 1998. [3] Van Jacobson, K. Nichols, K. Poduri, An Expedited Forwarding PHB, IETF RFC 2598, June 1999. [4] J. Heinanen, T. Finland, F. Baker, W. Weiss, J. Wroclawski, Assured Forwarding PHB Group, IETF RFC 2597, June 1999. [5] S. Bradner, J. McQuaid, Benchmarking Methodology for Network Interconnect Devices, IETF RFC 2544, March 1999. [6] S. Bradner, Benchmarking Terminology for Network Interconnection Devices, IETF RFC 1242, July 1991. 11. Authors' Addresses Sumit Khurana 445 South Street, MCC 1G233B, Morristown, NJ 07960 Email: sumit@research.telcordia.com Khurana, Erramilli, Bogovic [Page 12] INTERNET-DRAFT Benchmarking Methodology for DiffServ July 2000 Shobha Erramilli 331 Newman Springs Road, NVC 3X173, Red Bank, NJ 07701 Email: shobha@research.telcordia.com Tony Bogovic 445 South Street, MCC 1A264B, Morristown, NJ 07960 Email: tjb@research.telcordia.com Expires: January 2001 Khurana, Erramilli, Bogovic [Page 13]