Network Working Group                          Sumit Khurana
INTERNET-DRAFT                                 Shobha Erramilli
Expires January 2001                           Tony Bogovic
                                               Telcordia Technologies
                                               July 2000


Benchmarking Methodology for Devices Implementing Differentiated Services
                draft-khurana-bmwg-diffservm-00.txt

1. Status of this Memo

   This document is an Internet-Draft and is in full conformance with
   all provisions of Section 10 of RFC2026.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt

   The list of Internet-Draft shadow directories can be accessed at
   http://www.ietf.org/shadow.html

   This memo provides information for the Internet community.  This memo
   does not specify an Internet standard of any kind.  Distribution of
   this memo is unlimited.


2. Abstract

   This document discusses and defines tests for performance
   characterization of network interconnect devices implementing support
   for Differentiated services. The metrics to be used for performance
   evaluation and the methodology for determining the metrics is
   explained, as also the format for reporting the results of the tests.
   It builds upon the test methodology described in RFC 2544.


3. Introduction

   Performance evaluation of network equipment is important for network
   service providers in order to provision networks to satisfy the


Khurana, Erramilli, Bogovic                                     [Page 1]

INTERNET-DRAFT   Benchmarking Methodology for DiffServ         July 2000


   requirements of their customers. From the vendors perspective having
   well defined benchmarks with clearly stated metrics and performance
   comparison criteria, helps them enumerate the strengths and
   weaknesses of their products in the marketplace.

   Of late there has been a demand to provide services such as Virtual
   Private Networks and Voice, over IP networks which may require that
   Internet Service Providers provide guaranteed performance levels to
   their customers. For example, VPNs may require low end to end loss
   and voice may be supported if and only if the end to end delay is
   bounded. Performance evaluation of networking equipment that comprise
   a provider's network, becomes even more critical in this context as
   the providers must satisfy their customers that they would be able to
   meet the requirements for mission-critical or real time applications.

   The IETF has proposed a Differentiated Services architecture[1] for
   providing scalable service discrimination in the Internet. Packets
   are marked differently to create several packet classes. Packets
   belonging to different classes are given different forwarding
   treatments in the network. Individual micro-flows at the edge of the
   network are classified as one of several unique service classes (such
   as gold, silver and bronze; Olympic services) based on the analysis
   of one or more fields (multi-field (MF)) in the IP header such as
   source, destination, port number, protocol ID etc. The DS field in
   the IP header is marked with distinct codepoints to identify
   different classes. Routers in the core base the treatment that these
   packets receive, on this marking by carrying out a behavior aggregate
   classification, which enables the packets to be put in different
   queues. Routers then implement a set of buffer management and packet
   scheduling mechanisms on these queues to permit per-hop service
   differentiation between traffic classes. A service provisioning
   policy can be engineered to control the amount of traffic associated
   with each class of service. Existence of a service level agreement
   (SLA) is assumed between the customer and the ISP. Traffic conforming
   to the SLA (in-profile) and traffic exceeding the SLA (e.g. packets
   arriving at a rate higher than the agreed upon bandwidth) are treated
   differently.  Classifiers, markers and policers or shapers work at
   the edge of the network to enforce the service provisioning policy.


4. Motivation

   Conceptually, the functionality that a router provides, consists of
   two distinct blocks, the packet forwarding part and the routing part.
   The packet forwarding engine determines how the packets are queued
   and scheduled for service. The routing engine determines the next hop
   route that the packets take when they are serviced. The packet
   forwarding functionality changes in routers implementing


Khurana, Erramilli, Bogovic                                     [Page 2]

INTERNET-DRAFT   Benchmarking Methodology for DiffServ         July 2000


   differentiated services, as compared to routers handling only best
   effort traffic. The routing engine is not impacted as a result of
   implementing differentiated services. The logical components of the
   forwarding engine are classifier, meter, marker, dropper, queue and
   scheduler. Together, these serve to condition the traffic to conform
   to the service level agreements that are expected of the router
   implementing the services. For example, all traffic exceeding a
   certain traffic profile is dropped, whereas all traffic that is in
   profile, gets a configured departure rate and thus end to end delay
   bounds are provided for the customer's application.

   In terms of performance goals, it is important to characterize how
   well the devices implement the forwarding engine to meet service
   expectations. Diffserv provides the building blocks for services, but
   does not define the services themselves. Performance evaluation of a
   device can help determine what are the services that can be supported
   or provided using that router. In terms of router implementations,
   one wants to characterize the efficacy of the scheduling and buffer
   management algorithms in terms of providing the services without
   necessarily delving into the precise details of what these algorithms
   are.  Some specific goals are to

      a) Determine the effectiveness of the device in providing
      different services to traffic belonging to different classes.
      b) Test whether enabling QoS features imposes an excessive
      performance penalty on the Device Under Test.
      c) Test whether lower priority classes starve in the presence of
      congestion.
      d) If one class under utilizes its share, determine how the excess
      is distributed.

   This document defines a set of tests for characterizing the
   performance of routers implementing differentiated services. The
   tests make no assumptions on the specific algorithms that the devices
   use to implement differentiated services. Instead, the test results
   should be evaluated in terms of the services that are expected from
   the device. RFC 2544[5] defines a set of tests for benchmarking
   network interconnect devices considering only one class of traffic.
   This document extends the test methodology explained in RFC2544 to
   provide per class distinction in the metrics obtained. In terms of
   benchmarking device performance RFC 2544 tests help characterize the
   aggregate and best effort forwarding behaviors of the device, whereas
   this document extends RFC 2544 tests to characterize the efficiency
   of the packet forwarding behavior of the device (that is, its
   scheduling, buffer management, policing and shaping capabilities,) in
   providing service differentiation.

   The outline of the rest of this document is as follows; Section 5


Khurana, Erramilli, Bogovic                                     [Page 3]

INTERNET-DRAFT   Benchmarking Methodology for DiffServ         July 2000


   provides an overview of the general approach. Section 6 explains the
   modifications needed to the tests stated in RFC 2544 in order to be
   able to test differentiated services. Section 7 is a list of
   recommended tests.  RFC 2544, RFC 1242[6] and RFC 2475[1] should be
   consulted before attempting to use this document.


5. Background


5.1 Device Configuration

   The Differentiated Services framework does not define service
   definitions or the specific mechanisms or algorithms through which
   Per-Hop behaviors are implemented. The test procedures and analysis
   must also therefore be independent of the specific mechanisms, that
   routers implement for scheduling or buffer management. Care must
   however be taken that equivalent features are enabled on the Devices
   Under Test, in order that test results are comparable. This is
   especially important for benchmarking, when routers A and B are being
   evaluated relative to each other. Since the standard doesn't impose
   truly equivalent features on devices, it is recommended that the
   features or mechanisms on the routers be enabled which best satisfy a
   given service definition. If the same algorithms exist on both
   routers those should be enabled. If not, configurations that serve to
   satisfy the service requirement should be enabled. A service
   definition can be defined in terms of the loss rate, latency or
   throughput expected for given code-points. The expectations could be
   expressed in several different ways. For example, for a given class;

      a) Absolute relative to another class (e.g. Throughput (X2) = 1.5
      * Throughput(X2), configure classification and weights for
      weighted fair queuing such that  this condition is satisfied.)
      b) An absolute value ( e.g. Throughput(X1) = 100 frames/ sec),
      c) Relative to other classes  (e.g. latency(X2) < latency(X1),
      configure policing using the leaky bucket mechanism, such that out
      of profile traffic for X2 is dropped and map X2 to the
      implementation of the EF[3] PHB.)  or
      d) in some other consistent user defined manner that is relevant.

   The configuration used should be stated along with test results.


5.2 Test Parameters

   The test configuration is as described in RFC 2544. i.e. a tester may
   be used to generate traffic and receive traffic after it passes
   through the device under test. In particular, sections 6 through 25


Khurana, Erramilli, Bogovic                                     [Page 4]

INTERNET-DRAFT   Benchmarking Methodology for DiffServ         July 2000


   of RFC 2544 are directly applicable for the test input parameters.
   For example, the frame size values used should be consistent with RFC
   2544 tests.

   The additional capability required of the tester is the ability to
   generate traffic belonging to different classes, i.e. the ability to
   mark the DSCP field of the generated IP packets differently. The
   receiver on the tester should be able to demultiplex packets based on
   the codepoints in order to determine metrics on a per code-point
   basis.

   Most of the terminology used is consistent with RFC 1242. Additional
   terminology for delay jitter is defined in section 5.4.


5.3 Test Result Evaluation

   Data obtained as a result of carrying out the listed tests should be
   evaluated with regard to the environment in which the device would be
   deployed and based on the most important service differentiation
   criteria expected of it. For example, does the device configured to
   provide a guaranteed bandwidth to traffic with code-point x, also
   ensure that the latency for traffic with code-point x is less than a
   certain maximum. Some suggested criteria for evaluating the test
   results are:

      a) Conformance to standards. For example, are packets belonging to
      an AF code-point mis-ordered.
      b) Service expectations. Is the device able to satisfy the service
      expectations.
      c) Metrics obtained for one device relative to another device.

   These criteria may be interpreted and/or enhanced in the context of
   the environment in which the equipment is to be deployed.


5.4 Additional Terminology

   The tests described in this document are consistent with the
   definitions of Throughput, Latency, Frame Loss Rate and the Ability
   to handle back to back frames as described in RFC 1242. However, they
   help determine these metrics separately for each class being tested.

   In addition a metric and corresponding methodology for Delay Jitter
   are defined.


5.4.1 Definition of Delay Jitter


Khurana, Erramilli, Bogovic                                     [Page 5]

INTERNET-DRAFT   Benchmarking Methodology for DiffServ         July 2000


   Definition:
   The instantaneous delay jitter of a frame is defined as the
   difference of the latency of that packet and the latency of the
   preceding packet in the stream.

   Discussion:
   Delay variation is an important consideration for supporting delay
   sensitive services. For example, when a real-time voice application
   (e.g. telephony) receives voice packets over an IP network, the
   packets are played out as they are received. However, if a packet
   arrives while previous packets are still being played out, the new
   packets must be buffered till it is time to play them out. Similarly,
   if a packet arrives too late there is a gap in the play out. Delay
   jitter thus has important implications for what services can be
   supported and the buffering required in order to support them.

   In the context of a device implementing Diffserv, it is important to
   measure the efficacy of the device in scheduling succeeding packets
   of a given class, such that the instantaneous packet delay variation
   is bounded.

   Measurement Units:
   Time with fine enough units to distinguish 2 events.


6. Benchmarking Tests Methodology

   This section describes a set of tests for determining throughput,
   latency, delay jitter, frame loss rate and the ability to handle
   back-to-back frames, given a set of codepoints as input.  The tests
   described in section 7, recommend a minimal set of codepoints to test
   using these generic tests.

   For the sake of uniformity the tests are enumerated in the same
   format as used in RFC 2544.


6.1 Throughput Test

   Objective:
   To determine the throughput provided by the DUT when QoS parameters
   are enabled. Determine the relative share of bandwidth for each class
   being tested.

   Procedure:
   Input the code-points that are going to be exercised. Input the
   service expectation. For example, the  allocation of relative
   bandwidth between different classes. Configure the device under test


Khurana, Erramilli, Bogovic                                     [Page 6]

INTERNET-DRAFT   Benchmarking Methodology for DiffServ         July 2000


   to meet this service expectation. For each codepoint send a specific
   number of frames at a rate equal to or higher than the input link
   capacity through the DUT and count the frames that are transmitted by
   the DUT. There will be losses.

   For the codepoint with the highest expected service (the codepoint
   with the highest numeric value for class selector PHB[2]), reduce the
   offered rate and rerun the test till no frames are lost, i.e. the
   count of received frames for the class is equal to the count of
   offered frames for the class. The throughput for the class is the
   fastest rate at which the count of test frames transmitted by the DUT
   for the class is equal to the number of test frames sent to it by the
   test equipment.

   With the higher expected service codepoints being transmitted at
   their throughput rate, repeat the test for the next highest priority
   class to determine its throughput, till all code-points being tested
   are exhausted. The output metric from the test is the set of (code-
   point, throughput) pairs. Repeat the test with different frame sizes.

   Reporting  Format:
   The result should be reported as two graphs. On the first graph the
   x-coordinate should show the frame size. The y-coordinate should show
   the throughput as frames per second. A line representing the
   theoretical throughput for each class may be plotted if expected
   allocation of bandwidth was specified for the test. A line should be
   plotted for the actual throughput for each class. The total
   throughput obtained  across all classes, ( sum of throughput of  all
   the classes being tested) should also be plotted.  On the second
   graph the x-coordinate should show the frame size. The y-coordinate
   should show the percentage of the total throughput obtained across
   all classes. Lines should be plotted for each class for the actual
   throughput. Lines may also be plotted for the expected throughput.

   The  first graph shows the utilization when QoS is activated, giving
   an indication of the performance penalty, if any, when QoS is
   activated. The second graph shows the relative allocation of
   throughput for each class and allows comparison with the expected
   relative values.


6.2 Latency Test

   Objective:
   Determine the latency experienced by frames in each class.

   Procedure:
   Specify the code-points and service expectations, if any, to test


Khurana, Erramilli, Bogovic                                     [Page 7]

INTERNET-DRAFT   Benchmarking Methodology for DiffServ         July 2000


   with. Configure the device to meet the service expectations.
   Determine the throughput for the code-points with the device
   correctly configured for the listed frame size. Send a stream of
   frames of a particular size through the DUT at the determined
   throughput rate to a specific destination for each defined code-
   point, simultaneously. Tag one packet of each class at the halfway
   point in the time duration  of the test as defined in the latency
   test in RFC 2544. Determine the latency as the time difference
   between the packet being transmitted and received as in RFC 2544 for
   each class. Repeat the test at least 20 times. The value reported is
   the average obtained over the 20 trials. The test duration must be at
   least 120 seconds with the packet whose latency is to be measured
   being tagged after 60 seconds. The test should be repeated with the
   tagged packets being part of the same flow and the tagged packet
   destined to a different address.

   Reporting Format:
   Latency results should be reported in the format of a table with a
   row for each frame size describing the load on each class and the
   latency observed for each class. The units used for reporting latency
   should be the maximum resolution that the test equipment can measure.
   In addition a graph may be plotted which shows frame size on the x
   axis and latency in on the y-axis for traffic corresponding to each
   code-point that is being tested.


6.3 Frame loss Test

   Objective:
   To determine the frame loss rate, for a DUT through a range of input
   data and frame sizes for each class.

   Procedure:
   Carry out the generic throughput test for the code-points being
   tested. Offer traffic at the maximum input line rate with the load
   composed of frames in the ratio determined by the generic throughput
   test.  e.g.  if code-points x1, x2 and x3 are being tested and the
   throughput values are in the ratio 1:2:3, offer x1 at 1/6*line rate ,
   x2 and 2/6*line rate and x3 at 3/6*line rate. If the count of
   received frames is less than the count of offered frames there are
   losses. Compute the frame loss ratio for each class.  If there are
   losses reduce the overall load by 10% for the next trial.  i.e.
   offered load for x1, x2 and x3 are now 1/6*0.9*line rate,
   2/6*0.9*line rate and 3/6*0.9*line rate. Continue reducing the load
   by 10% on each subsequent trial till two trials are obtained at which
   there are no losses. Repeat for different frame sizes. If possible
   the load should be decreased at a finer granularity than 10%.


Khurana, Erramilli, Bogovic                                     [Page 8]

INTERNET-DRAFT   Benchmarking Methodology for DiffServ         July 2000


   Loss ratio for a class is defined as
   ( input_count_for_class - output_count_for_class) /
   input_count_for_class

   Reporting Format:
   The result should be reported as a table. The table should have
   columns for the frame size, total offered load and the offered load
   for each class. The offered load should be reported as a fraction of
   the line rate.  There should be columns for the overall loss ratio
   and the loss ratio for each class. The loss ratio is defined as the
   fraction of input packets lost as above.


6.4 Back-to-back frames

   Objective:
   To determine the ability of the DUT to handle back to back frames of
   each class. Determine whether the classes are independently
   forwarded.

   Procedure:
   For the code-points under test send alternating back to back frames
   with the minimum inter frame gap. If the count of transmitted frames
   is equal to the number of frames forwarded the length of the burst is
   increased and the test is rerun.

   The back-to-back value for each class is the longest burst that the
   DUT will handle without the loss of any frame. The reported values
   should be averaged over at least 50 trials. The trials should be of
   at least 2 seconds duration. The test should be repeated with
   different frame lengths.

   Reporting Format:
   A table with columns for burst length for each code-point should be
   reported. Values should be repeated for different frame sizes.


6.5 Delay Jitter

   Objective:
   To determine delay jitter.

   Procedure:
   The procedure for determining delay jitter is similar to the
   procedure for measuring latency as explained in section 6.2. Traffic
   for each class is transmitted at the throughput rate. Instead of one
   packet being marked for each class in the middle of the test, two
   packets are marked. The difference in the latencies for the two


Khurana, Erramilli, Bogovic                                     [Page 9]

INTERNET-DRAFT   Benchmarking Methodology for DiffServ         July 2000


   marked packets of each class determines the delay jitter.

   Reporting Format:
   Delay jitter results should be reported in the format of a table with
   a row for each frame size describing the load on each class and the
   delay variation observed for each class. The units used for reporting
   delay jitter should be the maximum resolution that the test equipment
   can measure. In addition a graph may be plotted which shows frame
   size on the x axis and the delay variation on the y-axis for traffic
   corresponding to each codepoint that is being tested.


7. Benchmarking Tests

   A minimal set of recommended code-points to carry out the generic
   tests  are enumerated in this section. These tests are grouped
   together by the objective of the test. There may be an overlap in the
   input code-point values if tests in multiple groups are carried out.
   In this case, the tests need not be repeated. However, the metrics
   obtained for the input code-points from the previous test group
   should be reported. Other combinations of code-points beyond the
   minimal set may be tested depending on the services that are expected
   from the device.


7.1 Test for each code-point

   Objective:
   Determine the performance penalty associated with enabling QoS for
   each code-point. At a minimum the test should be carried out for the
   code-point with the highest expected performance being considered and
   best effort traffic 000000.

   Procedure:
   Carry out the throughput test explained in section 6.1 with all of
   the input traffic stream belonging to a single codepoint. Repeat the
   test for each codepoint, with and without QoS features enabled on the
   DUT.  Carry out the generic latency test and jitter tests at the
   input rates obtained for each code-point. Repeat the tests for each
   codepoint, with and without QoS features enabled on the DUT.  Carry
   out the frame loss test for the codepoint being tested.  Determine
   the back-to back value for the code-point under test.

   Reporting Format:
   Report the throughput result as the graphs in the generic throughput
   test. The graphs show actual and expected throughput values, if any,
   for the code-point. Graphs should be plotted for both QoS enabled and
   QoS disabled. If possible, the graphs may be overlaid.  Latency,


Khurana, Erramilli, Bogovic                                    [Page 10]

INTERNET-DRAFT   Benchmarking Methodology for DiffServ         July 2000


   frame loss and back-to-back values should be reported as tables for
   different frame sizes.


7.2 Test for two code-points

   Objective:
   Determine the relative service differentiation associated with
   different pairs of code-points. At a minimum the following pairs
   should be tested.
      111000 and 000000 (Best Effort)
      110000 and 000000 (Best Effort)
   This test ensures that best effort traffic is not starved and ensures
   compliance with class selector PHB standard, i.e. 111000 and 110000
   (TOS precedence values historically used for routing traffic) are
   given preferential treatment[2] over best effort traffic.

   Procedure:
   For traffic streams corresponding to two user defined distinct code-
   points, carry out the generic throughput test.  Carry out the generic
   latency, jitter, frame loss and back-to-back tests.

   Reporting Format:
   Plot graphs as for the generic throughput test. Results for latency,
   jitter, frame loss rate and back-to-back tests are reported as tables
   as explained in section 6.


7.3 Test to study effect of other classes on a specific class

   Objective:
   Study the effect of traffic from other classes on a given class
   having, code-point x.  Determine whether x uses  the excess bandwidth
   available if other classes are inactive. Ensure that class x is not
   starved.  At a minimum the test should be carried out with x =
   000000.

   Procedure:
   Input the codepoint of interest, x. Input a list y of other code-
   points against which the metrics obtained for x are to be compared.
   If feasible, the test should be carried out with all codepoints of
   interest as members of the set y. Carry out the generic throughput,
   frame loss and back-to-back tests with each combination comprising x
   and elements of y.

   i.e.  if x and y1, y2, y3 are to be tested carry out the tests for
   all combinations of x, y1, y2, and y3. x always enabled and y1, y2,
   y3 either active or inactive for a total of 8 test cases.


Khurana, Erramilli, Bogovic                                    [Page 11]

INTERNET-DRAFT   Benchmarking Methodology for DiffServ         July 2000


   Reporting Format:
   The results should be reported as a table. There should be a column
   for each codepoint stating whether the code-point was enabled for the
   test. Another column should state the throughput obtained for the
   codepoint as a percentage of the total. The frame size should be
   stated in another column.

   Delay values should be reported similarly, with the table above
   containing additional columns for latency values obtained for each
   codepoint. The tables for frame loss and back-to-back values are
   reported as for the generic tests for these metrics.


8. Security Considerations

   Security issues are not addressed in this document.


9. Acknowledgements

   Thanks are due to Susan Thomson, Raghavan Kalkunte and Tong Zhang for
   discussions related to the contents of this draft.

10. References

   [1] S. Blake, D. Black, M. Carlson, E. Davies, Z. Wang, W. Weiss, An
   Architecture for Differentiated Services, IETF RFC 2475, December
   1998.
   [2] K. Nichols, S. Blake, F. Baker, D. Black,  Definition of the
   Differentiated Services Field (DS Field) in the IPv4 and IPv6
   Headers, IETF RFC 2474, December 1998.
   [3] Van Jacobson, K. Nichols, K. Poduri, An Expedited Forwarding PHB,
   IETF RFC 2598, June 1999.
   [4] J. Heinanen, T. Finland, F. Baker, W. Weiss, J. Wroclawski,
   Assured Forwarding PHB Group, IETF RFC 2597, June 1999.
   [5] S. Bradner, J. McQuaid, Benchmarking Methodology for Network
   Interconnect Devices, IETF RFC 2544, March 1999.
   [6] S. Bradner, Benchmarking Terminology for Network Interconnection
   Devices, IETF RFC 1242, July 1991.

11. Authors' Addresses

   Sumit Khurana
   445 South Street, MCC 1G233B,
   Morristown,
   NJ 07960

   Email: sumit@research.telcordia.com


Khurana, Erramilli, Bogovic                                    [Page 12]

INTERNET-DRAFT   Benchmarking Methodology for DiffServ         July 2000


   Shobha Erramilli
   331 Newman Springs Road, NVC 3X173,
   Red Bank,
   NJ 07701

   Email: shobha@research.telcordia.com

   Tony Bogovic
   445 South Street, MCC 1A264B,
   Morristown,
   NJ 07960

   Email: tjb@research.telcordia.com

   Expires: January 2001


Khurana, Erramilli, Bogovic                                    [Page 13]