Internet Engineering Task Force                   Integrated Services WG
INTERNET-DRAFT                              Shenker/Partridge/Wroclawski
draft-ietf-intserv-control-del-svc-02.txt                  Xerox/BBN/MIT
                                                        14 November 1995
                                                         Expires: ?/?/96


          Specification of Controlled Delay Quality of Service


Status of this Memo


   This document is an Internet-Draft.  Internet-Drafts are working
   documents of the Internet Engineering Task Force (IETF), its areas,
   and its working groups.  Note that other groups may also distribute
   working documents as Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as ``work in progress.''

   To learn the current status of any Internet-Draft, please check the
   ``1id-abstracts.txt'' listing contained in the Internet-Drafts Shadow
   Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe),
   munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or
   ftp.isi.edu (US West Coast).

   This document is a product of the Integrated Services working group
   of the Internet Engineering Task Force.  Comments are solicited and
   should be addressed to the working group's mailing list at int-
   serv@isi.edu and/or the author(s).

   This draft reflects changes from the IETF meeting in Stockholm.


Abstract


      This memo describes the network element behavior required to
      deliver Controlled Delay service in the Internet.  Controlled
      delay service provides three levels of delay control; network
      elements, when overloaded, are required to control delay by
      denying service requests.  However, there are no quantitative
      assurances about the absolute level of delay provided.  The
      controlled delay service is designed for service-adaptive and


Shenker/Partridge/Wroclawski Expires ?/?/95                     [Page 1]

INTERNET-DRAFT draft-ietf-intserv-control-del-svc-02.txt         ?, 1995


      delay-adaptive applications; i.e., applications that are prepared
      to dynamically adapt to changing packet transmission delays and to
      dynamically change the level of packet delivery delay control they
      request from the network when their current level of service is
      not adequate.  The controlled delay service imposes relatively
      minimal requirements on network components that implement it, and
      is intended to be usable in situations ranging from small
      centrally managed private IP networks to the global Internet.
      This specification follows the service specification template
      described in [1].


Introduction

   This document defines the requirements for network elements that
   support Controlled Delay service.  This memo is one of a series of
   documents that specify the network element behavior required to
   support various qualities of service in IP internetworks.  Services
   described in these documents are useful both in the global Internet
   and private IP networks.

   This document is based on the service specification template given in
   [1]. Please refer to that document for definitions and additional
   information about the specification of qualities of service within
   the IP protocol family.


End-to-End Behavior

   The end-to-end behavior provided by a series of network elements that
   conform to this document provides three levels of delay control.
   This service ensures that the levels of experienced delays and losses
   will be controlled, in that additional service requests will be
   turned away when the element is overloaded.  In particular, the
   bandwidth available to the flow will be, on average, at least as
   great as specified in its service request.  Criteria for determining
   when a resource is overloaded are not specified in this definition,
   but are left to the individual vendor.   This service makes no
   assurances about the absolute levels of delay or jitter the receiving
   application will experience.  However, all three levels of controlled
   delay service will have average delays that are no worse than best
   effort service, and the maximal delays should be significantly better
   than best effort service when there is significant load on the
   network.  Packet losses are rare as long as the offered traffic
   conforms to the specified traffic characterization  (see Invocation
   Information).

   This service is subject to admission control.


Shenker/Partridge/Wroclawski Expires ?/?/95                     [Page 2]

INTERNET-DRAFT draft-ietf-intserv-control-del-svc-02.txt         ?, 1995


Motivation

   Controlled delay service is designed for service-adaptive and delay-
   adaptive applications. These applications are sensitive to packet
   delivery delay, but are prepared to adapt to dynamically changing
   delays by varying their playback point.  In addition, they may be
   prepared to change their requested level of service at any time if
   the current level of service received from the network is not
   adequate. This flexibility allows such applications to operate
   successfully and efficiently over a wide range of network conditions.

   Many applications that transmit interactive data, such as audio and
   video conferencing sessions, are well suited to operation with the
   controlled delay service. Applications that desire proven guarantees
   on packet delivery time, such as real-time control and servoing
   systems or playback applications that are intolerant of late-arriving
   packets, are generally not in this category.

   The end-to-end behavior obtained with controlled delay service
   provides a middle ground between the employment of adaptive
   applications in a pure best-effort network and the employment of a
   network that rigidly controls delay.  Strengths of this middle ground
   are that applications can obtain some load control and delivery
   preference for their packets while still benefiting from their
   adaptive behavior; that the service can be usefully deployed in
   large, unstructured internetworks; and that the specification is
   amenable to highly efficient implementation and use of network
   resources.

   Associated with this service are characterization parameters which
   describe the current delays experienced in the three services levels.
   If the characterizations are provided to the endpoints, these will
   provide some hint about the likely end-to-end delays that might
   result from requesting a particular level of service.  This is
   intended to aid applications in choosing the appropriate service
   level.  However, this service is still quite usable without these
   characterizations.


Network Element Data Handling Requirements


   The network element must ensure that the packet loss and delays are
   controlled.  This must be accomplished through active admission
   control.  In particular, overprovisioning is not sufficient to
   deliver controlled delay service; the element must be able to turn
   flows away if accepting them would cause the element to have
   excessive queueing delays.  However, no quantitative specification of


Shenker/Partridge/Wroclawski Expires ?/?/95                     [Page 3]

INTERNET-DRAFT draft-ietf-intserv-control-del-svc-02.txt         ?, 1995


   average, statistical, or maximal delays is required.

   There are three different logical levels of service. A network
   element may internally implement fewer (or more) actual levels of
   service, but must map them into three logical levels at the
   controlled delay service invocation interface.  The levels have
   different degrees of delay control, with level 1 having the most
   tightly controlled delay, and level 3 having the least tightly
   controlled delay.  The different levels do not have to give strictly
   ordered delays for each packet; that is, the network element need not
   ensure that every packet given level 1 service experiences less delay
   than if it were given level 2 service.  The element need only ensure
   that the typical delays are no greater in level 1 than in level 2
   (and similarly for levels 2 and 3).

   All three levels of service should be given better service, i.e. more
   tightly controlled delay, than uncontrolled best effort traffic.  The
   average delays experienced by packets receiving different levels of
   controlled delay service and best-effort service may not differ
   significantly.  However, the tails of the delay distributions, i.e.,
   the maximum packet delays seen, for the levels of controlled delay
   service that are implemented and for best-effort service should be
   significantly different when the network has substantial load.

   The controlled delay service must maintain a very low level of packet
   loss. Although packet losses may occur, any substantial loss
   represents a "failure" of the admission control algorithm.  However,
   vendors may employ admission control algorithms with different levels
   of conservativeness, resulting in very different levels of loss
   (varying, for instance, from 1 in 10^4 to 1 in 10^8).

   The controlled delay service definition does not require any control
   of short-term packet jitter (variation in network element transit
   delay between different packets in the flow) beyond the control
   already exercised on delay. Network element implementors who find it
   advantageous to do so may use resource scheduling algorithms that
   exercise some jitter control.

   Links are not permitted to fragment packets as part of controlled
   delay service.  Packets larger than the MTU of the link must be
   policed as nonconformant which means that they will be policed
   according to the rules described in the Policing section below.


Invocation Information


   The controlled delay service is invoked by specifying  the traffic


Shenker/Partridge/Wroclawski Expires ?/?/95                     [Page 4]

INTERNET-DRAFT draft-ietf-intserv-control-del-svc-02.txt         ?, 1995


   (TSpec) and the desired service (RSpec) to the network element.  A
   service request for an existing flow that has a new TSpec and/or
   RSpec should be treated as a new invocation, in the sense that
   admission control must be reapplied to the flow.  Flows that reduce
   their TSpec and/or their RSpec (i.e., their new TSpec/RSpec is
   strictly smaller than the old TSpec/RSpec according to the ordering
   rules described in the section on Ordering below) should never be
   denied service.

   The TSpec takes the form of a token bucket plus a minimum policed
   unit (m) and a maximum packet size (M).

   The token bucket has a bucket depth, b, and a bucket rate, r.  Both b
   and r must be positive.  The rate, r, is measured in bytes of IP
   datagrams per second, and can range from 1 byte per second to as
   large as 40 terabytes per second (or about what is believed to be the
   maximum theoretical bandwidth of a single strand of fiber).  Clearly,
   particularly for large bandwidths, only the first few digits are
   significant and so the use of floating point representations,
   accurate to at least 0.1% is encouraged.

   The bucket depth, b, is also measured in bytes and can range from 1
   byte to 250 gigabytes.  Again, floating point representations
   accurate to at least 0.1% are encouraged.

   The range of values is intentionally large to allow for the future
   bandwidths.  The range is not intended to imply that a network
   element must support the entire range.

   The minimum policed unit, m,  is an integer measured in bytes.  All
   IP datagrams less than size m will be counted against the token
   bucket as being of size m. The maximum packet size, M, is the biggest
   packet that will conform to the traffic specification; it is also
   measured in bytes.  The flow must be rejected if the requested
   maximum packet size is larger than the MTU of the link.   Both m and
   M must be positive, and m must be less then or equal to M.

   The RSpec is a service level.  The service level is specified by one
   of the integers 1, 2, or 3.  Implementations should internally choose
   representations that leave a range of at least 256 service levels
   undefined, for possible extension in the future.

   The TSpec can be represented by two floating point numbers in
   single-precision IEEE floating point format followed by two 32-bit
   integers in network byte order.  The first value is the rate (r), the
   second value is the bucket size (b), the third is the minimum policed
   unit (m), and the fourth is the maximum packet size (M).


Shenker/Partridge/Wroclawski Expires ?/?/95                     [Page 5]

INTERNET-DRAFT draft-ietf-intserv-control-del-svc-02.txt         ?, 1995


   The RSpec may be represented as an unsigned 16-bit integer carried in
   network byte order.

   For all IEEE floating point values, the sign bit must be zero. (All
   values must be positive).  Exponents less than 127 (i.e., 0) are
   prohibited.  Exponents greater than 162 (i.e., positive 35) are
   discouraged.


Exported Information


   Each controlled delay service module exports at least the following
   information. All of the parameters described below are
   characterization parameters.

   For each level of service, the network element exports three
   measurements of delay (thus making nine quantities in total).  Each
   of these characterization parameters is based on the maximal packet
   transit delay experienced over some set of previous time intervals of
   length T; these delays do not include discarded packets.  The three
   time intervals T are 1 second, 60 seconds, and 3600 seconds.  The
   exported parameters are averages over some set of these previous time
   intervals.

   There is no requirement that these characterization parameters be
   based on exact measurements.  In particular, these delay measurements
   can be based on estimates of packet delays or aggregate measurements
   of queue loading.  This looseness is allowed to avoid placing undue
   burdens on network element designs in which obtaining precise delay
   measurements is difficult.

   These delay parameters have an additive composition rule. For each
   parameter the composition function computes the sum, enabling a setup
   protocol to deliver the cumulative sum along the path to the end
   nodes.

   The delays are measured in units of one microsecond.  An individual
   element can advertise a delay value between 1 and 2**28 (somewhat
   over two minutes) and the total delay added across all elements can
   range as high as 2**32-1.  Should the sum of the different elements
   delay exceed 2**32-1, the end-to-end advertised delay should be
   2**32-1.

   Note that while the granularity of measurement is microseconds, a
   conforming element is free to measure delays more loosely.  The
   minimum requirement is that the element estimate its delay accurately
   to the nearest 100 microsecond granularity.  Elements that can


Shenker/Partridge/Wroclawski Expires ?/?/95                     [Page 6]

INTERNET-DRAFT draft-ietf-intserv-control-del-svc-02.txt         ?, 1995


   measure more accurately are, of course, encouraged to do so.

      NOTE: Measuring in milliseconds is not acceptable, because if the
      minimum delay value is a millisecond, a path with several hops
      will lead to a composed delay of at least several milliseconds,
      which is likely to be misleading.

   The characterization parameters may be represented as a sequence of
   nine 32-bit unsigned integers in network byte order.  The first three
   integers are the parameters for T=1, T=60 and T=3600 for level 1, the
   next three integers are for T=1, T=60, T=3600 for level 2, and the
   last three integers are for T=1, T=60, T=3600 for level 3.

   The following values are assigned from the characterization parameter
   namespace.

   The controlled delay service is service_name 1.

   The delay characterization parameters receive parameter_number's one
   through nine, in the order given above. That is,

      parameter_name          definition

      1                       Service Level = 1, T = 1
      2                       Service Level = 1, T = 60
      3                       Service Level = 1, T = 3600
      4                       Service Level = 2, T = 1
      5                       Service Level = 2, T = 60
      6                       Service Level = 2, T = 3600
      7                       Service Level = 3, T = 1
      8                       Service Level = 3, T = 60
      9                       Service Level = 3, T = 3600


   The end-to-end composed results are assigned parameter_names N+10,
   where N is the value of the per-hop name given above.

   No other exported data is required by this specification.


Policing


   Policing is done at the edge of the network, at all heterogeneous
   source branch points and at all source merge points.  A heterogeneous
   source branch point is a spot where the multicast distribution tree
   from a source branches to multiple distinct paths, and the TSpec's of
   the reservations on the various outgoing links are not all the same.


Shenker/Partridge/Wroclawski Expires ?/?/95                     [Page 7]

INTERNET-DRAFT draft-ietf-intserv-control-del-svc-02.txt         ?, 1995


   Policing need only be done if the TSpec on the outgoing link is "less
   than" (in the sense described in the Ordering section) the TSpec
   reserved on the immediately upstream link.  A source merge point is
   where the multicast distribution trees from two different sources
   (sharing the same reservation) merge.  It is the responsibility of
   the invoker of the service (a setup protocol, local configuration
   tool, or similar mechanism) to identify points where policing is
   required.  Policing is allowed at points other than those mentioned
   above.

   The token bucket parameters require that traffic must obey the rule
   that over all time periods, the amount of data sent cannot exceed
   rT+b, where r and b are the token bucket parameters and T is the
   length of the time period.  For the purposes of this accounting,
   links must count packets that are smaller than the minimal policing
   unit to be of size m.  Packets that arrive at an element and cause a
   violation of the the rT+b bound are considered nonconformant.
   Policing to conformance with this token bucket is done in two
   different ways. At all policing point, non-conforming packets are
   treated as best-effort datagrams.  [If and when a marking ability
   becomes available, these nonconformant packets should be ``marked''
   as being non-compliant and then treated as best effort packets at all
   subsequent routers.]  Other actions, such as delaying packets until
   they are compliant, are not allowed.

      NOTE: The prohibition on delaying packets is open to discussion.
      It may be better to permit some delaying of a packet if that delay
      would allow it to pass the policing function.  (In other words, to
      reshape the traffic).  The challenge is to define a viable
      reshaping function.

      Intuitively, a plausible approach is to allow a delay of (roughly)
      up to the maximum queueing delay experienced by completely
      conforming packets before declaring that a packet has failed to
      pass the policing function. The merit of this approach, and the
      precise wording of the specification that describes it, require
      further study.

   A related issue is that at all network elements, packets bigger than
   the MTU of the link must be considered nonconformant and should be
   classified as best effort (and will then either be fragmented or
   dropped according to the element's handling of best effort traffic).
   [Again, if marking is available, these reclassified packets should be
   marked.]


Shenker/Partridge/Wroclawski Expires ?/?/95                     [Page 8]

INTERNET-DRAFT draft-ietf-intserv-control-del-svc-02.txt         ?, 1995


Ordering and Merging

   TSpec's are ordered according to the following rule: TSpec A is a
   substitute ("as good or better than") for TSpec B if (1) both the
   token bucket depth and rate for TSpec A are greater than or equal to
   those of TSpec B, (2) the minimum policed unit m is at least as small
   for TSpec A as it is for TSpec B, and (3) the maximum packet size M
   is at least as large for TSpec A as it is for TSpec B.

   A merged TSpec may be calculated over a set of TSpecs by taking the
   largest token bucket rate, largest bucket size, smallest minimal
   policed unit, and largest  maximum packet size across all members of
   the set.  This use of the word "merging" is similar to that in the
   RSVP protocol; a merged TSpec is one that is adequate to describe the
   traffic from any one of a number of flows.

   Service request specifications (RSpecs) are ordered by their
   numerical values (in inverse order); service level 1 is substitutable
   for service level 2 and 3, and service level 2 is substitutable for
   service level 3.


Guidelines for Implementors


   It is expected that the service levels implemented at a particular
   element will offer significantly different levels of delay control.
   There seems little advantage in offering levels that differ only
   slightly in the level of delay control.  So, while a particular
   element may offer less than three levels of service, the levels of
   service it does offer should have notably different queueing delays.


      NOTE: An additional service currently being considered is the
      "predictive" service described in [3].  It is expected that if an
      element offers both predictive service and controlled delay
      service, that it should not implement both but should use the
      predictive service as a controlled delay service.  This is allowed
      since (1) the required behavior of predictive service meets all of
      the requirements of controlled delay service, (2) the invocations
      are compatible, and (3) the ordering relationships defined in the
      predictive service specification document are such that a given
      level of predictive service is at least as good as the same level
      of controlled delay service. The inter-service mapping with
      predictive service, mentioned above, is omitted from the "Ordering
      and Merging" section of this draft of the controlled delay service
      specification because the exact definition of both services is
      still under discussion. Should the final definitions include an


Shenker/Partridge/Wroclawski Expires ?/?/95                     [Page 9]

INTERNET-DRAFT draft-ietf-intserv-control-del-svc-02.txt         ?, 1995


      inter-service mapping function, the Ordering and Merging sections
      of each document might contain words similar to the following:

      "In addition, the controlled delay service is related to the
      predictive service in the sense that a given level of predictive
      service is considered at least as good as the same level of
      controlled delay service.  See additional comments in the
      guidelines section."

   Network elements are permitted to oversubscribe their traffic, where
   by oversubscribe, we mean that the sum of the token buckets of the
   controlled delay traffic exceeds the maximum throughput or buffer
   space of the router.  However, given the requirement of low loss,
   this oversubscribing should only be done in cases where the element
   is quite sure that actual utilization is far less than the sum of the
   token buckets would suggest.  A more conservative approach is to
   reject new flows, when the addition of their traffic would cause the
   sums of the token buckets to exceed the capacity of the network
   element.

Evaluation Criteria


   Evaluating a network element's implementation of controlled delay
   service is somewhat difficult, since the quality of service depends
   on overall traffic load, the traffic pattern presented and the degree
   of delay control implemented.  In this section we sketch out a
   methodology for testing an element's controlled delay service.

   The idea is that one chooses a particular traffic mix (for instance,
   30 percent level 1, 10 percent level 2, 20 percent level 3 and 40
   percent uncontrolled best-effort traffic) and loads the network
   element with progressively higher amounts of this traffic mix (i.e.,
   40% of capacity, then 50% of capacity, on beyond 100% capacity).  For
   each load level, one measures the utilization, mean delays, and the
   packet loss rate for each level of service (including best effort).
   Each test run at a particular load should involve enough traffic that
   is a reasonable predictor of the performance a long-lived application
   such as a video conference would experience (e.g., an hour or more of
   traffic).

   This memo does not specify particular traffic mixes to test.
   However, we expect in the future that as the nature of real-time
   Internet traffic is better understood, the traffic used in these
   tests will be chosen to reflect the current and future Internet load.


Shenker/Partridge/Wroclawski Expires ?/?/95                    [Page 10]

INTERNET-DRAFT draft-ietf-intserv-control-del-svc-02.txt         ?, 1995


Examples of Implementation


   A possible implementation of controlled delay service would be to
   have a queueing mechanism with three priority levels, with level 1
   packets being highest priority and level 3 packets being lowest
   priority.  Each controlled delay service level would be associated
   with a target queue utilization level, say 20% for level 1, 50% for
   the combination of levels 1 and 2, and 70% for the combination of all
   three levels.  The utilization of the link, by each of the three
   levels, would be measured over some relatively short time period
   (say, 5 seconds, or 10000 MTU packet transmission times).  A new flow
   would be admitted to level 1 if the measured usage of level 1, plus
   the token bucket rate of the new flow, was below the target
   utilization of level 1.  Similarly, a new flow would be admitted to
   level 2 if the measured usage of levels 1 and 2, plus the token
   bucket rate of the new flow, was below the target utilization of
   levels 1 and 2.


Examples of Use


   We give two examples of use, both involving an interactive
   application.

   In the first example, we assume that either the receiving application
   is ignoring characterizations or the network is not delivering the
   characterizations to the end-nodes. We further assume that the
   application's data transmission units is timestamped.  The receiver,
   by inspecting the timestamps, can determine the end-to-end delays and
   react if they are excessive.  If so, then the application asks for a
   better level of service.  If the delays are well below the required
   level, the application can ask for a worse level of service.  A
   protocol useful to applications providing this capability is the
   proposed IETF Real-Time Transport Protocol [2].

   In the second example, we assume that characterization parameters are
   delivered to the receiving application.  The receiver chooses the
   service level whose characterizations for the maximal delays for all
   intervals are under the required level after network latencies are
   considered. If the actual delays during the course of operation are
   worse than expected, the application can ask for a better level of
   service.


Shenker/Partridge/Wroclawski Expires ?/?/95                    [Page 11]

INTERNET-DRAFT draft-ietf-intserv-control-del-svc-02.txt         ?, 1995


Security Considerations

   Security considerations are not discussed in this memo.

References


   [1] S. Shenker and J. Wroclawski. "Network Element Service
   Specification Template", Internet Draft, June 1995, <draft-ietf-
   intserv-svc-template-01.txt>

   [2] H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson.  "RTP:
   A Transport Protocol for Real-Time Applications", Internet Draft,
   March 1995, <draft-ietf-avt-svc-rtp-07.txt>

   [3] S. Shenker, C. Partridge, B. Davie, and L. Breslau.
   "Specification of Predictive Quality of Service", Internet Draft, ??
   1995, <draft-ietf-intserv-predictive-svc-01.txt>


Authors' Address:


   Scott Shenker
   Xerox PARC
   3333 Coyote Hill Road
   Palo Alto, CA  94304-1314
   shenker@parc.xerox.com
   415-812-4840
   415-812-4471 (FAX)

   Craig Partridge
   BBN
   2370 Amherst St
   Palo Alto, CA  94306
   craig@bbn.com

   John Wroclawski
   MIT Laboratory for Computer Science
   545 Technology Sq.
   Cambridge, MA  02139
   jtw@lcs.mit.edu
   617-253-7885
   617-253-2673 (FAX)


Shenker/Partridge/Wroclawski Expires ?/?/95                    [Page 12]

INTERNET-DRAFT draft-ietf-intserv-control-del-svc-02.txt         ?, 1995


Shenker/Partridge/Wroclawski Expires ?/?/95                    [Page 13]