HTTP/1.1 200 OK Date: Tue, 09 Apr 2002 03:21:21 GMT Server: Apache/1.3.20 (Unix) Last-Modified: Wed, 21 Feb 1996 04:34:44 GMT ETag: "2e6d43-7315-312aa0e4" Accept-Ranges: bytes Content-Length: 29461 Connection: close Content-Type: text/plain Internet Engineering Task Force Integrated Services WG INTERNET-DRAFT Shenker/Partridge/Wroclawski draft-ietf-intserv-control-del-svc-02.txt Xerox/BBN/MIT 14 November 1995 Expires: ?/?/96 Specification of Controlled Delay Quality of Service Status of this Memo This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as ``work in progress.'' To learn the current status of any Internet-Draft, please check the ``1id-abstracts.txt'' listing contained in the Internet-Drafts Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or ftp.isi.edu (US West Coast). This document is a product of the Integrated Services working group of the Internet Engineering Task Force. Comments are solicited and should be addressed to the working group's mailing list at int- serv@isi.edu and/or the author(s). This draft reflects changes from the IETF meeting in Stockholm. Abstract This memo describes the network element behavior required to deliver Controlled Delay service in the Internet. Controlled delay service provides three levels of delay control; network elements, when overloaded, are required to control delay by denying service requests. However, there are no quantitative assurances about the absolute level of delay provided. The controlled delay service is designed for service-adaptive and Shenker/Partridge/Wroclawski Expires ?/?/95 [Page 1] INTERNET-DRAFT draft-ietf-intserv-control-del-svc-02.txt ?, 1995 delay-adaptive applications; i.e., applications that are prepared to dynamically adapt to changing packet transmission delays and to dynamically change the level of packet delivery delay control they request from the network when their current level of service is not adequate. The controlled delay service imposes relatively minimal requirements on network components that implement it, and is intended to be usable in situations ranging from small centrally managed private IP networks to the global Internet. This specification follows the service specification template described in [1]. Introduction This document defines the requirements for network elements that support Controlled Delay service. This memo is one of a series of documents that specify the network element behavior required to support various qualities of service in IP internetworks. Services described in these documents are useful both in the global Internet and private IP networks. This document is based on the service specification template given in [1]. Please refer to that document for definitions and additional information about the specification of qualities of service within the IP protocol family. End-to-End Behavior The end-to-end behavior provided by a series of network elements that conform to this document provides three levels of delay control. This service ensures that the levels of experienced delays and losses will be controlled, in that additional service requests will be turned away when the element is overloaded. In particular, the bandwidth available to the flow will be, on average, at least as great as specified in its service request. Criteria for determining when a resource is overloaded are not specified in this definition, but are left to the individual vendor. This service makes no assurances about the absolute levels of delay or jitter the receiving application will experience. However, all three levels of controlled delay service will have average delays that are no worse than best effort service, and the maximal delays should be significantly better than best effort service when there is significant load on the network. Packet losses are rare as long as the offered traffic conforms to the specified traffic characterization (see Invocation Information). This service is subject to admission control. Shenker/Partridge/Wroclawski Expires ?/?/95 [Page 2] INTERNET-DRAFT draft-ietf-intserv-control-del-svc-02.txt ?, 1995 Motivation Controlled delay service is designed for service-adaptive and delay- adaptive applications. These applications are sensitive to packet delivery delay, but are prepared to adapt to dynamically changing delays by varying their playback point. In addition, they may be prepared to change their requested level of service at any time if the current level of service received from the network is not adequate. This flexibility allows such applications to operate successfully and efficiently over a wide range of network conditions. Many applications that transmit interactive data, such as audio and video conferencing sessions, are well suited to operation with the controlled delay service. Applications that desire proven guarantees on packet delivery time, such as real-time control and servoing systems or playback applications that are intolerant of late-arriving packets, are generally not in this category. The end-to-end behavior obtained with controlled delay service provides a middle ground between the employment of adaptive applications in a pure best-effort network and the employment of a network that rigidly controls delay. Strengths of this middle ground are that applications can obtain some load control and delivery preference for their packets while still benefiting from their adaptive behavior; that the service can be usefully deployed in large, unstructured internetworks; and that the specification is amenable to highly efficient implementation and use of network resources. Associated with this service are characterization parameters which describe the current delays experienced in the three services levels. If the characterizations are provided to the endpoints, these will provide some hint about the likely end-to-end delays that might result from requesting a particular level of service. This is intended to aid applications in choosing the appropriate service level. However, this service is still quite usable without these characterizations. Network Element Data Handling Requirements The network element must ensure that the packet loss and delays are controlled. This must be accomplished through active admission control. In particular, overprovisioning is not sufficient to deliver controlled delay service; the element must be able to turn flows away if accepting them would cause the element to have excessive queueing delays. However, no quantitative specification of Shenker/Partridge/Wroclawski Expires ?/?/95 [Page 3] INTERNET-DRAFT draft-ietf-intserv-control-del-svc-02.txt ?, 1995 average, statistical, or maximal delays is required. There are three different logical levels of service. A network element may internally implement fewer (or more) actual levels of service, but must map them into three logical levels at the controlled delay service invocation interface. The levels have different degrees of delay control, with level 1 having the most tightly controlled delay, and level 3 having the least tightly controlled delay. The different levels do not have to give strictly ordered delays for each packet; that is, the network element need not ensure that every packet given level 1 service experiences less delay than if it were given level 2 service. The element need only ensure that the typical delays are no greater in level 1 than in level 2 (and similarly for levels 2 and 3). All three levels of service should be given better service, i.e. more tightly controlled delay, than uncontrolled best effort traffic. The average delays experienced by packets receiving different levels of controlled delay service and best-effort service may not differ significantly. However, the tails of the delay distributions, i.e., the maximum packet delays seen, for the levels of controlled delay service that are implemented and for best-effort service should be significantly different when the network has substantial load. The controlled delay service must maintain a very low level of packet loss. Although packet losses may occur, any substantial loss represents a "failure" of the admission control algorithm. However, vendors may employ admission control algorithms with different levels of conservativeness, resulting in very different levels of loss (varying, for instance, from 1 in 10^4 to 1 in 10^8). The controlled delay service definition does not require any control of short-term packet jitter (variation in network element transit delay between different packets in the flow) beyond the control already exercised on delay. Network element implementors who find it advantageous to do so may use resource scheduling algorithms that exercise some jitter control. Links are not permitted to fragment packets as part of controlled delay service. Packets larger than the MTU of the link must be policed as nonconformant which means that they will be policed according to the rules described in the Policing section below. Invocation Information The controlled delay service is invoked by specifying the traffic Shenker/Partridge/Wroclawski Expires ?/?/95 [Page 4] INTERNET-DRAFT draft-ietf-intserv-control-del-svc-02.txt ?, 1995 (TSpec) and the desired service (RSpec) to the network element. A service request for an existing flow that has a new TSpec and/or RSpec should be treated as a new invocation, in the sense that admission control must be reapplied to the flow. Flows that reduce their TSpec and/or their RSpec (i.e., their new TSpec/RSpec is strictly smaller than the old TSpec/RSpec according to the ordering rules described in the section on Ordering below) should never be denied service. The TSpec takes the form of a token bucket plus a minimum policed unit (m) and a maximum packet size (M). The token bucket has a bucket depth, b, and a bucket rate, r. Both b and r must be positive. The rate, r, is measured in bytes of IP datagrams per second, and can range from 1 byte per second to as large as 40 terabytes per second (or about what is believed to be the maximum theoretical bandwidth of a single strand of fiber). Clearly, particularly for large bandwidths, only the first few digits are significant and so the use of floating point representations, accurate to at least 0.1% is encouraged. The bucket depth, b, is also measured in bytes and can range from 1 byte to 250 gigabytes. Again, floating point representations accurate to at least 0.1% are encouraged. The range of values is intentionally large to allow for the future bandwidths. The range is not intended to imply that a network element must support the entire range. The minimum policed unit, m, is an integer measured in bytes. All IP datagrams less than size m will be counted against the token bucket as being of size m. The maximum packet size, M, is the biggest packet that will conform to the traffic specification; it is also measured in bytes. The flow must be rejected if the requested maximum packet size is larger than the MTU of the link. Both m and M must be positive, and m must be less then or equal to M. The RSpec is a service level. The service level is specified by one of the integers 1, 2, or 3. Implementations should internally choose representations that leave a range of at least 256 service levels undefined, for possible extension in the future. The TSpec can be represented by two floating point numbers in single-precision IEEE floating point format followed by two 32-bit integers in network byte order. The first value is the rate (r), the second value is the bucket size (b), the third is the minimum policed unit (m), and the fourth is the maximum packet size (M). Shenker/Partridge/Wroclawski Expires ?/?/95 [Page 5] INTERNET-DRAFT draft-ietf-intserv-control-del-svc-02.txt ?, 1995 The RSpec may be represented as an unsigned 16-bit integer carried in network byte order. For all IEEE floating point values, the sign bit must be zero. (All values must be positive). Exponents less than 127 (i.e., 0) are prohibited. Exponents greater than 162 (i.e., positive 35) are discouraged. Exported Information Each controlled delay service module exports at least the following information. All of the parameters described below are characterization parameters. For each level of service, the network element exports three measurements of delay (thus making nine quantities in total). Each of these characterization parameters is based on the maximal packet transit delay experienced over some set of previous time intervals of length T; these delays do not include discarded packets. The three time intervals T are 1 second, 60 seconds, and 3600 seconds. The exported parameters are averages over some set of these previous time intervals. There is no requirement that these characterization parameters be based on exact measurements. In particular, these delay measurements can be based on estimates of packet delays or aggregate measurements of queue loading. This looseness is allowed to avoid placing undue burdens on network element designs in which obtaining precise delay measurements is difficult. These delay parameters have an additive composition rule. For each parameter the composition function computes the sum, enabling a setup protocol to deliver the cumulative sum along the path to the end nodes. The delays are measured in units of one microsecond. An individual element can advertise a delay value between 1 and 2**28 (somewhat over two minutes) and the total delay added across all elements can range as high as 2**32-1. Should the sum of the different elements delay exceed 2**32-1, the end-to-end advertised delay should be 2**32-1. Note that while the granularity of measurement is microseconds, a conforming element is free to measure delays more loosely. The minimum requirement is that the element estimate its delay accurately to the nearest 100 microsecond granularity. Elements that can Shenker/Partridge/Wroclawski Expires ?/?/95 [Page 6] INTERNET-DRAFT draft-ietf-intserv-control-del-svc-02.txt ?, 1995 measure more accurately are, of course, encouraged to do so. NOTE: Measuring in milliseconds is not acceptable, because if the minimum delay value is a millisecond, a path with several hops will lead to a composed delay of at least several milliseconds, which is likely to be misleading. The characterization parameters may be represented as a sequence of nine 32-bit unsigned integers in network byte order. The first three integers are the parameters for T=1, T=60 and T=3600 for level 1, the next three integers are for T=1, T=60, T=3600 for level 2, and the last three integers are for T=1, T=60, T=3600 for level 3. The following values are assigned from the characterization parameter namespace. The controlled delay service is service_name 1. The delay characterization parameters receive parameter_number's one through nine, in the order given above. That is, parameter_name definition 1 Service Level = 1, T = 1 2 Service Level = 1, T = 60 3 Service Level = 1, T = 3600 4 Service Level = 2, T = 1 5 Service Level = 2, T = 60 6 Service Level = 2, T = 3600 7 Service Level = 3, T = 1 8 Service Level = 3, T = 60 9 Service Level = 3, T = 3600 The end-to-end composed results are assigned parameter_names N+10, where N is the value of the per-hop name given above. No other exported data is required by this specification. Policing Policing is done at the edge of the network, at all heterogeneous source branch points and at all source merge points. A heterogeneous source branch point is a spot where the multicast distribution tree from a source branches to multiple distinct paths, and the TSpec's of the reservations on the various outgoing links are not all the same. Shenker/Partridge/Wroclawski Expires ?/?/95 [Page 7] INTERNET-DRAFT draft-ietf-intserv-control-del-svc-02.txt ?, 1995 Policing need only be done if the TSpec on the outgoing link is "less than" (in the sense described in the Ordering section) the TSpec reserved on the immediately upstream link. A source merge point is where the multicast distribution trees from two different sources (sharing the same reservation) merge. It is the responsibility of the invoker of the service (a setup protocol, local configuration tool, or similar mechanism) to identify points where policing is required. Policing is allowed at points other than those mentioned above. The token bucket parameters require that traffic must obey the rule that over all time periods, the amount of data sent cannot exceed rT+b, where r and b are the token bucket parameters and T is the length of the time period. For the purposes of this accounting, links must count packets that are smaller than the minimal policing unit to be of size m. Packets that arrive at an element and cause a violation of the the rT+b bound are considered nonconformant. Policing to conformance with this token bucket is done in two different ways. At all policing point, non-conforming packets are treated as best-effort datagrams. [If and when a marking ability becomes available, these nonconformant packets should be ``marked'' as being non-compliant and then treated as best effort packets at all subsequent routers.] Other actions, such as delaying packets until they are compliant, are not allowed. NOTE: The prohibition on delaying packets is open to discussion. It may be better to permit some delaying of a packet if that delay would allow it to pass the policing function. (In other words, to reshape the traffic). The challenge is to define a viable reshaping function. Intuitively, a plausible approach is to allow a delay of (roughly) up to the maximum queueing delay experienced by completely conforming packets before declaring that a packet has failed to pass the policing function. The merit of this approach, and the precise wording of the specification that describes it, require further study. A related issue is that at all network elements, packets bigger than the MTU of the link must be considered nonconformant and should be classified as best effort (and will then either be fragmented or dropped according to the element's handling of best effort traffic). [Again, if marking is available, these reclassified packets should be marked.] Shenker/Partridge/Wroclawski Expires ?/?/95 [Page 8] INTERNET-DRAFT draft-ietf-intserv-control-del-svc-02.txt ?, 1995 Ordering and Merging TSpec's are ordered according to the following rule: TSpec A is a substitute ("as good or better than") for TSpec B if (1) both the token bucket depth and rate for TSpec A are greater than or equal to those of TSpec B, (2) the minimum policed unit m is at least as small for TSpec A as it is for TSpec B, and (3) the maximum packet size M is at least as large for TSpec A as it is for TSpec B. A merged TSpec may be calculated over a set of TSpecs by taking the largest token bucket rate, largest bucket size, smallest minimal policed unit, and largest maximum packet size across all members of the set. This use of the word "merging" is similar to that in the RSVP protocol; a merged TSpec is one that is adequate to describe the traffic from any one of a number of flows. Service request specifications (RSpecs) are ordered by their numerical values (in inverse order); service level 1 is substitutable for service level 2 and 3, and service level 2 is substitutable for service level 3. Guidelines for Implementors It is expected that the service levels implemented at a particular element will offer significantly different levels of delay control. There seems little advantage in offering levels that differ only slightly in the level of delay control. So, while a particular element may offer less than three levels of service, the levels of service it does offer should have notably different queueing delays. NOTE: An additional service currently being considered is the "predictive" service described in [3]. It is expected that if an element offers both predictive service and controlled delay service, that it should not implement both but should use the predictive service as a controlled delay service. This is allowed since (1) the required behavior of predictive service meets all of the requirements of controlled delay service, (2) the invocations are compatible, and (3) the ordering relationships defined in the predictive service specification document are such that a given level of predictive service is at least as good as the same level of controlled delay service. The inter-service mapping with predictive service, mentioned above, is omitted from the "Ordering and Merging" section of this draft of the controlled delay service specification because the exact definition of both services is still under discussion. Should the final definitions include an Shenker/Partridge/Wroclawski Expires ?/?/95 [Page 9] INTERNET-DRAFT draft-ietf-intserv-control-del-svc-02.txt ?, 1995 inter-service mapping function, the Ordering and Merging sections of each document might contain words similar to the following: "In addition, the controlled delay service is related to the predictive service in the sense that a given level of predictive service is considered at least as good as the same level of controlled delay service. See additional comments in the guidelines section." Network elements are permitted to oversubscribe their traffic, where by oversubscribe, we mean that the sum of the token buckets of the controlled delay traffic exceeds the maximum throughput or buffer space of the router. However, given the requirement of low loss, this oversubscribing should only be done in cases where the element is quite sure that actual utilization is far less than the sum of the token buckets would suggest. A more conservative approach is to reject new flows, when the addition of their traffic would cause the sums of the token buckets to exceed the capacity of the network element. Evaluation Criteria Evaluating a network element's implementation of controlled delay service is somewhat difficult, since the quality of service depends on overall traffic load, the traffic pattern presented and the degree of delay control implemented. In this section we sketch out a methodology for testing an element's controlled delay service. The idea is that one chooses a particular traffic mix (for instance, 30 percent level 1, 10 percent level 2, 20 percent level 3 and 40 percent uncontrolled best-effort traffic) and loads the network element with progressively higher amounts of this traffic mix (i.e., 40% of capacity, then 50% of capacity, on beyond 100% capacity). For each load level, one measures the utilization, mean delays, and the packet loss rate for each level of service (including best effort). Each test run at a particular load should involve enough traffic that is a reasonable predictor of the performance a long-lived application such as a video conference would experience (e.g., an hour or more of traffic). This memo does not specify particular traffic mixes to test. However, we expect in the future that as the nature of real-time Internet traffic is better understood, the traffic used in these tests will be chosen to reflect the current and future Internet load. Shenker/Partridge/Wroclawski Expires ?/?/95 [Page 10] INTERNET-DRAFT draft-ietf-intserv-control-del-svc-02.txt ?, 1995 Examples of Implementation A possible implementation of controlled delay service would be to have a queueing mechanism with three priority levels, with level 1 packets being highest priority and level 3 packets being lowest priority. Each controlled delay service level would be associated with a target queue utilization level, say 20% for level 1, 50% for the combination of levels 1 and 2, and 70% for the combination of all three levels. The utilization of the link, by each of the three levels, would be measured over some relatively short time period (say, 5 seconds, or 10000 MTU packet transmission times). A new flow would be admitted to level 1 if the measured usage of level 1, plus the token bucket rate of the new flow, was below the target utilization of level 1. Similarly, a new flow would be admitted to level 2 if the measured usage of levels 1 and 2, plus the token bucket rate of the new flow, was below the target utilization of levels 1 and 2. Examples of Use We give two examples of use, both involving an interactive application. In the first example, we assume that either the receiving application is ignoring characterizations or the network is not delivering the characterizations to the end-nodes. We further assume that the application's data transmission units is timestamped. The receiver, by inspecting the timestamps, can determine the end-to-end delays and react if they are excessive. If so, then the application asks for a better level of service. If the delays are well below the required level, the application can ask for a worse level of service. A protocol useful to applications providing this capability is the proposed IETF Real-Time Transport Protocol [2]. In the second example, we assume that characterization parameters are delivered to the receiving application. The receiver chooses the service level whose characterizations for the maximal delays for all intervals are under the required level after network latencies are considered. If the actual delays during the course of operation are worse than expected, the application can ask for a better level of service. Shenker/Partridge/Wroclawski Expires ?/?/95 [Page 11] INTERNET-DRAFT draft-ietf-intserv-control-del-svc-02.txt ?, 1995 Security Considerations Security considerations are not discussed in this memo. References [1] S. Shenker and J. Wroclawski. "Network Element Service Specification Template", Internet Draft, June 1995, [2] H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson. "RTP: A Transport Protocol for Real-Time Applications", Internet Draft, March 1995, [3] S. Shenker, C. Partridge, B. Davie, and L. Breslau. "Specification of Predictive Quality of Service", Internet Draft, ?? 1995, Authors' Address: Scott Shenker Xerox PARC 3333 Coyote Hill Road Palo Alto, CA 94304-1314 shenker@parc.xerox.com 415-812-4840 415-812-4471 (FAX) Craig Partridge BBN 2370 Amherst St Palo Alto, CA 94306 craig@bbn.com John Wroclawski MIT Laboratory for Computer Science 545 Technology Sq. Cambridge, MA 02139 jtw@lcs.mit.edu 617-253-7885 617-253-2673 (FAX) Shenker/Partridge/Wroclawski Expires ?/?/95 [Page 12] INTERNET-DRAFT draft-ietf-intserv-control-del-svc-02.txt ?, 1995 Shenker/Partridge/Wroclawski Expires ?/?/95 [Page 13]