Internet Draft Anna Charny, ed. Cisco Systems Fred Baker Cisco Systems Jon Bennett Riverdelta Networks Kent Benson Tellabs Jean-Yves Le Boudec EPFL Angela Chiu AT&T Labs William Courtney TRW Bruce Davie Cisco Systems Shahram Davari PMC-Sierra Victor Firoiu Nortel Networks Charles Kalmanek AT&T Research K.K. Ramakrishnan TeraOptic Networks Dimitrios Stiliadis Lucent Technologies Expires August, 2001 draft-ietf-diffserv-ef-supplemental-00.txt February 2001 Supplemental Information for the New Definition of the EF PHB Status of this Memo This document is an Internet Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet Drafts are working documents of the Internet Engineering Task Force (IETF), its Areas, and its Working Groups. Note that other groups may also distribute working documents as Internet Drafts. Internet Drafts are draft documents valid for a maximum of six months. Internet Drafts may be updated, replaced, or obsoleted by Anna Charny, ed. INTERNET-DRAFT [Page 1] Information for the EF PHB Expires: July 1 2001 other documents at any time. It is not appropriate to use Internet Drafts as reference material or to cite them other than as a "working draft" or "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. To learn the current status of any Internet-Draft, please check the "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow Directories on ftp.ietf.org (US East Coast), nic.nordu.net (Europe), ftp.isi.edu (US West Coast), or munnari.oz.au (Pacific Rim). This document is a product of the Diffserv working group of the Internet Engineering Task Force. This document was written during the process of clarification of RFC2598 [10] that led to the publication of [6]. It is published as part of the historical record of the IETF's Differentiated Services working group. Please address comments to the group's mailing list at diffserv@ietf.org, with a copy to the authors. Copyright (C) The Internet Society (1999). All Rights Reserved. Abstract This document is intended to supplement [6]. Its primary motivation is providing additional explanation to the revised EF definition and its properties. The document also provides additional implementation examples and gives some guidance for computation of the numerical parameters of the new definition for several well known schedulers and router architectures. Contents 1. Introduction .............................................3 2. Definition of EF PHB .....................................3 2.1 The formal definition ....................................4 2.2 The case of an ideal output-buffered device with an EF FIFO at the output ....................................6 2.3 The General case .........................................7 2.3.1 The colorblind definition ..............................7 2.3.2 Packet reordering with the colorblind definition .......8 2.4 The packet-identity-aware definition .....................8 3 Per Packet delay ...........................................9 3.1 Single hop delay bound ...................................9 3.2 Multi-hop worst case delay ...............................9 Anna Charny, ed. INTERNET-DRAFT [Page 2] Information for the EF PHB Expires: July 1 2001 4 Packet loss ...............................................10 5 Implementation considerations .............................11 5.1 The output buffered model with aggregate queuing at the output............................................12 5.1.1 Strict Non-preemptive Priority Queue ..................12 5.1.2 WF2Q ..................................................12 5.1.3 Deficit Round Robin (DRR)..............................12 5.1.4 Start-Time Fair Queuing (SFQ) and Self-Clocked Fair Queuing ..........................................12 5.2 A Router with Variable Internal Delay and Aggregate Scheduling at the output...............................12 6 Security Considerations ...................................13 7 References.................................................13 8 Appendix. Difficulties with the RFC 2598 Definition .......14 8.1 Perfectly-Clocked Forwarding.............................15 8.2 Router Internal Delay ...................................15 8.3 Maximum Configurable Rate and Provisioning Efficiency....16 8.4 The Non-trivial Nature of the Difficulties ..............17 9 Authors'addresses .........................................18 10 Full Copyright ...........................................20 Specification of Requirements The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [3]. 1 Introduction The Expedited Forwarding (EF) Per-Hop Behavior (PHB) was designed to be used to build a low-loss, low-latency, low-jitter, assured bandwidth service. The potential benefits of this service, and therefore the EF PHB, are enormous. Because of the great value of this PHB, it is critical that the forwarding behavior required of and delivered by an EF-compliant node be specific, quantifiable, and unambiguous. Unfortunately, the definition of EF PHB in the original RFC2598 [10] was not sufficiently precise (see Appendix and [4]). A more precise definition is given in [6]. This document is intended to aid in the understanding of the properties of the new definition and provide supplemental information not included in the text of [6] for sake of brevity. The document is outlined as follows. In section 2, we briefly restate the definition for EF PHB of [6]. We then provide some additional discussion of this definition and describe some of its properties. We discuss the issues associated with per-packet delay and loss in sections 3 and 4. In section 5 we discuss the impact of known scheduling architectures on the critical parameters of the new definition. We also discuss the impact of deviation of real devices from the ideal output-buffered model on the magnitude of the critical parameters in the definition. 2 Definition of EF PHB Anna Charny, ed. INTERNET-DRAFT [Page 3] Information for the EF PHB Expires: July 1 2001 2.1 The formal definition An intuitive explanation of the new EF definition is described in [6]. Here we restate the formal definition from [6] verbatim. A node that supports EF on an interface I at some configured rate R MUST satisfy the following equations: d_j <= f_j + E_a (eq_1) where f_j is defined iteratively by f_0 = 0, d_0 = 0 f_j = max(a_j, min(d_j-1, f_j-1)) + l_j/R, for all j > 0 (eq_2) In this definition: - d_j is the time that the last bit of the j-th EF packet to depart actually leaves the node from the interface I. - f_j is the target departure time for the j-th EF packet to depart from I, the "ideal" time at or before which the last bit of that packet should leave the node. - a_j is the time that the last bit of the j-th EF packet destined to the output I to arrive actually arrives at the node. - l_j is the size (bits) of the j-th EF packet to depart from I. l_j is measured on the IP datagram (IP header plus payload) and does not include any lower layer (e.g. MAC layer) overhead. - R is the EF configured rate at output I (in bits/second). - E_a is the error term for the treatment of the EF aggregate. Note that E_a represents the worst case deviation between actual departure time of an EF packet and ideal departure time of the same packet, i.e. E_a provides an upper bound on (d_j - f_j) for all j. - d_0 and f_0 do not refer to a real packet departure but are used purely for the purposes of the recursion. The time origin should be chosen such that no EF packets are in the system at time 0. An EF-compliant node MUST be able to be characterized by the range of possible R values that it can support on each of its interfaces while conforming to these equations, and the value of E_a that can be met on each interface. R may be line rate or less. E_a MAY be specified as a worst-case value for all possible R values or MAY be expressed as a function of R. Anna Charny, ed. INTERNET-DRAFT [Page 4] Information for the EF PHB Expires: July 1 2001 Note also that, since a node may have multiple inputs and complex internal scheduling, the jth packet to arrive may not be the jth packet to depart. It is in this sense that eq_1 and eq_2 are colorblind with regard to packet identity. In addition, a node that supports EF on an interface I at some configured rate R MUST satisfy the following equations: D_j <= F_j + E_p (eq_3) where F_j is defined iteratively by F_0 = 0, D_0 = 0 F_j = max(A_j, min(D_j-1, F_j-1)) + L_j/R, for all j > 0 (eq_4) In this definition: - D_j is actual the departure time of the individual EF packet that arrived at time A_j, i.e., given a packet which was the j-th EF packet destined for I to arrive at the node via any input, D_j is the time at which the last bit of that individual packet actually leaves the node from the interface I. - F_j is the target departure time for the individual EF packet which arrived at time A_j. - A_j is the time that the last bit of the j-th EF packet destined to the output I to arrive actually arrives at the node. - L_j is the size (bits) of the j-th EF packet to arrive at the node that is destined to output I. L_j is measured on the IP datagram (IP header plus payload) and does not include any lower layer (e.g. MAC layer) overhead. - R is the EF configured rate at output I (in bits/second). - E_p is the error term for the treatment of individual EF packets. Note that E_p represents the worst case deviation between actual departure time of an EF packet and ideal departure time of the same packet, i.e. E_p provides an upper bound on (D_j - F_j) for all j. - D_0 and F_0 do not refer to a real packet departure but are used purely for the purposes of the recursion. The time origin should be chosen such that no EF packets are in the system at time 0. It is the fact that D_j and F_j refer to departure times for the jth packet to arrive that makes eq_3 and eq_4 aware of packet identity. This is the critical distinction between the last two equations and the first two. Anna Charny, ed. INTERNET-DRAFT [Page 5] Information for the EF PHB Expires: July 1 2001 An EF-compliant node SHOULD be able to be characterized by the range of possible R values that it can support on each of its interfaces while conforming to these equations, and the value of E_p that can be met on each interface. E_p MAY be specified as a worst-case value for all possible R values or MAY be expressed as a function of R. An E_p value of "undefined" MAY be specified. 2.2 The case of an ideal output-buffered device with an EF FIFO at the output For an ideal output-buffered device with a FIFO for EF packets at the output and no internal delay, the i-th packet to arrive to the device is also the i-th packet to depart from the device. Therefore, in this ideal model the colorblind and packet-identity-aware characteristics are identical, and E_a = E_p. In this section we therefore omit the subscript and refer to the latency term simply as E. It could be shown that for such an ideal device the definition of section 2 is stronger than the well-known rate-latency curve [2] in the sense that if a scheduler satisfies the EF definition it also satisfies the rate-latency curve. As a result, all the properties known for the rate-latency curve also apply to the modified EF definition. However, we argue below that the definition of section 2.1 is more suitable to reflect the intent of EF PHB than the rate-latency curve. It is shown in [2] that the rate-latency curve is equivalent to the following definition: Definition of Rate Latency Curve (RLC): D(j) <= F'(j) + E (eq_5) where F'(0)=0, F'(j)=max(a(j), F'(j-1))+ L(j)/R for all j>0 (eq_6) It can be easily verified that the EF definition of section 2.1 is stronger than RLC by noticing that for all j, F'(j) >= F(j). It is easy to see that F'(j) in the definition RLC corresponds to the time the j-th departure should have occurred should the EF aggregate be constantly served exactly at its configured rate R. Following the common convention, we refer to F'(j) as the "fluid finish time" of the j-th packet to depart. The intuitive meaning of the rate-latency curve of RLC is that any packet is served at most time E later than this packet would finish service in the fluid model. For RLC (and hence for the stronger EF definition) it holds that in any interval (0,t) the EF aggregate gets close to the desired service rate R (as long as there is enough traffic to Anna Charny, ed. INTERNET-DRAFT [Page 6] Information for the EF PHB Expires: July 1 2001 sustain this rate). The discrepancy between the ideal and the actual service in this interval depends on the latency term E, which in turn depends on the scheduling implementation. The smaller E, the smaller the difference between the configured rate and the actual rate achieved by the scheduler. While RLC guarantees the desired rate to the EF aggregate in all intervals (0,t) to within a specified error, it may nevertheless result in large gaps in service. For example, suppose that (a large number) N of identical EF packets of length L arrived from different interfaces to the EF queue in the absence of any non-EF traffic. Then any work-conserving scheduler will serve all N packets at link speed. When the last packet is sent at time NL/C, where C is the capacity of output link, F(N) will be equal to NL/R. That is, the scheduler is running ahead of ideal, since NL/C < NL/R for R < C. Suppose now that at time NL/C a large number of non-EF packets arrive, followed by a single EF packet. Then the scheduler can legitimately delay starting to send the EF packet until time F(N+1)=(N+1)L/R + E - L/C. This means that the EF aggregate will have no service at all in the interval (NL/C, (N+1)L/R + E - L/C). This interval can be quite large if R is substantially smaller than C. In essence, the EF aggregate can be "punished" by a gap in service for receiving faster service than its configured rate at the beginning. The new EF definition alleviates this problem by introducing the term min(D(j-1), F(j-1)) in the recursion. Essentially, this means that the fluid finishing time is "reset" if that packet is sent before its "ideal" departure time. As a consequence of that, for the case where the EF aggregate is served in the FIFO order, suppose a packet arrives at time t to a server satisfying the EF definition. The packet will be transmitted no later than time t + Q(t)/R + E, where Q(t) is the EF queue size at time t (including the packet under discussion). This statement is proved in [4]. 2.3 The General case In a more general case, where either the output scheduler does not serve the EF packets in a FIFO order, or the variable internal delay in the device reorders packets while delivering them to the output (or both), the i-th packet destined to a given output interface to arrive to the device may no longer be the i-th packet to depart from that interface. In that case the packet-identity-aware and the colorblind definitions are no longer identical. 2.3.1 The colorblind definition The colorblind definition can be viewed as a truly aggregate characteristic of the service provided to EF packets. For an analogy consider a dark reservoir to which all arriving packets are placed. A scheduler is allowed to pick a packet from the reservoir in a random order, without any knowledge of the order of packet arrivals. The colorblind part of the definition measures the accuracy of the output Anna Charny, ed. INTERNET-DRAFT [Page 7] Information for the EF PHB Expires: July 1 2001 rate provided to the EF aggregate as a whole. The smaller E_a, the more accurate is the assurance that the reservoir is drained at least at the configured rate. 2.3.2 Packet reordering with the colorblind definition Note that in this reservoir analogy packets of EF aggregate may be arbitrarily reordered. However, the definition of EF PHB given in [6] explicitly requires that no packet reordering occur within a microflow. This requirement restricts the scheduling implementations, or, in the reservoir analogy, the order of pulling packets out of the reservoir to make sure that packets within a microflow are not reordered, but it still allows reordering at the aggregate level. Note that reordering within the aggregate, as long as there is no flow-level reordering, does not necessarily reflect a "bad" service. Consider for example a scheduler that arbitrates among 10 different EF "flows" with diverse rates. A scheduler that is aware of the rate requirements may choose to send a packet of the faster flow before a packet of the slower flow to maintain lower jitter at the flow level. In particular, an ideal "flow"-aware WFQ scheduler will cause reordering within the aggregate, while maintaining packet ordering and small jitter at the flow level. It is intuitively clear that for such a scheduler, as well as for a simpler FIFO scheduler, the "accuracy" of the service rate is crucial for minimizing "flow"-level jitter. The packet-identity-aware definition quantifies this accuracy of the service rate. A small value of E_a is a necessary (although not sufficient) condition for maintaining small per-flow jitter. However, the small value of E_a does not give any assurances about the absolute value of per-packet delay. In fact, if the input rate exceeds the configured rate, the colorblind definition may result in arbitrarily large delay of a subset of packets. This is the primary motivation for the packet-identity-aware definition. 2.4 The packet-identity-aware definition The primary goal of the packet-aware characterization of the EF implementation is that, unlike the colorblind characterization, it provides a way to find a per-packet delay bound as a function of input traffic parameters. While the colorblind definition characterizes the accuracy of the service rate of the entire EF aggregate, the packet-identity-aware part of the definition characterizes the deviation of the device from an ideal server that serves the EF aggregate in FIFO order at least at the configured rate. The value of E_p in the packet-identity-aware definition is therefore affected by two factors: the accuracy of the aggregate rate service and the degree of packet reordering within the EF aggregate (under the Anna Charny, ed. INTERNET-DRAFT [Page 8] Information for the EF PHB Expires: July 1 2001 constraint that packets within the same microflow are not reordered). Therefore, a sub-aggregate aware device that provides an ideal service rate to the aggregate, and also provides an ideal rate service for each of the sub-aggregates, may nevertheless have a very large value of E_p (in this case E_p must be at least equal to the ratio of the maximum packet size divided by the smallest rate of any sub aggregate). As a result, a large value of E_p does not necessarily mean that the service provided to EF aggregate is bad - rather it may be an indication that the service is good, but non-FIFO. On the other hand, a large value of E_p may also mean that the aggregate service is very inaccurate (bursty), and hence in this case the large value of E_p reflects a poor quality of implementation. As a result, a large number of E_p does not necessarily provide any guidance on the quality of the EF implementation. However, a small value of E_p does indicate a high quality FIFO implementation. Since E_p and E_a relate to different aspects of the EF implementation, they should be considered together to determine the quality of the implementation. 3. Per Packet delay The primary motivation for the packet-identity-aware definition is that it allows to quantify the per-packet delay bound. This section discusses the issues with computing per-packet delay 3.1 Single hop delay bound If the total traffic arriving to an output port I from all inputs is constrained by a leaky bucket with parameters (R, B), where R is the configured rate at I, and B is the bucket depth (burst), then the delay of any packet departing from I is bounded by D_p, given by D_p = B/R + E_p (eq_7) Because the delay bound depends on the configured rate R and the input burstiness B, it is desirable for both of these parameters to be visible to a user of the device. A PDB desiring a particular delay bound may need to limit the range of configured rates and allowed burstiness that it can support in order to deliver such bound. Equation (eq_7) provides a means for determining an acceptable operating region for the device with a given E_p. It may also be useful to limit the total offered load to a given output to some rate R_1 < R (e.g. to obtain end-to-end delay bounds [5]). It is important to realize that, while R_1 may also be a configurable parameter of the device, the delay bound in (eq_7) does not depend on it. It may be possible to get better bounds explicitly using the bound on the input rate, but the bound (eq_7) does not take advantage of this information. 3.2 Multi-hop worst case delay Although the PHB defines inherently local behavior, in this section we briefly discuss the issue of per-packet delay as the packet traverses Anna Charny, ed. INTERNET-DRAFT [Page 9] Information for the EF PHB Expires: July 1 2001 several hops implementing EF PHB. Given a delay bound (eq_7) at a single hop, it is tempting to conclude that per-packet bound across h hops is simply h times the bound (eq_7). However, this is not necessarily the case, unless B represents the worst case input burstiness across all nodes in the network. However, obtaining such a worst case value of B is not trivial. If EF PHB is implemented using aggregate class-based scheduling where all EF packets share a single FIFO, the so-called effect of jitter accumulation may result in an increase in burstiness from hop to hop. In particular, it can be shown that unless severe restrictions on EF utilization are imposed, even if all EF flows are ideally shaped at the ingress, then for any value of delay D it is possible to construct a network where EF utilization on any link is bounded not to exceed a given factor, no flow traverses more than a specified number of hops, but there exists a packet that experiences a delay more than D [5]. This result implies that the ability to limit the worst case burstiness and the resulting end-to-end delay across several hops may require not only limiting EF utilization on all links, but also constraining the global network topology. Such topology constraints would need to be specified in the definition of any PDB built on top of EF PHB, if such PDB requires a strict worst case delay bound. 4. Packet loss Any device with finite buffering may need to drop packets if the input burstiness becomes sufficiently high. To meet the low loss objective of EF, a node may be characterized by the operating region in which loss of EF due to congestion will not occur. This may be specified as a token bucket of rate r <= R and burst size B that can be offered from all inputs to a given output interface without loss. However, as discussed in the previous section, the phenomenon of jitter accumulation makes it generally difficult to guarantee that the input burstiness never exceeds the specified operating region. A no-loss guarantee across multiple hops may require specification of constraints on network topology which are outside the scope of inherently local definition of a PHB. Thus, it must be possible to establish whether a device conforms to the EF definition even when some packets are lost. This can be done by performing an "off-line" test of conformance to equations (eq_1)- (eq_4). After observing a sequence of packets entering and leaving the node, the packets which did not leave are assumed lost and are notionally removed from the input stream. The remaining packets now constitute the arrival stream and the packets which left the node constitute the departure stream. Conformance to the equations can thus be verified by considering only those packets that successfully passed through the node.. Note that in the event that loss does occur, the specification of which packets are lost is beyond the scope of the definition of EF PHB. However, those packets that were not lost must conform to the equations definition of EF PHB in section 2.1. Anna Charny, ed. INTERNET-DRAFT [Page 10] Information for the EF PHB Expires: July 1 2001 5. Implementation considerations A packet passing through a router will experience delay for a number of reasons. Two familiar components of this delay are the time the packet sits in a buffer at an outgoing link waiting for the scheduler to select it and the time it takes to actually transmit the packet on the outgoing line. There may be other components of a packet's delay through a router, however. A router might have to do some amount of header processing before the packet can be given to the correct output scheduler, for example. In another case a router may have a FIFO buffer (called a transmission queue in [7]) where the packet sits after being selected by the output scheduler but before it is transmitted. In cases such as these, the extra delay a packet may experience can be accounted for by absorbing it into the latency terms E_a and E_p. Implementing EF on a router with a multi-stage switch fabric requires special attention. A packet may experience additional delays due to the fact that it must compete with other traffic for forwarding resources at multiple contention points in the switch core. The delay an EF packet may experience before it even reaches the output-link scheduler should be included in the latency term. Input-buffered and input/output-buffered routers based on crossbar design may also require modification of their latency terms. The factors such as the speedup factor and the choice of crossbar arbitration algorithms may affect the latency terms substantially. Delay in the switch core comes from two sources, both of which must be considered. The first part of this delay is the fixed delay a packet experiences regardless of the other traffic. This component of the delay includes the time it takes for things such as packet segmentation and reassembly in cell based cores, enqueueing and dequeueing at each stage, and transmission between stages. The second part of the switch core delay is variable and depends on the type and amount of other traffic traversing the core. This delay comes about if the stages in the core mix traffic flowing between different input/output port pairs. Thus, EF packets must compete against other traffic for forwarding resources in the core. Some of this competing traffic may even be EF traffic from other aggregates. This introduces extra delay, that can also be absorbed by the latency term in the definition. To capture these considerations, in this section we will consider two simplified implementation examples. The first is an ideal output buffered node where packets entering the device from an input interface are immediately delivered to the output scheduler. In this model the properties of the output scheduler fully define the values of the parameters E_a and E_p. We will consider the case where the output scheduler implements aggregate class-based queueing, so that all EF packets share a single queue. We will discuss the values of E_a and E_p for a variety of class-based schedulers widely considered Anna Charny, ed. INTERNET-DRAFT [Page 11] Information for the EF PHB Expires: July 1 2001 acceptable for EF implementations. The second example will consider a router modeled as a black box with a known bound on the variable delay a packet can experience from the time it arrives to an input to the time it is delivered to its destination output. The output scheduler in isolation is assumed to be an aggregate scheduler with a known value of E_a(S)=E_p(S)=E(S). This model provides a reasonable abstraction to a large class of router implementations. 5.1. The output buffered model with aggregate queuing at the output. As has been mentioned earlier, in this model E_a = E_p, so we shall omit the subscript and refer to both terms as latency E. The remainder of this subsection discusses E for a number of scheduling implementations. 5.1.1 Strict Non-preemptive Priority Queue A Strict Priority scheduler in which all EF packets share a single FIFO queue which is served at strict non-preemptive priority over other queues satisfies the EF definition with the latency term E = MTU/C where MTU is the maximum packet size and C is the speed of output link. 5.1.2 WF2Q Another scheduler that satisfies the EF definition with a small latency term is WF2Q described in [1]. A class-based WF2Q scheduler, in which all EF traffic shares a single queue with the weight corresponding to the configured rate of the EF aggregate satisfies the EF definition with the latency term E = MTU/C+MTU/R. 5.1.3.Deficit Round Robin (DRR) For DRR [12], both E can be shown to grow linearly with N*(r_max/r_min)*MTU, where r_min and r_max denote the smallest and the largest rate among the rate assignments of all queues in the scheduler, and N is the number of queues in the scheduler 5.1.4. Start-Time Fair Queuing (SFQ) and Self-Clocked Fair Queuing (SCFQ) For SFQ [9] and SCFQ [8] E can be shown to grow linearly with the number of queues in the scheduler. 5.2. A Router with Variable Internal Delay and Aggregate Scheduling at the output. In this section we consider a router which is modeled as follows. A packet entering the router may experience a variable delay D_v with a known upper bound D. That is, 0<=D_v C/2, then this interval would be of length greater than MTU/R, and more than MTU EF bits would have to be served during this interval for the router to be compliant. Thus, R must be no greater than C/2. It can be shown that for schedulers other than PQ, such as various Anna Charny, ed. INTERNET-DRAFT [Page 16] Information for the EF PHB Expires: July 1 2001 implementations of WFQ, the maximum compliant configured rate may be much smaller than 50%. For example, for SCFQ [8] the maximum configured rate cannot exceed C/N, where N is the number of queues in the scheduler. For WRR, mentioned as compliant in section 2.2 of RFC 2598, this limitation is even more severe. This is because in these schedulers a packet arriving to an empty EF queue may be forced to wait until one packet from each other queue (in the case of SCFQ) or until several packets from each other queue (in the case of WRR) are served before it will finally be forwarded. While it is frequently assumed that the configured rate of EF traffic will be substantially smaller than the link bandwidth, the requirement that this rate should never exceed 50% of the link bandwidth appears unnecessarily limiting. For example, in a fully connected mesh network, where any flow traverses a single link on its way from source to its destination there seems no compelling reason to limit the amount of EF traffic to 50% (or an even smaller percentage for some schedulers) of the link bandwidth. Another, perhaps even more striking example is the fact that even a TDM circuit with dedicated slots cannot be configured to forward EF packets at more than 50% of the link speed without violating RFC 2598 (unless the entire link is configured for EF). If the configured rate of EF traffic is greater than 50% (but less than the link speed), there will always exist an interval longer than MTU/R in which less than the configured rate is achieved. For example, suppose the configured rate of the EF aggregate is 2C/3. Then the forwarding pattern of the TDM circuit might be E E x E E x E E x ... |---| where only one packet is served in the marked interval of length 2T = 2MTU/C. But at least 4/3 MTU would have to be served during this interval by a router in compliance with the definition in RFC 2598. The fact that even a TDM line cannot be booked over 50% by EF traffic indicates that the restriction is artificial and unnecessary. 8.4 The Non-trivial Nature of the Difficulties One possibility to correct the problems discussed in the previous sections might be to attempt to clarify the definition of the intervals to which the definition applied or by averaging over multiple intervals. However, an attempt to do so meets with considerable analytical and implementation difficulties. For example, attempting to align interval start times with some epochs of the forwarded stream appears to require a certain degree of global clock synchronization and is fraught with the risk of misinterpretation and mistake in practice. Another approach might be to allow averaging of the rates over some larger time scale. However, it is unclear exactly what finite time scale would suffice in all reasonable cases. Furthermore, this Anna Charny, ed. INTERNET-DRAFT [Page 17] Information for the EF PHB Expires: July 1 2001 approach would compromise the notion of very short-term time scale guarantees that are the essence of EF PHB. We also explored a combination of two simple fixes. The first is the addition of the condition that the only intervals subject to the definition are those that fall inside a period during which the EF aggregate is continuously backlogged in the router (i.e., when an EF packet is in the router). The second is the addition of an error (latency) term that could serve as a figure-of-merit in the advertising of EF services. With the addition of these two changes the candidate definition becomes as follows: In any interval of time (t1, t2) in which EF traffic is continuously backlogged, at least R(t2 - t1 - E) bits of EF traffic must be served, where R is the configured rate for the EF aggregate and E is an implementation-specific latency term. The "continuously backlogged" condition eliminates the insufficient- packets-to-forward difficulty while the addition of the latency term of size MTU/C resolves the perfectly-clocked forwarding example (section 1.2.1), and also removes the limitation on EF configured rate. However, neither fix (nor the two of them together) resolves the example of section 1.2.2. To see this, recall that in the example of section 1.2.2 the EF aggregate is continuously backlogged, but the service rate of the EF aggregate is consistently smaller than the configured rate, and therefore no finite latency term will suffice to bring the example into conformance. 9. Authors' addresses Anna Charny, ed. Cisco Systems 300 Apollo Drive Chelmsford, MA 01824 acharny@cisco.edu Fred Baker Cisco Systems 170 West Tasman Dr. San Jose, CA 95134 fred@cisco.com Jon Bennett RiverDelta Networks 3 Highwood Drive East Tewksbury, MA 01876 jcrb@riverdelta.com Kent Benson Anna Charny, ed. INTERNET-DRAFT [Page 18] Information for the EF PHB Expires: July 1 2001 Tellabs Research Center 3740 Edison Lake Parkway #101 Mishawaka, IN 46545 Kent.Benson@tellabs.com Jean-Yves Le Boudec ICA-EPFL, INN Ecublens, CH-1015 Lausanne-EPFL, Switzerland leboudec@epfl.c Angela Chiu AT&T Labs 100 Schulz Dr. Rm 4-204 Red Bank, NJ 07701 alchiu@att.com Bill Courtney TRW Bldg. 201/3702 One Space Park Redondo Beach, CA 90278 bill.courtney@trw.com Shahram Davari PMC-Sierra Inc 555 Legget drive Suit 834, Tower B Ottawa, ON K2K 2X3, Canada shahram_davari@pmc-sierra.com Bruce Davie Cisco Systems 300 Apollo Drive Chelmsford, MA 01824 bsd@cisco.com Victor Firoiu Nortel Networks 600 Tech Park Billerica, MA 01821 vfirou@nortelnetworks.com Charles Kalmanek AT&T Labs-Research 180 Park Avenue, Room A113, Florham Park NJ crk@research.att.com. K.K. Ramakrishnan AT&T Labs-Research Rm. A155, 180 Park Ave, Florham Park, NJ 07932 Anna Charny, ed. INTERNET-DRAFT [Page 19] Information for the EF PHB Expires: July 1 2001 kkrama@research.att.com Dimitrios Stiliadis Lucent Technologies 1380 Rodick Road Markham, Ontario, L3R-4G5, Canada stiliadi@bell-labs.com 10. Full Copyright Copyright (C) The Internet Society 1999. All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose ofdeveloping Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Anna Charny, ed. INTERNET-DRAFT [Page 20]