Network Working Group A. Charny Internet Draft Cisco Document: draft-charny-ef-definition-01.txt Nov 2000 EF PHB Redefined Internet Draft Anna Charny, ed. Cisco Systems Fred Baker Cisco Systems Jon Bennett Riverdelta Networks Kent Benson Tellabs Jean-Yves Le Boudec EPFL Angela Chiu AT&T Labs William Courtney TRW Bruce Davie Cisco Systems Shahram Davari PMC-Sierra Victor Firoiu Nortel Networks Charles Kalmanek AT&T Research K.K. Ramakrishnan AT&T Research Dimitrios Stiliadis Lucent Technologies Expires May 2001 draft-charny-ef-definition-01.txt November 2000 EF PHB Redefined Charny May 2000 1 EF PHB Redefined Nov 2000 Status of this Memo This document is an Internet Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet Drafts are working documents of the Internet Engineering Task Force (IETF), its Areas, and its Working Groups. Note that other groups may also distribute working documents as Internet Drafts. Internet Drafts are draft documents valid for a maximum of six months. Internet Drafts may be updated, replaced, or obsoleted by other documents at any time. It is not appropriate to use Internet Drafts as reference material or to cite them other than as a "working draft" or "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. To learn the current status of any Internet-Draft, please check the "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow Directories on ftp.ietf.org (US East Coast), nic.nordu.net (Europe), ftp.isi.edu (US West Coast), or munnari.oz.au (Pacific Rim). This document is a product of the Diffserv working group of the Internet Engineering Task Force. Please address comments to the group's mailing list at diffserv@ietf.org, with a copy to the authors. Copyright (C) The Internet Society (1999). All Rights Reserved. Abstract This document proposes text aiming at providing clarification to RFC 2598. The primary motivation for this draft is to clarify the definition of EF PHB given in RFC 2598. This draft gives a rigorous definition of EF PHB which in our opinion preserves the spirit of the EF PHB as intended by RFC 2598 while allowing a number of reasonable compliant implementations. 1 Introduction The Expedited Forwarding (EF) Per-Hop Behavior (PHB) of RFC 2598 was designed to be used to build a low-loss, low-latency, low-jitter, assured bandwidth service. The potential benefits of this service, and therefore the EF PHB, are enormous. Because of the great value of this PHB, it is critical that the forwarding behavior required of Charny May 2000 2 EF PHB Redefined Nov 2000 and delivered by an EF-compliant node be specific, quantifiable, and unambiguous. The underlying intuition behind the EF PHB, as defined in RFC 2598, stems from the fact that delay and jitter are typically small in a lightly loaded network. The EF PHB, as defined in RFC 2598, effectively defines a building block for creating a "virtual unloaded network" for EF traffic. It achieves this goal by requiring that the service rate of the EF aggregate at any link be equal to or exceeds the input rate of EF traffic at any link (under the assumption that the network is appropriately provisioned and that EF traffic is shaped/policed at the network ingress). Conceptually, the configured rate of the EF aggregate can be viewed as the "link speed" in this "virtual network". While specifying this "link speed" is not by itself sufficient to provide strict delay or jitter guarantees in a general network, nevertheless knowing this "link speed", or the minimal guaranteed drain rate of EF traffic, is essential for the ability to construct quantifiable end-to-end behavior across a Diffserv domain. Thus, the definition of EF PHB in RFC 2598 is indeed a necessary building block for constructing quantifiable PDBs. Unfortunately however, we believe that the actual definition contained in section 2 of RFC 2598 is not sufficiently precise. As a result, many of the forwarding behaviors which are intuitively reasonable do not actually comply with the formal definition of RFC 2598. Furthermore, many of the schedulers believed to deliver EF-compliant behavior cannot be used to implement the formal definition of EF since they result in forwarding treatment which does not comply with the definition of RFC 2598. A detailed discussion of the issues we find with the definition of RFC 2598 is given in Appendix A. The goal of this draft is to give a precise mathematical definition that describes the notion of ensuring a guaranteed service rate for an EF aggregate at a small timescale, thus presenting a formal framework for constructing an "unloaded virtual network" for EF traffic. Different PDBs may be constructed from this basic building block by imposing various restrictions on the network topology, configuration parameters, scheduling disciplines, etc. These mechanisms are outside the scope of this draft. 2 Definition of EF PHB 2.1 The Formal Definition 2.1.1 Intuition behind the definition The intent of EF PHB is to provide the EF aggregate with its configured service rate (or better) over as small a timescale as Charny May 2000 3 EF PHB Redefined Nov 2000 possible. We formalize this notion by introducing what we call a "packet scale rate guarantee". The intuitive meaning of the packet scale rate guarantee is that as long as there are EF packets in the node, we would like the j-th EF packet of length L(j) to depart no later than L(j)/R seconds after the (j-1)st departed (here R is the configured rate of the aggregate). (L(j)/R is simply the time that it would take to forward the j-th packet at the EF-configured rate R.) Were this always to occur, the EF packets would be forwarded perfectly at the configured rate. However, real world schedulers and router architectures introduce various degrees of distortion in the perfect forwarding sequence. Furthermore, it is clear that packets may not possibly be forwarded at the configured rate if they arrive slower than at this rate. The formal definition must account for these issues. In essence, the packet scale rate guarantee is defined in terms of an upper bound on the deviation of the actual departure time of the j-th packet of EF aggregate from the "ideal" departure time at configured rate R. The "ideal" departure time is computed iteratively. Essentially, when there are multiple EF packets in the device, the ideal time of the j-th departure is simply the ideal time of previous departure plus L(j)/R, where L(j) is the length of the j-th packet to depart. In the case when an EF packet arrives to a device when all the previous packets have already departed, the computation of the ideal departure time is somewhat more complicated. There are two cases to be considered in this case. If the previous, j-1-th departure occurred after its own ideal departure time, then the new ideal departure time should be L(j)/R plus the larger of the j-1-th ideal departure time and the j-th arrival time. This is the case when the EF aggregate is behind its ideal service rate at the time of the j-1-th departure. However, if the previous departure occurred before its ideal departure time, which corresponds to the case when the EF aggregate has been served faster than at its configured rate at by the time of the j-1-th departure, then the new departure time is computed as L(j)/R plus the larger of the j-th arrival time and the time of the actual (rather than the ideal)j-1-th departure. This is needed to avoid "punishing" the newly arrived EF packet by delaying it longer due to some other packets receiving service faster than at the configured rate R in the past. More discussion of this issue can be found in appendices A and E. 2.1.2 The Formal Definition Formally, we say that a node provides EF service if it forwards packets in compliance with the following definition: Definition of Packet Scale Rate Guarantee (DEF_1) ----------------------------------------- Charny May 2000 4 EF PHB Redefined Nov 2000 A node offers the EF aggregate a "packet scale rate guarantee R with latency E" at some output interface I if for all j > 0, d(j), the time of departure of the j-th EF packet to depart from the interface I, satisfies the following condition: d(j) <= F(j) + E (eq_1) where F(j) is defined iteratively by F(0)=0, d(0) = 0 F(j)=max(a(j), min(d(j-1), F(j-1)))+ L(j)/R, for all j>0 (eq_2) and E is a constant tolerance (or error) term for the node (given in seconds). In this definition, d(j) is the time that the last bit of the j-th EF packet to depart actually leaves the node from the interface I. F(j) is the target departure (finishing) time for the j-th EF packet to depart from I, the "ideal" time that the last bit of that packet should leave the node. a(j) is the time that the last bit of the j-th EF packet destined to the output I to arrive actually arrives at the node. L(j) is the size (bits) of the j-th EF packet to depart from I. R is the EF configured rate at I (in bits/second) Note that the sequences a(j), d(j) and F(j) relate to packets that leave a given output interface, in this case interface I, but may arrive from any input interface. Every OUTPUT interface, I,J,K,etc has its own sequence of a(j)'s, d(j)'s and F(j)'s, i.e. a_I(j), a_J(j), a_K(j), etc, for clarity we omit the subscript since it can be inferred. The choice of indexes does not restrict when in the actual packet stream we start the observation of the arrival and departure of EF packets, except that the observation must start when there are no EF packets in the node for this output interface. (Otherwise, we would not have the values of a(j) for the EF packets already in the node.) Note also that while index j=1 corresponds to the first packet in the observation, index j=0 does not correspond to any packet at all and is used solely to start the recursion. The latency term E in (eq_1) quantifies the maximum distortion from the ideal service at the configured rate R that a particular device Charny May 2000 5 EF PHB Redefined Nov 2000 can introduce. As a result, the term E in (eq_1) can be viewed as a "figure of merit" and can be used to compare different implementations of EF PHB. NOTE: The latency term E may be declared on a per output link basis. NOTE: Since the declaration of a fixed value of E may for some schedulers restrict the range of the configured rate R, the value of E may be declared as a function of the configured rate R. Note that nothing in the definition implies that a(j) and d(j) necessarily refer to the same packet. This lack of direct correspondence between a(j) and d(j) is deliberate, and relates to the goal of accommodating a wide range of schedulers and router architectures. Even in the case of a priority FIFO implementation at the output interface, the presence of variable internal delay may result in reordering of the EF packets arriving from different input interfaces, causing the j-th EF packet arriving to a router not being the same packet as the j-th EF packet departing from the router. Likewise, the j-th arriving packet may not necessarily be the j-th departing packet in "flow-aware schedulers" which have the ability to differentiate between different sub-flows within the EF aggregate. An example of such a scheduler might be a hierarchical scheduler which serves the EF aggregate as a whole at the highest priority, but uses some WFQ implementation to choose a packet of a particular sub-stream of EF (e.g. a given "virtual wire" circuit) within the EF aggregate. Further discussion of interpretation of this definition can be found in Appendix A. 2.1.3. Example usage of the definition. We now show an example of how the definition can be applied to an abstract router. The figures below describe a sequence of packets arriving to a router, and their departure times. Nothing is known about the internals of the router, and the arrival and departure times represent the only externally observable information. All packets shown in the examples are destined to a single output interface. For the sake of an example, we assume that the output interface in question has a configured rate R=C/2, where C is the output line rate, and that the router declares the error term E=4*(MTU/C) at this interface. In each figure, time increases as we move to the right. Units of time are MTU/C, the time it takes to forward an MTU-sized packet at the output line rate C. For simplicity, all packets are MTU-sized. The first figure below shows a sequence of arriving EF packets (labeled A, B, etc., using upper-case letters). The placement of the letter corresponds to the time when the last bit of the packet arrives at the router. Note that there is some degree of burstiness Charny May 2000 6 EF PHB Redefined Nov 2000 in the input pattern: packets A and B arrive back-to-back, and packets D and E arrive back-to-back as well. Packets E and F arrive simultaneously on different input interfaces. t ---> 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 A B C D E F G The next figure below shows a forwarding behavior that conforms to the definition proposed in this draft. On each line, the packet letter (A, B, etc.) shows the time that the last bit of the packet is forwarded. I.e., the upper-case letters give the values of d() for the sequence of packets. The terms 'f' and 'f+e' in a row give the ideal departure time, F(), and the latest permissible departure time, F() + E, respectively, for the packet on that row. Thus, F(A) = 2 and F(A) + E = 6, as given on the first row of the body of the table. Similarly, F(B) = 3 and F(B) + E = 7, as given on the second row of the body of the table. Hence, any uppercase letter which is placed to the left of the time corresponding to f+e on the line corresponds to a conformant departure. Calculations using equations (eq_1) and (eq_2) are given after the figure to show how the values of 'f' and 'f+e' were obtained. Some comments follow the calculations. t ---> 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 A f f+e Bf f+e f C f+e f D f+e f E f+e f F f+e f G f+e a(0) = d(0) = 0. a(A) = 0 F(A) = max(0, min(0, 0)) + 2 = 2, d(A) must be <= 2 + 4 = 6 d(A) = 1 <= 6 a(B) = 1 F(B) = max(1, min(1, 2)) + 2 = 3, d(B) must be <= 3 + 4 = 7 d(B) = 3 <= 7 a(C) = 3 Charny May 2000 7 EF PHB Redefined Nov 2000 F(C) = max(3, min(3, 3)) + 2 = 5, d(C) must be <= 5 + 4 = 9 d(C) = 6 <= 9 a(D) = 5 F(D) = max(5, min(6, 5)) + 2 = 7, d(D) must be <= 7 + 4 = 11 d(D) = 10 <= 11 a(E) = 6 F(E) = max(6, min(10, 7)) + 2 = 9, d(E) must be <= 9 + 4 = 13 d(E) = 11 <= 13 a(F) = 6 F(F) = max(6, min(11, 9)) + 2 = 11, d(F) must be <= 11 + 4 = 15 d(F) = 12 a(G) = 9 F(G) = max(9, min(12, 11)) + 2 = 13, d(G) must be <= 13 + 4 = 17 d(G) = 14 <= 17 The key to understanding the calculations is to notice that whenever a packet P is forwarded earlier than its ideal departure time,F(P), the calculation of the next packet's ideal departure time uses P's actual departure time. Whenever a packet P is forwarded later than its ideal departure time, the calculation of the next packet's ideal departure time uses P's ideal departure time. Thus, slippage is not allowed to accumulate when packets are forwarded late, and credit is not built up when packets are forwarded early. The next figure below shows another forwarding behavior for the same arrival pattern. This behavior does not conform to this draft's proposed definition. t ---> 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 A f f+e Bf f+e C f f+e Df f+e f E f+e f f+e F f Gf+e Here, A and C are forwarded early, while B and D are forwarded at their ideal departure times. Note, however, that these ideal departure times are earlier than they would have been if A and C had not been forwarded early. (Note also, that there is no accumulation of credit for the early departures of A and C. If such an accumulation of credit were permitted, jitter could be increased if a subsequent packet is delayed a long time while the credit is spent.) Thus, the ideal forwarding time for the fourth packet, D, is at time 6, even though the EF-configured rate is one packet every two time intervals. Packet E is forwarded late, although still within tolerance. Packet F's ideal forwarding time is 10, but it is Charny May 2000 8 EF PHB Redefined Nov 2000 not forwarded until time 15, one time unit later than its latest permissible forwarding time. This very late departure makes the behavior non-conformant. Packet G is forwarded in conformance with the definition, but just barely. These examples illustrate how conformance to the proposed definition can be verified without any knowledge of the internal router architecture or scheduling implementation. Of course, while this knowledge is not necessary to determine the conformance with a given declared E, the designer of the box must use this knowledge to be able to declare E for the device. 2.2 Per Packet Delay. It is important to note that just as with the definition of EF PHB in RFC 2598, the packet scale rate guarantee is defined only in the context of an entire EF aggregate as a whole. In particular, the packet-scale rate guarantee definition is intentionally silent about exactly how various sub-streams of the EF aggregate are scheduled within the EF aggregate. A consequence of this is that packet scale rate guarantee provided to EF aggregate does not by itself imply a per-packet delay. This is analogous to the fact that the mere knowledge of link rates in a real network serving just a single class of traffic does not in itself provide per-packet delay guarantee. The per packet delay guarantee at a hop with a FIFO service in such network will differ drastically from a per-packet delay guarantee at a hop with a WFQ server. Aside from the knowledge of the properties of scheduling implementations, ensuring per-packet delay at a hop involves the ability to bound burstiness at the ingress of a hop. This is a complex task involving a fair amount of global knowledge such as the network topology, hop count, link utilization, upstream scheduling implementations, etc. As a result, addressing these issues appears appropriate not in the context of a local PHB definition, but rather in the context of a PDB, which is inherently a global concept. 3. Implementation considerations. The packet scale rate guarantee definition does not mandate a particular underlying queuing or scheduling mechanism. However, for the definition to be meaningful, it is important to make sure that there exist at least some schedulers that strictly satisfy the definition with reasonable latency terms. It can be shown that the strict priority scheduler in which all EF packets share a single FIFO queue (which is served at strict non- preemptive priority over other queues) satisfies the definition with the latency term E = MTU/C where MTU is the maximum packet size and C is the speed of output link. Charny May 2000 9 EF PHB Redefined Nov 2000 Another scheduler that satisfies the definition with a small latency term is WF2Q described in [BZ96a]. A class-based WF2Q scheduler, in which all EF traffic shares a single queue with the weight corresponding to the configured rate of the EF aggregate can be shown to satisfy the definition with the latency term E = MTU/C+MTU/R. The proofs that PQ and WF2Q satisfy the packet-scale rate guarantee definition with the above latency terms are given in Appendix C. The definition also allows a wide range of scheduling algorithms, but different algorithms result in different degrees of deviation from the "ideal" service rate. The degree of the accuracy with which a scheduler can ensure that the EF aggregate receives its configured rate at a small (packet) timescale is expressed by the E term of the scheduler. A list of several well-known schedulers and their corresponding error terms can be found in Appendix D. 5. Security Considerations This draft makes the PHB definition in [RFC2598] more rigorous, but adds no new functions to it. As a result, it adds no security issues to those described in that specification. 6. Appendices Appendix A: Issues with the RFC 2598 PHB Definition There are several potentially serious problems with having a formal EF definition that does not match people's intuitive understanding. First, the understanding of what it means for a node to be EF- compliant may vary among people. This discrepancy may arise due to the fact that two people's intuitive understanding of the definition may actually differ somewhat; also, someone learning about EF from the formal definition may develop an understanding of EF at odds with the understanding that most people currently familiar with EF have. These discrepancies in people's understanding of EF may have serious consequences. The resulting confusion may increase the time and cost needed to develop equipment, cause interoperability problems, and create mismatches between expected node and network performance and actual performance. Second, the lack of a clear conformance definition makes it impossible to test a piece of equipment and declare it "conforming" or "non-conforming." Third, the lack of a mathematically precise description of a node's behavior makes it impossible to analytically design or evaluate services constructed using the EF PHB or other PHBs that must contend for resources with EF traffic. Fourth, an incorrect formal definition of EF may lead to erroneous reasoning about the properties of networks implementing EF. A.1 The RFC 2598 Definition of EF PHB and Its Intuitive Meaning The definition of the EF PHB as given in [RFC 2598] states: Charny May 2000 10 EF PHB Redefined Nov 2000 It [the EF PHB departure rate] SHOULD average at least the configured rate when measured over any time interval equal to or longer than the time it takes to send an output link MTU sized packet at the configured rate. The intuitive content of this definition is fairly clear. On all time scales ranging down to very small time scales, the EF aggregate should be given at least its configured share of the output link bandwidth. Among other things, this allows EF to support applications that are delay- and jitter-sensitive. However, intuition alone will not allow vendors to design compliant schedulers capable of advertising their EF configuration to other routers. As we show in the next section, the simplicity of the definition is misleading in the sense that it does not actually capture the intuition correctly under a number of circumstances. A note is due here on the precise interpretation of the wording of the definition. A potential cause of ambiguity is the fact that the definition contains the word SHOULD which according to [Bra97] means that in principle an implementation of EF PHB may under some circumstances choose not to be strictly compliant with the specified requirement, in which case any issues with the strict definition may be viewed as irrelevant. However, it seems that in order for the SHOULD to be meaningful, there should exist at least some implementations which are strictly compliant, even if non-compliant implementations may be chosen under some circumstances. Furthermore, the Virtual Wire behavior aggregate [JNP2000] is defined by replacing SHOULD by MUST in the definition of EF PHB in RFC 2598. Therefore, in all cases the exact mathematical properties of the EF definition and the existence of strictly compliant implementations are of substantial interest. The remainder of this section concentrates on the discussion of these issues in detail. A.2 Particular Difficulties with the RFC 2598 EF PHB Definition A literal interpretation of the definition would consider the behaviors given in the next two subsections as non-compliant. The definition also unnecessarily constrains the maximum configurable rate of an EF aggregate. A.2.1 Perfectly-Clocked Forwarding Consider the following stream forwarded from a router with EF- configured rate R=C/2, where C is the output line rate. In the illustration, E is an MTU-sized EF packet while x is a non-EF packet or unused capacity, also of size MTU. ... E x E x E x E x E x E x... |-----| Charny May 2000 11 EF PHB Redefined Nov 2000 The interval between the vertical bars is 3*MTU/C, which is greater than MTU/(C/2), and so is subject to the EF PHB definition. During this interval, 3*MTU/2 bits of the EF aggregate should be forwarded, but only MTU bits are forwarded. Therefore, while this forwarding pattern should be considered compliant under any reasonable interpretation of the EF PHB, it actually does not formally comply with the definition of RFC 2598. Note that this forwarding pattern can occur in any work-conserving scheduler in an ideal output-buffered architecture where EF packets arrive in a perfectly clocked manner according to the above pattern and are forwarded according to exactly the same pattern in the absence of any non-EF traffic. Trivial as this example may be, it reveals the lack of mathematical precision in the formal definition. The fact that no work-conserving scheduler can formally comply with the definition is unfortunate, and appears to warrant some changes to the definition that would correct this problem. The underlying reason for the problem described here is quite simple - one can only expect that the EF aggregate is served at configured rate in some interval where there is enough backlog of EF packets to sustain that rate. In the example above the packets come in exactly at the rate at which they are served, and so there is no persistent backlog. Certainly, if the input rate is even smaller than the configured rate of the EF aggregate, there will be no backlog as well, and a similar formal difficulty will occur. A seemingly simple solution to this difficulty might be to require that the EF aggregate is served at its configured rate only when the queue is backlogged. However, as we show in the remainder of this section, this solution does not suffice. A.2.2 Router Internal Delay We now argue that the example considered in the previous section is not as trivial as it may seem at first glance. Consider a router with EF configured rate R = C/2 as in the previous example, but with an internal delay of 3T (where T = MTU/C) between the time that a packet arrives at the router and the time that it is first eligible for forwarding at the output link. Such things as header processing, route look-up, and delay in switching through a multi-layer fabric could cause this delay. Now suppose that EF traffic arrives regularly at a rate of (2/3)R = C/3. The router will perform as shown below. EF Packet Number 1 2 3 4 5 6 ... Arrival (at router) 0 3T 6T 9T 12T 15T ... Arrival (at scheduler) 3T 6T 9T 12T 15T 18T ... Departure 4T 7T 10T 13T 16T 19T ... Charny May 2000 12 EF PHB Redefined Nov 2000 Again, the output does not satisfy the RFC 2598 definition of EF PHB. As in the previous example, the underlying reason for this problem is that the scheduler cannot forward EF traffic faster than it arrives. However, it can be easily seen that the existence of internal delay causes one packet to be inside the router at all times. An external observer will rightfully conclude that the number of EF packets that arrived to the router is always at least one greater than the number of EF packets that left the router, and therefore the EF aggregate is constantly backlogged. However, while the EF aggregate is continuously backlogged, the observed output rate is nevertheless strictly less that the configured rate. This example indicates that the simple addition of the condition that EF aggregate must receive its configured rate only when the EF aggregate is backlogged does not suffice in this case. Yet, the problem described here is of fundamental importance in practice. Most routers have a certain amount of internal delay. A vendor declaring EF compliance is not expected to simultaneously declare the details of the internals of the router. Therefore, the existence of internal delay may cause a perfectly reasonable EF implementation to display seemingly non-conformant behavior, which is clearly undesirable. A.2.3 Maximum Configurable Rate and Provisioning Efficiency It is well understood that with any non-preemptive scheduler, the compliant configurable rate for an EF aggregate cannot exceed C/2 [JNP2000]. This is because an MTU-sized EF packet may arrive to an empty queue at time t just as an MTU-sized non-EF packet begins service. The maximum number of EF bits that could be forwarded during the interval [t, t + 2*MTU/C] is MTU. But if configured rate R > C/2, then this interval would be of length greater than MTU/R, and more than MTU EF bits would have to be served during this interval for the router to be compliant. Thus, R must be no greater than C/2. It can be shown that for schedulers other than PQ, such as various implementations of WFQ, the maximum compliant configured rate may be much smaller than 50%. For example, for SCFQ [Gol94] the maximum configured rate cannot exceed C/N, where N is the number of queues in the scheduler. For WRR, mentioned as compliant in section 2.2 of RFC 2598, this limitation is even more severe. This is because in these schedulers a packet arriving to an empty EF queue may be forced to wait until one packet from each other queue (in the case of SCFQ) or until several packets from each other queue (in the case of WRR) are served before it will finally be forwarded. While it is frequently assumed that the configured rate of EF traffic will be substantially smaller than the link bandwidth, the bandwidth appears unnecessarily limiting. For example, in a fully connected mesh network, where any flow traverses a single link on its way from source to its destination there seems no compelling Charny May 2000 13 EF PHB Redefined Nov 2000 reason to limit the amount of EF traffic to 50% (or an even smaller percentage for some schedulers) of the link bandwidth. Another, perhaps even more striking example is the fact that even a TDM circuit with dedicated slots cannot be configured to forward EF packets at more than 50% of the link speed without violating RFC 2598 (unless the entire link is configured for EF). If the configured rate of EF traffic is greater than 50% (but less than the link speed), there will always exist an interval longer than MTU/R in which less than the configured rate is achieved. For example, suppose the configured rate of the EF aggregate is 2C/3. Then the forwarding pattern of the TDM circuit might be E E x E E x E E x ... |---| where only one packet is served in the marked interval of length 2T = 2MTU/C. But at least 4/3 MTU would have to be served during this interval by a router in compliance with the definition in RFC 2598. The fact that even a TDM line cannot be booked over 50% by EF traffic indicates that the restriction is artificial and unnecessary. A.3 The Non-trivial Nature of the Difficulties One possibility to correct the problems discussed in the previous sections might be to attempt to clarify the definition of the intervals to which the definition applied or by averaging over multiple intervals. However, an attempt to do so meets with considerable analytical and implementation difficulties. For example, attempting to align interval start times with some epochs of the forwarded stream appears to require a certain degree of global clock synchronization and is fraught with the risk of misinterpretation and mistake in practice. Another approach might be to allow averaging of the rates over some larger time scale. However, it is unclear exactly what finite time scale would suffice in all reasonable cases. Furthermore, this approach would compromise the notion of very short-term time scale guarantees that are the essence of EF PHB. We also explored a combination of two simple fixes. The first is the addition of the condition that the only intervals subject to the definition are those that fall inside a period during which the EF aggregate is continuously backlogged in the router (i.e., when an EF packet is in the router). The second is the addition of an error (latency) term that could serve as a figure-of-merit in the advertising of EF services. With the addition of these two changes the candidate definition becomes as follows: Charny May 2000 14 EF PHB Redefined Nov 2000 In any interval of time (t1, t2) in which EF traffic is continuously backlogged, at least R(t2 - t1 - E) bits of EF traffic must be served, where R is the configured rate for the EF aggregate and E is an implementation-specific latency term. The "continuously backlogged" condition eliminates the insufficient- packets-to-forward difficulty while the addition of the latency term of size MTU/C resolves the perfectly-clocked forwarding example (section A.2.1), and also removes the limitation on EF configured rate. However, neither fix (nor the two of them together) resolves the example of section A.2.2. To see this, recall that in the example of section A.2.2 the EF aggregate is continuously backlogged, but the service rate of the EF aggregate is consistently smaller than the configured rate, and therefore no finite latency term will suffice to bring the example into conformance. This appears to be a serious problem. Therefore, we believe that such modification, albeit attractive in its simplicity, falls short of addressing all the problems identified with the definition of the RFC 2598. Appendix B: Further Interpretation of the Packet Scale Rate Guarantee Definition The intuitive meaning of the packet scale rate guarantee is that as long as there are EF packets in the node, we would like the j-th EF packet to depart L(j)/R seconds after the (j-1)st departed. (L(j)/R is the time that it would take to forward the j-th packet at the EF- configured rate R.) Were this always to occur, the EF packets would be forwarded perfectly. The rest of the definition is a concession to the extreme unlikelihood that perfect forwarding can occur. Perhaps the simplest way to understand the definition is to dissect it and examine its various pieces. Consider the term min(d(j-1), F(j-1)). This term exists to ensure that the node is not given "credit" for faster-than-configured service and is not forgiven for slower-than-configured service. Suppose that this term was replaced with d(j-1) or with F(j-1). Replacing min(d(j-1), F(j-1)) with d(j-1) would permit the node to give the EF aggregate a consistently lower rate of service than the configured rate whenever E > 0. To see this, suppose that we make the replacement, that all packets have size MTU, and that a(j) <= d(j-1). (This last condition means that the node is continuously backlogged with EF packets over the time interval under discussion.) Then, using the revised definition, we would have F(j) = d(j-1) + MTU/R d(j) <= F(j) + E = d(j-1) + MTU/R + E Charny May 2000 15 EF PHB Redefined Nov 2000 which would imply [d(j) - d(j-1)] <= MTU/R + E This last inequality says that the node would be permitted to send an MTU-sized packet every (MTU/R)+E seconds. If E > 0, this rate would be consistently slower than R and is clearly not acceptable EF PHB. Replacing min(d(j-1), F(j-1)) with F(j-1) would award the node "credit" for faster-than-configured service. It would be possible for the node to accumulate this credit by forwarding several EF packets in a row, each earlier than required. The node could then redeem this credit by delaying the next EF packet until all the credit plus the normal inter-packet interval was consumed. To see this, suppose we make the replacement, that all packets have size MTU, and that a(j) <= F(j-1). (This last condition means that the next EF packet arrives before the previous packet was scheduled to depart.) Then, using this revised definition, we would have F'(j) = F'(j-1) + MTU/R and d(j) <= F'(j) + E Suppose that we have a node with negligible internal delay, that its output line rate is C = 3R, and that it forwards n EF packets back- to-back. We would have F'(1) = MTU/R; d(1) = MTU/C F'(2) = F'(1) + MTU/R = 2MTU/R; d(2) = 2MTU/C ... F'(n) = F'(n-1) + MTU/R = nMTU/R; d(n) = nMTU/C By the time the n-th EF packet is forwarded, the node has accumulated credit amounting to n(MTU/R - MTU/C). Using the example assumption that C = 3R, the node has accumulated credit equal to (2n/3)MTU/R. The (n+1)th EF packet need not be forwarded until (2n/3)MTU/R + E seconds have elapsed from the time that the n-th packet was forwarded. Depending upon the actual values of n and R (which may be much less than 1/3 the output line rate), a sizeable amount of jitter between the n-th and (n-1)th EF packets would be produced. These two alternative definitions illustrate the role of the min(d(j-1), F(j-1)) term - to ensure that the node forwards EF packets at at least the configured rate over both large and small time scales. The a(j) term and the maximum operator are included for purely technical reasons. First, their presence says that the node does not have to forward an EF packet that has not yet arrived. Absurd as such a notion may be, without the term and the operator, the definition would formally insist that EF packets continue to be forwarded even when there are none to be forwarded. Charny May 2000 16 EF PHB Redefined Nov 2000 If this were the only purpose for including the a(j) term and the maximum operator, it would be much clearer to simply add the condition that the definition applies only when the node has backlogged EF packets. However, there is a second reason why the definition is written as it is - the possibility that the node has non-negligible internal delay between the input and the output. Such things as header processing, route look-up, and delay in switching through a multi-layer fabric could cause this delay. The set-up of an example to illustrate this role of the a(j) term and the maximum operator is a bit more lengthy than it was for the previous examples. Consider a node with an EF-configured rate of R = C/2. Let T = MTU/C, the time it takes to forward an MTU-sized packet at the output line rate. Suppose that MTU-sized EF packets arrive at the node regularly at a rate of (2/3)R = C/3. Suppose also that the node has an internal delay of 3T. Even if there is no other traffic, the node will perform no better than is shown below. EF Packet Number 1 2 3 4 5 6 ... Arrival at router (a(j)) 0 3T 6T 9T 12T 15T ... Arrival (at scheduler) 3T 6T 9T 12T 15T 18T ... Departure (d(j)) 4T 7T 10T 13T 16T 19T ... Note that from time 0 onward, EF packets are backlogged in the node. If the a(j) term and the maximum operator are removed from the definition, then we would have F'(j) = min(d(j-1), F'(j-1)) + MTU/R d(j) <= F'(j) + E Working through the recursions with F' representing the modified target finishing time function and F representing the original definition given in equations (1) and (2), we have EF Packet Number 1 2 3 4 5 6 ... Arrival at router (a(j)) 0 3T 6T 9T 12T 15T ... Departure (d(j)) 4T 7T 10T 13T 16T 19T ... Modified (F'(j)) 2T 4T 6T 8T 10T 12T ... Original (F(j)) 2T 5T 8T 11T 14T 17T ... The modified F' falls behind the departure times at a constant rate. No fixed tolerance term, E, would be large enough to ensure that the node's behavior was compliant. On the other hand, the original F has every packet being forwarded late, but always late by the same amount, 2T. Setting E >= 2T allows the node to conform to EF PHB. Note that this node cannot possibly perform any better than has been depicted in this example. It cannot begin forwarding EF packets until 3T after they arrive. As in this example, as link speeds increase, we may well discover that internal delays become multiples of the time it takes to transmit a packet. Thus, it is important that the definition of EF rigorously address acceptable behavior in the presence of internal delay. Charny May 2000 17 EF PHB Redefined Nov 2000 This last example leads to a consideration of the role of the tolerance term, E. It happens that E must be greater than 0 for almost every real-world node that would provide EF PHB. We have already seen that we need E > 0 for a node that has internal delay, even if there is no non-EF traffic. Another easy example where E > 0 is required, is a non-preemptive node offering an EF- configured rate R > C/2. Suppose, for example, that R = 0.75C. With such a node, it is always possible that an EF packet will arrive at a node (at time 0) just as that node is beginning to serve a non-EF packet. Assuming that the EF packet and the non-EF packet are the same size (say, MTU-sized), the EF packet will have to wait at least until the non-EF packet is forwarded before it can begin to be served. That is, the earliest that the EF packet can be forwarded is 2MTU/C. Yet, for the EF packet, F(1) = 0 + MTU/(0.75C) = (4/3)MTU/C. If E were 0, then d(1) <= (4/3)MTU/C. But, this is impossible. Thus, without the tolerance term E > 0, the node could not be configured for this EF-configured rate, even if it serves EF using priority queuing with EF as the highest priority. In Appendix D, we consider the situations of other scheduling disciplines for EF service, such as weighted round-robin, weighted fair queuing, and other commonly-used schedulers. All of these schedulers require an E > 0, even if their internal delay is negligible. Rather than excluding nodes employing these schedulers from ever being able to offer EF service, we included the tolerance term E in the definition of the packet scale rate guarantee. It is possible that nodes can use this term as a figure of merit when advertising their capability to provide EF PHB. It is also important to note that the tolerance E does not permit a node to persistently forward EF packets at less than the configured rate. By including E in the d(j) <= F(j) + E inequality rather than in the recursion that defines F(j), the worst that can happen is that forwarding is shifted forward in time by at most E. That is, the E term allows a fixed delay for the forwarding of the entire EF stream, but it does not allow the rate of forwarding to be less than the EF-configured rate. Putting yet another way, it is a difference, but not a differential. Appendix C: Proofs of Satisfiability of the Packet Scale Rate Guarantee Definition for PQ and WF2Q C.1 Satisfiability of the Packet Scale Rate Guarantee Definition for PQ In this section, we prove that a priority queuing (PQ) scheduler satisfies the EF redefinition using the latency term E = MTU/C. Statement C1. ============ Charny May 2000 18 EF PHB Redefined Nov 2000 PQ satisfies the redefinition (equations (eq_1) and (eq_2) of section 2.1) with E=MTU/C. Proof of C1. Consider any busy period of the EF queue. Let k=1 correspond to the first packet in that busy period and assume that a(1) >=0. We prove by induction that for all k >=1 in this busy period d(k) <= F(k)+MTU/C (eq_c1_1) This would immediately imply Statement C1. Base case. For k=1, F(1) = max(a(1), min(d(0), F(0)) + L(1)/R >= a(1) + L(1)/R >= a(1)+L(1)/C (eq_c1_2) and d(1) <= a(1) + MTU/C + L(1)/C <= F(1) + MTU/C where the first inequality follows from the fact that the first packet in a PQ may wait at most for one largest packet transmission before its own transmission begins, and the second inequality follows from (eq_c1_2). Inductive step. Note that since EF has the highest priority, for k > 1 in the busy period of the EF queue d(k) = d(k-1) + L(k)/C (eq_c1_3) Now from the induction hypothesis F(k-1) >= d(k-1) - MTU/C And the definition (eq_2) of section 2.1 of F(k) gives F(k) >= max(a(k), min(d(k-1), d(k-1) - MTU/C))+ L(k)/C = max(a(k), d(k-1)- MTU/C)+ L(k)/C (eq_c1_4) It follows immediately from (eq_c1_4) that F(k) >= d(k-1)- MTU/C + L(k)/C Combining with (eq_c1_3) demonstrates (eq_c1_1) and completes the inductive step. C.2 Satsifiability of the Packet Scale Rate Guarantee Definition for WF2Q Charny May 2000 19 EF PHB Redefined Nov 2000 In this section, we prove that a worst-case fair weighted fair queuing (WF2Q) scheduler satisfies the EF redefinition using the latency term E = MTU/C + MTU/R. The proof begins with a helping theorem that brings us most of the way to the conclusion. Statement C2. ============ If a scheduler satisfies the condition G(i) - E1 <= d(i) <= G(i) + E2 (eq_c2_1) where G(i) is the i-th finishing time of the reference fluid scheduler, then it satisfies the redefinition in section 2.1 with latency term E <= E1 + E2 Proof of Statement C2. ---------------------- To prove Statement C2 we will prove that for all i >= 0 F(i) >= G(i) - E1 (eq_c2_2) where F(i) is the set of finish times recursively defined by (eq_2) of section 2.1. If (eq_c2_2) is proven, then from (eq_c2_1) and (eq_c2_2) d(i) <= G(i) + E2 <= F(i) + E1 + E2, which means that the scheduler satisfies the redefinition with the latency term E = E1 + E2. Proof of (eq_c2_2). ----------------- First note that in the reference GPS system, packet i starts its service at time max ( a(i), G(i-1)) and receives a service rate at least equal to R. Thus G(i) <= max ( a(i), G(i-1)) + L(i)/R (eq_c2_3) Now the proof of (eq_c2_2) proceeds by induction. Base case F(0)=0, G(0) = 0, so (eq_c2_2) trivially holds for i=0. Inductive step. Suppose (eq_c2_2) holds for all j=0,1...i-1, (i>=1) We have both F(i-1) >= G(i-1) - E1 and d(i-1) >= G(i-1) - E1, thus min (F(i-1), d(i-1)) >= G(i-1) - E1 (eq_c2_4) Charny May 2000 20 EF PHB Redefined Nov 2000 Combining this with equation (eq_2) of section 2.1, we obtain F(i) >= G(i-1) - E1 + L(i)/R (eq_c2_5) Again from equation (eq_2) we have F(i) >= a(i)+ L(i)/R >= a(i) - E1 + L(i)/R (eq_c2_6) Combining (eq_c2_5), (eq_c2_6) and (eq_c2_3) gives F(i) >= G(i)-E1, which completes the proof of (eq_c2_2) and statement C2. Statement C3. ============= WF2Q satisfies the redefinition (equations (eq_1) and (eq_2) of section 2.1) with E = MTU/C + MTU/R Proof of C3. ------------ It follows from the results of [BZ96a] that the departures in WF2Q satisfy the condition max(G(i-1), a(i))<= d(i) <= G(i) + MTU/C From Equation (eq_c2_3) this implies that d(i) >= G(i) - L(i)/R >= G(i) - MTU/R Therefore (eq_c2_1) in Statement C2 holds with E1=MTU/R and E2 = MTU/C. Therefore, by Statement C2, WF2Q satisfies the redefinition with E=MTU/C + MTU/R. Appendix D: Implementation Considerations - Values of the Latency Term for Various Schedulers D.1 General queuing and scheduling considerations. The redefinition of EF given in section 2.1 does not mandate a particular underlying queuing structure. While it can be implemented using aggregate queuing, where all packets of the EF aggregate share a single queue, it also allows finer queuing granularity, where EF packets may be assigned to a number of different queues. Likewise, the redefinition allows in principle a wide range of schedulingalgorithms, ranging from a strict priority scheduling of aggregate EF queue, to hierarchical scheduling with per-flow queuing as described in section D.4 below. Both the queuing structure and the scheduling algorithm have a significant impact on the delay and jitter which can be provided to Charny May 2000 21 EF PHB Redefined Nov 2000 the packets of the EF aggregate. It is typically more difficult to provide strict deterministic end-to-end delay and/or jitter guarantees if aggregate queuing is implemented [CLeB2000]. However, implementing and scheduling a large number of queues at high speeds presents a significant engineering challenge, while aggregate scheduling is very attractive due to its simplicity and scalability. D.2 Aggregate Queuing and Scheduling Accuracy for FIFO Service of the EF Aggregate It can be shown that if all packets in the EF aggregate share a single FIFO queue served by a scheduler satisfying the rate-latency service curve, then end-to-end delay and jitter guarantees depend on the latency term E of the scheduler [CLeB2000]. The smaller the latency term, the better the delay and jitter bounds that can be provided. In that respect, a strict priority queuing implementation which has a very small latency term is a natural candidate for implementing EF PHB. Various implementations of Weighted Fair Queuing-like schedulers are also possible candidates for such implementation, but the delay and jitter characteristics of these schedulers differ substantially depending on the accuracy of the implementation. A widely used way of evaluating the accuracy of rate-based scheduling implementations is to compare the output of the scheduler with the so-called "fluid model" [Par92]. In this framework, a given scheduler S and the reference fluid scheduler are subject to the same arrival patterns. The accuracy of the scheduler S can be determined by how close the time of the i-th departure in the scheduler S is to the corresponding departure time in the fluid scheduler. More precisely, if d(i) is the time of the i-th departure under some scheduler S, and G(i) is the time of the i-th departure in the reference fluid scheduler, then the accuracy of S may be determined by two latency terms E1 and E2 such that for all i G(i)-E1 <= d(i)<= G(i) + E2 While the term E2 determines the maximum per-hop delay bound, E1 has an effect on the jitter at the output of the scheduler. For example, as shown in [BZ96a], for WF2Q, E1 = MTU/R, E2= MTU/C, and for PGPS [Par92] E2 = MTU/C as well, while E1 is linear in the number of queues in the scheduler. It is demonstrated in [BZ96a] that while WF2Q and PGPS have the same delay bounds, PGPS may result in substantially burstier departure patterns. In general, it can be shown that if a scheduler satisfies DEF_2, then it also satisfies the redefinition with the latency term E <= E1 + E2. The proof of this statement is given in Appendix C. Note that E1+E2 is not necessarily a tight latency bound, and for a given scheduler a tighter bound may be obtained. That is, the fact that a given scheduler has a large E1+E2 does not necessarily mean that is has a large E. Charny May 2000 22 EF PHB Redefined Nov 2000 D.3 Additional examples of efficient WFQ-Like Scheduling Implementations and their Latency Terms. In this section we briefly discuss some schedulers that can be used to implement the redefined EF PHB with different degrees of accuracy and with different implementation complexity. D.3.1 Weighted Fair Queuing (WFQ/PGPS) For WFQ/PGPS ([DKS90],[Par92]), E2 = MTU/C just as for the case of WF2Q. However, it can be shown that E1 can grow linearly with the number of queues in the scheduler (which here and below is denoted by N). The worst case complexity of WFQ is also O(N). D.3.2.Deficit Round Robin (DRR) For DRR [SV95], both E1 and E2 can be shown to grow linearly with N*(r_max/r_min)*MTU, where r_min and r_max denote the smallest and the largest rate among the rate assignments of all queues in the scheduler. The implementation complexity of DRR is O(1). D.3.3. Start-Time Fair Queuing (SFQ) and Self-Clocked Fair Queuing (SCFQ) For SFQ [GVC96] and SCFQ [Gol94] both E1 and E2 can be shown to grow linearly with N. Implementation complexity of both of these schedulers is O(log N). D.3.4 WF2Q+ For WF2Q+ [BZ96b], E1 = MTU/R, while E2 can grow linearly with N. The implementation complexity of WF2Q+ is O(log N). D.4. Hierarchical scheduling implementations A possible implementation of EF PHB may be based on a hierarchical scheduling framework, such as described in [FJ95]. In this framework, different subsets of EF packets may be assigned to different queues. The semantics of exactly how packets are classified into different EF queues is highly implementation- dependent. For convenience, the subset of EF packets sharing a single queue will be referred to as "EF flows". The EF queues are grouped in a "logical queue", which is scheduled as a single entity along with other non-EF queues or groups of queues by a "top-level" scheduler. It is this top-level scheduler that must satisfy DEF_1. Once the EF aggregate (i.e the EF "logical queue") is scheduled by this top-level scheduler, an "EF flow-level" scheduler is invoked. As an example, a hierarchical scheduler with WF2Q at each level of the hierarchy (as described in [BZ96b]) can be used for such a purpose. Alternatively, the EF "logical" queue can be served at Charny May 2000 23 EF PHB Redefined Nov 2000 strict priority over all non-EF queues, while the EF queue at the "EF flow" level can be served by some other scheduler, such as WF2Q. In principle, hierarchical scheduling structure allows a substantial flexibility in the choice of scheduling mechanisms at each level of the hierarchy. Per-packet delay guarantees in such a hierarchical scheduling framework strongly depend on the accuracy of schedulers employed at each level of the hierarchy. In general, the more accurate the scheduling implementation at each level, the better the per-packet guarantee that can be provided. It can be shown that for the scheduling hierarchy, the E1 and E2 latency terms of the hierarchical scheduler with respect to a particular "leaf queue" can be obtained by summing the E1 and E2 terms of the schedulers employed at the nodes of the scheduling tree along the ascending branch of the tree from the root to the leaf. D.5. Effect of internal switching mechanisms A packet passing through a router will experience delay for a number of reasons. Two familiar components of this delay are the time the packet sits in a buffer at an outgoing link waiting for the scheduler to select it and the time it takes to actually transmit the packet on the outgoing line. There may be other components of a packet's delay through a router, however. A router might have to do some amount of header processing before the packet can be given to the correct output scheduler, for example. In another case a router may have a FIFO buffer (called a transmission queue in [FC2000]) where the packet sits after being selected by the output scheduler but before it is transmitted. In cases such as these, the extra delay a packet may experience can be accounted for by absorbing it into the latency term, E, in DEF_1. Implementing EF on a router with a multi-stage switch fabric requires special attention. A packet may experience additional delays due to the fact that it must compete with other traffic for forwarding resources at multiple contention points in the core. The delay an EF packet may experience before it even reaches the output- link scheduler should be included in the latency term. Input- buffered and input/output-buffered routers may also require modification of their latency terms. Delay in the switch core comes from two sources, both of which must be considered. The first part of this delay is the fixed delay a packet experiences regardless of the other traffic. This component of the delay includes the time it takes for things such as packet segmentation and reassembly in cell based cores, enqueueing and dequeueing at each stage, and transmission between stages. The second part of the switch core delay is variable and depends on the type and amount of other traffic traversing the core. This delay comes about if the stages in the core mix traffic flowing between different input/output port pairs. Thus, EF packets must compete against other traffic for forwarding resources in the core. Some of Charny May 2000 24 EF PHB Redefined Nov 2000 this competing traffic may even be EF traffic from other aggregates. This introduces extra delay, that can also be absorbed by the latency term in the definition. Appendix E: Comparison of the Packet Scale Rate Guarantee with the Rate-Latency Curve To understand the meaning of the redefinition (equations eq_1 and eq_2, in section 2.1) we compare it with a well-known rate-latency curve [LEB98], and argue that the redefinition is stronger than the rate-latency curve [LEB98] in the sense that if a scheduler satisfies the redefinition, it also satisfies the rate-latency curve. As a result, all the properties known for the rate-latency curve also apply to the redefinition. We also argue why the redefinition is more suitable to reflect the intent of EF PHB than the rate-latency curve. It is shown in [LEB98] that the rate-latency curve is equivalent to the following definition: Definition DEF_2: d(j) <= F'(j) + E (eq_3) where F'(0)=0, F'(j)=max(a(j), F'(j-1))+ L(j)/R for all j>0 (eq_4) It can be easily verified that the redefinition is stronger than DEF_2 by noticing that for all j, F'(j) >= F(j). It is easy to see that F'(j) in the definition DEF_2 corresponds to the time the j-th departure should have occurred should the EF aggregate be constantly served exactly at its configured rate R. Following the common convention, we refer to F'(j) as the "fluid finish time" of the j-th packet to depart. The intuitive meaning of the rate-latency curve of DEF_2 is that any packet is served at most time E later than this packet would finish service in the fluid model. For a rate-latency curve DEF_2 (and hence for the stronger redefinition) it holds that in any interval (0,t) the EF aggregate gets close to the desired service rate R (as long as there is enough traffic to sustain this rate). The discrepancy between the ideal and the actual service in this interval depends on the latency term E, which in turn depends on the scheduling implementation. The smaller E, the smaller the difference between the configured rate and the actual rate achieved by the scheduler. While DEF_2 guarantees the desired rate to the EF aggregate in all intervals (0,t) to within a specified error, it may nevertheless Charny May 2000 25 EF PHB Redefined Nov 2000 result in large gaps in service. For example, suppose that (a large number) N of identical EF packets of length L arrived from different interfaces to the EF queue in the absence of any non-EF traffic. Then any work-conserving scheduler will serve all N packets at link speed. When the last packet is sent at time NL/C, where C is the capacity of output link, F(N) will be equal to NL/R. Suppose now that at time NL/C a large number of non-EF packets arrive, followed by a single EF packet. Then the scheduler can legitimately delay starting to send the EF packet until time F(N+1)=(N+1)L/R + E - L/C. This means that the EF aggregate will have no service at all in the interval (NL/C, (N+1)L/R + E - L/C). This interval can be quite large if R is substantially smaller than C. In essence, the EF aggregate can be "punished" by a gap in service for receiving faster service than its configured rate at the beginning. The redefinition alleviates this problem by introducing the term min(d(j-1), F(j-1)) in the recursion. Essentially, this means that the fluid finishing time is "reset" if that packet is sent too early. As a consequence of that, for the case where the EF aggregate is served in a FIFO order, suppose a packet arrives at time t to a server satisfying the redefinition. The packet will be transmitted no later than time t + Q(t)/R + E, where Q(t) is the EF queue size at time t (including the packet under discussion). This statement is proved in Appendix C. 7. References [BZ96a] J.C.R. Bennett and H. Zhang, ``WF2Q: Worst-case Fair Weighted Fair Queuing'', INFOCOM'96, Mar, 1996 [BZ96b] J.C.R. Bennett and H. Zhang, Hierarchical Packet Fair Queuing Algorithms. IEEE/ACM Transactions on Networking, 5(5):675-689, Oct 1997. Also in Proceedings of SIGCOMM'96, Aug, 1996 [RFC2475] Black, D., Blake, S., Carlson, M., Davies, E., Wang, Z. and W. Weiss, "An Architecture for Differentiated Services", RFC 2475, December 1998. [LEB98] J.-Y. Le Boudec, "Application of Network Calculus To Guaranteed Service Networks", IEEE Transactions on Information theory, (44) 3, May 1998 [Bra97] Bradner, S., "Key Words for Use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [CLeB2000] A. Charny, J.-Y. Le Boudec "Delay Bounds in a Network with Aggregate Scheduling". To appear in Proc. of QoFIS'2000, September 25-26, 2000, Berlin, Germany. [DKS90] A. Demers, S. Keshav, and S. Shenker, "Analysis and Simulation of a Fair Queuing Algorithm". In Journal of Internetworking Research and Experience, Charny May 2000 26 EF PHB Redefined Nov 2000 pages 3-26, October 1990. Also in Proceedings of ACM SIGCOMM'89, pp 3-12. [FC2000] T. Ferrari and P. F. Chimento, "A Measurement- Based Analysis of Expedited Forwarding PHB Mechanisms," Eighth International Workshop on Quality of Service, Pittsburgh, PA, June 2000, [FJ95] S. Floyd and V. Jacobson, "Link-sharing and Resource Management Models for Packet Networks", IEEE/ACM Transactions on Networking, Vol. 3 no. 4, pp. 365- 386,August 1995. [Gol94] S.J. Golestani. "A Self-clocked Fair Queuing Scheme for Broad-band Applications". In Proceedings of IEEE INFOCOM'94, pages 636-646, Toronto, CA, April 1994. [GVC96] P. Goyal, H.M. Vin, and H. Chen. "Start-time Fair Queuing: A Scheduling Algorithm for Integrated Services". In Proceedings of the ACM-SIGCOMM 96, pages 157-168, Palo Alto, CA, August 1996. [RFC2598] V. Jacobson, K. Nichols, K. Poduri, "An Expedited Forwarding PHB", RFC 2598, June 1999 [RFC2474] Nichols, K., Blake, S., Baker, F. and D. Black, "Definition of the Differentiated Services Field (DS Field) in the IPv4 and IPv6 Headers", RFC 2474, December 1998. [JNP2000] V. Jacobson, K. Nichols, K. Poduri, "The 'Virtual Wire' Behavior Aggregate," (draft-ietf-diffserv-ba-vw-00.txt), March 2000. [Par92] A. Parekh. "A Generalized Processor Sharing Approach to Flow Control in Integrated Services Networks". PhD dissertation, Massachusetts Institute of Technology, February 1992. [SV95] M. Shreedhar and G. Varghese. "Effient Fair Queueing Using Deficit Round Robin". In Proceedings of SIGCOMM'95, pages 231-243, Boston, MA, September 1995. [Sto95] I. Stoica and H. Abdel-Wahab, "Earliest Eligible Virtual Deadline First: A Flexible and Accurate Mechanism for Proportional Share Resource Allocation", Technical Report 95-22, Old Dominion University, November 1995. 8. Authors' addresses Anna Charny, ed. Charny May 2000 27 EF PHB Redefined Nov 2000 Cisco Systems 300 Apollo Drive Chelmsford, MA 01824 acharny@cisco.edu Fred Baker Cisco Systems 170 West Tasman Dr. San Jose, CA 95134 fred@cisco.com Jon Bennett RiverDelta Networks 3 Highwood Drive East Tewksbury, MA 01876 jcrb@riverdelta.com Kent Benson Tellabs Research Center 3740 Edison Lake Parkway #101 Mishawaka, IN 46545 Kent.Benson@tellabs.com Jean-Yves Le Boudec ICA-EPFL, INN Ecublens, CH-1015 Lausanne-EPFL, Switzerland leboudec@epfl.c Angela Chiu AT&T Labs 100 Schulz Dr. Rm 4-204 Red Bank, NJ 07701 alchiu@att.com Bill Courtney TRW Bldg. 201/3702 One Space Park Redondo Beach, CA 90278 bill.courtney@trw.com Shahram Davari PMC-Sierra Inc 555 Legget drive Suit 834, Tower B Ottawa, ON K2K 2X3, Canada shahram_davari@pmc-sierra.com Bruce Davie Cisco Systems 300 Apollo Drive Chelmsford, MA 01824 Charny May 2000 28 EF PHB Redefined Nov 2000 bsd@cisco.com Victor Firoiu Nortel Networks 600 Tech Park Billerica, MA 01821 vfirou@nortelnetworks.com Charles Kalmanek AT&T Labs-Research 180 Park Avenue, Room A113, Florham Park NJ crk@research.att.com. K.K. Ramakrishnan AT&T Labs-Research Rm. A155, 180 Park Ave, Florham Park, NJ 07932 kkrama@research.att.com Dimitrios Stiliadis Lucent Technologies 1380 Rodick Road Markham, Ontario, L3R-4G5, Canada stiliadi@bell-labs.com 9. Full Copyright Copyright (C) The Internet Society 2000. All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Charny May 2000 29 EF PHB Redefined Nov 2000 [This Page Intentionally Left Blank ] Charny May 2000 30