Network Working Group                                         A. Charny
Internet Draft                                                    Cisco
Document: draft-charny-ef-definition-01.txt                    Nov 2000


                            EF PHB Redefined
  Internet Draft                                      Anna Charny, ed.
                                                         Cisco Systems

                                                            Fred Baker
                                                         Cisco Systems

                                                           Jon Bennett
                                                   Riverdelta Networks

                                                           Kent Benson
                                                               Tellabs

                                                   Jean-Yves Le Boudec
                                                                  EPFL

                                                           Angela Chiu
                                                             AT&T Labs

                                                      William Courtney
                                                                   TRW

                                                           Bruce Davie
                                                         Cisco Systems

                                                        Shahram Davari
                                                            PMC-Sierra

                                                         Victor Firoiu
                                                       Nortel Networks

                                                      Charles Kalmanek
                                                         AT&T Research

                                                     K.K. Ramakrishnan
                                                         AT&T Research

                                                   Dimitrios Stiliadis
                                                   Lucent Technologies


   Expires May 2001
   draft-charny-ef-definition-01.txt                    November 2000


                                   EF PHB Redefined


Charny                         May 2000                              1
                           EF PHB Redefined                   Nov 2000


   Status of this Memo

   This document is an Internet Draft and is in full conformance with
   all provisions of Section 10 of RFC2026. Internet Drafts are working
   documents of the Internet Engineering Task Force (IETF), its Areas,
   and its Working Groups.  Note that other groups may also distribute
   working documents as Internet Drafts.

   Internet Drafts are draft documents valid for a maximum of six
   months. Internet Drafts may be updated, replaced, or obsoleted by
   other documents at any time.  It is not appropriate to use Internet
   Drafts as reference material or to cite them other than as a
   "working draft" or "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

   To learn the current status of any Internet-Draft, please check the
   "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow
   Directories on ftp.ietf.org  (US East Coast), nic.nordu.net
   (Europe), ftp.isi.edu (US West Coast), or munnari.oz.au (Pacific
   Rim).

   This document is a product of the Diffserv working group of the
   Internet Engineering Task Force.  Please address comments to the
   group's mailing list at diffserv@ietf.org, with a copy to the
   authors.

   Copyright  (C) The Internet Society (1999).   All Rights
   Reserved.


Abstract

   This document proposes text aiming at providing clarification to RFC
   2598. The primary motivation for this draft is to clarify the
   definition
   of EF PHB given in RFC 2598.

   This draft gives a rigorous definition of EF PHB which in our
   opinion preserves the spirit of the EF PHB as intended by RFC 2598
   while allowing a number of reasonable compliant implementations.

1 Introduction

   The Expedited Forwarding (EF) Per-Hop Behavior (PHB) of RFC 2598 was
   designed to be used to build a low-loss, low-latency, low-jitter,
   assured bandwidth service. The potential benefits of this service,
   and therefore the EF PHB, are enormous. Because of the great value
   of this PHB, it is critical that the forwarding behavior required of

Charny                         May 2000                             2
                           EF PHB Redefined                   Nov 2000


   and delivered by an EF-compliant node be specific, quantifiable, and
   unambiguous.

   The underlying intuition behind the EF PHB, as defined in RFC 2598,
   stems from the fact that delay and jitter are typically small in a
   lightly loaded network.  The EF PHB, as defined in RFC 2598,
   effectively defines a building block for creating a "virtual
   unloaded network" for EF traffic.  It achieves this goal by
   requiring that the service rate of the EF aggregate at any link be
   equal to or exceeds the input rate of EF traffic at any link (under
   the assumption that the network is appropriately provisioned and
   that EF traffic is shaped/policed at the network ingress).

   Conceptually, the configured rate of the EF aggregate can be viewed
   as the "link speed" in this "virtual network".  While specifying
   this "link speed" is not by itself sufficient to provide strict
   delay or jitter guarantees in a general network, nevertheless
   knowing this "link speed", or the minimal guaranteed drain rate of
   EF traffic, is essential for the ability to construct quantifiable
   end-to-end behavior across a Diffserv domain.
    
   Thus, the definition of EF PHB in RFC 2598 is indeed a necessary
   building block for constructing quantifiable PDBs. Unfortunately
   however, we believe that the actual definition contained in section
   2 of RFC 2598 is not sufficiently precise.  As a result, many of the
   forwarding behaviors which are intuitively reasonable do not
   actually comply with the formal definition of RFC 2598. Furthermore,
   many of the schedulers believed to deliver EF-compliant behavior
   cannot be used to implement the formal definition of EF since they
   result in forwarding treatment which does not comply with the
   definition of RFC 2598.

   A detailed discussion of the issues we find with the definition of
   RFC 2598 is given in Appendix A.

   The goal of this draft is to give a precise mathematical definition
   that describes the notion of ensuring a guaranteed service rate for
   an EF aggregate at a small timescale, thus presenting a formal
   framework for constructing an "unloaded virtual network" for EF
   traffic. Different PDBs may be constructed from this basic building
   block by imposing various restrictions on the network topology,
   configuration parameters, scheduling disciplines, etc. These
   mechanisms are outside the scope of this draft.

2 Definition of EF PHB

2.1 The Formal Definition

2.1.1 Intuition behind the definition

   The intent of EF PHB is to provide the EF aggregate with its
   configured service rate (or better) over as small a timescale as


Charny                         May 2000                             3
                           EF PHB Redefined                   Nov 2000


   possible. We formalize this notion by introducing what we call a
   "packet scale rate guarantee".

   The intuitive meaning of the packet scale rate guarantee is that as
   long as there are EF packets in the node, we would like the j-th EF
   packet of length L(j) to depart no later than L(j)/R seconds after
   the (j-1)st departed (here R is the configured rate of the
   aggregate). (L(j)/R is simply the time that it would take to forward
   the j-th packet at the EF-configured rate R.) Were this always to
   occur, the EF packets would be forwarded perfectly at the configured
   rate.

   However, real world schedulers and router architectures introduce
   various degrees of distortion in the perfect forwarding sequence.
   Furthermore, it is clear that packets may not possibly be forwarded
   at the configured rate if they arrive slower than at this rate. The
   formal definition must account for these issues.

   In essence, the packet scale rate guarantee is defined in terms of
   an upper bound on the deviation of the actual departure time of the
   j-th packet of EF aggregate from the "ideal" departure time at
   configured rate R.  The "ideal" departure time is computed
   iteratively. Essentially, when there are multiple EF packets in the
   device, the ideal time of the j-th departure is simply the ideal
   time of previous departure plus L(j)/R, where  L(j) is the length of
   the j-th packet to depart. In the case when an EF packet arrives to
   a device when all the previous packets have already departed, the
   computation of the ideal departure time is somewhat more
   complicated.  There are two cases to be considered in this case.  If
   the previous, j-1-th departure occurred after its own ideal
   departure time, then the new ideal departure time should be L(j)/R
   plus the larger of the j-1-th ideal departure time and the j-th
   arrival time. This is the case when the EF aggregate is behind its
   ideal service rate at the time of the j-1-th departure. However, if
   the previous departure occurred before its ideal departure time,
   which corresponds to the case when the EF aggregate has been served
   faster than at its configured rate at by the time of the j-1-th
   departure, then the new departure time is computed as L(j)/R plus
   the larger of the j-th arrival time and the time of the actual
   (rather than the ideal)j-1-th departure. This is needed to avoid
   "punishing" the newly arrived EF packet by delaying it longer due to
   some other packets receiving service faster than at the configured
   rate R in the past.  More discussion of this issue can be found in
   appendices A and E.

2.1.2 The Formal Definition
   Formally, we say that a node provides EF service if it forwards
   packets in compliance with the following definition:

   Definition of Packet Scale Rate Guarantee  (DEF_1)
   -----------------------------------------


Charny                         May 2000                             4
                           EF PHB Redefined                   Nov 2000


   A node offers the EF aggregate a "packet scale rate guarantee R with
   latency E" at some output interface I if for all j > 0, d(j), the
   time of departure of the j-th EF packet to depart from the interface
   I, satisfies the following condition:


      d(j) <= F(j) + E                                           (eq_1)

   where F(j) is defined iteratively by

      F(0)=0, d(0) = 0

      F(j)=max(a(j), min(d(j-1), F(j-1)))+ L(j)/R,  for all j>0  (eq_2)

   and E is a constant tolerance (or error) term for the node (given in
   seconds).

   In this definition,

   d(j) is the time that the last bit of the j-th EF packet to depart
   actually leaves the node from the interface I.

   F(j) is the target departure (finishing) time for the j-th EF packet
   to depart from I, the "ideal" time that the last bit of that packet
   should leave the node.

   a(j) is the time that the last bit of the j-th EF packet destined to
   the output I to arrive actually arrives at the node.

   L(j) is the size (bits) of the j-th EF packet to depart from I.
   R is the EF configured rate at I (in bits/second)


   Note that the sequences a(j), d(j) and F(j) relate to packets that
   leave a given output interface, in this case interface I, but may
   arrive from any input interface. Every OUTPUT interface, I,J,K,etc
   has its own sequence of a(j)'s, d(j)'s and F(j)'s, i.e. a_I(j),
   a_J(j), a_K(j), etc, for clarity we omit the subscript since it can
   be inferred.

   The choice of indexes does not restrict when in the actual packet
   stream we start the observation of the arrival and departure of EF
   packets, except that the observation must start when there are no EF
   packets in the node for this output interface. (Otherwise, we would
   not have the values of a(j) for the EF packets already in the node.)
   Note also that while index j=1 corresponds to the first packet in
   the observation, index j=0 does not correspond to any packet at all
   and is used solely to start the recursion.

   The latency term E in (eq_1) quantifies the maximum distortion from
   the ideal service at the configured rate R that a particular device

Charny                         May 2000                             5
                           EF PHB Redefined                   Nov 2000

   can introduce.  As a result, the term E in (eq_1) can be viewed as a
   "figure of merit" and can be used to compare different
   implementations of EF PHB.

   NOTE: The latency term E may be declared on a per output link basis.

   NOTE: Since the declaration of a fixed value of E may for some
   schedulers restrict the range of the configured rate R, the value of
   E may be declared as a function of the configured rate R.

   Note that nothing in the definition implies that a(j) and d(j)
   necessarily refer to the same packet. This lack of direct
   correspondence between a(j) and d(j) is deliberate, and relates to
   the goal of accommodating a wide range of schedulers and router
   architectures. Even in the case of a priority FIFO implementation at
   the output interface, the presence of variable internal delay may
   result in reordering of the EF packets arriving from different input
   interfaces, causing the j-th EF packet arriving to a router not
   being the same packet as the j-th EF packet departing from the
   router. Likewise, the j-th arriving packet may not necessarily be
   the j-th departing packet in "flow-aware schedulers" which have the
   ability to differentiate between different sub-flows within the EF
   aggregate. An example of such a scheduler might be a hierarchical
   scheduler which serves the EF aggregate as a whole at the highest
   priority, but uses some WFQ implementation to choose a packet of a
   particular sub-stream of EF (e.g. a given "virtual wire" circuit)
   within the EF aggregate. Further discussion of interpretation of
   this definition can be found in Appendix A.

2.1.3. Example usage of the definition.


   We now show an example of how the definition can be applied to an
   abstract router.

   The figures below describe a sequence of packets arriving to a
   router, and their departure times. Nothing is known about the
   internals of the router, and the arrival and departure times
   represent the only externally observable information. All packets
   shown in the examples are destined to a single output interface. For
   the sake of an example, we assume that the output interface in
   question has a configured rate R=C/2, where C is the output line
   rate, and that the router declares the error term E=4*(MTU/C) at
   this interface.

   In each figure, time increases as we move to the right. Units of
   time are MTU/C, the time it takes to forward an MTU-sized packet at
   the output line rate C. For simplicity, all packets are MTU-sized.

   The first figure below shows a sequence of arriving EF packets
   (labeled A, B, etc., using upper-case letters). The placement of the
   letter corresponds to the time when the last bit of the packet
   arrives at the router.  Note that there is some degree of burstiness

Charny                         May 2000                             6
                           EF PHB Redefined                   Nov 2000


   in the input pattern:  packets A and B arrive back-to-back, and
   packets D and E arrive back-to-back as well. Packets E and F arrive
   simultaneously on different input interfaces.

   t --->
   0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16
   A
       B
               C
                       D
                           E
                           F
                                       G

   The next figure below shows a forwarding behavior that conforms to
   the definition proposed in this draft. On each line, the packet
   letter (A, B, etc.) shows the time that the last bit of the packet
   is forwarded. I.e., the upper-case letters give the values of d()
   for the sequence of packets. The terms 'f' and 'f+e' in a row give
   the ideal departure time, F(), and the latest permissible departure
   time, F() + E, respectively, for the packet on that row. Thus, F(A)
   = 2 and F(A) + E = 6, as given on the first row of the body of the
   table. Similarly, F(B) = 3 and F(B) + E = 7, as given on the second
   row of the body of the table. Hence, any uppercase letter which is
   placed to the left of the time corresponding to f+e on the line
   corresponds to a conformant departure.  Calculations using equations
   (eq_1) and (eq_2) are given after the figure to show how the values
   of 'f' and 'f+e' were obtained. Some comments follow the
   calculations.

   t --->
0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17
    A   f              f+e
            Bf             f+e
                    f   C          f+e
                            f           D  f+e
                                    f       E      f+e
                                            f   F          f+e
                                                   f   G          f+e


   a(0) = d(0) = 0.

   a(A) = 0
   F(A) = max(0, min(0, 0)) + 2 = 2, d(A) must be <= 2 + 4 = 6
   d(A) = 1 <= 6

   a(B) = 1
   F(B) = max(1, min(1, 2)) + 2 = 3, d(B) must be <= 3 + 4 = 7
   d(B) = 3 <= 7

   a(C) = 3

Charny                         May 2000                             7
                           EF PHB Redefined                   Nov 2000

   F(C) = max(3, min(3, 3)) + 2 = 5, d(C) must be <= 5 + 4 = 9
   d(C) = 6 <= 9

   a(D) = 5
   F(D) = max(5, min(6, 5)) + 2 = 7, d(D) must be <= 7 + 4 = 11
   d(D) = 10 <= 11

   a(E) = 6
   F(E) = max(6, min(10, 7)) + 2 = 9, d(E) must be <= 9 + 4 = 13
   d(E) = 11 <= 13

   a(F) = 6
   F(F) = max(6, min(11, 9)) + 2 = 11, d(F) must be <= 11 + 4 = 15
   d(F) = 12

   a(G) = 9
   F(G) = max(9, min(12, 11)) + 2 = 13, d(G) must be <= 13 + 4 = 17
   d(G) = 14 <= 17

   The key to understanding the calculations is to notice that whenever
   a packet P is forwarded earlier than its ideal departure time,F(P),
   the calculation of the next packet's ideal departure time uses P's
   actual departure time. Whenever a packet P is forwarded later than
   its ideal departure time, the calculation of the next packet's ideal
   departure time uses P's ideal departure time. Thus, slippage is not
   allowed to accumulate when packets are forwarded late, and credit is
   not built up when packets are forwarded early.

   The next figure below shows another forwarding behavior for the same
   arrival pattern. This behavior does not conform to this draft's
   proposed definition.

   t --->
0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17
    A   f              f+e
            Bf             f+e
                C   f              f+e
                            Df             f+e
                                    f       E      f+e
                                            f              f+e  F
                                                    f              Gf+e

   Here, A and C are forwarded early, while B and D are forwarded at
   their ideal departure times. Note, however, that these ideal
   departure times are earlier than they would have been if A and C had
   not been forwarded early. (Note also, that there is no accumulation
   of credit for the early departures of A and C. If such an
   accumulation of credit were permitted, jitter could be increased if
   a subsequent packet is delayed a long time while the credit is
   spent.) Thus, the ideal forwarding time for the fourth packet, D, is
   at time 6, even though the EF-configured rate is one packet every
   two time intervals. Packet E is forwarded late, although still
   within tolerance. Packet F's ideal forwarding time is 10, but it is

Charny                         May 2000                             8
                           EF PHB Redefined                   Nov 2000

   not forwarded until time 15, one time unit later than its latest
   permissible forwarding time. This very late departure makes the
   behavior non-conformant. Packet G is forwarded in conformance with
   the definition, but just barely.

   These examples illustrate how conformance to the proposed definition
   can be verified without any knowledge of the internal router
   architecture or scheduling implementation. Of course, while this
   knowledge is not necessary to determine the conformance with a given
   declared E, the designer of the box must use this knowledge to be
   able to declare E for the device.

2.2 Per Packet Delay.

   It is important to note that just as with the definition of EF PHB
   in RFC 2598, the packet scale rate guarantee is defined only in the
   context of an entire EF aggregate as a whole. In particular, the
   packet-scale rate guarantee definition is intentionally silent about
   exactly how various sub-streams of the EF aggregate are scheduled
   within the EF aggregate. A consequence of this is that packet scale
   rate guarantee provided to EF aggregate does not by itself imply a
   per-packet delay.

   This is analogous to the fact that the mere knowledge of link rates
   in a real network serving just a single class of traffic does not in
   itself provide per-packet delay guarantee.  The per packet delay
   guarantee at a hop with a FIFO service in such network will differ
   drastically from a per-packet delay guarantee at a hop with a WFQ
   server.

   Aside from the knowledge of the properties of scheduling
   implementations, ensuring per-packet delay at a hop involves the
   ability to bound burstiness at the ingress of a hop. This is a
   complex task involving a fair amount of global knowledge such as the
   network topology, hop count, link utilization, upstream scheduling
   implementations, etc.  As a result, addressing these issues appears
   appropriate not in the context of a local PHB definition, but rather
   in the context of a PDB, which is inherently a global concept.

3. Implementation considerations.

   The packet scale rate guarantee definition does not mandate a
   particular underlying queuing or scheduling mechanism. However, for
   the definition to be meaningful, it is important to make sure that
   there exist at least some schedulers that strictly satisfy the
   definition with reasonable latency terms.

   It can be shown that the strict priority scheduler in which all EF
   packets share a single FIFO queue (which is served at strict non-
   preemptive priority over other queues) satisfies the definition with
   the latency term E = MTU/C where MTU is the maximum packet size and
   C is the speed of output link.


Charny                         May 2000                             9
                           EF PHB Redefined                   Nov 2000

   Another scheduler that satisfies the definition with a small latency
   term is WF2Q described in [BZ96a]. A class-based WF2Q scheduler, in
   which all EF traffic shares a single queue with the weight
   corresponding to the configured rate of the EF aggregate can be
   shown to satisfy the definition with the latency term E =
   MTU/C+MTU/R.

   The proofs that PQ and WF2Q satisfy the packet-scale rate guarantee
   definition with the above latency terms are given in Appendix C.

   The definition also allows a wide range of scheduling algorithms,
   but different algorithms result in different degrees of deviation
   from the "ideal" service rate. The degree of the accuracy with which
   a scheduler can ensure that the EF aggregate receives its configured
   rate at a small (packet) timescale is expressed by the E term of the
   scheduler. A list of several well-known schedulers and their
   corresponding error terms can be found in Appendix D.

5. Security Considerations

   This draft makes the PHB definition in [RFC2598] more rigorous, but
   adds no new functions to it. As a result, it adds no security issues
   to those described in that specification.

6. Appendices

   Appendix A: Issues with the RFC 2598 PHB Definition

   There are several potentially serious problems with having a formal
   EF definition that does not match people's intuitive understanding.
   First, the understanding of what it means for a node to be EF-
   compliant may vary among people. This discrepancy may arise due to
   the fact that two people's intuitive understanding of the definition
   may actually differ somewhat; also, someone learning about EF from
   the formal definition may develop an understanding of EF at odds
   with the understanding that most people currently familiar with EF
   have. These discrepancies in people's understanding of EF may have
   serious consequences. The resulting confusion may increase the time
   and cost needed to develop equipment, cause interoperability
   problems, and create mismatches between expected node and network
   performance and actual performance. Second, the lack of a clear
   conformance definition makes it impossible to test a piece of
   equipment and declare it "conforming" or "non-conforming." Third,
   the lack of a mathematically precise description of a node's
   behavior makes it impossible to analytically design or evaluate
   services constructed using the EF PHB or other PHBs that must
   contend for resources with EF traffic. Fourth, an incorrect formal
   definition of EF may lead to erroneous reasoning about the
   properties of networks implementing EF.

A.1 The RFC 2598 Definition of EF PHB and Its Intuitive Meaning

   The definition of the EF PHB as given in [RFC 2598] states:

Charny                         May 2000                            10
                           EF PHB Redefined                   Nov 2000


     It [the EF PHB departure rate] SHOULD average at least the
   configured rate when measured over any time interval equal to or
   longer than the time it takes to send an output link MTU sized
   packet at the configured rate.

   The intuitive content of this definition is fairly clear. On all
   time scales ranging down to very small time scales, the EF aggregate
   should be given at least its configured share of the output link
   bandwidth. Among other things, this allows EF to support
   applications that are delay- and jitter-sensitive.

   However, intuition alone will not allow vendors to design compliant
   schedulers capable of advertising their EF configuration to other
   routers. As we show in the next section, the simplicity of the
   definition is misleading in the sense that it does not actually
   capture the intuition correctly under a number of circumstances.

   A note is due here on the precise interpretation of the wording of
   the definition. A potential cause of ambiguity is the fact that the
   definition contains the word SHOULD which according to [Bra97] means
   that in principle an implementation of EF PHB may under some
   circumstances choose not to be strictly compliant with the specified
   requirement, in which case any issues with the strict definition may
   be viewed as irrelevant. However, it seems that in order for the
   SHOULD to be meaningful, there should exist at least some
   implementations which are strictly compliant, even if non-compliant
   implementations may be chosen under some circumstances. Furthermore,
   the Virtual Wire behavior aggregate [JNP2000] is defined by
   replacing SHOULD by MUST in the definition of EF PHB in RFC 2598.
   Therefore, in all cases the exact mathematical properties of the EF
   definition and the existence of strictly compliant implementations
   are of substantial interest.  The remainder of this section
   concentrates on the discussion of these issues in detail.

A.2 Particular Difficulties with the RFC 2598 EF PHB Definition

   A literal interpretation of the definition would consider the
   behaviors given in the next two subsections as non-compliant. The
   definition also unnecessarily constrains the maximum configurable
   rate of an EF aggregate.

A.2.1 Perfectly-Clocked Forwarding

   Consider the following stream forwarded from a router with EF-
   configured rate R=C/2, where C is the output line rate. In the
   illustration, E is an MTU-sized EF packet while x is a non-EF packet
   or unused capacity, also of size MTU.

           ... E x E x E x E x E x E x...
                    |-----|


Charny                         May 2000                            11
                           EF PHB Redefined                   Nov 2000

   The interval between the vertical bars is 3*MTU/C, which is greater
   than MTU/(C/2), and so is subject to the EF PHB definition. During
   this interval, 3*MTU/2 bits of the EF aggregate should be forwarded,
   but only MTU bits are forwarded.  Therefore, while this forwarding
   pattern should be considered compliant under any reasonable
   interpretation of the EF PHB, it actually does not formally comply
   with the definition of RFC 2598.

   Note that this forwarding pattern can occur in any work-conserving
   scheduler in an ideal output-buffered architecture where EF packets
   arrive in a perfectly clocked manner according to the above pattern
   and are forwarded according to exactly the same pattern in the
   absence of any non-EF traffic.

   Trivial as this example may be, it reveals the lack of mathematical
   precision in the formal definition. The fact that no work-conserving
   scheduler can formally comply with the definition is unfortunate,
   and appears to warrant some changes to the definition that would
   correct this problem.

   The underlying reason for the problem described here is quite simple
   - one can only expect that the EF aggregate is served at configured
   rate in some interval where there is enough backlog of EF packets to
   sustain that rate. In the example above the packets come in exactly
   at the rate at which they are served, and so there is no persistent
   backlog.  Certainly, if the input rate is even smaller than the
   configured rate of the EF aggregate, there will be no backlog as
   well, and a similar formal difficulty will occur.

   A seemingly simple solution to this difficulty might be to require
   that the EF aggregate is served at its configured rate only when the
   queue is backlogged.  However, as we show in the remainder of this
   section, this solution does not suffice.

A.2.2 Router Internal Delay

   We now argue that the example considered in the previous section is
   not as trivial as it may seem at first glance.

   Consider a router with EF configured rate R = C/2 as in the previous
   example, but with an internal delay of 3T (where T = MTU/C) between
   the time that a packet arrives at the router and the time that it is
   first eligible for forwarding at the output link. Such things as
   header processing, route look-up, and delay in switching through a
   multi-layer fabric could cause this delay. Now suppose that EF
   traffic arrives regularly at a rate of (2/3)R = C/3. The router will
   perform as shown below.

            EF Packet Number         1    2    3    4    5    6 ...
            Arrival (at router)      0   3T   6T   9T  12T  15T ...
            Arrival (at scheduler)  3T   6T   9T  12T  15T  18T ...
            Departure               4T   7T  10T  13T  16T  19T ...


Charny                         May 2000                            12
                           EF PHB Redefined                   Nov 2000

   Again, the output does not satisfy the RFC 2598 definition of EF
   PHB. As in the previous example, the underlying reason for this
   problem is that the scheduler cannot forward EF traffic faster than
   it arrives. However, it can be easily seen that the existence of
   internal delay causes one packet to be inside the router at all
   times. An external observer will rightfully conclude that the number
   of EF packets that arrived to the router is always at least one
   greater than the number of EF packets that left the router, and
   therefore the EF aggregate is constantly backlogged. However, while
   the EF aggregate is continuously backlogged, the observed output
   rate is nevertheless strictly less that the configured rate.

   This example indicates that the simple addition of the condition
   that EF aggregate must receive its configured rate only when the EF
   aggregate is backlogged does not suffice in this case.

   Yet, the problem described here is of fundamental importance in
   practice.  Most routers have a certain amount of internal delay.  A
   vendor declaring EF compliance is not expected to simultaneously
   declare the details of the internals of the router.  Therefore, the
   existence of internal delay may cause a perfectly reasonable EF
   implementation to display seemingly non-conformant behavior, which
   is clearly undesirable.

A.2.3 Maximum Configurable Rate and Provisioning Efficiency

   It is well understood that with any non-preemptive scheduler, the
   compliant configurable rate for an EF aggregate cannot exceed C/2
   [JNP2000]. This is because an MTU-sized EF packet may arrive to an
   empty queue at time t just as an MTU-sized non-EF packet begins
   service. The maximum number of EF bits that could be forwarded
   during the interval [t, t + 2*MTU/C] is MTU. But if configured rate
   R > C/2, then this interval would be of length greater than MTU/R,
   and more than MTU EF bits would have to be served during this
   interval for the router to be compliant. Thus, R must be no greater
   than C/2.

   It can be shown that for schedulers other than PQ, such as various
   implementations of WFQ, the maximum compliant configured rate may be
   much smaller than 50%. For example, for SCFQ [Gol94] the maximum
   configured rate cannot exceed C/N, where N is the number of queues
   in the scheduler. For WRR, mentioned as compliant in section 2.2 of
   RFC 2598, this limitation is even more severe. This is because in
   these schedulers a packet arriving to an empty EF queue may be
   forced to wait until one packet from each other queue (in the case
   of SCFQ) or until several packets from each other queue (in the case
   of WRR) are served before it will finally be forwarded.

   While it is frequently assumed that the configured rate of EF
   traffic will be substantially smaller than the link bandwidth, the
   bandwidth appears unnecessarily limiting.  For example, in a fully
   connected mesh network, where any flow traverses a single link on
   its way from source to its destination there seems no compelling

Charny                         May 2000                            13
                           EF PHB Redefined                   Nov 2000

   reason to limit the amount of EF traffic to 50% (or an even smaller
   percentage for some schedulers) of the link bandwidth.

   Another, perhaps even more striking example is the fact that even a
   TDM circuit with dedicated slots cannot be configured to forward EF
   packets at more than 50% of the link speed without violating RFC
   2598 (unless the entire link is configured for EF). If the
   configured rate of EF traffic is greater than 50% (but less than the
   link speed), there will always exist an interval longer than MTU/R
   in which less than the configured rate is achieved. For example,
   suppose the configured rate of the EF aggregate is 2C/3. Then the
   forwarding pattern of the TDM circuit might be

      E E x E E x E E x ...
         |---|

   where only one packet is served in the marked interval of length 2T
   = 2MTU/C. But at least 4/3 MTU would have to be served during this
   interval by a router in compliance with the definition in RFC 2598.
   The fact that even a TDM line cannot be booked over 50% by EF
   traffic indicates that the restriction is artificial and
   unnecessary.

A.3 The Non-trivial Nature of the Difficulties

   One possibility to correct the problems discussed in the previous
   sections might be to attempt to clarify the definition of the
   intervals to which the definition applied or by averaging over
   multiple intervals. However, an attempt to do so meets with
   considerable analytical and implementation difficulties. For
   example, attempting to align interval start times with some epochs
   of the forwarded stream appears to require a certain degree of
   global clock synchronization and is fraught with the risk of
   misinterpretation and mistake in practice.

   Another approach might be to allow averaging of the rates over some
   larger time scale. However, it is unclear exactly what finite time
   scale would suffice in all reasonable cases. Furthermore, this
   approach would compromise the notion of very short-term time scale
   guarantees that are the essence of EF PHB.

   We also explored a combination of two simple fixes. The first is the
   addition of the condition that the only intervals subject to the
   definition are those that fall inside a period during which the EF
   aggregate is continuously backlogged in the router (i.e., when an EF
   packet is in the router). The second is the addition of an error
   (latency) term that could serve as a figure-of-merit in the
   advertising of EF services.

   With the addition of these two changes the candidate definition
   becomes as follows:


Charny                         May 2000                            14
                           EF PHB Redefined                   Nov 2000

     In any interval of time (t1, t2) in which EF traffic is
   continuously backlogged, at least R(t2 - t1 - E) bits of EF traffic
   must be served, where R is the configured rate for the EF aggregate
   and E is an implementation-specific latency term.

   The "continuously backlogged" condition eliminates the insufficient-
   packets-to-forward difficulty while the addition of the latency term
   of size MTU/C resolves the perfectly-clocked forwarding example
   (section A.2.1), and also removes the limitation on EF configured
   rate.

   However, neither fix (nor the two of them together) resolves the
   example of section A.2.2. To see this, recall that in the example of
   section A.2.2 the EF aggregate is continuously backlogged, but the
   service rate of the EF aggregate is consistently smaller than the
   configured rate, and therefore no finite latency term will suffice
   to bring the example into conformance.  This appears to be a serious
   problem.

   Therefore, we believe that such modification, albeit attractive in
   its simplicity, falls short of addressing all the problems
   identified with the definition of the RFC 2598.

Appendix B: Further Interpretation of the Packet Scale Rate Guarantee
Definition

   The intuitive meaning of the packet scale rate guarantee is that as
   long as there are EF packets in the node, we would like the j-th EF
   packet to depart L(j)/R seconds after the (j-1)st departed. (L(j)/R
   is the time that it would take to forward the j-th packet at the EF-
   configured rate R.) Were this always to occur, the EF packets would
   be forwarded perfectly. The rest of the definition is a concession
   to the extreme unlikelihood that perfect forwarding can occur.

   Perhaps the simplest way to understand the definition is to dissect
   it and examine its various pieces.

   Consider the term min(d(j-1), F(j-1)). This term exists to ensure
   that the node is not given "credit" for faster-than-configured
   service and is not forgiven for slower-than-configured service.
   Suppose that this term was replaced with d(j-1) or with F(j-1).

   Replacing min(d(j-1), F(j-1)) with d(j-1) would permit the node to
   give the EF aggregate a consistently lower rate of service than the
   configured rate whenever E > 0. To see this, suppose that we make
   the replacement, that all packets have size MTU, and that a(j) <=
   d(j-1). (This last condition means that the node is continuously
   backlogged with EF packets over the time interval under discussion.)
   Then, using the revised definition, we would have

      F(j) =  d(j-1) + MTU/R
      d(j) <= F(j) + E = d(j-1) + MTU/R + E


Charny                         May 2000                            15
                           EF PHB Redefined                   Nov 2000

   which would imply

      [d(j) - d(j-1)] <= MTU/R + E

   This last inequality says that the node would be permitted to send
   an MTU-sized packet every (MTU/R)+E seconds. If E > 0, this rate
   would be consistently slower than R and is clearly not acceptable EF
   PHB.

   Replacing min(d(j-1), F(j-1)) with F(j-1) would award the node
   "credit" for faster-than-configured service. It would be possible
   for the node to accumulate this credit by forwarding several EF
   packets in a row, each earlier than required. The node could then
   redeem this credit by delaying the next EF packet until all the
   credit plus the normal inter-packet interval was consumed.  To see
   this, suppose we make the replacement, that all packets have size
   MTU, and that a(j) <= F(j-1). (This last condition means that the
   next EF packet arrives before the previous packet was scheduled to
   depart.) Then, using this revised definition, we would have

      F'(j) = F'(j-1) + MTU/R and d(j) <= F'(j) + E

   Suppose that we have a node with negligible internal delay, that its
   output line rate is C = 3R, and that it forwards n EF packets back-
   to-back. We would have

      F'(1) = MTU/R; d(1) = MTU/C
      F'(2) = F'(1) + MTU/R = 2MTU/R; d(2) = 2MTU/C
         ...
      F'(n) = F'(n-1) + MTU/R = nMTU/R; d(n) = nMTU/C

   By the time the n-th EF packet is forwarded, the node has
   accumulated credit amounting to n(MTU/R - MTU/C). Using the example
   assumption that C = 3R, the node has accumulated credit equal to
   (2n/3)MTU/R. The (n+1)th EF packet need not be forwarded until
   (2n/3)MTU/R + E seconds have elapsed from the time that the n-th
   packet was forwarded. Depending upon the actual values of n and R
   (which may be much less than 1/3 the output line rate), a sizeable
   amount of jitter between the n-th and (n-1)th EF packets would be
   produced.

   These two alternative definitions illustrate the role of the
   min(d(j-1), F(j-1)) term - to ensure that the node forwards EF
   packets at at least the configured rate over both large and small
   time scales.

   The a(j) term and the maximum operator are included for purely
   technical reasons. First, their presence says that the node does not
   have to forward an EF packet that has not yet arrived. Absurd as
   such a notion may be, without the term and the operator, the
   definition would formally insist that EF packets continue to be
   forwarded even when there are none to be forwarded.


Charny                         May 2000                            16
                           EF PHB Redefined                   Nov 2000

   If this were the only purpose for including the a(j) term and the
   maximum operator, it would be much clearer to simply add the
   condition that the definition applies only when the node has
   backlogged EF packets. However, there is a second reason why the
   definition is written as it is - the possibility that the node has
   non-negligible internal delay between the input and the output. Such
   things as header processing, route look-up, and delay in switching
   through a multi-layer fabric could cause this delay.

   The set-up of an example to illustrate this role of the a(j) term
   and the maximum operator is a bit more lengthy than it was for the
   previous examples. Consider a node with an EF-configured rate of R =
   C/2. Let T = MTU/C, the time it takes to forward an MTU-sized packet
   at the output line rate.  Suppose that MTU-sized EF packets arrive
   at the node regularly at a rate of (2/3)R = C/3. Suppose also that
   the node has an internal delay of 3T. Even if there is no other
   traffic, the node will perform no better than is shown below.

      EF Packet Number         1    2    3    4    5    6 ...
      Arrival at router (a(j)) 0   3T   6T   9T  12T  15T ...
      Arrival (at scheduler)  3T   6T   9T  12T  15T  18T ...
      Departure (d(j))        4T   7T  10T  13T  16T  19T ...

   Note that from time 0 onward, EF packets are backlogged in the node.
   If the a(j) term and the maximum operator are removed from the
   definition, then we would have

      F'(j) = min(d(j-1), F'(j-1)) + MTU/R
      d(j) <= F'(j) + E

   Working through the recursions with F' representing the modified
   target finishing time function and F representing the original
   definition given in equations (1) and (2), we have

      EF Packet Number         1    2    3    4    5    6 ...
      Arrival at router (a(j)) 0   3T   6T   9T  12T  15T ...
      Departure (d(j))        4T   7T  10T  13T  16T  19T ...
      Modified (F'(j))        2T   4T   6T   8T  10T  12T ...
      Original (F(j))         2T   5T   8T  11T  14T  17T ...

   The modified F' falls behind the departure times at a constant rate.
   No fixed tolerance term, E, would be large enough to ensure that the
   node's behavior was compliant. On the other hand, the original F has
   every packet being forwarded late, but always late by the same
   amount, 2T. Setting E >= 2T allows the node to conform to EF PHB.

   Note that this node cannot possibly perform any better than has been
   depicted in this example. It cannot begin forwarding EF packets
   until 3T after they arrive. As in this example, as link speeds
   increase, we may well discover that internal delays become multiples
   of the time it takes to transmit a packet. Thus, it is important
   that the definition of EF rigorously address acceptable behavior in
   the presence of internal delay.

Charny                         May 2000                            17
                           EF PHB Redefined                   Nov 2000


   This last example leads to a consideration of the role of the
   tolerance term, E. It happens that E must be greater than 0 for
   almost every real-world node that would provide EF PHB.

   We have already seen that we need E > 0 for a node that has internal
   delay, even if there is no non-EF traffic. Another easy example
   where E > 0 is required, is a non-preemptive node offering an EF-
   configured rate R > C/2. Suppose, for example, that R = 0.75C. With
   such a node, it is always possible that an EF packet will arrive at
   a node (at time 0) just as that node is beginning to serve a non-EF
   packet. Assuming that the EF packet and the non-EF packet are the
   same size (say, MTU-sized), the EF packet will have to wait at least
   until the non-EF packet is forwarded before it can begin to be
   served. That is, the earliest that the EF packet can be forwarded is
   2MTU/C.  Yet, for the EF packet, F(1) = 0 + MTU/(0.75C) =
   (4/3)MTU/C. If E were 0, then d(1) <= (4/3)MTU/C. But, this is
   impossible. Thus, without the tolerance term E > 0, the node could
   not be configured for this EF-configured rate, even if it serves EF
   using priority queuing with EF as the highest priority.

   In Appendix D, we consider the situations of other scheduling
   disciplines for EF service, such as weighted round-robin, weighted
   fair queuing, and other commonly-used schedulers. All of these
   schedulers require an E > 0, even if their internal delay is
   negligible. Rather than excluding nodes employing these schedulers
   from ever being able to offer EF service, we included the tolerance
   term E in the definition of the packet scale rate guarantee. It is
   possible that nodes can use this term as a figure of merit when
   advertising their capability to provide EF PHB.

   It is also important to note that the tolerance E does not permit a
   node to persistently forward EF packets at less than the configured
   rate. By including E in the d(j) <= F(j) + E inequality rather than
   in the recursion that defines F(j), the worst that can happen is
   that forwarding is shifted forward in time by at most E. That is,
   the E term allows a fixed delay for the forwarding of the entire EF
   stream, but it does not allow the rate of forwarding to be less than
   the EF-configured rate. Putting yet another way, it is a difference,
   but not a differential.

Appendix C: Proofs of Satisfiability of the Packet Scale Rate Guarantee
Definition for PQ and WF2Q

C.1 Satisfiability of the Packet Scale Rate Guarantee Definition for PQ

      In this section, we prove that a priority queuing (PQ) scheduler
   satisfies the EF redefinition using the latency term E = MTU/C.

   Statement C1.
   ============


Charny                         May 2000                            18
                           EF PHB Redefined                   Nov 2000

   PQ satisfies the redefinition (equations (eq_1) and (eq_2) of
   section 2.1) with E=MTU/C.

   Proof of C1.

   Consider any busy period of the EF queue. Let k=1 correspond to
   the first packet in that busy period and assume that a(1) >=0.

   We prove by induction that for all k >=1 in this busy period

       d(k) <= F(k)+MTU/C                                (eq_c1_1)

   This would immediately imply Statement C1.

   Base case.
   For k=1,

   F(1) = max(a(1), min(d(0), F(0)) + L(1)/R >= a(1) + L(1)/R >=
          a(1)+L(1)/C                                         (eq_c1_2)

   and
       d(1) <= a(1) + MTU/C + L(1)/C <= F(1) + MTU/C

   where the first inequality follows from the fact that the first
   packet  in a PQ may wait at most for  one  largest packet
   transmission  before its own transmission  begins, and the second
   inequality follows from (eq_c1_2).

   Inductive step.

   Note that since EF has the highest priority, for k > 1 in the busy
   period of the EF queue

        d(k) = d(k-1) + L(k)/C                                (eq_c1_3)

   Now from the induction hypothesis

        F(k-1) >= d(k-1) - MTU/C

   And the definition (eq_2) of section 2.1 of F(k) gives

        F(k) >= max(a(k), min(d(k-1), d(k-1) - MTU/C))+ L(k)/C =
                max(a(k), d(k-1)- MTU/C)+ L(k)/C              (eq_c1_4)

   It follows immediately from (eq_c1_4) that

        F(k) >= d(k-1)- MTU/C + L(k)/C

   Combining with (eq_c1_3) demonstrates (eq_c1_1) and completes the
   inductive step.

C.2 Satsifiability of the Packet Scale Rate Guarantee Definition for
WF2Q

Charny                         May 2000                            19
                           EF PHB Redefined                   Nov 2000


   In this section, we prove that a worst-case fair weighted fair
   queuing (WF2Q) scheduler satisfies the EF redefinition using the
   latency term E = MTU/C + MTU/R. The proof begins with a helping
   theorem that brings us most of
   the way to the conclusion.

   Statement C2.
   ============
   If a scheduler satisfies the condition

   G(i) - E1 <= d(i) <= G(i) + E2                            (eq_c2_1)

   where G(i) is the i-th finishing time of the reference fluid
   scheduler, then it satisfies the redefinition in section 2.1 with
   latency term E <= E1 + E2

   Proof of Statement C2.
   ----------------------
   To prove Statement C2 we will prove that for all i >= 0

   F(i) >= G(i) - E1                                         (eq_c2_2)

   where F(i) is the set of finish times recursively defined by
   (eq_2) of section 2.1.

   If (eq_c2_2) is proven, then from (eq_c2_1) and (eq_c2_2)

   d(i)  <= G(i) + E2 <= F(i) + E1 + E2, which means that the scheduler
   satisfies the redefinition with the latency term E = E1 + E2.

   Proof of (eq_c2_2).
   -----------------
   First note that in the reference GPS system, packet i starts its
   service at time max ( a(i), G(i-1)) and receives  a service rate at
   least equal to R. Thus

       G(i) <= max ( a(i), G(i-1)) + L(i)/R                  (eq_c2_3)

   Now the proof of (eq_c2_2) proceeds by induction.

   Base case

   F(0)=0, G(0) = 0, so (eq_c2_2) trivially holds for i=0.

   Inductive step.

   Suppose (eq_c2_2) holds for all j=0,1...i-1, (i>=1)

   We have both F(i-1) >= G(i-1) - E1 and d(i-1) >= G(i-1) - E1,
   thus

      min (F(i-1), d(i-1))  >= G(i-1) - E1                   (eq_c2_4)

Charny                         May 2000                            20
                           EF PHB Redefined                   Nov 2000


   Combining this with equation (eq_2) of section 2.1, we obtain

      F(i) >= G(i-1) - E1 + L(i)/R                           (eq_c2_5)

   Again from equation (eq_2) we have

      F(i) >= a(i)+ L(i)/R >= a(i) - E1 + L(i)/R             (eq_c2_6)

   Combining  (eq_c2_5), (eq_c2_6) and (eq_c2_3) gives F(i) >= G(i)-E1,
   which  completes  the  proof  of  (eq_c2_2)   and statement C2.

   Statement C3.
   =============
   WF2Q satisfies the redefinition (equations (eq_1) and (eq_2) of
   section 2.1) with E = MTU/C + MTU/R

   Proof of C3.
   ------------
   It follows from the results of [BZ96a] that the departures in WF2Q
   satisfy the condition

        max(G(i-1), a(i))<= d(i) <= G(i) + MTU/C


   From Equation (eq_c2_3) this implies that

       d(i) >= G(i) - L(i)/R >=  G(i) - MTU/R

   Therefore  (eq_c2_1) in Statement C2 holds with E1=MTU/R and
   E2 = MTU/C. Therefore, by Statement C2, WF2Q satisfies the
   redefinition with E=MTU/C + MTU/R.


Appendix D: Implementation Considerations -  Values of the Latency Term
for Various Schedulers

D.1 General queuing and scheduling considerations.

   The redefinition of EF given in section 2.1  does not mandate a
   particular underlying queuing structure.   While it can be
   implemented using aggregate queuing, where all packets of the EF
   aggregate share a single queue, it also allows finer queuing
   granularity, where EF packets may be assigned to a number of
   different queues.

   Likewise, the redefinition allows in principle a wide range of
   schedulingalgorithms, ranging from a strict priority scheduling of
   aggregate EF queue, to hierarchical scheduling with per-flow queuing
   as described in section D.4 below.

   Both the queuing structure and the scheduling algorithm have a
   significant impact on the delay and jitter which can be provided to

Charny                         May 2000                            21
                           EF PHB Redefined                   Nov 2000

   the packets of the EF aggregate. It is typically more difficult to
   provide strict deterministic end-to-end delay and/or jitter
   guarantees if aggregate queuing is implemented [CLeB2000].  However,
   implementing and scheduling a large number of queues at high speeds
   presents a significant engineering challenge, while aggregate
   scheduling is very attractive due to its simplicity and scalability.

D.2 Aggregate Queuing and Scheduling Accuracy for FIFO Service of the
EF Aggregate

   It can be shown that if all packets in the EF aggregate share a
   single FIFO queue served by a scheduler satisfying the rate-latency
   service curve, then end-to-end delay and jitter guarantees depend on
   the latency term E of the scheduler [CLeB2000]. The smaller the
   latency term, the better the delay and jitter bounds that can be
   provided.  In that respect, a strict priority queuing implementation
   which has a very small latency term is a natural candidate for
   implementing EF PHB. Various implementations of Weighted Fair
   Queuing-like schedulers are also possible candidates for such
   implementation, but the delay and jitter characteristics of these
   schedulers differ substantially depending on the accuracy of the
   implementation.

   A widely used way of evaluating the accuracy of rate-based
   scheduling implementations is to compare the output of the scheduler
   with the so-called "fluid model" [Par92].  In this framework, a
   given scheduler S and the reference fluid scheduler are subject to
   the same arrival patterns. The accuracy of the scheduler S can be
   determined by how close the time of the i-th departure in the
   scheduler S is to the corresponding departure time in the fluid
   scheduler.  More precisely, if d(i) is the time of the i-th
   departure under  some scheduler S, and G(i) is the time of the i-th
   departure in  the reference fluid scheduler, then the accuracy of S
   may  be determined by two latency terms E1 and E2 such that for all
   i

                   G(i)-E1 <= d(i)<= G(i) + E2

   While the term E2 determines the maximum per-hop delay bound, E1 has
   an effect on the jitter at the output of the scheduler.   For
   example, as shown in [BZ96a], for WF2Q, E1 = MTU/R, E2= MTU/C, and
   for PGPS [Par92] E2 = MTU/C as well, while E1 is linear in the
   number of queues in the scheduler.  It is demonstrated in [BZ96a]
   that while WF2Q and PGPS have the same delay bounds, PGPS may result
   in substantially burstier departure patterns.

   In general, it can be shown that if a scheduler satisfies DEF_2,
   then it also satisfies the redefinition with the latency term E <=
   E1 + E2.  The proof of this statement is given in Appendix C. Note
   that E1+E2 is not necessarily a tight latency bound, and for a given
   scheduler a tighter bound may be obtained.  That is, the fact that a
   given scheduler has a large E1+E2 does not necessarily mean that is
   has a large E.

Charny                         May 2000                            22
                           EF PHB Redefined                   Nov 2000


   D.3 Additional examples of efficient WFQ-Like Scheduling
   Implementations and their Latency Terms.

   In this section we briefly discuss some schedulers that can be used
   to implement the redefined EF PHB with different degrees of accuracy
   and with different implementation complexity.

D.3.1 Weighted Fair Queuing (WFQ/PGPS)

   For WFQ/PGPS ([DKS90],[Par92]), E2 = MTU/C just as for the case of
   WF2Q. However, it can be shown that E1 can grow linearly with
   the number of queues in the scheduler (which here and below is
   denoted by N). The worst case complexity of WFQ is also O(N).

D.3.2.Deficit Round Robin (DRR)

   For DRR [SV95], both E1 and E2 can be shown to grow linearly with
   N*(r_max/r_min)*MTU, where r_min and r_max denote the smallest and
   the largest rate among the rate assignments of all queues in the
   scheduler. The implementation complexity of DRR is O(1).


D.3.3. Start-Time Fair Queuing (SFQ) and Self-Clocked Fair Queuing
(SCFQ)

   For SFQ [GVC96] and SCFQ [Gol94] both E1 and E2 can be shown to grow
   linearly with N.  Implementation complexity of both of these
   schedulers is O(log N).

D.3.4 WF2Q+

   For WF2Q+ [BZ96b], E1  = MTU/R, while E2 can grow linearly with N.
   The implementation complexity of WF2Q+ is O(log N).

D.4. Hierarchical scheduling implementations

   A possible implementation of EF PHB may be based on a hierarchical
   scheduling framework, such as described in [FJ95].  In this
   framework, different subsets of EF packets may be assigned to
   different queues.  The semantics of exactly how packets are
   classified into different EF queues is highly implementation-
   dependent.    For convenience, the subset of EF packets sharing a
   single queue will be referred to as "EF flows".  The EF queues are
   grouped in a "logical queue", which is scheduled as a single entity
   along with other non-EF queues or groups of queues by a "top-level"
   scheduler. It is this top-level scheduler that must satisfy DEF_1.
   Once the EF aggregate (i.e the EF "logical queue") is scheduled by
   this top-level scheduler, an "EF flow-level" scheduler is invoked.

   As an example, a hierarchical scheduler with WF2Q at each level of
   the hierarchy (as described in [BZ96b]) can be used for such a
   purpose. Alternatively, the EF  "logical" queue can be served at

Charny                         May 2000                            23
                           EF PHB Redefined                   Nov 2000

   strict priority over all non-EF queues, while the EF queue at the
   "EF flow" level can be served by some other scheduler, such as WF2Q.

   In principle, hierarchical scheduling structure allows a substantial
   flexibility in the choice of scheduling mechanisms at each level of
   the hierarchy.  Per-packet delay guarantees in such a hierarchical
   scheduling framework strongly depend on the accuracy of schedulers
   employed at each level of the hierarchy. In general, the more
   accurate the scheduling implementation at each level, the better the
   per-packet guarantee that can be provided.  It can be shown that for
   the scheduling hierarchy, the E1 and E2 latency terms of the
   hierarchical scheduler with respect to a particular "leaf queue"
   can be obtained by summing the  E1  and  E2 terms  of  the
   schedulers employed at the  nodes  of  the scheduling  tree along
   the ascending branch  of  the  tree from the root to the leaf.

D.5.  Effect of internal switching mechanisms

   A packet passing through a router will experience delay for a number
   of reasons.  Two familiar components of this delay are the time the
   packet sits in a buffer at an outgoing link waiting for the
   scheduler to select it and the time it takes to actually transmit
   the packet on the outgoing line.

   There may be other components of a packet's delay through a router,
   however.  A router might have to do some amount of header processing
   before the packet can be given to the correct output scheduler, for
   example.  In another case a router may have a FIFO buffer (called a
   transmission queue in [FC2000]) where the packet sits after being
   selected by the output scheduler but before it is transmitted.  In
   cases such as these, the extra delay a packet may experience can be
   accounted for by absorbing it into the latency term, E, in DEF_1.

   Implementing EF on a router with a multi-stage switch fabric
   requires special attention. A packet may experience additional
   delays due to the fact that it must compete with other traffic for
   forwarding resources at multiple contention points in the core.  The
   delay an EF packet may experience before it even reaches the output-
   link scheduler should be included in the latency term. Input-
   buffered and input/output-buffered routers may also require
   modification of their latency terms.

   Delay in the switch core comes from two sources, both of which must
   be considered. The first part of this delay is the fixed delay a
   packet experiences regardless of the other traffic. This component
   of the delay includes the time it takes for things such as packet
   segmentation and reassembly in cell based cores, enqueueing and
   dequeueing at each stage, and transmission between stages. The
   second part of the switch core delay is variable and depends on the
   type and amount of other traffic traversing the core.  This delay
   comes about if the stages in the core mix traffic flowing between
   different input/output port pairs. Thus, EF packets must compete
   against other traffic for forwarding resources in the core. Some of

Charny                         May 2000                            24
                           EF PHB Redefined                   Nov 2000

   this competing traffic may even be EF traffic from other aggregates.
   This introduces extra delay, that can also be absorbed by the
   latency term in the definition.

   Appendix E: Comparison of the Packet Scale Rate Guarantee with the
   Rate-Latency Curve

   To understand the meaning of the redefinition (equations eq_1 and
   eq_2, in section 2.1) we compare it with a well-known rate-latency
   curve [LEB98], and argue that the redefinition is stronger than the
   rate-latency curve [LEB98] in the sense that if a scheduler
   satisfies the redefinition, it also satisfies the rate-latency
   curve. As a result, all the properties known for the rate-latency
   curve also apply to the redefinition. We also argue why the
   redefinition is more suitable to reflect the intent of EF PHB than
   the rate-latency curve.

   It is shown in [LEB98] that the rate-latency curve is equivalent to
   the following definition:

   Definition DEF_2:

      d(j) <= F'(j) + E                                        (eq_3)

   where

      F'(0)=0,
      F'(j)=max(a(j), F'(j-1))+ L(j)/R   for all j>0           (eq_4)

   It can be easily verified that the redefinition is stronger than
   DEF_2 by noticing that for all j, F'(j) >= F(j).

   It is easy to see that F'(j) in the definition DEF_2 corresponds to
   the time the j-th departure should have occurred should the EF
   aggregate be constantly served exactly at its configured rate R.
   Following the common convention, we refer to F'(j) as the "fluid
   finish time" of the j-th packet to depart.

   The intuitive meaning of the rate-latency curve of DEF_2 is that any
   packet is served at most time E later than this packet would finish
   service in the fluid model.

   For a rate-latency curve DEF_2 (and hence for the stronger
   redefinition) it holds that in any interval (0,t) the EF aggregate
   gets close to the desired service rate R (as long as there is enough
   traffic to sustain this rate). The discrepancy between the ideal and
   the actual service in this interval depends on the latency term E,
   which in turn depends on the scheduling implementation. The smaller
   E, the smaller the difference between the configured rate and the
   actual rate achieved by the scheduler.

   While DEF_2 guarantees the desired rate to the EF aggregate in all
   intervals (0,t) to within a specified error, it may nevertheless

Charny                         May 2000                            25
                           EF PHB Redefined                   Nov 2000

   result in large gaps in service. For example, suppose that (a large
   number) N of identical EF packets of length L arrived from different
   interfaces to the EF queue in the absence of any non-EF traffic.
   Then any work-conserving scheduler will serve all N packets at link
   speed. When the last packet is sent at time NL/C, where C is the
   capacity of output link, F(N) will be equal to NL/R. Suppose now
   that at time NL/C a large number of non-EF packets arrive, followed
   by a single EF packet.  Then the scheduler can legitimately delay
   starting to send the EF packet until time F(N+1)=(N+1)L/R + E - L/C.
   This means that the EF aggregate will have no service at all in the
   interval (NL/C,  (N+1)L/R + E - L/C). This interval can be quite
   large if R is substantially smaller than C. In essence, the EF
   aggregate can be "punished" by a gap in service for receiving faster
   service than its configured rate at the beginning.

   The redefinition alleviates this problem by introducing the term
   min(d(j-1), F(j-1)) in the recursion.  Essentially, this means that
   the fluid finishing time is "reset" if that packet is sent too
   early. As a consequence of that, for the case where the EF aggregate
   is served in a FIFO order, suppose a packet arrives at time t to a
   server satisfying the redefinition. The packet will be transmitted
   no later than time t + Q(t)/R + E, where Q(t) is the EF queue size
   at time t (including the packet under discussion). This statement is
   proved in Appendix C.

7. References

      [BZ96a]      J.C.R. Bennett and H. Zhang, ``WF2Q: Worst-case
      Fair Weighted Fair Queuing'', INFOCOM'96, Mar, 1996

      [BZ96b]     J.C.R. Bennett and H. Zhang, Hierarchical
      Packet Fair Queuing Algorithms. IEEE/ACM Transactions
      on Networking, 5(5):675-689, Oct 1997. Also in
      Proceedings of SIGCOMM'96, Aug, 1996

      [RFC2475]    Black, D., Blake, S., Carlson, M., Davies, E., Wang,
      Z. and W. Weiss, "An Architecture for Differentiated
      Services", RFC 2475, December 1998.

      [LEB98]      J.-Y. Le Boudec, "Application of Network Calculus To
      Guaranteed Service Networks", IEEE Transactions on
      Information theory, (44) 3, May 1998

      [Bra97]      Bradner, S., "Key Words for Use in RFCs to Indicate
      Requirement Levels", BCP 14, RFC 2119, March 1997.

      [CLeB2000]   A. Charny, J.-Y. Le Boudec "Delay Bounds in a
      Network with Aggregate Scheduling". To appear in Proc.
      of QoFIS'2000, September 25-26, 2000, Berlin, Germany.

      [DKS90]      A. Demers, S. Keshav, and S. Shenker, "Analysis
      and Simulation of a Fair Queuing Algorithm". In
      Journal of Internetworking Research and Experience,

Charny                         May 2000                            26
                           EF PHB Redefined                   Nov 2000

      pages 3-26, October 1990. Also in Proceedings of ACM
      SIGCOMM'89, pp 3-12.

      [FC2000]     T. Ferrari and P. F. Chimento, "A Measurement-
      Based Analysis of Expedited Forwarding PHB
      Mechanisms," Eighth International Workshop on Quality
      of Service, Pittsburgh, PA, June 2000,

      [FJ95]       S. Floyd and V. Jacobson, "Link-sharing and Resource
      Management Models for Packet Networks", IEEE/ACM
      Transactions on Networking, Vol. 3 no. 4, pp. 365-
      386,August 1995.

      [Gol94]     S.J. Golestani. "A Self-clocked Fair Queuing
      Scheme for Broad-band Applications". In Proceedings of
      IEEE INFOCOM'94, pages 636-646, Toronto, CA, April
      1994.

      [GVC96]     P. Goyal, H.M. Vin, and H. Chen. "Start-time
      Fair Queuing: A Scheduling Algorithm for Integrated
      Services". In Proceedings of the ACM-SIGCOMM 96, pages
      157-168, Palo Alto, CA, August 1996.

      [RFC2598]     V. Jacobson, K. Nichols, K. Poduri, "An Expedited
      Forwarding PHB", RFC 2598, June 1999

      [RFC2474]    Nichols, K., Blake, S., Baker, F. and D. Black,
      "Definition of the Differentiated Services Field (DS
      Field) in the IPv4 and IPv6 Headers", RFC 2474,
      December 1998.

      [JNP2000]    V. Jacobson, K. Nichols, K. Poduri,
      "The 'Virtual Wire' Behavior Aggregate,"
      (draft-ietf-diffserv-ba-vw-00.txt), March 2000.

      [Par92]     A. Parekh. "A Generalized Processor Sharing
      Approach to Flow Control in Integrated Services
      Networks". PhD dissertation, Massachusetts Institute
      of Technology, February 1992.

      [SV95]       M. Shreedhar and G. Varghese. "Effient Fair Queueing
      Using Deficit Round Robin". In Proceedings of
      SIGCOMM'95, pages 231-243, Boston, MA, September 1995.

      [Sto95]      I. Stoica and H. Abdel-Wahab, "Earliest Eligible
      Virtual Deadline First: A Flexible and Accurate
      Mechanism for Proportional Share Resource Allocation",
      Technical Report 95-22, Old Dominion University,
      November 1995.

8. Authors' addresses

      Anna Charny, ed.

Charny                         May 2000                            27
                           EF PHB Redefined                   Nov 2000

      Cisco Systems
      300 Apollo Drive
      Chelmsford, MA 01824
      acharny@cisco.edu

      Fred Baker
      Cisco Systems
      170 West Tasman Dr.
      San Jose, CA 95134
      fred@cisco.com

      Jon Bennett
      RiverDelta Networks
      3 Highwood Drive East
      Tewksbury, MA 01876
      jcrb@riverdelta.com

      Kent Benson
      Tellabs Research Center
      3740 Edison Lake Parkway #101
      Mishawaka, IN  46545
      Kent.Benson@tellabs.com

      Jean-Yves Le Boudec
      ICA-EPFL, INN
      Ecublens, CH-1015
      Lausanne-EPFL, Switzerland
      leboudec@epfl.c

      Angela Chiu
      AT&T Labs
      100 Schulz Dr. Rm 4-204
      Red Bank, NJ 07701
      alchiu@att.com

      Bill Courtney
      TRW
      Bldg. 201/3702
      One Space Park
      Redondo Beach, CA 90278
      bill.courtney@trw.com

      Shahram Davari
      PMC-Sierra Inc
      555 Legget drive
      Suit 834, Tower B
      Ottawa, ON K2K 2X3, Canada
      shahram_davari@pmc-sierra.com

      Bruce Davie
      Cisco Systems
      300 Apollo Drive
      Chelmsford, MA 01824

Charny                         May 2000                            28
                           EF PHB Redefined                   Nov 2000

      bsd@cisco.com

      Victor Firoiu
      Nortel Networks
      600 Tech Park
      Billerica, MA 01821
      vfirou@nortelnetworks.com

      Charles Kalmanek
      AT&T Labs-Research
      180 Park Avenue, Room A113,
      Florham Park NJ
      crk@research.att.com.

      K.K. Ramakrishnan
      AT&T Labs-Research
      Rm. A155, 180 Park Ave,
      Florham Park, NJ 07932
      kkrama@research.att.com

      Dimitrios Stiliadis
      Lucent Technologies
      1380 Rodick Road
      Markham, Ontario, L3R-4G5, Canada
      stiliadi@bell-labs.com

   9. Full Copyright

   Copyright (C) The Internet Society 2000.  All Rights Reserved.

   This document and translations of it may be copied and furnished to
   others, and derivative works that comment on or otherwise explain it
   or assist in its implementation may be prepared, copied, published
   and distributed, in whole or in part, without restriction of any
   kind, provided that the above copyright notice and this paragraph
   are included on all such copies and derivative works.  However, this
   document itself may not be modified in any way, such as by removing
   the copyright notice or references to the Internet Society or other
   Internet organizations, except as needed for the purpose of
   developing Internet standards in which case the procedures for
   copyrights defined in the Internet Standards process must be
   followed, or as required to translate it into languages other than
   English.

   The limited permissions granted above are perpetual and will not be
   revoked by the Internet Society or its successors or assigns.

   This document and the information contained herein is provided on an
   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

Charny                         May 2000                            29
                           EF PHB Redefined                   Nov 2000


                                                                      [This Page Intentionally Left Blank ]


Charny                         May 2000                            30