Internet Engineering Task Force     N. Venkitaraman   Motorola Labs 
     INTERNET DRAFT                      J. Mysore         Motorola Labs 
     Expires January, 2001               R. Srikant        UIUC
                                         R. Barnes         UIUC

                                         July 2000


                      Stateless Prioritized Fair Queueing
                   <draft-venkitaraman-diffserv-spfq-00.txt>


     Status of this Memo

       This document is an Internet-Draft and is in full conformance with 
       all provisions of Section 10 of RFC2026.

       Internet-Drafts are working documents of the Internet Engineering 
       Task Force (IETF), its areas, and its working groups. Note that 
       other groups may also distribute working documents as Internet-Drafts.

       Internet-Drafts are draft documents valid for a maximum of six months
       and may be updated, replaced, or obsoleted by other documents at any
       time. It is inappropriate to use Internet-Drafts as reference
       material or to cite them other than as "work in progress."

       The list of current Internet-Drafts can be accessed at
       http://www.ietf.org/ietf/1id-abstracts.txt

       The list of Internet-Draft Shadow Directories can be accessed at
       http://www.ietf.org/shadow.html.

     Abstract

       This document proposes a Per Hop Behavior(PHB) in the context of 
       diffserv. We first propose a labeling procedure for the ingress 
       router that labels packets in a manner that implicitly conveys the 
       packet arrival rate information of each flow to the core router. 
       Coupled with a simple forwarding procedure, this provides a
       fair allocation of bandwidth. We then propose other labeling
       procedures that add little computational complexity, but provide
       weighted fair allocation, minimum bandwidth assurances and support 
       for intra-flow priorities. By appropriately designing the labeling 
       algorithms, this PHB can be used to optimize the performance of a 
       variety of flows.

     1. Introduction

       The Internet is being increasingly used for carrying multimedia 
       streams that are sensitive to the end to end rate, delay, and 
       drop assurances they receive from the network. Another characteristic 


     Venkitaraman et al       Expires January 2001               [Page 1]

     INTERNET DRAFT     Stateless Prioritized Fair Queueing     July 2000


       of flows sharing the Internet is that most of them tend to be 
       composed of packets which contribute varying amounts of utility to 
       the flow they belong to. That being the case, the utility provided 
       by a quantum of resource allocated to a flow depends on the value of 
       the packets that use it. In summary, merely allocating a certain
       quantity of bandwidth to a flow does not always imply that the flow 
       will be able to make optimal use of it at all times. This intra-flow 
       heterogeneity in packet utility could, for instance, be caused due 
       to the stream employing a hierarchical coding mechanism as in MPEG or
       for reasons intrinsic to the rate adaptation algorithm implemented by 
       the transport layer. In such cases, dropping the ``wrong'' packet(s)
       can significantly impact the qualitative and quantitative extent to 
       which a flow is able to make use of the resources allocated to it. 
       In general, if we consider optimizing the perceived quality of service
       as the end goal of an architecture for providing end to end QoS, 
       then it is important that we ensure fair allocation of resources, 
       and enable simple ways for a flow to make best use of its fair share.
       In order to perform fair allocation of resources to contending flows, 
       traditionally either per-flow queueing mechanisms [3] or per-flow 
       dropping mechanisms [6] have been proposed in core routers. These 
       mechanisms however require core routers to maintain flow specific 
       state for each flow sharing a link. This results in limiting the 
       scalability of these approaches. In addition, dropping, queueing 
       and forwarding decisions are more complex than traditional FIFO 
       queueing mechanisms, introducing delays in the forwarding path. 
       More recently CSFQ [8], Corelite [10], CJVC [7] have proposed 
       scalable core-stateless architectures for providing per-flow 
       QoS commitments.

       Stoica et al.[9] introduced the concept of dynamic packet state(DPS),
       where in, each packet carries in its header, in addition to the PHB 
       codepoint, some PHB-specific state. The state is initialized by the 
       ingress node. Core nodes process each incoming packet based on 
       the state carried in the packet's header, and optionally updates 
       their internal state and/or the state in the packet's header before 
       forwarding the packet to the next hop. By using DPS to coordinate 
       the actions of edge(ingress) and core nodes along the path traversed
       by a flow, distributed algorithms can be designed to approximate the 
       behavior of a broad class of "stateful" networks using networks 
       in which core nodes do not maintain per flow state.

       This document proposes a PHB which is based on the concept of 
       dynamic packet state. Using different labeling procedures at the 
       ingress nodes this PHB can be tailored to provide services such as 
       fair allocation of bandwidth to flows sharing a network and support 
       for intra-flow priorities which ensures that lower priority packets 
       of a flow get dropped preferentially over high priority packets.


     Venkitaraman et al       Expires January 2001               [Page 2]

     INTERNET DRAFT     Stateless Prioritized Fair Queueing     July 2000


       By appropriately designing the labeling procedure at the ingress 
       node, such a PHB can be tailored to optimize the performance of a 
       variety of flows. We refer to our approach as Stateless Prioritized 
       Fair Queueing(SPFQ). In section 2, we explain the principle behind 
       SPFQ using a simple labeling algorithm for the ingress node. 
       In section 3, we present different labeling algorithms that can be 
       used to optimize the perceived quality of service for different 
       classes of flows.

     2. Stateless Prioritized Fair Queueing (SPFQ)

       Unlike common PHB codepoints [1,4,5], a PHB codepoint based 
       on DPS has extra state associated to it. This state is initialized 
       by ingress nodes and carried by packets inside the DS domain.
       The state semantic is PHB dependent. The values taken by the
       state can be either flow, path, or packet dependent. The state
       carried by packets can be used by core nodes for a variety of
       purposes such as, packet scheduling, updating the local node's state,
       or extending the codepoint space.

       We first explain our approach using an idealized fluid model 
       and then present its packetized version.

     2.1 Fluid Model Algorithm: Bit Labeling

       We first restate the key observation in CSFQ [8] that we will
       also use. In a router implementing fair queueing, all flows
       that are bottlenecked at a router have the same output rate.
       We call this the rate threshold(r_t(t)). 

       Let C be the capacity of an output link at a router, and r_i(t) the 
       arrival rate of flow i in bits per second. Let A(t) denote the total
       arrival rate of n flows: 
       A(t)= r_1(t) + r_2(t) + ... + r_n(t). If A(t) <= C then no bits are
       dropped. But if A(t) > C, then r_t(t) is the unique solution to
       C = min(r_1(t), r_t(t))+min(r_2(t), r_t(t))+ ... +min(r_n(t), r_t(t)).
       In general, if max-min bandwidth allocations are achieved, each flow i
       receives service at a rate given by min(r_i(t), r_t(t)). 

       For every flow, in any given second, we consider up to r_t(t) bits of 
       the flow as being conforming, and all bits in excess of that as being
       non-conforming. If we mark every bit of a flow as being conforming 
       or non-conforming, we can obtain the allocation provided by a fair 
       queueing router, by simply having core routers accept all conforming 
       bits of a flow and dropping all non-conforming bits.


     Venkitaraman et al       Expires January 2001               [Page 3]

     INTERNET DRAFT     Stateless Prioritized Fair Queueing     July 2000


       What we need now, is a labeling algorithm at the ingress node, 
       that would enable the core router to distinguish between 
       conforming and non-confirming traffic.

       Consider the simple sequential labeling algorithm given below:

       label(bit)
         served+ = 1
         bit->label = served
       where the value of 'served' is reset to 0, after every second.

       Let us suppose that the rate at which each flow is sending 
       bits is constant. The result of this algorithm is that
       during any given second, the bits from a flow sending at
       rate r_i bits per second are marked sequentially from 1 to
       r_i . Each flow's bits are labeled from 1 to r_i during any
       second; and the label is reset to 0 at the end of each 
       second. Then, for any given flow, accepting bits with label 
       1 to 'r_a', would be equivalent to providing the flow with a 
       rate of 'r_a' bits per second. So, if all bits carry such a 
       label, the core router can simply identify non-conforming bits 
       to be those with label > r_t and drop them. Consequently, no
       flow can receive a service in excess of r_t . Furthermore, as
       all bits with label <= r_t are accepted, all flows sending at a
       rate less than or equal to r_t will not have any of their bits
       dropped.

       As we will see in the next section, a key advantage of such a 
       labeling procedure is that it allows us to convey rate information as 
       well as intra-flow utility to the core router using the same field 
       in the packet header.


     Venkitaraman et al       Expires January 2001               [Page 4]

     INTERNET DRAFT     Stateless Prioritized Fair Queueing     July 2000


     2.2 Packet Labeling

       In order to extend the sequential labeling algorithm given
       for the fluid model to a packetized model, we essentially
       need to take variable packet sizes into account. Hence,
       instead of incrementing the counter 'served' by 1 (which
       is the size of any packet in a network with purely single-bit
       packets), we increment the value of 'served' by the size of
       the packet. Given below is a pseudo code for the packetized 
       version of the sequential marking algorithm.

       label(pkt)
         served+ = pkt->size
         pkt->label = served
       where the value of served is reset to 0, after a fixed size epoch.

       All ingress routers in a DS domain MUST use a same sized epoch.
       The size of the epoch is a design parameter that should be chosen 
       to reflect a trade-off between mimicking the fluid model accurately
       and not giving an unfair advantage to flows that arrived
       most recently in the system. To understand this trade-off,
       suppose that the epoch is chosen to be very long and that a new 
       flow arrives in the middle of an epoch. Then the bits from the 
       new flow would be labelled starting from a value of one and would 
       have a higher priority throughout the rest of the epoch than 
       the bits of flows that have been sending bits from the beginning 
       of the epoch.

       [2], [8] have proposed algorithms for updating the rate 
       threshold (r_t) based on link state, i.e, based on parameters
       such as queue length and the aggregate accepted rate of packets.

     2.2.1 Forwarding Decision at a Core Router

       The forwarding decision at the core router is made based
       on the following algorithm:

       enque(pkt)
         if (pkt->label <= r_t)
          Accept(pkt)
         else
          Drop(pkt)

       Note that the core routers do not have to relabel the packets. This 
       can have significant performance benefits over approaches that 
       require relabeling in the core.

     3. Services based on SPFQ

       In this section, we present examples of labeling methods that can 
       provide different services with marginal increase in computational 
       complexity.


     Venkitaraman et al       Expires January 2001               [Page 5]

     INTERNET DRAFT     Stateless Prioritized Fair Queueing     July 2000


     3.1 Weighted SPFQ

       The SPFQ algorithm can be readily extended to support flows with 
       different weights. Let w_i be the weight of flow i. An allocation 
       is weighted fair if all bottlenecked flows have the same value
       for r_i/w_i. The only major change required to achieve weighted fair 
       allocation is in the ingress labeling algorithm, where we need to 
       use 'served/w_i' instead of 'served'. This enables per-flow service 
       differentiation without maintaining per-flow state in the core nodes 
       of the network. 

     3.2 Minimum bandwidth allocation

       From the forwarding algorithm given in section 2, it is clear that the
       packets marked with lower values of label are dropped only after all 
       packets with larger labels have been dropped. This suggests that 
       packets marked with the smallest label (of 0) will not be dropped 
       as long as the aggregate rate of such packets does not exceed the 
       link capacity. So, for a flow requiring a minimum bandwidth allocation
       of 'b_min', labeling packets with the smallest label at a rate of 
       'b_min' would guarantee that the flow will receive the rate that it 
       has been guaranteed within any reasonable time window (assuming that 
       there is no packet loss due to channel error). An admission control 
       mechanism MUST be used to ensure that the aggregate reserved rate 
       does not exceed the capacity of the link. A distributed admission 
       control mechanism, such as the one proposed in [7] can be used for 
       this purpose.

     3.3 Avoiding back-to-back packet drops

       Many real-time applications and certain versions of TCP perform 
       poorly in the presence of back-to-back or bursty pattern of drops. 
       The labeling procedure at the ingress router can be exploited to 
       naturally reduce the possibility of bursty packet drops.

       For ease of presentation, let us first consider a situation
       where the router stores all packets that arrive during an
       epoch, then stamps the packets and sends them out. Let us
       assume that this procedure is repeated at the end of each
       epoch. Suppose that a flow, let us say Flow i, sends 10
       packets in an epoch and that the length of each packet is
       the same. In the algorithm as described so far, the first
       packet that arrived in the epoch will be stamped 1; the second 
       packet that arrived in the epoch would be stamped 2 (assuming the 
       packet lengths are normalized to be equal to 1) and so on. 
       Suppose that the threshold r_t at a core router is 5; then the 
       first five packets would be accepted and the next five would be 
       dropped. To avoid such a bursty loss pattern, we could instead 
       label the first arriving packet as 1, the next arriving packet
       6; the third packet 2; the fourth packet 7; etc. We call this
       procedure interleaving. Using interleaving, with the same forwarding
       procedure at the core router, alternate packets would be
       dropped instead of consecutive packets. Of course, a different 
       choice of interleaving the labels could provide some countermeasure 
       against three consecutive losses, four consecutive losses etc. Thus, 
       in principle, the labeling procedure can be used to convey both rate 
       information to the core as well as to convey intra-flow priorities.


     Venkitaraman et al       Expires January 2001               [Page 6]

     INTERNET DRAFT     Stateless Prioritized Fair Queueing     July 2000


       The above algorithm is difficult to implement since it requires the 
       edge router to store all the packets from all the ingress flows 
       during an epoch. We can get an implementable version of the above 
       algorithm by maintaining an exponentially-averaged estimate of the 
       rate of each flow at the edge router. This rate estimate can then 
       be used to estimate the number of expected packets in the next epoch. 
       Let us suppose that the estimate of the number of packets in the 
       next epoch is 10; but the actual number of packets that arrive in 
       the next epoch is larger than the estimate. Then, the first 10 
       packets could be interleaved, but the remaining packets would not be. 
       If the number of packet arrivals is less than the estimate, then we 
       would end up assigning larger label (lower priority) to some of the 
       packets than we should have. However in a typical epoch, the labeling 
       and interleaving procedures allow us to significantly reduce the 
       possibility of back-to-back drops.  

     3.4 Intra-flow priorities for streams encoded in layers

       Consider a flow, Flow i, which is sending packets into
       the network at rate r_i : Suppose that the packets of Flow i
       belong to three priority levels and the rate at which Flow
       i generates packets of level 1 be r_i1 : Rates r_i2 and r_i3 are
       defined in a similar manner and r_i1 + r_i2 + r_i3 = r_i : Let
       us suppose that intra-flow priority level 1 has higher priority 
       than level 2 which in turn has higher priority than level 3. 
       As examples, the three levels can be thought as I, P and B frames 
       in an MPEG video stream or as layers in a layered multicast session. 
       Further, suppose that r_i > r_t where we recall that r_t is the 
       threshold at a core router. Then, Flow i's packets would be dropped at
       rate (r_i-r_t)/r_i. In addition to providing fairness across flows,
       we would like the network to provide differentiated dropping for
       the three priority levels for Flow i as follows: Level 1
       packets are dropped at rate max(0, r_i1-r_t)/r_i1; Level 2 packets 
       are dropped at rate max(0, r_i1 +r_i2-max(r_t ;r_i1 ))/ r_i2
       and Level 3 packets are dropped at rate
       max(0, r_i1 +r_i2 +r_i3-max(r_t ;r_i1 +r_i2 ))/r_i3 : In other
       words, if dropping is necessary Level 3 packets are dropped
       first, followed by Level 2 packets and finally, Level 1 packets.

       The priority dropping described above can be implemented
       in SPFQ as follows: the edge router maintains an estimate of 
       r_i1 ; r_i2 and r_i3. Thus, if priority dropping is desired at the 
       core with no increase in computational complexity at the core
       routers, then the end hosts are required to communicate
       their priorities to the edge. A simple mechanism to accomplish 
       this could be in the form of a field in the packet
       header. If priority dropping is required, this is inevitable; 
       the priorities of the packets have to be communicated by the end 
       hosts to the network in some manner. The interleaving procedure in 
       an epoch is as follows: without loss of generality, let us suppose 
       that an epoch is of length 1 second. Then, we would label Level 1 
       bits of Flow i starting from 1 through r_i1 . Level 2 bits from 
       r_i1 +1 through r_i1 +r_i2 and Level 3 bits from r_i1 +r_i2 +1 
       through r_i1 +r_i2 +r_i3.


     Venkitaraman et al       Expires January 2001               [Page 7]

     INTERNET DRAFT     Stateless Prioritized Fair Queueing     July 2000


     4. Issues

     4.1 Recommended Codepoints

       In keeping with the recommendation in [9], it is appropriate to 
       use values drawn from the 16 codepoints reserved for local and 
       experimental use (xxxx11).

     4.2 Encoding Flow State

       In PHBs based on DPS, the ingress router inserts flow specific state
       when the packet enters the DS domain. In SPFQ, when a packet from a 
       flow enters the DS domain, the ingress router MUST insert a label 
       that implicitly conveys both the rate of the flow and relative 
       priority of the packet within the flow. Additionally, a flow MAY 
       convey the priority of a packet to the edge ingress router by 
       encoding the information in the packet header. There are at least 
       three ways to encode state in the packet header: (1) introduce a new
       IP option and insert the option at the ingress router, (2) introduce 
       a new header between layer 2 and layer 3, and (3) use the existing 
       IP header. [9] describes the relative merits of each approach. 
       The key requirement to be met is that all changes performed at 
       ingress nodes or within the DS domain on packets' headers 
       (possible for the purpose of storing the state) must be undone by 
       egress nodes.

     4.3 IP Fragmentation

       Consider a packet of size 'S' bytes carrying a label of 'L'. When the
       packet gets fragmented into 'n+1' fragments such that S=n*s+d,
       its label must be updated as follows: the ith fragment in
       the first 'n' fragments of size 's' will have a label (L-d)-(n-i)*s
       and the last fragment will have a label of L.

     5. Conclusion

       In this document, we have presented a scalable architecture for jointly
       providing fair bandwidth allocation to flows and prioritized dropping 
       for flows with intra-flow priorities. Key attributes of the presented 
       approach are that it enables the network to make per-flow rate 
       commitments without maintaining per-flow state in the core routers 
       and to provide support for intra-flow priorities. The labeling 
       algorithms in the ingress nodes can be tailored to optimize the 
       performance of a variety of flows. We presented labeling procedures 
       that can be used to provide weighted service differentiation, minimum 
       rate contracts, reduce the odds of back to back drops in flows sensitive 
       to bursty drops and optimize the perceived quality of layered streams
       in a network providing equal sharing of the bandwidth. We believe 
       that PHBs such as SPFQ will be useful in enabling fair resource 
       allocation, and optimizing the percieved quality of service of a
       variety of flows in a DS domain.


     Venkitaraman et al       Expires January 2001               [Page 8]

     INTERNET DRAFT     Stateless Prioritized Fair Queueing     July 2000


     References

       [1] S. Blake, D. Black, M. Carlson, E. Davies, Z. Wang, and
       W. Weiss. An Architecture for Differentiated Services, Internet
       Draft, draft-ietf-diffserv-arch-02.txt, October 1998.

       [2] Z. Cao, Z Wang and E. Zegura, ``Rainbow Fair Queueing: Fair 
       Bandwidth Sharing Without Per-Flow State,'' Proc. of INFOCOM 2000.

       [3] A. Demers, S. Keshav, and S. Shenker. ''Analysis and Simulation 
       of a Fair Queueing Algorithm,'' Journal of Internetworking Research
       and Experience, October 1990.

       [4] J. Heinanen, F. Baker, W. Weiss, and J. Wroclawski. ''Assured 
       Forwarding PHB Group,'' Internet Draft, draft-ietf-diffserv-af-04.txt,
       January 1999.

       [5] V. Jacobson, K. Poduri and K. Nichols. ''An Expedited 
       Forwarding PHB, November 1998. Internet Draft,'' 
       draft-ietf-diffserv-phb-ef-01.txt, November 1999.

       [6] D. Lin and R. Morris. ''Dynamics of Random Early Detection,''
       Proc. of SIGCOMM'97, October 1997.

       [7] I. Stoica and H. Zhang. ``Providing Guaranteed Services Without
       Per Flow Management,'' Proc. of SIGCOMM '99, September 1999.

       [8] I. Stoica, S. Shenker and H. Zhang. ``Core-Stateless Fair Queue-
       ing: Achieving approximately Fair Bandwidth Allocations in
       High Speed Networks,'' Proc. of SIGCOMM '98, September
       1998.

       [9] I. Stoica et al. "Per Hop Behaviors Based on  Dynamic 
       Packet States, <draft-stoica-diffserv-dps-00.txt>, February 1999.

       [10] S. Raghupathy, T. Kim, N. Venkitaraman and V. Bharghavan.
       ``Achieving Per-flow Weighted Rate Fairness in a Core-Stateless
       Network,'' Proc. of ICDCS '00, April 2000.

       [11] K. Nichols, S. Blake, F. Baker, and D. L. Black,
       ``Definition of the Differentiated Services Field (DS Field) in the
       ipv4 and ipv6 Headers, Internet Draft, draft-ietf-diffserv-
       header-04.txt,'' October 1998.


     Venkitaraman et al       Expires January 2001               [Page 9]

     INTERNET DRAFT     Stateless Prioritized Fair Queueing     July 2000


     9. Author's Addresses

       Narayanan Venkitaraman
       Motorola Labs, IL02/2240
       1301 E. Algonquin Rd.
       Schaumburg, IL 60196
       venkitar@rsch.comm.mot.com

       Jayanth Mysore
       Motorola Labs, IL02/2240
       1301 E. Algonquin Rd.
       Schaumburg, IL 60196
       jayanth@rsch.comm.mot.com

       R. Srikant
       Coordinated Science Lab
       Univ. of Illinois
       1308 W Main St.
       Urbana, IL  61801
       rsrikant@uiuc.edu

       Roberto Barnes
       Coordinated Science Lab
       Univ. of Illinois
       1308 W Main St.
       Urbana, IL  61801
       rbarnes@comm.csl.uiuc.edu


     Venkitaraman et al       Expires January 2001              [Page 10]