Active Queue Management                         Wolfram Lautenschlaeger
and Packet Scheduling (aqm)                              Alcatel-Lucent
Internet Draft                                                Bell Labs
Intended status:
Expires: August 2014      February 13, 2014


            Global Synchronization Protection for Packet Queues
                        draft-lauten-aqm-gsp-00.txt


Abstract

   Transmission capacity sharing TCP flows tend to synchronize among
   each other. This way rate variations of the individual flows, which
   are caused by the congestion control algorithms, do not even out. The
   effect is known as global synchronization. Large queuing buffer
   demand and large latency and jitter are the consequences. Global
   Synchronization Protection (GSP) is an extension of regular tail drop
   packet queuing schemes that prevents global synchronization. For
   large traffic aggregates the de-correlation between the individual
   flow variations reduces buffer demand and packet sojourn time by an
   order of magnitude and more. Even though quite simple, the solution
   has a theoretical rationale and is not heuristic, and it has been
   tested with a Linux implementation.

Status of this Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html


Lautenschlaeger        Expires August 13, 2014                 [Page 1]

Internet-Draft    Global Synchronization Protection       February 2014


   This Internet-Draft will expire on August 13, 2014.

Copyright Notice

   Copyright (c) 2014 IETF Trust and the persons identified as the
   document authors. All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document. Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.


Table of Contents

   1. Introduction...................................................2
   2. Conventions used in this document..............................3
   3. Root cause of global synchronization...........................3
   4. Protecting queues of global synchronization....................4
      4.1. Basic algorithm...........................................4
      4.2. Interval adaptation at large flow numbers.................5
      4.3. Interval adaptation at small RTT..........................6
      4.4. Threshold adaptation......................................6
      4.5. Sanity checks and special cases...........................6
   5. Security Considerations........................................6
   6. IANA Considerations............................................7
   7. Conclusions....................................................7
   8. References.....................................................7
      8.1. Normative References......................................7
      8.2. Informative References....................................7
   9. Acknowledgments................................................7

1. Introduction

   The congestion window (CWND) of a particular TCP connection, in
   combination with the round trip time (RTT), limits the transmission
   rate of the flow, which enables adaptation of the sending rate to the
   actual network conditions, [1]. TCP uses a rather coarse congestion
   control feedback by halving the congestion window in response to
   packet loss. To fill a bottleneck link by 100% anyway, a packet
   buffer in front of the link is required. For a single TCP flow a
   buffer in the range of bottleneck capacity multiplied by the round
   trip time is required (bandwidth-delay product rule, BDP), [2]. For
   aggregated traffic of many flows the picture is not so clear.


Lautenschlaeger        Expires August 13, 2014                 [Page 2]

Internet-Draft    Global Synchronization Protection       February 2014


   Conservative estimations tend towards BDP of the whole aggregate,
   i.e. link capacity * RTT. At the other hand, rate reductions due to
   CWND halving are still only in the range of a particular flow rate.
   With the assumption of N sharing flows, this yields ideally a buffer
   size of only (link capacity/N)*RTT. Unfortunately this value cannot
   be reached in practice. It would require a uniform distribution of
   rate reductions by the different flows over time. In opposite, rate
   reductions of bottleneck sharing flows tend to synchronize among each
   other, which is called global synchronization. In worst case, with
   all flows synchronized, the buffer demand is back at BDP of the whole
   traffic, thus confirming the conservative estimation.

   There are cases where global synchronization does not occur, in
   particular large number of flows (N>500), large spread of RTT between
   the different flows, and high frequency of flow renewals. In these
   cases the buffer size can be reduced to BDP/sqrt(N), which lies
   between the conservative and overly optimistic estimations above,[3].
   Nevertheless there are still doubts, whether the absence of global
   synchronization is a general reliable design assumption for high
   capacity links, [4].

   Most Active Queue Management (AQM) algorithms like RED [5] are aiming
   at better control of the queue size, which implies control over
   global synchronization. Global Synchronization Protection (GSP) goes
   the other way round. It suppresses the root cause of global
   synchronization and de-correlates the CWND variations of the
   competing flows, but it does not impact the behavior of a particular
   flow. This way it moves the buffer size demand down from conservative
   BDP of the whole link into the direction towards the ideal BDP of a
   single flow.

2. Conventions used in this document

   In this document, the term "packet drop" is used for congestion
   notification, silently assuming that congestion marking for ECN could
   be equally applied.

   In this document, the term "queue size" is preferably applied in
   number of bytes, however, the algorithm could be also applied to the
   number of packets, or even to the queuing delay (milliseconds).

3. Root cause of global synchronization

   Global synchronization occurs in cases where a number of greedy TCP
   flows with comparably uniform RTT cross a tail drop queue in front of
   a shared transmission link. Tail drop means, a newly arriving packet
   is placed at the end of the queue if buffer space permits. Otherwise


Lautenschlaeger        Expires August 13, 2014                 [Page 3]

Internet-Draft    Global Synchronization Protection       February 2014


   it is dropped. The queue is drained from front of the queue at the
   speed of the link as long as packets are available. Greedy TCP flows
   means, the applications try to send as fast as possible and the flows
   are not limited elsewhere in the network.

   In congestion avoidance state, all senders gradually increase their
   sending rate, in total exceeding the transmission capacity so that
   the queue is filling up. At some point in time, a first packet is
   dropped due to lack of buffer space. Ideally, the TCP flow, where the
   dropped packet belongs to, reduces its sending rate, the queue
   relaxes, and subsequently arriving packets can be placed in the
   buffer. Senders are continuing to increase their sending rates until
   the next drop, and so on.

   Unfortunately, the rate reduction due to the dropped packet takes at
   least one RTT to take effect at the queue entry. During that RTT
   interval all senders continue to gradually increase their sending
   rates, whereas the queue is still full. Further packets need to be
   dropped. It can be shown analytically that for N flows with NewReno
   and delayed ACK the number of drops is in the range of N/2.
   (Experiments confirm this and show an even higher number with CUBIC.)
   The outcome is that even though the rate reduction by one flow would
   suffice, not one but half of the flows are triggered within one RTT
   to reduce their sending rates - we have global synchronization.

4. Protecting queues of global synchronization

4.1. Basic algorithm

   The basic algorithm is as follows: Set a threshold on queue size
   below the actual buffer size. If a new packet arrives and the queue
   size is above the threshold, then drop that packet. After that,
   ignore any further threshold violation for a timeout interval of 1 -
   3 RTT. After expiry of the timeout proceed as above.


Lautenschlaeger        Expires August 13, 2014                 [Page 4]

Internet-Draft    Global Synchronization Protection       February 2014


   Algorithm:

   interval = e.g. 2 * RTT
   threshold = e.g. 1/2 * buffer size
   next_possible_drop = now
   at any packet enqueuing do:
       if queue_size > threshold && now > next_possible_drop:
           drop this packet
           next_possible_drop = now + interval
       else
           enqueue this packet
   end

   The first dropped packet is triggering the rate reduction. During the
   timeout the queue is growing further beyond the threshold until the
   rate reduction takes effect at queue entry. Afterwards the queue size
   should have dropped below the threshold, so that at expiry of the
   timeout the threshold is typically not violated anymore. No explicit
   action occurs at timeout expiry, which makes the parameter rather
   insensitive to the actual traffic characteristics. Even if the
   timeout interval is too short, the algorithm still reduces global
   synchronization.

4.2. Interval adaptation at large flow numbers

   The basic algorithm works well for moderate numbers of flows N, i.e.
   in a range of 2 < N < 20. More precisely, at flow numbers N smaller
   than the average CWND of one of the sharing flows. At larger numbers
   the total rate increase during the timeout interval is larger than
   the subsequent rate reduction by one of the flows. As consequence,
   after timeout expiry the threshold is still violated, the queue is
   growing further and further, and, eventually, reaches the buffer
   limit and enters tail drop operation. The performance is still better
   than plain tail drop and one could rely on the observation that at
   large flow numbers global synchronization disappears, anyway.

   Alternatively the initial timeout interval can be reduced, depending
   on the actual traffic, in a way, where not just once, but twice, or
   even more times per RTT the timeout expires. The adaptation criterion
   is the proportion of time above and below threshold. In regular
   operation according to the basic algorithm, the queue is most of the
   time below the threshold. If, however, the queue is more frequently
   above than below threshold, the interval should be reduced until
   equilibrium is reached. In this condition the queue is oscillating


Lautenschlaeger        Expires August 13, 2014                 [Page 5]

Internet-Draft    Global Synchronization Protection       February 2014


   around the threshold, quite similar to other AQM schemes like e.g.
   PED.

   Algorithm:

   at any packet enqueueing do:
       theta = cumulative time {above - below} threshold
       k = max(1, gamma * theta)
       interval = initial_interval/k
   end

   Gamma is a scaling parameter that controls the sensitivity of
   adaptation.

4.3. Interval adaptation at small RTT

   The RTT is not known exactly but there should be at least a rough
   idea on the range of RTT for setting up the timeout interval. If this
   estimation is much too large, a similar situation occurs like in the
   large flow numbers case. The total rate increase during the timeout
   interval (which turns out to be multiple RTTs) is larger than the
   subsequent rate reduction by one flow. The adaptation rule is the
   same as for large flow numbers.

4.4. Threshold adaptation

   Tbc

4.5. Sanity checks and special cases

   An additional rule can be introduced that prevents large packet
   bursts from immediately triggering the drop: Restart the timeout not
   only after a packet drop but also whenever a packet is arriving at an
   empty queue.

5. Security Considerations

   Global synchronization is a particular problem of many elastic flows
   sharing a bottleneck. GSP is there to prevent this. But it does not
   protect of unresponsive flows. If the congestion notification
   according to section 4.1. randomly hits an unresponsive flow then the
   expected rate reduction within the timeout interval might simply not
   happen, which postpones the notification by one timeout interval. In
   extreme cases, with a large amount of unresponsive traffic, GSP
   behaves like plain tail drop.


Lautenschlaeger        Expires August 13, 2014                 [Page 6]

Internet-Draft    Global Synchronization Protection       February 2014


6. IANA Considerations

   There are no actions for IANA.

7. Conclusions

   tbc

8. References

8.1. Normative References


8.2. Informative References

   [1]   Van Jacobson, Congestion avoidance and control, Proc. SIGCOMM
         '88, 1988

   [2]   M. Mathis, J. Semke, J. Mahdavi, and T. Ott, The Macroscopic
         Behavior of the TCP Congestion Avoidance Algorithm, Comput.
         Commun. Rev., 27.3, 1997, pp. 67-82.

   [3]   G. Appenzeller, I. Keslassy, and N. McKeown, Sizing router
         buffers, Proc. ACM SIGCOMM '04, 2004.

   [4]   G. Vu-Brugier, R. S. Stanojevic, D. J. Leith, R. N. Shorten, A
         critique of recently proposed buffer-sizing strategies, ACM
         SIGCOMM Computer Communication Review, 37.1, 2007

   [5]   S. Floyd, Van Jacobsen, Random Early Detection Gateways for
         Congestion Avoidance, IEEE/ACM Trans. Networking, 1.4, 1993

   [6]   K. Nichols, Van Jacobson, "Controlling Queue Delay", ACM Queue
         - Networks, 2012

   [7]   R. Pan, et al., PIE: A lightweight control scheme to address
         the bufferbloat problem, IETF, draft-pan-tsvwg-pie-00, 2012

9. Acknowledgments

   This document was prepared using 2-Word-v2.0.template.dot.


Lautenschlaeger        Expires August 13, 2014                 [Page 7]

Internet-Draft    Global Synchronization Protection       February 2014


Authors' Addresses

   Wolfram Lautenschlaeger
   Alcatel-Lucent
   Bell Labs
   Lorenzstrasse 10
   70435 Stuttgart
   Germany

   Email: Wolfram.Lautenschlaeger@alcatel-lucent.com


Lautenschlaeger        Expires August 13, 2014                 [Page 8]