Transport Area Working Group                                    G. White
Internet-Draft                                                 CableLabs
Intended status: Informational                          October 22, 2018
Expires: April 25, 2019


 Identifying and Handling Non Queue Building Flows in a Bottleneck Link
                        draft-white-tsvwg-nqb-00

Abstract

   This draft discusses the potential to improve quality of experience
   for broadband internet applications by distinguishing between flows
   that cause queuing latency and flows that don't.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on April 25, 2019.

Copyright Notice

   Copyright (c) 2018 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (https://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.


White                    Expires April 25, 2019                 [Page 1]

Internet-Draft          Non Queue Building Flows            October 2018


Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
   2.  Non-Queue Building Flows  . . . . . . . . . . . . . . . . . .   3
   3.  Identifying NQB traffic . . . . . . . . . . . . . . . . . . .   3
     3.1.  Endpoint marking  . . . . . . . . . . . . . . . . . . . .   4
     3.2.  Queuing behavior analysis . . . . . . . . . . . . . . . .   5
   4.  Non Queue Building PHB  . . . . . . . . . . . . . . . . . . .   5
   5.  End-to-end Support  . . . . . . . . . . . . . . . . . . . . .   6
   6.  Relationship to L4S . . . . . . . . . . . . . . . . . . . . .   6
   7.  Comparison to Existing Approaches . . . . . . . . . . . . . .   6
   8.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .   7
   9.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .   7
   10. Security Considerations . . . . . . . . . . . . . . . . . . .   7
   11. Informative References  . . . . . . . . . . . . . . . . . . .   7
   Author's Address  . . . . . . . . . . . . . . . . . . . . . . . .   8

1.  Introduction

   Residential broadband internet services are commonly configured with
   a single bottleneck link (the access network link) upon which the
   service definition is applied.  The service definition, typically an
   upstream/downstream data rate tuple, is implemented as a configured
   pair of rate shapers that are applied to the user's traffic.  In such
   networks, the quality of service that each application receives, and
   as a result, the quality of experience that it generates for the user
   is influenced by the characteristics of the access network link.

   The vast majority of packets that are carried by residential
   broadband access networks are managed by an end-to-end congestion
   control algorithm, such as Reno, Cubic or BBR.  These congestion
   control algorithms attempt to seek the available capacity of the end-
   to-end path (which in the case of residential broadband networks, can
   frequently be the access network link), and in doing so generally
   overshoot the available capacity, causing a queue to build-up at the
   bottleneck link.  This queue build up results in queuing delay that
   the application experiences as variable latency.

   In contrast to congestion-controlled applications, there are a
   variety of relatively low data rate applications that do not
   materially contribute to queueing delay, but are nonetheless
   subjected to it by sharing the same bottleneck link in the access
   network.  Many of these applications may be sensitive to latency or
   latency variation, and thus produce a poor quality of experience in
   such conditions.

   Active Queue Management (AQM) mechanisms (such as PIE [RFC8033],
   DOCSIS-PIE [RFC8034], or CoDel [RFC8289]) can improve the quality of


White                    Expires April 25, 2019                 [Page 2]

Internet-Draft          Non Queue Building Flows            October 2018


   experience for latency sensitive applications, but there are
   practical limits to the amount of improvement that can be achieved
   without impacting the throughput of capacity-seeking applications.

   This document considers differentiating between these two classes of
   traffic in bottleneck links in order that both classes can deliver
   exceptional quality of experience for their applications, and
   solicits discussion / feedback.

2.  Non-Queue Building Flows

   There are many applications that send traffic at relatively low data
   rates and/or in a fairly smooth and consistent manner such that they
   are highly unlikely to exceed the available capacity of the network
   path between source and sink.  Such applications are ideal candidates
   to be queued separately from the capacity-seeking applications that
   cause queue buildups and latency.

   These Non-queue-building (NQB) flows are typically UDP flows, which
   send traffic at a lower data rate and don't seek the capacity of the
   link (examples: online games, voice chat, dns lookups).  Here the
   data rate is essentially limited by the Application itself.  In
   contrast, Queue-building (QB) flows include traffic which uses the
   Traditional TCP, QUIC, BBR or other TCP variants.

   There are a lot of great examples of applications that fall very
   neatly into these two categories, but there are also application
   flows that may be in a gray area in between (e.g. they are NQB on
   high-speed links, but QB on slow-speed links).

3.  Identifying NQB traffic

   This memo is intended to seek feedback on mechanisms by which Non-
   Queue Building flows can be identified by the network in an
   application-neutral way.  Two mechanisms in particular seem feasible,
   and could (either alone or in concert) be used to differentiate
   between QB and NQB flows.

   o  Endpoint marking.  This mechanism would have application endpoints
      apply a marking (perhaps utilizing the Diffserv field of the IP
      header) to NQB flows that could then be used by the network to
      differentiate between QB and NQB flows.

   o  Queuing behavior analysis.  This mechanism would utilize real-time
      per-flow traffic statistics to identify whether a flow is sending
      traffic at a rate that exceeds the available capacity of the
      bottleneck link and hence is causing a queue to form.


White                    Expires April 25, 2019                 [Page 3]

Internet-Draft          Non Queue Building Flows            October 2018


3.1.  Endpoint marking

   This mechanism would have application endpoints apply a marking
   (perhaps utilizing the Diffserv field of the IP header) to NQB flows
   that could then be used by the network to differentiate between QB
   and NQB flows.  It would be useful for such a marking to be
   universally agreed upon, rather than being locally defined by the
   network operator, such that applications could be written to apply
   the marking without regard to local network policies.

   Some questions that arise when considering endpoint marking are: How
   can an application determine whether it is queue building or not,
   given that the sending application is generally not aware of the
   available capacity of the path to the receiving endpoint?  Even in
   cases where an application is aware of the capacity of the path, how
   can it be sure that the available capacity (considering other flows
   that may be sharing the path) would be sufficient to result in the
   application's traffic not causing a queue to form?  In an unmanaged
   environment, how can networks trust endpoint marking, why wouldn't
   all applications mark their packets as NQB?

   As an answer the last question, it would be worthwhile to note that
   the NQB designation and marking would be intended to convey
   verifiable traffic behavior, not needs or wants.  Also, it would be
   important that incentives are aligned correctly, i.e. that there is a
   benefit to the application in marking its packets correctly, and no
   benefit for an application in intentionally mismarking its traffic.
   Thus, a useful property of nodes that support separate queues for NQB
   and QB flows would be that for NQB flows, the NQB queue provides
   better performance (considering latency, loss and throughput) than
   the QB queue; and for QB flows, the QB queue provides better
   performance (considering latency, loss and throughput) than the NQB
   queue.

   Even so, it is possible that due to an implementation error or
   misconfiguration, a QB flow would end up getting mismarked as NQB, or
   vice versa.  In the case of an NQB flow that isn't marked as NQB and
   ends up in the QB queue, it would only impact its own quality of
   service, and so it seems to be of lesser concern.  However, a QB flow
   that is mismarked as NQB, either due to error or due to the fact that
   the application developer can't predict the data rate capabilities of
   the link, would causing queuing delays for all of the other flows
   that are sharing the NQB queue.

   To prevent this situation from harming the performance of the real
   NQB flows, it would likely be valuable to support a "queue
   protection" function that could identify QB flows that are mismarked
   as NQB, and reclassify those flows/packets to the QB queue.  This


White                    Expires April 25, 2019                 [Page 4]

Internet-Draft          Non Queue Building Flows            October 2018


   would benefit the reclassified flow by giving it access to a large
   buffer (and thus lower packet loss rate), and would benefit the
   actual NQB flows by preventing harm (increased latency variability)
   to them.  Some open questions around this function include: How could
   such a function be implemented in an objective and verifiable manner?
   What other options might exist to serve this purpose in a dual-queue
   architecture?

3.2.  Queuing behavior analysis

   Similar to the queue protection function outlined in the previous
   section, it may be feasible to devise a real time flow analyzer for a
   node that would identify flows that are causing queue build up, and
   redirect those flows to the QB queue, leaving the remaining flows in
   the NQB queue.

4.  Non Queue Building PHB

   This section uses the DiffServ nomenclature of per-hop-behavior (PHB)
   to describe how a network node could provide better quality of
   service for NQB flows without reducing performance of QB flows.

   A node supporting the NQB PHB would provide a separate queue for non-
   queue-building traffic.  This queue would support a latency-based
   queue protection mechanism that is able to identify queue-building
   behavior in flows that are classified into the queue, and to redirect
   flows causing queue build up to a different queue.

   While there may be some similarities between the characteristics of
   NQB flows and flows marked with the Expedited Forwarding DSCP, the
   NQB PHB would differ from the Expedited Forwarding PHB in several
   important ways.

   o  NQB traffic is not rate limited or rate policed.  Rather, the NQB
      queue would be expected to support a latency-based queue
      protection mechanism that identifies NQB marked flows that are
      beginning to cause latency, and redirects packets from those flows
      to the queue for QB flows.

   o  The node supporting the NQB PHB makes no guarantees on latency or
      data rate for NQB marked flows, but instead aims to provide sub-
      millisecond queuing delays for as many such marked flows as it
      can, and shed load when needed.

   o  EF is commonly used exclusively for voice traffic, for which
      additional functions are applied, such as admission control,
      accounting, prioritized delivery, etc.


White                    Expires April 25, 2019                 [Page 5]

Internet-Draft          Non Queue Building Flows            October 2018


   In networks that support the NQB PHB, it may be preferred to also
   include traffic marked EF (101110b) in the NQB queue.  The choice of
   the 0x2A codepoint (101010b) for NQB would conveniently allow a node
   to select these two codepoints using a single mask pattern of
   101x10b.

5.  End-to-end Support

   In contrast to the existing standard DSCPs, which are typically only
   enforced within a DiffServ Domain (e.g. an AS), this DSCP would be
   intended for end-to-end usage across the Internet.  Some access
   network service providers bleach the Diffserv field on ingress into
   their network, and in some cases apply their own DSCP for internal
   usage.  Access networks that support the NQB PHB would need to permit
   the NQB PHB to pass through this bleaching operation such that the
   PHB can be provided at the access network link.

6.  Relationship to L4S

   The dual-queue mechanism described in this draft is similar to, and
   is intended to be compatible with [I-D.ietf-tsvwg-l4s-arch].

7.  Comparison to Existing Approaches

   Traditional QoS mechanisms focus on prioritization in an attempt to
   achieve two goals, reduced latency for "latency-sensitive" traffic,
   and increased bandwidth availability for "important" applications.
   Applications are generally given priority in proportion to some
   combination of latency-sensitivity and importance.

   Downsides to this approach include the difficulties in sorting out
   what priority level each application should get (making the value
   judgement as to latency-sensitivity and importance), associating
   packets to priority levels (lots of classifier state, or trusting
   endpoint markings and the value judgements that they convey),
   ensuring that high priority traffic doesn't starve lower priority
   traffic (admission control, weighted scheduling, etc. are possible
   solutions).  This solution can work in a managed network, where the
   network operator can control the usage of the QoS mechanisms, but has
   not been adopted end-to-end across the internet.

   Flow queueing approaches (such as fq_codel RFC 8290 [RFC8290]), on
   the other hand, achieve latency improvements by associating packets
   into "flow" queues and then prioritizing "sparse flows", i.e. packets
   that arrive to an empty flow queue.  Flow queueing does not attempt
   to differentiate between flows on the basis of value (importance or
   latency-sensitivity), it simply gives preference to sparse flows, and
   tries to guarantee that the non-sparse flows all get an equal share


White                    Expires April 25, 2019                 [Page 6]

Internet-Draft          Non Queue Building Flows            October 2018


   of the remaining channel capacity.  As a result, fq mechanisms could
   be considered more appropriate for unmanaged environments and general
   internet traffic.

   Downsides to this approach include loss of low latency performance
   due to hash collisions (where a sparse flow shares a queue with a
   bulk data flow), complexity in managing a large number of queues, and
   the scheduling (typically DRR) that enforces that each non-sparse
   flow gets an equal fraction of link bandwidth causes problems with
   VPNs and other tunnels, exhibits poor behavior with less-aggressive
   CA algos, e.g.  LEDBAT, and exhibits poor behavior with RMCAT CA
   algos.  In effect the network element is making a decision as to what
   constitutes a flow, and then forcing all such flows to take equal
   bandwidth at every instant.

   The Dual-queue approach achieves the main benefit of fq_codel:
   latency improvement without value judgements, without the downsides.

   The distinction between NQB flows and QB flows is similar to the
   distinction made between "sparse flow queues" and "non-sparse flow
   queues" in fq_codel.  In fq_codel, a flow queue is considered sparse
   if it is drained completely by each packet transmission, and remains
   empty for at least one cycle of the round robin over the active flows
   (this is approximately equivalent to saying that it utilizes less
   than its fair share of capacity).  While this definition is
   convenient to implement in fq_codel, it isn't the only useful
   definition of sparse flows.

8.  Acknowledgements

   TBD

9.  IANA Considerations

   TBD

10.  Security Considerations

   TBD

11.  Informative References

   [I-D.ietf-tsvwg-l4s-arch]
              Briscoe, B., Schepper, K., and M. Bagnulo, "Low Latency,
              Low Loss, Scalable Throughput (L4S) Internet Service:
              Architecture", draft-ietf-tsvwg-l4s-arch-02 (work in
              progress), March 2018.


White                    Expires April 25, 2019                 [Page 7]

Internet-Draft          Non Queue Building Flows            October 2018


   [RFC8033]  Pan, R., Natarajan, P., Baker, F., and G. White,
              "Proportional Integral Controller Enhanced (PIE): A
              Lightweight Control Scheme to Address the Bufferbloat
              Problem", RFC 8033, DOI 10.17487/RFC8033, February 2017,
              <https://www.rfc-editor.org/info/rfc8033>.

   [RFC8034]  White, G. and R. Pan, "Active Queue Management (AQM) Based
              on Proportional Integral Controller Enhanced PIE) for
              Data-Over-Cable Service Interface Specifications (DOCSIS)
              Cable Modems", RFC 8034, DOI 10.17487/RFC8034, February
              2017, <https://www.rfc-editor.org/info/rfc8034>.

   [RFC8289]  Nichols, K., Jacobson, V., McGregor, A., Ed., and J.
              Iyengar, Ed., "Controlled Delay Active Queue Management",
              RFC 8289, DOI 10.17487/RFC8289, January 2018,
              <https://www.rfc-editor.org/info/rfc8289>.

   [RFC8290]  Hoeiland-Joergensen, T., McKenney, P., Taht, D., Gettys,
              J., and E. Dumazet, "The Flow Queue CoDel Packet Scheduler
              and Active Queue Management Algorithm", RFC 8290,
              DOI 10.17487/RFC8290, January 2018,
              <https://www.rfc-editor.org/info/rfc8290>.

Author's Address

   Greg White
   CableLabs
   858 Coal Creek Circle
   Louisville, CO  80027
   US

   Email: g.white@cablelabs.com


White                    Expires April 25, 2019                 [Page 8]