Internet DRAFT - draft-even-iccrg-dc-fast-congestion

draft-even-iccrg-dc-fast-congestion







TSVWG                                                            R. Even
Internet-Draft                                                    Huawei
Intended status: Informational                                  R. Huang
Expires: April 25, 2020                    Huawei Technologies Co., Ltd.
                                                        October 23, 2019


                 Data Center Fast Congestion Management
                 draft-even-iccrg-dc-fast-congestion-00

Abstract

   Fast congestion control is discussed in academic papers as well as in
   the different standard bodies.  There is no one proposal for
   providing a solution that will work for all use cases leading to
   multiple approaches.  By congestion control we refer to an end to end
   solution and not only to the congestion control algorithm on the
   sender side.  This document describes the current state of flow
   control and congestion for Data Centers and proposes future
   directions.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on April 25, 2020.

Copyright Notice

   Copyright (c) 2019 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (https://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect



Even & Huang             Expires April 25, 2020                 [Page 1]

Internet-Draft             DC Fast Congestion               October 2019


   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
   2.  Conventions . . . . . . . . . . . . . . . . . . . . . . . . .   3
   3.  Abbreviations . . . . . . . . . . . . . . . . . . . . . . . .   3
   4.  Alternative Congestion Management mechanisms  . . . . . . . .   4
     4.1.  Mechanisms based on estimation of network status  . . . .   4
     4.2.  Network provides limited information  . . . . . . . . . .   4
       4.2.1.  ECN and DCTCP . . . . . . . . . . . . . . . . . . . .   5
       4.2.2.  DCQCN . . . . . . . . . . . . . . . . . . . . . . . .   5
       4.2.3.  SCE - Some Congestion Experienced . . . . . . . . . .   6
       4.2.4.  L4S - Low Latency, Low Loss, Scalable Throughput  . .   7
     4.3.  Network provides more information . . . . . . . . . . . .   8
     4.4.  Network provides proactive control  . . . . . . . . . . .   9
   5.  Summary and Proposal  . . . . . . . . . . . . . . . . . . . .   9
     5.1.  Reflect the network status more accurately  . . . . . . .  10
     5.2.  Notify the reaction point as soon as possible.  . . . . .  10
   6.  Security Considerations . . . . . . . . . . . . . . . . . . .  11
   7.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  11
   8.  References  . . . . . . . . . . . . . . . . . . . . . . . . .  11
     8.1.  Normative References  . . . . . . . . . . . . . . . . . .  11
     8.2.  Informative References  . . . . . . . . . . . . . . . . .  11
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  15

1.  Introduction

   Fast congestion control is discussed in academic papers as well as in
   the different standard bodies.  There is no one proposal for
   providing a solution that will work for all use cases leading to
   multiple approaches.  By congestion control we refer to an end to end
   solution and not only to the congestion control algorithm on the
   sender side.

   The major use case that we are looking at is congestion control for
   Data Centers, a controlled environment[RFC8085].  With the emerging
   Distributed Storage, AI/HPC (High Performance Computing), Machine
   Learning, etc., modern datacenter applications demand high
   throughput(40Gbps and above) with ultra-low latency of less than 10
   microsecond per hop from the network, with low CPU overhead.  For the
   end to end the latency should be less than 50usec, this value is
   based on DCQCN [DCQCN] The high link speed (>40Gb/s) in Data Centers
   (DC) are making network transfers complete faster and in fewer RTTs.
   Network traffic in a data center is often a mix of short and long



Even & Huang             Expires April 25, 2020                 [Page 2]

Internet-Draft             DC Fast Congestion               October 2019


   flows, where the short flows require low latencies and the long flows
   require high throughputs.

   On IP-routed datacenter networks, RDMA is deployed using RoCEv2
   [RoCEv2] protocol or iWARP [RFC5040] RoCEv2 [RoCEv2] is a
   straightforward extension of the RoCE protocol that involves a simple
   modification of the RoCE packet format.  RoCEv2 packets carry an IP
   header which allows traversal of IP L3 Routers and a UDP header that
   serves as a stateless encapsulation layer for the RDMA Transport
   Protocol Packets over IP.  For Data Centers RDMA in ROCEv2 expect a
   lossless fabric and this is achieved using ECN and PFC. iWARP
   congestion control is based on TCP congestion control (DCTCP
   [RFC8257])

   A good congestion control for data centers should provide low
   latency, fast convergence and high link utilization.  Since multiple
   applications with different requirements may run on the DC network it
   is important to provide fairness between different applications that
   may use different congestion algorithms.  An important issue from the
   user perspective is to achieve short Flow Completion Time (FCT).

   This document investigates the current congestion control proposals,
   and discusses future data center congestion control directions which
   aims to achieve high performance and collaboration.

2.  Conventions

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
   "OPTIONAL" in this document are to be interpreted as described in BCP
   14 [RFC2119] [RFC8174] when, and only when, they appear in all
   capitals, as shown here.

3.  Abbreviations

      RCM - RoCEv2 Congestion Management

      PFC - Priority-based Flow Control

      ECN - Explicit Congestion Notification

      DCQCN - Data Center Quantized Congestion Notification

      AI/HPC - Artificial Intelligence/High-Performance computing

      ECMP - Equal-Cost Multipath

      NIC - Network Interface Card



Even & Huang             Expires April 25, 2020                 [Page 3]

Internet-Draft             DC Fast Congestion               October 2019


      RED - Random early detection gateways for congestion avoidance

4.  Alternative Congestion Management mechanisms

   This section will describe alternative directions based on current
   work.  Looking at the alternatives from the network perspective we
   can classify the alternatives as:

   1.  Based on estimation of network status: Traditional TCP, Timely.

   2.  Network provides limited information: DCQCN using only ECN, SCE
       and L4S

   3.  Network provides some information: HPCC.

   4.  Network provides proactive control: RCP (Rate Control Protocol)

   Note that any research on congestion control that requires network
   participation will be irrelevant if we cannot find a viable
   deployment path where only part of the network devices support the
   proposed congestion control.

4.1.  Mechanisms based on estimation of network status

   Traditional mechanisms uses packet status as the congestion signal
   and feedback to the sender, e.g. loss or delay, which is based on the
   facts that packets will drop when a buffer is full and packets will
   be delayed when a queue is building up.  It can simply be achieved by
   the interactions between the sender and the receiver, without the
   involvement of network.  It works well on the internet for a very
   long time, especially for best effort applications that do not have
   specific performance requirements.

   However, these mechanism are not optimized for some data center
   application because the convergence time and throughput are not good
   enough.  Mainly because endpoints estimation of network status are
   not accurate enough, and these mechanisms lack further information to
   adjust the sender behaviors.

4.2.  Network provides limited information

   In these mechanisms, the network utilize the ECN field of IP header
   to provide some hints on network status.  The following sections
   describe some typical proposals.







Even & Huang             Expires April 25, 2020                 [Page 4]

Internet-Draft             DC Fast Congestion               October 2019


4.2.1.  ECN and DCTCP

   The Internet solutions use ECN [RFC3168] for marking the state of the
   queues in the network device, they may use some AQM mechanism
   (fq_coDel [RFC8290] ], PIE [RFC8033]) in the network devices and a
   congestion algorithm (New Reno [RFC5681], Cubic [RFC8312] or
   DCTCP[RFC8257]) on the sender side to address the congestion in the
   network.  Note that ECN is signaled earlier than packet drop but may
   cause earlier exit from TCP slow start.

   One of the problem for TCP is that ECN is specified for TCP in such a
   way that only one feedback signal can be transmitted per Round-Trip
   Time (RTT).  [I-D.ietf-tcpm-accurate-ecn] specifies an alternative
   feedback scheme that provides more accurate information that can be
   used by DCTCP and L4S.

   Traditional TCP uses ECN signal to indicate congestion experienced
   instead of packet loss, however, it does not provide information
   about the degree of the congestion.  DCTCP [RFC8257] is trying to
   solve this issue.  It estimates the fraction of bytes that encounter
   congestion rather than simply detecting the congestion presence.
   DCTCP further scales its sending rates accordingly.  DCTCP is widely
   implemented in current data center environments.

4.2.2.  DCQCN

   An enhancement to the congestion handling for ROCEv2 is the
   Congestion Control for Large-Scale RDMA Deployments [DCQCN] providing
   similar functionality to QCN [QCN] and DCTCP [RFC8257], it is
   implemented in some of the ROCEv2 NICs but is not part of the ROCEv2
   specification.  As such, vendors have their own implementations which
   make it difficult to interoperate with each other efficiently.

   DCQCN tests are assuming that the Congestion Point is using RED-ECN
   for ECN marking and the RDMA CNP message is used by the Notification
   Point (the receiver) to report ECN Congestion Experienced (CE).
   DCQCN as presented includes parameters that should be set.  It
   provides the parameters that were used during the specific tests
   using Mellanox NICs.  One of the comments about DCQCN is that it is
   not simple to define the parameters in order to get an optimized
   solution.  This solution is specific to ROCEv2 and addresses only the
   congestion control algorithm and is implemented in the NIC.

   DCQCN notification is using CNP that only report that at least one
   packet with CE marking was received in the last 50usec; this is
   similar to TCP reporting.  Other UDP based transports like RTP and
   QUIC provides information about how many packets marked with CE,
   ECT(0,1) were received.



Even & Huang             Expires April 25, 2020                 [Page 5]

Internet-Draft             DC Fast Congestion               October 2019


4.2.3.  SCE - Some Congestion Experienced

   [I-D.morton-taht-tsvwg-sce] ECT(1) to be an early notification of
   congestion on ECT(0) marked packets, which can be used by AQM
   algorithms and transports as an earlier signal of congestion than CE
   ("Congestion Experienced").

   The ECN specification say that the congestion algorithm should treat
   CE marks the same as a drop packets.  Using ECT(1) to signal SCE
   permits middleboxes implementing AQM to signal incipient congestion,
   below the threshold required to justify setting CE.  Existing
   [RFC3168] compliant receivers MUST transparently ignore this new
   signal with respect to congestion control, and both existing and SCE-
   aware middleboxes MAY convert SCE to CE in the same circumstances as
   for ECT, thus ensuring backwards compatibility with ECN [RFC3168]
   endpoints.

   This solution is using ECT(1) which was defined in ECN [RFC3168] as a
   one bit Nonce but this use is obsoleted in RFC8311 and SCE is using
   it for the SCE mark.  There may be other documents trying to use this
   bit for example L4S use it to signal L4S support.  The SCE marking
   are done by the AQM algorithm (RED, CODEL) and are sent back to the
   sender by the transport so there may be a need to add support for
   conveying the SCE marking to the sender (QUIC for example already has
   support for reporting the count of ECT(0) and ECT(1) separately).
   This solution is simpler than HPCC but provide less information.

   [I-D.heist-tsvwg-sce-one-and-two-flow-tests] presents one and two-
   flow test results for the SCE reference implementation.  These tests
   are not intended to be a comprehensive real-world evaluation of SCE,
   but an illustration of SCE's influence on basic TCP metrics in a
   controlled environment.  The goal of the one-flow tests is to analyze
   the impact of SCE on the TCP throughput and TCP RTT of single TCP
   flows across a range of simulated path bandwidths and RTTs.  The
   tests were with RENO and DCCP.  Even though using SCE gave in general
   better results there were significant under-utilization at low
   bandwidths ( <10Mb/sec; <25Mb/sec) and a slight increase in TCP RTT
   for DCTCP-SCE at 100Mbit / 160ms and a slight increase in TCP RTT for
   SCE RENO at high BDPs.  The document does not describe the congestion
   algorithm that was used for DCTCP-SCE or RENO-SCE and comment that
   further work need to be done to understand the reason for this
   behvior.

   The goal of the two-flow tests is to measure fairness between and
   among SCE and non-SCE TCP flows, through either a single queue or
   with fair queuing.





Even & Huang             Expires April 25, 2020                 [Page 6]

Internet-Draft             DC Fast Congestion               October 2019


   The initial results show that SCE enabled flows back off in the face
   of competition, whereas non-SCE flows fill the queue until a drop or
   CE mark occurs so fairness is not achieved.  By changing the ramp by
   which SCE is marked and marking SCE when closer to drop or CE the
   fairness is better.

4.2.4.  L4S - Low Latency, Low Loss, Scalable Throughput

   There are three main components to the L4S architecture
   [I-D.ietf-tsvwg-l4s-arch]

   1.  Network: L4S traffic needs to be isolated from the queuing
       latency of Classic traffic.  However, the two should be able to
       freely share a common pool of capacity.  This is because there is
       no way to predict how many flows at any one time might use each
       service and capacity in access networks is too scarce to
       partition into two.  The Dual Queue Coupled AQM
       [I-D.ietf-tsvwg-aqm-dualq-coupled] was developed as a minimal
       complexity solution to this problem.  The two queues appear to be
       separated by a 'semi-permeable' membrane that partitions latency
       but not bandwidth.  Per-flow queuing such as in [RFC8290] could
       be used but it partitions both latency and bandwidth between
       every end-to-end flow.  So it is rather overkill, which brings
       disadvantages, not least that large number of queues are needed
       when two are sufficient.

   2.  Protocol: A host needs to distinguish L4S and Classic packets
       with an identifier so that the network can classify them into
       their separate treatments.  [I-D.ietf-tsvwg-ecn-l4s-id] considers
       various alternative identifiers, and concludes that all
       alternatives involve compromises, but the ECT(1) and CE
       codepoints of the ECN field represent a workable solution.

   3.  Host: Scalable congestion controls already exist.  They solve the
       scaling problem with TCP that was first pointed out in [RFC3649].
       The one used most widely (in controlled environments) is Data
       Center TCP (DCTCP [RFC8257]).  Although DCTCP as-is 'works' well
       over the public Internet, most implementations lack certain
       safety features that will be necessary once it is used outside
       controlled environments like data centers.  A similar scalable
       congestion control will also need to be transplanted into
       protocols other than TCP (QUIC, SCTP, RTP/RTCP, RMCAT, etc.)
       Indeed, between the present document being drafted and published,
       the following scalable congestion controls were implemented: TCP
       Prague, QUIC Prague and an L4S variant of the RMCAT SCReAM
       controller [RFC8298].





Even & Huang             Expires April 25, 2020                 [Page 7]

Internet-Draft             DC Fast Congestion               October 2019


   Using Dual Queue provides better fairness between DCTCP and Reno/
   Cubic . This is less relevant to Data Centers where the competing
   streams may use DCQN and DCTCP.

4.3.  Network provides more information

   The new-generation high-speed cloud network congestion control
   protocol HPCC (High Precision Congestion Control) [HPCC], aiming to
   achieve the ultimate performance and high stability of the high-speed
   cloud network at the same time.  HPCC has been presented at ACM
   SIGCOMM 2019.

   The key design choice of HPCC is to rely on switches to provide fine-
   grained load information, such as queue size and accumulated tx/rx
   traffic to compute precise flow rates.  This has two major benefits:
   (i) HPCC can quickly converge to proper flow rates to highly utilize
   bandwidth while avoiding congestion; and (ii) HPCC can consistently
   maintain a close-to-zero queue for low latency.

   HPCC is a sender-driven CC framework.  Each packet a sender sends
   will be acknowledged by the receiver.  During the propagation of the
   packet from the sender to the receiver, each switch along the path
   leverages the INT feature of its switching ASIC to insert some meta-
   data that reports the current load of the packet's egress port,
   including timestamp (ts), queue length (qLen), transmitted bytes
   (txBytes), and the link bandwidth capacity (B).  When the receiver
   gets the packet, it copies all the meta-data recorded by the switches
   to the ACK message it sends back to the sender.  The sender decides
   how to adjust its flow rate each time it receives an ACK with network
   load information.

   Current IETF activity in IOAM [I-D.ietf-ippm-ioam-data] provides a
   standard mechanism for inserting metadata by the switches in the
   middle.  IOAM can provides an optional method for sending the
   metadata feedback by the network to the endpoints on congestion
   status.  But to using IOAM, the following points should be
   considered:

   1.  Is the current IOAM data fields sufficient for congestion
       control.

   2.  The encapsulation of IOAM in data center for congestion control.

   3.  The feedback format for sender driven congestion control.

   The HPCC framework requires each node in the middle to add
   information about its state to the forward going packet until it
   reaches the receiver who will send the acknowledgment.  We can think



Even & Huang             Expires April 25, 2020                 [Page 8]

Internet-Draft             DC Fast Congestion               October 2019


   of others modes like having the nodes in the middle updating the
   status information based on its available resources.  This solution
   requires support for INT or IOAM, both protocols need to specify the
   packet format with the INT/IOAM extension.  The HPCC document specify
   how to implement it for ROCEv2 while for IOAM there are some drafts
   in IPPM WG describing how to implement it for different transports
   and layer 2 packets.

   The conclusion from the trials done were that HPCC can be a next-
   generation CC for high-speed networks to achieve ultra-low latency,
   high bandwidth, and stability simultaneously.  HPCC achieves fast
   convergence, small queues, and fairness by leveraging precise load
   information from INT.

   Similar mechanism is defined in Quick Start for TCP and IP[RFC4782].
   There is a difference with the starting rate.  While HPCC starts at
   maximum line speed [RFC4782] starts at a rate as specified in the
   Quick-Start request message.  The Quick Start is specified for TCP,
   if other transport (UDP) is used there is a need to specify how the
   receiver send the Quick-Start response message.

4.4.  Network provides proactive control

   The typical algorithm in this category is RCP (Rate Control Protocol)
   [RCP].  In the basic RCP algorithm , a router maintains a single
   rate, R(t), for every link.  The router "stamps" R(t) on every
   passing packet (unless it already carries a slower value).  The
   receiver sends the value back to the sender, thus informing it about
   the slowest (or bottleneck) rate along the path.  In this way, the
   sender quickly finds out the rate it should be using (without the
   need for Slow-Start).  The router updates R(t) approximately once per
   roundtrip time, and strives to emulate Processor Sharing among flows.
   The biggest plus of RCP is the short flow completion times under a
   wide range of network and traffic characteristics.

   The downside of RCP is that RCP involves the routers in congestion
   control, so it needs help from the infrastructure.  Although they are
   simple, it does have per-packet computations.  Another downside is
   that although the RCP algorithm strives to keep the buffer occupancy
   low most times, there are no guarantees of buffers not overflowing or
   of a zero packet loss.

5.  Summary and Proposal

   Congestion control is all about how to utilize the network resource
   in a better and reasonably way under different network conditions.
   Senders are the reaction points that consume network resource, and
   network nodes are the congestion points.  Ideally, reaction points



Even & Huang             Expires April 25, 2020                 [Page 9]

Internet-Draft             DC Fast Congestion               October 2019


   should react as soon as possible when network statuses change.  To
   achieve that, there are two directions:

5.1.  Reflect the network status more accurately

   In order to provide more information than just ECN CE marking there
   is a need to standardize a mechanism for the network device to
   provide such information and for the receiver to send more
   information to the sender.  The network device should not insert any
   new fields to the IP packet but should be able to modify the value of
   fields in the packets sent from the data sender.

   The network device will update the metadata in the forward going
   packet to provide more information than a single CE mark or SCE like
   solution.

   The receiver will analyze the metadata and report back to the sender.
   Different from the Internet, data center network can benefit more
   from having more accurate information to achieve better congestion
   control.  And this means network and hosts must collaborate together
   to achieve it.

   Issues to be addressed:

   o  How to add the metadata to the forward stream (IOAM is a valid
      option since we are interested in a single DC domain).  The
      encapsulations for both IPv4 and IPv6 should be considered.

   o  Negotiation of the capabilities of different nodes.

   o  The format of the network information feedback to the sender in
      the case of sender-driven mechanisms.

   o  The semantics of the message (notification or proactive)

   o  Investigation of the extra load on the network device for adding
      the metadata.

5.2.  Notify the reaction point as soon as possible.

   In this direction, it is worth to investigate if it's possible for
   the middle nodes to notify the sender directly (like IOAM Postcards)
   on network conditions, but such a method is challenging in terms of
   addressing security issues and the first concern will be that this
   can serve as a tool for DOS attack.  But other ways, for example,
   carry the information in the reverse traffic would be an alternative
   as long as reverse traffic exists.




Even & Huang             Expires April 25, 2020                [Page 10]

Internet-Draft             DC Fast Congestion               October 2019


   Issues to be addressed:

   o  How to deal with multiple congestion points?

   o  How to identify support by the sender and receiver for this mode
      and support legacy systems (same as previous mode).

   o  How to authenticate the validity of the data.

   o  Hardware implications

6.  Security Considerations

   TBD

7.  IANA Considerations

   No IANA action

8.  References

8.1.  Normative References

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119,
              DOI 10.17487/RFC2119, March 1997,
              <https://www.rfc-editor.org/info/rfc2119>.

   [RFC8174]  Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
              2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
              May 2017, <https://www.rfc-editor.org/info/rfc8174>.

8.2.  Informative References

   [CongestionManagment]
              "Understanding RoCEv2 Congestion Management", 12 2018,
              <https://community.mellanox.com/s/article/understanding-
              rocev2-congestion-management>.

   [DCQCN]    Zhu, Y., Eran, H., Firestone, D., Guo, C., Lipshteyn, M.,
              Liron, Y., Padhye, J., Raindel, S., Yahia, M. H., and M.
              Zhang, "Congestion control for large-scale RDMA
              deployments. In ACM SIGCOMM Computer Communication Review,
              Vol. 45. ACM, 523-536.", 8 2015,
              <https://conferences.sigcomm.org/sigcomm/2015/pdf/papers/
              p523.pdf>.





Even & Huang             Expires April 25, 2020                [Page 11]

Internet-Draft             DC Fast Congestion               October 2019


   [HPCC]     Li, Y., Miao, R., Liur, H. H., Zhuang, Y., Feng, F., Tang,
              L., Cao, Z., Zhang, M., Kelly, F., Alizadeh, M., and M.
              Yu, "HPCC: High Precision Congestion Control", 8 2019,
              <https://liyuliang001.github.io/publications/hpcc.pdf>.

   [I-D.heist-tsvwg-sce-one-and-two-flow-tests]
              Heist, P., Grimes, R., and J. Morton, "Some Congestion
              Experienced One and Two-Flow Tests", draft-heist-tsvwg-
              sce-one-and-two-flow-tests-00 (work in progress), July
              2019.

   [I-D.herbert-ipv4-eh]
              Herbert, T., "IPv4 Extension Headers and Flow Label",
              draft-herbert-ipv4-eh-01 (work in progress), May 2019.

   [I-D.ietf-ippm-ioam-data]
              Brockners, F., Bhandari, S., Pignataro, C., Gredler, H.,
              Leddy, J., Youell, S., Mizrahi, T., Mozes, D., Lapukhov,
              P., Chang, R., daniel.bernier@bell.ca, d., and J. Lemon,
              "Data Fields for In-situ OAM", draft-ietf-ippm-ioam-
              data-07 (work in progress), September 2019.

   [I-D.ietf-quic-transport]
              Iyengar, J. and M. Thomson, "QUIC: A UDP-Based Multiplexed
              and Secure Transport", draft-ietf-quic-transport-23 (work
              in progress), September 2019.

   [I-D.ietf-tcpm-accurate-ecn]
              Briscoe, B., Kuehlewind, M., and R. Scheffenegger, "More
              Accurate ECN Feedback in TCP", draft-ietf-tcpm-accurate-
              ecn-09 (work in progress), July 2019.

   [I-D.ietf-tsvwg-aqm-dualq-coupled]
              Schepper, K., Briscoe, B., and G. White, "DualQ Coupled
              AQMs for Low Latency, Low Loss and Scalable Throughput
              (L4S)", draft-ietf-tsvwg-aqm-dualq-coupled-10 (work in
              progress), July 2019.

   [I-D.ietf-tsvwg-ecn-l4s-id]
              Schepper, K. and B. Briscoe, "Identifying Modified
              Explicit Congestion Notification (ECN) Semantics for
              Ultra-Low Queuing Delay (L4S)", draft-ietf-tsvwg-ecn-l4s-
              id-07 (work in progress), July 2019.








Even & Huang             Expires April 25, 2020                [Page 12]

Internet-Draft             DC Fast Congestion               October 2019


   [I-D.ietf-tsvwg-l4s-arch]
              Briscoe, B., Schepper, K., Bagnulo, M., and G. White, "Low
              Latency, Low Loss, Scalable Throughput (L4S) Internet
              Service: Architecture", draft-ietf-tsvwg-l4s-arch-04 (work
              in progress), July 2019.

   [I-D.morton-taht-tsvwg-sce]
              Morton, J. and D. Taht, "The Some Congestion Experienced
              ECN Codepoint", draft-morton-taht-tsvwg-sce-00 (work in
              progress), March 2019.

   [IEEE.802.1QBB_2011]
              IEEE, "IEEE Standard for Local and metropolitan area
              networks--Media Access Control (MAC) Bridges and Virtual
              Bridged Local Area Networks--Amendment 17: Priority-based
              Flow Control", IEEE 802.1Qbb-2011,
              DOI 10.1109/ieeestd.2011.6032693, September 2011,
              <http://ieeexplore.ieee.org/servlet/
              opac?punumber=6032691>.

   [QCN]      Alizadeh, M., Atikoglu, B., Kabbani, A., Lakshmikantha,
              A., Pan, R., Prabhakar, B., and M. Seaman, "Data Center
              Transport Mechanisms:Congestion Control Theory and IEEE
              Standardization", 9 2008,
              <https://web.stanford.edu/~balaji/papers/QCN.pdf>.

   [RCP]      Dukkipati, N., "RATE CONTROL PROTOCOL (RCP): CONGESTION
              CONTROL TO MAKE FLOWS COMPLETE QUICKLY", 10 2007,
              <http://yuba.stanford.edu/~nanditad/thesis-NanditaD.pdf>.

   [RFC3168]  Ramakrishnan, K., Floyd, S., and D. Black, "The Addition
              of Explicit Congestion Notification (ECN) to IP",
              RFC 3168, DOI 10.17487/RFC3168, September 2001,
              <https://www.rfc-editor.org/info/rfc3168>.

   [RFC3649]  Floyd, S., "HighSpeed TCP for Large Congestion Windows",
              RFC 3649, DOI 10.17487/RFC3649, December 2003,
              <https://www.rfc-editor.org/info/rfc3649>.

   [RFC4782]  Floyd, S., Allman, M., Jain, A., and P. Sarolahti, "Quick-
              Start for TCP and IP", RFC 4782, DOI 10.17487/RFC4782,
              January 2007, <https://www.rfc-editor.org/info/rfc4782>.

   [RFC5040]  Recio, R., Metzler, B., Culley, P., Hilland, J., and D.
              Garcia, "A Remote Direct Memory Access Protocol
              Specification", RFC 5040, DOI 10.17487/RFC5040, October
              2007, <https://www.rfc-editor.org/info/rfc5040>.




Even & Huang             Expires April 25, 2020                [Page 13]

Internet-Draft             DC Fast Congestion               October 2019


   [RFC5681]  Allman, M., Paxson, V., and E. Blanton, "TCP Congestion
              Control", RFC 5681, DOI 10.17487/RFC5681, September 2009,
              <https://www.rfc-editor.org/info/rfc5681>.

   [RFC6679]  Westerlund, M., Johansson, I., Perkins, C., O'Hanlon, P.,
              and K. Carlberg, "Explicit Congestion Notification (ECN)
              for RTP over UDP", RFC 6679, DOI 10.17487/RFC6679, August
              2012, <https://www.rfc-editor.org/info/rfc6679>.

   [RFC8033]  Pan, R., Natarajan, P., Baker, F., and G. White,
              "Proportional Integral Controller Enhanced (PIE): A
              Lightweight Control Scheme to Address the Bufferbloat
              Problem", RFC 8033, DOI 10.17487/RFC8033, February 2017,
              <https://www.rfc-editor.org/info/rfc8033>.

   [RFC8085]  Eggert, L., Fairhurst, G., and G. Shepherd, "UDP Usage
              Guidelines", BCP 145, RFC 8085, DOI 10.17487/RFC8085,
              March 2017, <https://www.rfc-editor.org/info/rfc8085>.

   [RFC8257]  Bensley, S., Thaler, D., Balasubramanian, P., Eggert, L.,
              and G. Judd, "Data Center TCP (DCTCP): TCP Congestion
              Control for Data Centers", RFC 8257, DOI 10.17487/RFC8257,
              October 2017, <https://www.rfc-editor.org/info/rfc8257>.

   [RFC8290]  Hoeiland-Joergensen, T., McKenney, P., Taht, D., Gettys,
              J., and E. Dumazet, "The Flow Queue CoDel Packet Scheduler
              and Active Queue Management Algorithm", RFC 8290,
              DOI 10.17487/RFC8290, January 2018,
              <https://www.rfc-editor.org/info/rfc8290>.

   [RFC8298]  Johansson, I. and Z. Sarker, "Self-Clocked Rate Adaptation
              for Multimedia", RFC 8298, DOI 10.17487/RFC8298, December
              2017, <https://www.rfc-editor.org/info/rfc8298>.

   [RFC8312]  Rhee, I., Xu, L., Ha, S., Zimmermann, A., Eggert, L., and
              R. Scheffenegger, "CUBIC for Fast Long-Distance Networks",
              RFC 8312, DOI 10.17487/RFC8312, February 2018,
              <https://www.rfc-editor.org/info/rfc8312>.

   [RoCEv2]   "Infiniband Trade Association. Supplement to InfiniBand
              architecture specification volume 1 release 1.2.2 annex
              A17: RoCEv2 (IP routable RoCE).",
              <https://cw.infinibandta.org/document/dl/7781>.








Even & Huang             Expires April 25, 2020                [Page 14]

Internet-Draft             DC Fast Congestion               October 2019


Authors' Addresses

   Roni Even
   Huawei

   Email: roni.even@huawei.com


   Rachel Huang
   Huawei Technologies Co., Ltd.

   Email: rachel.huang@huawei.com







































Even & Huang             Expires April 25, 2020                [Page 15]