TSVWG R. Even Internet-Draft Huawei Intended status: Informational R. Huang Expires: April 25, 2020 Huawei Technologies Co., Ltd. October 23, 2019 Data Center Fast Congestion Management draft-even-iccrg-dc-fast-congestion-00 Abstract Fast congestion control is discussed in academic papers as well as in the different standard bodies. There is no one proposal for providing a solution that will work for all use cases leading to multiple approaches. By congestion control we refer to an end to end solution and not only to the congestion control algorithm on the sender side. This document describes the current state of flow control and congestion for Data Centers and proposes future directions. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on April 25, 2020. Copyright Notice Copyright (c) 2019 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect Even & Huang Expires April 25, 2020 [Page 1] Internet-Draft DC Fast Congestion October 2019 to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 2. Conventions . . . . . . . . . . . . . . . . . . . . . . . . . 3 3. Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . 3 4. Alternative Congestion Management mechanisms . . . . . . . . 4 4.1. Mechanisms based on estimation of network status . . . . 4 4.2. Network provides limited information . . . . . . . . . . 4 4.2.1. ECN and DCTCP . . . . . . . . . . . . . . . . . . . . 5 4.2.2. DCQCN . . . . . . . . . . . . . . . . . . . . . . . . 5 4.2.3. SCE - Some Congestion Experienced . . . . . . . . . . 6 4.2.4. L4S - Low Latency, Low Loss, Scalable Throughput . . 7 4.3. Network provides more information . . . . . . . . . . . . 8 4.4. Network provides proactive control . . . . . . . . . . . 9 5. Summary and Proposal . . . . . . . . . . . . . . . . . . . . 9 5.1. Reflect the network status more accurately . . . . . . . 10 5.2. Notify the reaction point as soon as possible. . . . . . 10 6. Security Considerations . . . . . . . . . . . . . . . . . . . 11 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11 8. References . . . . . . . . . . . . . . . . . . . . . . . . . 11 8.1. Normative References . . . . . . . . . . . . . . . . . . 11 8.2. Informative References . . . . . . . . . . . . . . . . . 11 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 15 1. Introduction Fast congestion control is discussed in academic papers as well as in the different standard bodies. There is no one proposal for providing a solution that will work for all use cases leading to multiple approaches. By congestion control we refer to an end to end solution and not only to the congestion control algorithm on the sender side. The major use case that we are looking at is congestion control for Data Centers, a controlled environment[RFC8085]. With the emerging Distributed Storage, AI/HPC (High Performance Computing), Machine Learning, etc., modern datacenter applications demand high throughput(40Gbps and above) with ultra-low latency of less than 10 microsecond per hop from the network, with low CPU overhead. For the end to end the latency should be less than 50usec, this value is based on DCQCN [DCQCN] The high link speed (>40Gb/s) in Data Centers (DC) are making network transfers complete faster and in fewer RTTs. Network traffic in a data center is often a mix of short and long Even & Huang Expires April 25, 2020 [Page 2] Internet-Draft DC Fast Congestion October 2019 flows, where the short flows require low latencies and the long flows require high throughputs. On IP-routed datacenter networks, RDMA is deployed using RoCEv2 [RoCEv2] protocol or iWARP [RFC5040] RoCEv2 [RoCEv2] is a straightforward extension of the RoCE protocol that involves a simple modification of the RoCE packet format. RoCEv2 packets carry an IP header which allows traversal of IP L3 Routers and a UDP header that serves as a stateless encapsulation layer for the RDMA Transport Protocol Packets over IP. For Data Centers RDMA in ROCEv2 expect a lossless fabric and this is achieved using ECN and PFC. iWARP congestion control is based on TCP congestion control (DCTCP [RFC8257]) A good congestion control for data centers should provide low latency, fast convergence and high link utilization. Since multiple applications with different requirements may run on the DC network it is important to provide fairness between different applications that may use different congestion algorithms. An important issue from the user perspective is to achieve short Flow Completion Time (FCT). This document investigates the current congestion control proposals, and discusses future data center congestion control directions which aims to achieve high performance and collaboration. 2. Conventions The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here. 3. Abbreviations RCM - RoCEv2 Congestion Management PFC - Priority-based Flow Control ECN - Explicit Congestion Notification DCQCN - Data Center Quantized Congestion Notification AI/HPC - Artificial Intelligence/High-Performance computing ECMP - Equal-Cost Multipath NIC - Network Interface Card Even & Huang Expires April 25, 2020 [Page 3] Internet-Draft DC Fast Congestion October 2019 RED - Random early detection gateways for congestion avoidance 4. Alternative Congestion Management mechanisms This section will describe alternative directions based on current work. Looking at the alternatives from the network perspective we can classify the alternatives as: 1. Based on estimation of network status: Traditional TCP, Timely. 2. Network provides limited information: DCQCN using only ECN, SCE and L4S 3. Network provides some information: HPCC. 4. Network provides proactive control: RCP (Rate Control Protocol) Note that any research on congestion control that requires network participation will be irrelevant if we cannot find a viable deployment path where only part of the network devices support the proposed congestion control. 4.1. Mechanisms based on estimation of network status Traditional mechanisms uses packet status as the congestion signal and feedback to the sender, e.g. loss or delay, which is based on the facts that packets will drop when a buffer is full and packets will be delayed when a queue is building up. It can simply be achieved by the interactions between the sender and the receiver, without the involvement of network. It works well on the internet for a very long time, especially for best effort applications that do not have specific performance requirements. However, these mechanism are not optimized for some data center application because the convergence time and throughput are not good enough. Mainly because endpoints estimation of network status are not accurate enough, and these mechanisms lack further information to adjust the sender behaviors. 4.2. Network provides limited information In these mechanisms, the network utilize the ECN field of IP header to provide some hints on network status. The following sections describe some typical proposals. Even & Huang Expires April 25, 2020 [Page 4] Internet-Draft DC Fast Congestion October 2019 4.2.1. ECN and DCTCP The Internet solutions use ECN [RFC3168] for marking the state of the queues in the network device, they may use some AQM mechanism (fq_coDel [RFC8290] ], PIE [RFC8033]) in the network devices and a congestion algorithm (New Reno [RFC5681], Cubic [RFC8312] or DCTCP[RFC8257]) on the sender side to address the congestion in the network. Note that ECN is signaled earlier than packet drop but may cause earlier exit from TCP slow start. One of the problem for TCP is that ECN is specified for TCP in such a way that only one feedback signal can be transmitted per Round-Trip Time (RTT). [I-D.ietf-tcpm-accurate-ecn] specifies an alternative feedback scheme that provides more accurate information that can be used by DCTCP and L4S. Traditional TCP uses ECN signal to indicate congestion experienced instead of packet loss, however, it does not provide information about the degree of the congestion. DCTCP [RFC8257] is trying to solve this issue. It estimates the fraction of bytes that encounter congestion rather than simply detecting the congestion presence. DCTCP further scales its sending rates accordingly. DCTCP is widely implemented in current data center environments. 4.2.2. DCQCN An enhancement to the congestion handling for ROCEv2 is the Congestion Control for Large-Scale RDMA Deployments [DCQCN] providing similar functionality to QCN [QCN] and DCTCP [RFC8257], it is implemented in some of the ROCEv2 NICs but is not part of the ROCEv2 specification. As such, vendors have their own implementations which make it difficult to interoperate with each other efficiently. DCQCN tests are assuming that the Congestion Point is using RED-ECN for ECN marking and the RDMA CNP message is used by the Notification Point (the receiver) to report ECN Congestion Experienced (CE). DCQCN as presented includes parameters that should be set. It provides the parameters that were used during the specific tests using Mellanox NICs. One of the comments about DCQCN is that it is not simple to define the parameters in order to get an optimized solution. This solution is specific to ROCEv2 and addresses only the congestion control algorithm and is implemented in the NIC. DCQCN notification is using CNP that only report that at least one packet with CE marking was received in the last 50usec; this is similar to TCP reporting. Other UDP based transports like RTP and QUIC provides information about how many packets marked with CE, ECT(0,1) were received. Even & Huang Expires April 25, 2020 [Page 5] Internet-Draft DC Fast Congestion October 2019 4.2.3. SCE - Some Congestion Experienced [I-D.morton-taht-tsvwg-sce] ECT(1) to be an early notification of congestion on ECT(0) marked packets, which can be used by AQM algorithms and transports as an earlier signal of congestion than CE ("Congestion Experienced"). The ECN specification say that the congestion algorithm should treat CE marks the same as a drop packets. Using ECT(1) to signal SCE permits middleboxes implementing AQM to signal incipient congestion, below the threshold required to justify setting CE. Existing [RFC3168] compliant receivers MUST transparently ignore this new signal with respect to congestion control, and both existing and SCE- aware middleboxes MAY convert SCE to CE in the same circumstances as for ECT, thus ensuring backwards compatibility with ECN [RFC3168] endpoints. This solution is using ECT(1) which was defined in ECN [RFC3168] as a one bit Nonce but this use is obsoleted in RFC8311 and SCE is using it for the SCE mark. There may be other documents trying to use this bit for example L4S use it to signal L4S support. The SCE marking are done by the AQM algorithm (RED, CODEL) and are sent back to the sender by the transport so there may be a need to add support for conveying the SCE marking to the sender (QUIC for example already has support for reporting the count of ECT(0) and ECT(1) separately). This solution is simpler than HPCC but provide less information. [I-D.heist-tsvwg-sce-one-and-two-flow-tests] presents one and two- flow test results for the SCE reference implementation. These tests are not intended to be a comprehensive real-world evaluation of SCE, but an illustration of SCE's influence on basic TCP metrics in a controlled environment. The goal of the one-flow tests is to analyze the impact of SCE on the TCP throughput and TCP RTT of single TCP flows across a range of simulated path bandwidths and RTTs. The tests were with RENO and DCCP. Even though using SCE gave in general better results there were significant under-utilization at low bandwidths ( <10Mb/sec; <25Mb/sec) and a slight increase in TCP RTT for DCTCP-SCE at 100Mbit / 160ms and a slight increase in TCP RTT for SCE RENO at high BDPs. The document does not describe the congestion algorithm that was used for DCTCP-SCE or RENO-SCE and comment that further work need to be done to understand the reason for this behvior. The goal of the two-flow tests is to measure fairness between and among SCE and non-SCE TCP flows, through either a single queue or with fair queuing. Even & Huang Expires April 25, 2020 [Page 6] Internet-Draft DC Fast Congestion October 2019 The initial results show that SCE enabled flows back off in the face of competition, whereas non-SCE flows fill the queue until a drop or CE mark occurs so fairness is not achieved. By changing the ramp by which SCE is marked and marking SCE when closer to drop or CE the fairness is better. 4.2.4. L4S - Low Latency, Low Loss, Scalable Throughput There are three main components to the L4S architecture [I-D.ietf-tsvwg-l4s-arch] 1. Network: L4S traffic needs to be isolated from the queuing latency of Classic traffic. However, the two should be able to freely share a common pool of capacity. This is because there is no way to predict how many flows at any one time might use each service and capacity in access networks is too scarce to partition into two. The Dual Queue Coupled AQM [I-D.ietf-tsvwg-aqm-dualq-coupled] was developed as a minimal complexity solution to this problem. The two queues appear to be separated by a 'semi-permeable' membrane that partitions latency but not bandwidth. Per-flow queuing such as in [RFC8290] could be used but it partitions both latency and bandwidth between every end-to-end flow. So it is rather overkill, which brings disadvantages, not least that large number of queues are needed when two are sufficient. 2. Protocol: A host needs to distinguish L4S and Classic packets with an identifier so that the network can classify them into their separate treatments. [I-D.ietf-tsvwg-ecn-l4s-id] considers various alternative identifiers, and concludes that all alternatives involve compromises, but the ECT(1) and CE codepoints of the ECN field represent a workable solution. 3. Host: Scalable congestion controls already exist. They solve the scaling problem with TCP that was first pointed out in [RFC3649]. The one used most widely (in controlled environments) is Data Center TCP (DCTCP [RFC8257]). Although DCTCP as-is 'works' well over the public Internet, most implementations lack certain safety features that will be necessary once it is used outside controlled environments like data centers. A similar scalable congestion control will also need to be transplanted into protocols other than TCP (QUIC, SCTP, RTP/RTCP, RMCAT, etc.) Indeed, between the present document being drafted and published, the following scalable congestion controls were implemented: TCP Prague, QUIC Prague and an L4S variant of the RMCAT SCReAM controller [RFC8298]. Even & Huang Expires April 25, 2020 [Page 7] Internet-Draft DC Fast Congestion October 2019 Using Dual Queue provides better fairness between DCTCP and Reno/ Cubic . This is less relevant to Data Centers where the competing streams may use DCQN and DCTCP. 4.3. Network provides more information The new-generation high-speed cloud network congestion control protocol HPCC (High Precision Congestion Control) [HPCC], aiming to achieve the ultimate performance and high stability of the high-speed cloud network at the same time. HPCC has been presented at ACM SIGCOMM 2019. The key design choice of HPCC is to rely on switches to provide fine- grained load information, such as queue size and accumulated tx/rx traffic to compute precise flow rates. This has two major benefits: (i) HPCC can quickly converge to proper flow rates to highly utilize bandwidth while avoiding congestion; and (ii) HPCC can consistently maintain a close-to-zero queue for low latency. HPCC is a sender-driven CC framework. Each packet a sender sends will be acknowledged by the receiver. During the propagation of the packet from the sender to the receiver, each switch along the path leverages the INT feature of its switching ASIC to insert some meta- data that reports the current load of the packet's egress port, including timestamp (ts), queue length (qLen), transmitted bytes (txBytes), and the link bandwidth capacity (B). When the receiver gets the packet, it copies all the meta-data recorded by the switches to the ACK message it sends back to the sender. The sender decides how to adjust its flow rate each time it receives an ACK with network load information. Current IETF activity in IOAM [I-D.ietf-ippm-ioam-data] provides a standard mechanism for inserting metadata by the switches in the middle. IOAM can provides an optional method for sending the metadata feedback by the network to the endpoints on congestion status. But to using IOAM, the following points should be considered: 1. Is the current IOAM data fields sufficient for congestion control. 2. The encapsulation of IOAM in data center for congestion control. 3. The feedback format for sender driven congestion control. The HPCC framework requires each node in the middle to add information about its state to the forward going packet until it reaches the receiver who will send the acknowledgment. We can think Even & Huang Expires April 25, 2020 [Page 8] Internet-Draft DC Fast Congestion October 2019 of others modes like having the nodes in the middle updating the status information based on its available resources. This solution requires support for INT or IOAM, both protocols need to specify the packet format with the INT/IOAM extension. The HPCC document specify how to implement it for ROCEv2 while for IOAM there are some drafts in IPPM WG describing how to implement it for different transports and layer 2 packets. The conclusion from the trials done were that HPCC can be a next- generation CC for high-speed networks to achieve ultra-low latency, high bandwidth, and stability simultaneously. HPCC achieves fast convergence, small queues, and fairness by leveraging precise load information from INT. Similar mechanism is defined in Quick Start for TCP and IP[RFC4782]. There is a difference with the starting rate. While HPCC starts at maximum line speed [RFC4782] starts at a rate as specified in the Quick-Start request message. The Quick Start is specified for TCP, if other transport (UDP) is used there is a need to specify how the receiver send the Quick-Start response message. 4.4. Network provides proactive control The typical algorithm in this category is RCP (Rate Control Protocol) [RCP]. In the basic RCP algorithm , a router maintains a single rate, R(t), for every link. The router "stamps" R(t) on every passing packet (unless it already carries a slower value). The receiver sends the value back to the sender, thus informing it about the slowest (or bottleneck) rate along the path. In this way, the sender quickly finds out the rate it should be using (without the need for Slow-Start). The router updates R(t) approximately once per roundtrip time, and strives to emulate Processor Sharing among flows. The biggest plus of RCP is the short flow completion times under a wide range of network and traffic characteristics. The downside of RCP is that RCP involves the routers in congestion control, so it needs help from the infrastructure. Although they are simple, it does have per-packet computations. Another downside is that although the RCP algorithm strives to keep the buffer occupancy low most times, there are no guarantees of buffers not overflowing or of a zero packet loss. 5. Summary and Proposal Congestion control is all about how to utilize the network resource in a better and reasonably way under different network conditions. Senders are the reaction points that consume network resource, and network nodes are the congestion points. Ideally, reaction points Even & Huang Expires April 25, 2020 [Page 9] Internet-Draft DC Fast Congestion October 2019 should react as soon as possible when network statuses change. To achieve that, there are two directions: 5.1. Reflect the network status more accurately In order to provide more information than just ECN CE marking there is a need to standardize a mechanism for the network device to provide such information and for the receiver to send more information to the sender. The network device should not insert any new fields to the IP packet but should be able to modify the value of fields in the packets sent from the data sender. The network device will update the metadata in the forward going packet to provide more information than a single CE mark or SCE like solution. The receiver will analyze the metadata and report back to the sender. Different from the Internet, data center network can benefit more from having more accurate information to achieve better congestion control. And this means network and hosts must collaborate together to achieve it. Issues to be addressed: o How to add the metadata to the forward stream (IOAM is a valid option since we are interested in a single DC domain). The encapsulations for both IPv4 and IPv6 should be considered. o Negotiation of the capabilities of different nodes. o The format of the network information feedback to the sender in the case of sender-driven mechanisms. o The semantics of the message (notification or proactive) o Investigation of the extra load on the network device for adding the metadata. 5.2. Notify the reaction point as soon as possible. In this direction, it is worth to investigate if it's possible for the middle nodes to notify the sender directly (like IOAM Postcards) on network conditions, but such a method is challenging in terms of addressing security issues and the first concern will be that this can serve as a tool for DOS attack. But other ways, for example, carry the information in the reverse traffic would be an alternative as long as reverse traffic exists. Even & Huang Expires April 25, 2020 [Page 10] Internet-Draft DC Fast Congestion October 2019 Issues to be addressed: o How to deal with multiple congestion points? o How to identify support by the sender and receiver for this mode and support legacy systems (same as previous mode). o How to authenticate the validity of the data. o Hardware implications 6. Security Considerations TBD 7. IANA Considerations No IANA action 8. References 8.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, . [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 2017, . 8.2. Informative References [CongestionManagment] "Understanding RoCEv2 Congestion Management", 12 2018, . [DCQCN] Zhu, Y., Eran, H., Firestone, D., Guo, C., Lipshteyn, M., Liron, Y., Padhye, J., Raindel, S., Yahia, M. H., and M. Zhang, "Congestion control for large-scale RDMA deployments. In ACM SIGCOMM Computer Communication Review, Vol. 45. ACM, 523-536.", 8 2015, . Even & Huang Expires April 25, 2020 [Page 11] Internet-Draft DC Fast Congestion October 2019 [HPCC] Li, Y., Miao, R., Liur, H. H., Zhuang, Y., Feng, F., Tang, L., Cao, Z., Zhang, M., Kelly, F., Alizadeh, M., and M. Yu, "HPCC: High Precision Congestion Control", 8 2019, . [I-D.heist-tsvwg-sce-one-and-two-flow-tests] Heist, P., Grimes, R., and J. Morton, "Some Congestion Experienced One and Two-Flow Tests", draft-heist-tsvwg- sce-one-and-two-flow-tests-00 (work in progress), July 2019. [I-D.herbert-ipv4-eh] Herbert, T., "IPv4 Extension Headers and Flow Label", draft-herbert-ipv4-eh-01 (work in progress), May 2019. [I-D.ietf-ippm-ioam-data] Brockners, F., Bhandari, S., Pignataro, C., Gredler, H., Leddy, J., Youell, S., Mizrahi, T., Mozes, D., Lapukhov, P., Chang, R., daniel.bernier@bell.ca, d., and J. Lemon, "Data Fields for In-situ OAM", draft-ietf-ippm-ioam- data-07 (work in progress), September 2019. [I-D.ietf-quic-transport] Iyengar, J. and M. Thomson, "QUIC: A UDP-Based Multiplexed and Secure Transport", draft-ietf-quic-transport-23 (work in progress), September 2019. [I-D.ietf-tcpm-accurate-ecn] Briscoe, B., Kuehlewind, M., and R. Scheffenegger, "More Accurate ECN Feedback in TCP", draft-ietf-tcpm-accurate- ecn-09 (work in progress), July 2019. [I-D.ietf-tsvwg-aqm-dualq-coupled] Schepper, K., Briscoe, B., and G. White, "DualQ Coupled AQMs for Low Latency, Low Loss and Scalable Throughput (L4S)", draft-ietf-tsvwg-aqm-dualq-coupled-10 (work in progress), July 2019. [I-D.ietf-tsvwg-ecn-l4s-id] Schepper, K. and B. Briscoe, "Identifying Modified Explicit Congestion Notification (ECN) Semantics for Ultra-Low Queuing Delay (L4S)", draft-ietf-tsvwg-ecn-l4s- id-07 (work in progress), July 2019. Even & Huang Expires April 25, 2020 [Page 12] Internet-Draft DC Fast Congestion October 2019 [I-D.ietf-tsvwg-l4s-arch] Briscoe, B., Schepper, K., Bagnulo, M., and G. White, "Low Latency, Low Loss, Scalable Throughput (L4S) Internet Service: Architecture", draft-ietf-tsvwg-l4s-arch-04 (work in progress), July 2019. [I-D.morton-taht-tsvwg-sce] Morton, J. and D. Taht, "The Some Congestion Experienced ECN Codepoint", draft-morton-taht-tsvwg-sce-00 (work in progress), March 2019. [IEEE.802.1QBB_2011] IEEE, "IEEE Standard for Local and metropolitan area networks--Media Access Control (MAC) Bridges and Virtual Bridged Local Area Networks--Amendment 17: Priority-based Flow Control", IEEE 802.1Qbb-2011, DOI 10.1109/ieeestd.2011.6032693, September 2011, . [QCN] Alizadeh, M., Atikoglu, B., Kabbani, A., Lakshmikantha, A., Pan, R., Prabhakar, B., and M. Seaman, "Data Center Transport Mechanisms:Congestion Control Theory and IEEE Standardization", 9 2008, . [RCP] Dukkipati, N., "RATE CONTROL PROTOCOL (RCP): CONGESTION CONTROL TO MAKE FLOWS COMPLETE QUICKLY", 10 2007, . [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition of Explicit Congestion Notification (ECN) to IP", RFC 3168, DOI 10.17487/RFC3168, September 2001, . [RFC3649] Floyd, S., "HighSpeed TCP for Large Congestion Windows", RFC 3649, DOI 10.17487/RFC3649, December 2003, . [RFC4782] Floyd, S., Allman, M., Jain, A., and P. Sarolahti, "Quick- Start for TCP and IP", RFC 4782, DOI 10.17487/RFC4782, January 2007, . [RFC5040] Recio, R., Metzler, B., Culley, P., Hilland, J., and D. Garcia, "A Remote Direct Memory Access Protocol Specification", RFC 5040, DOI 10.17487/RFC5040, October 2007, . Even & Huang Expires April 25, 2020 [Page 13] Internet-Draft DC Fast Congestion October 2019 [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion Control", RFC 5681, DOI 10.17487/RFC5681, September 2009, . [RFC6679] Westerlund, M., Johansson, I., Perkins, C., O'Hanlon, P., and K. Carlberg, "Explicit Congestion Notification (ECN) for RTP over UDP", RFC 6679, DOI 10.17487/RFC6679, August 2012, . [RFC8033] Pan, R., Natarajan, P., Baker, F., and G. White, "Proportional Integral Controller Enhanced (PIE): A Lightweight Control Scheme to Address the Bufferbloat Problem", RFC 8033, DOI 10.17487/RFC8033, February 2017, . [RFC8085] Eggert, L., Fairhurst, G., and G. Shepherd, "UDP Usage Guidelines", BCP 145, RFC 8085, DOI 10.17487/RFC8085, March 2017, . [RFC8257] Bensley, S., Thaler, D., Balasubramanian, P., Eggert, L., and G. Judd, "Data Center TCP (DCTCP): TCP Congestion Control for Data Centers", RFC 8257, DOI 10.17487/RFC8257, October 2017, . [RFC8290] Hoeiland-Joergensen, T., McKenney, P., Taht, D., Gettys, J., and E. Dumazet, "The Flow Queue CoDel Packet Scheduler and Active Queue Management Algorithm", RFC 8290, DOI 10.17487/RFC8290, January 2018, . [RFC8298] Johansson, I. and Z. Sarker, "Self-Clocked Rate Adaptation for Multimedia", RFC 8298, DOI 10.17487/RFC8298, December 2017, . [RFC8312] Rhee, I., Xu, L., Ha, S., Zimmermann, A., Eggert, L., and R. Scheffenegger, "CUBIC for Fast Long-Distance Networks", RFC 8312, DOI 10.17487/RFC8312, February 2018, . [RoCEv2] "Infiniband Trade Association. Supplement to InfiniBand architecture specification volume 1 release 1.2.2 annex A17: RoCEv2 (IP routable RoCE).", . Even & Huang Expires April 25, 2020 [Page 14] Internet-Draft DC Fast Congestion October 2019 Authors' Addresses Roni Even Huawei Email: roni.even@huawei.com Rachel Huang Huawei Technologies Co., Ltd. Email: rachel.huang@huawei.com Even & Huang Expires April 25, 2020 [Page 15]