Internet Engineering Task Force G. Fairhurst Internet-Draft University of Aberdeen Intended status: Standards Track November 3, 2019 Expires: May 6, 2020 Guidelines for Internet Congestion Control at Endpoints draft-fairhurst-tsvwg-cc-04 Abstract This document provides guidance on the design of methods to avoid congestion collapse and to provide congestion control. Recommendations and requirements on this topic are distributed across many documents in the RFC series. This therefore seeks to gather and consolidate these recommendations and provide overall guidance. It is intended to provide input to the design of new congestion control methods in protocols, such as the IETF Quick UDP Internet Connections (QUIC) transport. The present document is for discussion and comment by the IETF. If published, it plans to update or replace the Best Current Practice in BCP 41, which currently includes "Congestion Control Principles" provided in RFC2914. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on May 6, 2020. Copyright Notice Copyright (c) 2019 IETF Trust and the persons identified as the document authors. All rights reserved. Fairhurst Expires May 6, 2020 [Page 1] Internet-Draft CC Guidelines November 2019 This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1. Best Current Practice in the RFC-Series . . . . . . . . . 3 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 5 3. Principles of Congestion Control . . . . . . . . . . . . . . 5 3.1. A Diversity of Path Characteristics . . . . . . . . . . . 5 3.2. Flow Multiplexing and Congestion . . . . . . . . . . . . 6 3.3. Avoiding Congestion Collapse and Flow Starvation . . . . 9 4. Guidelines for Performing Congestion Control . . . . . . . . 10 4.1. Connection Initialization . . . . . . . . . . . . . . . . 10 4.2. Using Path Capacity . . . . . . . . . . . . . . . . . . . 12 4.3. Timers and Retransmission . . . . . . . . . . . . . . . . 13 4.4. Responding to Potential Congestion . . . . . . . . . . . 15 4.5. Using More Capacity . . . . . . . . . . . . . . . . . . . 16 4.6. Network Signals . . . . . . . . . . . . . . . . . . . . . 17 4.7. Protection of Protocol Mechanisms . . . . . . . . . . . . 18 5. IETF Guidelines on Evaluation of Congestion Control . . . . . 18 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 18 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 19 8. Security Considerations . . . . . . . . . . . . . . . . . . . 19 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 19 9.1. Normative References . . . . . . . . . . . . . . . . . . 19 9.2. Informative References . . . . . . . . . . . . . . . . . 20 Appendix A. Revision Notes . . . . . . . . . . . . . . . . . . . 25 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 26 1. Introduction The IETF has specified Internet transports (e.g., TCP [I-D.ietf-tcpm-rfc793bis], UDP [RFC0768], UDP-Lite [RFC3828], SCTP [RFC4960], and DCCP [RFC4340]) as well as protocols layered on top of these transports (e.g., RTP [RFC3550], QUIC [I-D.ietf-quic-transport], SCTP/UDP [RFC6951], DCCP/UDP [RFC6773]) and transports that work directly over the IP network layer. These transports are implemented in endpoints (either Internet hosts or routers acting as endpoints), and are designed to detect and react to network congestion. TCP was the first transport to provide this, Fairhurst Expires May 6, 2020 [Page 2] Internet-Draft CC Guidelines November 2019 although the TCP specifications found in RFC 793 predates the inclusion of congestion control and did not contain any discussion of using or managing a congestion window. Recommendations and requirements on this topic are distributed across many documents in the RFC series. This document therefore seeks to gather and consolidate these recommendations and provide overall guidelines. It is intended to provide input to the design of congestion control methods that are implemented protocols. The focus of the present document is upon unicast point-to-point transports, this includes migration from using one path to another path. Some recommendations [RFC5783] and requirements in this document apply to point-to-multipoint transports (e.g., multicast), however this topic extends beyond the current document's scope. [RFC2914] provides additional guidance on the use of multicast. 1.1. Best Current Practice in the RFC-Series Like RFC2119, this documents borrows heavily from earlier publications addressing the need for end-to-end congestion control, and this subsection provides an overview of key topics. [RFC2914] provides a general discussion of the principles of congestion control. Section 3 discussed Fairness, stating "The equitable sharing of bandwidth among flows depends on the fact that all flows are running compatible congestion control algorithms". Section 3.1 describes preventing congestion collapse. Congestion collapse was first reported in the mid 1980s [RFC0896], and at that time was largely due to TCP connections unnecessarily retransmitting packets that were either in transit or had already been received at the receiver. We call the congestion collapse that results from the unnecessary retransmission of packets classical congestion collapse. Classical congestion collapse is a stable condition that can result in throughput that is a small fraction of normal [RFC0896]. Problems with classical congestion collapse have generally been corrected by improvements to timer and congestion control mechanisms, implemented in modern implementations of TCP [Jacobson88]. This classical congestion collapse was a key focus of [RFC2309]. A second form of congestion collapse occurs due to undelivered packets, where Section 5 of [RFC2914] notes: "Congestion collapse from undelivered packets arises when bandwidth is wasted by delivering packets through the network that are dropped before reaching their ultimate destination. This is probably the largest unresolved danger with respect to congestion collapse in the Internet Fairhurst Expires May 6, 2020 [Page 3] Internet-Draft CC Guidelines November 2019 today. Different scenarios can result in different degrees of congestion collapse, in terms of the fraction of the congested links' bandwidth used for productive work. The danger of congestion collapse from undelivered packets is due primarily to the increasing deployment of open-loop applications not using end-to-end congestion control. Even more destructive would be best-effort applications that *increase* their sending rate in response to an increased packet drop rate (e.g., automatically using an increased level of FEC (Forward Error Correction))." Section 3.3 of [RFC2914] notes: "In addition to the prevention of congestion collapse and concerns about fairness, a third reason for a flow to use end-to-end congestion control can be to optimize its own performance regarding throughput, delay, and loss. In some circumstances, for example in environments with high statistical multiplexing, the delay and loss rate experienced by a flow are largely independent of its own sending rate. However, in environments with lower levels of statistical multiplexing or with per-flow scheduling, the delay and loss rate experienced by a flow is in part a function of the flow's own sending rate. Thus, a flow can use end-to-end congestion control to limit the delay or loss experienced by its own packets. We would note, however, that in an environment like the current best-effort Internet, concerns regarding congestion collapse and fairness with competing flows limit the range of congestion control behaviors available to a flow." In addition to the prevention of congestion collapse and concerns about fairness, a flow using end-to-end congestion control can optimize its own performance regarding throughput, delay, and loss [RFC2914]. The standardization of congestion control in new transports can avoid a congestion control "arms race" among competing protocols [RFC2914]. That is, avoid designs of transports that could compete for Internet resource in a way that significantly reduces the ability of other flows to use the Internet. The use of standard methods is therefore encouraged. The popularity of the Internet has led to a proliferation in the number of TCP implementations [RFC2914]. A variety of non-TCP transports have also being deployed. Some transport implementations fail to use standardised congestion avoidance mechanisms correctly because of poor implementation [RFC2525]. However, this is not the only reason fro not using standard methods. Some transports have chosen mechanisms that are not presently standardised, or have adopted approaches to their design that differ from present standards. Guidance is needed therefore not only for future standardisation, but to ensure safe and appropriate evolution of Fairhurst Expires May 6, 2020 [Page 4] Internet-Draft CC Guidelines November 2019 transports that have not presently been submitted for standardisation. 2. Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119]. The path between endpoints (sometimes called "Internet Hosts" or source and destination nodes in IPv6) consists of the endpoint protocol stack at the sender and the receiver (which together implement the transport service), and a succession of links and network devices (routers or middleboxes) that provide connectivity across the network. The set of network devices forming the path is not usually fixed, and it should generally be assumed that this set can change over arbitrary lengths of time. [RFC5783] defines congestion control as "the feedback-based adjustment of the rate at which data is sent into the network. Congestion control is an indispensable set of principles and mechanisms for maintaining the stability of the Internet." [RFC5783] also provides an informational snapshot taken by the IRTF's Internet Congestion Control Research Group (ICCRG) from October 2008. Other terminology is directly copied from the cited RFCs. 3. Principles of Congestion Control This section summarises the principles for providing congestion control, and provides the background forSection 4. 3.1. A Diversity of Path Characteristics Internet transports can reserve capacity at routers or on the links being used, but most uses do not rely upon prior reservation of capacity along the path they use. In the absence of such a reservation, endpoints are unable to determine a safe rate at which to start or continue their transmission. The use of an Internet path therefore requires a combination of end-to-end transport mechanisms to detect and then respond to changes in the capacity that it discovers is available across the network path. Buffering (an increase in latency) or congestion loss (discard of a packet) arises when the traffic arriving at a link or network exceeds the resources available. Loss can also occur for other reasons, but it is usually not possible for an endpoint to reliably disambiguate the cause of packet loss (e.g., loss could be due to link corruption, receiver overrun, etc. [RFC3819]). A network device that does not support Fairhurst Expires May 6, 2020 [Page 5] Internet-Draft CC Guidelines November 2019 Active Queue Management (AQM) [RFC7567] typically uses a drop-tail policy to drop excess IP packets when its queue(s) becomes full. When a transport uses a path to send packets (i.e. a flow), this impacts any other Internet flows (possibly from or to other endpoints) that share the capacity of any common network device or link (i.e., are multiplexed) along the path. As with loss, latency can also be incurred for other reasons [RFC3819] (Quality of Service link scheduling, link radio resource management/bandwidth on demand, transient outages, link retransmission, and connection/resource setup below the IP layer, etc). When choosing an appropriate sending rate, packet loss needs to be considered. Although losses are not always due to congestion, endpoint congestion control needs to conservatively react to loss as a potential signal of reduced available capacity and reduce the sending rate. Many designs place the responsibility of rate-adaption at the sender (source) endpoint, utilising feedback information provided by the remote endpoint (receiver). Congestion control can also be implemented by determining an appropriate rate limit at the receiver and using this limit to control the maximum transport rate (e.g., using methods such as [RFC5348] and [RFC4828]). Principles include: o A transport design is REQUIRED be robust to a change in the set of devices forming the network path. A reconfiguration, reset or other event could interrupt this path or trigger a change in the set of network devices forming the path. o Transports are REQUIRED to operate safely over the wide range of path characteristics presented by Internet paths. o The path characteristics can change over relatively short intervals of time (i.e., characteristics discovered do not necessarily remain valid for multiple Round Trip Times, RTTs). In particular, the transport SHOULD measure and adapt to the characteristics of the path(s) being used. 3.2. Flow Multiplexing and Congestion It is normal to observe some perturbation in latency and/or loss when flows shares a common network bottleneck with other traffic. This impact needs to be considered and Internet flows ought to implement appropriate safeguards to avoid inappropriate impact on other flows that share the resources along a path. Congestion control methods satisfy this requirement and therefore can help avoid congestion collapse. Fairhurst Expires May 6, 2020 [Page 6] Internet-Draft CC Guidelines November 2019 "This raises the issue of the appropriate granularity of a "flow", where we define a `flow' as the level of granularity appropriate for the application of both fairness and congestion control. [RFC2309] states: "There are a few `natural' answers: 1) a TCP or UDP connection (source address/port, destination address/port); 2) a source/destination host pair; 3) a given source host or a given destination host. We would guess that the source/destination host pair gives the most appropriate granularity in many circumstances. The granularity of flows for congestion management is, at least in part, a policy question that needs to be addressed in the wider IETF community." [RFC2914] Internet transports need to react to avoid congestion that impacts other flows sharing a path. The Requirements for Internet Hosts [RFC1122] formally mandates that endpoints perform congestion control. "Because congestion control is critical to the stable operation of the Internet, applications and other protocols that choose to use UDP as an Internet transport must employ mechanisms to prevent congestion collapse and to establish some degree of fairness with concurrent traffic [RFC2914]. Additional mechanisms are, in some cases, needed in the upper layer protocol for an application that sends datagrams (e.g., using UDP) [RFC8085]. Endpoints can send more than one flow. "The specific issue of a browser opening multiple connections to the same destination has been addressed by [RFC2616]. Section 8.1.4 states that "Clients that use persistent connections SHOULD limit the number of simultaneous connections that they maintain to a given server. A single-user client SHOULD NOT maintain more than 2 connections with any server or proxy." [RFC2140]. This suggests that there are opportunities for transport connections between the same endpoints (from the same or differing applications) might share some information, including their congestion control state, if they are known to share the same path. [RFC8085] adds "An application that forks multiple worker processes or otherwise uses multiple sockets to generate UDP datagrams SHOULD perform congestion control over the aggregate traffic." An endpoint can become aware of congestion by various means (including packet loss Section 3.1). A signal that indicates congestion on the end-to-end network path, needs to result in a congestion control reaction by the transport to reduce the maximum rate permitted by the sending endpoint[RFC8087]). The general recommendation in the UDP Guidelines [RFC8085] is that applications SHOULD leverage existing congestion control techniques, such as those defined for TCP [RFC5681], TCP-Friendly Rate Control (TFRC) [RFC5348], SCTP [RFC4960], and other IETF-defined transports. This is because there are many trade offs and details that can have a Fairhurst Expires May 6, 2020 [Page 7] Internet-Draft CC Guidelines November 2019 serious impact on the performance of congestion control for the application they support and other traffic that seeks to share the resources along the path over which they communicate. Network devices can be configured to isolate the queuing of packets for different flows, or aggregates of flows, and thereby assist in reducing the impact of flow multiplexing on other flows. This could include methods seeking to equally distribute resources between sharing flows, but this is explicitly not a requirement for a network device [Flow-Rate-Fairness]. Endpoints can not rely on the presence and correct configuration of these methods, and therefore even when a path is expected to support such methods, also need to employ methods that work end-to-end. Experience has shown that successful protocols developed in a specific context or for a particular application tend to also become used in a wider range of contexts. Therefore, IETF specifications by default target deployment on the general Internet, or need to be defined for use only within a controlled environment. Principles include: o Endpoints MUST perform congestion control [RFC1122] and SHOULD leverage existing congestion control techniques [RFC8085]. o If an application or protocol chooses not to use a congestion- controlled transport protocol, it SHOULD control the rate at which it sends datagrams to a destination host, in order to fulfil the requirements of [RFC2914], as stated in [RFC8085]. o Transports SHOULD control the aggregate traffic they send on a path [RFC8085]. They ought not to use multiple congestion- controlled flows between the same endpoints to gain a performance advantage. o Transports that do not target Internet deployment need to be constrained to only operate in a controlled environment (e.g., see Section 3.6 of [RFC8085]) and provide appropriate mechanisms to prevent traffic accidentally leaving the controlled environment [RFC8084]. o Although network devices can be configured to reduce the impact of flow multiplexing on other flows, endpoints MUST NOT rely solely on the presence and correct configuration of these methods, except when constrained to operate in a controlled environment. Fairhurst Expires May 6, 2020 [Page 8] Internet-Draft CC Guidelines November 2019 3.3. Avoiding Congestion Collapse and Flow Starvation A significant pathology can arise when a poorly designed transport creates congestion. This can result in severe service degradation or "Internet meltdown". This phenomenon was first observed during the early growth phase of the Internet in the mid 1980s [RFC0896] [RFC0970]. It is technically called "Congestion Collapse". [RFC2914] notes that informally, "congestion collapse occurs when an increase in the network load results in a decrease in the useful work done by the network." Transports need to be specifically designed with measures to avoid starving other flows of capacity (e.g., [RFC7567]). [RFC2309] also discussed the dangers of congestion-unresponsive flows, and states that "all UDP-based streaming applications should incorporate effective congestion avoidance mechanisms." [RFC7567] and [RFC8085] both reaffirm this, encouraging development of methods to prevent starvation. Principles include: o Transports MUST avoid inducing flow starvation to other flows that share resources along the path they use. o Endpoints MUST treat a loss of all feedback (e.g., expiry of a retransmission time out, RTO) as an indication of persistent congestion (i.e., an indication of potential congestion collapse). o When an endpoint detects persistent congestion, it MUST reduce the maximum rate (e.g., reduce its congestion window). This normally involves the use of protocol timers to detect a lack of acknowledgment for transmitted data (Section 4.3). o Protocol timers (e.g., used for retransmission or to detect persistent congestion) need to be appropriately initialised. A transport SHOULD adapt its protocol timers to follow the measured the path Round Trip Rime (RTT) (e.g., Section 3.1.1 of [RFC8085]). o A transport MUST employ exponential backoff each time persistent congestion is detected [RFC1122], until the path characteristics can again be confirmed. o Network devices MAY provide mechanisms to mitigate the impact of congestion collapse by transport flows (e.g., priority forwarding of control information, and starvation detection), and SHOULD mitigate the impact of non-conformant and malicious flows [RFC7567]). These mechanisms complement, but do not replace, the endpoint congestion avoidance mechanisms. Fairhurst Expires May 6, 2020 [Page 9] Internet-Draft CC Guidelines November 2019 4. Guidelines for Performing Congestion Control This section provides guidance for designers of a new transport protocol that decide to implement congestion control and its associated mechanisms. The text draws on language used in the specifications of TCP and other IETF transports. For example, a protocol timer is generally needed to detect persistent congestion, and this document uses the term Retransmission Timeout (RTO) to refer to the operation of this timer. Similarly, the document refers to a congestion window as the variable that controls the rate of transmission by the congestion controller. The use of these terms does not imply that endpoints need to implement functions in the way that TCP currently does. Each new transport needs to make its own design decisions about how to meet the recommendations and requirements for congestion control. 4.1. Connection Initialization When a connection or flow to a new destination is established, the endpoints have little information about the characteristics of the network path they will use. This section describes how a flow starts transmission over such a path. Flow Start: A new flow between two endpoints needs to initialise a congestion controller for the path it will use. It cannot assume that capacity is available at the start of the flow, unless it uses a mechanism to explicitly reserve capacity. In the absence of a capacity signal, a flow MUST therefore start slowly. The TCP slow-start algorithm is the accepted standard for flow startup [RFC5681]. TCP uses the notion of an Initial Window (IW) [RFC3390], updated by [RFC6928]) to define the initial volume of data that can be sent on a path. This is not the smallest burst, or the smallest window, but it is considered a safe starting point for a path that is not suffering persistent congestion, and is applicable until feedback about the path is received. The initial sending rate (e.g., determined by the IW) needs to be viewed as tentative until the capacity is confirmed to be available. Initial RTO Interval: When a flow sends the first packet, it typically has no way to know the actual RTT of the path it will use. An initial value needs to be used to initialise the principal retransmission timer, which will be used to detect lack of responsiveness from the remote endpoint. In TCP, this is the starting value of the RTO. The selection of a safe initial value is a trade off that has important consequences on the overall Internet stability [RFC6928] [RFC8085]. In the absence of any Fairhurst Expires May 6, 2020 [Page 10] Internet-Draft CC Guidelines November 2019 knowledge about the latency of a path (including the initial value), the RTO MUST be conservatively set to no less than 1 second. Values shorter than 1 second can be problematic (see the appendix of [RFC6298]). (Note: Linux TCP has deployed a smaller initial RTO value). [[Author note: It could be useful to discuss cached values]]. Initial RTO Expiry: If the RTO timer expires while awaiting completion of a connection setup, or handshake (e.g., the three- way handshake in TCP, the ACK of a SYN segment), and the implementation is using an RTO less than 3 seconds, the local endpoint can resend the connection setup. [[Author note: It would be useful to discuss how the timer is managed to protect from multiple handshake failure]]. The RTO MUST then be re-initialized to increase it to 3 seconds when data transmission begins (i.e., after the handshake completes) [RFC6298] [RFC8085]. This conservative increase is necessary to avoid congestion collapse when many flows retransmit across a shared bottleneck with restricted capacity. Initial Measured RTO: Once an RTT measurement is available (e.g., through reception of an acknowledgement), the timeout value must be adjusted. This adjustment MUST take into account the RTT variance. For the first sample, this variance cannot be determined, and a local endpoint MUST therefore initialise the variance to RTT/2 (see equation 2.2 of [RFC6928] and related text for UDP in section 3.1.1 of [RFC8085]). Current State: A congestion controller MAY assume that recently used capacity between a pair of endpoints is an indication of future capacity available in the next RTT between the same endpoints. It MUST react (reduce its rate) if this is not (later) confirmed to be true. [[Author note: do we need to bound this]]. Cached State: A congestion controller that recently used a specific path could use additional state that lets a flow take-over the capacity that was previously consumed by another flow (e.g., in the last RTT) which it understands is using the same path and no will longer use the capacity it recently used. In TCP, this mechanism is referred to as TCP Control Block (TCB) sharing [RFC2140] [I-D.ietf-tcpm-2140bis]. The capacity and other information can be used to suggest a faster initial sending rate, but this information MUST be viewed as tentative until the path capacity is confirmed by receiving a confirmation that actual traffic has been sent across the path. (i.e., the new flow needs to either use or loose the capacity that has been tentatively Fairhurst Expires May 6, 2020 [Page 11] Internet-Draft CC Guidelines November 2019 offered to it). A sender MUST reduce its rate if this capacity is not confirmed within the current RTO interval. 4.2. Using Path Capacity This section describes how a sender needs to regulate the maximum volume of data in flight over the interval of the current RTT, and how it manages transmission of the capacity that it perceives is available. Congestion Management: The capacity available to a flow could be expressed as the number of bytes in flight, the sending rate or a limit on the number of unacknowledged segments. When determining the capacity used, all data sent by a sender needs to be accounted, this includes any additional overhead or data generated by the transport. A transport performing congestion management will usually optimise performance for its application by avoiding excessive loss or delay and maintain a congestion window. In steady-state this congestion window reflects a safe limit to the sending rate that has not resulted in persistent congestion. A congestion controller for a flow that uses packet Forward Error Correction (FEC) encoding (e.g., [RFC6363]) needs to consider all additional overhead introduced by packet FEC when setting and managing its congestion window. One common model views the path between two endpoints as a "pipe". New packets enter the pipe at the sending endpoint, older ones leave the pipe at the receiving endpoint. Congestion and other forms of loss result in "leakage" from this pipe. Received data (leaving the network path at the remote endpoint) is usually acknowledged to the congestion controller. The rate that data leaves the pipe indicates the share of the capacity that has been utilised by the flow. If, on average (over an RTT), the sending rate equals the receiving rate, this indicates the path capacity. This capacity can be safely used again in the next RTT. If the average receiving rate is less than the sending rate, then the path is either queuing packets, the RTT/path has changed, or there is packet loss. Transient Path: Unless managed by a resource reservation protocol, path capacity information is transient. A sender that does not use capacity has no understanding whether previously used capacity remains available to use, or whether that capacity has disappeared (e.g., a change in the path that causes a flow to experience a smaller bottleneck, or when more traffic emerges that consumes previously available capacity resulting in a new bottleneck). For this reason, a transport that is limited by the volume of data Fairhurst Expires May 6, 2020 [Page 12] Internet-Draft CC Guidelines November 2019 available to send MUST NOT continue to grow its congestion window when the current congestion window is more than twice the volume of data acknowledged in the last RTT. Validating the congestion window Standard TCP states that a TCP sender "SHOULD set the congestion window to no more than the Restart Window (R)" before beginning transmission, if the sender has not sent data in an interval that exceeds the current retransmission timeout, i.e., when an application becomes idle [RFC5681]. An experimental specification [RFC7661] permits TCP senders to tentatively maintain a congestion window larger than the path supported in the last RTT when application-limited, provided that they appropriately and rapidly collapse the congestion window when potential congestion is detected. This mechanism is called Congestion Window Validation (CWV). Burst Mitigation: Even in the absence of congestion, statistical multiplexing of flows can result in transient effects for flows sharing common resources. A sender therefore SHOULD avoid inducing excessive congestion to other flows (collateral damage). While a congestion controller ought to limit sending at the granularity of the current RTT, this can be insufficient to satisfy the goals of preventing starvation and mitigating collateral damage. This requires moderating the burst rate of the sender to avoid significant periods where a flow(s) consume all buffer capacity at the path bottleneck, which would otherwise prevent other flows from gaining a reasonable share. Endpoints SHOULD provide mechanisms to regulate the bursts of transmission that the application/protocol sends to the network (section 3.1.6 of [RFC8085]). ACK-Clocking [RFC5681] can help mitigate bursts for protocols that receive continuous feedback of reception (such as TCP). Sender pacing can mitigate this [RFC8085], (See Section 4.6 of [RFC3449]), and has been recommended for TCP in conditions where ACK-Clocking is not effective, (e.g., [RFC3742], [RFC7661]). SCTP [RFC4960] defines a maximum burst length (Max.Burst) with a recommended value of 4 segments to limit the SCTP burst size. 4.3. Timers and Retransmission This section describes mechanisms to detect and provide retransmission, and to protect the network in the absence of timely feedback. Loss Detection: Loss detection occurs after a sender determines there is no delivery confirmation within an expected period of Fairhurst Expires May 6, 2020 [Page 13] Internet-Draft CC Guidelines November 2019 time (e.g., by observing the time-ordering of the reception of ACKs, as in TCP DupACK) or by utilising a timer to detect loss (e.g., a transmission timer with a period less than the RTO, [RFC8085] [I-D.ietf-tcpm-rack]) or a combination of using a timer and ordering information to trigger retransmission of data. Retransmission: Retransmission of lost packets or messages is a common reliability mechanism. When loss is detected, the sender can choose to retransmit the lost data, ignore the loss, or send other data (e.g., [RFC8085] [I-D.ietf-quic-recovery]), depending on the reliability model provided by the transport service. Any transmission consumes network capacity, therefore retransmissions MUST NOT increase the network load in response to congestion loss (which worsens that congestion) [RFC8085]. Any method that sends additional data following loss is therefore responsible for congestion control of the retransmissions (and any other packets sent, including FEC information) as well as the original traffic. Measuring the RTT: Once an endpoint has started communicating with its peer, the RTT be MUST adjusted by measuring the actual path RTT. This adjustment MUST include adapting to the measured RTT variance (see equation 2.3 of [RFC6928]). Maintaining the RTO: The RTO SHOULD be set based on recent RTT observations (including the RTT variance) [RFC8085]. RTO Expiry: Persistent lack of feedback (e.g., detected by an RTO timer, or other means) MUST be treated an indication of potential congestion collapse. A failure to receive any specific response within a RTO interval could potentially be a result of a RTT change, change of path, excessive loss, or even congestion collapse. If there is no response within the RTO interval, TCP collapses the congestion window to one segment [RFC5681]. Other transports MUST similarly respond when they detect loss of feedback. An endpoint needs to exponentially backoff the RTO interval [RFC8085] each time the RTO expires. That is, the RTO interval MUST be set to at least the RTO * 2 [RFC6298] [RFC8085]. Maximum RTO: A maximum value MAY be placed on the RTO interval. This maximum limit to the RTO interval MUST NOT be less than 60 seconds [RFC6298]. [[ Author Note: These recommendations should be re-evaluated in lite of the current chartered work in the TCPM WG. ]] Fairhurst Expires May 6, 2020 [Page 14] Internet-Draft CC Guidelines November 2019 4.4. Responding to Potential Congestion Internet flows SHOULD implement appropriate safeguards to avoid inappropriate impact on other flows that share the resources along a path. The safety and responsiveness of new proposals need to be evaluated [RFC5166]. In determining an appropriate congestion response, designs could take into consideration the size of the packets that experience congestion [RFC4828]. Congestion Response: An endpoint MUST promptly reduce the rate of transmission when it receive or detects an indication of congestion (e.g., loss) [RFC2914]. TCP Reno established a method that relies on multiplicative- decrease to halve the sending rate while congestion is detected. This response to congestion indications is considered sufficient for safe Internet operation, but other decrease factors have also been published in the RFC Series [RFC8312]. ECN Response: A congestion control design should provide the necessary mechanisms to support Explicit Congestion Notification (ECN) [RFC3168] [RFC6679], as described in section 3.1.7 of [RFC8085]. This can help determine an appropriate congestion window when supported by routers on the path [RFC7567] to enable rapid early indication of incipient congestion. The early detection of incipient congestion justifies a different reaction to an explicit congestion signal compared to the reaction to detected packet loss [RFC8311] [RFC8087]. Simple feedback of received Congestion Experienced (CE) marks [RFC3168], relies only on an indication that congestion has been experienced within the last RTT. This style of response is appropriate when a flow uses ECT(0). The reaction to reception of this indication was modified in TCP ABE [RFC8511]. Further detail about the received CE- marking can be obtained by using more accurate receiver feedback (e.g., [I-D.ietf-tcpm-accurate-ecn] and extended RTP feedback). The more detailed feedback provides an opportunity for a finer- granularity of congestion response. Current work-in-progress [I-D.ietf-tsvwg-l4s-arch]defines a reaction for packets marked with ECT(1), building on the style of detailed feedback provided by [I-D.ietf-tcpm-accurate-ecn] and a modified marking system [I-D.ietf-tsvwg-aqm-dualq-coupled]. Robustness to Path Change: The detection of congestion and the resulting reduction MUST NOT solely depend upon reception of a signal from the remote endpoint, because congestion indications could themselves be lost under persistent congestion. Fairhurst Expires May 6, 2020 [Page 15] Internet-Draft CC Guidelines November 2019 The only way to surely confirm that a sending endpoint has successfully communicated with a remote endpoint is to utilise a timer (seeSection 4.3) to detect a lack of response that could result from a change in the path or the path characteristics (usually called the RTO). Congestion controllers that are unable to react after one (or at most a few) RTTs after receiving a congestion indication should observe the guidance in section 3.3 of the UDP Guidelines [RFC8085]. Persistent Congestion: Persistent congestion can result in congestion collapse, which MUST be aggressively avoided [RFC2914]. Endpoints that experience persistent congestion and have already exponentially reduced their congestion window to the restart window (e.g., one packet), MUST further reduce the rate if the RTO timer continues to expire. For example, TFRC [RFC5348] continues to reduce its sending rate under persistent congestion to one packet per RT, and then exponentially backs off the time between single packet transmissions if the congestion continues to persist [RFC2914]. [RFC8085] provides guidelines for a sender that does not, or is unable to, adapt the congestion window. 4.5. Using More Capacity In the absence of persistent congestion, an endpoint is permitted to increase its congestion window and hence the sending rate. An increase should only occur when there is additional data available to send across the path (i.e., the sender will utilise the additional capacity in the next RTT). TCP Reno [RFC5681] defines an algorithm, known as the Additive- Increase/ Multiplicative-Decrease (AIMD) algorithm, which allows a sender to exponentially increase the congestion window each RTT from the initial window to the first detected congestion event. This is designed to allow new flows to rapidly acquire a suitable congestion window. Where the bandwidth delay product (BDP) is large, it can take many RTT periods to determine a suitable share of the path capacity. Such high BDP paths benefit from methods that more rapidly increase the congestion window, but in compensation these need to be designed to also react rapidly to any detected congestion (e.g., TCP Cubic [RFC8312]). Increasing Congestion Window: A sender MUST NOT continue to increase its rate for more than an RTT after a congestion indication is received. The transport SHOULD stop increasing its congestion window as soon as it receives indication of congestion to avoid excessive "overshoot". Fairhurst Expires May 6, 2020 [Page 16] Internet-Draft CC Guidelines November 2019 While the sender is increasing the congestion window, a sender will transmit faster than the last known safe rate. Any increase above the last confirmed rate needs to be regarded as tentative and the sender reduce their rate below the last confirmed safe rate when congestion is experienced (a congestion event). Congestion: An endpoint MUST utilise a method that assures the sender will keep the rate below the previously confirmed safe rate for multiple RTT periods after an observed congestion event. In TCP, this is performed by using a linear increase from a slow start threshold that is re-initialised when congestion is experienced. Avoiding Overshoot: Overshoot of the congestion window beyond the point of congestion can significantly impact other flows sharing resources along a path. It is important to note that as endpoints experience more paths with a large BDP and a wider range of potential path RTT, that variability or changes in the path can have very significant constraints on appropriate dynamics for increasing the congestion window (see also burst mitigation, Section 4.2). 4.6. Network Signals An endpoint can utilise signals from the network to help determine how to regulate the traffic it sends. Network Signals: Mechanisms MUST NOT solely rely on transport messages or specific signalling messages to perform safely. (See section 5.2 of [RFC8085] describing use of ICMP messages). They need to be designed so that they safely operate when path characteristics change at any time. Transport mechanisms MUST robust to potential black-holing of any signals (i.e., need to be robust to loss or modification of packets, noting that this can occur even after successful first use of a signal by a flow, as occurs when the path changes, see Section 3.1). A mechanism that utilises signals originating in the network (e.g., RSVP, NSIS, Quick-Start, ECN), MUST assume that the set of network devices on the path can change. This motivates the use of soft-state when designing protocols that interact with signals originating from network devices [I-D.irtf-panrg-what-not-to-do] (e.g., ECN). This can include context-sensitive treatment of "soft" signals provided to the endpoint [RFC5164]. Fairhurst Expires May 6, 2020 [Page 17] Internet-Draft CC Guidelines November 2019 4.7. Protection of Protocol Mechanisms An endpoint needs to provide protection from attacks on the traffic it generates, or attacks that seek to increase the capacity it consumes (impacting other traffic that shared a bottleneck). Off Path Attack: A design MUST protect from off-path attack to the protocol [RFC8085] (i.e., one by an attacker that is unable to see the contents of packets exchanged across the path). An attack on the congestion control can lead to a Denial of Service (DoS) vulnerability for the flow being controlled and/or other flows that share network resources along the path. Validation of Signals: Network signalling and control messages (e.g., ICMP [RFC0792]) MUST be validated before they are used to protect from malicious abuse. This MUST at least include protection from off-path attack [RFC8085]. On Path Attack: A protocol can be designed to protect from on-path attacks, but this requires more complexity and the use of encryption/authentication mechanisms (e.g., IPsec [RFC4301], QUIC [I-D.ietf-quic-transport]). 5. IETF Guidelines on Evaluation of Congestion Control The IETF has provided guidance [RFC5033] for considering alternate congestion control algorithms. The IRTF has also described a set of metrics and related trade-off between metrics that can be used to compare, contrast, and evaluate congestion control techniques [RFC5166]. [RFC5783] provides a snapshot of congestion-control research in 2008. 6. Acknowledgements This document owes much to the insight offered by Sally Floyd, both at the time of writing of RFC2914 and her help and review in the many years that followed this. Nicholas Kuhn helped develop the first draft of these guidelines. Tom Jones and Ana Custura reviewed the first version of this draft. The University of Aberdeen received funding to support this work from the European Space Agency. Fairhurst Expires May 6, 2020 [Page 18] Internet-Draft CC Guidelines November 2019 7. IANA Considerations This memo includes no request to IANA. RFC Editor Note: If there are no requirements for IANA, the section will be removed during conversion into an RFC by the RFC Editor. 8. Security Considerations This document introduces no new security considerations. Each RFC listed in this document discusses the security considerations of the specification it contains. The security considerations for the use of transports are provided in the references section of the cited RFCs. Security guidance for applications using UDP is provided in the UDP Usage Guidelines [RFC8085]. Section 4.7 describes general requirements relating to the design of safe protocols and their protection from on and off path attack. Section 4.6 follows current best practice to validate ICMP messages prior to use. 9. References 9.1. Normative References [RFC1122] Braden, R., Ed., "Requirements for Internet Hosts - Communication Layers", STD 3, RFC 1122, DOI 10.17487/RFC1122, October 1989, . [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, . [RFC2914] Floyd, S., "Congestion Control Principles", BCP 41, RFC 2914, DOI 10.17487/RFC2914, September 2000, . [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition of Explicit Congestion Notification (ECN) to IP", RFC 3168, DOI 10.17487/RFC3168, September 2001, . [RFC3390] Allman, M., Floyd, S., and C. Partridge, "Increasing TCP's Initial Window", RFC 3390, DOI 10.17487/RFC3390, October 2002, . Fairhurst Expires May 6, 2020 [Page 19] Internet-Draft CC Guidelines November 2019 [RFC3742] Floyd, S., "Limited Slow-Start for TCP with Large Congestion Windows", RFC 3742, DOI 10.17487/RFC3742, March 2004, . [RFC5348] Floyd, S., Handley, M., Padhye, J., and J. Widmer, "TCP Friendly Rate Control (TFRC): Protocol Specification", RFC 5348, DOI 10.17487/RFC5348, September 2008, . [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion Control", RFC 5681, DOI 10.17487/RFC5681, September 2009, . [RFC6298] Paxson, V., Allman, M., Chu, J., and M. Sargent, "Computing TCP's Retransmission Timer", RFC 6298, DOI 10.17487/RFC6298, June 2011, . [RFC6928] Chu, J., Dukkipati, N., Cheng, Y., and M. Mathis, "Increasing TCP's Initial Window", RFC 6928, DOI 10.17487/RFC6928, April 2013, . [RFC7567] Baker, F., Ed. and G. Fairhurst, Ed., "IETF Recommendations Regarding Active Queue Management", BCP 197, RFC 7567, DOI 10.17487/RFC7567, July 2015, . [RFC7661] Fairhurst, G., Sathiaseelan, A., and R. Secchi, "Updating TCP to Support Rate-Limited Traffic", RFC 7661, DOI 10.17487/RFC7661, October 2015, . [RFC8085] Eggert, L., Fairhurst, G., and G. Shepherd, "UDP Usage Guidelines", BCP 145, RFC 8085, DOI 10.17487/RFC8085, March 2017, . 9.2. Informative References [Flow-Rate-Fairness] Briscoe, Bob., "Flow Rate Fairness: Dismantling a Religion, ACM Computer Communication Review 37(2):63-74", April 2007. [I-D.ietf-quic-recovery] Iyengar, J. and I. Swett, "QUIC Loss Detection and Congestion Control", draft-ietf-quic-recovery-23 (work in progress), September 2019. Fairhurst Expires May 6, 2020 [Page 20] Internet-Draft CC Guidelines November 2019 [I-D.ietf-quic-transport] Iyengar, J. and M. Thomson, "QUIC: A UDP-Based Multiplexed and Secure Transport", draft-ietf-quic-transport-23 (work in progress), September 2019. [I-D.ietf-tcpm-2140bis] Touch, J., Welzl, M., and S. Islam, "TCP Control Block Interdependence", draft-ietf-tcpm-2140bis-00 (work in progress), April 2019. [I-D.ietf-tcpm-accurate-ecn] Briscoe, B., Kuehlewind, M., and R. Scheffenegger, "More Accurate ECN Feedback in TCP", draft-ietf-tcpm-accurate- ecn-09 (work in progress), July 2019. [I-D.ietf-tcpm-rack] Cheng, Y., Cardwell, N., Dukkipati, N., and P. Jha, "RACK: a time-based fast loss detection algorithm for TCP", draft-ietf-tcpm-rack-06 (work in progress), November 2019. [I-D.ietf-tcpm-rfc793bis] Eddy, W., "Transmission Control Protocol Specification", draft-ietf-tcpm-rfc793bis-14 (work in progress), July 2019. [I-D.ietf-tsvwg-aqm-dualq-coupled] Schepper, K., Briscoe, B., and G. White, "DualQ Coupled AQMs for Low Latency, Low Loss and Scalable Throughput (L4S)", draft-ietf-tsvwg-aqm-dualq-coupled-10 (work in progress), July 2019. [I-D.ietf-tsvwg-l4s-arch] Briscoe, B., Schepper, K., Bagnulo, M., and G. White, "Low Latency, Low Loss, Scalable Throughput (L4S) Internet Service: Architecture", draft-ietf-tsvwg-l4s-arch-04 (work in progress), July 2019. [I-D.irtf-panrg-what-not-to-do] Dawkins, S., "Path Aware Networking: Obstacles to Deployment (A Bestiary of Roads Not Taken)", draft-irtf- panrg-what-not-to-do-03 (work in progress), May 2019. [RFC0768] Postel, J., "User Datagram Protocol", STD 6, RFC 768, DOI 10.17487/RFC0768, August 1980, . Fairhurst Expires May 6, 2020 [Page 21] Internet-Draft CC Guidelines November 2019 [RFC0792] Postel, J., "Internet Control Message Protocol", STD 5, RFC 792, DOI 10.17487/RFC0792, September 1981, . [RFC0896] Nagle, J., "Congestion Control in IP/TCP Internetworks", RFC 896, DOI 10.17487/RFC0896, January 1984, . [RFC0970] Nagle, J., "On Packet Switches With Infinite Storage", RFC 970, DOI 10.17487/RFC0970, December 1985, . [RFC2140] Touch, J., "TCP Control Block Interdependence", RFC 2140, DOI 10.17487/RFC2140, April 1997, . [RFC2309] Braden, B., Clark, D., Crowcroft, J., Davie, B., Deering, S., Estrin, D., Floyd, S., Jacobson, V., Minshall, G., Partridge, C., Peterson, L., Ramakrishnan, K., Shenker, S., Wroclawski, J., and L. Zhang, "Recommendations on Queue Management and Congestion Avoidance in the Internet", RFC 2309, DOI 10.17487/RFC2309, April 1998, . [RFC2525] Paxson, V., Allman, M., Dawson, S., Fenner, W., Griner, J., Heavens, I., Lahey, K., Semke, J., and B. Volz, "Known TCP Implementation Problems", RFC 2525, DOI 10.17487/RFC2525, March 1999, . [RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext Transfer Protocol -- HTTP/1.1", RFC 2616, DOI 10.17487/RFC2616, June 1999, . [RFC3449] Balakrishnan, H., Padmanabhan, V., Fairhurst, G., and M. Sooriyabandara, "TCP Performance Implications of Network Path Asymmetry", BCP 69, RFC 3449, DOI 10.17487/RFC3449, December 2002, . [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson, "RTP: A Transport Protocol for Real-Time Applications", STD 64, RFC 3550, DOI 10.17487/RFC3550, July 2003, . Fairhurst Expires May 6, 2020 [Page 22] Internet-Draft CC Guidelines November 2019 [RFC3819] Karn, P., Ed., Bormann, C., Fairhurst, G., Grossman, D., Ludwig, R., Mahdavi, J., Montenegro, G., Touch, J., and L. Wood, "Advice for Internet Subnetwork Designers", BCP 89, RFC 3819, DOI 10.17487/RFC3819, July 2004, . [RFC3828] Larzon, L-A., Degermark, M., Pink, S., Jonsson, L-E., Ed., and G. Fairhurst, Ed., "The Lightweight User Datagram Protocol (UDP-Lite)", RFC 3828, DOI 10.17487/RFC3828, July 2004, . [RFC4301] Kent, S. and K. Seo, "Security Architecture for the Internet Protocol", RFC 4301, DOI 10.17487/RFC4301, December 2005, . [RFC4340] Kohler, E., Handley, M., and S. Floyd, "Datagram Congestion Control Protocol (DCCP)", RFC 4340, DOI 10.17487/RFC4340, March 2006, . [RFC4828] Floyd, S. and E. Kohler, "TCP Friendly Rate Control (TFRC): The Small-Packet (SP) Variant", RFC 4828, DOI 10.17487/RFC4828, April 2007, . [RFC4960] Stewart, R., Ed., "Stream Control Transmission Protocol", RFC 4960, DOI 10.17487/RFC4960, September 2007, . [RFC5033] Floyd, S. and M. Allman, "Specifying New Congestion Control Algorithms", BCP 133, RFC 5033, DOI 10.17487/RFC5033, August 2007, . [RFC5164] Melia, T., Ed., "Mobility Services Transport: Problem Statement", RFC 5164, DOI 10.17487/RFC5164, March 2008, . [RFC5166] Floyd, S., Ed., "Metrics for the Evaluation of Congestion Control Mechanisms", RFC 5166, DOI 10.17487/RFC5166, March 2008, . [RFC5783] Welzl, M. and W. Eddy, "Congestion Control in the RFC Series", RFC 5783, DOI 10.17487/RFC5783, February 2010, . Fairhurst Expires May 6, 2020 [Page 23] Internet-Draft CC Guidelines November 2019 [RFC6363] Watson, M., Begen, A., and V. Roca, "Forward Error Correction (FEC) Framework", RFC 6363, DOI 10.17487/RFC6363, October 2011, . [RFC6679] Westerlund, M., Johansson, I., Perkins, C., O'Hanlon, P., and K. Carlberg, "Explicit Congestion Notification (ECN) for RTP over UDP", RFC 6679, DOI 10.17487/RFC6679, August 2012, . [RFC6773] Phelan, T., Fairhurst, G., and C. Perkins, "DCCP-UDP: A Datagram Congestion Control Protocol UDP Encapsulation for NAT Traversal", RFC 6773, DOI 10.17487/RFC6773, November 2012, . [RFC6951] Tuexen, M. and R. Stewart, "UDP Encapsulation of Stream Control Transmission Protocol (SCTP) Packets for End-Host to End-Host Communication", RFC 6951, DOI 10.17487/RFC6951, May 2013, . [RFC8084] Fairhurst, G., "Network Transport Circuit Breakers", BCP 208, RFC 8084, DOI 10.17487/RFC8084, March 2017, . [RFC8087] Fairhurst, G. and M. Welzl, "The Benefits of Using Explicit Congestion Notification (ECN)", RFC 8087, DOI 10.17487/RFC8087, March 2017, . [RFC8311] Black, D., "Relaxing Restrictions on Explicit Congestion Notification (ECN) Experimentation", RFC 8311, DOI 10.17487/RFC8311, January 2018, . [RFC8312] Rhee, I., Xu, L., Ha, S., Zimmermann, A., Eggert, L., and R. Scheffenegger, "CUBIC for Fast Long-Distance Networks", RFC 8312, DOI 10.17487/RFC8312, February 2018, . [RFC8511] Khademi, N., Welzl, M., Armitage, G., and G. Fairhurst, "TCP Alternative Backoff with ECN (ABE)", RFC 8511, DOI 10.17487/RFC8511, December 2018, . Fairhurst Expires May 6, 2020 [Page 24] Internet-Draft CC Guidelines November 2019 Appendix A. Revision Notes Note to RFC-Editor: please remove this entire section prior to publication. Individual draft -00: o Comments and corrections are welcome directly to the authors or via the IETF TSVWG, working group mailing list. IndivRFC896 idual draft -01: o This update is proposed for initial WG comments. o If there is interest in progressing this document, the next version will include more complee referencing to citred material. Individual draft -02: o Correction of typos. Individual draft -03: o Added section 1.1 with text on current BCP status with additional alignment and updates to RFC2914 on Congestion Control Principles (after question from M. Scharf). o Edits to consolidate starvation text. o Added text that multicast currently noting that this is out of scope. o Revised sender-based CC text after comment from C. Perkins (Section 3.1,3.3 and other places). o Added more about FEC after comment from C. Perkins. o Added an explicit reference to RFC 5783 and updated this text (after question from M. Scharf). o To avoid doubt, added a para about "Each new transport needs to make its own design decisions about how to meet the recommendations and requirements for congestion control." o Upated references. Individual draft -04: Fairhurst Expires May 6, 2020 [Page 25] Internet-Draft CC Guidelines November 2019 o Correction of NiTs. Further clarifications. o This draft does not attempt to address further alignment with draft-ietf-tcpm-rto-consider. This will form part of a future revision. Author's Address Godred Fairhurst University of Aberdeen School of Engineering Fraser Noble Building Aberdeen AB24 3U UK Email: gorry@erg.abdn.ac.uk Fairhurst Expires May 6, 2020 [Page 26]