Internet Engineering Task Force Mark Allman INTERNET DRAFT NASA GRC/BBN File: draft-allman-tcp-abc-00.txt July, 2000 Expires: January, 2001 TCP Congestion Control with Appropriate Byte Counting Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract This document proposed a small modification to the way TCP increases its congestion window. Rather than the traditional method of increasing the congestion window by a constant amount for each arriving acknowledgment, we suggest basing the increase on the number of previously unacknowledged bytes each ACK covers. This change improves the performance of TCP, as well as closes a security hole TCP receivers can use to induce the sender into increasing the sending rate too rapidly. 1 Introduction This document proposes a modified algorithm for increasing TCP's congestion window (cwnd) that improves performance and security. Rather than increasing a TCP's congestion window based on the number of acknowledgments (ACKs) that arrive at the data sender the congestion window is increased based on the number of bytes acknowledged by the arriving ACKs. The algorithm improves performance by mitigating the impact of delayed ACKs on the growth of cwnd. At the same time, the algorithm provides more appropriate cwnd growth in response to ACKs that cover only small amounts of data (less than a full segment size). More appropriate cwnd growth Expires: January 2001 [Page 1] draft-allman-tcp-abc-00.txt July 2000 can improve both performance and can prevent inappropriate cwnd growth in response to a misbehaving receiver. Much of the language in this document is taken from [RFC2581]. The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119]. This document is organized as follows. Section 2 outlines the modified algorithm for increasing TCP's congestion window. Section 3 discusses the advantages of using the modified algorithm. Section 4 discusses the disadvantages of the approach outlined in this document. Section 5 outlines some of the fairness issues that must be considered for the modified algorithm. Section 6 discusses security considerations. 2 A Modified Algorithm for Increasing the Congestion Window As originally outlined in [Jac88] and specified in [RFC2581], TCP uses two algorithms for increasing the congestion window (cwnd). During steady-state, TCP uses the Congestion Avoidance algorithm to linearly increase the value of cwnd. At the beginning of a transfer, after a retransmission timeout or after a long idle period (in some implementations), TCP uses the Slow Start algorithm to increase cwnd exponentially. According to RFC 2581 slow start bases the cwnd increase on the number of incoming acknowledgments. During congestion avoidance RFC 2581 allows more latitude in increasing cwnd, but traditionally implementations have based the increase on the number of arriving ACKs. In the following two subsections, we detail modifications to these algorithms to increase cwnd based on the number of bytes being acknowledged by each arriving ACK, rather than by the number of ACKs that arrive. We call these changes ``Appropriate Byte Counting'' (ABC) [All99]. 2.1 Congestion Avoidance RFC 2581 specifies that cwnd should be increased by 1 segment per round-trip time (RTT) during the congestion avoidance phase of a transfer. Traditionally, TCPs have approximated this increase by increasing cwnd by 1/cwnd for each arriving ACK. This algorithm opens cwnd by roughly 1 segment per RTT if the receiver ACKs each incoming segment and no ACK loss occurs. However, if the receiver implements delayed ACKs [Bra89] the receiver returns roughly half as many ACKs which causes the sender to open cwnd more conservatively (by approximately 1 segment every second RTT). The approach that we suggest is to store the number of bytes that have been ACKed in a bytes_acked variable in the TCP control block. When bytes_acked becomes greater than or equal to the value of the congestion window, bytes_acked is reduced by the value of cwnd. Next, cwnd is incremented by a full-sized segment. The algorithm suggested above is specifically allowed by RFC 2581 during congestion avoidance because it opens the window by at most 1 segment per RTT. Expires: January 2001 [Page 2] draft-allman-tcp-abc-00.txt July 2000 2.2 Slow Start RFC 2581 states that the sender increments the congestion window by at most 1*SMSS bytes for each arriving acknowledgment during slow start. We propose that a TCP sender SHOULD increase cwnd by the number of previously unacknowledged bytes ACKed by each incoming acknowledgment provided the increase is not more than L bytes. Choosing the limit on the increase, L, is discussed in the next subsection. When the number of previously unacknowledged bytes ACKed is less than 1*SMSS bytes or L is less than 1*SMSS bytes this proposal is no more aggressive (and possibly less aggressive) than allowed by RFC 2581. However, increasing cwnd by more than 1*SMSS bytes in response to a single ACK is more aggressive than allowed by RFC 2581. We believe the more aggressive version of the slow start algorithm still falls under the ``conservation of packets'' principle outlined in [Jac88] and is safe for experimentation in shared networks provided an appropriate limit is applied (see next section). 2.3 Choosing the Limit The limit, L, chosen for the cwnd increase during slow start controls the aggressiveness of the algorithm. Choosing L=1*SMSS bytes provides behavior that is no more aggressive than allowed by RFC 2581. However, ABC with L=1*SMSS bytes is more conservative in a number of key ways (as discussed in the next section) and therefore, we believe that even though with L=1*SMSS bytes TCP stacks will see little performance benefit, ABC SHOULD be used. A very large L could potentially lead to large line-rate bursts of traffic in the face of a large amount of ACK loss or in the case when the receiver sends ``stretch ACKs'' (ACKs for more than the two full-sized segments allowed by the delayed ACK algorithm) [Pax97]. This documents suggest that TCP implementations SHOULD use L=2*SMSS bytes to balance between being conservative (L=1*SMSS bytes) and potentially being very aggressive. In addition, L=2*SMSS bytes exactly balances the negative impact of the delayed ACK algorithm (as discussed in more detail in section 3.2). Note that when L=2*SMSS bytes cwnd growth is roughly the same as the case when the standard algorithms are used in conjunction with a receiver that transmits an ACK for each incoming segment. The exception to the above suggestion is during a slow start phase that follows a retransmission timeout (RTO). In this situation, a TCP MUST use L=1*SMSS as specified in RFC 2581 since ACKs for large amount of previously unacknowledged data are common during this phase of a transfer. These ACKs do not necessarily indicate how much data has left the network in the last RTT and therefore ABC cannot accurately determine how much to increase cwnd. As an example, say segment N is dropped by the network and segments N+1 and N+2 arrive successfully at the receiver. The sender will receive only two duplicate ACKs and therefore must rely on the retransmission timer (RTO) to detect the loss. When the RTO expires Expires: January 2001 [Page 3] draft-allman-tcp-abc-00.txt July 2000 segment N is retransmitted. The ACK sent in response to the retransmission will be for segment N+2. However, this ACK does not indicate that three segments have left the network in the last RTT, but rather only a single segment left the network. Therefore, the appropriate cwnd increment is at most 1*SMSS bytes. 3 Advantages This section outlines several advantages of using the ABC algorithm to increase cwnd, rather than the standard ACK counting algorithm given in [RFC2581]. 3.1 More Appropriate Congestion Window Increase The ABC algorithm outlined in section 2 increases TCP's cwnd in proportion to the amount of data actually sent into the network. ACK counting, on the other hand, increments cwnd by a constant upon the arrival of each ACK. For instance, consider a telnet connection in which ACKs generally cover only a few bytes of data, but cwnd is increased by 1*SMSS bytes for each ACK received. When a large amount of data needs to be transmitted (e.g., displaying a large file) the data is sent in one large burst because the cwnd grows by 1*SMSS bytes per ACK rather than based on the actual amount of capacity used. Such a line-rate burst of data can potentially cause a large amount of segment loss. Congestion Window Validation (CWV) [RFC2861] helps the above problem as well. CWV limits the amount of unused cwnd a TCP connection can accumulate. ABC can be used in conjunction with CWV to obtain an accurate measure of the network path. 3.2 Mitigate the Impact of Delayed ACKs and Lost ACKs Delayed ACKs [RFC1122,RFC2581] allow a TCP receiver to refrain from sending an ACK for each incoming segment. However, a receiver SHOULD send an ACK for every second full-sized segment that arrives. Furthermore, a receiver MUST NOT withhold an ACK for more than 500 ms. By reducing the number of ACKs sent to the data originator the receiver is slowing the growth of the congestion window under an ACK counting system. Using ABC with L=2*SMSS bytes can roughly negate the negative impact imposed by delayed ACKs by allowing cwnd to be increased for ACKs that were withheld by the receiver. This allows the congestion window to grow in a manner similar to the case when the receiver ACKs each incoming segment, but without adding extra traffic to the network. Simulation studies have shown increased throughput when a TCP sender uses ABC when compared to the standard ACK counting algorithm [All99], especially for short transfers that never leave the initial slow start period. Note that delayed ACKs should not be an issue during slow start-based loss recovery, as RFC 2581 recommends that receivers not delay ACKs that cover out-of-order segments. Therefore, as discussed above, ABC with L > 1*SMSS is inappropriate for such slow start based loss recovery and MUST NOT be used. Expires: January 2001 [Page 4] draft-allman-tcp-abc-00.txt July 2000 3.3 Prevents Attacks from Misbehaving Receivers [SCWA99] outlines several methods for a receiver to induce a TCP sender into violating congestion control and transmitting data at a potentially inappropriate rate. One of the outlined attacks is ``ACK Splitting''. This scheme involves the receiver sending multiple ACKs for each incoming data segment, each ACKing only a small portion of the original TCP data segment. Since TCP senders have traditionally used ACK counting to increase cwnd, ACK splitting causes inappropriately rapid cwnd growth and, in turn, a potentially inappropriate sending rate. A TCP sender that uses ABC can prevent this attack from being used to undermine standard congestion control because the cwnd increase is based on the number of bytes ACKed, rather than the number of ACKs received. To prevent misbehaving receivers from inducing inappropriate sender behavior we suggest TCP implementation use ABC, even if L=1*SMSS bytes (i.e., not allowing ABC to provide more aggressive cwnd growth than allowed by RFC 2581). 4 Disadvantages The main disadvantages of using ABC with L=2*SMSS bytes are an increase in the burstiness of TCP and a small increase in the overall loss rate. [All98] discusses the two ways that ABC increases the burstiness of the TCP sender. First, the ``micro burstiness'' of the connection is increased. In other words, the number of segments sent in response to each incoming ACK is increased by at most 1 segment when using ABC with L=2*SMSS bytes in conjunction with a receiver that is sending delayed ACKs. During slow start this translates into an increase from sending 2 back-to-back segments to sending 3 back-to-back packets in response to an ACK for a single packet. Or, an increase of 3 packets to 4 packets when receiving a delayed ACK for two outstanding packets. Note that ACK loss can cause larger bursts. However, ABC only increases the burst size by at most 1*SMSS bytes per ACK received when compared to the standard behavior. This slight increase in the burstiness should only cause problems for devices that have very small buffers. In addition, ABC increases the ``macro burstiness'' of the TCP sender in response to delayed ACKs. Rather than increasing cwnd by roughly 1.5 times per RTT, ABC roughly doubles the congestion window every RTT. However, doubling cwnd every RTT fits within the spirit of slow start, as originally outlined [Jac88]. With the increased burstiness comes a modest increase in the loss rate for a TCP connection employing ABC (see the next section for a short discussion on the fairness of ABC to non-ABC flows). The additional loss can be directly attributable to the increased aggressiveness of ABC. During slow start cwnd is increased more rapidly and therefore when loss occurs cwnd is larger and more drops are likely. Similarly, a congestion avoidance cycle takes roughly half as long when using ABC and delayed ACKs when compared to an ACK Expires: January 2001 [Page 5] draft-allman-tcp-abc-00.txt July 2000 counting implementation. In other words, a TCP sender reaches the capacity of the network path, drops a packet and reduces the congestion window by half roughly twice as often when using ABC. However, as discussed above, in spite of the additional loss an ABC TCP sender generally obtains better overall performance than a non-ABC TCP. 5 Fairness Considerations [All99] presents several simulations conducted to measure the impact of ABC on competing traffic (both ABC and non-ABC). The experiments show that while ABC increases the drop rate for the connection using ABC, competing traffic is not greatly effected. The experiments show that standard TCP and ABC both obtain roughly the same throughput regardless of the variant of the competing traffic. The simulations also reaffirm that ABC outperforms non-ABC TCP in an environment with varying types of TCP connections. 6 Security Considerations As discussed in section 3.3 ABC protects a TCP from a misbehaving receiver that induces the sender into transmitting at an inappropriate rate with an ``ACK splitting'' attack. This, in turn, protects the network from an overly aggressive sender. 7 Conclusions We RECOMMEND that all TCP stacks be modified to use ABC with L=1*SMSS bytes. Furthermore, simulations of ABC with L=2*SMSS bytes show a promising performance improvement that we encourage researchers to experiment with in the Internet. Acknowledgments This draft has benefited from discussions with and encouragement from Sally Floyd. References [All98] Mark Allman. TCP Byte Counting Refinements. ACM Computer Communication Review, 29(3), July 1999. [All99] Mark Allman. TCP Byte Counting Refinements. ACM Computer Communication Review, 29(3), July 1999. [Jac88] Van Jacobson. Congestion Avoidance and Control. ACM SIGCOMM 1988. [Pax97] Vern Paxson. Automated Packet Trace Analysis of TCP Implementations. ACM SIGCOMM, September 1997. [RFC1122] B. Braden, ed., Requirements for Internet Hosts -- Communication Layers, RFC 1122, Oct. 1989. Expires: January 2001 [Page 6] draft-allman-tcp-abc-00.txt July 2000 [RFC2119] S. Bradner, Key words for use in RFCs to Indicate Requirement Levels, BCP 14, RFC 2119, March 1997. [RFC2581] Mark Allman, Vern Paxson, W. Richard Stevens. TCP Congestion Control, April 1999. RFC 2581. [RFC2861] Mark Handley, Jitendra Padhye, Sally Floyd. TCP Congestion Window Validation, June 2000. RFC 2861. [SCWA99] Stefan Savage, Neal Cardwell, David Wetherall, Tom Anderson. TCP Congestion Control with a Misbehaving Receiver. ACM Computer Communication Review, 29(5), October 1999. Author's Addresses: Mark Allman NASA Glenn Research Center/BBN Technologies Lewis Field 21000 Brookpark Rd. MS 54-2 Cleveland, OH 44135 Phone: 216-433-6586 Fax: 216-433-8705 mallman@grc.nasa.gov http://roland.grc.nasa.gov/~mallman Expires: January 2001 [Page 7]