Internet Engineering Task Force Janardhan Iyengar INTERNET DRAFT University of Delaware draft-iyengar-burst-mitigation-01.txt Mark Allman Expires: July, 2006 ICIR/ICSI Ethan Blanton Purdue University January, 2006 TCP Burst Mitigation Through Congestion Window Limiting draft-iyengar-burst-mitigation-01.txt Status of this Memo By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Copyright Notice Copyright (C) The Internet Society (2006). Abstract This document describes Congestion Window Limiting (CWL), a method for mitigating micro-bursts in TCP by limiting the congestion window during interruptions in TCP's acknowledgment clock. Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119]. The reader is expected to be familiar with terminology from [RFC2581]. Iyengar, Allman, Blanton [Page 1] draft-iyengar-burst-mitigation-01.txt January 2006 1. Introduction TCP dynamics and application sending patterns can cause a TCP sender to inject bursts into the network with potentially harmful effects for both the network and the sender. Bursting can stress network queues causing loss in the bursting connection as well as in other flows sharing the stressed queues. Bursting can also cause scaling on short timescales [JD03] and increase queueing delays in routers. This document draws from previously proposed burst mitigation techniques and presents one possible technique to reduce some of TCP's burstiness. In this document, we are concerned with one type of bursting which we call "micro-bursts". Micro-bursts are generated by a TCP in response to changes in the cumulative acknowledgment point. Each TCP segment carrying a cumulative acknowledgment (ACK) that slides the sender's transmission window allows previously unsent data segments to be transmitted (when application data is available). These segments are ideally transmitted at the line rate of the sender's network (assuming the host's CPU can produce packets fast enough). We refer to such bursts of segments sent in response to receipt of a single ACK as "micro-bursts". TCP exhibits other bursting behaviors as well, which we collectively term as "macro-bursts" since they tend to occur over longer timescales than micro-bursts. Macro-bursts can be caused by several TCP and/or network phenomena, such as slow start [RFC2581] and ACK compression [ZSC91]. Although macro-bursts and their mitigation have also been the topic of much research ([AB05] briefly discusses this research), we limit ourselves to only micro-burst mitigation in this document. Several situations can cause micro-bursting: * Although TCP's cumulative ACK mechanism is robust to loss, ACK loss causes a TCP sender's transmission window to slide by a greater amount with lesser frequency, potentially triggering large micro-bursts in the process. * An application can send data in a bursty fashion, causing TCP to transmit micro-bursts. * Reordered ACKs cause an ACK stream that appears similar to an ACK stream with loss, causing similar micro-bursting. * In some cases, when a TCP sender exits fast recovery, a large number of segments are transmitted at line rate [FF96]. This dynamic occurs when the sender cannot transmit enough new segments during the recovery phase (e.g., due to ACK loss) and therefore stores "permission to send" until a cumulative ACK arrives. This phenomenon is discussed in [FF96], where the "MaxBurst" mechanism is introduced to contain the consequent burst (see discussion in section 3). Iyengar, Allman, Blanton [Page 2] draft-iyengar-burst-mitigation-01.txt January 2006 These and other causes of bursting are described in more detail in [JD03,AB05]. In this document, we present one possible method for mitigating TCP micro-bursts called Congestion Window Limiting (CWL), which is based on work in [HTH01] and originally outlined in [AB05]. Alternate schemes have been proposed to mitigate the impact of micro-bursts, as discussed in section 3. We note that the question of whether or not micro-bursts need mitigation remains open. [JD03] suggests that TCP's bursting may need mitigation from the perspective of the network, while [BA05] suggests that micro-bursts often do not cause loss within the bursting connection. By specifying a particular mitigation technique this document intends to draw community attention to the issue of micro-bursts, and attempts to generate discussion and further exploration and experimentation in the area. 2. Congestion Window Limiting (CWL) CWL introduces a new parameter called "BLimit", which represents the largest acceptable micro-burst a TCP should transmit. Each time an ACK is received that slides the transmission window, the congestion window (cwnd) modification (increase or decrease) procedures outlined in [RFC2581] MUST be applied. When using CWL, the following steps MUST be executed before any data is sent in response to the received ACK: (1) If cwnd > (FlightSize + BLimit) TCP will likely send a micro-burst and steps (2) and (3) MUST be used; otherwise, skip (2) and (3) and transmit data as usual. If this condition holds, the only case where a micro-burst will not occur is when not enough application data is available to transmit. (2) If ssthresh < cwnd then ssthresh MUST be set to cwnd. (3) Set cwnd = (FlightSize + BLimit). After these steps, available application data should be transmitted as allowed by the cwnd and the receiver's advertised window. CWL controls bursts by reducing cwnd when the ACK clock is lost or interrupted to the point where the cumulative ACK will trigger a burst of segments in excess of BLimit. History information maintained in ssthresh allows the connection to exponentially increase the cwnd (via slow start) back to the size before the reduction. BLimit SHOULD be chosen such that bursts are no larger than those allowed by [RFC3390]. From [RFC3390], we therefore choose: BLimit = min (4*MSS, max (2*MSS, 4380 bytes)) (1) If useful, BLimit MAY be smaller than allowed by equation (1). Iyengar, Allman, Blanton [Page 3] draft-iyengar-burst-mitigation-01.txt January 2006 3. Related Work CWL makes TCP congestion control more conservative and is therefore implicitly allowed by [RFC2581]. Congestion Window Validation (CWV) [RFC2861] attempts to protect the network from a sender's incorrect or stale view of the available capacity along the path. [RFC2861] recommends (i) not increasing the cwnd when it is not fully used by an application-limited sender, and (ii) decaying the cwnd after a sufficiently long idle period to avoid use of an unvalidated cwnd. [RFC2861] suggests reducing the cwnd of an application-limited sender by half for each idle RTO interval. While CWV can prevent micro-bursts in some situations, this is accidental and not part of the problem CWV is trying to solve. CWL, on the other hand, aims at preventing micro-bursts by reducing the cwnd when appropriate, and in doing so, protects the network from an application-limited sender with stale cwnd information. CWL also prevents a cwnd from increasing during application-limited periods by limiting it to (FlightSize + BLimit). Note that CWL is more aggressive in reducing cwnd than [RFC2861]. Several techniques have been proposed in the past for controlling micro-bursts, as follows: * As noted above, [FF96] introduces the "MaxBurst" mechanism. MaxBurst is an additional constraint that limits the number of data segments that can be transmitted in response to any given ACK. CWL provides a single control for the amount of data a TCP connection can transmit into the network at any given point. This is arguably a clean approach to controlling the load imposed on the network. On the other hand, by introducing a second control, MaxBurst provides for separation of concerns. In other words, limiting the sizes of micro-bursts is, in some sense, a different task than limiting the overall transmission rate to control network congestion; therefore, using two different controls may make sense. An additional drawback of MaxBurst is that the two transmission controllers may interact poorly, causing undesirable side effects. When BLimit == MaxBurst, CWL and MaxBurst perform similarly [AB05]. * [HTH01] introduces an algorithm called "Use it or Lose it" (UI/LI) which modifies the cwnd to reflect the actual outstanding number of bytes, thereby controlling bursts in response to an ack. UI/LI is used in SCTP [RFC2960,RA+05] and provides the basis for CWL. CWL extends UI/LI by modifying ssthresh and enabling a sender to slow start up to the last known safe cwnd (step (2) in the algo above). In the absence of explicitly setting ssthresh as part of the burst mitigation process the UI/LI algorithm is non-deterministic in its use of slow start after reducing cwnd. [AB05] illustrates cases where Iyengar, Allman, Blanton [Page 4] draft-iyengar-burst-mitigation-01.txt January 2006 slow start is used and cases where it is not used, simply depending on the state of the connection before UI/LI reduces the cwnd. * Rate-Based Pacing [VH97] imposes a limitation on the rate of sending, and prevent bursts by pacing data into the network until the ACK clock is established. Although this solution can be very effective in burst mitigation in some cases, it requires a new timer and parameters for pacing out the data segments. Further, as shown in [AB05], there are cases where there is no natural "lull" in the connection into which segments can be nicely paced. Therefore, the exact application of pacing requires more research. 4. Discussion We emphasize that the question of whether or not micro-bursts need mitigation remains open. While this document provides the specification for one mitigation technique based on current knowledge, continued research on bursts and alternative mitigation mechanisms is strongly encouraged. Finally, we note that some TCP stacks may already implement some form of micro-burst mitigation, although the mechanisms used may not be well understood and have not been through IETF community review. This document presents an initial step towards encouraging better understood and community reviewed micro-burst mitigation mechanisms. 5. Security Considerations This document calls for reducing the congestion window during loss of TCP's ACK clock. An attacker can therefore reduce throughput of a TCP connection by causing ACK loss or reordering of data or acks. 6. IANA Considerations None. Acknowledgments Discussions with Sally Floyd have shaped some of the thinking that is contained in this document. Normative References [RFC2119] S. Bradner. Key words for use in RFCs to Indicate Requirement Levels, March 1997. BCP 14, RFC 2119. [RFC2581] M. Allman, V. Paxson, W. Stevens. TCP Congestion Control, April 1999. RFC 2581. Iyengar, Allman, Blanton [Page 5] draft-iyengar-burst-mitigation-01.txt January 2006 Informative References [RFC2861] M. Handley, J. Padhye, S. Floyd. TCP Congestion Window Validation, June 2000. RFC 2861. [AB05] M. Allman, E. Blanton. Notes on Burst Mitigation for Transport Protocols. ACM Computer Communication Review, 35(2), April 2005. [BA05] E. Blanton, M. Allman. On the Impact of Bursting on TCP Performance. Proceedings of the Workshop for Passive and Active Measurement, March 2005. [FF96] K. Fall, S. Floyd. Simulation-based Comparisons of Tahoe, Reno, and SACK TCP. Computer Communication Review, 26(3), July 1996. [HTH01] A. Hughes, J. Touch, J. Heidemann. Issues in TCP Slow-Start Restart After Idle. Internet draft , December 2001 (expired). URL: http://www.isi.edu/touch/pubs/draft-hughes-restart-00.txt. [JD03] H. Jiang, C. Dovrolis. Source-Level IP Packet Bursts: Causes and Effects. In ACM SIGCOMM/Usenix Internet Measurement Conference, October 2003. [SA+05] R. Stewart, I. Arias-Rodriguez, K. Poon, A. Caro, M. Tuexen. SCTP Specification Errata and Issues. Internet draft , October 2005 (work in progress). [VH97] V. Visweswaraiah and J. Heidemann. Improving Restart of Idle TCP Connections. Technical Report 97-661, University of Southern California, November 1997. [ZSC91] L. Zhang, S. Shenker, and D. Clark. Observations on the Dynamics of a Congestion Control Algorithm: The Effects of Two-Way Traffic. ACM SIGCOMM, September 1991. Author's Addresses Janardhan Iyengar Protocol Engineering Lab, CIS Department University of Delaware 103 Smith Hall Newark, DE 19716 Email: iyengar@cis.udel.edu URL: http//www.cis.udel.edu/~iyengar/ Mark Allman ICSI Center for Internet Research 1947 Center Street, Suite 600 Berkeley, CA 94704-1198 Phone: (440) 235-1792 Iyengar, Allman, Blanton [Page 6] draft-iyengar-burst-mitigation-01.txt January 2006 Email: mallman@icir.org URL: http://www.icir.org/mallman/ Ethan Blanton Purdue University Computer Sciences 250 North University Street West Lafayette, IN 47907 Email: eblanton@cs.purdue.edu URL: http://www.cs.purdue.edu/homes/eblanton/ Intellectual Property Statement The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79. Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf-ipr@ietf.org. Disclaimer of Validity This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Copyright Statement Copyright (C) The Internet Society (2006). This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights. Acknowledgment Funding for the RFC Editor function is currently provided by the Internet Society. Iyengar, Allman, Blanton [Page 7]