TCPM Working Group J. Touch Internet Draft USC/ISI Intended status: Standards Track July 16, 2012 Expires: January 2013 Automating the Initial Window in TCP draft-touch-tcpm-automatic-iw-03.txt Status of this Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html This Internet-Draft will expire on January 16, 2011. Copyright Notice Copyright (c) 2012 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Touch, (TBD) Expires January 16, 2013 [Page 1] Internet-Draft Automating the Initial Window in TCP July 2012 Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Abstract The Initial Window (IW) provides the starting point for TCP's feedback-based congestion control algorithm. Its value has increased over time to increase performance and to reflect increased capability of Internet devices. This document describes a mechanism to adjust the IW over long timescales, to make future changes more safely deployed and to potentially avoid reexamination of this value in the future. Table of Contents 1. Introduction...................................................2 2. Conventions used in this document..............................3 3. Design Considerations..........................................3 4. Proposed IW Algorithm..........................................4 5. Discussion.....................................................7 6. Observations...................................................8 7. Security Considerations........................................9 8. IANA Considerations............................................9 9. Conclusions....................................................9 10. References....................................................9 10.1. Normative References.....................................9 10.2. Informative References...................................9 11. Acknowledgments..............................................10 1. Introduction TCP's congestion control algorithm uses an initial window value (IW), both as a starting point for new connections and after one RTO or more [RFC2581][RFC2861]. This value has evolved over time, originally one maximum segment size (MSS), and increased to the lesser of four MSS or 4,380 bytes [RFC3390][RFC5681]. For typical Internet connections with an maximum transmission units (MTUs) of 1500 bytes, this permits three segments of 1,460 bytes each. The IW value was originally implied in the original TCP congestion control description, and documented as a standard in 1997 [RFC2001][Ja88]. The value was last updated in 1998 experimentally, and moved to the standards track in 2002 [RFC2414][RFC3390]. There have been recent proposals to update the IW based on further increases in host and router capabilities and network capacity, some Touch, (TBD) Expires January 16, 2013 [Page 2] Internet-Draft Automating the Initial Window in TCP July 2012 focusing on specific values (e.g., IW=10), and others prescribing a schedule for increases over time (e.g., IW=6 for 2011, increasing by 1-2 MSS per year). This document proposes that TCP can objectively measure when an IW is too large, and that such feedback should be used over long timescales to adjust the IW automatically. The result should be safer to deploy and might avoid the need to repeatedly revisit IW size over time. 2. Conventions used in this document The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC-2119 [RFC2119]. In this document, these words will appear with that interpretation only when in ALL CAPS. Lower case uses of these words are not to be interpreted as carrying RFC-2119 significance. In this document, the characters ">>" preceding an indented line(s) indicates a compliance requirement statement using the key words listed above. This convention aids reviewers in quickly identifying or finding the explicit compliance requirements of this RFC. 3. Design Considerations TCP's IW value has existed statically for over two decades, so any solution to adjusting the IW dynamically should have similarly stable, non-invasive effects on the performance and complexity of TCP. In order to be fair, the IW should be similar for most machines on the public Internet. Finally, a desirable goal is to develop a self-correcting algorithm, so that IW values that cause network problems can be avoided. To that end, we propose the following list of design goals: o Impart little to no impact to TCP in the absence of loss, i.e., it should not increase the complexity of default packet processing in the normal case. o Adapt to network feedback over long timescales, avoiding values that persistently cause network problems. o Decrease the IW in the presence of sustained loss of IW segments, as determined over a number of different connections. Touch, (TBD) Expires January 16, 2013 [Page 3] Internet-Draft Automating the Initial Window in TCP July 2012 o Increase the IW in the absence of sustained loss of IW segments, as determined over a number of different connections. o Operate conservatively, i.e., tend towards leaving the IW the same in the absence of sufficient information, and give greater consideration to IW segment loss than IW segment success. We expect that, without other context, a good IW algorithm will converge to a single value, but this is not required. An endpoint with additional context or information, or deployed in a constrained environment, can always use a different value. In specific, information from previous connections, or sets of connections with a similar path, can already be used as context for such decisions [RFC2140]. However, if a given IW value persistently causes packet loss during the initial burst of packets, it is clearly inappropriate and could be inducing unnecessary loss in other competing connections. This might happen for sites behind very slow boxes with small buffers, which may or may not be the first hop. 4. Proposed IW Algorithm Below is a simple description of the proposed IW algorithm. It relies on the following parameters: o MinIW = 3 MSS or 4,380 bytes (as per RFC3390] o MaxIW = 10 o MulDecr = 0.5 o AddIncr = 2 MSS o Threshold = 0.05 We assume that the minimum IW (MinIW) should be as currently specified [RFC3390]. The maximum IW can be set to a fixed value [Ch10], or set based on a schedule if trusted time references are available [Al10]; here we prefer a fixed value. We also propose to use an AIMD algorithm, with increase and decreases as noted. Although these parameters are somewhat arbitrary, their initial values are not important except that the algorithm is AIMD and the MaxIW should not exceed that recommended for other systems on the Internet. Current proposals, including default current operation, are degenerate cases of the algorithm below for given parameters - Touch, (TBD) Expires January 16, 2013 [Page 4] Internet-Draft Automating the Initial Window in TCP July 2012 notably MulDec = 1.0 and AddIncr = 0 MSS, thus disabling the automatic part of the algorithm. The proposed algorithm is as follows: 0. On boot: IW = MaxIW; # assume this is in bytes, and an even number of MSS 1. Upon starting a new connection CWND = IW; conncount++; IWnotchecked = 1; # true 2. During a connection's SYN-ACK processing, if SYN-ACK includes ECN, treat as if the IW is too large if (IWnotchecked && (synackecn == 1)) { losscount++; IWnotchecked = 0; # never check again } 3. During a connection, if retransmission occurs, check the seqno of the outgoing packet (in bytes) to see if the resent segment fixes an IW loss: if (Retransmitting && IWnotchecked && ((ISN - seqno) < IW))) { losscount++; IWnotchecked = 0; # never do this entire "if" again } else { IWnotchecked = 0; # you're beyond the IW so stop checking } 4. Once every 1000 conections, as a separate process (i.e., not as part of processing a given connection): Touch, (TBD) Expires January 16, 2013 [Page 5] Internet-Draft Automating the Initial Window in TCP July 2012 if (conncount > 1000) { if (losscount/conncount > threshold) { # the number of connections with errors is too high IW = IW * MulDecr; } else { IW = IW + AddIncr; } } We recognize that this algorithm can yield a false positive when the sequence number wraps around. This can be avoided using either PAWS [RFC1323] context or 64-bit internal sequence numbers (as in TCP-AO [RFC5925]). Alternately, false positives can be allowed since they are expected to be infrequent and thus will not affect the overall statistics of the algorithm. The following additional constraints are imposed: >> The automatic IW algorithm MUST initialize to MaxIW, in the absence of other context information. If there are too few connections to make a decision, or if there is otherwise insufficient information to increase the IW, then the MaxIW defaults to the current recommended value. >> An implementation may allow the MaxIW to grow beyond the currently recommended Internet default, but not more than 2 segments per calendar year. If an endpoint has a persistent history of successfully transmitting IW segments without loss, then it is allowed to probe the Internet to determine if larger IW values have similar success. This probing is limited and requires a trusted time source, otherwise the MaxIW remains constant. >> An implementation MUST adjust the IW based on loss statistics at least once every 1000 connections. An endpoint needs to be sufficiently reactive to IW loss. >> An implementation MUST decrease the IW by at least one MSS when indicated during an evaluation interval. An endpoint that detects loss needs to decrease its IW by at least one MSS, otherwise it is not participating in an automatic reactive algorithm. Touch, (TBD) Expires January 16, 2013 [Page 6] Internet-Draft Automating the Initial Window in TCP July 2012 >> An implementation MUST increase by no more than 2 MSS per evaluation interval. An endpoint that does not experience IW loss needs to probe the network incrementally. >> An implementation SHOULD use an IW that is an integer multiple of 2 MSS. The IW should remain a multiple of 2 MSS segments, to enable efficient ACK compression without incurring unnecessary timeouts. >> An implementation MUST decrease the IW if more than 95% of connections have IW losses. Again, this is to ensure an implementation is sufficiently reactive. >> An implementation MAY group IW values and statistics within subsets of connections. Such grouping MAY use any information about connections to form groups except loss statistics. There are some TCP connections which might not be counted at all, such as those to/from loopback addresses, or those within the same subnet as that of a local interface (for which congestion control is sometimes disabled anyway). This may also include connections that terminate before the IW is full, i.e., as a separate check at the time of the connection closing. The period over which the IW is updated is intended to be a long timescale, e.g., a month or so, or 1,000 connections, whichever is longer. An implementation might check the IW once a month, and simply not update the IW or clear the connection counts in months where the number of connections is too small. 5. Discussion There are numerous parameters to the above algorithm that are compliant with the given requirements; this is intended to allow variation in configuration and implementation while ensuring that all such algorithms are reactive and safe. This algorithm continues to assume segments because that is the basis of most TCP implementations. It might be useful to consider revising the specifications to allow byte-based congestion given sufficient experience. Touch, (TBD) Expires January 16, 2013 [Page 7] Internet-Draft Automating the Initial Window in TCP July 2012 The algorithm checks for IW losses only during the first IW after a connection start; it does not check for IW losses elsewhere the IW is used, e.g., during slow-start restarts. >> An implementation MAY detect IW losses during slow-start restarts in addition to losses during the first IW of a connection. In this case, the implementation MUST count each restart as a "connection" for the purposes of connection counts and periodic rechecking of the IW value. False positives can occur during some kinds of segment reordering, e.g., that might trigger spurious retransmissions even without a true segment loss. These are not expected to be sufficiently common to dominate the algorithm and its conclusions. This mechanism does require additional per-connection state which is currently common in some implementations, and is useful for other reasons (e.g., the ISN is used in TCP-AO [RFC5925]). The mechanism also benefits from persistent state kept across reboots, as would be other state sharing mechanisms (e.g., TCP Control Block Sharing [RFC2140]). The mechanism is inspired by RFC 2140's use of information across connections. The receive window (RWIN) is not involved in this calculation. The size of RWIN is determined by receiver resources, and provides space to accommodate segment reordering. It is not involved with congestion control, which is the focus of this document and its management of the IW. 6. Observations The IW may not converge to a single, global value. It also may not converge at all, but rather may oscillate by a few MSS as it repeatedly probes the Internet for larger IWs and fails. Both properties are consistent with TCP behavior during each individual connection. This mechanism assumes that losses during the IW are due to IW size. Persistent errors that drop packets for other reasons - e.g., OS bugs, can cause false positives. Again, this is consistent with TCP's basic assumption that loss is caused by congestion and requires backoff. This algorithm treats the IW of new connections as a long-timescale backoff system. Touch, (TBD) Expires January 16, 2013 [Page 8] Internet-Draft Automating the Initial Window in TCP July 2012 7. Security Considerations This algorithm presents an opportunity for an intelligent attack to reduce the IW of a given system, by repeatedly dropping packets during the IW only. An intermediate that can drop packets in a controlled manner can already impact the performance of a connection, and can reduce the congestion window of an ongoing connection in ways that impact performance more than just dropping during the IW. 8. IANA Considerations This document has no IANA considerations. This section should be removed prior to publication. 9. Conclusions 10. References 10.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC3390] Allman, M., Floyd, S., Partridge, C., "Increasing TCP's Initial Window", RFC 3390 (Standards Track), Oct. 2002. [RFC5681] Allman, M., Paxson, V., Blanton, E., "TCP Congestion Control," RFC 5681 (Standards Track), Sep. 2009. 10.2. Informative References [Al10] Allman, M., "Initial Congestion Window Specification", (work in progress), draft-allman-tcpm-bump-initcwnd-00, Nov. 2010. [Ch10] Chu, J., Dukkipati, N., Cheng, Y., Mathis, M., "Increasing TCP's Initial Window," (work in progress), draft-ietf- tcpm-initcwnd-04, Jun. 2012. [Ja88] Jacobson, V., M. Karels, "Congestion Avoidance and Control", Proc. Sigcomm 1988. [RFC1323] Jacobson, V., Braden, R., Borman, D., "TCP Extensions for High Performance", RFC 1323, May 1992. Touch, (TBD) Expires January 16, 2013 [Page 9] Internet-Draft Automating the Initial Window in TCP July 2012 [RFC2001] Stevens, W., "TCP Slow Start, Congestion Avoidance, Fast Retransmit, and Fast Recovery Algorithms", RFC2001 (Standards Track), Jan. 1997. [RFC2140] Touch, J., "TCP Control Block Interdependence", RFC 2140 / STD 7(Informational), Apr. 1997. [RFC2414] Allman, M., Floyd, S., Partridge, C., "Increasing TCP's Initial Window", RFC 2414 (Experimental), Sept. 1998. [RFC2581] Allman, M., Paxson, V., Stevens, W., "TCP Congestion Control," RFC2581 (Standards Track), Apr. 1999. [RFC2861] Handley, M., Padhye, J., Floyd, S., "TCP Congestion Window Validation", RFC2861 (Experimental), June 2000. [RFC5925] Touch, J., A. Mankin, R. Bonica, "The TCP Authentication Option", RFC 5925 (Standards Track), June 2010. 11. Acknowledgments Mark Allman and Aki Nyrjinen contributed to the development of this algorithm. Members of the TCPM mailing list also participated in providing useful feedback. This document was prepared using 2-Word-v2.0.template.dot. Authors' Addresses Joe Touch USC/ISI 4676 Admiralty Way Marina del Rey, CA 90292-6695 U.S.A. Phone: +1 (310) 448-9151 Email: touch@isi.edu Touch, (TBD) Expires January 16, 2013 [Page 10]