TRIGTRAN (BoF) S. Dawkins Internet-Draft MCSR Labs Expires: November 20, 2003 May 22, 2003 End-to-end, Implicit "Link-Up" Notification draft-dawkins-trigtran-linkup-00.txt Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http:// www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on November 20, 2003. Copyright Notice Copyright (C) The Internet Society (2003). All Rights Reserved. Abstract The Performance Implications of Link Characteristics [PILC] working group is recommending an end-to-end implicit notification when an access link outage ends. This document codifies the "Link Up Notification" for TCP. The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119]. Dawkins Expires November 20, 2003 [Page 1] Internet-Draft "Link-Up" Notifications May 2003 1. Introduction The Transmission Control Protocol (TCP) [RFC793] uses a retransmission timer to ensure data delivery in the absence of any feedback from a remote data receiver, and prescribes an "exponential backoff" for this timer in cases where retransmissions are also unacknowledged. This timer can grow to a very large value (often capped at 64 seconds, but even this limit isn't required by standards-track specifications). This exponential backoff is necessary to prevent sustained congestion (if loss occurs due to congestion), but may provide an unnecessarily unpleasant user experience (if the loss occurs due to link outages in a wireless environment). The Performance Implications of Link Characteristics [PILC] working group is recommending an end-to-end implicit notification when an access link outage ends [LINK, section 8.2]. The goal is to allow sending transports to retransmit in a timely fashion without modifying the exponential backoff mechanism. This notification was well-supported in the IETF 56 TRIGTRAN BoF [TRIGTRAN56]. This note describes a method of "short-circuiting" a "backed-off" retransmission timer in a case where a TCP detects that a local interface has become operational, so that a sender is notified that another retransmission attempt may be appropriate. The TCP using the interface sends a "Link Up Notification" (or "LUN") to its peer. Dawkins Expires November 20, 2003 [Page 2] Internet-Draft "Link-Up" Notifications May 2003 2. Problem Statement The Transmission Control Protocol (TCP) [RFC793] uses a retransmission timer to ensure data delivery in the absence of any feedback from a remote data receiver. This timer, called the retransmission timeout (RTO), is calculated using an algorithm specified in [RFC2988]. When an RTO occurs, the sender retransmits an unacknowledged segment. If this retransmitted segment is also unacknowledged, the sender waits twice as long before attempting an additional retransmission, and this effect is cumulative. The initial value of RTO is 3 seconds, and subsequent values during normal operation approach a smoothed average of the RTT (plus a factor based on the variance in RTT), with a lower bound of 1 second. When a segment is lost, and cannot be recovered by other means (Fast Retransmit), the RTO used to trigger the first retransmission attempt will be as short as is "reasonable" - the RTO is calculated based on the measured RTT, so the RTO will happen with a reasonable expectation that no acknowledgement for data sent before RTO will be received after RTO. This might be characterized as "as soon as possible, but no sooner". All well and good, if the retransmitted segment is acknowledged. If it is not acknowledged, the TCP will wait twice as long before retransmitting again, and will continue to double the RTO interval each time its attempt to retransmit fails. This behavior is conservative, ensuring that sending TCPs "back off" in the presence of path congestion. This desirable property comes at a price - current RTO values quickly increase into the 10s of seconds between retransmission attempts, a painfully slow interval if a human being is "in the loop". BSD-based TCPs finally "cap" the maximum RTO value at 64 seconds, but this "cap" is not required [RFC2988] - conformant TCPs are allowed to continue to increase RTO into multiple minutes between retransmission attempts. If an RTO has happened because of path congestion, high and rising RTO-based periods of "silence" are necessary to ensure that path congestion does not remain, or even increase, at a time when the sending TCP is not receiving any feedback from the receiver. If an RTO has happened because of an access link failure, an all-too-common situation when the access link is a wireless link, and the access link becomes available again, the unexpired portion of the full RTO period is not required to prevent sustained congestion, because no congestion was occurring. However, today's sending TCPs Dawkins Expires November 20, 2003 [Page 3] Internet-Draft "Link-Up" Notifications May 2003 cannot know this is the case, must make the conservative assumption that lost packets are being lost due to congestion, and have no indication that the RTO is caused by an access link failure. It is axiomatic that a "human in the loop" will abandon the operation leading to multiple minutes of inactivity and "try again" - for instance, pressing the "stop" and "reload" buttons on an HTTP browser. These operations often reset or abandon existing TCP connections, causing TCPs to discard learned path characteristics, and add additional packets (SYN/SYN-ACK on new connections, etc.) If it's possible to prevent this, it's desirable to do so. 2.1 A Historical Note: "Kicking" TCP The IETF PILC Working group is recommending retransmission of packets on an interface that has returned to operational status, in [LINK]. [LINK] documents informal practice, but additional details are required for standards-track TCPs. "Kicking TCP" takes its name from Phil Karn's posting to the PILC mailing list, proposing that routers driving subnetworks subject to lengthy outages "try to hold onto the last IP packet of each flow when a link goes down and forward it to its destination when the link comes back up". [LINKNOTE]. Ideally, a "Link Up Notification" (or "LUN") would be accomplished using an ICMP message, but in today's Internet, an end-to-end TCP packet for an existing connection is more likely to "arrive" at its destination across border gateways, firewalls, and NATs. "Kicking TCP" takes advantage of this - the LUN is exactly a packet that has already been transmitted on an existing connection path. This document takes "Kicking TCP" as a starting point. It extends "Kicking TCP" by adding sender-side behavior for apparently-duplicated packets received on an RTOed TCP connection. 2.2 Applicability Statement Hosts supporting TCP-based applications over subnetwork interfaces subject to multi-second outages MAY perform the actions described in Section 3. These actions are more attractive for "human-in-the-loop" applications, but are acceptable for any TCP-based application. All hosts supporting TCP-based applications SHOULD perform the actions described in Section 4. Dawkins Expires November 20, 2003 [Page 4] Internet-Draft "Link-Up" Notifications May 2003 3. When a Local Interface Returns to "UP" If a host contains a local interface that is subject to frequent and lengthy outages, the host subnetwork implementation MAY retain a copy of "the last" packet transmitted on each TCP connection. When the subnetwork implementation detects that a local interface has returned to "UP" status, the subnetwork implementation MAY retransmit the last packet stored for each TCP connection. 3.1 Layering Violation Tradeoffs This proposal casually acts like subnetwork implementations can track TCP connections between two end hosts. This is a layering violation. If an implementation finds it more convenient to provide "local link up" indications to its own TCP, LUN functionality can be implemented in the TCP/IP stack. Not all subnetwork implementations are able to distinguish between TCP connections. In this case, the subnetwork may chose to store one packet per destination host. TCP source and destination port numbers will be masked when the host is using IPSEC Encapsulating Secure Payload [ESP], because this cryptographic privacy mechanism obscures these fields from the TCP/IP "pseudo header". In these cases, the subnetwork may also choose to store one packet per destination host. If a host is storing one packet per destination host, it should be the most recently transmitted packet, to maximize the probability that a LUN will restart an active TCP connection. 3.2 Stopping the Babbling LUNs are intended as an end-to-end implicit notification to a peer TCP, not a reliable signal. If a LUN is also lost due to a new link outage, no additional LUNs will take place unless the local interface "cycles" again. Some subnetwork technologies can cycle between operational and non-operational status very rapidly. To prevent "LUN storms", hosts MUST wait at least one second (the minimum RTO value) after an interface becomes operational before sending a LUN. Modified hosts MUST not send LUNs more frequently than once every three seconds. This restriction matches the RTO period for a new TCP connection. Dawkins Expires November 20, 2003 [Page 5] Internet-Draft "Link-Up" Notifications May 2003 4. When an RTOed TCP Sender Receives a LUN The LUN described in Section 3 will contain an acknowledgement sequence number, if the TCP connection has advanced to the ESTABLISHED state. There are several possibilities (using [RFC793]-style notation): 1. SND.NXT < SEG.ACK - in this case, the receiver has retransmitted an acknowledgement for a segment that hasn't been sent yet. 2. SND.UNA < SEG.ACK <= SND.NXT - in this case, the receiver has retransmitted a "new" ACK that the sender has not seen. The TCP would process this segment normally - it would remove the acknowledged segments from the retransmission queue and perform slow start (since the connection is already in RTO). 3. SEG.ACK <= SND.UNA - in this case, the receiver has retransmitted a "duplicate" ACK that the sender has seen previously. Normally, this segment would be ignored (as having been duplicated or reordered by the IP network). This memo adds the following TCP mechanism: for a connection in RETRANSMISSION-WAIT, the sending TCP SHOULD perform slow start. OPEN ISSUE: should we tighten the criteria for a "duplicated ACK", so that we only trigger on a LUN for the "most recent" ACK transmitted? (perhaps SEG.ACK - SND.UNA <= PMTU? Is this doable in most TCP implementations?) Dawkins Expires November 20, 2003 [Page 6] Internet-Draft "Link-Up" Notifications May 2003 5. Security Considerations This memo describes a (small) change in TCP behavior. The procedures defined in this memo will cause sending hosts to retransmit one packet per RTOed connection before RTO timers would have expired (when the sending host would have retransmitted one packet per connection anyway). The procedures defined in this memo may cause a TCP to "give up" on an RTOed connection more rapidly than it would have previously (for instance, modified BSD-derived TCPs may still attempt retransmission 12 times, and then abandon the connection, even if LUNs cause retransmissions to take place before an RTO timer would have expired). This memo assumes that fully-backed-off TCP connections for interactive applications will often be abandoned anyway, resulting in additional traffic (SYN/SYN-ACKs, etc.), so that these considerations may be outweighed by traffic avoidance in these situations. Dawkins Expires November 20, 2003 [Page 7] Internet-Draft "Link-Up" Notifications May 2003 6. IANA Considerations There are no IANA considerations for this document. Author's Address Spencer Dawkins MCSR Labs 1547 Rivercrest Blvd. Allen, TX 75002 US Phone: +1-972-727-9834 EMail: spencer@mcsr-labs.org Dawkins Expires November 20, 2003 [Page 8] Internet-Draft "Link-Up" Notifications May 2003 Appendix A. References [LINK]: "Advice for Internet Subnetwork Designers", Phil Karn (editor), February 2003 [draft-ietf-pilc-link-design-13.txt, work in progress] [LINKNOTE]: "Kicking TCP", posting on PILC mailing list by Phil Karn, March 7, 2000 [http://pilc.grc.nasa.gov/list/archive/0691.html] [PILC]: "Performance Implications of Link Characteristics", IETF Working group [http://www.ietf.org/html.charters/ pilc-charter.html] [RFC793]: "Transmission Control Protocol", J. Postel, September, 1981 [ftp://ftp.rfc-editor.org/in-notes/rfc793.txt] [RFC2119]: "Key words for use in RFCs to Indicate Requirement Levels", S. Bradner, March 1997 [ftp://ftp.rfc-editor.org/ in-notes/rfc2119.txt] [RFC2988]: "Computing TCP's Retransmission Timer", V. Paxson, M. Allman, November, 2000 [ftp://ftp.rfc-editor.org/in-notes/ rfc2988.txt] [TRIGTRAN56]: "Triggers for Transport (TRIGTRAN) BoF minutes", March, 2003 [http://www.ietf.org/proceedings/03mar/minutes/trigtran.htm] Dawkins Expires November 20, 2003 [Page 9] Internet-Draft "Link-Up" Notifications May 2003 Intellectual Property Statement The IETF takes no position regarding the validity or scope of any intellectual property or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; neither does it represent that it has made any effort to identify any such rights. Information on the IETF's procedures with respect to rights in standards-track and standards-related documentation can be found in BCP-11. Copies of claims of rights made available for publication and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementors or users of this specification can be obtained from the IETF Secretariat. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights which may cover technology that may be required to practice this standard. Please address the information to the IETF Executive Director. Full Copyright Statement Copyright (C) The Internet Society (2003). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assignees. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION Dawkins Expires November 20, 2003 [Page 10] Internet-Draft "Link-Up" Notifications May 2003 HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Acknowledgement Funding for the RFC Editor function is currently provided by the Internet Society. Dawkins Expires November 20, 2003 [Page 11]