TCPM Working Group S. Schuetz Internet-Draft NEC Intended status: Standards Track L. Eggert Expires: September 6, 2007 Nokia W. Eddy Verizon Y. Swami K. Le Nokia March 5, 2007 TCP Response to Lower-Layer Connectivity-Change Indications draft-schuetz-tcpm-tcp-rlci-01 Status of this Memo By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79. This document may not be modified, and derivative works of it may not be created, except to publish it as an RFC and to translate it into languages other than English. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on September 6, 2007. Copyright Notice Copyright (C) The IETF Trust (2007). Schuetz, et al. Expires September 6, 2007 [Page 1] Internet-Draft TCP Response to Connectivity Indications March 2007 Abstract When the path characteristics between two hosts change abruptly, TCP can experience significant delays before resuming transmission in an efficient manner or TCP can behave unfairly to competing traffic. This document describes TCP extensions that improve transmission behavior in response to advisory, lower-layer connectivity-change indications. The proposed TCP extensions modify the local behavior of TCP and introduce a new TCP option to signal locally received connectivity-change indications to remote peers. Performance gains result from a more efficient transmission behavior and there is no difference in aggressiveness in comparison to a freshly-started connection. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Motivation and Overview . . . . . . . . . . . . . . . . . . . 4 3. Connectivity-Change Indications . . . . . . . . . . . . . . . 6 4. TCP Response to Connectivity-Change Indications . . . . . . . 7 4.1. Connectivity-Change Indication TCP Option . . . . . . . . 8 4.2. Generation and Processing of Connectivity-Change Indication TCP Options . . . . . . . . . . . . . . . . . . 9 4.3. Re-Probing Path Characteristics . . . . . . . . . . . . . 13 4.4. Speculative Retransmission . . . . . . . . . . . . . . . . 14 5. Security Considerations . . . . . . . . . . . . . . . . . . . 14 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 15 7. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 15 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 15 8.1. Normative References . . . . . . . . . . . . . . . . . . . 15 8.2. Informative References . . . . . . . . . . . . . . . . . . 16 Editorial Comments . . . . . . . . . . . . . . . . . . . . . . . . Appendix A. Background: Classification of Connectivity Disruptions . . . . . . . . . . . . . . . . . . . . . 18 A.1. Short Connectivity Disruptions . . . . . . . . . . . . . . 20 A.2. Long Connectivity Disruptions . . . . . . . . . . . . . . 21 Appendix B. Document Revision History . . . . . . . . . . . . . . 24 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 24 Intellectual Property and Copyright Statements . . . . . . . . . . 26 Schuetz, et al. Expires September 6, 2007 [Page 2] Internet-Draft TCP Response to Connectivity Indications March 2007 1. Introduction The Transmission Control Protocol (TCP) [RFC0793] generally assumes that the end-to-end path between two hosts has characteristics that are relatively stable over the lifetime of a connection. Although TCP's congestion control algorithms [RFC2581] can adapt to changes to the path characteristics after several round-trip times, they fail to support efficient operation in the few round-trip times immediately after a significant path change. This is due to the granularity of TCP's sampling mechanisms. Significant changes to path connectivity include loss or reestablishment of connectivity, and drastic, abrupt changes in round-trip time (RTT) or available bandwidth. Connectivity changes that occur on such short time-scales are becoming more common, due to host mobility or intermittent network attachment. This document describes a set of complementary TCP extensions that improve behavior when transmitting over paths whose characteristics can change on short time-scales. TCP implementations that support these extensions respond to receiving generic, link-technology- independent, per-connection "path characteristics have changed" (or short: "connectivity-change") indications from lower layers. A connectivity-change indication signals that the characteristics of the end-to-end path between the local node and its peer have changed in some undefined way. The response mechanisms proposed for TCP act on this information in a conservative fashion. The specific response depends on the state of a connection. It is important to note that this addition of response mechanisms to lower-layer information is following an established precedent. TCP and other transport protocols already react to information and signals from lower layers; the proposed connectivity-change indications thus extend an established interface between layers in the protocol stack. TCP measures the end-to-end path to implicitly derive network-layer information. TCP also directly reacts to network-layer signals delivered via ICMP, for example, "Port Unreachable" or the now-deprecated "Source Quench" [RFC1122]. Explicit Congestion Notification (ECN) [RFC3168] and Quick-Start [I-D.ietf-tsvwg-quickstart] are other sources of network-layer information for which response mechanisms for TCP have been defined. Connectivity-change indications are yet another source of lower-layer information that TCP can use to improve its operation. A second important point to note is that the TCP response mechanisms to connectivity-change indications are purely optional efficiency improvements. In the absence of connectivity-change indications, a TCP that implements these changes behaves identically to an unmodified TCP. When lower layers provide connectivity-change Schuetz, et al. Expires September 6, 2007 [Page 3] Internet-Draft TCP Response to Connectivity Indications March 2007 indications that trigger the response mechanisms, they enhance TCP operation based on the explicit lower-layer information that is signaled. These response mechanisms do not increase the aggressiveness of TCP. Note that the IAB has recently described architectural issues of "link indications" [I-D.iab-link-indications]. The authors feel that this term is not quite accurate in this environment, because transport mechanisms should remain link-technology-agnostic. However, transport protocols have always acted on network-layer information and signals, such as measured path characteristics or ICMP-signaled conditions. Because of the growing proliferation of shim layers between the traditional network and transport layers, this document uses the term "lower-layer indication" to remain independent of specific network or shim layers. Note that it is currently an open question as to whether additional lower-layer indications can provide further information to transport protocols. Also, this document only describes response mechanisms for TCP, although other transport protocols may benefit from similar response mechanisms to react to connectivity-change indications. The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119]. 2. Motivation and Overview Several proposed network-layer extensions support host mobility, including Mobile IPv4 [RFC3344], Mobile IPv6 [RFC3775] and HIP [I-D.ietf-hip-mm]. Typically, they shield transport-layer protocols from mobility events and enable them to sustain established connections across mobility events. However, the path characteristics that established connections experience after a mobility event may have changed drastically and on short time-scales. Congestion control, RTT and path-MTU state gathered over an old path before the move generally have no meaning for the new path. Because TCP uses stale information when resuming transmission over the new path, it can be either too aggressive or highly inefficient. Similar conditions may be found when fail-overs occur for multihomed hosts through the shim6 protocol. Some background on the types of scenarios that the technology described in this document is designed to work within are found in Appendix A. TCP already forces a slow-start restart in some cases where the network state becomes unknown, such as after an idle period or heavy losses. A first part of the response specified in this document Schuetz, et al. Expires September 6, 2007 [Page 4] Internet-Draft TCP Response to Connectivity Indications March 2007 involves a similar return to initial slow-start state in response to connectivity-change indications that are received while a connection is transmitting in steady-state. Note that this behavior is more conservative than the standard TCP response or lack of response. Some performance gains with the proposed mechanisms are due to either avoiding overloading the new path, which typically incurs an RTO, or using slow-start to quickly detect new capacity far above the point where steady-state had previously been near. A second response component improves TCP operation in the presence of temporary connectivity disruptions. These disruptions can occur independently of mobility events and, for example, may be due to insufficient wireless access coverage or nomadic computer use. Connectivity disruptions can severely decrease TCP performance. The main reason for this decrease is TCP's retransmission behavior after a connectivity disruption [SCHUETZ]. TCP uses periodic retransmission attempts in exponentially increasing intervals, which can unnecessarily delay retransmissions after connectivity returns. In the extreme case, TCP connections can even abort, if the disruption is longer than the TCP "user timeout." (Connection aborts are out of scope for this document but can be prevented by the TCP User Timeout Option [I-D.ietf-tcpm-tcp-uto].) This second response action executes when receiving a connectivity- change indication while a connection is stalled in exponential back- off. It improves TCP retransmission behavior after connectivity is restored through an immediate speculative retransmission attempt [footnote-1]. Similar to the first response component, the second one also increases TCP performance through a more intelligent transmission behavior that uses periods of connectivity more efficiently. In comparison to startup of a new connection, it does not cause significant amounts of additional traffic and it does not change TCP's congestion control algorithms. Finally, this draft specifies a third response component, which is a new TCP option that notifies the connection's remote peer of a connectivity-change event detected locally. This is useful because connectivity-change indications typically require appropriate responses at both ends of a connection, but may only be received or detected by one end. The other parts of the response to a connectivity-change indication are independent of the indication's source (locally notified or remotely signaled) and depend only on the specific indication and the state of the connection for which it was received. Schuetz, et al. Expires September 6, 2007 [Page 5] Internet-Draft TCP Response to Connectivity Indications March 2007 3. Connectivity-Change Indications The focus of this document is on specifying TCP response mechanisms to lower-layer "path characteristics have changed" indications. This section briefly describes how different network- and shim-layer mechanisms underneath the transport layer may provide these "connectivity-change" indications to TCP. This section is included for clarification only; details on connectivity indication sources are out of scope of this document. When lower layers detect a connectivity-change event, they generate corresponding connectivity-change indications. Lower-layer events that could trigger such an indication include (but are not limited to): o the IP address of the local outbound interface used for a given connection has changed, e.g., due to DHCP [RFC2131] or IPv6 router advertisements [RFC2460] o link-layer connectivity of the local outbound interface used for a given connection has changed, e.g., link-layer "link up" event [I-D.ietf-dna-link-information] o the local outbound interface used for a given connection has changed, due to routing changes or link-layer connectivity changes at other interfaces (including tunnel establishment or teardown, e.g., in response to IKE events [RFC4306]) o a Mobile IP binding update has completed [RFC3775] o a HIP readdressing update has completed [I-D.ietf-hip-mm] o a path-change signal from the network has arrived (possible in theory, depends on network capabilities) o other notifications as defined by the IETF's Detecting Network Attachment (DNA) working group have occurred [I-D.ietf-dna-link-information] Note that the list above only describes some potential sources for connectivity-change events. Other sources exist, but the details on when to generate such events are out of the scope of this document, which focuses on the TCP response mechanisms when such events are received. Schuetz, et al. Expires September 6, 2007 [Page 6] Internet-Draft TCP Response to Connectivity Indications March 2007 4. TCP Response to Connectivity-Change Indications A TCP connection can receive a connectivity-change indication (CCI) either from its local stack ("local CCI") or through a new "connectivity-change indication TCP option" from its peer ("remote CCI"). Section 4.1 specifies this new TCP option. In either case, upon reception of a CCI, the TCP response mechanisms defined in this document re-probe path characteristics or perform a speculative retransmission, depending on whether the connection is currently stalled in exponential back-off or transmitting in steady-state. A connection is "stalled in exponential back-off", if at least one segment was retransmitted due to an RTO expiration but has not been ACK'ed yet. The remainder of this section first defines the format of the new CCI option in Section 4.1 and then describes the two TCP response mechanisms triggered by receiving CCIs - re-probing path characteristics and speculative retransmission - in Section 4.3 and Section 4.4. To implement the RLCI mechanism defined in this document, TCP implementations MUST maintain five new state variables per TCP connection [footnote-2]: LOCAL_CCI_COUNT Counts (modulo 8) the number of local CCIs received for a connection. Starting from value 7, it is decremented on each local CCI and after 0 wraps up to 7. REMOTE_CCI_COUNT Holds a copy of the last CCI counter value advertised by the peer through a CCI TCP option. This is initialized to 7, and is updated in response to remote CCIs according to the rules defined in Section 4.2. LOCAL_CCI_ACTIVE Boolean flag, true if the local TCP stack is currently executing a response mechanism after having received a local CCI, and false otherwise. REMOTE_CCI_ACTIVE Boolean flag, true if the local TCP stack is currently executing a response mechanism after having received a remote CCI, false otherwise. Schuetz, et al. Expires September 6, 2007 [Page 7] Internet-Draft TCP Response to Connectivity Indications March 2007 REMOTE_CCI_SNDNXT Retains a copy of SND.NXT [RFC0793] at the time the most recent remote CCI was received. 4.1. Connectivity-Change Indication TCP Option Connectivity-change indications (CCIs) are generally asymmetric, i.e., they may occur or be detected by one end but not the other. The basic idea behind the CCI TCP option is to signal the occurrence of local CCIs to the other end, in order to allow it to respond appropriately. 1 2 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 +----------------+---------------+---+-----+-----+ | Kind = X | Length = 3 |RES| CNT | ECNT| +----------------+---------------+---+-----+-----+ Figure 1: Format of the connectivity-change indication TCP option. Figure 1 shows the format of the CCI TCP option. It contains these fields: Kind (8 bits) The TCP option number X [RFC0793] allocated by IANA upon publication of this document (see Section 6). Length (8 bits) Length of the TCP option in octets [RFC0793]; its value MUST be 3. RES (2 bits) Reserved bits. The sender SHOULD set these to zero and the receiver MUST ignore them. CNT (3 bits) Current value of LOCAL_CCI_COUNT of the local end sending the option. ECNT (3 bits) Echoed value of CNT, i.e., the value of CNT in the last CCI option received from the other end. The CCI TCP option contains a counter (CNT) that represents the number of times each side has received local connectivity-change indications. At the beginning of a connection, LOCAL_CCI_ACTIVE and REMOTE_CCI_ACTIVE MUST be set to false. LOCAL_CCI_COUNT and REMOTE_CCI_COUNT MUST be set to 7. REMOTE_CCI_SNDNXT MUST be set to 0. Schuetz, et al. Expires September 6, 2007 [Page 8] Internet-Draft TCP Response to Connectivity Indications March 2007 A host opening a connection includes a CCI option in its SYN segment with the initial LOCAL_CCI_COUNT of 7 to advertise support for the option. A host receiving a SYN MUST NOT include a CCI option in its SYN-ACK unless it has received a CCI option in the corresponding SYN. A host MUST NOT process any following CCI options unless one was included in both the SYN and SYN-ACK. After the SYN exchange, a host SHOULD send a CCI option only after receiving a new local connectivity-change indication, or in response to receiving a new CCI option from the other end. Section 4.3 and Section 4.4 describe the processing rules in detail. A host MUST include a CCI option in all outgoing segments whenever LOCAL_CCI_ACTIVE is true or REMOTE_CCI_ACTIVE is true (or both). A host MUST NOT include a CCI option in any segments whenever LOCAL_CCI_ACTIVE is false and REMOTE_CCI_ACTIVE is false, i.e. the host is not processing any connectivity-change indications. When sending any CCI option, CNT MUST be set to the current LOCAL_CCI_COUNT and ECNT MUST be set to the current REMOTE_CCI_COUNT. 4.2. Generation and Processing of Connectivity-Change Indication TCP Options Processing of a connectivity-change indication can be separated into two parts: 1. Processing in "initiator" mode, i.e., when a host receives a local CCI and forwards it to the other end through a CCI TCP option. 2. Processing in "responder" mode, i.e., when a host that receives a remote CCI in a CCI TCP option from the other end. Section 4.2.1 and Section 4.2.2 describe the state machines at an initiator and a responder, respectively. Note that a single host can be both initiator and responder at the same time, if a local CCI and a remote CCI happen to occur at the same time. The following events, conditions and actions are used in the definition of the two state machines: Events: E_LOCAL_CCI Local end received a local CCI. Schuetz, et al. Expires September 6, 2007 [Page 9] Internet-Draft TCP Response to Connectivity Indications March 2007 E_REMOTE_CCI Local end received information about a remote CCI, i.e., received a TCP segment that includes a CCI TCP option. E_NONE Local end received a TCP segment that does not include a CCI TCP option. Conditions: C_NEW_REMOTE_CCI Received CCI option signals a new remote CCI, i.e., CNT != REMOTE_CCI_COUNT. C_ECHOED_LOCAL_CCI Received CCI option echoes the local CCI counter, i.e., ECNT == LOCAL_CCI_COUNT. C_LOCAL_PROGRESS Local end made progress since receiving the last remote CCI, i.e., ACK > REMOTE_CCI_SNDNXT. Actions: A_DECREMENT_LOCAL Decrement LOCAL_CCI_COUNT, i.e., LOCAL_CCI_COUNT = LOCAL_CCI_COUNT - 1. LOCAL_CCI_COUNT wraps from 0 to 7. A_FORCE_SEND Force transmission of a segment that MUST include a CCI option. The segment can either be an outstanding retransmission, a new data segment or a pure ACK. A_UPDATE_REMOTE_COUNT Update remote CCI counter according to received CCI option, i.e., set REMOTE_CCI_COUNTER = CNT. A_UPDATE_SNDNXT Store the segment number of the next data segment, i.e., set REMOTE_CCI_SNDNXT = SND.NXT. 4.2.1. Initiator Mode Processing This section describes the initiator mode processing of a TCP host implementing RLCI. In initiator mode, a host needs to signal the last received local CCI to its peer, until the peer echoes reception of that CCI. Figure 2 shows the corresponding state machine. Schuetz, et al. Expires September 6, 2007 [Page 10] Internet-Draft TCP Response to Connectivity Indications March 2007 At the beginning of a connection, i.e., before the first local CCI is received, LOCAL_CCI_ACTIVE is false. This remains the case until the local end receives a local CCI (E_LOCAL_CCI). When that happens, it decrements LOCAL_CCI_COUNT (A_DECREMENT_LOCAL), forces a segment to be sent to the peer (A_FORCE_SEND) and LOCAL_CCI_ACTIVE becomes true. Note that this also implies that all subsequent outgoing segments MUST contain a CCI TCP option until LOCAL_CCI_ACTIVE is false (and possibly until REMOTE_CCI_ACTIVE is false, in case it became true during the local CCI processing). E_LOCAL_CCI => A_DECREMENT_LOCAL A_FORCE_SEND +-------------------------+ +-----+ | | | | | V V | +------------------+ +------------------+ | | LOCAL_CCI_ACTIVE | | LOCAL_CCI_ACTIVE | | | == false | | == true | | +------------------+ +------------------+ | ^ ^ | | | | | | | | | | | +---------------------+ | ------+ | E_NONE | E_LOCAL_CCI => | | A_DECREMENT_LOCAL +-------------------------+ A_FORCE_SEND E_REMOTE_CCI && C_ECHOED_LOCAL_CCI Figure 2: State machine for initiator processing. When receiving a local CCI (E_LOCAL_CCI) while LOCAL_CCI_ACTIVE is true, a host remains in this state but needs to perform the actions A_DECREMENT_LOCAL and A_FORCE_SEND. LOCAL_CCI_ACTIVE remains true until a host receives a segment carrying the CCI TCP option (E_REMOTE_CCI) that echoes the current LOCAL_CCI_COUNT in the ECNT field of the option (C_ECHOED_LOCAL_CCI). In this case, LOCAL_CCI_ACTIVE becomes false. 4.2.2. Responder Mode Processing This section describes the responder mode processing of CCIs for a TCP host implementing the CCI TCP option. In responder mode, a host echoes the last received remote CCI to its peer, until it can be sure that the peer correctly received the echo. Figure 3 shows the Schuetz, et al. Expires September 6, 2007 [Page 11] Internet-Draft TCP Response to Connectivity Indications March 2007 corresponding state machine. At the beginning of a connection, REMOTE_CCI_ACTIVE is false, i.e., the local host is not processing any remote CCIs. When it receives a TCP segment with a CCI TCP option (E_REMOTE_CCI) signaling a new remote CCI (C_NEW_REMOTE_CCI), it updates REMOTE_CCI_COUNT with the value of the CNT field in the received option (A_UPDATE_REMOTE_COUNT), stores the segment number of the next data segment in REMOTE_CCI_SNDNXT (A_UPDATE_SNDNXT) and sets REMOTE_CCI_ACTIVE to true. Note that this also implies that all subsequent outgoing segments MUST contain a CCI TCP option until REMOTE_CCI_ACTIVE is false (and possibly until LOCAL_CCI_ACTIVE is false, in case it became true during the remote CCI processing). E_REMOTE_CCI && C_NEW_REMOTE_CCI == true => A_UPDATE_REMOTE_COUNT A_UPDATE_SNDNXT +-------------------------+ +-----+ | | | | | V V | +-------------------+ +-------------------+ | | REMOTE_CCI_ACTIVE | | REMOTE_CCI_ACTIVE | | | == false | | == true | | +-------------------+ +-------------------+ | ^ ^ | | | | | | | | | | | +---------------------+ | ------+ | E_NONE | E_REMOTE_CCI && | | C_NEW_REMOTE_CCI == true => +-------------------------+ A_UPDATE_REMOTE_COUNT E_REMOTE_CCI && A_UPDATE_SNDNXT C_NEW_REMOTE_CCI == false && C_LOCAL_PROGRESS Figure 3: State machine for responder processing. When a host where REMOTE_CCI_ACTIVE is true receives a remote CCI TCP option (E_REMOTE_CCI) that signals a new remote CCI (C_NEW_REMOTE_CCI), it updates REMOTE_CCI_COUNT with the value of the CNT field in the received option (A_UPDATE_REMOTE_COUNT), stores the segment number of the next data segment in REMOTE_CCI_SNDNXT (A_UPDATE_SNDNXT) and leaves REMOTE_CCI_ACTIVE set to true. A host sets REMOTE_CCI_ACTIVE to false only in one of the following Schuetz, et al. Expires September 6, 2007 [Page 12] Internet-Draft TCP Response to Connectivity Indications March 2007 two cases. First, if it receives a TCP segment that does not include a a CCI TCP option (E_NONE), because this signals that LOCAL_CCI_ACTIVE is false at the other end from which it can conclude that the other end has completed processing of the CCI. Second, if it receives a CCI TCP option (E_REMOTE_CCI) that does not signal a new remote CCI (C_NEW_REMOTE_CCI == false) and the connection has made progress since the last remote CCI (C_LOCAL_PROGRESS). In this case, data segments sent after the last remote CCI have already been ACK'ed, i.e., the peer must have received the echoed ECNT value in at least one of the segments sent since the last remote CCI, because a full round-trip of CCI option has completed. Therefore, the local host can terminate responder mode processing. Note: The second transition is required for the case when both hosts are in responder mode at the same time. Neither will stop including CCI TCP options in their segments, because REMOTE_CCI_ACTIVE is true on both sides. This can happen, e.g., when both hosts receive local CCIs at (nearly) the same time and signal it to each other using a CCI TCP option. 4.3. Re-Probing Path Characteristics When a TCP connection receives a connectivity-change indication and is not currently stalled in exponential back-off, it MUST re-probe the path characteristics to prevent causing congestion by transmitting based on stale path state. In principle, this occurs similar to the initial slow-start: The sender MUST NOT transmit more than the default initial window (INIT_WINDOW) of data after a CCI is received and MUST reset the congestion control state (CWND and SS_THRESH), round-trip time measurement (RTTM) state, and RTO timer as if this were a new connection [RFC2581][RFC2988]. If case Path MTU Discovery (PMTUD) is activated, PMTUD state MUST also be reset [RFC1191][RFC1981][I-D.ietf-pmtud-method]. One difference to an initial slow-start is that after a CCI, the connection may have segments in flight towards the destination along a previous path. Therefore, after a CCI, congestion control MUST ignore any stale ACKs received and MUST update the congestion window solely based on ACKs for data that was sent before a CCI was received. Each ACK that is received while the host is processing any CCI SHOULD be treated as a stale ACK, i.e., each ACK received for data sent while LOCAL_CCI_ACTIVE was false or REMOTE_CCI_ACTIVE was false is a stale ACK. In practice, a decent heuristic to disambiguate stale and fresh ACKs is that all ACKs received while either LOCAL_CCI_ACTIVE or REMOTE_CCI_ACTIVE are true are considered stale. This works assuming there is only little large-scale reordering, because the packet that triggers the local state machine back into an inactive state will generally be received after all Schuetz, et al. Expires September 6, 2007 [Page 13] Internet-Draft TCP Response to Connectivity Indications March 2007 stale packets. In some scenarios this assumption may not hold, but it seems reasonable for the vast majority of scenarios where the stale path is cleared of packets in less time than one or two RTTs on the new path. For each stale ACK received, a host MUST NOT adjust the congestion window and MUST NOT send any new data into the network. This SHOULD continue until both LOCAL_CCI_ACTIVE and REMOTE_CCI_ACTIVE are false or there is a timeout. When that occurs, the sender should consider any un-ACK'ed segments below the highest received ACK as lost and discount them from the segments in flight. The sender MUST use slow- start based loss recovery for these segments. 4.4. Speculative Retransmission The basic idea behind the speculative retransmission is to allow TCP to resume stalled connections as soon as it receives an indication that connectivity to previously unreachable peers may have returned. When a TCP connection receives a connectivity-change indication - either from the local stack or in a connectivity-change TCP option from the peer - and is currently stalled, it MUST immediately initiate the standard retransmission procedure, just as if the RTO for the connection had expired. 5. Security Considerations The only foreseen security considerations with the techniques presented in this document result from either an attacker's ability to spoof valid TCP segments with options that seemingly indicate connectivity changes, or an attacker's ability to generate bogus connectivity change indications locally. An attacker might produce a stream of such false indicators that could keep a connection in slow- start at the initial window. One possible defense against this type of attack is to rate-limit the response to connectivity indicators (whether local or remote). This is also probably less serious than other attacks such an empowered adversary could perform, like resetting the connection or injecting data. A similar effect could be achieved without the new option by forging duplicate ACKs that would keep a sender in loss recovery. If both sets of IP addresses, port numbers, and sequence numbers are guessable for a connection, then the connection should use an approved means (such as IPsec) [I-D.ietf-tcpm-tcp-antispoof] for protection against spoofed segments. Schuetz, et al. Expires September 6, 2007 [Page 14] Internet-Draft TCP Response to Connectivity Indications March 2007 6. IANA Considerations This section is to be interpreted according to [RFC2434]. This document does not define any new namespaces. It uses an 8-bit TCP option number maintained by IANA at http://www.iana.org/assignments/tcp-parameters. IANA is requested to assign a new TCP option number upon publication of this document. 7. Acknowledgments This draft combines and obsoletes [I-D.swami-tcp-lmdr] and [I-D.eggert-tcpm-tcp-retransmit-now]. The authors would like to thank Mark Allman, Marcus Brunner, Shashikant Maheshwari, Kacheong Poon, Juergen Quittek, Stefan Schmid and Joe Touch for their comments and suggestions on the two previous drafts. Simon Schuetz is partly funded by Ambient Networks, a research project supported by the European Commission under its Sixth Framework Program. Wesley Eddy's work on this document was performed at NASA's Glenn Research Center, while in support of the NASA Space Communications Architecture Working Group (SCAWG), and the FAA/Eurocontrol Future Communications Study (FCS). 8. References 8.1. Normative References [I-D.ietf-pmtud-method] Mathis, M. and J. Heffner, "Packetization Layer Path MTU Discovery", draft-ietf-pmtud-method-11 (work in progress), December 2006. [RFC0793] Postel, J., "Transmission Control Protocol", STD 7, RFC 793, September 1981. [RFC1191] Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191, November 1990. [RFC1981] McCann, J., Deering, S., and J. Mogul, "Path MTU Discovery for IP version 6", RFC 1981, August 1996. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. Schuetz, et al. Expires September 6, 2007 [Page 15] Internet-Draft TCP Response to Connectivity Indications March 2007 [RFC2434] Narten, T. and H. Alvestrand, "Guidelines for Writing an IANA Considerations Section in RFCs", BCP 26, RFC 2434, October 1998. [RFC2581] Allman, M., Paxson, V., and W. Stevens, "TCP Congestion Control", RFC 2581, April 1999. [RFC2988] Paxson, V. and M. Allman, "Computing TCP's Retransmission Timer", RFC 2988, November 2000. 8.2. Informative References [DUKE] Duke, M., Henderson, T., and J. Meegan, "Experience with ``Link-UP Notification'' Over a Mobile Satellite Link", ACM Computer Communication Review, Vol. 34, No. 3, July 2004. [EDDY] Eddy, W. and Y. Swami, "Adapting End-host Congestion Control for Mobility", NASA Glenn Research Center Technical Report, CR-2005-213838, July 2005. [I-D.dawkins-trigtran-linkup] Dawkins, S., "End-to-end, Implicit 'Link-Up' Notification", draft-dawkins-trigtran-linkup-01 (work in progress), October 2003. [I-D.eggert-tcpm-tcp-retransmit-now] Eggert, L., "TCP Extensions for Immediate Retransmissions", draft-eggert-tcpm-tcp-retransmit-now-02 (work in progress), June 2005. [I-D.iab-link-indications] Aboba, B., "Architectural Implications of Link Indications", draft-iab-link-indications-10 (work in progress), March 2007. [I-D.ietf-dna-link-information] Yegin, A., "Link-layer Event Notifications for Detecting Network Attachments", draft-ietf-dna-link-information-06 (work in progress), February 2007. [I-D.ietf-hip-mm] Nikander, P., "End-Host Mobility and Multihoming with the Host Identity Protocol", draft-ietf-hip-mm-04 (work in progress), June 2006. [I-D.ietf-tcpimpl-restart] Hughes, A., Touch, J., and J. Heidemann, "Issues in TCP Schuetz, et al. Expires September 6, 2007 [Page 16] Internet-Draft TCP Response to Connectivity Indications March 2007 Slow-Start Restart After Idle", draft-ietf-tcpimpl-restart-00 (work in progress), March 1998. [I-D.ietf-tcpm-tcp-antispoof] Touch, J., "Defending TCP Against Spoofing Attacks", draft-ietf-tcpm-tcp-antispoof-06 (work in progress), February 2007. [I-D.ietf-tcpm-tcp-uto] Eggert, L. and F. Gont, "TCP User Timeout Option", draft-ietf-tcpm-tcp-uto-04 (work in progress), October 2006. [I-D.ietf-tsvwg-quickstart] Floyd, S., "Quick-Start for TCP and IP", draft-ietf-tsvwg-quickstart-07 (work in progress), October 2006. [I-D.swami-tcp-lmdr] Swami, Y., "Lightweight Mobility Detection and Response (LMDR) Algorithm for TCP", draft-swami-tcp-lmdr-07 (work in progress), March 2006. [KOODLI] Koodli, R. and C. Perkins, "Fast Handovers and Context Transfers in Mobile Networks", ACM Computer Communication Review, Vol. 31, No. 5, October 2001. [OTT] Ott, J. and D. Kutscher, "OTT Internet: IEEE 802.11b for Automobile Users", Proc. Infocom 2004, March 2004. [RFC1122] Braden, R., "Requirements for Internet Hosts - Communication Layers", STD 3, RFC 1122, October 1989. [RFC2131] Droms, R., "Dynamic Host Configuration Protocol", RFC 2131, March 1997. [RFC2460] Deering, S. and R. Hinden, "Internet Protocol, Version 6 (IPv6) Specification", RFC 2460, December 1998. [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition of Explicit Congestion Notification (ECN) to IP", RFC 3168, September 2001. [RFC3344] Perkins, C., "IP Mobility Support for IPv4", RFC 3344, August 2002. [RFC3775] Johnson, D., Perkins, C., and J. Arkko, "Mobility Support Schuetz, et al. Expires September 6, 2007 [Page 17] Internet-Draft TCP Response to Connectivity Indications March 2007 in IPv6", RFC 3775, June 2004. [RFC3819] Karn, P., Bormann, C., Fairhurst, G., Grossman, D., Ludwig, R., Mahdavi, J., Montenegro, G., Touch, J., and L. Wood, "Advice for Internet Subnetwork Designers", BCP 89, RFC 3819, July 2004. [RFC4306] Kaufman, C., "Internet Key Exchange (IKEv2) Protocol", RFC 4306, December 2005. [SCHUETZ] Schuetz, S., Eggert, L., Schmid, S., and M. Brunner, "Protocol Enhancements for Intermittently Connected Hosts", ACM Computer Communication Review, Vol. 35, No. 3, July 2005. [SCOTT] Scott, J. and G. Mapp, "Link layer-based TCP optimisation for disconnecting networks", ACM Computer Communication Review, Vol. 33, No. 5, October 2003. Editorial Comments [footnote-1] The authors have heard the idea of triggering retransmits based on connectivity events of directly- connected links being attributed to Phil Karn ("kick" operation in the KAQ9 TCP stack). A thread from the PILC mailing list in 2000 discusses some thoughts on this (http://www.isi.edu/pilc/list/archive/0691.html). [footnote-2] Although this specification introduces five new per- connection state variables, a preliminary implementation of an earlier revision of this mechanism [I-D.swami-tcp-lmdr] only required around a hundred lines of kernel code. Appendix A. Background: Classification of Connectivity Disruptions Connectivity disruptions can occur in many different situations. They can be due to wireless interference, movement out of a wireless coverage area, switching between access networks, or simply due to unplugging an Ethernet cable. Depending on the situation in which they occur, the implications of connectivity disruptions are different and must be handled appropriately. This section attempts to classify different types of connectivity disruptions and discusses their implications and impact on TCP. Two main properties of connectivity disruptions affect how TCP reacts to them: their duration and whether the path characteristics have Schuetz, et al. Expires September 6, 2007 [Page 18] Internet-Draft TCP Response to Connectivity Indications March 2007 significantly changed after they end. This document distinguishes between "short" and "long" disruptions and "changed" and "unchanged" path characteristics. Note that these two categories are orthogonal to each other, i.e., four types of connectivity disruptions exist. Connectivity disruptions are "short" for a given TCP connection, if connectivity returns before the RTO fires for the first time, i.e., when TCP is still in steady-state. In this case, standard TCP recovers lost data segments through Fast Retransmit and lost ACKs through successfully delivered later ACKs. Appendix A.1 briefly describes this case. Connectivity disruptions are "long" for a given TCP connection, if the RTO fires at least once before connectivity returns, i.e., when TCP is in exponential back-off. In this case, TCP can be inefficient in its retransmission scheme, as described in Appendix A.2. Whether or not path characteristics change when connectivity returns is a second important factor for TCP's retransmission scheme. Standard TCP implicitly assumes that path characteristics remain unchanged across short disruptions by performing Fast Retransmit using the path parameters collected before the disruption. For long disruptions, standard TCP is more conservative and performs slow- start, re-probing the path characteristics from scratch. However, the standard behavior can be inefficient due to when it is initiated. These implicit assumptions can cause standard TCP to misbehave or perform inefficiently in some scenarios. Figure 4 illustrates the standard TCP behavior. +-----------------------+-----------------------+ Short | Fast Retransmit using | Fast Retransmit using | Duration | currently collected | currently collected | < RTO | path characteristics | path characteristics | +-----------------------+-----------------------+ Long | | | Duration | Slow-start | Slow-start | >= RTO | | | +-----------------------+-----------------------+ Unchanged Path Changed Path Characteristics Characteristics Figure 4: Standard TCP behavior. Schuetz, et al. Expires September 6, 2007 [Page 19] Internet-Draft TCP Response to Connectivity Indications March 2007 A.1. Short Connectivity Disruptions One common cause of short connectivity disruptions that result in a change of the end-to-end path characteristics is transparent network layer mobility, via protocols such as Mobile IP, NEMO, or HIP. These protocols generally hide mobility events from the transport layer, but cannot mask the resulting changes to the end-to-end path that established TCP connections transmit over. Consider a Mobile IP scenario as shown in Figure 5. At time T, a mobile node MN attaches to access network Net-1, connected to the Internet through access router AR-1 and has the care-of address . It establishes a TCP connection to the correspondent node CN. While MN attaches to AR-1, packets between CN and follow PATH-1 (via Cloud-1 and AR-1). Assume that at some time T+1, MN moves and then attaches to Net-2, which is reachable through AR-2 with the care-of address . While MN attaches to AR-2, all packets between CN and follow PATH-2 (through Cloud-2 and AR-2). <---------PATH-1----------> /---------\ +------+ | | | | Net-1 +---+ Cloud-1 +---+ AR-1 +-----> MN (time=T) | | | | | | \----+----/ +---+--+ | | | | CN <------+ | PATH-3 | | | | | /----V----\ +-------+ V | | | | | +---+ Cloud-2 +---+ AR-2 +-----> MN (time=T+1) | | | | Net-2 \---------/ +-------+ <--------PATH-2-----------> Figure 5: Mobility example. During a transient disconnected period, MN may have disconnected from Net-1 and not yet attached to Net-2. Consequently, AR-1 may not be able to deliver packets to MN. This could result in a burst of packet losses. Several approaches for "fast" or "seamless" handovers exist that involve adding machinery to the ARs to buffer and redirect packets originally sent to Net-1 towards Net-2, rather than dropping them (e.g., [KOODLI]). Schuetz, et al. Expires September 6, 2007 [Page 20] Internet-Draft TCP Response to Connectivity Indications March 2007 As long as MN remains in Net-1, standard congestion control algorithms [RFC2581] are sufficient. However, once MN moves from Net-1 to Net-2, two different scenarios are possible depending on network topology: o In the first scenario, with standard Mobile IPv4, all packets destined to are dropped by AR-1 once MN has moved. Since the latency involved in establishing a new tunnel to the HA is on the order of the RTT (2*RTT in case of Mobile IPv6), roughly an entire window's worth of data and ACKs will be dropped by AR-1. Because of this burst loss, CN and MN are likely to incur expensive retransmission timeouts. o In the second scenario, with a fast handover mechanism in place, losses are masked through buffering and tunneling between routers AR-1 and AR-2. The exact sequence of buffering and forwarding between the ARs is not guaranteed to occur in a manner consistent with the available bandwidth of PATH-3 or conformant to TCP's clocking expectations. This can cause TCP's behavior over PATH-2 to be based on the unrelated properties of PATH-1 and PATH-3. After attaching to Net-2, reception of stale ACKs (for data sent on PATH-1) will cause MN to incorrectly inflate its congestion window. These stale ACKs do not provide any indication of the congestion along PATH-2. CN's congestion window becomes similarly inflated by ACKs that MN sends for data segments redirected over PATH-3. If the congestion windows from PATH-1 are already too big for PATH-2, this can overload Net-2 or PATH-2, causing packet loss and timeouts. On the other hand, if the available bandwidth along PATH-2 is greater than along PATH-1, and if the sender is in congestion avoidance, it will need potentially many RTTs before utilizing the available path capacity. This is due to relatively slow bandwidth increase during congestion avoidance caused by a stale SS_THRESH. (See [EDDY] for details.) A.2. Long Connectivity Disruptions For long disruptions, standard TCP performs slow-start after connectivity returns, because the retransmission timeout (RTO) has expired. This conservative strategy avoids overloading the new path. However, TCP's general exponential back-off retransmission strategy can time these slow-starts such that performance decreases. When a long connectivity disruption occurs along the path between a host and its peer while the host is transmitting data, it stops receiving ACKs. After the RTO expires, the host attempts to retransmit the first unacknowledged segment. TCP implementations Schuetz, et al. Expires September 6, 2007 [Page 21] Internet-Draft TCP Response to Connectivity Indications March 2007 that follow the recommended RTO management proposed in [RFC2988] double the RTO after each retransmission attempt until it exceeds 60 seconds. This scheme causes a host to attempt to retransmit across established connections roughly once a minute. (More frequently during the first minute or two of the connectivity disruption, while the RTO is still being backed off.) When the long connectivity disruption ends, standard TCP implementations still wait until the RTO expires before attempting retransmission. Figure 6 illustrates this behavior. Depending on when connectivity becomes available again, this can waste up to a minute of connectivity for TCPs that implement the recommended RTO management described in [RFC2988]. For TCP implementations that do not implement [RFC2988], even longer connectivity periods may be wasted. For example, Linux uses 120 seconds as the maximum RTO by default. Sequence number X = Successfully transmitted segment ^ O = Lost segment | : : : X | : : :X | OO O O O O : X | X: : : | X : :<------------>: | X : : Wasted : | X : : connection : |X : : time : +-----:---------------------:--------------:--------> : : : Time Connectivity Connectivity TCP gone back retransmit Figure 6: Standard TCP behavior in the presence of disrupted connectivity. This retransmission behavior is not efficient, especially in scenarios where connectivity periods are short and connectivity disruptions are frequent [OTT]. Experiments show that TCP performance across a path with frequent disruptions is significantly worse, compared to a similar path without disruptions [SCHUETZ]. In the ideal case, TCP would attempt a retransmission as soon as connectivity to its peer was re-established. Figure 7 illustrates the ideal behavior. Schuetz, et al. Expires September 6, 2007 [Page 22] Internet-Draft TCP Response to Connectivity Indications March 2007 Sequence number X = Successfully transmitted segment ^ O = Lost segment | : : X : | : :X : | OO O O O O X : | X: : : | X : :<------------>: | X : : Efficiency : | X : : improvement : |X : : : +-----:---------------------:--------------:--------> : : : Time Connectivity Connectivity Next gone back = immediate scheduled TCP retransmit retransmit Figure 7: Ideal TCP behavior in the presence of disrupted connectivity The ideal behavior is difficult to achieve for arbitrary connectivity disruptions. One obviously problematic approach would use higher- frequency retransmission attempts to enable earlier detection of whether connectivity has returned. This can generate significant amounts of extra traffic. Other proposals attempt to trigger faster retransmissions by retransmitting buffered or newly-crafted segments from inside the network [SCOTT][I-D.dawkins-trigtran-linkup][DUKE][RFC3819]. Note that scenarios exist where path characteristics remain unchanged after long connectivity disruptions. In this case, even an intelligently scheduled slow-start is inefficient, because TCP could safely resume transmitting at the old rate instead of slow-starting. Although originally developed to avoid line-rate bursts, techniques for the well-known "slow-start after idle" case [I-D.ietf-tcpimpl-restart] may be useful to further improve performance after a disruption ends in such a scenario. This document does not currently describe this additional optimization, and an open question remains on how unchanged path characteristics after long connectivity disruptions could be validated by an end host. Schuetz, et al. Expires September 6, 2007 [Page 23] Internet-Draft TCP Response to Connectivity Indications March 2007 Appendix B. Document Revision History +----------+--------------------------------------------------------+ | Revision | Comments | +----------+--------------------------------------------------------+ | 00 | Initial version. This document is a merge of and | | | obsoletes [I-D.eggert-tcpm-tcp-retransmit-now] and | | | [I-D.swami-tcp-lmdr]. | | 01 | Major revision of the description of the | | | connectivity-change indication TCP option and its | | | processing in Section 4. Other formatting changes to | | | the document include moving some background material | | | to the appendix. | +----------+--------------------------------------------------------+ Authors' Addresses Simon Schuetz NEC Network Laboratories Kurfuerstenanlage 36 Heidelberg 69115 Germany Phone: +49 6221 4342 165 Fax: +49 6221 4342 155 Email: simon.schuetz@netlab.nec.de URI: http://www.netlab.nec.de/ Lars Eggert Nokia Research Center P.O. Box 407 Nokia Group 00045 Finland Phone: +358 50 48 24461 Email: lars.eggert@nokia.com URI: http://research.nokia.com/people/lars_eggert/ Schuetz, et al. Expires September 6, 2007 [Page 24] Internet-Draft TCP Response to Connectivity Indications March 2007 Wesley M. Eddy Verizon Federal Network Systems NASA Glenn Research Center 21000 Brookpark Road, MS 54-5 Cleveland, OH 44135 USA Email: weddy@grc.nasa.gov Yogesh Prem Swami Nokia Research Center, Dallas 955 Page Mill Road Palo Alto, California 94304 USA Phone: +1 972 374 0669 Email: yogesh.swami@nokia.com Khiem Le Nokia Research Center, Dallas 6000 Connection Drive Irving, TX 75603 USA Phone: +1 972 342 3502 Email: khiem.le@nokia.com Schuetz, et al. Expires September 6, 2007 [Page 25] Internet-Draft TCP Response to Connectivity Indications March 2007 Full Copyright Statement Copyright (C) The IETF Trust (2007). This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights. This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Intellectual Property The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79. Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf-ipr@ietf.org. Acknowledgment Funding for the RFC Editor function is provided by the IETF Administrative Support Activity (IASA). Schuetz, et al. Expires September 6, 2007 [Page 26]