TCPM Working Group S. Schuetz Internet-Draft NEC Intended status: Experimental N. Koutsianas Expires: August 25, 2008 L. Eggert Nokia W. Eddy Verizon Y. Swami Nokia K. Le NSN February 22, 2008 TCP Response to Lower-Layer Connectivity-Change Indications draft-schuetz-tcpm-tcp-rlci-03 Status of this Memo By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79. This document may not be modified, and derivative works of it may not be created, except to publish it as an RFC and to translate it into languages other than English. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on August 25, 2008. Copyright Notice Copyright (C) The IETF Trust (2008). Schuetz, et al. Expires August 25, 2008 [Page 1] Internet-Draft TCP Response to Connectivity Indications February 2008 Abstract When the path characteristics between two hosts change abruptly, TCP can experience significant delays before resuming transmission in an efficient manner or TCP can behave unfairly to competing traffic. This document describes TCP extensions that improve transmission behavior in response to advisory, lower-layer connectivity-change indications. The proposed TCP extensions modify the local behavior of TCP and introduce a new TCP option to signal locally received connectivity-change indications to remote peers. Performance gains result from a more efficient transmission behavior and there is no difference in aggressiveness in comparison to a newly-started connection. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 3. Motivation and Overview . . . . . . . . . . . . . . . . . . . 4 4. Connectivity-Change Indications . . . . . . . . . . . . . . . 6 5. TCP Response to Connectivity-Change Indications (CCIs) . . . . 7 5.1. Connectivity-Change Indication (CCI) TCP Option . . . . . 9 5.2. Generation and Processing of Connectivity-Change Indication TCP Options . . . . . . . . . . . . . . . . . . 11 5.3. Re-Probing Path Characteristics . . . . . . . . . . . . . 15 5.4. Speculative Retransmission . . . . . . . . . . . . . . . . 16 6. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . 16 6.1. Triggered Segment Transmission during Steady-State . . . . 17 6.2. Impact of Packet Loss . . . . . . . . . . . . . . . . . . 17 6.3. Use of Limited Transmit with RLCI . . . . . . . . . . . . 18 6.4. Simultaneous Processing of Connectivity-Change Indications . . . . . . . . . . . . . . . . . . . . . . . 19 7. Security Considerations . . . . . . . . . . . . . . . . . . . 19 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 20 9. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 20 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 20 10.1. Normative References . . . . . . . . . . . . . . . . . . . 20 10.2. Informative References . . . . . . . . . . . . . . . . . . 21 Editorial Comments . . . . . . . . . . . . . . . . . . . . . . . . Appendix A. Background: Classification of Connectivity Disruptions . . . . . . . . . . . . . . . . . . . . . 23 A.1. Short Connectivity Disruptions . . . . . . . . . . . . . . 25 A.2. Long Connectivity Disruptions . . . . . . . . . . . . . . 27 Appendix B. Document Revision History . . . . . . . . . . . . . . 29 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 30 Intellectual Property and Copyright Statements . . . . . . . . . . 32 Schuetz, et al. Expires August 25, 2008 [Page 2] Internet-Draft TCP Response to Connectivity Indications February 2008 1. Introduction The Transmission Control Protocol (TCP) [RFC0793] generally assumes that the end-to-end path between two hosts has characteristics that are relatively stable over the lifetime of a connection. Although TCP's congestion control algorithms [RFC2581] can adapt to changes to the path characteristics after several round-trip times, they fail to support efficient operation in the few round-trip times immediately after a significant path change. This is due to the granularity of TCP's sampling mechanisms. Significant changes to path connectivity include loss or reestablishment of connectivity, and drastic, abrupt changes in round-trip time (RTT) or available bandwidth. Connectivity changes that occur on such short time-scales are becoming more common, due to host mobility or intermittent network attachment. This document describes a set of complementary TCP extensions that improve behavior when transmitting over paths whose characteristics can change on short time-scales. TCP implementations that support these extensions respond to receiving generic, link-technology- independent, per-connection connectivity-change indications from lower layers. A connectivity-change indication signals that the characteristics of the end-to-end path between the local node and its peer have changed in some undefined way. The response mechanisms proposed for TCP act on this information in a conservative fashion. The specific response depends on the current state of a connection when a connectivity-change indication is received. It is important to note that this addition of response mechanisms to lower-layer information is following an established precedent. TCP and other transport protocols already react to information and signals from lower layers; the proposed connectivity-change indications thus extend an established interface between layers in the protocol stack. TCP measures the end-to-end path to implicitly derive network-layer information. TCP also directly reacts to network-layer signals delivered via ICMP, for example, "Port Unreachable" or the now-deprecated "Source Quench" [RFC1122]. Explicit Congestion Notification (ECN) [RFC3168] and Quick-Start [RFC4782] are other sources of network-layer information for which response mechanisms for TCP have been defined. Connectivity-change indications are yet another source of lower-layer information that TCP can use to improve its operation. A second important point to note is that the TCP response mechanisms to connectivity-change indications are purely optional efficiency improvements. In the absence of connectivity-change indications, a TCP that implements these changes behaves identically to an unmodified TCP. When lower layers provide connectivity-change Schuetz, et al. Expires August 25, 2008 [Page 3] Internet-Draft TCP Response to Connectivity Indications February 2008 indications that trigger the response mechanisms, they enhance TCP operation based on the explicit lower-layer information that is signaled. These response mechanisms do not increase the aggressiveness of TCP. Note that the IAB has recently described architectural issues of "link indications" [RFC4907]. The authors feel that this term is not quite accurate in this environment, because transport mechanisms should remain link-technology-agnostic. However, transport protocols have always acted on network-layer information and signals, such as measured path characteristics or ICMP-signaled conditions. Because of the growing proliferation of shim layers between the traditional network and transport layers, this document uses the term "lower- layer indication" to remain independent of specific network or shim layers. Note that it is currently an open question as to whether additional lower-layer indications can provide further information to transport protocols. Also, this document only describes response mechanisms for TCP, although other transport protocols may benefit from similar response mechanisms to react to connectivity-change indications. 2. Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119]. The following abbreviations are used throughout the document: +------+---------------------------------------------------------+ | CCI | Connectivity-Change Indication | | RLCI | Response to Lower-layer Connectivity-change Indications | +------+---------------------------------------------------------+ Table 1: Abbreviations 3. Motivation and Overview Several proposed network-layer extensions support host mobility, including Mobile IPv4 [RFC3344], Mobile IPv6 [RFC3775] and HIP [I-D.ietf-hip-mm]. Typically, they shield transport-layer protocols from mobility events and enable them to sustain established connections across mobility events. However, the path characteristics that established connections experience after a mobility event may have changed drastically and on short time-scales. Schuetz, et al. Expires August 25, 2008 [Page 4] Internet-Draft TCP Response to Connectivity Indications February 2008 Congestion control, RTT and path-MTU state gathered over an old path before the move generally have no meaning for the new path. Because TCP uses stale information when resuming transmission over the new path, it can be either too aggressive or highly inefficient. Similar conditions may be found when fail-overs occur for multihomed hosts through the shim6 protocol. Some background on the types of scenarios that the technology described in this document is designed to work within is found in Appendix A. TCP already forces a slow-start restart in some cases where the network state becomes unknown, such as after an idle period or heavy losses. A first part of the response specified in this document involves a similar return to initial slow-start state in response to connectivity-change indications that are received while a connection is transmitting in steady-state. Note that this behavior is more conservative than the standard TCP response or lack of response. Some performance gains with the proposed mechanisms are due to either avoiding overloading the new path, which typically incurs an RTO, or using slow-start to quickly detect new capacity far above the point where steady-state had previously been near. A second response component improves TCP operation in the presence of temporary connectivity disruptions. These disruptions can occur independently of mobility events and, for example, may be due to insufficient wireless access coverage or nomadic computer use. Connectivity disruptions can severely decrease TCP performance. The main reason for this decrease is TCP's retransmission behavior after a connectivity disruption [SCHUETZ]. TCP uses periodic retransmission attempts in exponentially increasing intervals, which can unnecessarily delay retransmissions after connectivity returns. In the extreme case, TCP connections can even abort, if the disruption is longer than the TCP "user timeout". (Connection aborts are out of scope for this document but can be prevented by the TCP User Timeout Option [I-D.ietf-tcpm-tcp-uto].) This second response action executes when receiving a connectivity- change indication while a connection is stalled in exponential back- off. It improves TCP retransmission behavior after connectivity is restored through an immediate speculative retransmission attempt [footnote-1]. Similar to the first response component, the second one also increases TCP performance through a more intelligent transmission behavior that uses periods of connectivity more efficiently. In comparison to startup of a new connection, it does not cause significant amounts of additional traffic and it does not change TCP's congestion control algorithms. Finally, this draft specifies a third response component, which is a new TCP option that notifies the connection's remote peer of a Schuetz, et al. Expires August 25, 2008 [Page 5] Internet-Draft TCP Response to Connectivity Indications February 2008 connectivity-change event detected locally. This is useful because connectivity-change indications typically require appropriate responses at both ends of a connection, but may only be received or detected by one end. The other parts of the response to a connectivity-change indication are independent of the indication's source (locally notified or remotely signaled) and depend only on the specific indication and the state of the connection for which it was received. 4. Connectivity-Change Indications The focus of this document is on specifying TCP response mechanisms to lower-layer connectivity-change indications. This section briefly describes how different network- and shim-layer mechanisms underneath the transport layer may provide these connectivity-change indications to TCP. This section is included for clarification only; details on connectivity indication sources are out of scope of this document. When lower layers detect a connectivity-change event, they generate corresponding connectivity-change indications. Lower-layer events that could trigger such an indication include (but are not limited to): o the IP address of the local outbound interface used for a given connection has changed, e.g., due to DHCP [RFC2131] or IPv6 router advertisements [RFC2460]; o link-layer connectivity of the local outbound interface used for a given connection has changed, e.g., link-layer "link up" event [RFC4957]; o the local outbound interface used for a given connection has changed, due to routing changes or link-layer connectivity changes at other interfaces (including tunnel establishment or teardown, e.g., in response to IKE events [RFC4306]); o a Mobile IP binding update has completed [RFC3775]; o a HIP readdressing update has completed [I-D.ietf-hip-mm]; o a path-change signal from the network has arrived (possible in theory, depends on network capabilities); o other notifications as defined by the IETF's Detecting Network Attachment (DNA) working group have occurred [RFC4957]. Note that the list above only describes some potential sources for Schuetz, et al. Expires August 25, 2008 [Page 6] Internet-Draft TCP Response to Connectivity Indications February 2008 connectivity-change events. Other sources exist, but the details on when to generate such events are out of the scope of this document, which focuses on the TCP response mechanisms when such events are received. 5. TCP Response to Connectivity-Change Indications (CCIs) A TCP connection can receive a connectivity-change indication (CCI) either from its local stack ("local CCI") or through a new "connectivity-change indication TCP option" from its peer ("remote CCI"). Section 5.1 specifies this new TCP option. In either case, upon reception of a CCI, the TCP RLCI (Response to Lower-layer Connectivity-change Indications) mechanisms defined in this document immediately re-probe path characteristics. They do this by either performing a speculative retransmission or by sending a single segment of new data or a pure ACK, depending on whether the connection is currently stalled in exponential back-off or transmitting in steady-state, respectively. A connection is "stalled in exponential back-off", if at least one segment was retransmitted due to a RTO expiration but has not been ACK'ed yet. The remainder of this section first defines the format of the new CCI TCP option in Section 5.1 and its processing in Section 5.2. After that, the two TCP response mechanisms triggered by receiving CCIs - re-probing path characteristics and speculative retransmission - are described in Section 5.3 and Section 5.4. The TCP RLCI mechanisms defined in this document depend on the TCP Timestamps option (TSopt) [RFC1323]. Consequently, it is REQUIRED that an end host that wishes to use the RLCI mechanisms for a TCP connection negotiate the use of TCP Timestamps options with its peer. If this negotiation fails, a host MUST NOT use the RLCI mechanisms for a connection. TCP Timestamps options are needed by the RLCI mechanisms during the following operations: o To re-probe the path characteristics after a connectivity-change indication. A host uses the TS Echo Reply (TSecr) field of a TCP Timestamps option to distinguish whether incoming ACKs are for segments that have been transmitted before or after CCI. o To identify a new remote CCI. A host uses the TS Value (TSval) field of an incoming TCP Timestamps option to distinguish a new remote CCI from the delayed reception of an old one. As a result, last remote CCI is defined as the one received with the highest TS Value. Section 5.2 and Section 5.3 give more details about how the RLCI Schuetz, et al. Expires August 25, 2008 [Page 7] Internet-Draft TCP Response to Connectivity Indications February 2008 mechanisms use TCP Timestamps options. An implementation of the RLCI mechanisms defined in this document maintains nine new state variables per TCP connection. [footnote-2] LOCAL_CCI It is a 1-bit counter, having an initial value of 0. It is used for distinguishing the existence of a new local CCI. It changes its value every time a new local CCI received from the local stack starts being processed. REMOTE_CCI It holds a copy of the last CCI value advertised by the peer through a CCI TCP option. This is a 1-bit counter initialized to 0 and gets updated in response to remote CCIs according to the rules defined in Section 5.2. LOCAL_CCI_STATUS It holds the status of the processing of local CCIs. It can have three possible values: LOCAL_CCI_IDLE (0), LOCAL_CCI_NEW (1), LOCAL_CCI_ECHO_ACK (2). The initial value is LOCAL_CCI_IDLE. REMOTE_CCI_STATUS It holds the status of the processing of the last remote CCI advertised by the peer through a CCI TCP option. It can have two possible values: REMOTE_CCI_IDLE (0), REMOTE_CCI_ECHO (1). The initial value is REMOTE_CCI_IDLE. LAST_CCI_TIME It holds the local time when the last CCI (either local or remote) was received. It is updated every time either LOCAL_CCI or REMOTE_CCI is modified. REMOTE_CCI_PEER_TIME This variable is used in order to distinguish new remote CCIs from the retransmissions of the past ones. It holds the TS Value (TSval) of the Timestamps option of the segment advertising the last remote CCI. It is initialized when receiving the first segment from the peer and it is updated every time REMOTE_CCI is modified. LOCAL_CCI_PEER_ECHO_TIME This variable is used in order to distinguish the echo of a new local CCI from delayed retransmissions of echoes of older local CCIs. It holds the TS Value (TSval) of the Timestamps option of the segment that echoed the last local CCI. It is initialized when receiving the first segment from the peer and it is updated every time LOCAL_CCI_STATUS changes from LOCAL_CCI_NEW to Schuetz, et al. Expires August 25, 2008 [Page 8] Internet-Draft TCP Response to Connectivity Indications February 2008 LOCAL_CCI_ECHO_ACK. CCI_SNDMAX Retains the highest sequence number transmitted when the most recent CCI (either local or remote) was received. CCI_CONTROLLED_CWND It is a Boolean variable that sets an additional condition controlling the increment of TCPs congestion window (CWND). Having an initial value of false, it is updated according to the rules defined in Section 5.2. 5.1. Connectivity-Change Indication (CCI) TCP Option Connectivity-change indications (CCIs) are generally asymmetric, i.e., they may occur or be detected by one end but not the other. The basic idea behind the CCI option is to signal the occurrence of local CCIs to the other end, in order to allow also the other end to respond appropriately. Note that this assumes that paths will generally be symmetric, meaning that a CCI received by one end for its path to the other end will imply that the characteristics of the reverse path have changed, too. 1 2 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 +---------------+---------------+-----+-+-+---+-+ | | | R | | | |E| | Kind = X | Length = 3 | E |C|E| C |C| | | | S | |C| S |S| +---------------+---------------+-----+-+-+---+-+ Figure 1: Format of the connectivity-change indication TCP option. Figure 1 shows the format of the CCI option. It contains these fields: Kind (8 bits) The TCP option number X [RFC0793] allocated by IANA upon publication of this document (see Section 8). Length (8 bits) Length of the TCP option in octets [RFC0793]; its value MUST be 3. RES (3 bits) Reserved bits. The sender SHOULD set these to zero and the receiver MUST ignore them. Schuetz, et al. Expires August 25, 2008 [Page 9] Internet-Draft TCP Response to Connectivity Indications February 2008 C (1 bit) Current value of LOCAL_CCI of the end sending the option. EC (1 bit) Echoed value of C, i.e., the current value of REMOTE_CCI of the end sending the option. CS (2 bit) Current value of LOCAL_CCI_STATUS of the end sending the option. ECS (1 bit) Current value of REMOTE_CCI_STATUS of the end sending the option. The CCI option contains two single-bit fields (C and EC) used to distinguish new CCIs from delayed retransmissions of past ones. It also contains some flags representing the status of each CCI processing. These flags are used for a 3-way handshake ensuring that both parties have been informed of a new CCI. At the beginning of a connection, LOCAL_CCI and REMOTE_CCI MUST be set to 0. LOCAL_CCI_STATUS and REMOTE_CCI_STATUS MUST be set to LOCAL_CCI_IDLE and REMOTE_CCI_IDLE, respectively. A host actively opening a connection and wishing to use the CCI option for that connection MUST include a CCI option in its SYN segment with C := 0, CS := LOCAL_CCI_IDLE, EC := 0 and ECS := REMOTE_CCI_IDLE in order to advertise support for the TCP CCI option. A host receiving a SYN segment MUST NOT include a CCI option in its SYN-ACK or any subsequent segment, unless it has received a CCI option in the corresponding SYN. In case a host has received a CCI option in the SYN segment, it MUST echo that CCI option in its SYN- ACK segment, i.e., it MUST set C := 0, CS := LOCAL_CCI_IDLE, EC := 0 and ECS := REMOTE_CCI_IDLE. A host MUST NOT process any following CCI options unless one was included in both the SYN and SYN-ACK and both peers have enabled TCP Timestamps for the connection. Section 5.2.1 and Section 5.2.2 describe the processing rules in detail. A host MUST send a CCI option in all outgoing segments whenever LOCAL_CCI_STATUS is not LOCAL_CCI_IDLE or REMOTE_CCI_STATUS is not REMOTE_CCI_IDLE (or both). A host MUST NOT send a CCI option when LOCAL_CCI_STATUS is LOCAL_CCI_IDLE and REMOTE_CCI_STATUS is REMOTE_CCI_IDLE, i.e., when the host is not currently processing any CCI. The only exceptions to that rule are SYN and SYN-ACK segments. Whenever sending any CCI option, C MUST be set to the current LOCAL_CCI, EC MUST be set to the current REMOTE_CCI, CS MUST be set to LOCAL_CCI_STATUS and ECS MUST be set to REMOTE_CCI_STATUS, respectively. Schuetz, et al. Expires August 25, 2008 [Page 10] Internet-Draft TCP Response to Connectivity Indications February 2008 5.2. Generation and Processing of Connectivity-Change Indication TCP Options Processing of a connectivity-change indication can be separated into two parts: 1. Processing in "initiator" mode, i.e., when a host receives a local CCI and (reliably) forwards it to the other end through a CCI option. 2. Processing in "responder" mode, i.e., when a host that receives a remote CCI in a CCI option from the other end. Section 5.2.1 and Section 5.2.2 describe the state machines at an initiator and a responder, respectively. Note that a single host can be both - initiator and responder - at the same time. This can happen if a local CCI occurs while processing for a remote CCI is ongoing, or vice versa. The following events, conditions and actions are used in the definition of the two state machines: Events: E_LOCAL_CCI Local end received a local CCI. E_REMOTE_CCI Local end received information about a remote CCI, i.e., received a TCP segment that includes a CCI option. E_SEGMENT_SENT Local end sent a TCP segment that includes the CCI option. Conditions: C_NEW_REMOTE_CCI A received CCI option signals a new remote CCI, i.e., C != REMOTE_CCI, CS == LOCAL_CCI_NEW and the TSval of the Timestamps option of the received segment is greater than the current REMOTE_CCI_PEER_TIME (TSval > REMOTE_CCI_PEER_TIME). C_ECHOED_LOCAL_CCI A received CCI option echoes the last local CCI, i.e., EC == LOCAL_CCI, ECS == REMOTE_CCI_ECHO and the TSval of the Timestamps option of the received segment is greater than the current LOCAL_CCI_PEER_ECHO_TIME (TSval > LOCAL_CCI_PEER_ECHO_TIME). Schuetz, et al. Expires August 25, 2008 [Page 11] Internet-Draft TCP Response to Connectivity Indications February 2008 C_ECHOED_REMOTE_CCI A received CCI option acknowledges that the peer has received the echo of its last local CCI, i.e., C == REMOTE_CCI, CS == LOCAL_CCI_ECHO_ACK and the TSval of the Timestamps option of the received segment is greater than the current REMOTE_CCI_PEER_TIME (TSval > REMOTE_CCI_PEER_TIME). Actions: A_TGL_LOCAL_CCI Toggle LOCAL_CCI. A_TGL_REMOTE_CCI Toggle REMOTE_CCI. A_REPROBE_PATH TCP discards all congestion control information gathered on the current path, initializes them to the defaults and re-probes path characteristics based only on the segments transmitted after this event, as described in Section 5.3. In other words, CCI_CONTROLLED_CWND := 1, LAST_CCI_TIME := current local time, CCI_SNDMAX := highest sequence number transmitted so far and the congestion control state (CWND and SS_THRESH), round-trip time measurement (RTTM) state and RTO timer are reset to the initial values for a new connection. Additionally, if the connection is stalled in exponential back-off, TCP MUST act as if RTO had expired and start the speculative retransmission procedure described in Section 5.4. A_FORCE_SEND Force transmission of a segment that MUST include a CCI option, in order to inform the other peer about the local CCI. If the connection is stalled in exponential back-off, this is taken care of by the speculative retransmission procedure described in Section 5.4. If the connection is in steady-state and there is new data to be sent, TCP MUST immediately send a single segment of new data including a CCI option. If there is no new data to be sent, TCP MUST immediately send a pure ACK including a CCI option. A_UPD_CCI_PEER_TIME Set REMOTE_CCI_PEER_TIME to the TSval value of the TCP Timestamps option of the received segment. A_UPD_CCI_PEER_E_TIME Set LOCAL_CCI_PEER_ECHO_TIME to the TSval value of the TCP Timestamps option of the received segment. Schuetz, et al. Expires August 25, 2008 [Page 12] Internet-Draft TCP Response to Connectivity Indications February 2008 5.2.1. Initiator Mode Processing This section describes the initiator mode processing of a TCP host implementing RLCI. In initiator mode, a host signals the occurrence of a local CCI to its peer, until the peer echoes reception of that CCI. After receiving the echo, the host needs to acknowledge the echo reception, resulting in a 3-way handshake. Figure 2 shows the corresponding state machine. At the beginning of a connection, i.e., before the first local CCI occurs, LOCAL_CCI is 0 and LOCAL_CCI_STATUS is LOCAL_CCI_IDLE. This remains the case until TCP receives a local CCI (E_LOCAL_CCI). When that happens, TCP toggles LOCAL_CCI (A_TGL_LOCAL_CCI), sets LOCAL_CCI_STATUS := LOCAL_CCI_NEW, starts re-probing the new path (A_REPROBE_PATH) and forces a segment to be sent to the peer (A_FORCE_SEND). Note that all subsequently transmitted segments MUST contain a CCI option until LOCAL_CCI_STATUS becomes LOCAL_CCI_IDLE. After the host receives the echo of the local CCI (C_ECHOED_LOCAL_CCI), it updates LOCAL_CCI_PEER_ECHO_TIME (A_UPD_CCI_PEER_E_TIME) and sets LOCAL_CCI_STATUS := LOCAL_CCI_ECHO_ACK. The initiator remains in this state until it can send a segment with the CCI option (E_SEGMENT_SENT) that acknowledges reception of the CCI echo. At that time, it sets LOCAL_CCI_STATUS := LOCAL_CCI_IDLE. The transition from LOCAL_CCI_IDLE to LOCAL_CCI_ECHO_ACK occurs if a segment acknowledging the reception of a CCI echo is lost, and the initiator retransmits the echo acknowledgment. When a local CCI occurs (E_LOCAL_CCI) while LOCAL_CCI_STATUS != LOCAL_CCI_IDLE, the host MUST ignore it and MUST NOT alter LOCAL_CCI, because it is already processing another local CCI. Schuetz, et al. Expires August 25, 2008 [Page 13] Internet-Draft TCP Response to Connectivity Indications February 2008 E_LOCAL_CCI => A_TGL_LOCAL_CCI E_REMOTE_CCI A_REPROBE_PATH C_ECHOED_LOCAL_CCI=> A_FORCE_SEND A_UPD_CCI_PEER_E_TIME +----------------+ +----------------+ | | | | | | | | | | | | | V | V +----------------+ +----------------+ +----------------+ | | | | | | |LOCAL_CCI_STATUS| |LOCAL_CCI_STATUS| |LOCAL_CCI_STATUS| | == | | == | | == | |LOCAL_CCI_IDLE | |LOCAL_CCI_NEW | |LOCAL_CCI_ECHO_ | | | | | |ACK | +----------------+ +----------------+ +----------------+ ^ | ^ | | | | | | +-----------------------------------+ | | E_REMOTE_CCI | | C_ECHOED_LOCAL_CCI | | | | | +-----------------------------------------+ E_SEGMENT_SENT Figure 2: State machine for initiator processing. 5.2.2. Responder Mode Processing This section describes the responder mode processing of CCIs for a TCP host implementing the CCI option. In responder mode, a host echoes the last received remote CCI to its peer, until it can be sure that the peer correctly received the echo. Figure 3 shows the corresponding state machine. At the beginning of a connection, REMOTE_CCI is 0 and REMOTE_CCI_STATUS is REMOTE_CCI_IDLE, i.e., the local host is not processing any remote CCIs. When TCP receives a segment with a CCI option (E_REMOTE_CCI) signaling a new remote CCI (C_NEW_REMOTE_CCI), it increments REMOTE_CCI (A_TGL_REMOTE_CCI), changes REMOTE_CCI_STATUS to REMOTE_CCI_ECHO, updates REMOTE_CCI_PEER_TIME according to TSval (A_UPD_CCI_PEER_TIME), starts re-probing the new path (A_REPROBE_PATH) and forces a segment to be sent to the peer Schuetz, et al. Expires August 25, 2008 [Page 14] Internet-Draft TCP Response to Connectivity Indications February 2008 (A_FORCE_SEND). Note that all subsequently transmitted segments MUST contain a CCI option until REMOTE_CCI_STATUS is again REMOTE_CCI_IDLE. This transition occurs when the peer acknowledges the reception of the CCI echo (C_ECHOED_REMOTE_CCI). E_REMOTE_CCI E_REMOTE_CCI C_NEW_REMOTE_CCI => C_NEW_REMOTE_CCI => A_TGL_REMOTE_CCI A_TGL_REMOTE_CCI A_UPD_CCI_PEER_TIME A_UPD_CCI_PEER_TIME A_REPROBE_PATH A_REPROBE_PATH A_FORCE_SEND A_FORCE_SEND +-----------------+ +-------------+ | | | | | V | | +-----------------+ +-----------------+ | |REMOTE_CCI_STATUS| |REMOTE_CCI_STATUS| | | == | | == | | |REMOTE_CCI_IDLE | |REMOTE_CCI_ECHO | | +-----------------+ +-----------------+ | ^ | ^ | | | | | +-----------------+ +-------------+ E_REMOTE_CCI C_ECHOED_REMOTE_CCI Figure 3: State machine for responder processing. If TCP receives a new remote CCI while REMOTE_CCI_STATUS == REMOTE_CCI_ECHO, this indicates that the acknowledgment of a previous CCI echo may have been lost and that the peer had a new CCI occur. In this case, TCP MUST perform the same actions as if REMOTE_CCI_STATUS == REMOTE_CCI_IDLE. 5.3. Re-Probing Path Characteristics When a TCP connection receives a new CCI, it MUST re-probe path characteristics in order to prevent causing congestion by transmitting based on stale path state information. In principle, this is similar to the initial slow-start: The sender MUST NOT transmit more than the default initial window (INIT_WINDOW) of data after a new CCI is received and it MUST reset the congestion control state (CWND and SS_THRESH), round-trip time measurement (RTTM) state and RTO timer, as if this were a new connection [RFC2581][RFC2988]. Schuetz, et al. Expires August 25, 2008 [Page 15] Internet-Draft TCP Response to Connectivity Indications February 2008 If Path MTU Discovery (PMTUD) is in use, the PMTUD state MUST also be reset [RFC1191][RFC1981][RFC4821]. One difference to an initial slow-start is that after a CCI, the connection may have segments in flight towards the destination along a previous path. Therefore, after a CCI, TCP MUST ignore any ACKs received for data that was sent before the CCI and it MUST update the congestion window solely based on ACKs for data that was sent after the CCI occurred. The mechanism used for distinguishing ACKs for data sent after a CCI occurred from ACKs for data sent before a CCI occurred uses TCP Timestamps options. When a host receives a new CCI (either local or remote), LAST_CCI_TIME MUST be set to the current local time, CCI_SNDMAX MUST be set to the highest sequence number transmitted so far and CCI_CONTROLLED_CWND MUST be set to true. While CCI_CONTROLLED_CWND == true, TCP MUST update the congestion window based only on inbound ACKs that contain a TS Echo Reply (TSecr) value greater than or equal to LAST_CCI_TIME. Any inbound ACK with a TS Echo Reply (TSecr) value less than LAST_CCI_TIME MUST NOT cause an update to the congestion window, even if it advances the window. If CCI_CONTROLLED_CWND is true and the host receives an ACK with a sequence number greater than or equal to CCI_SNDMAX, CCI_CONTROLLED_CWND MUST be set to false and the congestion control algorithm MUST begin to process all ACKs normally, without checking their Timestamps options. 5.4. Speculative Retransmission The basic idea behind the speculative retransmission is to allow TCP to resume stalled connections as soon as it receives an indication that connectivity to previously unreachable peers may have returned. When a TCP connection receives a new CCI - either from the local stack or in a CCI TCP option from the peer - and is currently stalled in exponential back-off, it MUST immediately initiate the standard retransmission procedure, just as if the RTO for the connection had expired. 6. Discussion This section discusses some design choices of the RLCI mechanisms that can affect TCP performance under certain circumstances. Schuetz, et al. Expires August 25, 2008 [Page 16] Internet-Draft TCP Response to Connectivity Indications February 2008 6.1. Triggered Segment Transmission during Steady-State A TCP stack that implements RLCI mechanisms and receives a local CCI immediately sends a TCP segment (A_FORCE_SEND) in order to inform the other end of the CCI and resets all path information (A_REPROBE_PATH). When TCP is stalled in exponential back-off, this is taken care of by the speculative retransmission procedure that is triggered by the CCI. On the other hand, when TCP is in steady-state, it sends a new segment (A_FORCE_SEND) if there is any new data queued for transmission. As usual, the number of unacknowledged segments is limited by CWND. However, CWND has just been reset to its initial value. This means that there is a possibility that the transmission sends a segment that is outside the current congestion window. Although this behavior may appear to be aggressive, it is in fact as conservative as a newly starting connection, because only a single unacknowledged segment is sent along the path after CCI. 6.2. Impact of Packet Loss If a connection is in exponential back-off when a CCI occurs, TCP considers all unacknowledged segments to be lost and the speculative retransmission procedure immediately starts. On the other hand, if the connection is in steady-state when a CCI occurs, TCP considers all unacknowledged segments to still be in flight and continues sending new data. Depending on what caused a CCI, four scenarios are possible that differ in what happens to segments and ACKs in flight: 1. All (or at least the vast majority of) segments and ACKs in flight reach their respective destinations, i.e., there are no losses. In this case, TCP acts as if a new connection had started and re-probes the new path. 2. Some of the ACKs in flight from the receiver to the sender are lost. In this case, TCP behaves exactly as above, because a cumulative ACK for the new segment sent along the path after the CCI acknowledges all the previous unacknowledged segments. 3. Some of the data segments in flight from the sender to the receiver are lost. In this case, the new data segment transmitted after the CCI causes a duplicate ACK. As this duplicate ACK does not cause TCP to send another data segment, the connection stalls and a RTO occurs. After RTO, the standard retransmission procedure takes place with SS_THRESH equal to INITIAL_WINDOW/2 (i.e., the minimum allowed). This disables slow Schuetz, et al. Expires August 25, 2008 [Page 17] Internet-Draft TCP Response to Connectivity Indications February 2008 start and causes a severely decreased performance. A possible solution is to execute the speculative retransmission procedure after receiving a CCI even if the connection is in steady-state. 4. Some of the data segments and some of the ACKs that are in flight are lost. This case is similar to the previous one. In all these cases, it is also possible that the round-trip time changes significantly after the CCI, reordering data segments and ACKs that are still in flight with ones sent after the CCI. These reorderings appear to TCP as losses, and may result in the connection experiencing one of the above cases even if there was no actual packet loss. 6.3. Use of Limited Transmit with RLCI As described in the previous section, when a connection is in steady- state, a connectivity-change indication (CCI) resets all path information of TCP and causes one new data segment to be sent. In case of significant data segment loss before a CCI, the new data segment transmitted after a CCI causes a duplicate ACK. As this duplicate ACK does not trigger TCP to send another data segment, the connection stalls and an RTO occurs. Limited Transmit [RFC3042] can be used in case of packet loss in order to cause the transmission of three duplicate ACKs and trigger the fast retransmission procedure. As it must not cause an amount of outstanding data more than the congestion window plus two segments, it cannot always be used after a CCI due to the initialized CWND. If the connection has more outstanding data than INITIAL_WINDOW plus two segments before a CCI, resetting of CWND to the initial value after CCI causes an amount of outstanding data greater than the new CWND plus two segments and disables Limited Transmit. A modified Limited Transmit algorithm can be used in combination with RLCI: If CCI_CONTROLLED_CWND is true: The Limited Transmit Algorithm as described in [RFC3042] should be followed, but without checking the amount of outstanding data, i.e., if a TCP sender has previously unsent data queued for transmission it should transmit new data upon the arrival of the first two consecutive duplicate ACKs when the receiver's advertised window allows this transmission. Schuetz, et al. Expires August 25, 2008 [Page 18] Internet-Draft TCP Response to Connectivity Indications February 2008 If CCI_CONTROLLED_CWND is false: The Limited Transmit Algorithm as described in [RFC3042] should be followed unmodified. When the fast retransmission procedure is triggered by the modified Limited Transmit after a CCI, SS_THRESH is set to INITIAL_WINDOW/2 (i.e., the minimum allowed) as CWND before fast retransmission was equal to INITIAL_WINDOW. As a result, slow-start is disabled causing decreased TCP performance. A minor modification can keep SS_THRESH unmodified in the previous case, i.e., if CCI_CONTROLLED_CWND == true and CWND == INITIAL_WINDOW, keep SS_THRESH unmodified (having its initial value) upon the reception of the third duplicate ACK that triggers the fast retransmission procedure. 6.4. Simultaneous Processing of Connectivity-Change Indications As mentioned in Section 5.2.1, if a local CCI occurs (E_LOCAL_CCI) while LOCAL_CCI_STATUS != LOCAL_CCI_IDLE, the host MUST ignore it, because it is already processing another local CCI. As a result, only one local CCI at each end can be processed at the same time. Consequently, as every remote CCI at one end is triggered by a local CCI at the other end, only one remote CCI at each end can be processed at the same time. On the other hand, if both hosts receive connectivity-change indications from their local stacks (local CCIs) at almost the same time, there is a possibility of simultaneous processing of local and remote CCIs at both ends. In that case, path re-probing is triggered twice at each end in a very short time that can be lower than RTT. As this does not improve TCP performance, it can be avoided by triggering the A_REPROBE_PATH action only if CCI_CONTROLLED_CWND == false. 7. Security Considerations The only foreseen security considerations with the techniques presented in this document result from either an attacker's ability to spoof valid TCP segments with CCI options that seemingly indicate connectivity changes, or an attacker's ability to generate bogus CCIs locally. An attacker might produce a stream of such false indicators that could keep a connection in slow-start at the initial window. One possible defense against this type of attack is to rate-limit the response to CCIs (whether local or remote). This is also probably less serious than other attacks such an empowered adversary could perform, like resetting the connection or injecting data. A similar Schuetz, et al. Expires August 25, 2008 [Page 19] Internet-Draft TCP Response to Connectivity Indications February 2008 effect could be achieved without the new CCI option by forging duplicate ACKs that would keep a sender in loss recovery. If both sets of IP addresses, port numbers, and sequence numbers are guessable for a connection, then the connection should employ other measures [RFC4953] for protection against spoofed segments. 8. IANA Considerations This section is to be interpreted according to [I-D.narten-iana-considerations-rfc2434bis]. This document does not define any new namespaces. It requests that IANA allocate a new 8-bit TCP option number for the CCI option from the registry maintained at http://www.iana.org/assignments/tcp-parameters. 9. Acknowledgments This draft combines and obsoletes [I-D.swami-tcp-lmdr] and [I-D.eggert-tcpm-tcp-retransmit-now]. The authors would like to thank Mark Allman, Marcus Brunner, Alfred Hoenes, Shashikant Maheshwari, Kacheong Poon, Juergen Quittek, Stefan Schmid and Joe Touch for their comments and suggestions on this draft as well as the two original drafts. Simon Schuetz and Lars Eggert are partly funded by the Trilogy project, a research project supported by the European Commission under its Seventh Framework Program. Wesley Eddy's work on this document was performed at NASA's Glenn Research Center, while in support of the NASA Space Communications Architecture Working Group (SCAWG), and the FAA/Eurocontrol Future Communications Study (FCS). 10. References 10.1. Normative References [I-D.narten-iana-considerations-rfc2434bis] Narten, T. and H. Alvestrand, "Guidelines for Writing an IANA Considerations Section in RFCs", draft-narten-iana-considerations-rfc2434bis-08 (work in progress), October 2007. [RFC0793] Postel, J., "Transmission Control Protocol", STD 7, Schuetz, et al. Expires August 25, 2008 [Page 20] Internet-Draft TCP Response to Connectivity Indications February 2008 RFC 793, September 1981. [RFC1191] Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191, November 1990. [RFC1323] Jacobson, V., Braden, B., and D. Borman, "TCP Extensions for High Performance", RFC 1323, May 1992. [RFC1981] McCann, J., Deering, S., and J. Mogul, "Path MTU Discovery for IP version 6", RFC 1981, August 1996. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC2581] Allman, M., Paxson, V., and W. Stevens, "TCP Congestion Control", RFC 2581, April 1999. [RFC2988] Paxson, V. and M. Allman, "Computing TCP's Retransmission Timer", RFC 2988, November 2000. [RFC3042] Allman, M., Balakrishnan, H., and S. Floyd, "Enhancing TCP's Loss Recovery Using Limited Transmit", RFC 3042, January 2001. [RFC4821] Mathis, M. and J. Heffner, "Packetization Layer Path MTU Discovery", RFC 4821, March 2007. 10.2. Informative References [DUKE] Duke, M., Henderson, T., and J. Meegan, "Experience with ``Link-UP Notification'' Over a Mobile Satellite Link", ACM Computer Communication Review, Vol. 34, No. 3, July 2004. [EDDY] Eddy, W. and Y. Swami, "Adapting End-host Congestion Control for Mobility", NASA Glenn Research Center Technical Report, CR-2005-213838, July 2005. [I-D.dawkins-trigtran-linkup] Dawkins, S., "End-to-end, Implicit 'Link-Up' Notification", draft-dawkins-trigtran-linkup-01 (work in progress), October 2003. [I-D.eggert-tcpm-tcp-retransmit-now] Eggert, L., "TCP Extensions for Immediate Retransmissions", draft-eggert-tcpm-tcp-retransmit-now-02 (work in progress), June 2005. Schuetz, et al. Expires August 25, 2008 [Page 21] Internet-Draft TCP Response to Connectivity Indications February 2008 [I-D.ietf-hip-mm] Henderson, T., "End-Host Mobility and Multihoming with the Host Identity Protocol", draft-ietf-hip-mm-05 (work in progress), March 2007. [I-D.ietf-tcpimpl-restart] Hughes, A., Touch, J., and J. Heidemann, "Issues in TCP Slow-Start Restart After Idle", draft-ietf-tcpimpl-restart-00 (work in progress), March 1998. [I-D.ietf-tcpm-tcp-uto] Eggert, L. and F. Gont, "TCP User Timeout Option", draft-ietf-tcpm-tcp-uto-08 (work in progress), November 2007. [I-D.swami-tcp-lmdr] Swami, Y., "Lightweight Mobility Detection and Response (LMDR) Algorithm for TCP", draft-swami-tcp-lmdr-07 (work in progress), March 2006. [KOODLI] Koodli, R. and C. Perkins, "Fast Handovers and Context Transfers in Mobile Networks", ACM Computer Communication Review, Vol. 31, No. 5, October 2001. [OTT] Ott, J. and D. Kutscher, "OTT Internet: IEEE 802.11b for Automobile Users", Proc. Infocom 2004, March 2004. [RFC1122] Braden, R., "Requirements for Internet Hosts - Communication Layers", STD 3, RFC 1122, October 1989. [RFC2131] Droms, R., "Dynamic Host Configuration Protocol", RFC 2131, March 1997. [RFC2460] Deering, S. and R. Hinden, "Internet Protocol, Version 6 (IPv6) Specification", RFC 2460, December 1998. [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition of Explicit Congestion Notification (ECN) to IP", RFC 3168, September 2001. [RFC3344] Perkins, C., "IP Mobility Support for IPv4", RFC 3344, August 2002. [RFC3775] Johnson, D., Perkins, C., and J. Arkko, "Mobility Support in IPv6", RFC 3775, June 2004. [RFC3819] Karn, P., Bormann, C., Fairhurst, G., Grossman, D., Schuetz, et al. Expires August 25, 2008 [Page 22] Internet-Draft TCP Response to Connectivity Indications February 2008 Ludwig, R., Mahdavi, J., Montenegro, G., Touch, J., and L. Wood, "Advice for Internet Subnetwork Designers", BCP 89, RFC 3819, July 2004. [RFC4306] Kaufman, C., "Internet Key Exchange (IKEv2) Protocol", RFC 4306, December 2005. [RFC4782] Floyd, S., Allman, M., Jain, A., and P. Sarolahti, "Quick- Start for TCP and IP", RFC 4782, January 2007. [RFC4907] Aboba, B., "Architectural Implications of Link Indications", RFC 4907, June 2007. [RFC4953] Touch, J., "Defending TCP Against Spoofing Attacks", RFC 4953, July 2007. [RFC4957] Krishnan, S., Montavont, N., Njedjou, E., Veerepalli, S., and A. Yegin, "Link-Layer Event Notifications for Detecting Network Attachments", RFC 4957, August 2007. [SCHUETZ] Schuetz, S., Eggert, L., Schmid, S., and M. Brunner, "Protocol Enhancements for Intermittently Connected Hosts", ACM Computer Communication Review, Vol. 35, No. 3, July 2005. [SCOTT] Scott, J. and G. Mapp, "Link layer-based TCP optimization for disconnecting networks", ACM Computer Communication Review, Vol. 33, No. 5, October 2003. Editorial Comments [footnote-1] The authors have heard the idea of triggering retransmits based on connectivity events of directly- connected links being attributed to Phil Karn ("kick" operation in the KAQ9 TCP stack). A thread from the PILC mailing list in 2000 discusses some thoughts on this (http://www.isi.edu/pilc/list/archive/0691.html). [footnote-2] Although this specification introduces eight new per- connection state variables, a preliminary implementation of an earlier revision of this mechanism [I-D.swami-tcp-lmdr] only required around a hundred lines of kernel code. Appendix A. Background: Classification of Connectivity Disruptions Connectivity disruptions can occur in many different situations. Schuetz, et al. Expires August 25, 2008 [Page 23] Internet-Draft TCP Response to Connectivity Indications February 2008 They can be due to wireless interference, movement out of a wireless coverage area, switching between access networks, or simply due to unplugging an Ethernet cable. Depending on the situation in which they occur, the implications of connectivity disruptions are different and must be handled appropriately. This section attempts to classify different types of connectivity disruptions and discusses their implications and impact on TCP. Two main properties of connectivity disruptions affect how TCP reacts to them: their duration and whether the path characteristics have significantly changed after they end. This document distinguishes between "short" and "long" disruptions and "changed" and "unchanged" path characteristics. Note that these two categories are orthogonal to each other, i.e., four types of connectivity disruptions exist. Connectivity disruptions are "short" for a given TCP connection, if connectivity returns before the RTO fires for the first time, i.e., when TCP is still in steady-state. In this case, standard TCP recovers lost data segments through Fast Retransmit and lost ACKs through successfully delivered later ACKs. Appendix A.1 briefly describes this case. Connectivity disruptions are "long" for a given TCP connection, if the RTO fires at least once before connectivity returns, i.e., when TCP is in exponential back-off. In this case, TCP can be inefficient in its retransmission scheme, as described in Appendix A.2. Whether or not path characteristics change when connectivity returns is a second important factor for TCP's retransmission scheme. Standard TCP implicitly assumes that path characteristics remain unchanged across short disruptions by performing Fast Retransmit using the path parameters collected before the disruption. For long disruptions, standard TCP is more conservative and performs slow- start, re-probing the path characteristics from scratch. However, the standard behavior can be inefficient due to when it is initiated. These implicit assumptions can cause standard TCP to misbehave or perform inefficiently in some scenarios. Figure 4 illustrates the standard TCP behavior. Schuetz, et al. Expires August 25, 2008 [Page 24] Internet-Draft TCP Response to Connectivity Indications February 2008 +-----------------------+-----------------------+ Short | Fast Retransmit using | Fast Retransmit using | Duration | currently collected | currently collected | < RTO | path characteristics | path characteristics | +-----------------------+-----------------------+ Long | | | Duration | Slow-start | Slow-start | >= RTO | | | +-----------------------+-----------------------+ Unchanged Path Changed Path Characteristics Characteristics Figure 4: Standard TCP behavior. A.1. Short Connectivity Disruptions One common cause of short connectivity disruptions that result in a change of the end-to-end path characteristics is transparent network layer mobility, via protocols such as Mobile IP, NEMO, or HIP. These protocols generally hide mobility events from the transport layer, but cannot mask the resulting changes to the end-to-end path that established TCP connections transmit over. Consider a Mobile IP scenario as shown in Figure 5. At time T, a mobile node MN attaches to access network Net-1, connected to the Internet through access router AR-1 and has the care-of address . It establishes a TCP connection to the correspondent node CN. While MN attaches to AR-1, packets between CN and follow PATH-1 (via Cloud-1 and AR-1). Assume that at some time T+1, MN moves and then attaches to Net-2, which is reachable through AR-2 with the care-of address . While MN attaches to AR-2, all packets between CN and follow PATH-2 (through Cloud-2 and AR-2). Schuetz, et al. Expires August 25, 2008 [Page 25] Internet-Draft TCP Response to Connectivity Indications February 2008 <---------PATH-1----------> /---------\ +------+ | | | | Net-1 +---+ Cloud-1 +---+ AR-1 +-----> MN (time=T) | | | | | | \----+----/ +---+--+ | | | | CN <------+ | PATH-3 | | | | | /----V----\ +-------+ V | | | | | +---+ Cloud-2 +---+ AR-2 +-----> MN (time=T+1) | | | | Net-2 \---------/ +-------+ <--------PATH-2-----------> Figure 5: Mobility example. During a transient disconnected period, MN may have disconnected from Net-1 and not yet attached to Net-2. Consequently, AR-1 may not be able to deliver packets to MN. This could result in a burst of packet losses. Several approaches for "fast" or "seamless" handovers exist that involve adding machinery to the ARs to buffer and redirect packets originally sent to Net-1 towards Net-2, rather than dropping them (e.g., [KOODLI]). As long as MN remains in Net-1, standard congestion control algorithms [RFC2581] are sufficient. However, once MN moves from Net-1 to Net-2, two different scenarios are possible depending on network topology: o In the first scenario, with standard Mobile IPv4, all packets destined to are dropped by AR-1 once MN has moved. Since the latency involved in establishing a new tunnel to the HA is on the order of the RTT (2*RTT in case of Mobile IPv6), roughly an entire window's worth of data and ACKs will be dropped by AR-1. Because of this burst loss, CN and MN are likely to incur expensive retransmission timeouts. o In the second scenario, with a fast handover mechanism in place, losses are masked through buffering and tunneling between routers AR-1 and AR-2. The exact sequence of buffering and forwarding between the ARs is not guaranteed to occur in a manner consistent with the available bandwidth of PATH-3 or conformant to TCP's clocking expectations. This can cause TCP's behavior over PATH-2 to be based on the unrelated properties of PATH-1 and PATH-3. Schuetz, et al. Expires August 25, 2008 [Page 26] Internet-Draft TCP Response to Connectivity Indications February 2008 After attaching to Net-2, reception of stale ACKs (for data sent on PATH-1) will cause MN to incorrectly inflate its congestion window. These stale ACKs do not provide any indication of the congestion along PATH-2. CN's congestion window becomes similarly inflated by ACKs that MN sends for data segments redirected over PATH-3. If the congestion windows from PATH-1 are already too big for PATH-2, this can overload Net-2 or PATH-2, causing packet loss and timeouts. On the other hand, if the available bandwidth along PATH-2 is greater than along PATH-1, and if the sender is in congestion avoidance, it will need potentially many RTTs before utilizing the available path capacity. This is due to relatively slow bandwidth increase during congestion avoidance caused by a stale SS_THRESH. (See [EDDY] for details.) A.2. Long Connectivity Disruptions For long disruptions, standard TCP performs slow-start after connectivity returns, because the retransmission timeout (RTO) has expired. This conservative strategy avoids overloading the new path. However, TCP's general exponential back-off retransmission strategy can time these slow-starts such that performance decreases. When a long connectivity disruption occurs along the path between a host and its peer while the host is transmitting data, it stops receiving ACKs. After the RTO expires, the host attempts to retransmit the first unacknowledged segment. TCP implementations that follow the recommended RTO management proposed in [RFC2988] double the RTO after each retransmission attempt until it exceeds 60 seconds. This scheme causes a host to attempt to retransmit across established connections roughly once a minute. (More frequently during the first minute or two of the connectivity disruption, while the RTO is still being backed off.) When the long connectivity disruption ends, standard TCP implementations still wait until the RTO expires before attempting retransmission. Figure 6 illustrates this behavior. Depending on when connectivity becomes available again, this can waste up to a minute of connectivity for TCPs that implement the recommended RTO management described in [RFC2988]. For TCP implementations that do not implement [RFC2988], even longer connectivity periods may be wasted. For example, Linux uses 120 seconds as the maximum RTO by default. Schuetz, et al. Expires August 25, 2008 [Page 27] Internet-Draft TCP Response to Connectivity Indications February 2008 Sequence number X = Successfully transmitted segment ^ O = Lost segment | : : : X | : : :X | OO O O O O : X | X: : : | X : :<------------>: | X : : Wasted : | X : : connection : |X : : time : +-----:---------------------:--------------:--------> : : : Time Connectivity Connectivity TCP gone back retransmit Figure 6: Standard TCP behavior in the presence of disrupted connectivity. This retransmission behavior is not efficient, especially in scenarios where connectivity periods are short and connectivity disruptions are frequent [OTT]. Experiments show that TCP performance across a path with frequent disruptions is significantly worse, compared to a similar path without disruptions [SCHUETZ]. In the ideal case, TCP would attempt a retransmission as soon as connectivity to its peer was re-established. Figure 7 illustrates the ideal behavior. Sequence number X = Successfully transmitted segment ^ O = Lost segment | : : X : | : :X : | OO O O O O X : | X: : : | X : :<------------>: | X : : Efficiency : | X : : improvement : |X : : : +-----:---------------------:--------------:--------> : : : Time Connectivity Connectivity Next gone back := immediate scheduled TCP retransmit retransmit Figure 7: Ideal TCP behavior in the presence of disrupted connectivity Schuetz, et al. Expires August 25, 2008 [Page 28] Internet-Draft TCP Response to Connectivity Indications February 2008 The ideal behavior is difficult to achieve for arbitrary connectivity disruptions. One obviously problematic approach would use higher- frequency retransmission attempts to enable earlier detection of whether connectivity has returned. This can generate significant amounts of extra traffic. Other proposals attempt to trigger faster retransmissions by retransmitting buffered or newly-crafted segments from inside the network [SCOTT][I-D.dawkins-trigtran-linkup][DUKE][RFC3819]. Note that scenarios exist where path characteristics remain unchanged after long connectivity disruptions. In this case, even an intelligently scheduled slow-start is inefficient, because TCP could safely resume transmitting at the old rate instead of slow-starting. Although originally developed to avoid line-rate bursts, techniques for the well-known "slow-start after idle" case [I-D.ietf-tcpimpl-restart] may be useful to further improve performance after a disruption ends in such a scenario. This document does not currently describe this additional optimization, and an open question remains on how unchanged path characteristics after long connectivity disruptions could be validated by an end host. Appendix B. Document Revision History +----------+--------------------------------------------------------+ | Revision | Comments | +----------+--------------------------------------------------------+ | 03 | Mainly editorial and textual changes according to | | | feedback received since last version. | | 02 | Major modification to the RLCI mechanism for | | | implementing a 3-way handshake that ensures that both | | | peers are informed about a connectivity-change | | | indication. CCI option format, RLCI variables | | | maintained by the TCP peers and the related state | | | machines are affected by that modification. | | 01 | Major revision of the description of the | | | connectivity-change indication TCP option and its | | | processing in Section 5. Other formatting changes to | | | the document include moving some background material | | | to the appendix. | | 00 | Initial version. This document is a merge of and | | | obsoletes [I-D.eggert-tcpm-tcp-retransmit-now] and | | | [I-D.swami-tcp-lmdr]. | +----------+--------------------------------------------------------+ Schuetz, et al. Expires August 25, 2008 [Page 29] Internet-Draft TCP Response to Connectivity Indications February 2008 Authors' Addresses Simon Schuetz NEC Laboratories Europe Kurfuerstenanlage 36 Heidelberg 69115 Germany Phone: +49 6221 4342 165 Email: simon.schuetz@nw.neclab.eu URI: http://www.nw.neclab.eu Nikolaos Koutsianas Nokia Research Center Email: nkout@mobile.ntua.gr Lars Eggert Nokia Research Center P.O. Box 407 Nokia Group 00045 Finland Phone: +358 50 48 24461 Email: lars.eggert@nokia.com URI: http://research.nokia.com/people/lars_eggert/ Wesley M. Eddy Verizon Federal Network Systems NASA Glenn Research Center 21000 Brookpark Road, MS 54-5 Cleveland, OH 44135 USA Email: weddy@grc.nasa.gov Schuetz, et al. Expires August 25, 2008 [Page 30] Internet-Draft TCP Response to Connectivity Indications February 2008 Yogesh Prem Swami Nokia Research Center, Dallas 955 Page Mill Road Palo Alto, California 94304 USA Phone: +1 972 374 0669 Email: yogesh.swami@nokia.com Khiem Le Nokia Siemens Networks 6000 Connection Drive Irving, TX 75039 USA Phone: +1 972 342 3502 Email: khiem.le@nsn.com Schuetz, et al. Expires August 25, 2008 [Page 31] Internet-Draft TCP Response to Connectivity Indications February 2008 Full Copyright Statement Copyright (C) The IETF Trust (2008). This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights. This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Intellectual Property The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79. Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf-ipr@ietf.org. Acknowledgment Funding for the RFC Editor function is provided by the IETF Administrative Support Activity (IASA). Schuetz, et al. Expires August 25, 2008 [Page 32]