Network Working Group S. Schuetz Internet-Draft L. Eggert Expires: December 2, 2006 NEC W. Eddy Verizon Y. Swami K. Le Nokia May 31, 2006 TCP Response to Lower-Layer Connectivity-Change Indications draft-schuetz-tcpm-tcp-rlci-00 Status of this Memo By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79. This document may not be modified, and derivative works of it may not be created, except to publish it as an RFC and to translate it into languages other than English. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on December 2, 2006. Copyright Notice Copyright (C) The Internet Society (2006). Abstract Schuetz, et al. Expires December 2, 2006 [Page 1] Internet-Draft TCP Response to Connectivity Indications May 2006 When connectivity characteristics between two hosts change abruptly, TCP can experience significant delays before resuming transmission in an efficient manner or TCP can behave unfairly to competing traffic. This document describes TCP extensions that improve transmission behavior in response to advisory, lower-layer connectivity-change indications. The proposed TCP extensions modify the local behavior of TCP and introduce a new TCP option to signal local connectivity- change indications to remote peers. Performance gains result from a more efficient transmission behavior and are not due to an increased aggressiveness. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Motivation and Overview . . . . . . . . . . . . . . . . . . . 4 3. Background: Classification of Connectivity Disruptions . . . . 5 3.1. Short Connectivity Disruptions . . . . . . . . . . . . . . 6 3.2. Long Connectivity Disruptions . . . . . . . . . . . . . . 8 4. Connectivity-Change Indications . . . . . . . . . . . . . . . 10 5. TCP Response to Connectivity-Change Indications . . . . . . . 11 5.1. Connectivity-Change Indication TCP Option . . . . . . . . 12 5.2. Re-Probing Path Characteristics . . . . . . . . . . . . . 14 5.3. Speculative Retransmission . . . . . . . . . . . . . . . . 15 6. Security Considerations . . . . . . . . . . . . . . . . . . . 15 7. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . 16 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 16 9. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 16 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 17 10.1. Normative References . . . . . . . . . . . . . . . . . . . 17 10.2. Informative References . . . . . . . . . . . . . . . . . . 17 Editorial Comments . . . . . . . . . . . . . . . . . . . . . . . . Appendix A. Document Revision History . . . . . . . . . . . . . . 20 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 21 Intellectual Property and Copyright Statements . . . . . . . . . . 23 Schuetz, et al. Expires December 2, 2006 [Page 2] Internet-Draft TCP Response to Connectivity Indications May 2006 1. Introduction Several current components of Transmission Control Protocol (TCP) [RFC0793] assume that end-to-end paths between hosts are relatively stable over the lifetime of a connection. Although the TCP congestion control algorithms [RFC2581] adapt to changes in path connectivity characteristics between two hosts over time, they cannot adapt well if significant changes occur on time-scales of a few round-trip times or less. This is due to the granularity of TCP's sampling mechanisms. Significant changes to path connectivity include loss or reestablishment of connectivity, and drastic, abrupt changes to the round-trip time (RTT) or available bandwidth. Connectivity changes that occur on short time-scales are becoming more common, due to host mobility or intermittent network attachment. This document describes a set of complementary TCP extensions that improve behavior when path characteristics change on short time- scales. TCP implementations that support the proposed extensions respond to receiving generic, technology-independent, per-connection "path characteristics have changed" (or short: "connectivity-change") indications from lower layers. A connectivity-change indication signals that the connectivity characteristics of the end-to-end path between the local node and its peer have changed in an undefined way. The response mechanisms proposed for TCP act on this information in a conservative fashion. The specific response depends on the state of a connection. It is important to note that TCP and other transport protocols already react to information and signals from lower layers; the proposed connectivity-change indications thus extend an established interface between layers in the protocol stack. TCP measures the end-to-end path to implicitly derive network-layer information. TCP also directly reacts to network-layer signals delivered via ICMP, for example, "Port Unreachable" or the now-deprecated "Source Quench" [RFC1122]. Explicit Congestion Notification (ECN) [RFC3168] and Quick-Start [I-D.ietf-tsvwg-quickstart] are other sources of network- layer information for which response mechanisms for TCP have been proposed. Connectivity-change indications are yet another source of lower-layer information that TCP can use to improve its operation. A second important point to note is that the proposed TCP response mechanisms to connectivity-change indications are purely optional efficiency improvements. In the absence of connectivity-change indications, a TCP that implements the proposed changes behaves identical to an unmodified TCP. When lower layers provide connectivity-change indications that trigger the proposed enhancements, they enhance TCP operation based on the explicit lower- layer information that is signaled. The proposed response mechanisms Schuetz, et al. Expires December 2, 2006 [Page 3] Internet-Draft TCP Response to Connectivity Indications May 2006 do not increase the aggressiveness of TCP. Note that the IAB has recently described architectural issues of "link indications" [I-D.iab-link-indications]. The authors feel that this term is not quite accurate in this environment, because transport mechanisms should remain link-technology-agnostic. However, transport protocols have always acted on network-layer information and signals, such as measured path characteristics or ICMP-signaled conditions. Because of the growing proliferation of shim layers between the traditional network and transport layers, this document uses the term "lower-layer indication" to remain independent of specific network or shim layers. Note that it is currently an open question as to whether additional lower-layer indications can provide further information to transport protocols. Also, this document focuses on response mechanisms for TCP only, although other transport protocols may benefit from similar response mechanisms that react to these indications. The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119]. 2. Motivation and Overview Several proposed network layer extensions support host mobility, including Mobile IPv4 [RFC3344], Mobile IPv6 [RFC3775] and HIP [I-D.ietf-hip-mm]. Typically, they shield transport-layer protocols from mobility events and enable them to sustain established connections across mobility events. However, the path characteristics that established connections experience after a mobility event may have changed drastically and on short time-scales. Congestion control, RTT and path-MTU state gathered over an old path before the move generally have no meaning when transmitting along a new path. TCP already forces a slow-start restart in some cases where the network state becomes unknown, such as after an idle period or heavy losses. One mechanism proposed in this document introduces a similar slow-start restart in response to connectivity-change indications that are received while a connection is in steady-state. Note that this behavior is more conservative than the standard TCP response; any performance gains with the proposed mechanisms are due to avoiding to overload the new path. A second proposed extension improves TCP operation in the presence of temporary connectivity disruptions. These disruptions can occur Schuetz, et al. Expires December 2, 2006 [Page 4] Internet-Draft TCP Response to Connectivity Indications May 2006 independently of mobility events and may, for example, be due to insufficient wireless access coverage or nomadic computer use. Connectivity disruptions can severely decrease TCP performance. The main reason for this decrease is TCP's retransmission behavior after a connectivity disruption [SCHUETZ-CCR], i.e., periodic retransmission attempts in exponentially increasing intervals, which can unnecessarily delay retransmissions after connectivity returns. In the extreme case, TCP connections can even abort, if the disruption is longer than the TCP "user timeout." (Connection aborts are out of scope for this document but can be prevented by the TCP User Timeout Option [I-D.ietf-tcpm-tcp-uto].) The proposed response mechanism is also executed when receiving a connectivity-change indication, but is chosen when a connection is stalled in exponential back-off. It improves TCP retransmission behavior after connectivity is restored through an immediate speculative retransmission attempt [anchor3]. Similar to the first extension, this modification increases TCP performance through a more intelligent transmission behavior that uses periods of connectivity more efficiently. It does not cause significant amounts of additional traffic and does not change TCP's congestion control algorithms. Finally, this draft proposes a third mechanism, which is a new TCP option that signals connectivity-change indications received or detected by a host to its remote peers in open TCP connections. This is useful, because connectivity indications typically require appropriate responses at both peers, but may only be received or detected by one peer. Response to a connectivity-change indication is independent of its source (locally notified or remotely signaled) and depends only on the specific indication and the state of the connection for which it was received. 3. Background: Classification of Connectivity Disruptions Connectivity disruptions occur in many different situations. They can be due to wireless interference, movement out of a wireless coverage area, switching between access networks, or simply due to unplugging an Ethernet cable. Depending on the situation in which they occur, the implications of connectivity disruptions are different and must be handled appropriately. This section attempts to classify different types of connectivity disruptions and discusses their implications and impact on TCP. Two main properties of connectivity disruptions affect how TCP reacts on them: their duration and whether the path characteristics have significantly changed after they end. This document distinguishes Schuetz, et al. Expires December 2, 2006 [Page 5] Internet-Draft TCP Response to Connectivity Indications May 2006 between "short" and "long" disruptions and "changed" and "unchanged" path characteristics. Note that these two categories are orthogonal to each other, i.e., all four combinations exist. Connectivity disruptions are "short" for a given TCP connection, if connectivity returns before the RTO fires for the first time. In this case, standard TCP recovers lost data segments through Fast Retransmit and lost ACKs through successfully delivered later ACKs. Section 3.1 briefly describes this case. Connectivity disruptions are "long" for a given TCP connection, if the RTO fires at least once before connectivity returns. In this case, TCP can be inefficient in its retransmission scheme, as described in Section 3.2. Whether or not path characteristics change when connectivity returns is a second important factor for TCP's retransmission scheme. Standard TCP implicitly assumes that path characteristics remain unchanged for short disruptions by performing Fast Retransmit based on path parameters collected before the disruption. For long disruptions, standard TCP is more conservative and performs slow- start, re-probing the path characteristics from scratch. However, the standard behavior can be inefficient. These implicit assumptions can cause standard TCP to misbehave or perform inefficiently in some scenarios. Figure 1 illustrates the standard TCP behavior. +----------------------+----------------------+ Short | Fast Retransmit | Fast Retransmit | Duration | using collected path | using collected path | < RTO | characteristics | characteristics | +----------------------+----------------------+ Long | | | Duration | Slow-start | Slow-start | >= RTO | | | +----------------------+----------------------+ Unchanged Path Changed Path Characteristics Characteristics Figure 1: Standard TCP behavior. 3.1. Short Connectivity Disruptions One common cause of short connectivity disruptions that result in a change of the end-to-end path characteristics is transparent network layer mobility, via protocols such as Mobile IP, NEMO, or HIP. Schuetz, et al. Expires December 2, 2006 [Page 6] Internet-Draft TCP Response to Connectivity Indications May 2006 Although changes in the point of network attachment happen unbeknownst to the transport layer, these events may change many aspects of the path which established TCP connections base their behavior upon. Consider a MobileIP scenario as shown in Figure 2. At time T, a mobile node MN is attached to access network Net-1, connected to the Internet through access router AR-1 and has the care-of address . A TCP connection is established between MN and a corresponding node CN. While MN is attached to AR-1, packets between CN and are routed using PATH-1 (via Cloud-1 and AR-1). Assume that at some time T+1, MN moves and then attaches to Net-2, which is reachable through AR-2 with the care-of address . While MN is attached to AR-2, all packets between CN and are routed using PATH-2 (through Cloud-2 and AR-2). <---------PATH-1----------> /---------\ +------+ | | | | Net-1 +---+ Cloud-1 +---+ AR-1 +-----> MN (time=T) | | | | | | \----+----/ +---+--+ | | | | CN <------+ | PATH-3 | | | | | /----V----\ +-------+ V | | | | | +---+ Cloud-2 +---+ AR-2 +-----> MN (time=T+1) | | | | Net-2 \---------/ +-------+ <--------PATH-2-----------> Figure 2: Mobility example. During a transitional disconnected period, MN may be disconnected from Net-1 and not yet attached to Net-2. Consequently, AR-1 may not be able to deliver packets to MN. This could result in a burst of packet losses. There are several suggested means of supporting "fast" or "seamless" handovers, which involve adding machinery to the ARs to buffer and redirect packets originally sent to Net-1 towards Net-2, rather than dropping them (e.g., [KOODLI]). As long as MN remains in Net-1, standard congestion control algorithms [RFC2581] are sufficient. But once it moves from Net-1 to Net-2, two different scenarios are possible depending on network Schuetz, et al. Expires December 2, 2006 [Page 7] Internet-Draft TCP Response to Connectivity Indications May 2006 topology: o In the first scenario, with standard Mobile IPv4, all packets destined to are dropped by AR-1 once the mobile node has moved. Since the latency involved in establishing a new tunnel to the HA is on the order of the RTT (2*RTT in case of Mobile IPv6), roughly an entire window's worth of data and ACKs will be dropped by AR-1. Because of this burst loss, the CN and MN are likely to incur expensive retransmission timeouts. o In the second scenario, with a fast handover mechanism in place, losses are suppressed through buffering and tunneling between routers AR-1 and AR-2. The exact means of buffering and forwarding between the ARs is not guaranteed to occur in a manner consistent to the available bandwidth of PATH-3, nor to conform to TCP's clocking expectations. This can cause TCP's behavior over PATH-2 to be based on the unrelated properties of PATH-1 and PATH-3. After attaching to Net-2, reception of stale ACKs (for data sent on PATH-1) will cause MN to incorrectly inflate its congestion window. These stale ACKs do not provide any indication of the congestion along PATH-2 and should consequently be ignored . CN's congestion window becomes similarly inflated by ACKs that MN sends for data segments redirected over PATH-3. If the congestion windows from PATH-1 are already too big for PATH-2, this can overload Net-2 or PATH-2, causing packet loss and timeouts. On the other hand, if the available bandwidth along PATH-2 is greater than along PATH-1, and if the sender is in congestion avoidance, it will need potentially many RTTs before reaching a reasonable throughput. This is due to relatively slow bandwidth increase during congestion avoidance caused by a stale SS_THRESH. (See [ES05] for details.) 3.2. Long Connectivity Disruptions For long disruptions, standard TCP performs slow-start after connectivity returns, because the retransmission timeout (RTO) has expired. This is a conservative strategy that avoids overloading the new path. However, TCP's general exponential back-off retransmission strategy can time these slow-starts such that performance decreases. When a long connectivity disruption occurs along the path between a host and its peer while the host is transmitting data, it stops receiving ACKs. After the RTO expires, the host attempts to retransmit the first unacknowledged segment. TCP implementations that follow the recommended RTO management proposed in [RFC2988] Schuetz, et al. Expires December 2, 2006 [Page 8] Internet-Draft TCP Response to Connectivity Indications May 2006 double the RTO after each retransmission attempt until it exceeds 60 seconds. This scheme causes a host to attempt to retransmit across established connections roughly once a minute. (More frequently during the first minute or two of the connectivity disruption, while the RTO is still being backed off.) When the long connectivity disruption ends, standard TCP implementations still wait until the RTO expires before attempting retransmission. Figure 3 illustrates this behavior. Depending on when connectivity becomes available again, this can waste up to a minute of connection time for TCPs that implement the recommended RTO management described in [RFC2988]. For TCP implementations that do not implement [RFC2988], even longer connection times may be lost. For example, Linux uses 120 seconds as the maximum RTO by default. Sequence number X = Successfully transmitted segment ^ O = Lost segment | : : : X | : : :X | OO O O O O : X | X: : : | X : :<------------>: | X : : Wasted : | X : : connection : |X : : time : +-----:---------------------:--------------:--------> : : : Time Connectivity Connectivity TCP gone back retransmit Figure 3: Standard TCP behavior in the presence of disrupted connectivity. This retransmission behavior is not efficient, especially in scenarios where connected periods are short and connectivity disruptions are frequent [DRIVE-THRU]. Experiments show that TCP performance across a path with frequent disruptions is significantly worse, compared to a similar path without disruptions [SCHUETZ-CCR]. In the ideal case, TCP would attempt a retransmission as soon as connectivity to its peer was re-established. Figure 4 illustrates the ideal behavior. Schuetz, et al. Expires December 2, 2006 [Page 9] Internet-Draft TCP Response to Connectivity Indications May 2006 Sequence number X = Successfully transmitted segment ^ O = Lost segment | : : X : | : :X : | OO O O O O X : | X: : : | X : :<------------>: | X : : Efficiency : | X : : improvement : |X : : : +-----:---------------------:--------------:--------> : : : Time Connectivity Connectivity Next gone back = immediate scheduled TCP retransmit retransmit Figure 4: Ideal TCP behavior in the presence of disrupted connectivity The ideal behavior is difficult to achieve for arbitrary connectivity disruptions. One obviously problematic approach would use higher- frequency retransmission attempts to enable earlier detection of whether connectivity has returned. This can generate significant amounts of extra traffic. Other proposals attempt to trigger faster retransmissions by retransmitting buffered or newly-crafted segments from inside the network [SCOTT][I-D.dawkins-trigtran- linkup][DUKEHEND][RFC3819]. Note that scenarios exist where path characteristics remain unchanged after long connectivity disruptions. In this case, even an intelligently scheduled slow-start is inefficient, because TCP could safely resume transmitting at the old rate instead of slow-starting. Although originally developed to avoid line-rate bursts, techniques for the well-known "slow-start after idle" case [I-D.ietf-tcpimpl- restart] may be useful to further improve performance after a disruption ends. This document does not currently describe this additional optimization. 4. Connectivity-Change Indications The focus of this document is on specifying TCP response mechanisms to lower-layer "path characteristics have changed" indications. This section briefly describes how different network- and shim-layer mechanisms underneath the transport layer can provide these "connectivity-change" indications to TCP. This description is included for clarification only; the details of providing Schuetz, et al. Expires December 2, 2006 [Page 10] Internet-Draft TCP Response to Connectivity Indications May 2006 connectivity indications is out of scope of this document. Connectivity-change indications may be generated after lower layers detect a connectivity-change event, for example, because: o the IP address of the outbound interface of a connection has changed, e.g., due to DHCP [RFC2131] or IPv6 router advertisements [RFC2460] o link-layer connectivity at the outbound interface of a connection has changed, e.g., link-layer "link up" event o the outbound interface of a connection has changed, due to routing changes or link-layer connectivity changes at other interfaces (including tunnel establishments or teardowns, e.g., in response to IKE events [RFC4306]) o a MobileIP binding update has completed [RFC3775] o a HIP readdressing update has completed [I-D.ietf-hip-mm] o a path-change signal from the network has arrived (possible in theory, depends on network capabilities) o other notifications as defined by the IETF's Detecting Network Attachment (DNA) working group [I-D.ietf-dna-link-information] 5. TCP Response to Connectivity-Change Indications A TCP connection can receive connectivity-change indications either from its local stack or through a new "connectivity-change TCP option" from its peer, as described in Section 5.1. In either case, TCP implementations that implement the proposed changes re-probe path characteristics or perform a speculative retransmission, depending on whether the connection is currently stalled in exponential back-off or not. A connection is "stalled in exponential back-off", if there is at least one unrecovered RTO, i.e. a segment was already retransmitted due to an RTO but still is not ACKed yet. TCP implementations that implement the proposed changes MUST maintain three new variables per connection: MY_CCI_COUNT, REMOTE_CCI_COUNT and CCI_STATE. The variables MY_CCI_COUNT and REMOTE_CCI_COUNT count locally and remotely received connectivity-change indications, respectively. The variable CCI_STATE stores the current state of the connectivity-change indication processing. CCI_STATE can have one of the following values: Schuetz, et al. Expires December 2, 2006 [Page 11] Internet-Draft TCP Response to Connectivity Indications May 2006 o CCI_IDLE: The host is currently not processing any connectivity- change indications. o CCI_INITIATOR: The host is currently processing a connectivity- change indication received from the local stack and propagated the indication to its peer through a connectivity-change TCP option. o CCI_RESPONDER: The host is currently processing a connectivity- change indication received from its peer via a connectivity-change TCP option. In the following, this document first introduces the operation of the new connectivity-change TCP option in Section 5.1, and afterwards describes the two mechanisms to improve TCP performance in response to connectivity-change events - namely re-probing path characteristics and speculative retransmission - in Section 5.2 and Section 5.3. 5.1. Connectivity-Change Indication TCP Option Connectivity-change indications are generally asymmetric, i.e., they may occur on one peer host but not the other. The basic idea behind the connectivity-change TCP option is to signal connectivity-change indications that the local stack has received to the peer, in order to allow it to respond appropriately. Figure 5 shows the option. However, if there is strong evidence that a connectivity-change indication received from the local stack is symmetric, i.e., it occurs on both communicating peers, the host MAY decide not to signal the connectivity-change indication to the remote peer. In this case, the signaling overhead can be avoided, because the remote peer will already react to the connectivity-change indication that it receives from its local stack. For instance, when a HIP identifier becomes rebound to a new locator, both local and remote peers can be simultaneously notified about the connectivity-change by their local stacks, when the HIP UPDATE procedure completes [I-D.ietf-hip-mm]. 1 1 2 2 0 8 6 8 1 4 +----------------+----------------+-----+------+------+ | KIND | LENGTH | RES | CNTR | ECNT | +----------------+----------------+-----+------+------+ Figure 5: Format of the connectivity-change indication TCP option. KIND: (8 Bits) TCP Option Type. Value set to 25 for experimental purposes. Schuetz, et al. Expires December 2, 2006 [Page 12] Internet-Draft TCP Response to Connectivity Indications May 2006 LENGTH: (8 Bits) TCP Option Length. Value = 3. RES: (2 Bits) Reserved bit. Sender SHOULD set the value to zero. Receiver MUST ignore these fields. CNTR: (3 Bits) The local connectivity-change indication counter value of the host sending this option. This value is decremented once for every connectivity-change indication that the local stack delivers to the connection. ECNT: (3 Bits) The echoed value of CNTR. On reception of a connectivity-change indication TCP option, a host copies the received CNTR value to the ECNT field of its response. The connectivity-change TCP option contains a counter (CNTR) that represents the number of times each side has received connectivity- change indications from its local stack. At the beginning of a connection, both endpoints use this option in the SYN and SYN-ACK segments, with an initial counter value of 7, to advertise support for the option. A host MUST NOT place this option in a SYN-ACK unless it was present on the received SYN. After the SYN exchange, hosts SHOULD NOT send this option until there is a connectivity- change indication. After connection setup, the option is only generated when a connection receives a connectivity-change indication from its local stack, or in response to a received connectivity- change TCP option from the peer. A host MUST NOT send the option during a connection unless it was advertised by both sides during the SYN handshake. When a host receives a connectivity-change TCP option, it SHOULD respond to it as described in Section 5.2 and Section 5.3 only if CNTR != REMOTE_CCI_COUNT, i.e. the peer signals a new instance of a connectivity-change that it has not previously signaled. The host SHOULD NOT respond to the reception of a connectivity-change TCP option if CNTR = REMOTE_CCI_COUNT, because the option duplicates a previous connectivity-change indication. At the beginning of a connection, CCI_STATE MUST be set to CCI_IDLE. The option SHOULD be included in all outgoing ACKs or segments if CCI_STATE != CCI_IDLE and SHOULD NOT be included in any outgoing ACK or segment if CCI_STATE = CCI_IDLE. When sending the connectivity-change TCP option, CNTR MUST be set to current MY_CCI_COUNT and ECNT MUST be set to current REMOTE_CCI_COUNT. When a connection receives a connectivity-change indication from its local stack and decides to signal the local indication to the remote Schuetz, et al. Expires December 2, 2006 [Page 13] Internet-Draft TCP Response to Connectivity Indications May 2006 peer, it decrements its MY_CCI_COUNTER, sets CCI_STATE to CCI_INITIATOR and consequently sends a connectivity-change TCP option in every subsequent ACK or data segment until CCI_STATE = CCI_IDLE. It resets CCI_STATE from CCI_INITIATOR to CCI_IDLE when it sees its current MY_CCI_COUNTER value echoed back as ECNT in a connectivity- change TCP option received from its peer. NOTE: As discussed before, a host may under certain circumstances decide not to signal a local connectivity-change indication to the remote peer. In this case, MY_CCI_COUNTER and CCI_STATE MUST NOT be altered. When a host receives a connectivity-change TCP option from its peer, it compares the received CNTR and the local REMOTE_CCI_COUNT. If they match, no further action is required. Otherwise, it MUST update REMOTE_CCI_COUNT to CNTR. It also MUST update CCI_STATE to CCI_RESPONDER unless o CCI_STATE is CCI_INITIATOR and o it has the higher initial sequence number of the two communicating hosts. CCI_STATE is reset from CCI_RESPONDER to CCI_IDLE when a host receives an ACK or segment from its peer that does not contain the connectivity-change TCP option. NOTE: The transition from CCI_STATE CCI_INITIATOR to CCI_RESPONDER is only allowed if the host has the lower initial sequence number. This is to prevent an infinite signaling loop where both hosts are in the CCI_RESPONDER state. Otherwise, if the two peers simultaneously receive connectivity-change indications from their local stacks and send out connectivity-change TCP options, both peers would set CCI_STATE to CCI_RESPONDER and include the option in all subsequent ACKs and segments. Therefore, none of the peers will reset CCI_STATE from CCI_RESPONDER to CCI_IDLE, as this transition is only performed when a host receives an ACK or segment that does not contain the connectivity-change TCP option. 5.2. Re-Probing Path Characteristics When a TCP connection receives a connectivity-change indication and is not currently stalled, it MUST re-probe the path characteristics to prevent causing congestion along the potentially new path and to quickly probe the path's available capacity. In principle, this occurs similar to the initial slow-start: The sender MUST NOT transmit more than the default initial window of data along the new path, in order to avoid over-congesting it, and the slow-start Schuetz, et al. Expires December 2, 2006 [Page 14] Internet-Draft TCP Response to Connectivity Indications May 2006 threshold (SS_THRESH) SHOULD be set to the initial value as with a new connection to allow for rapid probing of available capacity. In addition, it MUST reset round-trip time measurement (RTTM) and the RTO timer. In case Path MTU Discovery (PMTUD) is activated, PMTUD state SHOULD also be reset [RFC1191][RFC1981]. One difference to slow-start is that after a connectivity-change indication, the connection may have segments in flight towards the destination along a previous path. Therefore, after a connectivity- change indication, congestion control MUST ignore any stale ACKs and MUST update the congestion window solely based on ACKs for data sent on the new path. In detail, when a connectivity-change indication is received, it MAY send INIT_WINDOW worth of data along the changed path and MUST reset the congestion control state, RTTM state, and RTO timer as if this were a new connection [RFC2581][RFC2988]. Each ACK that is received while CCI_STATE is not CCI_IDLE SHOULD be treated as a stale ACK. For each stale ACK received, a host MUST NOT adjust the congestion window and MUST NOT send any new data into the network. This behavior SHOULD continue until CCI_STATE is CCI_IDLE again or there is a timeout. Once CCI_STATE is set to CCI_IDLE, the sender should consider any un-ACK'ed segments below the highest received ACK as lost and discount them from the segments in flight. The sender MUST use slow-start based loss recovery for these segments. 5.3. Speculative Retransmission The basic idea behind the speculative retransmission is to allow TCP to resume stalled connections as soon as it receives an indication that connectivity to previously unreachable peers may have returned. When a TCP connection receives a connectivity-change indication - either from the local stack or in a connectivity-change TCP option from the peer - and is currently stalled, it MUST immediately initiate the standard retransmission procedure, just as if the RTO for the connection had expired. In addition, conforming TCP implementations SHOULD send at least one segment to the peer. This segment MUST contain the connectivity- change TCP option to notify the peer and may either be a queued data retransmission or a pure ACK, if the connection has no data awaiting retransmission. 6. Security Considerations Schuetz, et al. Expires December 2, 2006 [Page 15] Internet-Draft TCP Response to Connectivity Indications May 2006 The only foreseen security considerations with the techniques presented in this document, result from either an attacker's ability to spoof valid TCP segments with options that seemingly indicate connectivity changes, or an attacker's ability to generate bogus connectivity change indications locally. An attacker might produce a stream of such false indicators that could keep a connection in slow- start at the initial window. One possible defense against this type of attack is to rate-limit the response to connectivity indicators (whether local or remote). This is also probably less serious than other attacks such an empowered adversary could perform, like reseting the connection or injecting data. A similar effect could be achieved without the new option by forging duplicate ACKs that would keep a sender in loss recovery. If both sets of IP addresses, port numbers, and sequence numbers are guessable for a connection, then the connection should use an approved means (such as IPsec) [I-D.ietf-tcpm-tcp-antispoof] for protection against spoofed segments. 7. Conclusion When connectivity characteristics between two hosts change abruptly, TCP can experience significant delays before resuming transmission in an efficient manner or TCP can behave unfairly to competing traffic. This document describes TCP extensions that improve transmission behavior in response to advisory, lower-layer connectivity-change indications. The proposed TCP extensions modify the local behavior of TCP and introduce a new TCP option to signal local connectivity- change indications to remote peers. 8. IANA Considerations This section is to be interpreted according to [RFC2434]. This document does not define any new namespaces. It uses an 8-bit TCP option number maintained by IANA at http://www.iana.org/assignments/tcp-parameters. 9. Acknowledgments This draft combines and obsoletes [I-D.swami-tcp-lmdr] and [I-D.eggert-tcpm-tcp-retransmit-now]. The authors would like to thank Mark Allman, Marcus Brunner, Shashikant Maheshwari, Kacheong Poon, Juergen Quittek, Stefan Schmid and Joe Touch for their comments and suggestions on the two previous drafts. Schuetz, et al. Expires December 2, 2006 [Page 16] Internet-Draft TCP Response to Connectivity Indications May 2006 Lars Eggert and Simon Schuetz are partly funded by Ambient Networks, a research project supported by the European Commission under its Sixth Framework Program. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the Ambient Networks project or the European Commission. Wesley Eddy's work on this document was performed at NASA's Glenn Research Center, while in support of the NASA Space Communications Architecture Working Group (SCAWG), and the FAA/Eurocontrol Future Communications Study (FCS). 10. References 10.1. Normative References [RFC0793] Postel, J., "Transmission Control Protocol", STD 7, RFC 793, September 1981. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC2434] Narten, T. and H. Alvestrand, "Guidelines for Writing an IANA Considerations Section in RFCs", BCP 26, RFC 2434, October 1998. [RFC2581] Allman, M., Paxson, V., and W. Stevens, "TCP Congestion Control", RFC 2581, April 1999. [RFC2988] Paxson, V. and M. Allman, "Computing TCP's Retransmission Timer", RFC 2988, November 2000. 10.2. Informative References [DRIVE-THRU] Ott, J. and D. Kutscher, "Drive-Thru Internet: IEEE 802.11b for Automobile Users", Proc. Infocom 2004, March 2004. [DUKEHEND] Duke, M., Henderson, T., and J. Meegan, "Experience with ``Link-UP Notification'' Over a Mobile Satellite Link", ACM Computer Communication Review, Vol. 34, No. 3, July 2004. [ES05] Eddy, W. and Y. Swami, "Adapting End-host Congestion Schuetz, et al. Expires December 2, 2006 [Page 17] Internet-Draft TCP Response to Connectivity Indications May 2006 Control for Mobility", NASA Glenn Research Center Technical Report, CR-2005-213838, July 2005. [I-D.dawkins-trigtran-linkup] Dawkins, S., "End-to-end, Implicit 'Link-Up' Notification", draft-dawkins-trigtran-linkup-01 (work in progress), October 2003. [I-D.eggert-tcpm-tcp-retransmit-now] Eggert, L., "TCP Extensions for Immediate Retransmissions", draft-eggert-tcpm-tcp-retransmit-now-02 (work in progress), June 2005. [I-D.iab-link-indications] Aboba, B., "Architectural Implications of Link Indications", draft-iab-link-indications-04 (work in progress), December 2005. [I-D.ietf-dna-link-information] Yegin, A., "Link-layer Event Notifications for Detecting Network Attachments", draft-ietf-dna-link-information-03 (work in progress), October 2005. [I-D.ietf-hip-mm] Nikander, P., "End-Host Mobility and Multihoming with the Host Identity Protocol", draft-ietf-hip-mm-03 (work in progress), March 2006. [I-D.ietf-tcpimpl-restart] Hughes, A., Touch, J., and J. Heidemann, "Issues in TCP Slow-Start Restart After Idle", draft-ietf-tcpimpl-restart-00 (work in progress), March 1998. [I-D.ietf-tcpm-tcp-antispoof] Touch, J., "Defending TCP Against Spoofing Attacks", draft-ietf-tcpm-tcp-antispoof-03 (work in progress), February 2006. [I-D.ietf-tcpm-tcp-uto] Eggert, L. and F. Gont, "TCP User Timeout Option", draft-ietf-tcpm-tcp-uto-02 (work in progress), October 2005. [I-D.ietf-tsvwg-quickstart] Floyd, S., "Quick-Start for TCP and IP", draft-ietf-tsvwg-quickstart-02 (work in progress), March 2006. Schuetz, et al. Expires December 2, 2006 [Page 18] Internet-Draft TCP Response to Connectivity Indications May 2006 [I-D.swami-tcp-lmdr] Swami, Y., "Lightweight Mobility Detection and Response (LMDR) Algorithm for TCP", draft-swami-tcp-lmdr-07 (work in progress), March 2006. [KOODLI] Koodli, R. and C. Perkins, "Fast Handovers and Context Transfers in Mobile Networks", ACM Computer Communication Review, Vol. 31, No. 5, October 2001. [RFC1122] Braden, R., "Requirements for Internet Hosts - Communication Layers", STD 3, RFC 1122, October 1989. [RFC1191] Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191, November 1990. [RFC1981] McCann, J., Deering, S., and J. Mogul, "Path MTU Discovery for IP version 6", RFC 1981, August 1996. [RFC2131] Droms, R., "Dynamic Host Configuration Protocol", RFC 2131, March 1997. [RFC2460] Deering, S. and R. Hinden, "Internet Protocol, Version 6 (IPv6) Specification", RFC 2460, December 1998. [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition of Explicit Congestion Notification (ECN) to IP", RFC 3168, September 2001. [RFC3344] Perkins, C., "IP Mobility Support for IPv4", RFC 3344, August 2002. [RFC3775] Johnson, D., Perkins, C., and J. Arkko, "Mobility Support in IPv6", RFC 3775, June 2004. [RFC3819] Karn, P., Bormann, C., Fairhurst, G., Grossman, D., Ludwig, R., Mahdavi, J., Montenegro, G., Touch, J., and L. Wood, "Advice for Internet Subnetwork Designers", BCP 89, RFC 3819, July 2004. [RFC4306] Kaufman, C., "Internet Key Exchange (IKEv2) Protocol", RFC 4306, December 2005. [SCHUETZ-CCR] Schuetz, S., Eggert, L., Schmid, S., and M. Brunner, "Protocol Enhancements for Intermittently Connected Hosts", ACM Computer Communication Review, Vol. 35, No. 3, July 2005. Schuetz, et al. Expires December 2, 2006 [Page 19] Internet-Draft TCP Response to Connectivity Indications May 2006 [SCOTT] Scott, J. and G. Mapp, "Link layer-based TCP optimisation for disconnecting networks", ACM Computer Communication Review, Vol. 33, No. 5, October 2003. Editorial Comments [anchor3] LE: The authors have seen the idea of triggering retransmits based on connectivity events of directly- connected links attributed to Phil Karn ("kick" operation in the KAQ9 TCP stack). Pointers to a citable reference are highly appreciated! Appendix A. Document Revision History +----------+--------------------------------------------------------+ | Revision | Comments | +----------+--------------------------------------------------------+ | 00 | Initial version. This document is a merge of and | | | obsoletes [I-D.eggert-tcpm-tcp-retransmit-now] and | | | [I-D.swami-tcp-lmdr]. | +----------+--------------------------------------------------------+ Schuetz, et al. Expires December 2, 2006 [Page 20] Internet-Draft TCP Response to Connectivity Indications May 2006 Authors' Addresses Simon Schuetz NEC Network Laboratories Kurfuerstenanlage 36 Heidelberg 69115 Germany Phone: +49 6221 4342 165 Fax: +49 6221 4342 155 Email: simon.schuetz@netlab.nec.de URI: http://www.netlab.nec.de/ Lars Eggert NEC Network Laboratories Kurfuerstenanlage 36 Heidelberg 69115 Germany Phone: +49 6221 4342 143 Fax: +49 6221 4342 155 Email: lars.eggert@netlab.nec.de URI: http://www.netlab.nec.de/ Wesley M. Eddy Verizon Federal Network Systems NASA Glenn Research Center 21000 Brookpark Road, MS 54-5 Cleveland, OH 44135 USA Email: weddy@grc.nasa.gov Yogesh Prem Swami Nokia Research Center, Dallas 6000 Connection Drive Irving, TX 75603 USA Phone: +1 972 374 0669 Email: yogesh.swami@nokia.com Schuetz, et al. Expires December 2, 2006 [Page 21] Internet-Draft TCP Response to Connectivity Indications May 2006 Khiem Le Nokia Research Center, Dallas 6000 Connection Drive Irving, TX 75603 USA Phone: +1 972 894 4882 Email: khiem.le@nokia.com Schuetz, et al. Expires December 2, 2006 [Page 22] Internet-Draft TCP Response to Connectivity Indications May 2006 Intellectual Property Statement The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79. Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf-ipr@ietf.org. Disclaimer of Validity This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Copyright Statement Copyright (C) The Internet Society (2006). This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights. Acknowledgment Funding for the RFC Editor function is currently provided by the Internet Society. Schuetz, et al. Expires December 2, 2006 [Page 23]