Network Working Group Reiner Ludwig INTERNET-DRAFT Michael Meyer Expires: August 2002 Ericsson Research February, 2002 The Eifel Detection Algorithm for TCP Status of this memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or cite them other than as "work in progress". The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/lid-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html Abstract The Eifel detection algorithm allows the TCP sender to detect a posteriori whether it has entered loss recovery unnecessarily. It already determines this in the beginning of loss recovery when the first new ACK arrives after the timeout-based or fast retransmit has been sent. The algorithm requires that the TCP Timestamps option defined in RFC1323 is enabled for a connection. The idea is to use the timestamps echoed in returning ACKs to eliminate the retransmission ambiguity in TCP. The Eifel detection algorithm provides a basis for future TCP enhancements. This includes response algorithms to back out of loss recovery by restoring a TCP sender's congestion control state. Ludwig & Meyer [Page 1] INTERNET-DRAFT TCP - Eifel Detection February, 2002 Terminology The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this document, are to be interpreted as described in [RFC2119]. We use the term 'new ACK' to refer to an ACK that acknowledges previously unacknowledged data. We use the term 'duplicate ACK' as defined in [WS95]. Furthermore, we refer to the first-time transmission of a data segment as the 'original transmit'. 1. Introduction The retransmission ambiguity problem [KP87] is the TCP sender's inability to distinguish whether the first new ACK that arrives after a retransmit was sent in response to the original transmit or the retransmit. This problem occurs after a timeout-based retransmit and after a fast retransmit. The Eifel detection algorithm uses the TCP Timestamps option defined in RFC1323 to eliminate the retransmission ambiguity. It thereby allows the TCP sender to detect a posteriori whether it has entered loss recovery unnecessarily. This added capability of the TCP sender is useful in environments where TCP's loss recovery and congestion control algorithms may often get falsely triggered. This can be caused by packet reordering, packet duplication, the loss of a flight of ACKs, or a sudden delay increase in the data or the ACK path that results in a spurious timeout. For example, such sudden delay increases can often occur in wide-area wireless access networks due to handovers, resource preemption, or because the mobile transmitter traverses through a radio coverage hole (e.g., see [Gu01]). Based on the Eifel detection algorithm, the TCP sender may then choose to implement dedicated response algorithms. One goal of such a response algorithm would be to alleviate the consequences of a falsely triggered loss recovery. This may include restoring the TCP sender's congestion control state, and/or avoiding the often unnecessary go-back-N retransmits that typically occur after a spurious timeout. Another goal would be to adapt protocol parameters such as the duplicate acknowledgement threshold [RFC2581], and/or the RTT estimators [RFC2988]. This is to reduce the risk of falsely triggering TCP's loss recovery again as the connection progresses. However, such response algorithms are outside the scope of this document. Note: The original proposal, the "Eifel algorithm" [LK00], comprises both a detection and a response algorithm. This document only defines the detection part. A key feature of the Eifel detection algorithm is that it already detects upon the first new ACK that arrives during loss recovery whether a fast retransmit or a timeout was spurious. This is crucial Ludwig & Meyer [Page 2] INTERNET-DRAFT TCP - Eifel Detection February, 2002 to be able to avoid the mentioned go-back-N retransmits. Another feature is that the Eifel detection algorithm is fairly robust against the loss of ACKs. 2. Events that Falsely Trigger TCP Loss Recovery The following events falsely trigger a TCP sender's loss recovery and congestion control algorithms. This causes a so-called spurious retransmit, and an unnecessary reduction of the TCP sender's congestion window and slow start threshold [RFC2581]. - Spurious timeout - Packet reordering - Packet duplication The common understanding of a spurious timeout is a timeout that would not have occurred had the sender "waited longer". This may be caused by increased delay that suddenly occurs in the data and/or the ACK path. That in turn might cause an ACK to arrive too late, i.e., only after the TCP sender's retransmission timer has expired. Note that such a late ACK could either be a new ACK that would acknowledge the oldest outstanding segment, or it could be the third duplicate ACK that would trigger a fast retransmit of the oldest outstanding segment [RFC2581]. For the purpose of specifying the algorithm in Section 3, we define the former case as SPUR_TO_NEWACK, and the latter as SPUR_TO_DUPACK. However, in this document we stretch the definition of a spurious timeout. We also speak of a spurious timeout when the timeout occurred because an entire flight of ACKs was lost while in fact the original transmit of the oldest outstanding segment had arrived at the TCP receiver. This case is also considered to fall under the definition of SPUR_TO_NEWACK. Even though such a timeout is certainly unavoidable, the triggering of loss recovery was spurious since that original transmit was not lost. Hence, we use the term 'spurious timeout' in the sense that entering the associated loss recovery was unnecessary. Yet, we exclude from the definition of a spurious timeout the case where the retransmit arrives at the TCP receiver before the corresponding original transmit. Packet reordering in the network may occur because IP [RFC791] does not guarantee in-order delivery of packets. Additionally, a TCP receiver generates a duplicate ACK for each segment that arrives out- of-order. This results in a spurious fast retransmit if three or more data segments arrive out-of-order at the TCP receiver, and at least three of the resulting duplicate ACKs arrive at the TCP sender. We define such a case as SPUR_FR. Packet duplication may occur because a receiving IP does not (cannot) remove packets that have been duplicated in the network. A TCP Ludwig & Meyer [Page 3] INTERNET-DRAFT TCP - Eifel Detection February, 2002 receiver in turn also generates a duplicate ACK for each duplicate segment. As with packet reordering, this results in a spurious fast retransmit if duplication of data segments or ACKs results in three or more duplicate ACKs to arrive at the TCP sender. This case is also considered to fall under the definition of SPUR_FR. The negative impact on TCP performance caused by packet reordering and packet duplication is commonly the same: a single spurious retransmit (the fast retransmit), and the unnecessary halving of the TCP sender's congestion window as a result of the subsequent fast recovery phase [RFC2581]. The negative impact on TCP performance caused by a spurious timeout is more severe. First, the timeout event itself causes a single spurious retransmit, and unnecessarily forces the TCP sender into slow start [RFC2581]. Then, as the connection progresses, a chain reaction gets triggered that further decreases TCP's performance. Since the timeout was spurious, at least some ACKs for original transmits will typically arrive at the TCP sender before the ACK for the retransmit arrives. (This is unless severe packet reordering coincided with the spurious timeout in such a way that the ACK for the retransmit is the first new ACK to arrive at the TCP sender.) Those ACKs for original transmits will then trigger an implicit go- back-N loss recovery at the TCP sender. Assuming that none of the outstanding segments and none of the corresponding ACKs were lost, all outstanding segments will get retransmitted unnecessarily. Moreover, in some TCPs this may then cause a spurious fast retransmit. This is because the spurious go-back-N retransmits will arrive as duplicates at the TCP receiver, which in turn triggers a series of duplicate ACKs. Note that this last spurious fast retransmit could be avoided with the careful variant of 'bug fix' [RFC2582]. More detailed explanations including TCP trace plots that visualize the effects of spurious timeouts and packet reordering can be found in [LK00]. 3. The Eifel Detection Algorithm 3.1 The Idea The goal of the Eifel detection algorithm is to allow the TCP sender to detect a posteriori whether it has entered loss recovery unnecessarily. Furthermore, the TCP sender should be able to make this decision upon the first new ACK that arrives after the timeout- based or fast retransmit has been sent. This in turn requires extra information in every ACK by which the TCP sender can unambiguously distinguish whether that first new ACK was sent in response to the original transmit or the retransmit. Such extra information is provided by the TCP Timestamps option [RFC1323]. Generally speaking, Ludwig & Meyer [Page 4] INTERNET-DRAFT TCP - Eifel Detection February, 2002 timestamps are monotonously increasing "serial numbers" added into every segment that are then echoed within the corresponding ACKs. This is exploited by the Eifel detection algorithm in the following way. Given that timestamps are enabled for a connection the TCP sender always stores the timestamp of the retransmit sent in the beginning of loss recovery, i.e., the timestamp of the timeout-based or the fast retransmit. If the timestamp of the first new ACK, arriving after the mentioned retransmit was sent, is smaller then the stored timestamp of that retransmit, then that ACK must have been sent in response to an original transmit. Hence, the TCP sender must have entered loss recovery unnecessarily. The fact that the Eifel detection algorithm decides upon this first new ACK is crucial to allow future response algorithms to avoid the spurious go-back-N retransmits that typically occur after a spurious timeout. Also, if loss recovery was entered unnecessarily, a window worth of ACKs are outstanding that all carry a timestamp that is smaller than the stored timestamp of the retransmit. The arrival of any one of those ACKs suffices the Eifel detection algorithm to work. Hence, the solution is fairly robust against ACK losses. Even the ACK sent in response to the retransmit, i.e., the one that carries the stored timestamp, may get lost. Note: Also the DSACK option [RFC2883] can be used to detect a posteriori whether the TCP sender has entered loss recovery unnecessarily. However, the first ACK carrying a DSACK option usually arrives at the TCP sender only after loss recovery has already terminated. Thus, the DSACK option cannot be used to eliminate the retransmission ambiguity. Consequently, it cannot be used to avoid the mentioned spurious go-back-N retransmits. Moreover, a DSACK-based detection algorithm is less robust against ACK losses. 3.2 The Algorithm Given that the TCP Timestamps option [RFC1323] is enabled for a connection, a TCP sender MAY use the Eifel detection algorithm as defined in this subsection. If the Eifel detection algorithm is used, the following steps MUST be taken by the TCP sender only upon initiation of loss recovery, i.e., when either the timeout-based retransmit or fast retransmit is sent. Note: The Eifel detection algorithm should not be (re-)initiated after loss recovery has already started. In particular, it may not be (re-)initiated upon subsequent timeouts for the same segment, and not upon retransmitting segments other than the oldest outstanding segment. Ludwig & Meyer [Page 5] INTERNET-DRAFT TCP - Eifel Detection February, 2002 (1) Set a "SpuriousRecovery" variable to FALSE. (2) Set a "RetransmitTS" variable to the value of the Timestamp Value field of the Timestamps option included in the retransmit. A TCP sender must ensure that RetransmitTS does not get overwritten as loss recovery progresses, e.g., in case of a second timeout and subsequent second retransmit of the same octet. (3) Wait for the arrival of a valid ACK. If a valid ACK arrives, then proceed to step (4). (4) If a first or second duplicate ACK has arrived, then return to step (3), else proceed to step (5). (5) If a third duplicate ACK has arrived and the loss recovery had been initiated with a timeout-based retransmit, then set SpuriousRecovery to SPUR_TO_DUPACK and proceed to step (DONE), else proceed to step (6). (6) If a new ACK has arrived and the Timestamp Echo Reply field of the ACK's Timestamps option is smaller than RetransmitTS, then proceed to step (7), else proceed to step (DONE). (7) If the loss recovery had been initiated with a timeout- based retransmit then set SpuriousRecovery to SPUR_TO_NEWACK, else set SpuriousRecovery to SPUR_FR. Proceed to step (DONE). (DONE) No further processing. The comparison operator "smaller than" in step (6) is conservative. In theory, when the "timestamp clock" is slow or the network is fast, RetransmitTS could at most be equal to the timestamp echoed by an ACK sent in response to an original transmit. In that case, it is assumed that the loss recovery was not falsely triggered. 3.3 Non-RFC1323-compliant TCP Receivers Mask Certain Spurious Timeouts Not all TCP implementations strictly follow RFC1323. There are differences concerning which timestamp a TCP receiver echoes when a duplicate segment arrives. The relevant case to consider in this context is a spurious timeout that occurred because an entire flight of ACKs got lost. (Recall the definition of a spurious timeout in Section 2.) The question is which timestamp the TCP receiver echoes in response to the spurious retransmit that typically arrives as a duplicate segment. RFC1323-compliant TCP receivers (e.g., LINUX kernel 2.4) will echo the timestamp of the last segment that arrived Ludwig & Meyer [Page 6] INTERNET-DRAFT TCP - Eifel Detection February, 2002 in-sequence, while some non-RFC1323-compliant TCP receivers will echo the timestamp of the retransmit. The Eifel detection algorithm will only detect such spurious timeouts in case the TCP receiver is RFC1323-compliant in this respect. 3.4 Protecting Against Misbehaving TCP Receivers A TCP receiver can easily make a genuine retransmit appear to the TCP sender as a spurious retransmit by forging echoed timestamps. This may pose a security concern. Fortunately, there is a way to modify the Eifel detection algorithm in a way that makes it robust against lying TCP receivers. The idea is to use timestamps as a "segment's secret" that the TCP receiver only gets to know if it receives the segment. Conversely, a TCP receiver will not know the timestamp of a segment that was lost. Hence, to "prove" that it received the original transmit of a segment that the TCP sender retransmitted, the TCP receiver would need to return the timestamp of that original transmit. The Eifel detection algorithm could then be modified to only decide that loss recovery had been unnecessarily entered if the first new ACK echoes the timestamp of the original transmit. Hence, implementers may choose to implement the algorithm with the following modifications. Step (2) is replaced with step (2'): (2') Set a "RetransmitTS" variable to the value of the Timestamp Value field of the Timestamps option that was included in the original transmit corresponding to the retransmit. Note: This step requires that the TCP sender stores the timestamps of all outstanding original transmits. Step (4) is replaced with step (4'): (4') If a first or second duplicate ACK has arrived, then proceed to step (DONE), else proceed to step (6). Step (6) is replaced with step (6'): (6') If a new ACK has arrived and the Timestamp Echo Reply field of the ACK's Timestamps option is equal to RetransmitTS, then proceed to step (7), else proceed to step (DONE). These modifications come at a cost. First, spurious timeouts of type SPUR_TO_DUPACK cannot be detected any longer. Also, the modified Ludwig & Meyer [Page 7] INTERNET-DRAFT TCP - Eifel Detection February, 2002 algorithm is fairly sensitive against ACK losses since it relies on the arrival of the new ACK that corresponds to the original transmit. Note: The first new ACK that arrives after loss recovery had been unnecessarily entered, typically echoes the timestamp of the original transmit. This assumes that the ACK corresponding to the original transmit was not lost, and the TCP receiver does not forge timestamps but complies with RFC1323. In case of a spurious fast retransmit, this is implied by the rules for generating ACKs for data segments that fill in all or part of a gap in the sequence space (see section 4.2 of [RFC2581]) and by the rules for echoing timestamps in that case (see rule (C) in section 3.4 of [RFC1323]). In case of a spurious timeout, it is likely that the delay that has caused the spurious timeout has also caused the TCP receiver's delayed ACK timer [RFC1122] to expire before the original transmit arrives. Also, in this case the rules for generating ACKs and the rules for echoing timestamps (see rule (A) in section 3.4 of [RFC1323]) ensure that the original transmit's timestamp is echoed. A remaining problem is that a TCP receiver might guess a lost segment's timestamp from observing the timestamps of segments that arrived earlier. This could be avoided by having the TCP sender add a "random counter" to the timestamp of every segment it sends, and then increment that counter by a random value, e.g., between 1 and 100. This ensures that timestamps remain monotonously increasing while making it difficult for a TCP receiver to guess the timestamp of a lost segment. 4. Security Considerations There do not seem to be any security considerations associated with the Eifel detection algorithm. This is because the Eifel detection algorithm does not alter existing protocol state at the TCP sender nor at the receiver. That is, no value of a state variable other than one introduced by the algorithm itself (SpuriousRecovery, and RetransmitTS) is changed. Moreover, a variant of the Eifel detection algorithm has been proposed in Section 3.4 that makes it robust against lying TCP receivers. Acknowledgments Many thanks to Keith Sklower, Randy Katz, Stephan Baucke, Sally Floyd, Vern Paxson, Mark Allman, Ethan Blanton, Andrei Gurtov, Pasi Sarolahti, and Alexey Kuznetsov for useful discussions that contributed to this work. Ludwig & Meyer [Page 8] INTERNET-DRAFT TCP - Eifel Detection February, 2002 References [RFC2581] M. Allman, V. Paxson, W. Stevens, TCP Congestion Control, RFC 2581, April 1999. [RFC2119] S. Bradner, Key words for use in RFCs to Indicate Requirement Levels, RFC 2119, March 1997. [RFC1323] V. Jacobson, R. Braden, D. Borman, TCP Extensions for High Performance, RFC 1323, May 1992. [RFC2582] S. Floyd, T. Henderson, The NewReno Modification to TCP's Fast Recovery Algorithm, RFC 2582, April 1999. [RFC2883] S. Floyd, J. Mahdavi, M. Mathis, M. Podolsky, A. Romanow, An Extension to the Selective Acknowledgement (SACK) Option for TCP, RFC 2883, July 2000. [Gu01] A. Gurtov, Effect of Delays on TCP Performance, In Proceedings of IFIP Personal Wireless Communications, August 2001. [KP87] P. Karn, C. Partridge, Improving Round-Trip Time Estimates in Reliable Transport Protocols, In Proceedings of ACM SIGCOMM 87. [LK00] R. Ludwig, R. H. Katz, The Eifel Algorithm: Making TCP Robust Against Spurious Retransmissions, ACM Computer Communication Review, Vol. 30, No. 1, January 2000, available at http://www.acm.org/sigcomm/ccr/archive/2000/ jan00/ccr-200001-ludwig.html (easier studied when viewed/printed in color). [RFC2988] V. Paxson, M. Allman, Computing TCP's Retransmission Timer, RFC 2988, November 2000. [RFC791] J. Postel, Internet Protocol, RFC 791, September 1981. [RFC793] J. Postel, Transmission Control Protocol, RFC793, September 1981. [WS95] G. R. Wright, W. R. Stevens, TCP/IP Illustrated, Volume 2 (The Implementation), Addison Wesley, January 1995. Ludwig & Meyer [Page 9] INTERNET-DRAFT TCP - Eifel Detection February, 2002 Author's Address Reiner Ludwig Ericsson Research (EED) Ericsson Allee 1 52134 Herzogenrath, Germany Phone: +49 2407 575 719 Fax: +49 2407 575 400 Email: Reiner.Ludwig@eed.ericsson.se Michael Meyer Ericsson Research (EED) Ericsson Allee 1 52134 Herzogenrath, Germany Phone: +49 2407 575 654 Fax: +49 2407 575 400 Email: Michael.Meyer@eed.ericsson.se This Internet-Draft expires in August 2002. Ludwig & Meyer [Page 10]