Network Working Group Reiner Ludwig INTERNET-DRAFT Ericsson Research Expires: May 2002 November, 2001 The Eifel Algorithm for TCP Status of this memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or cite them other than as "work in progress". The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/lid-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html Abstract A solution to eliminate the retransmission ambiguity in TCP, is to mark ACKs with a special retransmit-marker. The marker would need to be present in those ACKs, and only those ACKs, that the TCP receiver sends in response to retransmits; both genuine and spurious retransmits. Based on such a retransmit-marker, the Eifel algorithm allows the TCP sender to detect a posteriori that a fast retransmit or a timeout was spurious. Three alternative retransmit-markers are defined in this document, and the Eifel algorithm may be based on either one of them: the TCP RXT flag, the TCP Timestamps option, and the TCP SACK option. The Eifel algorithm provides a basis for future TCP enhancements such as response schemes that may change a TCP sender's protocol state to improve end-to-end performance. R. Ludwig [Page 1] INTERNET-DRAFT The Eifel Algorithm for TCP November, 2001 Terminology The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this document, are to be interpreted as described in [RFC2119]. We use the term 'new ACK' to refer to an ACK that acknowledges outstanding data. We use the term 'duplicate' ACK as defined in [WS95]. Furthermore, we refer to the first-time transmission of a data segment as the 'original transmit', and 'HighData' is the highest sequence number transmitted at a given point. 1. Introduction The retransmission ambiguity problem [KP87] is the TCP sender's inability to distinguish whether the first new ACK that arrives after a retransmit was sent in response to the original transmit or the retransmit. A solution to eliminate the retransmission ambiguity is to mark ACKs with a special "retransmit-marker". The marker would need to be present in those ACKs, and only those ACKs, that the TCP receiver sends in response to retransmits; both genuine and spurious retransmits. Based on such a retransmit-marker, the Eifel algorithm allows the TCP sender to detect a posteriori that a fast retransmit or a timeout was spurious. This added capability of the TCP sender is useful in environments where TCP's loss recovery and congestion control algorithms may often get falsely triggered. Packet reordering, packet duplication, or a sudden delay increase in the data or the ACK path that results in a spurious timeout can cause this. For example, such sudden delay increases can often occur in wide-area wireless access networks due to handovers, resource preemption, or because the mobile transmitter traverses through a radio coverage hole (e.g., see [Gu01]). Based on the Eifel algorithm, the TCP sender may then choose to implement dedicated response schemes. One goal of such a scheme would be to alleviate the consequences of a falsely triggered loss recovery phase. For example, an important feature of the scheme proposed in [LG01] is that it avoids the spurious go-back-N retransmits that typically occur after a spurious timeout. Another goal would be to "upgrade" the congestion control parameters, the congestion window and slow start threshold [RFC2581]. This is to compensate for the unnecessary reduction of their values when the loss recovery was falsely triggered. Yet, another goal would be to adapt other protocol parameters, e.g., the duplicate acknowledgement threshold [RFC2581], and the RTT estimators [RFC2988]. This is to reduce the risk of falsely triggering TCP's loss recovery again in the future. However, such response schemes are outside the scope of this document. They are addressed in different documents (e.g., see [LG01] and [BA01a]). R. Ludwig [Page 2] INTERNET-DRAFT The Eifel Algorithm for TCP November, 2001 A key feature of the Eifel algorithm is that it already detects upon the first new ACK that arrives during a loss recovery phase that a fast retransmit or a timeout was spurious. This is crucial to be able to avoid the mentioned spurious go-back-N retransmits. Another feature is that the Eifel algorithm is fairly robust against ACK losses. Especially, the loss of the single ACK carrying the retransmit-marker does not affect the functioning of the Eifel algorithm. 2. Spurious Retransmits The following events falsely trigger a TCP sender's loss recovery and congestion control algorithms. This causes a so-called spurious retransmit, and an unnecessary reduction of the TCP sender's congestion window. - Spurious timeouts - Packet reordering - Packet duplication A spurious timeout is a timeout that would not have occurred had the sender "waited longer". It may be caused by increased delay that suddenly occurs in the data or the ACK path. This in turn might cause an ACK to arrive too late, i.e., only after the TCP sender's retransmission timer has expired. This would then trigger a spurious retransmit. Note that the mentioned ACK could either be a new ACK that would acknowledge the oldest outstanding segment, or it could be the third duplicate ACK that would have triggered a fast retransmit of the oldest outstanding segment [RFC2581]. Note: In the latter case the sender should suppress the fast retransmit that the third duplicate ACK would usually trigger. In general, a TCP sender should ignore all duplicate ACKs that arrive after a timeout [GL01]. Packet reordering in the network may occur because IP [RFC791] is a connection-less protocol. Thus, IP does not guarantee an in-order delivery of packets. This results in a spurious fast retransmit if three or more data segments arrive out-of-order at the TCP receiver, and at least three of the resulting duplicate ACKs arrive at the TCP sender. This is because a TCP receiver generates a duplicate ACK for each segment that arrives out-of-order, and three consecutive duplicate ACKs trigger the TCP sender's fast retransmit algorithm. Packet duplication in the network may also occur because IP is connection-less. That is, the receiving IP does not (cannot) remove duplicate packets. A TCP receiver in turn also generates a duplicate ACK for each duplicate segment. As with packet reordering, this results in a spurious fast retransmit if duplication of data segments R. Ludwig [Page 3] INTERNET-DRAFT The Eifel Algorithm for TCP November, 2001 or ACKs results in three or more duplicate ACKs to arrive at the TCP sender. The negative impact on TCP performance caused by packet reordering and packet duplication is commonly the same: a single spurious retransmit (the fast retransmit), and the unnecessary halving of the TCP sender's congestion window as a result of the subsequent fast recovery phase [RFC2581]. The negative impact on TCP performance caused by a spurious timeout is more severe. First, the timeout event itself causes a single spurious retransmit, and unnecessarily forces the TCP sender into slow start [RFC2581]. Then, as the connection progresses, a chain reaction gets triggered that further decreases TCP's performance. Since the timeout was spurious, at least some ACKs for original transmits will typically arrive at the TCP sender before the ACK for the retransmit arrives. (This is unless severe packet reordering coincided with the spurious timeout in such a way that the ACK for the retransmit is the first new ACK to arrive at the TCP sender.) Those ACKs for original transmits will then trigger an implicit go- back-N loss recovery at the TCP sender. Assuming that none of the outstanding segments and none of the corresponding ACKs were lost, all outstanding segments will get retransmitted unnecessarily. Moreover, this will then typically cause a spurious fast retransmit. This is because the spurious go-back-N retransmits will arrive as duplicates at the TCP receiver, which in turn triggers a series of duplicate ACKs. Note that this last spurious fast retransmit could be avoided with the careful variant of 'bug fix' [RFC2582]. More detailed explanations including TCP trace plots that visualize the effects of packet reordering and spurious timeouts can be found in [LK00]. 3. The Eifel Algorithm 3.1 The Idea As mentioned before, a solution to eliminate the retransmission ambiguity in TCP, is to mark ACKs with a special retransmit-marker. The marker would need to be present in those ACKs, and only those ACKs, that the TCP receiver sends in response to retransmits; both genuine and spurious retransmits. Based on such a retransmit-marker, the Eifel algorithm allows the TCP sender to detect a posteriori that a fast retransmit or a timeout was spurious. [Note: The Eifel algorithm as defined here is a pure detection mechanism. The original proposal of the Eifel algorithm [LK00] included the TCP sender's response to a detected spurious retransmit, and a marking scheme that allows a TCP sender to distinguish an ACK for an original transmit from that of a R. Ludwig [Page 4] INTERNET-DRAFT The Eifel Algorithm for TCP November, 2001 retransmit. Such response and marking schemes are addressed in different documents [RFC2883], [BA01a], [BA01b], [LM01], [LG01].] The key idea of the Eifel algorithm is to let the TCP sender take the absence of a retransmit-marker in the first new ACK that arrives after a retransmit as sufficient evidence that the loss recovery phase was falsely triggered. Being able to make this decision upon the first new ACK is crucial for response schemes such as [LG01] to avoid the spurious go-back-N retransmits that typically occur after a spurious timeout. However, making this decision upon the first new ACK is not absolutely reliable. A counter-example can be constructed where this approach fails: In case the original transmit was in fact dropped in the network, a new ACK that arrives after a retransmit would also not carry a retransmit-marker if (1) the retransmit arrived at the TCP receiver in sequence, i.e., if it had jumped ahead of all data segments that were outstanding when the retransmit was sent, and if (2) the ACK for the retransmit carrying the retransmit-marker got lost. In that case the mentioned new ACK would correspond to one of the data segments that were outstanding when the retransmit was sent. Note: This example holds independent of whether the loss recovery phase was triggered by the arrival of the third duplicate ACK or by a timeout. Note also: in case the SACK option is used as the retransmit-marker (see Section 3.2 and 3.4), condition (2) is not required. Nevertheless, this counter-example is regarded as being a rather pathological case. In addition, it seems to be the only conceivable counter-example. Therefore, it is regarded as sufficiently safe to take the absence of a retransmit-marker in the first new ACK that arrives after a retransmit as the signal that the TCP sender falsely entered the loss recovery phase. This approach makes the Eifel algorithm fairly robust against ACK losses. This is because a number of ACKs that do not carry the retransmit-marker are commonly in flight during a falsely triggered loss recovery phase. The arrival of at least one of those ACKs would be sufficient for the Eifel algorithm to make a decision. Also, the loss of the single ACK carrying the retransmit-marker does not affect the functioning of the Eifel algorithm. Three alternative retransmit-markers are defined in this document, and the Eifel algorithm may be based on either one of them: (1) the TCP RXT flag [LM01], R. Ludwig [Page 5] INTERNET-DRAFT The Eifel Algorithm for TCP November, 2001 (2) the TCP Timestamps option [RFC1323], and (3) the TCP SACK option [RFC2018]. The exact use of those markers is specified in the following subsections. The RXT flag is the most efficient retransmit-marker since it does not enlarge TCP segments or ACKs by an option. Unlike for the RXT flag and the Timestamps option, the SACK option has the drawback that it can only detect whether a fast retransmit was spurious. It cannot be used to detect spurious timeouts as explained in Section 3.4. Note: The DSACK option [RFC2883] is not an appropriate retransmit- marker that would allow to eliminate the retransmission ambiguity. The reason is that the first new ACK that arrives after a retransmit will commonly not carry a DSACK option. That is, the DACK option commonly arrives one or more ACKs later than the first new ACK for a retransmit. This is independent of whether the retransmit was genuine or spurious. Given that both the TCP sender and receiver support the RXT flag, or the Timestamps option, or the SACK option, a TCP sender MAY implement the Eifel algorithm as defined in the following subsections. 3.2 The Algorithm The following steps MUST be taken by the TCP sender when the third duplicate ACK arrives or a timeout occurs, i.e., when a loss recovery phase starts: (1) Set a "SpuriousRecovery" variable to 'false'. If this variable is true at step (7) below, the loss recovery phase had been falsely triggered. (2) Set a "RecoveryPoint" variable to HighData. When the TCP sender receives the first new ACK for this data octet the loss recovery phase is terminated. (3-TS) This step only applies when the Timestamps option is used as the retransmit-marker: Set a "RetransmitTS" variable to the value of the Timestamp Value field of the Timestamps option included in the first retransmit. A TCP sender MUST ensure that RetransmitTS does not get overwritten as the loss recovery phase progresses, e.g., in case of a second timeout and subsequent second retransmit of the same octet. The following steps MUST be taken by the TCP sender when the first new ACK arrives after the loss recovery phase started: R. Ludwig [Page 6] INTERNET-DRAFT The Eifel Algorithm for TCP November, 2001 (4) If the ACK's sequence number is greater than the value of RecoveryPoint , proceed to step (7) below. This condition ensures that only those ACKs are considered that correspond to segments that were outstanding at the time the retransmit was sent. (5-SACK) This step only applies when the SACK option is used as the retransmit-marker: If the ACK's sequence number is equal to the value of RecoveryPoint, proceed to step (7) below. This step is motivated in Section 3.4. (6-RXT) This step only applies when the RXT flag is used as the retransmit-marker: If the ACK does not have the RXT flag set (binary 1), set SpuriousRecovery to 'true' and proceed to step (7) below. (6-TS) This step only applies when the Timestamps option is used as the retransmit-marker: If the value of the Timestamp Echo Reply field of the ACK's Timestamps option is smaller than RetransmitTS, set SpuriousRecovery to 'true' and proceed to step (7) below. This step is commented in Section 3.3. (6-SACK) This step only applies when the SACK option is used as the retransmit-marker, and only if the loss recovery phase had been triggered by the arrival of the third duplicate ACK: If the ACK does not carry a SACK option, set SpuriousRecovery to 'true' and proceed to step (7) below. This step is motivated in Section 3.4. (7) Done. No further processing. 3.3 Comments to the Timestamp-based Detection The comparison operator "smaller than" in step (6-TS) is conservative. In theory, when the "timestamp clock" is slow or the network is fast, RetransmitTS could at most be equal to the timestamp echoed by an ACK sent in response to an original transmit. In that case, it is assumed that the loss recovery was not falsely triggered. 3.4 Comments to the SACK-based Detection The SACK option is not a retransmit-marker in the sense that it is present only in those ACKs that the TCP receiver sends in response to retransmits (see Section 3.1). This is why the SACK option cannot be used to detect that a timeout was spurious. For this, we only need to consider the case where an entire flight of segments was lost versus the case of a spurious timeout. Assuming that packet reordering did not concur, the first new ACK arriving after the retransmit would not R. Ludwig [Page 7] INTERNET-DRAFT The Eifel Algorithm for TCP November, 2001 carry a SACK option in either case. Thus, the retransmission ambiguity problem would still exist. Nevertheless, the SACK option can be used to detect that a fast retransmit was spurious. Let's assume a SACK-enabled TCP receiver. If only a single segment was in fact lost, then the first new ACK that arrives after the fast retransmit will not carry a SACK option, and will acknowledge RecoveryPoint. If multiple segments were in fact lost, then the first new ACK that arrives after the fast retransmit will carry a SACK option, and will acknowledge below RecoveryPoint, i.e., the ACK will be partial. Thus, partial ACKs that do not carry a SACK option are impossible independent of how many segments were lost. Conversely, the first new ACK after the fast retransmit that is a partial ACK and that does not carry a SACK option, signals that the loss recovery phase was falsely triggered. This motivates steps (5- SACK) and (6-SACK) in Section 3.2. Note: The SACK option cannot be used to detect that a fast retransmit was spurious when the entire flight of segments jumps ahead of the first segment (the oldest outstanding segment) of that flight. This is because the resulting new ACK would not carry a SACK option but would acknowledge RecoveryPoint. Thus, step (5-SACK) in Section 3.2 would become effective. 4. Security Considerations There do not seem to be any security considerations associated with the Eifel algorithm itself. This is because the Eifel algorithm is only a detection scheme that is not tied to any specific action that might alter existing protocol state at the TCP sender or receiver. That is, no value of a state variable other than one introduced by the algorithm itself (SpuriousRecovery, RecoverPoint, and RetransmitTS) is changed. However, security considerations might exist for response schemes that use the Eifel algorithm as a basis. In particular, it needs to be considered that the TCP receiver might by lying about the retransmit-marker. Acknowledgments Many thanks to Keith Sklower, Randy Katz, Michael Meyer, Stephan Baucke, Sally Floyd, Vern Paxson, Mark Allman, Ethan Blanton, and Andrei Gurtov for very useful discussions that contributed to this work. References [RFC2581] M. Allman, V. Paxson, W. Stevens, TCP Congestion Control, R. Ludwig [Page 8] INTERNET-DRAFT The Eifel Algorithm for TCP November, 2001 RFC 2581, April 1999. [BA01a] E. Blanton, M. Allman, Adjusting the Duplicate ACK Threshold to Avoid Spurious Retransmits, work in progress, July 2001. [BA01b] E. Blanton, M. Allman, Using TCP DSACKs and SCTP Duplicate TSNs to Detect Spurious Retransmissions, work in progress, August 2001. [RFC1122] R. Braden, Requirements for Internet Hosts - Communication Layers, RFC 1122, October 1989. [RFC2119] S. Bradner, Key words for use in RFCs to Indicate Requirement Levels, RFC 2119, March 1997. [RFC1323] V. Jacobson, R. Braden, D. Borman, TCP Extensions for High Performance, RFC 1323, May 1992. [RFC2018] M. Mathis, J. Mahdavi, S. Floyd, A. Romanow, TCP Selective Acknowledgement Options, RFC 2018, October 1996. [RFC2582] S. Floyd, T. Henderson, The NewReno Modification to TCP's Fast Recovery Algorithm, RFC 2582, April 1999. [RFC2883] S. Floyd, J. Mahdavi, M. Mathis, M. Podolsky, A. Romanow, An Extension to the Selective Acknowledgement (SACK) Option for TCP, RFC 2883, July 2000. [Gu01] A. Gurtov, Effect of Delays on TCP Performance, In Proceedings of IFIP Personal Wireless Communications, August 2001. [GL01] A. Gurtov, R. Ludwig, Making TCP Robust Against Delay Spikes, work in progress, November 2001. [KP87] P. Karn, C. Partridge, Improving Round-Trip Time Estimates in Reliable Transport Protocols, In Proceedings of ACM SIGCOMM 87. [LK00] R. Ludwig, R. H. Katz, The Eifel Algorithm: Making TCP Robust Against Spurious Retransmissions, ACM Computer Communication Review, Vol. 30, No. 1, January 2000, available at http://www.acm.org/sigcomm/ccr/archive/2000/ jan00/ccr-200001-ludwig.html (easier studied when viewed/printed in color). [LM01] R. Ludwig, M. Meyer, The TCP Retransmit (RXT) Flag, work in progress, November 2001. [LG01] R. Ludwig, A. Gurtov, Responding to Spurious Timeouts in R. Ludwig [Page 9] INTERNET-DRAFT The Eifel Algorithm for TCP November, 2001 TCP, work in progress, November 2001. [RFC2988] V. Paxson, M. Allman, Computing TCP's Retransmission Timer, RFC 2988, November 2000. [RFC791] J. Postel, Internet Protocol, RFC 791, September 1981. [RFC793] J. Postel, Transmission Control Protocol, RFC793, September 1981. [WS95] G. R. Wright, W. R. Stevens, TCP/IP Illustrated, Volume 2 (The Implementation), Addison Wesley, January 1995. Author's Address Reiner Ludwig Ericsson Research (EED) Ericsson Allee 1 52134 Herzogenrath, Germany Phone: +49 2407 575 719 Fax: +49 2407 575 400 Email: Reiner.Ludwig@Ericsson.com This Internet-Draft expires in May 2002. R. Ludwig [Page 10]