Network Working Group                                      Reiner Ludwig
INTERNET-DRAFT                                             Michael Meyer
Expires: August 2002                                   Ericsson Research
                                                          February, 2002


                 The Eifel Detection Algorithm for TCP
                <draft-ietf-tsvwg-tcp-eifel-alg-03.txt>


Status of this memo

   This document is an Internet-Draft and is in full conformance with
   all provisions of Section 10 of RFC2026.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups. Note that other
   groups may also distribute working documents as Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time. It is inappropriate to use Internet-Drafts as reference
   material or cite them other than as "work in progress".

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/lid-abstracts.txt

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html


Abstract

   The Eifel detection algorithm allows the TCP sender to detect a
   posteriori whether it has entered loss recovery unnecessarily. It
   already determines this in the beginning of loss recovery when the
   first new ACK arrives after the timeout-based or fast retransmit has
   been sent. The algorithm requires that the TCP Timestamps option
   defined in RFC1323 is enabled for a connection. The idea is to use
   the timestamps echoed in returning ACKs to eliminate the
   retransmission ambiguity in TCP. The Eifel detection algorithm
   provides a basis for future TCP enhancements. This includes response
   algorithms to back out of loss recovery by restoring a TCP sender's
   congestion control state.


Ludwig & Meyer                                                  [Page 1]


INTERNET-DRAFT           TCP - Eifel Detection           February, 2002


Terminology

   The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD,
   SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this
   document, are to be interpreted as described in [RFC2119].

   We use the term 'new ACK' to refer to an ACK that acknowledges
   previously unacknowledged data. We use the term 'duplicate ACK' as
   defined in [WS95]. Furthermore, we refer to the first-time
   transmission of a data segment as the 'original transmit'.


1. Introduction

   The retransmission ambiguity problem [KP87] is the TCP sender's
   inability to distinguish whether the first new ACK that arrives after
   a retransmit was sent in response to the original transmit or the
   retransmit. This problem occurs after a timeout-based retransmit and
   after a fast retransmit. The Eifel detection algorithm uses the TCP
   Timestamps option defined in RFC1323 to eliminate the retransmission
   ambiguity. It thereby allows the TCP sender to detect a posteriori
   whether it has entered loss recovery unnecessarily.

   This added capability of the TCP sender is useful in environments
   where TCP's loss recovery and congestion control algorithms may often
   get falsely triggered. This can be caused by packet reordering,
   packet duplication, the loss of a flight of ACKs, or a sudden delay
   increase in the data or the ACK path that results in a spurious
   timeout. For example, such sudden delay increases can often occur in
   wide-area wireless access networks due to handovers, resource
   preemption, or because the mobile transmitter traverses through a
   radio coverage hole (e.g., see [Gu01]).

   Based on the Eifel detection algorithm, the TCP sender may then
   choose to implement dedicated response algorithms. One goal of such a
   response algorithm would be to alleviate the consequences of a
   falsely triggered loss recovery. This may include restoring the TCP
   sender's congestion control state, and/or avoiding the often
   unnecessary go-back-N retransmits that typically occur after a
   spurious timeout. Another goal would be to adapt protocol parameters
   such as the duplicate acknowledgement threshold [RFC2581], and/or the
   RTT estimators [RFC2988]. This is to reduce the risk of falsely
   triggering TCP's loss recovery again as the connection progresses.
   However, such response algorithms are outside the scope of this
   document. Note: The original proposal, the "Eifel algorithm" [LK00],
   comprises both a detection and a response algorithm. This document
   only defines the detection part.

   A key feature of the Eifel detection algorithm is that it already
   detects upon the first new ACK that arrives during loss recovery
   whether a fast retransmit or a timeout was spurious. This is crucial


Ludwig & Meyer                                                  [Page 2]


INTERNET-DRAFT           TCP - Eifel Detection           February, 2002


   to be able to avoid the mentioned go-back-N retransmits. Another
   feature is that the Eifel detection algorithm is fairly robust
   against the loss of ACKs.


2. Events that Falsely Trigger TCP Loss Recovery

   The following events falsely trigger a TCP sender's loss recovery and
   congestion control algorithms. This causes a so-called spurious
   retransmit, and an unnecessary reduction of the TCP sender's
   congestion window and slow start threshold [RFC2581].

     - Spurious timeout

     - Packet reordering

     - Packet duplication

   The common understanding of a spurious timeout is a timeout that
   would not have occurred had the sender "waited longer". This may be
   caused by increased delay that suddenly occurs in the data and/or the
   ACK path. That in turn might cause an ACK to arrive too late, i.e.,
   only after the TCP sender's retransmission timer has expired. Note
   that such a late ACK could either be a new ACK that would acknowledge
   the oldest outstanding segment, or it could be the third duplicate
   ACK that would trigger a fast retransmit of the oldest outstanding
   segment [RFC2581]. For the purpose of specifying the algorithm in
   Section 3, we define the former case as SPUR_TO_NEWACK, and the
   latter as SPUR_TO_DUPACK. However, in this document we stretch the
   definition of a spurious timeout. We also speak of a spurious timeout
   when the timeout occurred because an entire flight of ACKs was lost
   while in fact the original transmit of the oldest outstanding segment
   had arrived at the TCP receiver. This case is also considered to fall
   under the definition of SPUR_TO_NEWACK. Even though such a timeout is
   certainly unavoidable, the triggering of loss recovery was spurious
   since that original transmit was not lost. Hence, we use the term
   'spurious timeout' in the sense that entering the associated loss
   recovery was unnecessary. Yet, we exclude from the definition of a
   spurious timeout the case where the retransmit arrives at the TCP
   receiver before the corresponding original transmit.

   Packet reordering in the network may occur because IP [RFC791] does
   not guarantee in-order delivery of packets. Additionally, a TCP
   receiver generates a duplicate ACK for each segment that arrives out-
   of-order. This results in a spurious fast retransmit if three or more
   data segments arrive out-of-order at the TCP receiver, and at least
   three of the resulting duplicate ACKs arrive at the TCP sender. We
   define such a case as SPUR_FR.

   Packet duplication may occur because a receiving IP does not (cannot)
   remove packets that have been duplicated in the network. A TCP


Ludwig & Meyer                                                  [Page 3]


INTERNET-DRAFT           TCP - Eifel Detection           February, 2002


   receiver in turn also generates a duplicate ACK for each duplicate
   segment. As with packet reordering, this results in a spurious fast
   retransmit if duplication of data segments or ACKs results in three
   or more duplicate ACKs to arrive at the TCP sender. This case is also
   considered to fall under the definition of SPUR_FR.

   The negative impact on TCP performance caused by packet reordering
   and packet duplication is commonly the same: a single spurious
   retransmit (the fast retransmit), and the unnecessary halving of the
   TCP sender's congestion window as a result of the subsequent fast
   recovery phase [RFC2581].

   The negative impact on TCP performance caused by a spurious timeout
   is more severe. First, the timeout event itself causes a single
   spurious retransmit, and unnecessarily forces the TCP sender into
   slow start [RFC2581]. Then, as the connection progresses, a chain
   reaction gets triggered that further decreases TCP's performance.
   Since the timeout was spurious, at least some ACKs for original
   transmits will typically arrive at the TCP sender before the ACK for
   the retransmit arrives. (This is unless severe packet reordering
   coincided with the spurious timeout in such a way that the ACK for
   the retransmit is the first new ACK to arrive at the TCP sender.)
   Those ACKs for original transmits will then trigger an implicit go-
   back-N loss recovery at the TCP sender. Assuming that none of the
   outstanding segments and none of the corresponding ACKs were lost,
   all outstanding segments will get retransmitted unnecessarily.
   Moreover, in some TCPs this may then cause a spurious fast
   retransmit. This is because the spurious go-back-N retransmits will
   arrive as duplicates at the TCP receiver, which in turn triggers a
   series of duplicate ACKs. Note that this last spurious fast
   retransmit could be avoided with the careful variant of 'bug fix'
   [RFC2582].

   More detailed explanations including TCP trace plots that visualize
   the effects of spurious timeouts and packet reordering can be found
   in [LK00].


3. The Eifel Detection Algorithm

3.1 The Idea

   The goal of the Eifel detection algorithm is to allow the TCP sender
   to detect a posteriori whether it has entered loss recovery
   unnecessarily. Furthermore, the TCP sender should be able to make
   this decision upon the first new ACK that arrives after the timeout-
   based or fast retransmit has been sent. This in turn requires extra
   information in every ACK by which the TCP sender can unambiguously
   distinguish whether that first new ACK was sent in response to the
   original transmit or the retransmit. Such extra information is
   provided by the TCP Timestamps option [RFC1323]. Generally speaking,


Ludwig & Meyer                                                  [Page 4]


INTERNET-DRAFT           TCP - Eifel Detection           February, 2002


   timestamps are monotonously increasing "serial numbers" added into
   every segment that are then echoed within the corresponding ACKs.
   This is exploited by the Eifel detection algorithm in the following
   way.

   Given that timestamps are enabled for a connection the TCP sender
   always stores the timestamp of the retransmit sent in the beginning
   of loss recovery, i.e., the timestamp of the timeout-based or the
   fast retransmit. If the timestamp of the first new ACK, arriving
   after the mentioned retransmit was sent, is smaller then the stored
   timestamp of that retransmit, then that ACK must have been sent in
   response to an original transmit. Hence, the TCP sender must have
   entered loss recovery unnecessarily.

   The fact that the Eifel detection algorithm decides upon this first
   new ACK is crucial to allow future response algorithms to avoid the
   spurious go-back-N retransmits that typically occur after a spurious
   timeout. Also, if loss recovery was entered unnecessarily, a window
   worth of ACKs are outstanding that all carry a timestamp that is
   smaller than the stored timestamp of the retransmit. The arrival of
   any one of those ACKs suffices the Eifel detection algorithm to work.
   Hence, the solution is fairly robust against ACK losses. Even the ACK
   sent in response to the retransmit, i.e., the one that carries the
   stored timestamp, may get lost.

   Note: Also the DSACK option [RFC2883] can be used to detect a
   posteriori whether the TCP sender has entered loss recovery
   unnecessarily. However, the first ACK carrying a DSACK option usually
   arrives at the TCP sender only after loss recovery has already
   terminated. Thus, the DSACK option cannot be used to eliminate the
   retransmission ambiguity. Consequently, it cannot be used to avoid
   the mentioned spurious go-back-N retransmits. Moreover, a DSACK-based
   detection algorithm is less robust against ACK losses.


3.2 The Algorithm

   Given that the TCP Timestamps option [RFC1323] is enabled for a
   connection, a TCP sender MAY use the Eifel detection algorithm as
   defined in this subsection.

   If the Eifel detection algorithm is used, the following steps MUST be
   taken by the TCP sender only upon initiation of loss recovery, i.e.,
   when either the timeout-based retransmit or fast retransmit is sent.
   Note: The Eifel detection algorithm should not be (re-)initiated
   after loss recovery has already started. In particular, it may not be
   (re-)initiated upon subsequent timeouts for the same segment, and not
   upon retransmitting segments other than the oldest outstanding
   segment.


Ludwig & Meyer                                                  [Page 5]


INTERNET-DRAFT           TCP - Eifel Detection           February, 2002


      (1)     Set a "SpuriousRecovery" variable to FALSE.

      (2)     Set a "RetransmitTS" variable to the value of the
               Timestamp Value field of the Timestamps option included
               in the retransmit. A TCP sender must ensure that
               RetransmitTS does not get overwritten as loss recovery
               progresses, e.g., in case of a second timeout and
               subsequent second retransmit of the same octet.

      (3)     Wait for the arrival of a valid ACK. If a valid ACK
               arrives, then proceed to step (4).

      (4)     If a first or second duplicate ACK has arrived, then
               return to step (3), else proceed to step (5).

      (5)     If a third duplicate ACK has arrived and the loss
               recovery had been initiated with a timeout-based
               retransmit, then set SpuriousRecovery to SPUR_TO_DUPACK
               and proceed to step (DONE), else proceed to step (6).

      (6)     If a new ACK has arrived and the Timestamp Echo Reply
               field of the ACK's Timestamps option is smaller than
               RetransmitTS, then proceed to step (7), else proceed to
               step (DONE).

      (7)     If the loss recovery had been initiated with a timeout-
               based retransmit then set SpuriousRecovery to
               SPUR_TO_NEWACK, else set SpuriousRecovery to SPUR_FR.
               Proceed to step (DONE).

      (DONE)  No further processing.

   The comparison operator "smaller than" in step (6) is conservative.
   In theory, when the "timestamp clock" is slow or the network is fast,
   RetransmitTS could at most be equal to the timestamp echoed by an ACK
   sent in response to an original transmit. In that case, it is assumed
   that the loss recovery was not falsely triggered.


3.3 Non-RFC1323-compliant TCP Receivers Mask Certain Spurious Timeouts

   Not all TCP implementations strictly follow RFC1323. There are
   differences concerning which timestamp a TCP receiver echoes when a
   duplicate segment arrives. The relevant case to consider in this
   context is a spurious timeout that occurred because an entire flight
   of ACKs got lost. (Recall the definition of a spurious timeout in
   Section 2.) The question is which timestamp the TCP receiver echoes
   in response to the spurious retransmit that typically arrives as a
   duplicate segment. RFC1323-compliant TCP receivers (e.g., LINUX
   kernel 2.4) will echo the timestamp of the last segment that arrived


Ludwig & Meyer                                                  [Page 6]


INTERNET-DRAFT           TCP - Eifel Detection           February, 2002


   in-sequence, while some non-RFC1323-compliant TCP receivers will echo
   the timestamp of the retransmit.

   The Eifel detection algorithm will only detect such spurious timeouts
   in case the TCP receiver is RFC1323-compliant in this respect.


3.4 Protecting Against Misbehaving TCP Receivers

   A TCP receiver can easily make a genuine retransmit appear to the TCP
   sender as a spurious retransmit by forging echoed timestamps. This
   may pose a security concern.

   Fortunately, there is a way to modify the Eifel detection algorithm
   in a way that makes it robust against lying TCP receivers. The idea
   is to use timestamps as a "segment's secret" that the TCP receiver
   only gets to know if it receives the segment. Conversely, a TCP
   receiver will not know the timestamp of a segment that was lost.
   Hence, to "prove" that it received the original transmit of a segment
   that the TCP sender retransmitted, the TCP receiver would need to
   return the timestamp of that original transmit. The Eifel detection
   algorithm could then be modified to only decide that loss recovery
   had been unnecessarily entered if the first new ACK echoes the
   timestamp of the original transmit.

   Hence, implementers may choose to implement the algorithm with the
   following modifications.

   Step (2) is replaced with step (2'):

      (2')    Set a "RetransmitTS" variable to the value of the
               Timestamp Value field of the Timestamps option that was
               included in the original transmit corresponding to the
               retransmit. Note: This step requires that the TCP sender
               stores the timestamps of all outstanding original
               transmits.

   Step (4) is replaced with step (4'):

      (4')    If a first or second duplicate ACK has arrived, then
               proceed to step (DONE), else proceed to step (6).

   Step (6) is replaced with step (6'):

      (6')    If a new ACK has arrived and the Timestamp Echo Reply
               field of the ACK's Timestamps option is equal to
               RetransmitTS, then proceed to step (7), else proceed to
               step (DONE).

   These modifications come at a cost. First, spurious timeouts of type
   SPUR_TO_DUPACK cannot be detected any longer. Also, the modified


Ludwig & Meyer                                                  [Page 7]


INTERNET-DRAFT           TCP - Eifel Detection           February, 2002


   algorithm is fairly sensitive against ACK losses since it relies on
   the arrival of the new ACK that corresponds to the original transmit.

   Note: The first new ACK that arrives after loss recovery had been
   unnecessarily entered, typically echoes the timestamp of the original
   transmit. This assumes that the ACK corresponding to the original
   transmit was not lost, and the TCP receiver does not forge timestamps
   but complies with RFC1323. In case of a spurious fast retransmit,
   this is implied by the rules for generating ACKs for data segments
   that fill in all or part of a gap in the sequence space (see section
   4.2 of [RFC2581]) and by the rules for echoing timestamps in that
   case (see rule (C) in section 3.4 of [RFC1323]). In case of a
   spurious timeout, it is likely that the delay that has caused the
   spurious timeout has also caused the TCP receiver's delayed ACK timer
   [RFC1122] to expire before the original transmit arrives. Also, in
   this case the rules for generating ACKs and the rules for echoing
   timestamps (see rule (A) in section 3.4 of [RFC1323]) ensure that the
   original transmit's timestamp is echoed.

   A remaining problem is that a TCP receiver might guess a lost
   segment's timestamp from observing the timestamps of segments that
   arrived earlier. This could be avoided by having the TCP sender add a
   "random counter" to the timestamp of every segment it sends, and then
   increment that counter by a random value, e.g., between 1 and 100.
   This ensures that timestamps remain monotonously increasing while
   making it difficult for a TCP receiver to guess the timestamp of a
   lost segment.


4. Security Considerations

   There do not seem to be any security considerations associated with
   the Eifel detection algorithm. This is because the Eifel detection
   algorithm does not alter existing protocol state at the TCP sender
   nor at the receiver. That is, no value of a state variable other than
   one introduced by the algorithm itself (SpuriousRecovery, and
   RetransmitTS) is changed.

   Moreover, a variant of the Eifel detection algorithm has been
   proposed in Section 3.4 that makes it robust against lying TCP
   receivers.


Acknowledgments

   Many thanks to Keith Sklower, Randy Katz, Stephan Baucke, Sally
   Floyd, Vern Paxson, Mark Allman, Ethan Blanton, Andrei Gurtov, Pasi
   Sarolahti, and Alexey Kuznetsov for useful discussions that
   contributed to this work.


Ludwig & Meyer                                                  [Page 8]


INTERNET-DRAFT           TCP - Eifel Detection           February, 2002


References

   [RFC2581] M. Allman, V. Paxson, W. Stevens, TCP Congestion Control,
             RFC 2581, April 1999.

   [RFC2119] S. Bradner, Key words for use in RFCs to Indicate
             Requirement Levels, RFC 2119, March 1997.

   [RFC1323] V. Jacobson, R. Braden, D. Borman, TCP Extensions for High
             Performance, RFC 1323, May 1992.

   [RFC2582] S. Floyd, T. Henderson, The NewReno Modification to TCP's
             Fast Recovery Algorithm, RFC 2582, April 1999.

   [RFC2883] S. Floyd, J. Mahdavi, M. Mathis, M. Podolsky, A. Romanow,
             An Extension to the Selective Acknowledgement (SACK) Option
             for TCP, RFC 2883, July 2000.

   [Gu01]    A. Gurtov, Effect of Delays on TCP Performance, In
             Proceedings of IFIP Personal Wireless Communications,
             August 2001.

   [KP87]    P. Karn, C. Partridge, Improving Round-Trip Time Estimates
             in Reliable Transport Protocols, In Proceedings of ACM
             SIGCOMM 87.

   [LK00]    R. Ludwig, R. H. Katz, The Eifel Algorithm: Making TCP
             Robust Against Spurious Retransmissions, ACM Computer
             Communication Review, Vol. 30, No. 1, January 2000,
             available at http://www.acm.org/sigcomm/ccr/archive/2000/
             jan00/ccr-200001-ludwig.html (easier studied when
             viewed/printed in color).

   [RFC2988] V. Paxson, M. Allman, Computing TCP's Retransmission Timer,
             RFC 2988, November 2000.

   [RFC791]  J. Postel, Internet Protocol, RFC 791, September 1981.

   [RFC793]  J. Postel, Transmission Control Protocol, RFC793, September
             1981.

   [WS95]    G. R. Wright, W. R. Stevens, TCP/IP Illustrated, Volume 2
             (The Implementation), Addison Wesley, January 1995.


Ludwig & Meyer                                                  [Page 9]


INTERNET-DRAFT           TCP - Eifel Detection           February, 2002


Author's Address

     Reiner Ludwig
     Ericsson Research (EED)
     Ericsson Allee 1
     52134 Herzogenrath, Germany
     Phone: +49 2407 575 719
     Fax:   +49 2407 575 400
     Email: Reiner.Ludwig@eed.ericsson.se

     Michael Meyer
     Ericsson Research (EED)
     Ericsson Allee 1
     52134 Herzogenrath, Germany
     Phone: +49 2407 575 654
     Fax:   +49 2407 575 400
     Email: Michael.Meyer@eed.ericsson.se


This Internet-Draft expires in August 2002.


Ludwig & Meyer                                                 [Page 10]