Internet Engineering Task Force                             P. Sarolahti
INTERNET DRAFT                                     Nokia Research Center
File: draft-sarolahti-tsvwg-tcp-frto-00.txt                      M. Kojo
                                                  University of Helsinki
                                                           June 24, 2002
                                                 Expires: December, 2002


                F-RTO: A TCP RTO Recovery Algorithm for
                  Avoiding Unnecessary Retransmissions


Status of this Memo

   This document is an Internet-Draft and is in full conformance with
   all provisions of Section 10 of [RFC2026].

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.


Abstract

   Spurious retransmission timeouts (RTOs) cause suboptimal TCP
   performance, because they often result in unnecessary retransmission
   of the last window of data. This document describes the "Forward RTO
   Recovery" (F-RTO) algorithm for recovering from the TCP RTOs
   efficiently even when the RTO was spurious. F-RTO is a TCP sender
   only algorithm that does not require any TCP options to operate.
   After retransmitting the first unacknowledged segment triggered by an
   RTO, the F-RTO algorithm at a TCP sender monitors the incoming
   acknowledgements to determine whether the timeout was spurious and to
   decide whether to send new segments or retransmit unacknowledged
   segments. The algorithm effectively avoids additional unnecessary
   retransmissions and thereby improves TCP performance in case of a


Expires: December 2002                                          [Page 1]

draft-sarolahti-tsvwg-tcp-frto-00.txt                          June 2002


   spurious timeout.


1.  Introduction

   The TCP protocol [Pos81] has two methods for triggering
   retransmissions.  Primarily, the TCP sender relies on incoming
   duplicate ACKs, which indicate that the receiver is missing some of
   the data. After a required amount of successive duplicate ACKs have
   arrived at the sender, it retransmits the first unacknowledged
   segment [APS99]. Secondarily, the TCP sender maintains a
   retransmission timer which triggers retransmission of segments, if
   they have not been acknowledged within the retransmission timer
   expiration period. When the retransmission timer expires, the
   congestion window is initialized to one segment and unacknowledged
   segments are retransmitted using the slow-start algorithm. The
   retransmission timer is adjusted dynamically based on the measured
   round-trip times [PA00].

   It has been pointed out that the retransmission timer can expire
   spuriously and trigger unnecessary retransmissions when no segments
   have been lost [GL02]. After a spurious RTO the acknowledgements of
   original segments arrive at the sender, usually triggering
   unnecessary retransmissions of whole window of segments.

   There are a number of potential reasons for spurious RTOs. First,
   some mobile networking technologies involve sudden delay peaks on
   transmission because of actions taken during a hand-off. Second,
   arrival of competing traffic of higher priority on a low-bandwidth
   link, or some other change in available bandwidth involves a sudden
   increase of round-trip time which may trigger a spurious
   retransmission timeout. A persistently reliable link layer can also
   cause a sudden delay when several data frames are lost for some
   reason. This document does not distinguish the different causes of
   such a delay, but discusses the spurious RTO caused by delay in
   general.

   This document describes an alternative algorithm called "Forward RTO-
   Recovery" (F-RTO) to be used for retransmitting segments after an
   RTO. The purpose of the F-RTO algorithm is to avoid most of the
   unnecessary retransmissions following a spurious timeout and thereby
   improve TCP performance. When the RTO is not spurious, the F-RTO
   algorithm reverts back to the conventional RTO recovery algorithm and
   should have similar performance. F-RTO does not require any TCP
   options in its operation, and it can be implemented by modifying only
   the TCP sender. This is different from alternative algorithms (Eifel
   [LK00] and D-SACK-based algorithms) that have been suggested for the
   same purpose.  The Eifel algorithm uses TCP timestamps for detecting


Expires: December 2002                                          [Page 2]

draft-sarolahti-tsvwg-tcp-frto-00.txt                          June 2002


   a spurious timeout and the D-SACK-based algorithms require that the
   SACK option with D-SACK extension [FMMP00] is in use.

   When a RTO occurs, the F-RTO sender retransmits the first
   unacknowledged segment normally. If the next two acknowledgements
   advance the window, the F-RTO sender continues sending new data and
   exits the recovery.  However, if either of the next two
   acknowledgements is a duplicate ACK, the RTO was likely to be caused
   because of missing data, hence the F-RTO sender retransmits the
   unacknowledged segments in slow start similarly to the traditional
   algorithm.

   The F-RTO algorithm only attempts to avoid unnecessary
   retransmissions after an RTO. Eifel and D-SACK can also be used in
   avoiding unnecessary retransmissions in other events, for example due
   to packet reordering. In addition, F-RTO is not intended to be used
   for deciding whether to revert the congestion control parameters
   after a spurious timeout. However, F-RTO can be used jointly with
   either Eifel or D-SACK algorithms. In such a case the TCP sender may
   detect unnecessary retransmissions also due to other reasons than a
   spurious RTO and it can more safely revert the congestion control
   parameters when a spurious RTO is detected.


2.  F-RTO Algorithm

   The F-RTO algorithm affects the TCP sender behavior only after a
   retransmission timeout. Otherwise the TCP behavior is similar to the
   conventional TCP. When the retransmission timer expires, the F-RTO
   algorithm takes the following steps at the TCP sender.

   1) When RTO expires, retransmit first unacknowledged segment.

      The TCP sender must adjust ssthresh to FlightSize / 2, according
      to the TCP congestion control specifications. However, congestion
      window (cwnd) is not yet adjusted, but the TCP sender waits for
      the next incoming acknowledgements before deciding on the
      following actions. Leaving cwnd unadjusted in this stage does not
      cause TCP sender to inject any additional segments.

   2) When the first acknowledgement after the RTO arrives at the
      sender, the sender chooses the following actions depending on
      whether the ACK advances the window or whether it is a duplicate
      ACK.

      a) If the acknowledgement is a duplicate ACK, revert to the
         conventional recovery and do not enter step 3 of this
         algorithm.


Expires: December 2002                                          [Page 3]

draft-sarolahti-tsvwg-tcp-frto-00.txt                          June 2002


         The sender must set cwnd to 1 * MSS. This duplicate ACK is
         triggered by a segment that was sent before the retransmission
         triggered by the RTO.  This is possible, for example, if the
         RTO is triggered during fast recovery and the forward
         transmissions have triggered duplicate ACKs. A common reason
         for entering this branch is that RTO triggered retransmission
         of a segment that was already retransmitted earlier by fast
         retransmit.

      b) If the acknowledgement advances the window, transmit two new
         segments.

         At this point the TCP sender should set cwnd <- ssthresh, thus
         halving the transmission rate. Sending two new segments at this
         point is equally aggressive to the conventional RTO recovery
         algorithm, which would have increased its cwnd to 2 * MSS when
         the first ACK arrives after RTO.

   3) When the second acknowledgement after the RTO arrives at the
      sender, either continue transmitting new data, or start
      retransmitting the unacknowledged segments.

      a) If the acknowledgement is a duplicate ACK, set congestion
         window to three segments, continue with the slow start
         algorithm retransmitting unacknowledged segments.

         The duplicate ACK indicates that at least one segment other
         than the segment which triggered RTO is lost in the last window
         of data. There is no sufficient evidence that any of the
         segments was delayed. Therefore, the sender proceeds with
         retransmissions similarly to the conventional RTO recovery
         algorithm, with the send_high variable stored when the
         retransmission timer expired to avoid unnecessary fast
         retransmits.

      b) If the acknowledgement advances the window, continue
         transmitting new data following the congestion avoidance
         algorithm.

         Because the TCP sender has retransmitted only one segment after
         the RTO, this acknowledgement indicates that an originally
         transmitted segment has arrived at the receiver. This is
         regarded as a strong indication of a spurious RTO. However,
         since the TCP sender cannot surely know at this point whether
         the segment that triggered the RTO was actually lost, adjusting
         the congestion control parameters after the RTO is the correct
         action.  From this point on, the TCP sender continues as in the
         normal congestion avoidance.


Expires: December 2002                                          [Page 4]

draft-sarolahti-tsvwg-tcp-frto-00.txt                          June 2002


         If this algorithm branch is taken, the TCP sender ignores the
         send_high variable that indicates the highest sequence number
         transmitted so far [FH99]. The send_high variable was proposed
         as a "bugfix" for avoiding unnecessary multiple fast
         retransmits when RTO expires during fast recovery with NewReno
         TCP. As the sender has not retransmitted other segments but the
         one that triggered RTO, the problem addressed by the bugfix
         cannot occur. Therefore, if there are duplicate ACKs arriving
         at the sender after the RTO, they are likely to indicate a
         packet loss, hence fast retransmit should be used to allow
         efficient recovery.  Alternatively, if there are not enough
         duplicate ACKs arriving at the sender after a packet loss, the
         retransmission timer expires another time and the sender enters
         step 1 of this algorithm.

   When algorithm branch (3b) is taken, the sender does not reduce the
   congestion window to one segment, but halves it to the level of
   ssthresh. Because the sender does not enter slow start, it increases
   congestion window only once in a round-trip time after RTO, and
   therefore is slightly more conservative than the conventional
   recovery algorithm. In fact, if the segment that triggered RTO was
   not lost, the correct behavior would have been to not decrease the
   congestion window at all. However, by using the D-SACK or TCP
   timestamp options the sender can more reliably detect whether the
   retransmission was unnecessary and revert the last adjustments on the
   congestion control parameters in such a case. Section 4 outlines the
   possible methods that can be used with F-RTO in reverting the
   congestion control state.

   Branch (3b) can also be taken when a retransmission triggered by
   delay is lost. This kind of loss cannot be detected at the sender.
   Therefore, we consider that reducing the congestion window to half of
   its previous size is an adequate action at this point, because a
   similar action is taken when TCP sender enters fast recovery.


3.  Scenarios

   We now discuss different scenarios where RTOs typically may occur and
   discuss how the F-RTO algorithm would perform in those scenarios. The
   interesting scenarios are a sudden delay triggering RTO, loss of a
   retransmitted packet during fast recovery, link outage losing several
   packets, and packet reordering. A performance evaluation with a more
   thorough analysis on a real implementation of F-RTO is given in
   [SKR02].


Expires: December 2002                                          [Page 5]

draft-sarolahti-tsvwg-tcp-frto-00.txt                          June 2002


3.1.  Sudden delay

   An unexpectedly long delay can trigger RTO, should it occur on a
   single packet blocking the following packets, or appear as increased
   RTTs for several successive packets. The example below illustrates
   the sequence of packets and acknowledgements seen by the TCP sender
   that follows the F-RTO algorithm, when a sudden delay occurs
   triggering RTO. For simplicity, delayed acknowledgements are not used
   in the example.

         ...                (cwnd = 6, ssthresh < 6, FlightSize = 5)
         1.  SEND(10) ->
         2.  ACK(6) <-
         3.  SEND(11) ->
         4.  <delay + RTO>  (ssthresh <- 3)
         5.  SEND(6) ->
         6.  ACK(7) <-
         7.  SEND(12) ->
         8.  SEND(13) ->
         9.  ACK(8) <-      (cwnd <- 3, FlightSize = 6)
         10. ACK(9) <-      (cwnd = 3,  FlightSize = 5)
         11. ACK(10) <-     (cwnd = 3,  FlightSize = 4)
         12. ACK(11) <-     (cwnd = 4,  FlightSize = 3)
         13. SEND(14) ->
         ...

   When a sudden delay long enough to trigger RTO occurs at step 4, the
   TCP sender retransmits the first unacknowledged segment (step 5).
   Because the next ACK advances the window, the TCP sender continues by
   sending two new data segments (steps 7, 8) and adjusts cwnd to 3 MSS.
   Because the second acknowledgement arriving after the RTO also
   advances the window the TCP sender exits the recovery and continues
   with the congestion avoidance. From this point on the retransmissions
   are invoked either by fast retransmit or when triggered by the
   retransmission timer. Because the TCP sender reduces cwnd when
   receiving the first ACK after RTO and sends the two new data segments
   at steps 7 and 8, it has to wait until the FlightSize is reduced to
   the level of congestion window before it can continue transmitting
   again at step 13.


3.2.  Loss of a retransmission

   If a retransmitted segment is lost, the only way to retransmit it
   again is to wait for the RTO to trigger the retransmission. Once the
   segment is successfully received, the receiver usually acknowledges
   several segments cumulatively. The example below shows a scenario
   where retransmission (of segment 6) is lost, as well as a later


Expires: December 2002                                          [Page 6]

draft-sarolahti-tsvwg-tcp-frto-00.txt                          June 2002


   segment (segment 9) in the same window. The limited transmit [ABF01]
   or SACK TCP [MMFR96] enhancements are not in use in this example.

         ...                (cwnd = 6, ssthresh < 6, FlightSize = 5)
         1.  SEND(10) ->
         2.  ACK(6) <-
         3.  SEND(11) ->
         4.  ACK(6) <-
         5.  ACK(6) <-
         6.  ACK(6) <-
         7.  SEND(6) ->     (cwnd <- 6, ssthresh <- 3, FlightSize = 6)
         8.  ACK(6) <-
         9.  <RTO>          (ssthresh <- 3)
         10. SEND(6) ->
         11. ACK(9) <-
         12. SEND(12) ->
         13. SEND(13) ->
         14. ACK(9) <-      (cwnd <- 3)
         15. SEND(9) ->
         16. SEND(10) ->
         17. SEND(11) ->
         18. ACK(11) <-
         ...

   In the example above, segment 6 is lost and the sender retransmits it
   after three duplicate ACKs in step 7. However, the retransmission is
   also lost, and the sender has to wait for the RTO to expire before
   retransmitting it again. Because the first ACK following the RTO
   advances the window (step 11), the sender transmits two new segments.
   The second ACK in step 14 does not advance the window, and the sender
   enters the slow start, sets cwnd to 3 * MSS, and retransmits the next
   three unacknowledged segments, as per the F-RTO algorithm description
   given in Section 2. After this the receiver acknowledges all segments
   transmitted prior to entering recovery and the sender can continue
   transmitting new data in congestion avoidance.


3.3.  Link outage

   A performance study shows that F-RTO performs similarly to the
   regular recovery when consecutive packets are lost both up- and
   downstream as a result of link outage [SKR02]. Because one of the two
   next ACKs after RTO does not advance the window in case of data loss,
   the TCP sender start retransmitting unacknowledged segments in slow
   start.

   The abovementioned performance study recovered a problem if the
   decision to skip retransmissions after RTO is based on the


Expires: December 2002                                          [Page 7]

draft-sarolahti-tsvwg-tcp-frto-00.txt                          June 2002


   information from the TCP timestamps only (i.e. by following the Eifel
   algorithm). When both data segments and acknowledgements are dropped
   in the same window, the TCP sender may end up in a wrong conclusion:
   in some occasions it continues transmitting new segments according to
   timestamp feedback, even though several data segments have been
   dropped. A simple example illustrating the problem is given below. In
   Section 4 we outline how this phenomenon can be avoided by using a
   retransmission logic of the F-RTO algorithm in conjunction with the
   timestamp information.

         ...                (cwnd = 6, ssthresh = 4, FlightSize = 5)
         1.  SEND(10) ->    (ts 110>
         2.  ACK(6) <-      <tsecr 105>
         3.  SEND(11) ->    <ts 111>
         4.  <data segment 7 is lost, ACKs for 6-11 are lost>
         5.  <RTO>          (cwnd <- 1, ssthresh <- 3)
         6.  SEND(6) ->     <ts 116>
         7.  ACK(7) <-      <tsecr 106> (undo: cwnd <- 6, ssthresh <- 4)
         8.  SEND(12) ->    <ts 117>
         9.  ACK(7) <-      <tsecr 106>
         10. <RTO>
         11. SEND(7) ->     <ts 120>
         12. ACK(8) ->      <tsecr 120>
         ...

   The Eifel sender fails, because the duplicate ACKs caused by lost
   data segment 7 are required to echo the timestamp of the segment that
   was last to update the window [BBJ92]. Because the duplicate ACKs
   were dropped, the first ACK after the retransmission in step 6
   appears as a new acknowledgement with a timestamp earlier than the
   timestamp of the retransmission. The sender reverts the adjustments
   on the congestion window and continues transmitting new data, which
   is a wrong action to take. This can lead to another RTO which could
   have been avoided with the regular TCP sender that retransmits the
   unacknowledged segments after the RTO. The F-RTO sender can avoid the
   problem presented above, because it starts retransmitting in slow
   start after getting the second acknowledgement not advancing the
   window at step 9.


3.4.  Packet reordering

   Since F-RTO modifies the TCP sender behavior only after a
   retransmission timeout, we limit the discussion on the effects of
   packet reordering in F-RTO behavior to the cases where packet
   reordering occurs immediately after RTO. We consider the
   retransmission timeout due to packet reordering to be very rare case,
   since reordering often triggers fast retransmit due to duplicate ACKs


Expires: December 2002                                          [Page 8]

draft-sarolahti-tsvwg-tcp-frto-00.txt                          June 2002


   caused by out-of-order segments. Should packet reordering occur after
   a RTO, duplicate ACKs arrive to the sender, taking the F-RTO
   algorithm to retransmit in slow start as a regular RTO recovery would
   do. Although this might not be the correct action, it is similar to
   the behavior of the regular TCP, making F-RTO a safe modification
   also in the presence of reordering.


4.  Reverting the Congestion Control State

   The related work on concerning improving the TCP performance on
   spurious retransmission timeouts suggests that the congestion control
   state would be reverted to the situation preceding the spurious
   timeout. The justification for this is that since there are no
   packets actually lost, the congestion window should not be reduced
   either. The F-RTO algorithm itself does not provide capabilities for
   making the decision to revert the congestion control state, but it
   can be combined with TCP enhancements such as Eifel [LK00] or D-SACK
   [FMMP00] that can be used for reverting the congestion window. These
   enhancements, however, require the TCP timestamps or selective
   acknowledgements to be present.

   Reverting the congestion control state is not encouraged by the
   authors at the present, but it remains a research issue to study the
   performance impacts of such action. Although by increasing the
   congestion window and slow start threshold the available network
   bandwidth can be utilized more rapidly, reverting the congestion
   control state may cause additional congestion losses on a bottleneck
   link. However, the algorithms for reverting the congestion control
   state when SACK or TCP timestamps are available, are outlined briefly
   below.

   When TCP timestamps are enabled, the TCP sender should follow the
   retransmission algorithm given in Section 2. However, the TCP sender
   MAY revert the slow start threshold and cwnd to a value preceding the
   RTO, if the echoed TCP timestamp in the first and second
   acknowledgement following the RTO are earlier than the timestamp of
   the retransmission, and if the two next acknowledgements advance the
   window.

   When selective acknowledgements are available with the D-SACK
   enhancement, the TCP sender can employ the D-SACK enhancement for
   deciding when to revert the congestion control state. Again, the TCP
   sender should follow the retransmission algorithm given in Section 2.
   When RTO expires, the TCP sender stores the highest segment
   transmitted (send_high) and initializes the counter for
   retransmissions made due to RTO. The TCP sender counts the number of
   D-SACK blocks indicating false retransmissions received with the


Expires: December 2002                                          [Page 9]

draft-sarolahti-tsvwg-tcp-frto-00.txt                          June 2002


   acknowledgements up to send_high. If all retransmissions were
   indicated as duplicate segments with D-SACK blocks and the next two
   acknowledgements following the RTO advanced the window, the slow
   start threshold and cwnd MAY be reverted to the values preceding the
   RTO.

   It can be noticed that by using Eifel the TCP sender can revert the
   congestion control state immediately after the two acknowledgements
   following the RTO, whereas using D-SACK the TCP sender must wait for
   the whole window of data transmitted before RTO to arrive before it
   can make decision on undoing the congestion control state. In
   addition, when reverting the congestion control state, the sender
   should avoid injecting burst of segments into the network with some
   burst avoidance algorithm.


5.  Security Considerations

   No additional security threats on TCP due to the F-RTO algorithm are
   known.


References


   [ABF01]   M. Allman, H. Balakrishnan, and S. Floyd. Enhancing TCP's
             Loss Recovery Using Limited Transmit. RFC 3042, January
             2001.

   [APS99]   M. Allman, V. Paxson, and W. Stevens. TCP Congestion Con-
             trol. RFC 2581, April 1999.

   [BBJ92]   D. Borman, R. Braden, and V. Jacobson. TCP Extensions for
             High Performance. RFC 1323, May 1992.

   [FH99]    S. Floyd and T. Henderson. The NewReno Modification to
             TCP's Fast Recovery Algorithm. RFC 2582, April 1999.

   [FMMP00]  S. Floyd, J. Mahdavi, M. Mathis, and M. Podolsky. An Exten-
             sion to the Selective Acknowledgement (SACK) Option to TCP.
             RFC 2883, July 2000.

   [GL02]    A. Gurtov and R. Ludwig. Evaluating the Eifel Algorithm for
             TCP in a GPRS Network. In Proc. of European Wireless, Flo-
             rence, Italy, February 2002

   [LK00]    R. Ludwig and R.H. Katz. The Eifel Algorithm: Making TCP
             Robust Against Spurious Retransmissions. ACM Computer


Expires: December 2002                                         [Page 10]

draft-sarolahti-tsvwg-tcp-frto-00.txt                          June 2002


             Communication Review, 30(1), January 2000.

   [MMFR96]  M. Mathis, J. Mahdavi, S. Floyd, and A. Romanow. TCP Selec-
             tive Acknowledgement Options. RFC 2018, October 1996.

   [PA00]    V. Paxson and M. Allman. Computing TCP's Retransmission
             Timer. RFC 2988, November 2000.

   [Pos81]   J. Postel. Transmission Control Protocol. RFC 793, Septem-
             ber 1981.

   [SKR02]   P. Sarolahti, M. Kojo, and K. Raatikainen. F-RTO: A New
             Recovery Algorithm for TCP Retransmission Timeouts. Univer-
             sity of Helsinki, Dept. of Computer Science. Series of Pub-
             lications C, No. C-2002-07. February 2002.


Authors' Addresses

   Pasi Sarolahti
   Nokia Research Center
   P.O. Box 407
   FIN-00045 NOKIA GROUP
   Finland

   Phone: +358 50 4876607
   EMail: pasi.sarolahti@nokia.com


   Markku Kojo
   University of Helsinki
   Department of Computer Science
   P.O. Box 26
   FIN-00014 UNIVERSITY OF HELSINKI
   Finland

   Phone: +358 9 1914 4179
   EMail: markku.kojo@cs.helsinki.fi


Expires: December 2002                                         [Page 11]