Internet Engineering Task Force                            A. Zimmermann
Internet-Draft                                              A. Hannemann
Intended status: Experimental                     RWTH Aachen University
Expires: February 27, 2010                               August 26, 2009


         Make TCP more Robust to Long Connectivity Disruptions
                      draft-zimmermann-tcp-lcd-02

Status of this Memo

   This Internet-Draft is submitted to IETF in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt.

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

   This Internet-Draft will expire on February 27, 2010.

Copyright Notice

   Copyright (c) 2009 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents in effect on the date of
   publication of this document (http://trustee.ietf.org/license-info).
   Please review these documents carefully, as they describe your rights
   and restrictions with respect to this document.

Abstract

   Disruptions in end-to-end path connectivity which last longer than
   one retransmission timeout cause suboptimal TCP performance.  The
   reason for the performance degradation is that TCP interprets segment



Zimmermann & Hannemann  Expires February 27, 2010               [Page 1]

Internet-Draft        Make TCP more Robust to LCDs           August 2009


   loss induced by connectivity disruptions as a sign of congestion,
   resulting in repeated backoffs of the retransmission timer.  This
   leads in turn to a deferred detection of the re-establishment of the
   connection since TCP waits until the next retransmission timeout
   occurs before attempting the retransmission.

   This document describes how standard ICMP messages can be exploited
   to disambiguate true congestion loss from non-congestion loss caused
   by long connectivity disruptions.  Moreover, a revert strategy of the
   retransmission timer is specified that enables a more prompt
   detection of whether the connectivity to a previously disconnected
   peer node has been restored or not.  The specified algorithm is a TCP
   sender-only modification that effectively improves TCP performance in
   presence of connectivity disruptions.


Table of Contents

   1.  Terminology  . . . . . . . . . . . . . . . . . . . . . . . . .  3
   2.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
   3.  Connectivity Disruption Indication . . . . . . . . . . . . . .  5
   4.  Connectivity Disruption Reaction . . . . . . . . . . . . . . .  6
     4.1.  Basic Idea . . . . . . . . . . . . . . . . . . . . . . . .  6
     4.2.  The Algorithm  . . . . . . . . . . . . . . . . . . . . . .  7
     4.3.  Discussion . . . . . . . . . . . . . . . . . . . . . . . .  9
     4.4.  Protecting Against Misbehaving Routers (the Safe
           Variant) . . . . . . . . . . . . . . . . . . . . . . . . . 11
   5.  Related Work . . . . . . . . . . . . . . . . . . . . . . . . . 11
   6.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 13
   7.  Security Considerations  . . . . . . . . . . . . . . . . . . . 13
   8.  Acknowledgments  . . . . . . . . . . . . . . . . . . . . . . . 13
   9.  References . . . . . . . . . . . . . . . . . . . . . . . . . . 13
     9.1.  Normative References . . . . . . . . . . . . . . . . . . . 13
     9.2.  Informative References . . . . . . . . . . . . . . . . . . 14
   Appendix A.  TODO list . . . . . . . . . . . . . . . . . . . . . . 16
   Appendix B.  Changes from previous versions of the draft . . . . . 16
     B.1.  Changes from draft-zimmermann-tcp-lcd-01 . . . . . . . . . 16
     B.2.  Changes from draft-zimmermann-tcp-lcd-00 . . . . . . . . . 16
   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 17












Zimmermann & Hannemann  Expires February 27, 2010               [Page 2]

Internet-Draft        Make TCP more Robust to LCDs           August 2009


1.  Terminology

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in [RFC2119].

   As defined in [RFC0793], the term "acceptable acknowledgment (ACK)"
   refers to a TCP segment that acknowledges previously unacknowledged
   data.  The Transmission Control Protocol (TCP) sender state variable
   "SND.UNA" and the current segment variable "SEG.SEQ" are used as
   defined in [RFC0793].  SND.UNA holds the segment sequence number of
   earliest segment that has not been acknowledged by the TCP receiver
   (the oldest outstanding segment).  SEG.SEQ is the segment sequence
   number of a given segment.

   We use both the term "retransmission timer" and the term
   "retransmission timeout (RTO)" as defined in [RFC2988].


2.  Introduction

   Connectivity disruptions can occur in many different situations.  The
   frequency of the connectivity disruptions depends thereby on the
   property of the end-to-end path between the communicating hosts.
   While connectivity disruptions can occur in traditional wired
   networks too, e.g., simply due to an unplugged network cable, the
   likelihood of occurrence is significantly higher in wireless (multi-
   hop) networks.  Especially, end-host mobility, network topology
   changes and wireless interferences are crucial factors.  In the case
   of the Transmission Control Protocol (TCP) [RFC0793], the performance
   of the connection can exhibit a significant reduction compared to a
   permanently connected path [SESB05].  This is because TCP, which was
   originally designed to operate in fixed and wired networks, generally
   assumes that the end-to-end path connectivity is relatively stable
   over the connection's lifetime.

   According to Schuetz et. al.  [I-D.schuetz-tcpm-tcp-rlci]
   connectivity disruptions can be classified into two groups: "short"
   and "long" connectivity disruptions.  A connectivity disruption is
   short if connectivity returns before the retransmission timer fires
   for the first time.  In this case, TCP recovers lost data segments
   through Fast Retransmit and lost acknowledgments (ACK) through
   successfully delivered later ACKs.  Connectivity disruptions are
   declared as "long" for a given TCP connection, if the retransmission
   timer fires at least once before connectivity returns.  Whether or
   not path characteristics like the round trip time (RTT) or the
   available bandwidth have changed when the connectivity returns after
   a disruption is another important aspect for TCP's retransmission



Zimmermann & Hannemann  Expires February 27, 2010               [Page 3]

Internet-Draft        Make TCP more Robust to LCDs           August 2009


   scheme [I-D.schuetz-tcpm-tcp-rlci].

   This document will focus on TCP's behavior in face of long
   connectivity disruptions in the time "before" connectivity is
   restored.  In particular this memo does not describe any additional
   modification to detect if the path characteristics remain unchanged
   in order to improve TCP's behavior "after" connectivity is restored.
   Therefore, TCP's congestion control mechanisms
   [I-D.ietf-tcpm-rfc2581bis] will be unchanged.

   When a long connectivity disruption occurs on a TCP connection, the
   TCP sender stops receiving acknowledgments.  After the retransmission
   timer expires, the TCP sender enters the timeout-based loss recovery
   and declares the oldest outstanding segment (SND.UNA) as lost.  Since
   TCP tightly couples reliability and congestion control, the
   retransmission of SND.UNA is triggered together with the reduction of
   sending rate, which is based on the assumption that loss is
   indication of congestion [I-D.ietf-tcpm-rfc2581bis].  As long as the
   connectivity disruption persists, TCP will repeat the procedure until
   the oldest outstanding segment is successfully acknowledged, or the
   connection times out.  TCP implementations that follow the
   recommended retransmission timeout (RTO) management of RFC 2988
   [RFC2988] double the RTO after each retransmission attempt.  However,
   the RTO growth may be bounded by an upper limit, the maximum RTO,
   which is at least 60s, but may be longer: Linux for example uses
   120s.  If the connectivity is restored between two retransmission
   attempts, TCP still has to wait until the retransmission timer
   expires before resuming transmission, since it simply does not have
   any means to know when the connectivity is re-established.
   Therefore, depending on when connectivity becomes available again,
   this can waste up to maximum RTO of possible transmission time.

   This retransmission behavior is not efficient, especially in
   scenarios or networks like wireless (multi-hop) networks where
   connectivity disruptions are frequent.  In the ideal case, TCP would
   attempt a retransmission as soon as connectivity to its peer is re-
   established.  This document describes how the standard Internet
   Control Message Protocol (ICMP) can be exploited to identify non-
   congestion loss caused by connectivity disruptions.  An revert
   strategy of the retransmission timer is specified that enables, due
   to higher-frequency retransmissions, a prompt detection of whether
   connectivity to a previously disconnected peer node has been
   restored.  The specified scheme is a TCP sender-only modification,
   i.e., neither intermediate routers nor the TCP receiver have to be
   modified.  Furthermore, in the case the network allows, i.e., no
   congestion is present, the proposed algorithm approaches the ideal
   behavior.




Zimmermann & Hannemann  Expires February 27, 2010               [Page 4]

Internet-Draft        Make TCP more Robust to LCDs           August 2009


3.  Connectivity Disruption Indication

   As long as the queue of an intermediate router experiencing a link
   outage is deep enough, i.e., it can buffer all incoming packets, a
   connectivity disruption will only cause variation in delay which is
   handled well by contemporary TCP implementations with the help of
   Eifel [RFC3522] or forward RTO (F-RTO) [I-D.ietf-tcpm-rfc4138bis].
   However, if the link outage lasts too long, the router experiencing
   the link outage is forced to drop packets and finally to discard the
   according route.  Means to detect such link outages comprise reacting
   on failed address resolution protocol (ARP) [RFC0826] queries,
   unsuccessful link sensing, and the like.  However, this is solely in
   the responsibility of the respective router.

      Note: The focus of this memo is on introducing a method how ICMP
      messages may be exploited to improve TCP's performance; how
      different physical and link layer mechanisms underneath the
      network layer may trigger ICMP destination unreachable messages
      are out of scope of this memo.

   The removal of the route usually goes along with a notification to
   the corresponding TCP sender about the dropped packets via ICMP
   destination unreachable messages of code 0 (net unreachable) or code
   1 (host unreachable) [RFC1812].  Therefore, since ICMP destination
   unreachable messages of these codes provide evidence that packets
   were dropped due to a link outage, they can be used by a TCP as an
   indication for a connectivity disruption.

   Note that there are also other ICMP destination unreachable messages
   with different codes.  Some of them are candidates for connectivity
   disruption indications too, but need further investigation.  For
   example ICMP destination unreachable messages with code 5 (source
   route failed), code 11 (net unreachable for TOS), or code 12 (host
   unreachable for TOS) [RFC1812].  On the other side codes that flag
   hard errors are of no use for the proposed scheme, since TCP should
   abort the connection when those are received [RFC1122].  In the
   following, the term "ICMP unreachable message" is used as synonym for
   ICMP destination unreachable messages of code 0 or code 1.

   The accurate interpretation of ICMP unreachable messages as an
   connectivity disruption indication is complicated by the following
   two peculiarities of ICMP messages.  Firstly, they do not necessarily
   operate on the same timescale as the packets, i.e., in the given case
   TCP segments, which elicited them.  When a router drops a packet due
   to a missing route it will not necessarily send an ICMP unreachable
   message immediately, but rather queues it for later delivery.
   Secondly, ICMP messages are subject to rate limiting, e.g., when a
   router drops a whole window of data due to a link outage, it will



Zimmermann & Hannemann  Expires February 27, 2010               [Page 5]

Internet-Draft        Make TCP more Robust to LCDs           August 2009


   hardly send as many ICMP unreachable messages as it dropped TCP
   segments.  Depending on the load of the router it may even send no
   ICMP unreachable messages at all.  Both peculiarities originate from
   [RFC1812].

   Fortunately, according to [RFC0792] ICMP unreachable messages are
   obliged to contain in their body the Internet Protocol (IP) header
   [RFC0791] of the datagram eliciting the ICMP unreachable messages
   plus the first 64 bits of the payload of that datagram.  Hence, in
   case of TCP both port numbers and the sequence number are included.
   This allows the originating TCP to identify the connection which an
   ICMP unreachable message is reporting an error about.  Moreover, it
   allows the originating TCP to identify which segment of the
   respective connection triggered the ICMP unreachable message,
   provided that there are not several segments in flight with the same
   sequence number.  This may very well be the case when TCP is
   recovering lost segments (see Section 4.3).

   A connectivity disruption indication in form of an ICMP unreachable
   message associated with a presumably lost TCP segment provides strong
   evidence that the segment was not dropped due to congestion but
   instead was successful delivered to the temporary end-point of the
   employed path, i.e., the reporting router.  It therefore did not
   witness any congestion at least on that very part of the path which
   was traveled by both, the TCP segment eliciting the ICMP unreachable
   message as well as the ICMP unreachable message itself.


4.  Connectivity Disruption Reaction

   In Section 4.1 the basic idea of the algorithm is given.  The
   complete algorithm is specified in Section 4.2.  In Section 4.3 the
   algorithm is discussed in detail.

4.1.  Basic Idea

   The goal of the algorithm is the prompt detection when the
   connectivity to a previously disconnected peer node has been restored
   after a long connectivity disruption while retaining appropriate
   behavior in case of congestion.  The proposed algorithm exploits
   standard ICMP unreachable messages to increase the TCP's
   retransmission frequency during timeout-based loss recovery by
   undoing one retransmission timer backoff whenever an ICMP unreachable
   message reports on a presumably lost retransmission.

   This approach has the advantage of appropriately reducing the probing
   rate in case of congestion.  If either the (re-)transmission itself,
   or the corresponding ICMP message is dropped the conventional backoff



Zimmermann & Hannemann  Expires February 27, 2010               [Page 6]

Internet-Draft        Make TCP more Robust to LCDs           August 2009


   is performed and not undone, effectively halving the probing rate.

4.2.  The Algorithm

   A TCP sender using RFC 2988 [RFC2988] to compute TCP's retransmission
   timer MAY employ the following scheme to avoid over-conservative
   backoffs of the retransmission timer in case of long connectivity
   disruptions.  If a TCP sender does implement the scheme, the
   following steps MUST be taken, but only upon initiation of a timeout-
   based loss recovery, i.e., upon the first timeout of the oldest
   outstanding segment (SND.UNA).  The algorithm MUST NOT be re-
   initiated after a timeout-based loss recovery has already been
   started but not completed.  In particular, it must not be re-
   initiated upon subsequent timeouts for the same segment.

   A TCP sender that does not employ RFC 2988 [RFC2988] to compute TCP's
   retransmission timer SHOULD NOT use the scheme.  We envision that the
   scheme could be easily adapted to other algorithms than RFC 2988.
   However, we leave this as future work.

   The scheme specified in this document uses the "Backoff_cnt"
   variable, whose initial value is zero.  The variable is used to count
   the number of performed retransmission timer backoffs during one
   timeout-based loss recovery.  Moreover, the "RTO_base" variable is
   used to recover the previous RTO in case the retransmission timer
   backoff was unnecessary.  The variable is initialized with the RTO
   upon initiation of timeout-based loss recovery.

   (1)  Before the variable RTO gets updated when timeout-based loss
        recovery is initiated, set the variable "Backoff_cnt" and the
        variable "RTO_base" as follows:

           Backoff_cnt := 0;
           RTO_base := RTO.

        Proceed to step (R).

   (R)  This is a placeholder for the behavior that a standard TCP must
        execute at this point in case the retransmission timer is
        expired.  In particular if RFC 2988 [RFC2988] is used, steps
        (5.4) - (5.6) of that algorithm go here.  Proceed to step (2).

   (2)  If the retransmission timer was backed off in the previous step
        (R), then increment the variable "Backoff_cnt" by one to account
        for the new backoff

           Backoff_cnt := Backoff_cnt + 1.




Zimmermann & Hannemann  Expires February 27, 2010               [Page 7]

Internet-Draft        Make TCP more Robust to LCDs           August 2009


   (3)  Wait either

           for the expiration of the retransmission timer.  When the
           retransmission timer expires, proceed to step (R);

           or for the arrival of an acceptable ACK.  When an acceptable
           ACK arrives, proceed to step (A);

           or for the arrival of an ICMP unreachable message.  When the
           ICMP unreachable message ICMP_DU arrives, proceed to step
           (4).

   (4)  If "Backoff_cnt > 0", i.e., an undoing of the last
        retransmission timer backoff is allowed, then

           proceed to step (5);

        else

           proceed to step (3).

   (5)  Extract the TCP segment header included in the ICMP destination
        unreachable message ICMP_DU

           SEG := Extract(ICMP_DU).

   (6)  If "SEG.SEQ == SND.UNA", i.e., the ICMP unreachable ICMP_DU
        message reports on the oldest outstanding segment, then undo the
        last retransmission timer backoff

           Backoff_cnt := Backoff_cnt - 1;
           RTO := RTO_base * 2^(Backoff_cnt).

   (7)  If the retransmission timer expires due to the undoing in the
        previous step (6), then

           proceed to step (R);

        else

           proceed to step (3).

   (A)  This is a placeholder for the standard TCP behavior that must be
        executed at this point in the case an acceptable ACK has
        arrived.  No further processing.

   When a TCP in steady-state detects a segment loss using the
   retransmission timer it enters the timeout-based loss recovery and



Zimmermann & Hannemann  Expires February 27, 2010               [Page 8]

Internet-Draft        Make TCP more Robust to LCDs           August 2009


   initiates the algorithm (step 1).  It adjusts the slow start
   threshold (ssthresh), sets the congestion window (CWND) to one
   segment, back offs the retransmission timer and retransmits the first
   unacknowledged segment (step R) [I-D.ietf-tcpm-rfc2581bis] [RFC2988].

   In case the retransmission timer expires again (step 3a) a TCP will
   repeat the retransmission of the first unacknowledged segment and
   back off the retransmission timer once more (step R).  If a maximum
   value is placed on the RTO (rule 2.5 in [RFC2988]) and that maximum
   value is already reached the TCP will not backoff the retransmission
   timer in this step and thus "Backoff_cnt" MUST NOT be incremented.
   However, the "last step" to reach this maximum RTO is still
   considered as a backoff in the scope of this algorithm and
   "Backoff_cnt" MUST be incremented, even if the RTO is not strictly
   doubled.

   If the first received packet after the retransmission(s) is an
   acceptable ACK (step 3b), a TCP will proceed as normal, i.e., slow
   start the connection and terminate the algorithm (step A).  Later
   ICMP unreachable messages from the just terminated timeout-based loss
   recovery are of no use and therefore ignored since the ACK clock is
   already restarting due to the successful retransmission.

   On the other side if the first received packet after the
   retransmission(s) is an ICMP unreachable message (step 3c), a TCP
   SHOULD if allowed (step 4) undo one backoff for each ICMP unreachable
   message reporting an error on a retransmission.  To decide if an ICMP
   unreachable message reports on a retransmission, the sequence number
   therein is exploited (step 5, step 6).  The undo is done by re-
   calculating the RTO with the previously reduced "Backoff_cnt".  This
   calculation explicitly matches the exponential backoff specified in
   [RFC2988] (rule 5.5).

   Upon receipt of an ICMP unreachable message which legitimately undoes
   one backoff there is the possibility that this new started
   retransmission timer has expired already (step 7).  Then, a TCP
   SHOULD retransmit immediately, i.e., an ICMP message clocked
   retransmission.  In case the new started retransmission timer has not
   expired yet, TCP MUST wait accordingly.

4.3.  Discussion

   It is important to note that the proposed algorithm only reacts to
   connectivity disruption indications in form of ICMP destination
   unreachable messages during the phase of RTO induced loss recovery.
   That is, TCP's behavior is not altered when no ICMP unreachable
   messages are received, or the retransmission timer of the TCP sender
   did not yet expire since the last successfully received ACK.  Thereby



Zimmermann & Hannemann  Expires February 27, 2010               [Page 9]

Internet-Draft        Make TCP more Robust to LCDs           August 2009


   the algorithm is by definition only triggered in the case of long
   connectivity disruptions.

   Only such ICMP unreachable messages which are reporting on the
   sequence number of the retransmission (SND.UNA) are evaluated by the
   proposed algorithm.  All other ICMP unreachable messages are ignored.
   If an ICMP unreachable message arrives for a retransmission it
   provides evidence that neither the retransmission nor the
   corresponding ICMP unreachable message itself did experience any
   congestion.  In other words, it has been proved that the
   retransmission was not lost due to congestion, but due to a
   connectivity disruption instead.

   One could argue, that if an ICMP unreachable message arrives for an
   RTO induced retransmission, the RTO should be reset, and the next
   retransmission sent out immediately similar to what is done when an
   ACK arrives after an RTO induced recovery phase.  This would allow
   for a much higher probing frequency based on the round trip time of
   the router where the connectivity is disrupted.  However, we consider
   our proposed scheme a good trade off between conservative behavior
   and a fast detection of connectivity re-establishment.

   Of course there is an ambiguity on which (re-)transmission an ICMP
   unreachable message reports.  However, for our purposes it is not
   considered to be problem, because the assumption that such an ICMP
   message provides evidence that one link loss was wrongly considered
   as a congestion loss, still holds.  There is also the option to make
   use of the timestamps option to obtain a more strict mapping between
   segments and ICMP messages (see Section 4.3).

   Besides the ambiguity if the first unacknowledged sequence number
   refers to the original transmission or to any of the retransmissions,
   there is another source of ambiguity about the sequence numbers
   contained in the ICMP unreachable messages.  For high bandwidth paths
   like modern gigabit links the sequence space may wrap rather quickly,
   thereby allowing the possibility that a late ICMP unreachable message
   reporting on an old error may coincidentally fit as input in the
   scheme explained above.  As a result, the scheme would wrongly undo
   one backoff.  Chances for this to happen are minuscule, since a
   particular ICMP message would need to contain the exact sequence
   number of SND.UNA, while at the same TCP is coincidentally in
   timeout-based loss recovery.  Moreover, as the scheme is tailored
   most conservatively no threat to the network from this issues may
   arise.

   Finally, the scheme explicitly does not call for a differentiation of
   ICMP unreachable messages originating from different routers, as the
   evidence of no congestion still holds even if the reporting router



Zimmermann & Hannemann  Expires February 27, 2010              [Page 10]

Internet-Draft        Make TCP more Robust to LCDs           August 2009


   changed.

   Another exploitation of ICMP unreachable messages in the context of
   TCP congestion control might seem appropriate in case the ICMP
   unreachable message is received while TCP is in steady-state and the
   message refers to a segment from within the current window of data.
   As the RTT up to the router which generates the ICMP unreachable
   message is likely to be substantially shorter than the overall RTT to
   the destination, the ICMP unreachable message may very well reach the
   originating TCP while it is transmitting the current window of data.
   In case the remaining window is large, it might seem appropriate to
   refrain from transmitting the remaining window as there is timely
   evidence that it will only trigger further ICMP unreachable messages
   at the very router.  Although this might seem appropriate from a
   wastage perspective, it may be counterproductive from a security
   perspective since ICMP message are easy to spoof, thereby allowing an
   easy attack to the TCP by simply forging such ICMP messages.

   An additional consideration is the following: in the presence of
   multi-path routing even the receipt of a legitimate ICMP unreachable
   message cannot be exploited accurately because there is the option
   that only one of the multiple paths to the destination is suffering
   from a connectivity disruption which causes ICMP unreachable messages
   to be sent.  Then however, there is the possibility that the path
   along which the connectivity disruption occurred contributed
   considerably to the overall bandwidth, such that a congestion
   response is very well reasonable.  However, this is not necessarily
   the case.  Therefore, a TCP has no means except for its inherent
   congestion control to decide on this matter.  All in all, it seems
   that for a connection in steady-state, i.e., not in RTO induced
   recovery, reacting on ICMP unreachable messages in regard to
   congestion control is not appropriate.  For the case of RTO-based
   retransmissions, however, there is a reasonable congestion response,
   which is skipping further backoffs of the retransmission timer
   because there is no congestion indication - as described above.

4.4.  Protecting Against Misbehaving Routers (the Safe Variant)

   Given that the TCP Timestamps option [I-D.ietf-tcpm-1323bis] is
   enabled for a connection, a TCP sender MAY use the following
   algorithm to protect against misbehaving routers.


5.  Related Work

   In literature there are several methods that address TCP's problems
   in the presence of connectivity disruptions.  Some of them try to
   improve TCP's performance by modifying lower layers.  For example



Zimmermann & Hannemann  Expires February 27, 2010              [Page 11]

Internet-Draft        Make TCP more Robust to LCDs           August 2009


   [SM03] introduces a "smart link layer" that buffers one segment for
   each ongoing connection and replaying these segments on connectivity
   re-establishment.  This approach has a serious drawback: previously
   stateless intermediate routers have to be modified in order to
   inspect TCP headers, to track the end-to-end connection and to
   provide additional buffer space.  These lead all in all to an
   additional need of memory and processing power.

   On the other hand stateless link layer schemes, like proposed in
   [RFC3819], which unconditionally buffer some small number of packets
   may have another problem: if a packet is buffered longer than the
   maximum segment lifetime (MSL) of 2 min [RFC0793], i.e., the
   disconnection lasts longer than MSL, TCP's assumption that such
   segments will never be received will no longer be true, violating
   TCP's semantics [I-D.eggert-tcpm-tcp-retransmit-now].

   Other approaches like TCP-F [CRVP01] or the Explicit Link Failure
   Notification (ELFN) [HV02] inform the TCP sender about a disrupted
   path by special messages generated from intermediate routers.  In
   case of a link failure they stop sending segments and freeze TCP's
   retransmission timers.  TCP-F stays in this state and remains silent
   until either a "route establishment notification" is received or an
   internal timer expires.  In contrast, ELFN periodically probes the
   network to detect connectivity re-establishment.  Both proposals rely
   on changes to intermediate routers, whereas the scheme proposed in
   this document is a sender-only modification.  Moreover, ELFN also
   does not consider congestion and may impose serious additional load
   on the network, depending on the probe interval.

   The authors of ATCP [LS01] propose enhancements to identify different
   types of packet loss by introducing a layer between TCP and IP.  They
   utilize ICMP destination unreachable messages to set TCP's receiver
   advertised window to zero and thus forcing the TCP sender to perform
   zero window probing with a exponential backoff.  ICMP destination
   unreachable messages, which arrive during this probing period, are
   ignored.  This approach is nearly orthogonal to this document, which
   exploits ICMP messages to undo a retransmission timer backoff when
   TCP is already probing.  In principle both mechanisms could be
   combined, however, due to security considerations it does not seem
   appropriate to adopt ATCP's reaction as discussed in Section 4.3.

   Schuetz et al. describe in [I-D.schuetz-tcpm-tcp-rlci] a set of TCP
   extensions that improve TCP's behavior when transmitting over paths
   whose characteristics can change on short time-scales.  Their
   proposed extensions modify the local behavior of TCP and introduce a
   new TCP option to signal locally received connectivity-change
   indications (CCIs) to remote peers.  Upon reception of a CCI, they
   re-probe the path characteristics either by performing a speculative



Zimmermann & Hannemann  Expires February 27, 2010              [Page 12]

Internet-Draft        Make TCP more Robust to LCDs           August 2009


   retransmission or by sending a single segment of new data, depending
   on whether the connection is currently stalled in exponential backoff
   or transmitting in steady-state, respectively.  The authors focus on
   specifying TCP response mechanisms, nevertheless underlying layers
   would have to be modified to explicitly send CCIs to make these
   immediate responses possible.


6.  IANA Considerations

   This memo includes no request to IANA.


7.  Security Considerations

   The proposed algorithm is considered to be secure.  For example an
   attacker cannot make a TCP modified with proposed scheme flood the
   network just by sending forged ICMP unreachable messages to attempt
   to maliciously shorten the retransmission timer.  An attacker would
   need to guess the correct sequence number of the current
   retransmission, which seems very unlikely.  Even in case of an
   omniscient attacker, the impact on network load would be low, since
   the retransmission frequency is limited by the RTO which was computed
   before TCP has entered the timeout-based loss recovery.  (The highest
   probing frequency is expected to be even lower than once per minimum
   RTO, that is 1s as specified by [RFC2988].)


8.  Acknowledgments

   We would like to thank Timothy Shepard and Joe Touch for feedback on
   earlier versions of this draft.  We also thank Michael Faber, Daniel
   Schaffrath, and Damian Lukowski for implementing and testing the
   algorithm in Linux.  Special thanks go to Ilpo Jarvinen, who gave
   valuable feedback regarding the Linux implementation.

   This document was written with the xml2rfc tool described in
   [RFC2629].


9.  References

9.1.  Normative References

   [I-D.ietf-tcpm-1323bis]
              Borman, D., Braden, R., and V. Jacobson, "TCP Extensions
              for High Performance", draft-ietf-tcpm-1323bis-01 (work in
              progress), March 2009.



Zimmermann & Hannemann  Expires February 27, 2010              [Page 13]

Internet-Draft        Make TCP more Robust to LCDs           August 2009


   [I-D.ietf-tcpm-rfc2581bis]
              Allman, M., Paxson, V., and E. Blanton, "TCP Congestion
              Control", draft-ietf-tcpm-rfc2581bis-07 (work in
              progress), July 2009.

   [RFC0792]  Postel, J., "Internet Control Message Protocol", STD 5,
              RFC 792, September 1981.

   [RFC0793]  Postel, J., "Transmission Control Protocol", STD 7,
              RFC 793, September 1981.

   [RFC1812]  Baker, F., "Requirements for IP Version 4 Routers",
              RFC 1812, June 1995.

   [RFC2988]  Paxson, V. and M. Allman, "Computing TCP's Retransmission
              Timer", RFC 2988, November 2000.

   [RFC4443]  Conta, A., Deering, S., and M. Gupta, "Internet Control
              Message Protocol (ICMPv6) for the Internet Protocol
              Version 6 (IPv6) Specification", RFC 4443, March 2006.

9.2.  Informative References

   [CRVP01]   Chandran, K., Raghunathan, S., Venkatesan, S., and R.
              Prakash, "A feedback-based scheme for improving TCP
              performance in ad hoc wireless networks", IEEE Personal
              Communications vol. 8, no. 1, pp. 34-39, February 2001.

   [HV02]     Holland, G. and N. Vaidya, "Analysis of TCP performance
              over mobile ad hoc networks", Wireless Networks vol. 8,
              no. 2-3, pp. 275-288, March 2002.

   [I-D.eggert-tcpm-tcp-retransmit-now]
              Eggert, L., "TCP Extensions for Immediate
              Retransmissions", draft-eggert-tcpm-tcp-retransmit-now-02
              (work in progress), June 2005.

   [I-D.ietf-tcpm-rfc4138bis]
              Sarolahti, P., Kojo, M., Yamamoto, K., and M. Hata,
              "Forward RTO-Recovery (F-RTO): An Algorithm for Detecting
              Spurious  Retransmission Timeouts with TCP",
              draft-ietf-tcpm-rfc4138bis-04 (work in progress),
              October 2008.

   [I-D.schuetz-tcpm-tcp-rlci]
              Schuetz, S., Koutsianas, N., Eggert, L., Eddy, W., Swami,
              Y., and K. Le, "TCP Response to Lower-Layer Connectivity-
              Change Indications", draft-schuetz-tcpm-tcp-rlci-03 (work



Zimmermann & Hannemann  Expires February 27, 2010              [Page 14]

Internet-Draft        Make TCP more Robust to LCDs           August 2009


              in progress), February 2008.

   [LS01]     Liu, J. and S. Singh, "ATCP: TCP for mobile ad hoc
              networks", IEEE Journal on Selected Areas in
              Communications vol. 19, no. 7, pp. 1300-1315, 2001 July.

   [RFC0791]  Postel, J., "Internet Protocol", STD 5, RFC 791,
              September 1981.

   [RFC0826]  Plummer, D., "Ethernet Address Resolution Protocol: Or
              converting network protocol addresses to 48.bit Ethernet
              address for transmission on Ethernet hardware", STD 37,
              RFC 826, November 1982.

   [RFC1122]  Braden, R., "Requirements for Internet Hosts -
              Communication Layers", STD 3, RFC 1122, October 1989.

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.

   [RFC2629]  Rose, M., "Writing I-Ds and RFCs using XML", RFC 2629,
              June 1999.

   [RFC3522]  Ludwig, R. and M. Meyer, "The Eifel Detection Algorithm
              for TCP", RFC 3522, April 2003.

   [RFC3819]  Karn, P., Bormann, C., Fairhurst, G., Grossman, D.,
              Ludwig, R., Mahdavi, J., Montenegro, G., Touch, J., and L.
              Wood, "Advice for Internet Subnetwork Designers", BCP 89,
              RFC 3819, July 2004.

   [RFC4884]  Bonica, R., Gan, D., Tappan, D., and C. Pignataro,
              "Extended ICMP to Support Multi-Part Messages", RFC 4884,
              April 2007.

   [SESB05]   Schuetz, S., Eggert, L., Schmid, S., and M. Brunner,
              "Protocol enhancements for intermittently connected
              hosts", SIGCOMM Computer Communication Review vol. 35, no.
              3, pp. 5-18, December 2005.

   [SM03]     Scott, J. and G. Mapp, "Link layer-based TCP optimisation
              for disconnecting networks", SIGCOMM Computer
              Communication Review vol. 33, no. 5, pp. 31-42,
              October 2003.







Zimmermann & Hannemann  Expires February 27, 2010              [Page 15]

Internet-Draft        Make TCP more Robust to LCDs           August 2009


Appendix A.  TODO list

   o  Extend the Security Sections 4.4 and 7.

   o  Extend discussion in Section 4.3

      *  ICMPv6.  See [RFC4443] and [RFC4884].

      *  Explicit Congestion Notification (ECN).

      *  More about congestion in general.

   o  Mention the possible side-effect on TCP implementations that
      measure the thresholds R1 and R2 (Section 4.2.3.5 of [RFC1122]) as
      a count of retransmissions instead of time units.

   o  Discuss the influence of packet duplication on the algorithm
      (Thanks to Ilpo).


Appendix B.  Changes from previous versions of the draft

B.1.  Changes from draft-zimmermann-tcp-lcd-01

   o  The algorithm in Section 4.2 was slightly changed.  Instead of
      reverting the RTO by halving it, it is recalculated with help of
      the "Backoff_cnt" variable.  This fixes an issue that occurred
      when the retransmission timer was backed off but bounded by a
      maximum value.  The algorithm in the previous version of the
      draft, would have "reverted" to half of that maximum value,
      instead of using the value, before the RTO was doubled (and then
      bounded).

   o  Miscellaneous editorial changes.

   o  Extended the TODO list (Appendix A).

B.2.  Changes from draft-zimmermann-tcp-lcd-00

   o  Miscellaneous editorial changes in Section 1, 2 and 3.

   o  The document was restructured in Section 1, 2 and 3 for easier
      reading.  The motivation for the algorithm is changed according
      TCP's problem to disambiguate congestion from non-congestion loss.

   o  Added Section 4.1.





Zimmermann & Hannemann  Expires February 27, 2010              [Page 16]

Internet-Draft        Make TCP more Robust to LCDs           August 2009


   o  The algorithm in Section 4.2 was restructured and simplified:

      *  The special case of the first received ICMP destination
         unreachable message after an RTO was removed.

      *  The "Backoff_cnt" variable was introduced so it is no longer
         possible to perform more reverts than backoffs.

   o  The discussion in Section 4.3 was improved and expanded according
      to the algorithm changes.

   o  Added Section 4.4.


Authors' Addresses

   Alexander Zimmermann
   RWTH Aachen University
   Ahornstrasse 55
   Aachen,   52074
   Germany

   Phone: +49 241 80 21422
   Email: zimmermann@cs.rwth-aachen.de


   Arnd Hannemann
   RWTH Aachen University
   Ahornstrasse 55
   Aachen,   52074
   Germany

   Phone: +49 241 80 21423
   Email: hannemann@nets.rwth-aachen.de

















Zimmermann & Hannemann  Expires February 27, 2010              [Page 17]