Network Working Group                                         K. Nielsen
Internet-Draft                                                  Ericsson
Intended status: Experimental                          November 10, 2014
Expires: May 14, 2015


                  SCTP Tail Loss Recovery Enhancements
                  draft-nielsen-tsvwg-sctp-tlr-01.txt

Abstract

   Loss Recovery by means of T3-Retransmission has significant
   detrimental impact on the delays experienced through an SCTP
   association.  The throughput achievable over an SCTP association also
   is negatively impacted by the occurence of T3-Retransmissions.  Loss
   Recovery by Fast Retransmission operation is in most situations
   superior to T3-Retransmission from a latency and a throughput
   perspective.  The present SCTP Fast Recovery algorithms as specified
   by [RFC4960] are not able to adequately or timely recover losses in
   certain situations, thus resorting to loss recovery by lengthy
   T3-Retransimissions or by non-timely activation of Fast Recovery.  In
   this document we propose for a number of enhancements to the SCTP
   Loss Recovery algorithms aimed to amend some of these deficiencies
   with a particular focus on Loss Recovery for drops in Traffic Tails.
   The enhancements supplement the existing algorithms of [RFC4960] with
   proactive probing and timer driven activation of the Fast
   Retransmission algorithm as well as a number of enhancements of the
   Fast Retransmission algorithm in itself are proposed.  The
   enhancement are proposed as supplements to the Loss Recovery
   algorithms of [RFC4960] and as such they do not deprecate or replace
   any of the mechanisms defined by [RFC4960].

   The solution proposed draws on prior art in the area of SCTP and TCP
   Loss Recovery improvements.  The mechanisms proposed include the
   adjustment to SCTP Fast Retransmission of certain improvements
   specified for TCP Fast Retransmission by [RFC6675] as well as the
   proposal embeds SCTP Early Retransmit [RFC5827] in a delayed variant.
   The proposal heavily draws on the ideas put forward for TCP by
   [DUKKIPATI01] for proactive probing and timer driven entering of Fast
   Recovery.  The proposal embeds certain aspects from [HURTIG] when
   applicable.  The procedures proposed are sender-side only and do not
   impact the SCTP receiver.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.




Nielsen                   Expires May 14, 2015                  [Page 1]


Internet-Draft                  SCTP TLR                   November 2014


   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on May 14, 2015.

Copyright Notice

   Copyright (c) 2014 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
     1.1.  SCTP TLR Function . . . . . . . . . . . . . . . . . . . .   4
     1.2.  TCP applicability . . . . . . . . . . . . . . . . . . . .   6
     1.3.  Packet Re-ordering  . . . . . . . . . . . . . . . . . . .   6
     1.4.  Congestion Control  . . . . . . . . . . . . . . . . . . .   6
   2.  Conventions and Terminology . . . . . . . . . . . . . . . . .   7
   3.  Description of Algorithms . . . . . . . . . . . . . . . . . .   7
     3.1.  SCTP Scoreboard and Mis Indication Counting Enhancements    7
       3.1.1.  Highest TSN Newly Acknowledged Extension  . . . . . .   7
     3.2.  RFC6675 nextseg() Tail Loss Enhancements for SCTP FR  . .   9
     3.3.  SCTP-TLR Description  . . . . . . . . . . . . . . . . . .  10
       3.3.1.  Principles  . . . . . . . . . . . . . . . . . . . . .  11
       3.3.2.  SCTP - TLR Statemachine . . . . . . . . . . . . . . .  12
       3.3.3.  TLPP Transmission Rules . . . . . . . . . . . . . . .  15
       3.3.4.  TLPP Recovered Losses . . . . . . . . . . . . . . . .  15
     3.4.  SCTP MH Considerations  . . . . . . . . . . . . . . . . .  16
   4.  Evaluation of function  . . . . . . . . . . . . . . . . . . .  16
   5.  Socket API Considerations . . . . . . . . . . . . . . . . . .  16
   6.  Security Considerations . . . . . . . . . . . . . . . . . . .  17



Nielsen                   Expires May 14, 2015                  [Page 2]


Internet-Draft                  SCTP TLR                   November 2014


   7.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  17
   8.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  17
   9.  References  . . . . . . . . . . . . . . . . . . . . . . . . .  17
     9.1.  Normative References  . . . . . . . . . . . . . . . . . .  17
     9.2.  Informative References  . . . . . . . . . . . . . . . . .  17
   Author's Address  . . . . . . . . . . . . . . . . . . . . . . . .  18

1.  Introduction

   Loss Recovery by means of T3-Retransmission has significant impact on
   the delays experienced through, as well as, the throughput achievable
   over an SCTP association.  Loss Recovery by Fast Retransmission (FR)
   operation in most situations is superior to T3-Retransmission from
   both a latency and a throughput perspective.

   The present SCTP Fast Retransmission algorithm, as specified by
   [RFC4960], is driven uniquely by exceed of a duptresh number of mis
   indication counts stemming for returned SACKs, and it is as such not
   able to adequately or timely recover losses in traffic tails where a
   sufficient number of such SACKs may not be generated, there resorting
   to loss recovery by T3-Retransimissions or by "non-timely" activation
   of Fast Recovery.

   By drop in traffic tails we refer not only to "pure" tail drops,
   i.e., drop of all packets in the end of the communication on an SCTP
   association from a certain point onwards, but more generally and
   specifically to the following situations:

   1.  Pure tail drops of the last SCTP packets of an SCTP association
       or more generally drop of packets in the end of an SCTP
       association which are not proceeded by more than dupthresh number
       of packets which are not dropped.  Drops of either type we will
       generally refer to as Tail Drops.

   2.  Tails Drops among packets sent in a the end of bursts spaced by
       pauses of time equal to or greater than the T3-timeout
       (approximately).  It is noted that such bursts (pauses in between
       bursts) may result from application limitations, from congestion
       control limitations or from receiver side limitations.

   3.  Drops among packets sent so sparsely that each dropped packet
       constitutes a tail drop in that dupthresh number of packets would
       not be sent (would not be available for sent) prior to expiry of
       the T3-timeout.

   It shall be noted that while the above traffic drop criteria describe
   drops among the forward data packets only, then drops among forward
   data packets combined with drops of the returned SACKs may together



Nielsen                   Expires May 14, 2015                  [Page 3]


Internet-Draft                  SCTP TLR                   November 2014


   result in that an insufficient number of SACKs be returned to traffic
   sender for that the Fast Retransmission algorithm be activated prior
   to T3-timeout occurring.  The tail traffic situations for which SCTP
   FR is not able to recover the losses is thus in general broader than
   the exact situations listed above.  The improvements proposed
   includes enhancement of SCTP to deduce the mis indication counts from
   an enhanced SACK scoreboard thus removing some of the vulnerability
   of the present SCTP mis indication counting to loss of SACKs.

   It is noted that the Early Retransmit algorithm, [RFC5827], addresses
   activation of Fast Recovery for a particular subset of the above tail
   drop situations.  The solution proposed embeds (as a special case)
   the Early Retransmits algorithm in the delayed variant, experienced
   with for TCP in [DUKKIPATI02] in which Early Retransmission is only
   activated provided a certain time has elapsed since the lowest
   outstanding TSN was transmitted.  The delay adds robustness towards
   spurious retransmissions caused by "mild" packet re-ordering as
   documented for TCP in [DUKKIPATI02].

1.1.  SCTP TLR Function

   The function proposed for enhancements of the SCTP Loss Recovery
   operation for Traffic Tail Losses is divided in two parts:

   o  Enhancements of SCTP Fast Retransmisison (SCTP FR) algorithm by
      means of the introduction SCTP FR equivalents of the following
      Tail Loss Recovery improving functions inspired by or specified by
      [RFC6675] for TCP.

      *  Counting mis indications for a missing (non-SACK'ed) TSN based
         on augmented SACK scoreboard information in which the mis
         indications will be based on the number of SACK'ed SCTP packets
         carrying data chunks of higher TSNs.  The mechanism is
         specified both in terms of packets, the book-keeping of which
         requires new logic, as well as in terms of a less
         implementation demanding byte based variant following the
         Islost() approach of [RFC6675].  We shall refer to this as
         Extended Mis Indication Counting.

      *  The "last resort" retransmisssion, Nextseg 3) and Nextseg 4),
         operations of [RFC6675] supporting conditional proactive fast
         retransmissions of missing TSNs within the Fast Recovery Exit
         Point but not yet classified as lost

   o  New SCTP Tail Loss Recovery State machine with proactive timer
      driven activation of (the enhanced) Fast Recovery operation
      whenever network responsiveness (SACKs of packets) has been proven
      within a certain time, shorter then the T3 timeout, from the



Nielsen                   Expires May 14, 2015                  [Page 4]


Internet-Draft                  SCTP TLR                   November 2014


      transmittal of the lowest outstanding TSN.  The SCTP TLR mechanism
      implements a new timer, the Tail Loss Probe timer (PTO), and it
      works in parts by:

      *  forcing entering of Fast Recovery when network responsiveness
         has been proven and the PTO timer has kicked, but additional
         trafic sent (SACKs of additional traffic sent) have not served
         to activate Fast Recovery based on the (extended) mis
         indication counting.

      *  probing, by transmittal of a TLR probe packet, for network
         responsiveness, when no other information is available at kick
         of the PTO timer (no packets have been received for any packets
         in the traffic tail).

      *  allowing for T3-retransmission Loss Recovery only when the
         network remains unresponsive (no SACK received for any traffic
         in the tail nor for the probe packet),

   It is noted that depending on the exact situation (e.g., drop
   pattern, congestion window and amount of data in flight) then
   T3-retransmission procedures need not be inferior to Fast
   Retransmission procedures.  Rather in some situations
   T3-retransmission will indeed be superior as T3-retransmissions allow
   for ramp up of the congestion window during the Recovery Process and
   as it, by its nature of declaring all outstanding data as lost, never
   risks being blocked by congestion window limitations.  The changes
   proposed in this document focus on improving the Loss Recovery
   operation of SCTP by enforcing timely activation of (improved) Fast
   Retransmission algorithms.  With the purpose to reduce the latency of
   the TCP and SCTP Loss Recovery operation [HURTIG] has taken the
   alternative approach of accelerating the activation of
   T3-retransmission processes when Fast Recovery is not able to kick in
   to recover the loss.  [HURTIG] only addresses a subset of the Tail
   loss scenarios in scope in the work presented here.  The ideas of
   [HURTIG] for accurate RTO restart are drawn on in the solution
   proposed here for accurate restart of the new tail loss probe timer
   (PTO-timer) as well as for accurate set of the T3-timer under certain
   conditions thus harvesting some og the same latency optimizations as
   [HURTIG].

   OPEN ISSUE: It is to be determined if [HURTIG], or plain
   T3-retransmission of [RFC4960], are opportune compared to the
   solution proposed here in certain situations.  Speculated situations
   include situations where the Fast Retransmission algorithm (when
   activated via new proactive approach) is blocked by congestion
   control (CC) limitations.  If the issue is significant, the remedy
   may be to look for special purpose amendments, like to amend the CC



Nielsen                   Expires May 14, 2015                  [Page 5]


Internet-Draft                  SCTP TLR                   November 2014


   operation during SCTP FR or to redesign the solution to promote
   proactive T3-retransmission operation rather than Fast Retransmission
   in certain situations.  Yet another remedy may be to generally look
   to improve the CC operation of SCTP.

   The SCTP TLR procedures proposed apply as add-on supplements to any
   SCTP implementation based on [RFC4960].  The procedures are sender-
   side only and do not impact the SCTP receiver.

1.2.  TCP applicability

   SCTP Loss Recovery operation in its core is based on the design of
   Loss Recovery for TCP with SACK enabled.  The enhancements of SCTP
   Tail Loss Recovery proposed here are readably applicable for TCP.

   It is noted that while the SCTP TLR algorithms and SCTP TLR state
   machine defined is inspired by the timer driven tail loss probe
   approach specified in [DUKKIPATI01] for TCP, then the solution
   defined here differs in the approach taken.  The approach here is a
   clean state approach defining a new comprehensive SCTP TLR
   statemachine on top of (in addition to) the existing Fast Recovery
   and T3-Recovery states covering all tails loss patterns, whereas the
   approach of [DUKKIPATI01] relies on a number of experimental
   mechanisms ([DUKKIPATI02], [MATHIS], [RFC5827]) defined for TCP in
   IETF or in Research with adhoc extension to support selected Tail
   loss patterns by addition of the tail loss probe mechanism and the
   therefrom driven activation of the mechanisms.

1.3.  Packet Re-ordering

   The solution proposed is an enhancement of the existing mis
   indication counting based Fast Recovery operation of SCTP, [RFC4960],
   and as such the solution inherits the fundamental vulnerability to
   packet re-ordering that the SCTP Fast Recovery algorithm of [RFC4960]
   embeds.

   The solution does not increase the vulnerability of Loss Recovery to
   packet-reordering as demonstrated by (to be filled in).

1.4.  Congestion Control

   It shall be noted that in its very nature of prompting for activation
   of Fast Recovery instead of T3-Recovery then the benefit of the
   solution proposed versus the existing solution of [RFC4960] will
   depend on the CC operation not only during the recovery process but
   also after exit of the recovery process.  In this context it is noted
   that the prior approach taken for TCP, [DUKKIPATI01], has been




Nielsen                   Expires May 14, 2015                  [Page 6]


Internet-Draft                  SCTP TLR                   November 2014


   documented for a TCP implementation running CUBIC, whereas SCTP runs
   a CC algorithm more similar to TCP Reno CC as defined by [RFC5681].

   The solution at present is defined within the constraints of existing
   Congestion Control principles of STCP as defined by [RFC4960].  It is
   anticipated that Congestion Control improvements are desirable for
   SCTP in general as well as for the functions deined here in
   particular.

2.  Conventions and Terminology

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in [RFC2119].

3.  Description of Algorithms

3.1.  SCTP Scoreboard and Mis Indication Counting Enhancements

3.1.1.  Highest TSN Newly Acknowledged Extension

   Entering of Fast Recovery in SCTP, as specified by [RFC4960]), is
   driven by mis indication counts.  When a TSN has received dupthresh=3
   mis indication counts, the TSN is declared lost and will be eligible
   for fast retransmission via Fast Recovery procedure.

   Mis indication counts are in RFC4960 SCTP driven entirely by receipt
   of SACKs in accordance with the Highest TSN Newly Acknowledged
   algorithm (section 7.2.4 of [RFC4960]):

      Highest TSN Newly Acknowledged (HTNA): For each incoming SACK,
      miss indications are incremented only for missing TSNs prior to
      the highest TSN newly acknowledged in the SACK.  A newly
      acknowledged DATA chunk is one not previously acknowledged in a
      SACK.

   An evident issue with the HTNA algorithm is that it is vulnerable to
   loss of SACKs.  In many situations loss of SACKs will result only in
   a slight delayed entering of Fast Recovery for a dropped TSN, but
   generally then by relying on HTNA algorithm only, loss of SACKs will
   further broaden the trafic tails situations where Fast Recovery
   either not be activated in a timely manner or will not be activated
   at all due to the receipt of an insufficient number SACKs only.

   In order to make SCTP Fast Recovery more robust towards drop of SACKs
   we describe for the following extension of the HTNA algorithm to be
   supported by an SCTP implementation:




Nielsen                   Expires May 14, 2015                  [Page 7]


Internet-Draft                  SCTP TLR                   November 2014


      Newly Acked Packets ahead-of-line (NAPahol): For each incoming
      SACK, miss indications are incremented only for missing TSNs prior
      to the highest TSN newly acknowledged in the SACK.  A newly
      acknowledged DATA chunk is one not previously acknowledged in a
      SACK.  For each missing TSN thus potentially eligible for
      additional mis indication counts, the number of mis indications to
      be given shall follow the number of newly acknowledged packets
      ahead of line of the packet of the missing TSN.

   The solution is robust towards split SACK.  The solution requires for
   the SCTP impementation to keep track of the relationship inbetween
   chunks and packets.  One solution is for the SCTP implementation to
   maintain a monotonically incrementing packet seqence number to map
   chunks to packets and for each outstanding chunk to keep state of the
   packet id that the chunk was sent in as well as (incrementally
   updated) the packet ids of up to dupthres-1 (=2) packets ahead of
   line for which chunks have been SACKed.

   As an alternative to the above accurate packet counting then an SCTP
   implementation MAY instead support the following bytes counting based
   extension of the RFC4960 HTNA algorithm:

      Highest Bytes Newly Acknowledged (HBNA): For each incoming SACK,
      miss indications are incremented only for missing TSNs prior to
      the highest TSN newly acknowledged in the SACK.  A newly
      acknowledged DATA chunk is one not previously acknowledged in a
      SACK.  For each missing TSN thus eligible for additional mis
      indication counts, the number of mis indications to be given shall
      follow the number of newly acknowledge bytes in the SACK ahead of
      line of the missing TSN in the following manner Add-mis-
      indication-count(TSN) = mod_PMTU(Newly bytes ahead of
      line(TSN))+1.

   For both solutions (NAPhol, HBNA) then it is noted that an SCTP
   implementation only need to keep count of the mis-indications up to
   the dupthres=3 threshold level and equally well an implementation
   need not track the exact number of packets ahead of line or the exact
   number os bytes ahead of line of a certain missing TSN once this
   number surpasses the dupthres=3 threshold.

   This last byte based approach follows the approach taken for TCP,
   Islost(), in [RFC6675].  It is noted, however, that due to the
   message based approach of SCTP, then a byte based approach generally
   will be less accurate as a measure for the number of packet received
   ahead of line than it is for byte stream based TCP.

   OPEN ISSUE: Check alignment with algorthms defined in [HURTIG].  If
   relevant align.



Nielsen                   Expires May 14, 2015                  [Page 8]


Internet-Draft                  SCTP TLR                   November 2014


3.2.  RFC6675 nextseg() Tail Loss Enhancements for SCTP FR

   The Fast Recovery algorithm for TCP as specified in [RFC6675]
   implements some differences compared to the fast retransmission
   algorithm specified for SCTP by [RFC4960].  Of particular
   significance for recovery of losses in traffic tails scenarios are
   the fact that the [RFC6675] algorithm, once Fast Recovery has been
   activated, takes two "last resort" retransmission measures, step 3)
   and step 4) of Nextseg() of [RFC6675], that faciliate the recovery of
   losses in situations where only an insufficient number of SACKs would
   be able to be generated to complete the Fast Recovery process without
   resorting to T3-timeout.  For SCTP Fast Recovery we formulate the
   equivalent measures as follows:

   Last Resort Retransmission:  If the following conditions are met:

      *  there are no outstanding TSN's eligible for fast retransmission
         due to dupthres or more mis indications

      *  there is no new data available for transmission

      then an outstanding TSN less than or equal to the Fast Recovery
      Exit Point, for which there exists SACKs of chunks ahead of line
      of the TSN, may be retransmitted provided the CWND allow.  The
      bytes of a TSN which is retransmitted in this manner are not
      subtracted from the flight size prior to this action be taken nor
      as a result of this action.  If the mis indication count of the
      TSN subsequently reaches the dupthres value, the bytes of the TSN
      shall be subtracted from the flight size.  Once acknowledged the
      remaining contribution of this TSN in the flight size (whether it
      be there counted once or twice at this point in time) is
      subtracted.  A TSN which is retransmitted in this manner will be
      marked as ineligible for a subsequent fast retransmit.

   Rescue:  If all of the following conditions are met:

      *  there are no outstanding TSN's eligible for fast retransmission
         due to dupthres or more mis indications

      *  there is no new data available for transmission

      *  there are no outstanding TSNs eligible for Last Resort
         retransmission

      *  the cumack has progressed since this entering of Fast Recovery

      and there exist non-SACKed, non fast retransmitted TSNs, within
      the Fast Recovery Exit point, then for this entry of Fast



Nielsen                   Expires May 14, 2015                  [Page 9]


Internet-Draft                  SCTP TLR                   November 2014


      Recovery, conditionally to that the CWND allows, we allow for fast
      retransmisssion of one packet of consecutive outstanding non fast
      retransmitted TSNs up to PMTU size, the highest TSN of which MUST
      be the highest outstanding TSN within the Fast Recovery Point.
      The bytes of a TSN which is retransmitted in this manner are not
      subtracted from the flight size prior to this action be taken nor
      as a result of this action.  If the mis indication count of the
      TSN subsequently reaches the dupthres value, the bytes of the TSN
      shall be subtracted from the flight size.  Once acknowledged the
      remaining contribution of this TSN in the flight size (whether it
      be there counted once or twice at this point in time) is
      subtracted.  A TSN which is retransmitted in this manner will be
      marked as ineligible for a subsequent fast retransmit.

   An implementation of the Rescue operation may be accomplished by
   maintain of an RescueRTX parameter as described for TCP in [RFC6675].

   DISCUSSION: [RFC4960] in addition to the HTNA algorithm demand for
   additional mis indication counting to be performed during Fast
   Recovery according to the following prescription (section 7.2.4 of
   [RFC4960]):

   (#)  If an endpoint is in Fast Recovery and a SACK arrives that
      advances the Cumulative TSN Ack Point, the miss indications are
      incremented for all TSNs reported missing in the SACK.

   It is noted that under special circumstances then (#) make SCTP Fast
   Recovery complete in situations where TCP Fast Recovery would only
   complete by virtue of the measure 3) or 4) of [RFC6675] and as such
   these measures are more critically demanded for TCP Fast Recovery
   operation than for the SCTP Fast Recovery operation.  However as
   documented by (to be filled in) the Last Resort Retransmission
   operation and the Rescue operation also for SCTP significantly
   improve the Loss Recovery operation; the latency of the individual
   loss recovery operation as well as the ability of the operation to
   complete without resort to T3-timeout.  Consequently this document
   prescribes for Enhanced SCTP Tail Loss Recovery to implement these
   procedures.

   As the algoritm extension is limited by the existing congestion
   control algorithm of SCTP, these extensions of SCTP Fast Recovery do
   not compromize the TCP fairness of the SCTP Fast Recovery Operation.

3.3.  SCTP-TLR Description







Nielsen                   Expires May 14, 2015                 [Page 10]


Internet-Draft                  SCTP TLR                   November 2014


3.3.1.  Principles

   The Tail Loss Recovery function for SCTP is based on the following
   principles:

   o  Maintain a Tail Loss Probe Timer (PTO) which, away from when SCTP
      is in Fast Recovery or in T3-recovery, is running on lowest
      outstanding TSN.  The PTO timer value used will depend on the
      situation:

         By default the following timer value is used:

              PTO1:  PTO=MIN(RTO, 1.5*SRTT+MAX(RTTVAR, DELAY_ACK))

         Whereas the following value is used:

              PTO2:  PTO=MIN(RTO, 1.5*SRTT+RTTVAR)

         when it is known that subsequent SACKs not acknowledging the
         TSN for which the PTO is running will be (or will have been)
         returned immediately.  For more details see Section 3.3.2.

         By design the probe timer is kept lower or equal to the RTO,
         thereby postponing a potential unnecessary and damaging RTO, as
         well as generally larger than an anticipated RTT thereby
         preventing that it kicks in prematurely.  I.e., the timer only
         kicks in at a time where one would have expected to have
         received a SACK were there no problems.

   o  PTO timer driven transmittal of Tail Loss Probe Packet: Once data
      is outstanding and the PTO timer kicks on the lowest outstanding
      TSN and no SACKs of any chunks with higher TSN number have
      arrived, a probe packet, denoted a Tail Loss Probe Packet (TLPP),
      is sent to probe for network responsiveness (i.e., for SACK of the
      TLPP) in order to potentially drive proactive entering of Fast
      Recovery.

      *  In this situation the PTO timer on the lowest outstanding TSN
         is cancelled and reset as a T3-timer with value MAX(PTO, RTO-
         PTO).

      *  The TLPP sent is chosen as the lowest unsent TSN if such exists
         and is available for transmittal or alternatively if no such
         TSN is available, the presently outstanding packet with highest
         TSN number.  This is done in order to best possibly interface
         with standard Fast Recovery, i.e., to create a loss pattern
         situation that corresponds best possibly with how Fast Recovery
         algorithm retransmits lost packets.



Nielsen                   Expires May 14, 2015                 [Page 11]


Internet-Draft                  SCTP TLR                   November 2014


   o  PTO timer driven entering of Fast Recovery: Process is enforced
      when network responsiveness is proven (SACK of later sent data
      than lowest outstanding TSN is available) and (at least) PTO time
      has elapsed since transmittal of the lowest outstanding TSN.

3.3.2.  SCTP - TLR Statemachine

   In addition to the Fast Recovery State and the T3-Recovery state the
   SCTP Tail Loss Recovery function defines 3 states: The SCTP TLR OPEN
   state, the SCTP TLR PROBE WAIT state and the SCTP TLR DELAY WAIT
   state.  At any given time SCTP transmission logic will be in either
   of the 5 states.

   Figure 1 illustrates the states and the state transistions.

   (to be inserted)



          Figure 1, Enhanced Loss Recovery State Machine Diagram

   In the following we describe the states and the actions taken.

3.3.2.1.  SCTP TLR OPEN STATE

   In this state SCTP is not performing Fast Recovery nor T3-recovery.
   This is the state entered when SCTP sends the first data after idle.
   In this state SCTP has outstanding data, a PTO timer is running on
   the lowest outstanding TSN and the SACK scoreboard has no gaps.
   I.e., the highest SACK'ed TSN is cummulatively acked.

   The PTO set on a new lowest outstanding TSN in this state will follow
   [PTO1] when less than 2 packets are outstanding at the time when the
   timer is set and follow [PTO2] when 2 or more packets are outstanding
   when the PTO timer is set.

   In this state the following may happen:

   o  A SACK acknowledging a higher outstanding TSN than the lowest
      outstanding TSN may arrive thus proving network responsiveness
      while still not acknowledging the lowest outstanding TSN.  This
      indicates that either packets are being re-ordered or the lowest
      outstanding TSN has been lost.  The state will now transit to SCTP
      TLR DELAY WAIT state for potential entering of SCTP TLR driven
      Fast Recovery if the PTO timer kicks prior to the lowest
      outstanding TSN has been acknowledged.





Nielsen                   Expires May 14, 2015                 [Page 12]


Internet-Draft                  SCTP TLR                   November 2014


   o  The PTO set on a new lowest outstanding TSN in this state will
      follow [PTO1] when less than 2 packets are outstanding at the time
      when the timer is set and follow [PTO2] when 2 or more packets are
      outstanding when the PTO timer is set.

   o  The PTO timer on the lowest outstanding TSN may kick, in which
      case SCTP TLP will send a TLPP, reset the PTO timer on the lowest
      outstanding TSN to a T3 timer of value Max(PTO, RTO-PTO) and
      transit to SCTP TLP PROBE WAIT state to await either the kick of
      the T3 on the lowest outstanding TSN (network is persistently
      unresponsive) or prove of network responsiveness and potential
      entering of SCTP TLP driven Fast Recovery unless the network
      responsiveness proof comes in form of cummulative acknowledgement
      of the TSN.

3.3.2.2.  SCTP TLR DELAY PROBE STATE

   In this state the lowest outstanding TSN has remained unSACK'ed for
   more than PTO time and no indication (no SACK of higher outstanding
   TSNs have been received) thus resulting in the transmittal of a TLPP
   to probe for the network responsiveness.

   The MAX(PTO, RTO-PTO) T3-value set on the lowest outstanding TSN when
   sending the TLPP probe and entering this state shall be MAX(PTO1,
   (RTO-PTO)_previous), where the (RTO-PTO)_previous is set according to
   value of this at the time the PTO timer previously was set on the
   lowest outstanding TSN.

   In this state then the following may happen:

   o  A SACK cumulatively acknowledging all holes including the lowest
      outstanding TSN will bring the SCTP TLP STM state back to SCTP TLP
      Open state and the PTO timer will be restarted on the new lowest
      outstanding TSN.

   o  A SACK will arrive for a higher outstanding TSN with lowest
      outstanding TSN remaining unSACK'ed.  This will result in
      declaration of the lowest outstanding TSN as lost and will make
      SCTP enter Fast Recovery with exist point being set to the highest
      outstanding TSN as normal.

   o  A SACK will arrive that acknowledge the lowest outstanding TSN,
      and the PTO timer is reset on the new lowest outstanding TSN, but
      also data of higher TSN than the new lowest outstanding TSN are
      acknowledged in the SACK.  In this case there is indication that
      either packet re-ordering has occurred or the new lowest
      outstanding TSN has been lost.  The state will now transit to SCTP
      TLP Delay Wait state for potential entering of SCTP TLP driven



Nielsen                   Expires May 14, 2015                 [Page 13]


Internet-Draft                  SCTP TLR                   November 2014


      Fast Recovery if the PTO timer kicks prior to the new lowest
      outstanding TSN has acknowledged.

3.3.2.3.  SCTP TLR DELAY WAIT STATE

   In this state network responsiveness has been received (in form of a
   SACK of higher TSN than the lowest outstanding TSN) and the PTO timer
   on the lowest outstanding TSN is running for potential entering of
   SCTP TLP driven Fast Recovery.

   The PTO set on a new lowest outstanding TSN in this state will be
   [PTO2].

   In this state then the following may happen:

   o  The PTO timer on the lowest outstanding TSN kicks.  This will
      result in declaration of the lowest outstanding TSN as lost and
      will make SCTP enter Fast Recovery with exist point being set to
      the highest outstanding TSN as normal.

   o  A SACK cumulatively acknowledging all holes including the lowest
      outstanding TSN will bring the SCTP TLP STM state back to SCTP TLP
      Open state and the PTO timer will be restarted on the new lowest
      outstanding TSN.

   o  A SACK will arrive that acknowledge the lowest outstanding TSN,
      and the PTO timer is reset on the new lowest outstanding TSN, but
      also data of higher TSN than the new lowest outstanding TSN are
      acknowledged in the SACK.  In this case there is indication that
      either packet re-ordering has occurred or the new lowest
      outstanding TSN has been lost.  The state will remain in SCTP TLP
      Delay Wait state for potential entering of SCTP TLP driven Fast
      Recovery if the PTO timer kicks prior to the new lowest
      outstanding TSN has acknowledged.

   o  A SACK will arrive that does not acknowledge the lowest
      outstanding TSN.  In this situation the no state no changes are
      done to the PTO timer running and the state will remain in SCTP
      TLP Delay Wait state for potential entering of SCTP TLP driven
      Fast Recovery if the PTO timer kicks prior to the lowest
      outstanding TSN has acknowledged.

3.3.2.4.  Exit of Loss Recovery

   After exit of Fast Recovery or T3-Recovery then if data is
   outstanding a PTO timer is started on the lowest outstanding TSN and
   the state transits to either SCTP TLR OPEN state or to SCTP TLP DELAY
   Wait state depending on the status of the SACK scoreboard (i.e., do



Nielsen                   Expires May 14, 2015                 [Page 14]


Internet-Draft                  SCTP TLR                   November 2014


   gaps exists or not).  The PTO timer set will follow the rules
   described above.

3.3.3.  TLPP Transmission Rules

   The transmission of a Tail Loss Probe Packet (TLPP), done when
   entering the SCTP TLR PROBE DELAY WAIT state, is governed by the
   following details:

   o  TLPP of new data is always preferred if available.

   o  TLPP as new data is full-sized packet

   o  TLPP of retransmission data is one TSN chunk.  A TLPP of
      retransmission data counts twice in the in-flight until
      acknowledged.

   The motivation for sending TLPP of retransmission in form of one
   chunk only is that demasking of loss recovery by the TLPP (see
   Section 3.3.4) is more simple when only one TSN has been used as a
   probe.

   TLPP Transmission conditions:

   o  If no TLPP is outstanding, a probe is sent unconditionally of
      CWND.

   o  If a TLPP is outstanding, a probe is sent conditionally to
      flightsize < CWND + 1PMTU, otherwise no TLPP is sent.

   o  If no new data exists, a probe of retransmission data is sent
      conditional to whether a TLPP is already outstanding.  As follows:

      *  If no TLPP is outstanding, send TLPP consisting of highest
         outstanding TSN.

      *  If a TLPP is outstanding, then if and only if the probe is
         highest outstanding TSN may it be resent.  Otherwise no TLPP is
         sent.

   The above rules are defined to support detection of TLPP recovered
   losses by the algorithm described in Section 3.3.4.  The

3.3.4.  TLPP Recovered Losses

   If a single SCTP packet is lost, there is a risk that the TLPP packet
   itself might repair the loss if that particular lost packet is used
   as probe.  The masking problem is only present if the TLPP is based



Nielsen                   Expires May 14, 2015                 [Page 15]


Internet-Draft                  SCTP TLR                   November 2014


   on retransmission data (i.e., not if the TLPP is based on new data).
   The TLPP might mask the loss and thus interfering with the congestion
   control principle that requires for CWND halving when a loss is
   detected.

   At present the solution in this document operates with the algorithm
   defined for this purpose in [DUKKIPATI01] with a slight adjustment to
   SCTP to rely on the D-SACK (duplicate TSN received) information
   available from SCTP SACK.  The solution operates with a conceptual
   TLPP Retransmission Episode.  As follows:

   o  Once a TLPP packet consisting of retransmission data is sent a
      TLPP Retransmission Episode is started.  The TLPP Retransmission
      Episode is over when an incoming SACK cumulatively acknowledges a
      sequence number higher than the sequence number of the TLPP probe
      with retransmission data.

   o  CWND halving is done at the termination of a TLPP Retransmisssion
      Episode if at this time in stage the number of times the TLPP TSN
      has been received, acccording to the D-SACK information
      communicated, is lower than the number of times the TLPP TSN has
      been sent.

   o  A TLPP Retransmission Episode is abruptly terminated if Fast
      Recovery or T3-Recovery is entered.

   OPEN ISSUE: The above solution is vulnerable to spurious CWND halving
   when a TLPP packet is re-ordered compared to a subsequent new data
   chunk sent.  A possibly solution, contemplated for a number of
   reasons for SCTP, is to extend SCTP to distinguish retransmitted
   chunks from original chunks.

3.4.  SCTP MH Considerations

   The functions defined have been implemented for SCTP MH.  MH aspects
   to be filled in.

4.  Evaluation of function

   Experiments in progress.  Details to be filled in.

5.  Socket API Considerations

   This section will describe how the socket API defined in [RFC6458] is
   extended to provide a way for the application to control the
   retransmission algorithms in operation in the SCTP layer.

   Socket option for control of the features is yet to be defined.



Nielsen                   Expires May 14, 2015                 [Page 16]


Internet-Draft                  SCTP TLR                   November 2014


   Please note that this section is informational only.

6.  Security Considerations

   There are no new security considerations introduced by the functions
   defined in this document.

7.  Acknowledgements

   The author acknowlegdes Henrik Jensen for his very significant
   contribution for the definition of, the implementation of and the
   experiments with function.

   The work heavily draws on prior art work done for TCP, [DUKKIPATI01]
   in particular.  The contributors of that work should be credited for
   many of the ideas put forward here for SCTP.

8.  IANA Considerations

   This document does not create any new registries or modify the rules
   for any existing registries managed by IANA.

9.  References

9.1.  Normative References

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.

   [RFC4960]  Stewart, R., "Stream Control Transmission Protocol", RFC
              4960, September 2007.

   [RFC5062]  Stewart, R., Tuexen, M., and G. Camarillo, "Security
              Attacks Found Against the Stream Control Transmission
              Protocol (SCTP) and Current Countermeasures", RFC 5062,
              September 2007.

   [RFC6675]  Blanton, E., Allman, M., Wang, L., Jarvinen, I., Kojo, M.,
              and Y. Nishida, "A Conservative Loss Recovery Algorithm
              Based on Selective Acknowledgment (SACK) for TCP", RFC
              6675, August 2012.

9.2.  Informative References

   [DUKKIPATI01]
              Dukkipati, N., Cardwell, N., Cheng, Y., and M. Mathis,
              "Tail Loss Probe (TLP): An Algorithm for Fast Recovery of
              Tail", Work Expired , 2 2013.



Nielsen                   Expires May 14, 2015                 [Page 17]


Internet-Draft                  SCTP TLR                   November 2014


   [DUKKIPATI02]
              Dukkipati, N., Mathis, M., Cheng, Y., and M. Ghobadi,
              "Proportional Rate Reduction for TCP", Proceedings of the
              11th ACM SIGCOMM Conference on Internet Measurement , 11
              2011.

   [HURTIG]   Hurtig et al., P., "TCP and SCTP RTO Restart, draft-ietf-
              tcpm-rtorestart-03", IETF Work In Progress , 7 2014.

   [MATHIS]   Mathis, M., "FACK", ACM SIGCOMM Computer Communication
              Review 26,4, 10 1996.

   [RFC5681]  Allman, M., Paxson, V., and E. Blanton, "TCP Congestion
              Control", RFC 5681, September 2009.

   [RFC5827]  Allman, M., Avrachenkov, K., Ayesta, U., Blanton, J., and
              P. Hurtig, "Early Retransmit for TCP and Stream Control
              Transmission Protocol (SCTP)", RFC 5827, May 2010.

   [RFC6458]  Stewart, R., Tuexen, M., Poon, K., Lei, P., and V.
              Yasevich, "Sockets API Extensions for the Stream Control
              Transmission Protocol (SCTP)", RFC 6458, December 2011.

Author's Address

   Karen E. E. Nielsen
   Ericsson
   Kistavaegen 25
   Stockholm  164 80
   Sweden

   Email: karen.nielsen@tieto.com



















Nielsen                   Expires May 14, 2015                 [Page 18]