TSVWG                                                              Y. Li
Internet-Draft                                                   X. Zhou
Intended status: Informational                                    Huawei
Expires: November 25, 2019                                  May 24, 2019


 LOOPS (Localized Optimizations of Path Segments) Problem Statement and
                             Opportunities
             draft-li-tsvwg-loops-problem-opportunities-02

Abstract

   In various network deployments, end to end paths are partitioned into
   multiple segments.  In some cloud based WAN connections, multiple
   overlay tunnels in series are used to achieve better path selection
   and lower latency.  In satellite communication, the end to end path
   is split into two terrestrial segments and a satellite segment.
   Packet losses can be caused both by random events or congestion in
   various deployments.

   Traditional end-to-end transport layers respond to packet loss slowly
   especially in long-haul networks: They either wait for some signal
   from the receiver to indicate a loss and then retransmit from the
   sender or rely on sender's timeout which is often quite long.  Non-
   congestion caused packet loss may make the TCP sender over-reduce the
   sending rate unnecessarily.  With end-to-end encryption moving under
   the transport (QUIC), traditional PEP (performance enhancing proxy)
   techniques such as TCP splitting are no longer applicable.

   LOOPS (Local Optimizations on Path Segments) aims to provide non end-
   to-end, locally based in-network recovery to achieve better data
   delivery by making packet loss recovery faster and by avoiding the
   senders over-reducing their sending rate.  In an overlay network
   scenario, LOOPS can be performed over the existing, or purposely
   created, overlay tunnel based path segments.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any


Li & Zhou               Expires November 25, 2019               [Page 1]

Internet-Draft        LOOPS Problem & opportunities             May 2019


   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on November 25, 2019.

Copyright Notice

   Copyright (c) 2019 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (https://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
     1.1.  Terminology . . . . . . . . . . . . . . . . . . . . . . .   4
   2.  Cloud-Internet Overlay Network  . . . . . . . . . . . . . . .   5
     2.1.  Tail Loss or Loss in Short Flows  . . . . . . . . . . . .   7
     2.2.  Packet Loss in Real Time Media Streams  . . . . . . . . .   8
     2.3.  Packet Loss and Congestion Control in Bulk Data Transfer    8
     2.4.  Multipathing  . . . . . . . . . . . . . . . . . . . . . .   9
   3.  Satellite Communication . . . . . . . . . . . . . . . . . . .   9
   4.  Features and Impacts to be Considered for LOOPS . . . . . . .  11
     4.1.  Local Recovery and End-to-end Retransmission  . . . . . .  12
       4.1.1.  OE to OE Measurement, Recovery and Multipathing . . .  13
     4.2.  Congestion Control Interaction  . . . . . . . . . . . . .  14
     4.3.  Overlay Protocol Extensions . . . . . . . . . . . . . . .  16
     4.4.  Summary . . . . . . . . . . . . . . . . . . . . . . . . .  16
   5.  Security Considerations . . . . . . . . . . . . . . . . . . .  17
   6.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  17
   7.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  17
   8.  Informative References  . . . . . . . . . . . . . . . . . . .  17
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  19

1.  Introduction

   Overlay tunnels are widely deployed for various networks, including
   long haul WAN interconnection, enterprise wireless access networks,
   etc.  The end to end connection is partitioned into multiple path
   segments using overlay tunnels.  This serves a number of purposes,


Li & Zhou               Expires November 25, 2019               [Page 2]

Internet-Draft        LOOPS Problem & opportunities             May 2019


   for instance, selecting a better path over the WAN or delivering the
   packets over heterogeneous network, such as enterprise access and
   core networks.

   A reliable transport layer normally employs some end-to-end
   retransmission mechanisms which also address congestion control
   [RFC0793] [RFC5681].  The sender either waits for the receiver to
   send some signals on a packet loss or sets some form of timeout for
   retransmission.  For unreliable transport layer protocols such as RTP
   [RFC3550], optional and limited usage of end-to-end retransmission is
   employed to recover from packet loss [RFC4585] [RFC4588].

   End-to-end retransmission to recover lost packets is slow especially
   when the network is long haul.  When a path is partitioned into
   multiple path segments that are realized as overlay tunnels, LOOPS
   (Local Optimizations on Path Segments) tries to provide local segment
   based in-network recovery to achieve better data delivery by making
   packet loss recovery faster and by avoiding the senders over-reducing
   their sending rate.  In an overlay network scenario, LOOPS can be
   performed over the existing, or purposely created, overlay tunnel
   based path segments.

   Some link types (satellite, microwave) may exhibit unusually high
   loss rate in special conditions (e.g., fades due to heavy rain).  The
   traditional TCP sender interprets loss as congestion and over-reduces
   the sending rate, degrading the throughput.  LOOPS is also applicable
   to such scenarios to improve throughput.

   Section 2 presents some of the issues and opportunities found in
   Cloud-Internet overlay networks that require higher performance and
   more reliable packet transmission in best effort networks.  Section 3
   discusses applications of LOOPS in satellite communication.
   Section 4 describes the corresponding solution features and the their
   impact on existing network technologies.


Li & Zhou               Expires November 25, 2019               [Page 3]

Internet-Draft        LOOPS Problem & opportunities             May 2019


                                                      ON=overlay node
                                                      UN=underlay node

   +---------+                                               +---------+
   |   App   | <---------------- end-to-end ---------------> |   App   |
   +---------+                                               +---------+
   |Transport| <---------------- end-to-end ---------------> |Transport|
   +---------+                                               +---------+
   |         |                                               |         |
   |         |        +--+  path  +--+  path segment2  +--+  |         |
   |         |        |  |<-seg1->|  |<--------------> |  |  |         |
   | Network |  +--+  |ON|  +--+  |ON|  +--+   +----+  |ON|  | Network |
   |         |--|UN|--|  |--|UN|--|  |--|UN|---| UN |--|  |--|         |
   +---------+  +--+  +--+  +--+  +--+  +--+   +----+  +--+  +---------+
     End Host                                                  End Host
                       <--------------------------------->
                        LOOPS domain: path segment enables
                        optimizations for better local transport

             Figure 1: LOOPS in Overlay Network Usage Scenario

1.1.  Terminology

   LOOPS:  Local Optimizations on Path Segments.  LOOPS includes the
      local in-network (i.e. non end-to-end) recovery function, for
      instance, loss detection and measurements.

   LOOPS Node:  Node supporting LOOPS functions.

   Overlay Node (ON):  Node having overlay functions (like overlay
      protocol encapsulation/decapsulation, header modification, TLV
      inspection) and LOOPS functions in LOOPS overlay network usage
      scenario.  Both OR and OE are Overlay Nodes.

   Overlay Tunnel:  A tunnel with designated ingress and egress nodes
      using some network overlay protocol as encapsulation, optionally
      with a specific traffic type.

   Overlay Path:  A channel within the overlay tunnel, where the traffic
      transmitted on the channel needs to pass through zero or more
      designated intermediate overlay nodes.  There may be more than one
      overlay path within an overlay tunnel when the different sets of
      designated intermediate overlay nodes are specified.  An overlay
      path may contain multiple path segments.  When an overlay tunnel
      contains only one overlay path without any intermediate overlay
      node specified, overlay path and overlay tunnel are used
      interchangeably.


Li & Zhou               Expires November 25, 2019               [Page 4]

Internet-Draft        LOOPS Problem & opportunities             May 2019


   Overlay Edge (OE):  Edge node of an overlay tunnel.

   Overlay Relay (OR):  Intermediate overlay node on an overlay path.
      An overlay path need not contain any OR.

   Path segment:  Part of an overlay path between two neighbor overlay
      nodes.  It is used interchangeably with overlay segment in this
      document when the context wants to emphasize on its overlay
      encapsulated nature.  An overlay path may contain multiple path
      segments.  When an overlay path contains only one path segment,
      i.e. the segment is between two OEs, the path segment is
      equivalent to the overlay path.  It is also called segment for
      simplicity in this document.

   Overlay segment:  Refers to path segment.

   Underlay Node (UN):  Nodes not participating in the overlay network
      function.

2.  Cloud-Internet Overlay Network

   The Internet is a huge network of networks.  The interconnections of
   end devices using this global network are normally provided by ISPs
   (Internet Service Provider).  This network created by the composition
   of the ISP networks is considered as the traditional Internet.  CSPs
   (Cloud Service Providers) are connecting their data centers using the
   Internet or via self-constructed networks/links.  This expands the
   Internet's infrastructure and, together with the original ISP's
   infrastructure, forms the Internet underlay.

   NFV (network function virtualization) further makes it easier to
   dynamically provision a new virtual node as a work load in a cloud
   for CPU/storage intensive functions.  With the aid of various
   mechanisms such as kernel bypassing and Virtual IO, forwarding based
   on virtual nodes is becoming more and more effective.  The
   interconnections among the purposely positioned virtual nodes and/or
   the existing nodes with virtualization functions potentially form an
   overlay of Internet.  It is called the Cloud-Internet Overlay Network
   (CION) in this document.

   CION makes use of overlay technologies to direct the traffic going
   through the specific overlay path regardless of the underlying
   physical topology, in order to achieve better service delivery.  It
   purposely creates or selects overlay nodes (ON) from providers.  By
   continuously measuring the delay of path segments and use them as
   metrics for path selection, when the number of overlay nodes is
   sufficiently large, there is a high chance that a better path could
   be found [DOI_10.1109_ICDCS.2016.49] [DOI_10.1145_3038912.3052560].


Li & Zhou               Expires November 25, 2019               [Page 5]

Internet-Draft        LOOPS Problem & opportunities             May 2019


   [DOI_10.1145_3038912.3052560] further shows all cloud providers
   experience random loss episodes and random loss accounts for more
   than 35% of total loss.

   Figure 2 shows an example of an overlay path over large geographic
   distances.  The path between two OEs (Overlay Edges) is an overlay
   path.  OEs are ON1 & ON4 in Figure 2.  Part of the path between ONs
   is a path segment.  Figure 2 shows the overlay path with 3 segments,
   i.e. ON1-ON2-ON3-ON4.  ON is usually a virtual node, though it does
   not have to be.  Overlay path transmits packets in some form of
   network overlay protocol encapsulation.  ON has the computing and
   memory resources that can be used for some functions like packet loss
   detection, network measurement and feedback, packet recovery.

                  _____________
                 /  domain 1   \
                /               \
            ___/                 -------------\
           /                                   \
    PoP1 ->--ON1                                \
          |   |                            ON4------>-- PoP2
          |   |   ON2                     ___|__/
           \__|_ |->|         _____      /   |
              | \|__|__      /     \    /    |
              |  |  |  \____/       \__/     |
             \|/ |  |        _____           |
              |  |  |    ___/     \          |
              |  | \|/  /          \_____    |
              |  |  |  /         domain 2 \ /|\
              |  |  | |       ON3         |  |
              |  |  |  \      |->|        |  |
              |  |  |   \_____|__|_______/   |
              | /|\ |         | \|/          |
              |  |  |         |  |           |
              |  |  |        /|\ |           |
       +--------------------------------------------------+
       |      |  |  |         |  |           |   Internet |
       |      o--o  o---o->---o  o---o->--o--o   underlay |
       +--------------------------------------------------+

              Figure 2: Cloud-Internet Overlay Network (CION)

   We tested based on 37 overlay nodes from multiple cloud providers
   globally.  Each pair of the overlay nodes are used as sender and
   receiver.  When the traffic is not intentionally directed to go
   through any intermediate virtual nodes, we call the path that the
   traffic takes the _default path_ in the test.  When any of the
   virtual nodes is intentionally used as an intermediate node to


Li & Zhou               Expires November 25, 2019               [Page 6]

Internet-Draft        LOOPS Problem & opportunities             May 2019


   forward the traffic, the path that the traffic takes is an _overlay
   path_ in the test.  The preliminary experiments showed that the delay
   of an overlay path is shorter than that of the default path in 69% of
   cases at 99% percentile and improvement is 17.5% at 99% percentile
   when we probe Ping packets every second for a week.

   Lower delay does not necessarily mean higher throughput.  Different
   path segments may have different packet loss rates.  Loss rate is
   another major factor impacting TCP throughput.  From some customer
   requirements, we set the target loss rate to be less than 1% at 99%
   percentile and 99.9% percentile, respectively.  The loss was measured
   between any two overlay nodes, i.e. any potential path segment.  Two
   thousand Ping packets were sent every 20 seconds between two overlay
   nodes for 55 hours.  This preliminary experiment showed that the
   packet loss rate satisfaction are 44.27% and 29.51% at the 99% and
   99.9% percentiles respectively.

   Hence packet loss in an overlay segment is a key issue to be solved
   in CION.  In long-haul networks, the end-to-end retransmission of
   lost packet can result in an extra round trip time.  Such extra time
   is not acceptable in some cases.  As CION naturally consists of
   multiple overlay segments, LOOPS leverages this to perform local
   optimizations on a single hop between two overlay nodes.  ("Local"
   here is a concept relative to end-to-end, it does not mean such
   optimization is limited to LAN networks.)

   The following subsections present different scenarios using multiple
   segment based overlay paths with a common need of local in-network
   loss recovery in best effort networks.

2.1.  Tail Loss or Loss in Short Flows

   When the lost segments are at the end of a transaction, TCP's fast
   retransmit algorithm does not work as there are no ACKs to trigger
   it.  When a sender does not receive an ACK for a given segment within
   a certain amount of time called retransmission timeout (RTO), it re-
   sends the segment [RFC6298].  RTO can be as long as several seconds.
   Hence the recovery of lost segments triggered by RTO is lengthy.
   [I-D.dukkipati-tcpm-tcp-loss-probe] indicates that large RTOs make a
   significant contribution to the long tail on the latency statistics
   of short flows like web pages.

   The short flow often completes in one or two RTTs.  Even when the
   loss is not a tail loss, it can possibly add another RTT because of
   end-to-end retransmission (not enough packets are in flight to
   trigger fast retransmit).  In long haul networks, it can result in
   extra time of tens or even hundreds of milliseconds.


Li & Zhou               Expires November 25, 2019               [Page 7]

Internet-Draft        LOOPS Problem & opportunities             May 2019


   An overlay segment transmits the aggregated flows from ON to ON.  As
   short flows are aggregated, the probability of tail loss over this
   specific overlay segment decreases compared to an individual flow.
   The overlay segment is much shorter than the end-to-end path in a
   Cloud- Internet overlay network, hence loss recovery over an overlay
   segment is faster.

2.2.  Packet Loss in Real Time Media Streams

   The Real-time transport protocol (RTP) is widely used in interactive
   audio and video.  Packet loss degrades the quality of the received
   media.  When the latency tolerance of the application is sufficiently
   large, the RTP sender may use RTCP NACK feedback from the receiver
   [RFC4585] to trigger the retransmission of the lost packets before
   the playout time is reached at the receiver.

   In a Cloud-Internet overlay network, the end-to-end path can be
   hundreds of milliseconds.  End-to-end feedback based retransmission
   may be not be very useful when applications can not tolerate one more
   RTT of this length.  Loss recovery over an overlay segment can then
   be used for the scenarios where RTCP NACK triggered retransmission is
   not appropriate.

2.3.  Packet Loss and Congestion Control in Bulk Data Transfer

   TCP congestion control algorithms such as Reno and CUBIC basically
   interpret packet loss as congestion experienced somewhere in the
   path.  When a loss is detected, the congestion window will be
   decreased at the sender to make the sending slower.  It has been
   observed that packet loss is not an accurate way to detect congestion
   in the current Internet [I-D.cardwell-iccrg-bbr-congestion-control].
   In long-haul links, when the loss is caused by non-persistent burst
   which is extremely short and pretty random, the sender's reaction of
   reducing sending rate is not able to respond in time to the
   instantaneous path situation or to mitigate such bursts.  On the
   contrary, reducing window size at the sender unnecessarily or too
   aggressively harms the throughput for application's long lasting
   traffic like bulk data transfer.

   The overlay nodes are distributed over the path with computing
   capability, they are in a better position than the end hosts to
   deduce the underlying links' instantaneous situation from measuring
   the delay, loss or other metrics over the segment.  Shorter round
   trip time over a path segment will benefit more accurate and
   immediate measurements for the maximum recent bandwidth available,
   the minimum recent latency, or trend of change.  ONs can further
   decide if the sending rate reduction at the sender is necessary when
   a loss happened.  Section 4.2 talks more details on this.


Li & Zhou               Expires November 25, 2019               [Page 8]

Internet-Draft        LOOPS Problem & opportunities             May 2019


2.4.  Multipathing

   As an overlay path may suffer from an impairment of the underlying
   network, two or more overlay paths between the same set of ingress
   and egress overlay nodes can be combined for reliability purpose.
   During a transient time when a network impairment is detected,
   sending replicating traffic over two paths can improve reliability.

   When two or more disjoint overlay paths are available as shown in
   Figure 3 from ON1 to ON2, different sets of traffic may use different
   overlay paths.  For instance, one path is for low latency and the
   other is for higher bandwidth, or they can be simply used as load
   balancing for better bandwidth utilization.

   Two disjoint paths can usually be found by measuring to figure out
   the segments with very low mathematical correlation in latency
   change.  When the number of overlay nodes is large, it is easy to
   find disjoint or partially disjoint segments.

   Different overlay paths may have varying characteristics.  The
   overlay tunnel should allow the overlay path to handle the packet
   loss depending on its own path measurements.

                       ON-A
             +----------o------------------+
             |                             |
             |                             |
      A -----o ON1                      ON2o----- B
             |                             |
             +-----------------------o-----+
                                   ON-B

                     Figure 3: Multiple Overlay Paths

3.  Satellite Communication

   Traditionally, satellite communications deploy PEP (performance
   enhancing proxy) nodes around the satellite link to enhance end-to-
   end performance.  TCP splitting is a common approach employed by such
   PEPs, where the TCP connection is split into three: the segment
   before the satellite hop, the satellite section (uplink, downlink),
   and the segment behind the satellite hop.  This requires heavy
   interactions with the end-to-end transport protocols, usually without
   the explicit consent of the end hosts.  Unfortunately, this is
   indistinguishable from a man-in-the-middle attack on TCP.  With end-
   to-end encryption moving under the transport (QUIC), this approach is
   no longer useful.


Li & Zhou               Expires November 25, 2019               [Page 9]

Internet-Draft        LOOPS Problem & opportunities             May 2019


   Geosynchronous Earth Orbit (GEO) satellites have a one-way delay (up
   to the satellite and back) on the order of 250 milliseconds.  This
   does not include queueing, coding and other delays in the satellite
   ground equipment.  The Round Trip Time for a TCP or QUIC connection
   going over a satellite hop in both directions, in the best case, will
   be on the order of 600 milliseconds.  And, it may be considerably
   longer.  RTTs on this order of magnitude have significant performance
   implications.

   Packet loss recovery is an area where splitting the TCP connection
   into different parts helps.  Packets lost on the terrestrial links
   can be recovered at terrestrial latencies.  Packet loss on the
   satellite link can be recovered more quickly by an optimized for
   satellite protocol between the PEPs and/or link layer FEC than they
   could be end to end.  Again, encryption makes TCP splitting no longer
   applicable.  Enhanced error recovery at the satellite link layer
   helps for the loss on the satellite link but doesn't help for the
   terrestrial links.  Even when the terrestrial segments are short, any
   loss must be recovered across the satellite link delay.  And, there
   are cases when a satellite ground station connects to the general
   Internet with a potentially larger terrestrial segment (e.g., to a
   correspondent host in another country).  Faster recovery over such
   long terrestrial segments is desirable.

   Another aspect of recovery is that terrestrial loss is highly likely
   to be congestion related but satellite loss is more likely to be
   transmission errors due to link conditions.  A transport endpoint
   slowing down because of mis-interpreting these errors as congestion
   losses unnecessarily reduces performance.  But, at the end points,
   the difference between the two is not easily distinguished.  To
   elaborate more on the loss recovery for satellite communications,
   while the error rate on the satellite paths is generally very low
   most of the time, it might get higher during special link conditions
   (e.g.  fades due to heavy rain).  The satellite hop itself does know
   which losses are due to link conditions as opposed to congestion, but
   it has no mechanism to signal this difference to the end hosts.

   We will need the protocol under QUIC to try to minimize non-
   congestion packet drop.  Specific link layers may have techniques
   such as satellite FEC to recover.  Where the capabilities of that may
   be exceeded (e.g., rain fade), we can look at LOOPS-like approaches.

   There are two high level classes of solutions for making encrypted
   transport traffic like QUIC work well over satellite:

   o  Hooks in the protocol which can adapt to large BDPs where both the
      bandwidth and the latency are large.  This would require end to
      end enhancement.


Li & Zhou               Expires November 25, 2019              [Page 10]

Internet-Draft        LOOPS Problem & opportunities             May 2019


   o  Capabilities (such as LOOPS) under the protocol to improve
      performance over specific segments of the path.  In particular,
      separating the terrestrial from the satellite losses.  Fixing the
      terrestrial loss quickly and keeping throughput high over
      satellite segment by not causing the end-hosts to over-reduce
      their sending window in case of non-congestion loss.

   This document focuses on the latter.

4.  Features and Impacts to be Considered for LOOPS

   LOOPS (Localized Optimizations of Path Segments) aims to leverage the
   virtual nodes in a selected path to improve the transport performance
   "locally" instead of end-to-end as those nodes have partitioned the
   path to multiple segments.  With the technologies like NFV (Network
   function virtualization) and virtual IO, it is easier to add
   functions to virtual nodes and even the forwarding on those virtual
   nodes is getting more efficient.  Some overlay protocols such as
   VXLAN [RFC7348], GENEVE [I-D.ietf-nvo3-geneve], LISP [RFC6830] or
   CAPWAP [RFC5415] are assumed to be employed in the network.  In
   overlay network usage scenario, LOOPS can extend a specific overlay
   protocol header to perform local measurement and local recovery
   functions, like the example shown in Figure 4.

    +------------+------------+-----------------+---------+---------+
    |Outer IP hdr|Overlay hdr |LOOPS information|Inner hdr|payload  |
    +------------+------------+-----------------+---------+---------+


                 Figure 4: LOOPS Extension Header Example

   LOOPS uses packet number space independent from that of the transport
   layer.  Acknowledgment should be generated from ON receiver to ON
   sender for packet loss detection and local measurement.  To reduce
   overhead, negative ACK over each path segment is a good choice here.
   A Timestamp echo mechanism, analogous to TCP's Timestamp option,
   should be employed in band in LOOPS extension to measure the local
   RTT and variation for an overlay segment.  Local in-network recovery
   is performed.  The measurement over segment is expected to give a
   hint on whether the lost packet of locally recovered one was caused
   by congestion.  Such a hint could be further feedback, using like by
   ECN Congestion Experienced (CE) markings, to the end host sender.  It
   directs the end host sender if congestion window adjustment is
   necessary.  LOOPS normally works on the overlay segment which
   aggregates the same type of traffic, for instance TCP traffic or
   finer granularity like TCP throughput sensitive traffic.  LOOPS does
   not look into the inner packet.  Elements to be considered in LOOPS
   are discussed briefly here.


Li & Zhou               Expires November 25, 2019              [Page 11]

Internet-Draft        LOOPS Problem & opportunities             May 2019


4.1.  Local Recovery and End-to-end Retransmission

   There are basically two ways to perform local recovery,
   retransmission and FEC (forward error correction).  They are possibly
   used together in some cases.  Such approaches between two overlay
   nodes recover the lost packet in relatively shorter distance and thus
   shorter latency.  Therefore the local recovery is always faster
   compared to end-to- end.

   At the same time, most transport layer protocols have their own end-
   to-end retransmission to recover the lost packet.  It would be ideal
   that end-to-end retransmission at the sender was not triggered if the
   local recovery was successful.

   End-to-end retransmission is normally triggered by a NACK as in RTCP
   or multiple duplicate ACKs as in TCP.

   When FEC is used for local recovery, it may come with a buffer to
   make sure the recovered packets delivered are in order subsequently.
   Therefore the receiver side is unlikely to see the out-of-order
   packets and then send a NACK or multiple duplicate ACKs.  The side
   effect to unnecessarily trigger end-to-end retransmit is minimum.
   When FEC is used, if redundancy and block size are determined, extra
   latency required to recover lost packets is also bounded.  Then RTT
   variation caused by it is predictable.  In some extreme case like a
   large number of packet loss caused by persistent burst, FEC may not
   be able to recover it.  Then end-to-end retransmit will work as a
   last resort.  In summary, when FEC is used as local recovery, the
   impact on end-to-end retransmission is limited.

   When retransmission is used, more care is required.

   For packet loss in RTP streaming, retransmission can recover those
   packets which would not be retransmitted end-to-end otherwise due to
   long RTT.  It would be ideal if the retransmitted packet reaches the
   receiver before it sends back information that the sender would
   interpret as a NACK for the lost packet.  Therefore when the
   segment(s) being retransmitted is a small portion of the whole end to
   end path, the retransmission will have a significant effect of
   improving the quality at receiver.  When the sender also re-transmits
   the packet based on a NACK received, the receiver will receive the
   duplicated retransmitted packets and should ignore the duplication.

   For packet loss in TCP flows, TCP RENO and CUBIC use duplicate ACKs
   as a loss signal to trigger the fast retransmit.  There are different
   ways to avoid the sender's end-to-end retransmission being triggered
   prematurely:


Li & Zhou               Expires November 25, 2019              [Page 12]

Internet-Draft        LOOPS Problem & opportunities             May 2019


   o  The egress overlay node can buffer the out-of-order packets for a
      while, giving a limited time for a packet being retransmitted
      somewhere in the overlay path to reach it.  The retransmitted
      packet and the buffered packets caused by it may increase the RTT
      variation at the sender.  When the retransmitted latency is a
      small portion of RTT or the loss is rare, such RTT variation will
      be smoothed without much impact.  Another possible way is to make
      the sender exclude such packets from the RTT measurement.  The
      locally recovered packets can be specially marked and this marking
      is spin back to end host sender.  Then RTT measurement should not
      use that packet.

      The buffer management is nontrivial in this case.  It has to be
      determined how many out-of-order packets can be buffered at the
      egress overlay node before it gives up waiting for a successful
      local retransmission.  As the lost packet is not always recovered
      successfully locally, the sender may invoke end-to-end fast
      retransmit slower than it would be in classic TCP.

   o  If LOOPS network does not buffer the out-of-order packets caused
      by packet loss, TCP sender can use a time based loss detection
      like RACK [I-D.ietf-tcpm-rack] to prevent the TCP sender from
      invoking fast retransmit too early.  RACK uses the notion of time
      to replace the conventional DUPACK threshold approach to detect
      losses.  RACK is required to be tuned to fit the local
      retransmission better.  If there are n similar segments over the
      path, segment retransmission will at least add RTT/n to the
      reordering window by average when the packet is lost only once
      over the whole overlay path.  This approach is more preferred than
      one described in previous bullet.  On the other hand, if time
      based loss detection is not supported at the sender, end to end
      retransmission will be invoked as usual.  It wastes some
      bandwidth.

4.1.1.  OE to OE Measurement, Recovery and Multipathing

   When local recovery is between two neighbor ONs, it is called per-hop
   recovery.  It can be between overlay relays or between overlay relay
   and overlay edge.  Another type of local recovery is called OE to OE
   recovery which performs between overlay edge nodes.  When the
   segments of an overlay path have similar characteristics and/or only
   OE has the expected processing capability, OE to OE based local
   recovery can be used instead of per-hop recovery.

   If there is more than one overlay path in an overlay tunnel,
   multipathing splits and recombines the traffic.  Measurements such as
   round trip time and loss rate between OEs hav to be specific to each
   path.  The ingress OE can use the feedback measurement to determine


Li & Zhou               Expires November 25, 2019              [Page 13]

Internet-Draft        LOOPS Problem & opportunities             May 2019


   the FEC parameter settings for different path.  FEC can also be
   configured to work over the combined path.  The egress OE must be
   able to remove the replicated packet when overlay path is switched
   during impairment.

   OE to OE measurement can help each segment determine its proportion
   in edge to edge delay.  It is useful for ON to decide if it is
   necessary to turn on the per-hop recovery or how to fine tune the
   parameter settings.  When the segment delay ratio is small, the
   segment retransmission is more effective.

4.2.  Congestion Control Interaction

   When a TCP-like transport layer protocol is used, local recovery in
   LOOPS has to interact with the upper layer transport congestion
   control.  Classic TCP adjusts the congestion window when a loss is
   detected and fast retransmit is invoked.

   The local recovery mechanism breaks the assumption of the necessary
   and sufficient conditional relationship between detected packet loss
   and congestion control trigger at the sender in classic TCP.  The
   loss that is locally recovered can be caused by a non-persistent
   congestion such as a microburst or a random loss, both of which
   ideally would not let the sender invoke the congestion control
   mechanism.  But then, it can also possibly caused by a real
   persistent congestion which should let the sender invoke sending rate
   reduction.  In either case, the sender does not see the locally
   recovered packet as a loss.

   When the local recovery takes effect, we consider the following two
   cases.  Firstly, the classic TCP sender does not see the enough
   number of duplicate ACKs to trigger fast retransmit.  This could be
   the result of in-order packet delivery including locally recovered
   ones to the receiver as mentioned in last subsection.  Classic TCP
   sender in this case will not reduce congestion window as no loss is
   detected.  Secondly, if a time based loss detection such as RACK is
   used, as long as the locally recovered packet's ACK reaches the
   sender before the reordering window expires, the congestion window
   will not be reduced.

   Such behavior brings the desirable throughput improvement when the
   recovered packet is lost due to non-persistent congestion.  It solves
   the throughput problem mentioned in Section 2.3 and Section 3.
   However, it also brings the risk that the sender is not able to
   detect the real persistent congestion in time and then overshoot.
   Eventually a severe congestion that is not recoverable by a local
   recovery mechanism may occur.  In addition, it may be unfriendly to


Li & Zhou               Expires November 25, 2019              [Page 14]

Internet-Draft        LOOPS Problem & opportunities             May 2019


   other flows (possibly pushing them out) if those flows are running
   over the same underlying bottleneck links.

   There is a spectrum of approaches.  On one end, each locally
   recovered packet can be treated exactly as a loss in order to invoke
   the congestion control at the sender to guarantee the fair sharing as
   classic TCP by setting its CE (Congestion Experienced) bit.  Explicit
   Congestion Notification (ECN) can be used here as ECN marking was
   required to be equivalent to a packet drop [RFC3168].  Congestion
   control at the sender works as usual and no throughput improvement
   could be achieved (although the benefit of faster recovery is still
   there).  On the other hand, ON can perform its congestion measurement
   over the segment, for instance local RTT and its variation trend.
   Then the lost packet can be determined if it was caused by congestion
   or other factors.  It will further decide if it is necessary to set
   CE marking or even what ratio is set to make the sender adjust the
   sending rate more correctly.

   There are possible cases that the sender detects the loss even with
   local recovery in function.  For example, when the re-ordering window
   in RACK is not optimally adapted, the sender may trigger the
   congestion control at the same time of end-to-end retransmission.  If
   spurious retransmission detection based on DSACK [RFC3708] is used,
   such end-to-end retransmission will be found out unnecessary when
   locally recovered packets reaches the receiver successfully.  Then
   congestion control changes will be undone at the sender.  This
   results in similar pros and cons as described earlier.  Pros are
   preventing the unnecessary window reduction and improving the
   throughput when the loss is caused by non-persistent congestion or
   random loss.  Cons are some mechanisms like ECN or its variants
   should be used wisely to make sure the congestion control is invoked
   in case of persistent congestion.

   An approach where the losses on a path segment are not immediately
   made known to the end-to-end congestion control can be combined with
   a "circuit breaker" style congestion control on the path segment.
   When the usage of path segment by the overlay flow starts to become
   unfair, the path segment sends congestion signals up to the end-to-
   end congestion control.  This must be carefully tuned to avoid
   unwanted oscillation.

   In summary, local recovery can improve Flow Completion Time (FCT) by
   eliminating tail loss in small flows.  As it changes loss event to
   out-of-order event in most cases to TCP sender, if TCP sender uses
   loss based congestion control, there is some implication on the
   throughput.  We suggest ECN and spurious retransmission to be enabled
   when local recovery is in use, it would give the desirable
   throughput, i.e. when loss is caused by congestion, reduce congestion


Li & Zhou               Expires November 25, 2019              [Page 15]

Internet-Draft        LOOPS Problem & opportunities             May 2019


   window; otherwise keep sender's sending rate.  We do not suggest to
   use spurious retransmission alone together with local recovery as it
   may cause the TCP sender falsely undo window reduction when
   congestion occurs.  If only ECN is enabled or neither ECN nor
   spurious retransmission is enabled, the throughput with local
   recovery in use is no much difference from that of the tradition TCP.

4.3.  Overlay Protocol Extensions

   The overlay usually has no control over how packets are routed in the
   underlying network between two overlay nodes, but it can control, for
   example, the sequence of overlay nodes a message traverses before
   reaching its destination.  LOOPS assumes the overlay protocol can
   deliver the packets in such designated sequence.  Most forms of
   overlay networking use some sort of "encapsulation".  The whole path
   taken can be performed by stitching multiple short overlay paths,
   like VXLAN [RFC7348], GENEVE [I-D.ietf-nvo3-geneve], or it can be a
   single overlay path with a sequence of intermediate overlay nodes
   specified, as in SRv6 [I-D.ietf-6man-segment-routing-header].  In
   either way, LOOPS information is required to be embedded in those
   protocols to support the data plane measurement and feedback.
   Retransmission or FEC based loss recovery can be either per ON-hop
   based or OE to OE based.

   LOOPS alone has no setup requirement on control plane.  Some overlay
   protocol, e.g.  CAPWAP [RFC5415], has session setup phase, we can use
   it to exchange the information such as dynamic FEC parameters.

4.4.  Summary

   LOOPS is expected to extend the existing overlay protocols in data
   plane.  Path selection is assumed a feature provided by the overlay
   protocols via SDN or other approaches and is not a part of LOOPS.
   LOOPS is a set of functions to be implemented on ONs in a long haul
   overlay network.  LOOPS includes the following features.

   1.  Local recovery.  Retransmission, FEC or hybrid can be used as
       local recovery method.  Such recovery mechanism is in-network.
       It is performed by two network nodes with computing and memory
       resources.

   2.  Local congestion measurement.  Sender ON measures the local
       segment RTT, loss and/or throughput to immediately get the
       overlay segment status.

   3.  Signal to end to end congestion control.  Strategy to set/not set
       ECN CE marking or simply drop the packet to signal the end host
       sender about the loss event to help adjust the sending rate.


Li & Zhou               Expires November 25, 2019              [Page 16]

Internet-Draft        LOOPS Problem & opportunities             May 2019


5.  Security Considerations

   LOOPS does not look at the traffic payload, so encrypted payload does
   not affect functionality of LOOPS.  The use of LOOPS introduces some
   issues which impact security.  ON with LOOPS function represents a
   point in the network where the traffic can be potentially
   manipulated.  Denial of service attack can be launched from an ON.  A
   rogue ON might be able to spoof packet as if it come from a
   legitimate ON.  It may also modify the ECN CE marking in packets to
   influence the sender's rate.  In order to protected from such
   attacks, the overlay protocol itself should have some build-in
   security protection which inherently be used by LOOPS.  The operator
   should use some authentication mechanism to make sure ONs are valid
   and non-compromised.

6.  IANA Considerations

   No IANA action is required.

7.  Acknowledgements

   Thanks to etosat mailing list about the discussion about the SatCom
   and LOOPS use case.

8.  Informative References

   [RFC0793]  Postel, J., "Transmission Control Protocol", STD 7,
              RFC 793, DOI 10.17487/RFC0793, September 1981,
              <https://www.rfc-editor.org/info/rfc793>.

   [RFC3168]  Ramakrishnan, K., Floyd, S., and D. Black, "The Addition
              of Explicit Congestion Notification (ECN) to IP",
              RFC 3168, DOI 10.17487/RFC3168, September 2001,
              <https://www.rfc-editor.org/info/rfc3168>.

   [RFC3550]  Schulzrinne, H., Casner, S., Frederick, R., and V.
              Jacobson, "RTP: A Transport Protocol for Real-Time
              Applications", STD 64, RFC 3550, DOI 10.17487/RFC3550,
              July 2003, <https://www.rfc-editor.org/info/rfc3550>.

   [RFC3708]  Blanton, E. and M. Allman, "Using TCP Duplicate Selective
              Acknowledgement (DSACKs) and Stream Control Transmission
              Protocol (SCTP) Duplicate Transmission Sequence Numbers
              (TSNs) to Detect Spurious Retransmissions", RFC 3708,
              DOI 10.17487/RFC3708, February 2004,
              <https://www.rfc-editor.org/info/rfc3708>.


Li & Zhou               Expires November 25, 2019              [Page 17]

Internet-Draft        LOOPS Problem & opportunities             May 2019


   [RFC4585]  Ott, J., Wenger, S., Sato, N., Burmeister, C., and J. Rey,
              "Extended RTP Profile for Real-time Transport Control
              Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC 4585,
              DOI 10.17487/RFC4585, July 2006,
              <https://www.rfc-editor.org/info/rfc4585>.

   [RFC4588]  Rey, J., Leon, D., Miyazaki, A., Varsa, V., and R.
              Hakenberg, "RTP Retransmission Payload Format", RFC 4588,
              DOI 10.17487/RFC4588, July 2006,
              <https://www.rfc-editor.org/info/rfc4588>.

   [RFC5415]  Calhoun, P., Ed., Montemurro, M., Ed., and D. Stanley,
              Ed., "Control And Provisioning of Wireless Access Points
              (CAPWAP) Protocol Specification", RFC 5415,
              DOI 10.17487/RFC5415, March 2009,
              <https://www.rfc-editor.org/info/rfc5415>.

   [RFC5681]  Allman, M., Paxson, V., and E. Blanton, "TCP Congestion
              Control", RFC 5681, DOI 10.17487/RFC5681, September 2009,
              <https://www.rfc-editor.org/info/rfc5681>.

   [RFC6298]  Paxson, V., Allman, M., Chu, J., and M. Sargent,
              "Computing TCP's Retransmission Timer", RFC 6298,
              DOI 10.17487/RFC6298, June 2011,
              <https://www.rfc-editor.org/info/rfc6298>.

   [RFC6830]  Farinacci, D., Fuller, V., Meyer, D., and D. Lewis, "The
              Locator/ID Separation Protocol (LISP)", RFC 6830,
              DOI 10.17487/RFC6830, January 2013,
              <https://www.rfc-editor.org/info/rfc6830>.

   [RFC7348]  Mahalingam, M., Dutt, D., Duda, K., Agarwal, P., Kreeger,
              L., Sridhar, T., Bursell, M., and C. Wright, "Virtual
              eXtensible Local Area Network (VXLAN): A Framework for
              Overlaying Virtualized Layer 2 Networks over Layer 3
              Networks", RFC 7348, DOI 10.17487/RFC7348, August 2014,
              <https://www.rfc-editor.org/info/rfc7348>.

   [I-D.dukkipati-tcpm-tcp-loss-probe]
              Dukkipati, N., Cardwell, N., Cheng, Y., and M. Mathis,
              "Tail Loss Probe (TLP): An Algorithm for Fast Recovery of
              Tail Losses", draft-dukkipati-tcpm-tcp-loss-probe-01 (work
              in progress), February 2013.

   [I-D.ietf-nvo3-geneve]
              Gross, J., Ganga, I., and T. Sridhar, "Geneve: Generic
              Network Virtualization Encapsulation", draft-ietf-
              nvo3-geneve-13 (work in progress), March 2019.


Li & Zhou               Expires November 25, 2019              [Page 18]

Internet-Draft        LOOPS Problem & opportunities             May 2019


   [I-D.ietf-tcpm-rack]
              Cheng, Y., Cardwell, N., Dukkipati, N., and P. Jha, "RACK:
              a time-based fast loss detection algorithm for TCP",
              draft-ietf-tcpm-rack-05 (work in progress), April 2019.

   [I-D.ietf-6man-segment-routing-header]
              Filsfils, C., Dukes, D., Previdi, S., Leddy, J.,
              Matsushima, S., and d. daniel.voyer@bell.ca, "IPv6 Segment
              Routing Header (SRH)", draft-ietf-6man-segment-routing-
              header-19 (work in progress), May 2019.

   [I-D.cardwell-iccrg-bbr-congestion-control]
              Cardwell, N., Cheng, Y., Yeganeh, S., and V. Jacobson,
              "BBR Congestion Control", draft-cardwell-iccrg-bbr-
              congestion-control-00 (work in progress), July 2017.

   [DOI_10.1109_ICDCS.2016.49]
              Cai, C., Le, F., Sun, X., Xie, G., Jamjoom, H., and R.
              Campbell, "CRONets: Cloud-Routed Overlay Networks", 2016
              IEEE 36th International Conference on Distributed
              Computing Systems (ICDCS), DOI 10.1109/icdcs.2016.49, June
              2016.

   [DOI_10.1145_3038912.3052560]
              Haq, O., Raja, M., and F. Dogar, "Measuring and Improving
              the Reliability of Wide-Area Cloud Paths", Proceedings of
              the 26th International Conference on World Wide Web -
              WWW '17, DOI 10.1145/3038912.3052560, 2017.

Authors' Addresses

   Yizhou Li
   Huawei Technologies
   101 Software Avenue,
   Nanjing 210012
   China

   Phone: +86-25-56624584
   Email: liyizhou@huawei.com


   Xingwang Zhou
   Huawei Technologies
   101 Software Avenue,
   Nanjing 210012
   China

   Email: zhouxingwang@huawei.com


Li & Zhou               Expires November 25, 2019              [Page 19]