Internet DRAFT - draft-agache-tcpm-sndbufadv

draft-agache-tcpm-sndbufadv







Internet Engineering Task Force                                A. Agache
Internet-Draft                                                 C. Raiciu
Intended status: Experimental        University Politehnica of Bucharest
Expires: January 21, 2016                                  July 20, 2015


                       TCP Sendbuffer Advertising
                     draft-agache-tcpm-sndbufadv-00

Abstract

   Network operators have difficulty in understanding the end-to-end
   performance of TCP connections through their networks.  By observing
   packets at different vantage points on their path and maintaining per
   flow state, network operators can detect packet losses,
   retransmission and estimate RTTs, among other metrics.  A key
   information needed by networks is whether a connection is limited by
   the network or by the application.  This information is very
   difficult to accurately infer by passive measurements.

   We propose to advertise sendbuffer occupancy in TCP: each segment
   will carry the amount of backlogged data present in the sender's
   buffer.  This information allows networks to discern between
   application-limited, network-limited and flow-control limited flows,
   creating new avenues of network optimization.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on January 21, 2016.

Copyright Notice

   Copyright (c) 2015 IETF Trust and the persons identified as the
   document authors.  All rights reserved.




Agache & Raiciu         Expires January 21, 2016                [Page 1]

Internet-Draft         TCP Sendbuffer Advertising              July 2015


   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

   1.  Requirements Language . . . . . . . . . . . . . . . . . . . .   2
   2.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
   3.  TCP Sendbuffer Structure  . . . . . . . . . . . . . . . . . .   3
   4.  Negotiating sendbuffer advertising  . . . . . . . . . . . . .   4
   5.  Encoding sendbuffer information . . . . . . . . . . . . . . .   5
   6.  References  . . . . . . . . . . . . . . . . . . . . . . . . .   6
     6.1.  Normative References  . . . . . . . . . . . . . . . . . .   6
     6.2.  Informative References  . . . . . . . . . . . . . . . . .   6
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .   7

1.  Requirements Language

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119 [RFC2119].

2.  Introduction

   Aggregate link statistics, such as packet and loss counts, are easily
   available in modern networks, but they convey a fairly limited
   picture of network performance.  In many cases, the network needs
   information about individual flows' demand for bandwidth to take the
   appropriate resource allocation decisions.

   One example is a mobile phone streaming audio or video over a WiFi
   connection.  The default strategy is to always stick to WiFi when
   available, despite the fact that performance may be terrible and
   seriously impair user experience.  If the mobile network knew the
   multimedia stream needs more bandwidth, it could fire-up the cellular
   connection and migrate traffic over there by using mobile client
   offloading software relying on Multipath TCP [NSDI-12] or Mobile IP
   [RFC5944].

   Another example is in datacenters with Clos topologies (such as the
   popular FatTree topology [FatTree]), where elephant flows are
   randomly placed on paths with flow-level Equal Cost Multipath



Agache & Raiciu         Expires January 21, 2016                [Page 2]

Internet-Draft         TCP Sendbuffer Advertising              July 2015


   Routing; when one or more elephant flows are placed on the same link,
   performance degrades despite existing capacity elsewhere in the
   network.  The network can reroute such flows by using tunnels or
   programmable switches (e.g.  Openflow) but the one thing missing is
   the information regarding which flows could utilize more capacity if
   given a better path.

   Determining if a TCP connection is network limited or not is
   difficult to do by passive monitoring.  The network needs to keep
   per-flow state, to estimate the sender congestion window and to
   accurately monitor flight-size.  When flight-size is smaller than the
   congestion window and the receive window, the connection is limited
   by the application and does not need more capacity.

   We propose that each TCP segment should also encode the amount of
   backlogged data in the TCP sendbuffer.  This information enables
   network boxes and receivers to easily identify connections that need
   more capacity.  Our goal is to have this extension "always on", and
   it is therefore very important to reduce its overhead.  Next, we
   discuss how to compute and report the amount of backlogged data.  We
   follow with a discussion of signaling options for conveying
   sendbuffer information.

3.  TCP Sendbuffer Structure


                        1          2
               ---|----------|----------|--->
               SND.UNA    SND.NXT   WRITE.SEQ


           1 - sequence numbers of unacknowledged, in flight data
           2 - sequence numbers of backlogged data.

                    Anatomy of the TCP Sendbuffer

   The figure above shows the anatomy of the TCP sendbuffer.  SND.UNA
   represents the oldest sequence number sent but not yet acknowledged.
   At the other end there is WRITE.SEQ, the tail sequence number of data
   held in the sendbuffer.  Somewhere in-between we have SND.NXT, the
   sequence number of the next byte to be sent.  From SND.NXT to
   WRITE.SEQ we have backlogged data, written by the application but not
   yet transmitted.

   SND.NXT is constrained by both the receive window and the congestion
   window as follows:

           SND.NXT <= SND.UNA + min(SND.WND, SND.CWND)



Agache & Raiciu         Expires January 21, 2016                [Page 3]

Internet-Draft         TCP Sendbuffer Advertising              July 2015


   As long as the receive window is not a bottleneck, and in the absence
   of hardware issues or software bugs, having SND.NXT smaller than
   WRITE.SEQ indicates that the congestion window is not large enough,
   so the connection is network limited at that point in time.  The
   easiest way to implement sendbuffer advertising is to simply copy the
   amount of backlogged data (WRITE.SEQ-SND.NXT) into the segment when
   it leaves the TCP stack.  However, this will result in non-zero
   sendbuffer advertisement when the connection is application-limited
   but the application writes bursts of a few packets.  These packets
   will be sent out immediately on the wire, yet the first packets in
   the burst will report that the application is backlogged, when in
   fact it isn't.

   To correctly implement sendbuffer advertisement, the sender MUST
   advertise the amount of backlogged according to the formula below:

           SEG.SNDBUF = WRITE.SEQ-SND.UNA - min(SND.WND, SND.CWND),
                         if WRITE.SEQ > SND.UNA + min(SND.WND, SND.CWND)

           SEG.SNDBUF = 0, otherwise

   This formula ensures that if an application write fits in the current
   receive and congestion windows, all the resulting segments will
   advertise zero backlog data.

4.  Negotiating sendbuffer advertising

   The standard way to extend TCP is to negotiate the extension during
   the three-way handshake.  The TCP option space, however, is already
   very crowded in the SYN exchange.  Until solutions that extend the
   TCP option space are standardized, negotiation in the SYN exchange
   is, in our view, not a feasible option for sendbuffer advertising.

   Fortunately, sendbuffer advertising is a sender-side only
   modification to TCP, and the information it makes available can be
   used anyone that understands it, be it the network or the receiver.
   This implies that we can simply bypass the three way handshake as
   long as the actual encoding of the sendbuffer information in TCP
   segments does not have negative effects to legacy routers,
   middleboxes and TCP receivers.  We discuss encoding in the next
   section.

   TCP sendbuffer advertising will therefore be a simple sender-only
   enhancement to the TCP stack that can be enabled by using system-wide
   configuration (e.g. sysctl in Linux).






Agache & Raiciu         Expires January 21, 2016                [Page 4]

Internet-Draft         TCP Sendbuffer Advertising              July 2015


5.  Encoding sendbuffer information

   In this section we discuss two encoding alternatives for sendbuffer
   information: as new TCP options, in the acknowledgement field of data
   segments and in the receive-window field.

   The first solution is to simply encode sendbuffer information in a
   new TCP option on every segment carrying data in a TCP connection,
   without negotiating this extension in the three way handshake.  This
   only adds 6B of overhead to each TCP segment.  This option is
   feasible only when there is sufficient space in the TCP option field
   of the corresponding data segment.

   Avoding the option negotiation will work really well in datacenters
   where it can be ensured out-of-band that all machines either know
   sendbuffer advertising or are unaffected by segments carrying new
   options.  In the Internet, before advertising sendbuffer information
   in new TCP options we need to ensure that: a) existing TCP stacks are
   robust to unknown options, simply ignoring them, and b) middleboxes
   do not drop segments carrying unknown options.  Existing studies
   [IMC-11] imply that the wide majority of network paths either allow
   unknown options or drop the options, allowing the segments through.
   Only a very small fraction of paths drop the segments with unknown
   options.  To cope with such cases, the implementation MUST NOT
   include sendbuffer information on retransmitted packets, to ensure
   that the connection makes some progress even in the presence of such
   middleboxes.

   Our second solution is based on the observation that while TCP itself
   is bidirectional, most connections in practice will transfer data
   unidirectionally most times.  The endpoints can be either data
   senders or receivers at different moments, but they rarely act as
   both at the same time.  When traffic is unidirectional, the sender
   sends the same value for the acknowledgement number and receive
   window field over and over again.

   We propose to reuse one or both of these fields to advertise
   sendbuffer information instead when traffic is unidirectional.  To
   detect unidirectional traffic, the sender will maintain a state
   variable called SND.NUM_SEG that is initially set to zero, and is
   zeroed whenever a segment with a valid ACK field is sent out.
   SND.NUM_SEG will be incremented whenever a segment is received.  A
   sendbuffer advertisment SHOULD be encoded in outgoing segments only
   when SND.NUM_SEG = 0.

   Sendbuffer advertising will encode the proper value in the ACK field
   and NOT set the ACK flag.  This ensures the receiver and other on-
   path hosts will ignore the field altogether.  We still need, however,



Agache & Raiciu         Expires January 21, 2016                [Page 5]

Internet-Draft         TCP Sendbuffer Advertising              July 2015


   to inform parties interested in sendbuffer information they can use
   the value of the ACK field.

   In datacenters, we can simply define one of the reserved TCP flags as
   the sendbuffer advertisement flag.  When this flag is set, the
   sendbuffer value is encoded in the ACK field.  The sendbuffer
   advertisement flag and the ACK flag CANNOT be set simultaneously.

   In the Internet, redefining the meaning of one of the reserved flags
   will simply not work through existing middleboxes; additionally,
   certain middleboxes may zero the ACK field when the ACK flag is not
   set.  In this context, we propose to use the receive window field in
   segments carrying sendbuffer information to encode a checksum of this
   information.  Interested parties will: a) scan for data segments with
   the ACK flag not set, b) compute a 1's complement checksum of the ACK
   field and check it against the receive window field.  In case of a
   match, the sendbuffer information can be used.  To understand the
   feasibility of this encoding, however, tests must to be conducted to
   check the behaviour of middleboxes when the ACK flag is not set.

6.  References

6.1.  Normative References

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.

6.2.  Informative References

   [FatTree]  Al-Fares, M., Loukissas, A., and A. Vahdat, "A scalable,
              commodity data center network architecture", 2008,
              <http://doi.acm.org/10.1145/1402958.1402967>.

   [IMC-11]   Honda, M., Nishida, Y., Raiciu, C., Greenhalgh, A.,
              Handley, M., and H. Tokuda, "Is it still possible to
              extend tcp?", 2011,
              <http://doi.acm.org/10.1145/2068816.2068834>.

   [NSDI-12]  Raiciu, C., Paasch, C., Barre, S., Ford, A., Honda, M.,
              Duchene, F., Bonaventure, O., and M. Handley, "How hard
              can it be? designing and implementing a deployable
              multipath tcp", 2012,
              <http://dl.acm.org/citation.cfm?id=2228298.2228338>.

   [RFC5944]  Perkins, C., "IP Mobility Support for IPv4, Revised",
              RFC 5944, November 2010.





Agache & Raiciu         Expires January 21, 2016                [Page 6]

Internet-Draft         TCP Sendbuffer Advertising              July 2015


Authors' Addresses

   Alexandru Agache
   University Politehnica of Bucharest
   Splaiul Independentei 313
   Bucharest
   Romania

   Email: alexandru.agache@cs.pub.ro


   Costin Raiciu
   University Politehnica of Bucharest
   Splaiul Independentei 313
   Bucharest
   Romania

   Email: costin.raiciu@cs.pub.ro

































Agache & Raiciu         Expires January 21, 2016                [Page 7]