Internet DRAFT - draft-tiesel-taps-socketintents-bsdsockets

draft-tiesel-taps-socketintents-bsdsockets







TAPS Working Group                                             P. Tiesel
Internet-Draft                                               T. Enghardt
Intended status: Informational                                 TU Berlin
Expires: January 3, 2019                                   July 02, 2018


A Socket Intents Prototype for the BSD Socket API - Experiences, Lessons
                       Learned and Considerations
             draft-tiesel-taps-socketintents-bsdsockets-02

Abstract

   This document describes a prototype implementation of Socket Intents
   [I-D.tiesel-taps-socketintents] for the BSD Socket API as an
   illustrative example how Socket Intents could be implemented.  It
   described the experiences made with the prototype and lessons learned
   from trying to extend the BSD Socket API.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on January 3, 2019.

Copyright Notice

   Copyright (c) 2018 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of




Tiesel & Enghardt        Expires January 3, 2019                [Page 1]

Internet-Draft       Socket Intents for BSD Sockets            July 2018


   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
   2.  Prototype Architecture  . . . . . . . . . . . . . . . . . . .   3
   3.  Multiple Access Manager . . . . . . . . . . . . . . . . . . .   4
     3.1.  Policy  . . . . . . . . . . . . . . . . . . . . . . . . .   5
     3.2.  Path characteristics data collectors  . . . . . . . . . .   6
   4.  Socket Intents Representation . . . . . . . . . . . . . . . .   7
   5.  The Socket Intents API Variants . . . . . . . . . . . . . . .   7
     5.1.  Classic API / muacc_context . . . . . . . . . . . . . . .   8
       5.1.1.  muacc_getaddrinfo() . . . . . . . . . . . . . . . . .   8
       5.1.2.  muacc_socket()  . . . . . . . . . . . . . . . . . . .   9
       5.1.3.  muacc_setsockopt()  . . . . . . . . . . . . . . . . .  10
       5.1.4.  muacc_connect() . . . . . . . . . . . . . . . . . . .  10
       5.1.5.  muacc_close() . . . . . . . . . . . . . . . . . . . .  11
     5.2.  Classic API / getaddrinfo . . . . . . . . . . . . . . . .  11
     5.3.  Socketconnect API . . . . . . . . . . . . . . . . . . . .  14
   6.  API Implementation Experiences & Lessons Learned  . . . . . .  15
     6.1.  The Missing Link to Name Resolution . . . . . . . . . . .  15
     6.2.  File Descriptors Considered Harmful . . . . . . . . . . .  16
     6.3.  Asynchronous API Anarchy  . . . . . . . . . . . . . . . .  17
     6.4.  Here Be Dragons hiding in Shadow Structures . . . . . . .  17
   7.  Conclusion  . . . . . . . . . . . . . . . . . . . . . . . . .  18
   8.  Acknowledgments . . . . . . . . . . . . . . . . . . . . . . .  18
   9.  References  . . . . . . . . . . . . . . . . . . . . . . . . .  18
     9.1.  Informative References  . . . . . . . . . . . . . . . . .  19
     9.2.  URIs  . . . . . . . . . . . . . . . . . . . . . . . . . .  20
   Appendix A.  API Usage Examples . . . . . . . . . . . . . . . . .  20
     A.1.  Usage Example of the Classic / muacc_context API  . . . .  20
     A.2.  Usage Example of the Classic / getaddrinfo API  . . . . .  21
     A.3.  Usage Example of the Socketconnect API  . . . . . . . . .  22
   Appendix B.  Changes  . . . . . . . . . . . . . . . . . . . . . .  23
     B.1.  Since -01 . . . . . . . . . . . . . . . . . . . . . . . .  23
     B.2.  Since -00 . . . . . . . . . . . . . . . . . . . . . . . .  23
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  24

1.  Introduction

   With the proliferation of devices that have multiple paths to the
   internet and an increasing number of transport protocols available,
   the number of transport options to serve a communication unit
   explodes.  Implementing a heuristic or strategy for choosing from
   this overwhelming set of transport options by each application puts a
   huge burden on the application developer.  Thus, the decisions
   regarding all transport options mentioned so far should be supported



Tiesel & Enghardt        Expires January 3, 2019                [Page 2]

Internet-Draft       Socket Intents for BSD Sockets            July 2018


   and, if requested by the application, automated within the transport
   layer.

   Socket Intents [I-D.tiesel-taps-socketintents] allow an application
   to express what it knows, assumes, expects or wants to prioritize
   regarding its own network communication.  This information can than
   be used by the OS to perform destination selection, path selection
   and transport protocol stack instance selection.

   Our Socket Intents prototype for the BSD Socket API is a first
   attempt to automate transport option selection within the OS.  It is
   primarily targeted at path and destination address selection and
   tries to be as close as possible to the semantics of the BSD Socket
   API.  The prototype mostly excludes the problem of transport protocol
   stack instance selection, which is more closely discussed in
   [I-D.tiesel-taps-communitgrany].

   We implemented the prototype as a wrapper for the BSD Socket API that
   communicates to a central Multiple Access Manager that makes the
   actual decisions and can optimize across applications.  The whole
   implementation was done in about 15k lines of C code.  The code is
   available at Github [1] under BSD License.

   This document describes our Socket Intents prototype for the BSD
   Socket API.  It details important aspects of the implementation and
   the API variants we developed over time based on lessons learned.
   Finally, it summarizes these lessons and points out why the BSD
   Socket API is not particularly well suited to integrate automated
   transport protocol stack instance selection.  Furthermore, it
   describes the limitations for destination address and path selection
   within the BSD Socket API.

2.  Prototype Architecture

   The Socket Intents prototype consists of the following components,
   also shown in Figure 1:

   o  The Socket Intents API, a BSD Socket API wrapper for applications
      to use, including a representation of the actual Socket Intents.

   o  The Socket Intents Library which implements the Socket Intents
      API.  It sends requests to the Multiple Access Manager, e.g.
      before establishing a connection, and gets back a response
      regarding what interface to use.

   o  The Multiple Access Manager (MAM), a daemon which gets informed
      about all application requests and has knowledge of the available
      network interfaces.



Tiesel & Enghardt        Expires January 3, 2019                [Page 3]

Internet-Draft       Socket Intents for BSD Sockets            July 2018


   o  The Policy, a dynamically loaded library hosted by the MAM.  It
      chooses which of the available interfaces to use based on the
      available knowledge about them and the Socket Intents.

   o  Data collectors that that reside inside the MAM and that provide
      information like bandwidth usage, smoothed RTT estimate and RSSI
      for wireless links to the policy.

    +------------------------+
    |      Application       |
    |                        |                   +-------------------+
    +-{ Socket Intents API }-+  (MAM Request)    |  Multiple Access  |
    |                        | ----------------> |      Manager      |
    |     Socket Intents     |  (MAM Response)   | +---------------+ |
    |        Library         | <---------------- | |    Policy     | |
    +------------------------+                   | +---------------+ |
    |      BSD Sockets       |                   | |Data Collectors| |
    +------------------------+                   +-+---------------+-+

           Figure 1: Components of the Socket Intents Prototype

3.  Multiple Access Manager

   The Multiple Access Manager (MAM) is the central transport option
   selection instance on a host.  It is realized as a daemon that runs
   in userspace and receives requests from each application that uses
   the Socket Intents Library.

   The MAM hosts the Policy, which is the actual decision making
   component, e.g., deciding which source address and therefore which
   source interface to use.  Upon events, such as an application
   requesting to resolve a name or to connect a socket (see Section 5
   for details), the Socket Intents Library issues a MAM request and the
   MAM invokes a callback to the policy - see Section 3.1 for details -
   which can either communicate its decision right away or defer its
   decision, e.g., when it has to wait for the results of name
   resolution.  The results and decisions are communicated back to the
   Socket Intents Library through the MAM response, where they are
   applied to the actual socket, see also Figure 1.

   To support the policy, the MAM maintains a list of IP prefixes that
   are configured on the local interfaces and available for outgoing
   communications.  As destination address selection and path selection
   are highly dependent on each other, the MAM integrates DNS resolution
   and maintains separate resolver configurations per prefix (see
   [ANRW17-MH] for further discussion on multiple PvDs and DNS
   resolution).  Furthermore, the MAM includes data collectors which




Tiesel & Enghardt        Expires January 3, 2019                [Page 4]

Internet-Draft       Socket Intents for BSD Sockets            July 2018


   periodically gather statistics on the available paths, see
   Section 3.2 for details.

3.1.  Policy

   In the Socket Intents prototype, the Policy implements the decision
   logic for selecting among available transport options.  In our
   current implementation, only one policy can be active at a given
   time.  We implement different interchangeable policies as dynamically
   loaded libraries, which are hosted by the Multi Access Manager (MAM),
   see Figure 1.  When launching the MAM, the user has to choose a
   policy and supply a policy configuration, which can contain
   additional information to configure the policy.

   Examples of policy configuration include:

   o  A list of IP prefixes configured on local interfaces to consider
      as source for the communication

   o  Name server(s) to use for each of the IP prefixes

   o  Preferences to instrument the policy, e.g., default prefix to use

   The policy is initialized with this configuration and then waits for
   the callback of an incoming MAM request.

   Upon a callback, the policy can use information from the MAM request,
   such as Socket Intents, and information available within the MAM,
   such as recently measured path characteristics (see Section 3.2), to
   make decisions.

   Policy decisions can include:

   o  The source address(es) used for name resolution

   o  How to order the results of name resolution (i.e., preferring
      certain IP addresses over others)

   o  Picking an IP protocol version

   o  Picking a transport protocol (Note that in our current
      implementation, we are constrained by the Socket API, so our
      policy cannot override the transport protocol chosen by an
      application.)

   o  Setting socket options (e.g., disable TCP Nagle)

   o  Choosing a source address for the outgoing communication



Tiesel & Enghardt        Expires January 3, 2019                [Page 5]

Internet-Draft       Socket Intents for BSD Sockets            July 2018


   o  Reusing a socket from a given socket set (only for the API variant
      described in Section 5.3)

   Note that in our current implementation, the policy is a piece of
   code which can in principle execute arbitrary instructions.  We
   assume this is acceptable for an experimental platform but would
   prefer an abstract description like a domain-specific language for a
   production system.

3.2.  Path characteristics data collectors

   The data collectors are implemented as a component of the MAM, within
   a callback that is executed periodically, e.g., every 100 ms.  When
   this callback is invoked, the MAM passively gathers statistics about
   the current usage and properties of the available local interfaces
   and stores them in per-interface or per-network prefix data
   structures.

   Measured properties include:

   o  Minimum Smoothed Round Trip Time (SRTT) of current TCP connections
      using a network prefix, as an estimate for last-mile latency

   o  Median SRTT of current TCP connections using a network prefix, as
      an alternate estimate for last-mile latency

   o  Median of Round Trip Time variations within connections

   o  Median variation of Smoothed Round Trip Times across connections

   o  Median of percentage of segments deemed lost of all transmitted
      segments of current TCP connections, as an estimate of upstream
      packet loss

   o  Maximum transmitted and received bytes per second over an
      interface within the last 5 minutes, as an estimate for maximum
      available bandwidth

   o  On 802.11 interfaces, the Received Signal Strength Indicator
      (RSSI) of the last received frame on that interface, as an
      estimate for reception strength

   o  On 802.11 interfaces, the modulation rate of the last received and
      the last transmitted unicast data frame on that interface, as an
      estimate for the available data transmission rate on the first hop






Tiesel & Enghardt        Expires January 3, 2019                [Page 6]

Internet-Draft       Socket Intents for BSD Sockets            July 2018


   o  On 802.11 interfaces, the latest Channel Utilization as parsed
      from a Beacon frame, as an estimate of congestion on the wireless
      medium

   See [ANRW18-Metrics] for more discussion of the gathered metrics.

   When a policy callback is invoked, the policy can use the latest
   measured properties to guide its decisions, see Section 3.1.

   Note that we do not perform active measurements from within the MAM
   to avoid overhead.

4.  Socket Intents Representation

   As described in [I-D.tiesel-taps-socketintents], Socket Intents are
   pieces of information about upcoming traffic.  An application can
   share the information that it has available through the Socket
   Intents API.

   In our implementation, Socket Intents are represented as socket
   options for get/setsockopt on its own socket option level
   (SOL_INTENTS).

   For some of the API variants, we had to introduce socket option
   lists, i.e., data structures that can hold multiple socket options
   and therefore multiple Socket Intents.

   Which of these variants is actually used depends on the API variant,
   see Section 5.

5.  The Socket Intents API Variants

   The Socket Intents API is a wrapper around the BSD Socket API.  It
   sends requests to the Multiple Access Manager (MAM) at certain
   events, e.g., before a connection is established, and applies the
   suggestions that it gets from the MAM, e.g., to bind to a certain
   local interface or to set a certain socket option.

   There exist different variants of this API, see Section 5, that try
   to fit different concepts:

   o  The Classic API with muacc_context, see Section 5.1, was
      attempting to stick as close as possible to the call sequence of
      BSD Sockets.

   o  The second variant of the classic API does all transport option
      selection in "getaddrinfo", see Section 5.2.  This variant tries
      to simplify the implementation without deriving too much from the



Tiesel & Enghardt        Expires January 3, 2019                [Page 7]

Internet-Draft       Socket Intents for BSD Sockets            July 2018


      usage of BSD Sockets.  It minimizes the changes to the BSD Socket
      API, but adds additional overhead to the application.

   o  The "socketconnect" API, see Section 5.3, tries to automate as
      much functionality as possible and adds support for automating
      connection caching.  It replaces the usual sequence of BSD Socket
      API calls with a single call.

5.1.  Classic API / muacc_context

   In the first variant, we add a parameter called "muacc_context" to
   the BSD Socket API calls and to getaddrinfo.  This parameter holds
   properties provided by the socket calls and retains them across
   function calls to enable automation of the connection properties by
   our Socket Intents Prototype.  The shadow data structures behind the
   "muacc_context" parameter are initialized by API wrapper at the time
   of the first call (which we assume to be muacc_getaddrinfo most of
   the time) with most of its fields empty.  Then within each call to
   our modified Socket API, it is filled with data.

   Properties include:

   o  Socket file descriptor

   o  API calls that were already performed on this context

   o  domain, type, and protocol of the socket

   o  remote hostname

   o  remote address

   o  hints for resolving the remote address

   o  local address to bind to that the application requested

   o  local address to bind to that the MAM suggested

   o  current socket options that were set

   o  socket options suggested by MAM

5.1.1.  muacc_getaddrinfo()

   This function resolves a host name or service to an addrinfo data
   structure, usually containing an IP address or port.  Internally, the
   Socket Intents prototype sends a "getaddrinfo" request to the MAM,
   which should do the name resolution.  It can, e.g., resolve the name



Tiesel & Enghardt        Expires January 3, 2019                [Page 8]

Internet-Draft       Socket Intents for BSD Sockets            July 2018


   over multiple available interfaces at the same time, and then order
   the results according to a policy decision, or only return results
   obtained over a specific interface.

   SIGNATURE:

   int muacc_getaddrinfo(muacc_context_t *ctx, const char *hostname,
   const char *servname, const struct addrinfo *hints, struct addrinfo
   **res)

   ARGUMENTS:

   ctx:  Context that can contain properties of this socket/connection
      and retains them across function calls.  This function is mostly
      called with an empty context, which is then filled within the
      function.

   hostname:  Remote host name to be resolved

   servname:  Remote service to be resolved

   hints:  Hints for resolving the name

   res:  Data structure for result of name resolution

   RETURN VALUE:

   Returns 0 on success, or an error code as provided by getaddrinfo().

5.1.2.  muacc_socket()

   This function creates a socket file descriptor just like the regular
   socket call.

   SIGNATURE:

   int muacc_socket(muacc_context_t *ctx, int domain, int type, int
   protocol)

   ARGUMENTS:

   ctx:  Context that can contain properties of this socket/connection
      and retains them across function calls.  This function is mostly
      called after muacc_getaddrinfo(), since domain, type, and protocol
      can depend on the type of resolved address.

   domain:  Domain of the socket




Tiesel & Enghardt        Expires January 3, 2019                [Page 9]

Internet-Draft       Socket Intents for BSD Sockets            July 2018


   type:  Type of the socket

   protocol:  Protocol of the socket

   RETURN VALUE:

   Returns a file descriptor of the new socket on success, or -1 on
   failure.

5.1.3.  muacc_setsockopt()

   This call allows to set socket options (including Socket Intents).
   For Socket Intents, this function can be called on a valid
   "muacc_context" and an invalided file descriptor (-1) to provide
   assertional hints to "muacc_getaddrinfo()".

   SIGNATURE:

   int muacc_setsockopt(muacc_context_t *ctx, int socket, int level, int
   option_name, const void *option_value, socklen_t option_len)

   ARGUMENTS:

   ctx:  Context that can contain properties of this socket/connection
      and retains them across function calls.  This function is mostly
      called to set Intents as socket options within the context.

   socket:  Socket file descriptor

   level:  Level of the socket option to set

   option_name:  Name of the socket option to set

   option_value:  Value of the socket option to set

   option_len:  Length of the socket option to set

   RETURN VALUE:

   Returns 0 on success, or -1 on failure.

5.1.4.  muacc_connect()

   Like the regular connect call, but also binds to the source address
   selected by the Socket Intents Policy and applies socket options
   suggested by the Socket Intents Policy.

   SIGNATURE:



Tiesel & Enghardt        Expires January 3, 2019               [Page 10]

Internet-Draft       Socket Intents for BSD Sockets            July 2018


   int muacc_connect(muacc_context_t *ctx, int socket, const struct
   sockaddr *address, socklen_t address_len)

   ARGUMENTS:

   ctx:  Context that can contain properties of this socket/connection
      and retains them across function calls.  This function is mostly
      called after all Socket Intents for this connection have been set
      via muacc_setsockopt().

   socket:  Socket file descriptor

   address:  Remote address to connect to

   address_len:  Length of the remote address

   RETURN VALUE:

   Returns 0 on success, or -1 on failure.

5.1.5.  muacc_close()

   Like regular close, but also cleans up state held in shadow
   structures behind "muacc_context"

   SIGNATURE:

   int muacc_close(muacc_context_t *ctx, int socket)

   ARGUMENTS:

   ctx:  Context that can contain properties of this socket/connection
      and retains them across function calls.  This function
      deinitializes and releases the context.

   socket:  Socket file descriptor

   RETURN VALUE:

   Returns 0 on success, or -1 on failure.

5.2.  Classic API / getaddrinfo

   In this variant, Socket Intents are passed directly to
   "getaddrinfo()" as part of the "hints" parameter.  The name
   resolution is done by the MAM, which makes all decisions and stores
   them in the "result" data structure as list of options ordered by
   preference.  Subsequently, applications can use this information for



Tiesel & Enghardt        Expires January 3, 2019               [Page 11]

Internet-Draft       Socket Intents for BSD Sockets            July 2018


   calls to the unmodified BSD Socket API or other APIs.  We provide
   helpers to apply all socket options from the "result" data structure.

   All relevant infos are stored in our addrinfo struct (see Figure 2)

   SIGNATURE:

   int muacc_ai_getaddrinfo(const char * hostname, const char * service,
   const struct muacc_addrinfo * hints, struct muacc_addrinfo ** result)

   ARGUMENTS:

   hostname:  Remote host name to be resolved

   service:  Remote service to be resolved

   hints:  Hints for resolving the name.  Contents include family,
      socket type, protocol, socket options (including Socket Intents
      for this socket/connection), local address to bind to.

   result:  Data structure for result of name resolution

   RETURN VALUE:

   Returns 0 on success, or an error code as provided by getaddrinfo().


























Tiesel & Enghardt        Expires January 3, 2019               [Page 12]

Internet-Draft       Socket Intents for BSD Sockets            July 2018


   /** Extended version of the standard library's struct addrinfo
    *
    * This is used both as hint and as result from the
    * muacc_ai_getaddrinfo * function. This structure
    * differs from struct addrinfo only in the three members
    * ai_bindaddrlen, ai_bindaddr and ai_socketopt.
    */
   struct muacc_addrinfo {
       int ai_flags;
       int ai_family;
       int ai_socktype;
       int ai_protocol;

       /** Not included in struct addrinfo. Purpose:
         * 1. If the structure is given to muacc_ai_getaddrinfo
         *    as hints, you set socket intents that influence MAM's
         *    source and destination as well as transport protocol
         *    selection
         * 2. The recommended socket options MAM will be returned
         *    through this attribute.
         */
       struct socketopt *ai_sockopts;

       int ai_addrlen;
       struct sockaddr *ai_addr;
       char *ai_canonname;

       /** Not included in struct addrinfo.
         * Length of ai_bindaddr.
         */
       int ai_bindaddrlen;
       /** Not included in struct addrinfo.
         * Contains the address, which the MAM recommends us to bind to.
         */
       struct sockaddr *ai_bindaddr;

       struct muacc_addrinfo *ai_next;
   };

             Figure 2: Definition of the muacc_addrinfo struct

   Appendix A.2 shows an example usage of the classic API with most
   functionality in getaddrinfo.








Tiesel & Enghardt        Expires January 3, 2019               [Page 13]

Internet-Draft       Socket Intents for BSD Sockets            July 2018


5.3.  Socketconnect API

   In this API variant, we move the functionality of resolving a
   hostname and connecting to the resulting address into one function
   called "socketconnect()".  This API makes it possible to call
   socketconnect not only for each connection, but also to multiplex
   messages across multiple existing sockets.

   This function returns a file descriptor of a connected socket for the
   application to use.  This socket can either be a newly created one or
   a socket that existed previously and is now being reused.
   Furthermore, a socket can belong to a socket set of sockets with
   common destination and service.  These sockets may, e.g., be bound to
   different local addresses, but are treated as interchangeable by the
   API implementation.  So if the application passes a socket file
   descriptor to this function, it may get back a different file
   descriptor to a socket from the same set, e.g., to use the connection
   over a different local interface for its following communication.

   SIGNATURE:

   int socketconnect(int *socket, const char *host, size_t hostlen,
   const char *serv, size_t servlen, struct socketopt *sockopts, int
   domain, int type, int proto)

   ARGUMENTS:

   socket:  Existing socket file descriptor as representant to a socket
      set, "-1" to create a new socket, or "0" to automatically try to
      find a suitable socket set

   host:  Remote hostname to be resolved

   hostlen:  Length of remote hostname

   serv:  Remote service or port

   servlen:  Length of remote service

   socketopts:  List of socket options, including Socket Intents

   domain:  Domain of the socket

   type:  Type of the socket

   proto:  Protocol of the socket

   RETURN VALUE:



Tiesel & Enghardt        Expires January 3, 2019               [Page 14]

Internet-Draft       Socket Intents for BSD Sockets            July 2018


   Returns 0 on success if socket is from an existing socket set, 1 on
   success if socket was newly created, or -1 on fail.

   Appendix A.3 shows an example usage of the Socketconnect API.

6.  API Implementation Experiences & Lessons Learned

   While designing and implementing the different parts of the system as
   described in this document, we faced several challenges.  In the
   Multiple Access Manager discovering the currently available paths and
   statistics about their performance turned out to be quite complex and
   had to be implemented in a partially platform-dependent way.
   However, the most challenging parts were the Socket Intents API and
   Library, on which we focus in the following sections.

6.1.  The Missing Link to Name Resolution

   Transport option selection is most useful if crucial information,
   such as Socket Intents or other socket options, is available as early
   as possible, i.e., for name resolution.  The primary problem here is
   the order of the function calls that are involved in name resolution,
   destination selection, protocol, and path selection, and how they are
   linked.

   In the classic BSD Socket API, most functions either take a socket
   file descriptor as argument or return it, and thus link different
   function calls to the same flow.  However, "getaddrinfo()" is not
   linked to a socket file descriptor, and it is typically called before
   the socket is created.  At this point, it is not yet possible to set
   a socket option, because the socket does not exist yet.

   Consequently, across BSD Socket API calls, several choices are being
   made before it is possible to set a Socket Intent: A call to
   "getaddrinfo()" returns a linked list of "addrinfo" structs, where
   each entry contains an "ai_family" (IP version), the pair of
   "ai_socktype" and "ai_protocol" (transport protocol), and a
   "sockaddr" struct containing an IP address and port to connect to.
   Then a socket of the given family, type, and protocol is created.
   Only after this has been done, socket options can be set on the
   socket, but at this point destination, IP version, and transport
   protocol are already fixed.  Before calling "connect()", only the
   path to be used (i.e., the local address to bind to) can still be
   chosen, but the available paths and which one to prefer may be
   constrained by the choice of destination.

   The three variants described in Section 5 work around this problem in
   different ways:




Tiesel & Enghardt        Expires January 3, 2019               [Page 15]

Internet-Draft       Socket Intents for BSD Sockets            July 2018


   o  The approach in Section 5.2 places the whole automation of
      transport option selection into the "getaddrinfo()" function.  The
      results are returned in an extended "addrinfo" struct and have to
      be applied manually by the application, including binding to a
      source address representing the selected path and applying all
      socket options provided in a list, for each connection attempt.

   o  The approach in Section 5.1 adds a context to all socket- and name
      resolution-related API calls.

   o  The approach in Section 5.3 puts all functionality into one call.

   All of these approaches add the missing link between name resolution
   and the other parts of the API, but add a lot of state keeping either
   to the API, which the application developer has to manage, or to the
   Socket Intents library.

6.2.  File Descriptors Considered Harmful

   When using BSD sockets, file descriptors are the abstraction for
   network flows.  Depending on the transport protocol used, their
   semantics changes and these file handles represent streams
   (SOCK_STREAM), associations (SOCK_DRAM) or network interfaces
   (SOCK_RAW).  This does not provide a unified API, but is merely an
   artifact of squeezing networking into the "Everything is a file" UNIX
   philosophy.

   File descriptors make no good abstraction for automated protocol
   stack instance selection as applications have to adopt to changed
   semantics, e.g., whether message boundaries are preserved, depending
   on the transport protocol chosen.

   File descriptors make no good abstraction for destination instance
   selection and path selection either.  Once a socket has been created,
   its protocol stack instance is fixed, so selecting a path by binding
   to a local address and connecting to a destination instance is now
   only possible using this protocol stack instance.  If such a
   connection attempt fails, it is possible to retry using another path
   and destination, but changing the protocol stack instance requires
   creating a new socket with a different file descriptor.

   For further discussion of other asynchronous I/O weirdness with file
   descriptors see end of Section 6.3.








Tiesel & Enghardt        Expires January 3, 2019               [Page 16]

Internet-Draft       Socket Intents for BSD Sockets            July 2018


6.3.  Asynchronous API Anarchy

   Network I/O is asynchronous, but asynchronous I/O within the POSIX
   filesystem API is hard to use.  There are at least three different
   asynchronous I/O APIs for each operating system.

   To implement asynchronous I/O for our Socket Intents prototype, we
   wrapped one of the asynchronous I/O APIs that is available on most
   platforms: "select()".  To make Socket Intents accessible to more
   applications and on more platforms, a production-grade system would
   need to wrap all asynchronous I/O APIs and implement most of the
   socket creation logic, path selection and connection logic within
   these wrappers.  However, mixing asynchronous I/O and multithreading
   may lead to unintuitive behavior, e.g., calling our prototype's
   select() from different threads could lead to anything from deadlocks
   to busy waiting.

   Another issue is that we use Unix domain sockets to communicate
   between our Multiple Access Manager and the Socket Intents API
   library called by the application, so we need to make sure that the
   application does not block on communication with the Multiple Access
   Manager.

   Also the problems with using file descriptors get even worse.  If a
   Socket API call should return immediately, it needs to provide the
   application with a reference to a flow that has not yet been fully
   set up, i.e., a reference to a "future" socket.  An implementation of
   such an asynchronous API has to return an unconnected socket file
   descriptor, on which the application then calls, e.g., "select()",
   and starts using it once it becomes readable and writable.  If the
   destination, path and transport protocol have not been chosen yet at
   this point, the file descriptor returned by the implementation might
   not yet have the final family and transport protocol.  When the
   implementation later creates the final socket of the right type, it
   can re-bind it to the file-id of the originally returned file
   descriptor using "dup2".  This procedure can easily lead to time-of-
   check to time-of-use confusion.  To make things even worse, the
   application can copy the "future" file descriptor using "dup", which
   is rarely useful for sockets, but in combination with file
   descriptors used as "future" it leads to unexpected behavior.

6.4.  Here Be Dragons hiding in Shadow Structures

   The API variants described in Section 5.3 and Section 5.1 need to
   keep a lot of state in shadow structures that cannot be passed
   between the Socket API calls otherwise.  This state needs to be
   cleaned up when the last copy of the file descriptor is closed or the




Tiesel & Enghardt        Expires January 3, 2019               [Page 17]

Internet-Draft       Socket Intents for BSD Sockets            July 2018


   last socket held for reuse has timed out.  In addition, access to
   these shadow structures has to be thread-safe.

   Implementing both has turned out to be extremely error-prone and
   there is a high amount of unspecified behavior and platform-dependent
   extensions in the system library.  These issues guarantee that an
   implementation of transport option selection that nicely integrates
   with BSD Sockets will come with lots of limitations and will not be
   portable across POSIX-compliant operating systems.

7.  Conclusion

   Adding transport option selection to BSD Sockets is hard, as the API
   calls are not designed to defer making and applying choices to a
   moment where all information needed for transport option selection is
   available.

   After all, if limiting transport option selection to the granularity
   BSD Sockets typically provide today (TCP connections and UDP
   associations), the API variant described in Section 5.2 seems to be a
   good compromise, even if it forces the application to try all
   candidates itself (either in a sequential or partial parallel
   fashion).  This option is easily deployable, but does not include
   automation of techniques like connection caching or HTTP pipelining.

   The most versatile API variant described in Section 5.3 implements
   connection caching on the transport layer.  This comes at the cost of
   heavily modifying existing applications.  If feasible, given the
   unnecessary complexity of the file I/O integration of BSD sockets, it
   seems easier to move to a totally different system like
   [I-D.trammell-taps-post-sockets].

8.  Acknowledgments

   The API variant described in Section 5.2 was originally drafted and
   implemented by Tobias Kaiser mail@tb-kaiser.de [2] as part of his BA
   thesis.

   This work has been supported by Leibniz Prize project funds of DFG -
   German Research Foundation: Gottfried Wilhelm Leibniz-Preis 2011 (FKZ
   FE 570/4-1).

9.  References








Tiesel & Enghardt        Expires January 3, 2019               [Page 18]

Internet-Draft       Socket Intents for BSD Sockets            July 2018


9.1.  Informative References

   [ANRW17-MH]
              Tiesel, P., May, B., and A. Feldmann, "Multi-Homed on a
              Single Link", Proceedings of the 2016 workshop on Applied
              Networking Research Workshop - ANRW 16,
              DOI 10.1145/2959424.2959434, 2016.

   [ANRW18-Metrics]
              "Metrics for access network selection (ANRW 2018)", n.d..

   [I-D.tiesel-taps-communitgrany]
              Tiesel, P. and T. Enghardt, "Communication Units
              Granularity Considerations for Multi-Path Aware Transport
              Selection", draft-tiesel-taps-communitgrany-02 (work in
              progress), May 2018.

   [I-D.tiesel-taps-socketintents]
              Tiesel, P., Enghardt, T., and A. Feldmann, "Socket
              Intents", draft-tiesel-taps-socketintents-01 (work in
              progress), October 2017.

   [I-D.trammell-taps-post-sockets]
              Trammell, B., Perkins, C., Pauly, T., Kuehlewind, M., and
              C. Wood, "Post Sockets, An Abstract Programming Interface
              for the Transport Layer", draft-trammell-taps-post-
              sockets-03 (work in progress), October 2017.

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119,
              DOI 10.17487/RFC2119, March 1997, <https://www.rfc-
              editor.org/info/rfc2119>.

   [RFC6824]  Ford, A., Raiciu, C., Handley, M., and O. Bonaventure,
              "TCP Extensions for Multipath Operation with Multiple
              Addresses", RFC 6824, DOI 10.17487/RFC6824, January 2013,
              <https://www.rfc-editor.org/info/rfc6824>.

   [RFC7413]  Cheng, Y., Chu, J., Radhakrishnan, S., and A. Jain, "TCP
              Fast Open", RFC 7413, DOI 10.17487/RFC7413, December 2014,
              <https://www.rfc-editor.org/info/rfc7413>.

   [RFC7556]  Anipko, D., Ed., "Multiple Provisioning Domain
              Architecture", RFC 7556, DOI 10.17487/RFC7556, June 2015,
              <https://www.rfc-editor.org/info/rfc7556>.






Tiesel & Enghardt        Expires January 3, 2019               [Page 19]

Internet-Draft       Socket Intents for BSD Sockets            July 2018


9.2.  URIs

   [1] https://github.com/fg-inet/socket-intents/

   [2] mailto:mail@tb-kaiser.de

Appendix A.  API Usage Examples

A.1.  Usage Example of the Classic / muacc_context API

   In this example, a client application sets up a connection to a
   remote host and sends data to it.  It specifies two Socket Intents on
   this connection: The Category of Bulk Transfer and the File Size of 1
   MB.

#define LENGTH_OF_DATA 1048576

// Create and initialize a context to retain information across function
// calls
muacc_context_t ctx;
muacc_init_context(&ctx);

int socket = -1;

struct addrinfo *result = NULL;

// Initialize a buffer of data to send later.
char buf[LENGTH_OF_DATA];
memset(&buf, 0, LENGTH_OF_DATA);

// Set Socket Intents for this connection. Note that the "socket" is
// still invalid, but it does not yet need to exist at this time. The
// Socket Intents prototype just sets the Intent within the
// muacc_context data structure.

enum intent_category category = INTENT_BULKTRANSFER;
muacc_setsockopt(&ctx, socket, SOL_INTENTS,
    INTENT_CATEGORY, &category, sizeof(enum intent_category));

int filesize = LENGTH_OF_DATA;
muacc_setsockopt(&ctx, socket, SOL_INTENTS,
    INTENT_FILESIZE, &filesize, sizeof(int));


// Resolve a host name. This involves a request to the MAM, which can
// automatically choose a suitable local interface or other parameters
// for the DNS request and set other parameters, such as preferred
// address family or transport protocol.



Tiesel & Enghardt        Expires January 3, 2019               [Page 20]

Internet-Draft       Socket Intents for BSD Sockets            July 2018


muacc_getaddrinfo(&ctx, "example.org", NULL, NULL, &result);

// Create the socket with the address family, type, and protocol
// obtained by getaddrinfo.
socket = muacc_socket(&ctx, result->ai_family, result->ai_socktype,
    result->ai_protocol);

// Connect the socket to the remote endpoint as determined by
// getaddrinfo.  This involves another request to MAM, which may at this
// point, e.g., choose to bind the socket to a local IP address before
// connecting it.
muacc_connect(&ctx, socket, result->ai_addr, result->ai_addrlen);

// Send data to the remote host over the socket.
write(socket, &buf, LENGTH_OF_DATA);

// Close the socket. This de-initializes any data that was stored within
// the muacc_context.
muacc_close(&ctx, socket);


A.2.  Usage Example of the Classic / getaddrinfo API

   As in Appendix A.1, the application sets the Intents "Category" and
   "File Size".


























Tiesel & Enghardt        Expires January 3, 2019               [Page 21]

Internet-Draft       Socket Intents for BSD Sockets            July 2018


#define LENGTH_OF_DATA 1048576

// Define Intents to be set later
enum intent_category category = INTENT_BULKTRANSFER;
int filesize = LENGTH_OF_DATA;

struct socketopt intents = { .level = SOL_INTENTS,
    .optname = INTENT_CATEGORY, .optval = &category, .next = NULL};
struct socketopt filesize_intent = { .level = SOL_INTENTS,
    .optname = INTENT_FILESIZE, .optval = &filesize, .next = NULL};

intents.next = &filesize_intent;

// Initialize a buffer of data to send later.
char buf[LENGTH_OF_DATA];
memset(&buf, 0, LENGTH_OF_DATA);

struct muacc_addrinfo intent_hints = { .ai_flags = 0,
    .ai_family = AF_INET, .ai_socktype = SOCK_STREAM, .ai_protocol = 0,
    .ai_sockopts = &intents, .ai_addr = NULL, .ai_addrlen = 0,
    .ai_bindaddr = NULL, .ai_bindaddrlen = 0, .ai_next = NULL };

struct muacc_addrinfo *result = NULL;

muacc_ai_getaddrinfo("example.org", NULL, &intent_hints,
    &result);

// Create and connect the socket, using the information obtained through
// getaddrinfo
int fd;
fd = socket(result->ai_family, result->ai_socktype,
    result->ai_protocol);
muacc_ai_simple_connect(fd, result);

// Send data to the remote host over the socket, then close it.
write(fd, &buf, LENGTH_OF_DATA);
close(fd);

muacc_ai_freeaddrinfo(result);


A.3.  Usage Example of the Socketconnect API

   As in Appendix A.1, the application sets the Intents "Category" and
   "File Size".  As we provide "-1" as socket, no we do not reuse
   existing connections.





Tiesel & Enghardt        Expires January 3, 2019               [Page 22]

Internet-Draft       Socket Intents for BSD Sockets            July 2018


   #define LENGTH_OF_DATA 1048576

   // Define Intents to be set later
   enum intent_category category = INTENT_BULKTRANSFER;
   int filesize = LENGTH_OF_DATA;

   struct socketopt intents = { .level = SOL_INTENTS,
       .optname = INTENT_CATEGORY, .optval = &category, .next = NULL};
   struct socketopt filesize_intent = { .level = SOL_INTENTS,
       .optname = INTENT_FILESIZE, .optval = &filesize, .next = NULL};

   intents.next = &filesize_intent;

   // Initialize a buffer of data to send later.
   char buf[LENGTH_OF_DATA];
   memset(&buf, 0, LENGTH_OF_DATA);

   int socket = -1;

   // Get a socket that is connected to the given host and service,
   // with the given Intents
   socketconnect(&socket, "example.org", 11, "80", 2, &intents, AF_INET,
       SOCK_STREAM, 0);

   // Send data to the remote host over the socket.
   write(socket, &buf, LENGTH_OF_DATA);

   // Close the socket and tear down the data structure kept for it
   // in the library
   socketclose(socket);

Appendix B.  Changes

B.1.  Since -01

   o  Updated list of gathered path characteristics

   o  Reordered start of Policy section to make it clearer

B.2.  Since -00

   o  Fixed Author's affiliations and funding

   o  Fixed acknowledgments







Tiesel & Enghardt        Expires January 3, 2019               [Page 23]

Internet-Draft       Socket Intents for BSD Sockets            July 2018


Authors' Addresses

   Philipp S. Tiesel
   TU Berlin
   Marchstr. 23
   Berlin
   Germany

   Email: philipp@inet.tu-berlin.de


   Theresa Enghardt
   TU Berlin
   Marchstr. 23
   Berlin
   Germany

   Email: theresa@inet.tu-berlin.de

































Tiesel & Enghardt        Expires January 3, 2019               [Page 24]