Network Working Group J. Hildebrand Internet-Draft Cisco Systems Intended status: Informational B. Trammell Expires: August 16, 2015 ETH Zurich February 12, 2015 Substrate Protocol for User Datagrams (SPUD) Prototype draft-hildebrand-spud-prototype-01 Abstract SPUD is a prototype for grouping UDP packets together in a "tube", also allowing network devices on the path between endpoints to participate explicitly in the tube outside the end-to-end context. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on August 16, 2015. Copyright Notice Copyright (c) 2015 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Hildebrand & Trammell Expires August 16, 2015 [Page 1] Internet-Draft I-D February 2015 1. Introduction The goal of SPUD (Substrate Protocol for User Datagrams) is to provide a mechanism for grouping UDP packets together into a "tube" with a defined beginning and end in time. Devices on the network path between the endpoints speaking SPUD may communicate explicitly with the endpoints outside the context of the end-to-end conversation. The SPUD protocol is a prototype, intended to promote further discussion of potential use cases within the framework of a concrete approach. To move forward, ideas explored in this protocol might be implemented inside another protocol such as DTLS. 1.1. Terminology In this document, the key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" are to be interpreted as described in BCP 14, RFC 2119 [RFC2119]. 2. Requirements o Deploy on existing Internet o No kernel modifications required o Only widely-available APIs required o No root permissions required for endpoint applications o New choices for congestion, retransmit, etc. available in transport protocols inside SPUD o Single firewall-traversal mechanism, multiple transport semantics o Low overhead * Determine SPUD is in use (very fast) * Associate packets with a tube (relatively fast) o Policy per-tube o Multiple interfaces for each endpoint Hildebrand & Trammell Expires August 16, 2015 [Page 2] Internet-Draft I-D February 2015 3. Lifetime of a tube A tube is a grouping of packets between two endpoints on the network. Tubes are started by the "initiator" expressing an interest in comminicating with the "responder". A tube may be closed by either endpoint. A tube may be in one of the following states: unknown no information is currently known about the tube. All tubes implicitly start in the unknown state. opening the initiator has requested a tube that the responder has not yet acknowledged. running the tube is set up and will allow data to flow resuming an out-of-sequence SPUD packet has been received for this tube. Policy will need to be developed describing how (or if) this state can be exploited for quicker tube resumption by higher- level protocols. This leads to the following state transitions (see Section 4.2 for details on the commands that cause transitions): +--------------------+ +-----+ | | |close| | v | v | +-----open--- +-------+ <--close----+ | | |unknown| | | | +------> +-------+ --ack,-+ | | | | data | | | | close | | | v | v | | +-------+ -------data-------> +--------+ | +---|opening| |resuming|---+ | | +-------+ <------open-------- +--------+ | | | ^ | | ^ | | | | | | | | | +open-+ +-ack--> +-------+ <--ack-+ +-data+ | |running| +-------close------- +-------+ ^ | | | open,ack,data +----+ Figure 1: State transitions Hildebrand & Trammell Expires August 16, 2015 [Page 3] Internet-Draft I-D February 2015 4. Packet layout SPUD packets are sent inside UDP packets, with the SPUD header directly after the UDP header. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | magic = 0xd80000d8 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |cmd|a|p| tube ID | +-+-+-+-+ + | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | CBOR Map | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 2: SPUD packets The fields in the packet are: o 32-bit constant magic number (see Section 4.1) o 2 bits of command (see Section 4.2) o 1 bit marking this packet as an application declaration (adec) o 1 bit marking this packet as a path declaration (pdec) o 60 bits defining the id of this tube o Data. If any of the command, adec, or pdec bits are set, the data is CBOR. 4.1. Detecting usage The first 32 bits of every SPUD packet is the constant bit pattern d80000d8 (hex), or 1101 1000 0000 0000 1101 1000 (binary). This pattern was selected to be invalid UTF-8, UTF-16 (both big- and little-endian), and UTF-32 (both big- and little-endian). The intent is to ensure that text-based non-SPUD protocols would not use this pattern by mistake. A survey of other protocols will be done to see if this pattern occurs often in existing traffic. The intent of this magic number is not to provide conclusive evidence that SPUD is being used in this packet, but instead to allow a very fast (i.e., trivially implementable in hardware) way to decide that SPUD is not in use on packets that do not include the magic number. Hildebrand & Trammell Expires August 16, 2015 [Page 4] Internet-Draft I-D February 2015 4.2. Commands The next 2 bits of a SPUD packet encode a command: Data (00) Normal data in a running tube Open (01) A request to begin a tube Close (10) A request to end a tube Ack (11) An acknowledgement to an open request 4.3. Declaration bits The adec bit is set when the application is making a declaration to the path. The pdec bit is set when the path is making a declaration to the application. 4.4. Additional information The information after the SPUD header is a CBOR [RFC7049] map (major type 5). Each key in the map may be an integer (major type 0 or 1) or a text string (major type 3). Integer keys are reserved for standardized protocols, with a registry defining their meaning. This convention can save several bytes per packet, since small integers only take a single byte in the CBOR encoding, and a single-character string takes at least two bytes (more when useful-length strings are used). The only integer keys reserved by this version of the document are: 0 (anything) Application Data. Any CBOR data type, used as application-specific data. Often this will be a byte string (major type 2), particularly for protocols that encrypt data. The overhead for always using CBOR is therefore effectively three or more bytes 0xA1 (map with one element), 0x00 (integer 0 as the key), and 0x41 (byte string containing one byte). [EDITOR'S NOTE: It may be that the simplicity and extensisbility of this approach is worth the three bytes of overhead.] 5. Initiating a tube To begin a tube, the initiator sends a SPUD packet with the "open" command (bits 01). Future versions of this specification may contain CBOR requesting proof of implementation from the receiving endpoint. Hildebrand & Trammell Expires August 16, 2015 [Page 5] Internet-Draft I-D February 2015 6. Acknowledging tube creation To acknowledge the creation of a tube, the responder sends a SPUD packet with the "ack" command (bits 11). The current thought is that the security provided by the TCP three-way handshake would be left to transport protocols inside of SPUD. Further exploration of this prototype will help decide how much of this handshake needs to be made visible to path elements that _only_ process SPUD. 7. Closing a tube To close a tube, either side sends a packet with the "close" command (bits 10). Whenever a path element sees a close packet for a tube, it MAY drop all stored state for that tube. Further exploration of this prototype will determine when close packets are sent, what CBOR they contain, and how they interact with transport protocols inside of SPUD. What is likely at this time is that SPUD close packets MAY contain error information in the following CBOR keys (and associated values): "error" (map, major type 5) a map from text string (major type 3) to text string. The keys are [RFC5646] language tags, and the values are strings that can be presented to a user that understands that language. The key "*" can be used as the default. "url" (text string, major type 3) a URL identifying some information about the path or its relationship with the tube. The URL represents some path condition, and retrieval of content at the URL should include a human-readable description. 8. Path declarations SPUD can be used for path declarations: information delivered to the endpoints from devices along the path. Path declarations can be thought of as enhanced ICMP for transports using SPUD, allowing information about the condition or state of the path or the tube to be communicated directly to a sender. Path declarations may be sent in either direction (toward the initiator or responder) at any time. The scope of a path declaration is the tube (identified by tube ID) to which it is associated. Devices along the path cannot make declarations to endpoints without a tube to associate them with. Path declarations are sent to one endpoint in a SPUD conversation by the path device sending SPUD packets with the source IP address and UDP port from the other endpoint in the conversation. These "spoofed" packets are required to allow existing network elements that pass traffic for a given Hildebrand & Trammell Expires August 16, 2015 [Page 6] Internet-Draft I-D February 2015 5-tuple to continue to work. To ensure that the context for these declarations is correct, path declaration packets MUST have the pdec bit set. Path declarations MUST use the "data" command (bits 00). Path declarations do not imply specific required actions on the part of receivers. Any path declaration MAY be ignored by a receiving application. When using a path declaration as input to an algorithm, the application will make decisions about the trustworthiness of the declaration before using the data in the declaration. The data associated with a path declaration may always have the following keys (and associated values), regardless of what other information is included: "ipaddr" (byte string, major type 2) the IPv4 address or IPv6 address of the sender, as a string of 4 or 16 bytes in network order. This is necessary as the source IP address of the packet is spoofed "cookie" (byte string, major type 2) data that identifies the sending path element unambiguously "url" (text string, major type 3) a URL identifying some information about the path or its relationship with the tube. The URL represents some path condition, and retrieval of content at the URL should include a human-readable description. "warning" (map, major type 5) a map from text string (major type 3) to text string. The keys are [RFC5646] language tags, and the values are strings that can be presented to a user that understands that language. The key "*" can be used as the default. The SPUD mechanism is defined to be completely extensible in terms of the types of path declarations that can be made. However, in order for this mechanism to be of use, endpoints and devices along the path must share a relatively limited vocabulary of path declarations. The following subsections briefly explore declarations we believe may be useful, and which will be further developed on the background of concrete use cases to be defined as part of the SPUD effort. Terms in this vocabulary considered universally useful may be added to the SPUD path declaration map keys, which in this case would then be defined as an IANA registry. Hildebrand & Trammell Expires August 16, 2015 [Page 7] Internet-Draft I-D February 2015 8.1. ICMP ICMP [RFC4443] (e.g.) messages are sometimes blocked by path elements attempting to provide security. Even when they are delivered to the host, many ICMP messages are not made available to applications through portable socket interfaces. As such, a path element might decide to copy the ICMP message into a path declaration, using the following key/value pairs: "icmp" (byte string, major type 2) the full ICMP payload. This is intended to allow ICMP messages (which may be blocked by the path, or not made available to the receiving application) to be bound to a tube. Note that sending a path declaration ICMP message is not a substitute for sending a required ICMP or ICMPv6 message. "icmp-type" (unsigned, major type 0) the ICMP type "icmp-code" (unsigned, major type 0) the ICMP code Other information from particular ICMP codes may be parsed out into key/value pairs. 8.2. Address translation SPUD-aware path elements that perform Network Address Translation MUST send a path declaration describing the translation that was done, using the following key/value pairs: "translated-external-address" (byte string, major type 2) The translated external IPv4 address or IPv6 address for this endpoint, as a string of 4 or 16 bytes in network order "translated-external-port" (unsigned, major type 0) The translated external UDP port number for this endpoint "internal-address" (byte string, major type 2) The pre-translation (internal) IPv4 address or IPv6 address for this endpoint, as a string of 4 or 16 bytes in network order "internal-port" (unsigned, major type 0) The pre-translation (internal) UDP port number for this endpoint The internal addresses are useful when multiple address translations take place on the same path. Hildebrand & Trammell Expires August 16, 2015 [Page 8] Internet-Draft I-D February 2015 8.3. Tube lifetime SPUD-aware path elements that are maintaining state MAY drop state using inactivity timers, however if they use a timer they MUST send a path declaration in both directions with the length of that timer, using the following key/value pairs: "inactivity-timer" (unsigned, major type 0) The length of the inactivity timer (in microseconds). A value of 0 means no timeout is being enforced by this path element, which might be useful if the timeout changes over the lifetime of a tube. 8.4. Explicit congestion notification Similar to ICMP, getting explicit access to ECN [RFC3168] information in applications can be difficult. As such, a path element might decide to generate a path declaration using the following key/value pairs: "ecn" (True, major type 7) congestion has been detected [EDITOR'S NOTE: we will track current proposals to improve ECN resolution here. DCTCP uses higher marking rate and lower response rate to get high resolution marking; we have ints, which are more powerful, if we can find an algorithm simple enough for path elements to use.] 8.5. Path element identity Path elements can describe themselves using the following key/value pairs: "description" (text string, major type 3) the name of the software, hardware, product, etc. that generated the declaration "version" (text string, major type 3) the version of the software, hardware, product, etc. that generated the declaration "caps" (byte string, major type 2) a hash of the capabilities of the software, hardware, product, etc. that generated the declaration [TO BE DESCRIBED] "ttl" (unisigned integer, major type 0) IP time to live / IPv6 Hop Limit of associated device [EDITOR'S NOTE: more detail is required on how this is calculated] Hildebrand & Trammell Expires August 16, 2015 [Page 9] Internet-Draft I-D February 2015 8.6. Maximum Datagram Size A path element may tell the endpoint the maximum size of a datagram it is willing or able to forward for a tube, to augment various path MTU discovery mechanisms. This declaration uses the following key/ value pairs: "mtu" (unsigned, major type 0) the maximum transmission unit (in bytes) 8.7. Rate Limit A path element may tell the endpoint the maximum data rate (in octets or packets) that it is willing or able to forward for a tube. As all path declarations are advisory, the device along the path must not rely on the endpoint to set its sending rate at or below the declared rate limit, and reduction of rate is not a guarantee to the endpoint of zero queueing delay. This mechanism is intended for "gross" rate limitation, i.e. to declare that the output interface is connected to a limited or congested link, not as a substitute for loss-based or explicit congestion notification on the RTT timescale. This declaration uses the following key/value pairs: "max-byte-rate" (unsigned, major type 0) the maximum bandwidth (in bytes per second) "max-packet-rate" (unsigned, major type 0) the maximum bandwidth (in packets per second) 8.8. Latency Advisory A path element may tell the endpoint the latency attributable to traversing that path element. This mechanism is intended for "gross" latency advisories, for instance to declare the output interface is connected to a satellite or [RFC1149] link. This declaration uses the following key/value pairs: "latency" (unsigned, major type 0) the latency (in microseconds) 8.9. Prohibition Report A path element which refuses to forward a packet may declare why the packet was not forwarded, similar to the various Destination Unreachable codes of ICMP. [EDITOR'S NOTE: Further thought will be given to how these reports interact with the ICMP support from Section 8.1.] Hildebrand & Trammell Expires August 16, 2015 [Page 10] Internet-Draft I-D February 2015 9. Declaration reflection In some cases, a device along the path may wish to send a path declaration but may not be able to send packets ont he reverse path. It may ask the endpoint in the forward direction to reflect a SPUD packet back along the reverse path in this case. [EDITOR'S NOTE: Bob Briscoe raised this issue during the SEMI workshop, which has largely to do with tunnels. It is not clear to the authors yet how a point along the path would know that it must reflect a declaration, but this approach is included for completeness.] A reflected declaration is a SPUD packet with both the pdec and adec flags set, and contains the same content as a path declaration would. However the packet has the same source address and port and destination address and port as the SPUD packet which triggered it. When a SPUD endpoint receives a declaration reflection, it SHOULD reflect it: swapping the source and destination addresses IP addresses and ports. The reflecting endpoint MUST unset the adec bit, sending the packet it as if it were a path declaration. [EDITOR's NOTE: this facility will need careful security analysis before it makes it into any final specification.] 10. Application declarations Applications may also use the SPUD mechanism to describe the traffic in the tube to the application on the other side, and/or to any point along the path. As with path declarations, the scope of an application declaration is the tube (identified by tube ID) to which it is associated. An application declaration is a SPUD packet with the adec flag set, and contains an application declaration formatted in CBOR in its payload. As with path declarations, an application declaration is a CBOR map, which may always have the following keys: o cookie (byte string, major type 2): an identifier for this application declaration, used to address a particular path element Unless the cookie matches one sent by the path element for this tube, every device along the path MUST forward application declarations on towards the destination endpoint. Hildebrand & Trammell Expires August 16, 2015 [Page 11] Internet-Draft I-D February 2015 The definition of an application declaration vocabulary is left as future work; we note only at this point that the mechanism supports such declarations. 11. CBOR Profile Moving forward, we will likely specify a subset of CBOR that can be used in SPUD, including the avoidance of floating point numbers, indefinite-length arrays, and indefinite-length maps. This will allow a significantly less complicated CBOR implementation to be used, which would be particularly nice on constrained devices. 12. Security Considerations This gives endpoints the ability to expose information about conversations to elements on path. As such, there are going to be very strict security requirements about what can be exposed, how it can be exposed, etc. This prototype DOES NOT tackle these issues yet. The goal is to ensure that this layer is better than TCP from a security perspective. The prototype is clearly not yet to that point. 13. IANA Considerations If this protocol progresses beyond prototype in some way, a registry will be needed for well-known CBOR map keys. 14. Acknowledgements Thanks to Ted Hardie for suggesting the change from "Session" to "Substrate" in the title, and to Joel Halpern for suggesting the change from "session" to "tube" in the protocol description. 15. References 15.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition of Explicit Congestion Notification (ECN) to IP", RFC 3168, September 2001. Hildebrand & Trammell Expires August 16, 2015 [Page 12] Internet-Draft I-D February 2015 [RFC4443] Conta, A., Deering, S., and M. Gupta, "Internet Control Message Protocol (ICMPv6) for the Internet Protocol Version 6 (IPv6) Specification", RFC 4443, March 2006. [RFC5646] Phillips, A. and M. Davis, "Tags for Identifying Languages", BCP 47, RFC 5646, September 2009. [RFC7049] Bormann, C. and P. Hoffman, "Concise Binary Object Representation (CBOR)", RFC 7049, October 2013. 15.2. Informative References [RFC1149] Waitzman, D., "Standard for the transmission of IP datagrams on avian carriers", RFC 1149, April 1990. Authors' Addresses Joe Hildebrand Cisco Systems Email: jhildebr@cisco.com Brian Trammell ETH Zurich Email: ietf@trammell.ch Hildebrand & Trammell Expires August 16, 2015 [Page 13]