Internet Engineering Task Force M. Scharf Internet-Draft Alcatel-Lucent Bell Labs Intended status: Experimental July 1, 2010 Expires: January 2, 2011 Multi-Connection TCP (MCTCP) Transport draft-scharf-mptcp-mctcp-00 Abstract Multipath transport over potentially different paths can be realized by several coupled Transmission Control Protocol (TCP) connections. Multi-Connection TCP (MCTCP) transport aggregates multiple TCP connections between potentially different addresses into a single session that can be accessed by an application like a single TCP connection. MCTCP encodes control information, as far as possible, in the payload of the TCP connections and therefore requires only minor changes in the TCP implementations, and it is transparent in the single-path case. MCTCP is therefore proposed as a simple, modular, and extensible mechanism for multipath transport. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on January 2, 2011. Copyright Notice Copyright (c) 2010 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents Scharf Expires January 2, 2011 [Page 1] Internet-Draft Multi-Connection TCP July 2010 carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 3. Design Considerations . . . . . . . . . . . . . . . . . . . . 4 3.1. Objectives . . . . . . . . . . . . . . . . . . . . . . . . 4 3.2. Operation Summary . . . . . . . . . . . . . . . . . . . . 5 3.3. Differences to Other Multipath Transport Solutions . . . . 9 4. TCP Extensions by MCTCP . . . . . . . . . . . . . . . . . . . 13 4.1. Setup of the Initial Connection . . . . . . . . . . . . . 13 4.2. Setup of Coupled Connection . . . . . . . . . . . . . . . 14 4.3. Usage of Coupled Connections . . . . . . . . . . . . . . . 16 4.4. Operation Mode Switch . . . . . . . . . . . . . . . . . . 17 5. MCTCP Session Protocol Messages . . . . . . . . . . . . . . . 18 5.1. Data Segmentation and Encoding . . . . . . . . . . . . . . 18 5.2. Address Advertisement . . . . . . . . . . . . . . . . . . 20 5.3. Connection Management . . . . . . . . . . . . . . . . . . 21 6. MCTCP Session Policies and Algorithms . . . . . . . . . . . . 22 6.1. Message Scheduling . . . . . . . . . . . . . . . . . . . . 22 6.2. Congestion and Flow Control . . . . . . . . . . . . . . . 22 7. Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . 23 7.1. Interface between MCTCP and TCP . . . . . . . . . . . . . 23 7.2. Interface to Applications . . . . . . . . . . . . . . . . 24 8. Interaction with Middleboxes . . . . . . . . . . . . . . . . . 24 8.1. Middleboxes that Manipulate TCP Options . . . . . . . . . 24 8.2. Middleboxes that Change Content . . . . . . . . . . . . . 25 8.3. Middleboxes that Translate Addresses/Ports . . . . . . . . 25 8.4. Middleboxes that Want to Control MCTCP Traffic . . . . . . 26 9. Open Issues . . . . . . . . . . . . . . . . . . . . . . . . . 27 10. Security Considerations . . . . . . . . . . . . . . . . . . . 27 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 28 12. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . 28 13. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 29 14. References . . . . . . . . . . . . . . . . . . . . . . . . . . 29 14.1. Normative References . . . . . . . . . . . . . . . . . . . 29 14.2. Informative References . . . . . . . . . . . . . . . . . . 29 Appendix A. Possible Future MCTCP Extension . . . . . . . . . . . 29 Scharf Expires January 2, 2011 [Page 2] Internet-Draft Multi-Connection TCP July 2010 1. Introduction The objective of Multipath TCP is to enable multipath transport over multiple paths like a regular TCP connection [1]. The motivation for using multiple paths, as well as design considerations are discussed in [5]. One key question concerning the Multipath TCP protocol design is how to transport the control information, which is required for the setup and the teardown of different sub-flows, as well as for the segmentation and reassembly of the byte stream in the sender and receiver, respectively. One possibility is to encode this signaling information in several new TCP options [6]. This document describes Multi-Connection TCP (MCTCP) transport. MCTCP is an alternative solution that transports both application and control data with an own framing mechanism in the payload of parallel TCP connections, but only if multipath transport is really needed. MCTCP is simpler and more modular while providing almost the same service like a Multipath TCP protocol with option signaling. To applications, MCTCP offers the same reliable, in-order, byte- stream transport as TCP. It is designed to be backward-compatible with both applications and the network layer. Applications can use MCTCP exactly like a single TCP connection, as described in [9]. As long as multiple paths are not used, an MCTCP transfer is identical to a standard TCP transfer, except for a new TCP option in SYN segments that detects MCTCP support in the remote end. Once multi- connection transfer is enabled, data chunks are sent over several TCP connections with a new type-length-value (TLV) framing format. This framing also permits the exchange of arbitrary amounts of control information between the endpoints of the MCTCP session. The multiple TCP connections operate independently, but the MCTCP session coordinates the congestion control states. MCTCP can therefore use a coupled congestion control (e. g., [8]) that does not harm other network users. 2. Terminology This document uses a terminology that slighly differs to [6]: Path: A sequence of links between a sender and a receiver, defined in this context by a source and destination address pair. Initial connection: The first TCP connection between the two endpoints of the MCTCP session. Scharf Expires January 2, 2011 [Page 3] Internet-Draft Multi-Connection TCP July 2010 Coupled connection: A coupled connection is a follow-up TCP connection that is part of the session. It roughly corresponds to a "subflow" in [6]. Session: A collection of the initial connection and, if in use, one or more coupled TCP connections. The applications at the two endpoints of the session can communicate as if there was a single TCP connection only. For an application, there is a one-to-one mapping between a session and the socket. If a session includes only the initial connection, it is almost identical to a standard TCP connection, except for a new TCP option in the SYN segments. The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [3]. 3. Design Considerations This section gives a high-level, non-compulsory overview of MCTCP's design and its usage. 3.1. Objectives With multipath transport, applications should be able to use the aggregated bandwith of several paths without coping about details of data transport, path management, scheduling, and congestion control. This can improve both performance and resilience compared to the current data transport that is mostly limited to a single path. Yet, a multipath transport solution that requires multiple addresses at least on one side will only be useful under certain constraints: First, it requires endsystems with more than one interface. One example are mobile devices with several radio interfaces, which are increasingly common. But even in that case it can make sense to use one interface only, for instance in order to save battery energy. Second, due to the signaling overhead and the effort of negotiation, a multipath transport mechanism is mainly useful for long bulk data file transfers. In the Internet, this use case only represents a small subset of TCP's usage scenarios. Given this rather specific use case, this document argues that a multipath transport mechanism should neither require complex modifications of the TCP stack, nor fundamentally change the TCP data transmission as seen by middleboxes on the path, at least as long only a single path is in use. Obviously, once multipath transport is enabled, any middlebox performing deep packet inspection may get confused as it will only see that part of the byte stream that is transported over the corresponding path. As a consequence, on can Scharf Expires January 2, 2011 [Page 4] Internet-Draft Multi-Connection TCP July 2010 use a different framing format in that case. Furthermore, rapid deployment of a multipath solution would also significantly benefit from the possibility to implement it in the user space, as far as possible. Multi-Connection TCP (MCTCP) transport is designed to be a simple, modular, extensible, and non-disruptive multipath transport mechanism. Key design objectives are: o Backward-compatibility: MCTCP is designed to be entirely backward- compatible with a single TCP connection and falls back to standard TCP if it is not supported by both endsystems, or if the setup of additional coupled connections fails. o Few TCP options only: MCTCP only requires new, short TCP options in SYN segments, at least for the basic operation. As a result, middleboxes that strip, duplicate, or modify TCP options, drop such packets, or reassemble the byte stream cannot affect the integrity of the data transport. o Identical byte stream: MCTCP's byte stream is identical to a TCP connection until multipath usage gets negotiated, except for the new TCP option in the SYN. As a fall back, it is in principle even possible to seamlessly continue the transport of the whole application data over the initial TCP connection, if multipath transport fails (e. g., due to middleboxes). o Simplicity: MCTCP tries to minimize the changes required inside existing network stacks. Except for few pretty straightforward addons, a coupled TCP connection is setup, maintained, and closed like a standard TCP connection. The major functions of MCTCP can be implemented in the user-space. o Same API: MCTCP can provide the same API to applications like the existing TCP. o Multi-address assumption: MCTCP assumes that one or both endpoints of an MCTCP session are multihomed and multiaddressed. These objectives are achieved by defining two different operation modes of MCTCP, the single-connection and the multi-connection mode. 3.2. Operation Summary In single-connection mode, an MCTCP session is equivalent to a single TCP connection. The required minimum of control information is exchanged by TCP options. When multipath transfer shall be enabled, MCTCP switches to the multi-connection mode, in which it opens Scharf Expires January 2, 2011 [Page 5] Internet-Draft Multi-Connection TCP July 2010 additional, coupled TCP connections from or to possibly different addresses of the same endsystems. Initial and coupled connection are linked by two tokens in each session endpoint, which are exchanged during the setup of the initial connection. Each coupled TCP connection can transport control information and data chunks in messages that are encoded in a type-length-value (TLV) framing format. In multi-connection mode, the MCTCP transport on one of the coupled TCP connections is similar to the Transport Layer Security (TLS) protocol [4], except that data is not encrypted but partitioned over different connections. TLS can be used on top of MCTCP without requiring any adaptation. In summary, in single-connection mode MCTCP is transparent, while in multi-connection mode it acts as a shim layer between several coupled TCP connections and the upper protocol layers, with a payload encoding similar like TLS. This document does not specify that the MCTCP session can fall back to single-connection mode once multi- connection mode is successfully activated on both MCTCP session endpoints, but this feature could be added as a mean to further increase MCTCP's robustness when facing problems with certain types of middleboxes. +-------------------------------+ | Application | +-------------------------------+ ^^^^ |||| Byte stream (e. g., socket interface) VVVV +-------------------------------+ | MCTCP session layer | +-------------------------------+ ^^ ^ ^ ^^ Chunked || : Connection & : || Chunked data || : cong. control : || data VV V V VV +---------------+---------------+ | TCP connection| TCP connection| +-------------------------------+ | IP | IP | +-------------------------------+ Figure 1: MCTCP in the protocol stack Figure 1 shows the position of MCTCP in the protocol stack, as a shim layer between (coupled) TCP connections and upper-layer protocols or applications. For MCTCP's connection management and the coupled congestion control, the MCTCP session layer requires an additional Scharf Expires January 2, 2011 [Page 6] Internet-Draft Multi-Connection TCP July 2010 interface to each TCP connection, as well as some simple changes in the TCP stack, e. g., to set the new TCP option in SYN segments. Both modifications are straightforward and only affect a small subset of TCP's function. The MCTCP session layer can be implemented in the kernel space as an extension of the socket interface processing. Alternatively, the connection management, data segmentation/reassembly, and congestion control coupling can be realized in the user space, in combination with some small modifications of TCP. As an example, MCTCP could be implemented as an extension of the library that offers the socket interface to applications. In both cases the MCTCP session layer can be completely transparent to applications, i. e., they can continue to use the existing socket interface to TCP [9]. In the following, a high-level summary of normal operation of MCTCP is provided, for the scenario shown in Figure 2: o To a non-MCTCP-aware application, MCTCP will be transparent and indistinguishable from normal TCP. All MCTCP operation is handled by the MCTCP implementation, although extended APIs could provide additional control and influence [9]. An application begins by opening a TCP socket in the normal way. o An MCTCP session begins in single-connection mode with a single TCP connection ("initial connection"). This is illustrated in Figure 2 between Addresses A1 and B1 on Hosts A and B, respectively. o MCTCP uses an "Multipath Capable" TCP option in the SYN segments to determine whether both endsystems support MCTCP. If the option is not echoed in the SYN/ACK, the connection initiator knows that the destination is not MCTCP-capable. If the SYN segment has to be retransmitted, the connection initiator will not set the "Multipath Capable" TCP option again, in order to circumvent problems with middleboxes that cannot deal with unknown TCP options. In that case, multipath transport cannot be used to that destination. o MCTCP does not exchange much signaling information in single- connection mode, as this would require further TCP options outside SYN segments. The only exception is the non-mandatory "Mode" TCP option, which can be set by one endpoint in order to signal to the other endpoint that it shall switch to multi-connection mode by establishing a coupled connection to the same destination IP address, over which additional information can then be exchanged. If this TCP option is removed on the path, MCTCP may not be able to enable multipath transport in some usage scenarios (e. g., Scharf Expires January 2, 2011 [Page 7] Internet-Draft Multi-Connection TCP July 2010 behind NAPTs), but the single-connection transport will continue without being impacted. o If additional addresses are available, and if they shall be used, MCTCP switches to the multi-connection mode. o When entering multi-connection mode, the MCTCP session endpoints establish one or more coupled TCP connections. The first coupled connection should use the same IP source and destination address like the initial connection, in order to establish a control channel over which more information can be exchanged. Each coupled connection is added to the MCTCP session. o MCTCP identifies multiple paths by the presence of multiple addresses at endpoints, and it can establish coupled connections between combinations of these multiple addresses. In the example shown in Figure 2, coupled connections are set up between A1 and B1, and between A2 and B1. o The discovery and setup of additional coupled TCP connections will be achieved through a path management method described later in this document. o The coupled connection use TLV-encoded messages and can thus transport both control messages and data chunks. The data chunks include a session-level sequence number to allow the in-order reassembly of the data chunks from multiple coupled connections at the receiver. Scharf Expires January 2, 2011 [Page 8] Internet-Draft Multi-Connection TCP July 2010 Host A Host B ------------------------ ------------------------ Address A1 Address A2 Address B1 Address B2 ---------- ---------- ---------- ---------- | | | | | "Initial connection" setup | | ^ |--------------SYN+MPCAP------------>| | | |(incl. Multipath Capable TCP option)| | | Single- | | | | | conn. |<----------SYN/ACK+MPCAP------------| | | mode | | | | | |#####Byte stream data transfer######| | V | | | | ~ ~ ~ ~ | | | | | "Coupled connections" setup | | |--------------SYN+JOIN------------->| | |<-----------SYN/ACK+JOIN------------| | ^ | | | | | | |------SYN+JOIN------->| | | Multi- | |<----SYN/ACK+JOIN-----| | | conn. | | | | | mode |##########TLV data transfer#########| | | | | | | | | |##TLV data transfer###| | V | | | | Figure 2: MCTCP usage scenario For simplicity reasons, MCTCP does not send further data over the initial connection after it has triggered the transition to multi- connection mode. As a consequence, the initial connection will be unused in multi-connection mode. This document mandates to keep the connection open as long as other coupled connections exist. This design choice is motivated later in this document. 3.3. Differences to Other Multipath Transport Solutions MCTCP follows the design principles outlined in [5], but it differs to the protocol design described in [6], which uses TCP options to transport all control information. In the following, the key advantages of MCTCP are summarized: o MCTCP does not rely on frequently sent TCP options, in particular not on options that may have to be present in many packets. In the simplest case, it only requires two new types of TCP options which are set in SYN segments only. The required options are short and do not consume much of the TCP option space, which is Scharf Expires January 2, 2011 [Page 9] Internet-Draft Multi-Connection TCP July 2010 already scarce in SYNs. It should also be noted that the selective acknowledgment (SACK) option [2] is currently the only major TCP option that is sporadically set after connection setup. Yet, SACK options are only present after packet losses or reordering events, which are seldom, and they are often set in segments without payload. Adding sporadically other new TCP options to all kinds of segments may increase the complexity of the TCP sender, since the MSS must be adapted correspondingly. As a consequence, MCTCP may also be simpler to realize in combination with TCP segmentation offload on network cards. o MCTCP's operation is much more robust in combination with middleboxes that strip, duplicate, or modify TCP options and/or drop packets with unknown TCP options. The worst case is that multipath transport will not be enabled on a path with such middleboxes, but the data stream's integrity will not be affected. MCTCP can also be extended to be rather robust when middleboxes rewrite content, as it could use a checksum to savely detect content modifications in one or several connections, or even define retransmission schemes that could transfer such content over an alternative connection. Such enhanced mechanisms are only possible by a protocol on top of TCP. o MCTCP offers a simple mechanism by which a middlebox can prevent to transport any multi-connection traffic: It can simply drop SYN segments with the "JOIN" TCP option. In that case, unless routing changes, paths through that middlebox will not be used in multi- connection mode. If that middlebox is on the path of the initial connection, it will always see the whole, unmodified byte stream. This middlebox-friendly design is an advantage of the distinction between initial and coupled connections. o The TCP option space is limited to 40 byte. In multi-connection mode, MCTCP can exchange any amount of information between the endsystems. As such, it is more extensible and flexible. For instance, without length limitation, one can easily exchange a list of several IPv6 candidate addresses in the payload of a single TCP sgement. It would also be possible to announce lists of candidate port numbers or even to exchange address information in form of a Uniform Resource Identifer (URI) or any other referral object structure. o The design is modular, as the operation of a single TCP connection is almost independent from the multipath transport, except for the necessary coupling of congestion control. For instance, there is no need to modify the SACK scoreboards implementation and the flow control heuristics in existing TCP implementations. Scharf Expires January 2, 2011 [Page 10] Internet-Draft Multi-Connection TCP July 2010 o MCTCP has a reasonable deployment roadmap. Most functions of MCTCP can be realized in the user space with a small patch of the TCP implementation only. The required extensions inside the network stack are simple, straightforward, and non-disruptive. This means that MCTCP can initially be deployed mostly as a user space solution, without lacking any features. As a second step, once the protocol is widely supported in the Internet, it could become an integral part of the network stack. o The transport of control information in the payload is reliable and congestion-controlled. TLV-encoded messaging is straightforward and well-known, e. g., from TLS. MCTCP does not incorporate a mandatory acknowledgement mechanism and therefore does not require additional data transport in the reverse direction. o MCTCP can be extended in future, for instance to use a stronger protection for the coupling of connections, possibly even by exchange of cryptographic keys, if needed. A list of possible future extensions is provided in the appendix. MCTCP shares a number of properties of [6]. It can use a coupled congestion control in a similar way, and it is able to enable multipath transport under the same constraints. Still, it must be noted that there are a number of potential drawbacks of MCTCP's design as well: o MCTCP is designed for the use case of a bulk data transfer that starts as a single path transfer that is later "upgraded" in order to use multiple interfaces. This is the most obvious use case of multipath transfer, as transporting smaller amounts of data over multiple paths would result in a significant overhead. In contrast, MCTCP is less efficient if the multipath transfer shall be used right from the beginning of a transfer, due the backward- compatible design of MCTCP's single-connection mode that results in a very limited control. If this use case was important, an MCTCP variant with payload encoding in the initial connection could be developed, too. Its design is straightforward, but left for further study, as it would only be of use in certain scenarios. o MCTCP opens an additional TCP connection when switching to multi- connection mode, and it does not continue using the initial connection. The connection setup of the coupled connections results in a small delay, i. e., the path may not be completely utilized during a short time. An obvious optimization would be to transfer the congestion control state from the initial connection Scharf Expires January 2, 2011 [Page 11] Internet-Draft Multi-Connection TCP July 2010 to the first coupled connection, in order to avoid the TCP Slow- Start there. Both connections should use the same path. It must be noted that not using the initial connection after the switch- over to the multi-connection mode is the simplest solution. The "handover" process and the resulting delay could be minimized by further optimization, but this is left for further study. o MCTCP session endpoints do not exchange address information before entering the multi-connection mode, even if this would be possibly by additional TCP options [6]. Both endsystems can initiate a change of operation mode, and address information can be exchanged by the MCTCP session protocol once this is successful. If the "Mode" TCP option is supported, an endpoint can even trigger the setup of a coupled connection by the other endpoint, e. g., if that host is located behind a NAPT. Yet, while being in single- connection mode, MCTCP provides no means to learn other addresses. As a consequence, endsystems may try to enter the multi-connection mode in vain, if they assume that their peer is multi-homed. If that peer is not multi-homed, it can either agree to switch to multi-connection mode, or deny that (by not responding with a "Join" option). In the former case, an additional TCP connection is needlessly established between both peers, and in the latter case data transfer could briefly slow down until MCTCP falls back to single-connection mode. For long-lived connections that benefit most from multi-connection mode both cases hardly cause much harm. o Given that MCTCP transports control information in the payload, it is more complex for middleboxes to parse and potentially modify MCTCP's control information. In order to do so, a middlebox has to perform deep packet inspection and reassemble the messages of the coupled TCP connection(s). This may prevent certain operations and optimizations by middleboxes. However, it should be noted that middleboxes cannot affect the payload in other related protocols such as TLS neither, i. e., MCTCP is somehow similar to TLS in that sense. Of course, middleboxes can still perform certain forms of traffic engineering for an individual coupled connection, such as randomizing initial sequence numbers or modifying the advertized receive window (which may, of course, do harm to any end-to-end connection). A middlebox that wants to prevent MCTCP usage can simply and savely drop packets with the TCP "Join" option and will then not be passed by any multi- connection traffic. o If MCTCP detects that one coupled connection stalls, it can retransmit the data enqueued on that socket buffer over another connection, which can reduce the delivery time and prevent head- of-line blocking. However, MCTCP is not designed to retransmit a Scharf Expires January 2, 2011 [Page 12] Internet-Draft Multi-Connection TCP July 2010 lost segment immediately over another coupled connection, given that this would require complex changes of the SACK scoreboard implementation in each coupled connection. As a result, if congestion occurs on a subset of the coupled connections, the end- to-end delivery delay of MCTCP may be larger than a protocol that is tightly integrated into a protocol stack. In general, such an architecture could assign data more flexibly and more dynamically to the different interfaces. Yet, a reasonable MCTCP session layer scheduling can reduce the risk of head-of-line blocking by simply avoiding long send socket buffer queues. o MCTCP as defined in this document does not provide some signaling mechanisms of [6], such as the "DATA FIN". While it is obviously possible to add these mechanisms as well, it will result in a more complex protocol design and is therefore not addressed in this version of the protocol specification. 4. TCP Extensions by MCTCP This section describes the modifications in the TCP protocol that are required by MCTCP. MCTCP only defines additional TCP options. Several TCP options and mechanisms are similar to [6], but differ in details. Later, Section 7.1 describes to what information inside the TCP stack an MCTCP session must have access to. 4.1. Setup of the Initial Connection The initial connection of an MCTCP session is setup like a TCP connection with a three-way handshake. A connection initiator that wants to announce its MCTCP capability sets the "Multipath Capable" TCP option in the SYN, as shown in Figure 3. This option only declares that its sender is capable of using MCTCP, even if will not be enabled for that session. It includes a field that presents a locally-unique token identifying this connection. The two tokens will be used when adding additional coupled connections to verify that the endpoint is identical. 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +---------------+---------------+-------------------------------+ | Kind=OPT_MPCAP| Length=6 | Sender Token : +---------------+---------------+-------------------------------+ : Sender Token (contd.) | +-------------------------------+ Figure 3: Multipath Capable option This option MUST only be present in packets with the SYN flag set. Scharf Expires January 2, 2011 [Page 13] Internet-Draft Multi-Connection TCP July 2010 It is only used in the initial TCP connection, in order to identify the MCTCP session; all following (coupled) connections will use another, similar option to join the MCTCP session. If a SYN contains an "Multipath Capable" option but the SYN/ACK does not, it is assumed that the responder is not multipath capable and thus the MCTCP session MUST fall back to standard TCP. If a SYN does not contain a "Multipath Capable" option, the SYN/ACK MUST NOT contain one in response. There are two tokens in a MCTCP session, one per endsystem. The token is generated by the sender and has local meaning only. It MUST be unique for the sender. The token MUST be difficult for an attacker to guess, and thus it is recommended that it SHOULD be generated randomly. If the SYN packets are unacknowledged, it is up to a local policy to decide how to respond. A sender SHOULD fall back to standard TCP (i. e., without the "Multipath Capable" option) after a maximum number of attempts, in order to work around middleboxes that may drop packets with unknown options. The number of attempts that are made will be up to local policy. Once the connection initiator has sent a SYN without the "Multipath Capable" option, it MUST fall back to regular TCP behavior, even if it subsequently receives a SYN/ACK that contains an "Multipath Capable" option. This might happen if the "Multipath Capable" SYN and subsequent non-MP-capable SYN are reordered. This is to ensure that the two endpoints end up in an interoperable state, no matter what order the SYNs arrive at the passive opener. 4.2. Setup of Coupled Connection An MCTCP session can open additional, coupled TCP connections. These coupled TCP connections all run the MCTCP session protocol with TLV encoding, as specified below. The endsystems can also use the coupled connection to exchange knowledge about their own address(es) - in particular the first one. Using this knowledge, an endpoint can initiate further coupled connections over currently unused pairs of addresses. Either endpoint that is part of an MCTCP session SHOULD be able to initiate the creation of a new coupled connection. A new coupled connection is started as a normal TCP three-way- handshake. The "Join" TCP option (Figure 4) is used to identify of which session the new connection should become a part. The token used is the locally unique token of the destination for the connection, as received by the "Multipath Capable" option in the SYN/ ACK exchange of the initial connection. Scharf Expires January 2, 2011 [Page 14] Internet-Draft Multi-Connection TCP July 2010 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +---------------+---------------+-------------------------------+ | Kind=OPT_JOIN | Length=6 | Receiver Token : +---------------+---------------+-------------------------------+ : Receiver Token (contd.) | +-------------------------------+ Figure 4: Multipath Join option This option MUST only be present when the SYN flag is set. The recipient of the "Join" option with a token that is valid for an existing MCTCP session must decide whether to allow an additional coupled connection, or whether to deny it. If the coupled connection shall be established, the recipient of the SYN responds with a SYN/ ACK also containing a "Join" option, with the initiator's token. Otherwise, if the recipient decides to deny the setup of a coupled connection, it MUST reply with a TCP RST. If the token is unknown at the recipient, the recipient MUST also respond with a TCP RST in the same way as when an unknown TCP port is used. Similarly, if the initiator of a coupled connection receives a SYN/ACK with an invalid token or a SYN/ACK without the "Join" option, it must send a TCP RST. In all these cases, the setup procedure of that coupled connection MUST be abandoned. As a result, the endpoints MUST return to single- connection mode if it is the first coupled connection. If there are already other coupled connections, it SHOULD NOT use that address pair for multipath transport. The verification of the tokens in both endpoints of the MCTCP session ensures that the endpoints of a coupled connection are identical to the endpoints of the initial connection. Also, middleboxes that drop packets with SYN options, or strip the option, can be detected in that way. A local policy SHOULD ensure that an endpoint stops re-sending SYNs with the "Join" option if it receives TCP RST or if it does not receive corresponding SYN/ACKs. In general, an endpoint SHOULD NOT try to open further coupled connections if previous attempts to the same destination address failed. An endpoint SHOULD also refrain from attempts to switch to multi-connection mode if this repeatedly failed before; this SHOULD be governed by a local policy. Scharf Expires January 2, 2011 [Page 15] Internet-Draft Multi-Connection TCP July 2010 Host A Host B ------------------------ ------------------------ Address A1 Address A2 Address B1 Address B2 ---------- ---------- ---------- ---------- | | | | |---------SYN+MPCAP (Token A)------->| | ^ |<-----SYN/ACK+MPCAP (Token B)-------| | | Single- | | | | | conn. |########Initial connection##########| | | mode | | | | V ~ ~ ~ ~ | | | | |---------SYN+JOIN (Token B)-------->| | |<------SYN/ACK+JOIN (Token A)-------| | ^ | | | | | |<=====E. g., MCTCP Add. Address=====| | | Multi- | | | | | conn. | |----------SYN+JOIN (Token B)------->| | mode | |<-------SYN/ACK+JOIN (Token A)------| | | | | | | |######First coupled connection######| | | | | | | | | |#####Second coupled connection######| V | | | | Figure 5: Example use of MCTCP tokens Figure 5 illustrates the usage of the two MCTCP tokens. An endpoint can decide to switch to multi-connection mode any time, as long as the initial connection is established. In multi-connection mode, an endpoint can add further coupled connections at any time. 4.3. Usage of Coupled Connections The setup of the first coupled connection MUST use the same source and destination IP addresses and SHOULD use same destination port like the initial connection. This implies that the first coupled connection SHOULD be actively opened by the initiator of the initial connection. This constraint ensures that the first coupled connection indeed uses valid addresses and that it uses the same path like the initial connection. It also facilites user-space implementation and network address port translation (NAPT) traversal. The first coupled connection has a special role because it enables the exchange of addresses or other information, which can be useful to setup additional coupled connections. The token supplied in the initial connection's SYN exchange is used for the demultiplexing of coupled connections, i. e., to associate a Scharf Expires January 2, 2011 [Page 16] Internet-Draft Multi-Connection TCP July 2010 new coupled connection to an existing MCTCP session. This means that the port numbers in a SYN of a coupled connection MAY NOT be used for demultiplexing. Still, an active opener of a new coupled connection SHOULD use a destination port numbers that is already in use by the passive opener, as long as the 5-tuple is unique for each host. Once a coupled connection is established, demultiplexing packets is done using the five-tuple, as in traditional TCP. This strategy is intended to maximize the probability of the SYN being permitted by a firewall or network address port translation (NAPT) at the recipient and to avoid confusing any network monitoring software. Control information can be sent over any established coupled connection, and it always affects the MCTCP session as a whole. As control information and data chunks are transported over the same pipe and may experience queueing in the send buffer, it is reasonable to send important control information immediately after the establishment of a new coupled connections (as shown in Figure 4 for an "MCTCP Additional Address" message). A scheduler in the MCTCP session layer decides which MCTCP messages are sent over which coupled connection. 4.4. Operation Mode Switch An MCTCP session endpoint MUST change its operation mode from single- connection to multi-connection mode once the first coupled connection is sucessfully setup. Either endpoint of an MCTCP session can request the other endpoint to switch to multi-connection mode by a "Mode" TCP option that is depicted in Figure 6. This may be useful if only the other endpoint can establish coupled TCP connections, e. g., if it is located behind a middlebox performing network address port translation (NAPT). 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 +---------------+---------------+ | Kind=OPT_MODE | Length=2 | +---------------+---------------+ Figure 6: Mode option This TCP option MAY be set in segments of the initial connection. Its implementation is RECOMMENDED. It MAY be set in segments without or with payload once the initial connection is established, as long as the MCTCP session is not in multi-connection mode. The option is also allowed in SYN/ACK segments, but not in pure SYN segments. If it is set in the SYN/ACK, it asks the connection initiation to enter multi-connection mode immediately. When receiving a "Mode" TCP Scharf Expires January 2, 2011 [Page 17] Internet-Draft Multi-Connection TCP July 2010 option, an MCTCP endpoint MAY send a SYN with the "Join" TCP option to the destination address and port of the initial connection, and switch to multi-connection mode. It is also allowed to silently ignore that notification and to continue in single-connection mode. An endsystem MUST refrain from resending "Mode" TCP options frequently if the MCTCP session cannot successfully negotiate the multi-connection mode, in order to avoid needless effort. 5. MCTCP Session Protocol Messages All coupled TCP connections run the MCTCP session protocol, which transports both data chunks and control messages in the format that is defined in this section. 5.1. Data Segmentation and Encoding In multi-connection mode, MCTCP segments data in chunks and transports them as TLV-encoded messages over one or more coupled TCP connections. The framing format of these chunks is shown in Figure 7. 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +---------------+-------------------------------+---------------+ | Type=MSG_CHUNK| Total message length | Reserved | +---------------+-------------------------------+---------------+ | Session sequence number (64 bit) : +---------------------------------------------------------------+ : Session sequence number (contd.) | +---------------------------------------------------------------+ | | ~ Data chunk (variable) ~ | | +---------------------------------------------------------------+ Figure 7: MCTCP Data Chunk message MCTCP uses global sequence number during a session. The value 0 refers to the first byte that is sent over the initial connection. An MCTCP receiver reassembles the byte stream according to that sequence number and delivers the data in-order to the upper protocol layer or application. The sequence number in the first Data Chunk message sent over coupled TCP connections SHOULD be the first byte that the MCTCP implementation has not already enqueued on the initial connection. In that case, there is no overlap between data transported over the initial connection and data transport over the coupled connections, Scharf Expires January 2, 2011 [Page 18] Internet-Draft Multi-Connection TCP July 2010 which simplifies the reassembly. An MCTCP sender MAY also resend data that has already been written to the initial connection if a coupled connection can use a faster path, but it MUST NOT resend data that has already been acknowledged on the initial connection by the receiver. A sender SHOULD NOT write further data to the initial connection after it has sent its first Data Chunk message to a coupled connection, in order to simplify the reconstruction of the byte stream in the receiver. The initial connection transports the upper layer protocol's byte stream without any gaps, i. e., the global session sequence number implicitly increases continuously even after multi-connection mode is entered. As a consequence, maybe apart from redundancy, it does not make much sense to continue sending the application byte stream over the initial connection. A receiver SHOULD close the MCTCP session if it detects an inconsistency between the byte stream received over the initial connection and the data chunks on the coupled connections. In certain cases, byte counters in the sender and receiver could get desynchronized if a middlebox transparently changes the length of the content sent over the initial connection. As discussed in Section 8, this violation of TCP's end-to-end semantics can be detected in the receiver, e. g., if there is a gap between the first byte received fromr the coupled connections and the last byte received from the initial connection. In this cases this document defines that the MCTCP session SHOULD be closed. Further work is needed to define whether MCTCP should have a method to resynchronize the sequence numbers at sender and receiver in such cases, or whether there should be a fall back to single-connection mode, which would not suffer from ambiguity about sequence numbers. The maximum allowed size of an MCTCP message is 65535 octets. Therefore, the maximum data chunk size is 2^16-13 = 65523 octets. The minimum allowed data chunk size is 1 octet. The segmentation of the application byte stream into data chunks and their assignment to coupled TCP connections is decided by a local algorithm in the MCTCP sender, which may take into account the path characteristics such as MSS, congestion control state, and other relevant information (e. g., the page size in case of a kernel implementation). An efficient segmentation algorithm should avoid sending small data chunks to reduce the header overhead both in the MCTCP and TCP layer. MCTCP does not provide session layer acknowledgements. It is an allowed behavior for an MCTCP instance to free the memory after handing data over to a connection. In that case, if a coupled TCP Scharf Expires January 2, 2011 [Page 19] Internet-Draft Multi-Connection TCP July 2010 connection fails or if it is closed, it may be impossible to complete the transfer on other coupled connections. Alternatively, an MCTCP instance can cache sent data for a certain time. In particular, an MCTCP sender can duplicate or retransmit data chunks over other coupled connections, even with overlapping sequence numbers. This strategy is more efficient if the retransmission is sent over a coupled connection that does not have a long-standing sending queue. The MCTCP sender can infer the connection state from the sequence numbers and congestion control state of the individual connections. If a receiver observes a corrupted MCTCP message, e. g., by invalid TLV format, it SHOULD close the corresponding coupled connection by sending a TCP FIN. A future version of the MCTCP session protocol could define notification messages and retransmission requests over another coupled connection, or even a request for a retransmission in a different encoding format. 5.2. Address Advertisement As motivated in [5], path management refers to the exchange of information about additional paths between endpoints. MCTCP requires multiple addresses at endpoints to be able to use multiple, possibly at least partly disjoint paths. In multi-connection mode, MCTCP can explicitly signal additional addresses of one endpoint to the other endpoint, which allows it to initiate new connections. The MCTCP session can therefore also deal with addresses that change. The "Add Address" MCTCP message announces additional addresses on which an endpoint can be reached (Figure 8 and Figure 9). Multiple messages can be sent subsequently in order to advertise several addresses. This message can be sent at any time over any coupled connection. 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +---------------+-------------------------------+---------------+ | Type=MSG_AADD4| Total message length = 8 | Reserved | +---------------+-------------------------------+---------------+ | IPv4 address (32 bit) | +---------------------------------------------------------------+ Figure 8: MCTCP Additional IPv4 Address message In Figure 8, the "Additional Address" message is shown for IPv4. The reserved bits could be used to express priorities or policies (e. g., "use now"). Scharf Expires January 2, 2011 [Page 20] Internet-Draft Multi-Connection TCP July 2010 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +---------------+-------------------------------+---------------+ | Type=MSG_AADD6| Total message length = 20 | Reserved | +---------------+-------------------------------+---------------+ | | ~ IPv6 address (128 bit) ~ | | +---------------------------------------------------------------+ Figure 9: MCTCP Additional IPv6 Address message Furthermore, there are MCTCP message to remove candidate addresses, which are shown in Figure 10 and Figure 11. If an address is removed, an endpoint SHOULD NOT try to open further coupled connections to that address. Already established coupled connections are not affected by these messages and must be explicitly closed separately. 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +---------------+-------------------------------+---------------+ | Type=MSG_RADD4| Total message length = 8 | Reserved | +---------------+-------------------------------+---------------+ | IPv4 address (32 bit) | +---------------------------------------------------------------+ Figure 10: MCTCP Remove IPv4 Address message 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +---------------+-------------------------------+---------------+ | Type=MSG_RADD6| Total message length = 20 | Reserved | +---------------+-------------------------------+---------------+ | | ~ IPv6 address (128 bit) ~ | | +---------------------------------------------------------------+ Figure 11: MCTCP Remove IPv6 Address message 5.3. Connection Management Each coupled TCP connection is maintained individually. A FIN only closes that individual connection. If an application closes the socket, the MCTCP shim layer MUST close the initial connection and all existing coupled connection. Apart from that, the MCTCP layer Scharf Expires January 2, 2011 [Page 21] Internet-Draft Multi-Connection TCP July 2010 may always close (or even re-open) coupled connections, governed by the local path management policies. In multi-connection mode, the MCTCP session is only closed once all coupled connections are closed. Coupled connections can be kept in the half-open state, but the MCTCP connection management SHOULD avoid this. It would be possible to specify an MCTCP message for explicitly closing the MCTCP session, or several coupled connections, but this is left for further study. MCTCP SHOULD keep the initial connection established when being in multi-connection mode, even if it is not used for data transport any more. This allows to expose valid addresses and port numbers to the application [9]. Keep-alives MAY be sent. The initial connection is closed by the MCTCP layer when all coupled connections are closed. If the initial connection is closed, the whole MCTCP session SHOULD be closed, too. Further studies are needed to understand whether the initial connection could savely be closed earlier. 6. MCTCP Session Policies and Algorithms This document does not mandate specific policies how to use and share resources on the coupled connections. Still, this section addresses some important issues that an MCTCP implementation must take into account. 6.1. Message Scheduling Data and control messages can be assigned to any coupled TCP connection and are sent then over that connection. Messages may be duplicated or retransmitted for redundancy reasons. The receiver MUST process the messages in one coupled TCP connection in the order of arrival. In-order message processing among several coupled connection of one MCTCP session is not ensured. 6.2. Congestion and Flow Control The MCTCP protocol does not have an own congestion control, nor an own flow control. Instead, it relies on the algorithms in the individual TCP connections. In the following, the operation is explained more in detail for the multi-connection mode. In single- connection mode, there is no difference compared to a normal TCP connection. Concerning flow control, the operation is straightforward: If the MCTCP receiver runs out of buffer space, it stops reading data from one or more coupled TCP connections. Depending on TCP's flow control and the available receive buffer, the flow control on one or more connections may throttle data transport until the MCTCP layer can process data again. Scharf Expires January 2, 2011 [Page 22] Internet-Draft Multi-Connection TCP July 2010 The MCTCP layer SHOULD at least be able to queue one full-sized MCTCP message (i. e., 65535 byte) for each established coupled TCP connection. In order to avoid stalls of the data transfer, an endsystem SHOULD NOT actively or passively open coupled TCP connection when it is short on memory. Similarly, coupled connections SHOULD NOT be established if an application explicitly sets small send or receive buffer sizes [9]. The coupled connections have different congestion windows. To achieve resource pooling, it is necessary to couple the congestion windows in use on each connection, in order to push most traffic to uncongested links and avoid unfairness. One algorithm that aims at achieving this objective is presented in [8]. MCTCP is able to use this or other coupled congestion control algorithms. In addition, an MCTCP sender may have local policies to decide how much traffic to sent over the available connections. It could also obtain path cost metrics from the receivers. The latter could be realized by a new MCTCP messages defining connection priorities, which is left for further study. 7. Interfaces This section describes MCTCP's interfaces from a functional point of view. Their realization is implementation-specific. 7.1. Interface between MCTCP and TCP MCTCP must be able to control a small set of features inside a TCP stack and therefore requires a corresponding interface: o The MCTCP layer must be able to set a "Multipath Capable" or "Join" TCP option in SYN segments. It must also be notified if those options are set in an incoming SYN segment, it must be able to access the tokens, and it must be able to influence how to respond depending on the token value (i. e., either by a SYN/ACK or RST). o The MCTCP layer may set the "Mode" TCP option on the established initial connection, in any segment other than pure SYNs, and it should be notified if that option is received. o The MCTCP layer must be able to affect the congestion window on each coupled connection. Depending on the algorithm, it may be sufficient just to set periodically certain parameters of the congestion control, such as the additive increase factor. For efficient operation, MCTCP may also have to read certain Scharf Expires January 2, 2011 [Page 23] Internet-Draft Multi-Connection TCP July 2010 information from each coupled TCP connection, such as: o The current amount of acknowledged and unacknowledged data on that connection, or the corresponding pointers to the byte stream. o The receive window advertised by the other endpoint on that connection. o The estimated round-trip time. o The maximum transmission unit (MTU) of the path, or TCP's maximum segment size (MSS). Note that the MSS is not a constant value if TCP options are added to data segments. Many operating systems provide already information about a subset of these parameters by a kernel/user-space interface. 7.2. Interface to Applications MCTCP provides reliable, in-order, byte-stream transport to applications and thus can be used by legacy applications like a standard TCP connection [9]. When MCTCP is realized inside the network stack, it is a new function block between the TCP instance and the socket interface, which is transparent to applications. Alternatively, MCTCP can be implemented in large parts by a user- space library that accesses an extended network stack by the socket interface, which may have to be enhanced to provide some additional control functions as explained in the previous section. Applications could then still use the standard APIs to that library and would not be affected at all. Such a user-space implementation in combination with a simple patch of the network stack could facilitate the initial deployment of MCTCP. 8. Interaction with Middleboxes There are various types of middleboxes in the Internet. Some of them only parse a TCP stream (e. g., deep packet inspection), while others change TCP header fields on the fly, and some may even rewrite the TCP payload. MCTCP is designed to be compatible with most types of middleboxes, but as middlebox behavior is not well specified, some open issues may remain. 8.1. Middleboxes that Manipulate TCP Options One class of middleboxes may strip, duplicate, or modify TCP options and/or drop packets with unknown TCP options, and this may even depend on whether the SYN flag is set or not. If a middlebox removes Scharf Expires January 2, 2011 [Page 24] Internet-Draft Multi-Connection TCP July 2010 MCTCP's TCP options in SYN segments, multipath transport will not be enabled at all (if that middlebox is on the path of the initial connection), or not over that path (if the middlebox is on the path of a potential coupled connection towards another address). Still, data transfer over the initial connection or other coupled connection(s) can continue without being significantly affected. Other TCP options that could be used by MCTCP are non-mandatory, i. e., the data integrity is not affected when these options are stripped or duplicated. 8.2. Middleboxes that Change Content Other middleboxes may rewrite the content of the TCP payload and possibly also its length (e. g., by rewriting URIs). MCTCP, as well as other multipath transport solutions, requires a session-level sequence number space for the in-order reassembly of the application data. If a middlebox changes the content and/or length on the initial connection or on coupled connections, it may be impossible to correctly reassemble the byte stream at the receiver. MCTCP will in many cases be able to detect changes of content over coupled connections, as it looses track of the TLV framing on that connection. It could even use a session layer checksum to detect such content modifications, plus an additional error recovery and retransmission scheme that would transfer such content over an alternative connection. If a middlebox changes the length of the byte stream on the initial connection, the sequence numbers at sender and receiver will not be synchronized when entering multi-connection mode, and there could be e. g. a gap. MCTCP keeps the initial connection open even in multi- connection mode. Therefore, in case of problems, it could use the initial connection to resend the whole byte stream as it would have been transfered in a single TCP connection, including potential middlebox modifications. But this requires that the corresponding data is still buffered at the sender and not yet delivered at the receiver. Such possible extensions of MCTCP could be added in a future version of this document. 8.3. Middleboxes that Translate Addresses/Ports NAPT middleboxes that are unaware of MCTCP create two problems: First, as hosts have local addresses only, and the global addresses are not necessarily known to host behind the NAPT, it may not be possible to advertise addresses to the other endpoint. Second, it may be impossible for one endpoint to open a coupled TCP connection to an endpoint sitting behind a NAPT middlebox. Scharf Expires January 2, 2011 [Page 25] Internet-Draft Multi-Connection TCP July 2010 In order to address the latter issue, MCTCP defines the Mode option. With that option, one endpoint can ask the other endpoint to enter multi-connection mode. As shown in Figure 12, sending this TCP option is useful if one endpoint has multiple public IP addresses, but cannot anounce them over the initial connection. If the host behind the NAPT middlebox receives the option and establishes a coupled connection, this can be used to convey the information about the other public address, and a coupled connection to that address can then be established, too. Host A NAPT Host B ------------------------ // ------------------------ Address A1 Address A2 // Address B1 Address B2 (private) (private) // (public) (public) ---------- ---------- // ---------- ---------- | | // | | |---------SYN+MPCAP------//--------->| | ^ |<-----SYN/ACK+MPCAP-----//----------| | | Single- | | // | | | conn. |###Initial connection###//##########| | | mode | | // | | V ~ ~ ~~ ~ ~ | | // | | |<--------Mode option----//----------| | | | // | | |---------SYN+JOIN-------//--------->| | |<------SYN/ACK+JOIN-----//----------| | ^ | | // | | | |#1st coupled connection#//##########| | | | | // | | | |<=MCTCP Add. Address B2=//==========| | | Multi- | | // | | | conn. |---------SYN+JOIN-------//----------------------->| | mode |<------SYN/ACK+JOIN-----//------------------------| | | | // | | | |#2nt coupled connection#//########################| V | | // | | Figure 12: Example use of the Mode option 8.4. Middleboxes that Want to Control MCTCP Traffic Given that MCTCP transports control information in the payload, it is more complex for middleboxes to parse and potentially modify MCTCP's control information. In order to do so, a middlebox must perform deep packet inspection and it has to parse the MCTCP session messages in the TCP connection. This may prevent certain operations and optimizations by middleboxes. However, it should be noted that Scharf Expires January 2, 2011 [Page 26] Internet-Draft Multi-Connection TCP July 2010 middleboxes cannot affect the payload in TLS neither, i. e., MCTCP is somehow similar to TLS in that sense. As a remedy, it could be possible to define a TCP option that contains an offset field with a pointer to the first byte of an MCTCP control message, so that a middlebox can find control messages without parsing the whole byte stream of a coupled TCP connection. Yet, such an option would be subject to all limitations of sporadically added TCP options. A middlebox that wants to prevent MCTCP usage can drop SYN segments containing the "Join" TCP option without causing any significant harm. If that middlebox is on the path of the initial connection, MCTCP will continue using the backward-compatible initial TCP connection only. If the middlebox is on the path towards another address, i. e., if the multi-connection mode is already entered, MCTCP will not establish an additional coupled connection. In both cases, no TLV-encoded content will pass that middlebox. Instead of dropping SYN segments with the "Join" TCP option, a middlebox could also strip the "Join" option, as the setup of a coupled connection will then fail. This method would avoid timeouts and further retransmission attempts by the sender. Alternatively, a middlebox could remove the "Multipath Capable" TCP option from SYN segments. Then, MCTCP will be identical to a standard TCP connection and never try to switch to multi-connection mode. However, it is not recommended to drop SYN segments containing the "Multipath Capable" TCP option as a means to prevent MCTCP, since this needlessly results in a longer connection setup time, and since just dropping segments with the "Join" option would be sufficient. 9. Open Issues o Avoiding inconsistencies when switching in parallel to multi- connection mode. o MCTCP does not support out-of-band TCP signaling transport (urgent flag). 10. Security Considerations A generic threat analysis for the addition of multipath capabilities to TCP is presented in [7]. MCTCP is designed along the assumptions of that document, with some enhancements. In general, MCTCP is subject to similar security threads like [6], but due to its extensibility, additional protection mechanisms could be incorporated in a future version. For instance, MCTCP can employ more secure mechanisms to protect the coupling of TCP connections, even by cryptographic keys like in TLS. Scharf Expires January 2, 2011 [Page 27] Internet-Draft Multi-Connection TCP July 2010 MCTCP uses a 32bit token only, in order to save TCP option space in SYN segments. This is reasonable, as this token is only required to authenticate the initiator of the first coupled connection, which must use the same IP source and destination address like the initial connection, i. e., off-path attacks are not possible. Coupled connections that are added subsequently could use a more secure protection scheme at the MCTCP session layer, either by longer 64bit tokens, or even by cryptographic methods, which could be exchanged by corresponding MCTCP control messages (not specified in this version of the document). This section will be extended in a later version of this document. 11. IANA Considerations This document will make a request to IANA to allocate new values for TCP option identifiers: o OPT_MPCAP ("Multipath Capable" option) o OPT_JOIN ("Join" option in order to add a coupled connection to the MCTCP session) o OPT_MODE ("Mode" option that requests change from single- connection to multi-connection operation mode) This document also defines several types of MCTCP messages: o MSG_CHUNK ("MCTCP Data Chunk") o MSG_AADD4 ("MCTCP Additional IPv4 Address") o MSG_AADD6 ("MCTCP Additional IPv6 Address") o MSG_RADD4 ("MCTCP Remove IPv4 Address") o MSG_RADD6 ("MCTCP Remove IPv6 Address") 12. Conclusion Multi-connection TCP transport is a simple, modular, and extensible solution to enable reliable transfer over multiple paths. This specification defines the protocol on top of the TCP byte stream, the few required extensions of TCP, and the light-weight interface between MCTCP and each TCP connection. In summary, MCTCP is a reasonable and incrementally deployable alternative to a signaling mechanism that uses TCP options only. Scharf Expires January 2, 2011 [Page 28] Internet-Draft Multi-Connection TCP July 2010 13. Acknowledgments Michael Scharf is supported by the German-Lab project (http://www.german-lab.de/) funded by the German Federal Ministry of Education and Research (BMBF). 14. References 14.1. Normative References [1] Postel, J., "Transmission Control Protocol", STD 7, RFC 793, September 1981. [2] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP Selective Acknowledgment Options", RFC 2018, October 1996. [3] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [4] Dierks, T. and E. Rescorla, "The Transport Layer Security (TLS) Protocol Version 1.2", RFC 5246, August 2008. 14.2. Informative References [5] Ford, A., Raiciu, C., Barre, S., and J. Iyengar, "Architectural Guidelines for Multipath TCP Development", draft-ietf-mptcp-architecture-00 (work in progress), March 2010. [6] Ford, A., Raiciu, C., and M. Handley, "TCP Extensions for Multipath Operation with Multiple Addresses", draft-ford-mptcp-multiaddressed-03 (work in progress), March 2010. [7] Bagnulo, M., "Threat Analysis for Multi-addressed/Multi-path TCP", draft-ietf-mptcp-threat-02 (work in progress), March 2010. [8] Raiciu, C., Handley, M., and D. Wischik, "Coupled Multipath- Aware Congestion Control", draft-raiciu-mptcp-congestion-01 (work in progress), March 2010. [9] Scharf, M. and A. Ford, "MPTCP Application Interface Considerations", draft-scharf-mptcp-api-01 (work in progress), March 2010. Appendix A. Possible Future MCTCP Extension This memo describes the baseline specification of MCTCP and the required minimum set of functions. A future version of this Scharf Expires January 2, 2011 [Page 29] Internet-Draft Multi-Connection TCP July 2010 specification may additionally add several other features to MCTCP, such as: o Exchange of longer tokens (e. g., 64bit) for connection coupling, using MCTCP control messages. o Signaling messages to exchange policy information concerning the usage of the coupled TCP connections. o A signaling message that advertises combination of addresses and port numbers, e. g., to deal with corresponding policies on one endpoint. o A signaling message that advertises additional addresses in another format, e. g., as URI. o MCTCP session level acknowledgements. As the indidual coupled TCP connections provide already reliable transport, the session error recovery must only deal with connection failure or middlebox problems. A simple acknowlegdement mechanism may be sufficient, e. g., a NACK-based design. o A checksum in the data chunk messages. o Signaling messages to negotiate different payload encoding formats, e. g., MIME-like encoding. o MCTCP control messages that manage coupled connections, such as a method to explicitly ask for closing several connections at MCTCP layer, similar to a "DATA FIN". o A simple MCTCP session flow control mechanism, complementing TCP's flow control. o A method to return to single-connection for the unlikely case that the multi-connection mode results in corrupted data transfer, due to data stream modifications by middleboxes. o A negotiation whether to indeed keep the initial connection established in multi-connection mode, assuming that it could either be closed or reused like a coupled connection. o A variant of this protocol that uses TLV-encoded message transport right from the beginning. o A method to discover and negotiate features between the two MCTCP session endpoints, e. g., by Hello messages similar to TLS. Scharf Expires January 2, 2011 [Page 30] Internet-Draft Multi-Connection TCP July 2010 Further studies are needed to determine whether some of these functions should be added to MCTCP. If so, their implementation may partly be optional and negotiated between the session endpoints. The baseline MCTCP design should be kept as simple as possible. Author's Address Michael Scharf Alcatel-Lucent Bell Labs Lorenzstrasse 10 70435 Stuttgart Germany EMail: michael.scharf@alcatel-lucent.com Scharf Expires January 2, 2011 [Page 31]