MPTCP Working Group Kesava. Krupakaran Internet-Draft Aravind Prasad Sridharan Intended Status: Informational Shathish Muthu Venkatesan Expires: October 9, 2015 DELL April 7, 2015 Optimized Multipath TCP subflows using Traceflow draft-aravind-mptcp-optimized-subflows-00 Abstract This document proposes a solution for optimized usage of MPTCP and its subflows. Status of this Memo This Internet-Draft is submitted to IETF in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/1id-abstracts.html The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html Copyright and License Notice Copyright (c) 2015 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect Expires October 9, 2015 [Page 1] INTERNET DRAFT April 7, 2015 to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 2 2. Issues existing currently in MP TCP . . . . . . . . . . . . . 3 3. Proposed Solution . . . . . . . . . . . . . . . . . . . . . . 3 3.1 Overall Protocol Operation . . . . . . . . . . . . . . . . . 3 3.1.1. Transmitting and receiving the TraceFlow frames . . . . 4 3.1.2 Interpreting the response frames . . . . . . . . . . . . 4 3.1.3. Generating topology between the source and the destination . . . . . . . . . . . . . . . . . . . . . . 4 3.1.4. Detecting the number of sub-flows to be used . . . . . 4 3.1.5. Modifying the flow parameters . . . . . . . . . . . . . 5 3.2. Modifications to Traceflow protocol . . . . . . . . . . . 5 3.2.1. Flow Discovery Request packet . . . . . . . . . . . . . 5 3.2.2. Flow Discovery Request TLV . . . . . . . . . . . . . . 6 3.2.3. Flow Discovery Response TLV . . . . . . . . . . . . . . 7 4. Advantages of proposed solution . . . . . . . . . . . . . . . . 8 5. Security Considerations . . . . . . . . . . . . . . . . . . . 8 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 8 7. References . . . . . . . . . . . . . . . . . . . . . . . . . . 9 7.1. Normative References . . . . . . . . . . . . . . . . . . . 9 7.2. Informative References . . . . . . . . . . . . . . . . . . 9 8. Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 9 1. Introduction Multipath TCP is a modified version of regular TCP that implements a multipath transport service enabling a transport connection to operate across multiple paths simultaneously and transparently to the application. 1.1. Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119]. Expires October 9, 2015 [Page 2] INTERNET DRAFT April 7, 2015 2. Issues existing currently in MP TCP The path management in MPTCP is a separate function from the packet scheduling, subflow interface, and congestion control functions. Currently, no efficient mechanisms exist to identify the exact set of available multiple paths between any Source and Destination Hosts. And due to the absence of any quantitative approach to identify the exact set of multipaths existing between the hosts, the decision on number of subflows to be used in MP TCP are taken arbitrarily. The problem with this approach is that, the number of subflows chosen could either be more than the actual existing number of multiple paths in network (excessive overhead on MPTCP) or less than the existing number of paths (under-utilization of available multiple paths). Also, if the multiple paths may combine to same single path for the most of the hops, the use of multiple sub flows may not be efficient and could increase the computation costs and become counterproductive. One well known mechanism is to provide different address pairs at source and destination hosts if they are multi-homed respectively so that MP TCP can initiate the corresponding number of subflows. This approach experiences the same set of problems discussed above and the number of subflows may not map efficiently with the actual number of multiple paths existing in the network between the hosts. 3. Proposed Solution The basic idea is to determine the exact set of multiple paths existing between the source and destination Hosts and make decision on number of subflows based on it. We propose to use a lighter and modified version of Traceflow protocol ([I-D.janapath-intarea-traceflow]). It helps to determine the path taken by a flow through a network and also capture all relevant information at each hop of the network that pertains to the flow. The complete explanation of modified version of Traceflow protocol is provided in section 3.2. 3.1 Overall Protocol Operation We would like to split the protocol operation into following steps for better understanding: 1. Transmitting and receiving the TraceFlow frames 2. Interpreting the response frames Expires October 9, 2015 [Page 3] INTERNET DRAFT April 7, 2015 3. Generating topology between the source and the destination 4. Detecting the number of sub-flows to be used 5. Modifying the flow parameters If the source has got multiple network interfaces to the destination, all the above steps would be carried out on each of the network interface individually. 3.1.1. Transmitting and receiving the TraceFlow frames The source will first send a TraceFlow detection frame with special MAC address. This frame will encapsulate the flow for which the path has to be detected. Upon receiving this frame, the intermediate switches/routers will read the flow parameters and determine all possible egress ports (LAG or ECMP), LAG/ECMP hash logic to be used to choose the egress port among all the egress ports, actual egress port that will be chosen for the given flow and send back these details (through the response frame explained above) to the sender. 3.1.2 Interpreting the response frames When an intermediate device receives the TraceFlow request frame, it will propagate it on all possible egress interfaces. If a LAG exists with the next hop router/switch, then multiple copies of the TraceFlow request frame would be sent to the same device. Hence, the receiving device would now reply for each of the response packet resulting in multiple replies being sent to the source. Therefore, receiving multiple replies from an intermediate source implies that multiple paths share a common network device. The same is applicable when ECMP exists along the network path. On the other hand, if the source receives only one reply frame from each of the intermediate device, then it is inferable that only one unique path exists from source to destination. 3.1.3. Generating topology between the source and the destination Once the source receives the reply from all of the intermediate devices and the destination, it computes the topology between the source and the destination. The result of this computation provides the possible paths (similar to linked-list of connected nodes) existing between the source and the destination and the number of common routers that multiple paths share. 3.1.4. Detecting the number of sub-flows to be used Expires October 9, 2015 [Page 4] INTERNET DRAFT April 7, 2015 The above steps gives us the set of paths that exist between the source and destination. Then we set overlap of each of the path with another paths in the set. If more than 50% of the devices overlap, then we pick only one path between those two paths. After overlapping and redundant path removal is done, this will result in the number of reasonably unique paths that exist from source to destination. We will use as many number of sub-flows as reasonably unique path in the MP-TCP session. 3.1.5. Modifying the flow parameters Even after initiating the required number of sub-flows, we cannot guarantee that each of the sub-flow will choose the required unique path as identified in the above step. This is where we make use of ECMP/LAG hash logic used in the intermediate device. The objective of this step is to tweak the sub-flow parameters in such a way (using the hash login information received from intermediate devices) to make sure that sub-flow passes through the required unique path as determined in step (4). 3.2. Modifications to Traceflow protocol We propose the following changes to Traceflow protocol: 1) Since, Traceflow packets are encapsulated with UDP, the Layer-2 switches may not able to identify and process them. Hence, we propose to use a special MAC address for Traceflow packets instead of UDP encapsulation. 2) Use of slightly modified version of the Flow Discovery Request packet, Request and Response TLVs. Other additional TLVs specified in Traceflow protocol may not be required. The following packet would be sufficient for the protocol usage: 1) Flow Discovery Request packet And the following TLVs will be used for carrying flow related info. 1) Flow Discovery Request TLV 2) Flow Discovery Response TLV 3.2.1. Flow Discovery Request packet Expires October 9, 2015 [Page 5] INTERNET DRAFT April 7, 2015 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Version | Hopcount | Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type | Reserved | Query ID | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | TLVs... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Version: The version number of the protocol. This document defines protocol version 1. Hopcount: Allows keeping track of the number of transit nodes that processed the Flow Discovery Request packet. This field is decremented at each device that processes the Flow Discovery Request packet. This field also helps in determining if there were any legacy devices not supporting TraceFlow protocol along the way. Length: Length of the packet Type: 1 Flow Discovery Request 2 Response for the Flow Discovery Request Query ID: A unique identifier generated by the originator that allows it to co-relate the responses from the transit nodes with the Flow Discovery Request packet generated. 3.2.2. Flow Discovery Request TLV This TLV is included in the Flow Discovery Request packet and identifies the traffic flow that the originator device is interested in probing. This is a mandatory TLV. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type | Code | Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Flow Information | padding | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Type: The type of the TLV. In this case, the value is 1 meaning Flow Expires October 9, 2015 [Page 6] INTERNET DRAFT April 7, 2015 Descriptor TLV Code: The Code identifies the sub-type of the TLV. In this case, this field is not defined. It SHOULD be set to 0. Length: The length of the TLV Flow information : This specifies the flow. For example, incase of TCP/IP network, this is src IP, dest IP, src port, dest port. Padding: This might be necessary to ensure the packet ends on a word boundary. 3.2.3. Flow Discovery Response TLV This TLV is used by the devices processing the Flow Discovery Request packet to provide the information requested by the originator device. This is a mandatory TLV. It should be included in the response sent to the device originating the Flow Discovery Request packet. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type | Code | Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Data ... | padding | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Type: 2 Response for the Flow Discovery Request Code: 1 Interface ifIndex that will be chosen for the given flow 2 LAG/ECMP details - list of ifIndexes of channel members if the destination port is a LAG/ECMP 3 Hash algorithm - hash algorithm used and parameters used for hash computation. For the Hashing algorithm, the data part will provide the info of the parameters used for hashing and algorithms used. The hashing parameters and algorithm are provided as Sub-TLVs in data part. Sub-TLV code: 1 - Hashing parameter 2 - Hash Algorithm Expires October 9, 2015 [Page 7] INTERNET DRAFT April 7, 2015 Sub-TLV value part will be 1 byte of length and contain the code value of the corresponding field. Following shows a sample set of hashing parameters and Algorithms with codes. Hash Name Code 1 Source MAC Address 2 Destination MAC Address 3 Source IP Address 4 Destination IP Address 5 TCP Source Port 6 TCP Destination Port 7 UDP Source Port 8 UDP Destination Port 9 VLAN ID Algorithm Name Code 1 RTAG-7 2 Vendor-specific 4. Advantages of proposed solution 1. Optimized use of actual available paths in network Helps to avoid possible Network Congestions by preventing multiple flows following a single path in network. 2. Reduce unnecessary MP TCP transport overhead: Helps to avoid forming multiple unnecessary flows and thereby reducing the processing overload. 5. Security Considerations This document does not introduce any new security concerns or any other specifications referenced in this document. 6. IANA Considerations No IANA actions required. Expires October 9, 2015 [Page 8] INTERNET DRAFT April 7, 2015 7. References 7.1. Normative References [RFC793] Postel, J., "Transmission Control Protocol", STD 7, RFC 793, September 1981. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. 7.2. Informative References [RFC2992] Hopps, C., "Analysis of an Equal-Cost Multi-Path Algorithm", RFC 2992, November 2000. [RFC6356] Raiciu, C., Handley, M., and D. Wischik, "Coupled Congestion Control for Multipath Transport Protocols", RFC 6356, October 2011. [RFC4960] Stewart, R., "Stream Control Transmission Protocol", RFC 4960, September 2007. [I-D.zinjuvadia-traceflow] Zinjuvadia, V., Manur, R., Krishnamurthy, S., and Viswanathan, "TraceFlow Extended", draft-zinjuvadia-traceflow-02, Feb 2009. [I-D.janapath-intarea-traceflow] Janardhanan, N., Balaji, V., Rich, G. and Peter, H, "Traceflow", janapath-intarea-traceflow-00, Jan 2012. 8. Authors' Addresses Kesava Vijaya Krupakaran India Phone: +91 9894847772 Email: Kesav.j@gmail.com Shathish Muthu Venkatesan DELL Olympia Technology Park Guindy, Chennai 600032 India Phone: +91 44 4220 1619 Expires October 9, 2015 [Page 9] INTERNET DRAFT April 7, 2015 Email: Shathish_Venkatesan@Dell.com Aravind Prasad Sridharan DELL Olympia Technology Park Guindy, Chennai 600032 India Phone: +91 44 4220 8658 Email: aravind_sridharan@dell.com Expires October 9, 2015 [Page 10]