ALTO WG X. Wang Internet-Draft S. Dong Intended status: Informational Tongji University Expires: January 21, 2016 G. Chen Huawei Technologies July 20, 2015 Design and Implementation of Large Data Transfer Coordinator draft-wang-alto-large-data-framework-01.txt Abstract The Application-Layer Traffic Optimization (ALTO) protocol provides network information with the goal of improving both application performance and network resource utilization. As data transfers become larger (e.g., due to big data analysis), more data transfers are concurrent but with service requirements, and more network capabilities are emerging (e.g., SDN allowing a data transfer to request specific routes or Qos), the management of large data transfers has become an increasingly challenging issue. This document introduces Data Transfer Coordinator (DTC), a centralized data transfer scheduling framework which provides Scheduling Hub Service (SHS) to coordinate and schedule large data transfers. DTC considers all three components: data transfer requirements, (ALTO) network information, and SDN control capabilities. This document specifies not only the basic framework of DTC, but also a key component, service API for SHS to specify data transfers and their relations. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on January 21, 2016. Wang, et al. Expires January 21, 2016 [Page 1] Internet-Draft Large Data Transfer Coordinator July 2015 Copyright Notice Copyright (c) 2015 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 2. Requirements Language . . . . . . . . . . . . . . . . . . . . 3 3. Terminology and Notation . . . . . . . . . . . . . . . . . . 3 4. Data Transfer Coordinator Framework . . . . . . . . . . . . . 4 4.1. Architecture . . . . . . . . . . . . . . . . . . . . . . 4 4.2. Job Collector . . . . . . . . . . . . . . . . . . . . . . 5 4.3. ALTO Client . . . . . . . . . . . . . . . . . . . . . . . 5 4.3.1. PASSIVE and ACTIVE Mode . . . . . . . . . . . . . . . 6 4.4. Task Scheduler . . . . . . . . . . . . . . . . . . . . . 6 4.4.1. Priority Model . . . . . . . . . . . . . . . . . . . 6 4.5. DTN Controller . . . . . . . . . . . . . . . . . . . . . 7 5. Scheduling Hub Service . . . . . . . . . . . . . . . . . . . 7 5.1. Application Compute-Transfer Structure . . . . . . . . . 8 5.2. Abstract Computation . . . . . . . . . . . . . . . . . . 8 5.3. DataTransferTask and SyncTask . . . . . . . . . . . . . . 9 5.4. Service API . . . . . . . . . . . . . . . . . . . . . . . 10 6. Example . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 7. Security Considerations . . . . . . . . . . . . . . . . . . . 12 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 12 9. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 12 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 12 10.1. Normative References . . . . . . . . . . . . . . . . . . 12 10.2. Informative References . . . . . . . . . . . . . . . . . 12 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 13 1. Introduction There is substantial need to manage large data transfers. Considering limited network resources such as bandwidth, inappropriate handling large data transfer would reduce performance significantly. It could be easier to cause network congestion than Wang, et al. Expires January 21, 2016 [Page 2] Internet-Draft Large Data Transfer Coordinator July 2015 low traffic. Congested network can result in higher rate of packet loss, then triggers retransmissions, which can cripple already heavily loaded networks. It's necessary to manage large data transfer not only for high network resource utilization but also for users' experience aspect. Scheduling data flows needs network information such as available bandwidth between two transfer nodes. ALTO defines cost maps providing cost between two pids and endpoint cost service for two endpoints. By utilizing these network information, application can determine how to allocate bandwidth for each data flow. However, to archive such scheduling, there needs a centralized coordinator that can be aware of every data flow requirements. Moreover, to get the customized requirements for each data transfer, a general interface is need to obtain the correlation among data flows besides single data flow requirements. This document introduces a centralized framework, Data Transfer Coordinator (DTC), which provides Scheduling Hub Service (SHS) for applications. SHS implements common functionalities for data transfers and provides cross-app coordination for achieving better network-wide utility. Also SHS provides a general API for applications to express data transfer relations by using two basic structures, DataTransferTask and SyncTask. This document is organized as follows: Section 3 defines the Terminology and Notation in this document. Section 4 gives the details of SHS for scheduling large data transfer. Section 5 gives details of service API designed. Section 6 gives a MapReduce example for specifying relations between data transfers. 2. Requirements Language The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119]. 3. Terminology and Notation This document uses the following additional terms:DTC, SHS, Job, Task. o DTC Data Transfer Coordinator. A centralized framework includes Job Collector, Task Scheduler, ALTO Client, and DTN Controller to provide data transfer scheduling service to applications. See more detailed description in Section 4. Wang, et al. Expires January 21, 2016 [Page 3] Internet-Draft Large Data Transfer Coordinator July 2015 o SHS Scheduling Hub Service. Data transfer scheduling service considers both network information and data transfer requests. Data transfer requests are captured by two basic structures, DataTransferTask and SyncTask.See more detailed description in Section 5. o Job Data transfer job that is registered by applications. A job includes tasks indicating data transfers and their relations submitted by one application. See more detailed description in Section 5. o Task Including DataTransferTask and SyncTask that specifies data transfer information and their relations, respectively. See more detailed description in Section 5. 4. Data Transfer Coordinator Framework 4.1. Architecture This section describes the design details of four components of the DTC framework, 1. Job Collector; 2. ALTO Client; 3. Task Scheduler; 4. Data Transfer Nodes (DTN) Controller. Among these four modules, task scheduler is the core of the framework. Job Collector provides interface to users for submitting data transfer requests, which will be passed to task scheduler for further process. Task scheduler makes scheduling based on the network information generated by ALTO client as well as the requirements of each data transfer from tasks. After computing allocation of bandwidth for each task, task scheduler will send transfer commands to DTN controller to start data transmission. Figure 1 shows the whole process. Wang, et al. Expires January 21, 2016 [Page 4] Internet-Draft Large Data Transfer Coordinator July 2015 .-----------. | Users | '-----------' | submit jobs .- - - - - - - - - - - - - - - | - - - - - - - - - - - - - - . | .-----------. | | | Job | | | DTC | Collector | | | '-----------' | | | pass user defined tasks | | | to Task Scheduler | | .-----------. .-----------. .---------. | | | DTN |----------| Task |----------| ALTO | | | | Controller| send | Scheduler | get | Client | | | '-----------' transfer '-----------' network '---------' | | commands state | ' - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -' The benefits of DTC include: o 1. It can achieve better network resource (bandwidth) allocation since it manages all data transfer requirements in a centralized framework. o 2. It takes customized data transfer requirement into consideration by introducing DataTransferTask and SyncTask to capture correlation among data flows. o 3. It's modular to support different scheduler algorithm implementations. 4.2. Job Collector The job collector is responsible to manage data transfer requests from user and pass them to task scheduler for further process. It is important that the requests are dynamic and hence the API of the job collector allows dynamic insertion and deletion of data transfers. Details of the data transfer description and APIs for users are described in Section 5.3: Service API. 4.3. ALTO Client ALTO client will be responsible to get network state to task scheduler for further usage. Although different scheduling algorithms may request different ALTO services, cost map and endpoint cost map seems to be the most useful services for scheduling tasks. Wang, et al. Expires January 21, 2016 [Page 5] Internet-Draft Large Data Transfer Coordinator July 2015 4.3.1. PASSIVE and ACTIVE Mode ALTO client should support two modes according to the way it perceives network state changes, PASSIVE and ACTIVE. In PASSIVE mode, ALTO client will query ALTO server periodically to get latest network states. If the network state changes after one query, the ALTO client will not be aware of the change until next query. In ACTIVE mode, ALTO client will only query ALTO server once to get the initial network state. If network state changes after that, the ALTO client will be notified by ALTO server so it does not have to query ALTO server again. Note that ACTIVE mode will only be supported by ALTO server with ALTO SSE implemented. 4.4. Task Scheduler The duty of task scheduler is to assign tasks from job collector to proper data transfer nodes (DTNs), splitting a file to several partial files to different DTNs if necessary, and notify the DTN controller to initiate the transfer. We will not discuss specific algorithm in this document but we assume algorithms used by scheduler should take network states provided by ALTO client into consideration. Different schedulers may obey different principles, some schedulers aims to maximize the number of finished tasks while some try to transfer as much data as possible. 4.4.1. Priority Model In this section, we proposed a schedule model based on priority. In this model, every task will be set a predefined priority value, e.g. LOW, MEDIUM and HIGH, to indicate how important it is. The principle of this model is that tasks with higher priority have the privilege to occupy more resources such as available bandwidth. If the priority is not set, the task must be set a default one. Things become tricky when user does not specify priority but an expected finish time instead. However, in this model, it is easy to be solved by transforming expected finish time to priority by following steps: o 01. Assign the lowest priority to the task and schedule the task. o 02. Calculate the task's estimated finish time. If the estimated finish time is longer than user specified finish time, increase the task priority by one and reschedule the task, else the schedule procedure completes. o 03. Keep doing step 2 until either the schedule procedure completes or the task is assigned as highest priority. If the task is still not able to be finished, we will keep it as highest priority and transfer as much data as possible. Wang, et al. Expires January 21, 2016 [Page 6] Internet-Draft Large Data Transfer Coordinator July 2015 The specific algorithm used to adjust the resources according to the priority is not described in this document. 4.5. DTN Controller DTN controller is only responsbile for two following functions: o 01. Receive and process instructions from task scheduler, e.g. starting a new transfer, aborting a running transfer and adjusting transfer parameters such as transfer rate or number of connections. o 02. Monitor transfer status and update status changes to task scheduler. If a transfer failed or finished, it should notify task scheduler the details for further scheduling. If we assume task scheduler is a manager, then DTN controller are workers who focusing on its own job without caring anything else. DTN controllers are not able to communicate with each other, which means it does not have a global view. Since the DTN controller has to utilize DTNs to transfer data, it should be deployed either in a server able to access DTNs or in the DTNs themselves. 5. Scheduling Hub Service Introducing a systematic description of data transfer for SHS is challenging. Although it is easy to describe each individual data transfer, this simple description method is not sufficient for a centralized data transfer coordinator because it is not capable of representing relations, e.g. dependencies, between different data transfers. To solve this problem, this section first introduces the concept of Application Compute-Transfer Structure (ACTS) that captures the computation logic of application. ACTS includes the two basic components, data computation and data transfer. We find that for many data processing applications, they are composed of several data computations and several data transfers by which data computations are linked as a complete data processing. For example, MapReduce job includes mappers and reducers as data computation components, and data transfers act as connections between mappers and reducers. However, for SHS, it doesn't need the exact computation at data computation nodes, but the enough knowledge to reflect the dependency between data transfers. Hence, we provide the ability of abstracting computation to applications for expressing dependency anf coordination between data transfers. By abstracting data computation, application can define the relation between data Wang, et al. Expires January 21, 2016 [Page 7] Internet-Draft Large Data Transfer Coordinator July 2015 transfers to/from one data computation node or a cluster of nodes, for expressing coarse grained dependency. Finally, to map the concept to the design, SHS service API includes two transfer task types, DataTransferTask and SyncTask, which defines the basic data transfer information and relations between data transfers, respectively. 5.1. Application Compute-Transfer Structure For many applications, the whole data processing would be divided into several pieces of small data computations depending on the different roles of servers, e.g., the MapReduce job is divided into two types of tasks, mapper and reducer, based on the role of servers. All partial data computations are linked by data transfers which transmit the result of computation from one place to another. By the joint collaboration of all small data computations, the application achieves the specific data processing. Then we use Application Compute-Transfer Structure (ACTS) which includes data computation and data transfer to convey the computation pattern of application. The mapping from computation logic of application to ACTS should be very obvius since it only includes data computation and data transfer. By using ACTS, the computation logic of application can be defined as several data computations and several data transfers which link data computations, i.e., a Directed Acyclic Graph (DAG), in which each node is data computation and each link is data transfer. 5.2. Abstract Computation For SHS, it doesn't need to know the exact computation of each data computation nodes in ACTS. But to schedule data transfers submitted by different applications, SHS needs the information about the relation between data transfers, such as dependency and coordiantion. The relation between data transfers is defined at data computation nodes. To achieve a collaboration of multiple data computation, each data computation must rely on the result of others. The dependency of data computations defines the relation of data transfers which is needed by SHS. Hence, to express the relation of data transfers, for a better scheduling, application should abstract its data computations In this document, we define some attributes (dependency type, throughput matching, pipelining or blocking, and deadline) that can be used for abstract computation. Dependency type includes two values, all and one, to specify when to start the output data trnsfer at data computation nodes. All indicates the output data transfer cannot start until all input data transfers (at the same data Wang, et al. Expires January 21, 2016 [Page 8] Internet-Draft Large Data Transfer Coordinator July 2015 computation node) finishes, and one indicates if one input data transfer finishes, it can start output data transfer instead of waiting for other input data transfers. Throughput matching will defines the throughout relation between input data transfers and output data transfers. E.g., application needs a higher throughput for output data transfers than input ones. Pipelining and blocking indicates whether should the output data transfers wait the finishing of input data transfers or not. Deadline specifies the deadline for add dependent data transfers. 5.3. DataTransferTask and SyncTask In this section, we define two types of task for mapping the concept to design of service API. DataTransferTask defines the basic information of data transfers while SyncTask defines the relation between data tansfers, i.e., abstract computation. The schema for DataTransferTask (dtt) representation is described as following: object { ResourcePath src; ResourcePath dst; JSONNumber dataSize; JSONNumber offset; [JSONString deadline;] } DataTransferTask; object { JSONString dependencies<1..*>; Attributes attributes<1..*>; } SyncTask; object { JSONString ss_id; JSONString path; } ResourcePath object { JSONString -> JSONString; } Attributes; with fields: o src Wang, et al. Expires January 21, 2016 [Page 9] Internet-Draft Large Data Transfer Coordinator July 2015 This field specifies the source of data transfer. o dst This field specifies the destination of data transfer. o ResourcePath This field identifies a unique resource in multiple storege systems. Since a storage system could be connected by multiple data transfer nodes, it is not accurate to identify a resource by server host and file path anymore. To solve this problem, DTC will assign every connected storage system a unique id. Thus, users can combine ss_id, which is the unique storage system id, and file_path, which indicates location of the file in the corresponding storage system, to identify a unique resource. o dataSize This field specifies the size of data to transport. o offset This field specifies the offset of data. This provides the flexibility to application to split the data and transport them separately. o dependencies This field specifies the dependencies of the SyncTask. Mapping to the ACTS, dependency of a SyncTask is the input data transfer of a data computation node. o attributes This field specifies the attributes of the SyncTask. Attributes is key-value that key is the attributes name and value is the attributes value. Attributes can be dependency type of throughput matching as described. 5.4. Service API Normally, users will register transfer jobs to include all conrresponding DataTransferTasks and SyncTasks. While a transfer jon is running, the user should be able to add tasks to or remove tasks from the job dynamically. To enable these features, a job collector should provide the following five functions for user: Wang, et al. Expires January 21, 2016 [Page 10] Internet-Draft Large Data Transfer Coordinator July 2015 o register() This function creates a new transfer job. It must return a job id for user to identify the job created. If the creation fails, it must throw an error. o unregister(job_id) This function aborts a running transfer job. It accepts a job_id parameter and must abort all tasks belonging to the job. The function return value should indicate if the abort action succeeds or not. If the job does not exist, it must throw an error. o createTaskDesc(type, [args]) This function creates a task description satisfying the structure defined above. Type argument specifies the type of task, DataTransferTask or SyncTask. Args list specifies the content of the task, for DataTransferTask, it includes src, dst, dataSize, offset, and deadline; for SyncTask, it includes dependencies and attributes. This function returns the specified task for further operations. o addTask(job_id, task) This function adds a new task to a existing job. This function accepts a job_id and a task as parameters. It must return a task id for user to identify the added task. If the creation fails, it must throw an error. o removeTaskS(job_id, task_id,) This function removes a task from a existing job. This function accepts a job_id and a task_id. The job_id and task_id will identify a unique task to be removed. The function return value should indicate if the remove action succeeds or not. 6. Example Suppose a MapReduce job has 10 mappers and 5 reducers. Each mapper transfers data to each reducer. There will be 50 data transfers in all. Application wants to express its requirements that minimize the finishing time of all transfers, not one individual transfer. Here we give a JSON example to show what should be sent to job collector for adding a DataTransferTask and a SyncTask to existing transfer job. After application added a DataTransferTask to transfer job, it will receive a task_id to identify the task (task_01, ..., task_50). Then it will use those task_id to add a SyncTask. Wang, et al. Expires January 21, 2016 [Page 11] Internet-Draft Large Data Transfer Coordinator July 2015 { "job-id": "job_00", "task": { "type": "data-transfer-task", "src": "http://192.168.0.0/bigdata/mapreduce/map0.data", "dst": "http://192.168.1.0/bigdata/mapreduce/reduce0.data", "data-size": "100", "offset": "0" } } { "job-id": "job_00", "task": { "type": "sync-task", "dependencies": [ "task_01", "task_02",..., "task_50" ], "dependency_type": "all" } } 7. Security Considerations This document has not conducted its security analysis. 8. IANA Considerations This document does not specified its IANA considerations, yet. 9. Acknowledgments The authors thank discussions with Yicheng Qian. 10. References 10.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/ RFC2119, March 1997, . 10.2. Informative References Wang, et al. Expires January 21, 2016 [Page 12] Internet-Draft Large Data Transfer Coordinator July 2015 [RFC7285] Alimi, R., Ed., Penno, R., Ed., Yang, Y., Ed., Kiesel, S., Previdi, S., Roome, W., Shalunov, S., and R. Woundy, "Application-Layer Traffic Optimization (ALTO) Protocol", RFC 7285, DOI 10.17487/RFC7285, September 2014, . Authors' Addresses Xin Wang Tongji University 4800 Cao'an Road, Jiading District Shanghai China Email: xinwang2014@hotmail.com Shu Dong Tongji University 4800 Cao'an Road, Jiading District Shanghai China Email: dongs2011@gmail.com Guohai Chen Huawei Technologies 101 Software Avenue, Yuhua District Nanjing China Email: chenguohai@huawei.com Wang, et al. Expires January 21, 2016 [Page 13]