Internet DRAFT - draft-yangcan-ietf-data-migration-standards

draft-yangcan-ietf-data-migration-standards







Internet Engineering Task Force                             C. Yang, Ed.
Internet-Draft                                  Y.Liu&Y.Wang&SY.Pan, Ed.
Intended status: Standards Track    South China University of Technology
Expires: November 28, 2020                                       C. Chen
                                                                  Inspur
                                                                 G. Chen
                                                                    GSTA
                                                                  Y. Wei
                                                                  Huawei
                                                            May 27, 2020


                   A Massive Data Migration Framework
             draft-yangcan-ietf-data-migration-standards-04

Abstract

   This document describes a standardized framework for implementing the
   massive data migration between traditional databases and big-data
   platforms on the cloud via Internet, especially for an instance of
   Hadoop data architecture.  The main goal of the framework is to
   provide more concise and friendly interfaces for users more easily
   and quickly migrate the massive data from a relational database to a
   distributed platform for a variety of requirements, in order to make
   full use of distributed storage resource and distributed computing
   capability to solve the bottleneck problems of both storage and
   computing performance in traditional enterprise-level applications.
   This document covers the fundamental architecture, data elements
   specification, operations, and interface related to massive data
   migration.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on November 28, 2020.




Yang, et al.            Expires November 28, 2020               [Page 1]

Internet-Draft          Data Migration Standards                May 2020


Copyright Notice

   Copyright (c) 2020 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (https://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
   2.  Definitions and Terminology . . . . . . . . . . . . . . . . .   4
   3.  Specific Framework Implementation Standards . . . . . . . . .   6
     3.1.  System Architecture Diagram . . . . . . . . . . . . . . .   6
     3.2.  Source and Target of Migration  . . . . . . . . . . . . .   7
       3.2.1.  The Data Sources of Migration . . . . . . . . . . . .   7
       3.2.2.  The Connection Testing of Relational Data Sources . .   8
       3.2.3.  The Target Storage Container of Data Migration  . . .   8
       3.2.4.  Specifying Target Cloud Platform  . . . . . . . . . .   8
       3.2.5.  Data Migration to third-party Web Applications  . . .   8
     3.3.  Type of Migrated Database . . . . . . . . . . . . . . . .   9
     3.4.  Scale of Migrated Table . . . . . . . . . . . . . . . . .   9
       3.4.1.  Full Table Migration  . . . . . . . . . . . . . . . .   9
       3.4.2.  Single Table Migration  . . . . . . . . . . . . . . .  10
       3.4.3.  Multi-table migration . . . . . . . . . . . . . . . .  10
     3.5.  Split-by  . . . . . . . . . . . . . . . . . . . . . . . .  10
       3.5.1.  Single Column . . . . . . . . . . . . . . . . . . . .  10
       3.5.2.  Multiple Column . . . . . . . . . . . . . . . . . . .  11
       3.5.3.  Non-linear Segmentation . . . . . . . . . . . . . . .  11
     3.6.  Conditional Query Migration . . . . . . . . . . . . . . .  12
     3.7.  Dynamic Detection of Data Redundancy  . . . . . . . . . .  12
     3.8.  Data Migration with Compression . . . . . . . . . . . . .  13
     3.9.  Updating Mode of Data Migration . . . . . . . . . . . . .  13
       3.9.1.  Appending Migration . . . . . . . . . . . . . . . . .  13
       3.9.2.  Overwriting the Import  . . . . . . . . . . . . . . .  13
     3.10. The Encryption and Decryption of Data Migration . . . . .  14
     3.11. Incremental Migration . . . . . . . . . . . . . . . . . .  14
     3.12. Real-Time Synchronization Migration . . . . . . . . . . .  14
     3.13. The Direct Mode of Data Migration . . . . . . . . . . . .  14
     3.14. The Storage Format of Data files  . . . . . . . . . . . .  15
     3.15. The Number of Map Tasks . . . . . . . . . . . . . . . . .  15



Yang, et al.            Expires November 28, 2020               [Page 2]

Internet-Draft          Data Migration Standards                May 2020


     3.16. The selection on the elements in a table to be migrated
           column  . . . . . . . . . . . . . . . . . . . . . . . . .  15
     3.17. Visualization of Migration  . . . . . . . . . . . . . . .  15
       3.17.1.  Dataset Visualization  . . . . . . . . . . . . . . .  15
       3.17.2.  Visualization of Data Migration Progress . . . . . .  15
     3.18. Smart Analysis of Migration . . . . . . . . . . . . . . .  16
     3.19. Task Scheduling . . . . . . . . . . . . . . . . . . . . .  16
     3.20. The Alarm of Task Error . . . . . . . . . . . . . . . . .  16
     3.21. Data Export From Cloud to RDBMS . . . . . . . . . . . . .  16
       3.21.1.  Data Export Diagram  . . . . . . . . . . . . . . . .  16
       3.21.2.  Full Export  . . . . . . . . . . . . . . . . . . . .  17
       3.21.3.  Partial Export . . . . . . . . . . . . . . . . . . .  17
     3.22. The Merger of Data  . . . . . . . . . . . . . . . . . . .  18
     3.23. Column Separator  . . . . . . . . . . . . . . . . . . . .  18
     3.24. Record Line Separator . . . . . . . . . . . . . . . . . .  18
     3.25. The Mode of Payment . . . . . . . . . . . . . . . . . . .  18
     3.26. Web Shell for Migration . . . . . . . . . . . . . . . . .  18
       3.26.1.  Linux Web Shell  . . . . . . . . . . . . . . . . . .  18
       3.26.2.  HBase Shell  . . . . . . . . . . . . . . . . . . . .  19
       3.26.3.  Hive Shell . . . . . . . . . . . . . . . . . . . . .  19
       3.26.4.  Hadoop Shell . . . . . . . . . . . . . . . . . . . .  19
       3.26.5.  Spark Shell  . . . . . . . . . . . . . . . . . . . .  19
       3.26.6.  Spark Shell Programming Language . . . . . . . . . .  19
   4.  Security Considerations . . . . . . . . . . . . . . . . . . .  19
   5.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  20
   6.  References  . . . . . . . . . . . . . . . . . . . . . . . . .  20
     6.1.  Normative References  . . . . . . . . . . . . . . . . . .  20
     6.2.  Informative References  . . . . . . . . . . . . . . . . .  20
     6.3.  URL References  . . . . . . . . . . . . . . . . . . . . .  20
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  21

1.  Introduction

   With the widespread popularization of cloud computing and big data
   technology, the scale of data is increasing rapidly, and the
   distribution computing requirements are more significant than before.
   For a long time, a majority of companies have usually use relational
   databases to store and manage their data, a great amount of
   structured data exist still and accumulate with the business
   development in legacies.  With the dairy growth of data size, the
   storage bottleneck and the performance degradation for the data when
   analyzing and processing have become pretty serious and need to be
   solved in globe enterprise-level applications.  This distributed
   platform refers to a software platform that builds data storage, data
   analysis, and calculations on a cluster of multiple hosts.  Its core
   architecture involves in distributed storage and distributed
   computing.  In terms of storage, it is theoretically possible to
   expand capacity indefinitely, and storage can be dynamically expanded



Yang, et al.            Expires November 28, 2020               [Page 3]

Internet-Draft          Data Migration Standards                May 2020


   horizontally with the increasing data.  In terms of computing, some
   key computing frameworks as mapreduce can be used to perform parallel
   computing on large-scale datasets to improve the efficiency of
   massive data processing.  Therefore, when the data size exceeds the
   storage capacity of a single-system or the computation exceeds the
   computing capacity of a stand-alone system, massive data can be
   migrated to a distributed platform.  The ability of resource sharing
   and collaborative computing provided by a distributed platform can
   well solve large-scale data processing problems.  The document
   focuses on putting forward a standard for implementing a big data
   migration framework through web access via Internet and considering
   how to help users more easily and quickly migrate the massive data
   from a traditional relational database to a cloud platform from
   multiple requirements.  Using the distributed storage and distributed
   computing technologies highlighted by the cloud platform, on the one
   hand, it solves the storage bottleneck and the problem of low data
   analyzing and processing performance of relational databases.  Based
   on the access by web, the framework supports open work state and
   promotes globe applications for data migration.

   Note: It is also permissible to implement this framework in non-web.

2.  Definitions and Terminology

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119 [RFC2119].

   The following definitions are for terms used in the context of this
   document.

   o  "DMOW": Its full name is "Data Migration on Web",it means data
      migration based on web.

   o  "Cloud Computing": Cloud computing is a pay-per-use model that
      provides available, convenient, on-demand network access, the user
      enters a configurable computing resource sharing pool (resources
      include network, server, storage, application software, services),
      these resources can be provided quickly, with little
      administrative effort or little interaction with service
      providers.

   o  "Big Data": A collection of data that cannot be captured, managed,
      and processed using conventional software tools within a certain
      time frame.  That is a massive, high growth rate and diversified
      information assets that require new processing modes to have
      stronger decision-making power, insight and process optimization
      capabilities.



Yang, et al.            Expires November 28, 2020               [Page 4]

Internet-Draft          Data Migration Standards                May 2020


   o  "Data Migration": The data migration described in this document is
      aimed at the data transfer process between a relational database
      and a cloud platform.

   o  "Data Storage": Data is recorded in a format on the computer's
      internal or external storage media.

   o  "Data Cleansing": It is a process of re-examining and verifying
      data.  The purpose is to remove duplicate information, correct
      existing errors, and provide data consistency.

   o  "Extraction-Transformation-Loading(ETL)": The processing of user
      database or data warehouse.  That is, data is extracted from
      various data sources, converted into data that meets the needs of
      the business, and finally loaded into the database.

   o  "Distributed Platform": A software platform that builds data
      storage, data analysis, and calculations on a cluster of multiple
      hosts.

   o  "Distributed File System": The physical storage resources managed
      by the file system are not directly connected to the local node.
      Instead, they are distributed on a group of machine nodes
      connected by a high-speed internal network.  These machine nodes
      together form a cluster.

   o  "Distributed Computing": A computer science discipline studies how
      to divide a problem that requires a very large amount of computing
      power into many small parts and it is coordinated by many
      independent computers to get the final result.

   o  "Apache Hadoop": An open source distributed system infrastructure
      that can be used to develop distributed programs for large data
      operations and storage.

   o  "Apache HBase": An open source, non-relational, distributed
      database.  Used with the Hadoop framework.

   o  "Apache Hive": It is a data warehouse infrastructure built on
      Hadoop.  It can be used for data extraction-transformation-
      loading(ETL), it is a mechanism that can store, query, and analyze
      large-scale data stored in Hadoop.

   o  "HDFS": A Hadoop distributed file system designed to run on
      general-purpose hardware.

   o  "MapReduce": A programming model for parallel computing of large-
      scale data sets (greater than 1 TB).



Yang, et al.            Expires November 28, 2020               [Page 5]

Internet-Draft          Data Migration Standards                May 2020


   o  "Spark": It is a fast and versatile computing engine designed for
      large-scale data processing.

   o  "MongoDB":It is a database based on distributed file storage
      designed to provide scalable, high-performance data storage
      solutions for web applications.

   o  "GHTs":the sender to send the granular header information index
      table of the information granules.

   o  "GDBs": the sender to send the granular information database of
      the information granules.

   o  "GHTr": the receiver to receive the granular header information
      index table of the information granules.

   o  "GDBr":the receiver to receive the granular information database
      of the information granules.

3.  Specific Framework Implementation Standards

   The main goal of this data migration framework is to help companies
   migrate their massive data stored in relational databases to cloud
   platforms through web access.  We propose a series of rules and
   constraints on the implementation of the framework, by which the
   users can conduct massive data migration with a multi-demand
   perspective.

   Note: The cloud platforms mentioned in the document refer to the
   Hadoop platform by default.  All standards on the operations and the
   environment of the framework refer to web state by default.

3.1.  System Architecture Diagram


















Yang, et al.            Expires November 28, 2020               [Page 6]

Internet-Draft          Data Migration Standards                May 2020


   Figure 1 shows the working diagram of the framework.


       +---------+         +----------------+
       |         |   (1)   |    WebServer   |
       | Browser |-------->|                |---------------------
       |         |         |  +-----------+ |                    |
       +---------+         |  |   DMOW    | |                    |
                           |  +-----------+ |                    |
                           +----------------+                    |
                                                                 |(2)
                                                                 |
                                                                 |
           +-------------+          +-----------------------+    |
           |             |   (3)    |                       |    |
           | Data Source |--------> |     Cloud Platform    |    |
           |             |          |  +-----------------+  |<----
           +-------------+          |  | Migration Engine|  |
                                    |  +-----------------+  |
                                    +-----------------------+



                      Figure 1:Reference Architecture

   The workflow of the framework is as follows:

      Step (1) in the figure means that users submit the requisition of
      data migration to DMOW through browser(the requisition includes
      data source information, target cloud platform information, and
      related migration parameter settings);

      Step (2) in the figure means that DMOW submits user's request
      information of data migration to cloud platform's migration
      engine;

      Step (3) in the figure means that the migration engine performs
      data migration tasks based on the migration requests it receives
      to migrate data from relational database to cloud platform;

3.2.  Source and Target of Migration

3.2.1.  The Data Sources of Migration

   This framework MUST support data migration between relational
   databases and cloud platforms on web, and MUST meet the following
   requirements:




Yang, et al.            Expires November 28, 2020               [Page 7]

Internet-Draft          Data Migration Standards                May 2020


   1.  The framework supports to connect data sources in relational
       databases.  The relational database MUST be at least one of the
       following:

       *  SQLSERVER

       *  MYSQL

       *  ORACLE

   2.  This framework MUST support the dynamic perception of data
       information in relational databases under a normal connection, in
       other words :

       *  It MUST support dynamic awareness of all tables in a
          relational database;

       *  It MUST support dynamic awareness of all columns corresponding
          to all tables in a relational database;

3.2.2.  The Connection Testing of Relational Data Sources

   Before conducting data migration, the framework MUST support testing
   the connection to the data sources that will be migrated, and then
   decide whether to migrate.

3.2.3.  The Target Storage Container of Data Migration

   This framework MUST allow users to migrate large amounts of data from
   a relational database to the following at least two types of target
   storage containers:

   o  HDFS

   o  HBASE

   o  HIVE

3.2.4.  Specifying Target Cloud Platform

   This framework MUST allow an authorized user to specify the target
   cloud platform to which the data will be migrated.

3.2.5.  Data Migration to third-party Web Applications

   This framework SHALL support the migration of large amounts of data
   from relational databases to one or multiple data containers for




Yang, et al.            Expires November 28, 2020               [Page 8]

Internet-Draft          Data Migration Standards                May 2020


   third-party Web applications.  The target storage containers of the
   third-party Web application systems can be:

   o  MONGODB

   o  MYSQL

   o  SQLSERVER

   o  ORACLE

3.3.  Type of Migrated Database

   This framework is needed to meet the following requirements:

   o  It MAY support migrating the entire relational database to the
      cloud platform;

   o  It MAY support homogeneous migration (for example, migration from
      ORACLE to ORACLE);

   o  It MAY support heterogeneous migrations between different
      databases (for example, migration from ORACLE to SQLServer);

   o  It SHALL support the migration to the MONGODB database;

   o  It's OPTIONAL that If the migration process is interrupted, it is
      needed to support automatic restart of the migration process and
      continue the migration from where it left off; Additionally, the
      framework is needed to be able to support the user in the
      following manner to inform this abnormal interruption:

      *  It MUST support popping up an alert box on the screen of the
         user;

      *  It SHALL support notifying users by email;

      *  It's OPTIONAL to notify users by an Instant Messenger as We
         Chat or QQ;

3.4.  Scale of Migrated Table

3.4.1.  Full Table Migration

   This framework MUST support the migration of all tables in a
   relational database to at least two types of target storage
   containers:




Yang, et al.            Expires November 28, 2020               [Page 9]

Internet-Draft          Data Migration Standards                May 2020


   o  HDFS

   o  HBASE

   o  HIVE

3.4.2.  Single Table Migration

   This framework MUST allow users to specify a single table in a
   relational database and migrate it to at least two types of target
   storage containers:

   o  HDFS

   o  HBASE

   o  HIVE

3.4.3.  Multi-table migration

   This framework MUST allow users to specify multiple tables in a
   relational database and migrate to at least two types of target
   storage containers:

   o  HDFS

   o  HBASE

   o  HIVE

3.5.  Split-by

   This framework is needed to meet the following requirements on split-
   by.

3.5.1.  Single Column

   1.  The framework MUST allow the user to specify a single column of
       the data table (usually the table's primary key), then slice the
       data in the table into multiple parallel tasks based on this
       column, and migrate the sliced data to one or more of the
       following target data containers respectively:

       *  HDFS

       *  HBASE

       *  HIVE



Yang, et al.            Expires November 28, 2020              [Page 10]

Internet-Draft          Data Migration Standards                May 2020


          The specification of the data table column can be based on the
          following methods:

          +  Users can specify freely;

          +  Users can specify linearly;

          +  Users can select an appropriate column for the segmentation
             based on the information entropy of the selected column
             data;

   2.  The framework SHALL allow the user to query the boundaries of the
       specified column in the split-by, then slice the data into
       multiple parallel tasks and migrating the data to one or more of
       the following target data containers:

       *  HDFS

       *  HBASE

       *  HIVE

3.5.2.  Multiple Column

   This framework MAY allow the user to specify multiple columns in the
   data table to slice the data linearly into multiple parallel tasks
   and then migrate the data to one or more of the following target data
   containers:

   o  HDFS

   o  HBASE

   o  HIVE

3.5.3.  Non-linear Segmentation

   It's OPTIONAL that this framework is needed to support non-linear
   intelligent segmentations of data for one or more columns and then
   migrate the data to one or more of the following target data
   containers:

      The non-linear intelligent segmentations refer to:

      *  Adaptive segmentation based on the distribution(density)of the
         value of numerical columns;





Yang, et al.            Expires November 28, 2020              [Page 11]

Internet-Draft          Data Migration Standards                May 2020


      *  Adaptive segmentation based on the distribution of entropy of
         subsegments of a column;

      *  Adaptive Segmentation Based on Neural Network Predictor;

      The target data container includes:

      *  HDFS

      *  HBASE

      *  HIVE

3.6.  Conditional Query Migration

   This framework SHALL allow users to specify the query conditions,
   then querying out the corresponding data records and migrating them.

3.7.  Dynamic Detection of Data Redundancy

   It's OPTIONAL that the framework is needed to allow users to add data
   redundancy labels and label communication mechanisms, then it detects
   redundant data dynamically during data migration to achieve non-
   redundant migration.

   The specific requirements are as follows:

   o  The framework SHALL be able to deep granulation processing on the
      piece of data content to be sent.  It means the content segment to
      be sent is further divided into smaller-sized data sub-blocks.

   o  The framework SHALL be able to feature calculation and forming a
      grain head for each of the decomposed particles, the granular
      header information includes but not limited to grain feature
      amount, grain data fingerprint, unique grain ID number, particle
      generation time, source address and destination address, etc.

   o  The framework SHALL be able to detect the granular header
      information to determine the transmission status of each
      information granule content that is decomposed, and if the current
      information granule to be sent is already present at the receiving
      end, the content of the granule is not sent.  Otherwise the
      current granule will be sent out.

   o  The framework SHALL be able to set a Cache at the sending port to
      cache the granular header information index table (GHTs) and the
      granular information database (GDBs) for the information granules;
      the receiver SHALL be able to set a Cache to cache the granular



Yang, et al.            Expires November 28, 2020              [Page 12]

Internet-Draft          Data Migration Standards                May 2020


      header information index table (GHTr) and the granular information
      database (GDBr) that have successfully received the information
      granular.

   o  After all the fragments of the data have been transferred, the
      framework SHALL be able to reassemble all the fragments and store
      the data on the receiving disk.

   o  The framework SHALL be able to set a granular encoder at the
      sending port, which is responsible for encoding and compressing
      the information granular content generated by the granular
      resolver.  The encoder generates a coded version of the
      corresponding information granule, and calculates the content
      about the granule head of the compressed information granule ,
      then performs transmission processing in the manner of sending the
      granule head, detecting redundant granules, synthesizing the
      granules, and accurately detecting redundant granules.

   o  The framework SHALL be able to set a granular decoder at the
      receiving port, which is responsible for decoding the encoded
      compressed granular content at the receiving port and merging it
      with the grain synthesizer, whether it comes from the sending port
      Cache or the receiving port Cache.

3.8.  Data Migration with Compression

   During the data migration process, the data is not compressed by
   default.  This framework MUST support at least one of the following
   data compression encoding formats, allowing the user to compress and
   migrate the data:

   o  GZIP

   o  BZIP2

3.9.  Updating Mode of Data Migration

3.9.1.  Appending Migration

   This framework SHALL support the migration of appending data to
   existing datasets in HDFS.

3.9.2.  Overwriting the Import

   When importing data into HIVE, the framework SHALL support
   overwriting the original dataset and saving it.





Yang, et al.            Expires November 28, 2020              [Page 13]

Internet-Draft          Data Migration Standards                May 2020


3.10.  The Encryption and Decryption of Data Migration

   This framework is needed to meet the following requirements:

   o  It MAY support data encryption at the source, and then the
      received data should be decrypted and stored on the target
      platform;

   o  It MUST support the authentication when getting data migration
      source data;

   o  It SHALL support the verification of identity and permission when
      accessing the target platform of data migration;

   o  During the process of data migration, it SHOULD support data
      consistency;

   o  During the process of data migration, it MUST support data
      integrity;

3.11.  Incremental Migration

   The framework SHOULD support incremental migration of table records
   in a relational database, and it MUST allow the user to specify a
   field value as "last_value" in the table in order to characterize the
   row record increment.  Then, the framework SHOULD migrate those
   records in the table whose field value is greater than the specified
   "last_value", and then update the last_value.

3.12.  Real-Time Synchronization Migration

   The framework SHALL support real-time synchronous migration of
   updated data and incremental data from a relational database to one
   or many of the following target data containers:

   o  HDFS

   o  HBASE

   o  HIVE

3.13.  The Direct Mode of Data Migration

   This framework MUST support data migration in direct mode, which can
   increase the data migration rate.

   Note:This mode supports only for MYSQL and POSTGRESQL.




Yang, et al.            Expires November 28, 2020              [Page 14]

Internet-Draft          Data Migration Standards                May 2020


3.14.  The Storage Format of Data files

   This framework MUST allow saving the migrated data within at least
   one of following data file formats:

   o  SEQUENCE

   o  TEXTFILE

   o  AVRO

3.15.  The Number of Map Tasks

   This framework MUST allow the user to specify a number of map tasks
   to start a corresponding number of map tasks for migrating large
   amounts of data in parallel.

3.16.  The selection on the elements in a table to be migrated column

   o  The specification of columns

         This framework MUST support the user to specify the data of one
         or multiple columns in a table to be migrated.

   o  The specification of rows

         This framework SHOULD support the user to specify the range of
         rows in a table to be migrated.

   o  The composition of the specification of columns and rows

         This framework MAY support optionally the user to specify the
         range of rows and columns in a table to be migrated.

3.17.  Visualization of Migration

3.17.1.  Dataset Visualization

   After the framework has migrated the data in the relational
   database,,it MUST support the visualization of the dataset in the
   cloud platform.

3.17.2.  Visualization of Data Migration Progress

   The framework SHOULD support to show dynamically the progress to
   users in graphical mode when migrating.





Yang, et al.            Expires November 28, 2020              [Page 15]

Internet-Draft          Data Migration Standards                May 2020


3.18.  Smart Analysis of Migration

   The framework MAY provide automated migration proposals to facilitate
   the user's estimation of migration workload and costs.

3.19.  Task Scheduling

   The framework SHALL support the user to set various migration
   parameters(such as map tasks,the storage format of data files,the
   type of data compression and so on) and task execution time, and then
   to perform the schedule off-line/online migration tasks.

3.20.  The Alarm of Task Error

   When the task fails, the framework MUST at least support to notify
   stakeholders through a predefined way.

3.21.  Data Export From Cloud to RDBMS

3.21.1.  Data Export Diagram

   Figure 2 shows the framework's working diagram of exporting data.


       +---------+         +----------------+
       |         |   (1)   |    WebServer   |
       | Browser |-------->|                |---------------------
       |         |         |  +-----------+ |                    |
       +---------+         |  |   DMOW    | |                    |
                           |  +-----------+ |                    |
                           +----------------+                    |
                                                                 |(2)
                                                                 |
                                                                 |
           +-------------+          +-----------------------+    |
           |             |   (3)    |                       |    |
           | Data Source |<-------- |     Cloud Platform    |    |
           |             |          |  +-----------------+  |<----
           +-------------+          |  | Migration Engine|  |
                                    |  +-----------------+  |
                                    +-----------------------+



                        Figure 2:Reference Diagram

   The workflow of exporting data through the framework is as follows:




Yang, et al.            Expires November 28, 2020              [Page 16]

Internet-Draft          Data Migration Standards                May 2020


      Step (1) in the figure means that users submit the requisition of
      data migration to DMOW through browser(the requisition includes
      cloud platform information,the information of target relational
      database, and related migration parameter settings);

      Step (2) in the figure means that DMOW submits user's request
      information of data migration to cloud platform's migration
      engine;

      Step (3) in the figure means that the migration engine performs
      data migration tasks based on the migration requests it receives
      to migrate data from cloud platform to relational database;

3.21.2.  Full Export

   The framework MUST at least support exporting data from HDFS to one
   of following relational databases:

   o  SQLSERVER

   o  MYSQL

   o  ORACLE

   The framework SHALL support exporting data from HBASE to one of
   following relational databases:

   o  SQLSERVER

   o  MYSQL

   o  ORACLE

   The framework SHALL support exporting data from HIVE to one of
   following relational databases:

   o  SQLSERVER

   o  MYSQL

   o  ORACLE

3.21.3.  Partial Export

   The framework SHALL allow the user to specify data range of keys on
   the cloud platform and export the elements in the specified range to
   a relational database.  Exporting into A Subset of Columns.




Yang, et al.            Expires November 28, 2020              [Page 17]

Internet-Draft          Data Migration Standards                May 2020


3.22.  The Merger of Data

   The framework SHALL support merging data in different directories in
   HDFS and store them in a specified directory.

3.23.  Column Separator

   The framework MUST allow the user to specify the separator between
   fields in the migration process.

3.24.  Record Line Separator

   The framework MUST allow the user to specify the separator between
   the record lines after the migration is complete.

3.25.  The Mode of Payment

   1.  One-way payment mode

       *  In the framework by default, users SHALL to pay for
          downloading data from the cloud platform.It is free when
          uploading data from a relational database to a cloud platform;

       *  In the framework, users SHALL pay for uploading data from a
          relational database to a cloud platform.It is free when
          downloading data from the cloud;

   2.  Two-way payment mode

          In the framework, the users of the data migration process
          between the relational database and the cloud platform all
          SHALL pay a fee;

3.26.  Web Shell for Migration

   The framework provides following shells for character interface to
   operate through web access.

3.26.1.  Linux Web Shell

   The framework SHALL support Linux shell through web access, which
   allows users to perform basic Linux command instructions for the
   configuration management of the data migrated on web.








Yang, et al.            Expires November 28, 2020              [Page 18]

Internet-Draft          Data Migration Standards                May 2020


3.26.2.  HBase Shell

   The framework SHALL support hbase shell through web access, which
   allows users to perform basic operations such as adding, deleting,
   and deleting to the data migrated to hbase through the web shell.

3.26.3.  Hive Shell

   The framework SHALL support hive shell through web access, which
   allows users to perform basic operations such as adding, deleting,
   and deleting to the data migrated to hive through the web shell.

3.26.4.  Hadoop Shell

   The framework SHALL support the Hadoop shell through web access so
   that users can perform basic Hadoop command operations through the
   web shell.

3.26.5.  Spark Shell

   The framework SHALL support spark shell through web access and
   provide an interactive way to analyze and process the data in the
   cloud platform.

3.26.6.  Spark Shell Programming Language

   In spark web shell, the framework SHALL support at least one of the
   following programming languages:

   o  Scala

   o  Java

   o  Python

4.  Security Considerations

   The framework SHOUD support for the security of the data migration
   process.  During the data migration process, it should support
   encrypt the data before transmission, and then decrypt it for storage
   in target after the transfer is complete.  At the same time, it must
   support the authentication when getting data migration source data
   and it shall support the verification of identity and permission when
   accessing the target platform.







Yang, et al.            Expires November 28, 2020              [Page 19]

Internet-Draft          Data Migration Standards                May 2020


5.  IANA Considerations

   This memo includes no request to IANA.

6.  References

6.1.  Normative References

   [RFC2026]  Bradner, S., "The Internet Standards Process -- Revision
              3", BCP 9, RFC 2026, DOI 10.17487/RFC2026, October 1996,
              <https://www.rfc-editor.org/info/rfc2026>.

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119,
              DOI 10.17487/RFC2119, March 1997,
              <https://www.rfc-editor.org/info/rfc2119>.

   [RFC2578]  McCloghrie, K., Ed., Perkins, D., Ed., and J.
              Schoenwaelder, Ed., "Structure of Management Information
              Version 2 (SMIv2)", STD 58, RFC 2578,
              DOI 10.17487/RFC2578, April 1999,
              <https://www.rfc-editor.org/info/rfc2578>.

6.2.  Informative References

   [RFC2629]  Rose, M., "Writing I-Ds and RFCs using XML", RFC 2629,
              DOI 10.17487/RFC2629, June 1999,
              <https://www.rfc-editor.org/info/rfc2629>.

   [RFC4710]  Siddiqui, A., Romascanu, D., and E. Golovinsky, "Real-time
              Application Quality-of-Service Monitoring (RAQMON)
              Framework", RFC 4710, DOI 10.17487/RFC4710, October 2006,
              <https://www.rfc-editor.org/info/rfc4710>.

   [RFC5694]  Camarillo, G., Ed. and IAB, "Peer-to-Peer (P2P)
              Architecture: Definition, Taxonomies, Examples, and
              Applicability", RFC 5694, DOI 10.17487/RFC5694, November
              2009, <https://www.rfc-editor.org/info/rfc5694>.

6.3.  URL References

   [hadoop]   The Apache Software Foundation,
              "http://hadoop.apache.org/".

   [hbase]    The Apache Software Foundation,
              "http://hbase.apache.org/".

   [hive]     The Apache Software Foundation, "http://hive.apache.org/".



Yang, et al.            Expires November 28, 2020              [Page 20]

Internet-Draft          Data Migration Standards                May 2020


   [idguidelines]
              IETF Internet Drafts editor,
              "http://www.ietf.org/ietf/1id-guidelines.txt".

   [idnits]   IETF Internet Drafts editor,
              "http://www.ietf.org/ID-Checklist.html".

   [ietf]     IETF Tools Team, "http://tools.ietf.org".

   [ops]      the IETF OPS Area, "http://www.ops.ietf.org".

   [spark]    The Apache Software Foundation,
              "http://spark.apache.org/".

   [sqoop]    The Apache Software Foundation,
              "http://sqoop.apache.org/".

   [xml2rfc]  XML2RFC tools and documentation,
              "http://xml.resource.org".

Authors' Addresses

   Can Yang (editor)
   South China University of Technology
   382 Zhonghuan Road East
   Guangzhou Higher Education Mega Centre
   Guangzhou, Panyu District
   P.R.China

   Phone: +86 18602029601
   Email: cscyang@scut.edu.cn


   Yu Liu&Ying Wang&ShiYing Pan (editor)
   South China University of Technology
   382 Zhonghuan Road East
   Guangzhou Higher Education Mega Centre
   Guangzhou, Panyu District
   P.R.China

   Email: 201820132798@scut.edu.cn










Yang, et al.            Expires November 28, 2020              [Page 21]

Internet-Draft          Data Migration Standards                May 2020


   Cong Chen
   Inspur
   163 Pingyun Road
   Guangzhou, Tianhe District
   P.R.China

   Email: chen_cong@inspur.com


   Ge Chen
   GSTA
   No. 109 Zhongshan Road West, Guangdong Telecom Technology Building
   Guangzhou, Tianhe District
   P.R.China

   Email: cheng@gsta.com


   Yukai Wei
   Huawei
   Putian Huawei base
   Shenzhen, Longgang District
   P.R.China

   Email: weiyukai@huawei.com


























Yang, et al.            Expires November 28, 2020              [Page 22]