Internet DRAFT - draft-cui-iss-problem

draft-cui-iss-problem







Network Working Group                                             Y. Cui
Internet-Draft                                                    Z. Lai
Intended status: Informational                                    L. Sun
Expires: May 5, 2016                                 Tsinghua University
                                                        November 2, 2015


                Internet Storage Sync: Problem Statement
                        draft-cui-iss-problem-03

Abstract

   Internet storage services have become more and more popular.  They
   attract a huge number of users and produce a significant share of
   Internet traffic.  Most existing Internet storage services make use
   of proprietary sync protocols with different capabilities to achieve
   the data sync.  However, a single Internet storage service using its
   proprietary sync protocols has intrinsic limitations on service
   usability and network performance.  This document outlines the
   related problems caused by using proprietary sync protocols and
   missing key capabilities.  It also shows a demand for designing a
   standard sync protocol to achieve better usability and sync
   performance.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on May 5, 2016.

Copyright Notice

   Copyright (c) 2015 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents



Cui, et al.                Expires May 5, 2016                  [Page 1]

Internet-Draft                iss Problems                 November 2015


   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
   2.  Terminology and Concepts  . . . . . . . . . . . . . . . . . .   4
   3.  Architecture of Internet Storage Service  . . . . . . . . . .   5
   4.  Problems  . . . . . . . . . . . . . . . . . . . . . . . . . .   6
     4.1.  Complicated Support for APIs  . . . . . . . . . . . . . .   6
     4.2.  Unavailable Cross-service Sync  . . . . . . . . . . . . .   7
     4.3.  Multiple Similar Clients  . . . . . . . . . . . . . . . .   7
     4.4.  Protocol Capability Configurations and Implementations  .   8
       4.4.1.  Chunking and Deduplication  . . . . . . . . . . . . .   9
       4.4.2.  Chunking and Delta-encoding . . . . . . . . . . . . .   9
       4.4.3.  Bundling  . . . . . . . . . . . . . . . . . . . . . .  10
     4.5.  Sync Protocols in Mobile and Wireless Environments  . . .  10
     4.6.  Unsatisfactory Concurrent Work Ability  . . . . . . . . .  11
   5.  Advantages of Standard Sync Protocol  . . . . . . . . . . . .  12
   6.  Understanding of Sync Protocol  . . . . . . . . . . . . . . .  13
   7.  Related Work in IETF  . . . . . . . . . . . . . . . . . . . .  14
   8.  Security Considerations (TBD) . . . . . . . . . . . . . . . .  14
   9.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  14
   10. Informative References  . . . . . . . . . . . . . . . . . . .  14
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  15

1.  Introduction

   Internet storage services provide a convenient way for users to
   synchronize local files or folders with remote servers.  In recent
   years, Internet storage services have gained tremendous popularity
   and accounted for a large amount of Internet traffic.  This high
   public interest also pushes various providers to enter the Internet
   storage market.  Services like Dropbox, Google Drive, OneDrive and
   Box are becoming pervasive in people's routine.  Dropbox, typically
   considered as one of the leading providers, annouced that they have
   more than 400 million registered users in June, 2015 [users], and
   this number will keep growing in the future.  Internet storage
   services enable the users to access, operate and share their data
   from anywhere, on any devices, at any time and with any connectivity.
   Internet storage services also provide powerful APIs which allow
   third-party applications to offload the burden of data storage and
   management to the server.  By aggregating users' files or application



Cui, et al.                Expires May 5, 2016                  [Page 2]

Internet-Draft                iss Problems                 November 2015


   data in the server, Internet storage services are becoming the "data
   entrance" for personal users.

   Sync protocol is the key design consideration of Internet storage
   services.  The sync protocol can be equipped with several
   capabilities to optimize the storage usage and speed up data
   transmission.  Existing Internet storage services employ their
   proprietary sync protocols to store/retrieve user data to/from the
   remote servers.  However, using proprietary sync protocols with
   different capabilities in different Internet Storage services has
   intrinsic limitations on service usability and network performance.

   Multi-service usability: Users may use multiple Internet storage
   services for the diversity of performance and functionality.  In
   addition, an Internet storage service has full access to user data,
   the user data is at risk when the service is attacked or when
   authorities require the providers to expose their data.  Some
   enterprise users may want to use their own network-based storage
   service.  Furthermore, it is complicated for developers to use
   different APIs to combine their application with Internet storage
   service.  It also makes it unavailable for an Internet storage
   service user to synchronize data with the users of other service.
   Moreover, to use multi-service a user may install a series of client
   applications with similar functionality, which wastes the local
   resource and sacrifices the user experience.

   Missing or misusing capabilities: Previous works show that existing
   Internet storage services have different capability configurations
   and implementations.  These capabilities are closely related to each
   other and help to efficiently synchronize user data.  However, most
   of the storage services are found to be lack of key capabilities or
   the capabilities are not reasonably configured, which may result in
   unexpected sync failure and sync inefficiency.  How to reasonably
   design and implement capabilities in the sync protocol has indeed
   become a critical problem for the providers.

   To address the problems mentioned above, an open and standard sync
   protocol is required.  In addition, this standard sync protocol are
   expected to support the useful capabilities to avoid unexpected sync
   failures and improve network performance.

   This document outlines the problems arisen in existing Internet
   storage services with various proprietary sync protocols.  Section 2
   lists the terminology and related concepts of Internet storage
   services.  Section 3 introduces the architecture of existing Internet
   storage services.  Section 4 describes the main problems and issues
   that need to be considered.  Section 5 explains the advantages of
   using open and standard sync protocol.  Section 6 shows a high-level



Cui, et al.                Expires May 5, 2016                  [Page 3]

Internet-Draft                iss Problems                 November 2015


   understanding of the sync protocol.  Section 7 identifies the
   differences between ISS and related work in IETF (i.e.  WebDAV).

2.  Terminology and Concepts

   Data synchronization (sync): A primary technique for Internet storage
   services.  It enables the client to automatically update local file
   changes to the remote servers through network communications.

   Client: An application which is installed at the user side (i.e. on
   multiple terminals).  It enables users to access and experience
   Internet storage service.

   Control server: The entity that takes the responsibility of
   authenticating users, managing metadata information and also
   notifying changes to the client.  It stores authentication and
   metadata information of users.

   Data storage server: The entity that stores the synchronized files of
   users.

   Control data: The control information exchanged with control server
   to fulfil the data sync process.  Typical control data includes
   metadata (e.g. hashes for chunks), authentication information and
   etc.

   Content data: The original data of the local file, often in forms of
   small chunks.

   Sync protocol: A communication protocol between client and remote
   servers to achieve data sync.  It contains control flow and data
   flow.  Sync protocols are always built on HTTPS/HTTP.

   o  Control flow: This flow is for client and control server to
      exchange control data.

   o  Data flow: This flow is for transmitting content data between
      client and data storage servers.

   Sync efficiency: A performance metric that indicates how fast the
   changes can be synchronized to the Internet with the lowest traffic
   overhead.

   Useful capabilities to improve sync efficiency:

   o  Chunking: Split large file into small chunks.

   o  Bundling: Transmit multiple small chunks as a single big chunk.



Cui, et al.                Expires May 5, 2016                  [Page 4]

Internet-Draft                iss Problems                 November 2015


   o  Deduplication: Avoid retransmission of existing content on the
      Internet.

   o  Delta-encoding: Only synchronize modified data.

   o  Compression: Compress data before transmission.

3.  Architecture of Internet Storage Service

   The architecture of most Internet storage services is generally
   composed of three major components: client, control server and data
   storage server.  And the whole architecture is shown in Figure 1.


                           * * * * * * * *
              * * * * * * *               * * * * * * *
            *                                 INTERNET  *
            *  +------------+        +------------+     *
         ------|   Control  |        | +------------+    *
        |  *   |   server   |        | |Data storage|========
        |   *  +------------+        + |   servers  |   *    |
        |   *                          +------------+   *    |
        |     * * * * * * *                * * * * * * *     |
   Control Flow            * * * * * * * *               Data Flow
        |                                                    |
        |                                                    |
        |                     +--------+                     |
         ---------------------| Client |=====================
                              +--------+

                               Figure 1


   With the help of sync protocol, all the three components could
   communicate with each other.  Control server is responsible for
   storing all the control data, including authentication information,
   metadata and etc.  And once there are changes made on synchronized
   files, the control server will notify the clients.  However the other
   type of data, content data, is stored in the form of chunks on the
   data storage servers with no knowledge of sources, users and
   relationship with other data chunks.  As a result, a complete user
   file will be split into small chunks and those chunks may be stored
   on several different data storage servers.  These two types of
   servers are separate logical entities and are usually deployed in
   different locations.  Every time the client synchronize a local file
   to the Internet, it needs to exchange control data and content data
   with different types of servers in different flows.




Cui, et al.                Expires May 5, 2016                  [Page 5]

Internet-Draft                iss Problems                 November 2015


4.  Problems

   Existing popular Internet storage services, including Dropbox,
   OneDrive, GoogleDrive and etc, are using their own proprietary sync
   protocols to achieve the data sync.  Using different proprietary
   protocols are always considered not to be beneficial to the
   development of Internet services.  This section describes current
   problems for Internet storage services caused by their sync
   protocols.  We summarize six specific problems from three different
   aspects: service usability, protocol capabilities and concurrent work
   ability.  As we discussed in Section 1, users prefer to use multiple
   storage services for the considerations of performance, reliability
   and security.  Service usability among multiple services is still
   lacking to some extent due to the proprietary format of sync
   protocols.  Section 4.1, Section 4.2 and Section 4.3 describe the
   problems which are concerned with the usability.  Moreover, previous
   works and measurements have revealed that most sync protocols are
   lack of key service capabilities or the capabilities are not well
   configured, which significantly degrades the network performance,
   especially in the mobile and wireless environment.  Section 4.4 and
   Section 4.5 illustrate the problems of current protocol capabilities.
   In addition, the unsatisfied concurrent work ability is specified in
   Section 4.6.

4.1.  Complicated Support for APIs

   Popular Internet storage services provide APIs that extend access to
   the content management features in client software for use in third-
   party applications.  In practical platform, these APIs take care of
   synchronizing data with Internet storage servers through a familiar
   system-like way.  Behind the scenes, API synchronize changes to the
   server and automatically notify the client when changes are made on
   other devices.  These APIs can also include some further advanced
   features or functions, e.g. revision or restoration of files, to make
   the client work better.  Different providers have different APIs
   provided to the developers and their APIs have different styles and
   features in order to support different platforms (e.g.  Windows and
   Andorid).

   Third-party applications prefer to combine multiple Internet storage
   services into their applications to achieve better performance,
   reliability and security.  However, for these developers who want to
   use multiple storage services, they need to learn the APIs of all
   service providers in order to design and implement their own clients.
   Although there have already been some successful third party clients
   that support multiple services (e.g.  ExpanDrive [ExpanDrive], IFTTT
   [IFTTT]), it is not easy for the developers to learn and apply so




Cui, et al.                Expires May 5, 2016                  [Page 6]

Internet-Draft                iss Problems                 November 2015


   many different APIs to develop and maintain their third party
   clients.

4.2.  Unavailable Cross-service Sync

   Synchronizing is one of the most important functions provided by
   Internet storage services.  With this function provided, files in the
   Internet could be easily shared and manipulated by different people
   and groups.  Anyone who is permitted to read and download the file is
   able to modify and upload new versions of this file to the Internet.

   However, this synchronizing function merely works well inside a
   single service.  Users who are using the same Internet storage
   service could easily achieve the sharing (i.e. download) and
   coordinated operations on their files.  When referring to the
   synchronizing among different Internet storage services, it is not
   complete since the sync among different services is not available.
   For example, if a Dropbox user wants to work on a cooperative file
   with a Google Drive user currently, he is only able to share this
   file with the other one by sending an open HTTP link of this file.
   After clicking on that link, the Google Drive user could only
   download this file through HTTP.  However, the Google Drive user can
   only read and download the shared file.  He cannot modify and update
   the shared file since Dropbox and Google Drive are using two
   different proprietary sync protocols.  This is because the
   cooperative file is stored on Dropbox servers.  A Google Drive client
   cannot download/upload the file through Dropbox's sync protocol since
   it has no idea of the Dropbox's sync protocol.  Different services
   using different proprietary sync protocols results in the
   unavailability.

4.3.  Multiple Similar Clients

   The emergency of more and more Internet storage services provides
   users with a wide range of choices for storing their local files
   remotely.  Like other Internet applications, users are not restricted
   to use only one of those services.  Actually, they tend to have
   multiple accounts for different Internet storage services and
   experience them simultaneously.  One important reason is that users
   are always pursuing better functionality.  For example, Dropbox is
   better at file processing, OneDrive is better at the interoperability
   and compatibility with Microsoft Office while GoogleDrive has a
   better performance at mail attachment.  To enable all the desired
   functions and features, a simple way is to register and use all the
   desired Internet storage services.  Furthermore, people may simply
   need multiple Internet storage services for larger storage space and
   higher reliability.




Cui, et al.                Expires May 5, 2016                  [Page 7]

Internet-Draft                iss Problems                 November 2015


   However, using different Internet storage service results in a
   problem that users have to install multiple similar client
   applications.  Since almost all commercial Internet storage services
   have their own proprietary sync protocols and corresponding client
   applications, installing and running multiple similar client
   applications sacrifices the user experience and also increases the
   complexity of synchronizing files with different providers' servers
   in Internet.  For instance, users usually suffer from duplicate
   operations in order to upload the same file to their different
   service accounts.

4.4.  Protocol Capability Configurations and Implementations

   Data sync is not a simple remote file transfer process, it can
   implement several capabilities to optimize the data storage usage and
   speed up data transmissions.  There exists five well-known
   capabilities that can be employed by Internet storage services to
   improve the sync efficiency and reliability: chunking, bundling,
   deduplication, delta-encoding and compression.  All these
   capabilities are aimed to help to efficiently synchronize user data
   via Internet communications.

   However, the investigation of [Benchmarking] shows that different
   Internet storage services have different capability configurations
   and implementations.  And most existing Internet storage services do
   not implement all the five capabilities in their sync protocol.  Lack
   of such capabilities can do affect the sync efficiency.  Table 1 from
   [QuickSync] shows different capabilities implementations of four
   popular Internet storage services (i.e.  Dropbox, GoogleDrive,
   OneDrive and Seafile) on Windows OS.


 +----------------+-------------+-------------+-------------+-------------+
 |  Capabilities  |   Dropbox   | GoogleDrive |   OneDrive  |   Seafile   |
 |                |             |             |             |             |
 +----------------+-------------+-------------+-------------+-------------+
 |    Chunking    |     4MB     |     8MB     |   Variable  |   Variable  |
 +----------------+-------------+-------------+-------------+-------------+
 |    Bundling    |     Yes     |      No     |      No     |      No     |
 +----------------+-------------+-------------+-------------+-------------+
 |  Deduplication |     Yes     |      No     |      No     |     Yes     |
 +----------------+-------------+-------------+-------------+-------------+
 | Delta-encoding |     Yes     |      No     |      No     |      No     |
 +----------------+-------------+-------------+-------------+-------------+
 |   Compression  |     Yes     |     Yes     |      No     |      No     |
 +----------------+-------------+-------------+-------------+-------------+
                                   Table 1




Cui, et al.                Expires May 5, 2016                  [Page 8]

Internet-Draft                iss Problems                 November 2015


   Measurements and study from [QuickSync] also reveal that those key
   capabilities significantly affect the sync performance.  Most of them
   should be implemented and well configured to achieve data sync.  The
   remaining part of this subsection lists the problems caused by
   insufficient or unreasonably configured capabilities.

4.4.1.  Chunking and Deduplication

   Chunking is the most widely implemented capability that simplifies
   the transmission recovery when the sync of a large file is
   interrupted.  Different implementations of chunking has different
   chunking schemes (i.e. dynamic chunking or static chunking) and chunk
   sizes.  Chunking is closely related to deduplication since the
   deduplication is performed in the chunk granularity.  Typically,
   smaller chunk size and dynamic chunking scheme (e.g.  Content Defined
   Chunking) are better for detecting and eliminating redundancy.
   However the ability to detect more redundancy is not always equal to
   better sync efficiency since it will introduce more computation
   overhead (i.e. finding more redundancy needs more CPU time).
   Aggressive dynamic chunking scheme (e.g.  Content Defined Chunking)
   performs better in a high delay (i.e. high RTT) environment, while
   fixed-size scheme performs well in good network conditions.  A trade-
   off between computation time and transmission time need to be
   considered to achieve an effective chunking.  A better chunking
   strategy may be network-aware which means the sync should be able to
   employ appropriate chunking strategy according to its current network
   condition.

4.4.2.  Chunking and Delta-encoding

   Delta-encoding is an algorithm that can be used to find the different
   portion of two files and achieve incremental sync.  However, not all
   Internet storage services implement delta-encoding.  One possible
   reason is that most delta-encoding algorithms work at the granularity
   of file, while to save the storage space thus reducing the cost,
   files are often split into chunks to manage for Internet storage
   services.  Naively piecing together all chunks to reconstruct the
   whole file to achieve incremental sync would waste massive intra-
   cluster bandwidth.  Therefore, some Internet storage services, e.g.
   Dropbox, implement delta-encoding at the chunk granularity.  The
   delta-encoding is performed between two chunks in the original and
   modified version respectively according to the chunk offset from the
   beginning of the file.  If a service uses the fixed size chunking
   method, some types of modifications, e.g. inserting some new data at
   the head of a file, may cause that the two chunks used to perform
   delta-encoding have very little similarity.  In this circumstance,
   delta-encoding is unable to reveal the delta between the original and
   modified file so that the incremental sync fails.  To solve the



Cui, et al.                Expires May 5, 2016                  [Page 9]

Internet-Draft                iss Problems                 November 2015


   problem, we need to design an improved delta-encoding algorithm with
   appropriate chunking that makes the incremental sync always available
   in various scenarios.

4.4.3.  Bundling

   Small files are more likely to be modified and synchronized
   frequently.  For example, people usually collaborate on a number of
   small files (e.g. a project's source code always consists of multiple
   small files).  In a high delay environment, synchronizing large
   number of small files is not efficient.  One reason is that most
   existing Internet storage services employ a sequential
   acknowledgement mechanism.  Under this circumstance, the next chunk
   is only allowed to be transmitted until the last chunk's
   acknowledgement has been received.  The sequential acknowledgement
   mechanism wastes the limited bandwidth since the TCP connection is in
   idle state for a long time.  Bundling small files together and
   employing delayed acknowledgement mechanism can effectively make full
   use of limited bandwidth so that the whole sync time and traffic
   overhead can be significantly decreased.

4.5.  Sync Protocols in Mobile and Wireless Environments

   The increasing number of mobile terminals introduces the requirement
   of synchronizing data on any device via any connectivity at anytime
   and anywhere.  A change made on the data through the desktop is
   required to be automatically transferred to the user's mobile phone
   or other mobile devices.  Based on the measurements from
   [Look_at_Mobile_Cloud], the problem of missing capabilities is more
   severe when referring to the mobile Internet storage services.  The
   root cause and problem are twofold:

   First of all, mobile devices have limited storage and computation
   ability, it is really hard to implement all the five useful
   capabilities discussed previously on a mobile client since the
   implementation of those capabilities will bring extra overhead
   (Table 2 shows the implementations for capabilities on Android OS).
   The measurement results from [Look_at_Mobile_Cloud] shows that none
   of existing mobile Internet storage services implement all the five
   key capabilities and only very few of them could be found on a mobile
   Internet storage client.  That explains why most Internet storage
   services wastes limited bandwidth, produce large useless traffic and
   suffer long sync time in the mobile environment.  How to implement
   all the desired capabilities with lower requirement of storage and
   computation resources is a critical problem needs to be addressed.






Cui, et al.                Expires May 5, 2016                 [Page 10]

Internet-Draft                iss Problems                 November 2015


 +----------------+-------------+-------------+-------------+-------------+
 |  Capabilities  |   Dropbox   | GoogleDrive |   OneDrive  |   Seafile   |
 |                |             |             |             |             |
 +----------------+-------------+-------------+-------------+-------------+
 |    Chunking    |     4MB     |     260K    |     1MB     |      No     |
 +----------------+-------------+-------------+-------------+-------------+
 |    Bundling    |      No     |      No     |      No     |      No     |
 +----------------+-------------+-------------+-------------+-------------+
 |  Deduplication |     Yes     |      No     |      No     |      No     |
 +----------------+-------------+-------------+-------------+-------------+
 | Delta-encoding |      No     |      No     |      No     |      No     |
 +----------------+-------------+-------------+-------------+-------------+
 |   Compression  |      No     |      No     |      No     |      No     |
 +----------------+-------------+-------------+-------------+-------------+
                                   Table 2


   Secondly, sync protocol cannot well handle network disruptions caused
   by unstable network connection.  For example, some services fail to
   resume sync if the data transmission is interrupted, or incur too
   much additional recovery overhead when exception happens.  A well
   designed sync protocol that guarantees reliability and efficiency in
   mobile or wireless networks is expected.

4.6.  Unsatisfactory Concurrent Work Ability

   With the popularity of Internet storage services, collaborative work
   is becoming an important feature of such services.  This feature is
   especially important and provides convenience for a team or an
   organization since participants could easily retrieve and edit the
   target file on the Internet.  Currently, such collaborative work
   ability is still unsatisfactory that some common and frequent
   operations may lead to redundant file versions.  More specifically,
   parallel updates from different end users may result in a version
   conflict.  If two or more users are editing the same file
   concurrently, it is hard to make the file updated correctly.  To
   ensure every participant's modification would be considered, the
   typical way is to lock the file and allow other participants to
   create different versions for the same file.  To obtain a final
   version, participants have to negotiate with each other about their
   modifications (versions) and merge the final version manually.  This
   would definitely affect the work efficiency since people have to
   spend lots of time and effort on managing redundant versions and
   merging a final version.

   A desired concurrent work ability is when different people are
   working on the same file, the client should automatically create
   exclusive versions for their users locally.  And after they finished



Cui, et al.                Expires May 5, 2016                 [Page 11]

Internet-Draft                iss Problems                 November 2015


   and uploaded to the server, the server would automatically merge
   different versions to get a final version without any human
   involvement.  Furthermore, a better solution is like what
   [GoogleDocs] does which provides actual real-time edit.  Multiple
   people could edit the same file and are able to find each other's
   cursor and real-time operation.  Such desired ability does help to
   improve the collaborative work ability but is really challenging when
   designing a protocol.

5.  Advantages of Standard Sync Protocol

   An open and standard sync protocol between client and server can
   effectively address some problems mentioned above.  The sync protocol
   consists of two types of flows: control flow and data flow.  Control
   flow is between client and control server.  It is intended for user
   authentication, metadata management and also the active notification
   of data changes.  Data flow is between client and data storage
   servers, which is only for transmitting actual file data (in the form
   of numerous chunks).  The combination of control flow and data flow
   enables the whole data sync.  According to the analysis of problems
   above, the key capabilities could be supported as optional features
   in the sync protocol and it would be better if the protocol is
   network-aware.  The rest of this section lists the advantages of
   employing an open and standard sync protocol.

   First off, with a standard sync protocol provided, a third party
   client that supports multiple Internet storage services is easy to
   implement since APIs provided by different providers would be
   unnecessary or at least simplified.  This would attract more and more
   people or organizations to develop and implement their own client
   (sometimes it is even possible for the user himself to implement his
   client).  As a result, users do not need multiple clients for
   multiple services any more and their user experience is improved.
   Furthermore, the competition in the (third party) client market is
   increasing which is beneficial for the users.  They are able to
   choose their clients flexibly and the frequent updates of clients
   enable users to obtain more functions and better user experience.

   Another advantage of having standard sync protocol is that the sync
   among different services is available or at least possible to
   achieve.  If two different services both employ the standard sync
   protocol, their users could synchronize files with each other using
   the same standard sync protocol (not the basic HTTP download any
   more).  In this way, users from different services could achieve
   sharing and coordinated operations on their local files.

   Using standard sync protocol also makes it easy to improve Internet
   storage services.  Compared with the existing proprietary formats,



Cui, et al.                Expires May 5, 2016                 [Page 12]

Internet-Draft                iss Problems                 November 2015


   standard sync protocol is totally open and designed by many
   contributors.  People are welcome to revise and improve the standard
   protocol.  We believe that both users and providers will benefit a
   lot from such a standard sync protocol.

6.  Understanding of Sync Protocol

    Client                 Control Server           Data Storage Server
       |                          |                          |
       |---meta data, auth info-->|                          |
       |<-------start sync--------|                          |
       |     sync preparation     |                          |
       |                          |                          |
       |--------------------store/retrieve------------------>|
       |<--------------------ok/content----------------------|
       |                         ...                         |
       |--------------------store/retrieve------------------>|
       |<--------------------ok/content----------------------|
       |                   data transmission                 |
       |                          |                          |
       |---meta data, ver info--->|                          |
       |<-----conclude sync-------|                          |
       |        sync finish       |                          |
       |                          |                          |

                               Figure 2


   Figure 2 shows a preliminary and high level understanding of the sync
   protocol.  The whole sync process could be divided into three stages:
   sync preparation, data transmission and sync finish.  In the first
   stage, the client should exchange its metadata, authentication
   information with the control server to initiate a sync process.
   During this stage, the capabilities including network-aware chunking
   and deduplication should be performed.  In the second stage, data
   transmission, client sends/retrieves chunks to/from the data storage
   servers.  To speed up the data sync and make it more reliable, the
   capabilities like bundling and delta-encoding could be employed.
   When the sync finishes (i.e. sync finish stage), the client would
   send its metadata again for the control server to check and conclude
   the sync process.  Also some version information is exchanged for the
   version control.  From this understanding we could derive that the
   control flow and data flow are closely related, which cannot work
   without each other.







Cui, et al.                Expires May 5, 2016                 [Page 13]

Internet-Draft                iss Problems                 November 2015


7.  Related Work in IETF

   WebDAV ([RFC4918]) provides an alternative way to exchange local data
   with remote web servers.  It can be treated as previous IETF effort
   on file collections, authoring and versioning over HTTP.  WebDAV
   mainly focuses on the authoring and versioning for distributed web
   contents.  Typical WebDAV protocol extends HTTP protocol to enable
   users to collaboratively edit and manage files on remote servers.
   WebDAV focuses on the distributed work (authoring and versioning)
   while ISS will focus on the data sync.  A potential major difference
   between data sync and distributed authoring/versioning is the
   frequency of data transmission.  In data sync, the client will
   automatically exchange data with remote servers when there are any
   changes.  In reality, every time you perform 'save' operation of a
   file, the client will solicit a data sync process.  Such frequent
   data transmission will cause a large amount of network traffic.  This
   introduces challenges to the design of sync protocols.  A possible
   solution is to make use of those well-known service capabilities and
   make the protocol to be network-aware to some extent.  The ISS
   protocol suite could build on the WebDAV protocol or basic HTTP
   protocol.

8.  Security Considerations (TBD)

   TBD

9.  Acknowledgements

   The authors would like to thank Barry Leiba, Mark Nottingham, Julian
   Reschke, Marc Blanchet, Mike Bishop, Haibin Song, Philip Hallam
   Baker, Michiel de Jong and Ted Lemon for their valuable comments and
   contributions to this work.

10.  Informative References

   [Batched]  Li, Z., Wilson, C., Jiang, Z., Liu, Y., Zhao, B., Jin, C.,
              Zhang, Z., and Y. Dai, "Efficient Batched Synchronization
              in Dropbox-Like Cloud Storage Services", Middleware ,
              2013.

   [Benchmarking]
              Drago, I., Bocchi, E., Mellia, M., Slatman, H., and A.
              Pras, "Benchmarking Personal Cloud Storage", IMC , 2013.

   [ExpanDrive]
              "ExpanDrive", <http://www.expandrive.com/>.





Cui, et al.                Expires May 5, 2016                 [Page 14]

Internet-Draft                iss Problems                 November 2015


   [GoogleDocs]
              "Google Docs",
              <http://www.google.com/intl/en/docs/about/>.

   [IFTTT]    "IFTTT", <https://ifttt.com/>.

   [Inside_Dropbox]
              Drago, I., Mellia, M., Munafo, M., Sperotto, A., Sadre,
              R., and A. Pras, "Inside Dropbox: Understanding Personal
              Cloud Storage Services", IMC , 2012.

   [Look_at_Mobile_Cloud]
              Cui, Y., Lai, Z., and N. Dai, "A First Look at Mobile
              Cloud Storage Services: Architecture, Experimentation and
              Challenge", IEEE Network , 2015.

   [QuickSync]
              Cui, Y., Lai, Z., Wang, X., Dai, N., and C. Miao,
              "QuickSync: Improving Synchronization Efficiency for
              Mobile Cloud Storage Services", MOBICOM , 2015.

   [RFC4918]  Dusseault, L., Ed., "HTTP Extensions for Web Distributed
              Authoring and Versioning (WebDAV)", RFC 4918,
              DOI 10.17487/RFC4918, June 2007,
              <http://www.rfc-editor.org/info/rfc4918>.

   [rsync]    "rsync", <https://rsync.samba.org/>.

   [Towards]  Li, Z., Jin, C., Xu, T., Wilson, C., Liu, Y., Cheng, L.,
              Liu, Y., Dai, Y., and Z. Zhang, "Towards Network-level
              Efficiency for Cloud Storage Services", IMC , 2014.

   [users]    "400 million strong", <https://blogs.dropbox.com/
              dropbox/2015/06/400-million-users/>.

Authors' Addresses

   Yong Cui
   Tsinghua University
   Beijing  100084
   P.R.China

   Phone: +86-10-6260-3059
   Email: yong@csnet1.cs.tsinghua.edu.cn







Cui, et al.                Expires May 5, 2016                 [Page 15]

Internet-Draft                iss Problems                 November 2015


   Zeqi Lai
   Tsinghua University
   Beijing  100084
   P.R.China

   Phone: +86-10-6278-5822
   Email: uestclzq@gmail.com


   Linhui Sun
   Tsinghua University
   Beijing  100084
   P.R.China

   Phone: +86-10-6278-5822
   Email: lh.sunlinh@gmail.com



































Cui, et al.                Expires May 5, 2016                 [Page 16]