Internet DRAFT - draft-tingwei-mctcp

draft-tingwei-mctcp







Network Working Group                                             T. Zhu
Internet-Draft                                                   F. Wang
Intended status: Informational                                   D. Feng
Expires: March 31, 2016                                           Q. Shi
                                                                  Y. Xie
                           Huazhong University of Science and Technology
                                                      September 28, 2015


 Congestion-Aware and Robust MultiCast TCP in Software-Defined Networks
                         draft-tingwei-mctcp-00

Abstract

   Reliable group communication is required in distributed applications,
   such as distributed file systems (HDFS, GFS and Ceph), where such
   group communication is defined by the sender and the group members
   are small (e.g. three).  However, existing standards for reliable
   multicast transport are receiver-initiated and suffer from
   inefficiency in either host-side protocols or multicast routing.

   This draft proposes a sender-initiated, efficient, congestion-aware
   and robust reliable multicast solution in Software-Defined Networks
   (SDN), called MCTCP (MultiCast TCP).  The main idea behind MCTCP is
   to manage the multicast groups in a centralized manner, and
   reactively schedule multicast flows to active and low-utilized links,
   and it is implemented by extending TCP as the host-side protocol and
   managing multicast groups in SDN-controller.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on March 31, 2016.






Zhu, et al.              Expires March 31, 2016                 [Page 1]

Internet-Draft                MultiCast TCP               September 2015


Copyright Notice

   Copyright (c) 2015 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
   2.  Terminology . . . . . . . . . . . . . . . . . . . . . . . . .   3
   3.  Motivation  . . . . . . . . . . . . . . . . . . . . . . . . .   4
   4.  An Example Application  . . . . . . . . . . . . . . . . . . .   5
   5.  MCTCP Architecture  . . . . . . . . . . . . . . . . . . . . .   6
     5.1.  Host Side Protocol  . . . . . . . . . . . . . . . . . . .   7
       5.1.1.  Session Establishment . . . . . . . . . . . . . . . .   7
       5.1.2.  Data Transmission . . . . . . . . . . . . . . . . . .  10
       5.1.3.  Session Close . . . . . . . . . . . . . . . . . . . .  10
       5.1.4.  Packet Format . . . . . . . . . . . . . . . . . . . .  10
         5.1.4.1.  Control Packet  . . . . . . . . . . . . . . . . .  10
         5.1.4.2.  Data Packet . . . . . . . . . . . . . . . . . . .  11
       5.1.5.  Programming APIs  . . . . . . . . . . . . . . . . . .  13
     5.2.  Multicast Group Manager . . . . . . . . . . . . . . . . .  13
       5.2.1.  Session Manager . . . . . . . . . . . . . . . . . . .  13
       5.2.2.  Link Monitor  . . . . . . . . . . . . . . . . . . . .  13
       5.2.3.  Routing Manager . . . . . . . . . . . . . . . . . . .  14
   6.  Security Considerations . . . . . . . . . . . . . . . . . . .  15
   7.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  16
   8.  References  . . . . . . . . . . . . . . . . . . . . . . . . .  16
     8.1.  Normative References  . . . . . . . . . . . . . . . . . .  16
     8.2.  Informative references  . . . . . . . . . . . . . . . . .  17
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  17

1.  Introduction

   Traditional reliable multicast schemes are mainly designed for very
   large multicast groups, such as PGM [RFC3208], NORM [RFC5740].  They
   are not suitable for the sender-defined small group multicast
   scenarios mainly for the following reasons.




Zhu, et al.              Expires March 31, 2016                 [Page 2]

Internet-Draft                MultiCast TCP               September 2015


   1.  Traditional reliable multicast schemes are receiver-initiated
       application-layer protocols (based on UDP), which suffer from
       high software overhead on end hosts and mismatch to the sender-
       initiated mode.

   2.  Traditional IP multicast routing algorithms, such as PIM-SM
       [RFC4601], are not designed to build optimal routing trees.  They
       are not aware of link congestion, and thus apt to cause
       significant performance degradation in burst and unpredictable
       traffic environment.

   3.  Traditional multicast group management protocols, such as IGMP
       [RFC3376],MLD [RFC2710], are not aware of link failures.  Any
       failure in multicast spanning trees can suspend transmission and
       lead to significant performance loss or business interruption.

   The emergence of SDN (Software-Defined Networking), brings new ideas
   for solving routing efficiency issues of reliable multicast in data
   centers.  A centralized control plane called SDN-controller provides
   global visibility of the network, rather than localized switch level
   visibility in traditional IP networks.  Therefore, multicast routing
   algorithms can leverage topology information and link utilization to
   build optimal (near-optimal) routing trees, and be robust against
   link congestion and failures.

   This memo proposes an SDN-based sender-initiated, efficient,
   congestion-aware and robust reliable multicast scheme, called MCTCP,
   which mainly designed for small groups.  The main idea behind MCTCP
   is to manage the multicast groups in a centralized manner, and
   reactively schedule multicast flows to active and low-utilized links.
   Therefore, the multicast routing can be efficient and robust.  To
   eliminate the high overhead on end hosts and achieve reliability,
   MCTCP extends TCP as the host-side protocol, which is a transport-
   layer protocol.

2.  Terminology

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119 [RFC2119].

   Software Defined Networking (SDN): defined in RFC 7426 [RFC7426]

   Sender: A sender is a node which can start up a multicast
   communication and send data to several other nodes.






Zhu, et al.              Expires March 31, 2016                 [Page 3]

Internet-Draft                MultiCast TCP               September 2015


   Receiver: A receiver can only wait for connection from a sender and
   receive data.  It does not need to subscribe a multicast group in
   advance, but just keep listening for connection.

   Multicast Session: A multicast session contains a sender and several
   receivers.

   MST: Multicast Spanning Tree.

   HSP: Host Side Protocol.

   MGM: Multicast Group Manager.

3.  Motivation

   MCTCP is designed for the sender-defined small group scenarios, which
   are very common in distributed systems like distributed file systems.
   MCTCP can make full use of the advantage of the SDN technology, and
   provide a framework for other intelligent or user-defined functions.

   Compared to traditional reliable multicast schemes, MCTCP has the
   following advantages:

   1.  Easier programming in upper-level applications.  MCTCP provides
       common socket APIs, so that a programmer can use MCTCP easily.

   2.  More efficient.  MCTCP can process the packets more efficient in
       host-side protocol (Transport-Layer), and can forward the packets
       via more efficient multicast spanning trees.

   3.  More robust.  MCTCP can be aware of link failures, so the loss
       caused by a link failure decreases greatly.

   4.  More secure.  MCTCP is inherent secure as the sender keeps the
       information of all receivers.  Moreover, the centralized
       admission control in MGM helps achieve security.

   5.  Require no assistance from network devices.  Different from the
       network-equipment scheme Xcast [RFC5058], which supports a very
       large number of small multicast sessions by explicitly encoding
       the list of destinations in the data packets, MCTCP can deploy on
       common SDN-enabled network devices and need no assistance from
       network devices.








Zhu, et al.              Expires March 31, 2016                 [Page 4]

Internet-Draft                MultiCast TCP               September 2015


4.  An Example Application

   The HDFS data replication process is a typical one-to-many data
   transmission, during which the client gets the list of DNs (Data
   Nodes) from a NN (Name Node), and then delivers the data chunks to
   them.  We assume the replication factor is three.



   +------+ Packets    +---+           +---+           +---+
   |      |----------->|   |---------->|   |---------->|   |
   |Client|            |DN0|           |DN1|           |DN2|
   |      |<-----------|   |<----------|   |<----------|   |
   +------+  ACKs      +---+           +---+           +---+

                 (a) Pipeline-based data replication

                   Packets(Multicast)
                  ------------->-------------->
               -->             --              --
   +------+  --   --   +---+      --   +---+      --   +---+
   |      |--       -->|   |        -->|   |        -->|   |
   |Client|            |DN0|           |DN1|           |DN2|
   |      |<--      ---|   |        ---|   |        ---|   |
   +------+   -   --   +---+      --   +---+      --   +---+
               <--             --              --
                  ------------<----------<---
                  ACKs

                  (b) Multicast-based data replication

   Illustration of Pipeline-based and Multicast-based data replication.

                                 Figure 1

   As shown in Figure 1(a), the original HDFS employs a pipeline-based
   replication method.  The data transmission unit is a packet, which is
   usually 64KB.  For each packet, the client first transfers it to DN0;
   then the DN0 stores and passes it to DN1; finally the DN1 stores and
   transfers it to DN2.  After the DN2 receives the packet, it returns
   an acknowledgment to DN1; then the DN1 returns an acknowledgment to
   DN0; finally the DN0 returns an acknowledgment to the Client.
   Therefore, the whole process can be regarded as a six-stage pipeline.
   Let O-HDFS denote the original HDFS.  O-HDFS has 2*n stages when
   configured as n replicas, resulting in long delay in packet
   transmission.  In addition, O-HDFS delivers data in unicast, which
   will generate a large number of duplicated packets into the network
   and reduce the overall transmission performance.



Zhu, et al.              Expires March 31, 2016                 [Page 5]

Internet-Draft                MultiCast TCP               September 2015


   Let M-HDFS denote the multicast-based HDFS, which using MCTCP for
   data replication.  As shown in Figure 1(b), the client divides the
   data into packets, and then delivers them to three data nodes DN0,
   DN1, DN2 in multicast.  For each packet, the client transfers it to
   DN0, DN1, DN2 simultaneously using MCTCP, and then all the data nodes
   return acknowledgements to the client directly.  Therefore, M-HDFS's
   data replication procedure can be regarded as a two-stage pipeline.
   Compared with O-HDFS, M-HDFS has less stages (two stages to six
   stages), thus resulting in lower latency.  Meanwhile, M-HDFS delivers
   data in multicast, so redundant packets in network are reduced
   greatly.

5.  MCTCP Architecture

   MCTCP consists of two modules, the HSP (Host-Side Protocol) and the
   MGM (Multicast Group Manager).  The HSP is an extension of TCP,
   leveraging the three-way handshake connection mechanism, cumulative
   acknowledge mechanism, data retransmission mechanism and congestion
   control mechanism to achieve reliable multipoint data delivery.  The
   MGM, located in SDN-controller, is responsible for calculating,
   adjusting and maintaining the MSTs for each multicast session.  It
   keeps monitoring the network status (e.g. link congestion and link
   failures) and creates maximal possibility to avoid network congestion
   and to be robust against link failures.



























Zhu, et al.              Expires March 31, 2016                 [Page 6]

Internet-Draft                MultiCast TCP               September 2015


   +--------------------------------------------------------------+
   |                  Multicast Group Manager                     |
   |                                                              |
   | +---------------+    +---------------+     +---------------+ |
   | |Session Manager|    |Routing Manager|     |  Link Monitor | |
   | +---------------+    +---------------+     +---------------+ |
   |                                             SDN-Controller   |
   +--------------------------------------------------------------+
            ^                    ||                      ^
           /|\                   ||                     /|\
            |Establish/          || MST       Link status|
            |Close               ||                      |
            |                   \||/                     |
            |                    \/                      |
   +--------------------------------------------------------------+
   |                            +--+                              |
   |                       -----|S2|-----                         |
   |     Switch       -----     +--+     -----                    |
   |     +--+    -----                        -----       +--+    |
   |     |S1|----           Network Devices        -------|S4|    |
   |     +--+------            +--+                -------+--+    |
   |       |       ------      |S3|        --------         |     |
   |       |             ------+--+--------                 |     |
   +-------|------------------------------------------------|-----+
   |     +---+                                            +---+   |
   |     |H1 | Host        Host Side Protocol             | H2|   |
   |     +---+                                            +---+   |
   +--------------------------------------------------------------+

                        The architecture of MCTCP.

                                 Figure 2

   The sender establishes connection with multiple receivers explicitly
   before data transmission.  First, the sender requests to the MGM for
   calculating the MST.  Second, the MGM calculates and installs the
   MST.  Third, the sender starts three-way handshake with receivers,
   and begins data transmission after that.  Fourth, the MGM adjusts the
   MST once link congestion or failure detected.  Fifth, the sender
   notifies the MGM after data transmission finishes.

5.1.  Host Side Protocol

5.1.1.  Session Establishment

   The sender requests to MGM for calculating MST when establishing a
   new session.  Since the receivers do not obtain the multicast address
   in advance, the first handshake must be realized by using unicast



Zhu, et al.              Expires March 31, 2016                 [Page 7]

Internet-Draft                MultiCast TCP               September 2015


   address.  The multicast address is placed in the SYN packet.  After
   receiving the SYN packet, the receivers get the specific multicast
   address, and join the group (just put the multicast address into the
   interested list, but not send IGMP messages), so that they can
   receive the multicast messages.


                 Sender     MGM           Receivers

   --------------|         +-|-+         |    |    |
                 |-------->| | |         |    |    |
   Initialization|         | | |         |    |    |
                 |<------- |*| |         |    |    |
                 |         +-|-+         |    |    |
   --------------|           |   SYN     |    |    |
                 |-----------|---------->|    |    |
                 |-----------|-----------|--->|    |
                 |-----------|-----------|----|--->|
                 |           | SYN+ACK   |    |    |
                 |<----------|-----------|    |    |
     Three-way   |<----------|-----------|----|    |
     Handshake   |<----------|-----------|----|----|
                 |           |           |    |    |
                 |           |   ACK     |    |    |
                 |---->*-----|---------->|    |    |
                 |       *---|-----------|--->|    |
                 |         *-|-----------|----|--->|
   --------------|           |           |    |    |
                 |      Data | Packets   |    |    |
                 |---->*-----|---------->|    |    |
     Data        |       *---|-----------|--->|    |
     Transmission|         *-|-----------|----|--->|
                 |           |  ACK      |    |    |
                 |<----------|-----------|    |    |
                 |<----------|-----------|----|    |
                 |<----------|-----------|----|----|
                 |           |           |    |    |

                 (a) Out-band scheme

                 Sender     MGM           Receivers

   --------------|         +-|-+         |    |    |
                 |-------->| | |  SYN    |    |    |
                 |         | |*|-------->|    |    |
                 |         | |*|---------|--->|    |
                 |         | |*|---------|----|--->|
                 |         +-|-+         |    |    |



Zhu, et al.              Expires March 31, 2016                 [Page 8]

Internet-Draft                MultiCast TCP               September 2015


                 |           |  SYN+ACK  |    |    |
     Three-way   |<----------|-----------|    |    |
     Handshake   |<----------|-----------|----|    |
                 |<----------|-----------|----|----|
                 |           |           |    |    |
                 |           |   ACK     |    |    |
                 |---->*-----|---------->|    |    |
                 |       *---|-----------|--->|    |
                 |         *-|-----------|----|--->|
   --------------|           |           |    |    |
                 |      Data | Packets   |    |    |
                 |---->*-----|---------->|    |    |
                 |       *---|-----------|--->|    |
                 |         *-|-----------|----|--->|
     Data        |           | ACK       |    |    |
     Transmission|<----------|-----------|    |    |
                 |<----------|-----------|----|    |
                 |<----------|-----------|----|----|
                 |           |           |    |    |
                 |           |           |    |    |
                 (b) In-band scheme

   The Procedure of Session Establishment and Data transmission.  There
                           are three receivers.

                                 Figure 3

   There are two alternative schemes, the out-band and the in-band
   schemes.  For the out-band scheme, the sender requests to the MGM
   before three-way handshake.  After calculating the MST, the MGM
   notifies the sender to start three-way handshake.  For the in-band
   scheme, the SYN packet is used to request MST for calculation, and
   redirected to the MGM.  After receiving the SYN packet and
   calculating MST, the MGM dispatches the SYN packet to all the
   receivers in unicast.  Figure 3 illustrates the procedure of
   connection establishment and data transmission.

   The out-band scheme suffers from time overhead of an extra RTT to the
   SDN controller.  Hence, this scheme is suitable for the large amount
   data transmission scenes, in which the overhead of session
   establishment is negligible.  The in-band scheme has no extra time
   overhead, but brings much pressure on the SDN controller.  This
   scheme is more suitable for extremely small membership and delay-
   sensitive scenes.







Zhu, et al.              Expires March 31, 2016                 [Page 9]

Internet-Draft                MultiCast TCP               September 2015


5.1.2.  Data Transmission

   When a session is established, data transmission begins.

   Packet Acknowledgement.  The sender maintains a sliding window and
   processes the acknowledgement from receivers.  The send window
   advancement is decided by the slowest receiver.  As MCTCP is mainly
   designed for small group scenarios, the ACK-implosion problem in
   traditional large member reliable multicast is negligible.

   Packet Retransmission.  The sender does multicast retransmission when
   the timer expires or a packet loss is detected.  Since the efficient
   and robust multicast forwarding achieved by MGM can greatly reduce
   the packet loss, the emergence of retransmission in MCTCP will be
   largely decreased.

   Congestion Control.  A large amount of congestion control algorithms
   can be used in MCTCP, such as TFMCC [RFC4654], pgmcc [PGMCC].

   Node failure.  A receiver is considered as failed if the sender does
   not receive any acknowledgement from it within a threshold time.  The
   failed receiver, which may encounter crash or network failure, should
   be cleaned out from the multicast session in order to ensure the
   transmission of the rest receivers.  Therefore, the upper-level
   applications should be responsible for fault recovery.

5.1.3.  Session Close

   After data transmission is completed, the sender closes the multicast
   session initiatively, and then notifies the MGM.

5.1.4.  Packet Format

   There are two kinds of packets in MCTCP, the control packets and the
   data packets.  The Control Packets are used to maintain the session
   states.  The Data Packets are the regular packets.

5.1.4.1.  Control Packet













Zhu, et al.              Expires March 31, 2016                [Page 10]

Internet-Draft                MultiCast TCP               September 2015


   0                   1                   2                   3
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-------------+---------------+-------------------------------+
   | Op=Establish|    Number     |          RESERVED             |
   +-------------+---------------+-------------------------------+
   |                   Multicast Address                         |
   +-------------------------------------------------------------+
   |                   Receiver Address1                         |
   +-------------------------------------------------------------+
   |                            ...                              |
   +-------------------------------------------------------------+
   |                   Receiver AddressN                         |
   +-------------------------------------------------------------+
                       (a) Establish Session Packet

   0                   1                   2                   3
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-------------+-----------------------------------------------+
   | Op=Close    |                 RESERVED                      |
   +-------------+-----------------------------------------------+
   |                   Multicast Address                         |
   +-------------------------------------------------------------+
                       (b) Close Session Packet


                              Control packet

                                 Figure 4

   There are at least two control packets:

   o  SessionEstablish packet: If a sender wants to start a multicast
      session, it MUST assign a multicast address and a set of
      receivers, then send them to the MGM for MST calculation.  As
      shown in Figure 4, the packet MUST contain the multicast address,
      the receiver number and the address of each receivers.

   o  SessionClose packet: When a multicast session is closed, the
      sender MUST tell the MGM.

5.1.4.2.  Data Packet










Zhu, et al.              Expires March 31, 2016                [Page 11]

Internet-Draft                MultiCast TCP               September 2015


   0                   1                   2                   3
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-------------+---------------+---------------+----------------+
   |    Type     |    Length     |   Sub_type    |                |
   +-------------+---------------+---------------+                +
   |                         Info                                 |
   +-------------+---------------+---------------+----------------+

                    Options Field in MCTCP Data Packets

                                 Figure 5

   MCTCP is a new transport-layer protocol.  To simplify the complexity
   of implementation and ensure compatibility of the protocol, the
   packet format of MCTCP is the same as TCP, and the related features
   for MCTCP are implemented in TCP options.  The options field of MCTCP
   are depicted in Figure 5, where the 'type' is the option type, namely
   defined by TCPOPT_MCTCP, the 'sub_type' is the sub-options of MCTCP,
   INCLUDING OPTION_MCTCP_XID, OPTION_MCTCP_MCADDR, OPTION_MCTCP_SENDER,
   etc.  The 'info' is the contents for corresponding sub-types.

   o  OPTION_MCTCP_XID, used to identify a unique group, with length of
      7 bytes, 4 bytes for XID.  The XID is generated by the sender,
      delivered to receivers during connection establishment for
      identifying the multicast session, defined as the shared initial
      sequence number of all receivers as well.

   o  OPTION_MCTCP_MCADDR, used to deliver the current multicast
      address, with length of 8 bytes, 1 byte for receiver ID and 4
      bytes for multicast address.  This option is used in the SYN
      packet for delivering the multicast address to receivers, and in
      all packets which the receivers send to the sender for identifying
      which group the packets belong to.  The receiver ID identifies
      which receiver the packet comes from, and is set NULL in the SYN
      packet.

   o  OPTION_MCTCP_SENDER, used to identify whether the packet is sent
      out by the sender, with the length of 3 bytes.  In the sender, the
      five-tuples <multicast address, source port, destination address,
      destination port, protocol> is used to identify a session instead
      of <source address, source port, destination address, destination
      port, protocol>.  Therefore it is different from the receivers in
      processing a receiving packet in the sender.  The host has to know
      whether a packet is coming from a sender or a receiver before
      processing it.






Zhu, et al.              Expires March 31, 2016                [Page 12]

Internet-Draft                MultiCast TCP               September 2015


5.1.5.  Programming APIs

   The HSP uses the common socket APIs for programming.  When
   programming using MCTCP, the receivers call the listen() system call
   for listening, just the same as TCP.  At the sender, the user can
   specify a multicast address for the multicast session, otherwise a
   random multicast address will be allocated automatically by the HSP.
   Then the sender should call setsockopt() function to specify the
   address list of the receivers before connect(), as shown below.

   #define PEER_NUM 3
   struct sockaddr_mc{
     uint16_t sin_port;
     struct in addr_sin addr;
   }
   struct sockaddr_mc mc_addr[PEER_NUM];
   setsockopt(fd, IPPROTO_MCTCP, MCTCP_ADDR, mc_addr, sizeof(mc_addr));

5.2.  Multicast Group Manager

   MCTCP uses a logically centralized approach to manage multicast
   groups.  The MGM, located in SDN controller, manages the multicast
   sessions and MSTs.  By keeping the global view of the network
   topology and monitoring the link status in real-time, the MGM can
   adjust the MSTs in case of link congestion or failures.
   Specifically, the MGM consists of three sub-modules, the session
   manager, the link monitor and the routing manager, as shown in
   Figure 2.

5.2.1.  Session Manager

   The session manager is responsible for maintaining the states of all
   groups.  When establishing or closing a multicast session, the sender
   informs the session manager.  Hence, the session manager can keep
   track of all the active multicast sessions.  If a multicast session
   is closed, the MST will not be cleared immediately, but just be
   marked inactive.  Therefore, a session with the same sender and
   receivers can reuse the MST.  The session manager periodically cleans
   up the inactive MSTs.

5.2.2.  Link Monitor

   Link Monitor is responsible for monitoring network link status, and
   estimating the weight of each link periodically.  This can be
   achieved easily by sFlow, NetFlow or "port-status" interface in
   OpenFlow [OpenFlow] protocol.





Zhu, et al.              Expires March 31, 2016                [Page 13]

Internet-Draft                MultiCast TCP               September 2015


5.2.3.  Routing Manager

   The routing manager is responsible for calculating and adjusting
   MSTs.  When establishing a new multicast session, the routing manager
   calculates the minimum cost MST based on the current link
   utilization.  When a link overloads or failure occurs, the adjustment
   for all MSTs over the link will be triggered.  The routing manager is
   divided into two parts, the routing calculation and the routing
   adjustment.  The MST should be calculated quickly during session
   establishment.  In the case of link congestion, the MST should be
   adjusted in the best-effort way.  When a link fails, all the relevant
   MST should be quickly updated.

   o  Routing calculation.  The members of a group are assigned by the
      sender, and no dynamically join/leave is allowed in MCTCP once the
      session begins.  So a lot of static multicast routing algorithm
      can be used, the minimum-cost path heuristic algorithm (MPH)[MPH],
      for example.  The MPH algorithm inputs a set of sender/receiver
      nodes and all-pairs shortest paths which are calculated by Floyd-
      Warshall algorithm, and outputs a minimum cost MST.

   o  Routing adjustment.  When the link monitor detects link
      overloading, i.e. the link weight is larger than a preset
      threshold, the routing adjustment will be triggered.  In routing
      adjustment, the relevant MSTs should be recalculated.


























Zhu, et al.              Expires March 31, 2016                [Page 14]

Internet-Draft                MultiCast TCP               September 2015


   +---+      +---+         +---+      +---+          +---+      +---+
   |H1 |      |H2 |         |H1 |      |H2 |          |H1 |      |H2 |
   +---+      +---+         +---+      +---+          +---+      +---+
     |          |             |          |              |          |
    +--+       +--+          +--+       +--+          +--+       +--+
    |S1|------>|S2|          |S1|------>|S2|          |S1|------>|S2|
    +--+       +--+          +--+       +--+          +--+       +--+
     |          |             |          |             |          |
     |          |    =====>   |          |    =====>   X          |
     V          |             V          V             |          V
    +--+       +--+          +--+       +--+          +--+       +--+
    |S3|------>|S4|          |S3|-------|S4|          |S3|<------|S4|
    +--+       +--+          +--+       +--+          +--+       +--+
     |          |             |          |              |          |
   +---+      +---+         +---+      +---+          +---+      +---+
   |H3 |      |H5 |         |H3 |      |H5 |          |H3 |      |H5 |
   +---+      +---+         +---+      +---+          +---+      +---+
   |H4 |      |H6 |         |H4 |      |H6 |          |H4 |      |H6 |
   +---+      +---+         +---+      +---+          +---+      +---+
         (a)                      (b)                       (c)

   An example for MST adjustment.  A multicast group G1:H1-->{H2,H3,H5}
    in (a).  Then H4 starts sending data to H6 with TCP in (b).  A link
                  down between S1 and S3 happens in (c).

                                 Figure 6

   For example, consider a simple network topology which consists of
   four switches, as shown in Figure 6.

   At time T0, there is one group G1:H1->{H2,H3,H5}, and the current MST
   is MST1:{S1->S2, S1->S3, S3->S4}.

   At time T1, H4 start to send data to H6, causing plenty of TCP
   traffic on link S3->S4, resulting in confliction with group G1 at
   link S3->S4.  Once the link monitor detects the congestion, the
   routing manager will start to adjust the MST of G1.  So the new MST
   will be MST2:{S1->S2, S1->S3, S2->S4}.

   At time T2, link S1->S3 fails.  Then the routing manager will adjust
   the MST of G1 to MST3:{S1->S2, S2->S4, S4->S3}.

6.  Security Considerations

   MCTCP is more secure than traditional reliable multicast schemes,
   mainly for the following two reasons.





Zhu, et al.              Expires March 31, 2016                [Page 15]

Internet-Draft                MultiCast TCP               September 2015


   First, MCTCP is a sender-defined scheme, all the receivers are
   specified by the sender.  Therefore, eavesdroppers can not join or
   leave a multicast session freely.  It is hard to steal data from a
   multicast session.

   Second, all the multicast sessions are under control of the MGM, so
   it is easy to enable admission control and policy enforcement.  For
   example, the MGM can enable authentication for each senders and
   receivers, so that a malicious sender is hard to start up a multicast
   session.  Some forms of denial-of-service attack which wants to
   enlarge by using multicast can be prevent.

7.  IANA Considerations

   TBD

8.  References

8.1.  Normative References

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119,
              DOI 10.17487/RFC2119, March 1997,
              <http://www.rfc-editor.org/info/rfc2119>.

   [RFC2710]  Deering, S., Fenner, W., and B. Haberman, "Multicast
              Listener Discovery (MLD) for IPv6", RFC 2710,
              DOI 10.17487/RFC2710, October 1999,
              <http://www.rfc-editor.org/info/rfc2710>.

   [RFC3208]  Speakman, T., Crowcroft, J., Gemmell, J., Farinacci, D.,
              Lin, S., Leshchiner, D., Luby, M., Montgomery, T., Rizzo,
              L., Tweedly, A., Bhaskar, N., Edmonstone, R.,
              Sumanasekera, R., and L. Vicisano, "PGM Reliable Transport
              Protocol Specification", RFC 3208, DOI 10.17487/RFC3208,
              December 2001, <http://www.rfc-editor.org/info/rfc3208>.

   [RFC3376]  Cain, B., Deering, S., Kouvelas, I., Fenner, B., and A.
              Thyagarajan, "Internet Group Management Protocol, Version
              3", RFC 3376, DOI 10.17487/RFC3376, October 2002,
              <http://www.rfc-editor.org/info/rfc3376>.

   [RFC4601]  Fenner, B., Handley, M., Holbrook, H., and I. Kouvelas,
              "Protocol Independent Multicast - Sparse Mode (PIM-SM):
              Protocol Specification (Revised)", RFC 4601,
              DOI 10.17487/RFC4601, August 2006,
              <http://www.rfc-editor.org/info/rfc4601>.




Zhu, et al.              Expires March 31, 2016                [Page 16]

Internet-Draft                MultiCast TCP               September 2015


   [RFC4654]  Widmer, J. and M. Handley, "TCP-Friendly Multicast
              Congestion Control (TFMCC): Protocol Specification",
              RFC 4654, DOI 10.17487/RFC4654, August 2006,
              <http://www.rfc-editor.org/info/rfc4654>.

   [RFC5058]  Boivie, R., Feldman, N., Imai, Y., Livens, W., and D.
              Ooms, "Explicit Multicast (Xcast) Concepts and Options",
              RFC 5058, DOI 10.17487/RFC5058, November 2007,
              <http://www.rfc-editor.org/info/rfc5058>.

   [RFC5740]  Adamson, B., Bormann, C., Handley, M., and J. Macker,
              "NACK-Oriented Reliable Multicast (NORM) Transport
              Protocol", RFC 5740, DOI 10.17487/RFC5740, November 2009,
              <http://www.rfc-editor.org/info/rfc5740>.

   [RFC7426]  Haleplidis, E., Ed., Pentikousis, K., Ed., Denazis, S.,
              Hadi Salim, J., Meyer, D., and O. Koufopavlou, "Software-
              Defined Networking (SDN): Layers and Architecture
              Terminology", RFC 7426, DOI 10.17487/RFC7426, January
              2015, <http://www.rfc-editor.org/info/rfc7426>.

8.2.  Informative references

   [MPH]      Takahashi, H. and A. Matsuyama, "An approximate solution
              for the steiner problem in graphs", Math. Japonica,
              vol. 24, no. 6, pp. 575-577, April 1980.

   [OpenFlow]
              McKeown, N., Anderson, T., Balakrishnan, H., Parulkar, G.,
              and L. Peterson, "OpenFlow: Enabling Innovation in Campus
              Networks", ACM SIGCOMM, vol. 38, no. 2, pp. 69-74, April
              2008.

   [PGMCC]    Rizzo, L., "Pgmcc: A TCP-friendly Single-rate Multicast
              Congestion Control Scheme", ACM SIGCOMM, p: 17--28,
              October 2000, <http://doi.acm.org/10.1145/347057.347390>.

Authors' Addresses

   Tingwei Zhu
   Huazhong University of Science and Technology
   WuHan  430074
   P.R.China

   Email: twzh@hust.edu.cn






Zhu, et al.              Expires March 31, 2016                [Page 17]

Internet-Draft                MultiCast TCP               September 2015


   Fang Wang
   Huazhong University of Science and Technology
   WuHan  430074
   P.R.China

   Email: wangfang@mail.hust.edu.cn


   Dan Feng
   Huazhong University of Science and Technology
   WuHan  430074
   P.R.China

   Email: dfeng@hust.edu.cn


   Qingyu Shi
   Huazhong University of Science and Technology
   WuHan  430074
   P.R.China

   Email: qingyushi@hust.edu.cn


   Yanwen Xie
   Huazhong University of Science and Technology
   WuHan  430074
   P.R.China

   Email: ywxie@hust.edu.cn





















Zhu, et al.              Expires March 31, 2016                [Page 18]