Internet Draft J. Macker Expiration: May 5, 1997 NRL draft-macker-mdp-framework-00.txt W. Dang U of Hawaii 5 November 1996 The Multicast Dissemination Protocol (MDP) Framework This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as ``work in progress.'' To learn the current status of any Internet-Draft, please check the ``1id-abstracts.txt'' listing contained in the Internet- Drafts Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or ftp.isi.edu (US West Coast). Abstract This document outlines a simple protocol framework for reliable multicast dissemination of data files. The general framework was originally developed and used by the Image Multicaster (IMM) application within the Internet MBone for dissemination of satellite imagery. This document describes the potential for more general use of the protocol framework, its operational modes, some performance issues, and the basic application data units (ADUs) presently used. This is not intended to be a detailed protocol specification document, but rather a broad description of the basic architectural approach. Further detailed description of the protocol implementation may be provided in future documents. Introduction The Image Multicaster (IMM) application was originally designed and implemented in 1993 as a reliable multicast tool for disseminating compressed imagery files to a group of multicast receivers. It was operated and tested within the MBone for periodic reliable bulk transfer of satellite imagery files using UDP/IP multicast transport as defined in RFC 1112 [2]. Besides imagery dissemination support, the IMM reliable multicast framework is useful for multicast bulk file transfer of more general data types. In order to further Macker, Dang [Page 1] Internet Draft Multicast Dissemination Protocol November 1996 clarify the general application of this framework, we will refer to it as the Multicast Dissemination Protocol (MDP) within this document. This informational document provides an operational overview of the MDP protocol framework and some discussion of the application data unit (ADU) types used in the present implementation. Motivation Generic IP multicast builds upon the connectionless, best effort service provided by UDP or raw IP and provides no guaranteed reliable or ordered delivery of data to end applications. Rather than an apparent reliability disadvantage, this feature can be an advantage to the rich set of application classes that can best determine their own definition of reliability and protocol operation by incorporating application layer considerations[5]. There are a number of references for a more extensive discussion of reliable multicasting design and application issues [4,7]. While there are a number of application classes with unique requirements motivating a variety of reliable multicasting design approaches, there is a general need for non-real-time bulk file transfer support. This is the specific category of problem to which MDP provides a candidate solution. Description of IMM/MDP Approach MDP is a protocol framework that implements reliable multicasting for bulk files using application layer framing concepts [5]. The MDP framework consists of both receiver (client) and sender (server) software modules. Multicast receivers wishing to subscribe to a multicast file dissemination service require the client module, while dissemination sources require the server module functionality. Present distributed versions of the IMM software package provide both client and server functionality. At present, MDP performs its protocol functionality using a single multicast group without additional unicast connections. MDP uses selective negative acknowledgement (NACK) to request repairs of missing data. It uses NACK suppression methods and event timers to minimize congestion and retransmission requests within the network. This is an important scalability feature for operating such a framework within a WAN infrastructure with a potential large receiver set. This feature and additional message aggregation functionality help reduce the likelihood of a network message implosion effect. History of Network Usage As mentioned, the IMM/MDP approach has been used within the MBONE for periodic dissemination of satellite imagery to subscribed clients worldwide. With a simple connection to the MBONE, IMM allowed users Macker, Dang [Page 2] Internet Draft Multicast Dissemination Protocol November 1996 from around the world to continuously view the latest weather images from satellites covering most of the earth. The initial project lasted for several months transmitting hourly satellites images derived from the GMS-4, GOES-7, METEOSAT, and GOES-8 satellites. Rapid dissemination of time sensitive information without the need for a worldwide hierarchical distribution system were key beneficial features of multicasting over the MBone. Protocol Framework and Operation The following section discusses the overall functional operation and design of the present MDP protocol framework. There are a number of protocol variables and operational modes which are described, but it is not the intent of this document to provide functional specification level of detail. Figure 1 is a high level description of the operation of the MDP framework. Basic UDP/IP multicast provides the transport channel for the service and the MDP reliability process provides reliable delivery of transmitted file data to multicast clients. ----------- |--MDP Reliability Process--| | Archive | v v ----------- --------- ---------- ------------- ^ | file/ | | MDP | -------------> | MDP | | | bulk | -> | Source | <------------- | Receivers |---------| | data | |(server)| Multicast | (clients) | | --------- ---------- Dissemination ------------- v Channel -------------- | Post | | Processor | | e.g, image | | viewer | -------------- Figure 1: High Level MDP Operation The MDP server fragments file data into a series of data units based upon the MDP maximum data unit (MDU) length setting. This data is provided to clients over a multicast dissemination channel using UDP/IP multicast transport. Server and client MDP processes use an initial transmission cycle and a successive series of recovery cycles to ensure reliable delivery of file data among the group of multicast receivers. The transmission cycle is the time at which data is first transmitted and a recovery cycle is the period during which data repair requests are serviced and selectively requested data repair packets are retransmitted. An operational description of the transmission and recovery cycles is presented later in this document. Macker, Dang [Page 3] Internet Draft Multicast Dissemination Protocol November 1996 Server Features The MDP server is designed to transmit a file of any type or an entire directory structure to a multicast receiver group. The user can set the server to reliable transmit all files within a hierarchy once only or continuously. If set continuously, the server can multicast files in a round robin scheduled fashion or optionally transmit updated files in the directory after one complete initial pass through the directory. The server checks for updated files by examining the file timestamp and comparing it to the servers last file sent timestamp. In effect, this optional feature allows for an underlying file update frequency to more directly determine the required transmission frequency. The present MDP server design is based upon a simple source rate control mechanism. The user has the ability to control the transfer rate as well as an additional transmission frequency wait period. The transfer rate pertains to the actual transmission rate of individual packets from the multicast source. The server uniformally distributes packet transmission times based upon this setting and the MDU value. The transmission frequency wait period defines the amount of time to wait before starting the next transmission. A zero setting for this wait period allows for continuous file transfers, whereas a setting of 3600 seconds results in the next file being transmitted one hour after completion of the previous file transmission. If the file is not finished transmitting a complete file before the expiration of the frequency setting, the server will attempt to complete the full file transmission procedure. Therefore a zero setting results in back to back full file transmissions. In the special case of continuous file transmissions, the server has the ability to stack multiple simultaneous file transmissions up to a predefined maximum limit (e.g., 8 files). This can result in more effective usage of bandwidth during the recovery cycle period when the server is waiting for clients to make requests for missing data. For applications desiring positive acknowledgment of files received from clients, the server provides another optional operational mode which requests the clients to provide a positive acknowledgement upon final receipt. Again, the client responses are designed to be random time delayed over a uniform window to smooth out the flow of packet data proceeding back to the server. Client Features The MDP client application has been designed to offer flexibility in processing each newly received file. As shown in Figure 1, receiving nodes have the individual option of allowing the newly received file to be either archived or post processed through a user selected Macker, Dang [Page 4] Internet Draft Multicast Dissemination Protocol November 1996 executable (e.g., jpeg viewer, VRML viewer, spreadsheet application,etc ), as shown in Figure 1. In the present design, the client is also capable of tracking several file transmissions simultaneously from a single server as well as from multiple servers. Transmission from multiple independent sources allows for collaborative distribution of files between large audiences. The client will properly reassemble file transmissions arriving from multiple independent servers sending on the same multicast address. Within the present design, clients are free to drop in or out of a group session. However, upon initiation of a receiver, a group membership packet is sent by the client to cordially announce it's participation in receiving multicasted data. At the present time, it is only used by other clients to track membership. MDP ADU Packet Types To provide some background terminology to the reader prior to a detailed discussion of protocol operation we present ADU packet types and a brief description of how they are used by MDP. The MDP framework uses five different main types of ADU packets to transfer information. They are: (1) Identification Used by server convey file information (2) Data Used by the server to transmit file data (3) Missing Data Used by clients to report missing data to server (4) Command Used by server to to query or trigger client responses (5) Statistic Used by client to report summary statistics Identification (ID) ADU Packet The ID ADU packet provides file information to the clients. It is used to advertise upcoming transmissions and the wait period. The ID ADU contains the following type of information. * protocol version * file identification number (source unique) * file size (bytes) * delay interval between transmissions * file name * flags The file identifier number is a source unique assigned number which serves as a reference handle to allow clients to make specific requests concerning a particular file. The ID packet is transmitted at the end of each file transmission and also at the end of its Macker, Dang [Page 5] Internet Draft Multicast Dissemination Protocol November 1996 recovery cycle period. In this way, the ID packet helps to synchronize the negative acknowledgment recovery cycle period between all of the clients. Data ADU Packet The Data ADU packet is used by the server to transmit the actual data content of a file. Transmission is performed by splitting each file into fixed length segments, the application transmission unit (MTU). These segments are then multicasted as the payload within a data ADU packet. The Data ADU contains the following type of information. * protocol version * file identification information * offset position (bytes) * flags The offset position is the file offset in bytes from the beginning of the file. It allows the client to reconstruct the original file by inserting the data segment into a duplicate file at the same offset. Data packets are multicasted sequentially during initial file transmission. During a recovery cycle, data packet retranmissions are selectively triggered by missing data repair request packets from clients described below. Missing Data Repair Request ADU Packet The Missing Data Repair Request ADU packets are client requests for retransmission of specific data packets for a particular file from a particular server. The Data ADU contains the following type of information. * protocol version * file identification information * server IP address * flags * list of missing offset positions Client scalability across large groups is another important feature of MDP. The key to this capacity is that clients primarily use NACKs, or missing data repair requests, for feedback to the server. NACK-based reliability can sharply reduces client requests to the server. Clients only request packets of data not received by detected gaps in received data and an in addition clients attempt to reduce the likelihood of duplicate repair request transmissions by listening for duplicate requests. See the recovery cycle section for more Macker, Dang [Page 6] Internet Draft Multicast Dissemination Protocol November 1996 discussion of this feature. Command ADU Packet The Command ADU contains the following type of information. * protocol version * file identification information * flags * cycle duration * list of positively acknowledged clients Command ADU packets are server requests for quick timed delay responses from clients. Based on the flag settings, the server can optionally request for positive acknowledgment (PACK) of specific files or only request for responses if data was not received (NACK). Clients accordingly respond with associated NACK and PACK responses, however clients will not send a duplicate NACK if another client has been heard sending the same request. The command packet also provides the mechanism for a return handshake to clients indicating positive acknowledgement reception. This is done by including in the Command packet, a variable length list of confirmed list of client IP addresses from which the server has received a positive acknowledgment. See the recovery cycle and the statistics report packet for a more information. Statistics Report ADU Packet The statistic ADU packet provides general statistical summary information from a client that has been receiving data from a server. The Statistics Report ADU contains the following type of information. * protocol version * file identification information * server IP address * flags * complete files received * incomplete files * total packets received * total number of retransmission requests The file identification information in this packet can be used by clients as a PACK indicator for complete reception of a particular file. Transmission Cycle Overview The MDP server transmission cycle is described as follows. The MDP Macker, Dang [Page 7] Internet Draft Multicast Dissemination Protocol November 1996 server enters the initial transmission cycle and multicasts an ID packet specifying the name of the file, its size, as well as the delay between transmissions. The server then begins multicasting the contents of the file using data packets uniformally distributed in time based upon the present transmission rate value. Each data packet contains a file identifier and a file offset pointer to uniquely identify each packet for client reassembly. Upon completion, the server enters a series of recovery cycles to retransmit missing data packets reported by clients. The server repeats the recovery cycle process until it receives no more requests from clients or until a timeout expires. A summary state diagram of the server transmission cycle is shown in Figure 2. {send ID}->{send data}-> {recovery cycle} -> {no requests/timeout} ^ | ^ | | v | v | {retransmit data} {send cmd seq | nack request} | | | v |<---{client resp}<-{yes}<-{any response?} | v {no} | v {finish} Figure 2: Server State Diagram Recovery Cycle Overview The following steps detail what happens in the MDP recovery cycle. During the recovery cycle clients make repair requests by providing an aggregate list of missing packets to the server. This list of requested packets is transmitted within the multicast group. Since the clients make random delay requests over a backoff window, the probability of clients sensing duplicate repair requests within the multicast group is increased. All packets sent by the server during the recovery cycle contain a EOF flag setting and a recovery cycle flag which marks a transition to a new recovery cycle. To mark the cycle, the server first broadcasts an ID packet, then uses a heartbeat timer setting (e.g., 2 seconds) to trigger successive command packet transmissions for resynchronizing clients. When the client detects entry into a new recovery cycle, a random time delayed missing repair request packet response is triggered. Each client is allowed only one random time delay request for missing packets. The client request for missing Macker, Dang [Page 8] Internet Draft Multicast Dissemination Protocol November 1996 packets should not repeat any missing packet requests previously heard from any other clients during that recovery cycle. The server will immediately retransmit missing packets reported while continuing to listen for additional client repair requests. Upon completing retransmission, the server begins a new recovery cycle and send another ID packet and set of command packets. If no client requests are heard during the recovery cycle the server will time out dependent upon on the present frequency setting. The purpose of a server controlled recovery cycle period is to shorten the duration of the cycle period and to increase the turnover frequency of recovery cycles. A higher recovery cycle turnover frequency results in faster file transfer to all clients. An overview of the server recovery cycle state diagram is shown in Figure 3. {send ID/toggle header flag}->{wait period}-->{no requests/timeout} ^ | | | V v | {request heard} <--| {send cmd} | | | | | V | | |<-----------{retransmit data} | | | V - ---{timed out} | V {End} Figure 3: Server Recovery Cycle State Diagram In the present design, each client tracks delivery of the file in block segments. If the file is larger than this size, it dynamically allocates memory to track the additional data segment. If a client determines it is also missing packets at the end of a block segment and is allowed to make requests for missing packets by the server (auto request header flag setting), the client enters into a recovery cycle phase as defined above. Upon completion of the recovery cycle, the server resumes transmitting the file where it left off. If the client determines the server has transitioned to a new data segment (as defined above) it will reenter the recovery cycle phase to request any missing packets if they still exist. This mode of operation allows for data repair cycles to occur at defined intervals during the initial data transmission rather than requiring multiple passes upon one complete transmission of the file. The purpose of this feature is to regulate the server from advancing too far ahead of clients requiring repair packets. When servicing a missing data repair request, the server will automatically multicast all data packets requested. Upon fulfilling all requests the server will send another ID packet and toggle the Macker, Dang [Page 9] Internet Draft Multicast Dissemination Protocol November 1996 recovery cycle flag. The recovery cycle flag indicates to all clients the beginning of a new recovery cycle. The recovery cycle time duration is determined by a timer value for the heartbeat interval (e.g., 2 secs). The server will continue the recovery cycle process until the server completes one cycle without any client repair requests begin received. If at this time the server has completed the file transmission, the server will send a periodic sequence of command packets with the NACK flag set. Upon hearing this command packet, clients that have not received a complete file are designed to do a short time delayed response to the server to keep it in the recovery cycle. Once again, a client will not make a response if it had previously heard a similar client response. The server only need to hear one response before starting back into the recovery cycle. An overview of the client recovery cycle state diagram is shown in Figure 4. {Packet contains EOF} -> {Toggled Recovery Flag?} ->{Initiate time ^ delay response} | | | v | {Listen for other | client requests} | | | v |< -{Send non-duplicate}<-{Incomplete file}<-- {Timeout} | {repair request} | | | | | | v | {file completed} | | | v |<---{send stat report}<---- {no} <-- {server heard PACK?} | {yes} | v {End of cycle} Figure 4: Client Recovery Cycle State Diagram An optional operational mode is available in which the server can request positive acknowledgment of complete file reception from clients, the server will set the PACK flag in all outgoing packets. Upon file completion, clients will multicast a random time delay stat packet. For each command packet received from the server, each client will continue to send a stat packet (at a heartbeat interval) until it has timed out or has received a command packet from the server acknowledging receipt of client's stat message. This optional mode of Macker, Dang [Page 10] Internet Draft Multicast Dissemination Protocol November 1996 operation is not recommended when large group membership is anticipated, due to the corresponding increase in multicast message traffic. It can, however, provide list-based assurance of particular membership delivery when desired. To improve achievable throughput, the server may initiate another file transfer while being in the recovery cycle of another. The transmission rate remains constant since the new file transfer process and existing recovery cycle are performed asynchronously. The server always responds first to client requests then resumes new file transfer. As detailed above in the server recovery cycle, anytime the server's packet transmission changes to a new data segment being tracked by the client, the client will enter into a recovery cycle to request any missing packets in the old segment. Future Work and Design Issues While the present design has been through limited MBone testing and has been shown to work effectively, there remain are number of design issues which the authors envision will continue to evolve. One of these issues is future approaches to flow control and congestion avoidance within a multicast group environment. We feel this is a general problem and not unique to MDP. While effective reactive flow control in a multicast environment remains a complex technical design issue, there are some basic flow control features in the present design that can be activated. In continuous transmit mode, the server can optionally adjust the data transfer rate by monitoring feedback of total packet retransmission requests from clients as compared to total packets sent for each file. The authors are aware that future design modifications are likely to occur here since many important issues remain unresolved concerning reliable multicast reactive flow control for WAN environments. We are exploring modifications to the protocol framework in this area. Nonetheless, in many instances, fixed source rate control can work effectively (e.g., in combination with a resource reservation protocol) and avoids the difficulty of managing a large group session around single or small populations of faulty or poorly performing clients below a desired group throughput threshold. As is presently done with many other Mbone applications (e.g., compressed video), we recommend that MDP/IMM users pay attention close attention to initial rate settings of their servers. To prevent accidental poor practice, reasonable lower and upper rate limit settings and default values are used within the implementation release. In summary, MDP can provide some rate adaption based upon the size of the NACK list experienced within the recovery cycle. NACK aggregation and duplicate request suppression at the receiving clients keeps reliable control loop traffic to a minimum during bulk Macker, Dang [Page 11] Internet Draft Multicast Dissemination Protocol November 1996 data transfer. This simple approach can be quite effective for a number of non-real-time bulk file transfer applications. There is a potential advantage in applying this protocol framework in combination with a reservation protocol(e.g., RSVP [6]) and future integrated services capabilities. The source rate control setting can be reflective of the bandwidth reserved and protocol timers can be better tuned to operate within average or upper bound delay expectations. While this is not required for protocol correctness, hybrid operation can support better performance. In addition, for high error rate and asymmetric network channels the adaptation of MDP to a hybrid reliable multicast dissemination scheme using both forward error correction and retransmission is presently under design and consideration. Suggested Usage As mentioned, the present MDP framework is seen as useful for the reliable bulk file transfer over generic IP multicast services. It is not the intention of the authors to suggest it is suitable for supporting all envisioned multicast reliability requirements, but rather it provides a simple framework for multicast file dissemination applications with a degree of concern for network traffic implosion. As previously described, IMM has been successfully demonstrated within the MBone for bulk data dissemination applications, including weather satellite compressed imagery updates servicing a large group of clients. In addition, this framework approach has some design features which make it attractive for bulk transfer in asymmetric network applications. The multipass repair cycles allow receiver group members to better aggregate and minimize duplicate repair requests with looser timing estimation and windowing requirements than approaches designed for smaller messaging and real-time interaction. A source-only repair approach may also make technical sense in asymmetric networks. Asymmetric architectures supporting multicast delivery are likely to make an important portion of the future Internet structure (e.g., DBS/cable/PSTN hybrids) and efficient, reliable bulk data transfer will be an important capability for large servicing groups of subscribed clients. Security Considerations No discussion of security considerations has been provided here. The authors recognize there is future work to be done here. For the readers information, limited operational testing of MDP using IPSec extensions for IPv4 has been accomplished to date. Macker, Dang [Page 12] Internet Draft Multicast Dissemination Protocol November 1996 References [1] W. Dang. "Reliable File Transfer in the Multicast Domain". Technical Report. August 1993 [2] S. Deering. "Host Extensions for IP Multicasting". Internet RFC 1112, August 1989. [3] J. Chang and N. Maxemchuk. "Reliable Broadcast Protocols". ACM Transactions on [4] S. Floyd, V. Jacobson, S. McCanne, C. Liu, and L. Zhang. "A Reliable Multicast Framework for Light-weight Sessions and Application Level Framing". In Proc. ACM SIGCOMM, August 1995. [5] D. Clark and D. Tennenhouse. "Architectural Considerations for a New Generation of Protocols". In Proc. ACM SIGCOMM, pages 201--208, September 1990. [6] L. Zhang, S. Deering, D. Estrin, S. Shenker, and D. Zappala. "RSVP: A New Resource ReSerVation Protocol". IEEE Network Magazine, pages 8--18, September 1993. [7] J. Macker, M. Corson, E. Klinker, "Reliable Multicast Data Delivery for Military Internetworking". IEEE MILCOM 96 Proceedings, pages 399-403, October 1996. Authors' Addresses Joe Macker Naval Research Laboratory Information Technology Division Washington, DC 20375 Phone: +1 (202) 767-2001 Email: macker@itd.nrl.navy.mil Winston Dang University of Hawaii Rm. 304a 2565 The Mall Keller Hall Honolulu, Hawaii 96822 Phone: +1 (808) 956 3490 Email: wkd@hawaii.edu Macker, Dang [Page 13] Internet Draft Multicast Dissemination Protocol November 1996 Macker, Dang [Page 14]