INTERNET DRAFT EXPIRES OCT 1998 INTERNET DRAFT Network Working Group J. Beauchamp Raytheon E-Systems December 1997 The Coherent File Transport Protocol Status of This Memo This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress." To view the entire list of current Internet-Drafts, please check the "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow Directories on ftp.is.co.za (Africa), ftp.nordu.net (Northern Europe), ftp.nis.garr.it (Southern Europe), munnari.oz.au (Pacific Rim), ftp.ietf.org (US East Coast), or ftp.isi.edu (US West Coast). Distribution of this document is unlimited. Introduction: The Coherent File Transport Protocol is an adaptation and extension to the Coherent File Distribution Protocol described in RFC 1235 [1]. The adaptations and extensions are designed to optimize the movement of information over a highly asymmetric satellite broadcast channel with a low bandwidth and a potentially long delay return path to the server source. This protocol is designed to exploit the broadcast capabilities of a satellite data service with the capability of a high bandwidth (10s of Mb/s) and to provide reliable delivery to a large number of users. The protocol is designed to operate in a noisy environment (Bit Error Rate of 10-6) and makes few assumptions about the receiving locations with respect to their availability or operating environment. Recipients are expected to have a return path to the broadcast source to positively acknowledge receipt of a unit of information (referred to here as a file). The delay of the return path from recipients to source is expected to be variable and potentially long. The protocol allows for instances in which the return path may be totally absent (see Protocol Extensions). The protocol differs from a traditional client-server model in that the transmission source is free to broadcast either unanticipated or unsolicited information to one or more recipients, Each active recipient must examine each received file and packet and decide if the file or packet is of interest. In order to preserve this differentiation, this memo refers to recipients and sources rather than clients and servers. In broad application, CFTP is anticipated to be most useful as a protocol to tunnel other protocols over a satellite broadcast network reliably. However, the protocol has advantages in other broadcast environments in that it requires very little setup to initiate a transfer and the acknowledgment mechanism consumes a minimum amount of bandwidth by each recipient. Simple modifications to the protocol as described here allow a unique combination of services to users who can respond through acknowledgments and to users who cannot or will not respond to broadcast data but who wish to receive information. Beauchamp [Page 1] Overview: As in RFC-1235, our implementation uses UDP [2] as the transport protocol but this is not a requirement of the protocol. Any broadcast datagram protocol could be employed. The satellite broadcast implementation assumes that there is only a single, unidirectional link (the satellite channel) between the source and recipient and that acknowledgment information from the recipient to the source is carried on a separate and independent media. In the general case, we assume that a file (which is any form of delimited traffic) to be transmitted is sent to the satellite source by an outside agent by any convenient protocol. We also assume that this file contains information about the intended recipients so that the file can be properly addressed to the destination. For example, a multicast group IP address [3] could be used by the source and this multicast group IP address would then become the address used by CFTP. While it is outside of the scope of the protocol discussed here, there is an issue of delivery acknowledgment between the external originating agent and the CFTP recipient. CFTP is a "best effort" delivery protocol in that the protocol will reliably deliver a file to any and all recipients who are listening to the satellite broadcast at the time the file is transmitted and respond with acknowledgements. However, the action of the protocol is not driven by delivery to specific recipients who may or may not be listening at the moment of transmission. Although CFTP does not maintain a list of active recipients, the protocol is capable of identifying all recipients who have successfully received a specific file. This information can be sent to the originating external agent. The protocol begins with the receipt of a file from an external agent. CFTP assigns a unique number to this file in the form of a 32 bit value referred to as the "ticket number". This 32 bit value is the binding entity that allows both the recipient and source to refer to the same file. This ticket number is combined with file size, file name, block size (information content size of each packet associated with the file), and user information to form a packet that is referred to as a "ticket". The ticket packet is queued for transmission. Our prototype used simple FIFO queuing within priority classes but any other queuing discipline can be applied. When the ticket packet is taken from the queue, it is broadcast over the channel. If the channel is noisy, this ticket packet broadcast can be repeated one or more times to improve the probability of receipt of the ticket by all intended recipients. As soon as the ticket packet transmission is complete, it is immediately followed by the broadcast of the file in the form of packets that contain the ticket number, the packet sequence number (reset to 0 at the beginning of each new file), and the packet data. When the file transmission is complete, the source is free to select the next queued item (either a new ticket packet and file or retransmission of previously NAKed packets)and begin broadcasting this new item. Each recipient listens for the ticket on a preestablished UDP port and decides if the ticket is of interest. If a specific recipient decides to receive the file associated with the ticket, it immediately allocates space to receive the file and marks all packets as not received. As Beauchamp [Page 2] packets are correctly received, they are placed in their proper location as determined by the packet sequence number and marked as received. When the last packet of a transmission is received or the recipient notes a change in ticket numbers in the received stream or the expiration of a receive file timeout, the recipient notes all of the packets not received by packet sequence number and composes a list of these packet sequence numbers. This list of packet sequence numbers becomes a selective NAK for this recipient and is transmitted back to a preestablished port at the CFTP source using any convenient protocol; our prototype implementation uses standard TCP for this return. The recipient includes the ticket identifier so that the source can unambiguously identify the file. A receipt containing a ticket number with no packets is taken by the source as a positive acknowledgment of receipt of the file. At the source, these NAK messages are collected and a list of all packets not received by one or more recipients is created. When the source decides that the last of these NAK messages has been received, it queues the ticket and list of packets to resend. When this ticket arrives at the top of the queue, the packets contained in the consolidated list of NAKed packets is sent using the standard data transmission packet described below; the ticket is not resent. As each recipient completely receives the file, it simply ignores any other traffic associated with the file. This means that different recipients can complete reception of a file at different times depending upon their local environment. Two additional conditions are worth discussion in this overview. First, the effect of a selective NAK that arrives after the next transmission has been scheduled and second, the arrival of a selective NAK for a ticket that has been closed. Other than the potential loss of channel efficiency, the effect of a late arriving NAK transmission causes few problems. The protocol notes the packets received in error and associates these packets with the next scheduled transmission. The problem, of course, is that the late NAK may include packets that are already scheduled for transmission and repeating them simply wastes bandwidth. In the case that the source has closed a ticket, the source simply drops the unexpected request for information; the recipient must time-out the transfer and dispose of the partially received file. Protocol Specification: Initiation (not strictly a part of CFTP): An external agent presents a delimited item of traffic (file) to a well known port to the CFTP service agent. For our prototype implementation, this service agent terminates the protocol with the external agent. At this point, the external agent can only know that the file was delivered to the CFTP source and the CFTP source will make a best effort delivery to the CFTP recipient(s); the CFTP source will deliver reliably to those recipients who respond. With the file in hand, the CFTP source builds a ticket as shown in figure 1. The ticket number appears as the first data item. The checksum is used to test the integrity of all information in the packet past the filler. For this initial transmission, the type field is filled with 'F' Beauchamp [Page 3] indicating the first transmission of this file and that the ticket should be broadcast. At the source, we have included a priority field that is used to determine the transmit queue ordering. Our prototype implementation uses this format as an internal control structure and queues tickets (using this internal control structure) first-in-first-out within a priority class. The value of block size is the amount of file data that will be included in each UDP packet transmitted. With a 16 bit value, we can accommodate UDP packets up to 64k Bytes. A total of 255 bytes is allocated to the file name. The value of file name is a null- terminated ASCII string. The remainder of the packet is allocated to user data. We anticipate that this user data area will incorporate metadata that will be used by recipients to decide if this file is to be received. First Transmission: At the moment the ticket is selected for transmission by the source, the priority field is replaced by the total block count for this file and the type field is set to 'T'(fig. 2). This allows recipients to recognize that this is a ticket packet and to create their received list as well as allowing them to allocate space to receive the file that will follow immediately. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | "ticket" | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | "chksum" | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | type = 'F' | filler | user data length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | priority | blksize | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ / / \ Filename, null-terminated, up to 255 octets \ / / +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ / / \ user data area up to (blksize - 255) octets \ / / +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Fig. 1: Source Ticket Packet (internal). Beauchamp [Page 4] 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | "ticket" | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | "chksum" | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | type = 'T' | filler | user data length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | block count | blksize | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ / / \ Filename, null-terminated, up to 255 octets \ / / +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ / / \ user data area up to (blksize - 255) octets \ / / +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Fig. 2: Source Ticket Packet transmitted. While it is not a requirement of the protocol, our prototype implementation broadcasts the ticket multiple times. This rebroadcast increases the likelihood that all recipients will hear the ticket. In general, we cannot know the status of all potential recipients nor their channel performance. Should the recipient see more than one of multiply broadcast copies of the ticket, the recipient simply ignores the duplicates. As soon as the ticket is transmitted, the source transmits all of the packets for the file. The CFTP data packet is shown in fig. 3. The last packet of a file may not occupy a complete blksize and it is up to the recipient to note that this last packet is short. The type field is set to 'B' and, except for the last packet, the EOT field is set to 0. The EOT field is not absolutely essential for the first transmission, the recipient could easily compare the current block number with the block count contained in the ticket. However, the EOT field is important for the partial transmissions to be discussed later since it identifies the last data to be transmitted under this ticket at this time and the last block to be transmitted may not be the last block in a file if the last block was not requested by any recipient. Beauchamp [Page 5] 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | "ticket" | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | "chksum" | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | type = 'B' | EOT | block number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ / / \ Filedata, up to blksiz octets \ / / +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Fig. 3: Data Packet. Recipient Protocol: A state diagram for the recipient operations is shown in figure 4. Once a recipient hears a ticket and decides to receive the file, the recipient adds the ticket identifier to the list of active tickets and listens for broadcast packets on the CFTP/UDP port that have a ticket number that matches one of the active tickets being received. If the data packet contains a packet that has not been previously received, the recipient adds the data into the received file and marks the block as being received. +-----------+ | Recipient | | start | | | | receive | | ticket | +-----------+ | received packet | receive packet .-----------------------. | | V V +---------+ +---------+ | INCMPLT | | | | | timeout 1 | receive | <---. .-| send |<------------| | | received packet | | PARREQ | or | message | ----' | | |non received | | | +---------+ packets +---------+ | ^ | | | '---' |finished | timeout 2 | | | | timeout 3 or | | retry limit V | +-----------+ +---------+ .-->| ABORT | | END | +-----------+ +---------+ Fig. 4: Recipient State Transition Diagram (figure adapted from RFC 1235) Beauchamp [Page 6] If the packet EOT flag is non-zero, the recipient scans the list of received blocks associated with the ticket and notes all of the blocks that it has not received. This list of non-received blocks is composed into a partial request message packets and sent to the source, the format for this message packet is shown in figure 5. As noted in the overview, the recipient can use any convenient protocol to return this packet to the source however, we anticipate the use of a protocol such as transaction TCP [4]. The source is assumed to be at a well-known IP address or extracted from the IP packet and the recipient directs this message to the CFTP port at the source address. In our prototype implementation, if there are more unreceived blocks than can fit within a single message packet, the recipient holds these additional message packets until the next transmission of the file. This is not a requirement of the protocol, the recipient is free to send multiple NAK packets through the return path by whatever mechanisms are permitted in the return protocol. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | "ticket" | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | "chksum" | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | blkcnt | block #1 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | block #2 | block #3 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | block #4 | block #5 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ / / \ block numbers, up to blksiz octets \ / / +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Fig. 5: Partial Request Message. If the recipient finds no blocks to request from the source, it marks the file as received and removes the ticket from its list of active tickets. Note that the recipient sends a return message indicating there are no additional packets required. This is interpreted by the source as a positive acknowledgment of receipt of the file. Since the recipient removes the ticket from its list of active tickets, any packets received for this ticket will be simply dropped. If a recipient does not receive the last packet of a message then the recipient must transition to the INCMPLT state. In a situation in which there is a continuous flow of traffic, the recipient will notice the change in ticket number without having received the last packet of the previous transmission. In a lightly loaded channel, some significant period of time could elapse before another ticket arrives and thus we use timeout 1 to force the transition to the INCMPLT state. Beauchamp [Page 7] It is possible that a partial request message could be lost in the return path or that the recipient has missed a retransmission of a small number of packets from the source. To handle this condition, the recipient periodically retransmits the partial request to the source as set by timeout 2. We use timeout 3 to abort partially received messages. If we have not heard a reply from the source after a long time, the recipient assumes that both the message and ticket have been lost and the recipient deletes all partially received blocks from the message and then removes the ticket from the list of active tickets. Finally, it is possible for a recipient to be receiving small portions of a message with each retransmission. Even though reception progress is being made by an individual recipient, the network resources required for this individual may be excessive. We control this through a retry limit. The retry limit is associated with the recipient rather than the source in order to be able to offer more persistent delivery to recipients who are special. Server Protocol: A state diagram of the server protocol is shown in figure 6. Initially, the source is idle; it receives a file from an external agent and builds a ticket. The ticket is transmitted followed by the transmission of the entire file. When the message is completely transmitted, the source places the ticket in the wait-for-ack state. While the ticket is in this state, the source receives the partial requests from recipients and builds a list of the blocks to retransmit. In general, the source cannot assume that it knows the total number of recipients that might respond and therefore removes the ticket from this state after a time-out. Upon exiting this state, the source notes the total number of blocks requiring retransmission. If this total number of blocks is zero, the source assumes that all interested recipients have correctly received the file and removes the ticket and file from the active list. If the number of blocks requiring retransmission is not zero, the source queues the (internal) ticket for retransmission along with the list of blocks to be retransmitted. When the retransmission of the list of blocks is complete, the source again waits for responses from the recipients. +--------+ +---------+ +----------+ +---+ | Source | extrn | send | | wait for | no | d | | idle |------->| ticket |----->| ack |--------->| o | +--------+ request| and | | packets | requests | n | | file | .>| | | e | +---------+ | +----------+ +---+ | | time-out | V | +----------+ | | send | '-| requested| | blocks | +----------+ Fig. 6: Source State Transition Diagram Beauchamp [Page 8] It is possible that there is no recipient that decides to receive a particular file. This is detected by CFTP by noting that there are no requests for retransmission after the wait for ack timeout. This approach simplifies the protocol state machine to transition to the done state in any instance in which there are no requests for packets from a message. Tunable Parameters: Packet size: In a satellite broadcast application, there is considerable flexibility in setting packet sizes. Shorter packets are individually less likely to be errored by channel noise but reduce user data bandwidth through protocol overhead. Shorter packets also potentially increase the amount of recipient bandwidth to report non-received packets. Longer packets are individually more likely to be errored by channel noise and can significantly reduce user data bandwidth if the channel is noisy. Our analysis suggests that a packet length between 2,000 and 2,500 octets is a reasonable value in channels that are as bad as 1x10-7. Time-outs: Setting appropriate time-out values in the protocol are somewhat implementation-dependent and certainly dependent upon the anticipated loading within the channel. Our prototype implementation has emphasized speed of service and thus has set several time-out values associated with acknowledgment time-outs to minimum values. The most critical of these is in the source where the source is waiting for acknowledgment. We picked a value of 750 msec with this time-out beginning when the last packet of a file has been sent. This value allows 250 msec for transport over a satellite and 500 msec for acknowledgment returns. ACK Implosion: As with any reliable multicast or broadcast protocol, CFTP is subject to a potentially large number of acknowledgments in a large recipient population. This is easily controlled by incorporating intermediate ACK collection processors who forward a smaller number of acknowledgments to the source. Use of such ACK concentrators involves directing recipients to address retransmission requests to a concentrator. The source is not concerned about the address of the device providing the retransmission request. Protocol Extensions: The loose relationship between a CFTP source and CFTP recipient creates opportunities for some unconventional operations. First, we define a probabilistic delivery extension to the protocol in which we offer a "good chance" delivery to a recipient who cannot or will not respond. Second, by incorporating metadata that describes either the intended audience of a file or the content of the file (or both), we can have a file delivery model that includes content as well as address. Probabilistic Delivery: Probabilistic delivery uses one or more retransmissions to increase the likelihood that a recipient correctly receives a file given we have no feedback from the recipient. For example, if we have a file consisting of 400 packets of 2,500 octets (a 1 Mbytes file), the likelihood of this Beauchamp [Page 9] message being received correctly in a channel with a gaussian bit error rate (BER) of 1*10-7 is about 44%. If each packet is transmitted twice, the likelihood of correctly receiving this message rises to about 99.8% and with 3 transmissions, this probability is greater than 99.99%. CFTP is modified to operate in this mode by simple changes in the source. In the internal data structures maintained by the source, a field is added that indicates the number of transmissions of each packet during the initial broadcast of the file. Recipients receive these multiple transmissions and simply accept the first correctly received packet, the normal protocol operations at the recipients drops duplicates. After this initial broadcast, the source continues the protocol as described above. Any retransmissions caused by recipient requests are only broadcast once. Content Delivery: Investigations into metadata are relatively new but are very promising as a tool to summarize a document. Standards for metadata are only beginning to evolve and any attempt to specify a specific method here is likely to be incompatible with the future. We have reserved an area inside the ticket to include metadata but in this instance, the metadata field is included as a "vendor-specific area" as done in bootp. Our model for metadata within the ticket follows an entity-attribute model; we are following the activities of the X3L8 committee as they develop standards for metadata. References: [1] Ioannidis, J. and Maguire, G., "The Coherent File Distribution Protocol", RFC 1235, June 1991. [2] Postel, J., "User Datagram Protocol", STD 6, RFC 768, August 1980. [3] Deering, S., "Host Extensions for IP Multicasting", RFC 1112, August 1989. [4] Jacobson, V, Braden, R., Borman, D., "TCP Extensions for High Performance", RFC1323, May 1992. Security Considerations: Security issues are not discussed in this document. Author's Address Jere Beauchamp Raytheon E-Systems P.O. Box 12248 St. Petersburg, FL 33733 EMail: jnba@eci-esyst.com Phone: (813) 302-2397 INTERNET DRAFT EXPIRES OCT 1998 INTERNET DRAFT