Internet Engineering Task Force Audio-Video Transport WG INTERNET-DRAFT H. Schulzrinne/S. Casner AT&T/ISI May 6, 1993 Expires: 10/01/93 A Transport Protocol for Real-Time Applications Status of this Memo This document is an Internet Draft. Internet Drafts are working documents of the Internet Engineering Task Force (IETF), its Areas, and its Working Groups. Note that other groups may also distribute working documents as Internet Drafts. Internet Drafts are draft documents valid for a maximum of six months. Internet Drafts may be updated, replaced, or obsoleted by other documents at any time. It is not appropriate to use Internet Drafts as reference material or to cite them other than as a ``working draft'' or ``work in progress.'' Please check the I-D abstract listing contained in each Internet Draft directory to learn the current status of this or any other Internet Draft. Distribution of this document is unlimited. Contents 1 Introduction 2 2 Real-time Data Transfer Protocol -- RTP 4 2.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 RTP Header Fields . . . . . . . . . . . . . . . . . . . . . . . . . 5 3 Reverse Control 8 4 Real Time Control Protocol --- RTCP 9 4.1 Forward Control Options . . . . . . . . . . . . . . . . . . . . . . 10 INTERNET-DRAFT RTP May 6, 1993 5 Security Considerations 15 6 RTP over network and transport protocols 15 6.1 Defaults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 6.1.1Framing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 6.1.2RTA option . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 6.2 UDP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 6.3 TCP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 6.4 ST-II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 A Implementation Notes 17 B Addresses of Authors 18 Abstract This draft describes a protocol called RTP suitable for the network transport of real-time data, such as audio, video or simulation data. The data transport is enhanced by a control protocol designed to provide minimal control and identification functionality. A reverse control protocol provides mechanisms for monitoring quality of service and other content-specific requests. This protocol is intended for experimental use. This specification is a product of the Audio-Video Transport working group within the Internet Engineering Task Force. Comments are solicited and should be addressed to the working group's mailing list at rem-conf@es.net and/or the authors. 1 Introduction This draft concisely specifies a real-time transport protocol. A discussion of the design decisions can be found in the current version of the companion Internet draft draft-ietf-avt-issues.txt. The transport protocol provides end-to-end delivery services for one or more f_l_o_w_s_ of data with real-time characteristics, for example, interactive audio and video. It does n_o_t_ guarantee delivery or prevent out-of-order delivery, nor does it assume that the underlying network is reliable and delivers packets in sequence. H. Schulzrinne/S. Casner Expires 10/01/93 [Page 2] INTERNET-DRAFT RTP May 6, 1993 [Note that the sequence numbers included in RTP allow the end system to reconstruct the sender's packet sequence, but sequence numbers may also be used to determine the proper location of a packet, for example in video decoding, without necessarily decoding packets in sequence]. RTP is designed to run on top of a variety of network and transport protocols, for example, IP, ST-II or UDP. [For most applications, RTP offers insufficient demultiplexing to run directly on IP.] RTP transfers data in a single direction, possibly to multiple destinations if supported by the underlying network. A mechanism for indicating a return path for control data is provided. While RTP is primarily designed to satisfy the needs of multi-participant multimedia conferences, it is not limited to that particular application. Storage of continuous data, interactive distributed simulation and control and measurement applications may also find RTP applicable. Profiles are used to instantiate certain header fields and options for particular sets of applications. A profile for audio and video data may be found in the companion Internet draft draft-ietf-avt-profile.txt. This document defines two packet formats and protocols: o the real-time transport protocol (RTP) for exchanging data with real-time properties. o the real-time control protocol (RTCP) for conveying information about the sites in an on-going association. RTCP information may be ignored without affecting the ability to correctly receive information. RTCP is used for loosely controlled conferences, i.e., where there is no explicit admission control and set-up. Its functionality may be subsumed by a conference control protocol (which is beyond the scope of this document). Control fields (options) for RTP and RTCP share the same structure and numbering space and are carried within the same packet. Options may appear in any order, unless specifically restricted by the option description. [The position of some security options may have significance.] Each option consists of the final bit, the option type designation, a one-octet length field denoting the total number of 32-bit long words comprising the option (including final bit, type and length), and finally any option-specific data. The last option before the packet data portion has the 'F' (final) bit set to one, for all other options this field has a value of zero. Fields within the fixed header and within options are aligned to the natural length of the field, i.e., 16-bit words are aligned on even addresses, 32-bit long words are aligned at addresses divisible by four, etc. Octets designated as padding have the value zero. Options unknown to the RTP implementation or the application are to be ignored. Options with option types having values from 64 to 127 inclusive are to be used for private extensions. Fields designated as MBZ ('must be zero') must have a value of H. Schulzrinne/S. Casner Expires 10/01/93 [Page 3] INTERNET-DRAFT RTP May 6, 1993 binary zero and are to be ignored by the receiver. All integer fields are carried in network byte order, that is, most significant byte (octet) first. The transmission order is described in detail in [1], Appendix A. Unless otherwise noted, constants are in decimal (base 10). Textual information is encoded accorded to the UTF-2 encoding of the ISO standard 10646 (Annex F) [2,3]. US-ASCII is a subset of this encoding and requires no additional encoding. The presence of multi-byte encodings is indicated by setting the most significant bit to a value of one. A byte with a binary value of zero may be used as a string terminator for padding purposes. 2 Real-time Data Transfer Protocol -- RTP 2.1 Definitions A c_o_n_t_e_n_t_ s_o_u_r_c_e_ is the actual source of the data carried, for example, the user and host that originally generated the audio data. A s_y_n_c_h_r_o_n_i_z_a_t_i_o_n_ s_o_u_r_c_e_ is the combination of one or more content sources with its own timing. A n_e_t_w_o_r_k_ s_o_u_r_c_e_ is the network-level origin of the RPDUs as seen by the end system. An e_n_d_ s_y_s_t_e_m_ generates the content to be used in RTP packets and delivers the content of received RTP packets to the user application. An end system is a synchronization source. An (RTP-level) b_r_i_d_g_e_ receives RTP packets from one or more sources, combines them in some manner and then forwards a new RTP packet. A bridge may change the encoding. A bridge always changes the timing relationship, introducing a new time scale. Bridges are synchronization sources, with each of the sources whose packets were combined into an outgoing RTP packet as the content sources for that outgoing packet. Audio bridges and media converters are examples of bridges. Example: assume SMITH@FOO and JONES@BAR are using a bridge to translate their audio from one encoding to another. The bridge mixes audio packets from Smith and Jones together and forwards the mixed packets. If, say, Smith was talking, she is indicated as the content source of the outgoing packet, allowing the receiver to properly display the current speaker rather than just the bridge that mixed the audio. For an end system receiving RTP packets from that bridge, the bridge is the synchronization source and Smith the content source. The RTP-level bridges described in this document are unrelated to the data link-layer bridges found in local area networks. If there is possibility H. Schulzrinne/S. Casner Expires 10/01/93 [Page 4] INTERNET-DRAFT RTP May 6, 1993 for confusion, the term 'RTP-level bridge' should be used. [The name 'bridge' follows common telecommunication usage.] An (RTP-level) t_r_a_n_s_l_a_t_o_r_ does not alter the timing of packets. Examples of its use include encoding conversion without mixing or retiming, conversion from multicast to unicast, and application-level filters in firewalls. A translator is neither a synchronization nor a content source. A s_y_n_c_h_r_o_n_i_z_a_t_i_o_n_ u_n_i_t_ consists of one or more packets that, as a group, share a common fixed delay between generation and playout of each part of the group, or can only be scheduled as a whole. The delay may change at the beginning of such a synchronization unit. The most common synchronization units are talkspurts for voice and frames for video transmission. 2.2 RTP Header Fields The header fields have the following meaning: protocol version: 2 bits Defines the protocol version. The version number of the protocol defined in this draft is one. flow: 6 bits The value of the field is the flow identifier, one of the items used by the receiver for demultiplexing. A synchronization source is identified by the receiver as the unique combination of network source address, flow value, and the synchronization source option, if present. option present bit (P): 1 bit This flag has a value of one if the fixed RTP header is followed by one or more options. end-of-synchronization-unit (S): 1 bit This flag has a value of one in the last packet of a synchronization unit, a value of zero otherwise. format: 6 bits The 'format' field forms an index into a table defined through a conference announcement protocol (to be specified), RTCP messages, a conference server or some other out-of-band means. If no mapping has been defined in this manner, a standard mapping is specified by the companion profile document, RFC TBD. RFC 1340, Assigned Numbers, or its successor, is to be used. sequence number: 16 bits The sequence number counts RTP protocol data units (packets). The sequence number increments by one for each packet sent. [The sequence number may be used by the receiver to detect packet loss, to restore H. Schulzrinne/S. Casner Expires 10/01/93 [Page 5] INTERNET-DRAFT RTP May 6, 1993 packet sequence and to identify packets to the application.] timestamp: 32 bits The timestamp reflects the wallclock time when the RPDU was generated. The timestamp consists of the middle 32 bits of a 64-bit NTP timestamp, as defined in RFC 1305 [4]. Note that several consecutive packets may have equal timestamps. The timestamp of the first packet(s) within a synchronization unit is expected to closely reflect the actual sampling instant, measured by the local system clock. It is not expected that the timestamp of the beginning of every synchronization unit is based on a local synchronized system clock. However, the local clock should be used frequently enough so that clock drift between synchronized system clock and sampling clock can be compensated for gradually. The local system clock should be controlled by a time synchronization protocol such as NTP if such a service is available. Within one synchronization unit, it may be appropriate to compute timestamps based on the logical timing relationships between the packets. For audio samples, for example, the nominal sampling interval may be used. If the clock quality field of the CDES option does not indicate otherwise, it is assumed that the timestamp at the beginning of a synchronization unit is derived from a synchronized system clock. However, it is allowable to operate without synchronized time on those systems where it is not available, unless a profile or session protocol requires otherwise. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Ver| flow |P|S| format | sequence number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | timestamp (seconds) | timestamp (fraction) | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | options ... | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ Figure 1: RTP header format The packet header is followed by options, if any, and the media data. Optional fields are summarized below. Unless otherwise noted, each option may appear only once per packet. Each packet may contain any number of options. H. Schulzrinne/S. Casner Expires 10/01/93 [Page 6] INTERNET-DRAFT RTP May 6, 1993 CSRC 0 Content source identifiers. The content source option is inserted only by bridges and identifies all sources that contributed to the packet. For example, for audio packets, all sources are listed that were mixed together to create this packet, allowing correct talker indication at the receiver. Each CSRC option may contain one or more content source identifiers, each 16 bits long. The identifier values must be unique for all content sources received through a particular synchronization source (bridge) on a particular conference (destination address and port); the value of binary zero is reserved and may not be used. If the number of content sources is even, the two octets needed to pad the list to a multiple of four octets are set to zero. There should only be a single CSRC option within a packet. If no CSRC option is present, the content source is assumed to have a value of zero. CSRC options are not modified by RTP-level translators. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |F| CSRC | length | content source identifier ... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ SSRC 1 Synchronization source identifier. The SSRC option is only inserted by RTP-level translators; the translator must assign a unique identifier for each synchronization source from which it receives packets for a particular conference (destination address and port). The value zero is reserved and must not be used. If no SSRC option is present, the network source is assumed to indicate the synchronization source. There must be no more than one SSRC identifier per packet; thus, a translator must remap the SSRC identifier of an incoming packet into a new, locally unique SSRC identifier. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |F| SSRC | length = 1 | identifier | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ BOP 2 (beginning of playout unit) 16-bit sequence number designating the first packet within the current playout unit. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |F| BOP | length = 1 | sequence number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ H. Schulzrinne/S. Casner Expires 10/01/93 [Page 7] INTERNET-DRAFT RTP May 6, 1993 3 Reverse Control This section describes a means for the receiver of RTP protocol data to signal back to the sender or a third party. Reverse control packets are sent to the destination specified by the sender of the data using the RNA and RTA options. Use of reverse control packets is optional. Reverse control packets have the format shown below. The packet is preceded by a 32-bit packet length field if and only if the underlying transport layer does not support framing. The packet length field contains the number of octets within the packet, n_o_t_ including the packet length field itself. The flow index is that of the flow to which this reverse control is a response. Reverse control packets are only sent to the synchronization source. It is the responsibility of the RTP-level bridge to convey information back to the content sources, if necessary. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 0 | 0 | 0 | flow index | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | reverse-control options (variable length) ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The following options may be used within reverse control packets: QOS 64 Quality of service measurement. The option contains the number of packets received (16 bits), the number of packets expected (16 bits), the minimum delay, the maximum delay and the average delay. The delay measures are encoded as 16/16 NTP timestamps, that is, 16 bits encode the number and seconds and 16 bits the fraction of a second. [The timestamp format is identical to the one used in the fixed RTP header.] 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |F| QOS | length = 5 | MBZ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | packets received | sequence number range | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | minimum delay (seconds) | minimum delay (fraction) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | maximum delay (seconds) | maximum delay (fraction) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | average delay (seconds) | average delay (fraction) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ H. Schulzrinne/S. Casner Expires 10/01/93 [Page 8] INTERNET-DRAFT RTP May 6, 1993 RAD 65 Reverse application data. The data contained in the option is directly passed to the application, without interpretation by RTP. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |F| RAD | length | reverse application data | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ... ... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 4 Real Time Control Protocol --- RTCP The real-time control protocol (RTCP) conveys minimal out-of-band advisory information during a conference. It provides support for loosely controlled conferences, i.e., where participants enter and leave without admission control and parameter negotiation. The services provided by RTCP services enhance RTP, but an end system does not have to implement RTCP features to participate in conferences(1) . RTCP does not aim to provide the services of a conference control protocol and does not provide some of the services desirable for two-party conversations. If a conference control protocol is in use, the services of RTCP should not be required. (Note: as of the writing of this document, a conference or session control protocol has not been specified within the Internet.) Unless otherwise noted, control information is carried periodically as options within RPDUs. In the absence of media data, packets containing only RTCP options are sent periodically to the same multicast group as data packets, using the same time-to-live value. Note that RTCP options could be sent in separate packets even when there is data to send; however, the RTCP packets would consume sequence numbers and make detection of lost data at the receiver more difficult. The period should be varied randomly to avoid synchronization of all sources and its mean should increase with the number of participants in the conference to limit the overall network load. The length of the period determines, for example, how long a receiver joining a conference has to wait in the worst case until it can identify the source. An initial period varying randomly between 3 and 10 seconds is recommended. A receiver may remove a site that it has not been heard from for a given time-out period from its list of active sites; the time-out period may depend on the number of sites or the observed average interarrival time of RTCP messages. Note that not every periodic message has to contain all RTCP options; for example, the MAIL part within the SDES option might only be ------------------------------ 1. There is one exception to that rule: if an application sends FMT options, the receiver has to decode these in order to properly interpret the RTP payload. H. Schulzrinne/S. Casner Expires 10/01/93 [Page 9] INTERNET-DRAFT RTP May 6, 1993 sent every few messages. The item types are defined below: 4.1 Forward Control Options The following options are sent in the same direction as the data stream. FMT 32 Format description. format: 6 bits The 'format' field designates the index value from the 'format' fixed header field, with values ranging from 0 to 63. Clock quality: 8 bits Provides an indication as to the sender-perceived quality of the timestamps in the RTP header. The octet is interpreted as a quantity indicating the maximum dispersion to a root time server measured in fractions of a second and expressed as a power of two. If a source is known to be synchronized to standard time, but with an unknown dispersion, or the dispersion is greater than TBD, the value TBD is used. If the clock is based on the nominal sample rate of the source, a value of TBD is used. The clock quality indication can be used to judge how the delay measurements reported by the QOS option can be interpreted (as absolute delay or only as delay variation). It is also useful for determining to what extent several sources with different clocks can be synchronized. Format-dependent data: variable Format-dependent data may or may not appear in a FMT option. It is passed to the next layer and not interpreted by RTP. H. Schulzrinne/S. Casner Expires 10/01/93 [Page 10] INTERNET-DRAFT RTP May 6, 1993 A FMT mapping changes the interpretation of a given 'content' value starting at the packet containing the FMT option. The new interpretation applies only to packets from the synchronization source of this packet. A sender should refrain from changing the content type and flow index of a mapping defined by external means such as a conference registry, conference announcement protocol or otherwise agreed-upon mapping. Dynamic changes to these values may result in misinterpretation of RTP payload if the packet(s) containing the FMT option are lost. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |F| FMT | length |0|0| format | clock quality | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | format-dependent data ... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ SDES 33 This option provides a mapping between a numeric source identifier and one or more identifying attributes. [Several attributes were combined into one option to avoid multiple mappings from identifiers to the receiver site data structure.] For those applications where the size of a multipart SDES option would be a concern, multiple SDES options may be formed with subsets of the parts to be sent in separate packets. An end system always uses an identifier value of zero. A bridge uses the content source identifiers used in CSRC options to identify contributors, and a value of zero to identify itself. Translators do not modify or insert SDES options. The end system performs the same mapping it uses to identify the content sources (that is, the combination of network source, synchronization source and the source number within this SDES option) to identify a particular source. Currently, the following items are defined. Each has a structure similar to that of RTCP and RTP options, that is, a type field followed by a length field (measured in multiples of four octets). No final bit is needed since the overall length is known. The class identifier of the informational items within the SDES option is identical to the CLASS value in the resource record (RR) in the Domain Name Service protocol (DNS) [RFC 1034, RFC 1035] [5,6] and may be found in the current version of the Assigned Numbers RFC issued by the Internet Assigned Numbers Authority. Additional values that are reserved are used for SDES-specific identifiers. H. Schulzrinne/S. Casner Expires 10/01/93 [Page 11] INTERNET-DRAFT RTP May 6, 1993 name class description USER 0 user and host identifier, e.g., ``doe@sleepy.megacorp.com'' or ``sleepy.megacorp.com'' MAIL 3 user's electronic mail address e.g., ``John.Doe@megacorp.com'' TEXT 65535 text describing the source, e.g.,``John Doe, Bit Recycler, Megacorp'' ADDR 1 IPv4 address of source 2-65534 other address formats Class value 4 is currently assigned to historical network address types (HESSIOD) and thus safe for private SDES use. Items are padded with zero to the next multiple of four octets. The USER item must have the format ``user@host'' or ``host'', where ``host'' is the fully qualified domain name of the host where the real-time data originates from, formatted according to the rules specified in RFC 1035. The latter form may be used if a user name is not available, for example on single-user systems. The user name should be in a form that a program such as ``finger'' or ``talk'' could use, i.e., it typically is the login name rather than the ``real life'' name. Note that the host name is not necessarily identical to the electronic mail address of the participant. The latter is provided through the MAIL option. The USER item is intended to be parsed by an application program. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |F| SDES | length | source identifier | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | class = 0 | length | text ... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ... describing the source ... ... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | class = 0xFFFF | length | user and ... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | domain name of source ... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | class = 1 | length | 0 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | IPv4 address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ H. Schulzrinne/S. Casner Expires 10/01/93 [Page 12] INTERNET-DRAFT RTP May 6, 1993 RNA 36 The RNA (reverse network address) indicates the network address to be used for sending reverse control data for the given content type. The address type field contains the address class, using the DNS-based namespace described for the SDES option above. If a host has several network addresses (for example, for different network protocols), the RNA option is to be repeated as often as needed. The receiver then chooses the address appropriate for its needs. The 'interval' field contains the number of seconds between QOS packets, expressed as the exponent of a power of two. For example, a value of 3 means that the source would like to receive quality-of-service reports every 2 ** 3 = 8 seconds. To avoid synchronization between receivers, a receiver should space QOS reports randomly between one half and twice the interval requested. The interval is advisory only and an application may choose to send QOS reports at a different frequency. [This caveat is necessary as keeping track of a different interval for each source may be unduly burdensome.] A profile may specify a different algorithm. A value in the 'interval' field of 255 decimal implies that no QOS packets should be sent. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |F| RNA | length | format | interval | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | address class | 0 | 0 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | network-address ... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |F| RNA | length = 2 | format | interval | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | address class = 1 | 0 | 0 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | IPv4 address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ RTA 37 The RTA (reverse transport address) indicates the transport selector (e.g., port number) to be used for sending reverse control data. The transport protocol field determines the interpretation of the following octets, using the IP Protocol Numbers defined in the current edition of the Assigned Numbers RFC. The figure shows the use of the RTA option for the ST-II, TCP and UDP protocols. [The port numbers are placed so that the second 32-bit word can be interpreted as the port number, with the most-significant bits as zero.] 0 1 2 3 H. Schulzrinne/S. Casner Expires 10/01/93 [Page 13] INTERNET-DRAFT RTP May 6, 1993 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |F| RTA | length | format | transport pro.| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | transport-address (port number) ... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |F| RTA | length >= 2 | format | protocol = 5 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 0 | 0 | 0 | SAP bytes | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | padding : ST-II service access point ... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |F| RTA | length = 2 | format | protocol = 6 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 0 | 0 | TCP port number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |F| RTA | length = 2 | format | protocol = 17 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 0 | 0 | UDP port number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ BYE 35 The BYE option indicates that a particular site is no longer active. A bridge sends BYE options with a (non-zero) content source value. An identifier value of zero indicates that the source indicated by the synchronization source (SSRC) option and network address is no longer active. If a bridge shuts down, it should first send BYE options for all content sources it handles, followed by a BYE option with an identifier value of zero. Each RTCP message can contain one or more BYE messages. [Multiple identifiers in a single BYE option are not allowed to avoid ambiguities between the special value of zero and any necessary padding.] 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |F| BYE | length = 1 | content source identifier | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ H. Schulzrinne/S. Casner Expires 10/01/93 [Page 14] INTERNET-DRAFT RTP May 6, 1993 5 Security Considerations RTP suffers from the same security deficiencies as the underlying protocols, for example, the ability of an impostor to fake source or destination network addresses. The usage of network addresses for identification within the protocol (SDES option) allows impersonating another site. Impersonation and denial-of-service attacks can be made more difficult by providing digital signatures for all or parts of a message. IP multicast provides no direct means for a sender to know all the receivers of the data sent. RTP options make it easy for all participants in a conference to identify themselves; if deemed important for a particular application, it is the responsibility of the application writer to make listening without identification difficult. It should be noted, however, that within an internet, privacy of the payload can generally only be assured by encryption. The TBD RTP options described in Section 2 allow the provision of the following security services within this layer: TBD. 6 RTP over network and transport protocols This section describes issues specific to carrying RTP packets over particular network and transport protocols. Unless otherwise noted, the mechanisms apply to both the forward (data) and reverse control directions. 6.1 Defaults The following rules apply unless superseded by protocol-specific subsections in this section. 6.1.1 Framing If RTP protocol data units (RPDU), in both forward and reverse directions, are carried over underlying protocols that provide the abstraction of a continuous bit stream rather than messages, each RPDU is prefixed by a 32-bit framing field containing the length of the RPDU measured in octets, not including the framing field itself. If a RPDU traverses a path over a mixture of octet-stream and message-oriented protocols, each RTP-level bridge between these protocols is responsible for adding and removing the framing field. A profile may determine that framing is to be used for protocols that do provide framing in order to allow carrying several RTP packets in one underlying protocol data unit. [Carrying several RTP packets H. Schulzrinne/S. Casner Expires 10/01/93 [Page 15] INTERNET-DRAFT RTP May 6, 1993 in one network or transport packet reduces header overhead and may ease synchronization between different streams.] 6.1.2 RTA option Port numbers (or equivalent) are by default two octets long. 6.2 UDP The format of the RTA option is shown below. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |F| RTA | length = 2 | format | protocol = 17 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 0 | 0 | UDP port number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 6.3 TCP The format of the RTA option is shown below. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |F| RTA | length = 2 | format | protocol = 6 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 0 | 0 | TCP port number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 6.4 ST-II The next protocol field (``NextPCol'', Section 4.2.2.10 in RFC-1190) is used to distinguish two encapsulations of RTP over ST-II. The first uses NextPCol value TBD and directly places the RTP packet into the ST-II data area. If NextPCol value TBD is used, the RTP header is preceded by a 32-bit header shown below. The byte count determines the number of bytes in the RTP header and payload to be checksummed. The 16-bit checksum uses the TCP and H. Schulzrinne/S. Casner Expires 10/01/93 [Page 16] INTERNET-DRAFT RTP May 6, 1993 UDP checksum algorithm. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | count of bytes to be checked | check sum | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ... RTP header ... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The format of the RTA option is shown below. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |F| RTA | length = 2 | format | protocol = 5 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 0 | 0 | ST-II service access pt (SAP) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ A Implementation Notes In this section, one possible implementation of the part of the receiver that maps incoming RTP packets to sources is described. The receiver maintains a list of all sources, content and synchronization sources alike in a table. Synchronization sources are stored with a content source value of zero. When an RTP packet arrives, the receiver determines its network source and port (from information returned by the operating system), synchronization source (SSRC option) and content source(s) (CSRC option). To locate the table entry containing timing information, mapping from content descriptor to actual encoding, etc., the receiver sets the content source to zero and locates a table entry based on the triple (network address and port, synchronization source identifier, 0). The receiver identifies the contributors to the packet (for example, the speaker who is heard in the packet) through the list of content sources carried in the CSRC option. To locate the table entry, it matches on the triple (network address and port, synchronization source identifier, content source). Note that since network addresses are only generated locally at the receiver, the receiver can choose whatever format seems most appropriate for matching. For example, a Berkeley Unix-based system may use struct sockaddr data types if it expects network sources with non-IP addresses. H. Schulzrinne/S. Casner Expires 10/01/93 [Page 17] INTERNET-DRAFT RTP May 6, 1993 Acknowledgments This draft is based on discussion within the IETF audio-video transport working group chaired by Stephen Casner. The current protocol has its origins in the Network Voice Protocol and the Packet Video Protocol (Danny Cohen and Randy Cole) and the protocol implemented by the 'vat' application (Van Jacobson and Steve McCanne). B Addresses of Authors Stephen Casner USC/Information Sciences Institute 4676 Admiralty Way Marina del Rey, CA 90292-6695 telephone: +1 310 822 1511 (extension 153) electronic mail: casner@isi.edu Henning Schulzrinne AT&T Bell Laboratories MH 2A244 600 Mountain Avenue Murray Hill, NJ 07974 telephone: +1 908 582 2262 electronic mail: hgs@research.att.com References [1] J. Postel, ``Internet protocol,'' Network Working Group Request for Comments RFC 791, Information Sciences Institute, Sept. 1981. [2] International Standards Organization, ``ISO/IEC DIS 10646-1:1993 information technology -- universal multiple-octet coded character set (UCS) -- part I: Architecture and basic multilingual plane,'' 1993. [3] The Unicode Consortium, T_h_e_ U_n_i_c_o_d_e_ S_t_a_n_d_a_r_d_. New York, New York: Addison-Wesley, 1991. [4] D. L. Mills, ``Network time protocol (version 3) -- specification, implementation and analysis,'' Network Working Group Request for Comments RFC 1305, University of Delaware, Mar. 1992. [5] P. Mockapetris, ``Domain names -- concepts and facilities,'' Network Working Group Request for Comments RFC 1034, ISI, Nov. 1987. H. Schulzrinne/S. Casner Expires 10/01/93 [Page 18] INTERNET-DRAFT RTP May 6, 1993 [6] P. Mockapetris, ``Domain names -- implementation and specification,'' Network Working Group Request for Comments RFC 1035, ISI, Nov. 1987. H. Schulzrinne/S. Casner Expires 10/01/93 [Page 19]