Internet Engineering Task Force Audio-Video Transport Working Group INTERNET-DRAFT H. Schulzrinne AT&T Bell Laboratories December 15, 1992 Expires: 5/1/93 Sample Profile for the Use of RTP for Audio and Video Conferences with Minimal Control Status of this Memo This document is an Internet Draft. Internet Drafts are working documents of the Internet Engineering Task Force (IETF), its Areas, and its Working Groups. Note that other groups may also distribute working documents as Internet Drafts). Internet Drafts are draft documents valid for a maximum of six months. Internet Drafts may be updated, replaced, or obsoleted by other documents at any time. It is not appropriate to use Internet Drafts as reference material or to cite them other than as a "working draft" or "work in progress." Please check the I-D abstract listing contained in each Internet Draft directory to learn the current status of this or any other Internet Draft. Distribution of this document is unlimited. Abstract This note describes a profile for the use of the real-time transport protocol (RTP) within audio and video multiparticipant conferences with minimal control. It provides interpretations of generic fields within the RTP specification suitable for audio and video conferences. In particular, this document defines a set of default mappings from content index to encodings. 1 Introduction This profile defines aspects of RTP left unspecified in the RTP protocol definition. This profile is intended for the use within audio and video conferences with minimal session control. In particular, no support for INTERNET-DRAFT AV Profile December 15, 1992 the negotiation of parameters or admission control is provided. Other profiles may make different choices for the items specified here. The profile specifies the use of RTP over unicast and multicast UDP as well as ST-II. For unicast UDP and ST-II, references to multicast addresses are to be ignored. The use of this profile is indicated by the use of a well-known port number. 2 Multiplexing and Demultiplexing Packets sharing the same multicast group address, the same destination port number and the same flow value belong to the same conference. Within a conference, a packet is mapped to a site (state) through its synchronization address and network source port. 3 CDESC The content field within the CDESC option describes the media encoding used. The four octets contain one of the encodings defined by the Internet Assigned Numbers Authority (IANA) or an encoding agreed upon by mutual consent of all conference participants. The names are defined in Figures 1 and 2 and encoded in US-ASCII. Case is significant. If the name is shorter than four characters, it is padded with one or more space characters (ASCII 32 decimal). The encodings are identified as follows: Bolt: refers to the proprietary Bolter video coding algorithm. dvc: the BBN video coding algorithm. DVI: refers to the Intel DVI/ADPCM audio encoding, specified in the `Recommended Practices for Enhancing Digital Audio Compatibility in Multimedia Systems'', published by the Interactive Multimedia Association (IMA), Annapolis, MD. 1016: refers to the Federal Standard 1016, which uses code-excited linear prediction. G721: refers to the ADPCM encoding defined by CCITT Recommendation G.721 at a rate of 32 kb/s. G723: refers to the ADPCM encoding defined by CCITT Recommendation G.723 at a rate of 24 kb/s. G722: is defined in CCITT Recommendation G.722 and denotes a subband coded H. Schulzrinne Expires 5/1/93 [Page 2] INTERNET-DRAFT AV Profile December 15, 1992 ADPCM algorithm with an audio bandwidth of 7 kHz. GSM: denotes the European GSM 06.10 provisional standard for full-rate speech transcoding, prI-ETS 300 036, based on residual pulse excitation with long term prediction (RPE/LTP). H261: refers to CCITT Recommendation H.261 and defines a video codec based on discrete-cosine transforms. nv: Xerox Parc video coding algorithm. PCMU: is a subset of CCITT Recommendation G.711, referring to a mu-law companded PCM encoding. PCMA: is a subset of CCITT Recommendation G.711, referring to an A-law companded PCM encoding. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |F| CFDESC | length |0|0| content | MBZ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | return port number | clock quality | MBZ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | name of encoding | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | channels | sampling rate (Hz) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ... encoding specific parameters ... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 1: CDESC for Audio For audio encodings, the index into the table of encodings is followed by a field containing a channel count and a sample rate field, measured in samples per second.(1) A channel count of zero is considered invalid. For video encodings, a one-octet numeric version identifier further describes the encoding. ------------------------------ 1. Fractional samples per second was considered excessive as the typical crystal accuraccy of 100 ppm translates into about one Hz or more of sampling rate inaccuracy. H. Schulzrinne Expires 5/1/93 [Page 3] INTERNET-DRAFT AV Profile December 15, 1992 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |F| CFDESC | length |0 0 content | MBZ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | return port number | clock quality | MBZ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | name of encoding | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | version | encoding-specific parameters | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ... encoding-specific parameters ... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 2: CDESC for Video 4 Standard Encodings Unless specified with the CDESC option, the mapping between the content field in an RTP packet and encodings, sampling rates and channel counts is specified by Tables 1 and 2. Values of 31 and below cannot be redefined by CDESC options. In other words, only values of 32 and above are valid in the content field within an CDESC option. The receiver is expected to discard RTP packets containing media data with unknown content field values. Sites are expected to keep the mapping between content and encoding constant, so that lost packets containing CDESC options do not lead the receiver to misinterpret media data. index encoding sampling rate channels ________name______(kHz)___________________ 0 PCMU 8 1 1 1016 8 1 2 G721 8 1 3 GSM 8 1 4 G723 8 1 5 DVI 8 1 6 L16 16 1 _____7__L16_______44.1_________________2__ Table 1: Default Audio Encodings H. Schulzrinne Expires 5/1/93 [Page 4] INTERNET-DRAFT AV Profile December 15, 1992 _number__name_ 31 H261 30 Bolt 29 dvc 28 nv Table 2: Default Video Encodings 5 Port Assignments and Miscellaneous UDP port [TBD] is to be used as the destination for multicast real-time data carried by RTP. Unicast connections may use the this or a set of mutually agreed-upon port numbers. ST-II connections use port 3456. The framing field is to be used only when RTP protocol data units are carried over a network or transport protocol that does not provide framing (e.g., TCP). 6 Address of Author Henning Schulzrinne AT&T Bell Laboratories MH 2A244 600 Mountain Avenue Murray Hill, NJ 07974 telephone: 908 582-2262 electronic mail: hgs@research.att.com H. Schulzrinne Expires 5/1/93 [Page 5]