INTERNET-DRAFT 18 November 1998 Colin Perkins University College London RTP Payload format for Interleaved Media draft-ietf-avt-interleaving-00.txt Status of this Memo This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as ``work in progress.'' To view the entire list of current Internet-Drafts, please check the ``1id-abstracts.txt'' listing contained in the Internet-Drafts Shadow Directories on ftp.is.co.za (Africa), ftp.nordu.net (Northern Europe), ftp.nis.garr.it (Southern Europe), munnari.oz.au (Pacific Rim), ftp.ietf.org (US East Coast), or ftp.isi.edu (US West Coast). Distribution of this document is unlimited. Comments are solicited and should be addressed to the authors and/or the AVT working group's mailing list at rem-conf@es.net. Abstract This memo defines an interleaving scheme for RTP streams. This scheme is derived from the RTP payload format for redundant audio data [3] and hence is targetted primarily at streamed audio, although it may be of use in other scenarios. 1 Introduction The need for loss resilient transport of media streams within RTP has been recognised for a number of years, and a number of channel coding schemes capable of providing such transport have been proposed [3--5]. These channel coding schemes have, to date, focused on the addition of FEC data to media streams, however FEC schemes are not the only form of error resilience which may be employed [2]. This memo focuses on a transport mechanism for interleaved media, providing Page 1 INTERNET-DRAFT 18 November 1998 an alternative which is of use when bandwidth efficiency is required and latency is not an issue. 2 Discussion When the codec frame size is smaller than the packet size, and end-to-end delay is unimportant, interleaving is a useful technique for reducing the effects of packet loss. Frames are resequenced before transmission, so that originally adjacent frames are separated by a guaranteed distance in the transmitted stream and returned to their original order at the receiver. Interleaving disperses the effect of packet losses. If, for example, units are 20ms in length and packets 80ms (ie: 4 units per packet), then the first packet could contain units 1, 5, 9, 13; the second packet would contain units 2, 6, 10, 14; and so on. It can be seen that the loss of a single packet from an interleaved stream results in multiple small gaps in the reconstructed stream, as opposed to the single large gap which would occur in a non-interleaved stream. Note that the size of the gap is dependent on the degree of interleaving used, and can be made arbitrarily small at the expense of additional latency. In many cases it is easier to reconstruct or repair a stream with such loss patterns, although this is clearly media and codec dependent. The obvious disadvantage of interleaving is that it increases latency. This limits the use of this technique for interactive applications, although it performs well for non-interactive use. The major advantage of interleaving is that it provides increased error resilience yet does not increase the bandwidth requirements of a stream. The interleaving process takes a series of frames produced by a media codec, and reorders them before packetisation. An example is illustrated in figure 1. +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ | 1| 2| 3| 4| 5| 6| 7| 8| 9|10|11|12|13|14|15|16| Initial +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ | | +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ V | 1| 5| 9|13| 2| 6|10|14| 3| 7|11|15| 4| 8|12|16| Reorder +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ | | +--+--+--+--+ +--+--+--+--+ +--+--+--+--+ +--+--+--+--+ V | 1| 5| 9|13| | 2| 6|10|14| | 3| 7|11|15| | 4| 8|12|16| Packetise +--+--+--+--+ +--+--+--+--+ +--+--+--+--+ +--+--+--+--+ Figure 1: The interleaving process In order to reconstruct an interleaved stream, the receiver must determine the order in which the frames arrive. This information Page 2 INTERNET-DRAFT 18 November 1998 can be communicated explicitly, by timestamping each frame, or implicitly by informing the receiver of the interleaving function by non-RTP means. It is more bandwidth efficient to implicitly transport this information, since this allows frames to be packed into RTP packets with no additional headers. The use of explicit timestamps on each frame allows the decoder to be unaware of the interleaving function being used, and allows for a common decoder for both redundant and interleaved media (see section 5). Use of a common payload format also allows for the codec to transparantly change, since the payload type of each frame is conveyed. It is our belief that the benefits of a common decoder model outweigh the bandwidth overhead incurred, hence this document defines a payload format with explicit timestamps on each frame. 3 Payload format definition The payload format for redundant audio data [3] provides an efficient means by which multiple frames of audio data may be combined within a single packet. Whilst that payload format was defined to allow transport of media-specific FEC data, it is also possible to use it to convey interleaved data. Interleaved frames are packed into an RTP packet using the same payload format as redundant frames. Each frame is sent once only, with the timestamp offset fields in the payload header used to indicate the ordering of interleaved frames. Frames MUST be packed into packets such that the frame with the earliest timestamp takes the place of the primary encoding, with the other frames taking the place of the redundant encodings. This is because the timestamp offset field in the payload header is unsigned and gives the delay relative to the primary encoding. The interleaving function to be used is a function of the encoder only and is not defined here. The decoder does not need to be aware of the interleaving function. The assignment of an RTP payload type for this new packet format is not necessary, rather the payload type for redundant audio data is used. It is expected that the RTP profile for a particular class of applications will assign a payload type for that encoding, or if that is not done a payload type in the dynamic range shall be chosen. Page 3 INTERNET-DRAFT 18 November 1998 4 Example Packet Assume the interleaving function illustrated in figure 1, using the GSM codec with 20ms frames. The format of the packets would be as illustrated in figure 2. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |V=2|P|X| CC=0 |M| PT | sequence number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | timestamp of initial frame | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | synchronization source (SSRC) identifier | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |1| block PT=3 | timestamp offset (=1920) | block length (=33)| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |1| block PT=3 | timestamp offset (=1280) | block length (=33)| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |1| block PT=3 | timestamp offset (=640) | block length (=33)| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0| block PT=3 | | +-+-+-+-+-+-+-+-+ + | | / 4 frames of GSM encoded data follow / | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 2: Example interleaved packet 5 Interaction with redundant audio Whilst the payload format defined in this memo is not the most efficient possible in terms of bandwidth usage for an interleaved stream, the reuse of the payload format for redundant audio data provides a number of advantages which we now describe. A decoder which can separate frames of data from interleaved/redundant media streams and order them according to both timestamp and quality, and which select the frame with the highest quality for a particular time interval should be able to decode both interleaved and redundant media streams with no change. This allows for dual usage: if low-latency transmission is desired, and some bandwidth overhead is acceptable, then the sender should choose redundant transmission. If latency is not an issue interleaving should be chosen. The decoder can render either stream with no change, Page 4 INTERNET-DRAFT 18 November 1998 resulting in a system suitable for both interactive and non-interactive scenarios. 6 Security consideration There are no additional security considerations beyond those noted for RTP [7], the RTP profile for audio/video conferences [6] and the RTP payload format for redundant audio [3]. 7 Acknowledgements The author wishes to thank Orion Hodson for his helpful comments. 8 Author's addresses Colin Perkins Department of Computer Science University College London Gower Street London WC1E 6BT UK Email: c.perkins@cs.ucl.c.uk References [1] S. Bradner. Key words for use in rfcs to indicate requirement levels. IETF Network Working Group, March 1997. RFC2119. [2] C. S. Perkins and O. Hodson. Options for repair of streaming media. IETF Audio/Video Transport Working Group, June 1998. RFC2354. [3] C. S. Perkins, I. Kouvelas, O. Hodson, V. Hardman, M. Handley, J.-C. Bolot, A. Vega-Garcia, and S. Fosse-Parisis. RTP Payload for redundant audio data. IETF Audio/Video Transport Working Group, 1997. RFC2198. [4] J. Rosenberg and H. Schulzrinne. An RTP payload format for generic forward error correction. IETF Audio/Video Transport Working Group, November 1998. draft-ietf-avt-fec-04.txt. [5] J. Rosenberg and H. Schulzrinne. An RTP payload format for Reed Solomon codes. IETF Audio/Video Transport Working Group, November 1998. draft-ietf-avt-reedsolomon-00.txt. Page 5 INTERNET-DRAFT 18 November 1998 [6] H. Schulzrinne. RTP profile for audio and video conferences with minimal control. IETF Audio/Video Transport Working Group, January 1996. RFC1890. [7] H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson. RTP: A transport protocol for real-time applications. IETF Audio/Video Transport Working Group, January 1996. RFC1889. Page 6