INTERNET-DRAFT                              18 November 1998


                                               Colin Perkins
                                   University College London


           RTP Payload format for Interleaved Media
              draft-ietf-avt-interleaving-00.txt


Status of this Memo


This document is an Internet-Draft.  Internet-Drafts are working documents
of the Internet Engineering Task Force (IETF), its areas, and its working
groups.  Note that other groups may also distribute working documents as
Internet-Drafts.

Internet-Drafts are draft documents valid for a maximum of six months and
may be updated, replaced, or obsoleted by other documents at any time.  It
is inappropriate to use Internet-Drafts as reference material or to cite
them other than as ``work in progress.''

To view the entire list of current Internet-Drafts, please check the
``1id-abstracts.txt'' listing contained in the Internet-Drafts Shadow
Directories on ftp.is.co.za (Africa), ftp.nordu.net (Northern Europe),
ftp.nis.garr.it (Southern Europe), munnari.oz.au (Pacific Rim),
ftp.ietf.org (US East Coast), or ftp.isi.edu (US West Coast).

Distribution of this document is unlimited.

Comments are solicited and should be addressed to the authors and/or the
AVT working group's mailing list at rem-conf@es.net.


                         Abstract

    This memo defines an interleaving scheme for RTP streams. This scheme
    is derived from the RTP payload format for redundant audio data [3] and
    hence is targetted primarily at streamed audio, although it may be of
    use in other scenarios.


1  Introduction

The need for loss resilient transport of media streams within RTP
has been recognised for a number of years, and a number of channel
coding schemes capable of providing such transport have been proposed
[3--5].  These channel coding schemes have, to date, focused on the
addition of FEC data to media streams, however FEC schemes are not
the only form of error resilience which may be employed [2].  This
memo focuses on a transport mechanism for interleaved media, providing


                                                   Page 1


INTERNET-DRAFT                            18 November 1998


an alternative which is of use when bandwidth efficiency is required
and latency is not an issue.


2  Discussion

When the codec frame size is smaller than the packet size, and end-to-end
delay is unimportant, interleaving is a useful technique for reducing the
effects of packet loss.  Frames are resequenced before transmission, so
that originally adjacent frames are separated by a guaranteed distance in
the transmitted stream and returned to their original order at the
receiver.  Interleaving disperses the effect of packet losses.  If, for
example, units are 20ms in length and packets 80ms (ie:  4 units per
packet), then the first packet could contain units 1, 5, 9, 13; the second
packet would contain units 2, 6, 10, 14; and so on.  It can be seen that
the loss of a single packet from an interleaved stream results in multiple
small gaps in the reconstructed stream, as opposed to the single large gap
which would occur in a non-interleaved stream.  Note that the size of the
gap is dependent on the degree of interleaving used, and can be made
arbitrarily small at the expense of additional latency.  In many cases it
is easier to reconstruct or repair a stream with such loss patterns,
although this is clearly media and codec dependent.

The obvious disadvantage of interleaving is that it increases latency.
This limits the use of this technique for interactive applications,
although it performs well for non-interactive use.  The major advantage
of interleaving is that it provides increased error resilience yet
does not increase the bandwidth requirements of a stream.

The interleaving process takes a series of frames produced by a media
codec, and reorders them before packetisation.  An example is illustrated
in figure 1.

   +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
   | 1| 2| 3| 4| 5| 6| 7| 8| 9|10|11|12|13|14|15|16|     Initial
   +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+       |
                                                           |
   +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+       V
   | 1| 5| 9|13| 2| 6|10|14| 3| 7|11|15| 4| 8|12|16|     Reorder
   +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+       |
                                                           |
+--+--+--+--+ +--+--+--+--+ +--+--+--+--+ +--+--+--+--+    V
| 1| 5| 9|13| | 2| 6|10|14| | 3| 7|11|15| | 4| 8|12|16| Packetise
+--+--+--+--+ +--+--+--+--+ +--+--+--+--+ +--+--+--+--+


             Figure 1:  The interleaving process

In order to reconstruct an interleaved stream, the receiver must
determine the order in which the frames arrive.  This information

                                                   Page 2


INTERNET-DRAFT                            18 November 1998


can be communicated explicitly, by timestamping each frame, or implicitly
by informing the receiver of the interleaving function by non-RTP
means.

It is more bandwidth efficient to implicitly transport this information,
since this allows frames to be packed into RTP packets with no additional
headers.

The use of explicit timestamps on each frame allows the decoder to be
unaware of the interleaving function being used, and allows for a common
decoder for both redundant and interleaved media (see section 5).  Use of a
common payload format also allows for the codec to transparantly change,
since the payload type of each frame is conveyed.

It is our belief that the benefits of a common decoder model outweigh the
bandwidth overhead incurred, hence this document defines a payload format
with explicit timestamps on each frame.


3  Payload format definition

The payload format for redundant audio data [3] provides an efficient
means by which multiple frames of audio data may be combined within
a single packet.  Whilst that payload format was defined to allow
transport of media-specific FEC data, it is also possible to use
it to convey interleaved data.

Interleaved frames are packed into an RTP packet using the same payload
format as redundant frames.  Each frame is sent once only, with the
timestamp offset fields in the payload header used to indicate the
ordering of interleaved frames.

Frames MUST be packed into packets such that the frame with the earliest
timestamp takes the place of the primary encoding, with the other frames
taking the place of the redundant encodings.  This is because the timestamp
offset field in the payload header is unsigned and gives the delay relative
to the primary encoding.

The interleaving function to be used is a function of the encoder
only and is not defined here.  The decoder does not need to be aware
of the interleaving function.

The assignment of an RTP payload type for this new packet format is not
necessary, rather the payload type for redundant audio data is used.  It is
expected that the RTP profile for a particular class of applications will
assign a payload type for that encoding, or if that is not done a payload
type in the dynamic range shall be chosen.


                                                   Page 3


INTERNET-DRAFT                            18 November 1998


4  Example Packet

Assume the interleaving function illustrated in figure 1, using the
GSM codec with 20ms frames.  The format of the packets would be as
illustrated in figure 2.

 0                     1                 2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|V=2|P|X| CC=0  |M|      PT     |           sequence number     |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                  timestamp  of initial frame                  |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|           synchronization source (SSRC) identifier            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|1| block PT=3  |  timestamp offset (=1920) | block length (=33)|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|1| block PT=3  |  timestamp offset (=1280) | block length (=33)|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|1| block PT=3  |  timestamp offset (=640)  | block length (=33)|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0| block PT=3  |                                               |
+-+-+-+-+-+-+-+-+                                               +
|                                                               |
/                 4 frames of GSM encoded data follow           /
|                                                               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


            Figure 2:  Example interleaved packet


5  Interaction with redundant audio

Whilst the payload format defined in this memo is not the most efficient
possible in terms of bandwidth usage for an interleaved stream, the
reuse of the payload format for redundant audio data provides a number
of advantages which we now describe.

A decoder which can separate frames of data from interleaved/redundant
media streams and order them according to both timestamp and quality,
and which select the frame with the highest quality for a particular
time interval should be able to decode both interleaved and redundant
media streams with no change.

This allows for dual usage:  if low-latency transmission is desired,
and some bandwidth overhead is acceptable, then the sender should
choose redundant transmission.  If latency is not an issue interleaving
should be chosen.  The decoder can render either stream with no change,


                                                   Page 4


INTERNET-DRAFT                            18 November 1998


resulting in a system suitable for both interactive and non-interactive
scenarios.


6  Security consideration

There are no additional security considerations beyond those noted
for RTP [7], the RTP profile for audio/video conferences [6] and
the RTP payload format for redundant audio [3].


7  Acknowledgements


The author wishes to thank Orion Hodson for his helpful comments.


8  Author's addresses

Colin Perkins
Department of Computer Science
University College London
Gower Street
London WC1E 6BT
UK

Email:  c.perkins@cs.ucl.c.uk


References

[1] S. Bradner. Key words for use in rfcs to indicate requirement levels.
    IETF Network Working Group, March 1997.  RFC2119.

[2] C. S. Perkins and O. Hodson. Options for repair of streaming media.
    IETF Audio/Video Transport Working Group, June 1998. RFC2354.

[3] C. S. Perkins, I. Kouvelas, O. Hodson, V. Hardman, M. Handley, 
    J.-C.  Bolot, A. Vega-Garcia, and S. Fosse-Parisis. RTP Payload for
    redundant audio data. IETF Audio/Video Transport Working Group, 1997.
    RFC2198.

[4] J. Rosenberg and H. Schulzrinne. An RTP payload format for generic
    forward  error correction. IETF Audio/Video Transport Working Group,
    November 1998. draft-ietf-avt-fec-04.txt.

[5] J. Rosenberg and H. Schulzrinne. An RTP payload format for Reed Solomon
    codes. IETF Audio/Video Transport Working Group, November 1998.
    draft-ietf-avt-reedsolomon-00.txt.


                                                   Page 5


INTERNET-DRAFT                            18 November 1998


[6] H. Schulzrinne. RTP profile for audio and video conferences with
    minimal control. IETF Audio/Video Transport Working Group, January
    1996.  RFC1890.

[7] H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson.  RTP: A
    transport protocol for real-time applications. IETF Audio/Video
    Transport Working Group, January 1996. RFC1889.


                                                   Page 6