Internet Engineering Task Force                    Avaro-France Telecom
Internet Draft                                               Basso-AT&T
                                                   Casner-Packet Design
                                                          Civanlar-AT&T
                                                        Gentric-Philips
                                                         Herpel-Thomson
                                                            Lim-mp4cast
                                                            Perkins-ISI
Document: draft-gentric-avt-rtp-mpeg4-00.txt              November 2000
                                                       Expires May 2000


                 RTP Payload Format for MPEG-4 Streams


                          Status of this Memo

This document is an Internet-Draft and is in full conformance with all
provisions of Section 10 of RFC2026.

Internet-Drafts are working documents of the Internet Engineering Task
Force (IETF), its areas, and its working groups. Note that other groups
may also distribute working documents as Internet-Drafts. Internet-
Drafts are draft documents valid for a maximum of six months and may be
updated, replaced, or obsoleted by other documents at any time. It is
inappropriate to use Internet- Drafts as reference material or to cite
them other than as "work in progress."

The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.


                                Abstract

This document describes a payload format for transporting MPEG-4
encoded data using RTP. MPEG-4 is a recent standard from ISO/IEC for
the coding of natural and synthetic audio-visual data. Several services
provided by RTP are beneficial for MPEG-4 encoded data transport over
the Internet. Additionally, the use of RTP makes it possible to
synchronize MPEG-4 data with other real-time data types.

This specification is a product of the Audio/Video Transport working
group within the Internet Engineering Task Force and ISO/IEC MPEG-4 ad
hoc group on MPEG-4 over Internet. Comments are solicited and should be
addressed to the working group's mailing list at rem-conf@es.net and/or
the authors.


Avaro/Basso/Casner/Civanlar/Gentric/Herpel/Lim/Perkins               1

             RTP Payload Format for MPEG-4 Streams       November 2000


1. Introduction

MPEG-4 is a recent standard from ISO/IEC for the coding of natural and
synthetic audio-visual data in the form of audiovisual objects that are
arranged into an audiovisual scene by means of a scene description
[1][2][3][4]. This draft specifies an RTP [5] payload format for
transporting MPEG-4 encoded data streams.

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [6].

The benefits of using RTP for MPEG-4 data stream transport include:

i. Ability to synchronize MPEG-4 streams with other RTP payloads

ii. Monitoring MPEG-4 delivery performance through RTCP

iii. Combining MPEG-4 and other real-time data streams received from
multiple end-systems into a set of consolidated streams through RTP
mixers

iv. Converting data types, etc. through the use of RTP translators.

1.1 Overview of MPEG-4 End-System Architecture

Fig. 1 below shows the general layered architecture of MPEG-4
terminals. The Compression Layer processes individual audio-visual
media streams. The MPEG-4 compression schemes are defined in the
ISO/IEC specifications 14496-2 [2] and 14496-3 [3]. The compression
schemes in MPEG-4 achieve efficient encoding over a bandwidth ranging
from several Kbps to many Mbps. The audio-visual content compressed by
this layer is organized into Elementary Streams (ESs). The MPEG-4
standard specifies MPEG-4 compliant streams. Within the constraint of
this compliance the compression layer is unaware of a specific delivery
technology, but it can be made to react to the characteristics of a
particular delivery layer such as the path-MTU or loss characteristics.
Also, some compressors can be designed to be delivery specific for
implementation efficiency.  In such cases the compressor may work in a
non-optimal fashion with delivery technologies that are different than
the one it is specifically designed to operate with.

The hierarchical relations, location and properties of ESs in a
presentation are described by a dynamic set of Object Descriptors
(ODs). Each OD groups one or more ES Descriptors referring to a single
content item (audio-visual object). Hence, multiple alternative or
hierarchical representations of each content item are possible.

ODs are themselves conveyed through one or more ESs. A complete set of
ODs can be seen as an MPEG-4 resource or session description at a
stream level. The resource description may itself be hierarchical, i.e.
an ES conveying an OD may describe other ESs conveying other ODs.


Avaro/Basso/Casner/Civanlar/Gentric/Herpel/Lim/Perkins               2

             RTP Payload Format for MPEG-4 Streams       November 2000


The session description is accompanied by a dynamic scene description,
Binary Format for Scene (BIFS), again conveyed through one or more ESs.
At this level, content is identified in terms of audio-visual objects.
The spatio-temporal location of each object is defined by BIFS. The
audio-visual content of those objects that are synthetic and static are
described by BIFS also. Natural and animated synthetic objects may
refer to an OD that points to one or more ESs that carry the coded
representation of the object or its animation data.

By conveying the session (or resource) description as well as the scene
(or content composition) description through their own ESs, it is made
possible to change portions of the content composition and the number
and properties of media streams that carry the audio-visual content
separately and dynamically at well known instants in time.

One or more initial Scene Description streams and the corresponding OD
stream has to be pointed to by an initial object descriptor (IOD). The
IOD needs to be made available to the receivers through some out-of-
band means that are not defined in this document.

A homogeneous encapsulation of ESs carrying media or control (ODs,
BIFS) data is defined by the Sync Layer (SL) that primarily provides
the synchronization between streams. The Compression Layer organizes
the ESs in Access Units (AU), the smallest elements that can be
attributed individual timestamps. Integer or fractional AUs are then
encapsulated in SL packets.  All consecutive data from one stream is
called an SL-packetized stream at this layer. The interface between the
compression layer and the SL is called the Elementary Stream Interface
(ESI). The ESI is informative.

The Delivery Layer in MPEG-4 consists of the Delivery Multimedia
Integration Framework defined in ISO/IEC 14496-6 [4]. This layer is
media unaware but delivery technology aware. It provides transparent
access to and delivery of content irrespective of the technologies
used.  The interface between the SL and DMIF is called the DMIF
Application Interface (DAI). It offers content location independent
procedures for establishing MPEG-4 sessions and access to transport
channels. The specification of this payload format is considered as a
part of the MPEG-4 Delivery Layer.

 media aware        +-----------------------------------------+
 delivery unaware   |           COMPRESSION LAYER             |
 14496-2 Visual     |streams from as low as Kbps to multi-Mbps|
 14496-3 Audio      +-----------------------------------------+
                                                           Elementary
                                                           Stream
 ==========================================================Interface
                                                                (ESI)
                   +-------------------------------------------+
 media and         |              SYNC LAYER                   |
 delivery unaware  | manages elementary streams, their synch-  |
 14496-1 Systems   | ronization and hierarchical relations     |
                   +-------------------------------------------+

Avaro/Basso/Casner/Civanlar/Gentric/Herpel/Lim/Perkins               3

             RTP Payload Format for MPEG-4 Streams       November 2000


                                                           DMIF
                                                           Application
===========================================================Interface
                                                                (DAI)
                   +-------------------------------------------+
 delivery aware    |               DELIVERY LAYER              |
 media  unaware    |provides transparent access to and delivery|
 14496-6 DMIF      | of content irrespective of delivery       |
                   |                technology                 |
                   +-------------------------------------------+

                 Figure 1: General MPEG-4 terminal architecture

1.2 MPEG-4 Elementary Stream Data Packetization

The ESs from the encoders are fed into the SL with indications of AU
boundaries, random access points, desired composition time and the
current time.

The Sync Layer fragments the ESs into SL packets, each containing a
header that encodes information conveyed through the ESI. If the AU is
larger than a SL packet, subsequent packets containing remaining parts
of the AU are generated with subset headers until the complete AU is
packetized.

The syntax of the Sync Layer is not fixed and can be adapted to the
needs of the stream to be transported. This includes the possibility to
select the presence or absence of individual syntax elements as well as
configuration of their length in bits. The configuration for each
individual stream is conveyed in a SLConfigDescriptor, which is an
integral part of the ES Descriptor for this stream.

2. Analysis of the alternatives for carrying MPEG-4 over IP

2.1 MPEG-4 over UDP

Considering that the MPEG-4 SL defines several transport related
functions such as timing, sequence numbering, etc., this seems to be
the most straightforward alternative for carrying MPEG-4 data over IP.
One group of problems with this approach, however, stems from the
monolithic architecture of MPEG-4. No other multimedia data stream
(including those carried with RTP) can be synchronized with MPEG-4 data
carried directly over UDP. Furthermore, the dynamic scene and session
control concepts can't be extended to non-MPEG-4 data.

Even if the coordination with non-MPEG-4 data is overlooked, carrying
MPEG-4 data over UDP has the following additional shortcomings:

i. Mechanisms need to be defined to protect sensitive parts of MPEG-4
data. Some of these (like FEC) are already defined for RTP.

ii. There is no defined technique for synchronizing MPEG-4 streams from
different servers in the variable delay environment of the Internet.

Avaro/Basso/Casner/Civanlar/Gentric/Herpel/Lim/Perkins               4

             RTP Payload Format for MPEG-4 Streams       November 2000


iii. MPEG-4 streams originating from two servers may collide (their
sources may become unresolvable at the destination) in a multicast
session.

iv. An MPEG-4 back channel needs to be defined for quality feedback
similar to that provided by RTCP.

v. RTP mixers and translators can't be used.

The back-channel problem may be alleviated by developing a reception
reporting protocol like RTCP. Such an effort may benefit from RTCP
design knowledge, but needs extensions.

2.2 RTP header followed by full MPEG-4 headers

This alternative may be implemented by using the send time or the
composition time coming from the reference clock as the RTP timestamp.
This way no new feedback protocol needs to be defined for MPEG-4's back
channel, but RTCP may not be sufficient for MPEG-4's feedback
requirements that are still in the definition stage. Additionally, due
to the duplication of header information, such as the sequence numbers
and time stamps, this alternative causes unnecessary increases in the
overhead. Scene description or dynamic session control can't be
extended to non-MPEG-4 streams also.

2.3 MPEG-4 ESs over RTP with individual payload types

This is the most suitable alternative for coordination with the
existing Internet multimedia transport techniques and does not use
MPEG-4 systems at all. Complete implementation of it requires
definition of potentially many payload types, as already proposed for
audio and video payloads [7], and might lead to constructing new
session and scene description mechanisms. Considering the size of the
work involved which essentially reconstructs MPEG-4 systems, this may
only be a long term alternative if no other solution can be found.

2.4 RTP header followed by a reduced SL header

The inefficiency of the approach described in 2.2 can be fixed by using
a reduced SL header that does not carry duplicate information following
the RTP header.

2.5 Recommendation

Based on the above analysis, the best compromise is to map the MPEG-4
SL packets onto RTP packets, such that the common pieces of the headers
reside in the RTP header that is followed by an optional reduced SL
header providing the MPEG-4 specific information. The details of this
payload format are described in the next section.

3. Payload Format


Avaro/Basso/Casner/Civanlar/Gentric/Herpel/Lim/Perkins               5

             RTP Payload Format for MPEG-4 Streams       November 2000


The RTP Payload consists of an integer number of (modified) SL packets.

Senders SHOULD use the single SL packet per RTP packet mode. This is
the RECOMMENDED usage because it allows leveraging optimally on the RTP
infrastructure.

Senders MAY put several SL packets in a RTP packet for applications
where delay is not critical running on networks that do not support RTP
header compression schemes such as CRTP or TCRTP.

Putting several SL packets in such a RTP packet SHALL exclusively be
used when each SL packets contains a complete Access Unit. I.e. this
scheme is intended for optional better efficiency when transporting
streams with small Access Unit sizes (much smaller than MTU).

RTP Packets SHOULD be sent in the decoding (MPEG-4 decodingTimeStamp)
order.

When transmitting multiple SL packets per RTP packet, SL packets MUST
be in decoding (MPEG-4 decodingTimeStamp) order inside the RTP packet.

The size of the SL packets -or the sum of the sizes in case of multiple
SL packets- SHOULD be adjusted such that the resulting RTP packet is
not larger than the path-MTU. To handle larger packets, this payload
format relies on lower layers for fragmentation, which may not be
desirable.

0                   1                   2                   3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|V=2|P|X|  CC   |M|     PT      |       sequence number         | RTP
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                           timestamp                           |
Header
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|           synchronization source (SSRC) identifier            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
:            contributing source (CSRC) identifiers             :
+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
|SL Packet Header (variable # of bytes)         |               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+               | RTP
|                                                               |
|       SL Packet Payload (byte aligned)                        |
Payload
|                                                               |
|               +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
|               |SL Packet Header (variable # of bytes)         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-|
|                                                               |
|       SL Packet Payload (byte aligned)                        |
|                                                               |
|                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                               :...OPTIONAL RTP padding        |

Avaro/Basso/Casner/Civanlar/Gentric/Herpel/Lim/Perkins               6

             RTP Payload Format for MPEG-4 Streams       November 2000


+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                  Figure 2 - An RTP packet for MPEG-4


3.1. SL Packet header structure

As explained above this payload uses modified SL packet headers
inside RTP packets. However it MUST be possible for receivers to
regenerate (even if only logically) a valid SL packetized stream
(see section 4).

There are 2 cases of usage for this payload.

Either a source of MPEG-4 content such as an encoder or a MPEG-4 file
format reader is used or an SL packetized stream source is used.

In the first case SL Packets are constructed specifically for
encapsulation in this RTP payload, logically however this is equivalent
to generating a SL packetized stream and then apply the modifications.
In this case the (logical) SL packetized stream MUST be generated so
that it is always possible to reconstruct exactly the original SL
packetized stream. This document specifies how to do that as well as
recommended practice for best performance.

In the second case all fields of the original SL Packet headers MUST
remain untouched with a number of exceptions corresponding to the above
motivated modifications. In this case it may happen for some SL
packetized streams that the SL packetized stream reconstructed by the
receiver is valid but not strictly identical to the original one,
unless the original SLConfigDescriptor has been transmitted "out of
band", which is out of the scope of this specification.

We will detail now the modifications of the SL packet header that MUST
be applied to a SL packetized stream before encapsulation in this RTP
payload format.

3.1.1 General rules

When generating SL packetized stream specifically for this format all
other fields in the SL packet headers that the RTP header does not
duplicate (including the decodingTimeStamp) is OPTIONAL.

If the resulting, smaller, SL packet header consumes a non-integer
number of bytes, zero padding bits MUST be inserted at the end of the
SL header to byte-align the SL packet payload. Similarily the SL packet
payload MUST be byte-aligned using zero padding bits.

3.1.2. Time Stamps transformation

The first SL packet of the RTP packet payload includes an SL packet
header without compositionTimeStamp fields since this is transported by
the RTP time stamp. Furthemore other MPEG-4 Time Stamps are encoded as
offsets (see below).

Avaro/Basso/Casner/Civanlar/Gentric/Herpel/Lim/Perkins               7

             RTP Payload Format for MPEG-4 Streams       November 2000


3.1.3. Indication of size

For efficiency SL packets do not carry their own size. This is not an
issue for RTP packets that contain a single SL Packet.

However, when multiple SL packets are put in a RTP packet the size of
each SL packet MUST be computable by the receiver. There are several
ways to do that, all require some MPEG-4 system knowledge. For that
reason only MPEG-4 SL aware devices may re-packetize at the SL packet
levels RTP streams with multiple SL packets per RTP packet defined by
this document.

a. For streams that have a constant Access Unit length (for example
some object types of MPEG-4 audio such as CELP) this length information
can be extracted using the MPEG-4 ObjectDescriptor framework and
therefore SHOULD NOT be transported anywhere in the stream.

b. In all other cases of multiple SL packets per RTP packet and
especially for streams that have variable Access Unit length:
b.1. the SLConfigDescriptor "AU_Length" field MUST indicate the length
in bits of the "accessUnitLength" SL packet header field.
b.2. and all SL packets MUST indicate the Access Unit length in bytes
using the "accessUnitLength" field of the SL packet header.

Since the receiver cannot know the original value of SLConfigDescriptor
(unless it is transported out of band, by some other means) this is a
case when the reconstructed SL packetized stream may not be identical
to the original one. Specifically the reconstructed SL stream is then
perfectly valid but has a slightly higher effective bit rate.

3.1.4. Indication of Access Unit boundaries

Since the conjunction of usage of the M bit (see below) and the rule
that multiple SL packets in a RTP packet MUST only consist of complete
Access Units transports the Access Unit boundaries information, the
presence of accessUnitStartFlag and accessUnitEndFlag in the SL Packet
header is redundant.

However accessUnitStartFlag is almost always required with this format:
- When a RTP packet transports a single SL packet, accessUnitStartFlag
is required to indicate a number of fields such as decodingTimeStamp
and randomAccessPointFlag.
- When a RTP packet transports multiple SL packets, accessUnitStartFlag
is required to indicate accessUnitLength.

Therefore it is RECOMMENDED that in SL packetized streams used for
transport with this format, the accessUnitEndFlag in SL Packet headers
SHOULD NOT be present.

3.1.5 SL packet duplication


Avaro/Basso/Casner/Civanlar/Gentric/Herpel/Lim/Perkins               8

             RTP Payload Format for MPEG-4 Streams       November 2000


Pre existing SL packetized streams may already contain duplicated SL
packets. Since such streams also already have packetSequenceNumber a
receiver will be able to reconstruct the correct order.

Since duplicated SL packets in SL packetized streams must be adjacent,
it is RECOMMENDED, when efficient, that RTP packets boundaries be
placed between duplicated SL packets. I.e. the presence of duplicated
SL packets in a SL packetized stream must be considered as a hint for
RTP packetization regarding this format.

3.1.6 Interleaving.

This format makes SL packet interleaving schemes possible by using the
optional packetSequenceNumber field.

However it raises issues about the subsequently required receiver
buffer size and some points remain to be solved about which
interleaving schemes should be allowed.

For that reason this SHOULD NOT be used until these issues are
clarified and documented.

3.2 RTP Header Fields Usage:

Payload Type (PT): The assignment of an RTP payload type for this new
packet format is outside the scope of this document, and will not be
specified here. It is expected that the RTP profile for a particular
class of applications will assign a payload type for this encoding, or
if that is not done then a payload type in the dynamic range shall be
chosen.

Marker (M) bit: Set to one to mark the last fragment (or only fragment)
of an AU. In case of multiple SL Packets per RTP packet, since each of
these SL Packets MUST carry a complete Access Unit the M bit MUST be
set to one.


Extension (X) bit: Defined by the RTP profile used.

Sequence Number: The RTP sequence number should be generated by the
sender with a constant random offset and does not have to be correlated
to any (optional) MPEG-4 SL sequence numbers. It is incremented by one
for each generated RTP packet with a single exception:

Duplicated RTP packets MUST have identical RTP sequence numbers.

As per rule of SL packetized streams structure "identical" refers to
all SL packet fields with the exception of the objectClockReference
field, which, if present, must be a valid update.


Avaro/Basso/Casner/Civanlar/Gentric/Herpel/Lim/Perkins               9

             RTP Payload Format for MPEG-4 Streams       November 2000


Timestamp: Set to the value in the compositionTimeStamp field of the
first SL packet, if present. If compositionTimeStamp has less than 32
bits length, the MSBs of timestamp MUST be set to zero.

Although it is available from the SL configuration data, the resolution
of the timestamp may need to be conveyed explicitly through some out-
of-band means to be used by network elements which are not MPEG-4
aware.

If compositionTimeStamp has more than 32 bits length, this payload
format cannot be used.

In all cases, the sender SHALL always make sure that RTP time stamps
are identical only for RTP packets transporting fragments of the same
Access Unit.

In case compositionTimeStamp is not present in the current SL packet,
but has been present in a previous SL packet, there are two cases:
either the reason is that this is the same Access Unit that has been
fragmented or these SL Packets are transporting different Access Units.

Senders may have various ways to resolve this issue, for example this
can be detected using accessUnitStartFlag and/or accessUnitEndFlag. If
this choice cannot be resolved (for example because the input SL Packet
stream does contain the necessary information) then the second case
MUST be assumed.

In the first case the same timestamp value MUST be taken as RTP
timestamp.

In the second case as well as when compositionTimeStamp is never
present in SL packets, the sender MUST calculate an appropriate
compositionTimeStamp that will be mapped to the RTP timestamp. It is
out of the scope of this document to describe how this can be done
since it may require overall application knowledge or in some cases
parsing of the SL Packet payload.

Since this operation may be complicated it is RECOMMENDED that when
pre-existing SL packetized streams are transmitted only streams that
have compositionTimeStamps for all Access Units SHOULD be used with
this payload format.

Furthermore since reconstruction by the receiver of decodingTimeStamps
is not possible with this payload format if they are not transmitted,
in the case of streams for which compositionTimeStamp and
decodingTimeStamp may be different:
- When SL Packets are constructed specifically for encapsulation in
this payload format decodingTimeStamps MUST be present for all Access
Units, for which it is not equal to compositionTimeStamp.
- When a pre-existing Sl packetized stream is transmitted it is
RECOMMENDED that only streams that have decodingTimeStamps for all
Access Units for which it is not equal to compositionTimeStamp SHOULD
be used.

Avaro/Basso/Casner/Civanlar/Gentric/Herpel/Lim/Perkins              10

             RTP Payload Format for MPEG-4 Streams       November 2000


The compositionTimeStamp, if present, MUST be removed from the first SL
packet header by bit-shifting the subsequent header elements towards
the beginning of the SL packet header. When unpacking the RTP packet
this process can be reversed with the knowledge of the
SLConfigDescriptor and by evaluating the compositionTimeStampFlag.

When using Multiple SL Packets per RTP packet, the
compositionTimeStamp, if present, of all SL packets after the first one
MUST remain in the SL header. However these compositionTimeStamp MUST
be changed to encode increments computed by substracting the
compositionTimeStamp of the first SL Packet since a RTP network
component may change the RTP time stamps for resynchronization
purposes.

If decodingTimeStamps are present the same offset technique relative to
the RTP header timestamp MUST be used for all SL Packet headers, for
the same reason.

Since this offset computation may lead to negative values a complement
to 2 convention MUST be used.

Since these offsets typically require much less bits the sender MAY
change the SLConfigDescriptor timeStampLength field accordingly.

It is RECOMMENDED that for streams that have constant AU duration and
do not require decodingTimeStamps (for example audio) all SL Packet
headers in the RTP packets SHOULD NOT contain any time stamps since the
compositionTimeStamp can be reconstructed from the RTP time stamp and,
when necessary, the packetSequenceNumber. In that case the durationFlag
of SLConfigDescriptor SHOULD be set to 1 and the accessUnitDuration of
SLConfigDescriptor SHOULD be used to indicate the constant Access Unit
duration.

Timestamps are recommended to start at a random value for security
reasons [5, Section 5.1]. It is RECOMMENDED that this random offset
SHOULD be the same for all streams of the same MPEG-4 media session
since this helps a receiver reconstruct accurate time stamps and
therefore perform accurate synchronization and reconstruction of the SL
packetized streams.

SSRC: set as described in RFC1889 [5]. A mapping between the ES
identifiers (ESIDs) and SSRCs should be provided through out-of-band
means.

CC and CSRC fields are used as described in RFC 1889 [5].

RTCP SHOULD be used as defined in RFC 1889 [5].

RTP timestamps in RTCP SR packets: according to the RTP timing model,
the RTP timestamp that is carried into an RTCP SR packet is the same as
the compositionTimeStamp that would be applied to an RTP packet for
data that was sampled at the instant the SR packet is being generated

Avaro/Basso/Casner/Civanlar/Gentric/Herpel/Lim/Perkins              11

             RTP Payload Format for MPEG-4 Streams       November 2000


and sent. The RTP timestamp value is calculated from the NTP timestamp
for the current time, which also goes in the RTCP SR packet. To perform
that calculation, an implementation needs to periodically establish a
correspondence between the CTS value of a data packet and the NTP time
at which that data was sampled.

4. SL packetized stream reconstruction

The MPEG-4 over IP framework [9] requires that the way a receiver can
reconstruct a valid SL packetized stream shall be documented, this is
the purpose of this section.

Since this format directly transports SL packets this reconstruction is
trivial with the following rules (Rx below).

R1: The SL packet header and SLConfigDescriptor SHALL remain exactly
the same as received with the following exceptions:

R1.1: All time stamps, if present, are restored from the offsets
relative to the RTP timestamp. If needed the SLConfigDescriptor shall
be modified to indicate the correct timeStampLength, the default value
being 32 bits.

R1.2: AU_length is set to zero in SLConfigDescriptor and the
accessUnitLength field is removed for all SL packets. (Unless the
original sender-side SLConfigDescriptor has been transmitted out of
band and indicates a non zero accessUnitLength).

4.1 Signaling losses

The usage of the RTP sequence number and RTP timestamp as specified
above makes sure that at the RTP level losses can always be detected by
the receiver.

Then there are two obvious ways of signaling losses in a SL packetized
stream:
- Using the packetSequenceNumber field.
- Using the compositionTimeStamp field.

However the original (sender side) SL packetized stream may have been
configured without any of the above methods for signaling losses.

In any case the receiver MUST provide a way to signal losses in the
reconstructed SL packetized stream.

R2: The receiver SHALL use one of the following options:

R2.1: The receiver SHOULD use the packetSequenceNumber field.
In order to do so the packetSeqNumLength field of SLConfigDescriptor
MUST be set to a suitable non zero length, the default value being 16
bits, and a suitable packetSequenceNumber MUST be used for each SL
packet.


Avaro/Basso/Casner/Civanlar/Gentric/Herpel/Lim/Perkins              12

             RTP Payload Format for MPEG-4 Streams       November 2000


R2.2: The receiver MAY in addition or alternatively use the
compositionTimeStamp field. In order to do so the
useAccessUnitStartFlag and useTimeStampsFlag of SLConfigDescriptor
SHALL be set to 1, then for for each SL packet accessUnitStartFlag and
compositionTimeStampFlag SHALL be set to 1 and a suitable
compositionTimeStamp SHALL be computed.

This computation is trivial for RTP packets containing a single SL
packet.

For RTP packets containing multiple SL packets, the computation of the
compositionTimeStamp is trivial if the accessUnitDuration field of
SLConfigDescriptor is present. If not present it is out of the scope of
this document to describe how this can be done since it may require
overall application knowledge or in some cases parsing of the SL Packet
payload. For that reason R2.1 is the RECOMMENDED method.

Note that with this payload format the decodingTimeStamp field cannot
be reconstructed in the general case if it was not transported (as an
offset). However implementations MAY estimate suitable values.

5. Multiplexing

Since a typical MPEG-4 session may involve a large number of objects,
that may be as many as a few hundred, transporting each ES as an
individual RTP session may not always be practical. Allocating and
controlling hundreds of destination addresses for each MPEG-4 session
may pose insurmountable session administration problems.  The
input/output processing overhead at the end-points will be extremely
high also. Additionally, low delay transmission of low bitrate data
streams, e.g. facial animation parameters, results in extremely high
header overheads.

To solve these problems, MPEG-4 data transport requires a multiplexing
scheme that allows selective bundling of several ESs. This is beyond
the scope of the payload format defined here. MPEG-4's Flexmux
multiplexing scheme may be used for this purpose by defining an
additional RTP payload format for "multiplexed MPEG-4 streams." On the
other hand, considering that many other payload types may have similar
needs, a better approach may be to develop a generic RTP multiplexing
scheme usable for MPEG-4 data. The multiplexing scheme reported in [8]
may be a candidate for this approach.

For MPEG-4 applications, the multiplexing technique needs to address
the following requirements:

i. The ESs multiplexed in one stream can change frequently during a
session. Consequently, the coding type, individual packet size and
temporal relationships between the multiplexed data units must be
handled dynamically.


Avaro/Basso/Casner/Civanlar/Gentric/Herpel/Lim/Perkins              13

             RTP Payload Format for MPEG-4 Streams       November 2000


ii. The multiplexing scheme should have a mechanism to determine the ES
identifier (ES_ID) for each of the multiplexed packets. ES_ID is not a
part of the SL header.

iii. In general, an SL packet does not contain information about its
size. The multiplexing scheme should be able to delineate the
multiplexed packets whose lengths may vary from a few bytes to close to
the path-MTU.

6. Security Considerations

RTP packets using the payload format defined in this specification are
subject to the security considerations discussed in the RTP
specification [5]. This implies that confidentiality of the media
streams is achieved by encryption. Because the data compression used
with this payload format is applied end-to-end, encryption may be
performed on the compressed data so there is no conflict between the
two operations. The packet processing complexity of this payload type
does not exhibit any significant non-uniformity in the receiver side to
cause a denial-of-service threat.

However, it is possible to inject non-compliant MPEG streams (Audio,
Video, and Systems) to overload the receiver/decoder's buffers which
might compromise the functionality of the receiver or even crash it.
This is especially true for end-to-end systems like MPEG where the
buffer models are precisely defined.

MPEG-4 Systems supports stream types including commands that are
executed on the terminal like OD commands, BIFS commands, etc. and
programmatic content like MPEG-J (Java(TM) Byte Code) and ECMASCRIPT.
It is possible to use one or more of the above in a manner non-
compliant to MPEG to crash or temporarily make the receiver
unavailable.

Authentication mechanisms can be used to validate of the sender and the
data to prevent security problems due to non-compliant malignant MPEG-4
streams.

A security model is defined in MPEG-4 Systems streams carrying MPEG-J
access units which comprises Java(TM) classes and objects. MPEG-J
defines a set of Java APIs and a secure execution model.  MPEG-J
content can call this set of APIs and Java(TM) methods from a set of
Java packages supported in the receiver within the defined security
model. According to this security model, downloaded byte code is
forbidden to load libraries, define native methods, start programs,
read or write files, or read system properties.

Receivers can implement intelligent filters to validate the buffer
requirements or parametric (OD, BIFS, etc.) or programmatic (MPEG-J,
ECMAScript) commands in the streams. However, this can increase the
complexity significantly.

7. Types, names and SDP

Avaro/Basso/Casner/Civanlar/Gentric/Herpel/Lim/Perkins              14

             RTP Payload Format for MPEG-4 Streams       November 2000


The encoding name associated to this RTP payload format is "mpeg4-sl".

The media type may be any of:
- "application"
- "video"
- "audio"

"application" SHOULD be used for MPEG-4 system streams (ISO/IEC 14496-
1).

"video" SHOULD be used for MPEG-4 video streams (ISO/IEC 14496-2).

"audio" SHOULD be used for MPEG-4 audio streams (ISO/IEC 14496-3).

7.1. SDP file example

In the following is an example of SDP syntax for the description of a
session containing one MPEG-4 audio stream, one MPEG-4 video and one
MPEG-4 system stream, transported using this format.

o= ....
I= ....
c=IN IP4 123.234.71.112
m=video 1034 RTP/AVT 97
a=rtpmap:97 mpeg4-sl
m=audio 810  RTP/AVT 98
a=rtpmpa:98 mpeg4-sl
m=application 1234  RTP/AVT 99
a=rtpmap:99 mpeg4-sl

8. Examples of usage of this payload format

8.1 MPEG-4 Video

Let us consider the case of a 30 frames per second MPEG-4 video stream
which bit rate is high enough that Access Units have to be split in
several SL packets (typically above 300 kb/s).

Let us assume also that the video codec generates in that case Video
Packets suitable to fit in one SL packet i.e that the video codec is
MTU aware and the MTU is 1500 bytes. We assume furthermore that this
stream contains B frames and therefore that decodingTimeStamps are
required.

8.1.1 Typical SLConfigDescriptor for video streams

In this example the SLConfigDescriptor is:

class SLConfigDescriptor extends BaseDescriptor : bit(8)
tag=SLConfigDescrTag {
  bit(8) predefined;
  if (predefined==0) {

Avaro/Basso/Casner/Civanlar/Gentric/Herpel/Lim/Perkins              15

             RTP Payload Format for MPEG-4 Streams       November 2000


    bit(1) useAccessUnitStartFlag; = 1
    bit(1) useAccessUnitEndFlag; = 0
    bit(1) useRandomAccessPointFlag; = 1
    bit(1) hasRandomAccessUnitsOnlyFlag; = 0
    bit(1) usePaddingFlag; = 0
    bit(1) useTimeStampsFlag; = 1
    bit(1) useIdleFlag; = 0
    bit(1) durationFlag; = 0
    bit(32) timeStampResolution; = 30
    bit(32) OCRResolution; = 0
    bit(8) timeStampLength;     // must be <= 64  = 5
    bit(8) OCRLength;           // must be <= 64 = 0
    bit(8) AU_Length;           // must be <= 32 = 0
    bit(8) instantBitrateLength; = 0
    bit(4) degradationPriorityLength; = 0
    bit(5) AU_seqNumLength; // must be <= 16 = 0
    bit(5) packetSeqNumLength; // must be <= 16 = 0
    bit(2) reserved=0b11;
  }
  if (durationFlag) {
    bit(32) timeScale; // NOT USED
    bit(16) accessUnitDuration;  // NOT USED
    bit(16) compositionUnitDuration;  // NOT USED
  }
  if (!useTimeStampsFlag) {
    bit(timeStampLength) startDecodingTimeStamp; = 0
    bit(timeStampLength) startCompositionTimeStamp; = 0
  }
}

Note the useRandomAccessPointFlag is set so that the
randomAccessPointFlag can indicate that the corresponding SL packet
contains a GOV and the first Video Packet of an Intra coded frame.

8.1.2 Typical SL packet header structure for video streams

With this configuration we can extrapolate the following SL packet
header structure:


aligned(8) class SL_PacketHeader (SLConfigDescriptor SL) {
  if (SL.useAccessUnitStartFlag) bit(1) accessUnitStartFlag; // 1 bit
  if (SL.useAccessUnitEndFlag) bit(1) accessUnitEndFlag; // NOT USED
  if (SL.OCRLength>0) bit(1) OCRflag; // NOT USED
  if (SL.useIdleFlag) bit(1) idleFlag; // NOT USED
  if (SL.usePaddingFlag) bit(1) paddingFlag; // NOT USED
  if (paddingFlag) bit(3) paddingBits; // NOT USED
  if (!idleFlag && (!paddingFlag || paddingBits!=0)) {
    if (SL.packetSeqNumLength>0) bit(SL.packetSeqNumLength)
packetSequenceNumber; // NOT USED
    if (SL.degradationPriorityLength>0) bit(1) DegPrioflag; // NOT USED
    if (DegPrioflag) bit(SL.degradationPriorityLength)
degradationPriority; // NOT USED

Avaro/Basso/Casner/Civanlar/Gentric/Herpel/Lim/Perkins              16

             RTP Payload Format for MPEG-4 Streams       November 2000


    if (OCRflag) bit(SL.OCRLength) objectClockReference; // NOT USED
    if (accessUnitStartFlag) {
      if (SL.useRandomAccessPointFlag) bit(1) randomAccessPointFlag; //
1 bit
      if (SL.AU_seqNumLength >0) bit(SL.AU_seqNumLength)
AU_sequenceNumber; // NOT USED
      if (SL.useTimeStampsFlag) {
        bit(1) decodingTimeStampFlag; // 1 bit
        bit(1) compositionTimeStampFlag; // 1 bit
      }
      if (SL.instantBitrateLength>0) bit(1) instantBitrateFlag; // NOT
USED
      if (decodingTimeStampFlag) bit(SL.timeStampLength)
decodingTimeStamp; // 5 bits, if present
      if (compositionTimeStampFlag) bit(SL.timeStampLength)
compositionTimeStamp; // NOT USED
      if (SL.AU_Length > 0) bit(SL.AU_Length) accessUnitLength; // NOT
USED
      if (instantBitrateFlag) bit(SL.instantBitrateLength)
instantBitrate; // NOT USED
    }
  }
}

Therefore we finally have the following header

aligned(8) class SL_PacketHeader (SLConfigDescriptor SL) {
  bit(1) accessUnitStartFlag; // 1 bit = 0
}

or

aligned(8) class SL_PacketHeader (SLConfigDescriptor SL) {
  bit(1) accessUnitStartFlag; // 1 bit = 1
  bit(1) randomAccessPointFlag; // 1 bit
  bit(1) decodingTimeStampFlag; // 1 bit
  bit(1) compositionTimeStampFlag; // 1 bit = 0
  bit(SL.timeStampLength) decodingTimeStamp; // 5 bits, if present
}


Note the compositionTimeStamp is not present since it would be
redundant with the RTP time stamp, therefore the value of
compositionTimeStampFlag is set to zero.


Which leads to the following cases:
- For SL packets of non-first fragment of frames (that do not have to
carry neither CTS nor DTS) the SL packet header is 1 bit (aligned to 1
byte).


Avaro/Basso/Casner/Civanlar/Gentric/Herpel/Lim/Perkins              17

             RTP Payload Format for MPEG-4 Streams       November 2000


- For SL packets of start of frames that have CTS equal to DTS (that do
not have to carry neither CTS nor DTS) the SL packet header is 4 bits
(aligned to 1 byte).

- For SL packets of start of frames that have CTS not equal to DTS, the
SL packet header is 9 bits (aligned to 2 bytes).

Note that for streams that do not have B frames the SL packet header is
always 1 byte.

Also note that for very high bit rates the majority of RTP packets will
transport SL packets with only 1 significant bit in the header i.e.
accessUnitStartFlag set to 0.

8.1.3 Overhead estimation

In the worst case we have a RTP overhead of 40 + 2 bytes for 1400 bytes
of payload i.e. 3 % overhead.

8.1.4 Non SL aware receivers

Note that for streams without B frames a non SL-aware but MPEG-4 video
aware receiver would just have to skip the first byte after the RTP
header since the SL packet header is then always one byte.

For a MPEG-4 video aware receivers detection of video resynch markers
is another possibility. Note then that non-emulation of these markers
in the SL packet headers is not guaranteed but ambiguities should be
easy to lift.

8.2 MPEG-4 Audio

8.2.1 MPEG-4 Audio for low delay applications

For low delay applications we can assume a single SL packet per RTP
packet.

8.2.1.1. Typical SLConfigDescriptor for MPEG-4 Audio low delay
applications

In that case signaling the AU_length is not needed. Also since CTS=DTS
signaling of MPEG-4 time stamps is not needed either.

We also assume here an audio Object Type for which all Access Units are
Random Access Points, which is signaled using the
hasRandomAccessUnitsOnlyFlag in the SLConfigDescriptor.

In this example the SLConfigDescriptor can be:

class SLConfigDescriptor extends BaseDescriptor : bit(8)
tag=SLConfigDescrTag {
  bit(8) predefined;
  if (predefined==0) {

Avaro/Basso/Casner/Civanlar/Gentric/Herpel/Lim/Perkins              18

             RTP Payload Format for MPEG-4 Streams       November 2000


    bit(1) useAccessUnitStartFlag; = 0
    bit(1) useAccessUnitEndFlag; = 0
    bit(1) useRandomAccessPointFlag; = 0
    bit(1) hasRandomAccessUnitsOnlyFlag; = 1
    bit(1) usePaddingFlag; = 0
    bit(1) useTimeStampsFlag; = 0
    bit(1) useIdleFlag; = 0
    bit(1) durationFlag; = 0
    bit(32) timeStampResolution; = 0
    bit(32) OCRResolution; = 0
    bit(8) timeStampLength;     // must be <= 64  = 0
    bit(8) OCRLength;           // must be <= 64 = 0
    bit(8) AU_Length;           // must be <= 32 = 0
    bit(8) instantBitrateLength; = 0
    bit(4) degradationPriorityLength; = 0
    bit(5) AU_seqNumLength; // must be <= 16 = 0
    bit(5) packetSeqNumLength; // must be <= 16 = 0
    bit(2) reserved=0b11;
  }
  if (durationFlag) {
    bit(32) timeScale; // NOT USED
    bit(16) accessUnitDuration;  // NOT USED
    bit(16) compositionUnitDuration;  // NOT USED
  }
  if (!useTimeStampsFlag) {
    bit(timeStampLength) startDecodingTimeStamp; = 0
    bit(timeStampLength) startCompositionTimeStamp; = 0
  }
}

8.2.1.2. Typical SL packet header for MPEG-4 Audio low delay
applications

With this configuration we can extrapolate the following SL packet
header structure:

aligned(8) class SL_PacketHeader (SLConfigDescriptor SL) {
  if (SL.useAccessUnitStartFlag) bit(1) accessUnitStartFlag; // NOT
USED
  if (SL.useAccessUnitEndFlag) bit(1) accessUnitEndFlag; // NOT USED
  if (SL.OCRLength>0) bit(1) OCRflag; // NOT USED
  if (SL.useIdleFlag) bit(1) idleFlag; // NOT USED
  if (SL.usePaddingFlag) bit(1) paddingFlag; // NOT USED
  if (paddingFlag) bit(3) paddingBits; // NOT USED
  if (!idleFlag && (!paddingFlag || paddingBits!=0)) {
    if (SL.packetSeqNumLength>0) bit(SL.packetSeqNumLength)
packetSequenceNumber; // NOT USED
    if (SL.degradationPriorityLength>0) bit(1) DegPrioflag; // NOT USED
    if (DegPrioflag) bit(SL.degradationPriorityLength)
degradationPriority; // NOT USED
    if (OCRflag) bit(SL.OCRLength) objectClockReference; // NOT USED
    if (accessUnitStartFlag) {


Avaro/Basso/Casner/Civanlar/Gentric/Herpel/Lim/Perkins              19

             RTP Payload Format for MPEG-4 Streams       November 2000


      if (SL.useRandomAccessPointFlag) bit(1) randomAccessPointFlag; //
NOT USED
      if (SL.AU_seqNumLength >0) bit(SL.AU_seqNumLength)
AU_sequenceNumber; // NOT USED
      if (SL.useTimeStampsFlag) {
        bit(1) decodingTimeStampFlag; // NOT USED
        bit(1) compositionTimeStampFlag; // NOT USED
       }
      if (SL.instantBitrateLength>0) bit(1) instantBitrateFlag; // NOT
USED
      if (decodingTimeStampFlag) bit(SL.timeStampLength)
decodingTimeStamp; // NOT USED
      if (compositionTimeStampFlag) bit(SL.timeStampLength)
compositionTimeStamp; // NOT USED
      if (SL.AU_Length > 0) bit(SL.AU_Length) 
        accessUnitLength; // NOT USED
      if (instantBitrateFlag) bit(SL.instantBitrateLength)
instantBitrate; // NOT USED
    }
  }
}

Therefore we finally have an empty (so called "null") SL header.

8.2.1.3. Overhead estimation for MPEG-4 Audio low delay applications

Depending on the actual MPEG-4 audio Object Type used the RTP overhead
(IP+UDP+RTP headers) can be very large since the SL packet payload can
be a few bytes or less.

8.2.2 MPEG-4 Audio for media delivery applications

For media delivery applications we can use multiple SL packets per RTP
packet since in that case delay is not a major concern.

8.2.2.1. Typical SLConfigDescriptor for MPEG-4 Audio media delivery
applications

In that case signaling the AU_length is required. Also since CTS=DTS
signaling of MPEG-4 time stamps is not needed.

In this example a typical SLConfigDescriptor is:

class SLConfigDescriptor extends BaseDescriptor : bit(8)
tag=SLConfigDescrTag {
  bit(8) predefined;
  if (predefined==0) {
    bit(1) useAccessUnitStartFlag; = 1
    bit(1) useAccessUnitEndFlag; = 0
    bit(1) useRandomAccessPointFlag; = 0
    bit(1) hasRandomAccessUnitsOnlyFlag; = 1
    bit(1) usePaddingFlag; = 0
    bit(1) useTimeStampsFlag; = 0

Avaro/Basso/Casner/Civanlar/Gentric/Herpel/Lim/Perkins              20

             RTP Payload Format for MPEG-4 Streams       November 2000


    bit(1) useIdleFlag; = 0
    bit(1) durationFlag; = 0
    bit(32) timeStampResolution; = 0
    bit(32) OCRResolution; = 0
    bit(8) timeStampLength;     // must be <= 64  = 0
    bit(8) OCRLength;           // must be <= 64 = 0
    bit(8) AU_Length;           // must be <= 32 = 7
    bit(8) instantBitrateLength; = 0
    bit(4) degradationPriorityLength; = 0
    bit(5) AU_seqNumLength; // must be <= 16 = 0
    bit(5) packetSeqNumLength; // must be <= 16 = 0
    bit(2) reserved=0b11;
  }
  if (durationFlag) {
    bit(32) timeScale; // NOT USED
    bit(16) accessUnitDuration;  // NOT USED
    bit(16) compositionUnitDuration;  // NOT USED
  }
  if (!useTimeStampsFlag) {
    bit(timeStampLength) startDecodingTimeStamp; = 0
    bit(timeStampLength) startCompositionTimeStamp; = 0
  }
}


8.2.2.2. Typical SL packet header for MPEG-4 Audio media delivery
applications

With this configuration we can extrapolate the following SL packet
header structure:

aligned(8) class SL_PacketHeader (SLConfigDescriptor SL) {
  if (SL.useAccessUnitStartFlag) bit(1) accessUnitStartFlag; // = 1
  if (SL.useAccessUnitEndFlag) bit(1) accessUnitEndFlag; // NOT USED
  if (SL.OCRLength>0) bit(1) OCRflag; // NOT USED
  if (SL.useIdleFlag) bit(1) idleFlag; // NOT USED
  if (SL.usePaddingFlag) bit(1) paddingFlag; // NOT USED
  if (paddingFlag) bit(3) paddingBits; // NOT USED
  if (!idleFlag && (!paddingFlag || paddingBits!=0)) {
    if (SL.packetSeqNumLength>0) bit(SL.packetSeqNumLength)
packetSequenceNumber; // NOT USED
    if (SL.degradationPriorityLength>0) bit(1) DegPrioflag; // NOT USED
    if (DegPrioflag) bit(SL.degradationPriorityLength)
degradationPriority; // NOT USED
    if (OCRflag) bit(SL.OCRLength) objectClockReference; // NOT USED
    if (accessUnitStartFlag) {
      if (SL.useRandomAccessPointFlag) bit(1) randomAccessPointFlag; //
NOT USED
      if (SL.AU_seqNumLength >0) bit(SL.AU_seqNumLength)
AU_sequenceNumber; // NOT USED
      if (SL.useTimeStampsFlag) {
        bit(1) decodingTimeStampFlag; // NOT USED
        bit(1) compositionTimeStampFlag; // NOT USED

Avaro/Basso/Casner/Civanlar/Gentric/Herpel/Lim/Perkins              21

             RTP Payload Format for MPEG-4 Streams       November 2000


      }
      if (SL.instantBitrateLength>0) bit(1) instantBitrateFlag; // NOT
USED
      if (decodingTimeStampFlag) bit(SL.timeStampLength)
decodingTimeStamp; // NOT USED
      if (compositionTimeStampFlag) bit(SL.timeStampLength)
compositionTimeStamp; // NOT USED
      if (SL.AU_Length > 0) bit(SL.AU_Length) accessUnitLength; // 7
bits
      if (instantBitrateFlag) bit(SL.instantBitrateLength)
instantBitrate; // NOT USED
    }
  }
}


The result is:

aligned(8) class SL_PacketHeader (SLConfigDescriptor SL) {
  bit(1) accessUnitStartFlag; // 1 bit
  bit(SL.AU_Length) accessUnitLength; // 7 bits
}

Note that for very high bit rate MPEG-4 audio the Access Unit length
may become more than 127 bytes, in that case AU_Length can be set to 15
bits in SLConfigDescriptor, since the SL packet header length would
jump to 2 bytes anyway because of the byte alignment.

8.2.2.3. Overhead estimation for MPEG-4 Audio media delivery
applications

The resulting overhead can be computed as follows:

At bit rate (BR) we compute the average Access Unit size (AvS) in bytes
using the Access Unit duration (AuDur) in milliseconds as:
AvS = (int)(BR/8*AuDur/1000)

For example 8 kb/s CELP with AuDur=20 ms, which leads to AvS=20 bytes.
In the same context as before we can assume 70 Access Units per RTP
packets, therefore the overhead is 40 bytes for RTP+UDP+IP plus 70
bytes of SL headers i.e. the overhead is 8 %.

For high bit rate audio the number of SL packets per RTP packet will
decrease, leading to better overhead figures.

9. References

[1] ISO/IEC 14496-1:2000 MPEG-4 Systems October 2000

[2] ISO/IEC 14496-2:1999/Amd.1:2000(E) MPEG-4 Visual January 2000

[3] ISO/IEC 14496-3:1999/FDAM 1:20000 MPEG-4 Audio January 2000


Avaro/Basso/Casner/Civanlar/Gentric/Herpel/Lim/Perkins              22

             RTP Payload Format for MPEG-4 Streams       November 2000


[4] ISO/IEC 14496-6 FDIS Delivery Multimedia Integration Framework,
November 1998.

[5] Schulzrinne, Casner, Frederick, Jacobson RTP: A Transport Protocol
for Real Time Applications  RFC 1889, Internet Engineering Task Force,
January 1996.

[6] S. Bradner, Key words for use in RFCs to Indicate Requirement
Levels, RFC 2119, March 1997.

[7] Y. Kikuchi, T. Nomura, S. Fukunaga, Y. Matsui, H. Kimata, RTP
payload format for MPEG-4 Audio/Visual streams, work in progress,
draft-ietf-avt-rtp-mpeg4-es-05.txt, September 2000.

[8] B. Thompson, T. Koren, D. Wing, Tunneling multiplexed Compressed
RTP ("TCRTP"), work in progress, draft-ietf-avt-tcrtp-01.txt, July
2000.

[9] D. Singer, Y Lim, A Framework for the delivery of MPEG-4 over IP-
based Protocols, work in progress, draft-singer-mpeg4-ip-01.txt,October
2000.

10. Authors' Addresses

Olivier Avaro
France Telecom
e-mail: olivier.avaro@francetelecom.fr

Andrea Basso
AT&T Labs - Research
100 Schultz Drive
Red Bank, NJ 07701
USA
e-mail: basso@research.att.com

Stephen L. Casner
Packet Design, Inc.
66 Willow Place
Menlo Park, CA 94025
USA
casner@acm.org

M. Reha Civanlar
AT&T Labs - Research
100 Schultz Drive
Red Bank, NJ 07701
USA
e-mail: civanlar@research.att.com

Philippe Gentric
Philips Digital Networks
22 Avenue Descartes
94453 Limeil-Brevannes CEDEX

Avaro/Basso/Casner/Civanlar/Gentric/Herpel/Lim/Perkins              23

             RTP Payload Format for MPEG-4 Streams       November 2000


France
e-mail: philippe.gentric@philips.com

Carsten Herpel
THOMSON multimedia
Karl-Wiechert-Allee 74
30625 Hannover
Germany
e-mail: herpelc@thmulti.com

Young-kwon Lim
mp4cast (MPEG-4 Internet Broadcasting Solution Consortium)
1001-1 Daechi-Dong Gangnam-Gu
Seoul, 305-333,
Korea
e-mail : young@techway.co.kr


Colin Perkins
USC Information Sciences Institute
4350 N. Fairfax Drive #620
Arlington, VA 22203
USA
e-mail: csp@isi.edu


Avaro/Basso/Casner/Civanlar/Gentric/Herpel/Lim/Perkins              24