Network Working Group                                         S. Wenger
Internet-Draft                                               Y.-K. Wang
Intended status: Standards Track                                  Nokia
Expires: January 09, 2008                                    T. Schierl
                                                         Fraunhofer HHI
                                                          July 09, 2007


                   RTP Payload Format for SVC Video
                     draft-ietf-avt-rtp-svc-02.txt


Status of this Memo

   By submitting this Internet-Draft, each author represents that any
   applicable patent or other IPR claims of which he or she is aware
   have been or will be disclosed, and any of which he or she becomes
   aware will be disclosed, in accordance with Section 6 of BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt.

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

   This Internet-Draft will expire on January 09, 2008.

Copyright Notice

   Copyright (C) The IETF Trust (2007).


Internet-Draft        RTP Payload Format for SVC Video       July 2007

Abstract

   This memo describes an RTP payload format for the scalable extension
   of the ITU-T Recommendation H.264 video codec which is technically
   identical to ISO/IEC International Standard 14496-10.  The RTP
   payload format allows for packetization of one or more Network
   Abstraction Layer (NAL) units, produced by the video encoder, in
   each RTP payload.  The payload format has wide applicability, such
   as low bit-rate conversational, Internet video streaming, or high
   bit-rate entertainment quality video.


Wenger, Wang, Schierl     Expires January 09, 2008          [page 2]

Internet-Draft        RTP Payload Format for SVC Video       July 2007

Table of Content

   RTP Payload Format for SVC Video...................................1
   1.   Introduction..................................................5
   1.1.    SVC -- the scalable extension of H.264/AVC.................5
   2.   Conventions...................................................5
   3.   The SVC Codec.................................................6
   3.1.    Overview...................................................6
   3.2.    Parameter Set Concept......................................7
   3.3.    Network Abstraction Layer Unit Header......................8
   4.   Scope........................................................11
   5.   Definitions and Abbreviations................................12
   5.1.    Definitions...............................................12
   5.1.1.  Definitions per SVC specification.........................12
   5.1.2.  Definitions local to this memo............................13
   5.2.    Abbreviations.............................................14
   6.   RTP Payload Format...........................................14
   6.1.    Design Principles.........................................14
   6.2.    RTP Header Usage..........................................15
   6.3.    Common Structure of the RTP Payload Format................15
   6.4.    NAL Unit Header Usage.....................................15
   6.5.    Packetization Modes.......................................16
   6.6.    Decoding Order Number (DON)...............................17
   6.7.    Aggregation Packets.......................................17
   6.8.    Fragmentation Units (FUs).................................18
   6.9.    Payload Content Scalability Information (PACSI) NAL Unit..18
   7.   Packetization Rules..........................................22
   8.   De-Packetization Process (Informative).......................23
   9.   Payload Format Parameters....................................23
   9.1.    MIME Registration.........................................24
   9.2.    SDP Parameters............................................26
   9.2.1.  Mapping of MIME Parameters to SDP.........................26
   9.2.2.  Usage with the SDP Offer/Answer Model.....................26
   9.2.3.  Usage with Session multiplexing...........................26
   9.2.4.  Usage in Declarative Session Descriptions.................27
   9.3.    Examples..................................................27
   9.4.    Parameter Set Considerations..............................27
   10.  Security Considerations......................................27
   11.  Congestion Control...........................................27
   12.  IANA Consideration...........................................28

Wenger, Wang, Schierl     Expires January 09, 2008          [page 3]

Internet-Draft        RTP Payload Format for SVC Video       July 2007

   13.  Informative Appendix: Application Examples...................28
   13.1.   Introduction..............................................28
   13.2.   Layered Multicast.........................................29
   13.3.   Streaming of an SVC scalable stream.......................30
   13.4.   Multicast to MANE, SVC scalable stream to endpoint........30
   13.5.   Scenarios currently not considered for complexity reasons.32
   13.6.   Scenarios currently not considered for being unaligned with
   IP philosophy.....................................................33
   13.7.   SSRC Multiplexing.........................................34
   14.  References...................................................35
   14.1.   Normative References......................................35
   14.2.   Informative References....................................36
   15.  Author's Addresses...........................................36
   16.  Copyright Statement..........................................37
   17.  Disclaimer of Validity.......................................37
   18.  Intellectual Property Statement..............................37
   19.  Acknowledgement..............................................38
   20.  RFC Editor Considerations....................................38
   21.  Open Issues..................................................38
   22.  Changes Log..................................................38


Wenger, Wang, Schierl     Expires January 09, 2008          [page 4]

Internet-Draft        RTP Payload Format for SVC Video       July 2007


1. Introduction

1.1. SVC -- the scalable extension of H.264/AVC

   This memo specifies an RTP [RFC3550] payload format for a
   forthcoming new mode of the H.264/AVC video coding standard, known
   as Scalable Video Coding (SVC).  Formally, SVC takes the form of
   Amendment 3 to ISO/IEC 14496 Part 10 [MPEG4-10], and ITU-T Rec.
   H.264 [H.264].

   The current specification of SVC is available in [SVC].  In this
   memo, SVC is used as an acronym for the mentioned scalable extension
   of H.264/AVC as defined in the new Annex G of ISO/IEC 14496 Part 10
   and ITU-T Rec. H.264.  In that, SVC is a superset of H.264/AVC.

   SVC covers the whole application ranges of H.264/AVC.  This range is
   considerable, starting with low bit-rate Internet streaming
   applications to HDTV broadcast and Digital Cinema with nearly
   lossless coding and requiring dozens or hundreds of MBit/s.

   This memo tries to follow a backward compatible enhancement
   philosophy similar to what the video coding standardization
   committees implement, by keeping as close an alignment to the
   H.264/AVC payload format [RFC3984] as possible.  It documents the
   enhancements relevant from an RTP transport viewpoint, defines
   signaling support for SVC, and deprecates the single NAL unit
   packetization mode of RFC 3984.

2. Conventions

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in BCP 14, RFC 2119
   [RFC2119].

   This specification uses the notion of setting and clearing a bit
   when bit fields are handled.  Setting a bit is the same as assigning
   that bit the value of 1 (On).  Clearing a bit is the same as
   assigning that bit the value of 0 (Off).

Wenger, Wang, Schierl     Expires January 09, 2008          [page 5]

Internet-Draft        RTP Payload Format for SVC Video       July 2007


3. The SVC Codec

3.1. Overview

   SVC provides scalable video bitstreams.  In SVC, a scalable video
   bitstream contains a base layer conforming to the profiles of H.264
   as defined in Annex A of [H.264], and one or more enhancement
   layers, denoted as Layers.  A Layer may be the base Layer or enhance
   the temporal resolution (i.e. the frame rate), the spatial
   resolution, or the quality of the video content, relative to the
   quality represented without the Layer.  Note, that the definition of
   Layer in this memo encompasses temporal, spatial and fidelity
   enhancements.

   Each RTP session can carry NAL units belonging to one or more
   Layers.  The NAL unit headers include information associating a
   given NAL unit to a Layer.  Therefore, extracting individual Layers
   from an RTP session containing more than one Layer is a lightweight
   operation, involving only fixed length bit fields in the header, as
   documented in this memo and in [SVC].

   Multiple RTP sessions, regardless of whether they carry a single
   Layer or multiple Layers as discussed above, can meaningfully be
   used to transport the whole scalable bitstream, or Operation Points
   thereof.  An Operation Point consists of only those Layers necessary
   to reconstruct a given quality (in temporal, spatial and fidelity
   dimensions).

   The concept of video coding layer (VCL) and network abstraction
   layer (NAL) is inherited from H.264.  The VCL contains the signal
   processing functionality of the codec; mechanisms such as transform,
   quantization, motion-compensated prediction, loop filtering and
   inter-layer prediction.  A coded picture in H.264 consists of one or
   more slices.  Within one access unit, a coded picture representing
   an Operation Point consists of all the coded slices required for
   decoding up to a particular Layer at the time instance corresponding
   to the access unit.  The Network Abstraction Layer (NAL)
   encapsulates each slice generated by the VCL into one or more
   Network Abstraction Layer Units (NAL units).  Please consult RFC

Wenger, Wang, Schierl     Expires January 09, 2008          [page 6]

Internet-Draft        RTP Payload Format for SVC Video       July 2007

   3984 for a more in-depth discussion of the NAL unit concept.  SVC
   specifies the decoding order of the NAL units.

   "Layer" in the terms "Video Coding Layer" and "Network Abstraction
   Layer" refers to a conceptual distinction, and is closely related to
   syntax layers (block, macroblock, slice, ... layers).  "Layer" here
   describes a syntax level of the bitstream in contrast to a part of
   the layered bitstream, which may be discarded.  It should not be
   confused with base and enhancement Layers.

   The concept of temporal scalability is not newly introduced by SVC,
   as profiles conforming to Annex A of [H.264] already support it.  In
   [H.264], sub-sequences have been introduced in order to allow
   optional use of temporal layers.  SVC extends this approach by
   advertising the temporal scalability information within the NAL unit
   header, or prefix NAL units, as discussed in section 3.3 of this
   memo and in [SVC].

   The concept of scaling the visual content quality in the granularity
   of complete enhancement Layers, i.e. through omitting the transport
   and decoding of entire Layers, is denoted as spatial scalability or
   Signal-to-Noise Ratio (SNR) scalability, the latter is also know as
   Coarse-Grained Scalability (CGS).  This is what is commonly
   understood as scalability in the IETF community.  In addition, SVC
   also offers the concept another type of SNR scalability, the Medium-
   Grained Scalability (MGS).  MGS involves selectively omitting the
   reconstruction of NAL units belonging to the MGS layer.  The
   selection of the NAL units to omit can be based on fixed length
   fields in the NAL unit header.


3.2. Parameter Set Concept

   The parameter set concept is inherited from [H.264].  Please refer
   to section 1.2 of RFC 3984 for more details.

   In SVC, pictures from different layers, defined as layer
   representations in [SVC] (Note: A layer representation in [SVC] is
   identified by a single value of dependency_id and a single value of

Wenger, Wang, Schierl     Expires January 09, 2008          [page 7]

Internet-Draft        RTP Payload Format for SVC Video       July 2007

   quality_id), may use the same sequence or picture parameter set, but
   may also use different sequence or picture parameter sets.  If
   different sequence parameter sets are used, then, at any time
   instant during the decoding process, there may be one active
   sequence parameter set (for the layer representation with the
   highest dependency_id) and one or more active layer sequence
   parameter set(s) (for lower layer representations).  Any specific
   active sequence parameter set and active layer sequence parameter
   set remains unchanged throughout a coded video sequence in the Layer
   in which the active sequence parameter set is referred to.  The
   active picture parameter set remains unchanged within a coded
   picture.

3.3. Network Abstraction Layer Unit Header

   An SVC NAL unit (type 20) consists of a header of four octets and
   the payload byte string.  It encapsulates VCL data as defined in
   Annex G of [SVC].  A special type of an SVC NAL unit is the prefix
   NAL unit (type 14) that includes descriptive information of the
   following NAL unit.

   SVC extends the NAL unit header defined for NAL units conforming to
   profiles defined in Annex A of [H.264] by three additional octets.
   The header indicates the type of the NAL unit, the (potential)
   presence of bit errors or syntax violations in the NAL unit payload,
   information regarding the relative importance of the NAL unit for
   the decoding process, the layer decoding dependency information, and
   other fields as discussed below.  This RTP payload specification is
   designed to be unaware of the octet string in the NAL unit payload.

   The NAL unit header co-serves as the payload header of this RTP
   payload format.  The payload of a NAL unit follows immediately.

   The syntax and semantics of the NAL unit header are formally
   specified in [SVC], but the essential properties of the NAL unit
   header are summarized below.

   The first byte of the NAL unit header has the following format (the
   bit fields are the same as defined for NAL units conforming to


Wenger, Wang, Schierl     Expires January 09, 2008          [page 8]

Internet-Draft        RTP Payload Format for SVC Video       July 2007

   profiles defined in Annex A of [H.264] and [RFC3984], while the
   semantics have changed slightly, in a backward compatible way):

         +---------------+
         |0|1|2|3|4|5|6|7|
         +-+-+-+-+-+-+-+-+
         |F|NRI|  Type   |
         +---------------+

   F: 1 bit
   forbidden_zero_bit.  H.264 declares a value of 1 as a syntax
   violation.

   NRI: 2 bits
   nal_ref_idc.  A value of 00 indicates that the content of the NAL
   unit is not used to reconstruct reference pictures for future
   prediction.  Such NAL units can be discarded without risking the
   integrity of the reference pictures in the same Layer.  Values
   greater than 00 indicate that the decoding of the NAL unit is
   required to maintain the integrity of reference pictures, or that
   the NAL unit contains parameter sets.

   Type: 5 bits
   nal_unit_type.  This component specifies the NAL unit payload type
   as defined in table 7-1 of [SVC], and later within this memo.  For a
   reference of all currently defined NAL unit types and their
   semantics, please refer to section 7.4.1 in [SVC].

   Previously, NAL unit types 14, 15 and 20 have been reserved for
   future extensions.  SVC is using these three NAL unit types.  NAL
   unit type 14 is used for the prefix NAL unit, NAL unit type 15 is
   used for SVC sequence parameter sets and NAL unit type 20 is used
   for coded slice in scalable extension (see section 7.4.1 in [SVC]).
   NAL unit types 14 and 20 indicate the presence of three additional
   octets in the NAL unit header, as shown below.

            +---------------+---------------+---------------+
            |0|1|2|3|4|5|6|7|0|1|2|3|4|5|6|7|0|1|2|3|4|5|6|7|
            +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
            |R|I|   PRID    |N| DID |  QID  | TID |U|D|O| RR|

Wenger, Wang, Schierl     Expires January 09, 2008          [page 9]

Internet-Draft        RTP Payload Format for SVC Video       July 2007

            +---------------+---------------+---------------+

   R: 1 bit
   reserved_one_bit.  Reserved bit for future extension.  R MUST be
   equal to one.

   I: 1 bit
   idr_flag.  This component specifies whether the layer picture is an
   instantaneous decoding refresh (IDR) layer picture (when equal to 1)
   or not (when equal to 0).

   PRID: 6 bits
   priority_id.  This flag specifies a priority identifier for the NAL
   unit.  A lower value of PRID indicates a higher priority.

   N: 1 bit
   no_inter_layer_pred_flag.  This flag specifies, when present in a
   coded slice NAL unit, whether inter-layer prediction may be used for
   decoding the coded slice (when equal to 1) or not (when equal to 0).

   DID: 3 bits
   dependency_id.  This component denotes the inter-layer coding
   dependency hierarchy.  At any access unit, a layer picture with a
   less dependency_id may be used for inter-layer prediction for coding
   of a layer picture with a greater dependency_id, while a layer
   picture with a greater dependency_id shall not be used for inter-
   layer prediction for coding of a layer picture with a less
   dependency_id.

   QID: 4 bits
   quality_id.  This component designates the quality level hierarchy
   of a MGS layer picture.  At any access unit and with identical
   dependency_id value, a layer picture with quality_id equal to ql
   uses a layer picture with quality_id equal to ql-1 for inter-layer
   prediction.

   TID: 3 bits
   temporal_id.  This component indicates the temporal layer (or frame
   rate) hierarchy.  Informally put, a layer consisted of pictures with
   a less temporal_id has a lower frame rate.  A given temporal layer

Wenger, Wang, Schierl     Expires January 09, 2008          [page 10]

Internet-Draft        RTP Payload Format for SVC Video       July 2007

   typically depends on the lower temporal layers (i.e. the temporal
   layers with less temporal_id) but never depends on any higher
   temporal layer.

   U: 1 bit
   use_ref_base_pic_flag.  A value of 1 indicates that only reference
   base pictures are used during the inter prediction process.  A value
   of 0 indicates that the reference base pictures are not used during
   the inter prediction process.

   D: 1 bit
   discardable_flag.  A value of 1 indicates that the current NAL unit
   is not used for decoding NAL units of the current access unit and
   all subsequent access units that have a greater value of
   dependency_id than the current NAL unit.  Such NAL units can be
   discarded without risking the integrity of higher layers with
   greater dependency_id.  discardable_flag equal to 0 indicates that
   the decoding of the NAL unit is required to maintain the integrity
   of higher layers with greater dependency_id.

   O: 1 bit
   output_flag: Affects the decoded picture output process as defined
   in Annex C of [SVC].

   RR: 2 bits
   reserved_three_2bits.  Reserved bits for future extension.  RR MUST
   be equal to three.

   This memo introduces the same additional NAL unit types as RFC 3984,
   which are presented in section 6.3.  The NAL unit types defined in
   this memo are marked as unspecified in [SVC].  Moreover, this
   specification extends the semantics of F, NRI, I, PRID, DID, QID,
   TID, U, and D as described in section 6.4.

4. Scope

   This payload specification can only be used to carry the "naked" NAL
   unit stream over RTP, and not the byte stream format according to
   Annex B of [SVC].  Likely, the applications of this specification
   will be in the IP based multimedia communications fields including

Wenger, Wang, Schierl     Expires January 09, 2008          [page 11]

Internet-Draft        RTP Payload Format for SVC Video       July 2007

   conversational multimedia, video telephony or video conferencing,
   Internet streaming and TV over IP.

   This specification allows, in a given RTP session, to encapsulate
   NAL units belonging to
     o the base Layer only, detailed specification in [RFC3984], or
     o one or more enhancement Layers, or
     o the base Layer and one or more enhancement Layers


5. Definitions and Abbreviations

5.1. Definitions

5.1.1.    Definitions per SVC specification

   This document uses the definitions of [SVC].  The following terms,
   defined in [SVC], are summed up for convenience:

   scalable bitstream:  A bitstream with the property that one or more
   bitstream subsets that are not identical to the scalable bitstream
   form another bitstream that conforms to the SVC specification.

   prefix NAL unit:  A NAL unit with nal_unit_type equal to 14 that
   immediately precedes a NAL unit with nal_unit_type equal to 1, 5,
   or 12.  The NAL unit that succeeds the prefix NAL unit is also
   referred to as the associated NAL unit.  The prefix NAL unit
   contains data associated with the associated NAL unit, which are
   considered to be part of the associated NAL unit.

   access unit:  A set of NAL units pertaining to a certain temporal
   location.  An access unit includes the coded slices of all the
   scalable layers at that temporal location and possibly other
   associated data, e.g. SEI messages and parameter sets.

   coded video sequence:  A sequence of access units that consists, in
   decoding order, of an instantaneous decoding refresh (IDR) access
   unit followed by zero or more non-IDR access units including all
   subsequent access units up to but not including any subsequent IDR
   access unit.

Wenger, Wang, Schierl     Expires January 09, 2008          [page 12]

Internet-Draft        RTP Payload Format for SVC Video       July 2007


   IDR access unit:  An access unit in which the layer picture with the
   maximum present value of dependency_id is an IDR picture.

   IDR picture:  A coded picture in which all slices with the maximum
   present value of dependency_id within the access unit are I or EI
   slices that causes the decoding process to mark all reference
   pictures as "unused for reference" immediately after decoding the
   IDR picture.  After the decoding of an IDR picture all following
   coded pictures in decoding order can be decoded without inter
   prediction from any picture decoded prior to the IDR picture.  The
   first picture of each coded video sequence is an IDR picture.

5.1.2.    Definitions local to this memo

   Layer:  A Layer may be the base Layer or an enhancement Layer that
   enhances the temporal resolution (i.e. the frame rate), the spatial
   resolution, or the quality of the video content, relative to the
   quality represented without the Layer.

   base Layer:  The base Layer is typically representing the minimal
   spatial resolution, the minimal fidelity, and the minimal frame rate
   of an SVC bitstream.  In other words, the base Layer consists of all
   the VCL NAL units with dependency_id, quality_id and temporal_level
   equal to 0 and the associated non-VCL NAL units.  The bitstream
   containing the base Layer and the temporal enhancement Layers with
   dependency_id and quality_id both equal to 0, which is referred as
   the full base Layer, must only contain NAL units conforming to
   profiles defined in Annex A of [H.264].  The base Layer is
   independently decodable without the requirement of using any other
   Layer of the SVC bitstream.  In SVC context each slice NAL unit in
   the base Layer is associated with a prefix NAL unit, which has a
   four bytes NAL unit header containing all the syntax elements
   described in section 3.3.  Note that this definition is different
   from the definition of "base layer" in Annex G of [SVC].

   enhancement Layer:  An SVC enhancement Layer is identified by
   temporal_level, dependency_id, and quality_level as defined in Annex
   G of [SVC] and summarized in section 3.3.


Wenger, Wang, Schierl     Expires January 09, 2008          [page 13]

Internet-Draft        RTP Payload Format for SVC Video       July 2007

   Operation Point:  An Operation Point of a SVC bitstream represents a
   certain level of temporal, spatial and quality scalability.  An
   Operation Point contains only those NAL units required for restoring
   a valid bitstream (conforming to profiles defined in Annex A or
   Annex G of [SVC]) to represent a certain quality.  The Operation
   Point is described by the maximum present value of dependency_id,
   and, within that maximum present value of dependency_id, by the
   maximum quality_id and temporal_id.

   RTP packet stream: A sequence of RTP packets with increasing
   sequence numbers, identical PT and SSRC, carried in one RTP session.
   Within the scope of this memo, one RTP packet stream is utilized to
   transport an integer number of SVC Layers.

   Session multiplexing:  The scalable SVC bitstream is distributed
   onto different RTP sessions, whereby each RTP session carries a
   single RTP packet stream.  Each RTP session requires a separate
   signaling and has a separate Timestamp, Sequence Number, and SSRC
   space.  Dependency between sessions MUST be signaled according to
   [I-D.schierl-mmusic-layered-codec] and this memo.

5.2. Abbreviations

   In addition to the abbreviations defined in [RFC3984], the following
   ones are defined.

   CGS:       Coarse-Grain Scalability
   MGS:       Medium-Grain Scalability

6. RTP Payload Format

6.1. Design Principles

   The following design principles have been observed:

   o Backward compatibility with [RFC3984] wherever possible.

   o As the SVC full base Layer is H.264/AVC compatible, we assume the
   full base
     Layer or any subset (when transmitted in its own session) to be

Wenger, Wang, Schierl     Expires January 09, 2008          [page 14]

Internet-Draft        RTP Payload Format for SVC Video       July 2007

     encapsulated using [RFC3984].  Requiring this has the desirable
     side effect that it can be used by [RFC3984] legacy devices.

   o MANEs are signaling aware and rely on signaling information.
     MANEs have state.

   o MANEs can terminate RTP sessions, and create different RTP
     sessions with perhaps modified content.  This form of a MANE acts
     as an RTP mixer.

   o MANEs can also act as RTP translators.  The perhaps most likely
     use case is media-aware stream thinning.  By using the payload
     header information identifying Layers within an RTP session,
     MANEs are able to remove packets from the RTP session while
     otherwise keeping the session intact.  This implies rewriting
     the RTP headers of the outgoing packet stream and rewriting of
     RTCP Receiver Reports.

6.2. RTP Header Usage

   Please see section 5.1 of [RFC3984].  The following applies in
   addition.

6.3. Common Structure of the RTP Payload Format

   Please see section 5.2 of [RFC3984].

6.4. NAL Unit Header Usage

   The structure and semantics of the NAL unit header were introduced
   in section 3.3.  This section specifies the semantics of F, NRI, I,
   PRID, DID, QID, TID, U, and D according to this specification.

   The semantics of F specified in section 5.3 of [RFC3984] also
   applies herein.

   For NRI, for the bitstream containing NAL units conforming with
   profiles defined in Annex A of [H.264] and transported using
   [RFC3984], the semantics specified in section 5.3 of [RFC3984] are
   applicable, i.e., NRI also indicates the relative importance of NAL

Wenger, Wang, Schierl     Expires January 09, 2008          [page 15]

Internet-Draft        RTP Payload Format for SVC Video       July 2007

   units.  In SVC context, only the semantics specified in Annex G of
   [SVC] are applicable, i.e., NRI does not indicate the relative
   importance of NAL units.

   For I, in addition to the semantics specified in Annex G of [SVC],
   according to this memo, MANEs MAY use this information to protect
   NAL units with I equal to 1 better than NAL units with I equal to 0.
   MANEs MAY also utilize information of NAL units with I equal to 1 to
   decide when to forward more packets for an RTP session.

   For PRID, the semantics specified in Annex G of [SVC] applies.
   Note, that MANEs implementing unequal error protection may use this
   information to protect NAL units with smaller PRID values better
   than those with larger PRID values, for example by including only
   the more important NAL units in a FEC protection mechanism.  The
   importance for the decoding process decreases as the PRID value
   increases.

   For DID, QID, TID, in addition to the semantics specified in Annex G
   of [SVC], according to this memo, values of DID, QID, or TID
   indicate the relative importance in their respective dimension.  A
   lower value of DID, QID, or TID indicates a higher importance if the
   other two components are identical.  MANEs MAY use this information
   to protect more important NAL units better than less important NAL
   units.

   For U, in addition to the semantics specified in Annex G of [SVC],
   according to this memo, MANEs MAY use this information to protect
   NAL units with U equal to 1 better than NAL units with U equal to 0.

   For D, in addition to the semantics specified in Annex G of [SVC],
   according to this memo, MANEs MAY use this information to determine
   whether a given NAL unit is required for successfully decoding a
   certain Operation Point of the SVC bitstream, hence to decide
   whether to forward the NAL unit.

6.5. Packetization Modes

   Please see section 5.4 of [RFC3984].  The single NAL unit
   packetization mode SHALL NOT be used.

Wenger, Wang, Schierl     Expires January 09, 2008          [page 16]

Internet-Draft        RTP Payload Format for SVC Video       July 2007


     Informative note: The non-interleaved mode allows an application
     to encapsulate a single NAL unit in a single RTP packet.
     Historically, the single NAL unit mode has been included into
     [RFC3984] only for compatibility with ITU-T Rec. H.241 Annex A
     [H.241].  There is no point in carrying this historic ballast
     towards a new application space such as the one provided with SVC.
     More technically speaking, the implementation complexity increase
     for providing the additional mechanisms of the non-interleaved
     mode (namely STAPs) is so minor, and the benefits are so great,
     that we require STAP implementation.

6.6. Decoding Order Number (DON)

   Please see section 5.5 of [RFC3984].  The following applies in
   addition.

   When different layers of a SVC bitstream are transported in more
   than one RTP session, the interleaved packetization mode MUST be
   used, and the DON values of all the NAL units MUST indicate the
   correct NAL unit decoding order over all the RTP sessions.

   When more than one RTP session is used to convey an Operation Point
   of a SVC bitstream, each session MUST signal an identical value for
   the MIME parameters sprop-interleaving-depth, sprop-max-don-diff,
   sprop-deint-buf-req, and sprop-init-buf-time.  Further, these values
   must be valid for the reception capabilities over all sessions.  A
   receiver MUST signal the same MIME parameter deint-buf-cap for all
   sessions used.  [Ed.Note(YkW): I think we need more thinking on the
   value of the parameters. For example, requiring the parameters be
   the same for all the RTP streams and clients might be overkill for
   receivers of only lower layers.]
   [Edt. Note (StW): In RFC3984, the aforementioned codepoints are
   optional.  It appears that for SVC, when used in conjunction with
   session mux, they are mandatory.  I don't know how to express this
   in the MIME registration; we'll cross that bridge once we are
   getting to it.]

6.7. Aggregation Packets


Wenger, Wang, Schierl     Expires January 09, 2008          [page 17]

Internet-Draft        RTP Payload Format for SVC Video       July 2007

   Please see section 5.7 of [RFC3984].

6.8. Fragmentation Units (FUs)

   Please see section 5.8 of [RFC3984].

6.9. Payload Content Scalability Information (PACSI) NAL Unit

   A new NAL unit type is specified in this memo, and referred to as
   payload content scalability information (PACSI) NAL unit.  The PACSI
   NAL unit, if present, MUST be the first NAL unit in an aggregation
   packet, and it MUST NOT be present in other types of packets.  The
   PACSI NAL unit indicates scalability and other characteristics that
   are common for all the remaining NAL units in the payload, thus
   making it easier for MANEs to decide whether to
   forward/process/discard the aggregation packet.  Furthermore, a
   PACSI NAL unit MAY contain zero or more SEI NAL units.  Senders MAY
   create PACSI NAL units and receivers MAY ignore them, or use them as
   hints to enable efficient aggregation packet processing.  Note that
   the NAL unit type for the PACSI NAL unit is selected among those
   values that are unspecified in [SVC] and [RFC3984].

   When the first aggregation unit of an aggregation packet contains a
   PACSI NAL unit, there MUST be at least one additional aggregation
   unit present in the same packet.  The RTP header fields are set
   according to the remaining NAL units in the aggregation packet.

   When a PACSI NAL unit is included in a multi-time aggregation packet
   (MTAP), the decoding order number (DON) for the PACSI NAL unit MUST
   be set to indicate either 1) the PACSI NAL unit is the first NAL
   unit in decoding order among the NAL units in the aggregation packet
   or 2) the PACSI NAL unit has an identical DON to the first NAL unit
   in decoding order among the remaining NAL units in the aggregation
   packet.

   The structure of a PACSI NAL unit is as follows.  The first four
   octets are exactly the same as the four-byte SVC NAL unit header as
   discussed in section 3.3.  They are followed by one additional octet
   and zero or more SEI NAL units, each preceded by a 16-bit unsigned
   size information (in network byte order) that indicates the size of

Wenger, Wang, Schierl     Expires January 09, 2008          [page 18]

Internet-Draft        RTP Payload Format for SVC Video       July 2007

   the following NAL unit in bytes (excluding these two octets, but
   including the NAL unit type octet of the NAL unit).  Following is an
   example of a PACSI NAL unit containing two SEI NAL units.

       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |F|NRI|  Type   |R|I|   PRID    |N| DID |  QID  | TID |U|D|O| RR|
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |A|T|P|C|S|E|RES|   TL0PICIDX   |        NAL unit size 1        |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                                                               |
      |                          SEI NAL unit 1                       |
      |                                                               |
      |                         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                         |        NAL unit size 2        |     |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+     |
      |                                                               |
      |            SEI NAL unit 2                                     |
      |                                           +-+-+-+-+-+-+-+-+-+-+
      |                                           |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


   The values of the fields in PACSI NAL unit MUST be set as follows.

   o The F bit MUST be set to 1 if the F bit in at least one remaining
     NAL unit in the payload is equal to 1.  Otherwise, the F bit MUST
     be set to 0.

   o The NRI field MUST be set to the highest value of NRI field among
     all the remaining NAL units in the payload.

   o The Type field MUST be set to 30.

   o The R bit MUST be set to 1.

   o The I bit MUST be set to 1 if the I bit of at least one of the
     remaining NAL units in the payload is equal to 1.  Otherwise, the
   I
     bit MUST be set to 0.


Wenger, Wang, Schierl     Expires January 09, 2008          [page 19]

Internet-Draft        RTP Payload Format for SVC Video       July 2007


   o The PRID field MUST be set to the lowest value of the PRID values
     of all the remaining NAL units in the payload.

   o The N bit MUST be set to 1 if the N bit of all the remaining NAL
     units in the payload is equal to 1.  Otherwise, the N bit MUST be
     set to 0.

   o The DID field MUST be set to the lowest value of the DID values
     of all the remaining NAL units in the payload.

   o The QID field MUST be set to the lowest value of the QID values
     of all the remaining NAL units with the lowest value of DID in the
     payload.

   o The TID field MUST be set to the lowest value of the TID values
     of all the remaining NAL units with the lowest value of DID in the
     payload.

   o The U bit MUST be set to 1 if the U bit of at least one of the
     remaining NAL units in the payload is equal to 1.  Otherwise, the
     U bit MUST be set to 0.

   o The D bit MUST be set to 0 if the D value of all the remaining NAL
     unit in the payload is equal to 0.  Otherwise, the D bit MUST be
     set to 1.

   o The O bit MUST be set to 1 if the O bit of at least one of the
     remaining NAL units in the payload is equal to 1.  Otherwise, the
     O bit MUST be set to 0.

   o The RR field MUST be set to be equal to 3.

   o The A bit MUST be set to 1 if all the layer pictures containing
     the target NAL units are anchor pictures.  Otherwise, the A bit
     MUST be set to 0.  The target NAL units are such NAL units
     contained in the aggregation packet, but not included in the PACSI
     NAL unit, that are within the access unit to which the first NAL
     unit following the PACSI NAL unit in the aggregation packet
     belongs.  An anchor picture is such a layer picture that, if

Wenger, Wang, Schierl     Expires January 09, 2008          [page 20]

Internet-Draft        RTP Payload Format for SVC Video       July 2007

     decoding of the layer starts from the layer picture, all the
     following layer pictures of the layer, in output order, can be
     correctly decoded.

      Informative note: An anchor picture is a random access point to
      the layer the anchor picture belongs to.  However, some layer
      pictures succeeding an anchor picture in decoding order but
      preceding the anchor picture in output order may refer to earlier
      layer pictures hence may not be correctly decoded, if random
      access is performed at the anchor picture.

   o The T bit MUST be set to 1 if all the layer pictures containing
     the target NAL units (as defined above) are temporal scalable
     layer switching points.  Otherwise, the T bit MUST be set to 0.
     For a temporal scalable layer switching point, all the layer
     pictures with the same value of temporal_id at and after the
     switching point in decoding order do not refer to any layer
     picture with the same value of temporal_id preceding the switching
     point in decoding order.

   o The P bit MUST be set to 1 if all the layer pictures containing
     the target NAL units (as defined above) are redundant pictures.
     Otherwise, the P bit MUST be set to 0.

   o The C bit MUST be set to 1 if the layer picture that has the
     greatest value of dependency_id among all the layer pictures
     containing the target NAL units (as defined above) is an intra
     picture, i.e., the layer picture does not refer to any earlier
     layer picture in decoding order in the same layer.  Otherwise, the
     C bit MUST be set to 0.

   o The S bit MUST be set to 1, if the first VCL NAL unit of the layer
     picture containing the first target NAL unit (as defined above) in
     decoding order is present in the payload.  Otherwise, the S bit
     MUST be set to 0.

   o The E bit MUST be set to 1, if the last VCL NAL unit of the layer
     picture containing the first target NAL unit (as defined above) in
     decoding order is present in the payload.  Otherwise, the E field
     MUST be set to 0.

Wenger, Wang, Schierl     Expires January 09, 2008          [page 21]

Internet-Draft        RTP Payload Format for SVC Video       July 2007


   o The RES field MUST be set to 0.

   o The TL0PICIDX field specifies either an identifier for the layer
     picture containing the first target NAL unit (as defined above)
     when TL of the layer picture is equal to 0, or the identifier of
     the most recent layer picture of TID equal to 0 in decoding order,
     when TID of the layer picture containing the first target NAL unit
     is greater than 0.  If the bitstream contains no earlier access
     unit than the access unit containing the target NAL units in
     decoding order with TID equal to 0, TL0PICIDX MAY have any value.
     Otherwise, let prevTL0FrameIdx be equal to the field TL0PICIDX of
     the most recent access unit relative to the access unit containing
     the target NAL units in decoding order with TID equal to 0.  If
     TID is equal to 0, the field TL0PICIDX MUST be equal to (
     prevTL0FrameIdx + 1 ) % 256.  Otherwise (TID is greater than 0),
     TL0PICIDX MUST be equal to prevTL0FrameIdx.

   SEI NAL units included in the PACSI NAL unit, if any, MUST contain a
   subset of the SEI messages associated with the access unit of the
   first NAL unit following the PACSI NAL unit within the aggregation
   packet.

      Informative note: Senders may repeat such SEI NAL units in the
      PACSI NAL unit the presence of which in more than one packet is
      essential for packet loss robustness.  Receivers may use the
      repeated SEI messages in place of missing SEI messages.

   An SEI message SHOULD NOT be included in a PACSI NAL unit and
   included in one of the NAL units contained in the same packet at the
   same time.

7. Packetization Rules

   Please see section 6 of [RFC3984].  The following rules apply in
   addition.

   The single NAL unit mode SHALL NOT be used.  (See also section 6.5
   for the motivation).


Wenger, Wang, Schierl     Expires January 09, 2008          [page 22]

Internet-Draft        RTP Payload Format for SVC Video       July 2007

   When a prefix NAL unit is encapsulated for transmission, it SHOULD
   be aggregated to the same transmission packet as the associated NAL
   unit following the prefix NAL unit in decoding order.

      Informative note: When either the prefix NAL unit or the
      associated NAL unit containing an H.264/AVC coded slice is lost,
      the remaining one would be hardly useful in SVC context.

   When Layers of a SVC bitstream are transported in more than one RTP
   session, the interleaved packetization mode MUST be used.

8. De-Packetization Process (Informative)

   Please see section 7 of [RFC3984].  The following rules apply in
   addition.

   To re-assemble a conforming NAL unit stream that has been conveyed
   in more than one RTP session, DON SHOULD be utilized to re-sequence
   NAL unit stemming from the different RTP sessions.

9. Payload Format Parameters

   [Edt. note: this section 9 and its subsections will be updated
   according to the changes listed below, a little later in the
   process.  For now, we just list the adjustments necessary, so not to
   bury any new information in the RFC 3984 text.]

   Section 8 of [RFC3984] applies with the following modification.

   The sentence

   "The parameters are specified here as part of the MIME subtype
   registration for the ITU-T H.264 | ISO/IEC 14496-10 codec."

   is replaced with

   "The parameters are specified here as part of the MIME subtype
   registration for the SVC codec."


Wenger, Wang, Schierl     Expires January 09, 2008          [page 23]

Internet-Draft        RTP Payload Format for SVC Video       July 2007

9.1. MIME Registration

          Editor's note: this needs to be updated by copy-pasting the
          RFC 3984 MIME registration into this document, so to make it
          self-contained.  Will be done later in the process.

   The MIME subtype for the SVC codec is allocated from the IETF tree.

   The receiver MUST ignore any unspecified parameter.

   Media Type name:     video

   Media subtype name:  H.264-SVC

   Required parameters: none

   OPTIONAL parameters:

   The optional MIME parameters specified in [RFC3984] apply, with the
   following constraints (to be edited in at the appropriate time):

   sprop-interleaving-depth:
   In case of using Session multiplexing, the same sprop-interleaving-
   depth value MUST be signaled for all sessions and MUST be valid over
   all sessions of the multiplex.

   sprop-max-don-diff:
   In case of using Session multiplexing, the same sprop-max-don-diff
   value MUST be signaled for all sessions and MUST be valid over all
   sessions of the multiplex.

   sprop-deint-buf-req:
   In case of using Session multiplexing, the same sprop-deint-buf-req
   value MUST be signaled for all sessions and MUST be valid over all
   sessions of the multiplex.

   sprop-init-buf-time:
   In case of using Session multiplexing, the same sprop-init-buf-time
   value MUST be signaled for all sessions and MUST be valid over all
   sessions of the multiplex.

Wenger, Wang, Schierl     Expires January 09, 2008          [page 24]

Internet-Draft        RTP Payload Format for SVC Video       July 2007


   deint-buf-cap:
   In case of using Session multiplexing, the same deint-buf-cap value
   MUST be signaled by the receiver for all sessions and MUST be valid
   over all sessions of the multiplex.

   In addition the following optional MIME parameters apply:

   sprop-scalability-info:
   This parameter MAY be used to convey the NAL unit containing the
   scalability information SEI message as specified in Annex G of
   [SVC].  The parameter MUST NOT be used to indicate codec capability
   in any capability exchange procedure.  The value of the parameter is
   the base64 representation of the NAL unit containing the scalability
   information SEI message.

   sprop-layer-ids:
   This parameter MAY be used to signal the layer identification
   value(s), expressed by the value of DID, QID, and TID of the SVC NAL
   unit header, for one or more Layer(s) conveyed in one RTP session.
   A layer identification is a three character value base64 coded.  If
   more than one Layer is transmitted within one RTP session, the layer
   identification value of each Layer MUST be itemized in order of
   decreasing importance, and MUST be comma-separated.

      Encoding considerations:
                           This type is only defined for transfer
                           via RTP (RFC 3550).

      Security considerations:
                           See section 9 of RFC XXXX.

      Public specification:
                           Please refer to section 15 of RFC XXXX.

      Additional information:
                           None

      File extensions:     none
      Macintosh file type code: none

Wenger, Wang, Schierl     Expires January 09, 2008          [page 25]

Internet-Draft        RTP Payload Format for SVC Video       July 2007

      Object identifier or OID: none
      Person & email address to contact for further information:
      Intended usage:      COMMON
      Author:
      Change controller:
                           IETF Audio/Video Transport working group
                           delegated from the IESG.

9.2. SDP Parameters

9.2.1.    Mapping of MIME Parameters to SDP

   The MIME media type video/SVC string is mapped to fields in the
   Session Description Protocol (SDP) as follows:

   *  The media name in the "m=" line of SDP MUST be video.

   *  The encoding name in the "a=rtpmap" line of SDP MUST be SVC (the
      MIME subtype).

   *  The clock rate in the "a=rtpmap" line MUST be 90000.

   *  The OPTIONAL parameters "profile-level-id", "max-mbps", "max-fs",
      "max-cpb", "max-dpb", "max-br", "redundant-pic-cap", "sprop-
      parameter-sets", "parameter-add", "packetization-mode", "sprop-
      interleaving-depth", "deint-buf-cap", "sprop-deint-buf-req",
      "sprop-init-buf-time", "sprop-max-don-diff", "max-rcmd-nalu-
      size'', ''sprop-layer-ids'', and ''sprop-scalability-info'', when
      present, MUST be included in the "a=fmtp" line of SDP.  These
      parameters are expressed as a MIME media type string, in the form
      of a semicolon separated list of parameter=value pairs.

9.2.2.    Usage with the SDP Offer/Answer Model

   TBD.

9.2.3.    Usage with Session multiplexing


Wenger, Wang, Schierl     Expires January 09, 2008          [page 26]

Internet-Draft        RTP Payload Format for SVC Video       July 2007

   If Session multiplexing is used, the rules on signaling media
   decoding dependency in SDP as defined in
   [I-D.schierl-mmusic-layered-codec] apply.

9.2.4.    Usage in Declarative Session Descriptions

   TBD.

9.3. Examples

   TBD.

9.4. Parameter Set Considerations

   Please see section 10 of [RFC3984].

10.  Security Considerations

   Please see section 11 of [RFC3984].

11.  Congestion Control

   Within any given RTP session carrying payload according to this
   specification, the provisions of section 12 of [RFC3984] apply.
   Reducing the session bandwidth is possible by one or more of the
   following means, listed in an order that, in most cases, will assure
   the least negative impact to the user experience:

   a) within the highest Layer identified by the DID field, utilize the
      TID and/or QID fields in the NAL unit header to drop NAL units
      with lower importance for the decoding process or human
      perception.
   b) drop all NAL units belonging to the highest enhancement Layer as
      identified by the highest DID value.
   c) dropping NAL units according to their importance for the decoding
      process, as indicated by the fields in the NAL unit header of the
      NAL units or in the prefix NAL units.
   d) dropping NAL units or entire packets not according to the
      aforementioned rules (media-unaware stream thinning).  This


Wenger, Wang, Schierl     Expires January 09, 2008          [page 27]

Internet-Draft        RTP Payload Format for SVC Video       July 2007

      results in the reception of a non-compliant bitstream and, most
      likely, in very annoying artifacts

          Informative note: The discussion above is centered on NAL
          units and not on packets, primarily because that is the level
          where senders can meaningfully manipulate the scalable
          bitstream.  The mapping of NAL units to RTP packets is fairly
          flexible when using aggregation packets.  Depending on the
          nature of the congestion control algorithm, the ''dimension''
          of congestion measurement (packet count or bitrate) and
          reaction to it (reducing packet count or bitrate or both) can
          be adjusted accordingly.

   All aforementioned means are available to the RTP sender, regardless
   whether that sender is located in the sending endpoint or in a mixer
   based MANE.

   When a translator-based MANE is employed, then the MANE MAY
   manipulate the session only on the MANE's outgoing path, so that the
   sensed end-to-end congestion falls within the permissible envelope.
   As all translators, in this case the MANE needs to rewrite RTCP RRs
   to reflect the manipulations it has performed on the session.

12.  IANA Consideration

   [Edt. Note: A new MIME type should be registered from IANA.]

13.  Informative Appendix: Application Examples

13.1.     Introduction

   Scalable video coding is a concept that has been around at least
   since MPEG-2 [MPEG2], which goes back as early as 1993.
   Nevertheless, it has never gained wide acceptance; perhaps partly
   because applications didn't materialize in the form envisioned
   during standardization.

   MPEG and JVT, respectively, performed a requirement analysis before
   the SVC project was launched.  Dozens of scenarios have been
   studied.  While some of the scenarios appear not to follow the most

Wenger, Wang, Schierl     Expires January 09, 2008          [page 28]

Internet-Draft        RTP Payload Format for SVC Video       July 2007

   basic design principles of the Internet -- and are therefore not
   appropriate for IETF standardization -- others are clearly in the
   scope of IETF work.  Of these, this draft chooses the following
   subset for immediate consideration.  Note that we do not reference
   the MPEG and JVT documents directly; partly, because at least the
   MPEG documents have a limited lifespan and are not publicly
   available, and partly because the language used in these documents
   is inappropriately video centric and imprecise, when it comes to
   protocol matters.

   With these remarks, we now introduce three main application
   scenarios that we consider as relevant, and that are implementable
   with this specification.

13.2.     Layered Multicast

   This well-understood form of the use of layered coding
   [McCanne/Vetterli] implies that all layers are individually conveyed
   in their own RTP packet streams, each carried in its own RTP session
   using the IP (multicast) address and port number as the single
   demultiplexing point.  Receivers ''tune'' into the layers by
   subscribing to the IP multicast, normally by using IGMP [IGMP].

   Layered Multicast has the great advantage of simplicity and easy
   implementation.  However, it has also the great disadvantage of
   utilizing many different transport addresses.  While we consider
   this not to be a major problem for a professionally maintained
   content server, receiving client endpoints need to open many ports
   to IP multicast addresses in their firewalls.  This is a practical
   problem from a firewall/NAT viewpoint.  Furthermore, even today IP
   multicast is not as widely deployed as many wish.

   We consider layered multicast an important application scenario for
   three reasons.  First, it is well understood and the implementation
   constraints are well known.  There may well be large scale IP
   networks outside the immediate Internet context that may wish to
   employ layered multicast in the future.  One possible example could
   be a combination of content creation and core-network distribution
   for the various mobile TV services, e.g. those being developed by
   3GPP (MBMS) [MBMS] and DVB (DVB-H) [DVB-H].

Wenger, Wang, Schierl     Expires January 09, 2008          [page 29]

Internet-Draft        RTP Payload Format for SVC Video       July 2007


13.3.     Streaming of an SVC scalable stream

   In this scenario, a streaming server has a repository of stored SVC
   coded layers for a given content.  At the time of streaming, and
   according to the capabilities, connectivity, and congestion
   situation of the client(s), the streaming server generates and
   serves a scalable stream.  Both unicast and multicast serving is
   possible.  At the same time, the streaming server may use the same
   repository of stored layers to compose different streams (with a
   different set of layers) intended for other audiences.

   As every endpoint receives only a single SVC RTP session, the number
   of firewall pinholes can be optimized to one.

   The main difference between this scenario and straightforward
   simulcasting lies in the architecture and the requirements of the
   streaming server, and is therefore out of the scope of IETF
   standardization.  However, compelling arguments can be made why such
   a streaming server design makes sense.  One possible argument is
   related to storage space and channel bandwidth.  Another is
   bandwidth adaptivity without transcoding -- a considerable advantage
   in a congestion controlled network.  When the streaming server
   learns about congestion, it can reduce sending bitrate by choosing
   fewer layers or utilizing FGS, when composing the layered stream;
   see section 10.  SVC is designed to gracefully support both
   bandwidth rampdown and bandwidth rampup with a considerable dynamic
   range.  This payload format is designed to allow for bandwidth
   flexibility in the mentioned sense, both for CGS and FGS layers.
   While, in theory, a transcoding step could achieve a similar dynamic
   range, the computational demands are impractically high and video
   quality is typically lowered -- therefore, few (if any) streaming
   servers implement full transcoding.

13.4.     Multicast to MANE, SVC scalable stream to endpoint

   This scenario is a bit more complex, and designed to optimize the
   network traffic in a core network, while still requiring only a
   single pinhole in the endpoint's firewall.  One of its key
   applications is the mobile TV market.

Wenger, Wang, Schierl     Expires January 09, 2008          [page 30]

Internet-Draft        RTP Payload Format for SVC Video       July 2007


   Consider a large private IP network, e.g. the core network of 3GPP.
   Streaming servers within this core network can be assumed to be
   professionally maintained.  We assume that these servers can have
   many ports open to the network and that layered multicast is a real
   option.  Therefore, we assume that the streaming server multicasts
   SVC scalable layers, instead of simulcasting different
   representations of the same content at different bit rates.

   Also consider many endpoints of different classes.  Some of these
   endpoints may not have the processing power or the display size to
   meaningfully decode all layers; other may have these capabilities.
   Users of some endpoints may not wish to pay for high quality and are
   happy with a base service, which may be cheaper or even free.  Other
   users are willing to pay for high quality.  Finally, some connected
   users may have a bandwidth problem in that they can't receive the
   bandwidth they would want to receive -- be it through congestion,
   connectivity, change of service quality, or for whatever other
   reasons.  However, all these users have in common that they don't
   want to be exposed too much, and therefore the number of firewall
   pinholes need to be small.

   This situation can be handled best by introducing middleboxes close
   to the edge of the core network, which receive the layered multicast
   streams and compose the single SVC scalable bit stream according to
   the needs of the endpoint connected.  These middleboxes are called
   MANEs throughout this specification.  In practice, we envision the
   MANE to be part of (or at least physically and topologically close
   to) the base station of a mobile network, where all the signaling
   and media traffic necessarily are multiplexed on the same physical
   link.  This is why we do not worry too much about decomposition
   aspects of the MANE as such.

   MANEs necessarily need to be fairly complex devices.  They certainly
   need to understand the signaling, so, for example, to associate the
   PT octet in the RTP header with the SVC payload type.

   A MANE may terminate the multicasted layered RTP sessions incoming
   from the core network side, and create new RTP sessions (perhaps
   even multicast sessions) to the endpoints connected to them.  In RTP

Wenger, Wang, Schierl     Expires January 09, 2008          [page 31]

Internet-Draft        RTP Payload Format for SVC Video       July 2007

   terminology, these types of MANEs are RTP mixers.  This implies, per
   RFC 3550, a very loose relationship between the incoming and
   outgoing RTP sessions.  In particular, there is no direct
   relationship between the incoming and outgoing RTP sequence numbers,
   RTP timestamps, payload types used, etc.

   Mixer-based MANEs are conceptually easy to implement and can offer
   powerful features, primarily because they necessarily can ''see'' the
   payload (including the RTP payload headers), utilize the wealth of
   layering information available therein, and manipulate it.

   While a mixer-based MANE operation in its most trivial form
   (combining multiple RTP packet streams into a single one) can be
   implemented comparatively simply -- reordering the incoming packets
   according to the DON and sending them in the appropriate order --
   more complex forms can also be envisioned.  For example, a mixer-
   type MANE can be optimizing the outgoing RTP stream to the MTU size
   of the outgoing path by utilizing the aggregation and fragmentation
   mechanisms of this memo.

   A MANE can also act as a translator.  In this case, we envision its
   functionality to stream thinning, so to adhere to congestion control
   principles as discussed in section 11.  While the implementation of
   the forward (media) channel of such a MANE appears to be
   comparatively simple, the need to rewrite RTCP RRs makes even such a
   MANE a complex device.

   While the implementation complexity of either case of a MANE, as
   discussed above, is fairly high, the computational demands are
   comparatively low.  In particular, SVC and/or this specification
   contain means to easily generate the correct inter-layer decoding
   order of NAL units.  It is also simple to identify the fine
   granularity scalable bits in a given NAL unit.  No serious bit-
   oriented processing is required and no significant state information
   (beyond that of the signaling and perhaps the SVC sequence parameter
   sets) need to be kept.

13.5.     Scenarios currently not considered for complexity reasons

   -- vacat --

Wenger, Wang, Schierl     Expires January 09, 2008          [page 32]

Internet-Draft        RTP Payload Format for SVC Video       July 2007


13.6.     Scenarios currently not considered for being unaligned with
          IP philosophy

   Remarks have been made that the current draft does not take into
   consideration at least one application scenario which some JVT folks
   consider important.  In particular, their idea is to make the RTP
   payload format (or the media stream itself) self-contained enough
   that a stateless, non signaling aware device can ''thin'' an RTP
   session to meet the bandwidth demands of the endpoint.  They call
   this device a ''Router'' or ''Gateway'', and sometimes a MANE.
   Obviously, it's not a Router or Gateway in the IETF sense.  To
   distinguish it from a MANE as defined in RFC 3984 and in this
   specification, let's call it a MDfH (Magic Device from Heaven).

   To simplify discussions, let's assume point-to-point traffic only.
   The endpoint has a signaling relationship with the streaming server,
   but it is known that the MDfH is somewhere in the media path (e.g.
   because the physical network topology ensures this).  It has been
   requested, at least implicitly through MPEG's and JVT's requirements
   document, that the MDfH should be capable to intercept the SVC
   scalable bit stream, modify it by dropping packets or parts thereof,
   and forwarding the resulting packet stream to the receiving
   endpoint.  It has been requested that this payload specification
   contains protocol elements facilitating such an operation, and the
   argument has been made that the NRI field of RFC 3984 serves exactly
   the same purpose.

   The authors of this I-D do not consider the scenario above to be
   aligned with the most basic design philosophies the IETF follows,
   and therefore have not addressed the comments made (except through
   this section).  In particular, we see the following problems with
   the MDfH approach):

   - As the very minimum, the MDfH would need to know which RTP
     streams are carrying SVC.  We don't see how this could be
     accomplished but by using a static payload type.  None of the
     IETF defined RTP profiles envision static payload types for SVC,
     and even the de-facto profiles developed by some application
     standard organizations (3GPP for example) do not use this

Wenger, Wang, Schierl     Expires January 09, 2008          [page 33]

Internet-Draft        RTP Payload Format for SVC Video       July 2007

     outdated concept.  Therefore, the MDfH necessarily needs to be at
     least ''listening'' to the signaling.
   - If the RTP packet payload were encrypted, it would be impossible
     to interpret the payload header and/or the first bytes of the
     media stream.  We understand that there are crypto schemes under
     discussion that encrypt only the last n bytes of an RTP payload,
     but we are more than unsure that this is fully in line with the
     IETF's security vision.

   Even if the above two problems would have been overcome through
   standardization outside of the IETF, we still foresee serious design
   flaws:

   - An MDfH can't simply dump RTP packets it doesn't want to forward.
     It either needs to act as a full RTP Translator (implying that it
     rewrites RTCP RRs and such), or it needs to patch the RTP
     sequence numbers to fulfill the RTP specification.  Not doing
     either would, for the receiver, look like the gaps in the
     sequence numbers occurred due to unintentional erasures, which
     has interesting effects on congestion control (if implemented),
     will break pretty much every meta-payload ever developed, and so
     on.  (Many more points could be made here).
   - An MDfH also can't ''prune'' FGS packets.  Again, doing so would
     not be compatible with meta payloads, and would mess up RTCP RRs
     and congestion control (if the congestion control is based on
     octet count and not on packet count; there are discussions
     related to the former at least in the context of TFRC).

   In summary, based on our current knowledge we are not willing to
   specify protocol mechanisms that support an operation point that has
   so little in common with classic RTP use.

13.7.     SSRC Multiplexing

   The authors have complentated the idea of introducing SSRC
   multiplexing, i.e. allowing to send multiple RTP packet streams
   containing layers in the same RTP session, differentiated by SSRC
   values.  Our intention was to minimize the number of firewall
   pinholes in an endpoint to one, by using MANEs to aggregate multiple
   outgoing sessions stemming from a server into a single session (with

Wenger, Wang, Schierl     Expires January 09, 2008          [page 34]

Internet-Draft        RTP Payload Format for SVC Video       July 2007

   SSRC multiplexed packet streams).  We were hoping that would be
   feasible even with encrypted packets in an SRTP context.

   While an implementation along these lines indeed appears to be
   feasible for the forward media path, the RTCP RR rewrite cannot be
   implemented in the way necessary for this scheme to work.  This
   relates to the need to authenticate the RTCP RRs as per SRTP
   [RFC3711].  While the RTCP RR itself does not need to be rewritten
   by the scheme we envisioned, its transport addresses needs to be
   manipulated.  This, in turn, is incompatible with the mandatory
   authentication of RTCP RRs.  As a result, there would be a
   requirement that a MANE needs to be in the RTCP security context of
   the sessions, which was not envisioned in our use case.

   As the envisioned use case cannot be implemented, we refrained to
   add the considerable document complexity to support SSRC
   multiplexing herein.

14.  References

14.1.     Normative References

[RFC3550]   Schulzrinne, H., Casner, S., Frederick, R., and V.
            Jacobson, "RTP: A Transport Protocol for Real-Time
            Applications", STD 64, RFC 3550, July 2003.
[MPEG4-10]  ISO/IEC International Standard 14496-10:2005.
[H.264]     ITU-T Recommendation H.264, "Advanced video coding for
            generic audiovisual services", Version 4, July 2005.
[I-D.schierl-mmusic-layered-codec]
            Schierl, T., and Wenger, S, "Signaling media decoding
            dependency in Session Description Protocol (SDP)",
            draft-schierl-mmusic-layered-codec-04 (work in progress),
            June 2007.
[SVC]       Joint Video Team, ''Joint Draft 11 of SVC Amendment'',
            available from http://ftp3.itu.ch/av-arch/jvt-site
            /2007_06_Geneva/JVT-X201.zip, Geneva, Switzerland, June
            2007.
[RFC3984]   Wenger, S., Hannuksela, M, Stockhammer, T, Westerlund, M,
            Singer, D, ''RTP Payload Format for H.264 Video'', RFC 3984,
            February 2005.

Wenger, Wang, Schierl     Expires January 09, 2008          [page 35]

Internet-Draft        RTP Payload Format for SVC Video       July 2007

[RFC2119]   Bradner, S., "Key words for use in RFCs to Indicate
            Requirement Levels", BCP 14, RFC 2119, March 1997.

14.2.     Informative References

[DVB-H]     DVB - Digital Video Broadcasting (DVB); DVB-H
            Implementation Guidelines, ETSI TR 102 377, 2005
[H.241]     ITU-T Rec. H.241, ''Extended video procedures and control
            signals for H.300-series terminals'', May 2006
[IGMP]      Cain, B., Deering S., Kovenlas, I., Fenner, B. and
            Thyagarajan, A., ''Internet Group Management Protocol,
            Version 3'', RFC 3376, October 2002.
[McCanne/Vetterli]
            V. Jacobson, S. McCanne and M. Vetterli. Receiver-
            driven layered multicast. In Proc. of ACM SIGCOMM'96, pages
            117--130, Stanford, CA, August 1996.
[MBMS]      3GPP - Technical Specification Group Services and System
            Aspects; Multimedia Broadcast/Multicast Service (MBMS);
            Protocols and codecs (Release 6), December 2005.
[MPEG2]     ISO/IEC International Standard 13818-2:1993.
[RFC3711]   Baugher, M., McGrew, D, Naslund, M, Carrara, E,
            Norrman, K, ''The secure real-time transport protocol
            (SRTP)'', RFC 3711, March 2004.

15.  Author's Addresses

   Stephan Wenger                 Phone: +1-650-862-7368
   Nokia                          Email: stewe@stewe.org
   955 Page Mill Road
   Palo Alto, CA 94304
   USA

   Ye-Kui Wang                    Phone: +358-50-486-7004
   Nokia Research Center          Email: ye-kui.wang@nokia.com
   P.O. Box 100
   FIN-33721 Tampere
   Finland

   Thomas Schierl                 Phone: +49-30-31002-227
   Fraunhofer HHI                 Email: schierl@hhi.fhg.de

Wenger, Wang, Schierl     Expires January 09, 2008          [page 36]

Internet-Draft        RTP Payload Format for SVC Video       July 2007

   Einsteinufer 37
   D-10587 Berlin
   Germany

16.  Copyright Statement

   Full Copyright Statement

   Copyright (C) The IETF Trust (2007).

   This document is subject to the rights, licenses and restrictions
   contained in BCP 78, and except as set forth therein, the authors
   retain all their rights.

17.  Disclaimer of Validity

   This document and the information contained herein are provided on an
   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
   THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
   OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
   THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

18.  Intellectual Property Statement

   Intellectual Property

   The IETF takes no position regarding the validity or scope of any
   Intellectual Property Rights or other rights that might be claimed to
   pertain to the implementation or use of the technology described in
   this document or the extent to which any license under such rights
   might or might not be available; nor does it represent that it has
   made any independent effort to identify any such rights.  Information
   on the procedures with respect to rights in RFC documents can be
   found in BCP 78 and BCP 79.

   Copies of IPR disclosures made to the IETF Secretariat and any
   assurances of licenses to be made available, or the result of an
   attempt made to obtain a general license or permission for the use of

Wenger, Wang, Schierl     Expires January 09, 2008          [page 37]

Internet-Draft        RTP Payload Format for SVC Video       July 2007

   such proprietary rights by implementers or users of this
   specification can be obtained from the IETF on-line IPR repository at
   http://www.ietf.org/ipr.

   The IETF invites any interested party to bring to its attention any
   copyrights, patents or patent applications, or other proprietary
   rights that may cover technology that may be required to implement
   this standard.  Please address the information to the IETF at
   ietf-ipr@ietf.org.

19.  Acknowledgement

   Funding for the RFC Editor function is provided by the IETF
   Administrative Support Activity (IASA).
   Further, the author Thomas Schierl of Fraunhofer HHI is sponsored
   by the European Commission under the contract number
   FP6-IST-0028097, project ASTRALS.

20.  RFC Editor Considerations

   none

21.  Open Issues

   1. Packetization rules need work.
   2. Alignment with the SVC specification (ongoing)


22.  Changes Log

Version 00

- 29.08.2005, YkW: Initial version
- 29.09.2005, Miska: Reviewed and commented throughout the document
- 05.10.2006, StW: Editorial changes through the document, and
formatted the document in RFC payload format style

>From -00 to -01

- 04.02.2006, StW: Added details to scope

Wenger, Wang, Schierl     Expires January 09, 2008          [page 38]

Internet-Draft        RTP Payload Format for SVC Video       July 2007

- 04.02.2006, StW: Added short subsection 6.1 ''Design Principles''
- 04.02.2006, StW: Added section 15, ''Application Examples''
- 06.02 - 03.03.2006, YkW: Various modifications throughout the
document
- 13.02.2006 - 03.03.2006 , ThS: Added definitions and additional
information to section 3.3, 5.1, 7 and 8, parameters in section 9.1 and
added section 14 for NAL unit re-ordering for layered multicast.
Further modifications throughout the document

>From -01 to -02

- 06.03.2006, StW: Editorial improvements
- 26.05.2006, YkW: Updated NAL unit header syntax and semantics
according to the latest draft SVC spec
- 20.06.2006, Miska/YkW: Added section 6.10 ''Payload Content
Scalability Information (PACSI) NAL Unit''
- 20.06.2006, YkW: Updated the NAL unit reordering process for layered
multicast (removed the old section 14 ''Informative Appendix: NAL Unit
Re-ordering for Layered Multicast'' and added the new section 13 ''NAL
Unit Reordering for Layered Multicast'')

>From -02 to -03

- 05.09.2006, YkW: Updated the NAL unit header syntax, definitions,
etc., according to the foreseen July JVT output.  Updated possible MANE
adaptation operations according to SPID, TL, DID and QL.  Clarified the
removal of single NAL unit packetiztaion mode.  Added the support of
SSRC multiplexing in layered multicast.
- 08.09.2006, StW: Editorial changes throughout the document
- 08.09.2006, YkW: Added the packetization rule for suffix NAL unit.
- 19.09.2006, YkW: Moved/updated SSRC multiplexing support to section
6.2 ``RTP header usage''. Moved/updated the cross layer DON constraint
to Section 6.6 ``Decoding order number''. Moved/updated the
packetization rule when a SVC bistream is transported over more than
one RTP session to Section 7 ``Packetization rules''. Removed Section
13 ''Support of layered multicast''.
- 16.10, TS: Added detailed four-byte NAL unit header description.
Change ''AVC'' to ''H.264'' conforming to 3984. Modifications throughout
the document. Extended description of 3rd byte of PACSI NAL unit.
Corrected terms RTP session and RTP packet stream in case of SSRC

Wenger, Wang, Schierl     Expires January 09, 2008          [page 39]

Internet-Draft        RTP Payload Format for SVC Video       July 2007

multiplexing. Added terms in definition section on RTP multiplexing.
Constraints on optional MIME parameters of 3984 for cross-layer DON
(DON section and MIME parameters). Copied parts of SI paper regarding
mixer, translator and SSRC mux with SRTP to section application
examples. Added section on SDP usage with Session and SSRC
multiplexing. Added points in Design principles on translator/mixer and
RTP multiplexing. Added additional founding information in Ack-
section. Corrected reference for SVC and added reference for generic
signaling.
17.10, StW: Fixed many editorials, clarified MANE, mixer, translator
and RTP packet stream throughout doc (hopefully consistently)
18.10., removed comments, clarified B-Bit, changed definition of base-
layer (do not need to be of the lowest temporal resolution),

>From -03 to draft-ietf-avt-rtp-svc-00

   - 23.11.06, StW: Editorials throughout the memo
   - 23.11.06, StW: removed all occurrences of the security
     discussions, as they are incorrect.  When using SRTP, the RTCP is
     authenticated, implying that a translator cannot rewrite RTCP
     RRs, implying that RRs would be incorrect as soon as the session
     is modified (i.e. packets are being removed), implying that SSRC-
     mux does not work in multicast.
   - 23.11.06, StW: rewrote congestion control
   - 23.11.06, StW: removed application scenario related to SRTP, as
     this does not work (see above
   - 23.11.06, StW: added informative reference to H.241
   - 27/29.11.06, YkW: editorial changes throughout the document
   - 27/29.11.06, YkW: alignment with the SVC specification
   - 19.12.06, TS:
     TS: [SVC] is now the complete Joint Draft of H.264
     TS: Removed SSRC Multiplexing
     TS: Changed use cases for MANE as a translator
     TS: Editorials throughout the document, alignment with SVC spec.
   - 20-28.12.06, StW/TS/YkW: editorial changes throughout the
     document

>From draft-ietf-avt-rtp-svc-00 to draft-ietf-avt-rtp-svc-01
   - 23.02.07, YkW/Miska Hannuksela: Added enhancements to PACSI NAL
     unit

Wenger, Wang, Schierl     Expires January 09, 2008          [page 40]

Internet-Draft        RTP Payload Format for SVC Video       July 2007

   - 01.03.07, Jonathan Lennox/YkW: Added recommendatory packetization
     rules for SEI messages and non-VCL NAL units
   - 05.03.07, Thomas Wiegand/YkW: Added the fields of picture start,
     picture end, and Tl0PicIdx to PACSI NAL unit
   - 05.03.07, TS: Draft conforms to new I-D style

>From draft-ietf-avt-rtp-svc-01 to draft-ietf-avt-rtp-svc-02
     25-June-2007: TS
     Clarified definitions Layer, Operation Points,
     Removed FGS
     Aligned with JVT-W201 spec
     Use of DON in de-packetization
     Congestion control
     25-June-2007: YkW
     Edit throughout the spec, aligned with JVT-X201 SVC spec
     09-July-2007: TS
     Further modifications and alignments with JVT-X201.


Wenger, Wang, Schierl     Expires January 09, 2008          [page 41]