Audio/Video Payload WG                                     T. Schierl
Internet Draft                                         Fraunhofer HHI
Intended status: Standards track                            S. Wenger
Expires: August 2012                                            Vidyo
                                                           Y.-K. Wang
                                                             Qualcomm
                                                     M. M. Hannuksela
                                                                Nokia
                                                    February 27, 2012


            RTP Payload Format for High Efficiency Video Coding
                   draft-schierl-payload-rtp-h265-00.txt


Status of this Memo

   This Internet-Draft is submitted to IETF in full conformance with
   the provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six
   months and may be updated, replaced, or obsoleted by other documents
   at any time.  It is inappropriate to use Internet-Drafts as
   reference material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt.

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

   This Internet-Draft will expire on August 27, 2012.

Copyright and License Notice

   Copyright (c) 2012 IETF Trust and the persons identified as the
   document authors.  All rights reserved.


Wenger, et al          Expires August 27, 2012                [Page 1]

Internet-Draft       RTP Payload Format for HEVC          February 2012


   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document. Please review these documents
   carefully, as they describe your rights and restrictions with
   respect to this document. Code Components extracted from this
   document must include Simplified BSD License text as described in
   Section 4.e of the Trust Legal Provisions and are provided without
   warranty as described in the Simplified BSD License.


Schierl, et al         Expires August 27, 2012                [Page 2]

Internet-Draft       RTP Payload Format for HEVC          February 2012


Abstract

   This memo describes an RTP payload format for High Efficiency Video
   Coding (HEVC) [HEVC], which is currently being developed by the
   Joint Collaborative Team on Video Coding (JCT-VC).  The RTP payload
   format allows for packetization of one or more Network Abstraction
   Layer  (NAL)  units  in  each  RTP  packet  payload,  as  well  as
   fragmentation of a NAL unit into multiple RTP packets.  Furthermore,
   it supports transmission of an HEVC stream over a single as well as
   multiple RTP flows.  The payload format has wide applicability in
   videoconferencing,  Internet  video  streaming,  and  high  bit-rate
   entertainment-quality video, among others.


Table of Contents

   Status of this Memo ............................................. 1
   Abstract ........................................................ 3
   Table of Contents ............................................... 3
   1 . Introduction ................................................ 5
      1.1 . The HEVC Codec.......................................... 5
         1.1.1 Overview ............................................ 5
         1.1.2 Parallel Processing Support ......................... 6
         1.1.3 Parameter Sets  ..................................... 9
         1.1.4  NAL Unit Header .................................... 9
      1.2 . Overview of the Payload Format ........................ 11
   2 . Conventions ................................................ 12
   3 . Definitions and Abbreviations .............................. 12
      3.1 Definitions ............................................. 12
         3.1.1 Definitions from the HEVC Specification ............ 12
         3.1.2 Definitions Specific to This Memo .................. 13
      3.2 Abbreviations ........................................... 14
   4 . RTP Payload Format ......................................... 14
      4.1 RTP Header Usage......................................... 14
      4.2 NAL Unit Header Usage ................................... 16
      4.3 Payload Structures ...................................... 16
      4.4 Transmission Modes ...................................... 17
      4.5 Packetization Modes ..................................... 17
      4.6 Decoding Order .......................................... 18
      4.7 Aggregation Packets ..................................... 20
         4.7.1 Single Time Aggregation Packet (STAP) .............. 22
      4.8 Fragmentation Units (FUs) ............................... 24
   5 . Packetization Rules ........................................ 28
      5.1 Common Packetization Rules .............................. 28
      5.2 Non-Interleaved mode .................................... 29
      5.3 Interleaved mode......................................... 29


Schierl, et al         Expires August 27, 2012                [Page 3]

Internet-Draft       RTP Payload Format for HEVC          February 2012


   6 . De-Packetization Process  .................................. 29
      6.1 Non-Interleaved Mode .................................... 30
      6.2 Interleaved Mode......................................... 30
         6.2.1 Size of the De-interleaving Buffer ................. 30
         6.2.2 De-interleaving Process ............................ 31
      6.3 Additional De-Packetization Guidelines .................. 33
   7 . Payload Format Parameters  ................................. 33
      7.1 Media Type Registration  ................................ 34
      7.2 SDP Parameters .......................................... 39
         7.2.1 Mapping of Payload Type Parameters to SDP .......... 39
         7.2.2 Usage with the SDP Offer/Answer Model .............. 39
         7.2.3 Usage with SDP Offer/Answer Model .................. 40
         7.2.4 Usage in Declarative Session Descriptions .......... 40
         7.2.5 Signaling of Parallel Processing ................... 40
      7.3 Examples ................................................ 41
      7.4 Parameter Set Considerations ............................ 41
   8 . Security Considerations  ................................... 41
   9 . Congestion Control ......................................... 41
   10 . IANA Consideration......................................... 41
   11 . Informative Appendix: Application Examples ................ 41
      11.1 Introduction ........................................... 41
      11.2 Streaming .............................................. 41
      11.3 Videoconferencing (Unicast to MANE, Unicast to Endpoints)41
      11.4 Mobile TV (Multicast to MANE, Unicast to Endpoint) ..... 41
   12 . Acknowledgements .......................................... 41
   13 . References ................................................ 42
      13.1 Normative References ................................... 42
      13.2 Informative References ................................. 42
   14 . Authors' Addresses......................................... 43


Schierl, et al         Expires August 27, 2012                [Page 4]

Internet-Draft       RTP Payload Format for HEVC          February 2012


1. Introduction

1.1. The HEVC Codec

1.1.1 Overview

   High Efficiency Video Coding [HEVC] is a forthcoming video coding
   standard under development by the Joint Collaborative Team on Video
   Coding (JCT-VC) formed by the ITU-T and ISO/IEC. It is reported to
   provide significantly coding efficiency gains over H.264 [H.264].
   The standard will be found under ISO/IEC as ISO/IEC 23008-2,
   informally as MPEG H Part 2. ITU-T may decide soon on the final
   recommendation number.

   H.264  and  HEVC  share  a  similar  hybrid  video  codec  design.
   Conceptually, both technologies include a video coding layer (VCL),
   and a network abstraction layer (NAL).

   The VCL of HEVC includes a prediction stage that involves motion
   compensation  and  spatial  intra-prediction,  integer  transforms
   applied to prediction residuals, and an entropy coding stage that
   uses an arithmetic coding. As in H.264, in-loop deblocking filtering
   is applied to the reconstructed picture.

   An important difference of HEVC compared to H.264 is the coding
   structure within a picture. In HEVC each picture is divided into
   treeblocks  of  up  to  64x64  luma  samples.    Treeblocks  can  be
   recursively split into smaller Coding Units (CUs) using a generic
   quad-tree segmentation structure. CUs can be further split into
   Prediction Units (PUs) used for intra- and inter-prediction and
   Transform Units (TUs) defined for transform and quantization.  HEVC
   includes integer transforms for a number of TU sizes.  HEVC also
   includes two new in-loop filters that may be applied after the
   deblocking filtering: Sample Adaptive Offset (SAO) and Adaptive Loop
   Filter (ALF).

   On  random  accessibility  provisioning,  HEVC  introduces  besides
   Instantaneous Decoder Refresh (IDR) pictures a Clean Random Access
   (CRA) picture, which is similar to what has been conventionally
   called open Group-of-Pictures (GOP) intra picture.  Compared to
   H.264 wherein a CRA picture may be signalled using a recovery point
   Supplemental  Enhancement  Information  (SEI)  message,  in  HEVC  a
   distinct NAL unit type is used for indication of a CRA picture.
   Furthermore, HEVC specifies that a conforming bitstream may start
   with a CRA picture, compared to in H.264 a conforming must start
   with an IDR picture.


Schierl, et al         Expires August 27, 2012                [Page 5]

Internet-Draft       RTP Payload Format for HEVC          February 2012


   Temporal layer access (TLA) pictures were introduced in HEVC to
   indicate temporal layer switching points.

   Predictively  coded  pictures  can  include  uni-predicted  and  bi-
   predicted slices.  The flexibility in creating picture coding
   structures is roughly comparable to H.264.

   The VCL generates and consumes syntax structures designed to be
   adaptable to MTU sizes commonly found in IP networks, irrespective
   of the size of a coded picture.  Picture segmentation is achieved
   through slices.  A concept of "fine granularity slices" (FGS) is
   included that allows to create slice boundaries within a treeblock.

   The Network Adaptation Layer (NAL) is responsible for information
   required to the decoding process of more than one slice, which are
   collected in parameter sets.  A number of data structures not
   strictly required for the decoding process, but potentially helpful
   in decoding systems can be conveyed in data structures such as
   Supplementary Enhancement Information (SEI) messages, Access unit
   delimiters, and so on.

   All the aforementioned MTU-sized (or smaller) data structures are
   available in the form of Network Adaptation Layer Units.

   The single distinguishing difference between HEVC and H.264 with
   respect to the RTP payload format design is the availability of VCL-
   based coding tools that are specifically designed to enable
   processing on high-level parallel architectures.  These tools are
   described below in sufficient detail to provide motivation for the
   parallel processing signaling support that is described in section
   7.2.5.

1.1.2 Parallel Processing Support

   The reportedly significantly higher computational demand of HEVC
   over H.264, in conjunction with the ever increasing video resolution
   (both spatially and temporally) required by the market, led to the
   adoption of VCL coding tools specifically targeted to allow for
   parallelization on the sub-picture level.  That is, parallelization
   occurs, at the minimum, at the granularity of an integer number of
   treeblocks. The targets for this type of high-level parallelization
   are multicore CPUs and DSPs as well as multiprocessor systems.  In a
   system design, to be useful, these tools require signaling support,
   which is provided in section 7.2.5 of this memo.  This section
   provides a brief overview of the tools available in [HEVC].  This
   section is expected to be updated frequently as the HEVC draft
   evolves.


Schierl, et al         Expires August 27, 2012                [Page 6]

Internet-Draft       RTP Payload Format for HEVC          February 2012


   For  parallelization,  four  picture  partition  strategies  are
   available.

   Regular  slices  are  segments  of  the  bitstream  that  can  be
   reconstructed independently from other regular slices within the
   same picture (though there may still be interdependencies through
   loop filtering operations).  Regular slices are the only tool that
   can be used for parallelization that is also available, in virtually
   identical form, in H.264.  Regular slices based parallelization does
   not require much inter-processor or inter-core communication (except
   for  inter-processor  or  inter-core  data  sharing  for  motion
   compensation when decoding a predictively coded picture, which is
   typically much heavier than inter-processor or inter-core data
   sharing due to in-picture prediction), as slices are designed to be
   independently decodable.  However, for the same reason, regular
   slices can require some coding overhead.  Further, regular slices
   (in contrast to some of the other tools mentioned below) also serve
   as the key mechanism for bitstream partitioning to match MTU size
   requirements, due to the in-picture independence of regular slices
   and that each regular slice is encapsulated in its own NAL unit.  In
   many cases, the goal of parallelization and the goal of MTU size
   matching can place contradicting demands to the slice layout in a
   picture.  The realization of this situation led to the development
   of the more advanced tools mentioned below.  This payload format
   does not contain any specific mechanisms aiding parallelization
   through regular slices.

   Entropy  slices,  like  regular  slices,  break  entropy  decoding
   dependencies but allow prediction (and filtering) to cross slice
   boundaries.  Insofar, they can be used as a lightweight mechanism to
   parallelize the entropy decoding, without having impact on other
   decoding steps.  The lightweightness comes from that though each
   entropy slice is encapsulated into its own NAL unit, it has a much
   shorter slice header as most of the slice header syntax elements are
   not present and must be inherited from the preceding full slice
   header.  Due to the allowance of in-picture prediction between
   neighboring entropy slices within a picture, the required inter-
   processor/inter-core communication to enable in-picture prediction
   can be substantial.  Due to the same reason, entropy slices cannot
   be used for MTU size matching.  Entropy slices appear to be only
   useful for system architectures that execute the entropy decoding
   process on a multicore/multi-CPU architecture, but execute the
   remaining decoding functionality on dedicated signal processing
   hardware.  At the time of writing, entropy slices are not included
   in any profile defined in draft HEVC.  No support of entropy slices
   is included in this memo.


Schierl, et al         Expires August 27, 2012                [Page 7]

Internet-Draft       RTP Payload Format for HEVC          February 2012


   In Wavefront Parallel Processing, the picture is partitioned into
   rows of treeblocks.  Entropy decoding and prediction are allowed to
   use data from treeblocks in other partitions.  Parallel processing
   is possible through parallel decoding of rows of treeblocks, where
   the start of the decoding of a row is delayed by two treeblocks, so
   to ensure that data related to a treeblock above and to the right of
   the subject treeblock is available before the subject treeblock is
   being decoded.  Using this staggered start (which appears like a
   wavefront when represented graphically), parallelization is possible
   with  up  to  as  many  processors/cores  as  the  picture  contains
   treeblock rows.  At the time of writing, the draft HEVC includes a
   mechanism to organize the coded bits of different treeblock rows to
   be friendly to a particular number of parallel processors/cores.
   For example, it is possible that coded bits of even numbers of
   treeblock rows (treeblock rows 0, 2, 4, ...) all come before coded
   bits of odd numbers of treeblock rows (treeblock rows 1, 3, 5, ...),
   such   that   the   bitstream   is   friendly   to   two   parallel
   processors/cores, though decoding of an earlier-coming treeblock row
   (e.g. treeblock row 2) refers to an later-coming treeblock row (e.g.
   treeblock row 1).  Similarly as entropy slices, due to the allowance
   of in-picture prediction between neighboring treeblock rows within a
   picture, the required inter-processor/inter-core communication to
   enable in-picture prediction can be substantial.  The wavefront
   parellel processing partitioning does not result into more NAL units
   compared  to  when  it  is  not  applied,  thus  wavefront  parellel
   processing cannot be used for MTU size matching.  At the time of
   writing, wavefront parallel processing is not included in any
   profile of draft HEVC.  This memo does not specify support for it.

   Tiles define horizontal and vertical boundaries that partition a
   picture into tile columns and rows.  The scan order of treeblocks is
   changed to be local within a tile (in the order of a treeblock
   raster can of a tile), before decoding the top-left treeblock of the
   next tile in the order of tile raster scan of a picture.  Similar to
   regular  slices,  tiles  break  in-picture  prediction  dependencies
   (including entropy decoding dependencies).  However, they do not
   need to be included into individual NAL units (same as wavefront
   parallel processing in this regard), hence tiles cannot be used for
   MTU  size  matching.    Each  tile  can  be  processed  by  one
   processor/core,  and  the  inter-processor/inter-core  communication
   required for in-picture prediction between processing units decoding
   neighboring tiles is limited to conveying the shared slice header in
   cases a slice is spanning more than one tile, and loop filtering
   related sharing of reconstructed samples and metadata.  Insofar,
   tiles are less demanding in terms of memory bandwidth compared to
   WPP due to the in-picture independence between two neighboring
   partitions.  Tiles are included in the (single) existing profile of


Schierl, et al         Expires August 27, 2012                [Page 8]

Internet-Draft       RTP Payload Format for HEVC          February 2012


   [EHVC] and the support in the context of this memo will be specified
   in section 7 of this memo.

   The interaction between regular slices and tiles is simplified by
   constraints of the HEVC draft.  Specifically, for each slice and
   tile, either or both of the following conditions must be fulfilled:
   1) all coded blocks in a slice belong to the same tile; 2) all coded
   blocks in a tile belong to the same slice.

1.1.3 Parameter Sets

   The parameter set concept is borrowed from [H.264].  In addition to
   Sequence Parameter Sets (SPS), carrying data valid to the whole
   video  sequence,  and  Picture  Parameter  Sets  (PPS),  carrying
   information valid on a picture by picture base, the new Adaption
   Parameters Sets (APS) carries picture-adaptive information that is
   also valid on a picture by picture base but is expected to change
   (typically much) more frequently than the information in PPS.

1.1.4 NAL Unit Header

   HEVC maintains the NAL unit concept of H.264 with modifications.
   HEVC uses a two-byte NAL unit header.  Table 1 lists the allocation
   of NAL unit types for VCL NAL units and non-VCL NAL units.


Schierl, et al         Expires August 27, 2012                [Page 9]

Internet-Draft       RTP Payload Format for HEVC          February 2012


                   Table 1.  NAL unit types in HEVC

      Type    NAL Unit Name                        NAL unit type class
      ----------------------------------------------------------------
       0      Unspecified                                non-VCL
       1      Coded slice of a non-IDR, non-CRA          VCL
                          and non-TLA pictures
       2      Reserved                                   -
       3      Coded slice of a TLA picture               VCL
       4      Coded slice of a CRA picture               VCL
       5      Coded slice of an IDR picture              VCL
       6      Supplemental enhancement information (SEI) non-VCL
       7      Sequence parameter set                     non-VCL
       8      Picture parameter set                      non-VCL
       9      Access unit delimiter                      non-VCL
      10..11  Reserved                                   -
      12      Filler data                                non-VCL
      13      Reserved                                   -
      14      Adaptation parameter set                   non-VCL
      15..23  Reserved                                   -
      24..63  unspecified                                non-VCL

   The syntax and semantics of the NAL unit header are specified in
   [HEVC], but the essential properties of the NAL unit header are
   summarized below for convenience.

   The first byte of the NAL unit header has the following format:

         +---------------+
         |0|1|2|3|4|5|6|7|
         +-+-+-+-+-+-+-+-+
         |F|N|   Type    |
         +---------------+

   The semantics of the components of the NAL unit type octets, as
   specified in [HEVC], are described briefly below.  In addition to
   the name and size of each field, the corresponding syntax element
   name in [HEVC] is also provided.

   F: 1 bit
      forbidden_zero_bit.  HEVC declares a value of 1 as a syntax
      violation.  Note: the bit is wasted for compatibility with MPEG-2
      transport systems.

   N: 1 bit
      nal_ref_flag.  A value of 0 indicates that the content of the NAL
      unit is not used to reconstruct reference pictures for future


Schierl, et al         Expires August 27, 2012               [Page 10]

Internet-Draft       RTP Payload Format for HEVC          February 2012


      prediction.  Such NAL units can be discarded without potentially
      damaging the integrity of the reference pictures.  A value of 1
      indicates that the decoding of the NAL unit is required to
      maintain the integrity of reference pictures or that the NAL unit
      contains a parameter set.

   Type: 6 bits
      nal_unit_type.  This component specifies the NAL unit type as
      defined in Table 7-1 of [HEVC], and in Table 1 in this memo.  For
      a reference of all currently defined NAL unit types and their
      semantics, please refer to Section 7.4.1 in [HEVC].

      In NAL units specified by HEVC, the second octet in the NAL unit
      header is shown below.

            +---------------+
            |0|1|2|3|4|5|6|7|
            +-+-+-+-+-+-+-+-+
            | TID |    R    |
            +---------------+


   TID: 3 bits
      temporal_id.  This component indicates the temporal identifier of
      the NAL unit in the coded sequence.  For IDR pictures or CRA
      pictures the value is 0.  For TLA pictures the value of
      temporal_id must be greater than 0.

   R: 5 bits
      reserved_5 bits.  Reserved bits for future extension (such as
      scalability and three-dimension video extensions).  R MUST be
      equal to "00001" (in binary form).  Decoders must ignore (i.e.
      remove from the bitstream and discard) NAL units with values of
      reserved_one_5bits not equal to '00001'.

   This memo extends the semantics of F, N, and TID, as described in
   Section 4.2.

1.2. Overview of the Payload Format

   This payload format defines the following processes required for
   transport of HEVC coded data over RTP [RFC3550]:

   o Usage of RTP header with this payload format

   o Packetization of HEVC coded NAL units into RTP packets


Schierl, et al         Expires August 27, 2012               [Page 11]

Internet-Draft       RTP Payload Format for HEVC          February 2012


   o Transmission of HEVC NAL units of the same bitstream within a
      single RTP session or within multiple RTP sessions

   o Payload format parameters to be used within the Session
      Description Protocol (SDP) [RFC4566].

2. Conventions

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in BCP 14, RFC 2119
   [RFC2119].

   This specification uses the notion of setting and clearing a bit
   when bit fields are handled.  Setting a bit is the same as assigning
   that bit the value of 1 (On).  Clearing a bit is the same as
   assigning that bit the value of 0 (Off).

3. Definitions and Abbreviations

3.1 Definitions

   This document uses the terms and definitions of [HEVC].  Section
   3.1.1 lists relevant definitions copied from [HEVC] for convenience.
   Section 3.1.2 gives definitions specific to this memo.

3.1.1 Definitions from the HEVC Specification

      access unit: A set of NAL units that are consecutive in decoding
      order and contain exactly one coded picture. In addition to the
      coded slice NAL units of the coded picture, the access unit may
      also contain other NAL units not containing slices of the coded
      picture.  The decoding of an access unit always results in a
      decoded picture.

      coded video sequence: A sequence of access units that consists,
      in decoding order, of an IDR access unit followed by zero or more
      non-IDR access units including all subsequent access units up to
      but not including any subsequent IDR access unit.

      CRA access unit: An access unit in which the coded picture is a
      CRA picture.

      CRA picture: A coded picture containing only I slices and for
      which each slice has nal_unit_type equal to 4; all coded pictures
      that follow the Clean Random Access (CRA) picture both in
      decoding order and output order shall not use inter prediction


Schierl, et al         Expires August 27, 2012               [Page 12]

Internet-Draft       RTP Payload Format for HEVC          February 2012


      from any picture that precedes the CRA picture either in decoding
      order or output order; and any picture that precedes the CRA
      picture in decoding order also precedes the CRA picture in output
      order.

      IDR access unit: An access unit in which the coded picture is an
      IDR picture.

      IDR picture: A coded picture for which the variable IdrPicFlag is
      equal to 1. An IDR picture causes the decoding process to mark
      all reference pictures as "unused for reference". All coded
      pictures that follow an IDR picture in decoding order can be
      decoded without inter prediction from any picture that precedes
      the IDR picture in decoding order.  The first picture of each
      coded video sequence in decoding order is an IDR picture.

      Random Access: The act of starting the decoding process for a
      bitstream at a point other than the beginning of the stream.

      Tile: An integer number of treeblocks co-occurring in one column
      and one row (each of which comprising one or more columns or rows
      of treeblocks), ordered consecutively in treeblock raster scan of
      the tile.  The division of each picture into tiles is a
      partitioning. Tiles in a picture are ordered consecutively in
      tile raster scan of the picture.  Although a slice contains
      treeblocks that are consecutive in treeblock raster scan of a
      tile, these treeblocks are not necessarily consecutive in
      treeblock raster scan of the picture.

3.1.2 Definitions Specific to This Memo

      media aware network element (MANE): A network element, such as a
      middlebox or application layer gateway that is capable of parsing
      certain aspects of the RTP payload headers or the RTP payload and
      reacting to their contents.

         Informative note: The concept of a MANE goes beyond normal
         routers or gateways in that a MANE has to be aware of the
         signaling (e.g., to learn about the payload type mappings of
         the media streams), and in that it has to be trusted when
         working with SRTP.  The advantage of using MANEs is that they
         allow packets to be dropped according to the needs of the
         media coding.  For example, if a MANE has to drop packets due
         to congestion on a certain link, it can identify and remove
         those packets whose elimination produces the least adverse
         effect on the user experience.  After dropping packets, MANEs


Schierl, et al         Expires August 27, 2012               [Page 13]

Internet-Draft       RTP Payload Format for HEVC          February 2012


         must rewrite RTCP packets to match the changes to the RTP
         packet stream as specified in Section 7 of [RFC3550].

      NAL unit decoding order: A NAL unit order that conforms to the
      constraints on NAL unit order given in Section 7.4.1.2.3 in
      [HEVC].

      NALU-time: The value that the RTP timestamp would have if the NAL
      unit would be transported in its own RTP packet.

      RTP packet stream: A sequence of RTP packets with increasing
      sequence numbers (except for wrap-around), identical PT and
      identical SSRC (Synchronization Source), carried in one RTP
      session.  Within the scope of this memo, one RTP packet stream is
      utilized to transport one or more layers.

      transmission order: The order of packets in ascending RTP
      sequence number order (in modulo arithmetic).  Within an
      aggregation packet, the NAL unit transmission order is the same
      as the order of appearance of NAL units in the packet.

3.2 Abbreviations

   TBD

4. RTP Payload Format

4.1 RTP Header Usage

   The format of the RTP header is specified in [RFC3550] and reprinted
   in Figure 1 for convenience.  This payload format uses the fields of
   the header in a manner consistent with that specification.

   The RTP payload (and the settings for some RTP header bits) for
   aggregation packets and fragmentation units are specified in
   Sections 4.6 and 4.8, respectively.


Schierl, et al         Expires August 27, 2012               [Page 14]

Internet-Draft       RTP Payload Format for HEVC          February 2012


    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |V=2|P|X|  CC   |M|     PT      |       sequence number         |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                           timestamp                           |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |           synchronization source (SSRC) identifier            |
   +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
   |            contributing source (CSRC) identifiers             |
   |                             ....                              |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                Figure 1 RTP header according to [RFC3550]


   The RTP header information to be set according to this RTP payload
   format is set as follows:

   Marker bit (M): 1 bit

     Set for the very last packet of the access unit indicated by the
      RTP timestamp, in line with the normal use of the M bit in video
      formats, to allow an efficient playout buffer handling.  For
      aggregation packets (STAP), the marker bit in the RTP header MUST
      be set to the value that the marker bit of the last NAL unit of
      the aggregation packet would have been if it were transported in
      its own RTP packet.  Decoders MAY use this bit as an early
      indication of the last packet of an access unit but MUST NOT rely
      on this property.

         Informative note: Only one M bit is associated with an
         aggregation packet carrying multiple NAL units.  Thus, if a
         gateway has re-packetized an aggregation packet into several
         packets, it cannot reliably set the M bit of those packets.

   Payload type (PT): 7 bits

      The assignment of an RTP payload type for this new packet format
      is outside the scope of this document and will not be specified
      here.  The assignment of a payload type has to be performed
      either through the profile used or in a dynamic way.


Schierl, et al         Expires August 27, 2012               [Page 15]

Internet-Draft       RTP Payload Format for HEVC          February 2012


   Sequence number (SN): 16 bits

      Set and used in accordance with RFC 3550.  In some packetization
      modes (list TBD), the sequence number is used to determine
      decoding order for the NALUs.

   Timestamp: 32 bits

      The RTP timestamp is set to the sampling timestamp of the
      content. A 90 kHz clock rate MUST be used.

      If the NAL unit has no timing properties of its own (e.g.,
      parameter set and SEI NAL units), the RTP timestamp is set to the
      RTP timestamp of the coded picture of the access unit in which
      the NAL unit is included, according to Section 7.4.1.2.3 of
      [HEVC].

      Receivers SHOULD ignore any picture timing SEI messages included
      in access units that have only one display timestamp.  Instead,
      receivers SHOULD use the RTP timestamp for synchronizing the
      display process.  If one access unit has more than one display
      timestamp carried in a picture timing SEI message, then the
      information in the SEI message SHOULD be treated as relative to
      the RTP timestamp, with the earliest event occurring at the time
      given by the RTP timestamp and subsequent events later, as given
      by the difference in picture time values carried in the picture
      timing SEI message.  Let tSEI1, tSEI2, ..., tSEIn be the display
      timestamps carried in the SEI message of an access unit, where
      tSEI1 is the earliest of all such timestamps.  Let tmadjst() be a
      function that adjusts the SEI messages time scale to a 90-kHz
      time scale.  Let TS be the RTP timestamp.  Then, the display time
      for the event associated with tSEI1 is TS.  The display time for
      the event with tSEIx, where x is [2..n], is TS + tmadjst (tSEIx -
      tSEI1).

4.2 NAL Unit Header Usage

   The structure and semantics of the NAL unit header according to the
   HEVC specification [HEVC] were introduced in Section 1.1.4.  This
   section specifies the extended semantics of the NAL unit header
   fields.

4.3 Payload Structures

   The NAL unit structure is central to HEVC [HEVC], all HEVC coded
   bits for representing a video signal are encapsulated in NAL units.
   Therefore each RTP packet payload is structured as a NAL unit, which


Schierl, et al         Expires August 27, 2012               [Page 16]

Internet-Draft       RTP Payload Format for HEVC          February 2012


   contains one or a part of one NAL unit specified in HEVC, or
   aggregates one or more NAL units specified in HEVC.

4.4 Transmission Modes

   This memo enables transmission of an HEVC bitstream over a single
   RTP session or multiple RTP sessions.

   TBD: SSRC Muxing for video conf. + TV broadcast/multicast.

4.5 Packetization Modes

   This memo specifies the following packetization modes:

   o Non-interleaved mode

   o Interleaved mode

   In the non-interleaved mode, NAL units are transmitted in NAL unit
   decoding order. The interleaved mode allows transmission of NAL
   units out of NAL unit decoding order.

   The packetization mode in use MAY be signaled by the value of the
   OPTIONAL packetization-mode media type parameter.  The used
   packetization mode governs which NAL unit types are allowed in RTP
   payloads.  Table 2 summarizes the allowed packet payload types for
   each packetization mode.  Packetization modes are explained in more
   detail in section 6.

     Table 2.  Summary of allowed NAL unit types for each packetization
             mode (yes = allowed, no = disallowed, ig = ignore)

      Payload Packet      Non-Interleaved    Interleaved
      Type    Type              Mode             Mode
      -------------------------------------------------
      0      reserved           ig               ig
      1-23   NAL unit          yes               no
      24     STAP-A            yes               no
      25     STAP-B             no              yes
      26     FU-A              yes              yes
      27     FU-B               no              yes
      28-63  reserved           ig               ig

   Some NAL unit or payload type values (indicated as reserved in
   Table 2) are reserved for future extensions.  NAL units of those
   types SHOULD NOT be sent by a sender (direct as packet payloads, or
   as aggregation units in aggregation packets, or as fragmented units


Schierl, et al         Expires August 27, 2012               [Page 17]

Internet-Draft       RTP Payload Format for HEVC          February 2012


   in FU packets) and MUST be ignored by a receiver. For example, the
   payload types 1-23, with the associated packet type "NAL unit", are
   allowed in "Non-Interleaved Mode", but disallowed in "Interleaved
   Mode".  However, NAL units of NAL unit types 1-23 can be used in
   "Interleaved Mode" as aggregation units in STAP-B packets as well as
   fragmented units in FU-A and FU-B packets.  Similarly, NAL units of
   NAL unit types 1-23 can also be used in the "Non-Interleaved Mode"
   as aggregation units in STAP-A packets or fragmented units in FU-A
   packets, in addition to being directly used as packet payloads.

4.6 Decoding Order

   In the interleaved packetization mode, the transmission order of NAL
   units is allowed to differ from the decoding order of the NAL units.
   Decoding order number (DON) is a field in the payload structure or a
   derived variable that indicates the NAL unit decoding order.
   Rationale and examples of use cases for transmission out of decoding
   order and for the use of DON are given in section 13.

   The coupling of transmission and decoding order is controlled by the
   OPTIONAL sprop-interleaving-depth media type parameter as follows.
   When the value of the OPTIONAL sprop-interleaving-depth media type
   parameter is equal to 0 (explicitly or per default), the
   transmission order of NAL units MUST conform to the NAL unit
   decoding order.  When the value of the OPTIONAL sprop-interleaving-
   depth media type parameter is greater than 0,

   o the order of NAL units generated by de-packetizing STAP-Bs, and
      FUs in two consecutive packets is NOT REQUIRED to be the NAL unit
      decoding order.

   The RTP payload structures for an STAP-A, and an FU-A do not include
   DON.  STAP-B and FU-B structures include DON.

      Informative note: When an FU-A occurs in interleaved mode, it
      always follows an FU-B, which sets its DON.

      Informative note: If a transmitter wants to encapsulate a single
      NAL unit per packet and transmit packets out of their decoding
      order, STAP-B packet type can be used.

   In the non-interleaved packetization mode, the transmission order of
   NAL units in single NAL unit packets, STAP-As, and FU-As MUST be the
   same as their NAL unit decoding order.  The NAL units within an STAP
   MUST appear in the NAL unit decoding order.  Thus, the decoding
   order is first provided through the implicit order within a STAP,


Schierl, et al         Expires August 27, 2012               [Page 18]

Internet-Draft       RTP Payload Format for HEVC          February 2012


   and second provided through the RTP sequence number for the order
   between STAPs, FUs, and single NAL unit packets.

   Signaling of the value of DON for NAL units carried in STAP-B, and a
   series of fragmentation units starting with an FU-B is specified in
   sections 4.7.1, and 4.8, respectively.  The DON value of the first
   NAL unit in transmission order MAY be set to any value.  Values of
   DON are in the range of 0 to 65535, inclusive.  After reaching the
   maximum value, the value of DON wraps around to 0.

   The decoding order of two NAL units contained in any STAP-B, or a
   series of fragmentation units starting with an FU-B is determined as
   follows.  Let DON(i) be the decoding order number of the NAL unit
   having index i in the transmission order.  Function don_diff(m,n) is
   specified as follows:

         If DON(m) == DON(n), don_diff(m,n) = 0

         If (DON(m) < DON(n) and DON(n) - DON(m) < 32768),
         don_diff(m,n) = DON(n) - DON(m)

         If (DON(m) > DON(n) and DON(m) - DON(n) >= 32768),
         don_diff(m,n) = 65536 - DON(m) + DON(n)

         If (DON(m) < DON(n) and DON(n) - DON(m) >= 32768),
         don_diff(m,n) = - (DON(m) + 65536 - DON(n))

         If (DON(m) > DON(n) and DON(m) - DON(n) < 32768),
         don_diff(m,n) = - (DON(m) - DON(n))

   A positive value of don_diff(m,n) indicates that the NAL unit having
   transmission order index n follows, in decoding order, the NAL unit
   having transmission order index m.  When don_diff(m,n) is equal to
   0, then the NAL unit decoding order of the two NAL units can be in
   either order.  A negative value of don_diff(m,n) indicates that the
   NAL unit having transmission order index n precedes, in decoding
   order, the NAL unit having transmission order index m.

   Values of the DON field MUST be such that the decoding order
   determined by the values of DON, as specified above, conforms to the
   NAL unit decoding order.  If the order of two NAL units in NAL unit
   decoding order is switched and the new order does not conform to the
   NAL unit decoding order, the NAL units MUST NOT have the same value
   of DON.  If the order of two consecutive NAL units in the NAL unit
   stream is switched and the new order still conforms to the NAL unit
   decoding order, the NAL units MAY have the same value of DON.
   Consequently, NAL units having the same value of DON can be decoded


Schierl, et al         Expires August 27, 2012               [Page 19]

Internet-Draft       RTP Payload Format for HEVC          February 2012


   in any order, and two NAL units having a different value of DON
   should be passed to the decoder in the order specified above.  When
   two consecutive NAL units in the NAL unit decoding order have a
   different value of DON, the value of DON for the second NAL unit in
   decoding order SHOULD be the value of DON for the first, incremented
   by one.

   An example of the de-packetization process to recover the NAL unit
   decoding order is given in section 7.

      Informative note: Receivers should not expect that the absolute
      difference of values of DON for two consecutive NAL units in the
      NAL unit decoding order will be equal to one, even in error-free
      transmission.  An increment by one is not required, as at the
      time of associating values of DON to NAL units, it may not be
      known whether all NAL units are delivered to the receiver.  For
      example, a gateway may not forward coded slice NAL units of non-
      reference pictures or SEI NAL units when there is a shortage of
      bit rate in the network to which the packets are forwarded.  In
      another example, a live broadcast is interrupted by pre-encoded
      content, such as commercials, from time to time.  The first intra
      picture of a pre-encoded clip is transmitted in advance to ensure
      that it is readily available in the receiver.  When transmitting
      the first intra picture, the originator does not exactly know how
      many NAL units will be encoded before the first intra picture of
      the pre-encoded clip follows in decoding order.  Thus, the values
      of DON for the NAL units of the first intra picture of the pre-
      encoded clip have to be estimated when they are transmitted, and
      gaps in values of DON may occur.

4.7 Aggregation Packets

   Aggregation packets are the NAL unit aggregation scheme of this
   payload specification.  The scheme is introduced to reflect the
   dramatically different MTU sizes of two key target networks:
   wireline IP networks (with an MTU size that is often limited by the
   Ethernet MTU size; roughly 1500 bytes), and IP or non-IP (e.g., ITU-
   T H.324/M) based wireless communication systems with preferred
   transmission unit sizes of 254 bytes or less.  To prevent media
   transcoding between the two worlds, and to avoid undesirable
   packetization overhead, a NAL unit aggregation scheme is introduced.

   The Single-time aggregation packet (STAP) is defined by this
   specification:


Schierl, et al         Expires August 27, 2012               [Page 20]

Internet-Draft       RTP Payload Format for HEVC          February 2012


   o Single-time aggregation packet (STAP): aggregates NAL units with
      identical NALU-time.  Two types of STAPs are defined, one without
      DON (STAP-A) and another including DON (STAP-B).

   Each NAL unit to be carried in an aggregation packet is encapsulated
   in an aggregation unit.  The structure of the RTP payload format for
   aggregation packets is presented in Figure 2.

   0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |F|NRI|  Type   |                                               |
   +-+-+-+-+-+-+-+-+                                               |
   |                                                               |
   |             one or more aggregation units                     |
   |                                                               |
   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                               :...OPTIONAL RTP padding        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

            Figure 2 RTP payload format for aggregation packets

   STAPs do have the following packetization rules:  The type field of
   the NAL unit type octet MUST be set to the appropriate value for
   STAP, as indicated in Table 2.  The F bit MUST be cleared if all F
   bits of the aggregated NAL units are zero; otherwise, it MUST be
   set.  The value of NRI MUST be the maximum of all the NAL units
   carried in the aggregation packet.

   The marker bit in the RTP header is set to the value that the marker
   bit of the last NAL unit of the aggregated packet would have if it
   were transported in its own RTP packet.

   The payload of an aggregation packet consists of one or more
   aggregation units.  See sections 4.7.1 for the single time
   aggregation unit.  An aggregation packet can carry as many
   aggregation units as necessary; however, the total amount of data in
   an aggregation packet obviously MUST fit into an IP packet, and the
   size SHOULD be chosen so that the resulting IP packet is smaller
   than the MTU size.  An aggregation packet MUST NOT contain
   fragmentation units specified in section 4.8.  Aggregation packets
   MUST NOT be nested; i.e., an aggregation packet MUST NOT contain
   another aggregation packet.


Schierl, et al         Expires August 27, 2012               [Page 21]

Internet-Draft       RTP Payload Format for HEVC          February 2012


4.7.1 Single Time Aggregation Packet (STAP)

   Single-time aggregation packet (STAP) SHOULD be used whenever NAL
   units are aggregated that all share the same NALU-time.  The payload
   of an STAP consists of at least one single-time aggregation unit, as
   presented in Figure 3. The payload of an STAP-B consists of a 16-bit
   unsigned decoding order number (DON) (in network byte order)
   followed by at least one single-time aggregation unit, as presented
   in Figure 4.


    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                   :                                               |
   +-+-+-+-+-+-+-+-+                                               |
   |                                                               |
   |                single-time aggregation units                  |
   |                                                               |
   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                               :
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                    Figure 3 Payload format for STAP-A

   0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                   :  decoding order number (DON)  |               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+               |
   |                                                               |
   |                single-time aggregation units                  |
   |                                                               |
   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                               :
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                    Figure 4 Payload format for STAP-B

   The DON field specifies the value of DON for the first NAL unit in
   an STAP-B in transmission order.  For each successive NAL unit in
   appearance order in an STAP-B, the value of DON is equal to (the
   value of DON of the previous NAL unit in the STAP-B + 1) % 65536, in
   which '%' stands for the modulo operation.


Schierl, et al         Expires August 27, 2012               [Page 22]

Internet-Draft       RTP Payload Format for HEVC          February 2012


   A single-time aggregation unit consists of 16-bit unsigned size
   information (in network byte order) that indicates the size of the
   following NAL unit in bytes (excluding these two octets, but
   including the NAL unit type octet of the NAL unit), followed by the
   NAL unit itself, including its NAL unit type byte.  A single-time
   aggregation unit is byte aligned within the RTP payload, but it may
   not be aligned on a 32-bit word boundary.  Figure 5 presents the
   structure of the single-time aggregation unit.

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                   :        NAL unit size          |               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+               |
   |                                                               |
   |                           NAL unit                            |
   |                                                               |
   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                               :
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

         Figure 5 Structure for single-time aggregation unit (STAU)

   Figure 6 presents an example of an RTP packet that contains an STAP-
   A.  The STAP-A contains two single-time aggregation units, labeled
   as 1 and 2 in the figure.

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                          RTP Header                           |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |STAP   NAL HDR |         NALU 1 Size           | NALU 1 HDR    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | NALU 1 HDR    |         NALU 1 Data                           |
   +-+-+-+-+-+-+-+-+                                               |
   :                                                               :
   |               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |               | NALU 2 Size                   | NALU 2 HDR    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | NALU 2 HDR    |         NALU 2 Data                           |
   +-+-+-+-+-+-+-+-+                                               :
   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                               :...OPTIONAL RTP padding        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


Schierl, et al         Expires August 27, 2012               [Page 23]

Internet-Draft       RTP Payload Format for HEVC          February 2012


    Figure 6 An example of an RTP packet including an STAP-A containing
                     two single-time aggregation units

   Figure 7 presents an example of an RTP packet that contains an STAP-
   B.  The STAP contains two single-time aggregation units, labeled as
   1 and 2 in the figure.

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                          RTP Header                           |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |STAP-B NAL HDR | DON                           | NALU 1 Size   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | NALU 1 Size   | NALU 1 HDR                    | NALU 1 Data   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+               +
   :                                                               :
   +               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |               | NALU 2 Size                   | NALU 2 HDR    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | NALU 2 HDR    |        NALU 2 Data                            |
   +-+-+-+-+-+-+-+-+                                               :
   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                               :...OPTIONAL RTP padding        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

    Figure 7 An example of an RTP packet including an STAP-B containing
                     two single-time aggregation units


4.8 Fragmentation Units (FUs)

   This payload type allows fragmenting a NAL unit into several RTP
   packets.  Doing so on the application layer instead of relying on
   lower layer fragmentation (e.g., by IP) may have the following use
   cases:

   o The payload format is capable of transporting NAL units bigger
      than 64 kbytes over an IPv4 network that may be present in pre-
      recorded video, particularly in High Definition formats (there is
      a limit of the number of slices per picture, which results in a
      limit of NAL units per picture, which may result in big NAL
      units).

   o The fragmentation mechanism allows fragmenting a single NAL unit
      and applying generic forward error correction.


Schierl, et al         Expires August 27, 2012               [Page 24]

Internet-Draft       RTP Payload Format for HEVC          February 2012


   Fragmentation is defined only for a single NAL unit and not for any
   aggregation packets.  A fragment of a NAL unit consists of an
   integer number of consecutive octets of that NAL unit.  Each octet
   of the NAL unit MUST be part of exactly one fragment of that NAL
   unit.  Fragments of the same NAL unit MUST be sent in consecutive
   order with ascending RTP sequence numbers (with no other RTP packets
   within the same RTP packet stream being sent between the first and
   last fragment).  Similarly, a NAL unit MUST be reassembled in RTP
   sequence number order.

   When a NAL unit is fragmented and conveyed within fragmentation
   units (FUs), it is referred to as a fragmented NAL unit.  STAPs MUST
   NOT be fragmented.  FUs MUST NOT be nested; i.e., an FU MUST NOT
   contain another FU.

   The RTP timestamp of an RTP packet carrying an FU is set to the
   NALU-time of the fragmented NAL unit.

   Figure 8 presents the RTP payload format for FU-A.  An FU-A consists
   of a fragmentation unit indicator of one octet, a fragmentation unit
   header of one octet, and a fragmentation unit payload.

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | FU   NAL HDR  |   FU header   |                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               |
   |                                                               |
   |                         FU payload                            |
   |                                                               |
   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                               :...OPTIONAL RTP padding        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                  Figure 8   RTP payload format for FU-A

   Figure 9 presents the RTP payload format for FU-Bs.  An FU-B
   consists of a fragmentation unit indicator of one octet, a
   fragmentation unit header of one octet, a decoding order number
   (DON) (in network byte order), and a fragmentation unit payload.  In
   other words, the structure of FU-B is the same as the structure of
   FU-A, except for the additional DON field.


Schierl, et al         Expires August 27, 2012               [Page 25]

Internet-Draft       RTP Payload Format for HEVC          February 2012


    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | FU indicator  |   FU header   |               DON             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-|
   |                                                               |
   |                         FU payload                            |
   |                                                               |
   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                               :...OPTIONAL RTP padding        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                  Figure 9   RTP payload format for FU-B

   NAL unit type FU-B MUST be used in the interleaved packetization
   mode for the first fragmentation unit of a fragmented NAL unit.  NAL
   unit type FU-B MUST NOT be used in any other case.  In other words,
   in the interleaved packetization mode, each NALU that is fragmented
   has an FU-B as the first fragment, followed by one or more FU-A
   fragments.


   The FU NAL HDR octet has the following format:

      +---------------+
      |0|1|2|3|4|5|6|7|
      +-+-+-+-+-+-+-+-+
      |F|N|    Type   |
      +---------------+

   A value equal to 26 in the Type field of the FU indicator octet
   identifies an FU-A packet and a value of 27 identifies an FU-B
   packet.  The use of the F bit is described in section 5.  The value
   of the N field MUST be set according to the value of the N field in
   the fragmented NAL unit.

   The FU header has the following format:

      +---------------+
      |0|1|2|3|4|5|6|7|
      +-+-+-+-+-+-+-+-+
      |S|E|    Type   |
      +---------------+

   S: 1 bit
      When set to one, the Start bit indicates the start of a


Schierl, et al         Expires August 27, 2012               [Page 26]

Internet-Draft       RTP Payload Format for HEVC          February 2012


      fragmented NAL unit.  When the following FU payload is not the
      start of a fragmented NAL unit payload, the Start bit is set to
      zero.

   E: 1 bit
      When set to one, the End bit indicates the end of a fragmented
      NAL unit, i.e., the last byte of the payload is also the last
      byte of the fragmented NAL unit.  When the following FU payload
      is not the last fragment of a fragmented NAL unit, the End bit is
      set to zero.

   Type: 6 bits
      The NAL unit payload type as defined in Table 7-1 of [HEVC].

   The value of DON in FU-Bs is selected as described in section 4.6.

      Informative note: The DON field in FU-Bs allows gateways to
      fragment NAL units to FU-Bs without organizing the incoming NAL
      units to the NAL unit decoding order.

   A fragmented NAL unit MUST NOT be transmitted in one FU; i.e., the
   Start bit and End bit MUST NOT both be set to one in the same FU
   header.

   The FU payload consists of fragments of the payload of the
   fragmented NAL unit so that if the fragmentation unit payloads of
   consecutive FUs are sequentially concatenated, the payload of the
   fragmented NAL unit can be reconstructed.  The NAL unit type octet
   of the fragmented NAL unit is not included as such in the
   fragmentation unit payload, but rather the information of the NAL
   unit type octet of the fragmented NAL unit is conveyed in F and N
   fields of the FU indicator octet of the fragmentation unit and in
   the type field of the FU header.  An FU payload MAY have any number
   of octets and MAY be empty.

   If a fragmentation unit is lost, the receiver SHOULD discard all
   following fragmentation units in transmission order corresponding to
   the same fragmented NAL unit.

   A receiver in an endpoint or in a MANE MAY aggregate the first n-1
   fragments of a NAL unit to an (incomplete) NAL unit, even if
   fragment n of that NAL unit is not received.  In this case, the
   forbidden_zero_bit of the NAL unit MUST be set to one to indicate a
   syntax violation.


Schierl, et al         Expires August 27, 2012               [Page 27]

Internet-Draft       RTP Payload Format for HEVC          February 2012


5. Packetization Rules

   The packetization modes are introduced in section 4.5.  The
   packetization rules common to more than one of the packetization
   modes are specified in section 5.1.  The packetization rules for the
   non-interleaved mode are specified in section 5.2, and the
   packetization rules for the interleaved mode are specified in
   sections 5.3.


5.1 Common Packetization Rules

   All senders MUST enforce the following packetization rules
   regardless of the packetization mode in use:

   o VCL NAL units belonging to the same coded picture (and thus
      sharing the same RTP timestamp value) SHOULD be sent in their
      original decoding order to minimize the delay.  Note that the
      decoding order is the order of the NAL units in the bitstream.

   o Parameter sets are handled in accordance with the rules and
      recommendations given in section 7.4.

   o MANEs MUST NOT duplicate any NAL unit except for sequence or
      picture parameter set NAL units, as neither this memo nor the
      HEVC specification provides means to identify duplicated NAL
      units.  Sequence and picture parameter set NAL units MAY be
      duplicated to make their correct reception more probable, but any
      such duplication MUST NOT affect the contents of any active
      sequence or picture parameter set.  Duplication SHOULD be
      performed on the application layer and not by duplicating RTP
      packets (with identical sequence numbers).

   Senders using the non-interleaved mode and the interleaved mode MUST
   enforce the following packetization rule:

   o MANEs MAY convert single NAL unit packets into one aggregation
      packet, convert an aggregation packet into several single NAL
      unit packets, or mix both concepts, in an RTP translator.  The
      RTP translator SHOULD take into account at least the following
      parameters: path MTU size, unequal protection mechanisms (e.g.,
      through packet-based FEC according to [RFC5109], especially for
      sequence and picture parameter set NAL units and coded slice data
      partition A NAL units), bearable latency of the system, and
      buffering capabilities of the receiver.


Schierl, et al         Expires August 27, 2012               [Page 28]

Internet-Draft       RTP Payload Format for HEVC          February 2012


         Informative note: An RTP translator is required to handle RTCP
         as per [RFC3550].


5.2 Non-Interleaved mode

   This mode MUST be supported.  This mode is in use when the value of
   the OPTIONAL packetization-mode media type parameter is equal to 1.
   It is primarily intended for low-delay applications.  Only single
   NAL unit packets, STAPs, and FUs MAY be used in this mode.  The
   transmission order of NAL units MUST comply with the NAL unit
   decoding order.

5.3 Interleaved mode

   This mode is in use when the value of the OPTIONAL packetization-
   mode media type parameter is equal to 2.  Some receivers MAY support
   this mode.  STAP-Bs, FU-As, and FU-Bs MAY be used.  STAP-As and
   single NAL unit packets MUST NOT be used.  The transmission order of
   packets and NAL units is constrained as specified in section 4.6.


6. De-Packetization Process

   The de-packetization process is implementation dependent.
   Therefore, the following description should be seen as an example of
   a suitable implementation.  Other schemes may be used as well as
   long as the output for the same input is the same as the process
   described below.  The output is the same meaning that the number of
   NAL units and their order are both the identical.  Optimizations
   relative to the described algorithms are likely possible.  Section
   6.1 presents the de-packetization process for the non-interleaved
   packetization mode and section 6.2 presents the de-packetization
   process for the interleaved packetization mode.

   All normal RTP mechanisms related to buffer management apply.  In
   particular, duplicated or outdated RTP packets (as indicated by the
   RTP sequences number and the RTP timestamp) are removed.  To
   determine the exact time for decoding, factors such as a possible
   intentional delay to allow for proper inter-stream synchronization
   must be factored in.


Schierl, et al         Expires August 27, 2012               [Page 29]

Internet-Draft       RTP Payload Format for HEVC          February 2012


6.1 Non-Interleaved Mode

   The receiver includes a receiver buffer to compensate for
   transmission delay jitter.  The receiver stores incoming packets in
   reception order into the receiver buffer.  Packets are de-packetized
   in RTP sequence number order.  If a de-packetized packet is a single
   NAL unit packet, the NAL unit contained in the packet is passed
   directly to the decoder.  If a de-packetized packet is an STAP-A,
   the NAL units contained in the packet are passed to the decoder in
   the order in which they are encapsulated in the packet.  For all the
   FU-A packets containing fragments of a single NAL unit, the de-
   packetized fragments are concatenated in their sending order to
   recover the NAL unit, which is then passed to the decoder.

6.2 Interleaved Mode

   The general concept behind these de-packetization rules is to
   reorder NAL units from transmission order to the NAL unit decoding
   order.

   The receiver includes a receiver buffer, which is used to compensate
   for transmission delay jitter and to reorder NAL units from
   transmission order to the NAL unit decoding order.  In this section,
   the receiver operation is described under the assumption that there
   is no transmission delay jitter.  To make a difference from a
   practical receiver buffer that is also used for compensation of
   transmission delay jitter, the receiver buffer is here after called
   the de-interleaving buffer in this section.  Receivers SHOULD also
   prepare for transmission delay jitter; i.e., either reserve separate
   buffers for transmission delay jitter buffering and de-interleaving
   buffering or use a receiver buffer for both transmission delay
   jitter and de-interleaving.  Moreover, receivers SHOULD take
   transmission delay jitter into account in the buffering operation;
   e.g., by additional initial buffering before starting of decoding
   and playback.

   This section is organized as follows: subsection 6.2.1 presents how
   to calculate the size of the de-interleaving buffer.  Subsection
   6.2.2 specifies the receiver process how to organize received NAL
   units to the NAL unit decoding order.

6.2.1 Size of the De-interleaving Buffer

   When the SDP Offer/Answer model or any other capability exchange
   procedure is used in session setup, the properties of the received
   stream SHOULD be such that the receiver capabilities are not
   exceeded.  In the SDP Offer/Answer model, the receiver can indicate


Schierl, et al         Expires August 27, 2012               [Page 30]

Internet-Draft       RTP Payload Format for HEVC          February 2012


   its capabilities to allocate a de-interleaving buffer with the
   deint-buf-cap media type parameter.  The sender indicates the
   requirement for the de-interleaving buffer size with the sprop-
   deint-buf-req media type parameter.  It is therefore RECOMMENDED to
   set the de-interleaving buffer size, in terms of number of bytes,
   equal to or greater than the value of sprop-deint-buf-req media type
   parameter.  See section 8.1 for further information on deint-buf-cap
   and sprop-deint-buf-req media type parameters and section 8.2.2 for
   further information on their use in the SDP Offer/Answer model.

   When a declarative session description is used in session setup, the
   sprop-deint-buf-req media type parameter signals the requirement for
   the de-interleaving buffer size.  It is therefore RECOMMENDED to set
   the de-interleaving buffer size, in terms of number of bytes, equal
   to or greater than the value of sprop-deint-buf-req media type
   parameter.

6.2.2 De-interleaving Process

   There are two buffering states in the receiver: initial buffering
   and buffering while playing.  Initial buffering occurs when the RTP
   session is initialized.  After initial buffering, decoding and
   playback are started, and the buffering-while-playing mode is used.

   Regardless of the buffering state, the receiver stores incoming NAL
   units, in reception order, in the de-interleaving buffer as follows.
   NAL units of aggregation packets are stored in the de-interleaving
   buffer individually.  The value of DON is calculated and stored for
   each NAL unit.

   The receiver operation is described below with the help of the
   following functions and constants:

   o Function AbsDON is specified in section 7.1.

   o Function don_diff is specified in section 4.6.

   o Constant N is the value of the OPTIONAL sprop-interleaving-depth
      media type type parameter (see section 7.1) incremented by 1.

   Initial buffering lasts until one of the following conditions is
   fulfilled:

   o There are N or more VCL NAL units in the de-interleaving buffer.


Schierl, et al         Expires August 27, 2012               [Page 31]

Internet-Draft       RTP Payload Format for HEVC          February 2012


   o If sprop-max-don-diff is present, don_diff(m,n) is greater than
      the value of sprop-max-don-diff, in which n corresponds to the
      NAL unit having the greatest value of AbsDON among the received
      NAL units and m corresponds to the NAL unit having the smallest
      value of AbsDON among the received NAL units.

   o Initial buffering has lasted for the duration equal to or greater
      than the value of the OPTIONAL sprop-init-buf-time media type
      parameter.

   The NAL units to be removed from the de-interleaving buffer are
   determined as follows:

   o If the de-interleaving buffer contains at least N VCL NAL units,
      NAL units are removed from the de-interleaving buffer and passed
      to the decoder in the order specified below until the buffer
      contains N-1 VCL NAL units.

   o If sprop-max-don-diff is present, all NAL units m for which
      don_diff(m,n) is greater than sprop-max-don-diff are removed from
      the de-interleaving buffer and passed to the decoder in the order
      specified below.  Herein, n corresponds to the NAL unit having
      the greatest value of AbsDON among the NAL units in the de-
      interleaving buffer.

   The order in which NAL units are passed to the decoder is specified
   as follows:

   o Let PDON be a variable that is initialized to 0 at the beginning
      of the RTP session.

   o For each NAL unit associated with a value of DON, a DON distance
      is calculated as follows.  If the value of DON of the NAL unit is
      larger than the value of PDON, the DON distance is equal to DON -
      PDON.  Otherwise, the DON distance is equal to 65535 - PDON + DON
      + 1.

   o NAL units are delivered to the decoder in ascending order of DON
      distance.  If several NAL units share the same value of DON
      distance, they can be passed to the decoder in any order.

   o When a desired number of NAL units have been passed to the
      decoder, the value of PDON is set to the value of DON for the
      last NAL unit passed to the decoder.


Schierl, et al         Expires August 27, 2012               [Page 32]

Internet-Draft       RTP Payload Format for HEVC          February 2012


6.3 Additional De-Packetization Guidelines

   The following additional de-packetization rules may be used to
   implement an operational HEVC de-packetizer:

   o Intelligent RTP receivers (e.g., in gateways) may identify lost
      FUs.  If a lost FU is found, a gateway may decide not to send the
      following FUs of the same fragmented NAL unit, as their
      information is meaningless for HEVC decoders.  In this way a MANE
      can reduce network load by discarding useless packets without
      parsing a complex bitstream.

   o Intelligent receivers having to discard packets or NALUs should
      first discard all packets/NALUs in which the value of the NRI
      field of the NAL unit type octet is equal to 0.  This will
      minimize the impact on user experience and keep the reference
      pictures intact.  If more packets have to be discarded, then
      packets with a NRI value equal to zero may be discarded before
      packets with a a higher NRI value.  However, discarding any
      packets with an NRI not equal to zero very likely leads to
      decoder drift and SHOULD be avoided.


7. Payload Format Parameters

   This section specifies the parameters that MAY be used to select
   optional features of the payload format and certain features of the
   bitstream.  The parameters are specified here as part of the media
   type registration for the HEVC codec.  A mapping of the parameters
   into the Session Description Protocol (SDP) [RFC4566] is also
   provided for applications that use SDP.  Equivalent parameters could
   be defined elsewhere for use with control protocols that do not use
   SDP.

   Some parameters provide a receiver with the properties of the stream
   that will be sent.  The names of all these parameters start with
   "sprop" for stream properties.  Some of these "sprop" parameters are
   limited by other payload or codec configuration parameters.  For
   example, the sprop-parameter-sets parameter is constrained by the
   profile-level-id parameter.  The media sender selects all "sprop"
   parameters rather than the receiver.  This uncommon characteristic
   of the "sprop" parameters may be incompatible with some signaling
   protocol concepts, in which case the use of these parameters SHOULD
   be avoided.


Schierl, et al         Expires August 27, 2012               [Page 33]

Internet-Draft       RTP Payload Format for HEVC          February 2012


7.1 Media Type Registration

   The media subtype for the HEVC codec is allocated from the IETF
   tree.

   The receiver MUST ignore any unspecified parameter.

   Media Type name:     video

   Media subtype name:  H265

   Required parameters: none

   OPTIONAL parameters:

      In the following definitions of parameters, "the stream" or "the
      NAL unit stream" refers to all NAL units conveyed in the current
      RTP session in SST, and all NAL units conveyed in the current RTP
      session and all NAL units conveyed in other RTP sessions that the
      current RTP session depends on in MST.

      profile-level-id:

      TBD

      sprop-parameter-sets:

      TBD

      max-mbps, max-smbps, max-fs, max-cpb, max-dpb, and max-br:
         TBD

      max-mbps:
         TBD

      max-smbps:
         TBD

      max-fs:
         TBD

      max-cpb:
         TBD

      max-dpb:
         TBD


Schierl, et al         Expires August 27, 2012               [Page 34]

Internet-Draft       RTP Payload Format for HEVC          February 2012


      max-br:
         TBD

      redundant-pic-cap:
         TBD

      sprop-level-parameter-sets:
         TBD

      use-level-src-parameter-sets:
         TBD

      packetization-mode:
         This parameter signals the properties of an RTP payload type
         or the capabilities of a receiver implementation.  Only a
         single configuration point can be indicated; thus, when
         capabilities to support more than one packetization-mode are
         declared, multiple configuration points (RTP payload types)
         must be used.

         When the value of packetization-mode is equal to 1, the non-
         interleaved mode, as defined in section 5.2 MUST be used.
         When the value of packetization-mode is equal to 2, the
         interleaved mode, as defined in section 5.3, MUST be used.
         The value of packetization-mode MUST be an integer in the
         range of 1 to 2, inclusive.

      sprop-interleaving-depth:
         This parameter MUST NOT be present when packetization-mode is
         not present or the value of packetization-mode is equal to 0
         or 1.  This parameter MUST be present when the value of
         packetization-mode is equal to 2.

         This parameter signals the properties of an RTP packet stream.
         It specifies the maximum number of VCL NAL units that precede
         any VCL NAL unit in the RTP packet stream in transmission
         order and follow the VCL NAL unit in decoding order.
         Consequently, it is guaranteed that receivers can reconstruct
         NAL unit decoding order when the buffer size for NAL unit
         decoding order recovery is at least the value of sprop-
         interleaving-depth + 1 in terms of VCL NAL units.

         The value of sprop-interleaving-depth MUST be an integer in
         the range of 0 to 32767, inclusive.

      sprop-deint-buf-req:
         This parameter MUST NOT be present when packetization-mode is


Schierl, et al         Expires August 27, 2012               [Page 35]

Internet-Draft       RTP Payload Format for HEVC          February 2012


         not present or the value of packetization-mode is not equal to
         2.  It MUST be present when the value of packetization-mode is
         equal to 2.

         sprop-deint-buf-req signals the required size of the de-
         interleaving buffer for the RTP packet stream.  The value of
         the parameter MUST be greater than or equal to the maximum
         buffer occupancy (in units of bytes) required in such a de-
         interleaving buffer that is specified in section 6.2.  It is
         guaranteed that receivers can perform the de-interleaving of
         interleaved NAL units into NAL unit decoding order, when the
         de-interleaving buffer size is at least the value of sprop-
         deint-buf-req in terms of bytes.

         The value of sprop-deint-buf-req MUST be an integer in the
         range of 0 to 4294967295, inclusive.

             Informative note: sprop-deint-buf-req indicates the
             required size of the de-interleaving buffer only.  When
             network jitter can occur, an appropriately sized jitter
             buffer has to be provisioned for as well.

      deint-buf-cap:
         This parameter signals the capabilities of a receiver
         implementation and indicates the amount of de-interleaving
         buffer space in units of bytes that the receiver has available
         for reconstructing the NAL unit decoding order.  A receiver is
         able to handle any stream for which the value of the sprop-
         deint-buf-req parameter is smaller than or equal to this
         parameter.

         If the parameter is not present, then a value of 0 MUST be
         used for deint-buf-cap.  The value of deint-buf-cap MUST be an
         integer in the range of 0 to 4294967295, inclusive.

             Informative note: deint-buf-cap indicates the maximum
             possible size of the de-interleaving buffer of the receiver
             only.  When network jitter can occur, an appropriately
             sized jitter buffer has to be provisioned for as well.

      sprop-init-buf-time:
         This parameter MAY be used to signal the properties of an RTP
         packet stream.  The parameter MUST NOT be present, if the
         value of packetization-mode is equal to 1.

         The parameter signals the initial buffering time that a
         receiver MUST wait before starting decoding to recover the NAL


Schierl, et al         Expires August 27, 2012               [Page 36]

Internet-Draft       RTP Payload Format for HEVC          February 2012


         unit decoding order from the transmission order.  The
         parameter is the maximum value of (decoding time of the NAL
         unit - transmission time of a NAL unit), assuming reliable and
         instantaneous transmission, the same timeline for transmission
         and decoding, and that decoding starts when the first packet
         arrives.

         An example of specifying the value of sprop-init-buf-time
         follows.  A NAL unit stream is sent in the following
         interleaved order, in which the value corresponds to the
         decoding time and the transmission order is from left to
         right:

             0  2  1  3  5  4  6  8  7 ...

         Assuming a steady transmission rate of NAL units, the
         transmission times are:

             0  1  2  3  4  5  6  7  8 ...

         Subtracting the decoding time from the transmission time
         column-wise results in the following series:

             0 -1  1  0 -1  1  0 -1  1 ...

         Thus, in terms of intervals of NAL unit transmission times,
         the value of sprop-init-buf-time in this example is 1.  The
         parameter is coded as a non-negative base10 integer
         representation in clock ticks of a 90-kHz clock.  If the
         parameter is not present, then no initial buffering time value
         is defined.  Otherwise the value of sprop-init-buf-time MUST
         be an integer in the range of 0 to 4294967295, inclusive.

         In addition to the signaled sprop-init-buf-time, receivers
         SHOULD take into account the transmission delay jitter
         buffering, including buffering for the delay jitter caused by
         mixers, translators, gateways, proxies, traffic-shapers, and
         other network elements.

      sprop-max-don-diff:
         This parameter MAY be used to signal the properties of an RTP
         packet stream.  It MUST NOT be used to signal transmitter or
         receiver or codec capabilities.  The parameter MUST NOT be
         present if the value of packetization-mode is equal to 1.
         sprop-max-don-diff is an integer in the range of 0 to 32767,
         inclusive.  If sprop-max-don-diff is not present, the value of


Schierl, et al         Expires August 27, 2012               [Page 37]

Internet-Draft       RTP Payload Format for HEVC          February 2012


         the parameter is unspecified.  sprop-max-don-diff is
         calculated as follows:

             sprop-max-don-diff = max{AbsDON(i) - AbsDON(j)},
             for any i and any j>i,

         where i and j indicate the index of the NAL unit in the
         transmission order and AbsDON denotes a decoding order number
         of the NAL unit that does not wrap around to 0 after 65535.
         In other words, AbsDON is calculated as follows: Let m and n
         be consecutive NAL units in transmission order.  For the very
         first NAL unit in transmission order (whose index is 0),
         AbsDON(0) = DON(0).  For other NAL units, AbsDON is calculated
         as follows:

             If DON(m) == DON(n), AbsDON(n) = AbsDON(m)

             If (DON(m) < DON(n) and DON(n) - DON(m) < 32768),
               AbsDON(n) = AbsDON(m) + DON(n) - DON(m)

             If (DON(m) > DON(n) and DON(m) - DON(n) >= 32768),
               AbsDON(n) = AbsDON(m) + 65536 - DON(m) + DON(n)

             If (DON(m) < DON(n) and DON(n) - DON(m) >= 32768),
               AbsDON(n) = AbsDON(m) - (DON(m) + 65536 - DON(n))

             If (DON(m) > DON(n) and DON(m) - DON(n) < 32768),
               AbsDON(n) = AbsDON(m) - (DON(m) - DON(n))

         where DON(i) is the decoding order number of the NAL unit
         having index i in the transmission order.  The decoding order
         number is specified in section 4.6.

             Informative note: Receivers may use sprop-max-don-diff to
             trigger which NAL units in the receiver buffer can be
             passed to the decoder.

      max-rcmd-nalu-size:
         TBD

      sar-understood:
         TBD

      sar-supported:
         TBD


Schierl, et al         Expires August 27, 2012               [Page 38]

Internet-Draft       RTP Payload Format for HEVC          February 2012


      Encoding considerations:
         This type is only defined for transfer via RTP (RFC 3550).

      Security considerations:
         See Section 8 of RFC XXXX.

      Public specification:
         Please refer to Section 13 of RFC XXXX.

      Additional information:
         None

      File extensions:     none

      Macintosh file type code: none

      Object identifier or OID: none

      Person & email address to contact for further information:

        Thomas Schierl, ts@thomas-schierl.de

      Intended usage:      COMMON

      Author:

        Thomas Schierl, ts@thomas-schierl.de

      Change controller:
         IETF Audio/Video Transport Payloads working group delegated
         from the IESG.

7.2 SDP Parameters

7.2.1 Mapping of Payload Type Parameters to SDP

   TBD

7.2.2 Usage with the SDP Offer/Answer Model

   The media type video/H265 string is mapped to fields in the Session
   Description Protocol (SDP) [RFC4566] as follows:

   o The media name in the "m=" line of SDP MUST be video.

   o The encoding name in the "a=rtpmap" line of SDP MUST be H265 (the
      media subtype).


Schierl, et al         Expires August 27, 2012               [Page 39]

Internet-Draft       RTP Payload Format for HEVC          February 2012


   o The clock rate in the "a=rtpmap" line MUST be 90000.

   o The OPTIONAL parameters "profile-level-id", "packetization-mode",
      when present, MUST be included in the "a=fmtp" line of SDP.
      These parameters are expressed as a media type string, in the
      form of a semicolon separated list of parameter=value pairs.

   o The OPTIONAL parameters "sprop-parameter-sets" and "sprop-level-
      parameter-sets", when present, MUST be included in the "a=fmtp"
      line of SDP or conveyed using the "fmtp" source attribute as
      specified in section 6.3 of [RFC5576].  For a particular media
      format (i.e., RTP payload type), a "sprop-parameter-sets" or
      "sprop-level-parameter-sets" MUST NOT be both included in the
      "a=fmtp" line of SDP and conveyed using the "fmtp" source
      attribute.  When included in the "a=fmtp" line of SDP, these
      parameters are expressed as a media type string, in the form of a
      semicolon separated list of parameter=value pairs.  When conveyed
      using the "fmtp" source attribute, these parameters are only
      associated with the given source and payload type as parts of the
      "fmtp" source attribute.

         Informative note: Conveyance of "sprop-parameter-sets" and
         "sprop-level-parameter-sets" using the "fmtp" source attribute
         allows for out-of-band transport of parameter sets in
         topologies like Topo-Video-switch-MCU [TBD].

   An example of media representation in SDP is as follows:

   m=video 49170 RTP/AVP 98
   a=rtpmap:98 H265/90000
   a=fmtp:98 profile-level-id=UVWXYZ;
             packetization-mode=1;
             sprop-parameter-sets=<parameter sets data>

7.2.3 Usage with SDP Offer/Answer Model

   TBD

7.2.4 Usage in Declarative Session Descriptions

   TBD

7.2.5 Signaling of Parallel Processing

   TBD


Schierl, et al         Expires August 27, 2012               [Page 40]

Internet-Draft       RTP Payload Format for HEVC          February 2012


7.3 Examples

   TBD.

7.4 Parameter Set Considerations

   TBD

8. Security Considerations

   TBD

9. Congestion Control

   TBD

10. IANA Consideration

   A new media type, as specified in Section 7.1 of this memo, should
   be registered with IANA.

11. Informative Appendix: Application Examples

11.1 Introduction

   TBD

11.2 Streaming

   TBD

11.3 Videoconferencing (Unicast to MANE, Unicast to Endpoints)

   TBD

11.4 Mobile TV (Multicast to MANE, Unicast to Endpoint)

   TBD

12. Acknowledgements

   TBD

   This document was prepared using 2-Word-v2.0.template.dot.


Schierl, et al         Expires August 27, 2012               [Page 41]

Internet-Draft       RTP Payload Format for HEVC          February 2012


13. References

13.1 Normative References

   [HEVC]   JCT-VC, "High-Efficiency Video Coding (HEVC) text
             specification Working Draft 6", JCTVC-H1003, February
             2012.

   [H.264]  ITU-T Recommendation H.264, "Advanced video coding for
             generic audiovisual services", March 2010.

   [RFC6184] Wang, Y.-K., Even, R., Kristensen, T., and R. Jesup, "RTP
             Payload Format for H.264 Video", RFC 6184, May 2011.

   [RFC6190] Wenger, S., Wang, Y.-K., Schierl, T., and A.
             Eleftheriadis, "RTP Payload Format for Scalable Video
             Coding", RFC 6190, May 2011.

   [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
             Requirement Levels", BCP 14, RFC 2119, March 1997.

   [RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model
             With Session Description Protocol (SDP)", RFC 3264, June
             2002.

   [RFC4648] Josefsson, S., "The Base16, Base32, and Base64 Data
             Encodings", RFC 4648, October 2006.

   [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and Jacobson,
             V., "RTP: A Transport Protocol for Real-Time
             Applications", STD 64, RFC 3550, July 2003.

   [RFC4566] Handley, M., Jacobson, V., and Perkins, C., "SDP: Session
             Description Protocol", RFC 4566, July 2006.

   [RFC5576] Lennox, J., Ott, J., and Schierl, T., "Source-Specific
             Media Attributes in the Session Description Protocol", RFC
             5576, June 2009.

13.2 Informative References

   [RFC5109] Li, A., "RTP Payload Format for Generic Forward Error
             Correction", RFC 5109, December 2007.


Schierl, et al         Expires August 27, 2012               [Page 42]

Internet-Draft       RTP Payload Format for HEVC          February 2012


14. Authors' Addresses

   Thomas Schierl
   Fraunhofer HHI
   Einsteinufer 37
   D-10587 Berlin
   Germany
   Phone: +49-30-31002-227
   EMail: ts@thomas-schierl.de

   Stephan Wenger
   Vidyo, Inc.
   433 Hackensack Ave., 7th floor
   Hackensack, N.J. 07601
   USA
   Phone: +1-415-713-5473
   EMail: stewe@stewe.org

   Ye-Kui Wang
   Qualcomm Incorporated
   5775 Morehouse Drive
   San Diego, CA 92121
   USA
   Phone: +1-858-651-8345
   EMail: yekuiw@qualcomm.com

   Miska M. Hannuksela
   Nokia Corporation
   P.O. Box 1000
   33721 Tampere
   Finland
   Phone: +358-7180-08000
   EMail: miska.hannuksela@nokia.com


Schierl, et al         Expires August 27, 2012               [Page 43]