Network Working Group                                    Johan Sjoberg
 INTERNET-DRAFT                                       Magnus Westerlund
 Expires: June 2005                                            Ericsson
                                                          Ari Lakaniemi
                                                                  Nokia
                                                      December 17, 2004


            Real-Time Transport Protocol (RTP) Payload Format for
                 Extended AMR Wideband (AMR-WB+) Audio Codec
                    <draft-ietf-avt-rtp-amrwbplus-04.txt>


 Status of this memo

    By submitting this Internet-Draft, each author represents that
    any applicable patent or other IPR claims of which he or she is
    aware have been or will be disclosed, and any of which he or she
    becomes aware will be disclosed, in accordance with Section 6 of
    RFC 3668.

    Internet-Drafts are working documents of the Internet Engineering
    Task Force (IETF), its areas, and its working groups.  Note that
    other groups may also distribute working documents as Internet-
    Drafts.

    Internet-Drafts are draft documents valid for a maximum of six
    months and may be updated, replaced, or obsoleted by other documents
    at any time.  It is inappropriate to use Internet-Drafts as
    reference material or to cite them other than as "work in progress."

    The list of current Internet-Drafts can be accessed at
    http://www.ietf.org/1id-abstracts.txt

    The list of Internet-Draft Shadow Directories can be accessed at
    http://www.ietf.org/shadow.html

    This document is a submission of the IETF AVT WG.  Comments should
    be directed to the AVT WG mailing list, avt@ietf.org.


 Abstract

    This document specifies a real-time transport protocol (RTP) payload
    format to be used for Extended AMR Wideband (AMR-WB+) encoded audio
    signals.  The AMR-WB+ codec is an audio extension of the AMR-WB
    codec providing additional frame types designed to give higher
    quality of music and speech than the original frame types.  A media
    type registration is included for AMR-WB+.


 Sjoberg, et. al.                                              [Page 1]

 INTERNET-DRAFT       RTP payload format for AMR-WB+      Dec 17, 2004


 TABLE OF CONTENTS

 1. Definitions.....................................................3
    1.1. Glossary...................................................3
    1.2. Terminology................................................3
 2. Introduction....................................................3
 3. Background on AMR-WB+ and Design Principles.....................4
    3.1. The AMR-WB+ Audio Codec....................................5
    3.2. Multi-rate Encoding and Rate Adaptation....................7
    3.3. Voice Activity Detection and Discontinuous Transmission....8
    3.4. Support for Multi-Channel Session..........................8
    3.5. Unequal Bit-error Detection and Protection.................8
    3.6. Robustness against Packet Loss.............................9
       3.6.1. Use of Forward Error Correction (FEC).................9
       3.6.2. Use of Frame Interleaving............................10
    3.7. AMR-WB+ Audio over IP scenarios...........................11
 4. RTP Payload Format for AMR-WB+.................................12
    4.1. RTP Header Usage..........................................13
    4.2. Payload Structure.........................................13
    4.3. Payload definitions.......................................14
       4.3.1. The Payload Table of Contents........................14
       4.3.2. Audio Data...........................................20
       4.3.3. Methods for Forming the Payload......................20
       4.3.4. Payload Examples.....................................20
    4.4. Interleaving Considerations...............................23
    4.5. Implementation Considerations.............................23
       4.5.1. ISF recovery when frames are lost....................24
 5. Congestion Control.............................................26
 6. Security Considerations........................................26
    6.1. Confidentiality...........................................27
    6.2. Authentication and Integrity..............................27
    6.3. Decoding Validation.......................................27
 7. Payload Format Parameters......................................27
    7.1. Media Type Registration...................................28
    7.2. Mapping Media Type Parameters into SDP....................29
       7.2.1. Offer-Answer Model Considerations....................30
       7.2.2. Examples.............................................31
 8. IANA Considerations............................................32
 9. Contributors...................................................32
 10. Acknowledgements..............................................32
 11. References....................................................32
    11.1. Normative references.....................................32
    11.2. Informative References...................................33
 12. Authors' Addresses............................................34
 13. IPR Notice....................................................34
 14. Copyright Notice..............................................35
 15. Changes.......................................................35


 Sjoberg, et. al.            Standards Track                  [Page 2]

 INTERNET-DRAFT       RTP payload format for AMR-WB+      Dec 17, 2004


 1. Definitions

 1.1. Glossary

    3GPP    - the Third Generation Partnership Project
    AMR     - Adaptive Multi-Rate Codec
    AMR-WB  - Adaptive Multi-Rate Wideband Codec
    AMR-WB+ - Extended Adaptive Multi-Rate Wideband Codec
    CMR     - Codec Mode Request
    CN      - Comfort Noise
    DTX     - Discontinuous Transmission
    FEC     - Forward Error Correction
    FT      - Frame Type
    ISF     - Internal Sampling Frequency
    SCR     - Source Controlled Rate Operation
    SID     - Silence Indicator (the frames containing only CN
              parameters)
    TFI     - Transport Frame Index
    TS      - Timestamp
    VAD     - Voice Activity Detection
    UED     - Unequal Error Detection
    UEP     - Unequal Error Protection


 1.2. Terminology

    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
    "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in
    this document are to be interpreted as described in RFC 2119 [2].


 2. Introduction

    This document specifies the payload format for packetization of
    Extended Adaptive Multi-Rate Wideband (AMR-WB+) [1] encoded audio
    signals into the Real-time Transport Protocol (RTP) [3].  The
    payload format supports transmission of mono or stereo audio,
    aggregating multiple frames per payload, and mechanisms enhancing
    robustness against packet loss.

    AMR-WB+ codec is an extension to the Adaptive Multi-Rate Wideband
    (AMR-WB).  The new features include extended audio bandwidth to
    enable high quality also for music, native support also for
    stereophonic audio and the possibility to operate on different
    internal sampling frequencies (ISFs).  The primary usage scenario
    for AMR-WB+ is transport over IP and therefore AMR-WB-like need for
    interworking with other transport networks is not necessary.

    AMR-WB+ is expected to mainly be used in streaming applications and
    the benefit of using an octet-aligned payload format to make the
    packetization process on a streaming server as efficient as possible


 Sjoberg, et. al.            Standards Track                  [Page 3]

 INTERNET-DRAFT       RTP payload format for AMR-WB+      Dec 17, 2004


    is seen substantial.  Therefore, the bandwidth efficient mode as
    defined for AMR-WB in [7] is not specified for AMR-WB+; the saved
    bandwidth using bandwidth efficient mode would anyway be very small,
    since all extension frame types already are octet aligned at the
    encoder output.

    The stereo encoding capability makes the support for multi-channel
    transport at RTP payload format level, as specified for AMR-WB,
    obsolete and therefore this feature is not included for the AMR-WB+
    RTP payload format.  Due to all these changes, and the different
    scope of the AMR-WB+ codec this formats defines a new significantly
    different RTP payload format compared to the ones for AMR and AMR-WB
    [7].

    There is no file format for AMR-WB+ defined within this
    specification.  Instead the 3GPP defined ISO based 3GP file format
    [14] supports AMR-WB+, and provides all functionality required from
    a file format.  This format does also support storage of AMR and
    AMR-WB, plus other multi-media formats allowing for synchronized
    playback.

    The rest of the document is organized in the following way.
    Background on AMR-WB+ and design principles can be found in Section
    3.  The payload format itself is specified in Section 4 and follows
    the principles used in [3] and [9].  In Section 7, a media type
    registration is provided.


 3. Background on AMR-WB+ and Design Principles

    The Extended Adaptive Multi-Rate Wideband (AMR-WB+) [1] audio codec
    is designed for compression of speech and audio signals achieving
    low bit-rate with good quality.  The codec is specified by 3GPP, and
    primary target applications within 3GPP are packet-switched
    streaming service (PSS) [13] and multimedia messaging service (MMS).
    However, due to its flexibility and robustness, AMR-WB+ is very well
    suited for streaming services in highly varying transport
    environments, e.g. the Internet.

    Some of the options of the payload format remain constant throughout
    a session, and therefore can be controlled/negotiated at the session
    set-up.  These options and variables are described in general terms
    at appropriate points in the text of this specification as
    parameters to be established through out-of-band means.  In Section
    7, all of the parameters are specified in the form of media type
    registration for the AMR-WB+ encoding.  The method used to signal
    these parameters at session setup or to arrange prior agreement of
    the participants is beyond the scope of this document; however,
    Section 7.2 provides a mapping of the parameters into the Session
    Description Protocol (SDP) [6] for those applications that use SDP.


 Sjoberg, et. al.            Standards Track                  [Page 4]

 INTERNET-DRAFT       RTP payload format for AMR-WB+      Dec 17, 2004


 3.1. The AMR-WB+ Audio Codec

    The AMR-WB+ audio codec was originally developed by 3GPP to be used
    for streaming and messaging services in GSM and 3G cellular systems.
    AMR-WB+ is designed as an audio extension to the AMR-WB speech
    codec.  The extension adds new functionality to the codec in order
    to provide high audio quality for a large range of signals including
    music.  Stereophonic operation has also been added where a new high-
    efficiency hybrid stereo coding algorithm enables stereo operation
    at bit-rates as low as 6.2 kbit/s in total.

    The AMR-WB+ audio codec includes the nine frame types specified for
    AMR-WB, extended with new bit-rates ranging from 5.2 to 48 kbit/s.
    Whereas the AMR-WB frame types employ 16000 Hz sampling frequency
    and operates only on monophonic signals, the extension can operate
    at a number of internal sampling frequencies, ISFs, both in mono and
    stereo, see Table 24 in [1].  However, the output sampling frequency
    of the decoder is limited to 8, 16, 24, 32 or 48 kHz.

    An overview of the AMR-WB+ encoding operations is as follows.  The
    encoder receives the audio sampled at for example 48 kHz.  The
    encoding process starts with pre-processing and resampling to the
    Internal Sampling Frequency (ISF) used.  The encoding is performed
    on equal sized super-frames, each corresponding to 2048 samples per
    channel at the ISF.  The codec performs a number of encoding
    decisions for each super-frame choosing between different encoding
    algorithms and block lengths giving fidelity-optimized encoding
    adapted to the signal characteristics of the source.  The stereo
    encoding (if used) is performed separately from the monophonic core
    encoding, thus enabling the selection of different combinations of
    core and stereo encoding rates.  The resulting encoded audio is
    produced in 4 equally long transport frames, individually usable by
    the decoder, corresponding to 512 samples.

    The codec supports 13 different ISFs, ranging from 12.8 up to 38.4
    kHz as described by table 24 in [1].  This allows a trade-off
    between audio bandwidth and the bit-rate required.  As encoding is
    performed on 2048 samples at the ISF, the duration of a super-frame
    and the effective bit-rate of the used frame type varies.  The ISF
    of 25600 Hz has a super-frame duration of 80 ms and is also the
    'nominal' value used to describe the encoding bit-rates.  Using this
    normalization, the ISF selection results in bit-rate variations from
    1/2 up to 3/2 of the nominal bit-rate.

    For each of the 4 transport frame of a super-frame to be
    individually decodable, the position within the super-frame must be
    known.

    The encoding for the extension modes is performed as one monophonic
    core encoding and one stereo encoding.  The core encoding is


 Sjoberg, et. al.            Standards Track                  [Page 5]

 INTERNET-DRAFT       RTP payload format for AMR-WB+      Dec 17, 2004


    performed by splitting the monophonic signal into a lower and a
    higher frequency bands.  The lower band is encoded using either
    algebraic code excited linear prediction (ACELP) or transform coded
    excitation (TCX), which is selected once per transport frame with
    certain allowed combinations within the super-frame.  The higher
    band is encoded using a low-rate parametric bandwidth extension
    approach.  The stereo signal is encoded using a similar frequency
    band decomposition as that for the mono signal, however here the
    signal is divided into three bands that are individually
    parameterized using different techniques.

    The total bit-rate produced by the extension is the result of the
    combination of the encoder's core rate, stereo rate and ISF.  The
    extension supports 8 different core encoding rates producing bit-
    rates between 10.4 and 24.0 kbit/s, see table 22 of [1].  There are
    16 stereo encoding rates generating bit-rates between 2.0 and 8.0
    kbit/s, see table 23 of [1].  The frame type encodes the AMR-WB
    modes, 4 fixed extension rates (see below), 24 combinations of core
    and stereo rates for stereo signals, and the 8 core rates for mono
    signals as listed in table 25 in [1].  This results in that the AMR-
    WB+ supports encodings between 10.4 and 32 kbit/s using an ISF of
    25600 Hz.  Further freedom in produced bit-rates and quality is
    available by using different ISFs.  The selection of an ISF will
    change the available audio bandwidth of the reconstructed signal,
    and at the same time change the total bit-rate.  The bit-rate for a
    given combination of frame type and ISF is determined by multiplying
    the frame type's bit-rate with the used ISF's bit-rate factor (see
    table 24 of [1]).

    The extension also has 4 frame types, which have fixed core bit-
    rates, stereo bit-rates and ISFs, see frame types 10-13 in Table 21
    in [1].  These four pre-defined frame types have a fixed input
    sampling frequency to the encoder set either at 16 or 24 kHz.  These
    frame types share the property with the AMR-WB modes that each
    transport frame only represens 20 ms of audio signal, however they
    are also part of 80 ms super-frames.  Thus frame types 0-13 (AMR-WB
    and fixed extension rates) as listed in table 21 of [1] do not
    require explicit ISF indication.  The other frame types 14-47
    require the ISF employed to be indicated.

    The fact that the extension has 32 different frame types that can be
    combined with 13 ISFs allows for a great flexibility in bit-rate and
    selection of desired quality.  For example there exist a number of
    combinations that will produce the same codec bit-rate.  One
    possible way of producing a 32 kbit/s audio stream is to utilize
    frame type 41, i.e. 25.6 kbit/s, and the ISF of 32kHz (5/4 *
    (19.2+6.4) = 32 kbit/s), and another way is to use frame type 47 and
    the ISF of 25.6 kHz (1 * (24 + 8) = 32 kbit/s).  Which combination
    to use depends on the content being encoded.  In the above example
    the first case provides wider audio bandwidth, while the second one
    spends the same number of bits on somewhat narrower audio bandwidth.


 Sjoberg, et. al.            Standards Track                  [Page 6]

 INTERNET-DRAFT       RTP payload format for AMR-WB+      Dec 17, 2004


    The duration of one AMR-WB+ audio transport frame can vary and
    depends on the ISF.  Since a transport frame always corresponds to
    512 samples at the used ISF, its duration is limited to the range
    13.33 to 40 ms.  The RTP TS clock rate 72000 Hz results in an AMR-
    WB+ transport frame lengths from 960 to 2880 ticks, depending on the
    selected ISF.  If the internal sampling rate is set to 25600 Hz, the
    transport frame duration is equal to 20 ms and the super-frame
    duration is equal to 80 ms.

         Index   ISF   Duration(ms) Duration(TS Ticks)
         -----------------------------------------------
           0     N/A      20             1440
           1    12800     40             2880
           2    14400     35.55          2560
           3    16000     32             2304
           4    17067     30             2160
           5    19200     26.67          1920
           6    21333     24             1728
           7    24000     21.33          1536
           8    25600     20             1440
           9    28800     17.78          1280
          10    32000     16             1152
          11    34133     15             1080
          12    36000     14.22          1024
          13    38400     13.33           960

         Table 1: RTP Timestamp Ticks for each ISF


    The encoder is able to change the used ISF and encoding frame type
    (both mono and stereo) during an encoding session.  For the
    extension frame types with index 10-13 and 16-47 the ISF and frame
    type changes are constrained to occur at super-frame boundaries,
    i.e. within a super-frame the ISF is constant.  Such a limitation
    does not apply for frame types with index 0-9, i.e. the original
    AMR-WB frame types.

    In conclusion there are some features that need special
    consideration from transport point of view.  Firstly, the fact that
    the frame duration depends on the ISF sets requirements on the RTP
    timestamping.  Secondly, each frame of encoded audio must maintain
    information about its frame type, ISF and position in the super-
    frame.


 3.2. Multi-rate Encoding and Rate Adaptation

    The multi-rate encoding capability of AMR-WB+ is designed for
    preserving high audio quality under a wide range of bandwidth
    requirements and transmission conditions.


 Sjoberg, et. al.            Standards Track                  [Page 7]

 INTERNET-DRAFT       RTP payload format for AMR-WB+      Dec 17, 2004


    AMR-WB+ enables seamless switching between frame types using the
    same number of audio channels and the same ISF.  Every AMR-WB+ codec
    implementation is required to support all the respective audio
    coding frame types defined by the codec and must be able to handle
    switching between any two frame types.  Switching between frame
    types employing different number of audio channels or different ISF
    is possible, but may not be completely seamless.  Therefore it is
    recommended to perform such switching infrequently and if possible
    during periods where the input is silent.


 3.3. Voice Activity Detection and Discontinuous Transmission

    AMR-WB+ supports the same algorithms for voice activity detection
    (VAD) and generation of comfort noise (CN) parameters during silence
    periods as used by the AMR-WB codec.  However it can only be used in
    conjunction with the AMR-WB frame types (FT=0-8).  As with the AMR-
    WB codec, this option allows for reduction of the number of
    transmitted bits and packets during silence periods to a minimum
    when operating in the AMR-WB frame types (FT = 0...8).  The
    operation of sending CN parameters at regular intervals during
    silence periods is usually called discontinuous transmission (DTX)
    or source controlled rate (SCR) operation.  The AMR-WB+ frames
    containing CN parameters are called Silence Indicator (SID) frames.
    See more details about VAD and DTX functionality in [4] and [5].


 3.4. Support for Multi-Channel Session

    Some of the AMR-WB+ frame types support encoding of stereophonic
    audio.  Because of this native support for two-channel stereophonic
    signal it does not seem necessary to support multi-channel transport
    with separate codecs as done in AMR-WB RTP payload [7].  The codec
    has the capability of stereo to mono downmixing as part of the
    decoding process.  Thus, also receiver that is only capable of
    playout of monophonic audio can still decode and play signals
    originally encoded and transmitted as stereo.  However, to avoid
    spending bit-rate on stereo encoding that will not be utilized, a
    mechanism for signaling a session with mono only is defined.


 3.5. Unequal Bit-error Detection and Protection

    The audio bits encoded in each AMR-WB frame are sorted according to
    their different perceptual sensitivity to bit errors.  This property
    can be exploited e.g. in cellular systems to achieve better voice
    quality by using unequal error protection and detection (UEP and
    UED) mechanisms.  However, the bits of the extension frame types of
    the AMR-WB+ codec do not have a consistent sensitivity property and
    are not sorted in sensitivity order.  Thus, UEP or UED cannot be


 Sjoberg, et. al.            Standards Track                  [Page 8]

 INTERNET-DRAFT       RTP payload format for AMR-WB+      Dec 17, 2004


    utilized with the extension frame types.  If there is a need to use
    UEP or UED for AMR-WB frame types, please use the RTP payload format
    for the AMR-WB defined in RFC 3267 [7].


 3.6. Robustness against Packet Loss

    The payload format supports two mechanisms to improve robustness
    against packet loss: simple forward error correction (FEC) and frame
    interleaving.


 3.6.1. Use of Forward Error Correction (FEC)

    The simple scheme of repetition of previously sent data is one way
    of achieving FEC.  Another possible scheme which can be more
    bandwidth efficient is to use payload external FEC, e.g. RFC2733
    [11], which generates extra packets containing repair data.  For the
    AMR-WB+ extension frame types, it is possible to send redundant
    copies of an input frame encoded using the same frame type and ISF.
    We describe such a scheme next.

    The basic idea is to send previously transmitted frame(s) together
    with the new one(s).  This is done by using a sliding window to
    group the audio frames to be sent in each payload.  Figure 1 below
    shows an example.

    --+--------+--------+--------+--------+--------+--------+--------+--
      | f(n-2) | f(n-1) |  f(n)  | f(n+1) | f(n+2) | f(n+3) | f(n+4) |
    --+--------+--------+--------+--------+--------+--------+--------+--

      <---- p(n-1) ---->
               <----- p(n) ----->
                        <---- p(n+1) ---->
                                 <---- p(n+2) ---->
                                          <---- p(n+3) ---->
                                                   <---- p(n+4) ---->

    Figure 1: An example of redundant transmission.

    In this example each frame is retransmitted once in the following
    RTP payload packet.  Here, f(n-2)...f(n+4) denotes a sequence of
    audio frames and p(n-1)...p(n+4) a sequence of payload packets.

    The use of this approach does not require signaling at the session
    setup.  In other words, the audio sender can choose to use this
    scheme without consulting the receiver.  This is because a packet
    containing redundant frames will not look different from a packet
    with only new frames.  For a certain timestamp, the receiver may
    receive multiple copies of a frame containing encoded audio data or
    frames indicated as NO_DATA.


 Sjoberg, et. al.            Standards Track                  [Page 9]

 INTERNET-DRAFT       RTP payload format for AMR-WB+      Dec 17, 2004


    This redundancy scheme provides the same functionality as the one
    described in RFC 2198 "RTP Payload for Redundant Audio Data" [12].
    In most cases the mechanism described above is more efficient and
    simpler than requiring both endpoints to support RFC 2198 in
    addition to the AMR-WB+ RTP payload format.  However, there is one
    scenario in which the use of RFC 2198 is needed: if one desires to
    use some other codec than AMR-WB+ for the redundant encoding, the
    AMR-WB+ payload format is not able to carry it.

    The sender is responsible for selecting an appropriate amount of
    redundancy based on feedback about the channel conditions, e.g. in
    RTCP receiver reports.  The sender is also responsible for avoiding
    congestion, which may be exacerbated by redundancy (see Section 5
    for more details).


 3.6.2. Use of Frame Interleaving

    To decrease protocol overhead, the payload design allows several
    audio frames be encapsulated into a single RTP packet.  One of the
    drawbacks of such an approach is that in case of packet loss this
    means loss of several consecutive audio frames, which usually causes
    clearly audible distortion in the reconstructed audio.  Interleaving
    of frames can improve the audio quality in such cases by
    distributing the consecutive losses into a series of single frame
    losses, which are easier to cover by an error concealment algorithm.
    However, interleaving and bundling several frames per payload will
    also increase end-to-end delay and sets higher buffering
    requirements, and it is therefore not appropriate for all usage
    scenarios or devices.  Anyway, streaming applications will most
    likely be able to exploit interleaving to improve audio quality in
    lossy transmission conditions.

    Note that this payload design supports the use of frame interleaving
    as an option.  The usage of this feature needs to be negotiated or
    at least signaled in the session set-up.

    The interleaving supported by this format is rather flexible.  For
    example, a continuous pattern can be defined, as the example below
    shows.

    --+--------+--------+--------+--------+--------+--------+--------+--
      | f(n-2) | f(n-1) |  f(n)  | f(n+1) | f(n+2) | f(n+3) | f(n+4) |
    --+--------+--------+--------+--------+--------+--------+--------+--

               [ P(n)   ]
      [ P(n+1) ]                 [ P(n+1) ]
                        [ P(n+2) ]                 [ P(n+2) ]
                                          [ P(n+3) ]                 [P(
                                                            [ P(n+4) ]


 Sjoberg, et. al.            Standards Track                 [Page 10]

 INTERNET-DRAFT       RTP payload format for AMR-WB+      Dec 17, 2004


    Figure 2: An example of interleaving pattern that has constant
    delay.

    In Figure 2 the consecutive frames, denoted f(n-2) to f(n+4), are
    aggregated into packets P(n) to P(n+4), two in each packet with
    interleaving.  This approach provides a pattern that allows for
    constant delay in both interleaving and deinterleaving process.  The
    deinterleaving buffer in this example needs to have room for at
    least 3 frames, including the one that is ready to be consumed.  One
    case when the storage space for 3 frames is needed is for example
    when f(n) is the next frame to be decoded and played: frame f(n) was
    received in packet P(n+2) carrying also frame f(n+3), and also frame
    f(n+1) received in packet P(n+1) is already in the deinterleaving
    buffer.  Note also that in this example the buffer occupancy varies:
    when frame f(n+1) is the next one to be decoded, there are only two
    frames (f(n+1) and f(n+3)) in the buffer.


 3.7. AMR-WB+ Audio over IP scenarios

    Since the primary target application for the AMR-WB+ codec is packet
    switched streaming, the most relevant usage scenario for this
    payload format is IP end-to-end between a server and a terminal, as
    shown in Figure 3.

              +----------+                          +----------+
              |          |    IP/UDP/RTP/AMR-WB+    |          |
              |  SERVER  |<------------------------>| TERMINAL |
              |          |                          |          |
              +----------+                          +----------+

               Figure 3: Server to terminal IP scenario


 Sjoberg, et. al.            Standards Track                 [Page 11]

 INTERNET-DRAFT       RTP payload format for AMR-WB+      Dec 17, 2004


 4. RTP Payload Format for AMR-WB+

    Despite belonging to a same family of codecs, the payload format for
    the AMR-WB+ is different from the AMR and AMR-WB payload formats
    [7].  The main emphasis in the payload design has been to minimize
    the overhead in typical use cases, while still providing full
    flexibility with slightly higher overhead.  This is made possible by
    defining some frame specific parameters to cover all frames in the
    payload instead of defining them for each frame separately.

    The payload format has two modes, the basic mode and the interleaved
    mode.  The main structural difference between the two modes is the
    extension of the table of content entries with a frame displacement
    fields in the interleaved mode.  The basic mode supports aggregation
    of multiple consecutive frames in a payload.  The interleaved mode
    supports aggregation of multiple frames that are non-consecutive in
    time.  In both modes it is possible to have frames encoded at
    different frame types in the same payload, but the ISF must remain
    constant throughout the payload.  However, frequent switching of the
    ISF is not expected, and the codec is restricted to switch ISF only
    on super-frame boundaries.  Thus, the payload format allows ISF
    switching only between payloads.

    The payload format is designed around the property that AMR-WB+
    frames carried in a payload are consecutive in time and share the
    same frame duration in between any ISF change.  Then enables the
    receiver to derive the timestamp for an individual frame within a
    payload based, either on the order of frames in the payload (basic
    mode), or the compact displacement fields (interleaving mode).  The
    frame timestamps are used to regenerate the correct order of frames
    after reception, identify duplicates, and detect lost frames that
    require concealment.

    The interleaving scheme of this payload format is significantly more
    flexible than the one specified in RFC 3267.  The AMR and AMR-WB
    payload format is only capable of using periodic patterns with
    frames taken from an interleaving group at fixed intervals, whereas
    this interleaving scheme allows for any patterns as long as the
    difference in decoding order between any two adjacent frames in the
    interleaved payload is not more than 256 frames.  Note that even at
    the highest ISF this allows interleaving depth up to 3.41 seconds.

    To allow for error resiliency through redundant transmission, the
    periods covered by multiple packets MAY overlap in time.  A receiver
    MUST be prepared to receive any audio frame multiple times, all
    multiply sent frames MUST use the same frame type and ISF, and have
    the same RTP timestamp, or be a NO_DATA frame (FT=15).

    The payload consists of octet aligned elements (header, ToC and
    audio frames), and only the audio frames for AMR-WB frame types (0-
    9) require any padding to make them an integral number of octets


 Sjoberg, et. al.            Standards Track                 [Page 12]

 INTERNET-DRAFT       RTP payload format for AMR-WB+      Dec 17, 2004


    long.  If additional padding is required to bring the payload length
    to a larger multiple of octets or for some other purpose, then the P
    bit in the RTP header MAY be set and padding appended as specified
    in [3].


 4.1. RTP Header Usage

    The format of the RTP header is specified in [3].  This payload
    format uses the fields of the header in a manner consistent with
    that specification.

    The RTP timestamp corresponds to the sampling instant of the first
    sample encoded for the first frame in the packet.  The timestamp
    clock frequency SHALL be 72000 Hz.  This frequency allows the frame
    duration to be integer RTP timestamp ticks for the used ISFs, and
    also gives reasonable conversion factors to used audio sampling
    frequencies.  See section 4.3.1 for how to derive the RTP timestamp
    for any audio frame beyond the first one.

    The RTP header marker bit (M) SHALL be set to 1 if the first frame
    carried in the packet contains an audio frame, which is the first in
    a talkspurt.  For all other packets the marker bit SHALL be set to
    zero (M=0).

    The assignment of an RTP payload type for this new packet format is
    outside the scope of this document, and will not be specified here.
    It is expected that the RTP profile under which this payload format
    is being used will assign a payload type for this encoding or
    specify that the payload type is to be bound dynamically.

    The media type parameter "channels" is used to indicate the maximum
    number of channels allowed to be used for a given payload type.  A
    payload type where channels=1 (mono), SHALL only carry mono content.
    While a payload type for which channels=2 has been declared MAY
    carry both mono and stereo content.


 4.2. Payload Structure

    The complete payload consists of a payload header, a payload table
    of contents, and the audio data representing one or more audio
    frames.  The following diagram shows the general payload format
    layout:

    +----------------+-------------------+----------------
    | payload header | table of contents | audio data ...
    +----------------+-------------------+----------------

    Payloads containing more than one audio frame are called compound
    payloads.


 Sjoberg, et. al.            Standards Track                 [Page 13]

 INTERNET-DRAFT       RTP payload format for AMR-WB+      Dec 17, 2004


    The following sections describe the variations taken by the payload
    format depending on whether the AMR-WB+ session is set up to use the
    basic mode or interleaved mode.


 4.3. Payload Definitions

 4.3.1. The Payload Header

    The payload header carries data that is common for all frames in the
    payload.  The structure of the payload header is described below.

     0 1 2 3 4 5 6 7
    +-+-+-+-+-+-+-+-+
    |   ISF   |TFI|L|
    +-+-+-+-+-+-+-+-+

    ISF (5 bits): Indicates the Internal Sampling Frequency employed for
       all frames in this payload.  The index value corresponds to
       internal sampling frequency as specified in Table 24 in [1].
       This field SHALL be set to 0 for Frame Type values 0-13.

    TFI (2 bits): Transport Frame Index from 0 (first) to 3 (last)
       indicating the position of the first transport frame of this
       payload in the AMR-WB+ super-frame structure.  This field SHALL
       be set to 0 for Frame Type values 0-9, and SHALL be ignored by
       the receiver.

    L (1 bit): Long displacement field flag for payloads in interleaved
       mode.  If set to 0, four-bit displacement fields are used to
       indicate interleaving offset; if set to 1, displacement fields of
       eight bits are used (see section 4.3.2.2).  For payloads in the
       basic mode this bit SHALL be set to 0 and SHALL be ignored by the
       receiver.

    Note that the change of ISF during a session always requires
    separate packets for frames employing different ISF value.
    Furthermore, in the interleaved mode the ISF switching also requires
    termination of the previous interleaving pattern and restarting a
    new one for the new ISF.


 Sjoberg, et. al.            Standards Track                 [Page 14]

 INTERNET-DRAFT       RTP payload format for AMR-WB+      Dec 17, 2004


 4.3.2. The Payload Table of Contents

    The table of contents (ToC) consists of a list of ToC entries where
    each entry corresponds to a group of audio frames carried in the
    payload, i.e.

    +----------------+----------------+- ... -+----------------+
    |  ToC entry #1  |  Toc entry #2  |          ToC entry #N  |
    +----------------+----------------+- ... -+----------------+

    When multiple groups of frames are present in a payload, the ToC
    entries SHALL be placed in the packet in order of their creation
    time.


 4.3.2.1. ToC Entry in the Basic Mode

    A ToC entry of a payload in the basic mode takes the following
    format:

     0                   1
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |F| Frame Type  |    #frames    |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

    F (1 bit): If set to 1, indicates that this ToC entry is followed by
       another ToC entry; if set to 0, indicates that this ToC entry is
       the last one in the ToC.

    Frame Type (FT) (7 bits): Indicates the audio codec frame type used
       for the group of frames corresponding to this ToC entry.  FT
       indicates the combination of AMR-WB+ core and stereo rate, one of
       the special AMR-WB+ frame types, the AMR-WB rate, or comfort
       noise, as specified by Table 25 in [1].

    #frames (8 bits): This field indicates the number of frames in the
       group corresponding to this ToC entry.  The number of frames is
       the value of this field plus one, i.e. in the range 1-256.


 Sjoberg, et. al.            Standards Track                 [Page 15]

 INTERNET-DRAFT       RTP payload format for AMR-WB+      Dec 17, 2004


 4.3.2.2. ToC Entry in the Interleaved Mode

    A ToC entry of a payload in the interleaved mode takes the following
    format if the L-bit in the payload header is set to 0:

     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |F| Frame Type  |    #frames    |  DIS1 |  ...  |  DISi |  ...  |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |  ...  |  ...  |  DISn |  padd |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

    F (1 bit): See definition in 4.3.2.1.

    Frame Type (FT) (7 bits): See definition in 4.3.2.1.

    #frames (8 bits): See definition in 4.3.2.1.

    DIS1...DISn (4 bits): A list of n (n=#frames) displacement fields
       indicating the displacement of the i:th (i=1..n) audio frame
       relative to the preceding audio frame in the payload as number of
       frames.  The four-bit displacement values may be between 0 and 15
       indicating the number of audio frames in decoding order between
       the (i-1):th and the i:th frame in the payload.  Note that for
       the first ToC entry of the payload the value of DIS1 has no
       meaning, since this frame's location in the decoding order is
       uniquely defined by the RTP timestamp and TFI in the payload
       header.  For the first ToC entry of a payload the DIS1 SHALL be
       set to zero, and the receiver SHALL ignore the value.  Note also
       that for subsequent ToC entries DIS1 indicates the number of
       frames between the last frame of the previous group and the first
       frame of this group.

    Padd (4 bits): Four padding bits SHALL be included at the end of the
       ToC entry in case there is odd number of frames in the group
       corresponding to this entry.  These bits SHALL be set to zero and
       SHALL be ignored by the receiver.  If a group containing an even
       number of frames is associated with this ToC entry, these padding
       bits SHALL NOT be included in the payload.


 Sjoberg, et. al.            Standards Track                 [Page 16]

 INTERNET-DRAFT       RTP payload format for AMR-WB+      Dec 17, 2004


    A ToC entry of a payload in the interleaved mode takes the following
    format if the L-bit in the payload header is set to 1:

     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |F| Frame Type  |    #frames    |      DIS1     |      ...      |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |      ...      |     DISn      |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

    F (1 bit): See definition in 4.3.2.1.

    Frame Type (FT) (7 bits): See definition in 4.3.2.1.

    #frames (8 bits): See definition in 4.3.2.1.

    DIS1...DISn (8 bits): A list of n (n=#frames) displacement fields
       indicating the displacement of the i:th (i=1..n) audio frame
       relative to the preceding audio frame in the payload as number of
       frames.  The eight-bit displacement values may be between 0 and
       255 indicating the number of audio frames in decoding order
       between the (i-1):th and the i:th frame in the payload.  Note
       that for the first ToC entry of the payload the value of DIS1 has
       no meaning, since this frame's location in the decoding order is
       uniquely defined by the RTP timestamp and TFI in the payload
       header.  For the first ToC entry of a payload the DIS1 SHALL be
       set to zero, and the receiver SHALL ignore the value.  Note also
       that for subsequent ToC entries DIS1 indicates the displacement
       between the last frame of the previous group and the first frame
       of this group.


 4.3.2.3. RTP Timestamp Derivation

    The RTP Timestamp value for a frame is the timestamp value of the
    first audio sample encoded in the frame.  The timestamp value for a
    frame is derived differently depending on if the payload is in basic
    or interleaved mode.  In both cases the first frame in a compound
    packet has an RTP timestamp equal to the one received in the RTP
    header.  In the basic mode, the RTP time for any subsequent frame is
    derived by adding together the frame durations (see Table 1) of all
    the preceding frames in the payload and adding the sum to the RTP
    header timestamp value.  For example if the RTP Header timestamp
    value is 12345, the payload carries four frames, and the frame
    duration is 16 ms (ISF = 32 kHz) corresponding to 1152 timestamp
    ticks, the RTP timestamp of the fourth frame in the payload is 12345
    + 3 * 1152 = 15801.

    In interleaved mode the RTP timestamp for each frame in the payload
    is derived by combining the RTP header timestamp and the sum of the


 Sjoberg, et. al.            Standards Track                 [Page 17]

 INTERNET-DRAFT       RTP payload format for AMR-WB+      Dec 17, 2004


    time offsets of all preceding frames in this payload.  The frame
    timestamps are computed based on displacement fields and the frame
    duration derived from the ISF value.  Note that the displacement in
    time between frame i-1 and frame i is (DISi + 1) * frame duration
    because also the duration of the (i-1):th must be taken into
    account.  The following example derives the RTP timestamps for the
    frames in an interleaved mode payload having the following header
    and ToC information:

    RTP header timestamp: 12345
    ISF = 32 kHz
    Frame 1 displacement field: DIS1 = 0
    Frame 2 displacement field: DIS2 = 6
    Frame 3 displacement field: DIS3 = 4
    Frame 4 displacement field: DIS4 = 7

    The ISF of 32 kHz implies frame duration of 16 ms, which means 1152
    ticks in 72 kHz timestamp rate.  The timestamp of the first frame in
    the payload is the RTP timestamp, i.e. TS1 = RTP TS.  Note that the
    displacement field value for this frame must be ignored.  For the
    second frame in the payload the timestamp can be calculated as TS2 =
    TS1 + (DIS2 + 1) * 1152 = 20409.  For the third frame the timestamp
    is TS3 = TS2 + (DIS3 + 1) * 1152 = 26169.  Finally, for the fourth
    frame of the payload we have TS4 = TS3 + (DIS4 + 1) * 1152 = 35385.


 4.3.2.4. Frame Type Considerations

    The value of Frame Type is defined in Table 25 in [1].  FT=14
    (AUDIO_LOST) is used to indicate frames that are lost.  NO_DATA
    (FT=15) frame could mean either that there is no data produced by
    the audio encoder for that frame or that no data for that frame is
    transmitted in the current payload (i.e., valid data for that frame
    could be sent either in an earlier or later packet).  The duration
    for these non-included frames is dependent on the internal sampling
    frequency indicated by the ISF field.

    For frame types with index 0-13 the ISF field SHALL be set 0 and has
    no meaning.  The frame duration for these frame types are fixed to
    20 ms in time, i.e. 1440 ticks in 72 kHz.  For payloads containing
    only frame types with index 0-9 the TFI field SHALL be set to 0, and
    lacks meaning.


 4.3.2.5. Other TOC Considerations

    If receiving a ToC entry with a FT value not defined, the whole
    packet SHOULD be discarded.  This is to avoid the loss of data
    synchronization in the depacketization process, which can result in
    a severe degradation in audio quality.


 Sjoberg, et. al.            Standards Track                 [Page 18]

 INTERNET-DRAFT       RTP payload format for AMR-WB+      Dec 17, 2004


    Note that packets containing only NO_DATA frames SHOULD NOT be
    transmitted.  Also, NO_DATA frames at the end of a frame sequence to
    be carried in a payload SHOULD NOT be included in the transmitted
    packet.  The AMR-WB+ SCR/DTX is identical with AMR-WB SCR/DTX
    described in [5] and can only be used in combination with the AMR-WB
    frame types (0-8).

    When multiple groups of frames are present, their ToC entries will
    be placed in the ToC in order of their creation time independently
    on the payload mode.  In basic mode the frames will be consecutive
    in time, while in interleaved mode the frames may not only be non-
    consecutive in time but may even have varying inter frame distances.

 4.3.2.6. ToC Examples

    The following figure shows an example of a ToC for three audio
    frames in basic mode.  Note that in this case all audio frames are
    encoded using the same frame type, i.e. there is only one ToC entry.

     0                   1
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |0| Frame Type1 |  #frames = 3  |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


    The following figure shows an example of a ToC of three entries in
    basic mode.  Note that also in this case the payload carries three
    frames, but three ToC entries are needed since all frames of the
    payload are encoded using different frame types.

     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |1| Frame Type1 |  #frames = 1  |1| Frame Type2 |  #frames = 1  |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |0| Frame Type3 |  #frames = 1  |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


    The following figure shows an example of a ToC of two entries in
    interleaved mode using four-bit displacement fields.  The payload
    includes two groups of frames, the first one including a single
    frame, and the other one consisting of two frames.

     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |1| Frame Type1 |  #frames = 1  |  DIS1 |  padd |0| Frame Type2 |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |  #frames = 2  |  DIS1 |  DIS2 |


 Sjoberg, et. al.            Standards Track                 [Page 19]

 INTERNET-DRAFT       RTP payload format for AMR-WB+      Dec 17, 2004


    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


 4.3.3. Audio Data

    Audio data of a payload contains one or more audio frames or comfort
    noise frames, as described in the ToC of the payload.

       Note, for ToC entries with FT=14 or 15, there will be no
       corresponding audio frame present in the audio data.

    Each audio frame for an extension frame type represents an AMR-WB+
    transport frame corresponding to the encoding of 512 samples of
    audio sampled with the internal sampling frequency specified by the
    ISF indicator.  As an exception, frame types with index 10-13 are
    only capable of using a single internal sampling frequency (25600
    Hz).  The encoding rates (combination of core bit-rate and stereo
    bit-rate) are indicated in the frame type field of the corresponding
    ToC entry.  The octet length of the audio frame is implicitly
    defined by the frame type field and is given in tables 21 and 25 of
    [1].  The order and numbering notation of the bits are as specified
    in [1].  As specified there, the bits of the AMR-WB audio frames
    (frame type values in range 0...8) have been rearranged in order of
    decreasing sensitivity.  For the AMR-WB+ extension frame types and
    comfort noise frames, the bits are in the order produced by the
    encoder.  The last octet of each audio frame MUST be padded with
    zeroes at the end if not all bits in the octet are used.  In other
    words, each audio frame MUST be octet-aligned.  However, all
    extension frame types (10-13, 16-47) specified in [1] lead to octet-
    aligned frames.


 4.3.4. Methods for Forming the Payload

    The payload begins with the payload header, followed by the table of
    contents consisting of a list of ToC entries.

    The audio data follows the table of contents, all of the octets
    comprising an audio frame are appended to the payload as a unit.
    The audio frames are packed in timestamp order within each group of
    frames (per ToC entry).  Each group of frames is packed in the same
    order as their corresponding ToC entries are arranged in the ToC,
    with the exception that a ToC entry with FT=14 or FT=15 there will
    be no data octets present for that group of frames.


 4.3.5. Payload Examples

 4.3.5.1. Example 1, Basic Mode Payload Carrying Multiple Frames Encoded
    Using the Same Frame Type


 Sjoberg, et. al.            Standards Track                 [Page 20]

 INTERNET-DRAFT       RTP payload format for AMR-WB+      Dec 17, 2004


    The following diagram shows a payload that carries three AMR-WB+
    frames encoded using 14 kbit/s frame type (FT=26) with a frame
    length of 280 bits (35 bytes).  The internal sampling frequency in
    this example is 25.6 kHz (ISF = 8).  The TFI for the first frame is
    2, indicating that the first transport frame in this payload is the
    third in a super-frame.  Since this payload is in the basic mode the
    subsequent frames of the payload are consecutive frames in decoding
    order, i.e. the fourth transport frame of the current super-frame
    and the first transport frame of the next super-frame.  Note that
    because the frames are all encoded using the same frame type, only
    one ToC entry is required.

     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | ISF = 8 | 2 |0|0|  FT = 26    |  #frames = 3  |   f1(0...7)   |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    : ...                                                           :
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | ...           | f1(272...279) |   f2(0...7)   |               |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    : ...                                                           :
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | f2(272...279) |   f3(0...7)   | ...                           |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    : ...                                                           :
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | ...                                           | f3(272...279) |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


 4.3.5.2. Example 2, Basic Mode Payload Carrying Multiple Frames Encoded
    Using Different Frame Types

    The following diagram shows a payload that carries three AMR-WB+
    frames; the first frame is encoded using 18.4 kbit/s frame type
    (FT=33) with a frame length of 368 bits (46 bytes), and the two
    subsequent frames are encoded using 20 kbit/s frame type (FT=35)
    having frame length of 400 bits (50 bytes).  The internal sampling
    frequency in this example is 32 kHz (ISF = 10), implying the overall
    bit-rates of 23 kbit/s for the first frame of the payload, and 25
    kbit/s for the subsequent frames.  The TFI for the first frame is 3,
    indicating that the first transport frame in this payload is the
    fourth in a super-frame.  Since this is a payload in the basic mode
    the subsequent frames of the payload are consecutive frames in
    decoding order, i.e. the first and second transport frames of the
    current super-frame.  Note that since the payload carries two
    different frame types, there are two ToC entries.


 Sjoberg, et. al.            Standards Track                 [Page 21]

 INTERNET-DRAFT       RTP payload format for AMR-WB+      Dec 17, 2004


     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |  ISF=10 | 3 |0|1|  FT = 33    |  #frames = 1  |0|  FT = 35    |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |  #frames = 2  |   f1(0...7)   | ...                           |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    : ...                                                           :
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | ...                           | f1(360...367) |   f2(0...7)   |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    : ...                                                           :
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | f2(392...399) |   f3(0...7)   | ...                           |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    : ...                                                           :
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | ...                           | f3(392...399) |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


 4.3.5.3. Example 3, Payload in Interleaved Mode

    This example shows a payload in interleaved mode carrying four
    frames encoded using 32 kbit/s frame type (FT=47) with frame length
    of 640 bits (80 bytes).  The internal sampling frequency is 38.4 kHz
    (ISF = 13) implying bit-rate of 48 kbit/s for all frames in the
    payload.  The TFI for the first frame is 0, i.e. it is the first
    transport frame of a super-frame.  The displacement fields for the
    subsequent frames are DIS2=18, DIS3=15, and DIS4=10, which implies
    that the subsequent frames have the TFIs of 3, 3, and 2,
    respectively.  The long displacement field flag L in the payload
    header is set to 1, which means that the displacement fields in the
    ToC entry use eight bits.  Note that since all frames of this
    payload are encoded using the same frame type, there is need only
    for a single ToC entry.  Furthermore, the displacement field for the
    first frame corresponding to the first ToC entry (DIS1=0) must be
    ignored since its timestamp and TFI are defined by the RTP timestamp
    and the TFI found in the payload header.

    The RTP timestamp values of the frames in this example is:
    Frame1: TS1 = RTP Timestamp
    Frame2: TS2 = TS1 + 19 * 960
    Frame3: TS3 = TS2 + 16 * 960
    Frame4: TS4 = TS3 + 11 * 960


 Sjoberg, et. al.            Standards Track                 [Page 22]

 INTERNET-DRAFT       RTP payload format for AMR-WB+      Dec 17, 2004


     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |  ISF=13 | 0 |1|0|  FT = 47    |  #frames = 4  |   DIS1 = 0    |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |   DIS2 = 18   |   DIS3 = 15   |   DIS4 = 10   |   f1(0...7)   |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    : ...                                                           :
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | ...                           | f1(632...639) |   f2(0...7)   |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    : ...                                                           :
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | ...                           | f2(632...639) |   f3(0...7)   |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    : ...                                                           :
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | ...                           | f3(632...639) |   f4(0...7)   |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    : ...                                                           :
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | ...                           | f4(632...639) |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


 4.4. Interleaving Considerations

    The flexible interleaving scheme requires some further usage
    considerations.  As presented in the example in Section 3.6.2, an
    interleaving pattern requires a certain size of the deinterleaving
    buffer.  This required buffer space, expressed as number of frame
    slots is indicated using the "interleaving" media parameter.  The
    number of frame slots needed can be converted into actual memory
    requirement considering the largest (in bytes) combination of AMR-
    WB+'s core and stereo rates.

    However, the information about the frame buffer size is not always
    sufficient to determine when it is appropriate to start consuming
    frames from the interleaving buffer, there are two cases in which
    additional information is needed: either due to switching of the ISF
    or due to changes of the interleaving pattern.  Due to this the
    "int-delay" media type parameter is defined.  It allows a sender to
    indicate the minimal media time that needs to be present in the
    buffer before the decoder can start consuming frames from the
    buffer.


 4.5. Implementation Considerations

    An application implementing this payload format MUST understand all
    the payload parameters in the out-of-band signaling used.  For


 Sjoberg, et. al.            Standards Track                 [Page 23]

 INTERNET-DRAFT       RTP payload format for AMR-WB+      Dec 17, 2004


    example, if an application uses SDP, all the SDP and MIME parameters
    in this document MUST be understood.  This requirement ensures that
    an implementation always can decide if it is capable or not of
    communicating.

    Both basic and interleaving mode SHALL be implemented.  The
    implementation burden of both is rather small and requiring both
    ensures interoperability.  As the AMR-WB+ codec contains full
    functionality of the AMR-WB codec, anyone supporting the AMR-WB+
    codec and this payload format is RECOMMENDED to also implement the
    payload format in RFC 3267 [7] for the AMR-WB frame types.  This
    will significantly help interoperability with devices that only
    support AMR-WB, in applications and scenarios where this is
    possible.  Otherwise an AMR-WB+ end-point that is in fact capable of
    everything except the RTP payload format for AMR-WB will not be able
    to communicate.

    When doing error concealment certain precautions are needed due to
    the possibility of switching of the ISF. The main difficulty arises
    from the fact that with packet loss naturally also the information
    about the ISF, number of frames and RTP timestamp of the missing
    packet that are required to perform the error concealment in a
    correct manner are lost.  This may lead to a case where the error
    concealment is performed using incorrect frame length, which in turn
    can in the worst case make some of the frames received in subsequent
    payloads unusable.  More information and an example algorithm for
    solving this problem is available in section 4.5.1 below.


 4.5.1. ISF recovery in case of packet loss

    In case of packet loss a proper error concealment has to be
    initiated in the AMR-WB+ decoder to replace the frames carried in
    the lost packet.  A loss concealment algorithm requires a codec
    framing that matches the timestamps of the correctly received
    frames.  Hence, it is necessary to recover the timestamps of the
    lost frames.  A difficulty with this may arise due to the fact that
    the codec frame length that is associated with the ISF may have
    changed during the frame loss.

    The task of recovering the timestamps of lost frames is illustrated
    by an example case where two frames with timestamps t0 and t1 have
    been received properly, the first one being the last packet before
    the loss, and the latter one is the first packet after the loss
    period.  The ISF values for these packets are isf0 and isf1,
    respectively.  The associated frame lengths (in timestamp ticks) are
    given as L0 and L1, respectively.  Three frames with timestamps x1 -
    x3 have been lost.  The example further assumes that ISF changes
    once from isf0 to isf1 during the frame loss, as shown in the figure
    below.


 Sjoberg, et. al.            Standards Track                 [Page 24]

 INTERNET-DRAFT       RTP payload format for AMR-WB+      Dec 17, 2004


    What is generally not known in the decoder and what is required for
    recovery of the timestamps is:
    * the ISFs associated to the lost frames
    * how many frames have been lost


      |<---L0--->|<---L0--->|<-L1->|<-L1->|<-L1->|

      |   Rxd    |   lost   | lost | lost |  Rxd |
    --+----------+----------+------+------+------+--

      t0         x1         x2     x3     t1

    In the following an example algorithm is given, which may be used to
    recover timestamps and ISFs belonging to lost frames.

    As in the above example, it is assumed that two frames have been
    received properly with timestamps t0 and t1, and ISF values isf0 and
    isf1, and associated frame lengths L0 and L1, respectively.
    Furthermore, the TFIs of the two received frames are denoted by tfi0
    and tfi1, respectively.

    Example Algorithm:

    Start:                              # check for frame loss
    If (t0 + L0) == t1 Then goto End    # no frame loss

    Step 1:                             # check case with no ISF change
    If (isf0 != isf1) Then goto Step 2  # At least one ISF change
    If (isFractional(t1 - t0)/L0) Then goto Step 3
                                        # More than 1 ISF change

    Return recovered timestamps as
    x(n) = t0 + n*L1 and associated ISF equal to isf0, for 0<n<(t1 -
    t0)/L0
    goto End

    Step 2:
    Loop initialization: n := 4 - tfi0 mod 4
    While n <= (t1-t0)/L0
      Evaluate m := (t1 - t0 - n*L0)/L1
      If (isInteger(m) AND ((tfi0+n+m) mod 4 == tfi1)) Then goto found;
      n := n+4
      endloop
    goto step 3                         # More than 1 ISF change

    found:
    Return recovered timestamps and ISFs as
    x(i) = t0 + i*L0 and associated ISF equal to isf0, for 0 < i <= n
    x(i) = t0 + n*L0 + (i-n)*L1 and associated ISF equal to isf1, for n
    < i <= n+m


 Sjoberg, et. al.            Standards Track                 [Page 25]

 INTERNET-DRAFT       RTP payload format for AMR-WB+      Dec 17, 2004


    goto End

    Step 3:
    More than 1 ISF change has occurred.  Since ISF changes can be
    assumed to be infrequent, such a situation occurs only if long
    sequences of frames are lost.  In that case it is probably not
    useful to try to recover the timestamps of the lost frames.  Rather,
    the AMR-WB+ decoder should be reset and decoding should be resumed
    starting with the frame with timestamp t1.

    End:

    The above algorithm does still not solve the issue when the receiver
    buffer depth is shallower than the loss burst.  In this kind of case
    where the concealment must be done without any knowledge about
    future frames, the concealment may result in loss of frame boundary
    alignment.  If that occurs, it may be necessary to reset and restart
    the codec to perform resynchronization.


 5. Congestion Control

    The general congestion control considerations for transporting RTP
    data apply to AMR-WB+ audio over RTP as well, see RTP [3] and any
    applicable RTP profile like AVP [9].  However, the multi-rate
    capability of AMR-WB+ audio coding provides a mechanism for
    controlling congestion, since the bandwidth demand can be adjusted
    by selecting a different coding frame type or lower internal
    sampling rate.

    Another parameter that may impact the bandwidth demand for AMR-WB+
    is the number of frames that are encapsulated in each RTP payload.
    Packing more frames in each RTP payload can reduce the number of
    packets sent and hence the overhead from IP/UDP/RTP headers, at the
    expense of increased delay and reduced error robustness against
    packet losses.

    If forward error correction (FEC) is used to combat packet loss, the
    amount of redundancy added by FEC will need to be regulated so that
    the use of FEC itself does not cause a congestion problem.


 6. Security Considerations

    RTP packets using the payload format defined in this specification
    are subject to the general security considerations discussed in RTP
    [3] and any applicable profile such as AVP [9] or SAVP [10].  As
    this format transports encoded audio, the main security issues
    include confidentiality, integrity protection, and data origin
    authentication of the audio itself.  The payload format itself does


 Sjoberg, et. al.            Standards Track                 [Page 26]

 INTERNET-DRAFT       RTP payload format for AMR-WB+      Dec 17, 2004


    not have any built-in security mechanisms.  Any suitable external
    mechanisms, such as SRTP [10], MAY be used.

    This payload format or the AMR-WB+ decoder does not exhibit any
    significant non-uniformity in the receiver side computational
    complexity for packet processing and thus is unlikely to pose a
    denial-of-service threat due to the receipt of pathological data.


 6.1. Confidentiality

    To achieve confidentiality of the encoded AMR-WB+ audio, all audio
    data bits will need to be encrypted.  There is less need to encrypt
    the payload header or the table of contents due to 1) that they only
    carry information about the frame type, and 2) that this information
    could be useful to some third party, e.g. quality monitoring.

    As long as the AMR-WB+ payload is only packed and unpacked at either
    end, encryption can be performed after packet encapsulation.


 6.2. Authentication and Integrity

    To authenticate the sender of the audio and provide integrity
    protection, an external mechanism has to be used.  It is RECOMMENDED
    that such a mechanism protect at least the complete RTP payload and
    header.

    Data tampering by a man-in-the-middle attacker could replace audio
    content and also result in erroneous depacketization/decoding that
    could lower the audio quality.


 6.3. Decoding Validation

    When processing a received payload packet, if the receiver finds
    that the calculated payload length based on the information of the
    session and the values found in the payload header fields does not
    match the size of the received packet, the receiver SHOULD discard
    the packet.  This is because decoding a packet that has errors in
    its fields used to indicate the number of frames or the frame type,
    which are used to determine data lengths of individual frames could
    severely degrade the audio quality.


 7. Payload Format Parameters

    This section defines the parameters that may be used to select
    features of the AMR-WB+ payload format.  The parameters are defined
    here as part of the media type registration for the AMR-WB+ audio
    codec.  A mapping of the parameters into the Session Description


 Sjoberg, et. al.            Standards Track                 [Page 27]

 INTERNET-DRAFT       RTP payload format for AMR-WB+      Dec 17, 2004


    Protocol (SDP) [6] is also provided for those applications that use
    SDP.  Equivalent parameters could be defined elsewhere for use with
    control protocols that do not use MIME or SDP.

    The data format and parameters are only specified for real-time
    transport in RTP.


 7.1. Media Type Registration

    The media type for the Extended Adaptive Multi-Rate Wideband (AMR-
    WB+) codec is allocated from the IETF tree since AMR-WB+ is expected
    to be a widely used audio codec in general streaming applications.

    Note, any unspecified parameter MUST be ignored by the receiver.

    Media Type name:     audio

    Media subtype name:  AMR-WB+

    Required parameters:

    None

    Optional parameters:

    channels:       The maximum number of audio channels present in the
                    audio frames.  Permissible values are 1 (mono) or 2
                    (stereo).  If no parameter is present, the maximum
                    number of channels is 2 (stereo).

    interleaving:   Indicates that frame level interleaving mode SHALL
                    be used for the payload.  The parameter specifies
                    the number of frame slots required in a
                    deinterleaving buffer (including the frame that is
                    ready to be consumed).  Its value is equal to one
                    plus the maximum number of frames that precede any
                    frame in transmission order and follow the frame in
                    RTP timestamp order.  If this parameter is not
                    present, interleaving SHALL NOT be used.

    int-delay:      The minimal media time delay in RTP timestamp ticks
                    that is needed in the deinterleaving buffer, i.e.
                    the difference in RTP timestamp between the earliest
                    and latest audio frame present in the deinterleaving
                    buffer, to ensure correct decoding.

    ptime:          see RFC2327 [6].

    maxptime:       see Section 8 in RFC 3267 [7].


 Sjoberg, et. al.            Standards Track                 [Page 28]

 INTERNET-DRAFT       RTP payload format for AMR-WB+      Dec 17, 2004


    Restriction on Usage:
                 This type is only defined for transfer via RTP (STD
                 64).

    Encoding considerations:

    Security considerations:
                 See Section 6 of RFC XXXX.

    Interoperability considerations:
                 To maintain interoperability with AMR-WB capable end-
                 points, in cases where negotiation is possible and the
                 AMR-WB+ end-point supporting this format also supports
                 RFC 3267 for AMR-WB transport, an AMR-WB+ end-point
                 SHOULD declare itself also as AMR-WB capable (i.e.
                 supporting also "audio/AMR-WB" as specified in RFC
                 3267).

                 As the AMR-WB+ decoder is capable of performing stereo
                 to mono conversions, all receivers of AMR-WB+ should be
                 able to receive both stereo and mono, although the
                 receiver only is capable of playout of mono signals.

    Public specification:
                 RFC XXXX
                 3GPP TS 26.290, see reference [1] of RFC XXXX

    Additional information:
                 This MIME type is not applicable for file storage.
                 Instead file storage of AMR-WB+ encoded audio is
                 specified within the 3GPP defined ISO based multimedia
                 file format defined in 3GPP TS 26.244, see reference
                 [14] of RFC XXXX.  This file format has the MIME types
                 "audio/3GPP" or "video/3GPP" as defined by RFC 3839
                 [15].

    Person & email address to contact for further information:
                 johan.sjoberg@ericsson.com
                 ari.lakaniemi@nokia.com

    Intended usage: COMMON.
                 It is expected that many IP based streaming
                 applications will use this type.

    Change controller:
                 IETF Audio/Video Transport working group delegated from
                 the IESG.


 7.2. Mapping Media Type Parameters into SDP


 Sjoberg, et. al.            Standards Track                 [Page 29]

 INTERNET-DRAFT       RTP payload format for AMR-WB+      Dec 17, 2004


    The information carried in the media type specification has a
    specific mapping to fields in the Session Description Protocol (SDP)
    [6], which is commonly used to describe RTP sessions.  When SDP is
    used to specify sessions employing the AMR-WB+ codec, the mapping is
    as follows:

    -  The media type ("audio") goes in SDP "m=" as the media name.

    -  The media type (payload format name) goes in SDP "a=rtpmap" as
       the encoding name.  The RTP clock rate in "a=rtpmap" SHALL be
       72000 for AMR-WB+, and the encoding parameter number of channels
       MUST either be explicitly set to 1 or 2, or be omitted, implying
       the default value of 2.

    -  The parameters "ptime" and "maxptime" go in the SDP "a=ptime" and
       "a=maxptime" attributes, respectively.

    -  Any remaining parameters go in the SDP "a=fmtp" attribute by
       copying them directly from the MIME media type string as a
       semicolon separated list of parameter=value pairs.


 7.2.1. Offer-Answer Model Considerations

    To achieve good interoperability for the AMR-WB+ RTP payload in an
    Offer-Answer [8] negotiate usage in SDP the following considerations
    should be made:

    For negotiable offer/answer usage the following interpretations of
    the parameters SHALL be done:

    -  The "interleaving" parameter is symmetric, thus requiring that
       the answerer must also include it for the answer to an offered
       payload type containing the parameter.  However, the buffer space
       value is declarative in usage in unicast.  For multicast usage
       the same value in the response is required to accept the payload
       type.  For streams declared as sendrecv or recvonly: The receiver
       will accept to receive payload using the interleaved mode of the
       payload format.  The value declares the amount of buffer space
       the receiver has available for the sender to utilize.  For
       sendonly streams the parameter indicates the desired
       configuration and amount of buffer space.  An answerer is
       RECOMMENDED to respond using the offered value, if capable of
       using it.

    -  The "int-delay" parameter is declarative.  For streams declared
       as sendrecv or recvonly the value indicate the maximum initial
       delay the receiver will accept in the deinterleaving buffer.  For
       sendonly streams the value is the amount of media time the sender
       desires to use, the value SHOULD be copied into any response.


 Sjoberg, et. al.            Standards Track                 [Page 30]

 INTERNET-DRAFT       RTP payload format for AMR-WB+      Dec 17, 2004


    -  The "channels" parameter is declarative.  For "sendonly" streams
       it indicates the desired channel usage, stereo and mono, or mono
       only.  For "recvonly" and "sendrecv" streams the parameter
       indicates what the receiver accepts to use.  As any receiver will
       be capable of receiving stereo frame type and perform local
       mixing within the AMR-WB+ decoder, there is normally only one
       reason to restrict to mono only: to avoid spending bit-rate on
       data that are not utilized if the front-end is only capable of
       mono.

    -  The "ptime" parameter works as indicated by the offer/answer
       model [8], "maxptime" SHALL be used in the same way.

    -  To maintain interoperability with AMR-WB in cases where
       negotiation is possible, an AMR-WB+ capable end-point which also
       implements the AMR-WB payload format [7] is RECOMMENDED to also
       declare itself capable of AMR-WB as it is a subset of the AMR-WB+
       codec.

    In declarative usage, like SDP in RTSP [16] or SAP [17], the
    following interpretation of the parameters SHALL be done:

    -  The "interleaving" parameter, if present, configures the payload
       format in that mode, and the value indicates the number of frames
       that the deinterleaving buffer is required to support to be able
       to handle this session correctly.

    -  The "int-delay" parameter indicates the initial buffering delay
       required to receive this stream correctly.

    -  The "channels" parameter indicates if the content being
       transmitted can contain either both stereo and mono rates, or
       only mono.

    -  All other parameters indicate values that are being used by the
       sending entity.


 7.2.2. Examples

    One example SDP session description utilizing AMR-WB+ mono and
    stereo encoding follow.

     m=audio 49120 RTP/AVP 99
     a=rtpmap:99 AMR-WB+/72000/2
     a=fmtp:99 interleaving=30; int-delay=86400
     a=maxptime:100

    Note that the payload format (encoding) names are commonly shown in
    upper case.  Media subtypes are commonly shown in lower case.  These
    names are case-insensitive in both places.  Similarly, parameter


 Sjoberg, et. al.            Standards Track                 [Page 31]

 INTERNET-DRAFT       RTP payload format for AMR-WB+      Dec 17, 2004


    names are case-insensitive both in MIME types and in the default
    mapping to the SDP a=fmtp attribute.


 8. IANA Considerations

    It is requested that one new MIME subtype (audio/amr-wb+) is
    registered by IANA, see Section 7.


 9. Contributors

    Daniel Enstrom has contributed in writing the codec introduction
    section.  Stefan Bruhn has contributed by writing the ISF recovery
    algorithm.

 10. Acknowledgements

    The authors would like to thank Redwan Salami and Stefan Bruhn for
    their significant contributions made throughout the writing and
    reviewing of this document.  Anisse Taleb and Ingemar Johansson
    contributed by implementing the payload format, and thus helped
    locating some flaws.  We would also like to acknowledge Qiaobing
    Xie, coauthor of RFC 3267 on which this document is based on.


 11. References

 11.1. Normative references

    [1]  3GPP TS 26.290 "Audio codec processing functions; Extended AMR
         Wideband codec; Transcoding functions", version 6.0.0 (2004-
         09), 3rd Generation Partnership Project (3GPP).
    [2]  Bradner, S., "Key words for use in RFCs to Indicate Requirement
         Levels", BCP 14, RFC 2119, Internet Engineering Task Force,
         March 1997.
    [3]  H. Schulzrinne, S. Casner, R. Frederick, V. Jacobson, "RTP: A
         Transport Protocol for Real-Time Applications", STD 64, RFC
         3550, Internet Engineering Task Force, July 2003.
    [4]  3GPP TS 26.192 "AMR Wideband speech codec; Comfort Noise
         aspects", version 5.0.0 (2001-03), 3rd Generation Partnership
         Project (3GPP).
    [5]  3GPP TS 26.193 "AMR Wideband speech codec; Source Controled
         Rate operation", version 5.0.0 (2001-03), 3rd Generation
         Partnership Project (3GPP).
    [6]  Handley, M. and V. Jacobson, "SDP: Session Description
         Protocol", RFC 2327, Internet Engineering Task Force, April
         1998.
    [7]  Sjoberg, J., Westerlund, M., Lakaniemi, A., and Q. Xie, "Real-
         Time Transport Protocol (RTP) Payload Format and File Storage
         Format for the Adaptive Multi-Rate (AMR) and Adaptive Multi-


 Sjoberg, et. al.            Standards Track                 [Page 32]

 INTERNET-DRAFT       RTP payload format for AMR-WB+      Dec 17, 2004


         Rate Wideband (AMR-WB) Audio Codecs", RFC 3267, Internet
         Engineering Task Force, June 2002.
    [8]  J. Rosenberg, and H. Schulzrinne, "An Offer/Answer Model with
         the Session Description Protocol (SDP)", RFC 3264, Internet
         Engineering Task Force, June 2002.


 11.2. Informative References

    [9]  Schulzrinne, H., "RTP Profile for Audio and Video Conferences
         with Minimal Control", STD 65, RFC 3551, Internet Engineering
         Task Force, July 2003.
    [10] Baugher, et. al., "The Secure Real Time Transport Protocol",
         RFC 3711, Internet Engineering Task Force, March 2004.
    [11] Rosenberg, J. and H. Schulzrinne, "An RTP Payload Format for
         Generic Forward Error Correction", RFC 2733, Internet
         Engineering Task Force, December 1999.
    [12] Perkins, C., Kouvelas, I., Hodson, O., Hardman, V., Handley,
         M., Bolot, J., Vega-Garcia, A. and S. Fosse-Parisis, "RTP
         Payload for Redundant Audio Data", RFC 2198, Internet
         Engineering Task Force, September 1997.
    [13] 3GPP TS 26.233 "Packet Switched Streaming service", version
         5.0.0 (2001-03), 3rd Generation Partnership Project (3GPP).
    [14] 3GPP TS 26.244 " Transparent end-to-end packet switched
         streaming service (PSS); 3GPP file format (3GP)", version 6.1.0
         (2004-09), 3rd Generation Partnership Project (3GPP).
    [15] D. Singer, and R. Castagno, "MIME Type Registrations for 3rd
         Generation Partnership Project (3GPP) Multimedia files," RFC
         3839, Internet Engineering Task Force, July 2004.
    [16] H. Schulzrinne, A. Rao, R. Lanphier, "Real Time Streaming
         Protocol (RTSP)", RFC 2326, Internet Engineering Task Force,
         April 1998.
    [17] M. Handley, C. Perkins, E. Whelan, "Session Announcement
         Protocol", RFC 2974, Internet Engineering Task Force, June
         2001.

    Any 3GPP document can be downloaded from the 3GPP webserver,
    "http://www.3gpp.org/", see specifications.


 Sjoberg, et. al.            Standards Track                 [Page 33]

 INTERNET-DRAFT       RTP payload format for AMR-WB+      Dec 17, 2004


 12. Authors' Addresses

    Johan Sjoberg
    Ericsson Research
    Ericsson AB
    SE-164 80 Stockholm, SWEDEN

    Phone:   +46 8 7190000
    EMail: Johan.Sjoberg@ericsson.com


    Magnus Westerlund
    Ericsson Research
    Ericsson AB
    SE-164 80 Stockholm, SWEDEN

    Phone:   +46 8 7190000
    EMail: Magnus.Westerlund@ericsson.com


    Ari Lakaniemi
    Nokia Research Center
    P.O.Box 407
    FIN-00045 Nokia Group, FINLAND

    Phone:   +358-71-8008000
    EMail: ari.lakaniemi@nokia.com


 13. IPR Notice

    The IETF takes no position regarding the validity or scope of any
    Intellectual Property Rights or other rights that might be claimed
    to pertain to the implementation or use of the technology described
    in this document or the extent to which any license under such
    rights might or might not be available; nor does it represent that
    it has made any independent effort to identify any such rights.
    Information on the procedures with respect to rights in RFC
    documents can be found in BCP 78 and BCP 79.

    Copies of IPR disclosures made to the IETF Secretariat and any
    assurances of licenses to be made available, or the result of an
    attempt made to obtain a general license or permission for the use
    of such proprietary rights by implementers or users of this
    specification can be obtained from the IETF on-line IPR repository
    at http://www.ietf.org/ipr.

    The IETF invites any interested party to bring to its attention any
    copyrights, patents or patent applications, or other proprietary
    rights that may cover technology that may be required to implement


 Sjoberg, et. al.            Standards Track                 [Page 34]

 INTERNET-DRAFT       RTP payload format for AMR-WB+      Dec 17, 2004


    this standard.  Please address the information to the IETF at ietf-
    ipr@ietf.org.


 14. Copyright Notice

    Copyright (C) The Internet Society (2004).  This document is subject
    to the rights, licenses and restrictions contained in BCP 78, and
    except as set forth therein, the authors retain all their rights.

    This document and the information contained herein are provided on
    an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE
    REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE
    INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR
    IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
    THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
    WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

    This Internet-Draft expires in June 2005.


 RFC Editor Considerations

    The RFC editor is requested to replace all occurrences of XXXX with
    the RFC number this document receives.

    The RFC editor is also requested to remove the next section
    "Changes".


 Changes

    Changes in draft-ietf-avt-rtp-amrwbplus-04.txt compared to draft-
    ietf-avt-rtp-amrwbplus-03.txt:

    - Editorial changes improving language.


    Changes in draft-ietf-avt-rtp-amrwbplus-03.txt compared to draft-
    ietf-avt-rtp-amrwbplus-02.txt:

    - Totally changed the payload format layout to reduce overhead
       (Section 4).
    - Updated the Offer/Answer definition for the interleaving
       parameter.
    - Rewritten the codec introduction to better explain the codec
       (Section 3.1).
    - Updated the security consideration on authentication.
    - Numerous editorial changes.


 Sjoberg, et. al.            Standards Track                 [Page 35]