Internet DRAFT - draft-xie-avt-et-rtp-amr


Internet Engineering Task Force                                  Q. Xie
Audio Video Transport WG                                       S. Gupta
INTERNET-DRAFT                                                 Motorola
                                                          November 2000
Expires in six months

                Error Tolerant RTP Payload Format for AMR

Status of this memo

This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026.

Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that other
groups may also distribute working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or cite them other than as "work in progress".

The list of current Internet-Drafts can be accessed at

The list of Internet-Draft Shadow Directories can be accessed at


This document defines the RTP payload format for *error tolerant*
delivery of Adaptive Multi-Rate (AMR) speech frames over an RTP

AMR codec's flexibility on bandwidth requirements and tolerance to
speech data bit errors are not only beneficial for "over-the-air"
wireless links, but also very desirable features for any voice-over-IP
applications in Internet. The design is focused on how to best
facilitate these features of AMR codec in an IP environment.

1. Conventions

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
document are to be interpreted as described in RFC2119 [1].

2. Introduction

This document defines the RTP payload format for *error tolerant*
delivery of Adaptive Multi-Rate (AMR) [2] speech frames over an RTP
session [3].

AMR speech codec [2] represents a new generation of coding algorithms
which are built to work with inaccurate transport channels. This type
of codec has built-in mechanisms to make itself tolerant to certain
degree of bit errors introduced by the transport channel. In other
words, it is designed to restore the original speech (with some
degradation) even when the coded speech data received is damaged.

However, in most cases, precise data transportation is the norm of IP
network, and whenever bit errors are detected in the data being
transported, the data is discarded. Usually, the transport protocol
(e.g., UDP or TCP) performs the bit error checking and bad packet
dropping. To take full advantage of the error tolerant design of AMR
codec, special consideration should be taken in the transport layer as
well as the AMR frame design. This will be discussed in details in
section 3.

Another important feature of AMR codec is that the data rate of the
coded speech generated by the encoder can be dynamically adjusted,
according to, for example, the availability of the path bandwidth. AMR
codec allows this adjustment being made on a frame by frame
basis. This feature may potentially help achieving better network
resource utilization.

3. Design Considerations

The flexibility on bandwidth requirements and the tolerance to bit
errors of AMR codec are not only beneficial for "over-the-air"
wireless links, but are also very desirable for general voice-over-IP
applications. The design is thus focused on how to best facilitate
these features of the AMR codec in an IP environment.

The protocol stack for delivering AMR speech data over IP is shown in
Figure 1.

                     | AMR Payload  |
                     |     RTP      |
                     |     UDP      |
                     |     IP       |
                     |  Link Layer  |
                  * or UDP-Lite, U-SCTP, etc.

     Figure 1. Protocol stack for AMR data delivery over IP.

As we will discuss in detail in the following sections, the use of an
error tolerant transport and link layers is desirable in order to take
full advantage of the bit error tolerant capability of the AMR codec
design. However, this is NOT a requirement for transferring AMR
encoded speech over IP. In other words, the AMR payload format defined
in this document can also work effectively with the conventional UDP
over any IP network.

3.1. Error Tolerant AMR Data Transport over IP Network

An IP network can be readily set up for bit-error tolerant unreliable
data transfer by, for example, using a data error tolerant link layer
and turning off the UDP checksum, or using a partial checksum
transport such as UDP-Lite [4] or U-SCTP [5].

Here the data error tolerant link layer can be a standard link layer
technology which tolerates bit errors (such as SLIP and some wireless
link layers) or any proprietary error tolerant link layers.

The use of a partial checksum unreliable transport protocol such as
UDP-Lite or U-SCTP can be a good solution at the transport
layer. These new unreliable transport protocols let a portion of the
carried data payload be excluded from their checksum calculation, and
hence bit errors occurred in that portion of the payload will not
cause the packet being discarded by the transport, while the rest of
the payload is still under the protection of the checksum.

Note, simply turning off the checksum in conventional UDP is not as a
good solution since a part of the data payload (such as the RTP header
and AMR payload header information) still needs data integrity
protection from the transport layer. We will discuss this in more
details below.

3.2. Unequal Bit Error Sensitivity of RTP Payload for AMR

When a RTP payload carrying AMR data is passed to the transport layer,
it will contain the following three types of data bits:

                  | header bits  |
                  +==============+ ---
                  | Class A bits |   \
                  |     and      |     AMR coded speech data
                  |  other bits  |   /
                  +--------------+ ---

           Figure 2. Data bits in RTP payload for AMR.

In Figure 2, the "header bits" include both AMR speech frame control
bits/headers (such as the frame type, mode request, etc.) and the RTP
protocol headers.

The "Class A bits" represent the most error sensitive bits in the
encoded speech data, while the "other bits" (also called Class B and C
bits in AMR terminology [6]) are those less sensitive to errors in the
encoded speech data.

3.3. Error Handling Requirements for Different Data Types

When delivering the RTP payload for AMR, in order to take the best
advantage of the error-tolerant design of the AMR codec, the above
three different types of bits in the datagram require different
bit error handling procedures.

  A) The "header bits" must be delivered error-free, and if any
     bit error is detected in the "header bits" portion of the
     datagrams, the whole datagram must be discarded. This is because
     any error in the header bits will invalidate the integrity of the
     whole datagram. 

  B) The "Class A bits" should be checked for error, and if any
     bit error is detected in the "Class A bits" portion of the
     payload, the AMR frame to which the erroneous Class A bits
     belong MUST be marked as bad. This is because if errors are found
     in the Class A bits of an AMR frame, the AMR decoder must be
     informed so that it will not use the Class A bits of that AMR
     frame when decoding the speech. 

     However, the payload should not be dropped since the "other
     bits" portion of the payload is less sensitive to bit errors and
     is still usable by the AMR decoder. 

  C) Error checking on "other bits" should not be performed.

In order to meet the above error handling goals, the Class A
bits can not be checksummed by the transport layer. Otherwise, any
error in the Class A bits of an AMR frame will cause the transport
layer to drop the whole packet. This can become disastrous if
multiple AMR frames are bundled in the same packet.

Instead, one should use transport layer partial checksum (as provided by
UDP-Lite and U-SCTP) to cover only the "header bits" shown in Figure 2,
while inside the AMR frame header, an 8-bit CRC covering the Class A
bits of the frame should be included. 

This CRC is generated by the speech encoder at the time when the AMR
frame is formed and will be verified by the AMR/RTP receiver (not the
transport protocol) before the AMR frame is passed to the decoder. If
the CRC verification fails, the AMR/RTP receiver will raise the bad
frame indicator when passing the AMR frame to the decoder

In summary, we will have:

  1) the transport layer partial checksum to cover the "header bits" -
     if found checksum error, discard the whole datagram.

  2) AMR frame CRC to cover the "Class A bits" - if found error, raise
     the a bad frame flag but continue to deliver the data. 

  3) The "other bits" is not checked.

3.4. Bundling Delivery of Multiple AMR Frames

When bundling multiple AMR frames in one RTP payload packet (so called
compound payload), a table of contents (TOC) structure should be used
to list all the header information of the included AMR frames. This
TOC block must be checksum protected at the transport layer.

No expensive bit reordering is necessary on the coded speech data bits
from the included AMR frames. The speech data bits of each included
AMR frame are simply cascaded to form the speech data block of the
compound payload, as shown in the following figure:

         +=====================+    -------
         |     RTP Header      |      ^
         +=====================+      |
         | RTP Payload Header  |   Error intolerant
         +=====================+  (protected by transport checksum)
         |  AMR Frame Table    |      |
         |  of Contents (TOC)  |      v
         +=====================+    -------
         |                     |      ^
         | Speech bits (Frm #1)|      |
         |         +-----------|      |
         |         |           |      |
         |---------+           |     Error tolerant
         |                     |      |
         | Speech bits (Frm #2)|      |
         |     +---------------|      |
         |     |               |      |
         |-----+               .      |
         .                     .      |
         . Speech bits (Frm #k)|      |
         |            +========+      |
         |            |               v
         +============+             -------

       Figure 3. Form for bundling multiple AMR frames.

Note, when conventional UDP is used as the transport, the whole
payload, including the RTP header, RTP payload header, TOC, and speech
data block, will be protected by UDP checksum.

4. Error Tolerant Payload Format Specification

In this section, we detail the format of the error tolerant RTP
payload for AMR. 

4.1. RTP Payload Header (PH) for AMR

Each RTP payload for AMR MUST start with the following 6 bit long
payload header: 

    0 1 2 3 4 5 
   | NF  | MR  |

   NF (Number of Frames) - unsigned int (3 bits):

     specifies the number of AMR frames carried within this RTP
     payload packet. Maximal number of AMR frames can be carried in
     a single payload packet is thus 7. 

     NF = '000' indicates no AMR frame is present in the payload. This
     can be used to send a stand-alone Mode Request.

   MR (Mode Request) - unsigned int (3 bits):

     indicates the next AMR rate mode the receiver of this payload
     packet should adopt. The value of MR is defined as the same as
     Frame Type Index 0-7, as shown in Table 1 below. 

4.2. AMR Frame Header (FH)

Immediately following the payload header (as defined in Section 4.1),
an AMR frame header MUST be present for each of the AMR frames
indicated by the NF field in the payload packet.

An AMR frame header occupies either 7 or 15 bits, depending on whether
a Codec CRC field is present. Its definition is as follows: 

    0                   1           
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4
   |   FT  |A|Q|C|   Codec CRC   |

   FT (Frame Type) - unsigned int (4 bits):

     specify the frame type index of the corresponding AMR frame, as 
     defined in Table 1a in [6]. The following table gives a summary:

                                     Class A   total speech
         Index     Mode                bits       bits        
           0       AMR 4.75            42         95         
           1       AMR 5.15            49        103         
           2       AMR 5.9             55        118         
           3       AMR 6.7 (PDC-EFR)   58        134         
           4       AMR 7.4 (IS-641)    61        148         
           5       AMR 7.95            75        159         
           6       AMR 10.2            65        204         
           7       AMR 12.2 (GSM-EFR)  81        244         
           8       AMR CNG             39         39         
           9       GSM EFR CNG         43         43         
          10       IS-641 CNG          38         38         
          11       PDC-EFR CNG         37         37         
          12 - 14  For future use       -          -         
          15       No transmission      0          0         
       Table 1: AMR speech frame types and sizes (from [6]).

   A (Class A Bits Only Indicator) - 1 bit:

      if set to 1, indicates that only Class A bits are present in the
      corresponding speech data portion of this frame. In other
      words, the less sensitive Class B and C speech bits are omitted from
      transmission in this frame. This could be useful to conserve
      bandwidth in certain Forward Error Correction (FEC) schemes.

      Using FT field and A flag together, the AMR receiver will
      be able to determine the exact number of speech bits carried in
      this frame (see Table 1).

   Q (Frame Quality Indicator) - 1 bit:

      corresponds to the Frame Quality Indicator (FQI) defined in
      Table 1b of [6]. If 0, indicating the AMR frame has been found
      corrupted (i.e., bad frame). If 1, indicating the frame is of
      good quality. 

   C (Codec CRC Indicator) - 1 bit:

      if set to 1, indicates the presence of an optional 8 bit Codec
      CRC field in this frame header.

   Codec CRC - binary encoded (8 bits):

      This is an optional field which is only present when the C bit
      is set to 1. 

      This corresponds to the Codec CRC defined in 4.1.4 of [6]. This
      CRC is used for error detection for the Class A bits of the 
      corresponding AMR frame. 

In cases where an error-intolerant transport (e.g., conventional UDP)
is used, the Codec CRC protection may become unnecessary.

Note, an AMR/RTP receiver MUST be prepared to receive an AMR coded
speech frame with or without the presence of the Codec CRC field. But
it is optional for the AMR/RTP sender to include a Codec CRC in an
outbound AMR frame.

When multiple AMR frames are present in the RTP payload, the AMR frame
headers from the included AMR frames are simply placed one after the
other into the payload, immediately following the payload header
bits. The frame headers together thus form a Table of Contents (TOC)
of all the AMR frames included.

The RTP payload header (PH) bits and the frame TOC together forms the
header block. The header block MUST be zero-padded to the next octet

The following diagram shows a payload header block indicating a single
AMR frame with no CRC:

    0                   1           
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
   |FN=1 | MR  | FT    |A|Q|0|x x x|
   |     octet 1   |     octet 2   |

 x - zero-padded bits 

Here is another example showing a header block indicating two AMR
frames with no CRC:

    0                   1                   2
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3
   |FN=2 | MR  | FT #1 |A|Q|0| FT #2 |A|Q|0|x x x x|
   |     octet 1   |     octet 2   |     octet 3   |

 x - zero-padded bits 

Note, this zero-padding on the header block has no interaction with
the RTP P bit (defined in [3]). 

4.3. AMR Frame Speech Data

Following the header block, the speech bits from each frame are placed
one after the other into the payload in the same order as their frame 
headers are arranged in the TOC. All the speech bits together forms
the speech data block.

Similar to the zero-padding on the header block described above, the
speech data block MUST be zero-padded to the next octet boundary. This
zero-padding on the speech data block has no interaction with the RTP
P bit (defined in [3]). 

4.4. Error Protection and Detection

Based on our discussion in Section 3.3, the header block, as defined
in Section 4.2, MUST be protected by the transport checksum. If this
portion of the payload fails the checksum examination, the whole
payload packet will be silently dropped at the transport layer.

The speech data block, as defined in Section 4.3, should not be
covered by the transport layer checksum when an error tolerant
transport is used (e.g., UDP-Lite, U-SCTP).

At the RTP receiver, after the payload packet is delivered from the
transport layer (and is unbundled into individual AMR frames in the
case of receiving a compound RTP payload), a preprocessor or
adaptation layer should verify the Codec CRC (if present) of an AMR
frame over the received Class A bits in the frame. If the CRC
verification fails, before passing the AMR frame to the speech
decoder, the preprocessor should set the Q bit of the frame to 0,
indicating the Class A bits of this frame in unusable.

This preprocessing function can be an internal function of the AMR
decoder. This is implementation specific.

5. RTP Payload for AMR Examples

5.1. Payload with a Single AMR Frame

This example shows an RTP payload packet carrying a single good
quality full AMR frame of 12.2 kbits/s rate (FT=7) with no CRC.

    0                   1           
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 
   +-+-+-+-+-+-+                       -------
   |FN=1 | MR=6|        <-PH               ^           
   +-+-+-+-+-+-+-+                     header block
   | FT=7  |0|1|0|      <-FH 1         (total 2 octets)
   +-+-+-+-+-+-+-+                         |
   |0 0 0|              <-padding          v
   +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+   -------
   |                               |       ^
   :  frm #1 (244 spch bits)       :   speech data block 
   |                               |   (total 31 octets)
   |       +-+-+-+-+-+-+-+-+-+-+-+-+       |
   |       |0 0 0 0|                       V
   +-+-+-+-+-+-+-+-+                   -------

In this example, the AMR receiver of this packet is also being asked
to use 10.2 kbits/s rate (MR=6) for speech encoding when sending in
the opposite direction.

5.2. Payload with multiple AMR Frames

This example shows three AMR frames of different type bundled into one
RTP payload:

    0                   1           
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 
   +-+-+-+-+-+-+                           -------
   |FN=3 | MR=4|        <-PH                  ^           
   +-+-+-+-+-+-+-+                            |
   | FT=7  |0|1|0|      <-FH1              header block
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+         (total 4 octets)
   | FT=2  |0|1|1|   Codec CRC2  |  <-FH2     |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+            |
   | FT=4  |1|0|0|      <-FH3                 |
   +-+-+-+-+-+-+-+                            |
   |0 0 0 0 0|          <-padding             v
   +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+       -------
   |                               |          ^
   :  frm #1 (244 spch bits)       :          |
   |                               |          |
   |       |-----------------------|          |
   |-+-+-+-+                       |   speech data block
   |  frm #2 (118 spch bits)       :   (total 53 octets)
   :                               |          |
   |                               |          |
   |   |---------------------------|          |
   |-+-+                           |          |
   |  frm #3 (61 Class A bits)     |          |
   |             +-+-+-+-+-+-+-+-+-+          |
   |             |0|                          V
   +-+-+-+-+-+-+-+-+                       -------


  1) the MR field (=4) in the payload header instructs the AMR
  codec peer to use 7.40 kbits/s encoding rate. 

  2) the second AMR frame is transmitted with 8 bit Codec CRC.

  3) the third AMR frame has the Q bit set to '0' in its frame header,
  indicating that the Class A bits of that AMR frame are corrupted and
  should not be used by the decoder when restructuring the
  speech. Also, the A flag is set to '1' for this frame, indicating
  that the corresponding speech data of this frame only contains Class
  A bits. 

6. Security Considerations


7. Acknowledgment

The authors wish to thank Peter Lei and Colin Perkins for their
valuable comments and suggestions.

8.  References

[1] IETF RFC2119, "Key words for use in RFCs to Indicate Requirement

[2] 3G TS 26.071 (V3.0.1), "AMR Speech Codec: General Description".

[3] IETF RFC1889, "RTP: A Transport Protocol for Real-Time

[4] IETF Internet Draft <draft-larzon-udplite-02.txt>, "The UDP Lite
    Protocol", work in progress.

[5] IETF Internet Draft <draft-xie-usctp-sigtran-01.txt>, "SCTP
    Unreliable Data Mode Extension", work in progress.

[6] 3G TS 26.101 (V3.0.0), "AMR Speech Codec: Frame Structure".

[7] Postel, J. (ed.), "User Datagram Protocol", RFC 768, August 1980.

9.  Authors' addresses

Qiaobing Xie                            Tel: +1-847-632-3028
Motorola, Inc.                          EMail:
1501 W. Shure Drive, #2309	    

Sanjay Gupta                            Tel: +1-847-435-0306
Motorola, Inc.                          EMail:
1501 W. Shure Drive, #3205	    

            Expires in six months from November 2000