Internet Engineering Task Force                                  Q. Xie
Audio Video Transport WG                                       S. Gupta
INTERNET-DRAFT                                                 Motorola
                                                           October 2000
Expires in six months


                Error Tolerant RTP Payload Format for AMR
                  <draft-xie-avt-et-rtp-amr-00.txt>


Status of this memo

This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026.

Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that other
groups may also distribute working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or cite them other than as "work in progress".

The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/lid-abstracts.txt

The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html


Abstract

This document defines the RTP payload format for *error tolerant*
delivery of Adaptive Multi-Rate (AMR) speech frames over an RTP
session.

The flexibility on bandwidth requirements and the tolerance to bit
errors of AMR codes are not only beneficial for "over-the-air"
wireless links, but are also very desirable for any Voice-over-IP
applications. The design is focused on how to best facilitate these
features of AMR codec in an IP environment.


1. Conventions

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC2119 [1].


2. Introduction

This document defines the RTP payload format for *error tolerant*
delivery of Adaptive Multi-Rate (AMR) [2] speech frames over an RTP
session [3].

AMR speech codec [2] represents a new generation of coding algorithms
which are built to work with inaccurate transport channels. This type
of codec has built-in mechanisms to make itself tolerant to certain
degree of bit errors introduced by the transport channel. In other
words, it is designed to restore the original speech (with some
degradation) even when the coded speech data is received with some bit
errors.

However, in most cases, precise data transportation is the norm of IP
network, and whenever bit errors are detected in the data being
transported, the data is discarded. Usually, the transport protocol
(e.g., UDP or TCP) performs the bit error checking and bad packet
dropping. To take full advantage of the error tolerant design of AMR
codec, special consideration must be taken in the transport
layer as well as the AMR frame design. This will be discussed in
details in section 3. 

Another important feature of AMR codec is that the data rate of the
coded speech generated by the encoder can be dynamically adjusted,
according to, for example, the availability of the path
bandwidth. This adjustment can be made on a frame by frame basis. For 
instance, during a live VoIP session, if the path over which the voice
traffic is being carried starts to experience congestion, the decoder,
after observing increased packet loss, may instruct the encoder to
switch its operation to a lower coding rate so as to reduce the
traffic amount and to avoid further packet loss. 


3. Design Considerations

The flexibility on bandwidth requirements and the tolerance to bit
errors of AMR codes are not only beneficial for "over-the-air"
wireless links, but are also very desirable for any Voice-over-IP
applications.

The design is thus focused on how to best facilitate these features of
AMR codec in an IP environment.


3.1. Error Tolerant Transport

The protocol stack for delivering AMR speech data over IP is shown in
Figure 1.

                     +--------------+
                     | AMR Payload  |
                     +--------------+
                     |     RTP      |
                     +==============+
                     |  UDP-Lite*   |
                     +--------------+
                     |     IP       |
                     +--------------+
                  * or U-SCTP, etc.

     Figure 1. Protocol stack for AMR data delivery over IP.

To allow datagrams with certain bit errors to be delivered over an IP 
network, we need an error tolerant transport layer. 

  Note, simply turning off the checksum protection in traditional UDP
  is not a solution since a part of the datagram (such as the RTP
  header information) still needs data integrity protection from the
  transport layer. 

This requires the use of partial checksum capable transport protocols
such as the UDP-Lite [4] and Unreliable SCTP [5]. These new
unreliable datagram transport protocols let a portion of the carried
datagram to be excluded from their checksum calculation, and hence bit
errors occurred in that portion of data will not cause the datagram
being discarded, while the rest of the datagram is still under the
checksum protection.


3.2. Unequal Bit Error Sensitivity of RTP Payload for AMR

When a RTP payload carrying AMR data is passed to the transport layer,
it will contain the following three types of data bits:

                  +--------------+
                  | header bits  |
                  +==============+ ---
                  | Class A bits |   \
                  |     and      |     AMR coded speech data
                  |  other bits  |   /
                  +--------------+ ---

           Figure 2. Data bits in RTP payload for AMR.

In Figure 2, the "header bits" include both AMR speech frame control
bits/headers (such as the frame type, mode request, etc.) and the RTP
protocol headers.

The "Class A bits" represent the most error sensitive coded speech data
bits, while the "other bits" (also called Class B and C bits in AMR
terminology [6]) are those less sensitive to errors in the coded
speech data. 


3.3. Error Handling Requirements for Different Data Types

When delivering the RTP payload for AMR, in order to take the best
advantage of the error-tolerant design of the AMR codec, the above
three different types of bits in the datagram require different
bit error handling procedures.

  A) The "header bits" MUST be delivered error-free, and if any
     bit error is detected in the "header bits" portion of the
     datagrams, the whole datagram MUST be discarded. This is because
     any error in the header bits will invalidate the integrity of the
     whole datagram. 

  B) The "Class A bits" MUST be checked for error, and if any
     bit error is detected in the "Class A bits" portion of the
     datagrams, the AMR frame to which the erroneous Class A bits
     belong MUST be marked as bad. This is because if errors are found
     in the Class A bits of an AMR frame, the AMR decoder must be
     informed so that it will not use the Class A bits of that AMR
     frame when decoding the speech. 

     However, the datagram MUST NOT be dropped since the "other bits"
     portion of the datagram is less sensitive to bit errors and is
     still usable to the AMR decoder. 

  C) Error checking on "other bits" shall not be performed.

In order to meet the above error handling requirements, the Class A
bits can not be checksummed by the transport layer. Otherwise, any
error in the Class A bits of an AMR frame will cause the transport
layer to drop the whole datagram. This can become disastrous if
multiple AMR frames are bundled in the same datagram.

Instead, we shall use transport layer partial checksum (as provided by
UDP-Lite and U-SCTP) to cover only the "header bits" shown in Figure 2,
while inside the AMR frame header, an 8-bit CRC covering the Class A
bits of the frame shall be included. 

This CRC is generated by the speech encoder at the time when the AMR
frame is formed and will be verified by the RTP receiver (not the
transport protocol) before the AMR frame is passed to the decoder. If
the CRC verification fails, the RTP receiver will raise the bad frame
indicator when passing the AMR frame to the decoder

In summary, we will have:

  1) the transport layer partial checksum to cover the "header bits" -
     if found checksum error, discard the whole datagram.

  2) AMR frame CRC to cover the "Class A bits" - if found error, raise
     the a bad frame flag but continue to deliver the data. 

  3) The "other bits" is not checked.


3.4. Bundling Delivery of Multiple AMR Frames

When bundling multiple AMR frames in one RTP payload packet (so called
compound payload), a table of contents (TOC) structure shall be used
to list all the header information of the included AMR frames. This
TOC block MUST be checksum protected at the transport layer.

No expensive bit reordering is necessary on the coded speech data bits
from the included AMR frames. The speech data bits of each included
AMR frame shall be simply bit-stuffed to the next byte boundary and
cascaded to form the speech data block of the compound payload, as
shown in the following figure:

         +=====================+    -------
         |     RTP Header      |      ^
         +=====================+      |
         | RTP Payload Header  |   Error intolerant
         +=====================+  (protected by transport checksum)
         |  AMR Frame Table    |      |
         |  of Contents (TOC)  |      v
         +=====================+    -------
         |                     |      ^
         | Speech bits (Frm #1)|      |
         |         +-----------|      |
         |         |           |      |
         |---------+           |     Error tolerant
         |                     |      |
         | Speech bits (Frm #2)|      |
         |     +---------------|      |
         |     |               |      |
         |-----+               .      |
         .                     .      |
         . Speech bits (Frm #k)|      |
         |            +========+      |
         |            |               v
         +============+             -------

       Figure 3. Form for bundling multiple AMR frames.


4. Error Tolerant Payload Format Specification

In this section, we detail the format of the error tolerant RTP
payload for AMR. 


4.1. RTP Payload Header for AMR

Each RTP payload for AMR MUST start with the following one (1) octet
payload header:

    0               
    0 1 2 3 4 5 6 7 
   +-+-+-+-+-+-+-+-+
   |    NF   | MR  |
   +-+-+-+-+-+-+-+-+


   NF (Number of Frames) - unsigned int (5 bits):

     specifies the number of AMR frames carried within this RTP
     payload packet. Maximal number of AMR frames can be carried in
     a single payload packet is thus 32.

   MR (Mode Request) - unsigned int (3 bits):

     indicates the next AMR rate mode the receiver of this payload
     packet should adopt. The value of MR is defined as the same as
     Frame Type Index 0-7, as shown in Table 1 below. 


4.2. AMR Frame Headers

Immediately following the payload header octet (as defined in Section 4.1) 
a two (2) octet AMR frame header MUST be present for each AMR frame
carried in the payload packet. 

The AMR frame header is defined as follows:

    0                   1           
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |   FT  |Q| rsvd|   Codec CRC   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


   FT (Frame Type) - unsigned int (4 bits):


     specify the frame type index of the corresponding AMR frame, as 
     defined in Table 1a in [6]. The following table gives a summary:

                                     speech    size in octets
         Index     Mode               bits      (w/ padding)
         ---------------------------------------------------
           0       AMR 4.75             95          12
           1       AMR 5.15            103          13
           2       AMR 5.9             118          15
           3       AMR 6.7 (PDC-EFR)   134          17
           4       AMR 7.4 (IS-641)    148          19
           5       AMR 7.95            159          20
           6       AMR 10.2            204          26
           7       AMR 12.2 (GSM-EFR)  244          31
           8       AMR CNG              39           5 
           9       GSM EFR CNG          43           6
          10       IS-641 CNG           38           5
          11       PDC-EFR CNG          37           5
          12 - 14  For future use        -           -
          15       No transmission       0           0
      
       Table 1: AMR speech frame types and sizes (from [6]).


   Q (Frame Quality Indicator) - 1 bit:

      corresponds to the Frame Quality Indicator (FQI) defined in
      Table 1b of [6]. If 0, indicating the AMR frame has been found
      corrupted (i.e., bad frame). If 1, indicating the frame is of
      good quality. 

   Codec CRC - binary encoded (8 bits):

      corresponds to the Codec CRC defined in 4.1.4 of [6]. This CRC
      is used for error detection for the Class A bits of the
      corresponding AMR frame. 

   Bits 5-7 are reserved for future use and MUST be set to 0's at sent
   and ignored at the reception.

When multiple AMR frames are present in the RTP payload, the AMR frame
headers from the included AMR frames are simply placed one after the
other into the payload, immediately following the payload header
octet. The frame headers together thus form a Table of Contents (TOC)
of all the AMR frames included.


4.3. AMR Frame Speech Data

Following the TOC, the speech bits from each frame will be bit-stuffed
with zeros to the next byte boundary (see Table 1 above for the
resultant sizes in octets for different frame types) and placed one
after the other in the payload in the same order as their frame
headers are arranged in the TOC.


4.4. Error Protection and Detection

Based on our discussion in Section 3.3, all header informations,
including the RTP Payload Header as defined in Section 4.1 and the AMR
Frame Headers (i.e., the TOC) as defined in Section 4.2, MUST be
protected by the transport checksum. In other words, if this portion
of the payload fails the checksum examination, the whole payload
packet will be silently dropped at the transport layer.

All the AMR speech data bits, as defined in Section 4.3, MUST NOT be
covered by the transport layer checksum, since bits errors in this
portion of the payload packet should be tolerated at the transport
layer. 

At the RTP receiver, after the payload packet is delivered from the
transport layer (and is unbundled into individual AMR frames in the
case of receiving a compound RTP payload), a preprocessor or
adaptation layer shall verify the Codec CRC of each AMR frame over
the received Class A bits of the corresponding frame. If the CRC
verification fails, before passing the AMR frame to the decoder, the
preprocessor shall set the Q bit of the frame to 0, indicating the
Class A bits of this frame in unusable.

This preprocessing function can be an internal function of the AMR
decoder. This is implementation specific.


5. RTP Payload for AMR Examples


5.1. Payload with a Single AMR Frame

This example shows an RTP payload packet carrying a single good
quality AMR frame of 12.2 kbits/s rate (FT=7).

   +-+-+-+-+-+-+-+-+
   |  NF=1   | MR=6|            <- payload header 
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | FT=7  |1| rsvd|   Codec CRC   |   <- frame header
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | data byte #1  |            <- encoded speech bits 0 to 7
   +-+-+-+-+-+-+-+-+	  
   | data byte #2  |            <- encoded speech bits 8 to 15
   +-+-+-+-+-+-+-+-+	  
   /     ...       /	  
   \               \	  
   +-+-+-+-+-+-+-+-+	  
   | data byte #30 |            <- encoded speech bits 232 to 239
   +-+-+-+-+-+-+-+-+	  
   | data byte #31 |            <- encoded speech bits 240 to 243
   +-+-+-+-+-+-+-+-+               (padded with 4 zeros) 

In this example, the eventual AMR receiver of this packet is also
being asked to use 10.2 kbits/s rate (MR=6) for speech encoding when
sending to the opposite direction. 


5.2. Payload with multiple AMR Frames

This example shows three AMR frames of different type bundled into one
RTP payload:

   +-+-+-+-+-+-+-+-+
   |  NF=3   | MR=4|            <- payload header 
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | FT=7  |1| rsvd|   Codec CRC1  |   <- frame header 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | FT=2  |0| rsvd|   Codec CRC2  |   <- frame header 2
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | FT=4  |1| rsvd|   Codec CRC3  |   <- frame header 3
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |  Spch Data #1 |
   .  (31 octets)  .            <- 244 bits encoded speech for	  
   |     ...       |	           frame 1 (w/ 4 last bits 0-padded)
   +-+-+-+-+-+-+-+-+	  
   |  Spch Data #2 |
   .  (15 octets)  .            <- 115 bits encoded speech for	  
   |     ...       |	           frame 2 (w/ 5 last bits 0-padded)
   +-+-+-+-+-+-+-+-+	  
   |  Spch Data #3 |
   .  (19 octets)  .            <- 148 bits encoded speech for	  
   |     ...       |	           frame 3 (w/ 4 last bits 0-padded)
   +-+-+-+-+-+-+-+-+	  

Moreover, the MR field (=4) in the payload header instructs the AMR
codec peer to use 7.40 kbits/s encoding rate. Also, the second frame
has Q bit set to '0' in its frame header, indicating that the Class A
bits of that AMR frame are corrupted and should not be used by the
decoder for decoding the speech.


6.  References

[1] IETF RFC2119, "Key words for use in RFCs to Indicate Requirement
    Levels".

[2] 3G TS 26.071 (V3.0.1), "AMR Speech Codec: General Description".

[3] IETF RFC1889, "RTP: A Transport Protocol for Real-Time
    Applications".

[4] IETF Internet Draft <draft-larzon-udplite-02.txt>, "The UDP Lite
    Protocol".

[5] IETF Internet Draft <draft-xie-usctp-sigtran-01.txt>, "SCTP
    Unreliable Data Mode Extension".

[6] 3G TS 26.101 (V3.0.1), "AMR Speech Codec: Frame Structure".


7.  Authors' addresses

Qiaobing Xie                            Tel: +1-847-632-3028
Motorola, Inc.                          EMail: qxie1@email.mot.com
1501 W. Shure Drive, #2309	    

Sanjay Gupta                            Tel: +1-847-435-0306
Motorola, Inc.                          EMail: QA4496@email.mot.com
1501 W. Shure Drive, #3205	    


            Expires in six months from pctober 2000.