Internet Engineering Task Force Q. Xie Audio Video Transport WG S. Gupta INTERNET-DRAFT Motorola October 2000 Expires in six months Error Tolerant RTP Payload Format for AMR Status of this memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or cite them other than as "work in progress". The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/lid-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html Abstract This document defines the RTP payload format for *error tolerant* delivery of Adaptive Multi-Rate (AMR) speech frames over an RTP session. The flexibility on bandwidth requirements and the tolerance to bit errors of AMR codes are not only beneficial for "over-the-air" wireless links, but are also very desirable for any Voice-over-IP applications. The design is focused on how to best facilitate these features of AMR codec in an IP environment. 1. Conventions The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC2119 [1]. 2. Introduction This document defines the RTP payload format for *error tolerant* delivery of Adaptive Multi-Rate (AMR) [2] speech frames over an RTP session [3]. AMR speech codec [2] represents a new generation of coding algorithms which are built to work with inaccurate transport channels. This type of codec has built-in mechanisms to make itself tolerant to certain degree of bit errors introduced by the transport channel. In other words, it is designed to restore the original speech (with some degradation) even when the coded speech data is received with some bit errors. However, in most cases, precise data transportation is the norm of IP network, and whenever bit errors are detected in the data being transported, the data is discarded. Usually, the transport protocol (e.g., UDP or TCP) performs the bit error checking and bad packet dropping. To take full advantage of the error tolerant design of AMR codec, special consideration must be taken in the transport layer as well as the AMR frame design. This will be discussed in details in section 3. Another important feature of AMR codec is that the data rate of the coded speech generated by the encoder can be dynamically adjusted, according to, for example, the availability of the path bandwidth. This adjustment can be made on a frame by frame basis. For instance, during a live VoIP session, if the path over which the voice traffic is being carried starts to experience congestion, the decoder, after observing increased packet loss, may instruct the encoder to switch its operation to a lower coding rate so as to reduce the traffic amount and to avoid further packet loss. 3. Design Considerations The flexibility on bandwidth requirements and the tolerance to bit errors of AMR codes are not only beneficial for "over-the-air" wireless links, but are also very desirable for any Voice-over-IP applications. The design is thus focused on how to best facilitate these features of AMR codec in an IP environment. Note, the use of an error tolerant transport as discussed in the following sections is desirable for achieving unequal error detection (UED) but not a requirement. The conventional UDP [7] can be used in place of the error tolerant transport. However, doing so will prevent one from taking full advantage of the bit error tolerant design of the AMR codec. 3.1. Error Tolerant Transport The protocol stack for delivering AMR speech data over IP is shown in Figure 1. +--------------+ | AMR Payload | +--------------+ | RTP | +==============+ | UDP-Lite* | +--------------+ | IP | +--------------+ * or U-SCTP, etc. Figure 1. Protocol stack for AMR data delivery over IP. To allow datagrams with certain bit errors to be delivered over an IP network, we need an error tolerant transport layer. Note, simply turning off the checksum protection in traditional UDP is not a good solution since a part of the datagram (such as the RTP header information) still needs data integrity protection from the transport layer. This requires the use of partial checksum capable transport protocols such as the UDP-Lite [4] and Unreliable SCTP [5]. These new unreliable datagram transport protocols let a portion of the carried datagram to be excluded from their checksum calculation, and hence bit errors occurred in that portion of data will not cause the datagram being discarded, while the rest of the datagram is still under the checksum protection. 3.2. Unequal Bit Error Sensitivity of RTP Payload for AMR When a RTP payload carrying AMR data is passed to the transport layer, it will contain the following three types of data bits: +--------------+ | header bits | +==============+ --- | Class A bits | \ | and | AMR coded speech data | other bits | / +--------------+ --- Figure 2. Data bits in RTP payload for AMR. In Figure 2, the "header bits" include both AMR speech frame control bits/headers (such as the frame type, mode request, etc.) and the RTP protocol headers. The "Class A bits" represent the most error sensitive coded speech data bits, while the "other bits" (also called Class B and C bits in AMR terminology [6]) are those less sensitive to errors in the coded speech data. 3.3. Error Handling Requirements for Different Data Types When delivering the RTP payload for AMR, in order to take the best advantage of the error-tolerant design of the AMR codec, the above three different types of bits in the datagram require different bit error handling procedures. A) The "header bits" must be delivered error-free, and if any bit error is detected in the "header bits" portion of the datagrams, the whole datagram must be discarded. This is because any error in the header bits will invalidate the integrity of the whole datagram. B) The "Class A bits" should be checked for error, and if any bit error is detected in the "Class A bits" portion of the datagrams, the AMR frame to which the erroneous Class A bits belong MUST be marked as bad. This is because if errors are found in the Class A bits of an AMR frame, the AMR decoder must be informed so that it will not use the Class A bits of that AMR frame when decoding the speech. However, the datagram should not be dropped since the "other bits" portion of the datagram is less sensitive to bit errors and is still usable to the AMR decoder. C) Error checking on "other bits" should not be performed. In order to meet the above error handling requirements, the Class A bits can not be checksummed by the transport layer. Otherwise, any error in the Class A bits of an AMR frame will cause the transport layer to drop the whole datagram. This can become disastrous if multiple AMR frames are bundled in the same datagram. Instead, one should use transport layer partial checksum (as provided by UDP-Lite and U-SCTP) to cover only the "header bits" shown in Figure 2, while inside the AMR frame header, an 8-bit CRC covering the Class A bits of the frame should be included. This CRC is generated by the speech encoder at the time when the AMR frame is formed and will be verified by the AMR/RTP receiver (not the transport protocol) before the AMR frame is passed to the decoder. If the CRC verification fails, the AMR/RTP receiver will raise the bad frame indicator when passing the AMR frame to the decoder In summary, we will have: 1) the transport layer partial checksum to cover the "header bits" - if found checksum error, discard the whole datagram. 2) AMR frame CRC to cover the "Class A bits" - if found error, raise the a bad frame flag but continue to deliver the data. 3) The "other bits" is not checked. 3.4. Bundling Delivery of Multiple AMR Frames When bundling multiple AMR frames in one RTP payload packet (so called compound payload), a table of contents (TOC) structure sould be used to list all the header information of the included AMR frames. This TOC block must be checksum protected at the transport layer. No expensive bit reordering is necessary on the coded speech data bits from the included AMR frames. The speech data bits of each included AMR frame are simply cascaded to form the speech data block of the compound payload, as shown in the following figure: +=====================+ ------- | RTP Header | ^ +=====================+ | | RTP Payload Header | Error intolerant +=====================+ (protected by transport checksum) | AMR Frame Table | | | of Contents (TOC) | v +=====================+ ------- | | ^ | Speech bits (Frm #1)| | | +-----------| | | | | | |---------+ | Error tolerant | | | | Speech bits (Frm #2)| | | +---------------| | | | | | |-----+ . | . . | . Speech bits (Frm #k)| | | +========+ | | | v +============+ ------- Figure 3. Form for bundling multiple AMR frames. Note, when conventional UDP is used as the transport, the whole payload, including the RTP header, RTP payload header, TOC, and speech data block, will be protected by UDP checksum. 4. Error Tolerant Payload Format Specification In this section, we detail the format of the error tolerant RTP payload for AMR. 4.1. RTP Payload Header (PH) for AMR Each RTP payload for AMR MUST start with the following 6 bit long payload header: 0 1 2 3 4 5 +-+-+-+-+-+-+ | NF | MR | +-+-+-+-+-+-+ NF (Number of Frames) - unsigned int (3 bits): specifies the number of AMR frames carried within this RTP payload packet. Maximal number of AMR frames can be carried in a single payload packet is thus 7. NF = '000' indicates no AMR frame is present in the payload. This can be used to send a stand-alone Mode Request. MR (Mode Request) - unsigned int (3 bits): indicates the next AMR rate mode the receiver of this payload packet should adopt. The value of MR is defined as the same as Frame Type Index 0-7, as shown in Table 1 below. 4.2. AMR Frame Header (FH) Immediately following the payload header (as defined in Section 4.1), an AMR frame header MUST be present for each of the AMR frames indicated by the NF field in the payload packet. An AMR frame header occupies either 7 or 15 bits, depending on whether a Codec CRC fiels is present. Its definition is as follows: 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 +-+-+-+-+-+-+-+.+.+.+.+.+.+.+.+ | FT |A|Q|C| Codec CRC | +-+-+-+-+-+-+-+.+.+.+.+.+.+.+.+ FT (Frame Type) - unsigned int (4 bits): specify the frame type index of the corresponding AMR frame, as defined in Table 1a in [6]. The following table gives a summary: Class A total speech Index Mode bits bits -------------------------------------------------- 0 AMR 4.75 42 95 1 AMR 5.15 49 103 2 AMR 5.9 55 118 3 AMR 6.7 (PDC-EFR) 58 134 4 AMR 7.4 (IS-641) 61 148 5 AMR 7.95 75 159 6 AMR 10.2 65 204 7 AMR 12.2 (GSM-EFR) 81 244 8 AMR CNG 39 39 9 GSM EFR CNG 43 43 10 IS-641 CNG 38 38 11 PDC-EFR CNG 37 37 12 - 14 For future use - - 15 No transmission 0 0 Table 1: AMR speech frame types and sizes (from [6]). A (Class A Bits Only Indicator) - 1 bit: if set to 1, indicates that only Class A bits are present in the corresponding speech data portion of this frame. In other words, the less sensitive Class B and C speech bits are omitted from transmission in this frame. This could be useful to conserve bandwidth in certain Forward Error Corection (FEC) schemes. Using FT field and A flag together, the AMR receiver will be able to determine the exact number of speech bits carried in this frame (see Table 1). Q (Frame Quality Indicator) - 1 bit: corresponds to the Frame Quality Indicator (FQI) defined in Table 1b of [6]. If 0, indicating the AMR frame has been found corrupted (i.e., bad frame). If 1, indicating the frame is of good quality. C (Codec CRC Indicator) - 1 bit: if set to 1, indicates the presence of an optional 8 bit Codec CRC field in this frame header. Codec CRC - binary encoded (8 bits): This is an optional field which is only present when the C bit is set to 1. This corresponds to the Codec CRC defined in 4.1.4 of [6]. This CRC is used for error detection for the Class A bits of the corresponding AMR frame. In cases where an error-intolerant transport (e.g., conventional UDP) is used, the Codec CRC protection may become unnecessary. Note, an AMR/RTP receiver MUST be prepared to receive an AMR coded speech frame with or without the presence of the Codec CRC field. But it is optional for the AMR/RTP sender to include a Codec CRC in an outbound AMR frame. When multiple AMR frames are present in the RTP payload, the AMR frame headers from the included AMR frames are simply placed one after the other into the payload, immediately following the payload header bits. The frame headers together thus form a Table of Contents (TOC) of all the AMR frames included. The RTP payload header (PH) bits and the frame TOC together forms the header block. The header block MUST be zero-padded to the next octet boundary. The following diagram shows a payload header block indicating a single AMR frame with no CRC: 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |FN=1 | MR | FT |A|Q|0|x x x| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | octet 1 | octet 2 | x - zero-padded bits Here is another example showing a header block indicating two AMR frames with no CRC: 0 1 2 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |FN=2 | MR | FT #1 |A|Q|0| FT #2 |A|Q|0|x x x x| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | octet 1 | octet 2 | octet 3 | x - zero-padded bits Note, this zero-padding on the header block has no interaction with the RTP P bit (defined in [3]). 4.3. AMR Frame Speech Data Following the header block, the speech bits from each frame are placed one after the other into the payload in the same order as their frame headers are arranged in the TOC. All the speech bits together forms the speech data block. Similar to the zero-padding on the header block described above, the speech data block MUST be zero-padded to the next octet boundary. This zero-padding on the speech data block has no interaction with the RTP P bit (defined in [3]). 4.4. Error Protection and Detection Based on our discussion in Section 3.3, the header block, as defined in Section 4.2, MUST be protected by the transport checksum. If this portion of the payload fails the checksum examination, the whole payload packet will be silently dropped at the transport layer. The speech data block, as defined in Section 4.3, should not be covered by the transport layer checksum when an error tolerant transport is used (e.g., UDP-Lite, U-SCTP). At the RTP receiver, after the payload packet is delivered from the transport layer (and is unbundled into individual AMR frames in the case of receiving a compound RTP payload), a preprocessor or adaptation layer should verify the Codec CRC (if present) of an AMR frame over the received Class A bits in the frame. If the CRC verification fails, before passing the AMR frame to the speech decoder, the preprocessor should set the Q bit of the frame to 0, indicating the Class A bits of this frame in unusable. This preprocessing function can be an internal function of the AMR decoder. This is implementation specific. 5. RTP Payload for AMR Examples 5.1. Payload with a Single AMR Frame This example shows an RTP payload packet carrying a single good quality full AMR frame of 12.2 kbits/s rate (FT=7) with no CRC. 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 +-+-+-+-+-+-+ ------- |FN=1 | MR=6| <-PH ^ +-+-+-+-+-+-+-+ header block | FT=7 |0|1|0| <-FH 1 (total 2 octets) +-+-+-+-+-+-+-+ | |0 0 0| <-padding v +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ ------- | | ^ : frm #1 (244 spch bits) : speech data block | | (total 31 octets) | +-+-+-+-+-+-+-+-+-+-+-+-+ | | |0 0 0 0| V +-+-+-+-+-+-+-+-+ ------- In this example, the AMR receiver of this packet is also being asked to use 10.2 kbits/s rate (MR=6) for speech encoding when sending in the opposite direction. 5.2. Payload with multiple AMR Frames This example shows three AMR frames of different type bundled into one RTP payload: 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 +-+-+-+-+-+-+ ------- |FN=3 | MR=4| <-PH ^ +-+-+-+-+-+-+-+ | | FT=7 |0|1|0| <-FH1 header block +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ (total 4 octets) | FT=2 |0|1|1| Codec CRC2 | <-FH2 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | FT=4 |1|0|0| <-FH3 | +-+-+-+-+-+-+-+ | |0 0 0 0 0| <-padding v +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ ------- | | ^ : frm #1 (244 spch bits) : | | | | | |-----------------------| | |-+-+-+-+ | speech data block | frm #2 (118 spch bits) : (total 53 octets) : | | | | | | |---------------------------| | |-+-+ | | | frm #3 (61 Class A bits) | | | +-+-+-+-+-+-+-+-+-+ | | |0| V +-+-+-+-+-+-+-+-+ ------- Notes: 1) the MR field (=4) in the payload header instructs the AMR codec peer to use 7.40 kbits/s encoding rate. 2) the second AMR frame is transmitted with 8 bit Codec CRC. 3) the third AMR frame has the Q bit set to '0' in its frame header, indicating that the Class A bits of that AMR frame are corrupted and should not be used by the decoder when restructuring the speech. Also, the A flag is set to '1' for this frame, indicating that the corresponding speech data of this frame only contains Class A bits. 6. References [1] IETF RFC2119, "Key words for use in RFCs to Indicate Requirement Levels". [2] 3G TS 26.071 (V3.0.1), "AMR Speech Codec: General Description". [3] IETF RFC1889, "RTP: A Transport Protocol for Real-Time Applications". [4] IETF Internet Draft , "The UDP Lite Protocol", work in progress. [5] IETF Internet Draft , "SCTP Unreliable Data Mode Extension", work in progress. [6] 3G TS 26.101 (V3.0.0), "AMR Speech Codec: Frame Structure". [7] Postel, J. (ed.), "User Datagram Protocol", RFC 768, August 1980. 7. Authors' addresses Qiaobing Xie Tel: +1-847-632-3028 Motorola, Inc. EMail: qxie1@email.mot.com 1501 W. Shure Drive, #2309 Sanjay Gupta Tel: +1-847-435-0306 Motorola, Inc. EMail: QA4496@email.mot.com 1501 W. Shure Drive, #3205 Expires in six months from October 2000.