Internet Engineering Task Force Johan Sjoberg, Ericsson Audio Video Transport WG Erik Ekudden, Ericsson INTERNET-DRAFT Morgan Lindqvist, Ericsson March 10, 2000 Magnus Westerlund, Ericsson Expires: September 10, 2000 Sweden RTP payload format for AMR Status of this memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or cite them other than as "work in progress". The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/lid-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html This document is an individual submission to the IETF. Comments should be directed to the authors. Abstract This document describes a proposed real-time transport protocol (RTP) [8] payload format for AMR speech encoded [1] signals. The AMR payload format can be used with minimal overhead sending one speech frame per RTP packet or using an extended format. The extended payload format supports means to send redundant data for speech frames sent in earlier RTP packets and to send multiple speech frames in one RTP packet. The payload format handles the current AMR mode set with 8 narrow-band modes and is prepared for future AMR extensions (e.g. wide-band modes). Mode adaptation and source controlled rate operation (SCR) are supported by the AMR payload format. Sjoberg [Page 1] INTERNET-DRAFT RTP Payload Format for AMR March 10, 2000 1. Introduction The adaptive multi-rate (AMR) speech codec was developed by the European Telecommunications Standards institute (ETSI). The AMR codec is standardized for GSM, and is also chosen by 3GPP as the mandatory codec for third generation systems. It is currently under standardization for TDMA. I.e. the AMR codec will be widely used in cellular systems. The AMR codec is developed to preserve high speech quality under a wide range of transmission conditions. The AMR codec is a multi-mode codec with 8 narrow band modes with bit rates between 4.75 and 12.2 kbps. The sampling frequency is 8000 Hz and processing is done on 20 ms frames, i.e. 160 samples per frame. The AMR modes are closely related to each other and uses the same coding framework. Three of the AMR modes are already adopted and used standards of there own, the 6.7 kbps mode as PDC-EFR [7], the 7.4 kbps mode as IS-641 codec in TDMA [6], and the 12.2 kbps mode as GSM- EFR [5]. AMR implementations must support all 8 speech coding modes, and mode switching can occur to any mode at any time. The mode information must therefore be transmitted together with the speech encoded bits, to indicate the mode. It is possible for the decoder to signal to the encoder the mode it prefers to receive. The reason can be e.g. transmission bandwidth or quality. The AMR codec is designed with a voice activity detector (VAD) and generation of comfort noise (CN) parameters during silence periods. Hence, the AMR codec can reduce the number of transmitted bits and packets during silence periods to a minimum. The operation to send CN parameters at regular intervals during silence periods is usually called discontinuous transmission (DTX) or source controlled rate (SCR) operation. The three codec standards that are part of AMR [5][6][7] also have SCR/CN functionality specified. To enable interoperability with terminals supporting these standards the AMR can optionally be extended to support also these CN schemes. AMR wide-band modes with 16000 Hz sampling frequency is under standardization. Due to the flexibility and robustness of AMR it is suitable also for other purposes than circuit switched cellular systems. Other suitable applications are real-time services over packet switched networks, e.g. over RTP. To be optimized for transmission over networks with high packet loss rates extra redundancy is built into the RTP payload format for AMR. The speech encoded bits have different perceptual sensitivity to bit errors. Cellular systems exploit this by using Sjoberg [Page 2] INTERNET-DRAFT RTP Payload Format for AMR March 10, 2000 unequal error protection and detection (UEP and UED). This mechanism concentrate the correction and detection of corrupted bits to the perceptually most sensitive bits. A frame is only regarded as lost or damaged if errors are detected in the most sensitive bits. The UED can also be employed on RTP if UDP lite is used as transport layer protocol (UDP lite [10] is work in progress). The payload then has to be ordered in sensitivity order. The AMR encoded bits are defined in sensitivity order in [2]. The different sensitivity could also be used for not transmitting the least sensitive bits when redundant frames are sent. The special problems with IP real-time traffic over cellular access networks are further discussed in [9]. Other AMR scenarios are possible, e.g. one end is circuit switched GSM then a gate-way to IP and an IP terminal in the other end. To improve quality also frames damaged by the GSM radio should be transmitted to the decoder in the IP network. To make this possible frame quality information has to be transmitted over the IP network. The quality bit is also needed for the AMR RTP payload format to interwork with for example the ATM AAL2 AMR profile. 2. Requirements The AMR payload format for RTP was designed to meet the following requirements: o Different levels of robustness must be supported, from no redundant data to extreme robustness capable of handling very high packet loss rates with no or small speech quality degradation. o Fast, frame-wise AMR mode adaptation must be supported. This means that it must be possible to send Codec Mode Requests back from the receiving side to the transmitting side with information on the preferred mode. Slower AMR mode adaptation may also be accomplished with external signaling. o Source controlled rate operation (SCR) and comfort noise parameter (CN) transmission defined in AMR must be supported. 3. Payload Format Specification The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC2119 [3]. The AMR payload format is designed to be flexible, ranging from very low overhead (minimal) to an extended format with room for future AMR extensions, e.g. wide band modes, and the possibility to send extra redundancy information and several speech frames in one packet. Sjoberg [Page 3] INTERNET-DRAFT RTP Payload Format for AMR March 10, 2000 The payload format consists of payload header and one or more payload frames. Neither the payload header nor the payload frames are octet aligned on their own but the full payload is. The full payload SHALL finally be ordered in descending bit error sensitivity order to be prepared for unequal error protection or unequal error detection schemes, e.g. UDP lite. The AMR encoded bit streams are defined in sensitivity order in Annex B of [2], the original order as delivered from the speech encoder is defined in [1]. The last octet of an AMR payload packet is padded with zeroes at the end if not all bits are needed. speech octets in Index Mode bits minimal form -------------------------------------------- 0 AMR 4.75 95 13 1 AMR 5.15 103 14 2 AMR 5.9 118 16 3 AMR 6.7 134 18 4 AMR 7.4 148 20 5 AMR 7.95 159 21 6 AMR 10.2 204 27 7 AMR 12.2 244 32 8 AMR CN 39 6 9 GSM EFR CN 43 7 10 IS-641 CN 38 6 11 PDC-EFR CN 37 6 12 - 14 For future use - - 15 No transmission 0 1 16 - 31 For future use - - Table 1: AMR frame types. Minimal form is one frame per payload and no Codec Mode Request. The bit order of frame type 0 - 11 is given in [2]. Frame type 15, no transmission, is needed to indicate not transmitted frames or lost frames, e.g. when multiple frames are sent in each payload and comfort noise starts. A frame type sequence in a payload with 8 frames, AMR mode 7, and CN starts in the fifth frame, could look like: {7,7,7,7,8,15,15,8}. The AMR SCR is described in [4]. Another reason for the no transmission frame type is a possible need to send an urgent Codec Mode Request in a silence period with comfort noise. The AMR payload format supports robust transmission, multiple frames in one payload packet, and the use of fast codec mode adaptation. The robust behavior is accomplished by retransmission of previously transmitted frames together with the current frame or frames. The redundant frames could be transmitted in their entirety or only partly. If only a part of the redundant frame is transmitted the least sensitive bits are omitted. A partly transmitted redundant Sjoberg [Page 4] INTERNET-DRAFT RTP Payload Format for AMR March 10, 2000 frame SHALL fill the number of used octets for that frame. The bits in the payload are sorted in descending sensitivity order to support UED like in UDP lite. When bits in redundant frames are not transmitted, the not transmitted/received bits MUST be reconstructed on the receiver side. It is RECOMMENDED to produce the non received bits with a random generation or another quality preserving method. To use a fixed pattern SHOULD be avoided from speech quality reasons. 3.1. The payload header The payload header has dynamic length, 3 or 8 bits. The bits in the header are specified as follows: Q (1 bit): The payload quality bit indicates, if not set, that the payload is severely damaged and the receiver should set the RX_TYPE, see [4], to SPEECH_BAD or SID_BAD depending on the frame type (FT). L (1 bit): Indicates the existence of LEN fields in the payload frames. R (1 bit): Indicates if the Codec Mode Request (CMR) is sent or not. CMR (5 bits): OPTIONAL field, depending on the R bit. Requested codec mode for the other communication direction. The interpretation is equal to the FT field, see Table 1. 0 0 1 2 +-+-+-+ |Q|L|R| +-+-+-+ Figure 1: AMR payload header, R=0 0 0 1 2 3 4 5 6 7 +-+-+-+-+-+-+-+-+ |Q|L|R| CMR | +-+-+-+-+-+-+-+-+ Figure 2: AMR payload header, R=1 3.2. AMR payload frame An AMR payload frame represent one encoded speech frame. Each payload frame includes several specified fields as follows: Sjoberg [Page 5] INTERNET-DRAFT RTP Payload Format for AMR March 10, 2000 F (1 bit): Indicates if this frame is followed by further frames. F=1 further frames follow, F=0 last frame. FT (5 bits): Frame type indicator, indicating the AMR speech coding mode or comfort noise (CN) mode. The mapping of existing AMR modes are given in Table 1. If FT=15 (No transmission) no LEN or AMR encoded bits follow. LEN (7 bits): OPTIONAL field, exists if the payload header bit L is set, L=1. LEN specifies the number of octets in the AMR encoded bits field in this frame. If LEN indicates more bits than the AMR mode information in the FT field, the implicit knowledge of the number of bits for the AMR mode indicated by FT is the valid number of AMR encoded bits. If LEN indicates fewer bits than given by the mode information in the FT field, LEN gives the number of encoded bits. If a frame is transmitted only partially the least sensitive bits at the end of the frame are omitted. This use is intended for partial redundant data. AMR encoded bits: This is the speech codec encoded data field. The length of this field is either defined implicitly by the AMR mode in the FT field, or by the LEN field. The last payload frame SHALL always contain a full AMR frame, i.e. no LEN field is needed. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |F| FT | LEN | | +-+-+-+-+-+-+-+-+-+-+-+-+-+ + | | + + / AMR encoded bits / + +-+-+-+-+-+-+-+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 3: Payload frame format, F=1 and L=1 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |F| FT | | +-+-+-+-+-+-+ + | | + + / AMR encoded bits / + +-+-+-+-+-+-+-+-+-+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 4: Payload frame format, F=0 or L=0 Sjoberg [Page 6] INTERNET-DRAFT RTP Payload Format for AMR March 10, 2000 3.3. Payload block sorting The bits in each frame are ordered in sensitivity order, i.e. a bit error in a more sensitive bit is subjectively more annoying than in a less sensitive bit. To be able to protect the most sensitive bits in a payload packet with a forward error detection code, e.g. a CRC outside RTP, the full RTP payload MUST be sorted in sensitivity order. The protection MAY then cover an appropriate number of octets from the beginning of the payload. How many octets depends on the channel and application. This can for example be accomplished by UDP lite [10] (work in progress). To maintain sensitivity ordering inside the AMR payload when more than one speech frame is transmitted in one packet reordering of the data is needed. The reordering to maintain the sensitivity ordered AMR payload SHALL be performed on bit level. The AMR payload header SHALL still be placed unchanged in the beginning of the payload. Thereafter, the payload frames are sorted with one bit alternating from each payload frame. +-------------+ | h(0)-h(H-1) | +------------------------+ | f(0,0) _ f(0,F(0)) | +----------------------------+ | f(1,0) _ f(1,F(1)) | +----------------------------+ | f(2,0) _ f(2,F(2)) | +----------------------+ \ \ +-------------------------------+ | f(N-1,0) _ f(N-1,F(N-1)) | +-------------------------------+ Figure 5: The payload header and N payload frames before sorting. The sorting algorithm can be described in C-code. b(m) - bit m of RTP final payload f(n,m) - bit m in payload frame n F(n) - number of bits in payload frame n, defined by FT or by LEN h(m) - bit m of payload header H - number of payload header bits, 3 or 8 bits N - number of payload frames in the payload S - number of unused bits Payload frames f(n,m) are ordered in consecutive order, where frame n=1 is preceding frame n=2. Sjoberg [Page 7] INTERNET-DRAFT RTP Payload Format for AMR March 10, 2000 The sorting algorithm is defined in C-style as: for (i = 0; i < H; i++){ b(i) = h(i); } max = max(F(0),..,F(N-1)); k = H; for (i = 0; i < max; i++){ for (j = 0; j < N; j++){ if (i < F(j)){ b(k++) = f(j,i); } } } S = 8 - k%8; if (S < 8){ for (i = 0; i < S; i++){ b(k++) = 0; } } 4. RTP header usage The RTP header marker bit (M) is used to mark (M=1) the packages containing the first speech frame after CN. All other packages the marker bit is set to 0 (M=0). The time-stamp corresponds to the sampling time of the first sample encoded for the first encoded speech frame in the packet. The timestamp unit is in samples, i.e. one AMR speech frame is 20 ms and sampling frequency is 8 kHz corresponds to 160 encoded speech samples per frame, i.e. the timestamp is increased by 160 for each consecutive frame. All frames in a packet MUST be successive 20 ms frames. 5. Examples 5.1. Simple example In the simple example we just send one full (L=0) frame in each RTP packet, no Codec Mode Request CMR is sent (R=0), the payload was not damaged at IP origin (Q=1). In this example we transmit one frame encoded with the 5.9 kbps mode (FT=2). The speech encoded bits are put into f(0) to f(117) in descending sensitivity order according to [2]. Sjoberg [Page 8] INTERNET-DRAFT RTP Payload Format for AMR March 10, 2000 | Bit no. | Oct.| 0 1 2 3 4 5 6 7 | ----+-------+-------+-------+-------+-------+-------+-------+-------+ 0 | Q=1 | L=0 | R=0 | F=0 | 0 | 0 | 0 | 1 | ----+-------+-------+-------+-------+-------+-------+-------+-------+ 1 | 0 | f(0) | f(1) | f(2) | ... | ... | ... | ... | ----+-------+-------+-------+-------+-------+-------+-------+-------+ 16 | ... | ... | ... | ... | f(115)| f(116)| f(117)| 0 | ----+-------+-------+-------+-------+-------+-------+-------+-------+ Figure 6: One frame per packet example. 5.2. Example with partial redundancy In this example the 6.7 kbps mode (FT=3) is sent with one redundant frame, also FT=3. Only a part of the redundant frame is sent, in this example 12 octets, (L=1, LEN=12). A mode request is sent(R=1), requesting the 10.2 kbps mode for the other link(CMR=6). The redundant frame (12 octets) is r(0) to r(95) and the current frame (134 bits) is f(0) to f(133). | Bit no. | Oct.| 0 1 2 3 4 5 6 7 | ----+-------+-------+-------+-------+-------+-------+-------+-------+ 0 | Q=1 | L=1 | R=1 | 0 | 0 | 1 | 1 | 0 | ----+-------+-------+-------+-------+-------+-------+-------+-------+ 1 | F=1 | F=0 | 0 | 0 | 0 | 0 | 0 | 0 | ----+-------+-------+-------+-------+-------+-------+-------+-------+ 2 | 1 | 1 | 1 | 1 | 0 | f(0) | 0 | f(1) | ----+-------+-------+-------+-------+-------+-------+-------+-------+ 3 | 0 | f(2) | 1 | f(3) | 1 | f(4) | 0 | f(5) | ----+-------+-------+-------+-------+-------+-------+-------+-------+ 4 | 0 | f(6) | r(0) | f(7) | r(1) | f(8) | r(2) | ... | ----+-------+-------+-------+-------+-------+-------+-------+-------+ 28 | ... | ... | ... | ... | r(93) | f(100)| r(94) | f(101)| ----+-------+-------+-------+-------+-------+-------+-------+-------+ 29 | r(95) | f(102)| f(103)| f(104)| ... | ... | ... | ... | ----+-------+-------+-------+-------+-------+-------+-------+-------+ 32 | ... | ... | ... | ... | ... | ... | f(131)| f(132)| ----+-------+-------+-------+-------+-------+-------+-------+-------+ 33 | f(133)| 0 | 0 | 0 | 0 | 0 | 0 | 0 | ----+-------+-------+-------+-------+-------+-------+-------+-------+ Figure 7: Example with partial redundancy. 6. References [1] GSM 06.90, "Adaptive Multi-Rate (AMR) speech transcoding". Sjoberg [Page 9] INTERNET-DRAFT RTP Payload Format for AMR March 10, 2000 [2] 3G TS 26.101, "AMR Speech Codec Frame Structure". [3] RFC 2119, "Key words for use in RFCs to Indicate Requirement Levels". [4] 3G TS 26.093, "AMR Speech Codec; Source Controlled Rate operation". [5] GSM 06.60, "Enhanced Full Rate (EFR) speech transcoding". [6] TIA/EIA IS-641-A, "TDMA Cellular/PCS _Radio interface, Enhanced Full-Rate Voice Codec". [7] ARIB, RCR STD-27H, Section 5.4, "ACELP Speech CODEC". [8] IETF RFC1889, "RTP: A Transport Protocol for Real-Time Applications". [9] IETF draft-westberg-realtime-cellular-01.txt, "Realtime Traffic over Cellular Access Networks". [10] IETF draft-larzon-udplite-02.txt, "The UDP Lite Protocol". 7. Authors' addresses Johan Sjoberg Ericsson Research E-mail: Johan.Sjoberg@ericsson.com Erik Ekudden Ericsson Research E-mail: Erik.Ekudden@ericsson.com Morgan Lindqvist Ericsson Research E-mail: Morgan.Lindqvist@ericsson.com Magnus Westerlund Ericsson Research E-mail: Magnus.Westerlund@era.ericsson.se This Internet-Draft expires September 10, 2000. Sjoberg [Page 10]