Internet Engineering Task Force Tim Fingscheidt, Siemens AG Audio Video Transport WG Bernhard Wimmer, Siemens AG INTERNET-DRAFT Germany July 14, 2000 Expires: January 14, 2001 RTP Payload Format for AMR Status of this memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or cite them other than as "work in progress". The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/lid-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html This document is an individual submission to the IETF. Comments should be directed to the authors. Abstract This document proposes a real-time transport protocol (RTP) [1] payload format for AMR speech encoded [2] signals. It supports all 8 modes of the AMR speech codec and is as well prepared for future extensions, such as AMR wideband. Mode adaptation and discontinuous transmission (DTX) are supported as well. The proposed payload format allows large flexibility with a minimum of bitrate overhead. One or multiple speech frames can be trans- mitted in a single packet. Redundant transmission of previously transmitted frames (or parts thereof) is possible as well as parity code transmission. With one speech frame per packet the additional parity code transmission allows reconstruction of N previous lost speech frames when N consecutive correct packets are buffered in the receiver. This means a very high robustness while the receiver buffer size can be chosen according to the application. For implementation of this draft, please consider also the requirements of [12]. Fingscheidt & Wimmer [Page 1] INTERNET-DRAFT RTP Payload Format for AMR July 14, 2000 1. Conventions used The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC2119 [11]. 2. Introduction The European Telecommunications Standards Institute (ETSI) as well as the Third Generation Partnership Project (3GPP) standardized the adaptive multi-rate (AMR) speech codec. In third generation systems the AMR codec will be mandatory. Three of the AMR modes are earlier standards like the 6.7 kbps mode (PDC-EFR [3]), the 7.4 kbps mode (IS-641 codec in TDMA [4]), and the 12.2 kbps mode (GSM-EFR [5]). The AMR codec comprises 8 modes with different bit rates ranging from 4.75 to 12.2 kbps. In systems with a fixed gross bit rate like e.g. GSM, this allows assigning different amounts of error protection in order to preserve high speech quality over a wide range of channel qualities. The sampling frequency is 8 kHz, speech frames are processed in 20 ms frames. The AMR modes are closely related to each other and use the same coding framework. AMR implementations must support all 8 speech coding modes, and mode switching can occur to any mode at any speech frame boundary. The mode information must therefore be transmitted together with the speech encoded bits to indicate the mode. Furthermore, the decoder may give an indication to the encoder of what mode it prefers to receive. This is called a codec mode request (CMR) and is useful to adjust the ratio of speech coder bits to error protection bits in order to ensure a certain speech quality. Along with the AMR codec, voice activity detection (VAD) and comfort noise generation (CNG) have been standardized. This allows a reduction of the number of transmitted bits in silence periods. The three earlier codec standards [3-5] however have different DTX/VAD/CNG schemes if they are not used in the AMR framework. For Interoperability reasons the proposed payload format supports also these CNG formats. To address the transmission over networks with high packet loss rates extra redundancy is built into the RTP payload format for AMR This is done in a very flexible manner by the optional transmission of parity bit blocks generated from previously transmitted AMR encoded frames. Dependent on how many previous frames are covered by this parity bit computation, a certain number of consecutive past lost frames can be reconstructed at the receiver. Since this may require buffering, the AMR payload format allows flexible tradeoff between robustness, bit rate, and receiver delay. The speech encoded bits have different perceptual sensitivity to bit errors. Accordingly, unequal error protection (UEP) is employed in cellular systems. A frame is considered as lost or damaged if errors are detected in the most sensitive bits. Unequal error detection (UED) can also be employed on RTP if e.g. UDP lite is used as transport layer protocol (UDP lite [6] is work in progress). The Fingscheidt & Wimmer [Page 2] INTERNET-DRAFT RTP Payload Format for AMR July 14, 2000 payload then has to be ordered in sensitivity order. The sensitivity order for the AMR encoded bits are defined in [7]. The different sensitivity can also be exploited by a parity check covering only the most sensitive bits, as is proposed as an option for the AMR payload format. To improve quality in circuit-switched GSM networks connected to IP networks also frames disturbed on the wireless GSM link should be transmitted to the decoder in the IP network. Consequently, such frames must be accompanied by a frame quality information in the IP network. This proposal of an RTP payload format for AMR is the third in a series of internet drafts (works in progress) related to this topic. In [8] the transmission of multiple speech frames in a single RTP packet is supported. The advantage of [9] as compared to [8] is mainly the possibility to transmit redundant speech frames (or parts thereof). The present proposal incorporates the abilities of [8,9] with the addition that there is an option for reconstruction of a larger number of past lost frames. For the purpose of clarity and simpler comparison, in the sequel we will follow the structure and the notation of [9] as far as possible. 3. Requirements The AMR payload format for RTP was designed to meet the following requirements: o Different levels of robustness must be supported: - no redundancy at all - past frames (partly) repeated - parity bits generated over several past frames to yield extreme robustness capable of handling very high packet loss rates with no or small speech quality degradation. o Fast, frame-wise AMR mode adaptation must be supported. This means that it must be possible to send codec mode requests (CMRs) back from the receiving side to the transmitting side with information on the preferred mode. Slower AMR mode adaptation may also be accomplished with external signaling. o Discontinuous transmission (DTX) and comfort noise generation (CNG) as specified in AMR must be supported. 4. RTP Payload Format Specification This RTP payload format is designed to be flexible, ranging from very low overhead (minimal) to an extended format with room for future AMR extensions, e.g. wide band modes, and the possibility to send extra redundancy information and several speech frames in one RTP payload packet. Fingscheidt & Wimmer [Page 3] INTERNET-DRAFT RTP Payload Format for AMR July 14, 2000 Each RTP payload consists of an - RTP payload header followed by the - RTP payload data. The RTP payload data is generated by the interleaving of one or several RTP payload frames, see section 4.4. An RTP payload frame may be generated from - AMR frames or - redundancy frames. Each RTP payload frame must not be octet-aligned, however the RTP payload shall be octet-aligned. If the last octet of an RTP payload covers unused bits, these bits shall be set to zero. 4.1. The RTP Payload Header The payload header has dynamic length, 3 or 8 bits. The bits in the Header are specified as follows: Q (1 bit): The payload quality bit indicates, if not set, that the Payload is severely damaged and the receiver should set the RX_TYPE, see [10], to SPEECH_BAD or SID_BAD depending on the frame type (FT). I (1 bit): If I=1, it indicates the existence LEN/DEPTH indicator bit (L) in each RTP payload frame. If I=0 the LEN/DEPTH indicator do not exist. R (1 bit): Indicates if the codec mode request (CMR) is sent or not. CMR (5 bits): OPTIONAL field, depending on the R bit. Requested codec mode for the other communication direction. The interpretation is equal to the FT field, see Table 1. 0 0 1 2 +-+-+-+ |Q|I|R| +-+-+-+ Figure 1: RTP payload header, R=0 0 0 1 2 3 4 5 6 7 +-+-+-+-+-+-+-+-+ |Q|I|R| CMR | +-+-+-+-+-+-+-+-+ Figure 2: RTP payload header, R=1 Fingscheidt & Wimmer [Page 4] INTERNET-DRAFT RTP Payload Format for AMR July 14, 2000 4.2. RTP Payload AMR Frame The RTP payload AMR frame is designed for covering AMR encoded speech data and is generated by - AMR frame header that is followed by the - AMR frame payload. The AMR frame must not be octet-aligned. 4.2.1. AMR Frame Header Format Each AMR frame header includes several specified fields as follows: F (1 bit): Indicates if this frame is followed by further frames. F=1 further frames follow, F=0 last frame. L (1 bit): (OPTIONAL) If the RTP payload header bit I=1 this field exists. If I=0 this field is not existing. If set to L=1 the AMR frame header includes the LEN field. If L=0 no LEN field exists in this AMR frame header. FT (5 bits): Frame type indicator, indicating the AMR speech coding mode or comfort noise (CN) mode. The mapping of existing AMR modes is given in Table 1. This implies that the number of bits of the AMR frame payload can be derived from Table 1. If FT=15 (No transmission) L for both AMR and redundancy frames SHOULD be set to 0. LEN (7 bits): OPTIONAL field, exists if the AMR header bit L is set, L=1. LEN specifies the number of octets in the current AMR frame payload. The following situations may occur and shall be treated as follows: - If LEN*8 <= number of speech bits indicated by FT, as shown in Table. 1, the number of bits of the AMR frame payload shall be derived by 8*LEN and not by the FT field. This implies that the encoded AMR data was shortend to 8*LEN. - otherwise the LEN field SHOULD be ignored. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |F|L| FT | LEN | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | | + + / AMR frame payload / / / + +-+-+-+-+-+-+-+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 3: AMR frame format, I=1 and L=1 Fingscheidt & Wimmer [Page 5] INTERNET-DRAFT RTP Payload Format for AMR July 14, 2000 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |F|L| FT | | +-+-+-+-+-+-+-+ + | | + + / AMR frame payload / / / + +-+-+-+-+-+-+-+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 4: AMR frame format, I=1 and L=0 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |F| FT | | +-+-+-+-+-+-+ + | | + + / AMR frame payload / + +-+-+-+-+-+-+-+-+-+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 5: AMR frame format, I=0 4.2.2. AMR Frame Payload Format The AMR speech encoder produces AMR speech frames, as defined by [2]. The currently defined AMR speech frame types can be found in Table 1. speech Index Mode bits ---------------------------------- 0 AMR 4.75 95 1 AMR 5.15 103 2 AMR 5.9 118 3 AMR 6.7 134 4 AMR 7.4 148 5 AMR 7.95 159 6 AMR 10.2 204 7 AMR 12.2 244 8 AMR CNG 39 9 GSM EFR CNG 43 10 IS-641 CNG 38 11 PDC-EFR CNG 37 12 - 14 For future use - 15 No transmission 0 16 - 31 For future use - Table 1: AMR speech frame types (taken from [9]) Fingscheidt & Wimmer [Page 6] INTERNET-DRAFT RTP Payload Format for AMR July 14, 2000 The bit order of frame type 0 - 11 is given in [7]. Frame type 15, no transmission, is needed to indicate not transmitted frames or lost frames, e.g. when multiple frames are sent in each payload and comfort noise starts. A frame type sequence in a payload with 8 frames, AMR mode 7, and CNG starts in the fifth frame, could look like: {7,7,7,7,8,15,15,8}. The AMR DTX (also called "source con- trolled rate operation", SCR) is described in [10]. Another reason for the no transmission frame type is a possible need to send an urgent codec mode request in a silence period with comfort noise. Before the AMR encoded speech frames are copied to the AMR frame payload the speech bits shall be ordered to the descending bit-error sensitivity. This re-ordering process is defined in [7]. After this re-ordering process the AMR encoded speech frame is copied to the AMR frame payload, according to the particular setting of the AMR frame header, e.g. copying of the first 8*LEN bits, see section 4.2.1. 4.3. RTP Payload - Redundancy Frame The RTP payload redundancy frame is designed for covering redundancy data for error-correction of lost AMR frames. The redundancy frame is generated by - redundancy frame header that is followed by the - redundancy frame payload. The redundancy frame must not be octet-aligned. 4.3.1. Redundancy Frame Header Format Each redundancy frame header includes several specified fields as follows: F (1 bit): Indicates if this frame is followed by further frames. F=1 further frames follow, F=0 last frame. L (1 bit): (OPTIONAL) If the RTP payload header bit I=1 this field exists. If I=0 this field is not existing. If set to L=1 the redundancy frame header includes the LEN field. If L=0 no R_LEN field exists in this redundancy frame header. R_FT (5 bits): This field indicates the FT-fields of the past DEPTH AMR frame headers by the following coding rule. R_FT(n) = FT(n-1) EXOR ... EXOR FT(n-DEPTH(n)) (Eq. 1) whereby n is set to the current AMR frame number. FT(n) is defined as the AMR frame header field FT of frame n. R_FT(n) denotes the redundancy frame header field R_FT of frame n. EXOR is defined as the bit-wise exclusive OR operation. DEPTH(n) denotes the redundancy frame header field DEPTH of frame n. Fingscheidt & Wimmer [Page 7] INTERNET-DRAFT RTP Payload Format for AMR July 14, 2000 R_LEN (7 bits): OPTIONAL field, exists if the redundancy header bit L is set, L=1. R_LEN specifies the number of octets in the current redundancy frame payload. Depending on R_LEN several different operational modes are used that will be described in section 4.3.2. R_LEN may be changed from redundancy frame to redundancy frame. If L=0 or/and I=0, R_LEN(n) is set to FT(n), whereby n denotes the current AMR frame number. DEPTH (4 bits): OPTIONAL field, exists if the redundancy header bit L is set, L=1. DEPTH specifies the number of previous AMR frame payload pakets that are used for the generation of the redundancy frame payload. The detailed description can be found in section 4.3.2. DEPTH = 0 is currently unused and may be used for future extension. If L=0 or/and I=0 then DEPTH is set to the default value 15. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |F|L| R_FT | R_LEN | DEPTH | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | | + + / redundancy frame payload / / / + +-+-+-+-+-+-+-+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 6: Redundancy frame format, I=1 and L=1 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |F|L| R_FT | | +-+-+-+-+-+-+-+ + | | + + / redundancy frame payload / / / + +-+-+-+-+-+-+-+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 7: Redundancy frame format, I=1 and L=0 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |F| R_FT | | +-+-+-+-+-+-+ + | | + + / redundancy frame payload / + +-+-+-+-+-+-+-+-+-+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 8: Redundancy frame format, I=0 Fingscheidt & Wimmer [Page 8] INTERNET-DRAFT RTP Payload Format for AMR July 14, 2000 4.3.2. Redundancy Frame Payload Format The generation of the redundancy payload is based on parity bit calculation of one or several previous AMR frame payload pakets. This number of AMR frames is determined by the redundancy frame header field DEPTH. The general rules for generating of the parity bits can be found in section 4.3.3. The value of R_LEN can in principle be changed during transmission. Let's assume R_LEN changes from R_LEN1 to R_LEN2, with DEPTH being constant. In that case for a number of DEPTH AMR frame packets only min(R_LEN1,R_LEN2) AMR frame payload bits can be reconstructed. Although adaptation of R_LEN for redundancy frames works seamlessly, it is RECOMMENDED not to perform such an adaptation on a frame-by-frame basis. The value of DEPTH can also be adapted during transmission. Let's assume DEPTH changes from DEPTH1 to DEPTH2. It is RECOMMENDED to choose a maximum value of DEPTH dependent on the application (e.g. streaming services: large DEPTH, VoIP: low DEPTH) and to adapt it only on a long term basis, since reconstruction capabilities are reduced in transition regions for a number of min(DEPTH1,DEPTH2) AMR frames. 4.3.3. Encoding Rules for the Parity Bits This section describes the encoding rules for the parity bits. Notation: n : number of the current AMR frame; n is increased for each sent AMR frame packet. n denotes also the current redundancy frame number. o : number of AMR frame that covers less AMR frame payload bits than required by current redundancy frame header field R_LEN(n) > LEN(o). g(n,m) : bit m in the AMR frame payload of frame n p(n,m) : bit m in the redundancy frame payload of frame n XOR : exclusive OR operation R_LEN(n) : denotes the R_LEN field of the redundancy frame header of frame n The parity bits SHALL be calculated by the following equation: p(n,m) = g(n-1,m) EXOR ... EXOR g(n-DEPTH+1, m) EXOR g(n-DEPTH, m) (eq.2) for m = 0 ... R_LEN(n)-1; Eq. 2 requires that all LEN(i) with i = (1, ... , DEPTH) of the AMR frames are at least as large as R_LEN(n). In the event that this is not valid the missing AMR frame payload bits SHALL be virtually generated by the following rule. Fingscheidt & Wimmer [Page 9] INTERNET-DRAFT RTP Payload Format for AMR July 14, 2000 if (o = n-DEPTH) g(o, LEN(o)+i) = 0, for i=0...(R_LEN(n)-LEN(o)-1); else if (R_LEN(n)-LEN(o) <= LEN(o-1)) g(o, LEN(o)+i) = g(o-1, i), for i=0...(R_LEN(n)-LEN(o)-1); else { g(o, LEN(o)+i) = g(o-1, i), for i = 0 ... (LEN(o-1)-1); g(o, LEN(o)+LEN(o-1)+i) = 0, for i = 0 ... (R_LEN(n)-LEN(o)-LEN(o-1)-1); } This rule implies that virtuell data SHALL be copied from the most sensitive bits of the previous AMR frame payload of the AMR frame o. However if the previous AMR frame number (o-1) is outside the window defined by the DEPTH parameter of the current redundancy frame the virtual data is set to 0. In the case that the AMR frame payload (o-1) contains less bits than required to achieve all virtual bits of AMR frame payload (o) then first all AMR frame payload bits of (o-1) SHALL be taken and then the missing virtual bits of AMR frame payload (o) SHALL be set to 0. Example: In this example, see Figure 9, it can be seen that the AMR frame payload contains not enough bits. Therefore the most sensitive bits of AMR frame payload (n-3) are virtually appended to AMR frame pay- load (n-2) until the desired length is reached. time: n-3 n-2 n-1 n +----------+ +-----------+ +----------+ +--------+ | |- XOR -| g(n-2,m), |- XOR -| | = | | | g(n-3,m) |- XOR -| fill with |- XOR -| g(n-1,m) | = | p(n,m) | | |- XOR -| g(n-3,m) |- XOR -| | = | | +----------+ +-----------+ +----------+ +--------+ Figure 9: Example of parity bit generation for p(n,m) with DEPTH=3 and the number of AMR frame payload bits in frame n-2 being smaller than 8*R_LEN(n). 4.3.4. Decoding of Redundancy Frame Payload Decoding of these parity codes is intended in the following manner. Imagine one frame of AMR encoded bits and one parity bit block per frame. Every value of DEPTH >= 1 allows the reconstruction of a single lost frame among the last DEPTH frames. DEPTH = 2 allows the reconstruction of two consecutive lost frames, once two good frames are received. In general, a number of DEPTH buffered packets allows for the reconstruction of a number of DEPTH lost frames preceding them. The set of equations given by the XOR operations is solved at Fingscheidt & Wimmer [Page 10] INTERNET-DRAFT RTP Payload Format for AMR July 14, 2000 first for the last (!) lost frame (unknowns), using the DEPTH buffered frames as knowns. Then everything is solved for the last but first lost frame, taking into account the already reconstructed last lost frame's bits. And so forth. Here the tremenduous strength of using parity codes instead of frame repetition becomes obvious: Especially for streaming applications a large value of DEPTH allows to reconstruct error bursts of the same large number of DEPTH consecutive frames. 4.3.5. Implications for DTX and the choice of DEPTH For delay reasons it is not advisable to store a large number (DEPTH) of CNG frames in the receiver buffer before previous lost CNG AMR frames or AMR frame payload packets, containing speech data, can be reconstructed. Thus the follwing rules SHALL apply: o Starting with the second AMR frame containing one/several CNG frames, DEPTH SHALL be set maximally to 1 for all consecutive redundancy frames containing CNG AMR frames. o In the first and the second AMR frame containing no CNG after a speech pause, DEPTH SHALL be set maximally to 1. These rules allow optimal recovery of lost AMR frames in DTX operation, while keeping delay at a minimum. 4.4. Payload Block Sorting In general a bit error in a more sensitive bit is subjectively more annoying than in a less sensitive bit. To be able to protect the most sensitive bits in a AMR and redundancy frames with a forward error detection code, e.g. a CRC outside RTP, the full RTP payload data MUST be sorted in sensitivity order. The protection MAY then cover an appropriate number of octets from the beginning of the AMR and/or redundancy frames. How many octets depends on the channel and application. This can for example be accomplished by UDP lite [6] (work in progress). To maintain sensitivity ordering inside the AMR payload when more than one speech frame is transmitted in one packet reordering of the data is needed. The reordering to maintain the sensitivity ordered AMR payload SHALL be performed on bit level. The AMR payload header SHALL still be placed unchanged in the beginning of the payload. Thereafter, the payload frames are sorted with one bit alternating from each payload frame. Fingscheidt & Wimmer [Page 11] INTERNET-DRAFT RTP Payload Format for AMR July 14, 2000 +-------------+ | h(0)-h(H-1) | +------------------------+ | f(0,0) _ f(0,F(0)) | +----------------------------+ | f(1,0) _ f(1,F(1)) | +----------------------------+ | f(2,0) _ f(2,F(2)) | +----------------------+ \ \ +-------------------------------+ | f(N-1,0) _ f(N-1,F(N-1)) | +-------------------------------+ Figure 10: The payload header and N AMR/redundancy frames before sorting. The sorting algorithm can be described in C-code. b(m) : bit m of RTP final payload f(n,m) : bit m in AMR/redundancy frame payload of frame n F(n) : number of bits in AMR/redundancy frame n, defined by FT or by LEN/R_LEN h(m) : bit m of RTP payload header H : number of RTP payload header bits, 3 or 8 bits N : number of AMR/redundancy frames in the RTP payload S : number of unused bits Payload frames f(n,m) are ordered in consecutive order, where frame n=1 is preceding frame n=2. The sorting algorithm is defined in C-style as: for (i = 0; i < H; i++) b(i) = h(i); max = max(F(0),..,F(N-1)); k = H; for (i = 0; i < max; i++){ for (j = 0; j < N; j++){ if (i < F(j)){ b(k++) = f(j,i); } } } S = 8 - k%8; if (S < 8){ for (i = 0; i < S; i++) b(k++) = 0; } Fingscheidt & Wimmer [Page 12] INTERNET-DRAFT RTP Payload Format for AMR July 14, 2000 5. RTP header usage The RTP header marker bit (M) is used to mark (M=1) the packages containing the first speech frame after CN. In all other packages the marker bit is set to 0 (M=0). The time-stamp corresponds to the sampling time of the first sample encoded for the first encoded speech frame in the AMR frame. The timestamp unit is in samples, i.e. one AMR speech frame is 20 ms and sampling frequency is 8 kHz corresponds to 160 encoded speech samples per frame, i.e. the timestamp is increased by 160 for each AMR speech consecutive frame. Due to DTX functionality each RTP packet SHALL contain the appropriate time-stamp of the first AMR frame, covered by the RTP payload. Each AMR frame containg CNG data or the first AMR frame containing speech data after CNG SHALL start with a new RTP packet. This is required to achieve the correct timing information. Please consider also [12] for setting of particular parameters. 6. Examples 6.1. Simple example In the simple example we just send one full (I=0) frame in each RTP packet, no codec mode request CMR is sent (R=0), the payload was not damaged at IP origin (Q=1). In this example we transmit one frame encoded with the 5.9 kbps mode (FT=2). The speech encoded bits are put into f(0) to f(117) in descending sensitivity order according to [7]. | Bit no. | Oct.| 0 1 2 3 4 5 6 7 | ----+-------+-------+-------+-------+-------+-------+-------+-------+ 0 | Q=1 | I=0 | R=0 | F=0 | 0 | 0 | 0 | 1 | ----+-------+-------+-------+-------+-------+-------+-------+-------+ 1 | 0 | f(0) | f(1) | f(2) | ... | ... | ... | ... | ----+-------+-------+-------+-------+-------+-------+-------+-------+ 16 | ... | ... | ... | ... | f(115)| f(116)| f(117)| 0 | ----+-------+-------+-------+-------+-------+-------+-------+-------+ Figure 11: One frame per packet example. 6.2. Example with parity bits In this example a AMR frame with 6.7 kbps mode (FT=3) is sent with one redundancy frame packet. - The RTP payload header is set to Q=1, I=1, R=1 and CMR = 6. A mode request is sent(R=1), requesting the 10.2 kbps mode for the other link (CMR=6). - The AMR frame header uses F=1, L=0 (this implies NO LEN field) and FT = 3. The AMR frame header is followed by the AMR frame payload, denoted by f(0) to f(133). Fingscheidt & Wimmer [Page 13] INTERNET-DRAFT RTP Payload Format for AMR July 14, 2000 - The redundancy frame header is set to - F = 0 (no following frames), - L = 1 (R_LEN and DEPTH exist) - R_FT = 3 (the 3 previous AMR frame header fields FT were 3), - R_LEN = 2 (number of redundancy frame payload bits = 2*8 = 16) - DEPTH = 3 (the 3 previous AMR frame payload packets are taken for redundancy frame payload calculation) The redundancy frame paylaod covers 16 bits and is denoted by the value r(.). | Bit no. | Oct.| 0 1 2 3 4 5 6 7 | ----+-------+-------+-------+-------+-------+-------+-------+-------+ 0 | Q=1 | I=1 | R=1 | 0 | 0 | 1 | 1 | 0 | ----+-------+-------+-------+-------+-------+-------+-------+-------+ 1 | F=1 | F=0 | L=0 | L=1 | 0 | 0 | 0 | 0 | ----+-------+-------+-------+-------+-------+-------+-------+-------+ 2 | 0 | 0 | 1 | 1 | 1 | 1 | f(0) | 0 | ----+-------+-------+-------+-------+-------+-------+-------+-------+ 3 | f(1) | 0 | f(2) | 0 | f(3) | 0 | f(4) | 0 | ----+-------+-------+-------+-------+-------+-------+-------+-------+ 4 | f(5) | 1 | f(6) | 0 | f(7) | 0 | f(8) | 0 | ----+-------+-------+-------+-------+-------+-------+-------+-------+ 5 | f(9) | 1 | f(10) | 1 | f(11) | r(0) | f(12) | r(1) | ----+-------+-------+-------+-------+-------+-------+-------+-------+ 6 | f(13) | r(2) | f(14) | r(3) | ... | ... | ... | ... | ----+-------+-------+-------+-------+-------+-------+-------+-------+ .. | ... | ... | ... | ... | ... | ... | ... | ... | ----+-------+-------+-------+-------+-------+-------+-------+-------+ 9 | ... | ... | ... | r(15) | f(27) | r(16) | f(28) | f(29) | ----+-------+-------+-------+-------+-------+-------+-------+-------+ .. | ... | ... | ... | ... | ... | ... | ... | ... | ----+-------+-------+-------+-------+-------+-------+-------+-------+ 33 | ... | ... | ... | ... | f(130)| f(131)| f(132)| f(133)| ----+-------+-------+-------+-------+-------+-------+-------+-------+ Figure 12: Example with 1 AMR frame and 1 redundancy frame Fingscheidt & Wimmer [Page 14] INTERNET-DRAFT RTP Payload Format for AMR July 14, 2000 7. References [1] IETF RFC1889, "RTP: A Transport Protocol for Real-Time Applications" [2] GSM 06.90, "Adaptive Multi-Rate (AMR) speech transcoding" [3] ARIB, RCR STD-27H, Section 5.4, "ACELP Speech CODEC" [4] TIA/EIA IS-641-A, "TDMA Cellular/PCS _Radio interface, Enhanced Full-Rate Voice Codec" [5] GSM 06.60, "Enhanced Full Rate (EFR) speech transcoding" [6] IETF draft-larzon-udplite-02.txt, "The UDP Lite Protocol" [7] 3G TS 26.101, "AMR Speech Codec Frame Structure" [8] IETF draft-lakaniemi-avt-rtp-amr-00.txt, "RTP Payload Format for AMR" [9] IETF draft-sjoberg-avt-rtp-amr-00.txt, "RTP payload format for AMR" [10] 3G TS 26.093, "AMR Speech Codec; Source Controlled Rate Operation" [11] RFC 2119, "Key words for use in RFCs to Indicate Requirement Levels" [12] IETF draft-wimmer-amr-01.txt, "MIME Type Registration for AMR Speech Codec" 8. Authors' addresses Tim Fingscheidt Siemens AG, ICP CD Grillparzerstrasse 10-18 D - 81675 Munich Germany Phone: ++49 89 722 57658 Fax: ++49 89 722 46489 E-mail: Tim.Fingscheidt@mch.siemens.de Bernhard Wimmer (contact person) Siemens AG, ICP CD Grillparzerstrasse 10-18 D - 81675 Munich Germany Phone: ++49 89 722 23247 Fax: ++49 89 722 46489 E-mail: Bernhard.Wimmer@mch.siemens.de This Internet-Draft expires January, 14, 2001. Fingscheidt & Wimmer [Page 15] INTERNET-DRAFT RTP Payload Format for AMR July 14, 2000 Full Copyright Statement "Copyright (C) The Internet Society (date). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES; EXPRESS OR IMPLIED; INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Fingscheidt & Wimmer [Page 16]