Audio-Video Transport Working Group Tom Hiller INTERNET-DRAFT Peter J. McCann Document: Michael D. Turner Ajay Rajkumar Lucent Technologies December 2000 RTP Payload Format for EVRC Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026 [Bradner96]. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. 1. Abstract This document describes how to carry Enhanced Variable Rate Codec (EVRC) encoded speech in RTP packets. 2. Conventions used in this document The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC-2119 [Bradner97]. 3. Introduction The Telecommunications Industry Association (TIA) [TIA-IS127] as well as the 3rd Generation Partnership Project 2 (3GPP2) [3GPP2- EVRC] have standardized the Enhanced Variable Rate Codec (EVRC). The EVRC incorporates voice-activity detection that allows the Hiller et al. Standards Track - Expires 06/01 1 RTP Payload Format for EVRC December, 2000 speech coder to select the appropriate number of bits to encode each frame thus causing silence or background noise to be coded with the smallest number of bits. This automatically results in a reduction in the number of transmitted bits during periods of silence or background noise. The EVRC was originally designed for use with the IS-95 CDMA air interface [TIA-IS95]. The EVRC uses 3 of the 4 primary traffic packet types permitted by IS-95 Multiplex Option 1: rate 1 (171 bits/packet), rate 1/2 (80 bits/packet), and rate 1/8 (16 bits/packet). The sampling frequency is always 8kHz, and speech data are always processed in 20 millisecond frames. The 3 frame types therefore have different bit rates ranging from 0.8 kbits/s to 8.55 kbits/s. Since the bit rate is driven by the voice activity and the rate can change on any speech frame boundary, the rate of encoding must be transmitted along with the speech information bits in each packet. Existing IS-95 implementations of EVRC always transmit at least a 1/8 rate frame every 20ms to allow monitoring of radio signal strength, adjust power, and perform handoffs as necessary even during silence periods. These activities help maintain voice quality in a wireless environment, but they also imply that there is no mechanism in place that would allow an IS-95 peer to detect silence without actually analyzing the EVRC payload itself, that is, at least partially implementing the EVRC vocoder. Also, in many implementations the power control is integrated with the vocoding in such a way that the transmitter may request the vocoder switch to one of the lower frame rates in an effort to squeeze some higher priority traffic into the limited channel bandwidth. This draft specifies an RTP payload that will support EVRC encoded speech data. To support IS-95 peers that do not contain DSP hardware, but which may nevertheless be endpoints of such an RTP stream, we do not require that the 'M' bit be set to indicate the start of a talk spurt. Also, to support the rate adjustment strategy discussed above, we provide bits for in-band signaling to the remote RTP endpoint to adjust the rate in the reverse direction. 4. RTP Payload Format for EVRC In this section we describe the usage of the fixed RTP header and then give the actual payload format specification. 4.1 Fixed RTP Header Usage We make no changes to the existing fixed RTP header. The RTP header marker bit (M) MAY be used to mark (M=1) the RTP packets containing the first speech frame after silence; otherwise the marker bit is set to 0 (M=0). Note that some implementations (especially IS-95 Hiller et al. Standards Track - Expires 06/01 2 RTP Payload Format for EVRC December, 2000 air interfaces without vocoder DSP hardware) may not be able to recognize silence, so implementations should not rely on the M bit for this purpose. The timestamp reflects the sampling instant of the first octet in the RTP data packet. The timestamp SHALL be increased by 160 for each consecutive 20ms sampling interval. If there are N frames in a given RTP packet, this means that the next RTP packet will have a timestamp that is increased by at least N*160 tick intervals. The sequence number MUST be incremented by one for each RTP packet generated. 4.2 RTP Payload We require the following 3-bit header to appear at the start of the RTP payload: 0 1 2 +-+-+-+ |R|CMR| +-+-+-+ Figure 1: Fixed EVRC Payload Header Setting the 'R' bit indicates that this packet is requesting a codec rate change for the reverse direction. The 'CMR' field is two bits (always present) that indicate the requested mode. It should be set to one of the following values: CMR Value Meaning --------- ------- 00 Please switch to a maximum of rate 1/8 (16 bit) encoding. 01 Reserved for future use. 10 Please switch to a maximum of rate 1/2 (80 bit) encoding. 11 Return to unconstrained rate. Hiller et al. Standards Track - Expires 06/01 3 RTP Payload Format for EVRC December, 2000 4.1 EVRC codec frame The fixed header is followed by one or more EVRC frames, each representing 20 milliseconds of encoded audio. If there is more than one frame in a given RTP payload, they must represent contiguous 20 millisecond samples. An EVRC payload frame represents one encoded speech frame. The layout of fields is shown in Figure 2. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |F|Q| FT | 0, 16, 80, or 171 EVRC Encoded Bits... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ / / +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ... 0-pad to octet boundary| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 2: EVRC codec frame format The fields should be set as follows: F (1 bit): Indicates if this frame is followed by additional frames in the same RTP payload. F=1 further frames follow, F=0 last frame. Q (1 bit): The payload quality bit indicates, if not set, that the payload is severely damaged. FT (3 bits): Frame type indicator, indicates the EVRC speech coding mode. It should be set to one of the following values: FT Value Meaning --------- ------- 000 Rate 1 (171 bit) frame. 001 Rate 1/2 (80 bit) frame. 010 Reserved for future use. 011 Rate 1/8 (16 bit) frame. 100 Blank frame (0 bits follow). 101 Reserved for future use. 110 Erasure (0 bits follow). 111 Reserved for future use. Note that erred frames are indicated with the use of the Q bit, above. EVRC encoded bits: This is the speech codec encoded data field. Padding bits, if necessary to achieve an integral Hiller et al. Standards Track - Expires 06/01 4 RTP Payload Format for EVRC December, 2000 number of bytes are at the end of the encoded speech data. The number of padding bits is between 0 and 7. 5. Fragmentation Due to delay constraints an RTP packet will usually carry only one vocoder frame. The length of the vocoder frames is rather short, on the order of two to a couple dozen bytes so that fragmentation of the vocoder frame is not an issue. Consequently, there is no need to design a fragmentation and reassembly mechanism to handle MTU issues as required in RFC 2736 when RTP packets may be fragmented. 6. The EVRC MIME Type Registration The MIME-name for the EVRC codec is allocated from the IETF tree since EVRC is expected to be a widely used codec for voice-over-IP applications. Media Type Name: audio Media Subtype Name: EVRC Required Parameters: none Optional parameters for RTP mode: ptime: Defined as usual for RTP audio. mode-set: Requested EVRC codec rates. Should be a comma-separated list of values acceptable as FT fields in EVRC payloads. maxframes: Maximum number of EVRC speech frames in one RTP packet. The receiver may set this parameter in order to limit buffering requirements or delay. Optional parameters for storage mode: none Encoding considerations for RTP mode: see section 4 of this document. Encoding considerations for storage mode: The EVRC speech frames are packed into consecutive compound EVRC payloads, see section 4. The compound EVRC payloads must be stored in sequential order. This implies that the first octet after payload n must be the first octet of payload (n+1). Furthermore, missing frames and non-received frames during non-speech period must be encapsulated into a compound EVRC payload as blank frames or erasures (frame type 4 or 6 from Section 4). Each receiving entity that accepts this MIME type must be able to decode all EVRC coding modes. Hiller et al. Standards Track - Expires 06/01 5 RTP Payload Format for EVRC December, 2000 Security considerations: see section 7 "Security". Public specification: this document. Additional information for storage mode: Magic number: none File extensions: evc, EVC Macintosh file type code: none Object identifier or OID: none Person & email address to contact for further information: mccap@lucent.com tom.hiller@lucent.com Intended usage: COMMON. It is expected that many VoIP applications (as well as mobile applications) will use this type. Author/Change controller: mccap@lucent.com tom.hiller@lucent.com 6.1 Mapping to SDP Parameters Please note that this chapter applies to the RTP mode only. Parameters are mapped to SDP [Handley98] as usual. Example usage in SDP: m=audio 49120 RTP/EVRC 97 a=rtpmap:97 EVRC a=fmtp:97 mode-set=0,1,3,4,6; maxframes=2 7. Security Considerations For confidentiality and integrity, EVRC packets may be protected via IP Security or by end-to-end payload encryption and authentication which is outside the scope of this draft. Note that efficient transmission over a wireless link may be made impossible if end-to- end IP Security is used, because the frame type bits will be invisible. End-to-end payload protection is then a more attractive option, and it should cover only the vocoded data, not the mode request, F, Q, or FT bits. 8. References [Bradner96] Bradner, S., "The Internet Standards Process -- Revision 3", BCP 9, RFC 2026, October 1996. Hiller et al. Standards Track - Expires 06/01 6 RTP Payload Format for EVRC December, 2000 [Bradner97] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997 [Handley98] M. Handley and V. Jacobson, "SDP: Session Description Protocol", RFC 2327, April 1998 [TIA-IS127] TIA/EIA/IS-127 "Enhanced Variable Rate Codec, Speech Service Option 3 for Wideband Spread Spectrum Digital Systems" [3GPP2-EVRC] C.S0014-0 "Enhanced Variable Rate Codec (EVRC)" [TIA-IS95] TIA/EIA/IS-95-B "Mobile Station - Base Station Compatibility Standard for Wideband Spread Spectrum Cellular Systems". 9. Author's Addresses Tom Hiller Lucent Technologies Room 2F-218 263 Shuman Drive Naperville, IL USA 60137 Phone: +1 630 979 7673 Email: tom.hiller@lucent.com Peter J. McCann Lucent Technologies Room 2Z-305 263 Shuman Drive Naperville, IL USA 60137 Phone: +1 630 713 9359 Email: mccap@lucent.com Michael D. Turner Lucent Technologies Room 2A-203 67 Whippany Rd Whippany, NJ USA 07981 Phone: +1 973 386 3579 Email: mdturner@lucent.com Ajay Rajkumar Lucent Technologies Room 1A-235 67 Whippany Rd Whippany, NJ USA 07981 Phone: +1 973 386 5249 Email: ajayrajkumar@lucent.com Hiller et al. Standards Track - Expires 06/01 7 RTP Payload Format for EVRC December, 2000 Acknowledgements Much of this document was modeled on the in-progress draft for the AMR payload format. Full Copyright Statement "Copyright (C) The Internet Society (date). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implmentation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Hiller et al. Standards Track - Expires 06/01 8