Audio-Video Transport Working Group                          Tom Hiller 
INTERNET-DRAFT                                          Peter J. McCann 
Document: <draft-mccann-avt-rtp-evrc-00.txt>          Michael D. Turner 
                                                          Ajay Rajkumar 
                                                    Lucent Technologies 
                                                          December 2000 
 
 
                      RTP Payload Format for EVRC 
 
 
Status of this Memo 
    
   This document is an Internet-Draft and is in full conformance with 
   all provisions of Section 10 of RFC2026 [Bradner96]. 
    
   Internet-Drafts are working documents of the Internet Engineering 
   Task Force (IETF), its areas, and its working groups. Note that 
   other groups may also distribute working documents as Internet-
   Drafts. Internet-Drafts are draft documents valid for a maximum of 
   six months and may be updated, replaced, or obsoleted by other 
   documents at any time. It is inappropriate to use Internet- Drafts 
   as reference material or to cite them other than as "work in 
   progress."  
    
   The list of current Internet-Drafts can be accessed at 
   http://www.ietf.org/ietf/1id-abstracts.txt  
    
   The list of Internet-Draft Shadow Directories can be accessed at 
   http://www.ietf.org/shadow.html. 
 
    
1. Abstract 
    
   This document describes how to carry Enhanced Variable Rate Codec 
   (EVRC) encoded speech in RTP packets. 
    
    
2. Conventions used in this document 
 
   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 
   "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in 
   this document are to be interpreted as described in RFC-2119 
   [Bradner97]. 
    
    
3. Introduction 
    
   The Telecommunications Industry Association (TIA) [TIA-IS127] as 
   well as the 3rd Generation Partnership Project 2 (3GPP2) [3GPP2-
   EVRC] have standardized the Enhanced Variable Rate Codec (EVRC).  
   The EVRC incorporates voice-activity detection that allows the 

  
Hiller et al.      Standards Track - Expires 06/01                  1 
                   RTP Payload Format for EVRC       December, 2000 
 
 
   speech coder to select the appropriate number of bits to encode each 
   frame thus causing silence or background noise to be coded with the 
   smallest number of bits.  This automatically results in a reduction 
   in the number of transmitted bits during periods of silence or 
   background noise.  
    
   The EVRC was originally designed for use with the IS-95 CDMA air 
   interface [TIA-IS95].  The EVRC uses 3 of the 4 primary traffic 
   packet types permitted by IS-95 Multiplex Option 1: rate 1 (171 
   bits/packet), rate 1/2 (80 bits/packet), and rate 1/8 (16 
   bits/packet).  The sampling frequency is always 8kHz, and speech 
   data are always processed in 20 millisecond frames.  The 3 frame 
   types therefore have different bit rates ranging from 0.8 kbits/s to 
   8.55 kbits/s.  Since the bit rate is driven by the voice activity 
   and the rate can change on any speech frame boundary, the rate of 
   encoding must be transmitted along with the speech information bits 
   in each packet. 
    
   Existing IS-95 implementations of EVRC always transmit at least a 
   1/8 rate frame every 20ms to allow monitoring of radio signal 
   strength, adjust power, and perform handoffs as necessary even 
   during silence periods.  These activities help maintain voice 
   quality in a wireless environment, but they also imply that there is 
   no mechanism in place that would allow an IS-95 peer to detect 
   silence without actually analyzing the EVRC payload itself, that is, 
   at least partially implementing the EVRC vocoder.  Also, in many 
   implementations the power control is integrated with the vocoding in 
   such a way that the transmitter may request the vocoder switch to 
   one of the lower frame rates in an effort to squeeze some higher 
   priority traffic into the limited channel bandwidth. 
    
   This draft specifies an RTP payload that will support EVRC encoded 
   speech data.  To support IS-95 peers that do not contain DSP 
   hardware, but which may nevertheless be endpoints of such an RTP 
   stream, we do not require that the 'M' bit be set to indicate the 
   start of a talk spurt.  Also, to support the rate adjustment 
   strategy discussed above, we provide bits for in-band signaling to 
   the remote RTP endpoint to adjust the rate in the reverse direction. 
  
    
4. RTP Payload Format for EVRC 
    
   In this section we describe the usage of the fixed RTP header and 
   then give the actual payload format specification. 
 
 
4.1 Fixed RTP Header Usage 
    
   We make no changes to the existing fixed RTP header.  The RTP header 
   marker bit (M) MAY be used to mark (M=1) the RTP packets containing 
   the first speech frame after silence; otherwise the marker bit is 
   set to 0 (M=0).  Note that some implementations (especially IS-95 
  
Hiller et al.      Standards Track - Expires 06/01                  2 
                   RTP Payload Format for EVRC       December, 2000 
 
 
   air interfaces without vocoder DSP hardware) may not be able to 
   recognize silence, so implementations should not rely on the M bit 
   for this purpose.  
    
   The timestamp reflects the sampling instant of the first octet in 
   the RTP data packet.  The timestamp SHALL be increased by 160 for 
   each consecutive 20ms sampling interval.  If there are N frames in a 
   given RTP packet, this means that the next RTP packet will have a 
   timestamp that is increased by at least N*160 tick intervals. 
    
   The sequence number MUST be incremented by one for each RTP packet 
   generated. 
    
    
4.2 RTP Payload 
    
   We require the following 3-bit header to appear at the start of the 
   RTP payload: 
    
                0 1 2  
               +-+-+-+ 
               |R|CMR| 
               +-+-+-+ 
    
        Figure 1: Fixed EVRC Payload Header 
    
   Setting the 'R' bit indicates that this packet is requesting a codec 
   rate change for the reverse direction.  The 'CMR' field is two bits 
   (always present) that indicate the requested mode.  It should be set 
   to one of the following values: 
    
          CMR Value     Meaning 
          ---------     ------- 
             00         Please switch to a maximum of rate 1/8 (16 bit) 
                        encoding. 
             01         Reserved for future use. 
             10         Please switch to a maximum of rate 1/2 (80 bit) 
                        encoding. 
             11         Return to unconstrained rate. 
 
    
Hiller et al.      Standards Track - Expires 06/01                  3 
                   RTP Payload Format for EVRC       December, 2000 
 
 
4.1 EVRC codec frame 
    
   The fixed header is followed by one or more EVRC frames, each 
   representing 20 milliseconds of encoded audio.  If there is more 
   than one frame in a given RTP payload, they must represent 
   contiguous 20 millisecond samples. 
    
   An EVRC payload frame represents one encoded speech frame.  The 
   layout of fields is shown in Figure 2.   
    
       0                   1                   2                   3 
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
      |F|Q| FT  |    0, 16, 80, or 171 EVRC Encoded Bits...  
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
      /                                                               / 
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
        ...    0-pad to octet boundary| 
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
    
                  Figure 2: EVRC codec frame format 
    
    
   The fields should be set as follows: 
    
   F (1 bit): Indicates if this frame is followed by additional frames 
              in the same RTP payload. F=1 further frames follow, F=0 
              last frame. 
       
   Q (1 bit): The payload quality bit indicates, if not set, that the 
              payload is severely damaged. 
    
   FT (3 bits): Frame type indicator, indicates the EVRC speech coding 
                mode.  It should be set to one of the following values: 
    
          FT Value      Meaning 
          ---------     ------- 
            000         Rate 1 (171 bit) frame. 
            001         Rate 1/2 (80 bit) frame. 
            010         Reserved for future use. 
            011         Rate 1/8 (16 bit) frame. 
            100         Blank frame (0 bits follow). 
            101         Reserved for future use. 
            110         Erasure (0 bits follow). 
            111         Reserved for future use. 
    
   Note that erred frames are indicated with the use of the Q bit, 
   above. 
    
   EVRC encoded bits: This is the speech codec encoded data field. 
                     Padding bits, if necessary to achieve an integral 
  
Hiller et al.      Standards Track - Expires 06/01                  4 
                   RTP Payload Format for EVRC       December, 2000 
 
 
                     number of bytes are at the end of the encoded 
                     speech data. The number of padding bits is 
                     between 0 and 7. 
 
 
5. Fragmentation 
    
   Due to delay constraints an RTP packet will usually carry only one 
   vocoder frame. The length of the vocoder frames is rather short, on 
   the order of two to a couple dozen bytes so that fragmentation of 
   the vocoder frame is not an issue.  Consequently, there is no need 
   to design a fragmentation and reassembly mechanism to handle MTU 
   issues as required in RFC 2736 when RTP packets may be fragmented. 
    
    
6. The EVRC MIME Type Registration 
    
   The MIME-name for the EVRC codec is allocated from the IETF tree 
   since EVRC is expected to be a widely used codec for voice-over-IP 
   applications. 
    
   Media Type Name:     audio 
    
   Media Subtype Name:  EVRC 
    
   Required Parameters: none 
    
   Optional parameters for RTP mode: 
    
     ptime:    Defined as usual for RTP audio. 
    
     mode-set: Requested EVRC codec rates.  Should be a comma-separated 
               list of values acceptable as FT fields in EVRC payloads. 
    
     maxframes: Maximum number of EVRC speech frames in one RTP packet.  
               The receiver may set this parameter in order to limit 
               buffering requirements or delay. 
    
   Optional parameters for storage mode: none 
    
   Encoding considerations for RTP mode: see section 4 of this 
                                         document.  
    
   Encoding considerations for storage mode: The EVRC speech frames are 
   packed into consecutive compound EVRC payloads, see section 4. The 
   compound EVRC payloads must be stored in sequential order. This 
   implies that the first octet after payload n must be the first octet 
   of payload (n+1). Furthermore, missing frames and non-received 
   frames during non-speech period must be encapsulated into a compound 
   EVRC payload as blank frames or erasures (frame type 4 or 6 from 
   Section 4).  Each receiving entity that accepts this MIME type must 
   be able to decode all EVRC coding modes. 
  
Hiller et al.      Standards Track - Expires 06/01                  5 
                   RTP Payload Format for EVRC       December, 2000 
 
 
   Security considerations: see section 7 "Security". 
    
   Public specification: this document. 
    
   Additional information for storage mode: 
      Magic number: none 
      File extensions: evc, EVC 
      Macintosh file type code: none 
      Object identifier or OID: none 
    
   Person & email address to contact for further information: 
      mccap@lucent.com 
      tom.hiller@lucent.com 
     
   Intended usage: COMMON. It is expected that many VoIP applications 
   (as well as mobile applications) will use this type. 
    
   Author/Change controller: 
      mccap@lucent.com 
      tom.hiller@lucent.com 
    
    
6.1 Mapping to SDP Parameters 
    
    Please note that this chapter applies to the RTP mode only. 
    
    Parameters are mapped to SDP [Handley98] as usual. 
    Example usage in SDP: 
     m=audio 49120 RTP/EVRC 97 
     a=rtpmap:97 EVRC 
     a=fmtp:97 mode-set=0,1,3,4,6; maxframes=2 
    
    
7. Security Considerations 
 
   For confidentiality and integrity, EVRC packets may be protected via 
   IP Security or by end-to-end payload encryption and authentication 
   which is outside the scope of this draft.  Note that efficient 
   transmission over a wireless link may be made impossible if end-to-
   end IP Security is used, because the frame type bits will be 
   invisible.  End-to-end payload protection is then a more attractive 
   option, and it should cover only the vocoded data, not the mode 
   request, F, Q, or FT bits. 
    
    
8. References 
    
   [Bradner96]  Bradner, S., "The Internet Standards Process -- 
                Revision 3", BCP 9, RFC 2026, October 1996. 
    

Hiller et al.      Standards Track - Expires 06/01                  6 
                   RTP Payload Format for EVRC       December, 2000 
 
 
   [Bradner97]  Bradner, S., "Key words for use in RFCs to Indicate 
                Requirement Levels", BCP 14, RFC 2119, March 1997 
    
   [Handley98]  M. Handley and V. Jacobson, "SDP: Session Description 
                Protocol", RFC 2327, April 1998 
    
   [TIA-IS127]  TIA/EIA/IS-127 "Enhanced Variable Rate Codec, Speech 
                Service Option 3 for Wideband Spread Spectrum Digital 
                Systems" 
           
   [3GPP2-EVRC]                  C.S0014-0 "Enhanced Variable Rate Codec (EVRC)" 
    
   [TIA-IS95]   TIA/EIA/IS-95-B "Mobile Station - Base Station 
                Compatibility Standard for Wideband Spread Spectrum 
                Cellular Systems". 
           
    
9. Author's Addresses 
    
   Tom Hiller 
   Lucent Technologies 
   Room 2F-218 
   263 Shuman Drive 
   Naperville, IL  USA 60137 
   Phone: +1 630 979 7673 
   Email: tom.hiller@lucent.com 
    
   Peter J. McCann 
   Lucent Technologies 
   Room 2Z-305 
   263 Shuman Drive 
   Naperville, IL  USA 60137 
   Phone: +1 630 713 9359 
   Email: mccap@lucent.com 
    
   Michael D. Turner 
   Lucent Technologies 
   Room 2A-203 
   67 Whippany Rd 
   Whippany, NJ USA 07981 
   Phone: +1 973 386 3579 
   Email: mdturner@lucent.com 
 
   Ajay Rajkumar 
   Lucent Technologies 
   Room 1A-235 
   67 Whippany Rd 
   Whippany, NJ USA 07981 
   Phone: +1 973 386 5249 
   Email: ajayrajkumar@lucent.com 
    
  
Hiller et al.      Standards Track - Expires 06/01                  7 
                   RTP Payload Format for EVRC       December, 2000 
 
 
Acknowledgements 
 
   Much of this document was modeled on the in-progress draft for the 
   AMR payload format. 
    
    
Full Copyright Statement 
 

   "Copyright (C) The Internet Society (date). All Rights Reserved. 
   This document and translations of it may be copied and furnished to 
   others, and derivative works that comment on or otherwise explain it 
   or assist in its implmentation may be prepared, copied, published 
   and distributed, in whole or in part, without restriction of any 
   kind, provided that the above copyright notice and this paragraph 
   are included on all such copies and derivative works. However, this 
   document itself may not be modified in any way, such as by removing 
   the copyright notice or references to the Internet Society or other 
   Internet organizations, except as needed for the purpose of 
   developing Internet standards in which case the procedures for 
   copyrights defined in the Internet Standards process must be 
   followed, or as required to translate it into languages other than 
   English. 
    
   The limited permissions granted above are perpetual and will not be 
   revoked by the Internet Society or its successors or assigns. 
    
   This document and the information contained herein is provided on an 
   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING 
   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING 
   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION 
   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF 
   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 


Hiller et al.      Standards Track - Expires 06/01                  8