Audio/Video Transport Group B. Wimmer, B. Hammer Internet Draft Siemens AG Document: Category: Informational July 14, 2000 Expires: January 14, 2001 MIME Type Registration of AMR Speech Codec Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026 [1]. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html 1. Abstract This document defines MIME-name audio/AMR, encoding format definitions and its parameters for GSM Adaptive Multi Rate (AMR) speech codec for - real-time transport over RTP - storage purpose. 2. Conventions used in this document The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC-2119 [2]. 3. Introduction GSM Adaptive Multi Rate (AMR) codec [3] is a speech codec developed by the European Telecommunications Standards Institute (ETSI). The AMR codec is the mandatory speech codec for GSM and for third generation systems by 3GPP. Although the AMR-codec was primarily designed for wireless systems it gives excellent speech quality also for wirelined systems. Wimmer & Hammer [Page 1] INTERNET DRAFT MIME Type Registration of AMR Speech Codec July,14 2000 The AMR allows eight different speech coding modes. The bit-rates range from 4.75 to 12.2 kbit/s, see Table 1. The sampling rate is 8000 Hz and processing is performed on 20 ms frames. Coding mode bit-rate note -------------------------------------------- 0 4.75 kbit/s 1 5.15 kbit/s 2 5.9 kbit/s 3 6.7 kbit/s PDC-EFR 4 7.4 kbit/s IS-641 [5] 5 7.95 kbit/s 6 10.2 kbit/s 7 12.2 kbit/s GSM EFR [6] Table 1: AMR speech coding modes As noted in Table 1 several rates are equivalent to other standards. Standard-compliant implementations [3] must support all eight coding modes. In addition to this, mode change may occur at any time. Furthermore Discontinuous Transmission / comfort noise (DTX/CN) functionality [7] may be used. Besides [7] for the particular coding modes 3, 4 and 7 specific DTX/CN schemes are defined, see [5][6][8]. 4. Definition of Modes This chapter defines the default and mandatory settings for both modes real-time transport over RTP and storage. 4.1. Real-time Transport over RTP Mode This mode is designed for applications using the real-time transport protocol (RTP), including the AMR RTP profile, e.g. Voice over IP applications. It is possible that the AMR decoder may want to receive certain AMR mode or a subset of AMR modes. In the end to end transmission parts of the chain may have limitations in the number of modes in the active codec set, e.g. the GSM radio link can only use a subset of maximum four modes. Therefore, it is possible to request specific set of AMR modes in capability description and it is mandatory for ARM encoder to abide this request. If request for mode set is not given, AMR encoder can freely decide which AMR mode to use. Although in principle AMR codec can perform a mode change at any time between any two modes, it is possible to set limitations for mode changes. The decoder has possibility to define the minimum number of frames between mode changes and to limit the mode change to happen into neighboring modes only. Wimmer & Hammer [Page 2] INTERNET DRAFT MIME Type Registration of AMR Speech Codec July,14 2000 In addition to AMR DTX/CN scheme, the three codec standards that are part of the AMR also have their own DTX/CN schemes ([6][7][12]). To enable interoperability with terminals supporting these standards, AMR can optionally be extended to support also these CN schemes. The CN capabilities are signaled in capability description. If no CN capabilities are reported, it is assumed that AMR CN is supported. If CN capabilities are reported, all supported CN types (including AMR CN) must be signaled. It is also possible to limit the number of frames, both AMR and redundancy frames, encapsulated into one RTP packet. This is an optional feature and if no parameter is given in capability description, the transmitter can encapsulate any number of AMR/redundancy frames into one RTP packet. If both parameters 'redundancy' and 'maxframes' are signaled in the capability description then 'maxframes' denotes the couple of AMR and redundancy frames as one unit, e.g. if maxframes = 4 and redundancy is present then the RTP payload covers four times the couple of (AMR & redundancy) frames. If only the parameter 'maxframes' is signaled then no redundancy frames are used and the number of AMR frames are denoted by the 'maxframes' parameter. There is also an option 'redundancy' to transmit also redundancy frames to help the receiver to recover from AMR frame packet losses in difficult transmission conditions. This is defined by section 4.3 of [10]. If this capability 'redundancy' is signalled then an RTP payload always contains at least one couple of AMR and redundancy frames, whereby the redundancy frame follows the AMR frame. If the 'maxframes' parameter is signaled then the RTP payload covers maxframes time this couple. 4.2. Storage Mode This mode targets applications that use stored AMR speech data, e.g. AMR speech data attached to an e-mail or AMR speech file download. The AMR frames of this mode SHALL be coded by the following rules that are based on [10]: - the AMR speech frames is coded as defined in section 4.2.1. and 4.2.2 of [10]. - each AMR frame header SHALL contain the L-bit field. In this mode the AMR speech data will be coded without the particular knowledge of the receiver capabilities. Therefore the AMR speech coder SHALL expect that only the following AMR modes are supported by the receiver: - all eight speech coding modes (index 0-7 of Table 1 of [10]) - AMR DTX/CN mode [7] (index 8 of Table 1 of [10]), all other DTX/CNs are not mandatory. - no-transmission mode (index 15 of Table 1 of [10]) This implies that each decoder that uses this Storage Mode, SHALL at least support these 3 rules. During speech pauses AMR frames SHALL be generated containing no-transmission mode indication. In Storage Mode no Payload Block Sorting Algorithmn SHALL be applied. Wimmer & Hammer [Page 3] INTERNET DRAFT MIME Type Registration of AMR Speech Codec July,14 2000 5. MIME Registration MIME-name for the AMR codec is allocated from IETF tree since AMR is expected to be the widely used speech codec in wireless and wirelined Internet messaging and email applications. This MIME type shall be used for stored AMR data only, e.g. attachment in an email message. It may be also used by the 3G Multimedia Messaging Service (MMS) [8]. Media Type name: audio Media subtype name: AMR Required parameters: none Optional parameters: Real-time Transport over RTP: ptime: Definition as usual in RTP audio. mode: AMR mode. Possible values are: 0,...,7 (see Table 1a [2]). mode-change-period: Defines a number N which restricts the mode changes in such a way that mode changes are only allowed on multiples of N, initial state of the phase is arbitrary. If this parameter is not present, mode change can happen at any time. mode-change-neighbor: If present, mode changes SHALL be made to neighboring modes only. If not present, change between any two modes is allowed. amr-cn: If present, GSM AMR DTX/CN is supported. Note that if no CN capabilities are reported, AMR DTX/CN is assumed to be supported, i.e. this parameter is only sent together with one of the following CN parameters. pdc-efr-cn:If present, PDC-EFR DTX/CN is supported, otherwise not supported. is-641-cn: If present, IS-641 DTX/CN is supported, otherwise not supported. gsm-efr-cn:If present, GSM EFR DTX/CN is supported, otherwise not supported. maxframes: Maximum number of AMR frames in one RTP payload packet. The receiver may set this parameter in order to limit the buffering requirements or delay. redundancy:If present, transmission of redundancy frames is supported, otherwise not supported. Storage Mode: none Encoding considerations: Please refer to section 4. Security considerations: none Interoperability considerations: none Public specification: please refer to chapter 5 "references". Wimmer & Hammer [Page 4] INTERNET DRAFT MIME Type Registration of AMR Speech Codec July,14 2000 Additional information: Magic number: none File extensions: amr, AMR Macintosh file type code: none Object identifier or OID: none Personal information Bernhard Wimmer, Siemens AG Bernard Hammer, Siemens AG E-Mail: bernhard.wimmer@mch.siemens.de, bernard.hammer@icn.siemens.de Intended usage: COMMON. It is expected that many Internet applications (as well as mobile applications) will use this type. Author/Change controller: bernhard.wimmer@mch.siemens.de 6. References [1] Bradner, S., "The Internet Standards Process -- Revision 3", BCP 9, RFC 2026, October 1996. [2] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997 [3] GSM 06.90: Adaptive Multi-Rate (AMR) Speech Transcoding [4] RCR STD-27H, Personal Digital Cellular Telecommunication System RCR Standard [5] TIA/EIA -136-Rev.A, part 410 - TDMA Cellular/PCS - Radio Interface, Enhanced Full Rate Voice Codec (ACELP). Formerly IS-641. TIA published standard, 1998 [6] GSM 06.60: Enhanced Full Rate (EFR) Speech Transcoding [7] GSM 06.92: Comfort noise aspects for Adaptive Multi-Rate (AMR) speech traffic channels [8] 3G TS 23.140, "Multimedia Messaging Service (MMS)", Functional Description [9] 3G TS 26.101 (V.3.0.0) - Mandatory Speech Codec Speech Processing Functions, AMR Speech Codec Frame Structure [10] IETF draft-fingscheid-avr-rtp-amr-00.txt, "RTP Payload Format for AMR" Wimmer & Hammer [Page 5] INTERNET DRAFT MIME Type Registration of AMR Speech Codec July,14 2000 6. Contact Person Bernhard Wimmer Siemens AG Grillparzer Street 12A 81675 Muenchen Germany Phone: +49-89-722-23247 Email: bernhard.wimmer@mch.siemens.de Bernard Hammer Siemens AG Hofmann Street 51 81000 Muenchen Germany Phone: - Email: bernard.hammer@icn.siemens.de Full Copyright Statement "Copyright (C) The Internet Society (date). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES; EXPRESS OR IMPLIED; INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Wimmer & Hammer [Page 6]