INTERNET-DRAFT Laile L. Di Silvestro (Microsoft) Expires in 6 months Greg Baribault (Microsoft) Microsoft Corporation June 20, 1999 Waveform Audio File Format MIME Sub-type Registration Status of this memo: This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. This document is an Internet Draft. Internet Drafts are working documents of the Internet Engineering Task Force (IETF), its Areas, and its Working Groups. Note that other groups may also distribute working documents as Internet Drafts. Internet Drafts are valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet Drafts as reference material or to cite them other than as a "work in progress". The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. To learn the current status of any Internet-Draft, please check the "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or ftp.isi.edu (US West Coast). This draft is being discussed by the Electronic Messaging Association VPIM work group. To subscribe to the mailing list, send a message to EMA Listserv Requests [listserv@listmail.ema.org] with the line "subscribe VPIM-L" in the body of the message. Di Silvestro, Baribault Expires 12/20/99 [Page 1] Internet Draft audio/wav 4/1/99 Abstract This document describes the registration of the MIME sub-type audio/wav for Waveform Audio File Format. This audio file format is based on RIFF and is defined by Microsoft in the Platform SDK. 1. Introduction This document describes the registration of the MIME sub-type audio/wav for the encapsulation of toll-quality audio in the Waveform Audio File Format. This audio file format is based on Resource Interchange File Format (RIFF), and is defined by Microsoft in the Platform SDK. The MIME subtype "wav" is being defined primarily for use in multimedia and voice messaging standards. the Voice Profile for Internet Messaging, version 3 [VPIM3] working draft specifies that all VPIM version 3 compliant implementations MAY generate audio/wav bodyparts and MUST receive audio/wav bodyparts. The VPIM version 3 specification further states that all compliant implementations MUST support receipt of wav-encapsulated 32KADPCM (g.726 ADPCM), BASIC (g.711 mu-law), and MS-GSM (Microsoft g.610 GSM) encoded audio. Because the Waveform Audio File format is not well-defined and has not undergone a process of standardization, this document briefly defines the format that will be supported by VPIM version 3. For more detailed information, refer to the specification. This document does not obsolete the informational draft RFC 2361 [WAVE] which describes audio/vnd.wav. Whereas RFC 2361 describes a mechanism for indicating a codec registered in the wav or avi vendor tree registries, this document proposes a standard for specifying wav-encapsulated audio content in a MIME stream. The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [REQ]. 2. WAV Definition Waveform Audio File Format is a file format for the storing of audio data in data chunks according to the Resource Interchange File Format (RIFF). Although the Waveform format is described in detail in xxxxxxx, lack of standardization and a proliferation of interpretations and enhancements make the format difficult to implement and support in an interoperable fashion. This document seeks to rectify the situation by defining the Waveform Audio File Format features that MUST be inplemented and supported for conformance with the proposed VPIM version 3 standard. Di Silvestro, Baribault Expires 12/20/99 [Page 2] Internet Draft audio/wav 4/1/99 2.1 Data Organization Data MUST be stored in 8-bit bytes in little-endian order. Multi-byte values MUST be stored with the low-order bytes first, and the bits left-justified: (lsb = least-significant bit, msb = most-significant bit) 7 6 5 4 3 2 1 0 +-----------------------+ char: | msb lsb | +-----------------------+ 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 +-----------------------+-----------------------+ short: | msb byte 0 | byte 1 lsb | +-----------------------+-----------------------+ 2.2 File Format The Waveform Audio File Format follows the Resource Interchange File Format (RIFF) standard in which all data is organized into 'chunks' and 'sub-chunks.' Each chunk MUST comprise a 4-byte chunk ID, a 4-byte length field specifying the size of the data, and the chunk data. To be compliant with this proposed standard, wav-formatted audio data MUST include the following chunks: RIFF header chunk: ID = 'RIFF' Format chunk: ID = 'fmt ' Sound data chunk: ID = 'data' Fact chunk: ID = 'fact' The chunks MAY appear in any order except that the Format chunk MUST be placed before the Sound data chunk (but not necessarily contiguous to the Sound data chunk). Any additional chunks MUST be expected and MAY be ignored. 2.2.1 The RIFF Header Chunk The RIFF header corresponds to the outermost chunk. In an audio/wav file, it MUST adhere to the following format: OFFSET LENGTH VALUE DESCRIPTION 0 4 bytes 'RIFF' The file format ID. 4 4 bytes Length of the file minus (-) 8 bytes. 8 4 bytes 'WAVE' The data format ID. 2.2.2 The Format Chunk The Format chunk specifies the characteristics of the audio data necessary to decompress it and play it. Each audio/wav file MUST include one and only one Format chunk. This chunk MUST include the following fields: Di Silvestro, Baribault Expires 12/20/99 [Page 3] Internet Draft audio/wav 4/1/99 OFFSET LENGTH VALUE DESCRIPTION 12 4 bytes 'fmt ' The chunk ID. 16 4 bytes 32 Length of the chunk excluding the 8 bytes for the ID and length. 20 4 bytes The codec ID. 24 4 bytes The number of channels. 28 8 bytes Samples per second. 36 8 bytes Average bytes per second. 44 4 bytes Block alignment. 48 4 bytes Bits per sample. Codec ID: The codec ID indicates what codec was used to compress the audio data. Three codecs are supported by the proposed VPIM version 3 standard, and one of them SHOULD be specified in the Codec ID field. The Codec ID field MAY indicate a codec other that the three listed below only in situations where it is certain that the recipient has the corresponding capabilities. CODEC ID g.711 mu-law 0x0007 g.610 MS-GSM 0x0031 g.726 32kADPCM 0x0064 Number of Channels: To preserve network bandwidth and minimize memory requirements, the Format chunk SHOULD specify and the Data chunk SHOULD provide only one channel (mono) unless it is certain that the recipient supports multi-channel playback. CHANNELS VALUE one (mono) 1 Samples per Second: This field indicates the rate at which the audio is to be played (once uncompressed), expressed in sample frames per second. The following table specified the samples per second that correspond to each VPIM version 3 codec: CODEC RATE (samples per second) g.711 mu-law 8000 g.610 MS-GSM 8000 g.726 32kADPCM 8000 Average Bytes per Second: This field specifies the number of bytes that play per second. It provides an indication of the buffer size needed to store the audio in order to avoid latency. It SHOULD be calculated according to the following formula: samples/second * block alignment (rounded up to nearest whole number). CODEC RATE (average bytes per second) g.711 mu-law 8000 g.610 MS-GSM 1625 g.726 32kADPCM 4000 Di Silvestro, Baribault Expires 12/20/99 [Page 4] Internet Draft audio/wav 4/1/99 Block Alignment: This field indicates the size of a sample frame in bytes. It SHOULD be calculated according to the following formula: number of channels * (bits per sample / 8) CODEC SIZE g.711 mu-law 1 g.610 MS-GSM 65 g.726 32kADPCM 2 since there are 4 bits per sample, the frames will not align on one byte. It is customary to add silence bits (oxF) to the end of the sample to make the frame end on a byte boundary. Bits per Sample: This field specifies the bit resolution of a sample point. CODEC BITS (bits per sample) g.711 mu-law 8 g.610 MS-GSM 0 data immediately followed by: 0x40 0x01 g.726 32kADPCM 4 2.2.3 The Data Chunk The Data chunk contains the compressed audio data. This chunk MUST be preceded (though not immediately) by the Format chunk. The Data chunk MUST adhere to the following format: OFFSET LENGTH VALUE DESCRIPTION 52 4 bytes 'data' The chunk ID. 56 4 bytes Length of the data (chunk size minus (-) 8 bytes. 60 The compressed audio. 2.2.4 The Fact Chunk All audio/wav files MUST include a Fact chunk as they contain compressed data. The Fact chunk MUSt contain one field indicating the size (in sample points) of the audio data after decompression. The Fact chunk MUST adhere to the following format: OFFSET LENGTH VALUE DESCRIPTION 4 bytes 'fact' The chunk ID. 4 bytes 8 Chunk size minus (-) 8 bytes. 8 bytes Sample length. Di Silvestro, Baribault Expires 12/20/99 [Page 5] Internet Draft audio/wav 4/1/99 3. MIME Definition 3.1 audio/wav [Specification] describes a file format for the encapsulation of raw and compressed audio data. This Waveform Audio File Format (WAVE) is based on the Resource Interchange File Format specification developed by Microsoft and IBM in 1991. The WAVE format organizes audio data and the information needed to decompress and play it in chunks. The MIME sub-type audio/WAV is defined to hold binary audio data encoded in 32 kbit/s ADPCM (g.726), mu-law (g.711), or MS-GSM (g.610), and encapsulated in the WAVE format. The content transfer encoding is typically either binary or base64. 3.2 VPIM Usage The audio/wav sub-type is a component of the proposed VPIM version 3 specification [VPIM3]. In this context, the Content-Description headers is used to succinctly describe the contents of the audio body. All VPIM Version 3 systems MUST be capable of receiving audio encapsulated in a WAVE file format. Sending systems MAY choose to send raw audio data or encapsulate it in the WAVE file format. All audio data MUST be compressed in one of the VPIM v3 codecs and encapsulated according to the guidelines provided in the section 2.0 of this document. Refer to the VPIM Specifcation for proper usage. 3.3 Relation to RFC 2361 RFC 2361, "WAVE and AVI Codec Registries," is an informational draft describing IANA namespaces for codecs registered in Microsoft's WAVE and AVI registries. Such codecs may be described in the following format: audio/vnd.wave; codec = [codec ID]. This format is not suited to the description of a wave file as defined in this document, as it does not indicate the format standard that audio/wav must adhere to for interoperability between messaging systems. On desktop-oriented messaging systems, audio/wav (rather than audio/vnd.wave) is the defacto standard. Di Silvestro, Baribault Expires 12/20/99 [Page 6] Internet Draft audio/wav 4/1/99 4. IANA Registration To: ietf-types@iana.org Subject: Registration of MIME media type audio/wav MIME media type name: audio MIME subtype name: wav Required parameters: none Optional parameters: codec = [codec id] Encoding considerations: Binary or Base-64 generally preferred Security considerations: There are no known security risks with the sending or playing of audio data. Wav-encapsulated audio data is typically interpreted only by a codec supported by a wav audio player. Unintended information introduced into the data stream will result in noise. Interoperability considerations: Published specification: None Applications which use this media type: Multimedia and voice messaging applications Additional information: Magic number(s): ? File extension(s): .wav Macintosh File Type Code(s): WAVE Person & email address to contact for further information: Laile L. Di Silvestro lailed@microsoft.com Greg Baribault gregbari@microsoft.com Intended usage: COMMON Author/Change controller: Laile L. Di Silvestro Greg Baribault Di Silvestro, Baribault Expires 12/20/99 [Page 7] Internet Draft audio/wav 4/1/99 5. Authors' Addresses Laile L. Di Silvestro Microsoft Corporation One Microsoft Way Redmond, WA 98052 lailed@microsoft.com Greg Baribault Microsoft Corporation One Microsoft Way Redmond, WA 98052 gregbari@microsoft.com 6. References [G726] CCITT Recommendation G.726 (1990), General Aspects of Digital Transmission Systems, Terminal Equipment - 40, 32, 24,16 kbit/s Adaptive Differential Pulse Code Modulation (ADPCM). [MIME4] Freed, N., Klensin, J., and J. Postel, "Multipurpose Internet Mail Extensions (MIME) Part Four: Registration Procedures", RFC 2048, November 1996. [VPIM1] Vaudreuil, G., "Voice Profile for Internet Mail", RFC 1911, February 1996. [VPIM2] Vaudreuil, G., and G. Parsons, "Voice Profile for Internet Mail - version 2", RFC 2421, September 1998. [VPIM3] Vaudreuil, Greg, "Voice Profile for Internet Mail, Version 2", Work In Progress, , December 1998. [REQ] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [WAVE] Fleischman, E., "WAVE and AVI Codec Registries", RFC 2361, June 1998. Di Silvestro, Baribault Expires 12/20/99 [Page 8] Internet Draft audio/wav 4/1/99 7. Full Copyright Statement Copyright (C) The Internet Society (1999). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. Microsoft hereby grants to the IETF, a perpetual, nonexclusive, non-sublicensable, non assignable, royalty-free, world-wide right and license under any Microsoft copyrights in this contribution to copy, publish and distribute the contribution, as well as a right and license of the same scope to any derivative works prepared by the IETF and based on, or incorporating all or part of the contribution. Microsoft further agrees that, upon adoption of this contribution as an Internet Standard, Microsoft will grant to any party a royalty-free license on other reasonable and non-discriminatory terms under applicable Microsoft intellectual property rights to implement and use the technology proposed in this contribution for the purpose of supporting the Internet Standard. Microsoft expressly reserves all other rights it may have in the material and subject matter of this contribution. Microsoft expressly disclaims any and all warranties regarding this contribution including any warranty that (a) this contribution does not violate the rights of others, (b) the owners, if any, of other rights in this contribution have been informed of the rights and permissions granted to IETF herein, and (c) any required authorizations from such owners have been obtained. This document and the information contained herein is provided on an "AS IS" basis and MICROSOFT DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OFTHE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. IN NO EVENT WILL MICROSOFT BE LIABLE TO ANY OTHER PARTY INCLUDING THE IETF AND ITS MEMBERS FOR THE COST OF PROCURING SUBSTITUTE GOODS OR SERVICES, LOST PROFITS, LOSS OF USE, LOSS OF DATA, OR ANY INCIDENTAL, CONSEQUENTIAL, INDIRECT, OR SPECIAL DAMAGES WHETHER UNDER CONTRACT, TORT, WARRANTY, OR OTHERWISE, ARISING IN ANY WAY OUT OF THIS OR ANY OTHER AGREEMENT RELATING TO THIS DOCUMENT, WHETHER OR NOT SUCH PARTY HAD ADVANCE NOTICE OF THE POSSIBILITY OF SUCH DAMAGES. Di Silvestro, Baribault Expires 12/20/99 [Page 9]