INTERNET-DRAFT Katsushi Kobayashi draft-ietf-avt-dv-audio-00.txt Communication Research Laboratory Akimichi Ogawa Keio University Stephen Casner Cisco Systems Carsten Bormann Universitaet Bremen TZI October 22, 1999 Expires April 2000 RTP Payload Format for 12-, 20- and 24-bit DV Audio Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. 1. Abstract This document specifies the packetization scheme for encapsulating the 12-bit nonlinear, 20-bit linear and 24-bit linear audio data streams into a payload of the Real-time Transport Protocol (RTP). This Internet draft is a revision of the draft named "draft- kobayashi-dv-audio12-00.txt". Changing the title is due to the draft incorporating 20- and 24-bit audio modes in addition to 12-bit. 2. Introduction This document describes the sampling of audio data in 12 bits nonlinear, 20 bits linear and 24 bits linear, and specifies the encapsulation of the audio data into the Real-time Transport Protocol Kobayashi, et al Expires April 2000 [Page 1] Internet Draft October 22, 1999 (RTP), version 2 [1,2]. The audio formats are used in DAT and DV video devices [3,4]. The packetization scheme for audio data in 16 bits linear encoding (L16) is already specified [2,5]. The packetization scheme specified in this document basically follows those formats. Thus, this document just specifies the differences from L16. The reader is advised to consult RFC1890 along with this specification. This document also specifies the out-band method to indicate whether analog preemphasis has been applied to the audio data. 2.1 Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [6] 3. The need for the RTP encapsulation for 12-, 20- and 24-bit audio. Many high quality digital audio and visual systems, such as DAT and DV, adopt sample-based audio encoding. Various audio formats are defined in accordance with the various situations. To transport the audio data with RTP, an RTP encapsulation needs to be defined for each specific format. Only 16 bits linear audio encapsulation has been defined as L16. Some other encoding formats have already appeared, such as 12 bits nonlinear, 20 bits linear and 24 bits linear used in the DAT and DV video world. This specification defines the RTP payload encapsulation format in order to use the new encodings in the RTP environment. The format of 12-bit nonlinear audio defined in IEC61119 is the same as 16-bit linear audio except for the packing of each sampled data element [3]. An element of 12-bit nonlinear audio data can be obtained from the corresponding 16-bit linear one. It would be easy to convert 12-bit nonlinear audio into 16-bit linear form at the RTP sender and transmit it using the L16 audio format already defined. However, the amount of data consumed by 16 bits is an increase of 33% compared with 12 bits and it wastes network bandwidth with meaningless data. 4. 12-bit nonlinear audio encapsulation The 12-bit nonlinear audio format in DAT and DV, called LP (Long Play) audio, is specified in IEC61119 [3]. Each sample of 12-bit nonlinear audio is derived from a single sample of 16-bit linear audio. The conversion detail between 16 and 12 bits is shown in Table 1. The 12-bit samples are packed contiguously into payload octets starting with the most significant bit. When there is an odd number of samples in the payload, the four LSBs of the last octet are Kobayashi, et al Expires April 2000 [Page 2] Internet Draft October 22, 1999 unused. Parameters other than quantization, e.g., sampling frequency and audio channel assignment, are the same as in the L16. When conveying encoding information in an SDP [7] session description, the 12-bit nonlinear audio payload format specified here is given the encoding name "DAT12". Thus, the media format representation might be: m=audio 49230 RTP/AVP 97 98 a=rtpmap:97 DAT12/32000/2 a=rtpmap:98 L16/48000/2 16 bits linear (X) 12 bits nonlinear (Y) ------------------------------------------------------------ 32,767 (7FFFh) Y = INT(X/64) + (600h) 2,047 (7FFh) 16,384 (4000h) 1,792 (700h) ------------------------------------------------------------ 16,383 (3FFFh) Y = INT(X/32) + (500h) 1,791 (6FFh) 8,192 (2000h) 1,536 (600h) ------------------------------------------------------------ 8,191 (1FFFh) Y = INT(X/16) + (400h) 1,535 (5FFh) 4,096 (1000h) 1,280 (500h) ------------------------------------------------------------ 4,095 (0FFFh) Y = INT(X/8) + (300h) 1,279 (4FFh) 2,048 (0800h) 1,024 (400h) ------------------------------------------------------------ 2,047 (07FFh) Y = INT(X/4) + (200h) 1,023 (3FFh) 1,024 (0400h) 768 (300h) ------------------------------------------------------------ 1,023 (03FFh) Y = INT(X/2) + (100h) 767 (2FFh) 512 (0200h) 512 (200h) ------------------------------------------------------------ 511 (01FFh) Y = X 511 (1FFh) 0 (0000h) 0 (000h) ------------------------------------------------------------ -1 (FFFFh) Y = X -1 (FFFh) -512 (FE00h) -512 (E00h) ------------------------------------------------------------ -513 (FFFFh) Y = INT((X + 1)/2) - (101h) -513 (DFFh) -1,024 (FE00h) -768 (D00h) ------------------------------------------------------------ -1,025 (FBFFh) Y = INT((X + 1)/4) - (201h) -769 (CFFh) -2,048 (F800h) -1,024 (C00h) ------------------------------------------------------------ -2,049 (F7FFh) Y = INT((X + 1)/8) - (301h) -1,025 (BFFh) -4,096 (F000h) -1,280 (B00h) ------------------------------------------------------------ Kobayashi, et al Expires April 2000 [Page 3] Internet Draft October 22, 1999 -4,097 (EFFFh) Y = INT((X + 1)/16) - (401h) -1,281 (AFFh) -8,192 (E000h) -1,536 (A00h) ------------------------------------------------------------ -8,193 (DFFFh) Y = INT((X + 1)/32) - (501h) -1,537 (9FFh) -16,384 (C000h) -1,792 (900h) ------------------------------------------------------------ -16,385 (BFFFh) Y = INT((X + 1)/64) - (601h) -1,793 (8FFh) -32,768 (8000h) -2,048 (800h) ------------------------------------------------------------ Table 1. Conversion between 16 bits to 12 bits [3] 5. 20- and 24-bit linear audio encapsulation The 20- and 24-bit linear audio encodings are simply an extension of the L16 linear audio encoding [2]. The 20- or 24-bit uncompressed audio data samples are represented as signed values in two's complement notation. The samples are packed contiguously into payload octets starting with the most significant bit. For the 20-bit encoding, when there is an odd number of samples in the payload, the four LSBs of the last octet are unused. When conveying encoding information in an SDP session description, the 20- and 24-bit linear audio payload format specified here are given the encoding names "L20" and "L24", respectively. The SDP audio media description might be shown as: m=audio 49230 RTP/AVP 99 100 a=rtpmap:99 L20/48000/2 a=rtpmap:100 L24/48000 6. Audio data with preemphasis In order to improve the high-frequency characteristics in audio, analog preemphasis is often applied to the signal before quantization. If analog preemphasis was applied before the payload data was sampled, the time constant parameter of the preemphasis may be conveyed in SDP with a format specific parameter a=fmtp line in microsecond/microsecond units. For backward compatibility, if preemphasis has not been applied, the emphasis parameter MUST NOT be included in the SDP record. An example SDP record showing preemphasis applied only to payload type 99 might be as follows: m=audio 49230 RTP/AVP 99 100 a=rtpmap:99 L20/48000/2 a=fmtp:99 emphasis:50/15 a=rtpmap:100 L24/48000 This preemphasis attribute could be used with L16 audio. Kobayashi, et al Expires April 2000 [Page 4] Internet Draft October 22, 1999 7. MIME registration This document defines some new RTP payload names and associated MIME types, DAT12, L20 and L24. The registration form for these MIME types are shown in below: 7.1 DAT12 registration form MIME media type name: audio MIME subtype name: DAT12 Required parameters: rate: number of samples per second -- Permissible values for rate are 8000, 11025, 16000, 22050, 24000, 32000, 44100, and 48000 samples per second. Optional parameters: channels: how many audio streams are interleaved defaults to 1; stereo would be 2, etc. Interleaving takes place between individual 12-bit samples. emphasis: the time constant value in microsecond/microsecond units if analog preemphasis is applied. Defaults to none. Encoding considerations: DAT12 audio can be transmitted with RTP as specified in "draft-ietf-avt-dv-audio-00". Security considerations: None Interoperability considerations: NONE Published specification: IEC1119 Standard. draft-ietf-avt-dv-audio-01 Applications which use this media type: Audio communication. Additional information: None Magic number(s): None File extension(s): None Macintosh File Type Code(s): None Person & email address to contact for further information: Katsushi Kobayashi e-mail: ikob@koganei.wide.ad.jp Kobayashi, et al Expires April 2000 [Page 5] Internet Draft October 22, 1999 Intended usage: COMMON Author/Change controller: Katsushi Kobayashi e-mail: ikob@koganei.wide.ad.jp 7.2 L20 registration form MIME media type name: audio MIME subtype name: L20 Required parameters: rate: number of samples per second -- Permissible values for rate are 8000, 11025, 16000, 22050, 24000, 32000, 44100, and 48000 samples per second. Optional parameters: channels: how many audio streams are interleaved defaults to 1; stereo would be 2, etc. Interleaving takes place between individual 20-bit samples. emphasis: the time constant value in microsecond/microsecond units if analog preemphasis is applied. Defaults to none. Encoding considerations: L20 audio can be transmitted with RTP as specified in "draft-ietf-avt-dv-audio-00". Security considerations: None Interoperability considerations: NONE Published specification: draft-ietf-avt-dv-audio-01 Applications which use this media type: Audio communication. Additional information: None Magic number(s): None File extension(s): None Macintosh File Type Code(s): None Person & email address to contact for further information: Katsushi Kobayashi e-mail: ikob@koganei.wide.ad.jp Intended usage: COMMON Kobayashi, et al Expires April 2000 [Page 6] Internet Draft October 22, 1999 Author/Change controller: Katsushi Kobayashi e-mail: ikob@koganei.wide.ad.jp 7.3 L24 registration form MIME media type name: audio MIME subtype name: L24 Required parameters: rate: number of samples per second -- Permissible values for rate are 8000, 11025, 16000, 22050, 24000, 32000, 44100, and 48000 samples per second. Optional parameters: channels: how many audio streams are interleaved defaults to 1; stereo would be 2, etc. Interleaving takes place between individual 24-bit samples. emphasis: the time constant value in microsecond/microsecond units if analog preemphasis is applied. Defaults to none. Encoding considerations: L24 audio can be transmitted with RTP as specified in "draft-ietf-avt-dv-audio-00". Security considerations: None Interoperability considerations: NONE Published specification: draft-ietf-avt-dv-audio-01 Applications which use this media type: Audio communication. Additional information: None Magic number(s): None File extension(s): None Macintosh File Type Code(s): None Person & email address to contact for further information: Katsushi Kobayashi e-mail: ikob@koganei.wide.ad.jp Intended usage: COMMON Author/Change controller: Kobayashi, et al Expires April 2000 [Page 7] Internet Draft October 22, 1999 Katsushi Kobayashi e-mail: ikob@koganei.wide.ad.jp 8. Security Considerations RTP packets using the payload format defined in this specification are subject to the security considerations discussed in the RTP specification [1], and any appropriate RTP profile. This implies that confidentiality of the media streams is achieved by encryption. Because the data compression used along with this payload format is applied to end-to-end, encryption may be performed after compression so there is no conflict between the two operations. A potential denial-of-service threat exists for data encodings using compression techniques that have non-uniform receiver-end computational load. The attacker can inject pathological datagrams into the stream which are complex to decode and cause the receiver to be overloaded. However, this encoding does not exhibit any significant non-uniformity. As with any IP-based protocol, in some circumstances a receiver may be overloaded simply by the receipt of too many packets, either desired or undesired. Network-layer authentication may be used to discard packets from undesired sources, but the processing cost of the authentication itself may be too high. In a multicast environment, pruning of specific sources may be implemented in future versions of IGMP [8] and in multicast routing protocols to allow a receiver to select which sources are allowed to reach it. 9. Full Copyright Statement Copyright (C) The Internet Society (1999). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. Kobayashi, et al Expires April 2000 [Page 8] Internet Draft October 22, 1999 The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE." 10. Authors' Addresses Katsushi Kobayashi, Communication Research Laboratory, 4-2-1 Nukii- kita machi, Koganei Tokyo 184-8795 JAPAN EMail: ikob@koganei.wide.ad.jp Akimichi Ogawa, Keio University, 5322 Endo, Fujisawa Kanagawa 252 JAPAN EMail: akimichi@sfc.wide.ad.jp Stephen L. Casner, Cisco Systems, Inc., 170 West Tasman Drive San Jose, CA 95134-1706 United States EMail: casner@cisco.com Carsten Bormann, Universitaet Bremen, FB3 TZI Postfach 330440 D-28334 Bremen, GERMANY Phone: +49.421.218-7024 Fax: +49.421.218-7000 EMail: cabo@tzi.org 11. Bibliography [1] H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson. RTP: A transport protocol for real-time applications. IETF Audio/Video Transport Working Group, January 1996. RFC1889. [2] Schulzrinne, H., "RTP Profile for Audio and Video Conferences with Minimal Control", RFC 1890, January 1996. [3] IEC61119, Digital audio tape cassette system (DAT), November 1992. [4] IEC 61834, Helical-scan digital video cassette recording system using 6,35 mm magnetic tape for consumer use (525-60, 625-50, 1125-60 and 1250-50 systems), August 1998. [5] Salsman, J., "The Audio/L16 MIME content type", RFC 2586, May 1999. [6] S. Bradner, "Key words for use in RFCs to Indicate Requirement Levels", RFC 2119, March 1997. Kobayashi, et al Expires April 2000 [Page 9] Internet Draft October 22, 1999 [7] M.Handley, V.Jacobson, "SDP: Session Description Protocol", RFC 2327, April 1998. [8] Deering, S., "Host Extensions for IP Multicasting", STD 5, RFC 1112, August 1989. Kobayashi, et al Expires April 2000 [Page 10]