INTERNET-DRAFT                                        Katsushi Kobayashi
draft-ietf-avt-dv-audio-00.txt         Communication Research Laboratory
                                                          Akimichi Ogawa
                                                         Keio University
                                                          Stephen Casner
                                                           Cisco Systems
                                                         Carsten Bormann
                                                 Universitaet Bremen TZI
                                                        October 22, 1999
                                                      Expires April 2000

          RTP Payload Format for 12-, 20- and 24-bit DV Audio

Status of this Memo

   This document is an Internet-Draft and is in full conformance with
   all provisions of Section 10 of RFC2026.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

1. Abstract

   This document specifies the packetization scheme for encapsulating
   the 12-bit nonlinear, 20-bit linear and 24-bit linear audio data
   streams into a payload of the Real-time Transport Protocol (RTP).
   This Internet draft is a revision of the draft named "draft-
   kobayashi-dv-audio12-00.txt". Changing the title is due to the draft
   incorporating 20- and 24-bit audio modes in addition to 12-bit.

2. Introduction

   This document describes the sampling of audio data in 12 bits
   nonlinear, 20 bits linear and 24 bits linear, and specifies the
   encapsulation of the audio data into the Real-time Transport Protocol


Kobayashi, et al           Expires April 2000                 [Page 1]


Internet Draft                                          October 22, 1999


   (RTP), version 2 [1,2].  The audio formats are used in DAT and DV
   video devices [3,4].  The packetization scheme for audio data in 16
   bits linear encoding (L16) is already specified [2,5].  The
   packetization scheme specified in this document basically follows
   those formats. Thus, this document just specifies the differences
   from L16.  The reader is advised to consult RFC1890 along with this
   specification.  This document also specifies the out-band method to
   indicate whether analog preemphasis has been applied to the audio
   data.

   2.1 Terminology

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119 [6]

3. The need for the RTP encapsulation for 12-, 20- and 24-bit audio.

   Many high quality digital audio and visual systems, such as DAT and
   DV, adopt sample-based audio encoding. Various audio formats are
   defined in accordance with the various situations.  To transport the
   audio data with RTP, an RTP encapsulation needs to be defined for
   each specific format. Only 16 bits linear audio encapsulation has
   been defined as L16. Some other encoding formats have already
   appeared, such as 12 bits nonlinear, 20 bits linear and 24 bits
   linear used in the DAT and DV video world. This specification defines
   the RTP payload encapsulation format in order to use the new
   encodings in the RTP environment.

   The format of 12-bit nonlinear audio defined in IEC61119 is the same
   as 16-bit linear audio except for the packing of each sampled data
   element [3].  An element of 12-bit nonlinear audio data can be
   obtained from the corresponding 16-bit linear one. It would be easy
   to convert 12-bit nonlinear audio into 16-bit linear form at the RTP
   sender and transmit it using the L16 audio format already defined.
   However, the amount of data consumed by 16 bits is an increase of 33%
   compared with 12 bits and it wastes network bandwidth with
   meaningless data.

4. 12-bit nonlinear audio encapsulation

   The 12-bit nonlinear audio format in DAT and DV, called LP (Long
   Play) audio, is specified in IEC61119 [3]. Each sample of 12-bit
   nonlinear audio is derived from a single sample of 16-bit linear
   audio. The conversion detail between 16 and 12 bits is shown in Table
   1. The 12-bit samples are packed contiguously into payload octets
   starting with the most significant bit. When there is an odd number
   of samples in the payload, the four LSBs of the last octet are


Kobayashi, et al           Expires April 2000                 [Page 2]


Internet Draft                                          October 22, 1999


   unused. Parameters other than quantization, e.g., sampling frequency
   and audio channel assignment, are the same as in the L16.

   When conveying encoding information in an SDP [7] session
   description, the 12-bit nonlinear audio payload format specified here
   is given the encoding name "DAT12". Thus, the media format
   representation might be:

      m=audio 49230 RTP/AVP 97 98
      a=rtpmap:97 DAT12/32000/2
      a=rtpmap:98 L16/48000/2


    16 bits linear (X)                          12 bits nonlinear (Y)
   ------------------------------------------------------------
     32,767 (7FFFh) Y = INT(X/64) + (600h)        2,047 (7FFh)
     16,384 (4000h)                               1,792 (700h)
   ------------------------------------------------------------
     16,383 (3FFFh) Y = INT(X/32) + (500h)        1,791 (6FFh)
      8,192 (2000h)                               1,536 (600h)
   ------------------------------------------------------------
      8,191 (1FFFh) Y = INT(X/16) + (400h)        1,535 (5FFh)
      4,096 (1000h)                               1,280 (500h)
   ------------------------------------------------------------
      4,095 (0FFFh) Y = INT(X/8) + (300h)         1,279 (4FFh)
      2,048 (0800h)                               1,024 (400h)
   ------------------------------------------------------------
      2,047 (07FFh) Y = INT(X/4) + (200h)         1,023 (3FFh)
      1,024 (0400h)                                 768 (300h)
   ------------------------------------------------------------
      1,023 (03FFh) Y = INT(X/2) + (100h)           767 (2FFh)
        512 (0200h)                                 512 (200h)
   ------------------------------------------------------------
        511 (01FFh) Y = X                           511 (1FFh)
          0 (0000h)                                   0 (000h)
   ------------------------------------------------------------
         -1 (FFFFh) Y = X                            -1 (FFFh)
       -512 (FE00h)                                -512 (E00h)
   ------------------------------------------------------------
       -513 (FFFFh) Y = INT((X + 1)/2) - (101h)    -513 (DFFh)
     -1,024 (FE00h)                                -768 (D00h)
   ------------------------------------------------------------
     -1,025 (FBFFh) Y = INT((X + 1)/4) - (201h)    -769 (CFFh)
     -2,048 (F800h)                              -1,024 (C00h)
   ------------------------------------------------------------
     -2,049 (F7FFh) Y = INT((X + 1)/8) - (301h)  -1,025 (BFFh)
     -4,096 (F000h)                              -1,280 (B00h)
   ------------------------------------------------------------


Kobayashi, et al           Expires April 2000                 [Page 3]


Internet Draft                                          October 22, 1999


     -4,097 (EFFFh) Y = INT((X + 1)/16) - (401h) -1,281 (AFFh)
     -8,192 (E000h)                              -1,536 (A00h)
   ------------------------------------------------------------
     -8,193 (DFFFh) Y = INT((X + 1)/32) - (501h) -1,537 (9FFh)
    -16,384 (C000h)                              -1,792 (900h)
   ------------------------------------------------------------
    -16,385 (BFFFh) Y = INT((X + 1)/64) - (601h) -1,793 (8FFh)
    -32,768 (8000h)                              -2,048 (800h)
   ------------------------------------------------------------
    Table 1. Conversion between 16 bits to 12 bits [3]

5. 20- and 24-bit linear audio encapsulation

   The 20- and 24-bit linear audio encodings are simply an extension of
   the L16 linear audio encoding [2].  The 20- or 24-bit uncompressed
   audio data samples are represented as signed values in two's
   complement notation. The samples are packed contiguously into payload
   octets starting with the most significant bit.  For the 20-bit
   encoding, when there is an odd number of samples in the payload, the
   four LSBs of the last octet are unused.  When conveying encoding
   information in an SDP session description, the 20- and 24-bit linear
   audio payload format specified here are given the encoding names
   "L20" and "L24", respectively. The SDP audio media description might
   be shown as:

      m=audio 49230 RTP/AVP 99 100
      a=rtpmap:99 L20/48000/2
      a=rtpmap:100 L24/48000

6. Audio data with preemphasis

   In order to improve the high-frequency characteristics in audio,
   analog preemphasis is often applied to the signal before
   quantization.  If analog preemphasis was applied before the payload
   data was sampled, the time constant parameter of the preemphasis may
   be conveyed in SDP with a format specific parameter a=fmtp line in
   microsecond/microsecond units.  For backward compatibility, if
   preemphasis has not been applied, the emphasis parameter MUST NOT be
   included in the SDP record.  An example SDP record showing
   preemphasis applied only to payload type 99 might be as follows:

      m=audio 49230 RTP/AVP 99 100
      a=rtpmap:99 L20/48000/2
      a=fmtp:99 emphasis:50/15
      a=rtpmap:100 L24/48000

   This preemphasis attribute could be used with L16 audio.


Kobayashi, et al           Expires April 2000                 [Page 4]


Internet Draft                                          October 22, 1999


7. MIME registration

   This document defines some new RTP payload names and associated MIME
   types, DAT12, L20 and L24. The registration form for these MIME types
   are shown in below:

   7.1 DAT12 registration form

   MIME media type name: audio

     MIME subtype name: DAT12

     Required parameters:
        rate: number of samples per second -- Permissible values for
          rate are 8000, 11025, 16000, 22050, 24000, 32000, 44100, and
          48000 samples per second.

     Optional parameters:
        channels: how many audio streams are interleaved defaults
          to 1; stereo would be 2, etc.  Interleaving takes place
          between individual 12-bit samples.

        emphasis: the time constant value in microsecond/microsecond
          units if analog preemphasis is applied.  Defaults to none.

     Encoding considerations: DAT12 audio can be transmitted with
        RTP as specified in "draft-ietf-avt-dv-audio-00".

     Security considerations: None

     Interoperability considerations: NONE

     Published specification: IEC1119 Standard.
                              draft-ietf-avt-dv-audio-01

     Applications which use this media type:
                              Audio communication.

     Additional information: None

       Magic number(s): None
       File extension(s): None
       Macintosh File Type Code(s): None

     Person & email address to contact for further information:
       Katsushi Kobayashi
       e-mail: ikob@koganei.wide.ad.jp


Kobayashi, et al           Expires April 2000                 [Page 5]


Internet Draft                                          October 22, 1999


     Intended usage: COMMON

     Author/Change controller:
       Katsushi Kobayashi
       e-mail: ikob@koganei.wide.ad.jp

   7.2 L20 registration form

   MIME media type name: audio

     MIME subtype name: L20

     Required parameters:
        rate: number of samples per second -- Permissible values for
          rate are 8000, 11025, 16000, 22050, 24000, 32000, 44100, and
          48000 samples per second.

     Optional parameters:
        channels: how many audio streams are interleaved defaults
          to 1; stereo would be 2, etc.  Interleaving takes place
          between individual 20-bit samples.

        emphasis: the time constant value in microsecond/microsecond
          units if analog preemphasis is applied.  Defaults to none.

     Encoding considerations: L20 audio can be transmitted with
       RTP as specified in "draft-ietf-avt-dv-audio-00".

     Security considerations: None

     Interoperability considerations: NONE

     Published specification: draft-ietf-avt-dv-audio-01

     Applications which use this media type:
                              Audio communication.

     Additional information: None

       Magic number(s): None
       File extension(s): None
       Macintosh File Type Code(s): None

     Person & email address to contact for further information:
       Katsushi Kobayashi
       e-mail: ikob@koganei.wide.ad.jp

     Intended usage: COMMON


Kobayashi, et al           Expires April 2000                 [Page 6]


Internet Draft                                          October 22, 1999


     Author/Change controller:
       Katsushi Kobayashi
       e-mail: ikob@koganei.wide.ad.jp

   7.3 L24 registration form

   MIME media type name: audio

     MIME subtype name: L24

     Required parameters:
        rate: number of samples per second -- Permissible values for
          rate are 8000, 11025, 16000, 22050, 24000, 32000, 44100, and
          48000 samples per second.

     Optional parameters:
        channels: how many audio streams are interleaved defaults
          to 1; stereo would be 2, etc.  Interleaving takes place
          between individual 24-bit samples.

        emphasis: the time constant value in microsecond/microsecond
          units if analog preemphasis is applied.  Defaults to none.

     Encoding considerations: L24 audio can be transmitted with
        RTP as specified in "draft-ietf-avt-dv-audio-00".

     Security considerations: None

     Interoperability considerations: NONE

     Published specification: draft-ietf-avt-dv-audio-01

     Applications which use this media type:
                              Audio communication.

     Additional information: None

       Magic number(s): None
       File extension(s): None
       Macintosh File Type Code(s): None

     Person & email address to contact for further information:
       Katsushi Kobayashi
       e-mail: ikob@koganei.wide.ad.jp

     Intended usage: COMMON

     Author/Change controller:


Kobayashi, et al           Expires April 2000                 [Page 7]


Internet Draft                                          October 22, 1999


       Katsushi Kobayashi
       e-mail: ikob@koganei.wide.ad.jp

8. Security Considerations

   RTP packets using the payload format defined in this specification
   are subject to the security considerations discussed in the RTP
   specification [1], and any appropriate RTP profile.  This implies
   that confidentiality of the media streams is achieved by encryption.
   Because the data compression used along with this payload format is
   applied to end-to-end, encryption may be performed after compression
   so there is no conflict between the two operations.

   A potential denial-of-service threat exists for data encodings using
   compression techniques that have non-uniform receiver-end
   computational load.  The attacker can inject pathological datagrams
   into the stream which are complex to decode and cause the receiver to
   be overloaded.  However, this encoding does not exhibit any
   significant non-uniformity.

   As with any IP-based protocol, in some circumstances a receiver may
   be overloaded simply by the receipt of too many packets, either
   desired or undesired.  Network-layer authentication may be used to
   discard packets from undesired sources, but the processing cost of
   the authentication itself may be too high.  In a multicast
   environment, pruning of specific sources may be implemented in future
   versions of IGMP [8] and in multicast routing protocols to allow a
   receiver to select which sources are allowed to reach it.

9. Full Copyright Statement

   Copyright (C) The Internet Society (1999). All Rights Reserved.

   This document and translations of it may be copied and furnished to
   others, and derivative works that comment on or otherwise explain it
   or assist in its implementation may be prepared, copied, published
   and distributed, in whole or in part, without restriction of any
   kind, provided that the above copyright notice and this paragraph are
   included on all such copies and derivative works.

   However, this document itself may not be modified in any way, such as
   by removing the copyright notice or references to the Internet
   Society or other Internet organizations, except as needed for the
   purpose of developing Internet standards in which case the procedures
   for copyrights defined in the Internet Standards process must be
   followed, or as required to translate it into languages other than
   English.


Kobayashi, et al           Expires April 2000                 [Page 8]


Internet Draft                                          October 22, 1999


   The limited permissions granted above are perpetual and will not be
   revoked by the Internet Society or its successors or assigns.

   This document and the information contained herein is provided on an
   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE."

10. Authors' Addresses

   Katsushi Kobayashi, Communication Research Laboratory, 4-2-1 Nukii-
   kita machi, Koganei Tokyo 184-8795 JAPAN EMail:
   ikob@koganei.wide.ad.jp

   Akimichi Ogawa, Keio University, 5322 Endo, Fujisawa Kanagawa 252
   JAPAN EMail:  akimichi@sfc.wide.ad.jp

   Stephen L. Casner, Cisco Systems, Inc.,  170 West Tasman Drive San
   Jose, CA 95134-1706 United States EMail: casner@cisco.com

   Carsten Bormann, Universitaet Bremen, FB3 TZI Postfach 330440 D-28334
   Bremen, GERMANY Phone: +49.421.218-7024 Fax: +49.421.218-7000 EMail:
   cabo@tzi.org

11. Bibliography


   [1] H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson.  RTP: A
       transport protocol for real-time applications. IETF Audio/Video
       Transport Working Group, January 1996. RFC1889.

   [2] Schulzrinne, H., "RTP Profile for Audio and Video Conferences
       with Minimal Control", RFC 1890, January 1996.

   [3] IEC61119, Digital audio tape cassette system (DAT), November
       1992.

   [4] IEC 61834, Helical-scan digital video cassette recording system
       using 6,35 mm magnetic tape for consumer use (525-60, 625-50,
       1125-60 and 1250-50 systems), August 1998.

   [5] Salsman, J., "The Audio/L16 MIME content type", RFC 2586, May
       1999.

   [6] S. Bradner, "Key words for use in RFCs to Indicate Requirement
       Levels", RFC 2119, March 1997.


Kobayashi, et al           Expires April 2000                 [Page 9]


Internet Draft                                          October 22, 1999


   [7] M.Handley, V.Jacobson, "SDP: Session Description Protocol",
       RFC 2327, April 1998.

   [8] Deering, S., "Host Extensions for IP Multicasting", STD 5,
       RFC 1112, August 1989.


Kobayashi, et al           Expires April 2000                [Page 10]