Internet DRAFT - draft-hatanaka-avt-rtp-atrac-family

draft-hatanaka-avt-rtp-atrac-family




Audio/Video Transport                                         M. Romaine
Internet-Draft                                               M. Hatanaka
Expires: January 6, 2005                                    J. Matsumoto
                                                                    SONY
                                                            July 8, 2004


                  RTP Payload Format for ATRAC Family
                 draft-hatanaka-avt-rtp-atrac-family-02

Status of this Memo

   By submitting this Internet-Draft, I certify that any applicable
   patent or other IPR claims of which I am aware have been disclosed,
   or will be disclosed, and any of which I become aware will be 
   disclosed, in accordance with RFC 3668.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as
   Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt.

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

   This Internet-Draft will expire on January 6, 2005.

Copyright Notice

   Copyright (C) The Internet Society (2004).  All Rights Reserved.

Abstract

   This document describes an RTP payload format for efficient and
   flexible transporting of audio data encoded with the Adaptive
   TRansform Audio Codec (ATRAC) family of codecs.  Recent enhancements
   to the ATRAC family of codecs support high quality audio coding with
   multiple channels.  The RTP payload format as presented in this
   document includes support for data fragmentation and elementary
   redundancy measures.




Romaine, et al.         Expires January 6, 2005                 [Page 1]

Internet-Draft    RTP Payload Format for ATRAC Family          July 2004


Table of Contents

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
     1.1   ATRAC Details  . . . . . . . . . . . . . . . . . . . . . .  3
   2.  Conventions  . . . . . . . . . . . . . . . . . . . . . . . . .  4
   3.  Payload Format . . . . . . . . . . . . . . . . . . . . . . . .  5
     3.1   RTP Header . . . . . . . . . . . . . . . . . . . . . . . .  5
     3.2   Payload Header . . . . . . . . . . . . . . . . . . . . . .  5
     3.3   Payload Data . . . . . . . . . . . . . . . . . . . . . . .  6
   4.  Frame Packetization  . . . . . . . . . . . . . . . . . . . . .  8
     4.1   Example Fragmented ATRAC Frame . . . . . . . . . . . . . .  8
   5.  Payload Format Parameters  . . . . . . . . . . . . . . . . . . 10
     5.1   ATRAC3 MIME Registration . . . . . . . . . . . . . . . . . 10
     5.2   ATRAC-X MIME Registraion . . . . . . . . . . . . . . . . . 11
     5.3   Channel Mapping Configuration Table  . . . . . . . . . . . 13
     5.4   Mapping MIME Parameters into SDP . . . . . . . . . . . . . 13
       5.4.1   For MIME subtype ATRAC3  . . . . . . . . . . . . . . . 14
       5.4.2   For MIME subtype ATRAC-X . . . . . . . . . . . . . . . 14
     5.5   Offer-Answer Model Considerations  . . . . . . . . . . . . 14
       5.5.1   For MIME subtype ATRAC3  . . . . . . . . . . . . . . . 14
       5.5.2   For MIME subtype ATRAC-X . . . . . . . . . . . . . . . 14
     5.6   Example SDP Session Descriptions . . . . . . . . . . . . . 15
     5.7   Example Offer-Answer Exchange  . . . . . . . . . . . . . . 15
   6.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 16
   7.  Security Considerations  . . . . . . . . . . . . . . . . . . . 17
     7.1   Confidentiality  . . . . . . . . . . . . . . . . . . . . . 17
     7.2   Authentication . . . . . . . . . . . . . . . . . . . . . . 17
     7.3   Decoding Validation  . . . . . . . . . . . . . . . . . . . 17
   8.  References . . . . . . . . . . . . . . . . . . . . . . . . . . 18
   8.1   Normative References . . . . . . . . . . . . . . . . . . . . 18
   8.2   Informative References . . . . . . . . . . . . . . . . . . . 18
       Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 18
       Intellectual Property and Copyright Statements . . . . . . . . 20


















Romaine, et al.         Expires January 6, 2005                 [Page 2]

Internet-Draft    RTP Payload Format for ATRAC Family          July 2004


1.  Introduction

   The ATRAC family of perceptual audio codecs are designed to address
   numerous needs for high-quality, low bit-rate audio transfer.  ATRAC
   technology can be found in many consumer and professional products
   and applications, including MD players, voice recorders, mobile
   phones, and CD players.  The need for real-time streaming of audio
   data has grown, and this document details our efforts in increasing
   the product and application space for the ATRAC family of codecs.

   Recent advances in ATRAC technology allow for multiple channels of
   audio to be encoded in customizable groupings.  This should allow for
   future expansions in scaled streaming.  To provide the greatest
   flexibility in streaming any one of the ATRAC family member codecs
   however, this payload format does not distinguish between the codecs
   on a packet level.

   This simplified payload format contains only the basic information
   needed to disassemble a packet of ATRAC audio in order to decode it.
   Timestamps are in sample units, with audio data currently encoded
   into frames of 1024 or 2048 samples depending on the ATRAC version.
   There is also basic support for fragmentation and redundancy, as
   ATRAC frames MAY exceed an MTU size of 1500 octets.

   Although streaming of multi-channel audio is supported depending on
   the ATRAC version used, all encoded audio for a given time period is
   contained within a single frame.  Therefore, there is no interleaving
   nor splitting of audio data on a channel-basis to be concerned with.

1.1  ATRAC Details

   Early versions of the ATRAC codec handled only two channels of audio
   at 44.1kHz sampling frequency, with typical bit-rates between 66kbps
   and 132kbps.  The latest version allows for up to 8 channels of audio
   at 96kHz sampling frequency.  The feasible bit-rate range has also
   expanded, allowing from 8kbps to 1400kbps.

   Depending on the version of ATRAC used, the sample-frame size is
   either 1024 or 2048.  Actual bit-rates are determined by specifying a
   fixed encoded frame-size.  In other words, instead of requesting a
   stereo 44.1kHz stream at, say, 64kbps, one would tell the encoder to
   create encoded frame-sizes of 364bytes.









Romaine, et al.         Expires January 6, 2005                 [Page 3]

Internet-Draft    RTP Payload Format for ATRAC Family          July 2004


2.  Conventions

   The key words "MUST, "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119 [4].














































Romaine, et al.         Expires January 6, 2005                 [Page 4]

Internet-Draft    RTP Payload Format for ATRAC Family          July 2004


3.  Payload Format

3.1  RTP Header

      0                   1                   2                   3
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |V=2|P|X|  CC   |M|     PT      |       sequence number         |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                          timestamp                            |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |            synchronization source (SSRC) identifier           |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |           contributing source (CSRC) identifiers              |
     |                             .....                             |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   Marker (M): 1 bit
   Set to zero as silence suppression is currently not used.

   Payload Type (PT): 7 bits
   The payload type can either be dynamically allocated at the
   application level, or an RTP profile for a class of applications is
   expected to assign the payload type for this format.  A dynamic
   allocation SHOULD designate this format as ATRAC-Family.

   Sequence number: 16 bits
   This field is as defined in RFC 3550 [1].

   Timestamp: 32 bits
   A timestamp representing the sampling time of the first sample of the
   first ATRAC frame in the RTP packet.  The clock frequency MUST be set
   to the sample rate of the encoded audio data, and is conveyed
   out-of-band.

3.2  Payload Header

   The ATRAC family payload header is a scant two octets.  This should
   make processing very simple.

    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |C|FrgNo| Rsrvd |NFrames| FrOff |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   Continuous flag (C): 1 bit  Set to one if this is a continuation of a
   fragmented packet.




Romaine, et al.         Expires January 6, 2005                 [Page 5]

Internet-Draft    RTP Payload Format for ATRAC Family          July 2004


   Fragment Number (FrgNo): 3 bits
   In the event of data fragmentation, this value is 1 for the first
   packet, and increases sequentially for the remaining fragmented data
   packets.

   Number of Frames (NFrames): 4 bits
   The number of frames in this packet.  This allows for a maximum of 16
   ATRAC-encoded audio encapsulations per packet, with 0 indicating one
   frame.  Keep in mind only the first frame is allowed to be
   fragmented.  Additionally, this MUST not be anything other than 0 for
   subsequent packets containing the fragmented frame.

   Frame Offset (FrOff): 4 bits
   The purpose of frame offsets is to provide a basic mechanism for the
   transmission of redundant data.  Redundant frames are sent
   sequentially before any new frames in the same packet.  The timestamp
   also reflects the playback time of the first frame in a packet, even
   if the first frame is a redundant frame.  Frame-size lengths are
   determined during SDP negotiations (one of either 1024 or 2048
   samples), and are fixed for a given session.  A "maxRedundantFrames"
   parameter is also sent during SDP negotations; this allows for the
   necessary buffer size to be calculated in advance.

   As an example of using Frame Offsets, refer to Figure 1, which
   considers a situation when FrOff is 2.  If a packet has 4 frames of
   audio, with each frame representing 1024 samples of audio, then we
   can calculate that playback begins with 2 frames (2048 samples) of
   redundant data, and can allocate buffer space as necessary.  (The
   only other necessary variable is sampling frequency, which MUST have
   been established during SDP negotiations).  This field SHOULD NOT be
   used in packets containing fragmented data.

    |-Fr1-|-Fr2-|-Fr3-|-Fr4-|                         Nth Packet,   TS=1
                |-Fr3-|-Fr4-|-Fr5-|-Fr6-|             N+1th Packet, TS=3
                            |-Fr5-|-Fr6-|-Fr7-|-Fr8-| N+1th Packet, TS=5


3.3  Payload Data

   ATRAC payload data consists of 2 octets which represent the
   byte-length of encoded audio data.  After that, the actual audio data
   follows.

      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |      Block Length     | Rsrvd |  ATRAC data...                  |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+




Romaine, et al.         Expires January 6, 2005                 [Page 6]

Internet-Draft    RTP Payload Format for ATRAC Family          July 2004


   Block length: 12 bits
   The byte length of encoded audio data until the end of the current
   packet.  This is so that in the case of fragmentation, if only a
   subsequent packet is received, decoding can still occur.  12 bits
   allows for a maximum block length of 4096 bytes.  In the event a data
   block is larger than 4096 bytes but would still fit within MTU
   limits, fragmentation MUST occur.












































Romaine, et al.         Expires January 6, 2005                 [Page 7]

Internet-Draft    RTP Payload Format for ATRAC Family          July 2004


4.  Frame Packetization

   Each RTP packet contains either an integer number of ATRAC encoded
   audio frames, with a maximum of 16, or one ATRAC frame fragment.

   As many complete ATRAC frames as can fit in a single path-MTU SHOULD
   be placed in an RTP packet, with the aforementioned maximum of 16.
   However, if an ATRAC frame will not fit into an RTP packet, it MUST
   be fragmented.

   The start of a fragmented frame gets placed in its own RTP packet,
   its Continuous bit (C) set to one, and its Fragment Number (FragNo)
   set to one.  As the frame must be the only one in the packet, the
   Number of Frames field is zero.  Subsequent packets are to contain
   the remaining fragmented frame data, with the Fragment Number
   increasing sequentially and the Continuous bit (C) consistently set
   to one.  As subsequent packets do not contain any new frames, the
   Number of Frames field SHOULD be ignored.  The last packet of
   fragmented data MUST have the Continuous bit (C) set to zero.

   In the event of fragmentation, the basic redundancy measures should
   NOT be used.

4.1  Example Fragmented ATRAC Frame

   An example of a fragmented ATRAC frame is presented below.  The
   encoded audio data frame is split over three RTP packets.  For
   brevity, the RTP packet header details have been excluded.























Romaine, et al.         Expires January 6, 2005                 [Page 8]

Internet-Draft    RTP Payload Format for ATRAC Family          July 2004


     Packet 1:
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |1|  1  | Rsrvd |   0   |   0   |      block length     | Rsrvd |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                          ATRAC data...                        |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

     Packet 2:
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |1|  2  | Rsrvd |   0   |   0   |      block length     | Rsrvd |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                     ...more ATRAC data...                     |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

     Packet 3:
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |0|  3  | Rsrvd |   0   |   0   |      block length     | Rsrvd |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                ...the last of the ATRAC data                  |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   The following points highlight important characteristics of the
   example above:
   o  the transition from one to zero of the Continuous bit (C)
   o  a sequential increase in the Fragment Number























Romaine, et al.         Expires January 6, 2005                 [Page 9]

Internet-Draft    RTP Payload Format for ATRAC Family          July 2004


5.  Payload Format Parameters

   Certain parameters will need to be defined before ATRAC family
   encoded content can be streamed.  Other optional parameters may also
   be defined to take advantage of specific features relevant to certain
   ATRAC versions.  Parameters for ATRAC3 and ATRAC-X are defined here
   as part of the MIME subtype registration process.  A mapping of these
   parameters into the Session Description Protocol (SDP) (RFC 2327) [2]
   is also provided for applications that utilize SDP.

   The data format and parameters are specified for real-time transport
   in RTP.

5.1  ATRAC3 MIME Registration

   The MIME subtype for the Adaptive TRansform Codec version 3 (ATRAC3)
   is allocated from the Vendor tree since this codec is intended to be
   used with commercial products, and use of any ATRAC family codec
   requires a license from Sony Corporation, the vendor.

   Note, any unspecified parameter MUST be ignored by the receiver.

   Media Type name:  audio

   Media subtype name:  vnd.sony.atrac3

   Required parameters:

   frameLength:  Indicates the size in bytes of an encoded audio frame.
   In essence, this value determines the bit-rate of the encoded audio.
   Permissible values are 192 (66kbps), 304 (105kbps), and 384
   (132kbps).

   Optional parameters:

   maxRedundantFrames:  The maximum number of redundant frames that may
   be sent during a session in any given packet under the redundant
   framing mechanism detailed in the draft.

   maxptime: The maximum amount of media which can be encapsulated in a
   payload packet, expressed as time in milliseconds.  The time is
   calculated as the sum of the time the media present in the packet
   represents.  The time SHOULD be a multiple of the frame size.  If
   this parameter is not present, the sender MAY encapsulate a maximum
   of 16 encoded frames into one RTP packet.

   ptime:    see RFC 2327 [2]




Romaine, et al.         Expires January 6, 2005                [Page 10]

Internet-Draft    RTP Payload Format for ATRAC Family          July 2004


   Encoding considerations: This type is defined for transfer via RTP
   RFC 3550 [1].

   Security considerations: Audio data is believed to offer no security
   risks.

   Public specifications: Please refer to section 7 of this draft.

   Macintosh file type code: none
   Object identifier or OID: none

   Person & email address to contact for further information:
   Mitsuyuki Hatanaka
   hatanaka@av.crl.sony.co.jp

   Intended usage: LIMITED USE
   Only licensees of ATRAC technology may use this type.

   Author/Change controller:
   hatanaka@av.crl.sony.co.jp

5.2  ATRAC-X MIME Registraion

   The MIME subtype for the Adaptive TRansform Codec version X (ATRAC-X)
   is allocated from the Vendor tree since this codec is intended to be
   used with commercial products, and use of any ATRAC family codec
   requires a license from Sony Corporation, the vendor.

   Note, any unspecified parameter MUST be ignored by the receiver.

   Media Type name:  audio

   Media subtype name:  vnd.sony.atrac-x

   Required parameters:

   sampleRate:  Represents the sampling frequency in Hz of the original
   audio data.  Permissible values are 32000, 44100, 48000, 88200,
   96000.

   frameLength:  Indicates the size in bytes of an encoded audio frame.
   In essence, this value determines the bitrate of the encoded audio.
   Permissible values lie within 8 ~ 8192.

   channelID:  Indicates the number of channels and channel layout
   according to the table in Section 5.3.  Note that this layout is
   different from that proposed in RFC 3551 [3].  However, as channelID
   = 0 defines an ambiguous channel layout, the channel mapping defined



Romaine, et al.         Expires January 6, 2005                [Page 11]

Internet-Draft    RTP Payload Format for ATRAC Family          July 2004


   in Section 4.1 of [3] could be used.  Permissible values are 0, 1, 2,
   3, 4, 5, 6, 7.

   Optional parameters:

   maxRedundantFrames:  The maximum number of redundant frames that may
   be sent during a session in any given packet under the redundant
   framing mechanism detailed in the draft.  If this parameter is not
   used, a default of "16" SHOULD be assumed.

   delayMode:  Indicates a desire to use low-delay features, in which
   case the decoder will process received data accordingly based on this
   value.  Permissible values are 2 and 4.

   encryptionMode:  Indicates whether the audio frames have been
   encrypted using OpenMG ("OpenMG") or a third party method ("Other).
   If "Other", the specific mode MUST be determined at the application
   level.  Permissible values are "OpenMG" and "Other".

   maxptime: The maximum amount of media which can be encapsulated in a
   payload packet, expressed as time in milliseconds.  The time is
   calculated as the sum of the time the media present in the packet
   represents.  The time SHOULD be a multiple of the frame size.  If
   this parameter is not present, the sender MAY encapsulate a maximum
   of 16 encoded frames into one RTP packet.

   ptime:    see RFC 2327 [2]

   Encoding considerations: This type is defined for transfer via RTP
   (RFC 3550) [1].

   Security considerations:
   Audio data is believed to offer no security risks.

   Public specifications:
   Please refer to section 7 of this draft.

   Macintosh file type code: none
   Object identifier or OID: none

   Person & email address to contact for further information:
   Mitsuyuki Hatanaka
   hatanaka@av.crl.sony.co.jp

   Intended usage: LIMITED USE
   Only licensees of ATRAC technology may use this type.

   Author/Change controller:



Romaine, et al.         Expires January 6, 2005                [Page 12]

Internet-Draft    RTP Payload Format for ATRAC Family          July 2004


   hatanaka@av.crl.sony.co.jp

5.3  Channel Mapping Configuration Table

               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
               | channelID | Number of |  Default Speaker    |
               |           | Channels  |      Mapping        |
               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
               |     0     |  max 64   |     undefined       |
               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
               |     1     |     1     | front: center       |
               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
               |     2     |     2     | front: left, right  |
               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
               |     3     |     3     | front: left, right  |
               |           |           | front: center       |
               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
               |     4     |     4     | front: left, right  |
               |           |           | front: center       |
               |           |           | rear: surround      |
               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
               |     5     |    5+1    | front: left, right  |
               |           |           | front: center       |
               |           |           | rear: left, right   |
               |           |           | LFE                 |
               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
               |     6     |    6+1    | front: left, right  |
               |           |           | front: center       |
               |           |           | rear: left, right   |
               |           |           | rear: center        |
               |           |           | LFE                 |
               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
               |     7     |    7+1    | front: left, right  |
               |           |           | front: center       |
               |           |           | rear: left, right   |
               |           |           | side: left, right   |
               |           |           | LFE                 |
               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


5.4  Mapping MIME Parameters into SDP

   The information carried in the MIME media type specification has a
   specific mapping to fields in the Session Description Protocol (SDP)
   [2], which is commonly used to describe RTP sessions.  When SDP is
   used to specify sessions employing the ATRAC family of codecs, the
   following mapping rules according to the ATRAC codec apply:




Romaine, et al.         Expires January 6, 2005                [Page 13]

Internet-Draft    RTP Payload Format for ATRAC Family          July 2004


5.4.1  For MIME subtype ATRAC3
   o  The MIME type ("audio") goes in SDP "m=" as the media name
   o  The MIME subtype (payload format name) goes in SDP "a=rtpmap" as
      the encoding name.
   o  The "frameLength" parameter goes in SDP "a=fmtp".  This parameter
      MUST be present.  "maxRedundantFrames" may follow, but if no value
      is transmitted, the receiver SHOULD assume a default value of
      "16".
   o  The parameters "ptime" and "maxptime" go in the SDP "a=ptime" and
      "a=maxptime" attributes, respectively.

5.4.2  For MIME subtype ATRAC-X
   o  The MIME type ("audio") goes in SDP "m=" as the media name
   o  The MIME subtype (payload format name) goes in SDP "a=rtpmap" as
      the encoding name.  This should be followed by the "sampleRate"
      (as the RTP clock rate), and then the total number of channels.
   o  Any remaining parameters go in the SDP "a=fmtp" attribute by
      copying them directly from the MIME media type string as a
      semicolon separated list of parameter=value pairs.  The
      "frameLength" parameter must be the first entry on this line.  It
      is recommened that the "channelID" parameter be the next entry.
      The receiver MUST assume a default value of "16" for
      "maxRedundantFrames".
   o  The parameters "ptime" and "maxptime" go in the SDP "a=ptime" and
      "a=maxptime" attributes, respectively.

5.5  Offer-Answer Model Considerations

   Some options for encoding and decoding ATRAC audio data will require
   either or both the sender and receiver to comply with certain
   specifications.  In order to establish an interoperable transmission
   framework, an Offer-Answer negotiation in SDP should observe the
   following considerations:

5.5.1  For MIME subtype ATRAC3
   o  Downgraded subsets of "frameLength" are possible.  However for
      best performance, we suggest the Answerer respond with the highest
      possible values offered.

5.5.2  For MIME subtype ATRAC-X
   o  When creating an offer with considerably high requirements (such
      as 8 channels at 96kHz), it is RECOMMENDED that the offerer also
      propose a configuration with lower requirements, such as a stereo
      only option.  Although multiple alternative configurations may be
      offered, care should be taken to not offer too many payload types.
   o  Downgraded subsets of "sampleRate", "frameLength", and "channelID"
      are possible.  However for best performance, we suggest the
      Answerer respond with the highest possible values offered.



Romaine, et al.         Expires January 6, 2005                [Page 14]

Internet-Draft    RTP Payload Format for ATRAC Family          July 2004


   o  The "maxRedundantFrames" is a suggested minimum.  The Answerer MAY
      use a higher value, but MUST NOT use a lower value.
   o  The optional parameters "delayMode" and "encryptionMode" are
      non-negotiable.  Thus, if the Answerer cannot comply with the
      offered value, the session must be deemed inoperable.
   o  The parameters "maxptime" and "ptime" should not, in most cases,
      affect the interoperability.  However, the parameter settings can
      affect application performance.

5.6  Example SDP Session Descriptions

   Example usage of ATRAC-X with stereo at 44100Hz:

   m=audio 49120 RTP/AVP 99
   a=rtpmap:99 ATRAC-X/44100/2
   a=fmtp:99 frameLength=312; channelID=2; delayMode=2
   a=maxptime:20

   Example usage of ATRAC-X with 5.1 setup at 48000Hz:

   m=audio 49120 RTP/AVP 99
   a=rtpmap:99 ATRAC-X/48000/6
   a=fmtp:99 frameLength=1156; channelID=5
   a=maxptime:30

5.7  Example Offer-Answer Exchange

   An example Offer/Answer model (assuming ATRAC Family's PT is 99).

   Alice's Offer:

   m=audio 49170 RTP/AVP 99
   a=rtpmap:98 ATRAC-X/44100/6
   a=fmtp:99 frameLength=1156; channelID=5
   a=rtpmap:99 ATRAC-X/44100/6
   a=fmtp:99 frameLength=386; channelID=5

   Bob's Answer:

   m=audio 49170 RTP/AVP 99
   a=rtpmap:99 ATRAC-X/44100/2
   a=fmtp:99 frameLength=386; channelID=2









Romaine, et al.         Expires January 6, 2005                [Page 15]

Internet-Draft    RTP Payload Format for ATRAC Family          July 2004


6.  IANA Considerations

   New MIME subtypes for ATRAC3 and ATRAC-X are currently being
   registered (see Section 5).















































Romaine, et al.         Expires January 6, 2005                [Page 16]

Internet-Draft    RTP Payload Format for ATRAC Family          July 2004


7.  Security Considerations

   Certain security precautions may be desired to protect copyrighted
   material.  The payload format as described in this document is
   subject to the security considerations defined in RFC3550 [1].  This
   payload format however does not implement any security mechanisms of
   its own.  External means, such as SRTP [5], MAY be used since the
   audio compression scheme follows an end-to-end model.

   Since the data transported is audio that is already encoded, the main
   security issues are confidentiality, integrity, and authentication of
   the actual audio.

7.1  Confidentiality

   To ensure confidentiality of ATRAC encoded audio, the audio frames
   will have to be encrypted.  Encryption of the payload header,
   however, is not as neccessary, and in fact may not be preferrable if
   the information could be useful to some third party application.

   Because the audio compression scheme follows an end-to-end model,
   encryption may be performed after packet encapsulation.  As
   multi-channel transmissions are contained in single encoded audio
   frames, there is no concern for encryption affecting interleaving
   data.

7.2  Authentication

   Transmitted data may be tampered or altered due malicious attempts,
   such as man-in-the-middle attacks.  Such attacks may result in
   depacketization and/or decoding errors that could decimate audio
   quality.

   As this payload format does not include its own means for sender
   authentication and integrity protection, an external mechanism must
   be used.  It is RECOMMENDED, however, that the chosen mechanism
   protect more than just the audio data bits.  For example, to protect
   against a man-in-the-middle attack, the payload header and RTP header
   SHOULD be protected.

7.3  Decoding Validation

   Verification of the received encoded audio packets should be
   performed so as to ensure a minimal level of audio quality.  As a
   most primitive implementation, if the receiver calculates a packet
   size differing from the payload length based on data in the payload
   header fields, the receiver SHOULD discard the packet.




Romaine, et al.         Expires January 6, 2005                [Page 17]

Internet-Draft    RTP Payload Format for ATRAC Family          July 2004


8.  References

8.1  Normative References

   [1]  Schulzrinne, H., Casner, S., Frederick, R. and V. Jacobsen,
        "RTP: A Transport Protocol for Real-Time Applications", RFC
        3550, July 2003.

   [2]  Handley, M. and V. Jacobson, "SDP: Session Description
        Protocol", RFC 2327, April 1998.

   [3]  Schulzrinne, H., "RTP Profile for Audio and Video Conferences
        with Minimal Control", RFC 3551, July 2003.

   [4]  Bradner, S., "Key words for use in RFCs to Indicate Requirement
        Levels, BCP 14", RFC 2119, March 1997.

8.2  Informative References

   [5]  Kerr, P., "RTP Payload Format for Vorbis Encoded Audio", October
        2003.

   [6]  Sjoberg, J., "Real-Time Transport Protocol (RTP) Payload Format
        and File Storage Format for the Adaptive Multi-Rate (AMR) and
        Adpative Multi-Rate Wideband (AMR-WB) Audio Codecs", RFC 3267,
        June 2002.

   [7]  Baugher, M., Carrara, E., McGrew, D., Naslund, M. and Norrman,
        "The Secure Real Time Transport Protocol", July 2003.

   [8]  Rosenberg, J. and Schulzrinne, "An Offer/Answer Model with the
        Session Description Protocl (SDP)", RFC 3264, June 2002.


Authors' Addresses

   Matthew Romaine
   Sony Corporation, Japan
   6-7-35 Kitashinagawa
   Shinagawa-ku
   Tokyo  141-0001
   Japan

   EMail: Matthew.Romaine@jp.sony.com







Romaine, et al.         Expires January 6, 2005                [Page 18]

Internet-Draft    RTP Payload Format for ATRAC Family          July 2004


   Mitsuyuki Hatanaka
   Sony Corporation, Japan
   6-7-35 Kitashinagawa
   Shinagawa-ku
   Tokyo  141-0001
   Japan

   EMail: hatanaka@av.crl.sony.co.jp


   Jun Matsumoto
   Sony Corporation, Japan
   6-7-35 Kitashinagawa
   Shinagawa-ku
   Tokyo  141-0001
   Japan

   EMail: jun@av.crl.sony.co.jp

































Romaine, et al.         Expires January 6, 2005                [Page 19]

Internet-Draft    RTP Payload Format for ATRAC Family          July 2004


Intellectual Property Statement

   The IETF takes no position regarding the validity or scope of any
   Intellectual Property Rights or other rights that might be claimed to
   pertain to the implementation or use of the technology described in
   this document or the extent to which any license under such rights
   might or might not be available; nor does it represent that it has
   made any independent effort to identify any such rights.  Information
   on the procedures with respect to rights in RFC documents can be
   found in BCP 78 and BCP 79.

   Copies of IPR disclosures made to the IETF Secretariat and any
   assurances of licenses to be made available, or the result of an
   attempt made to obtain a general license or permission for the use of
   such proprietary rights by implementers or users of this
   specification can be obtained from the IETF on-line IPR repository at
   http://www.ietf.org/ipr.

   The IETF invites any interested party to bring to its attention any
   copyrights, patents or patent applications, or other proprietary
   rights that may cover technology that may be required to implement
   this standard.  Please address the information to the IETF at
   ietf-ipr@ietf.org.


Disclaimer of Validity

   This document and the information contained herein are provided on an
   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
   ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
   INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
   INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.


Copyright Statement

   Copyright (C) The Internet Society (2004).  This document is subject
   to the rights, licenses and restrictions contained in BCP 78, and
   except as set forth therein, the authors retain all their rights.


Acknowledgment

   Funding for the RFC Editor function is currently provided by the
   Internet Society.




Romaine, et al.         Expires January 6, 2005                [Page 20]