Internet Engineering Task Force Mitsuyuki Hatanaka Internet Draft Matthew Romaine January 2004 Sony Corporation Expires: June 2004 RTP payload format for ATRAC family Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. the key words "MUST, "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [1]. Copyright (C) The Internet Society (2004). All Rights Reserved. Abstract This document describes an RTP payload format for efficient and flexible transporting of audio data encoded with the ATRAC family of codecs. Recent enhancements to the ATRAC family of codecs support high quality audio coding with multiple channels. The RTP payload format as presented in this document includes support for data fragmentation and elementary redundancy measures. Hatanaka Expires June 2004 [Page 1] Internet Draft January 2004 1 Introduction The ATRAC family of perceptual audio codecs are designed to address numerous needs for high quality low bit-rate audio transfer. ATRAC technology can be found in many consumer and professional products and applications, including MD players, voice recorders, mobile phones, and CD players. The need for real-time streaming of audio data has grown, and this document details our efforts in increasing the product and application space for the ATRAC family of codecs. Recent advances in ATRAC technology allow for multiple channels of audio to be encoded in customizable groupings. This should allow for future expansions in scaled streaming. To provide the greatest flexibility in streaming any one of the ATRAC family member codecs however, this payload format does not distinguish between the codecs on a packet level. This simplified payload format contains only the basic information needed to disassemble a packet of ATRAC audio in order to decode it. Timestamps are in sample units, with audio data encoded into frames of 512, 1024, or 2048 samples. There is basic support for fragmentation and redundancy, so ATRAC frames MAY exceed an MTU size of 1500 octets. 2 Payload Format 2.1 RTP Header 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |V=2|P|X| CC |M| PT | sequence number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | timestamp | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | synchronization source (SSRC) identifier | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | contributing source (CSRC) identifiers | | ..... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Extension (X): 1bit Set to zero as no header extensions are currently being supported. Marker (M): 1 bit Set to zero as profiling is currently not used. Payload Type (PT): 7 bits The payload type can either be dynamically allocated at the application level, or an RTP profile for a class of applications is expected to assign the payload type for this format. A dynamic allocation should designate this format as ATRAC-Family. Hatanaka Expires June 2004 [Page 2] Internet Draft January 2004 Sequence number: 16 bits This field is as defined in [1]. Timestamp: 32 bits A timestamp representing the sampling time of the first sample of the first ATRAC frame in the RTP packet. The clock frequency MUST be set to the sample rate of the encoded audio data, and is conveyed out-of-band. V/P/CC identifiers: These three fields, 2 bits, 1 bit, and 4 bits respectively, are as defined in [1]. SSRC/CSRC identifiers: These two fields, each 32 bits with one SSRC field and a maximum of 16 CSRC fields, are as defined in [1]. 2.2 Payload Header The ATRAC family payload header is a scant two octets. This should make processing very simple. 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |C|FrgNo| Rsrvd |NFrames| FrOff | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Continuous flag (C): 1 bit Set to one if this is a continuation of a fragmented packet. Fragment Number (FrgNo): 3 bits In the event of data fragmentation, this value is 1 for the first packet, and increases sequentially for the remaining fragmented data packets. Number of Frames (NFrames): 4 bits The number of frames in this packet. This allows for a maximum of 16 ATRAC-encoded audio encapsulations per packet, with 0 indicating one frame. Keep in mind only the first frame is allowed to be fragmented. Additionally, this must not be anything other than 0 for subsequent packets containing the fragmented frame. Frame Offset (FrOff): 4 bits The purpose of frame offsets is to provide a basic mechanism for the transmission of redundant data. Using the current packet's timestamp as a reference, the frame offset is converted to units of the timestamp, which then corresponds to the playback time of the first frame in the packet. This field should NOT be used in packets containing fragmented data. Hatanaka Expires June 2004 [Page 3] Internet Draft January 2004 2.3 Payload Data ATRAC payload data consists of 4 bits representing the frame ID. The following 2 octets then represent the byte-length of encoded audio data. After that, the actual audio data follows. 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |FrameID| Block Length | ATRAC data... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Frame ID: 4 bits A sequential identifier, starting from 0 for each new non-fragmented packet, identifying the start of each frame in this packet. Block length: 12 bits The byte length of encoded audio data until the end of the current packet. This is so that in the case of fragmentation, if only a subsequent packet is received, decoding can still occur. 12 bits allows for a maximum block length of 4096 bytes. In the event data is larger but would still fit within MTU limits, fragmentation must occur. 3 Frame Packetization Each RTP packet contains either an integer number of ATRAC encoded audio frames, with a maximum of 16, or one ATRAC frame fragment. As many complete ATRAC frames as can fit in a single path-MTU should be placed in an RTP packet, with the aforementioned maximum of 16. However, if an ATRAC frame will not fit into an RTP packet, it must be fragmented. The start of a fragmented frame gets placed in its own RTP packet, its Continuous bit (C) set to one, and its Fragment Number (FragNo) set to one. As the frame must be the only one in the packet, the Number of Frames field is also one. Subsequent packets are to contain the remaining fragmented frame data, with the Fragment Number increasing sequentially and the Continuous bit (C) consistently set to one. As subsequent packets do not contain any new frames, the Number of Frames field is to be zero. The last packet of fragmented data must have the Continuous bit (C) set to zero. In the event of fragmentation, the basic redundancy measures should NOT be used. 3.1 Example Fragmented ATRAC Frame An example of a fragmented ATRAC frame is presented below. The encoded audio data frame is split over three RTP packets. For brevity, the RTP packet header details have been excluded. Hatanaka Expires June 2004 [Page 4] Internet Draft January 2004 Packet 1: 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |1| 1 | Rsrvd | 1 | 0 | 1 | block length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ATRAC data... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Packet 2: 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |1| 2 | Rsrvd | 0 | 0 | 1 | block length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ...more ATRAC data... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Packet 3: 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0| 3 | Rsrvd | 0 | 0 | 1 | block length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ...the last of the ATRAC data | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The following points highlight important characteristics of the example above: -) the transition from one to zero of the Continuous bit (C) -) a sequential increase in the Fragment Number -) except for the first packet, a zero for the Number of Frames field 4 SDP Usage Certain attributes will need to be initiated before content can be streamed. For these purposes, the Session Description Protocol [3] should be used. Specifics in implementation are outside the scope of this document. However, the following attributes may prove useful. m = audio PortNumber (X-)ATRAC/44100/2 a = rtpmap:PT_ATRAC (X-)ATRAC/44100/2 a = fmtp:PT_ATRAC Profile [profile] FrameSampleSize [fsSize] BitRate [bps] ChannelConfigIndex [cci] Hatanaka Expires June 2004 [Page 5] Internet Draft January 2004 5 Security Considerations Certain security precautions may be desired to protect copyrighted material. The payload format as described in this document is subject to the security considerations defined in [1]. These security considerations imply the protection and confidentiality of the streamed content through encryption. Encryption may be performed on the encoded ATRAC since the compression scheme follows an end-to-end model. 6 References [1] RTP: A Transport Protocol for Real-Time Applications (RFC 1889) [2] RTP Payload Format for Vorbis Encoded Audio. Work in progress, draft-kerr-avt-rtp-vorbis-00.txt [3] Handley, M. and V. Jacobson, "SDP: Session Description Protocol", RFC 2327, April 1998 7 Full Copyright Statement Copyright (C) The Internet Society (2004). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PROPOSE. Hatanaka Expires June 2004 [Page 6] Internet Draft January 2004 8 Authors' Addresses Mitsuyuki Hatanaka Sony Corporation 6-7-35 Kitashinagawa Shinagawa-ku Tokyo, 141-0001 Japan Email: hatanaka@av.crl.sony.co.jp Matthew Romaine Sony Corporation 6-7-35 Kitashinagawa Shinagawa-ku Tokyo, 141-0001 Japan Email: Matthew.Romaine@jp.sony.com Jun Matsumoto Sony Corporation 6-7-35 Kitashinagawa Shinagawa-ku Tokyo, 141-0001 Japan Email: jun@av.crl.sony.co.jp Hatanaka Expires June 2004 [Page 7]