Internet Engineering Task Force Mitsuyuki Hatanaka Internet Draft Sony Corporation Document: draft-hatanaka-avt-rtp-atracx-00.txt October 2002 Expires: April 28 2003 RTP Payload Format for ATRAC-X Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract This document describes an RTP payload format for efficient and flexible transporting of ATRAC-X encoded audio data. ATRAC-X is a high quality audio coding technology that supports multiple channels. The RTP payload format as presented in this document includes support for metadata, data fragmentation, and continuous decoding even during packet losses. 1. Introduction ATRAC-X is a state-of-the-art perceptual audio coding technology, and is the successor of ATRAC and ATRAC3. ATRAC technology has been used in MD, NetMD, and Memory Stick Audio products. Improvements over previous versions of ATRAC include: - Higher sound quality at lower bit-rates - Wide range of bit-rate, from 8kbps to 1.4Mbps - Support for multichannel coding - A Flexible format for future extensions - Suitability for streaming, including scalability and fixed frame lengths The modularity and portability of ATRAC-X means it can be widely used in many applications and platforms. Hatanaka [Page 1] INTERNET-DRAFT draft-hatanaka-avt-rtp-atracx-00.txt October 2002 1.1 Overview of ATRAC-X ATRAC-X can deliver multiple channels of audio, from monaural to 7.1 channels, and from bit rates of 8kbps to 1.4Mbps. Sampling rates of 32kHz, 44.1kHz and 48kHz are currently supported, with higher rates of up to 96kHz on the horizon. Since ATRAC-X has adopted a flexible format, future extensions can include better-than-CD quality and increases in band width. Similar to other perceptual audio coding algorithms, ATRAC-X is based on time/frequency mappings. However, new techniques have been incorporated which enable more precise signal analysis and spectrum quantization, as well as efficient frequency scaling for QoS. 1.2 Overview of ATRAC-X streaming on RTP The basic building block for ATRAC-X streaming on RTP is the ATRAC-X "segment". Each such segment contains the current ATRAC-X encoded audio data and metadata, as well as any necessary redundant data. ATRAC-X segments also incorporate a fragmentation mechanism to avoid excessive packet sizes for one MTU. The contents of each segment are discussed in the following sections. Multiple ATRAC-X streams can be simultaneously transmitted during a single RTP session by sending multiple segments for each ATRAC-X "slot" -- our nomenclature for an arbitrary frame of time in which the received audio data resides. Each segment is assigned a StreamID corresponding to the stream with which it belongs. Figure 1 is a visualization of this concept. +------0--------1--------2--------3----> StreamID | +-----+ +-----+ +-----+ +-----+ 0 | N | | N | | N | | N | .. | +-----+ +-----+ +-----+ +-----+ | +-----+ +-----+ +-----+ +-----+ 1 | N+1 | | N+1 | | N+1 | | N+1 | .. | +-----+ +-----+ +-----+ +-----+ | +-----+ +-----+ +-----+ +-----+ +-----+ 2 | N+2 | | N+2 | | N+2 | | N+2 | .. | n | = ATRAC-X Segment | +-----+ +-----+ +-----+ +-----+ +-----+ with sequence n | : : : : V time ("slot") Figure1 : ATRAC-X RTP Streaming Concept As a result, ATRAC-X bit streams can be packetized arbitrarily (within reason) along both the time and StreamID axes as illustrated in Figure 1. This allows for flexible streaming of ATRAC-X on top of RTP. Hatanaka [Page 2] INTERNET-DRAFT draft-hatanaka-avt-rtp-atracx-00.txt October 2002 2. Payload visualizations 2.1 ATRAC-X Payload Format The complete structure of an ATRAC-X RTP Payload Format is shown in Figure 2. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |V=2 |P|X| CC |M| PT | sequence number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | timestamp | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | synchronization source (SSRC) identifier | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | contributing source (CSRC) identifiers | | .... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Version|FRSEQNO |C|FragNo |StreamID |Priority |NF(=2) |RSV| |RNF(=2)|RNMD(=1) |NMD(=1) |RSV|MDID | |MDID_LEN |RSV | | | | | META-DATA(1) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |LENGTH |RSV | | | ATRAC-X Main Data(1) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |LENGTH |RSV | | | ATRAC-X Main Data(2) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Rd MDID |Rd MDID_LEN |RSV | | Redundant META-DATA | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |LENGTH |RSV | | | Redudant ATRAC-X Data(1) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |LENGTH |RSV | | | Redudant ATRAC-X Data(2) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 2: ATRAC-X RTP Payload Format Hatanaka [Page 3] INTERNET-DRAFT draft-hatanaka-avt-rtp-atracx-00.txt October 2002 2.2 ATRAC-X Specific Data The structure specific to ATRAC-X is shown in Figure 3. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Version|FRSEQNO |C|FragNo |StreamID |Priority |NF |RSV| |RNF |RNMD |NMD |RSV|LENGTH |RSV | | | | ATRAC-X 1Frame Data | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |LENGTH |Rsv | | | ATRAC-X 2Frame Data | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |LENGTH |Rsv | | | ............ | | | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |LENGTH |Rsv | | | ATRAC-X N-th Frame Data | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 3: ATRAC-X Main Data format 3. Payload Header Description - Version: Version number (4bit) Version denotes the ATRAC-X payload header version number. If the receiver supports the version in the payload header, the transmitted packets will be parsed and reconstructed; otherwise the packets may be discarded by the receiver. Receivers may support more than one version of this protocol if desired. - FRSEQNO: Frame Sequence Number (7bit) FRSEQNO denotes the frame sequence number from 0 to 127, and wraps around accordingly. Thus, a 128th frame sequence would be denoted as 0. - StreamID: bit Stream ID (5bit) StreamID identifies each individual ATRAC-X bit stream. One ATRAC-X RTP session can handle up to 32 ATRAC-X streams at once. - Priority : Priority identifier (5bit) This identifier denotes the priority within each ATRAC-X slot with Hatanaka [Page 4] INTERNET-DRAFT draft-hatanaka-avt-rtp-atracx-00.txt October 2002 which each ATRAC-X segment is valued. A value of 0 represents the highest priority. Priority values are not absolute but relative to each other within one ATRAC-X slot (see below). The value of each priority does not have to be unique, and it is thus up to the receiver to decide which stream or streams to process. ___________ ___________ ___________ ___________ | ATRAC-X | | ATRAC-X | | ATRAC-X | |ATRAC-X | | 8kbps | |128kbps | | 8kbps | |128kbps | |FRSEQNO:N | |FRSEQNO:N | |FRSEQNO:N+1| |FRSEQNO:N+1| |StreamID:0 | |StreamID:1 | |StreamID:0 | |StreamID:1 | |Priority:1 | |Priority:0 | |Priority:1 | |Priority:0 | |NF:1 | |NF:1 | |NF:1 | |NF:1 | |<--------> | |<--------> | |<--------> | |<--------> | ATRAC-X ATRAC-X ATRAC-X ATRAC-X Segment(1) Segment(2) Segment(1) Segment(2) |<--------------------------->|<-------------------------->| (N)-th ATRAC-X Slot (N+1)-th ATRAC-X Slot Figure 4: The transmission of two individual ATRAC-X bit streams in one ATRAC-X RTP Payload Figure 4 shows an example of packetization for two individual ATRAC-X bit streams. We define "(N)-th ATRAC-X Slot" as the set of ATRAC-X segments that have identical frame sequence number N. In this case, each ATRAC-X slot is composed with two ATRAC-X segments. One of the ATRAC-X segments contains an 8kbps bit stream and the other contains a 128kbps bit stream. You may also notice that the higher bit-rate encoded stream has a higher priority, and that the StreamID designation is consistent over multiple ATRAC-X slots. The former need not always be true, but the latter must. This example highlights the following characteristics: (1) Two individual ATRAC-X streams are transmitted. (2) The StreamID for the first bit stream whose bit rate is 8kbps is "0", and the StreamID for the second one whose bit rate is 128kbps is "1". (3) A higher priority is assigned for the high bit rate stream. (4) The number of ATRAC-X audio frames for each ATRAC-X segment is "1". - NF : Number of ATRAC-X audio Frames (4bit) NF denotes the number of ATRAC-X audio frames in one ATRAC-X segment, and up to 15 audio frames can be included in one ATRAC-X segment. When transmitting metadata only, NF must be set to 0. - LENGTH: LENGTH of ATRAC-X data (13bit) The byte size of each ATRAC-X frame in an ATRAC-X segment is placed in LENGTH. Hatanaka [Page 5] INTERNET-DRAFT draft-hatanaka-avt-rtp-atracx-00.txt October 2002 Remarks: Fundamental information and attributes associated with the streamed audio contents, such as sampling frequency, word length of each PCM sample, etc. are not handled in the ATRAC-X RTP Payload. This information must be transmitted and negotiated before opening a streaming session between transmitter and receiver. This will allow receivers to have enough time to initialize necessary hardware. However, it will still be possible to change the attributes during streaming using metadata. 4. Metadata The ATRAC-X RTP payload provides support for the inclusion of metadata. Metadata can be used for controlling the playback of ATRAC-X data as it is streamed in real-time, or simply as supplemental information. Example uses include downmix parameters, speaker configuration settings, and effects such as panning, fading, etc. The receiver may handle all or part of the metadata segments, which are each classified by a unique ID. The following information must be defined in the ATRAC-X RTP payload header when referring to metadata. - NMD: Number of Metadata Frames(5bit) Number of metadata frames included in the RTP packet - MDID: MetaData ID (16bit) A unique ID which indicates the metadata type associated with this frame. Although unique, there are two ID types. The first type of identifier is globally pre-define for specific metadata types, while the other identifier type is for session specific use, as generated and negotiated between transmitter and receiver dynamically prior to the streaming session. The two types are distinguished by the MSB of the identifier. If the MSB is 0, it indicates the identifier is a pre-defined one; otherwise the ID is a session specific one. Thus, 32767 kinds of metadata will be available for each type of identifier. Currently all globally pre-defined identifiers are reserved and prohibited.Definition of the negotiation method between transmitter and receiver is outside the scope of this document. - MDLEN: MetaData LENgth (10bit) The byte size of the metadata corresponding to the above metadata ID. Hatanaka [Page 6] INTERNET-DRAFT draft-hatanaka-avt-rtp-atracx-00.txt October 2002 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |MDID |MDLEN |RSV | | | | | | META-DATA (N) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |MDID |MDLEN |RSV | | | | META-DATA (N+1) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 5: Metadata segment 5. Redundant data for robustness Redundant data can be included in the ATRAC-X RTP payload in order to recover from errors due to packet loss. ATRAC-X audio frames from previous ATRAC-X slots are re-sent as redundant audio data. Metadata can also be re-sent as redundant data. Existence of redundant data in the payload is not mandatory. The following information must be defined in the ATRAC-X RTP payload header: - RNF : Redundant ATRAC-X number of frames (4bit) The number of redundant ATRAC-X audio frames - RNMD : Redundant ATRAC-X metadata number of frames (5 bit) The number of redundant ATRAC-X metadata frames 0 1 2 3 4 5 6 7 8 +-+-+-+-+-+-+-+-+-+ |RNF |RNMD | +-+-+-+-+-+-+-+-+-+ Figure 6: Control bit field for redundant data The following 4 figures show hypothetical ATRAC-X packets at previous and current time frames when sending redundant data. Hatanaka [Page 7] INTERNET-DRAFT draft-hatanaka-avt-rtp-atracx-00.txt October 2002 Structure of RTP payload at previous period: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Version|FRSEQNO(=M) |C|FragNo |StreamID |Priority |NF(=3) |RSV| |RNF(=3)|RNMD(=0) |NMD(=0) |RSV|Length |RSV | | | | ATRAC-X 1Frame Data (N th Frame) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Length |RSV | | | ATRAC-X 1Frame Data (N+1 th Frame) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Length |RSV | | | ATRAC-X 1Frame Data (N+2 th Frame) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Length |RSV | | | ATRAC-X Redundant Frame Data (N-3 th Frame) | | (for FRSEQNO M-1) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Length |RSV | | | ATRAC-X Redundant Frame Data (N-2 th Frame) | | (for FRSEQNO M-1) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Length |RSV | | | ATRAC-X Redundant Frame Data (N-1 th Frame) | | (for FRSEQNO M-1) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 7: An example with redundant data in a previous packet Hatanaka [Page 8] INTERNET-DRAFT draft-hatanaka-avt-rtp-atracx-00.txt October 2002 Structure in RTP payload at current period 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Version|FRSEQNO(=M+1)|C|FragNo |StreamID |Priority |NF(=3) |RSV| |RNF(=3)|RNMD(=0) |NMD(=0) |RSV|Length |RSV | | | | ATRAC-X 1Frame Data (N+3 th Frame) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Length |RSV | | | ATRAC-X 1Frame Data (N+4 th Frame) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Length |RSV | | | ATRAC-X 1Frame Data (N+5 th Frame) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Length |RSV | | | ATRAC-X Redundant Frame Data (N th Frame) | | (for FRSEQNO M) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Length |RSV | | | ATRAC-X Redundant Frame Data (N+1 th Frame) | | (for FRSEQNO M) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Length |RSV | | | ATRAC-X Redundant Frame Data (N+2 th Frame) | | (for FRSEQNO M) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 8: An example with redundant data in the current packet Hatanaka [Page 9] INTERNET-DRAFT draft-hatanaka-avt-rtp-atracx-00.txt October 2002 Structure in RTP payload at previous period with metadata 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Version|FRSEQNO(=M) |C|FragNo |StreamID |Priority |NF(=2) |RSV| |RNF(=2)|RNMD(=1) |NMD (=1) |RSV|MD ID | |MDLEN |RSV | | | META-DATA (For FRSEQNO=M) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Length |RSV | | | ATRAC-X 1Frame Data (N+3 th Frame) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Length |RSV | | | ATRAC-X 1Frame Data (N+4 th Frame) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Length |RSV | | | ATRAC-X 1Frame Data (N+5 th Frame) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |MDID |MDLEN |RSV | | | | Redundant META DATA (For FRSEQNO=M-1) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Length |RSV | | | ATRAC-X Redundant Frame Data (N-3 th Frame) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Length |RSV | | | ATRAC-X Redundant Frame Data (N-2 th Frame) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Length |RSV | | | ATRAC-X Redundant Frame Data (N-1 th Frame) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 9:ü@An example with redundant data and additional metadata Hatanaka [Page 10] INTERNET-DRAFT draft-hatanaka-avt-rtp-atracx-00.txt October 2002 Structure in RTP payload at current period 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Version|FRSEQNO(=M+1)|C|FragNo |StreamID |Priority |NF(=3) |RSV| |RNF(=3)|RNMD(=1) |NMD(=0) |RSV|MDID | |MDLEN |RSV | | | META-DATA (For FRSEQNO = M+1) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Length |RSV | | | ATRAC-X 1Frame Data (N+3 th Frame) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Length |RSV | | | ATRAC-X 1Frame Data (N+4 th Frame) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Length |RSV | | | ATRAC-X 1Frame Data (N+5 th Frame) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |MDID |MDLEN |RSV | | Redundant META DATA (For FRSEQNO=M) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Length |RSV | | | ATRAC-X Redundant Frame Data (N th Frame) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Length |RSV | | | ATRAC-X Redundant Frame Data (N+1 th Frame) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Length |RSV | | | ATRAC-X Redundant Frame Data (N+2 th Frame) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 10: An example of redundant data with additional metadata 6. Fragmentation In the event that ATRAC-X frame data, metadata and/or redundant data are too large to be packetized into one RTP packet, transmissions of one ATRAC-X segment can be fragmented into sub-segments. Hatanaka [Page 11] INTERNET-DRAFT draft-hatanaka-avt-rtp-atracx-00.txt October 2002 0 1 2 3 4 5 6 7 +-+-+-+-+-+-+-+-+ |C|FragNo |RSV | +-+-+-+-+-+-+-+-+ Figure 11: Control bit field for fragmentation - C: Continuous flag (1bit) A 1 indicates that succeeding parts of the data in the current packet exists in following packets, and a value of 0 denotes the data is complete in the current packet. - FragNo: Fragmentation Number (4bit) The sequence number for each packet in the fragmentation. Up to 15 fragmentations are supported. Metadata can exist only in the first fragmented packet (FragNo = 0) to avoid conflicts in fragmentation. ___________ ____________ ____________ ____________ | ATRAC-X | | ATRAC-X | | ATRAC-X | |ATRAC-X | | 8kbps | | 64kbps | | 240kbps | | 240kbps | | | | | | | |Fragmented | |FRSEQNO:N | |FRSEQNO:N | |FRSEQNO:N | |FRSEQNO:N | |StreamID:0 | |StreamID:1 | |StreamID:2 | |StreamID:2 | |Priority:0 | |Priority:1 | |Priority:0 | |Priority:0 | |NF:1 | |NF:1 | |NF:1 | |NF:1 | |C:0 | |C:0 | |C:1 | |C:0 | |FragNo:0 | |FragNo:0 | |FragNo:0 | |FragNo:1 | |<--------> | |<---------->| |<-------------------------->| ATRAC-X ATRAC-X ATRAC-X Segment(1) Segment(2) Segment(3) |<--------------------------------------------------------->| ATRAC-X Slot -Nth- Figure 12: An example of fragmentation in 240kbps ATRAC-X Segment(3) 7. RTP Standard Header The usage of "Time Stamp" and "Marker Bit" field in RTP standard header is described here. The time when associating packet is sent to a network is written in unit of millisecond. The initial value for the "Time Stamp" is arbitrary, but the random number is preferable. Regarding "Marker bit", the value is 1 for the last packet in each ATRAC-X Slot, otherwise 0. 8. Glossary (1) ATRAC-X Audio Frame : The smallest unit of ATRAC-X data. This is equivalent to 2048 PCM samples (as defined in the ATRAC-X specification). Hatanaka [Page 12] INTERNET-DRAFT draft-hatanaka-avt-rtp-atracx-00.txt October 2002 (2) ATRAC-X Segment : A unit of ATRAC-X data that is sent inside an RTP packet. A segment consists of any combination of audio frames, metadata frames, redundant metadata frames, and redundant audio frames. (3) ATRAC-X Slot: A unit of time within which all audio frames of an ATRAC-X segment belong. For example, in Figure 4, two segments make up the Nth ATRAC-X slot. However, because these two segments are from different bitrate encodings, decoded audio samples from each segment would play in the same amount of time. 9. Security Considerations RTP packets using the payload format defined in this specification are subject to the security considerations discussed in the RTP specification [1]. This implies that confidentiality of the media streams is achieved by encryption. Because the data compression used with this payload format is applied end-to-end, encryption may be performed on the compressed data so there is no conflict between the two operations. 10. References [1] Schulzrinne, H., Casner, S., Frederick, R. and V. Jacobson "RTP: A Transport Protocol for Real Time Applications", RFC 1889, January 1996. 11. Author's Address Mitsuyuki Hatanaka Sony Corporation 6-7-35 Kitashinagawa Shinagawa-ku Tokyo,Japan EMail: hatanaka@av.crl.sony.co.jp Jun Matsumoto Sony Corporation 6-7-35 Kitashinagawa Shinagawa-ku Tokyo,Japan EMail: jun@av.crl.sony.co.jp Matthew Romaine Sony Corporation 6-7-35 Kitashinagawa Shinagawa-ku Tokyo,Japan EMail: Matthew.Romaine@jp.sony.com Hatanaka [Page 13]