Internet Draft draft-rey-avt-3gpp-timed-text-00.txt J. Rey/Matsushita Y. Matsui/Matsushita D. Ido/Matsushita Y. Notoya/Matsushita Expires: December 2003 June 2003 RTP Payload Format for 3GPP Timed Text Status of this document This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC 2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Copyright Notice Copyright (C) The Internet Society (2003). All Rights Reserved. Abstract This document specifies an RTP payload format for the transmission of 3GPP timed text. Timed text is defined as a part of 3GP file format. It is currently used for downloading timed text contents with or without audio/video contents. In the following sections the problems of streaming timed text are addressed and a means for streaming 3GPP timed text over RTP is specified. IETF draft - Expires December 2003 [Page 1] Internet Draft RTP Payload Format for 3GPP Timed Text June 2003 Table of Contents 1. Introduction....................................................2 2. Terminology.....................................................5 3. RTP Payload Format for 3GPP Timed Text..........................5 4. Error Resilient Transport.......................................8 5. Congestion control..............................................9 6. MIME Type Registration..........................................9 7. SDP usage.......................................................9 8. Examples of RTP packet structure................................9 9. IANA Considerations.............................................9 10. Security considerations........................................9 11. References.....................................................9 Authors' Addresses................................................10 IPR Notices.......................................................10 Full Copyright Statement..........................................11 1. Introduction The purpose of this draft is to provide a means to stream the 3GPP timed text using RTP. The 3GPP (3rd Generation Partnership Project) timed text format is specified in Annex D8.a of [1]. It is a time-lined text format defined in the 3GP file format specification, Annex D of [1]. The 3GP file format itself follows the ISO (International Standardisation Organisation) base media file format recommendation [2]. The 3GP timed text file format was developed by 3GPP as a text format for 3GPP Transparent End-to-end Packet-switched Streaming Services (PSS) [1]. The scope of the 3GPP PSS includes downloading and streaming of multimedia content over 3G packet-switched networks. The PSS adopts multimedia codecs (such as MPEG-4 Visual, AMR, MPEG-4 AAC, and JPEG) and protocols like SMIL [3] for presentation layouts or RTP for streaming. The current usage of the 3GPP timed text file format is limited to downloading (with or without audio contents) due to the lack of an appropriate RTP payload format. 1.1 Features supported by 3GPP Timed Text Plain text is a static media without timing information. It also lacks of information about, for example, colour for each word. With the 3GPP timed text format, on the other hand, it is possible to specify timing information along with a variety of text attributes, like: font, colour, scrolling, karaoke or hyperlinks, to name but a few. The timing information allows clients to display one piece of text after another synchronised with audio and video like in subtitles. The contained text attributes enables the receiver to, for example, render texts in the specified colours and fonts as the Rey, et al. [Page 2] Internet Draft RTP Payload Format for 3GPP Timed Text June 2003 creator would like. The hyperlink attribute in the 3GPP timed text allows users to jump to the URL with related content when the text is selected. 1.2 Basics of the 3GP File Format Each 3GP file consists of "Boxes". Boxes start with a header which indicates both size and type contained. The 3GP file contains the File Type Box (ftyp), the Movie Box (moov), and the Media Data Box (mdat). The Movie Box and the Media Data Box, serving as containers, include own boxes for each media. Similarly, each box type may include a number of boxes, see ISO Base Media file Format [2] for a complete list of possibilities. In the following, only those boxes are mentioned, which are useful for the purposes of this payload format. The File Type Box identifies the type and properties of a 3GP file. The File Type Box contents comprise the major brand, the minor version and the compatible brands. These are communicated via out- of-band means, such as SDP, when streamed with RTP. For the 3GPP timed text file format, the set of compatible-brands MUST include "3gp5". The Movie Box contains one or more Track Boxes (trak) which include information about each track. A Track Box contains the Track Header Box (tkhd) and the Media Information Box (minf). The latter includes the Sample Table Box (stbl) which itself contains the Sample Description Box (stsd), the Decoding Time to Sample Box (stts), the Sample Size Box (stsz) and the Sample to Chunk Box (stsc). The Track Header Box specifies the characteristics of a single track, where a track is, in this case, the streamed text during a session. Exactly one Track Header Box is needed for a track. It contains information about the track, such as the spatial layout (width and height), the video transformation matrix and the layer number. Since these pieces of information are essential and static, i.e. constant for the duration of the session, they MUST be sent prior to the transmission of any text samples. See the ISO base media file format [2] for details about the definition of the conveyed information. When using scene description in SMIL [3], it is possible to specify the layer and the position of the text track. In this case, the transmission of the Track Header Box (tkhd) is OPTIONAL. Otherwise, the Track Header Box MUST be sent prior to the start of the text streaming. The Sample Table Box (stbl) contains all the time and data indexing of the media samples in a track. Using the tables here, it is possible to locate samples in time, determine their type, and determine their size, container, and offset into that container. From the Sample Table Box (stbl) the following information is carried in each RTP packet using this payload format: the Sample Description Rey, et al. [Page 3] Internet Draft RTP Payload Format for 3GPP Timed Text June 2003 Box (stsd), the Decoding Time to Sample Box (stts), the Sample Size Box (stsz) and the Sample to Chunk Box (stsc). The Decoding Time to Sample Box (stts) is mapped to the field SDUR (Text Sample Duration); the Sample Size Box (stsz) is mapped the field SLEN (Text Sample Length) and the Sample to Chunk Box is mapped to the field SIDX (Text Sample Entry Index). The Sample to Chunk Box (stsc) associates the text sample and its corresponding sample description entry in the Sample Description Box (stsd, see below). The Sample to Chunk Box can be used to associate a text sample with a sample description entry. Since the sample description may vary during the session, the association SDIX must be sent together with the text samples using this payload format. The Sample Description Box (stsd) provides information on the basic characteristics of text samples. An example of such information could be the font size or the background colour. Since these pieces of information are commonly used by many text samples during the session, it is sent by out-of-bands means. A complete list of text characteristics can be found in [3]. Finally, the Media Data Box contains the media data itself. In 3GPP timed text tracks this box contains text samples. Its equivalent to audio and video is audio and video frames, respectively. The text sample consists of the text length, the text string, and one or several Modifier Boxes. The text length is the size of the text in bytes. The text string is plain text to render. The Modifier Box is information to render in addition to the text such as colour, font, etc. In general, text samples do not exceed the maximum transfer unit (MTU) of a particular network, but in some cases as explained later on in this document, text samples may become large and might need be fragmented. This document defines a method to convey both fragmented and non-fragmented text samples in an error resilient way. 1.3 Requirements In this section a set of requirements is listed. A justification for each of them is also given. An RTP Payload Format for 3GPP timed text SHALL: 1. Keep the 3GP text sample structure. The text sample consists of the text length, the text string, and one or several Modifier Boxes, as defined in [1]. This is important to foster interoperability of 3GP and RTP transmission formats. 2. Transmit the text sample size, sample duration and sample description index in-band. In 3GP format this information is included in the header part. In RTP it is important to transmit it in-band because this information might change from sample to sample. 3. Transmit the information contained in the Sample Description Box (stsd) out-of-band. The reason for this being that usually a sample description is referenced often from different text samples. Rey, et al. [Page 4] Internet Draft RTP Payload Format for 3GPP Timed Text June 2003 To save overhead it is sensible to transmit these pieces of information once at the initialisation phase and update them accordingly upon demand, if needed. 4. Enable the multiplexing of text samples in one single RTP packet. In a mobile communication environment a typical text sample size is around 100 bytes. Thus, multiplexing several text samples makes the transport over RTP more efficient. 5. Enable the fragmentation of a text sample into several RTP packets in order to cover a wide range of applications and network environments. 6. Enable the use of resilient transport mechanisms, such as repetition, piggy-backing, retransmissions and FEC, to list a few. 7. Provide a means to minimize the payload header overhead. In 3GP files the text sample size, text sample duration and text sample description index are coded as 32 bit fields. Typically, in timed text applications the full 32 bits are not used, so the payload format SHALL enable the configuration of the field lengths to reduce overhead. 2. Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [4]. 3. RTP Payload Format for 3GPP Timed Text The format of an RTP packet containing a 3GPP timed text packet is shown below: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |V=2|P|X| CC |M| PT | sequence number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | timestamp | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | synchronization source (SSRC) identifier | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | + RTP payload | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Marker bit (M): the marker bit must be set to 1 if the RTP packet includes the last part of a text sample; otherwise set to 0. Timestamp: The timestamp indicates the sampling instance of the timed text sample contained in the RTP packet. The initial value is randomly determined. If the RTP packet includes more than one text Rey, et al. [Page 5] Internet Draft RTP Payload Format for 3GPP Timed Text June 2003 sample, the timestamp indicates the sampling instance of the first text sample in the RTP packet. The default value of the timestamp clockrate is 1000 Hz. Other values may be specified by out-of-band means. Payload Type (PT): the payload type is set dynamically and sent by out-of-band means. The usage of the remaining RTP header fields follows the rules of RTP [5] and the profile in use. This payload format defines two payload headers: the text header, THDR, and the fragment header, FHDR. The use of these payload headers is defined depending on the contents of the payload. This payload format is used to convey both fragmented and non-fragmented text samples. When an RTP packet contains one or more (non- fragmented) text samples, only THDR is used for each text sample. When an RTP packet contains one text sample fragment, FHDR is always present and precedes the THDR, if present. This reduces the overhead. Note that only one text sample fragment is allowed when an RTP packet includes the fragment header, FHDR. The RTP sender implementing this payload format sends fragmented and non-fragmented text samples using two different payload types, i.e. payload type multiplexing, which are mapped dynamically. For this purpose, a new parameter is specified in this document for SDP, "fragment", see Section SDP. The receiver recognises a fragmented text sample by the payload type value. Note that this fact does not conflict with Section 5.2 of RTP [5] because it is the same media that is being transmitted. The following drawings illustrate the different RTP payload compositions. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | + THDR #1 + | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | text sample #1 | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | THDR #2 | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | text sample #2 | | | Rey, et al. [Page 6] Internet Draft RTP Payload Format for 3GPP Timed Text June 2003 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | ... | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 1.1 RTP payload structure when it contains one or more text samples. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | FHDR (variable) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | THDR (variable) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | + text sample fragment + | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 1.2 RTP payload structure containing a text sample fragment. 3.1 Text Header The Text Header, THDR, is used to convey both (whole) text samples and text sample fragments. It gives basic characteristics of each text sample. The THDR consists of three fields, SLEN, SIDX and SDUR in this order: - SLEN (8, 16, 24 or 32 bits) "Text Sample Length": indicates the size of the text sample in bytes, which corresponds to the entry value in the "stsz" for that sample. - SIDX (8, 16, 24 or 32 bits) "Text Sample Entry Index": indicates the reference index for the text sample, which corresponds to the index field in the "stsc" for the sample. - SDUR (8, 16, 24 or 32 bits) "Text Sample Duration": indicates the sample duration in timestamp units of the text sample, which corresponds to the entry value in the "stts" for that sample. The lengths of these fields are set by out-band means, see Section SDP. The composition of the THDR depends on whether the text sample is fragmented or not. In particular: - SLEN MUST NOT be present if the RTP packet contains a text sample fragment. If the RTP packet carries one or several non-fragmented samples SLEN MUST be present for every text sample. Rey, et al. [Page 7] Internet Draft RTP Payload Format for 3GPP Timed Text June 2003 - SIDX and SDUR MUST be present always when there is text in the fragment, i.e. T=1. Some examples follow: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SLEN | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SIDX | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SDUR | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ (a) 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SLEN | SIDX | SDUR | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ (b) Figure 3.2 Examples of THDR In Figure 3.2 (a), the THDR when the "fragment" parameter is set to 0 is shown. All fields are configured as 32 bits size. While in (b), SLEN is configured as 16 bits and other fields are configured as 8 bits (this is configurable by out-of-band means, see Section SDP for details). 3.2 Fragment Header 4. Error Resilient Transport 3GPP Timed Text operates at low bit rates. For this reason the use of payload redundancy as per RFC 2198 [6] or FEC as per RFC 2733 [7] is RECOMMENDED. In addition, the use of retransmission MAY be useful to piggy-back lost packets to the sent packets, according to RFC 2198. RFC 2354 [8] discusses a series of options for repairing streaming media. Rey, et al. [Page 8] Internet Draft RTP Payload Format for 3GPP Timed Text June 2003 5. Congestion control 6. MIME Type Registration 7. SDP usage 8. Examples of RTP packet structure 9. IANA Considerations 10. Security considerations 11. References 1 3GPP, "Transparent end-to-end packet switched streaming service (PSS); Protocols and codecs (Release 5)", TS 26.234 v 5.3.0, December 2002. 2 ISO/IEC 14496-1:2001/AMD5, öInformation technology û Coding of audio-visual objects û Part 1: Systems, ISO Base Media File Format", 2003. 3 W3C, "Synchronised Multimedia Integration Language (SMIL 2.0)", August, 2001. 4 S. Bradner, "Key words for use in RFCs to indicate requirement levels," BCP 14, RFC 2119, IETF, March 1997. 5 H. Schulzrinne, S. Casner, R. Frederick and V. Jacobson, "RTP: A Transport Protocol for Real-Time Applications", draft-ietf-avt- rtp-new-11.txt, Work in Progress, November 2001. Rey, et al. [Page 9] Internet Draft RTP Payload Format for 3GPP Timed Text June 2003 6 C. Perkins, I. Kouvelas, O. Hodson, V. Hardman, M. Handley, J.C. Bolot, A. Vega-Garcia, S. Fosse-Parisis, "RTP Payload for Redundant Audio Data", September 1997. 7 J. Rosenberg, H. Schulzrinne, "An RTP Payload Format for Generic Forward Error Correction", RFC 2733, December 1999. 8 C. Perkins, O. Hodson, "Options for Repair of Streaming Media", RFC 2354, June 1998. Authors' Addresses Jose Rey rey@panasonic.de Panasonic European Laboratories GmbH Monzastr. 4c D-63225 Langen, Germany Phone: +49-6103-766-134 Fax: +49-6103-766-166 Yoshinori Matsui matsui.yoshinori@jp.panasonic.com Matsushita Electric Industrial Co., LTD. 1006 Kadoma Kadoma-shi, Osaka, Japan Phone: +81 6 6900 9689 Fax: +81 6 6900 9699 Daiji Ido ido.daiji@jp.panasonic.com Panasonic Mobile Communications Co., Ltd. 5-3, Hikarinooka, Yokosuka-shi, Kanagawa, 239-0847, Japan Phone: +81 46 840 5416 Fax: +81 46 840 5183 Youji Notoya notoya.youji@jp.panasonic.com Matsushita Electric Industrial Co., LTD. 1006 Kadoma Kadoma-shi, Osaka, Japan Phone: +81 6 6900 9689 Fax: +81 6 6900 9699 IPR Notices The IETF takes no position regarding the validity or scope of any intellectual property or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; neither does it represent that it has made any effort to identify any such rights. Information on the IETF's procedures with respect to rights in standards-track and standards-related documentation can be found in BCP 11 [BCP11]. Copies of claims of rights made available for publication and any Rey, et al. [Page 10] Internet Draft RTP Payload Format for 3GPP Timed Text June 2003 assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF Secretariat. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights which may cover technology that may be required to practice this standard. Please address the information to the IETF Executive Director. Full Copyright Statement "Copyright (C) The Internet Society (2003). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE." Rey, et al. [Page 11]