Internet Draft                                                  J. Rey 
   draft-ietf-avt-rtp-3gpp-timed-text-06.txt                    Y. Matsui 
                                                               Matsushita 
   Expires: March 10, 2005                             September 10, 2004 
    
    
                  RTP Payload Format for 3GPP Timed Text 
                                      
   Status of this Memo 
    
   By submitting this Internet-Draft, we certify that any applicable 
   patent or other IPR claims of which we are aware have been disclosed, 
   and any of which we become aware will be disclosed, in accordance 
   with RFC 3668 (BCP 79). 
    
   By submitting this Internet-Draft, we accept the provisions of 
   Section 3 of RFC 3667 (BCP 78). 
    
   Internet-Drafts are working documents of the Internet Engineering 
   Task Force (IETF), its areas, and its working groups. Note that other 
   groups may also distribute working documents as Internet-Drafts. 
    
   Internet-Drafts are draft documents valid for a maximum of six months 
   and may be updated, replaced, or obsoleted by other documents at any 
   time. It is inappropriate to use Internet-Drafts as reference 
   material or to cite them other than as "work in progress." 
    
   The list of current Internet-Drafts can be accessed at 
   http://www.ietf.org/1id-abstracts.txt 
    
   The list of Internet-Draft Shadow Directories can be accessed at 
   http://www.ietf.org/shadow.html 
    
    
   Abstract 
    
   This document specifies an RTP payload format for the transmission of 
   3GPP (3rd Generation Partnership Project) timed text.  3GPP timed 
   text is a time-lined decorated text media format with defined storage 
   in a 3GP file.  Timed Text can be synchronized with audio/video 
   contents.  As of today, 3GP files containing timed text contents can 
   only be downloaded via HTTP.  There is no available mechanism for 
   streaming 3GPP timed text contents neither out of 3GP files nor 
   directly from live content.  In the following sections the problems 
   of streaming timed text are addressed and a payload format for 
   streaming 3GPP timed text over RTP is specified.  


                 IETF draft - Expires March 10, 2005         [Page 1] 
Table of Contents 
    
   1. Terminology.....................................................3 
   2. Introduction....................................................5 
   2.1. General Overview of the 3GPP Timed Text format...............5 
   2.2. Requirements for a payload format for 3GPP timed text........7 
   2.3. General Remarks..............................................8 
    2.3.1. Character Counting........................................8 
    2.3.2. On the length indication in the units.....................8 
    2.3.3. Fragmentation of Timed Text Samples.......................8 
    2.3.4. On aggregate payloads....................................10 
    2.3.5. Reassembling text samples at the receiver................12 
    2.3.6. Live streaming vs. Streaming from a 3GP file.............14 
   3. RTP Payload Format for 3GPP Timed Text.........................14 
   3.1. Payload Header Definitions..................................17 
    3.1.1. Common Payload Header Fields.............................18 
    3.1.2. TYPE 1 Header............................................20 
    3.1.3. TYPE 2 Header............................................22 
    3.1.4. TYPE 3 Header............................................23 
    3.1.5. TYPE 4 Header............................................24 
    3.1.6. TYPE 5 Header............................................25 
   4. Resilient Transport............................................25 
   5. Congestion control.............................................26 
   6. Scene Description..............................................26 
   6.1. Text rendering position and composition.....................26 
   6.2. SMIL usage..................................................27 
   7. MIME Type usage Registration...................................27 
   7.1. 3GPP Timed Text MIME Registration...........................27 
   8. SDP usage......................................................30 
   8.1. Mapping to SDP..............................................30 
   8.2. Parameter Usage in the SDP Offer/Answer Model...............31 
    8.2.1. Unicast Usage............................................31 
    8.2.2. Multicast Usage..........................................33 
   8.3. Offer/Answer Examples.......................................34 
   8.4. Parameter Usage outside of Offer/Answer.....................36 
   9. IANA Considerations............................................36 
   10. Security considerations.......................................36 
   11. References....................................................37 
   11.1. Normative References.......................................37 
   11.2. Informative References.....................................37 
   12. Annexes.......................................................38 
   12.1. Dynamic SIDX wrap-around mechanism.........................38 
   12.2. Basics of the 3GP File Structure...........................39 
   12.3. Usage of 3GP file information for transport in RTP.........40 
   13. Acknowledgements..............................................42 
   14. Author's Addresses............................................42 
   15. IPR Notices...................................................42 

     
                 IETF draft - Expires March 10, 2005         [Page 2] 
   Internet Draft Payload Format for 3GPP Timed Text September 10, 2004 
    
    
   16. Full Copyright Statement......................................43 
     
   [Note to the RFC Editor:  
    - please delete the Change Log section upon publication of this 
      document as RFC, 
    - please replace "RFCXXXX" with the RFC designation of this document 
      when published, 
    - please substitute "draft-ietf-..." references with the 
      corresponding RFC number if available at the time of publication] 
    
   Change Log 
    
   Since version -05: 
    - sendonly example in section 8: corrected offer. 
    
    - formatting issues with the O/A examples: wraparound of long lines 
    
    - MIME registration:  
     - added cross-reference to section 3 for choosing other than 
       default timestamp rate values. 
     - changed MIME type registration from 'video' to 'text' as agreed 
       on AVT ML. 
     - MIME registration template includes "Restrictions on usage:" 
       field, new in draft-freed-media-type-reg-01.txt.  
     - removed the "File extension" from MIME registration template 
       since the payload format does not define a file storing format. 
    
    - editorial nits. 
    
    
1. Terminology 
    
   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 
   document are to be interpreted as described in RFC 2119 [5]. 
    
   Furthermore, the following terms are used and have specific meaning 
   within the context of this document: 
    
   text sample or whole text sample or text access unit: 
    
        In the 3GPP Timed Text media format [1] this refers to a unit of 
        timed text data as contained in the source file.  Its equivalent 
        in audio/video would be a frame.  A text sample contains text 
        strings followed by zero or more modifier boxes.   
         
        In MPEG the equivalent of a text sample is referred as a 'text 
        access unit'. 
         

   Rey & Matsui                                               [Page 3] 
   Internet Draft Payload Format for 3GPP Timed Text September 10, 2004 
    
    
        In this document, like in 3GPP Timed Text media format [1], the 
        a text sample comprises a text string (note definition below) 
        followed by zero or more modifier boxes.   
    
   fragment or text sample fragment:  
    
        a fraction of a text sample.  A fragment may contain either text 
        strings or modifier (decoration) contents, but not both at the 
        same time. 
    
    
   sample contents:  
    
        general term to identify timed text data transported when using 
        this payload format.   
         
    
   text strings: 
    
        text strings is the term used to denote the actual text 
        characters encoded either as UTF-8 [18] or UTF-16 [19].  This 
        text string MUST NOT contain any byte order mark (BOM).  This 
        differs from the definition in the 3GPP Timed Text media format 
        [1], where UTF-16 text strings must include it.  In this payload 
        format the BOM is not needed as explained in Section 3.1.1.   
    
    
   decoration/modifiers:  
    
        the terms "decoration" and "modifiers" are used interchangeably 
        throughout the document to denote the contents of the text 
        sample that modify the default text formatting.  Modifiers may, 
        for example, specify different font size for a particular 
        sequence of characters or define karaoke timing for the sample. 
         
    
   sample description: 
    
        this term is used to denote information which is potentially 
        shared by more than one text sample.  In a 3GP file a sample 
        description is stored in a place where it can be shared.  It 
        contains setup and default information such as scrolling 
        direction, text box position, delay value, default font, 
        background color, etc.   
         
         
   units or transport units: 
    
        the payload headers specified in this document encapsulate text 
        samples, fragments thereof and sample descriptions by prepending 
        a specific payload header and so building what is here called a 
        (transport) unit. 
         
     
   Rey & Matsui                                               [Page 4] 
   Internet Draft Payload Format for 3GPP Timed Text September 10, 2004 
    
    
   aggregation / aggregate packet 
    
        An aggregate RTP packet consists of several units. 
         
 
   track / stream 
    
        3GP files contain audio/video and text tracks.  This document 
        enables to stream text tracks using RTP.  Therefore both terms 
        are exchanged in this document in the context of 3GP files. 
         
         
   Media Header Box / Track Header Box / ... 
    
        the 3GP file format makes use of these structures defined in the 
        ISO Base File Format [2].  When referring to these in this 
        document, initials are capitalized for clarity. 
    
    
2. Introduction 
    
   3GPP timed text is a media format for time-lined decorated text 
   specified in [1].  3GPP Timed text contents may be stored in 3GP 
   files or may be generated in real time.  The 3GP file format itself 
   is based on the ISO Base Media File Format recommendation [2].  
   Section 12.2 gives some insight into the 3GP file structure.   
    
   The purpose of this draft is to provide a means to stream 3GPP timed 
   text contents using RTP.  This includes the streaming of timed text 
   being read out of a 3GP file as well as the streaming of timed text 
   generated in real time, a.k.a. live streaming. 
    
2.1. General Overview of the 3GPP Timed Text format 
    
   The 3GPP timed text format was developed for use in the services 
   specified in the 3GPP Transparent End-to-end Packet-switched 
   Streaming Services (3GPP PSS) [16].  Besides plain text, the 3GPP 
   timed text format allows the display of decorated text: like for 
   karaoke applications, scrolling text for newscasts or hyperlinked 
   text.  Furthermore, these contents may or may not be synchronized 
   with other media, like audio or video.   
    
   The scope of the 3GPP PSS includes both downloading and streaming of 
   multimedia content over 3G packet-switched networks.  However, due to 
   the lack of an appropriate RTP payload format, the current usage of 
   the 3GPP timed text file format is limited to downloading via HTTP. 
    
   The 3GPP PSS adopts multimedia codecs (such as MPEG-4 Visual, AMR, 
   MPEG-4 AAC, and JPEG) and protocols like SMIL [9] for presentation 
   layouts or RTP [3] for streaming.  In general, a multimedia 
   presentation might consist of several audio/video/text streams (or 
   tracks in ISO file format jargon).  Different streams may have 
     
   Rey & Matsui                                               [Page 5] 
   Internet Draft Payload Format for 3GPP Timed Text September 10, 2004 
    
    
   different formats.  The media may be spatially synchronized either 
   using the information within the streams or a scene description 
   language like SMIL.   
    
   An example of this would be a media session with three different 
   media streams: 1 audio, 1 video and 1 timed text that reproduces a 
   music video with karaoke subtitles.  For each stream some information 
   is needed, which defines the regions where each media is displayed, 
   how the media looks like and its synchronization, among other things.  
   In karaoke, for example, the song lyrics may be displayed below the 
   music video and the words are highlighted synchronized with the music 
   track. 
    
   Four differentiated functional components might be identified in the 
   3GPP timed text media format: 
    
        o initial spatial layout information related to the text track: 
          these are the height and width of the text region where text 
          is displayed, the position of the text region in the display 
          and the layer or proximity of the text to the user.  These 
          pieces of information are contained in the Track Header Box.  
          Sections 6.1 and 12 provide further details. 
         
        o default settings for formatting and positioning of text: 
          style (font, size, colour,...), background colour, horizontal 
          and vertical justification, line width, scrolling, etcetera.  
          Sample descriptions contain such settings.   
         
        o the actual text: encoded characters using either UTF-8 [18] 
          or UTF-16 [19] encoding and, 
         
        o the decoration inside the modifier boxes: if some characters 
          have different style, some delay, blink, etcetera... this 
          needs to be indicated by appending the modifier boxes to the 
          text strings.  Modifier boxes are only present in the text 
          samples if they are actually needed.  Otherwise, the default 
          settings in the corresponding sample description apply.  At 
          the time of writing this payload format the following 
          decorations or modifiers are specified in the 3GPP timed text 
          media format specification [1]: 
         
            - text highlight, 
            - highlight color, 
            - blinking text, 
            - karaoke feature, 
            - hyperlink, 
            - text delay, 
            - text style and, 
            - positioning of the text box and, 
            - text wrap indication. 
      

   Rey & Matsui                                               [Page 6] 
   Internet Draft Payload Format for 3GPP Timed Text September 10, 2004 
    
    
   Section 12.3 specifies where to find these values in the 3GP file and 
   how these are mapped to the payload format.  For live streaming, 
   appropriate values using the same formats and units shall be used. 
    
   For further details on the 3GPP Timed Text media format, refer to 
   [1].   
    
    
2.2. Requirements for a payload format for 3GPP timed text 
    
   In this section a set of requirements is listed.  A justification for 
   each of them is also given.  An RTP Payload Format for 3GPP timed 
   text SHALL: 
    
        1.  Keep the 3GP text sample structure.  This requirement means 
   that it SHALL be possible for an RTP receiver using this payload 
   format to rebuild the text samples from the received RTP packets.  
    
        2.  Transmit the text sample size, sample duration and sample 
   description index in-band.  In RTP it is important to transmit it in-
   band because this information might change from sample to sample.   
    
        3.  Enable the transmission of the sample descriptions both by 
   out-of-band and in-band means.  Typically, a set of default 
   formatting settings is transmitted out-of-band (reliably) once at the 
   initialization phase.  If sample descriptions are needed in the 
   course of a session, these may be sent also out-of-band or in-band.  
   In-band transmission, although unreliable, may be more appropriate 
   for large sample descriptions or if these are sent frequently.  It is 
   also useful in cases where an out-of-band channel may not be 
   available and for live streaming, where contents are not known a 
   priori.  In order to cover this wide range of scenarios, the payload 
   format SHALL enable both in-band and out-of-band transmission of 
   sample descriptions.   
    
        4.  Enable the aggregation of units into an RTP packet.  In a 
   mobile communication environment a typical text sample size is around 
   100-200 bytes.  If the available bit rate and the packet size allow 
   it, units SHOULD be aggregated into one RTP packet.  This makes the 
   transport more efficient. 
    
        5.  Enable the fragmentation of a text sample into several RTP 
   packets in order to cover a wide range of applications and network 
   environments.  In general, fragmentation should be a rare event given 
   the low bit rates and text sample sizes.  However, the 3GPP Timed 
   Text media format does allow for larger text samples.  The payload 
   format SHALL take this into account and provide a means for coping 
   with fragmentation and reassembly.   
    
        6.  Enable the use of resilient transport mechanisms, such as 
   repetition, retransmission [11] and FEC [7].  These mechanisms may be 
   used to protect the information.  RFC 2354 [8] discusses available 
   mechanisms for stream repair. 
     
   Rey & Matsui                                               [Page 7] 
   Internet Draft Payload Format for 3GPP Timed Text September 10, 2004 
    
    
2.3. General Remarks 
    
   Before going into the details of the payload headers, some general 
   observations are made in this section.   
    
   In order to understand the next sections a rough functional 
   description of the units is needed: 
    
      o a TYPE 1 unit contains one complete text sample, 
      o a TYPE 2 unit contains a complete text string or a fragment 
        thereof, 
      o a TYPE 3 unit contains the complete modifiers or only the 
        first fragment thereof, 
      o a TYPE 4 unit contains one modifier fragment other than the  
        first and, 
      o a TYPE 5 unit contains one sample description. 
    
2.3.1. Character Counting 
    
   This payload format does not enable an RTP receiver to find out the 
   exact number of text characters lost. 
    
   The fragment size included in the payload headers does not help in 
   finding the number of lost characters, because the UTF-8/UTF-16 
   [18][19] encodings used yield a variable number of bytes per 
   character.   
    
        Informative note: for finding out the exact number of lost 
        characters, an additional field in TYPE 2 units and character 
        counting upon fragmentation would be required.  This overhead 
        does not seem reasonable given the expected low rate of 
        fragmentation. 
    
2.3.2. On the length indication in the units  
 
   Usually, RTP applications use the information on packet size from UDP 
   or lower layers to find out the length of the RTP payload.  While 
   this information can still be used, this payload format includes an 
   explicit length indication for each unit in the payload as a fixed 
   field in the payload headers.   
    
   This design choice allows easy interoperability with the RTP Payload 
   Format for Transport of MPEG-4 Elementary Streams, RFC 3640 [12], 
   which does require an explicit length indication for each unit (see 
   AU-header in RFC 3640).   
    
2.3.3. Fragmentation of Timed Text Samples 
    
   This section describes why text samples may have to be fragmented and 
   discusses some of the possible approaches to do it.  A solution is 
   proposed together with rules and recommendations for fragmenting and 
   transporting text samples using this payload format. 
     
   Rey & Matsui                                               [Page 8] 
   Internet Draft Payload Format for 3GPP Timed Text September 10, 2004 
    
    
   3GPP Timed Text applications are expected to operate at low bit rates.  
   This fact added to the small size of timed text samples (typically 
   one or two hundred bytes) makes fragmentation of text samples a rare 
   event.  Samples should usually fit into the MTU size of the used 
   network path. 
    
   Nevertheless, some text strings (e.g. ending roll in a movie) and 
   some modifier boxes (i.e. for hyperlinks, for karaoke or for styles) 
   might become large.  This may also apply for future modifier boxes.  
   In such cases, the first option to consider is whether it is possible 
   to adjust the encoding (e.g. the size of sample) in such a way that 
   fragmentation is avoided.  If so, this is preferred to fragmentation 
   and SHOULD be done.   
    
   Otherwise, if this is not possible or other constraints avoid doing 
   this, fragmentation MAY be used and the basic guidelines given in 
   this document MUST be followed:   
    
   o It is RECOMMENDED that text samples are fragmented as seldom as 
     possible, i.e. the least possible number of fragments is created 
     out of a text sample.   
 
   o If there is some bitrate and space in the payload available, 
     sample descriptions (if at hand) SHOULD be aggregated.  Sample 
     descriptions (TYPE 5 units) MAY be placed anywhere in an aggregate 
     payload, since the sample index (SIDX) is used to associate them 
     to their text samples (explained further in this document).   
    
   o Text strings MUST split at character boundaries.  Otherwise, it is 
     not possible to display the text contents of a fragment if a 
     previous fragment was lost.  As a consequence, text string 
     fragmentation requires knowledge of the UTF-8/UTF-16 encoding 
     formats to determine character boundaries. 
    
   o Unlike text strings, the modifier boxes are NOT REQUIRED to split 
     at meaningful boundaries.  However, it is RECOMMENDED to do so 
     whenever possible.  This decreases the effects of packet loss.  
     This payload format does not ensure that partially received 
     modifiers can be applied to text strings.  If only part of the 
     modifiers is received, it is an application issue how to deal with 
     these, i.e. whether to use them or not.   
    
        Informative note: ensuring that partially received modifiers can 
        be applied to text strings in all cases (for all modifier types 
        and for all fragment loss constellations) would place additional 
        requirements on the payload format.  In particular this would 
        require that: a) senders understand the semantics of the 
        modifier boxes and b) specific fragment headers for each of the 
        modifier boxes are defined, in addition to the payload formats 
        defined below.  Given the low probability of fragmentation and 
        the desire to keep the requirements low, it does not seem 
        reasonable to include such additional headers. 
     
   Rey & Matsui                                               [Page 9] 
   Internet Draft Payload Format for 3GPP Timed Text September 10, 2004 
    
    
   o Modifier and text string fragments SHOULD be protected against 
     packet losses, i.e. using FEC [7], retransmission [11], repetition 
     (Section 4) or an equivalent technique.  This minimizes the 
     effects of packet loss. 
    
   o An additional requirement when fragmenting text samples is that 
     the start of the modifiers MUST be indicated using the payload 
     header defined for that purpose, i.e. a TYPE 3 unit MUST be used 
     (see Section 3.1.4).  Otherwise, if packets are lost, a client may 
     be unable to identify where the modifiers start and the text ends 
     or whether either text strings or modifiers were received 
     completely or not.  Note that the text string length included in 
     the text samples can be used to easily find the modifier start 
     byte. 
    
   o Finally, sample descriptions SHALL NOT be fragmented, because they 
     contain important information that may affect several text 
     samples.   
    
2.3.4. On aggregate payloads 
    
   An RTP payload using this payload format MUST contain, at least, one 
   unit of any type (TYPEs 1-5).  Units SHOULD be aggregated to avoid 
   overhead, whenever possible.  The aggregate payloads MUST comply with 
   one of the following configurations: 
    
   1. zero or more whole text samples (TYPE 1 units) and zero or more 
     sample descriptions (TYPE 5) or,   
   2. zero or one modifier fragment (either TYPE 3 or TYPE 4) and zero 
     or more sample descriptions. 
   3. zero or one text string fragment (TYPE 2) and zero or one TYPE 3 
     unit and zero or more sample descriptions.  Moreover, if a TYPE 2 
     unit and a TYPE 3 unit are present, then they MUST belong to the 
     same text sample. 
    
   Different aggregates than the ones listed above SHALL NOT be used.   
    
   Some observations: 
    
   o TYPE 5 units MAY be placed anywhere in the aggregate and they 
     SHALL NOT be regarded for calculating the timestamp of the 
     subsequent units.  This is because they usually do not belong to 
     any text sample in particular, but may apply to several.  For 
     timestamp calculations, TYPE 5 units MUST simply be ignored, i.e. 
     by jumping to the next unit.   
    
   o as per rule 3 above, a payload MAY contain fragments of one (and 
     only one) text sample.  If this is the case, then exactly one TYPE 
     2 unit followed by one TYPE 3 unit are allowed in the same 
     payload.  This is inline with RFC 3640 [12], Section 2.4, which 
     explicitly disallows combining fragments of different (text) 

     
   Rey & Matsui                                              [Page 10] 
   Internet Draft Payload Format for 3GPP Timed Text September 10, 2004 
    
    
     samples in the same RTP payload.  Note that, in this special case, 
     no timestamp calculation is needed.   
    
   Some examples of aggregate payloads are illustrated in Figure 1 below 
   (although not scaled PAYLOAD 1, 2 and 3 correspond to the same 
   length):   
    
                                
      TS1    N/A   TS2     TS3  
    +------+-----+------+-----+  
    |TYPE1 |TYPE5|TYPE1 |TYPE1| 
    +------+-----+------+-----+ 
     sdur1   N/A  sdur2  sdur3      
      
                                   TS4    N/A 
                                 +-----+-------+ 
                                 |TYPE1| TYPE 5|                   a) 
                                 +-----+-------+ 
                                  sdur4   N/A 
                                                
                                        TS4         TS4    TS4 
                                 +--------------+ +--------------+ 
                                 |    TYPE2     | |TYPE2 |TYPE 3 | b) 
                                 +--------------+ +--------------+ 
                                       sdur4       sdur4   sdur4 
    
    
                                        TS4             TS4 
                                 +--------------+ +--------------+ 
                                 | TYPE2| TYPE 3| |     TYPE4    | c) 
                                 +--------------+ +--------------+ 
                                   sdur4  sdur4        sdur4 
    
    
    |----------PAYLOAD 1------|  |--PAYLOAD 2---| |--PAYLOAD 3---| 
             rtpts1                  rtpts2          rtpts3 
         
                   Figure 1. Example aggregate payloads. 
    
   In Figure 1 four text samples (TS1-4) are sent using three different 
   RTP packets.  For illustrative purposes, three different 
   possibilities for the last text sample, TS4, are depicted: a), b) and 
   c).   
    
      TS1    N/A   TS2     TS3        TS4            TS4    TS4 
    +------+-----+------+-----+  +--------------+ +--------------+  
    |TYPE1 |TYPE5|TYPE1 |TYPE1|  |    TYPE2     | |TYPE2 |TYPE 3 |  
    +------+-----+------+-----+  +--------------+ +--------------+ 
     sdur1   N/A  sdur2  sdur3         sdur4       sdur4   sdur4  
      
     (#1)    (#2) (#3)   (#4)           (#5)        (#6)    (#7)     
    
     
   Rey & Matsui                                              [Page 11] 
   Internet Draft Payload Format for 3GPP Timed Text September 10, 2004 
    
    
    |----------PAYLOAD 1------|  |--PAYLOAD 2---| |--PAYLOAD 3---| 
             rtpts1                  rtpts2          rtpts3 
    
   Using a particular configuration from Figure 1 to we will illustrate 
   how the timestamp for each unit is found.  Assuming TSx means Text 
   Sample x, rtptsy represents the standard RTP timestamp for PAYLOAD y 
   and sdurz the duration of unit z, the timestamp for unit #z (ts#z) 
   can be found as the sum of rtptsy plus the cumulative sum of the 
   durations of preceding units in that payload (except in the case of 
   PAYLOAD 3 as per rule 3 above).  Thus, we have: 
    
          1. for the units in the first aggregate payload, PAYLOAD 1:  
           
                        ts(#1)= rtpts1,  
                        ts(#2)= N/A 
                        ts(#3)= rtpts1 + sdur1, 
                        ts(#4)= rtpts1 + sdur1 + sdur2,  
                         
           Note that no sdur value is assigned to TYPE 5  
           units, and they are taken into account in the timestamp 
           calculation. 
                         
          2. for PAYLOAD 2: 
                
                        ts(#5)= rtpts2, 
                         
                         
          3. for PAYLOAD 3:  
           
                        ts(#6)= ts(#7)= rtpsts2= rtpts3 
    
          In this case rtpts3 must be equal to rtpts2 in order to be 
          able to assign this fragment to the correct sample (TS4). 
          Additionally, note that units 6 and 7 have the same timestamp 
          since they belong to the same text sample (also TS4). 
 
2.3.5. Reassembling text samples at the receiver 
 
   The payload headers defined in this document allow reassembling 
   fragmented text samples.  For this purpose the standard RTP timestamp, 
   the duration indication (SDUR) and the total and subtotal counters 
   (see TOTAL, THIS) field of the payload headers are used.   
    
   The process for collecting the different fragments (units) of a text 
   sample is as follows: 
    
     1. Search for units having the same timestamp value, i.e. belonging 
        to the same text sample. 
      
     2. Check within this set whether any of the units from the text 
        sample is missing.  This is done using the TOTAL and THIS 
        fields; the TOTAL field indicates how many fragments were 
        created out of the text sample and the THIS field indicates the 
     
   Rey & Matsui                                              [Page 12] 
   Internet Draft Payload Format for 3GPP Timed Text September 10, 2004 
    
    
        position of this fragment in the text sample.  As result of this 
        operation two outcomes are possible: 
      
          a. No fragment is missing.  Then the THIS field SHALL be used 
             to order the fragments and reassemble the text sample 
             before forwarding it to the decoding application.  Special 
             care SHALL be taken when reassembling the text string as 
             indicated in bullet 4 below. 
           
          b. One or more fragments are missing: check whether this 
             fragment belongs to the text string or to the modifiers: 
             TYPE 2 units identify text string fragments, TYPE 3 and 4 
             modifier fragments: 
           
              i. if the fragment or fragments missing belong to the 
                  text string and the modifiers were received complete, 
                  then the received text characters MAY, at least, be 
                  displayed as plain text.  Some modifiers MAY only be 
                  applied as long as it is possible to identify the 
                  character numbers, e.g. if only last text string 
                  fragment is lost.  This is the case for modifiers 
                  defining specific font styles ('styl'), highlighted 
                  characters ('hlit'), karaoke feature ('krok)' and 
                  blinking characters ('blnk').  Other modifiers such as 
                  'dlay' or 'tbox' can be applied without the knowledge 
                  of the character number.  It is an application issue 
                  to decide whether to use apply the modifiers or not.  
                 
             ii. if the fragment missing belongs to the modifiers and 
                  the text strings were received complete, then the 
                  incomplete modifiers MAY be used.  The text string 
                  SHOULD at least be displayed as plain text.  As 
                  mentioned in Section 2.3.3, modifiers MAY split 
                  without observing meaningful boundaries.  Hence, it 
                  may not always be possible to make use of partially 
                  received modifiers.  Again, to avoid this case, it is 
                  RECOMMENDED that the modifiers do split at meaningful 
                  boundaries.     
                 
            iii. a third possibility is that it is not possible to 
                  discern whether modifiers or text strings were 
                  received complete.  E.g. if the TYPE 3 unit of a 
                  sample plus the following or preceding packet is lost, 
                  there is no way for the RTP receiver to know if one if 
                  both packets lost belong to the modifiers or there is 
                  also some text strings.  FEC [7], retransmission [11] 
                  or other protection mechanisms as per section 4 are 
                  RECOMMENDED to avoid this situation.   
                 
             iv. finally, if it is sure that neither text strings nor 
                  modifiers were received complete, then the text 
                  strings and the modifiers MAY be rendered partially or 
                  MAY be discarded.  This is an application choice.   
     
   Rey & Matsui                                              [Page 13] 
   Internet Draft Payload Format for 3GPP Timed Text September 10, 2004 
    
    
     3. Sample descriptions can be directly associated with the 
        reassembled text samples, via the sample description index 
        (SIDX). 
      
     4. Reassembling of text strings: since the text strings transported 
        in RTP packets MUST NOT include any byte order mark (BOM), the 
        receiver MUST prepend it to the reassembled string (if needed) 
        before handling it to the timed text decoder.  This is needed 
        for UTF-16 encoded strings (i.e. "U" bit is set to 1).  The 
        value of the BOM is 0xFEFF (see [1]) and it is used by the 3GPP 
        timed text decoder to recognize the UTF encoding. 
      
2.3.6. Live streaming vs. Streaming from a 3GP file 
    
   This section addresses the differences between streaming live content 
   and streaming text tracks from a 3GP file. 
    
   For the purpose of this document, the term live streaming refers to 
   those scenarios where the timed text stream is sent from a live 
   encoder.  Upon reception the content may or may not be stored in a 
   3GP file.   
    
   At the sender, timed text content SHALL be encapsulated in RTP 
   packets following the guidelines given in this document.  At the 
   receiving side, a buffer is typically used to cancel the network 
   delay and delay jitter.  If receiver and sender support packet loss 
   resilience mechanisms (i.e. retransmission [11], packet FEC [7] or 
   other) it may also be possible to recover from packet losses.  Note 
   that how sender and receiver actually manage and dimension the 
   buffers are implementation design choices. 
    
   Section 12.3 specifies how the 3GP file parameters are mapped to the 
   fields of the payload header.  For live streaming, appropriate values 
   complying with the format and units described in [1] shall be used.  
   Where needed, clarifications on appropriate values are given in this 
   document. 
    
    
3. RTP Payload Format for 3GPP Timed Text  
    
   The format of an RTP packet containing 3GPP timed text is shown 
   below: 
    
       0                   1                   2                   3 
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
      |V=2|P|X| CC    |M|    PT       |        sequence number        | 
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
      |                           timestamp                           | 
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
      |           synchronization source (SSRC) identifier            | 
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
     
   Rey & Matsui                                              [Page 14] 
   Internet Draft Payload Format for 3GPP Timed Text September 10, 2004 
    
    
      |                                                               | 
      +                      RTP payload                              | 
      |                                                               | 
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
    
   Marker bit (M): the marker bit SHALL be set to 1 if the RTP packet 
   includes one or more whole text samples or the last fragment of a 
   text sample; otherwise set to zero (0).   
    
   Timestamp: the timestamp MUST indicate the sampling instant of the 
   earliest (or only) unit contained in the RTP packet.  The initial 
   value SHOULD be randomly determined, as specified in RTP [3].  In an 
   aggregate payload, units MUST be placed in play-out order, i.e. 
   earliest first in the payload.  If TYPE 1 units are aggregated, the 
   timestamp of the subsequent units MUST be obtained by adding the 
   timed text sample duration of previous samples to the RTP timestamp 
   value.  Refer to the details on the timestamp calculation for units 
   in an aggregate payload in Section 2.3.4. 
    
   Note that TYPE 5 units do not make use of the timestamp and therefore 
   the above does not apply for this particular case.   
    
   The timestamp clockrate of the samples in each text track is the 
   value of the "timescale" parameter in the Media Header Box for that 
   text track.  Note that each track in a 3GP file MAY have its own 
   clockrate as specified in the Media Header Box.   
    
   For live streaming an appropriate timestamp clockrate SHALL be used.  
   A default value of 1000 Hz is RECOMMENDED.  This value should provide 
   enough timing resolution for expressing the duration of text samples, 
   for synchronizing text with other media and for performing RTCP 
   measurements such as the interarrival delay jitter or the RTCP Packet 
   Receipt Times Report Block (Section 4.3 of RFC 3611 [20]).  This is 
   compliant to RTP [3], section 5.1: 
    
        "The resolution of the clock MUST be sufficient for the desired 
        synchronization accuracy and for measuring packet arrival jitter 
        (one tick per video frame is typically not sufficient)"  
    
   Other timestamp clockrates MAY be used.  However, using too low 
   clockrates may turn the RTCP measurements useless or may not provide 
   enough synchronization accuracy.  If this is the case, then such 
   clockrate values SHALL NOT be used.   
    
   Timestamp clockrates MUST be signaled by out-of-band means at session 
   setup, e.g. using the "rate" attribute in SDP.  See Section 8 for 
   details.   
    
   Unlike in other media like audio or video, no default sample size or 
   sampling rate is defined for timed text.  For 3GP files the length of 
   the text samples is calculated beforehand and included in the track 
   itself and for live encoding, it is the real time encoder that SHALL 
   choose an appropriate size for each text sample.  In general, the 
     
   Rey & Matsui                                              [Page 15] 
   Internet Draft Payload Format for 3GPP Timed Text September 10, 2004 
    
    
   amount of text 'captured' in a sample depends on the text source and 
   the particular application (see examples below).  Samples may, e.g., 
   be tailored to match the packet MTU as close as possible or to 
   provide a given redundancy for the available bit rate.  The encoding 
   application MUST also take into account the delay constraints of the 
   real-time session and assess whether FEC, retransmission or other 
   similar techniques are reasonable options for stream repair.   
    
   The following examples shall illustrate how a real-time encoder may 
   choose its settings to adapt to the scenario constraints.   
    
        Imagine a newscast scenario, where the spoken news is 
        transcribed and synchronized with the image and voice of the 
        reporter.  We assume that the news speaker talks at an average 
        speed of 5 words per second with an average word length of 5 
        characters plus one space per word, i.e. 30 characters per 
        second.  We assume an available IP MTU of 576 bytes and an 
        available bitrate of 576*8bits per second=4.6Kbps.  We assume 
        each character can be encoded using 2-bytes in UTF-16.  Several 
        constraints may come in this scenario into place, for example: 
        available IP MTU, available bandwidth, allowable delay and 
        required redundancy.  If the target were to minimize the packet 
        overhead, a text sample covering 8 seconds of text would be 
        closest to the IP MTU: IP/UDP/RTP/TYPE1 Header + (8s text 
        sample)=20+8+12+8+(~6 chars/word * 5 word/s * 8s *2 chars/word)= 
        528 bytes < 576 bytes.  For other scenarios, a delay of 8 
        seconds may be too much and just one packet per sample too low 
        of a redundancy.  If lower delay and higher redundancy is 
        required, a choice could be that the encoder 'collects' text 
        every second; this yields text samples (TYPE 1 units) of 68 
        bytes, TYPE 1 header included.  Taking a smaller delay of 3s, 
        three contiguous text samples could be aggregated in one RTP 
        payload: the current and last two text samples.  This accounts 
        to a total IP packet size of 20+8+12+3*(8+60)= 244 bytes.  Now, 
        with the same available bitrate of 4.6Kbps, these 244-byte 
        packets can be sent redundantly up two times per second, without 
        exceeding the available bandwidth: 
         
        RTP payload (1,2,3),(1,2,3) (2,3,4),(2,3,4) (3,4,5),(3,4,5) ... 
        Time:       <-----1s------> <-----1s------> <-----1s------> ... 
         
        This means that each text sample is sent at least six times, 
        which should provide enough redundancy.  Although not as 
        bandwidth efficient (488*8 < 528*8 < 576*8 bps) as the previous 
        packetization, this option increases the stream redundancy while 
        still meeting the delay and bandwidth constraints.   
         
        Another example would be a user sending timed text from a type-
        in area in the display.  In this case, the text sample is 
        created as soon as the user clicks the 'send' button.  Depending 
        on the packet length, fragmentation may be needed. 
         

   Rey & Matsui                                              [Page 16] 
   Internet Draft Payload Format for 3GPP Timed Text September 10, 2004 
    
    
        In a video conferencing application, text is synchronized with 
        audio and video.  Thus, the text samples shall be displayed long 
        enough to be read by a human, shall fit in the video screen and 
        shall 'capture' the audio contents rendered during the time the 
        corresponding video and audio is rendered.   
         
    
   Payload Type (PT): the payload type is set dynamically and sent by 
   out-of-band means. 
    
   The usage of the remaining RTP header fields follows the rules of RTP 
   [3] and the profile in use. 
    
3.1. Payload Header Definitions 
    
   An RTP packet using the payload headers defined in this document has 
   the following format: 
    
       0                   1                   2                   3 
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
      |V=2|P|X| CC    |M|    PT       |        sequence number        | 
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
      |                           timestamp                           | 
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
      |           synchronization source (SSRC) identifier            | 
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
      |U|   R   | TYPE|             LEN               |               : 
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-              : 
      :           (variable header fields depending on TYPE           : 
      :                                                               : 
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
      |                                                               | 
      :                     SAMPLE CONTENTS                           : 
      :                                                               : 
      :                                                               : 
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
                        Figure 2 RTP Packet Format. 
                         
                         
   The payload headers specified in this document consist of a set of 
   common fields followed by specific fields for each header type and 
   sample contents.  See Figure 3. 
    
   In this manner, the structure of the payload headers resembles that 
   of the 'access units' (AU) in RFC 3640 [12].  This similarity is 
   intentional to improve interoperability.  The 'AU header' of that 
   document finds an equivalent in the common header fields: U, R, TYPE 
   and LEN.  Similarly, the specific fields plus the sample contents 
   would be equivalent to the 'AU data section' of RFC 3640.  This is 
   illustrated in the figure below.   
    
                   =~'AU header'                    | =~'AU data' 
     
   Rey & Matsui                                              [Page 17] 
   Internet Draft Payload Format for 3GPP Timed Text September 10, 2004 
    
    
    0                   1                   2                   3 
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0  
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-....+ 
    |U|   R   |TYPE |             LEN               |   (variable)    | 
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-....+ 
     
                        Figure 3 Unit Format. 
    
   An aggregate RTP payload containing two complete text samples and a 
   payload with one text sample fragment would schematically look like 
   this: 
    
                                        +----------------------+ 
                                        |                      | 
                                        |   RTP Header         | 
                                        |                      | 
                               --------_+----------------------+ 
                               |        |                      | 
                            _  |        |    TYPE 1 Header     | 
                               |        ........................ 
                        UNIT 1 -        |                      | 
                               |        |    Text Sample       | 
                               |  _     |                      | 
                               |-------\........................ 
                                -------/|                      | 
                               |        |    TYPE 1 Header     | 
                               |        ........................ 
                        UNIT 2 -        |                      | 
                               |        |    Text Sample       | 
                               |        |                      | 
                               |  _     |                      | 
                               --------------------------------+ 
    
                                        +----------------------+ 
                                        |                      | 
                                        |   RTP Header         | 
                                        |                      | 
                               --------_+----------------------+ 
                               |        |TYPE 2(or 3 or 4)Hdr  | 
                               |        ........................ 
                        UNIT 3 -        |                      | 
                               |        | Text Sample Fragment | 
                               |_       |                      | 
                               |     _  |                      | 
                               ---------+----------------------+ 
                     Figure 4 Example RTP packets. 
 
3.1.1. Common Payload Header Fields 
    
   The fields common to all payload headers have the following format: 

     
   Rey & Matsui                                              [Page 18] 
   Internet Draft Payload Format for 3GPP Timed Text September 10, 2004 
    
    
            0                   1                   2        
            0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3  
           +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
           |U|   R   |TYPE |             LEN               | 
           +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
                     Figure 5 Common payload header fields. 
                         
                         
   Where: 
    
   o U (1 bit) "UTF Transformation flag": indicates whether the text 
     characters are encoded using UTF-8 (U=0) or UTF-16 (U=1).  This is 
     used to inform RTP receivers whether UTF-8 or UTF-16 was used to 
     encode the text string.  Since this bit is used, no byte order 
     mark (BOM) is needed inside the text string itself.  For the 
     payload formats defined in this document, the U bit is only used 
     in TYPE 1 and TYPE 2 headers.  Senders MUST set the U bit to zero 
     in TYPE 3, TYPE 4 and TYPE 5 headers.  Receivers MUST ignore the U 
     bit in TYPE 3, TYPE 4 and TYPE 5 headers.   
    
   o R (4 bits) "Reserved bits": for future extensions.  This field 
     MUST be set to zero (0x0) and MUST be ignored by receivers. 
    
   o TYPE (3 bits) "Type Field": this field specifies which specific 
     header fields follow.  The following TYPE values are defined: 
    
        - TYPE 1, for a whole text sample 
        - TYPE 2, for a text string fragment (without modifiers) 
        - TYPE 3, for a whole modifier box or the first fragment of a 
          modifier box  
        - TYPE 4, for a modifier fragment other than first. 
        - TYPE 5, for a sample description.  One header per sample 
          description.  
        - TYPE 0, 6 and 7 are reserved. 
         
   o Finally, the LEN (16 bits) "Length Field": indicates the size (in 
     bytes) of this header field and all the fields following, i.e. the 
     LEN field followed by the unit payload (text strings and 
     modifiers, if any).   
    
     In order to calculate the LEN value, the text sample length in the 
     Sample Size Box of 3GP files is used (see Section 12.3).   
      
     For live streaming, both sample length and the LEN value for the 
     current fragment MUST be calculated during the sampling process or 
     during fragmentation.   
    
     In general, LEN may take the following values: 
    
      - TYPE = 1, LEN >= 8,  
      - TYPE = 2, LEN >= 9,  
      - TYPE = 3, LEN > 6,  
     
   Rey & Matsui                                              [Page 19] 
   Internet Draft Payload Format for 3GPP Timed Text September 10, 2004 
    
    
      - TYPE = 4, LEN > 6 and,  
      - TYPE = 5, LEN > 3.  
      
     In the next subsection the different payload headers for the 
     values of TYPE are specified. 
    
3.1.2. TYPE 1 Header  
    
       0                   1                   2                   3 
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
      |U|   R   |TYPE |       LEN  (always >=8)       |    SIDX       | 
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
      |                      SDUR                     |     TLEN      | 
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
      |      TLEN     | 
      +-+-+-+-+-+-+-+-+ 
    
   This header type is used to transport whole text samples.  If several 
   text samples are sent in an RTP packet, every sample is preceded by 
   its own header (see Figure 4). 
    
   Note that also empty text samples are considered whole text samples, 
   although they do not contain sample contents.  Empty text samples may 
   be used to clear the display or to put an end to samples of unknown 
   duration, for example.  Units without sample contents SHALL have a 
   LEN field value of 8 (0x0008).   
    
   The fields above have the following meaning: 
    
   o U, R and TYPE as defined above. 
    
   o SIDX (8 bits) "Text Sample Entry Index": this is an index used to 
     identify the sample descriptions.   
    
     The SIDX field is used to find the sample description 
     corresponding to the unit's payload.  There are two types of SIDX 
     values: static and dynamic. 
    
     Static SIDX values are used to identify sample descriptions that 
     MUST be sent out-of-band and MUST remain active during the whole 
     session.  The transport of sample descriptions out-of-band is a 
     MANDATORY feature.  A static SIDX value is unequivocally linked to 
     one particular sample description during the whole session.  It 
     SHOULD be avoided that many sample descriptions are carried out-
     of-band, since these may become large and, ultimately, transport 
     is not the goal of the out-of-band channel.  Thus, this feature is 
     RECOMMENDED for transporting those sample descriptions that 
     provide a set of minimum default format settings.  Static SIDX 
     values MUST fall in the (inclusive) interval [129,254].   
      
     Dynamic SIDX values are used for sample descriptions sent in-band.  
     Sample descriptions MAY be sent in-band for several reasons: 
     
   Rey & Matsui                                              [Page 20] 
   Internet Draft Payload Format for 3GPP Timed Text September 10, 2004 
    
    
     because they are generated in real time, for transport resiliency 
     or both.  A dynamic SIDX value is unequivocally linked to one 
     particular sample description during the period in which this is 
     active in the session and it SHALL NOT be modified during that 
     period.  This period MAY be smaller than or equal to the session 
     duration.  This period is not known a priori.  A maximum of 64 
     dynamic active SIDX is allowed at any moment.  Dynamic SIDX values 
     MUST fall in the inclusive interval [0,127].  This should be 
     enough for both, recorded content and live streaming applications.  
     Nevertheless, a wrap-around mechanism is provided in Section 12.1 
     to handle sessions where more than 64 SIDX values might be needed.  
     Servers MAY make use of dynamic sample descriptions.  Clients MUST 
     be able to receive and interpret dynamic sample descriptions.   
      
     Finally, SIDX values 128 and 255 are reserved for future use. 
    
   o SDUR (24 bits) "Text Sample Duration": indicates the sample 
     duration in RTP timestamp units of the text sample.  For this 
     field, a length of 3 bytes is preferred to 2 bytes.  This is 
     because, for a typical clockrate of 1000 Hz, 16 bits would allow 
     for a maximum duration of just 65 seconds, which might be too 
     short for some streams.   
      
     Apart from defining the time period during which the text is 
     displayed, the duration field is also used to find the timestamp 
     of any subsequent units within the RTP packet. 
        
     Text samples have generally a known duration at the time of 
     transmission.  However, in some cases like live streaming, the 
     time at which a text piece shall be presented might not be known a 
     priori.  For this case, the value zero SDUR=0 (0x000000) is 
     reserved to signal unknown duration.  However, upon storing a text 
     sample with SDUR=0 in a 3GP file, the SDUR value MUST be changed 
     to the effective duration of the text sample, which MUST be always 
     greater than zero (note that the ISO file format [2] explicitly 
     forbids a sample duration of zero).  The effective duration MUST 
     be calculated as the timestamp difference between the current 
     sample (with unknown duration) and the next text sample that is 
     displayed. 
      
     The next example illustrates how units of unknown duration MUST be 
     presented.  If no text sample following is available, it is an 
     implementation issue what should be displayed.  E.g. a server 
     could send an empty sample to clear the text box. 
      
        Let us revisit a previous example, imagine now you are in an 
        airport watching the latest news report while you wait for your 
        plane.  Airports are loud, so the news report is transcribed in 
        the lower area of the screen.  This area displays two lines of 
        text: the headlines and the words spoken by the news speaker.  
        As usual, the headlines are shown for a longer time than the 
        rest.  This time is, in principle, unknown to the stream server, 

     
   Rey & Matsui                                              [Page 21] 
   Internet Draft Payload Format for 3GPP Timed Text September 10, 2004 
    
    
        which is streaming live.  A headline is just replaced when the 
        next headline is received.   
      
     Note that samples of unknown duration SHALL NOT use features, such 
     as scrolling or karaoke, which require knowledge of the duration 
     of the sample up front.  Furthermore, only sample descriptions 
     (TYPE 5) MAY follow units of unknown duration in the same 
     aggregate payload.  Otherwise, it would not be possible to 
     calculate the timestamp.   
    
     For text stored in 3GP files, see Section 12.3 for details on how 
     to extract the duration value.  For live streaming, live encoders 
     SHALL assign appropriate values and units according to [1] and 
     later releases. 
    
   o TLEN (16 bits), "Text String Length", is a byte-count of the text 
     string.  The text string length is needed by the decoder to know 
     where the modifiers in the payload start.  TLEN is not present in 
     text string fragments (TYPE 2) since it can be deductively 
     calculated from the LEN values of each fragment.   
      
     Please refer to Section 12.3 about how to obtain the text string 
     length from a 3GP file. 
      
   o Finally, the sample contents following the TLEN field consists of 
     a string of characters encoded using either UTF-8 or UTF-16, 
     followed by zero or more modifiers.  
    
3.1.3. TYPE 2 Header  
    
       0                   1                   2                   3 
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
      |U|   R   |TYPE |          LEN( always >9)      | TOTAL | THIS  | 
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
      |                    SDUR                       |    SIDX       | 
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
      |               SLEN            | 
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
    
   This header type is used to transport either a whole text string or a 
   fragment of it.  TYPE 2 units SHALL NOT contain modifiers.  In 
   detail: 
    
   o The U, R, TYPE, SIDX, and SDUR fields have similar interpretation 
     as above.  The U, SIDX and SDUR fields are meaningful since 
     partial text strings can also be displayed. 
    
   o The LEN field (16 bits) has the same meaning as above.  For this 
     header, it MUST always be greater than nine (0x0009). 
    
     As mentioned in Section 2.3.3 text strings MUST be split at 
     character boundaries to allow the display of text fragments.  
     
   Rey & Matsui                                              [Page 22] 
   Internet Draft Payload Format for 3GPP Timed Text September 10, 2004 
    
    
     Hence, as a minimum a text fragment MUST contain one character in 
     either UTF-8 or UTF-16.  Actually, this is just a formalism since 
     by following the fragmentation guidelines much larger fragments 
     should be created.   
      
     Note also, that TYPE 2 units do not contain an explicit text 
     string length (like TLEN in TYPE 1).  If all TYPE 2 units of a 
     sample are received, then the original string length can be 
     obtained deductively from the LEN of the fragments.   
    
   o The SLEN field (16 bits) indicates the size (in bytes) of the 
     original (whole) text sample to which this fragment belongs.  This 
     length comprises the text string plus any modifier boxes present 
     (and does not include the byte order mark and the text string 
     length as mentioned in the terminology section).   
    
     For stored content, see Section 12.3 for details on how to find 
     the SLEN value in a 3GP file.  For live content, the SLEN MUST be 
     obtained during the sampling process.   
    
     Clients MAY use SLEN to buffer space for the remaining fragments 
     of a text sample.   
    
   o The fields TOTAL (4 bits) and THIS (4 bits) indicate the total 
     number of fragments in which the original text sample (i.e. text 
     string and its modifiers) has been fragmented and which order 
     occupies the current fragment in that sequence, respectively.  The 
     usual "byte offset" field is not used here for two reasons: a) it 
     would take one more byte and b) it does not provide any 
     information on the character offset.  UTF-8/UTF-16 text strings 
     have, in general, a variable character length ranging from 1 to 6 
     bytes.  Therefore, the TOTAL/THIS solution is preferred.  It could 
     also be argued that the LEN and SLEN fields be used for this 
     purpose, but while they would provide information about the 
     completeness of the text sample, they do not specify the order of 
     the fragments. 
    
   o Finally, the sample contents following the SLEN field consists of 
     a fragment of the UTF-8/UTF-16 character string.  Again, no 
     modifiers follow. 
    
3.1.4. TYPE 3 Header  
    
       0                   1                   2                   3 
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
      |U|   R   |TYPE |        LEN( always >6)        |TOTAL  |  THIS | 
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
      |                      SDUR                     | 
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
    
   This header type is used to transport either the entire modifier 
   contents present in a text sample or just the first fragment of these.  
     
   Rey & Matsui                                              [Page 23] 
   Internet Draft Payload Format for 3GPP Timed Text September 10, 2004 
    
    
   This depends on whether the modifier boxes fit in the current RTP 
   payload.   
    
   If a text sample containing modifiers is fragmented this header MUST 
   be used to transport the first fragment or, if possible, the complete 
   modifiers. 
    
   In detail: 
    
   o The U, R, TOTAL/THIS and LEN fields are used as above.  The LEN 
     field MUST be greater than six (0x0006). 
    
   o The TOTAL/THIS field has the same meaning as for TYPE 2.  The THIS 
     field is counting the number of units (TYPE2, TYPE 3, TYPE 4) used 
     for fragmenting a text sample.  Therefore, the last (trailing) 
     modifier fragment are transported in a unit in which TOTAL=THIS.  
     In this case, TOTAL=THIS MUST be greater than one, because TOTAL 
     indicates the total number of fragments of the text sample, which 
     is logically, always larger than one.   
      
     Otherwise, if TOTAL is different from THIS in a TYPE 3 unit, this 
     unit just contains the first fragment of the modifiers. 
      
   o The SDUR has the same definition as above.  Since the fragments 
     are always transported in own RTP packets, this field is only 
     needed to know how long this fragment is valid.  This may, e.g., 
     be used to determine how long it should be kept in the display 
     buffer.   
    
   Note that the SLEN and SIDX fields are not present.  This is because: 
   a) these fragments do not contain text strings and b) these types of 
   fragments are applied over text string fragments, which already 
   contain this information.   
    
3.1.5. TYPE 4 Header  
    
       0                   1                   2                   3 
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
      |U|   R   |TYPE |        LEN( always >6)        |TOTAL  |  THIS | 
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
      |                      SDUR                     | 
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
    
   This header type is prepended to modifier fragments, other than the 
   first one.   
    
   The U, R, TOTAL/THIS and LEN fields are used as above.  The LEN field 
   MUST be greater than six (0x0006). 
    
   Regarding the SDUR field and the absence of the SLEN and SIDX fields, 
   the same reasoning as for TYPE 3 applies. 
    
     
   Rey & Matsui                                              [Page 24] 
   Internet Draft Payload Format for 3GPP Timed Text September 10, 2004 
    
    
3.1.6. TYPE 5 Header 
    
       0                   1                   2                   3 
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
      |U|   R   |TYPE |      LEN( always >3)          |   SIDX        | 
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
    
   This header type is used to transport (dynamic) sample descriptions.  
   The LEN field MUST be greater than three (0x0003).  Every sample 
   description MUST have its own TYPE 5 header. 
    
   The SIDX value is in the (inclusive) interval [0,127], as this unit 
   contains a dynamic sample description. 
    
   Senders MAY make use of TYPE 5 units. All receivers MUST implement 
   support for TYPE 5 units, since it adds minimum complexity and it may 
   increase the robustness of the streaming session.   
    
   Finally, if sample descriptions for a given SIDX value are not 
   available at the receiver, it is a matter of implementation whether 
   the text sample contents are displayed.  For example, an application 
   MAY provide a static default sample description to be used for these 
   cases.  This is, however, an implementation issue and out of the 
   scope of this document. 
    
    
4. Resilient Transport 
    
   Apart from the basic fragmentation guidelines described in the 
   section above, the simplest option for packet loss resilient 
   transport is repetition.   
    
   A server MAY decide to use repetition as a measure for packet loss 
   resilience.  Thereby, a server MAY send the same RTP packet payloads 
   or just parts of it, i.e. single units.   
    
   As for the case of complete payloads, single repeated units MUST 
   match exactly the same units sent in the first transmission, i.e. if 
   fragmentation is needed it SHALL be performed only once for each text 
   sample   Only then, a receiver can use the already received and the 
   repeated units to reconstruct the original text samples.  Since the 
   RTP timestamp is used to group together the fragments of a sample, 
   care must taken to preserve the timing of units when constructing new 
   RTP packets.  
    
        E.g. if a text sample was originally sent as a single non-
        fragmented text sample (one TYPE 1 unit), a repetition of that 
        sample MUST be sent also as a single non-fragmented text sample 
        in one unit.  Likewise, if the original text sample was 
        fragmented and spread over several RTP packets, say a total of 3 
        units, then the repeated fragments SHALL also have the same byte 
        boundaries and use the same unit headers and bytes per fragment.   
     
   Rey & Matsui                                              [Page 25] 
   Internet Draft Payload Format for 3GPP Timed Text September 10, 2004 
    
    
   With repetition, repeated units resolve to the same timestamp as 
   their originals.  Where redundant units are available, the receiver 
   SHOULD use those units received in the RTP packet with the highest 
   sequence number.   
    
   Regarding the RTP header fields: 
    
   o if the whole RTP payload is repeated, all payload-specific fields 
     in the RTP header (the M, TS and PT fields) MUST keep their 
     original values except the sequence number that MUST be 
     incremented to comply with RTP.   
    
   o in packets containing single repeated units, the general rules in 
     Section 3 for assigning values to the RTP header fields apply.  
     Particularly relevant here is to keep the value of the RTP 
     timestamp to preserve the timing of the units. 
    
   Apart from repetition other mechanisms such as FEC [7], 
   retransmission [11] or similar techniques SHOULD be used to cope with 
   packet losses.   
    
    
5. Congestion control 
    
   Congestion control for RTP SHALL be implemented in accordance with 
   RTP [3], and the applicable RTP profile, e.g. RTP/AVP [17].  The RTP 
   profile under which this payload format is used defines an 
   appropriate congestion control mechanism in different environments.  
   Following the rules under the profile, an RTP application can 
   determine its acceptable bitrate and packet rate in order to be fair 
   to other TCP or RTP flows. 
    
    
6. Scene Description 
    
6.1. Text rendering position and composition 
    
   In order to set up a timed text session, regardless of the stream 
   being stored in a 3GP file or streamed live, some initial layout 
   information is needed by the communicating peers.  Some parameters 
   are used to provide the client with information on the stream 
   properties.  These are the "width" and "height" of the text area, its 
   position and its "layer" or proximity to the user.  At the same time, 
   the server needs to know the client's capabilities, i.e. the maximum 
   allowable values for the text track height and width: "max-h" and 
   "max-w".   
    
   These pieces of information MUST be conveyed in a reliable form 
   previous to the start of the session, e.g. during session 
   announcement or in an O/A exchange.  An example of a reliable 

     
   Rey & Matsui                                              [Page 26] 
   Internet Draft Payload Format for 3GPP Timed Text September 10, 2004 
    
    
   transport may be the out-of-band channel used for SDP.  Section 7 and 
   8 provide details on the usage in SDP descriptions. 
    
   For stored content, the layout values expressing stream properties 
   MUST be obtained from the Track Header Box.  See Section 12.3 for 
   details on finding these values in a 3GP file.   
    
   For live streaming appropriate values SHALL be used. 
    
6.2. SMIL usage 
    
   Note that the attributes contained in the Track Header Boxes of a 3GP 
   file only specify the spatial relationship of the tracks within the 
   given 3GP file.  If several media streams are sent, they require 
   spatial synchronization.  For such purpose, SMIL [9] SHOULD be used.   
    
   SMIL assigns regions in the display to each of those files and places 
   the tracks within those regions.  The original track header 
   information is used for each track within its region.  Therefore, 
   even if SMIL scene description is used, the track header information 
   pieces SHOULD be sent anyway as they represent the intrinsic media 
   properties.   
    
   See [1] and the 3GPP SMIL Language Profile in [16] for details. 
    
    
7. MIME Type usage Registration 
    
7.1. 3GPP Timed Text MIME Registration 
         
   The MIME subtype for the 3GPP Timed Text codec is allocated from the 
   IETF tree.  The MIME top-level type under which this payload format 
   is registered is 'text'.   
    
   The receiver MUST ignore any unspecified parameter. 
    
   MIME Type: text 
    
   MIME subtype: 3gpp-tt 
    
   Required parameters 
    
        rate:  
                the RTP timestamp clockrate is equal to the clockrate of 
                the media.  If RTP packets are generated out of a 3GP 
                file, the clockrate of the text media MUST be copied 
                from the 3GP file, i.e. the clockrate is the value of 
                "timescale" parameter in the Media Header Box describing 
                that text track.  Other tracks (audio/video/text) in the 
                3GP file may have their own clockrates as indicated in 
                their corresponding Media Header Box.  For live 
                encoding, a clockrate of 1000 Hz is RECOMMENDED but 
                other values MAY be used.  If a different timestamp rate 
     
   Rey & Matsui                                              [Page 27] 
   Internet Draft Payload Format for 3GPP Timed Text September 10, 2004 
    
    
                value is used, this MUST be chosen carefully as per 
                Section 3 in RFCXXXX.  
                 
        sver: 
                The parameter "sver" contains a list of supported 
                backwards-compatible versions of the timed text format 
                specification (3GPP TS 26.245) that the sender accepts 
                to receive (and which are the same that it would be 
                willing to send).  The first value is the value 
                preferred to be received ( or preferred to send).  The 
                first value MAY be followed by a comma-separated list of 
                versions that SHOULD be used as alternatives.  The order 
                is meaningful, being first the most preferred and last 
                the least preferred.  Each entry has the format 
                Zi(xi*256+yi), where "Zi" is the number of the Release, 
                "xi" and "yi" are taken from the 3GPP specification 
                version, i.e. vZi.xi.yi.  For example, for 3GPP TS 
                26.245 v6.0.0, Zi(xi*256+yi)=6(0), the version value is 
                "60".  (Note that "60" is the concatenation of the 
                values Zi=6 and (xi*256+yi)=0 and not its product.) 
    
        width: 
                This parameter indicates the width in pixels of the text 
                track or area of the text being sent.  This is a 16 bit 
                integer.   
    
        height: 
                This parameter indicates the height in pixels of the 
                text track being sent.  This is a 16 bit integer. 
                 
        max-w: 
                This parameter indicates display capabilities.  This is 
                the maximum "width" value that the sender of this 
                parameter supports.  This is a 16 bit unsigned integer. 
    
        max-h: 
                This parameter indicates display capabilities.  This is 
                the maximum "height" value that the sender of this 
                parameter supports.  This is a 16 bit unsigned integer. 
                 
        tx: 
                This parameter indicates the horizontal translation 
                offset in pixels of the text track with respect to the 
                origin of the video track.  This is a 16 bit integer. 
    
        ty: 
                This parameter indicates the vertical translation offset 
                in pixels of the text track with respect to the origin 
                of the video track.  This is a 16 bit integer. 
    
        layer:  
                This parameter indicates the proximity of the text track 
                to the viewer.  More negative values mean closer to the 
     
   Rey & Matsui                                              [Page 28] 
   Internet Draft Payload Format for 3GPP Timed Text September 10, 2004 
    
    
                viewer.  This parameter has no units.  This is a 16 bit 
                integer. 
    
   Optional parameters: 
    
    
        tx3g:  
                This parameter MUST be used for conveying sample 
                descriptions out-of-band.  It contains a comma-separated 
                list of base64-encoded entries.  The entries of this 
                list that MAY follow any particular order and the list 
                MAY be empty.  The absence of this parameter is 
                equivalent to an empty list of sample descriptions.  
                Each entry is the result of running base64 encoding over 
                the concatenation of the SIDX and the sample description 
                for that SIDX, in this order.  The format of a sample 
                description entry can be found in 3GPP TS 26.245 Release 
                6 and later releases.  All servers and clients MUST 
                understand this parameter and MUST be capable of using 
                the sample description(s) contained in it.  Please refer 
                to RFC 3548 [6] for details on the base64 encoding. 
         
   Encoding considerations:  
    
        RTP payloads complying with this payload format contain binary 
        data.   
         
        Note that this type is incompatible with the use of text media 
        types in other protocols, e.g. text/html.  This is because in 
        order to extract and decode any of the timed text media it is 
        necessary understand the (binary) payload headers defined in 
        RFCXXXX. 
         
   Restrictions on usage: 
    
        This type is only defined for transfer via RTP. 
         
   Security considerations:  
    
        Please refer to Section 10 of RFCXXXX. 
    
   Interoperability considerations:  
    
        The 3GPP Timed Text media format for which this payload format 
        is defined is specified in Release 6 of 3GPP TS 26.245 
        "Transparent end-to-end packet switched streaming service (PSS); 
        Timed Text Format (Release 6)".  The 3GPP file format (3GP) and 
        the SMIL language profile used can be found in Release 5 of 3GPP 
        TS 26.234 and in the corresponding specifications for later 
        Releases.  Note also that 3GPP may in future Releases specify 
        extensions or updates to the media format in a backwards-
        compatible way, e.g. new modifier boxes or extensions to the 
        sample descriptions.  The payload format defined in RFCXXXX 
     
   Rey & Matsui                                              [Page 29] 
   Internet Draft Payload Format for 3GPP Timed Text September 10, 2004 
    
    
        allows for such extensions.  For future 3GPP Releases of the 
        Timed Text Format, the parameter "sver" is used to identify the 
        exact specification used. 
    
   Published specification: RFC XXXX  
    
   Applications which use this media type:  
         
        Multimedia streaming applications. 
    
   Additional information:  
         
        the 3GPP Timed Text media format is specified in 3GPP TS 26.245 
        "Transparent end-to-end packet switched streaming service (PSS); 
        Timed Text Format (Release 6)".  This document and future 
        extensions to the 3GPP Timed Text format are publicly available 
        at http://www.3gpp.org.  
         
        Magic number(s): None. 
         
        File extension(s): None. 
         
        Macintosh File Type Code(s): None. 
         
   Person & email address to contact for further information: 
         
        Jose Rey, rey@panasonic.de 
        Yoshinori Matsui, matsui.yoshinori@jp.panasonic.com 
        Audio/Video Transport Working Group. 
         
   Intended usage: COMMON 
         
   Author/Change controller:  
         
        Jose Rey 
        Yoshinori Matsui 
        IETF AVT WG 
    
    
8. SDP usage 
    
8.1. Mapping to SDP 
    
   The information carried in the MIME media type specification has a 
   specific mapping to fields in SDP [4].  If SDP is used to specify 
   sessions using this payload format, the mapping is done as follows: 
    
   o The MIME type ("text") goes in the SDP "m=" as the media name.   
    
       m=text <port number> RTP/<RTP profile> <dynamic payload type> 
    

   Rey & Matsui                                              [Page 30] 
   Internet Draft Payload Format for 3GPP Timed Text September 10, 2004 
    
    
   o The MIME subtype ("3gpp-tt") and the timestamp clockrate "rate" 
     (1000 Hz or other) go in SDP "a=rtpmap" line as the encoding name 
     and rate, respectively: 
    
       a=rtpmap:<payload type> 3gpp-tt/1000 
    
   o The MANDATORY fmtp parameters "sver", "width", "height", "max-h", 
     "max-w", "tx", "ty" and "layer" go in the SDP "a=fmtp" attribute 
     by copying them directly from the MIME media type string as a 
     semicolon separated list of parameter=value pairs. 
    
   o The OPTIONAL parameter "tx3g" goes in the SDP "a=fmtp" attribute 
     by copying them directly from the MIME media type string as a 
     semicolon separated list of values: 
    
       a=fmtp:<dynamic payload type> <parameter name>=<value>[, 
       <parameter name>=<value>] 
    
   o Any unknown parameter SHALL be ignored. 
    
8.2. Parameter Usage in the SDP Offer/Answer Model 
    
   In this section the meaning of the SDP parameters defined in this 
   document within the Offer/Answer (O/A) [13] context is explained. 
    
   In unicast, sender and receiver typically negotiate the streams, i.e. 
   which codecs and parameter values are used in the session.  This is 
   also possible in multicast to a lesser extend. 
    
   As stated in the O/A model, some "fmtp" (payload-format-specific) 
   parameters have a clear meaning and shall be included in the answer 
   as present in the offer.  Other parameters may need to be set among 
   parties, because it is not clear that offerer and answerer shall use 
   the same values.  A third type of parameters may take different 
   values for offerers and answerers.  Additionally, the meaning of the 
   parameter MAY vary depending on which direction it used.  In the 
   following sections, a "<directionality> offer" means an offer that 
   contains a stream set to <directionality>.  <directionality> may take 
   the values sendrecv, sendonly andrecvonly.  Similar considerations 
   apply for answers.  E.g. an answer to sendonly offer is a recvonly 
   answer. 
    
   These considerations apply to both 3GP file streaming and live 
   streaming scenarios. 
    
8.2.1. Unicast Usage 
    
   The following types of parameters are used in this payload format: 
    
     1. Declarative parameters: offerer and answerer declare the values 
        they will use for the incoming (sendrecv/recvonly) or outgoing 
        (sendonly) stream.  They MAY use different values.   
           
     
   Rey & Matsui                                              [Page 31] 
   Internet Draft Payload Format for 3GPP Timed Text September 10, 2004 
    
    
          a. "tx", "ty" and "layer": these are parameters describing 
             where the text track is placed.  Depending on the 
             directionality: 
              i. MUST appear in all sendrecv offers and answers and in 
                  all recvonly offers (thus applying to the incoming 
                  stream). 
             ii. MUST appear in all recvonly answers.  In this case, 
                  the answerer MAY accept the proposed values for the 
                  incoming stream or respond with different ones.  The 
                  offerer MUST use the returned values.   
            iii. MAY appear in sendonly offers and MUST appear in 
                  sendonly answers.  In offers they specify the values 
                  that the offerer proposes for sending (useful for 
                  visually composed streams, see below).  In answers 
                  these values MUST be copied from the corresponding 
                  recvonly offer upon accepting the stream (thus 
                  acknowledging the use of such).  
           
     2. Parameters describing the display capabilities, which MUST be 
        included in all offers and answers where "tx" and "ty" refer to 
        the incoming stream, thus excluding sendonly offers and answers 
        (see examples below). "max-h" and "max-w" indicate the maximum 
        dimensions of the text track (text display area) for the given 
        "tx" and "ty" values. 
 
     3. Parameters describing the sent stream properties, i.e. the 
        sender of the stream decides upon the values of these: 
      
          a. "width" and "height", specify the text track dimensions.  
             They SHALL ALWAYS be present in sendrecv and sendonly 
             offers and answers.  For recvonly answers, the answerer 
             MUST include the offered parameter values (if any) verbatim 
             in the answer upon accepting the stream.   
           
          b. "tx3g", static sample descriptions, this is an OPTIONAL 
             parameter.  It MAY only be present in sendrecv and sendonly 
             offers and answers. 
      
     4. Negotiable parameters, which MUST be agreed on.  This is the 
        case of "sver".  This parameter MUST be present in every offer 
        and answer.  The answerer SHALL choose one supported value from 
        the offerer's list or else it MUST remove the stream or reject 
        the session.  
      
     5. Symmetric parameters: "rate", timestamp clockrate, belongs to 
        this class.  It must be echoed verbatim in the answer.  
        Otherwise the stream MUST be removed or the session rejected. 
    
   Other observations regarding parameter usage: 
    
     o Translation and transparency values: in sendonly offers "tx", 
        "ty" and "layer" indicate proposed values.  This is useful for 
        visually composed sessions where the different streams occupy 
     
   Rey & Matsui                                              [Page 32] 
   Internet Draft Payload Format for 3GPP Timed Text September 10, 2004 
    
    
        different parts of the display, e.g., a video stream and the 
        captions.  These are just suggested values because it is the 
        peer rendering the text that ultimately decides where to place 
        the text track.   
      
     o Text track (area) dimensions, "height" and "width": in the case 
        of sendonly (sendrecv) offers, an answerer accepting the offer 
        MUST be prepared to render (and send) the stream with the same 
        exact values.  If any of these conditions are not met, the 
        stream MUST be removed or the session rejected. 
 
     o Display capabilities, "max-h" and "max-w": an answerer sending a 
        stream SHALL ensure that the "height" and "width" values in the 
        answer are compatible with the offerer's signalled capabilities.   
 
     o Version handling via "sver": the idea is that offerer and 
        answerer communicate using the same version.  This is achieved 
        by letting the answerer choose from a list of supported 
        versions, "sver".  For recvonly streams, the first value in the 
        list is the preferred version to receive.  Consequently, for 
        sendonly (and sendrecv) streams the first value is the one 
        preferred for sending (and receiving).  The answerer MUST choose 
        one value and return it in the answer.  Upon receiving the 
        answer, the offerer SHALL be prepared to send (sendonly and 
        sendrecv) and receive (recvonly and sendrecv) a stream using 
        that version.  If none of the versions in the list is supported 
        the stream MUST be removed or the session rejected. 
    
8.2.2. Multicast Usage 
    
   In multicast the parameter usage is similar to the unicast case, 
   except in the following cases: 
    
   o the parameters "tx", "ty" and "layer" in multicast offers only 
     have meaning for sendrecv and recvonly streams.  In order for all 
     clients to have the same vision of the session, they MUST be used 
     symmetrically. 
    
   o for "height", "width" and the OPTIONAL "tx3g" (for sendrecv and 
     sendonly), multicast offers specify which values of these 
     parameters the participants MUST use for sending.  Thus, if the 
     stream is accepted, the answerer MUST also here include them 
     verbatim in the answer.   
    
   o The capability parameters, "max-h" and "max-w", SHALL NOT be used 
     in multicast.  If the offered text track should change in size, a 
     new offer SHALL be used instead. 
    
   o Regarding version handling: 
 
     In the case of multicast offers, an answerer MAY accept a 
     multicast offer as long as one of the versions listed in the 
     "sver" is supported.  Therefore, if the stream is accepted, the 
     
   Rey & Matsui                                              [Page 33] 
   Internet Draft Payload Format for 3GPP Timed Text September 10, 2004 
    
    
     answerer MUST choose its preferred version but, unlike in unicast, 
     the offerer SHALL NOT change the offered stream to this chosen 
     version because there may be other session participants that do 
     support the newer extensions.  Consequently, different session 
     participants may end up using different backwards-compatible media 
     format versions.  It is RECOMMENDED that the multicast offer 
     contains a limited number of versions, in order for all 
     participants to have the same view of the session.  This is a 
     responsibility of the session creator.  If none of the offered 
     versions is supported, the stream SHALL be removed or the session 
     rejected.   
    
8.3. Offer/Answer Examples 
    
   In these unicast O/A examples the long lines are wrapped around.  
   Static sample descriptions are shortened for clarity. 
    
   Sendrecv offer: 
    
   O -> A 
    
   m=text <port> RTP/AVP 98 
   a=rtpmap:98 3gpp-tt/1000 
   a=fmtp:98 tx=100; ty=100; layer=0; height=80; width=100; max-h=120; 
   max-w=160; sver=6256,60; tx3g=81... 
   a=sendrecv 
    
   A -> O 
    
   m=text <port> RTP/AVP 98.. 
   a=rtpmap:98 3gpp-tt/1000 
   a=fmtp:98 tx=100; ty=95; layer=0; height=90; width=100; max-h=100; 
   max-w=160; sver=60; tx3g=82... 
   a=sendrecv 
    
   In this example the offerer is telling the answerer where it will 
   place the received stream and what is the maximum height and width 
   allowable for the stream that it will receive.  Also, it tells the 
   answerer the dimensions of the text track for the stream sent and 
   which sample description it shall use.  It offers two versions, 6256 
   and 60.  The answerer responds with an equivalent set of parameters 
   for the stream it receives.  In this case the answerer's "max-h" and 
   "max-w" are compatible with the offerer's "height" and "width".  
   Otherwise, the answerer would have to remove this stream and the 
   answerer issue a new offer taking the answerer's capabilities into 
   account.  Note also that the answerer's text box dimensions fit 
   within the within the maximum values signalled in the offer.  
   Finally, the answerer chooses to use version 60 of the timed text 
   format. 
    
    
   For recvonly: 
    
     
   Rey & Matsui                                              [Page 34] 
   Internet Draft Payload Format for 3GPP Timed Text September 10, 2004 
    
    
   Offerer -> Answerer 
    
   m=text <port> RTP/AVP 98 
   a=rtpmap:98 3gpp-tt/1000 
   a=fmtp:98 tx=100; ty=100; layer=0; max-h=120; max-w=160; sver=6256,60 
   a=recvonly 
    
   A -> O 
    
   m=text <port> RTP/AVP 98.. 
   a=rtpmap:98 3gpp-tt/1000 
   a=fmtp:98 tx=100; ty=100; layer=0; height=90; width=100; sver=60; 
   tx3g=82... 
   a=sendonly 
    
   In this case, the offer is different from the previous case: it does 
   not include the stream properties: "height", "width" and "tx3g".  The 
   answerer copies the "tx", "ty" and "layer" values, thus acknowledging 
   these.  "max-h" and "max-w" are not present in the answer because the 
   "tx" and "ty" (and "layer") in this special case do not apply to the 
   received, but to the sent stream.  Also, if offerer and answerer had 
   very different displays sizes, it would not be possible to express 
   the answerer's capabilities.  In the example above and for an 
   answerer with a 50x50 display, the translation values are already out 
   of range. 
    
    
   For sendonly: 
    
   O -> A 
    
   m=text <port> RTP/AVP 98 
   a=rtpmap:98 3gpp-tt/1000 
   a=fmtp:98 tx=100; ty=100; layer=0; height=80; width=100; 
   sver=6256,60; tx3g=81... 
   a=sendonly 
    
    
   A -> O 
    
   m=text <port> RTP/AVP 98.. 
   a=rtpmap:98 3gpp-tt/1000 
   a=fmtp:98 tx=100; ty=100; layer=0; height=80; width=100; max-h=100; 
   max-w=160; sver=60 
   a=recvonly 
    
   Note that "max-h" and "max-w" are not present in the offer.  Also, 
   with this answer, the answerer would accept the offer as is (thus 
   echoing "tx", "ty", "height", "width" and "layer") and additionally 
   inform the offerer about its capabilities: "max-h" and "max-w". 
    
   Another possible answer for this case would be: 
    
     
   Rey & Matsui                                              [Page 35] 
   Internet Draft Payload Format for 3GPP Timed Text September 10, 2004 
    
    
   A -> O 
    
   m=text <port> RTP/AVP 98.. 
   a=rtpmap:98 3gpp-tt/1000 
   a=fmtp:98 tx=120; ty=105; layer=0; max-h=95; max-w=150; sver=60 
   a=recvonly 
    
   In this case the answerer does not accept the values offered.  The 
   offerer MUST use these values or else remove the stream. 
    
    
8.4. Parameter Usage outside of Offer/Answer 
 
   SDP may also be employed outside of the Offer/Answer context, for 
   instance for multimedia sessions that are announced through the 
   Session Announcement Protocol (SAP) [14], or streamed through the 
   Real Time Streaming Protocol (RTSP) [15].   
    
   In this case, the receiver of a session description MUST support the 
   parameters and given values for the streams or else it MUST reject 
   the session.  It is the responsibility of the sender (or creator) of 
   the session descriptions to define the session parameters so that the 
   probability of unsuccessful session setup is minimized.  This is out 
   of the scope of this document. 
    
    
9. IANA Considerations 
    
   IANA is requested to register the MIME subtype name "3gpp-tt" for the 
   media type "text" as specified in Section 7 of this document.   
    
    
10. Security considerations 
    
   RTP packets using the payload format defined in this specification 
   are subject to the security considerations discussed in the RTP 
   specification [3] and any applicable RTP profile, e.g. AVP [RFC3551]. 
    
   In particular, an attacker may invalidate the current set of valid 
   sample descriptions at the client by means of repeating a packet with 
   an old sample description, i.e. replay attack.  This would mean that 
   the display of the text would be corrupted, if displayed at all.  
   Another form of attack may consist in sending redundant fragments, 
   whose boundaries do not match the exact boundaries of the originals.  
   This may cause a decoder to crash. 
    
   These types of attack may easily be avoided by using source 
   authentication and integrity protection.   
    
   Additionally, peers in a timed text session may desire to retain 
   privacy in their communication, i.e. confidentiality.   
    

   Rey & Matsui                                              [Page 36] 
   Internet Draft Payload Format for 3GPP Timed Text September 10, 2004 
    
    
   This payload format does not provide any mechanisms for achieving 
   these.  Confidentiality, integrity protection and authentication have 
   to be solved by a mechanism external to this payload format, e.g. 
   SRTP [10]. 
    
    
11. References 
    
11.1. Normative References 
    
   [1]  Transparent end-to-end packet switched streaming service (PSS); 
     Timed Text Format (Release 6), TS 26.245 v 0.1.6, Working Draft, 
     July 2003. 
    
   [2]  ISO/IEC 14496-1:2001/AMD5, "Information technology � Coding of 
     audio-visual objects � Part 1: Systems, ISO Base Media File 
     Format", 2003. 
    
   [3]  H. Schulzrinne, S. Casner, R. Frederick and V. Jacobson, "RTP: A 
     Transport Protocol for Real-Time Applications", STD 64, RFC 3550, 
     July 2003. 
    
   [4]  M. Handley, V. Jacobson, "SDP: Session Description Protocol", 
     RFC 2327, April 1998. 
    
   [5]  S. Bradner, "Key words for use in RFCs to indicate requirement 
     levels," BCP 14, RFC 2119, IETF, March 1997. 
    
   [6]  S. Josefsson (Ed.), "The Base16, Base32, and Base64 Data 
     Encodings", RFC 3548, July 2003. 
    
11.2. Informative References 
    
   [7]  J. Rosenberg, H. Schulzrinne, "An RTP Payload Format for Generic 
     Forward Error Correction", RFC 2733, December 1999. 

   [8]  C. Perkins, O. Hodson, "Options for Repair of Streaming Media", 
     RFC 2354, June 1998. 

   [9]  W3C, "Synchronised Multimedia Integration Language (SMIL 2.0)", 
     August, 2001. 

   [10] M. Baugher, D. A. McGrew, D. Oran, R. Blom, E. Carrara, M. 
     Naslund, K. Norrman, "The Secure Real-Time Transport Protocol", 
     RFC 3711, March 2004. 

   [11] J. Rey et al., "RTP Retransmission Payload Format", draft-ietf-
     avt-rtp-retransmission-10.txt, work in progress, January 2004. 

   [12] Van der Meer et al., "RTP Payload Format for Transport of MPEG-4 
     Elementary Streams ", RFC 3640, November 2003. 


   Rey & Matsui                                              [Page 37] 
   Internet Draft Payload Format for 3GPP Timed Text September 10, 2004 
    
    
   [13] J. Rosenberg., H. Schulzrinne, " An Offer/Answer Model with the 
     Session Description Protocol (SDP)", RFC 3264, June 2002. 

   [14] M. Handley, et al. "Session Announcement Protocol", RFC 2974, 
     October 2000.  

   [15] H. Schulzrinne, et al.,"Real Time Streaming Protocol (RTSP)", 
     RFC 2326, April 1998. 
    
   [16] Transparent end-to-end packet switched streaming service (PSS); 
     Protocols and codecs (Release 5), TS 26.234 v 5.6.0, Working 
     Draft, September 2003. 
    
   [17] H. Schulzrinne, S. Casner, "RTP Profile for Audio and Video 
     Conferences with Minimal Control", STD 65, RFC 3551, July 2003. 
    
   [18] F. Yergeau, "UTF-8, a transformation format of Unicode and ISO 
     10646", RFC 2044, October 1996. 
    
   [19] P. Hoffman, F. Yergeau, "UTF-16, an encoding of ISO 10646", RFC 
     2781, February 2000.  
 
   [20] Friedman, et al., "RTP Control Protocol Extended Reports (RTCP 
     XR)", RFC 3611, November 2003. 
 
   [21] Ott, et al., "Extended RTP Profile for RTCP-based Feedback 
     (RTP/AVPF)", draft-ietf-avt-rtcp-feedback-09.txt, work in 
     progress, July 2004. 
    
    
12. Annexes  
    
12.1. Dynamic SIDX wrap-around mechanism 
    
   The use of dynamic sample descriptions by senders is OPTIONAL.  
   However, if used, senders MUST implement this mechanism.  Receivers 
   MUST always implement it. 
    
   As mentioned in Section 3.1.2, dynamic SIDX values remain active 
   either during the entire duration of the session (if used just once) 
   or in different intervals of it (if used once or more).  Although 64 
   sample descriptions should cover the needs of most timed text 
   applications, a wrap-around mechanism to handle the exception is 
   described here.  In the following, SIDX value means dynamic SIDX 
   value. 
    
   There is a sliding window of 64 active SIDX values.  Values within 
   the window are active, all others are considered inactive.  An SIDX 
   value becomes "active" if at least one sample description identified 
   by that SIDX has been received.  Since sample descriptions MAY be 
   sent redundantly, it is possible that a client receives a given SIDX 
   several times.  However, the receiver SHALL ignore redundant sample 
   descriptions and it MUST use the already cached copy.  The guard 
     
   Rey & Matsui                                              [Page 38] 
   Internet Draft Payload Format for 3GPP Timed Text September 10, 2004 
    
    
   range of inactive values ensures that always the correct association 
   SIDX <-> sample description is used. 
    
   The following algorithm is used to maintain the dynamic SIDX values: 
    
     Let X be the SIDX of the last received sample description.  Let Y  
     be a value within the allowed range for dynamic SIDX: [0,127], and  
     different from X. 
    
        1. Initialize all dynamic SIDX values as inactive.  For stored 
          content, read the sample description index in the Sample to 
          Chunk box ("stsc") for that sample.  For live streaming, the 
          first value MAY be zero or any other value in the interval 
          above.  The initial value is SIDX=X.  Go to step 2. 
        2. First in-band sample description with SIDX=X is received. Go 
          to step 3. 
        3. Set all SIDX=Y inactive if inside the interval [X+1 
          modulo(128), X+64 modulo(128)].  Otherwise, set SIDX=Y as 
          active.  Go to step 4. 
        4. Wait for next sample description.  Upon reception of a sample 
          description with SIDX=X do:  
             a. If X is currently active, then wait for next SIDX (do 
               nothing).   
             b. Else go to step 3. 
      
   Example,  
    
        if X=4, any SIDX in the interval [5,68] is inactive.  Active 
        SIDX values are in the complementary interval [69,127] plus 
        [0,4].  Once the client is initialized, the interval of active 
        SIDX values MUST change whenever a sample description with an 
        inactive SIDX value is received.  E.g., if the client receives a 
        SIDX=6, then the active interval is now different: [0,6] plus 
        [71,127].  However, if the received SIDX is in the current valid 
        interval no change SHALL be applied.  This means that at any 
        instant a maximum of 64 SIDX values are valid, whereas the total 
        of values used might be over 64.   
    
12.2. Basics of the 3GP File Structure 
    
   This section provides a coarse overview of the 3GP file structure. 
    
   Each 3GP file consists of "Boxes".  Boxes start with a header, which 
   indicates both size and type contained.  In general, a 3GP file 
   contains the File Type Box (ftyp), the Movie Box (moov), and the 
   Media Data Box (mdat).  The Movie Box and the Media Data Box, serving 
   as containers, include own boxes for each media.  Similarly, each box 
   type may include a number of boxes.  See ISO Base Media file Format 
   [2] for a complete list of possibilities. 
    
   In the following, only those boxes are mentioned, which are useful 
   for the purposes of this payload format. 
    
     
   Rey & Matsui                                              [Page 39] 
   Internet Draft Payload Format for 3GPP Timed Text September 10, 2004 
    
    
   The File Type Box identifies the type and properties of a 3GP file.  
   The File Type Box contents comprise the major brand, the minor 
   version and the compatible brands.  When streamed with RTP, these are 
   communicated via out-of-band means, such as SDP.   
    
   The Movie Box (moov) contains one or more Track Boxes (trak) which 
   include information about each track.  A Track Box contains, among 
   others, the Track Header Box (tkhd), the Media Header Box (mdhd) and 
   the Media Information Box (minf).   
    
   The Track Header Box specifies the characteristics of a single track, 
   where a track is, in this case, the streamed text during a session.  
   Exactly one Track Header Box is present for a track.  It contains 
   information about the track, such as the spatial layout (width and 
   height), the video transformation matrix and the layer number.  Since 
   these pieces of information are essential and static, i.e. constant 
   for the duration of the session, they MUST be sent prior to the 
   transmission of any text samples.  See the ISO base media file format 
   [2] for details about the definition of the conveyed information. 
    
   The Media Header Box contains the timescale or number of time units 
   that pass in one second, i.e. cycles per second or Hertz.  The Media 
   Information Box includes the Sample Table Box (stbl) which itself 
   contains the Sample Description Box (stsd), the Decoding Time to 
   Sample Box (stts), the Sample Size Box (stsz) and the Sample to Chunk 
   Box (stsc).  Sample descriptions for each text sample are encoded as 
   "tx3g" sample entries in the Sample Description Box (stsd).  
    
   The Sample Table Box (stbl) contains all the time and data indexing 
   of the media samples in a track.  Using the tables here, it is 
   possible to locate samples in time, determine their type, and 
   determine their size, container, and offset into that container.   
    
   Finally, the Media Data Box contains the media data itself.  In timed 
   text tracks this box contains text samples.  Its equivalent to audio 
   and video is audio and video frames, respectively.  The text sample 
   consists of the text length, the text string, and one or several 
   Modifier Boxes.  The text length is the size of the text in bytes.  
   The text string is plain text to render.  The Modifier Box is 
   information to render in addition to the text such as colour, font, 
   etc. 
    
12.3. Usage of 3GP file information for transport in RTP 
    
   For the purpose of streaming timed text contents, some values in the 
   boxes contained in a 3GP file are mapped to fields of this payload 
   header.  This section explains where to find and how to use those 
   values. 
    
   From the Track Header Box (tkhd): 
    
        o tx,ty: these values are the second but last and third but 
          last values in the unity matrix.  These values are fixed-
     
   Rey & Matsui                                              [Page 40] 
   Internet Draft Payload Format for 3GPP Timed Text September 10, 2004 
    
    
          point 16.16 values, restricted to be integers (the lower 16 
          bits of each value shall be zero). Therefore, only the first 
          16 bits are used. 
         
        o width, height: they also have the same name in the box and 
          the payload header.  Similarly as above, only the first 16 
          bits are used, the rest is zero. 
         
        o layer: all 16 bits are used.  
    
   From the Sample Table Box (stbl) the following information is carried 
   in each RTP packet using this payload format:  
    
        o the Sample Description Box (stsd): this stsd box provides 
          information on the basic characteristics of text samples.  
          Each entry is a sample entry box of type "tx3g".  An example 
          of the information contained in a sample entry could be the 
          font size or the background color.  These pieces of 
          information are commonly used by many text samples during the 
          session.  Each sample entry "tx3g" is transported either in-
          band or out-of-band. 
         
        o the Decoding Time to Sample Box (stts): the 24 least 
          significant bits of the "sample_delta" are mapped to the 
          field SDUR (Text Sample Duration), 
         
        o the Sample Size Box (stsz): the 16 least significant bits of 
          the "sample_size" or "entry_size" (depending on whether the 
          sample size is fixed or variable) indicate the length (in 
          bytes) of the text string plus any modifier boxes that may be 
          in that text sample.  In 3GP files this value includes the 
          text string length and, if present, the byte order mark 
          (BOM).  For the purposes of this payload format, text samples 
          include neither text string length nor BOM.  For obtaining 
          the text sample length as defined in this payload format, it 
          MUST be accordingly adjusted to exclude those, i.e. 2 units 
          SHALL be subtracted for UTF-8 encoded samples and 4 units for 
          UTF-16 encoded samples.  The resulting value MUST be directly 
          mapped to the SLEN field defined in the TYPE 2 header.  At 
          the receiver, the reverse operation MUST be done to re-
          assemble the text string and restore the sample length as 
          stored in the 3GP file.   
         
                Note also that the text string length found in each of 
                the text samples of the text track may not always be 
                mapped directly to the TLEN.  This value SHALL be 
                reduced by 2 units to exclude the BOM in UTF-16 strings. 
         
        o the Sample to Chunk Box (stsc): the value of the 
          "sample_description_index" for that sample in the Sample to 
          Chunk Box is mapped to the field SIDX (Text Sample Entry 
          Index).  The Sample to Chunk Box (stsc) associates the text 
          sample and its corresponding sample description entry in the 
     
   Rey & Matsui                                              [Page 41] 
   Internet Draft Payload Format for 3GPP Timed Text September 10, 2004 
    
    
          Sample Description Box (stsd, see below).  The Sample to 
          Chunk Box can be used to associate a text sample with a 
          sample description entry.  Since the sample description may 
          vary during the session, the association SDIX is sent 
          together with the text samples using this payload format. 
    
    
13. Acknowledgements 
    
   The authors would like to thank Dave Singer, Jan van der Meer, Magnus 
   Westerlund and Colin Perkins for their comments and suggestions to 
   this document. 
    
    
14. Author's Addresses 
    
   Jose Rey                                     rey@panasonic.de 
   Panasonic European Laboratories GmbH          
   Monzastr. 4c                                  
   D-63225 Langen, Germany 
   Phone: +49-6103-766-134 
   Fax:   +49-6103-766-166 
    
   Yoshinori Matsui             matsui.yoshinori@jp.panasonic.com 
   Matsushita Electric Industrial Co., LTD. 
   1006 Kadoma 
   Kadoma-shi, Osaka, Japan 
   Phone: +81 6 6900 9689 
   Fax:   +81 6 6900 9699 
    
    
15. IPR Notices 
    
   The IETF takes no position regarding the validity or scope of any 
   Intellectual Property Rights or other rights that might be claimed to 
   pertain to the implementation or use of the technology described in 
   this document or the extent to which any license under such rights 
   might or might not be available; nor does it represent that it has 
   made any independent effort to identify any such rights.  Information 
   on the procedures with respect to rights in RFC documents can be 
   found in BCP 78 and BCP 79. 
    
   Copies of IPR disclosures made to the IETF Secretariat and any 
   assurances of licenses to be made available, or the result of an 
   attempt made to obtain a general license or permission for the use of 
   such proprietary rights by implementers or users of this 
   specification can be obtained from the IETF on-line IPR repository at 
   http://www.ietf.org/ipr. 
    
   The IETF invites any interested party to bring to its attention any 
   copyrights, patents or patent applications, or other proprietary 
   rights that may cover technology that may be required to implement 

     
   Rey & Matsui                                              [Page 42] 
   Internet Draft Payload Format for 3GPP Timed Text September 10, 2004 
    
    
   this standard.  Please address the information to the IETF at ietf-
   ipr@ietf.org. 
    
    
16. Full Copyright Statement 
    
   Copyright (C) The Internet Society (2004).  This document is subject 
   to the rights, licenses and restrictions contained in BCP 78, and 
   except as set forth therein, the authors retain all their rights. 
    
   This document and the information contained herein are provided on an 
   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 
   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET 
   ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, 
   INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE 
   INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 
   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 
    

   Rey & Matsui                                              [Page 43]