Network Working Group S. Pfeiffer Internet-Draft C. Parker Expires: September 20, 2005 A. Pang CSIRO March 19, 2005 The Annodex exchange format for time-continuous bitstreams, Version 3.0 draft-pfeiffer-annodex-02 Status of this Memo This document is an Internet-Draft and is subject to all provisions of Section 3 of RFC 3667. By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she become aware will be disclosed, in accordance with RFC 3668. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on September 20, 2005. Copyright Notice Copyright (C) The Internet Society (2005). Abstract This specification defines "Annodex", an exchange format for annotated and indexed time-continuous bitstreams. Annodex provides a bitstream format for exchanging multitrack interleaved time-continuous bitstreams and textual meta information attached to Pfeiffer, et al. Expires September 20, 2005 [Page 1] Internet-Draft ANNODEX March 2005 temporal fragments of the binary bitstreams. The meta information is given in the Continuous Media Markup Language (CMML). Annodex enables integration of time-continuous bitstreams into the browsing and searching functionality of the World Wide Web. The specification is not encumbered by patents. The Annodex format is protected by a trade mark to prevent the use of the term "Annodex" for any related but non-conformant and therefore non-interoperable technology. Conformant technology is encouraged to use the term "Annodex" when referring to the exchange format. The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 Overview . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Features of Annodex . . . . . . . . . . . . . . . . . . . . . 5 3. Authoring exchange format . . . . . . . . . . . . . . . . . . 8 4. The Ogg skeleton logical bitstream . . . . . . . . . . . . . . 10 4.1 The format of the skeleton ident header . . . . . . . . . 11 4.2 The format of the skeleton secondary headers . . . . . . . 13 4.3 Media mapping of skeleton into Ogg . . . . . . . . . . . . 16 5. Handling time in Annodex format bitstream . . . . . . . . . . 18 5.1 Conceptual overview . . . . . . . . . . . . . . . . . . . 18 5.2 Mapping a granule position to a time position . . . . . . 19 5.3 Addressing/seeking into the bitstream . . . . . . . . . . 22 5.4 Remultiplexing a bitstream . . . . . . . . . . . . . . . . 23 6. MIME media type applications . . . . . . . . . . . . . . . . . 24 6.1 MIME media type registration for 'application/annodex' . . 24 6.1.1 URI addressing into Annodex bitstreams . . . . . . . . 24 6.1.2 HTTP 'Accept' header field interpretation . . . . . . 25 6.2 MIME media type registration for 'video/annodex' . . . . . 26 6.3 MIME media type registration for 'audio/annodex' . . . . . 26 7. Security considerations . . . . . . . . . . . . . . . . . . . 28 8. ChangeLog . . . . . . . . . . . . . . . . . . . . . . . . . . 29 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 29 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 30 A. Definitions of terms and abbreviations . . . . . . . . . . . . 32 B. Glossary of acronyms . . . . . . . . . . . . . . . . . . . . . 33 C. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 34 Intellectual Property and Copyright Statements . . . . . . . . 35 Pfeiffer, et al. Expires September 20, 2005 [Page 2] Internet-Draft ANNODEX March 2005 1. Introduction 1.1 Motivation When searching the World Wide Web, time-continuous data such as audio and video files are currently treated as "dark matter" outside the existing infrastructure of the World Wide Web: It is not possible to look inside such files, search for their content through common text-based search engines, or directly hyperlink to points of interest inside them. The file can generally only be consumed in its entirety. In addition, such files are "dead ends" in that by consuming their content the hyperlinking functionality of the Web is left behind. Text documents were enabled for the Web through definition of a markup language (HTML [1]) for text documents to enable description of the structure of a document, and thus allow for the separation of content from presentation. This specification takes the same approach for time-continuous documents. The markup language for time-continuous documents is called CMML, short for Continuous Media Markup Language [2]. It describes the structure of time-continuous documents and allows for a clean separation of content from presentation. To turn text documents into a Web resource that can be exchanged between different applications, HTML markup is added. Such an exchange format where CMML is merged with the time-continuous document(s) it describes is also necessary to turn the time-continuous document(s) into a Web resource and provide a standard exchange format between applications. This format is called "Annodex" for annotated and indexed documents and is defined here. 1.2 Overview Annodex is using a container format that allows transport and storage of interleaved time-synchronous bitstreams. In a clean layering approach as is familiar from Internet protocols the functionality of the container format and CMML is explicitly separated. Each layer solves a specific problem without being dependent on layers that are further up in functionality. The container format of Annodex is the Ogg encapsulation format version 0 [3]. Annodex is an Ogg bitstream containing a "skeleton" and a CMML logical bitstream, in addition to other temporally interleaved data bitstreams. Ogg skeleton is a logical bitstream that describes all the other logical bitstreams contained in the Ogg physical bitstream (see section 4).It's purpose is to remove codec-specific information requirements from the multiplexing/demultiplexing process. Pfeiffer, et al. Expires September 20, 2005 [Page 3] Internet-Draft ANNODEX March 2005 Only an Annodex bitstream that contains a CMML bitstream can be regarded as a Web resource and as part of the Web, because it can be searched and browsed. An Ogg bitstream without a CMML bitstream is not an Annodex bitstream, but only an Ogg bitstream with a "skeleton" logical bitstream, which is still valuable as a multitrack media format that can be addressed through temporal hyperlinks [4], however it is not a first class citizen on the Web because Web search engines cannot index and crawl it. The file extension of Annodex files is ".anx". This document also applies for registration of the MIME type "application/annodex" for Annodex format bitstreams. In the meantime, "text/x-annodex" will be used. Further MIME types that this document applies for are "video/annodex" for Annodex format (possibly multitrack) video and "audio/annodex" for Annodex format (possibly multitrack) audio. Please note that this document assumes that the reader understands the Ogg encapsulation format version 0 [3]. Also, knowledge of the network protocols HTTP [5] and RTP/RTSP [6] as well as the extension of URIs to address temporal offsets into Web resources [4] are a prerequisite to understanding this document. To find out more about the use of Annodex for creating searchable and surfable Web resources, refer to the specification of the Continuous Media Markup Language (CMML Version 2.0) [2]. Pfeiffer, et al. Expires September 20, 2005 [Page 4] Internet-Draft ANNODEX March 2005 2. Features of Annodex Annodex contains interleaved bitstreams of time-related data. It is designed to be used both as a persistent file format and as a streaming format to exchange temporally addressable bitstreams. It enables encapsulation of any type of time-continuous bitstream as long as it is streamable and is based on a regular data sampling rate (called granulerate). For variable sampling rate bitstreams, a least common multiple of the used sampling rates must be known. Using this container format, Annodex is designed to accommodate any current or future compression format for time-continuous bitstreams. The container format that Annodex is based on is designed to allow several tracks of temporally synchronous time-continuous data. Each track represents codec data for one type of time-continuous data stream. Here is an example Annodex bitstream with data bitstreams D1-D3 (for example, a video track and two audio tracks) and an annotation track A1 (a CMML bitstream). __________________________________________________________________ D1 | | | | | | | | | | | __________________________________________________________________ D2 | | | | | | | __________________________________________________________________ D3 | | | | | | | | | | | | | | | | | | | | __________________________________________________________________ A1 | clip 1 | -- | clip 2 | clip 3 | __________________________________________________________________ The time axis t |-----------------------------------------------------------------> Bitstreams of time-continuous data are being regarded as a sequence of data packets that each have a timestamp representing the time at which the packet data ends. The packets contain all the data required to cover the interval from the last packet. If it doesn't cover the full period, it MUST cover the end part of the interval. Bitstreams that represent data that is to be presented in one single time instant are called time-instantaneous bitstreams. Their timestamp represents the time at which the packet's data starts and ends. The CMML track A1 above is one such bitstream. Its clips Pfeiffer, et al. Expires September 20, 2005 [Page 5] Internet-Draft ANNODEX March 2005 represent time-instantaneous data that is displayed at the given timestamp. The subsequent data packet replaces the information of the previous one. To insert a gap in a data bitstream (as in A1 above), a data packet MUST be inserted which explicitly annulls the data. Data bitstreams generally contain the following information: o setup information for a codec o content data The setup information is inserted at the start of a data bitstream before any content data. Distribution of Annodex format bitstreams is performed using a network protocol such as HTTP [5] or RTP/RTSP [6]. The basic process is the following: The client dispatches a download or streaming request to the server with a certain URI. The server resolves the URI and starts delivering Annodex format bitstreams, taking into account potential URI addressed offsets. Currently the distribution with HTTP is clear and discussed in this document, while the details of a distribution via RTP/RTSP are not yet examined and thus unspecified - in particular a RTP payload needs to be defined for Annodex. The following figure explains the protocol stack: ________ _________ _________ __________ \ | CMML | | Video | | Audio | | ... | | ________ _________ _________ __________ | | | skeleton | > Annodex _____________________________________________ | | | Ogg | | _____________________________________________ / | HTTP | RTSP | | _______________________| | | | RTP | _____________________________________________ | TCP | UDP | _____________________________________________ | IP | _____________________________________________ Pfeiffer, et al. Expires September 20, 2005 [Page 6] Internet-Draft ANNODEX March 2005 The Annodex format has been designed to accommodate for reliable and unreliable transport. In case of packet loss due to an unreliable transport, data may get lost; this may be important to the application or not and thus may need to be addressed. All data, including CMML data, is treated with the same importance. For instantaneous data tracks the loss of one packet implies that the next packet will restore the proper state. We envisage, however, that a client may require the current state information, so there should be a protocol request for re-sending the current state. This will be delivered by the server by inserting another copy of the instantaneous data into the Annodex bitstream. For example, clips within an annotation bitstream can be repeated in the Annodex bitstream by having the same "track" attribute and the same page_sequence_number as the previous "clip" element. This handling of unreliable transport relates mostly to the use of Annodex over RTP/RTSP and UDP and needs further elaboration. In short, the Annodex bitstream specific features are: o index clips of Annodex content for retrieval, e.g. with a Web search engine. o crawl Webs of Annodex and other Web resources, e.g. during an indexing operation of a Web search engine. o directly address and retrieve temporal intervals inside the Annodex bitstream without a need to decode logical bitstreams aside from skeleton. o directly address and retrieve named clips inside the Annodex bitstream without a need to decode any more than the skeleton and CMML logical bitstreams. o extract, cache, and reuse temporal intervals or named clips while retaining the annotation and index information. o browse through Webs of Annodex and other Web resources in an integrated manner making time-continuous content first class citizen on the World Wide Web. Pfeiffer, et al. Expires September 20, 2005 [Page 7] Internet-Draft ANNODEX March 2005 3. Authoring exchange format For authoring of Annodex bitstream information, the CMML [2] is defined. CMML's "stream" tag has been designed to author the skeleton bitstream and describe the data bitstreams to be interleaved into an Ogg bitstream. All other tags of a CMML file provide for authoring of the CMML bitstream. Use of a CMML bitstream without skeleton is strongly discouraged as the time referencing and clip recomposition functionality of Annodexing will get lost. An Annodex physical bitstream has the following mandatory order of Ogg pages: 1. skeleton bos page. 2. CMML bos page. 3. bos pages of the other logical bitstreams. 4. secondary header pages of all logical bitstreams, including fisbone. 5. skeleton eos page. 6. data and eos pages of logical bitstreams, excluding skeleton, multiplexed in a time-synchronous fashion. Such an Annodex bitstream is identified by the CMML bitstream's magic number which can be found at Byte position 104 for this version of the "skeleton" specification. This is calculated through the size of the skeleton bos page, which is fixed because the skeleton ident header is of fixed size and the Ogg page encapsulation header is also fixed size. The Ogg page header has 28 Bytes (including a one Byte segment table as this page has always less than 255 Bytes packet content), and the skeleton ident header has 48 Bytes (see further down). Then, the Byte position amounts to 28+48+28 = 104. The CMML bos page MUST thus also have less than 255 Bytes packet content, which is a sensible restriction. The CMML media mapping is defined in the CMML [2] specification. However, for identification of an Annodex bitstream, the bos page of the CMML logical bitstream needs to be identifiable, which is provided through the first 12 Bytes of the CMML ident packet containing the magic numbers and the version information: Other fields exists and are described in the CMML [2] specification. 1. Identifier: a 8 Byte field that identifies this file to be of a CMML logical input bitstream. It contains the magic numbers: 0x43 'C' 0x4d 'M' 0x4d 'M' 0x4c 'L' 0x00 '\0' Pfeiffer, et al. Expires September 20, 2005 [Page 8] Internet-Draft ANNODEX March 2005 0x00 '\0' 0x00 '\0' 0x00 '\0' 2. Version major: 2 Byte unsigned integer signifying the major version number of the CMML format bitstream. 3. Version minor: 2 Byte unsigned integer signifying the minor version number of the CMML format bitstream. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1| Byte +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Identifier 'CMML\0\0\0\0' | 0-3 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | 4-7 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Version major | Version minor | 8-11 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ... Pfeiffer, et al. Expires September 20, 2005 [Page 9] Internet-Draft ANNODEX March 2005 4. The Ogg skeleton logical bitstream The purpose of Ogg skeleton is to provide codec-specific knowledge that allows parsing, demultiplexing and remultiplexing of Ogg bitstreams without having to decode. While the Ogg encapsulation format by itself is capable of interleaving an unlimited number of time-continuous bitstreams, it is not possible to identify the type of bitstreams (e.g. audio or video) and their encoding format (e.g. Vorbis or Speex or Theora) without decoding at least the bos page of the logical bitstreams. Also, further general media type information such as the image dimensions of a frame in a video bitstream or the language of a speech bitstream may be provided in skeleton. Another limitation of Ogg is that each logical bitstream defines its own mapping of granule_position to time, which is therefore also given in the skeleton. This situation is not acceptable for Annodex, because an Annodex server must be able to return media format information for an Annodex resource without having to understand the codecs involved. And it must be able to return temporal subparts of an Annodex resource without needing to decode. An addition to the Ogg format is thus necessary, which describes all the logical bitstreams included in the Ogg stream. This is defined via a logical bitstream called the "skeleton". For Annodex bitstreams, use of a skeleton bitstream is mandatory. This section specifies the content of the "skeleton" logical bitstream and how it is mapped into Ogg. Knowledge of the Ogg bitstream format as specified in the Ogg RFC [3] is presumed. Please also refer to that document for descriptions of the terms used in this document. The skeleton bitstream has the ability to generically describe Ogg bitstreams that consist of one or more time-continuous data bitstream and one or more time-instantaneous data bitstream concurrently interleaved (in Ogg terms: multiplexed). It does not describe sequentially multiplexed Ogg bitstreams, but rather expects that a sequentially multiplexed bitstream has its own skeleton logical bitstream. The skeleton logical bitstream provides the following functionality on top of Ogg: o allows for the identification of the codec format and the content type of encapsulated logical bitstreams without the need to decode that bitstream's headers or data. o allows for extraction of a temporal interval of the Ogg physical bitstream while retaining the original start time offset of that Pfeiffer, et al. Expires September 20, 2005 [Page 10] Internet-Draft ANNODEX March 2005 interval. o allows for attachment of a real-world wall-clock time and a date to the Ogg physical bitstream, thus e.g. retaining creation date/time or first broadcast date/time. o allows for temporal offset operations into an Ogg physical bitstream without a need to decode any data. o allows generally for handling of content without a need to decode it, such as is necessary in a caching Web proxy. o allows for attachment of message header fields given as name-value pairs that contain some sort of protocol messages about the logical bitstream, e.g. the screen size for a video bitstream or the number of channels for an audio bitstream. For authoring of the skeleton bitstream information the CMML [2] can be used. CMML's "stream" tag has been designed with that purpose in mind. However, it is not mandatory to use CMML for authoring of skeleton information - that information may well originate from a different source and be written directly into the skeleton bitstream. See the CMML Internet-Draft for more details. 4.1 The format of the skeleton ident header The skeleton logical bitstream starts with an ident header containing information for the complete Ogg physical bitstream. The ident header has the following format: Pfeiffer, et al. Expires September 20, 2005 [Page 11] Internet-Draft ANNODEX March 2005 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1| Byte +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Identifier 'fishead\0' | 0-3 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | 4-7 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Version major | Version minor | 8-11 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Presentationtime numerator | 12-15 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | 16-19 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Presentationtime denominator | 20-23 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | 24-27 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Basetime numerator | 28-31 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | 32-35 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Basetime denominator | 36-39 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | 40-43 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | UTC | 44-47 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | 48-51 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | 52-55 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | 56-59 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | 60-63 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Fields with more than one Byte length are encoded LSB (least significant Byte) first. The fields in the skeleton ident header have the following meaning: 1. Identifier: a 8 Byte field that identifies this bitstream as a skeleton. It contains the magic numbers: 0x66 'f' 0x69 'i' 0x73 's' Pfeiffer, et al. Expires September 20, 2005 [Page 12] Internet-Draft ANNODEX March 2005 0x68 'h' 0x65 'e' 0x61 'a' 0x64 'd' 0x00 '\0' 2. Version major: 2 Byte unsigned integer signifying the major version number of the skeleton bitstream. This document specifies the major version 3. 3. Version minor: 2 Byte unsigned integer signifying the minor version number of the skeleton bitstream. This document specifies the minor version 0. 4. Presentationtime numerator & denominator: 8 Byte signed integer each They represent together the time at which to start presenting the Ogg physical bitstream given as a rational number. The denominator represents the temporal resolution at which the presentationtime is given. E.g. 5 on 1000 results in a presentationtime of 0.005 sec. This enables a very high temporal resolution without having to store floating point numbers. In a newly created physical bitstream presentationtime and basetime are the same. When remultiplexing a subpart of the stream, this number MUST be adapted to the requested start time offset of the newly created stream. 5. Basetime numerator & denominator: 8 Byte signed integer each They represent together the basetime of the Ogg physical bitstream given as a rational number like the presentationtime. This number is fixed once the physical bitstream is created and provides a mapping to time for the beginning of the physical bitstream when it starts with a granule position of 0. 6. UTC: a 20 Byte string containing a UTC time in the form of YYYYMMDDTHHMMSS.sssZ. It associates a calendar date and a wall-clock time with the basetime. It is a sequence of 20 NUL Bytes if not in use, making this ident packet and thus the bos page of the skeleton bitstream constant length. Please note: The possible temporal resolution of the presentation- and basetime is on the order of 2^-64. For example, the time formats in use for media that are described in this document range from 1/24 to 1/60 for the different smpte formats. This resolution is enough for any one of these. It is also expected to accommodate any future needs of time resolution for any other time format and time-continuously sampled data. 4.2 The format of the skeleton secondary headers The skeleton secondary headers are a sequence of packets that each contain information about one of the time-continuous or time-instantaneous other logical bitstreams contained within the Ogg physical bitstream. A skeleton secondary header packet has the Pfeiffer, et al. Expires September 20, 2005 [Page 13] Internet-Draft ANNODEX March 2005 following format: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1| Byte +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Identifier 'fisbone\0' | 0-3 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | 4-7 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Offset to message header fields | 8-11 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Serial number | 12-15 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Number of header packets | 16-19 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Granulerate numerator | 20-23 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | 24-27 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Granulerate denominator | 28-31 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | 32-35 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Startgranule | 36-39 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | 40-43 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Preroll | 44-47 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Granuleshift | Padding/future use | 48-51 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Message header fields ... | 52- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Fields with more than one Byte length are encoded LSB (least significant Byte) first. The fields in a skeleton secondary header packet have the following meaning: 1. Identifier: a 8 Byte field that identifies this packet as a skeleton secondary header for identifying other logical bitstreams. It contains the magic numbers: 0x66 'f' 0x69 'i' Pfeiffer, et al. Expires September 20, 2005 [Page 14] Internet-Draft ANNODEX March 2005 0x73 's' 0x62 'b' 0x6f 'o' 0x6e 'n' 0x65 'e' 0x00 '\0' 2. Offset to message header fields: 4 Byte unsigned integer that contains the number of Bytes used in this packet before the message header fields. For the version of the skeleton bitstream described in this document this number is fixed to 44. This field accommodates future changes to the skeleton bitstream allowing to parse message header fields even if more fields get inserted before them. 3. Serial number: 4 Byte unsigned integer containing the bitstream_serial_number of the Ogg logical bitstream described by this skeleton secondary header packet and thus connecting it to the logical bitstream. 4. Number of header packets: a 4 Byte unsigned integer that contains the number of header packets of that particular logical bitstream consisting of the bos page and the secondary header pages. 5. Granulerate numerator & denominator: 8 Byte signed integer each They represent the temporal resolution of the logical bitstream in Hz given as a rational number in the same way as the basetime attribute above. 6. Startgranule: 8 Byte signed integer that represents the granule number with which this logical bitstream starts, which is originally 0, but will be a positive offset when only a subpart of the stream is requested. 7. Preroll: 4 Byte unsigned integer that contains the number of packets to pre-roll in order to decode a current packet correctly. This is for example the case with Ogg Vorbis, which requires a pre-roll of 2 packets. 8. Granuleshift: a 1 Byte unsigned integer describing whether to partition the granule_position into two for that logical bitstream, and how many of the lower bits to use for the partitioning. The upper bits then still signify a time-continuous granule position for a directly decodable and presentable data granule. The lower bits allow for specification of a finer resolution such that for example predicted frames of a video can be addressed as well, though not decoded without tracing back to the last fully decodable data granule. This is e.g. the case with Ogg Theora. 9. Padding/future use: 3 Bytes padding data that may be used for future requirements and are mandated to zero in this revision. 10. Message header fields: header fields, following the generic Internet Message Format defined in RFC 2822 [7]. Each header field consists of a name followed by a colon (":") and the field Pfeiffer, et al. Expires September 20, 2005 [Page 15] Internet-Draft ANNODEX March 2005 value. Field names are case-insensitive. The field value MAY be preceded by any amount of LWS, though a single SP is preferred. Header fields can be extended over multiple lines by preceding each extra line with at least one SP or HT. There is one mandatory Message header field for all of the logical bitstreams: the "Content-type" header field. For an application that is parsing the Annodex bitstream, this field contains the MIME type and the character encoding of the data in the logical bitstream. E.g. for the annotation bitstream, this field will contain the value "Content-type: text/x-cmml; UTF-8" if the character set used in the CMML bitstream is UTF-8. E.g. for a bitstream containing Ogg Vorbis data the value is "Content-type: audio/x-vorbis". The Content-type message header field MUST come first for all of the Message header fields such that it can be found at a fixed location in the skeleton fisbone packet. As per RFC 2277 [8], message header fields are considered protocol data, i.e. it is not expected to have human readable text in there, and they MUST be entirely encoded in UTF-8. In addition, the mandatory header fields MUST be encoded in US-ASCII and it is recommended to also use US-ASCII code points as much as possible for the optional header fields. User defined optional message header fields MUST follow the naming standard given in RFC2822. 4.3 Media mapping of skeleton into Ogg The media mapping for skeleton into Ogg is as follows: o The skeleton ident (fishead) header is mapped into the skeleton bos page. o The secondary header pages of a skeleton logical bitstream consist of the fisbone header packets that each describe one particular logical data bitstream within the Ogg physical bitstream. o There are no content pages or data packets. As the skeleton eos page is included before the first data page of any logical bitstream, there actually cannot be any content data packets. o The skeleton eos page contains one packet of length zero. When using a skeleton logical bitstream in Ogg, a further restriction on the order in which Ogg pages appear is introduced to allow for easier identification: 1. The skeleton bos page is the very first bos page. This allows its differentiation from other Ogg bitstreams that don't contain a skeleton logical bitstream. 2. The bos pages of the other logical bitstreams come next as is a requirement of the Ogg bitstream format. Pfeiffer, et al. Expires September 20, 2005 [Page 16] Internet-Draft ANNODEX March 2005 3. The secondary header pages of all the logical bitstreams in the Ogg physical bitstream come next, as is also a requirement of Ogg. The skeleton secondary header pages are also included here. 4. Before any data pages of any of the logical bitstreams appear in the Ogg physical bitstream, the skeleton eos page MUST end the skeleton logical bitstream. This is necessary to end the control section of the bitstream. If an Ogg stream parser reaches the skeleton eos page, it knows that it has received all the bos and secondary header pages and can start setting up its decoding or parsing environment. Pfeiffer, et al. Expires September 20, 2005 [Page 17] Internet-Draft ANNODEX March 2005 5. Handling time in Annodex format bitstream With time-continuous data like Annodex, one needs to handle data at four different levels: o at the Bytes level, upon seeking. o at the packets level, upon encapsulating. o at the granules level, upon recomposing. o at the time level, upon displaying and addressing. This section explains how they all fit together. 5.1 Conceptual overview Annodex bitstreams inherently represent one timeline only, where the different logical bitstreams can be thought of as content tracks on that timeline. All of these tracks relate to the same timeline which starts at a certain time point and ends when the last bitstream ends. An example bitstream can be seen in the following figure. It consists of an Annodex bitstream that contains 4 media bitstreams and one CMML bitstream. The picture is a conceptual representation of the time intervals covered by the different logical bitstreams and the Ogg pages used to encapsulate the data. In the flat representation these are multiplexed such that the data packets of each of these bitstreams occur at the correct time. t_url | t_0 v t_n |------------------------------------------------------------------->| ---------------------------------------------------------------------- |clip1 | clip 2 |/clip 3///////////////| clip 4 | ---------------------------------------------------------------------- CMML bitstream ---------------------------------------------- | | | | | | | | | | |//| | | | | ---------------------------------------------- audio bitstream 1 ------------------------------------------------------------- | | | |/////| | | | | | | ------------------------------------------------------------- video bitstream 1 ---------------------------------------------------- | | | | |//| | | | | | | | | | | | | ---------------------------------------------------- audio bitstream 2 ------------------------------- Pfeiffer, et al. Expires September 20, 2005 [Page 18] Internet-Draft ANNODEX March 2005 | |/////| | | | ------------------------------- video bitstream 2 The time point at which an Annodex bitstream starts (t_0 in the above diagram) is called the "basetime" and represents the time in seconds associated with the granule position of 0 on all logical bitstreams. Typically, a newly created Annodex file starts all its logical bitstreams at granule position 0, and a typical extract of an Annodex bitstream, such as the one starting at t_url in the image above, starts each of its logical bitstreams at a different granule positions. These granule positions are stored in the "startgranule" field of the skeleton secondary header packets. The "basetime" of an Annodex bitstream may be 0, but it can also be any positive time. For example, in professional video production, the first frame of video of a program normally refers to a SMPTE basetime of 01:00:00:00, not 00:00:00:00 (see also the temporal URI addressing [4] specification). Associating such a practice to a digital video resource requires a way to store that basetime with the resource and interpreting it correctly when addressing offsets such as t_uri. Annodex provides such a mapping through the basetime field in the skeleton ident header. Also associated with the basetime is a calendar date and wall-clock time (a "UTC base") which represent a real-world time giving some meaningful calendar date association to the content such as the creation time or the first presentation time. The UTC base is specified in the UTC field of the skeleton ident header. 5.2 Mapping a granule position to a time position Each one of the encapsulated data bitstreams and the CMML bitstream have their own temporal resolution at which they provide data to cover the given timeline. This temporal resolution is usually given through the sampling rate of the particular bitstream. For example, a raw audio bitstream at CD quality is sampled with a sampling rate of 44100 Hz. A video bitstream may be sampled with a frame rate of 25 frames per second. This temporal resolution is called the "granulerate". A granule is a data element that is based on a regular data rate specific to the content type, such as the frame rate for video or the sampling rate for audio. It even exists for bitstreams that are not sampled at a regular rate - then it is the highest resolution of any of the used sampling rates. The granulerate is specified in the skeleton secondary header packets for each logical bitstream. Pfeiffer, et al. Expires September 20, 2005 [Page 19] Internet-Draft ANNODEX March 2005 Each one of the bitstreams insert data into the Ogg bitstream through packets which have an associated temporal duration based on the encoder packaging. Packets are packaged into Ogg pages, which have a granule position associated with them. Not taking the special case of a granuleshift into account, the granule position specifies the number of granules that has been encapsulated since the implicit start of the original bitstream until and including the given Ogg page. The granule position together with the granulerate and granuleshift information of the skeleton secondary header packets for the particular logical bitstream are used for the calculation of the time position for which a data packet of the logical bitstream completes data. A granule position of -1 indicates a special case and MUST NOT be used for calculation of a mapping to time. In principle, the granule position of an Ogg page divided by the granulerate of this page's logical bitstream provides the time position that is reached in that bitstream after decoding all data packets finished on this page. However, the granule_position field in an Ogg page allows for a more fine-grained description of the temporal position. The following image explains the composition of the granule_position field in an Ogg page: granule_position ------------------------------------------------ | keyindex | keyoffset | ------------------------------------------------ The granuleshift field of the skeleton secondary header packets describes how many of the granule_position's 64 bits are being used for the keyoffset. The keyoffset part of the granule_position is commonly used when the logical bitstream consists of packets that can only be fully decoded when referring back to a previous packet. For example, video streams often consist of inter and intra coded frames, where the intra frames are fully decodable and the inter frames are intermediate frames that require backtracking to the last inter frame for accurate decoding. Another example is a logical bitstream that is mapped as instantaneous information (i.e. their granuleposition represents the start time and the end time of the packet data), but actually has a duration associated to it, which is provided through a subsequent packet. CMML is such an example. The keyindex part of the granule_position is then used to provide the temporal position of the reference packet and the keyoffset part provides a counter for the data in between. The calculation of the temporal position of an Ogg page in Annodex is thus specified through the following algorithm: Pfeiffer, et al. Expires September 20, 2005 [Page 20] Internet-Draft ANNODEX March 2005 t_page = basetime + ((keyindex + keyoffset) / granulerate) The basetime provides the time offset used at the beginning of the logical bitstream for the first data packet and thus MUST be added for a correct calculation of the temporal position. As an example regard an audio bitstream that has a granulerate of 44100 (i.e. 44100 samples per 1 sec), a granuleshift of 0, and starts at 4 sec. When reaching a granule_position of 88200, this maps to a time position of 6 seconds: t_page = 4 + ((88200 + 0) / 44100) = 6 This signifies that the bitstream has reached the second sec of the audio bitstream after the end of decoding this page's packets, but maps to 6 seconds because of the basetime. As another example consider a video bitstream that has a granulerate of 25 (i.e. 25 frames per 1 second), a granuleshift of 3 (because it encodes - say - 7 partial frames between each fully encoded frame), and starts at 0 sec. When reaching a granule_position of 997, i.e. a keyindex of 62 and a keyshift of 5, this maps to a fully decodable time position of 2.68 seconds: t_page = 0 + ((62 + 5) / 25) = 2.68 sec The granulerate of a time-instantaneous bitstream such as the CMML bitstream can be chosen arbitrarily by the bitstream multiplexer. Per default, a granulerate of 1000 is used, which is the resolution of npt. The resolution of all the time schemes is given as: o npt: 1000 (milliseconds) o smpte-24: 24 (24 fps) o smpte-24-drop: 24/1.001 = 23.976 (approx. as per SMPTE) o smpte-25: 25 o smpte-30: 30 o smpte-30-drop: 30/1.001 = 29.970 (approx. as per SMPTE) o smpte-50: 50 o smpte-60: 60 o smpte-60-drop: 60/1.001 = 59.940 (approx. as per SMPTE) The granule position of the page finishing data of a time-instantaneous bitstream packet MUST signify the start time of that packet. For example, a CMML bitstream with a granulerate of 1000, a basetime of 0, and a clip that lasts from npt=12.020 till npt=15.0 will get a granule_position of 12020. In contrast, the granule_position of the page finishing data of e.g. an audio bitstream with granulerate 44100, basetime 0 and containing data from npt=12.020 to npt=15.0 will be 661500. Pfeiffer, et al. Expires September 20, 2005 [Page 21] Internet-Draft ANNODEX March 2005 A note about field overflows: an overflow of the granule position field can destroy the temporal integrity of the Annodex physical bitstream. In this case, a multiplexer MUST end the Annodex physical bitstream and restart a new one resetting the counter to 0 and adjusting the basetime appropriately. This is also called sequential multiplexing in Ogg. The same measure MUST be taken in case of an overflow of the page_sequence_number on one of the logical bitstreams. 5.3 Addressing/seeking into the bitstream Addressing into an Annodex bitstream is possible with the temporal URI addressing [4] scheme. Time is specified as a temporal offset from the "beginning" of the stream, making use of the basetime field. Time offsets can also be specified as calendar dates and times. The UTC base is then used as a basis for offsetting. The basetime allows to correctly map a temporal offset point such as a temporal URI to a Byte position in the stream. In the above figure take t_uri=npt:14.0 as the temporal offset addressed on a stream with t_0=npt:5.0 as the basetime - this requires a stream offsetting of only 9 sec to the appropriate granule position in each of the bitstreams, in the figure marked through patterned pages. The seeking action is performed on the interleaved bitstream, in which, the data packets occur in a temporally consecutive order based on the time at which their data ends. These times are represented in the granule positions of the Ogg pages, which are only allowed to monotonically increase within one logical bitstream. This implies that when having found an Ogg page with a granule position that maps to a given seek time (i.e. covers the time or ends at it), the seek has found the right location. This applies over all logical bitstreams. In the above example, this means that the Byte position of the first occurring page of the patterned pages has been found. There is a complication to the seeking: some logical bitstreams have backwards dependencies in their data packets and these have to be taken into account for seeking. For example, a logical bitstream may require several of its previous packets to allow a correct and complete decoding of the actual packet that occurs at the seektime. This is the case for Theora which requires to go back to the previous keyframe when decoding from a time offset. It is also the case for Vorbis which requires the previous 2 packets for accurate setup of the frequency transform - Speex needs approximately 2 packets for similar reasons. Even instantaneous bitstreams such as CMML may require to go back to a previous packet to recover the last state information - the currently active clip in the case of CMML. Pfeiffer, et al. Expires September 20, 2005 [Page 22] Internet-Draft ANNODEX March 2005 Therefore, once seeking has located the correct Byte position that refers to the given temporal offset, it MUST seek back. For logical bitstreams that have a non-zero "granuleshift" in the skeleton, it MUST seek back to the Ogg page that has a "keyindex" granule position. For logical bitstreams that have a non-zero "preroll" in the skeleton, it MUST seek back that many packets. The earliest Byte position that satisfies all these requirements is the correct seek position. A player that presents from an offset MUST take into account that the bitstream may contain some packets that are only there to allow accurate decoding of the seek time. When the backwards dependencies were resolved for a specific logical bitstream, several non-relevant Ogg pages of may also have ended up in the intermediate. These have to be skipped by a player. The time that a player MJST start presenting from is given in the "presentationtime" in the skeleton ident header. 5.4 Remultiplexing a bitstream When a subpart of an Annodex bitstream is requested, such as through a temporal URI query request from a Web server, the bitstream MUST be recomposed and a remultiplexed bitstream served out. There are several aims for performing the remultiplexing with as little effort and therefore as little delay as possible: o no decoding of the logical bitstreams is performed. o no changes to the pages, in particular to the granule positions are made. o changes occur only to the control section. The fields of the skeleton track allow achievement of all these aims. Remultiplexing is essentially achieved by seeking to the position as described above and then including from each logical bitstream only the relevant Ogg pages into the new stream. Changes to fields in the bitstream are restricted to the control section: o the "presentationtime" MUST be adjusted to the requested start time o the "startgranule" for each logical bitstream MUST be adjusted to the granule position at which each logical bitstream starts. This is not the first granule position of the Ogg pages included into the bitstream, but rather the last one that did not get included, as it represents the start time of the bitstream. Everything else, and in particular the Ogg pages, stay the same. This is important also to allow caching of such files as is required for Web proxies and described in temporal URI addressing [4]. Pfeiffer, et al. Expires September 20, 2005 [Page 23] Internet-Draft ANNODEX March 2005 6. MIME media type applications 6.1 MIME media type registration for 'application/annodex' This section contains the registration information for the "application/annodex" media type. While this media type is not approved by the IANA, "application/x-annodex" may be used. To: ietf-types@iana.org Subject: Registration of MIME media type application/annodex MIME media type name: application MIME subtype name: annodex Required parameters: none Optional parameters: none Encoding Considerations: Annodex is an exchange format for any type of encoded time-continuously sampled data stream. The authoring software MUST provide for the encoders, providing the MIME type (and potentially the charset for text-based formats) in the "Content-type" Message header field of each bitstream. The client software can select an appropriate decoder based on this information. Security considerations: see next section. Interoperability considerations: the Annodex bitstream format is a free specification that is independent of any media encoding format. It is designed to provide interoperability with the existing World Wide Web. Additional information: Magic numbers: "OggS" identifies an Ogg page at Byte position 0, "fishead\0" identifies a skeleton logical bitstream at Byte position 28. In the second Ogg page at Byte position 28 the magic number "CMML\0\0\0\0" can be found, identifying this as an Annodex bitstream. File extension: .anx Macintosh File Type Code: "ANDX" Intended usage: COMMON 6.1.1 URI addressing into Annodex bitstreams As Annodex bitstreams are time-continuous Web resources, hyperlinking into Annodex bitstreams via URIs is possible with the temporal URI Pfeiffer, et al. Expires September 20, 2005 [Page 24] Internet-Draft ANNODEX March 2005 query and fragment specification [4]. For the query case, an Annodex server must supports the "X-Accept-TimeURI" http header field (see the temporal URI query specification [4] for more details). The "X-Accept-Range-Redirect" and "X-Range-Redirect" http header fields MAY also be supported by an Annodex server and user agent. As Annodex bitstreams contain CMML logical bitstreams, URI addressing of clips via their name given in the "id" tag is also supported. The same mechanisms as specified in the CMML specification [2] apply to Annodex analogously. In particular, the id addressing is also regarded as an alias for a time offset and an Annodex conformant server that supports Annodex temporal URI addressing MUST also support named URI addressing (see the CMML specification [2] for more details). Examples for valid URI addresses: o http://example.com/sample.anx?t=npt:4 , which relates to an Annodex bitstream composed by the server from sample.anx by starting it at an offset of 4 seconds. o http://example.com/sample.anx?id=dolphin --- relates to the clip whose id attribute value is "dolphin" and all further clips after that. o http://example.com/sample.anx?id="dolphin/" --- relates only to the clip whose id attribute value is "dolphin". o http://example.com/sample.anx?id="intro/goldfish" --- realtes to all the clips from the "intro" clip to the "goldfish" clip. o http://example.com/sample.anx#t=npt:4 --- start using the Annodex bitstream from a 4 second offset. o http://example.com/sample.anx#dolphin -- use the clip with id="dolphin" only. 6.1.2 HTTP 'Accept' header field interpretation The Annodex and the CMML file that can be extracted from it are very tightly related to each other: the CMML file contains all annotation and indexing information including basetime and UTC time about the Annodex file. Therefore, receiving the CMML file instead of the Annodex file is like receiving all information about the bitstreams in the Annodex file except for the data bitstreams themselves. This situation can be taken advantage of with the "Accept" header of HTTP. When an Annodex file is requested from a HTTP server and the acceptable content types given in the "Accept" message header field contains "text/x-cmml" with a higher priority than "application/x-annodex", then the HTTP server SHOULD return the CMML file instead of the requested Annodex file itself. As is standard, the HTTP response will contain a "Content-type" field indicating what content was actually returned. A Web crawler of a search engine, Pfeiffer, et al. Expires September 20, 2005 [Page 25] Internet-Draft ANNODEX March 2005 e.g., can thus avoid extra network load and retrieve more easily parsable information. It SHOULD set the "Accept" HTTP header to "Accept: text/x-cmml" for every requested Annodex URI. For example: Accpet: text/x-cmml; q=1, application/x-annodex; q=0.5 6.2 MIME media type registration for 'video/annodex' This section contains the registration information for the "video/annodex" media type. While this media type is not approved by the IANA, "video/x-annodex" may be used. To: ietf-types@iana.org Subject: Registration of MIME media type "video/annodex" MIME media type name: video MIME subtype name: annodex Required parameters: none Optional parameters: none Encoding Considerations: Annodex video is a subclass of Annodex data where there is at least on video track encpsulated together with the skeleton and CMML tracks, and a potentially unlimited number of other audio and video tracks. Security considerations: as in "application/annodex" MIME application. Interoperability considerations: as in "application/annodex" MIME application. Additional information: Magic numbers: as in "application/annodex" MIME application. File extension: .axv Macintosh File Type Code: "ANXV" Intended usage: COMMON URI addressing and HTTP header field use of "application/annodex" type content apply analogously to "video/annodex". 6.3 MIME media type registration for 'audio/annodex' This section contains the registration information for the "audio/annodex" media type. While this media type is not approved by Pfeiffer, et al. Expires September 20, 2005 [Page 26] Internet-Draft ANNODEX March 2005 the IANA, "audio/x-annodex" may be used. To: ietf-types@iana.org Subject: Registration of MIME media type "audio/annodex" MIME media type name: audio MIME subtype name: annodex Required parameters: none Optional parameters: none Encoding Considerations: Annodex audio is a subclass of Annodex data where there is at least on audio track encpsulated together with the skeleton and CMML tracks, and a potentially unlimited number of other audio tracks. Security considerations: as in "application/annodex" MIME application. Interoperability considerations: as in "application/annodex" MIME application. Additional information: Magic numbers: as in "application/annodex" MIME application. File extension: .axa Macintosh File Type Code: "ANXA" Intended usage: COMMON URI addressing and HTTP header field use of "application/annodex" type content apply analogously to "audio/annodex". Pfeiffer, et al. Expires September 20, 2005 [Page 27] Internet-Draft ANNODEX March 2005 7. Security considerations Annodex format bitstreams contain several multiplexed binary media and one XML annotation bitstream. There is no generic encryption or signing mechanism provided for the complete bitstream or anyone of its parts. As the format of the encapsulated media bitstreams is not prescribed and is identified through the "Content-type" Message header field in that bitstream's skeleton secondary header packet, it is possible to encrypt or sign that media bitstream and then mark it accordingly with a MIME type that signifies the encryption. It is up to the applications that use this bitstream to provide an appropriate codec to handle such bitstreams. As Annodex format bitstreams contain binary media bitstreams, it is possible to include executable content in them. This can be an issue with applications that decode these bitstreams, especially when they are used in a network scenario. Such applications MUST ensure correct handling of manipulated bitstreams, of buffer overflow and the like. Pfeiffer, et al. Expires September 20, 2005 [Page 28] Internet-Draft ANNODEX March 2005 8. ChangeLog draft-pfeiffer-annodex-01: o Annodex version 2.0: changes because of renamings of CMML tags and changes to the temporal and named URI addressing. draft-pfeiffer-annodex-02: o Annodex version 3.0: The changes pertain to the bitstream format to allow for a stronger decoupling of Annodex and CMML. The Annodex format is now using the Ogg format with a "skeleton" and a "CMML" logical bitstream. This change has reinforced a layered approach that fits better with existing practice in Internet protocols, where each layer solves a specific problem without being dependent on other layers further up. 9. References [1] World Wide Web Consortium, "HTML 4.01 Specification", W3C HTML, December 1999, . [2] Pfeiffer, S., Parker, C. and A. Pang, "The Continuous Media Markup Language (CMML), Version 2.0 (work in progress)", I-D draft-pfeiffer-cmml-02.txt, March 2005, . [3] Pfeiffer, S., "The Ogg encapsulation format version 0", RFC 3533, May 2003, . [4] Pfeiffer, S., Parker, C. and A. Pang, "Specifying time intervals in URI queries and fragments of time-based Web resources (work in progress)", I-D draft-pfeiffer-temporal-fragments-03.txt, March 2005, . [5] Fielding, R., Gettys, J., Mogul, J., Nielsen, H., Masinter, L., Leach, P. and T. Berners-Lee, "Hypertext Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999, . [6] Schulzrinne, H., Rao, A. and R. Lanphier, "Real Time Streaming Protocol (RTSP)", RFC 2326, April 1998, . [7] Resnick, P., "Internet Message Format", RFC 2822, April 2001, . [8] Alvestrand, H., "IETF Policy on Character Sets and Languages", RFC 2277, January 1998, . Pfeiffer, et al. Expires September 20, 2005 [Page 29] Internet-Draft ANNODEX March 2005 [9] Bradner, S., "Key words for use in RFCs to Indicate Requirements Levels", RFC 2119, BCP 14, March 1997. [10] World Wide Web Consortium, "Extensible Markup Language (XML) 1.0", W3C XML, October 2000, . [11] World Wide Web Consortium, "XHTML(TM) 1.0 The Extensible Hyper Text Markup Language", W3C XHTML, January 2000, . [12] Berners-Lee, T., Fielding, R. and L. Masinter, "Uniform Resource Identifiers (URI): Generic Syntax", RFC 3986, January 2005, . [13] Alvestrand, H., "Tags for the Identification of Languages", RFC 1766, March 1995, . [14] Freed, N. and N. Borenstein, "Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types", RFC 2046, November 1996, . [15] Whitehead, E. and M. Murata, "XML Media Types", RFC 2376, July 1998, . [16] The Society of Motion Picture and Television Engineers, "SMPTE STANDARD for Television, Audio and Film - Time and Control Code", ANSI 12M-1999, September 1999. [17] ISO, TC154., "Data elements and interchange formats -- Information interchange -- Representation of dates and times", ISO 8601, 2000. Authors' Addresses Silvia Pfeiffer Commonwealth Scientific and Industrial Research Organisation CSIRO, Australia PO Box 76 Epping, NSW 1710 Australia Phone: +61 2 9372 4180 Email: Silvia.Pfeiffer@csiro.au URI: http://www.ict.csiro.au/ Pfeiffer, et al. Expires September 20, 2005 [Page 30] Internet-Draft ANNODEX March 2005 Conrad D. Parker Commonwealth Scientific and Industrial Research Organisation CSIRO, Australia PO Box 76 Epping, NSW 1710 Australia Phone: +61 2 9372 4222 Email: Conrad.Parker@csiro.au URI: http://www.ict.csiro.au/ Andre T. Pang Commonwealth Scientific and Industrial Research Organisation CSIRO, Australia PO Box 76 Epping, NSW 1710 Australia Phone: +61 2 9372 4222 Email: Andre.Pang@csiro.au URI: http://www.ict.csiro.au/ Pfeiffer, et al. Expires September 20, 2005 [Page 31] Internet-Draft ANNODEX March 2005 Appendix A. Definitions of terms and abbreviations Time-continuously sampled data: any sequence of binary data that represents an analog-time signal sampled in discrete time steps. In contrast to actual discrete-time signals as known from signal processing, time-continuously sampled data may also come in compressed form, such that a block of numbers represents an interval of time. Time-instantaneous bitstream: a time-continuously sampled data stream where the components provide information for a specific time-instant. Time-continuous bitstream: a time-continuously sampled data stream where the components provide ongoing information as time goes by. Clip: a temporal section of a time-continuous data stream. Annotation: a free-text, unstructured description of a clip. Metadata: a name-value pair that provides a structured, database-like description of the content. Hyperlink: a Unified Resource Identifier (URI). Meta information: collection of information about a data stream, which may include annotations, hyperlinks, and metadata. Fragment: a subpart of a media document covering some temporal interval. Mark-up: XML tags and their content used to describe a media document. Annodex bitstream: encapsulated time-continuous bitstream with head and clip elements. Annotating: the task of giving textual descriptions to fragments of media documents. Indexing: the task of identifying index points for media documents or fragments thereof. Hyperlinking: the task of linking from one Web resource to another. If a link has an offset into the resource, this is sometimes called deep hyperlinking. head element: CMML data containing information on an Annodexed media file. media packet: a block of digital data that represents a temporal subpart of a stream of continuous media. Media packets of one continuous media file do not overlap in time. bitstream: a sequence of time-continuous data. Pfeiffer, et al. Expires September 20, 2005 [Page 32] Internet-Draft ANNODEX March 2005 Appendix B. Glossary of acronyms CMML: Continuous Media Markup Language. DTD: Document Type Declaration. XML: eXtensible Markup Language. CMWeb: Continuous Media Web. Web: World Wide Web. URI: Unified Resource Identifier. Pfeiffer, et al. Expires September 20, 2005 [Page 33] Internet-Draft ANNODEX March 2005 Appendix C. Acknowledgments The authors greatly acknowledge the contributions of Rob Collins, Zentaro Kavanagh, Andrew Nesbit and Simon Lai in developing this specification. Pfeiffer, et al. Expires September 20, 2005 [Page 34] Internet-Draft ANNODEX March 2005 Intellectual Property Statement The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79. Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf-ipr@ietf.org. Disclaimer of Validity This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Copyright Statement Copyright (C) The Internet Society (2005). This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights. Acknowledgment Funding for the RFC Editor function is currently provided by the Internet Society. Pfeiffer, et al. Expires September 20, 2005 [Page 35]