Network Working Group S. Pfeiffer
Internet-Draft C. Parker
Expires: September 20, 2005 A. Pang
CSIRO
March 19, 2005
The Annodex exchange format for time-continuous bitstreams, Version
3.0
draft-pfeiffer-annodex-02
Status of this Memo
This document is an Internet-Draft and is subject to all provisions
of Section 3 of RFC 3667. By submitting this Internet-Draft, each
author represents that any applicable patent or other IPR claims of
which he or she is aware have been or will be disclosed, and any of
which he or she become aware will be disclosed, in accordance with
RFC 3668.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as
Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
This Internet-Draft will expire on September 20, 2005.
Copyright Notice
Copyright (C) The Internet Society (2005).
Abstract
This specification defines "Annodex", an exchange format for
annotated and indexed time-continuous bitstreams. Annodex provides a
bitstream format for exchanging multitrack interleaved
time-continuous bitstreams and textual meta information attached to
Pfeiffer, et al. Expires September 20, 2005 [Page 1]
Internet-Draft ANNODEX March 2005
temporal fragments of the binary bitstreams. The meta information is
given in the Continuous Media Markup Language (CMML). Annodex
enables integration of time-continuous bitstreams into the browsing
and searching functionality of the World Wide Web.
The specification is not encumbered by patents. The Annodex format
is protected by a trade mark to prevent the use of the term "Annodex"
for any related but non-conformant and therefore non-interoperable
technology. Conformant technology is encouraged to use the term
"Annodex" when referring to the exchange format.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Overview . . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Features of Annodex . . . . . . . . . . . . . . . . . . . . . 5
3. Authoring exchange format . . . . . . . . . . . . . . . . . . 8
4. The Ogg skeleton logical bitstream . . . . . . . . . . . . . . 10
4.1 The format of the skeleton ident header . . . . . . . . . 11
4.2 The format of the skeleton secondary headers . . . . . . . 13
4.3 Media mapping of skeleton into Ogg . . . . . . . . . . . . 16
5. Handling time in Annodex format bitstream . . . . . . . . . . 18
5.1 Conceptual overview . . . . . . . . . . . . . . . . . . . 18
5.2 Mapping a granule position to a time position . . . . . . 19
5.3 Addressing/seeking into the bitstream . . . . . . . . . . 22
5.4 Remultiplexing a bitstream . . . . . . . . . . . . . . . . 23
6. MIME media type applications . . . . . . . . . . . . . . . . . 24
6.1 MIME media type registration for 'application/annodex' . . 24
6.1.1 URI addressing into Annodex bitstreams . . . . . . . . 24
6.1.2 HTTP 'Accept' header field interpretation . . . . . . 25
6.2 MIME media type registration for 'video/annodex' . . . . . 26
6.3 MIME media type registration for 'audio/annodex' . . . . . 26
7. Security considerations . . . . . . . . . . . . . . . . . . . 28
8. ChangeLog . . . . . . . . . . . . . . . . . . . . . . . . . . 29
9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 30
A. Definitions of terms and abbreviations . . . . . . . . . . . . 32
B. Glossary of acronyms . . . . . . . . . . . . . . . . . . . . . 33
C. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 34
Intellectual Property and Copyright Statements . . . . . . . . 35
Pfeiffer, et al. Expires September 20, 2005 [Page 2]
Internet-Draft ANNODEX March 2005
1. Introduction
1.1 Motivation
When searching the World Wide Web, time-continuous data such as audio
and video files are currently treated as "dark matter" outside the
existing infrastructure of the World Wide Web: It is not possible to
look inside such files, search for their content through common
text-based search engines, or directly hyperlink to points of
interest inside them. The file can generally only be consumed in its
entirety. In addition, such files are "dead ends" in that by
consuming their content the hyperlinking functionality of the Web is
left behind.
Text documents were enabled for the Web through definition of a
markup language (HTML [1]) for text documents to enable description
of the structure of a document, and thus allow for the separation of
content from presentation. This specification takes the same
approach for time-continuous documents. The markup language for
time-continuous documents is called CMML, short for Continuous Media
Markup Language [2]. It describes the structure of time-continuous
documents and allows for a clean separation of content from
presentation.
To turn text documents into a Web resource that can be exchanged
between different applications, HTML markup is added. Such an
exchange format where CMML is merged with the time-continuous
document(s) it describes is also necessary to turn the
time-continuous document(s) into a Web resource and provide a
standard exchange format between applications. This format is called
"Annodex" for annotated and indexed documents and is defined here.
1.2 Overview
Annodex is using a container format that allows transport and storage
of interleaved time-synchronous bitstreams. In a clean layering
approach as is familiar from Internet protocols the functionality of
the container format and CMML is explicitly separated. Each layer
solves a specific problem without being dependent on layers that are
further up in functionality. The container format of Annodex is the
Ogg encapsulation format version 0 [3]. Annodex is an Ogg bitstream
containing a "skeleton" and a CMML logical bitstream, in addition to
other temporally interleaved data bitstreams. Ogg skeleton is a
logical bitstream that describes all the other logical bitstreams
contained in the Ogg physical bitstream (see section 4).It's purpose
is to remove codec-specific information requirements from the
multiplexing/demultiplexing process.
Pfeiffer, et al. Expires September 20, 2005 [Page 3]
Internet-Draft ANNODEX March 2005
Only an Annodex bitstream that contains a CMML bitstream can be
regarded as a Web resource and as part of the Web, because it can be
searched and browsed. An Ogg bitstream without a CMML bitstream is
not an Annodex bitstream, but only an Ogg bitstream with a "skeleton"
logical bitstream, which is still valuable as a multitrack media
format that can be addressed through temporal hyperlinks [4], however
it is not a first class citizen on the Web because Web search engines
cannot index and crawl it.
The file extension of Annodex files is ".anx". This document also
applies for registration of the MIME type "application/annodex" for
Annodex format bitstreams. In the meantime, "text/x-annodex" will be
used. Further MIME types that this document applies for are
"video/annodex" for Annodex format (possibly multitrack) video and
"audio/annodex" for Annodex format (possibly multitrack) audio.
Please note that this document assumes that the reader understands
the Ogg encapsulation format version 0 [3]. Also, knowledge of the
network protocols HTTP [5] and RTP/RTSP [6] as well as the extension
of URIs to address temporal offsets into Web resources [4] are a
prerequisite to understanding this document. To find out more about
the use of Annodex for creating searchable and surfable Web
resources, refer to the specification of the Continuous Media Markup
Language (CMML Version 2.0) [2].
Pfeiffer, et al. Expires September 20, 2005 [Page 4]
Internet-Draft ANNODEX March 2005
2. Features of Annodex
Annodex contains interleaved bitstreams of time-related data. It is
designed to be used both as a persistent file format and as a
streaming format to exchange temporally addressable bitstreams. It
enables encapsulation of any type of time-continuous bitstream as
long as it is streamable and is based on a regular data sampling rate
(called granulerate). For variable sampling rate bitstreams, a least
common multiple of the used sampling rates must be known. Using this
container format, Annodex is designed to accommodate any current or
future compression format for time-continuous bitstreams.
The container format that Annodex is based on is designed to allow
several tracks of temporally synchronous time-continuous data. Each
track represents codec data for one type of time-continuous data
stream. Here is an example Annodex bitstream with data bitstreams
D1-D3 (for example, a video track and two audio tracks) and an
annotation track A1 (a CMML bitstream).
__________________________________________________________________
D1 | | | | | | | | | | |
__________________________________________________________________
D2 | | | | | | |
__________________________________________________________________
D3 | | | | | | | | | | | | | | | | | | | |
__________________________________________________________________
A1 | clip 1 | -- | clip 2 | clip 3 |
__________________________________________________________________
The time axis t
|----------------------------------------------------------------->
Bitstreams of time-continuous data are being regarded as a sequence
of data packets that each have a timestamp representing the time at
which the packet data ends. The packets contain all the data
required to cover the interval from the last packet. If it doesn't
cover the full period, it MUST cover the end part of the interval.
Bitstreams that represent data that is to be presented in one single
time instant are called time-instantaneous bitstreams. Their
timestamp represents the time at which the packet's data starts and
ends. The CMML track A1 above is one such bitstream. Its clips
Pfeiffer, et al. Expires September 20, 2005 [Page 5]
Internet-Draft ANNODEX March 2005
represent time-instantaneous data that is displayed at the given
timestamp. The subsequent data packet replaces the information of
the previous one. To insert a gap in a data bitstream (as in A1
above), a data packet MUST be inserted which explicitly annulls the
data.
Data bitstreams generally contain the following information:
o setup information for a codec
o content data
The setup information is inserted at the start of a data bitstream
before any content data.
Distribution of Annodex format bitstreams is performed using a
network protocol such as HTTP [5] or RTP/RTSP [6]. The basic process
is the following: The client dispatches a download or streaming
request to the server with a certain URI. The server resolves the
URI and starts delivering Annodex format bitstreams, taking into
account potential URI addressed offsets. Currently the distribution
with HTTP is clear and discussed in this document, while the details
of a distribution via RTP/RTSP are not yet examined and thus
unspecified - in particular a RTP payload needs to be defined for
Annodex.
The following figure explains the protocol stack:
________ _________ _________ __________
\
| CMML | | Video | | Audio | | ... | |
________ _________ _________ __________ |
|
| skeleton | > Annodex
_____________________________________________ |
|
| Ogg | |
_____________________________________________ /
| HTTP | RTSP |
| _______________________|
|
| | RTP |
_____________________________________________
| TCP | UDP |
_____________________________________________
| IP |
_____________________________________________
Pfeiffer, et al. Expires September 20, 2005 [Page 6]
Internet-Draft ANNODEX March 2005
The Annodex format has been designed to accommodate for reliable and
unreliable transport. In case of packet loss due to an unreliable
transport, data may get lost; this may be important to the
application or not and thus may need to be addressed. All data,
including CMML data, is treated with the same importance. For
instantaneous data tracks the loss of one packet implies that the
next packet will restore the proper state. We envisage, however,
that a client may require the current state information, so there
should be a protocol request for re-sending the current state. This
will be delivered by the server by inserting another copy of the
instantaneous data into the Annodex bitstream. For example, clips
within an annotation bitstream can be repeated in the Annodex
bitstream by having the same "track" attribute and the same
page_sequence_number as the previous "clip" element. This handling
of unreliable transport relates mostly to the use of Annodex over
RTP/RTSP and UDP and needs further elaboration.
In short, the Annodex bitstream specific features are:
o index clips of Annodex content for retrieval, e.g. with a Web
search engine.
o crawl Webs of Annodex and other Web resources, e.g. during an
indexing operation of a Web search engine.
o directly address and retrieve temporal intervals inside the
Annodex bitstream without a need to decode logical bitstreams
aside from skeleton.
o directly address and retrieve named clips inside the Annodex
bitstream without a need to decode any more than the skeleton and
CMML logical bitstreams.
o extract, cache, and reuse temporal intervals or named clips while
retaining the annotation and index information.
o browse through Webs of Annodex and other Web resources in an
integrated manner making time-continuous content first class
citizen on the World Wide Web.
Pfeiffer, et al. Expires September 20, 2005 [Page 7]
Internet-Draft ANNODEX March 2005
3. Authoring exchange format
For authoring of Annodex bitstream information, the CMML [2] is
defined. CMML's "stream" tag has been designed to author the
skeleton bitstream and describe the data bitstreams to be interleaved
into an Ogg bitstream. All other tags of a CMML file provide for
authoring of the CMML bitstream. Use of a CMML bitstream without
skeleton is strongly discouraged as the time referencing and clip
recomposition functionality of Annodexing will get lost.
An Annodex physical bitstream has the following mandatory order of
Ogg pages:
1. skeleton bos page.
2. CMML bos page.
3. bos pages of the other logical bitstreams.
4. secondary header pages of all logical bitstreams, including
fisbone.
5. skeleton eos page.
6. data and eos pages of logical bitstreams, excluding skeleton,
multiplexed in a time-synchronous fashion.
Such an Annodex bitstream is identified by the CMML bitstream's magic
number which can be found at Byte position 104 for this version of
the "skeleton" specification. This is calculated through the size of
the skeleton bos page, which is fixed because the skeleton ident
header is of fixed size and the Ogg page encapsulation header is also
fixed size. The Ogg page header has 28 Bytes (including a one Byte
segment table as this page has always less than 255 Bytes packet
content), and the skeleton ident header has 48 Bytes (see further
down). Then, the Byte position amounts to 28+48+28 = 104. The CMML
bos page MUST thus also have less than 255 Bytes packet content,
which is a sensible restriction.
The CMML media mapping is defined in the CMML [2] specification.
However, for identification of an Annodex bitstream, the bos page of
the CMML logical bitstream needs to be identifiable, which is
provided through the first 12 Bytes of the CMML ident packet
containing the magic numbers and the version information: Other
fields exists and are described in the CMML [2] specification.
1. Identifier: a 8 Byte field that identifies this file to be of a
CMML logical input bitstream. It contains the magic numbers:
0x43 'C'
0x4d 'M'
0x4d 'M'
0x4c 'L'
0x00 '\0'
Pfeiffer, et al. Expires September 20, 2005 [Page 8]
Internet-Draft ANNODEX March 2005
0x00 '\0'
0x00 '\0'
0x00 '\0'
2. Version major: 2 Byte unsigned integer signifying the major
version number of the CMML format bitstream.
3. Version minor: 2 Byte unsigned integer signifying the minor
version number of the CMML format bitstream.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1| Byte
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Identifier 'CMML\0\0\0\0' | 0-3
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | 4-7
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Version major | Version minor | 8-11
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| ...
Pfeiffer, et al. Expires September 20, 2005 [Page 9]
Internet-Draft ANNODEX March 2005
4. The Ogg skeleton logical bitstream
The purpose of Ogg skeleton is to provide codec-specific knowledge
that allows parsing, demultiplexing and remultiplexing of Ogg
bitstreams without having to decode.
While the Ogg encapsulation format by itself is capable of
interleaving an unlimited number of time-continuous bitstreams, it is
not possible to identify the type of bitstreams (e.g. audio or
video) and their encoding format (e.g. Vorbis or Speex or Theora)
without decoding at least the bos page of the logical bitstreams.
Also, further general media type information such as the image
dimensions of a frame in a video bitstream or the language of a
speech bitstream may be provided in skeleton. Another limitation of
Ogg is that each logical bitstream defines its own mapping of
granule_position to time, which is therefore also given in the
skeleton.
This situation is not acceptable for Annodex, because an Annodex
server must be able to return media format information for an Annodex
resource without having to understand the codecs involved. And it
must be able to return temporal subparts of an Annodex resource
without needing to decode.
An addition to the Ogg format is thus necessary, which describes all
the logical bitstreams included in the Ogg stream. This is defined
via a logical bitstream called the "skeleton". For Annodex
bitstreams, use of a skeleton bitstream is mandatory. This section
specifies the content of the "skeleton" logical bitstream and how it
is mapped into Ogg. Knowledge of the Ogg bitstream format as
specified in the Ogg RFC [3] is presumed. Please also refer to that
document for descriptions of the terms used in this document.
The skeleton bitstream has the ability to generically describe Ogg
bitstreams that consist of one or more time-continuous data bitstream
and one or more time-instantaneous data bitstream concurrently
interleaved (in Ogg terms: multiplexed). It does not describe
sequentially multiplexed Ogg bitstreams, but rather expects that a
sequentially multiplexed bitstream has its own skeleton logical
bitstream.
The skeleton logical bitstream provides the following functionality
on top of Ogg:
o allows for the identification of the codec format and the content
type of encapsulated logical bitstreams without the need to decode
that bitstream's headers or data.
o allows for extraction of a temporal interval of the Ogg physical
bitstream while retaining the original start time offset of that
Pfeiffer, et al. Expires September 20, 2005 [Page 10]
Internet-Draft ANNODEX March 2005
interval.
o allows for attachment of a real-world wall-clock time and a date
to the Ogg physical bitstream, thus e.g. retaining creation
date/time or first broadcast date/time.
o allows for temporal offset operations into an Ogg physical
bitstream without a need to decode any data.
o allows generally for handling of content without a need to decode
it, such as is necessary in a caching Web proxy.
o allows for attachment of message header fields given as name-value
pairs that contain some sort of protocol messages about the
logical bitstream, e.g. the screen size for a video bitstream or
the number of channels for an audio bitstream.
For authoring of the skeleton bitstream information the CMML [2] can
be used. CMML's "stream" tag has been designed with that purpose in
mind. However, it is not mandatory to use CMML for authoring of
skeleton information - that information may well originate from a
different source and be written directly into the skeleton bitstream.
See the CMML Internet-Draft for more details.
4.1 The format of the skeleton ident header
The skeleton logical bitstream starts with an ident header containing
information for the complete Ogg physical bitstream. The ident
header has the following format:
Pfeiffer, et al. Expires September 20, 2005 [Page 11]
Internet-Draft ANNODEX March 2005
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1| Byte
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Identifier 'fishead\0' | 0-3
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | 4-7
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Version major | Version minor | 8-11
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Presentationtime numerator | 12-15
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | 16-19
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Presentationtime denominator | 20-23
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | 24-27
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Basetime numerator | 28-31
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | 32-35
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Basetime denominator | 36-39
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | 40-43
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| UTC | 44-47
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | 48-51
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | 52-55
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | 56-59
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | 60-63
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Fields with more than one Byte length are encoded LSB (least
significant Byte) first.
The fields in the skeleton ident header have the following meaning:
1. Identifier: a 8 Byte field that identifies this bitstream as a
skeleton. It contains the magic numbers:
0x66 'f'
0x69 'i'
0x73 's'
Pfeiffer, et al. Expires September 20, 2005 [Page 12]
Internet-Draft ANNODEX March 2005
0x68 'h'
0x65 'e'
0x61 'a'
0x64 'd'
0x00 '\0'
2. Version major: 2 Byte unsigned integer signifying the major
version number of the skeleton bitstream. This document
specifies the major version 3.
3. Version minor: 2 Byte unsigned integer signifying the minor
version number of the skeleton bitstream. This document
specifies the minor version 0.
4. Presentationtime numerator & denominator: 8 Byte signed integer
each They represent together the time at which to start
presenting the Ogg physical bitstream given as a rational number.
The denominator represents the temporal resolution at which the
presentationtime is given. E.g. 5 on 1000 results in a
presentationtime of 0.005 sec. This enables a very high temporal
resolution without having to store floating point numbers. In a
newly created physical bitstream presentationtime and basetime
are the same. When remultiplexing a subpart of the stream, this
number MUST be adapted to the requested start time offset of the
newly created stream.
5. Basetime numerator & denominator: 8 Byte signed integer each They
represent together the basetime of the Ogg physical bitstream
given as a rational number like the presentationtime. This
number is fixed once the physical bitstream is created and
provides a mapping to time for the beginning of the physical
bitstream when it starts with a granule position of 0.
6. UTC: a 20 Byte string containing a UTC time in the form of
YYYYMMDDTHHMMSS.sssZ. It associates a calendar date and a
wall-clock time with the basetime. It is a sequence of 20 NUL
Bytes if not in use, making this ident packet and thus the bos
page of the skeleton bitstream constant length.
Please note: The possible temporal resolution of the presentation-
and basetime is on the order of 2^-64. For example, the time formats
in use for media that are described in this document range from 1/24
to 1/60 for the different smpte formats. This resolution is enough
for any one of these. It is also expected to accommodate any future
needs of time resolution for any other time format and
time-continuously sampled data.
4.2 The format of the skeleton secondary headers
The skeleton secondary headers are a sequence of packets that each
contain information about one of the time-continuous or
time-instantaneous other logical bitstreams contained within the Ogg
physical bitstream. A skeleton secondary header packet has the
Pfeiffer, et al. Expires September 20, 2005 [Page 13]
Internet-Draft ANNODEX March 2005
following format:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1| Byte
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Identifier 'fisbone\0' | 0-3
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | 4-7
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Offset to message header fields | 8-11
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Serial number | 12-15
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Number of header packets | 16-19
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Granulerate numerator | 20-23
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | 24-27
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Granulerate denominator | 28-31
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | 32-35
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Startgranule | 36-39
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | 40-43
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Preroll | 44-47
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Granuleshift | Padding/future use | 48-51
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Message header fields ... | 52-
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Fields with more than one Byte length are encoded LSB (least
significant Byte) first.
The fields in a skeleton secondary header packet have the following
meaning:
1. Identifier: a 8 Byte field that identifies this packet as a
skeleton secondary header for identifying other logical
bitstreams. It contains the magic numbers:
0x66 'f'
0x69 'i'
Pfeiffer, et al. Expires September 20, 2005 [Page 14]
Internet-Draft ANNODEX March 2005
0x73 's'
0x62 'b'
0x6f 'o'
0x6e 'n'
0x65 'e'
0x00 '\0'
2. Offset to message header fields: 4 Byte unsigned integer that
contains the number of Bytes used in this packet before the
message header fields. For the version of the skeleton
bitstream described in this document this number is fixed to 44.
This field accommodates future changes to the skeleton bitstream
allowing to parse message header fields even if more fields get
inserted before them.
3. Serial number: 4 Byte unsigned integer containing the
bitstream_serial_number of the Ogg logical bitstream described
by this skeleton secondary header packet and thus connecting it
to the logical bitstream.
4. Number of header packets: a 4 Byte unsigned integer that contains
the number of header packets of that particular logical
bitstream consisting of the bos page and the secondary header
pages.
5. Granulerate numerator & denominator: 8 Byte signed integer each
They represent the temporal resolution of the logical bitstream
in Hz given as a rational number in the same way as the basetime
attribute above.
6. Startgranule: 8 Byte signed integer that represents the granule
number with which this logical bitstream starts, which is
originally 0, but will be a positive offset when only a subpart
of the stream is requested.
7. Preroll: 4 Byte unsigned integer that contains the number of
packets to pre-roll in order to decode a current packet
correctly. This is for example the case with Ogg Vorbis, which
requires a pre-roll of 2 packets.
8. Granuleshift: a 1 Byte unsigned integer describing whether to
partition the granule_position into two for that logical
bitstream, and how many of the lower bits to use for the
partitioning. The upper bits then still signify a
time-continuous granule position for a directly decodable and
presentable data granule. The lower bits allow for
specification of a finer resolution such that for example
predicted frames of a video can be addressed as well, though not
decoded without tracing back to the last fully decodable data
granule. This is e.g. the case with Ogg Theora.
9. Padding/future use: 3 Bytes padding data that may be used for
future requirements and are mandated to zero in this revision.
10. Message header fields: header fields, following the generic
Internet Message Format defined in RFC 2822 [7]. Each header
field consists of a name followed by a colon (":") and the field
Pfeiffer, et al. Expires September 20, 2005 [Page 15]
Internet-Draft ANNODEX March 2005
value. Field names are case-insensitive. The field value MAY
be preceded by any amount of LWS, though a single SP is
preferred. Header fields can be extended over multiple lines by
preceding each extra line with at least one SP or HT.
There is one mandatory Message header field for all of the logical
bitstreams: the "Content-type" header field. For an application that
is parsing the Annodex bitstream, this field contains the MIME type
and the character encoding of the data in the logical bitstream.
E.g. for the annotation bitstream, this field will contain the value
"Content-type: text/x-cmml; UTF-8" if the character set used in the
CMML bitstream is UTF-8. E.g. for a bitstream containing Ogg Vorbis
data the value is "Content-type: audio/x-vorbis". The Content-type
message header field MUST come first for all of the Message header
fields such that it can be found at a fixed location in the skeleton
fisbone packet.
As per RFC 2277 [8], message header fields are considered protocol
data, i.e. it is not expected to have human readable text in there,
and they MUST be entirely encoded in UTF-8. In addition, the
mandatory header fields MUST be encoded in US-ASCII and it is
recommended to also use US-ASCII code points as much as possible for
the optional header fields.
User defined optional message header fields MUST follow the naming
standard given in RFC2822.
4.3 Media mapping of skeleton into Ogg
The media mapping for skeleton into Ogg is as follows:
o The skeleton ident (fishead) header is mapped into the skeleton
bos page.
o The secondary header pages of a skeleton logical bitstream consist
of the fisbone header packets that each describe one particular
logical data bitstream within the Ogg physical bitstream.
o There are no content pages or data packets. As the skeleton eos
page is included before the first data page of any logical
bitstream, there actually cannot be any content data packets.
o The skeleton eos page contains one packet of length zero.
When using a skeleton logical bitstream in Ogg, a further restriction
on the order in which Ogg pages appear is introduced to allow for
easier identification:
1. The skeleton bos page is the very first bos page. This allows
its differentiation from other Ogg bitstreams that don't contain
a skeleton logical bitstream.
2. The bos pages of the other logical bitstreams come next as is a
requirement of the Ogg bitstream format.
Pfeiffer, et al. Expires September 20, 2005 [Page 16]
Internet-Draft ANNODEX March 2005
3. The secondary header pages of all the logical bitstreams in the
Ogg physical bitstream come next, as is also a requirement of
Ogg. The skeleton secondary header pages are also included here.
4. Before any data pages of any of the logical bitstreams appear in
the Ogg physical bitstream, the skeleton eos page MUST end the
skeleton logical bitstream. This is necessary to end the control
section of the bitstream. If an Ogg stream parser reaches the
skeleton eos page, it knows that it has received all the bos and
secondary header pages and can start setting up its decoding or
parsing environment.
Pfeiffer, et al. Expires September 20, 2005 [Page 17]
Internet-Draft ANNODEX March 2005
5. Handling time in Annodex format bitstream
With time-continuous data like Annodex, one needs to handle data at
four different levels:
o at the Bytes level, upon seeking.
o at the packets level, upon encapsulating.
o at the granules level, upon recomposing.
o at the time level, upon displaying and addressing.
This section explains how they all fit together.
5.1 Conceptual overview
Annodex bitstreams inherently represent one timeline only, where the
different logical bitstreams can be thought of as content tracks on
that timeline. All of these tracks relate to the same timeline which
starts at a certain time point and ends when the last bitstream ends.
An example bitstream can be seen in the following figure. It
consists of an Annodex bitstream that contains 4 media bitstreams and
one CMML bitstream. The picture is a conceptual representation of
the time intervals covered by the different logical bitstreams and
the Ogg pages used to encapsulate the data. In the flat
representation these are multiplexed such that the data packets of
each of these bitstreams occur at the correct time.
t_url
|
t_0 v t_n
|------------------------------------------------------------------->|
----------------------------------------------------------------------
|clip1 | clip 2 |/clip 3///////////////| clip 4 |
----------------------------------------------------------------------
CMML bitstream
----------------------------------------------
| | | | | | | | | | |//| | | | |
----------------------------------------------
audio bitstream 1
-------------------------------------------------------------
| | | |/////| | | | | | |
-------------------------------------------------------------
video bitstream 1
----------------------------------------------------
| | | | |//| | | | | | | | | | | | |
----------------------------------------------------
audio bitstream 2
-------------------------------
Pfeiffer, et al. Expires September 20, 2005 [Page 18]
Internet-Draft ANNODEX March 2005
| |/////| | | |
-------------------------------
video bitstream 2
The time point at which an Annodex bitstream starts (t_0 in the above
diagram) is called the "basetime" and represents the time in seconds
associated with the granule position of 0 on all logical bitstreams.
Typically, a newly created Annodex file starts all its logical
bitstreams at granule position 0, and a typical extract of an Annodex
bitstream, such as the one starting at t_url in the image above,
starts each of its logical bitstreams at a different granule
positions. These granule positions are stored in the "startgranule"
field of the skeleton secondary header packets.
The "basetime" of an Annodex bitstream may be 0, but it can also be
any positive time. For example, in professional video production,
the first frame of video of a program normally refers to a SMPTE
basetime of 01:00:00:00, not 00:00:00:00 (see also the temporal URI
addressing [4] specification). Associating such a practice to a
digital video resource requires a way to store that basetime with the
resource and interpreting it correctly when addressing offsets such
as t_uri. Annodex provides such a mapping through the basetime field
in the skeleton ident header.
Also associated with the basetime is a calendar date and wall-clock
time (a "UTC base") which represent a real-world time giving some
meaningful calendar date association to the content such as the
creation time or the first presentation time. The UTC base is
specified in the UTC field of the skeleton ident header.
5.2 Mapping a granule position to a time position
Each one of the encapsulated data bitstreams and the CMML bitstream
have their own temporal resolution at which they provide data to
cover the given timeline. This temporal resolution is usually given
through the sampling rate of the particular bitstream. For example,
a raw audio bitstream at CD quality is sampled with a sampling rate
of 44100 Hz. A video bitstream may be sampled with a frame rate of
25 frames per second.
This temporal resolution is called the "granulerate". A granule is a
data element that is based on a regular data rate specific to the
content type, such as the frame rate for video or the sampling rate
for audio. It even exists for bitstreams that are not sampled at a
regular rate - then it is the highest resolution of any of the used
sampling rates. The granulerate is specified in the skeleton
secondary header packets for each logical bitstream.
Pfeiffer, et al. Expires September 20, 2005 [Page 19]
Internet-Draft ANNODEX March 2005
Each one of the bitstreams insert data into the Ogg bitstream through
packets which have an associated temporal duration based on the
encoder packaging. Packets are packaged into Ogg pages, which have a
granule position associated with them. Not taking the special case
of a granuleshift into account, the granule position specifies the
number of granules that has been encapsulated since the implicit
start of the original bitstream until and including the given Ogg
page.
The granule position together with the granulerate and granuleshift
information of the skeleton secondary header packets for the
particular logical bitstream are used for the calculation of the time
position for which a data packet of the logical bitstream completes
data. A granule position of -1 indicates a special case and MUST NOT
be used for calculation of a mapping to time.
In principle, the granule position of an Ogg page divided by the
granulerate of this page's logical bitstream provides the time
position that is reached in that bitstream after decoding all data
packets finished on this page. However, the granule_position field
in an Ogg page allows for a more fine-grained description of the
temporal position. The following image explains the composition of
the granule_position field in an Ogg page:
granule_position
------------------------------------------------
| keyindex | keyoffset |
------------------------------------------------
The granuleshift field of the skeleton secondary header packets
describes how many of the granule_position's 64 bits are being used
for the keyoffset. The keyoffset part of the granule_position is
commonly used when the logical bitstream consists of packets that can
only be fully decoded when referring back to a previous packet. For
example, video streams often consist of inter and intra coded frames,
where the intra frames are fully decodable and the inter frames are
intermediate frames that require backtracking to the last inter frame
for accurate decoding. Another example is a logical bitstream that
is mapped as instantaneous information (i.e. their granuleposition
represents the start time and the end time of the packet data), but
actually has a duration associated to it, which is provided through a
subsequent packet. CMML is such an example. The keyindex part of
the granule_position is then used to provide the temporal position of
the reference packet and the keyoffset part provides a counter for
the data in between.
The calculation of the temporal position of an Ogg page in Annodex is
thus specified through the following algorithm:
Pfeiffer, et al. Expires September 20, 2005 [Page 20]
Internet-Draft ANNODEX March 2005
t_page = basetime + ((keyindex + keyoffset) / granulerate)
The basetime provides the time offset used at the beginning of the
logical bitstream for the first data packet and thus MUST be added
for a correct calculation of the temporal position.
As an example regard an audio bitstream that has a granulerate of
44100 (i.e. 44100 samples per 1 sec), a granuleshift of 0, and
starts at 4 sec. When reaching a granule_position of 88200, this
maps to a time position of 6 seconds:
t_page = 4 + ((88200 + 0) / 44100) = 6
This signifies that the bitstream has reached the second sec of the
audio bitstream after the end of decoding this page's packets, but
maps to 6 seconds because of the basetime.
As another example consider a video bitstream that has a granulerate
of 25 (i.e. 25 frames per 1 second), a granuleshift of 3 (because it
encodes - say - 7 partial frames between each fully encoded frame),
and starts at 0 sec. When reaching a granule_position of 997, i.e.
a keyindex of 62 and a keyshift of 5, this maps to a fully decodable
time position of 2.68 seconds:
t_page = 0 + ((62 + 5) / 25) = 2.68 sec
The granulerate of a time-instantaneous bitstream such as the CMML
bitstream can be chosen arbitrarily by the bitstream multiplexer.
Per default, a granulerate of 1000 is used, which is the resolution
of npt. The resolution of all the time schemes is given as:
o npt: 1000 (milliseconds)
o smpte-24: 24 (24 fps)
o smpte-24-drop: 24/1.001 = 23.976 (approx. as per SMPTE)
o smpte-25: 25
o smpte-30: 30
o smpte-30-drop: 30/1.001 = 29.970 (approx. as per SMPTE)
o smpte-50: 50
o smpte-60: 60
o smpte-60-drop: 60/1.001 = 59.940 (approx. as per SMPTE)
The granule position of the page finishing data of a
time-instantaneous bitstream packet MUST signify the start time of
that packet. For example, a CMML bitstream with a granulerate of
1000, a basetime of 0, and a clip that lasts from npt=12.020 till
npt=15.0 will get a granule_position of 12020. In contrast, the
granule_position of the page finishing data of e.g. an audio
bitstream with granulerate 44100, basetime 0 and containing data from
npt=12.020 to npt=15.0 will be 661500.
Pfeiffer, et al. Expires September 20, 2005 [Page 21]
Internet-Draft ANNODEX March 2005
A note about field overflows: an overflow of the granule position
field can destroy the temporal integrity of the Annodex physical
bitstream. In this case, a multiplexer MUST end the Annodex physical
bitstream and restart a new one resetting the counter to 0 and
adjusting the basetime appropriately. This is also called sequential
multiplexing in Ogg. The same measure MUST be taken in case of an
overflow of the page_sequence_number on one of the logical
bitstreams.
5.3 Addressing/seeking into the bitstream
Addressing into an Annodex bitstream is possible with the temporal
URI addressing [4] scheme. Time is specified as a temporal offset
from the "beginning" of the stream, making use of the basetime field.
Time offsets can also be specified as calendar dates and times. The
UTC base is then used as a basis for offsetting.
The basetime allows to correctly map a temporal offset point such as
a temporal URI to a Byte position in the stream. In the above figure
take t_uri=npt:14.0 as the temporal offset addressed on a stream with
t_0=npt:5.0 as the basetime - this requires a stream offsetting of
only 9 sec to the appropriate granule position in each of the
bitstreams, in the figure marked through patterned pages.
The seeking action is performed on the interleaved bitstream, in
which, the data packets occur in a temporally consecutive order based
on the time at which their data ends. These times are represented in
the granule positions of the Ogg pages, which are only allowed to
monotonically increase within one logical bitstream. This implies
that when having found an Ogg page with a granule position that maps
to a given seek time (i.e. covers the time or ends at it), the seek
has found the right location. This applies over all logical
bitstreams. In the above example, this means that the Byte position
of the first occurring page of the patterned pages has been found.
There is a complication to the seeking: some logical bitstreams have
backwards dependencies in their data packets and these have to be
taken into account for seeking. For example, a logical bitstream may
require several of its previous packets to allow a correct and
complete decoding of the actual packet that occurs at the seektime.
This is the case for Theora which requires to go back to the previous
keyframe when decoding from a time offset. It is also the case for
Vorbis which requires the previous 2 packets for accurate setup of
the frequency transform - Speex needs approximately 2 packets for
similar reasons. Even instantaneous bitstreams such as CMML may
require to go back to a previous packet to recover the last state
information - the currently active clip in the case of CMML.
Pfeiffer, et al. Expires September 20, 2005 [Page 22]
Internet-Draft ANNODEX March 2005
Therefore, once seeking has located the correct Byte position that
refers to the given temporal offset, it MUST seek back. For logical
bitstreams that have a non-zero "granuleshift" in the skeleton, it
MUST seek back to the Ogg page that has a "keyindex" granule
position. For logical bitstreams that have a non-zero "preroll" in
the skeleton, it MUST seek back that many packets. The earliest Byte
position that satisfies all these requirements is the correct seek
position.
A player that presents from an offset MUST take into account that the
bitstream may contain some packets that are only there to allow
accurate decoding of the seek time. When the backwards dependencies
were resolved for a specific logical bitstream, several non-relevant
Ogg pages of may also have ended up in the intermediate. These have
to be skipped by a player. The time that a player MJST start
presenting from is given in the "presentationtime" in the skeleton
ident header.
5.4 Remultiplexing a bitstream
When a subpart of an Annodex bitstream is requested, such as through
a temporal URI query request from a Web server, the bitstream MUST be
recomposed and a remultiplexed bitstream served out. There are
several aims for performing the remultiplexing with as little effort
and therefore as little delay as possible:
o no decoding of the logical bitstreams is performed.
o no changes to the pages, in particular to the granule positions
are made.
o changes occur only to the control section.
The fields of the skeleton track allow achievement of all these aims.
Remultiplexing is essentially achieved by seeking to the position as
described above and then including from each logical bitstream only
the relevant Ogg pages into the new stream. Changes to fields in the
bitstream are restricted to the control section:
o the "presentationtime" MUST be adjusted to the requested start
time
o the "startgranule" for each logical bitstream MUST be adjusted to
the granule position at which each logical bitstream starts. This
is not the first granule position of the Ogg pages included into
the bitstream, but rather the last one that did not get included,
as it represents the start time of the bitstream.
Everything else, and in particular the Ogg pages, stay the same.
This is important also to allow caching of such files as is required
for Web proxies and described in temporal URI addressing [4].
Pfeiffer, et al. Expires September 20, 2005 [Page 23]
Internet-Draft ANNODEX March 2005
6. MIME media type applications
6.1 MIME media type registration for 'application/annodex'
This section contains the registration information for the
"application/annodex" media type. While this media type is not
approved by the IANA, "application/x-annodex" may be used.
To: ietf-types@iana.org
Subject: Registration of MIME media type application/annodex
MIME media type name: application
MIME subtype name: annodex
Required parameters: none
Optional parameters: none
Encoding Considerations: Annodex is an exchange format for any type
of encoded time-continuously sampled data stream. The authoring
software MUST provide for the encoders, providing the MIME type (and
potentially the charset for text-based formats) in the "Content-type"
Message header field of each bitstream. The client software can
select an appropriate decoder based on this information.
Security considerations: see next section.
Interoperability considerations: the Annodex bitstream format is a
free specification that is independent of any media encoding format.
It is designed to provide interoperability with the existing World
Wide Web.
Additional information:
Magic numbers: "OggS" identifies an Ogg page at Byte position 0,
"fishead\0" identifies a skeleton logical bitstream at Byte
position 28. In the second Ogg page at Byte position 28 the magic
number "CMML\0\0\0\0" can be found, identifying this as an Annodex
bitstream.
File extension: .anx
Macintosh File Type Code: "ANDX"
Intended usage: COMMON
6.1.1 URI addressing into Annodex bitstreams
As Annodex bitstreams are time-continuous Web resources, hyperlinking
into Annodex bitstreams via URIs is possible with the temporal URI
Pfeiffer, et al. Expires September 20, 2005 [Page 24]
Internet-Draft ANNODEX March 2005
query and fragment specification [4]. For the query case, an Annodex
server must supports the "X-Accept-TimeURI" http header field (see
the temporal URI query specification [4] for more details). The
"X-Accept-Range-Redirect" and "X-Range-Redirect" http header fields
MAY also be supported by an Annodex server and user agent.
As Annodex bitstreams contain CMML logical bitstreams, URI addressing
of clips via their name given in the "id" tag is also supported. The
same mechanisms as specified in the CMML specification [2] apply to
Annodex analogously. In particular, the id addressing is also
regarded as an alias for a time offset and an Annodex conformant
server that supports Annodex temporal URI addressing MUST also
support named URI addressing (see the CMML specification [2] for more
details).
Examples for valid URI addresses:
o http://example.com/sample.anx?t=npt:4 , which relates to an
Annodex bitstream composed by the server from sample.anx by
starting it at an offset of 4 seconds.
o http://example.com/sample.anx?id=dolphin --- relates to the clip
whose id attribute value is "dolphin" and all further clips after
that.
o http://example.com/sample.anx?id="dolphin/" --- relates only to
the clip whose id attribute value is "dolphin".
o http://example.com/sample.anx?id="intro/goldfish" --- realtes to
all the clips from the "intro" clip to the "goldfish" clip.
o http://example.com/sample.anx#t=npt:4 --- start using the Annodex
bitstream from a 4 second offset.
o http://example.com/sample.anx#dolphin -- use the clip with
id="dolphin" only.
6.1.2 HTTP 'Accept' header field interpretation
The Annodex and the CMML file that can be extracted from it are very
tightly related to each other: the CMML file contains all annotation
and indexing information including basetime and UTC time about the
Annodex file. Therefore, receiving the CMML file instead of the
Annodex file is like receiving all information about the bitstreams
in the Annodex file except for the data bitstreams themselves.
This situation can be taken advantage of with the "Accept" header of
HTTP. When an Annodex file is requested from a HTTP server and the
acceptable content types given in the "Accept" message header field
contains "text/x-cmml" with a higher priority than
"application/x-annodex", then the HTTP server SHOULD return the CMML
file instead of the requested Annodex file itself. As is standard,
the HTTP response will contain a "Content-type" field indicating what
content was actually returned. A Web crawler of a search engine,
Pfeiffer, et al. Expires September 20, 2005 [Page 25]
Internet-Draft ANNODEX March 2005
e.g., can thus avoid extra network load and retrieve more easily
parsable information. It SHOULD set the "Accept" HTTP header to
"Accept: text/x-cmml" for every requested Annodex URI. For example:
Accpet: text/x-cmml; q=1, application/x-annodex; q=0.5
6.2 MIME media type registration for 'video/annodex'
This section contains the registration information for the
"video/annodex" media type. While this media type is not approved by
the IANA, "video/x-annodex" may be used.
To: ietf-types@iana.org
Subject: Registration of MIME media type "video/annodex"
MIME media type name: video
MIME subtype name: annodex
Required parameters: none
Optional parameters: none
Encoding Considerations: Annodex video is a subclass of Annodex data
where there is at least on video track encpsulated together with the
skeleton and CMML tracks, and a potentially unlimited number of other
audio and video tracks.
Security considerations: as in "application/annodex" MIME
application.
Interoperability considerations: as in "application/annodex" MIME
application.
Additional information:
Magic numbers: as in "application/annodex" MIME application.
File extension: .axv
Macintosh File Type Code: "ANXV"
Intended usage: COMMON
URI addressing and HTTP header field use of "application/annodex"
type content apply analogously to "video/annodex".
6.3 MIME media type registration for 'audio/annodex'
This section contains the registration information for the
"audio/annodex" media type. While this media type is not approved by
Pfeiffer, et al. Expires September 20, 2005 [Page 26]
Internet-Draft ANNODEX March 2005
the IANA, "audio/x-annodex" may be used.
To: ietf-types@iana.org
Subject: Registration of MIME media type "audio/annodex"
MIME media type name: audio
MIME subtype name: annodex
Required parameters: none
Optional parameters: none
Encoding Considerations: Annodex audio is a subclass of Annodex data
where there is at least on audio track encpsulated together with the
skeleton and CMML tracks, and a potentially unlimited number of other
audio tracks.
Security considerations: as in "application/annodex" MIME
application.
Interoperability considerations: as in "application/annodex" MIME
application.
Additional information:
Magic numbers: as in "application/annodex" MIME application.
File extension: .axa
Macintosh File Type Code: "ANXA"
Intended usage: COMMON
URI addressing and HTTP header field use of "application/annodex"
type content apply analogously to "audio/annodex".
Pfeiffer, et al. Expires September 20, 2005 [Page 27]
Internet-Draft ANNODEX March 2005
7. Security considerations
Annodex format bitstreams contain several multiplexed binary media
and one XML annotation bitstream. There is no generic encryption or
signing mechanism provided for the complete bitstream or anyone of
its parts. As the format of the encapsulated media bitstreams is not
prescribed and is identified through the "Content-type" Message
header field in that bitstream's skeleton secondary header packet, it
is possible to encrypt or sign that media bitstream and then mark it
accordingly with a MIME type that signifies the encryption. It is up
to the applications that use this bitstream to provide an appropriate
codec to handle such bitstreams.
As Annodex format bitstreams contain binary media bitstreams, it is
possible to include executable content in them. This can be an issue
with applications that decode these bitstreams, especially when they
are used in a network scenario. Such applications MUST ensure
correct handling of manipulated bitstreams, of buffer overflow and
the like.
Pfeiffer, et al. Expires September 20, 2005 [Page 28]
Internet-Draft ANNODEX March 2005
8. ChangeLog
draft-pfeiffer-annodex-01:
o Annodex version 2.0: changes because of renamings of CMML tags and
changes to the temporal and named URI addressing.
draft-pfeiffer-annodex-02:
o Annodex version 3.0: The changes pertain to the bitstream format
to allow for a stronger decoupling of Annodex and CMML. The
Annodex format is now using the Ogg format with a "skeleton" and a
"CMML" logical bitstream. This change has reinforced a layered
approach that fits better with existing practice in Internet
protocols, where each layer solves a specific problem without
being dependent on other layers further up.
9. References
[1] World Wide Web Consortium, "HTML 4.01 Specification", W3C HTML,
December 1999, .
[2] Pfeiffer, S., Parker, C. and A. Pang, "The Continuous Media
Markup Language (CMML), Version 2.0 (work in progress)",
I-D draft-pfeiffer-cmml-02.txt, March 2005,
.
[3] Pfeiffer, S., "The Ogg encapsulation format version 0",
RFC 3533, May 2003, .
[4] Pfeiffer, S., Parker, C. and A. Pang, "Specifying time
intervals in URI queries and fragments of time-based Web
resources (work in progress)",
I-D draft-pfeiffer-temporal-fragments-03.txt, March 2005,
.
[5] Fielding, R., Gettys, J., Mogul, J., Nielsen, H., Masinter, L.,
Leach, P. and T. Berners-Lee, "Hypertext Transfer Protocol --
HTTP/1.1", RFC 2616, June 1999,
.
[6] Schulzrinne, H., Rao, A. and R. Lanphier, "Real Time Streaming
Protocol (RTSP)", RFC 2326, April 1998,
.
[7] Resnick, P., "Internet Message Format", RFC 2822, April 2001,
.
[8] Alvestrand, H., "IETF Policy on Character Sets and Languages",
RFC 2277, January 1998, .
Pfeiffer, et al. Expires September 20, 2005 [Page 29]
Internet-Draft ANNODEX March 2005
[9] Bradner, S., "Key words for use in RFCs to Indicate
Requirements Levels", RFC 2119, BCP 14, March 1997.
[10] World Wide Web Consortium, "Extensible Markup Language (XML)
1.0", W3C XML, October 2000,
.
[11] World Wide Web Consortium, "XHTML(TM) 1.0 The Extensible Hyper
Text Markup Language", W3C XHTML, January 2000,
.
[12] Berners-Lee, T., Fielding, R. and L. Masinter, "Uniform
Resource Identifiers (URI): Generic Syntax", RFC 3986, January
2005, .
[13] Alvestrand, H., "Tags for the Identification of Languages",
RFC 1766, March 1995, .
[14] Freed, N. and N. Borenstein, "Multipurpose Internet Mail
Extensions (MIME) Part Two: Media Types", RFC 2046, November
1996, .
[15] Whitehead, E. and M. Murata, "XML Media Types", RFC 2376, July
1998, .
[16] The Society of Motion Picture and Television Engineers, "SMPTE
STANDARD for Television, Audio and Film - Time and Control
Code", ANSI 12M-1999, September 1999.
[17] ISO, TC154., "Data elements and interchange formats --
Information interchange -- Representation of dates and times",
ISO 8601, 2000.
Authors' Addresses
Silvia Pfeiffer
Commonwealth Scientific and Industrial Research Organisation CSIRO, Australia
PO Box 76
Epping, NSW 1710
Australia
Phone: +61 2 9372 4180
Email: Silvia.Pfeiffer@csiro.au
URI: http://www.ict.csiro.au/
Pfeiffer, et al. Expires September 20, 2005 [Page 30]
Internet-Draft ANNODEX March 2005
Conrad D. Parker
Commonwealth Scientific and Industrial Research Organisation CSIRO, Australia
PO Box 76
Epping, NSW 1710
Australia
Phone: +61 2 9372 4222
Email: Conrad.Parker@csiro.au
URI: http://www.ict.csiro.au/
Andre T. Pang
Commonwealth Scientific and Industrial Research Organisation CSIRO, Australia
PO Box 76
Epping, NSW 1710
Australia
Phone: +61 2 9372 4222
Email: Andre.Pang@csiro.au
URI: http://www.ict.csiro.au/
Pfeiffer, et al. Expires September 20, 2005 [Page 31]
Internet-Draft ANNODEX March 2005
Appendix A. Definitions of terms and abbreviations
Time-continuously sampled data: any sequence of binary data that
represents an analog-time signal sampled in discrete time steps.
In contrast to actual discrete-time signals as known from signal
processing, time-continuously sampled data may also come in
compressed form, such that a block of numbers represents an
interval of time.
Time-instantaneous bitstream: a time-continuously sampled data stream
where the components provide information for a specific
time-instant.
Time-continuous bitstream: a time-continuously sampled data stream
where the components provide ongoing information as time goes by.
Clip: a temporal section of a time-continuous data stream.
Annotation: a free-text, unstructured description of a clip.
Metadata: a name-value pair that provides a structured, database-like
description of the content.
Hyperlink: a Unified Resource Identifier (URI).
Meta information: collection of information about a data stream,
which may include annotations, hyperlinks, and metadata.
Fragment: a subpart of a media document covering some temporal
interval.
Mark-up: XML tags and their content used to describe a media
document.
Annodex bitstream: encapsulated time-continuous bitstream with head
and clip elements.
Annotating: the task of giving textual descriptions to fragments of
media documents.
Indexing: the task of identifying index points for media documents or
fragments thereof.
Hyperlinking: the task of linking from one Web resource to another.
If a link has an offset into the resource, this is sometimes
called deep hyperlinking.
head element: CMML data containing information on an Annodexed media
file.
media packet: a block of digital data that represents a temporal
subpart of a stream of continuous media. Media packets of one
continuous media file do not overlap in time.
bitstream: a sequence of time-continuous data.
Pfeiffer, et al. Expires September 20, 2005 [Page 32]
Internet-Draft ANNODEX March 2005
Appendix B. Glossary of acronyms
CMML: Continuous Media Markup Language.
DTD: Document Type Declaration.
XML: eXtensible Markup Language.
CMWeb: Continuous Media Web.
Web: World Wide Web.
URI: Unified Resource Identifier.
Pfeiffer, et al. Expires September 20, 2005 [Page 33]
Internet-Draft ANNODEX March 2005
Appendix C. Acknowledgments
The authors greatly acknowledge the contributions of Rob Collins,
Zentaro Kavanagh, Andrew Nesbit and Simon Lai in developing this
specification.
Pfeiffer, et al. Expires September 20, 2005 [Page 34]
Internet-Draft ANNODEX March 2005
Intellectual Property Statement
The IETF takes no position regarding the validity or scope of any
Intellectual Property Rights or other rights that might be claimed to
pertain to the implementation or use of the technology described in
this document or the extent to which any license under such rights
might or might not be available; nor does it represent that it has
made any independent effort to identify any such rights. Information
on the procedures with respect to rights in RFC documents can be
found in BCP 78 and BCP 79.
Copies of IPR disclosures made to the IETF Secretariat and any
assurances of licenses to be made available, or the result of an
attempt made to obtain a general license or permission for the use of
such proprietary rights by implementers or users of this
specification can be obtained from the IETF on-line IPR repository at
http://www.ietf.org/ipr.
The IETF invites any interested party to bring to its attention any
copyrights, patents or patent applications, or other proprietary
rights that may cover technology that may be required to implement
this standard. Please address the information to the IETF at
ietf-ipr@ietf.org.
Disclaimer of Validity
This document and the information contained herein are provided on an
"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Copyright Statement
Copyright (C) The Internet Society (2005). This document is subject
to the rights, licenses and restrictions contained in BCP 78, and
except as set forth therein, the authors retain all their rights.
Acknowledgment
Funding for the RFC Editor function is currently provided by the
Internet Society.
Pfeiffer, et al. Expires September 20, 2005 [Page 35]