INTERNET-DRAFT                                    Eric Fleischman
draft-fleischman-asf-rtp-record-00                Anders Klemets
                                                  Microsoft Corporation
                                                  November 14, 1997
                                                  Expires: May 14, 1998

          Recording MBone Sessions to ASF Files

Status of This Memo

This document is an Internet-Draft.  Internet-Drafts are working 
documents of the Internet Engineering Task Force (IETF), its areas, and 
its working groups.  Note that other groups may also distribute working 
documents as Internet-Drafts.

Internet-Drafts are draft documents valid for a maximum of six months 
and may be updated, replaced, or obsoleted by other documents at any 
time.  It is inappropriate to use Internet-Drafts as reference material 
or to cite them other than as ``work in progress.''

To learn the current status of any Internet-Draft, please check the 
``1id-abstracts.txt'' listing contained in the Internet-Drafts Shadow 
Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), 
munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or 
ftp.isi.edu (US West Coast).

Distribution of this document is unlimited.

Abstract

This document specifies two approaches by which multimedia data (e.g., 
MBone conferences), transmitted using the Real-Time Protocol (RTP), may 
be recorded to Advanced Streaming Format (ASF) files. The first method 
requires a minimum amount of buffering at the recording station but 
results in recordings which identically preserve the received content 
including out of order packets, network ''jitter'', etc. The second 
approach requires buffering at the recording station but results in 
enhanced recordings (i.e., higher percentage of correctly ordered 
packets, elimination of a percentage of received jitter, potential 
recovery of a percentage of lost packets). Both approaches record all 
received RTP content and the relevant subset of RTCP information. This 
recording occurs transparently to the MBone conference or RTP session, 
and does not involve any alterations to normal RTP, RTCP, or ASF use.




E. Fleischman and A. Klemets                                    [Page 1]


Internet Draft   draft-fleischman-asf-rtp-record-00   November 14, 1997

1. Introduction

The MBone is the part of the Internet that supports IP multicast, and 
thus permits efficient many-to-many communication. It is used 
extensively for multimedia conferencing. Such conferences usually have 
the property that tight coordination of conference membership is not 
necessary; to receive a conference, a user at an MBone site only has to 
know the conference's multicast group address and the UDP ports for the 
conference data streams. The specific MBone conferences addressed by 
this document are those which use the Real-time Transport Protocol (RTP, 
see [1]). In addition, the mechanisms described within this document 
also support unicast RTP uses.

This document describes two methods for recording multimedia data that 
is transmitted using the Real-Time Transport Protocol (RTP, see [1]) 
into Advanced Streaming Format (ASF; see [2]) files. The approach is 
independent of the network protocol used to transmit RTP packets and 
supports the recording of both unicasted and multicasted sessions. Data 
thus recorded may subsequently be played back by recreating the original 
RTP packets and transmitting them using either unicast or multicast 
techniques. A recording can also be played back locally, using a 
suitable playback tool. Playback can be controlled using RTSP [4] or 
other comparable stream control mechanisms.

RTP is a protocol for carrying arbitrary real-time data.  Each RTP 
packet contains a sequence number and timestamp, which can be used by a 
receiver to detect losses and present the data at the right time.  RTP 
uses a control protocol, RTCP, which can be used to synchronize 
different real-time streams.  For synchronization to be possible, the 
streams must be transmitted such that each stream has a distinct RTP 
synchronization source (SSRC) identifier.  RTP is most commonly used 
over UDP.  However, it may be used with any transport protocol that 
detects bit errors, and that conveys the length of an RTP packet.  RTP 
does not specify a mechanism for the reliable transfer of data.  The 
protocol also does not address the encapsulation of specific media 
types, but instead defers it to various profile specifications.  

ASF is an extensible file format for recording optionally synchronized 
multimedia streams.  The format is not tied to any particular media type 
or compression scheme. Similarly, the file format was designed to be 
operating system and data communications protocol independent.

2. ASF Overview


E. Fleischman and A. Klemets                                    [Page 2]


Internet Draft   draft-fleischman-asf-rtp-record-00   November 14, 1997

The Advanced Streaming Format is defined in [2].  An ASF file consists 
of three top-level objects: The Header Object, the Data Object and, 
optionally, the Index Object. 

The Header Object provides global information about the file as a whole 
as well as specific information about the multimedia data stored within 
the Data Object. This latter content provides the information necessary 
to correctly interpret each of the media streams found within the Data 
Object. The Header Object is a container for other objects that provide 
the following specific functions:

* File Properties Object -- describes the global file attributes.
* Stream Properties Object -- defines a media stream, its 
  characteristics, and the information needed to decode that stream.
* Content Description Object -- contains all bibliographic information, 
  which may be either general for the file as a whole or stream 
  specific.
* Component Download Object -- provides information on playback 
  components.
* Stream Group Object -- logically groups media streams together into 
  specific rendering contexts.
* Scaleable Object -- defines scalability relationships among 
  (scaleable) media streams containing bands.
* Prioritization Object -- defines the relative prioritization between 
  media streams.
* Mutual Exclusion Object -- defines exclusion relationships between 
  media streams (e.g., language selection)
* Inter-Media Dependency Object -- defines dependency relationships 
  among mixed media streams.
* Rating Object -- provides the W3C PICS ([5], [6]) rating of the file.
* Index Parameters Object -- supplies the information necessary to 
  regenerate the index of an ASF file.
* Language List Object -- supplies Language Identifier information that 
  is used by several other ASF objects.

The Data Object contains all the data for each of the recorded media 
streams. This data is stored in the form of ASF Data Units. In the 
general case, ASF Data Units are designed to be directly insertable into 
the payloads of data communications transport protocols in order to be 
streamed across the network.  Each ASF Data Unit is of variable length, 
and contains data for only one media stream. Data units are sorted 
within the Data Object based on the time at which they should be 
delivered (send time). Due to the way Data Units are sorted, consecutive 
Data Units may contain data from different media streams. 

E. Fleischman and A. Klemets                                    [Page 3]


Internet Draft   draft-fleischman-asf-rtp-record-00   November 14, 1997

ASF media streams logically (in the general case) consist of sub-
elements that are referred to as objects. What an object happens to be 
in a given media stream is entirely media stream dependent (e.g., it is 
a specific image within an image media stream, a frame within a (non-
scalable) video stream, etc). 

The Index Object contains a time-based index into the multimedia data of 
an ASF file. The time interval that each index entry represents is set 
at authoring time and stored in the Index Object. Since it is not 
required to index into every media stream in a file, a list of the media 
streams that are indexed follows the time interval value. Each index 
entry consists of one data unit offset per media stream being indexed. 
This information allows stream-specific index operations to occur.

A minimal ASF implementation consists of a Header Object containing 
solely a File Properties Object, one Stream Properties object, and one 
Language List Object as well as a Data Object containing only a single 
ASF data unit. 

3. Recording MBone Sessions

The process of recording MBONE sessions may be viewed as optionally 
consisting of four steps:

  Step 1 -- Create the ASF Header Object, which will provide the 
         context for correctly interpreting the data that may subsequently 
         be recorded.

  Step 2 -- Record one or more RTP streams into the ASF Data Object.

  Step 3 -- Optionally post-process the ASF Header Object to ensure 
         that it is as complete and as efficiently stored as possible 

  Step 4 -- Optionally create an ASF Index Object.

3.1. Preparing ASF Header Information

The ASF Header Object contains various other objects that contain 
information about the media streams in the Data Object. It is often 
desirable to create an ASF Header Object before the transmission that is 
to be recorded has begun.  This would be appropriate if information is 
already available that describes the RTP sources that are to be 
recorded.  Such information might be obtained through SDP [7], RTSP [4], 
or some other non-RTP means. It is also possible to add information to 
the ASF Header Object as new information is learned during the recording 

E. Fleischman and A. Klemets                                    [Page 4]


Internet Draft   draft-fleischman-asf-rtp-record-00   November 14, 1997

of the RTP traffic. 

ASF requires that an instance of the Stream Properties Object (SPO) must 
be defined to describe each media stream recorded within the Data 
Object. A media stream generally corresponds to an RTP source in an RTP 
session. RTP sources, in turn, are identified by the value of the SSRC 
field in the RTP header. The IP address and port number to which the 
data is sent identifies RTP sessions. On the MBone, most applications 
send audio and video on separate RTP sessions, and thus audio and video 
would be recorded as two separate media streams. However, all RTP 
packets that belong to a media stream are expected to have identical RTP 
Payload Type fields. If an RTP source changes the value it is using for 
the RTP Payload Type field :mid-session", then RTP packets with the new 
(i.e., different) Payload Type fields should be stored as a different 
media stream within ASF with its own unique SPO. It is recommended that 
the relationship between streams that compose the traffic from a single 
RTP source be associated by grouping them via the ASF Header Object's 
Stream Group Object.

While the session announcement will generally provide enough information 
to construct an initial File Properties Object (FPO) and some of the 
necessary SPOs before the session begins, loosely controlled (MBone) 
conferences can permit additional participants to join the conference. 
Therefore, provision should be made to anticipate the possibility of 
additional speakers joining the session. A recommended way to satisfy 
this provision is to reserve space within the ASF Header Object via the 
ASF Placeholder Object (See Appendix A) where additional ASF objects may 
be written (e.g., additional SPOs) as the MBone session dynamically 
progresses.  

Static RTP Payload Types may be handled in one of two ways:
1. Static RTP Payload Types should be translated into the equivalent 
   ASF standard media type (see Section 8 of [2]) using the equivalent 
   ASF codec (e.g., see Reference [10]), if known.
2. Alternatively, they can be recorded as RTP Media Types as defined in 
   Appendix B.

Dynamic RTP Payload Types may be handled in one of three ways:
1. The dynamic RTP payload type should be translated into the equivalent 
   ASF standard media type (see Section 8 of [2]) using the equivalent 
   ASF codec, if known. This means that the recorder will need to 
   identify the actual codec used by that dynamic RTP Payload Type 
   instance based upon the available information. The identity of this 

E. Fleischman and A. Klemets                                    [Page 5]


Internet Draft   draft-fleischman-asf-rtp-record-00   November 14, 1997

   codec will then need to be expressed as a specific ASF UUID 
   identifier (e.g., see Reference [10]) within the SPO's Codec ID 
   field. 
2. Alternatively, the recorder can translate the dynamic RTP payload 
   type to the appropriate static RTP payload type, if any and record it 
   as an RTP Media Type as defined in Appendix B.
3. Alternatively, the recorder can record it as a dynamic RTP payload 
   type as defined in Appendix B.

RTP payload types, which can not be deciphered by any of the above 
approaches, should be ignored (i.e., that media stream can not be 
recorded).

Note that if the RTP payload is translated into the equivalent ASF 
standard media type, an inverse transformation will need to be applied 
by a playback device, if the recording is retransmitted as RTP packets. 

3.2. Two Recording Approaches

The capabilities of local systems vary. For this reason, the document 
suggests that limited capability systems seek to record data via the 
Packet Capture Mode, which is described in section 3.2.1. More capable 
systems are recommended to use the Record Structure Mode, described in 
section 3.2.2.

3.2.1. Packet Capture Mode (Limited Buffering)

The Packet Capture Mode recording alternative seeks to write RTP data as 
it is received to the ASF Data Object on the disk. The clock of the 
recording computer is used to determine the ASF Data Unit's Send Time 
value. The Send Time value is calculated by subtracting the multimedia 
session's start time (as recorded by the recording computer) from the 
recording computer's current time and converting the result into 
millisecond units.

The RTP timestamp is directly written as the ASF Data Unit's 
Presentation Time value, again making the necessary conversions to 
account for the fact that the initial RTP timestamp value is random 
while the initial ASF Send Time and Presentation Time values are zero. 
The granularity of the Presentation Time units (i.e., the Presentation 
Time Numerator and Presentation Time Denominator fields within the SPO) 
should be set to the clock granularity for that RTP source. ASF's 
default presentation time granularity (i.e., a millisecond) should 
initially be used for those cases in which the actual clock granularity 

E. Fleischman and A. Klemets                                    [Page 6]


Internet Draft   draft-fleischman-asf-rtp-record-00   November 14, 1997

is not known.

The value of the Presentation Time Flags within the SPO for this media 
stream shall thus be configured to be "11" (i.e., Full Data Unit 
Presentation Time).

RTCP Sender Reports (SR) for the RTP source being recorded can be used 
to calculate the clock granularity of the source. This is useful if the 
clock granularity is otherwise unknown. It is also possible to use 
Sender Reports to detect skews between the clock granularity used by the 
source, and the granularity that is given by the RTP Payload Type 
specification or profile. If such a skew is detected, the Rational Time 
Values (i.e., Presentation Time Numerator and Presentation Time 
Denominator fields) of the SPO should be altered accordingly.

This approach has the advantage of being simple and direct to implement. 
It has the following disadvantages: 
* Jitter is preserved - and repeated re-recordings of the same 
  contents by this manner may exacerbate the jitter on each subsequent 
  recording.
* Out-of-order packets remain out of order.
 
3.2.2. Record Structure Mode (Buffering)

The Record Structure Mode requires that packets be buffered a finite 
amount of time (e.g., 5 seconds) before being written to disk. Packets 
within the buffer should be correctly ordered. Packet holes occurring 
within the buffer interval should be filled by retransmitted packets (if 
any). 

Within this approach, the value of the RTP Timestamp field is used to 
compute the send time. Since the RTP timestamp starts at a random value, 
while the ASF Send Time and Presentation Time start at zero, a 
conversion into appropriate ASF Send Time values must be made. The send 
time is stored with a 1-millisecond granularity. The appropriate RTP 
Payload Type specification or profile gives the granularity of the RTP 
Timestamp. RTCP Sender Reports (SR) may be used calculate the 
granularity of the RTP Timestamp if it is otherwise unknown. Sender 
Reports can also be used to detect skews between the RTP Timestamp 
granularity and the granularity specified in the RTP Payload Type 
specification or profile. If such a skew is detected, the send time 
values for currently buffered packets of that media type have to be 
altered (retaining their millisecond granularities) to correctly reflect 
the skew.

E. Fleischman and A. Klemets                                    [Page 7]


Internet Draft   draft-fleischman-asf-rtp-record-00   November 14, 1997


The following values should be recorded within the Stream Properties 
Object for the media streams recorded by this approach: The clock 
frequency of the RTP payload type should be appropriately recorded into 
the Presentation Time Numerator and Presentation Time Denominator 
fields. The Presentation Time Flag value should have the value of "01" 
and the Presentation Time Delta field should have a value of zero. This 
means that both the ASF send time and presentation time have the same 
value and that subsequent RTP retransmissions of this data will contain 
only one timestamp (i.e., RTP's timestamp). 

This approach has the advantage of correcting some of the received 
jitter, correctly sorting some of the out-of-order packets, and 
potentially filling in some lost packets (assuming a retransmission 
scheme is used). The disadvantage of this approach is that it is more 
complex to implement. This is particularly the case if the RTP payload 
type's clock frequency is not known ahead of time and has to be 
subsequently learned via RTCP transmissions. In addition, it requires 
additional buffering on the recording computer.

3.3. Recording MBONE Sessions

The following translations from RTP packet fields to ASF data fields are 
identical for both recording approaches.

3.3.1. RTP Mixers and Translators

The combined streams resulting from Mixers and Translators need to be 
demultiplexed back into their original component streams when being 
recorded into ASF, if possible. If this is not possible, then copies of 
the RTP packet containing data that is attributed to multiple sources 
need to be stored into each of these sources' media streams (i.e., ASF 
Data Units). In either case, these streams may be optionally re-mixed 
when they are subsequently replayed from the ASF files depending upon 
local implementation considerations.

3.3.2. RTP Packet Information

The RTP Header's Payload Type field combined with the SSRC is used to 
determine the ASF Stream Number value for that media stream. This Stream 
Number value identifies which SPO instance should be used to define this 
media stream. This value is recorded into the Stream Number field of the 
ASF Data Unit.

The Version field in the RTP header is not recorded into the ASF file 

E. Fleischman and A. Klemets                                    [Page 8]


Internet Draft   draft-fleischman-asf-rtp-record-00   November 14, 1997

unless it is a version other than 2. If the Version field in the RTP 
header is other than 2, the RTP version number should be recorded into 
the ASF Header Object's Content Description Object (CDO; see Section 5.4 
of [2]) using a value of 73 for the Field Type field.

The Padding bit, and the Padding field that is present if the bit is 
set, is not recorded. If an RTP packet where the Padding bit was set is 
received, the padding field should be removed from the RTP payload. 
Padding may be regenerated when retransmitting the recording, if 
necessary.

SSRC information should be written into the CDO as an aid for 
remembering the association between an SSRC and a Media Stream. This 
will also permit the original sequence number to be optionally recreated 
once the recorded data is retransmitted. The 32-bit SSRC value will need 
to be converted into a string when it is stored into the Value field of 
the CDO. When storing the SSRC as a Unicode string, the SSRC is treated 
as an unsigned 32-bit integer, and it must be converted to the local 
byte order (i.e., host byte order). The value of the Field Type field is 
70.

Because the initial RTP timestamp value is a random value, the initial 
RTP timestamp value should also be recorded into the CDO. This will 
permit the original timestamp sequence to be optionally recreated once 
the recorded data is retransmitted. The 32-bit timestamp value will need 
to be converted into a Unicode string when it is recorded into the Value 
field of the CDO. The value of the CDO's Field Type field is 71.

The initial RTP Sequence Number value should be recorded into the CDO. 
This will permit the original number to be optionally recreated once the 
recorded data is retransmitted. The 16-bit Sequence Number value will 
need to be converted into a Unicode string when it is stored within the 
Value field of the CDO. When storing the Sequence Number as a string, 
the Sequence Number is treated as an unsigned 16-bit integer, and it 
must be converted to the local byte order (i.e., host byte order). The 
value of the Field Type field is 72.

It should be noted that ASF's concept of Object Number differs from 
RTP's concept of Sequence Number although they are both used to identify 
out-of-order and missing information. [Note: earlier versions of the ASF 
spec used the term "ObjectID" instead of "Object Number".] The former 
identifies specific media stream "objects" as a part of a fragmentation 
and grouping schema. What an object happens to be in a given media 

E. Fleischman and A. Klemets                                    [Page 9]


Internet Draft   draft-fleischman-asf-rtp-record-00   November 14, 1997

stream is entirely media stream dependent (e.g., it is a specific image 
within an image media stream, a frame within a (non-scalable) video 
stream, etc).  Since object fragmentation occurs within a specific RTP 
Payload Type instance and RTP headers do not indicate this type of 
information, an identical translation of the original Object Number 
semantics would require a decoding of the media stream. The value of 
pursuing this type of overhead is highly questionable, especially when 
the ultimate goal of identifying missing or out-of-order information is 
common between the two approaches. Therefore, the RTP sequence number 
should be directly mapped into the ASF's Object Number field of the ASF 
Data Unit. Since the 16-bit Sequence Number starts at a random interval 
while the 8-bit Object Number starts at zero, the mapping between the 
Sequence Number and Object ID needs to reflect this difference (e.g., 
Current-Sequence-Number value minus Original Sequence-Number value = 
Object Number) and account for the fact that Object Numbers "wrap 
around" to zero every 2^8th packet and Sequence Numbers "wrap around"
when their value hits 2^16.

If the CSRC fields within the RTP header are demultiplexed into their 
original component streams when being recorded, then the CSRC fields are 
not recorded. If, however, this is not possible, then the CSRC 
information should be written into the ASF Data Unit's extension field 
as described below.

If the RTP payload has been converted into an "equivalent ASF standard 
media type" (see Section 3.1), then the RTP Extension Object described 
by the next paragraph is optional. However, if the RTP Media Type 
described in Appendix B has been used to record the data, then the RTP 
Extension Object is required to be used if either the RTP Header's M-bit 
or the RTP Header's eXtension (X) bit are ever set within that stream, 
or if CSRC information is ever needed to be recorded within that media 
stream. The RTP Extension Object permits exact copies of the original 
RTP packets to be regenerated, if desired.

The RTP Extension Object is an instance of the Extension Object that is 
described within Section 5.3.1 of [2]. Extension Objects are associated 
with a specific media stream's SPO and indicate the semantics and format 
of specific data (i.e., in this case RTP Packet Header data) that is 
stored on a per packet basis within the Extension Data field of the ASF 
Data Unit (see Section 6.1 of [2]). The RTP Extension Object is defined 
as follows:
* The value of the Extension Data Size field is 0xFFFF 
* The UUID value of the Extension System field is {96800c63-4c94-11d1-
  837b-0080c7a37f95}. 

E. Fleischman and A. Klemets                                    [Page 10]


Internet Draft   draft-fleischman-asf-rtp-record-00   November 14, 1997

These definitions indicate that this recording shall follow the 
"variable length" extension data encoding format (i.e., one bit length 
field followed by the extension data) within the Extension Data field of 
the ASF Data Unit.

In the case of the RTP Extension Object, the Extension Data field of the 
ASF Data Unit has the following syntax:

Field Name:           Size:
Extension Length      8 bits - Size in bytes of the Extension Data and Flag 
                      fields (i.e. sizeof (Extension Data) + 1)
Flag                  8 bits
     X-bit            1 bit (LSB) -- contains the RTP Header's X-bit value 
     CSRC Count       4 bits      -- contains the RTP Header's CC value
     M-bit            1 bit       -- contains the RTP Header's M-bit value
     Reserved         2 bits (MSB)
Extension Data        RTP Header CSRC list, if any, followed by Extension 
                      Data, if any

The "variable length" encoding means that if either the X bit is set or 
the CSRC Count has a non-zero value, then the Extension length, flag, 
and RTP header extension data are written into the Extension Data field 
of the ASF Data Unit. If both the X bit is cleared and the CSRC Count 
has a zero value, then only the extension length and flag fields are 
written to the Extension Data field of the ASF Data Unit. If both the X-
bit is set and the CSRC Count field has a non-zero value, then the CSRC 
list of the RTP Header appears first immediately followed by the RTP 
Header Extension data within the Extension Data field. These fields are 
arranged in big-endian order (also known as network byte order).

3.3.3. RTCP Packet Information

RR and BYE packets are not recorded into ASF files. Clock skew 
information obtained from SR packets is used for the timestamp 
calculations described in Sections 3.3.1 and 3.3.2. Other information 
contained in SR packets, except for APP and SDES information, is not 
recorded.

SDES information is stored in the ASF Header Object's Content 
Description Object (CDO). Appropriate SDES items (i.e., "CNAME", "NAME", 
"EMAIL", "PHONE", "LOC", "TOOL", "NOTE", and "PRIV") shall be written 
into the CDO as described by Appendix C.  Synchronization relationships 
between media streams containing the same CNAME value should be retained 
via associating them by ASF's Inter-Media Dependency Object (Section 

E. Fleischman and A. Klemets                                    [Page 11]


Internet Draft   draft-fleischman-asf-rtp-record-00   November 14, 1997

5.12 of [2]).

APP information should be handled in one of two ways. 
1. If the recorder understands (through out-of-band mechanisms outside 
   of the scope of both ASF and RTP) that the APP information contains 
   script commands or invocations, which correspond to either the ASF 
   Header Object's Script Command Object (see section 5.5 of [2]) or to 
   a Command Media stream type (see section 8.7 of [2]), then the 
   recorder can convert the APP information into the appropriate ASF 
   constructs.
2. If the recorder does not understand the APP information then that 
   information should be appropriately recorded "as is" into the ASF 
   Header Object's Script Command Object. 

If the values of the SDES fields from a particular RTP source change 
during the recording, it is recommended that the CDO contain the initial 
value for the SDES field. Subsequent values of the SDES fields should 
then be recorded as a separate media stream, via the mechanisms 
described in Appendix D.

3.4. Optional Post-Processing of the ASF Header

Whenever live recordings are made, the Live Bit must be set in ASF's 
File Properties Object. This signifies that certain fields in the ASF 
File Properties Object and the Stream Properties Object(s) are invalid 
and should be ignored. In addition, these same files are likely to also 
contain the ASF Placeholder Object (see Appendix A). It is highly 
recommended, but not required, that post-processing be done to ASF files 
to clear the Live Bit, remove the ASF Placeholder Object, and to write 
valid data into the fields which are invalid when the Live Bit is set.
3.5. Optional Creation of the ASF Index Object

ASF uses the Index Parameters Object in the ASF Header to identify the 
parameters and media streams whose data will be indexed. This object is 
described in Section 5.14 of [2]. If the Index Parameters Object does 
not yet exist for this file, then it needs to be constructed before the 
Index Object is built. Using the information contained within the Index 
Parameters Object, the Index Object is constructed as defined in Section 
7 of [2]. 

3.6. Playback of the Recorded RTP Data

Recorded media streams are stored into the ASF Data Object as ASF Data 
Units (see Section 6.1 of [2]). Each ASF Data Unit contains a "header 

E. Fleischman and A. Klemets                                    [Page 12]


Internet Draft   draft-fleischman-asf-rtp-record-00   November 14, 1997

field" together with the media data which is being stored. The payload 
of each RTP packet comprises the media data stored within the ASF Data 
Unit. The RTP header itself is not stored but its content is mapped into 
the SPO, CDO, and the header field of the ASF Data Unit.

The ASF file contains sufficient information to play back the recorded 
data, either locally or via a remote playback device. When RTP packets 
are recorded into the ASF file using the RTP Media Type (see Appendix 
B), sufficient information exists to regenerate RTP packets with the 
same SSRC and sequence numbers as the original packets, if desired. 
Additionally, it is possible to regenerate RTCP SDES and APP packets 
with the same content as those sent by the original RTP source. This 
permits recorded data to be retransmitted into an existing MBone 
conference, for example, in such a manner that it may appear that the 
data originates from the original RTP source.

This specification does not define a required feature set for playback 
devices. For example, even though it is possible to retransmit the 
recorded data using RTP, playback devices are not required to do so.


Appendix A. ASF Placeholder Object Definition

"Loosely controlled" sessions permit participants to enter and leave 
without membership control or parameter negotiation. Since one can not 
always predict how many participants will speak, nor what media types 
they will use, a mechanism is needed to reserve space within the Header 
Object so that new Header Objects (e.g., Stream Properties Objects) may 
be readily added to the header when needed without requiring the header 
to be re-written.

The purpose of the ASF Placeholder Object is to fulfill this "place 
holder" function. New header objects are added into the space reserved 
by the ASF Placeholder Object. The ASF Placeholder Object will then 
reduce the amount of space it is reserving by the amount taken by the 
new object(s).

ASF Placeholder Objects are ignored (skipped over) when ASF Header 
Information is conveyed to remote nodes. Even so, it is recommended that 
they be removed by post processing (see section 3.4) to make more 
compact files.




E. Fleischman and A. Klemets                                    [Page 13]


Internet Draft   draft-fleischman-asf-rtp-record-00   November 14, 1997



The ASF Placeholder Object is defined as follows:

Field Name:    Size:             Value:
Object ID      128 bits          This field contains the following UUID 
                                 value: {D6E22A0F-35DA-11d1-9034-
                                 00A0C90349BE}
Object Size    64 bits           The size of this object in bytes (i.e., 
                                 Reserved field value + 24)
Reserved       (Object Size 
               - 24) * 8 bits    Reserved space


Appendix B. RTP Media Type

ASF has defined standard media types for Audio, Video, Image, Timecode, 
Text, MIDI, Command, and Media-Objects (Hotspots) in Section 8 of [2]. 
Implementations, which support these types of media streams, are 
expected to implement them in the manner defined within the ASF 
standard. MBone content, which is stored within ASF, is therefore 
expected to be mapped into the standard ASF media streams format 
whenever possible.

However, occasions will exist when it will not be possible to conform to 
this requirement. Possible reasons include the following:
* The recorder may not be aware of which media type is associated with 
  an RTP Payload Type (i.e., whether the RTP Payload Type is referring 
  to Audio, Video, or some other media type).
* The recorder may not know which ASF-defined codec corresponds to the 
  codec assumed by the RTP Payload Type and therefore it would be 
  unable to complete the mapping into a standard ASF media type.
* The RTP Payload Type may indicate an interleaved data stream (e.g., 
  video and audio combined into a single stream). No standard ASF media 
  type has yet been defined for such interleaved data.
* The RTP Payload Type may indicate a media type which is not among the 
  standard ASF Media Types.
For these reasons and others, a provision must exist to record MBone 
data as a distinct RTP Media Type. This appendix defines the format of 
RTP Media Type.

The RTP Media Type is defined within the Stream Properties Object (SPO) 
by placing the UUID value {96800c65-4c94-11d1-837b-0080c7a37f95} into 
the Stream Type field. The following information is then stored as Type-

E. Fleischman and A. Klemets                                    [Page 14]


Internet Draft   draft-fleischman-asf-rtp-record-00   November 14, 1997

Specific Data field within the SPO:

Field Name:       Field Type:  Size (bits):  Description:

Payload Type         UINT      8              The Payload Type value 
                                              indicated by the RTP header.
Profile Size         UINT      16             Size in bytes of the Profile 
                                              field.
Profile              UINT8     ?              ASCII string identifying the 
                                              Profile which has defined the 
                                              Payload Type. (E.g., "AVP" 
                                              for the profile defined by 
                                              [3] and [9].) An empty string 
                                              is used if the profile is not 
                                              known.
 Announcement ID Size UINT     16             Size in bytes of the 
                                              Announcement ID field.
 Announcement ID     UINT8     ?              MIME Type of the session 
                                              announcement mechanism used. 
                                              (E.g., "application/x-sdp" 
                                              for SDP [7] announcements.)
 Announcement Size   UINT      16             Size in bytes of the 
                                              Announcement field.
 Announcement        UINT8     ?              ASCII string containing the 
                                              definition for this media 
                                              stream. (E.g., for SDP [7] 
                                              announcements, this would 
                                              contain the entire rtpmap 
                                              entry for this media stream.)
All ASCII strings in the RTP Media Type are terminated by a NULL 
character. These fields should be stored in little-endian byte order 
(i.e., the orientation used in the ASF Header Object). 

The final four fields (i.e., Announcement ID Size, Announcement ID, 
Announcement Size, and Announcement) are used to convey information 
about the dynamic RTP payload type. This information might have been 
available to the recording device through non-RTP means. Examples of 
possible sources of such information include session descriptions, such 
as SDP [7], and presentation descriptions [4]. However, if a static RTP 
Payload Type is being specified, both the Announcement ID Size and the 
Announcement Size fields may have a value of zero indicating that the 
Announcement ID and Announcement fields have not been specified.

The rest of the SPO should be specified as indicated in Section 3.2 

E. Fleischman and A. Klemets                                    [Page 15]


Internet Draft   draft-fleischman-asf-rtp-record-00   November 14, 1997

above.

The received RTP data of this media stream is stored into the ASF Data 
Object as described in Section 3.2 and Section 3.3 above.


Appendix C. Recording SDES Information

Section 5.4 of [2] describes the syntax and semantics of the Content 
Description Object (CDO) within the ASF Header Object. This object 
consists of an array of Description Records containing four logical 
entries:
1. A Field Type value which identifies the semantics of the entry. Each 
   SDES packet may be recorded to the CDO using the following pre-
   defined Field Type (unsigned integer) values:

                   SDES entry:             Field Type Value:
                   CNAME                   61
                   NAME                    62
                   EMAIL                   63
                   PHONE                   64
                   LOC                     65
                   TOOL                    66
                   NOTE                    67
                   PRIV                    68

2. Stream Number to identify to which media stream this CDO entry 
   refers.
3. Name - Name of the entry. This field is redundant to the Field Type 
   value and therefore the field is frequently not used. However, 
   applications may optionally use this field for language 
   "localization" reasons (e.g., to translate the entry into a specific 
   target language). 
4. Value - the information conveyed by the specific SDES message (e.g., 
   User and domain name in a CNAME packet).


Appendix D. SDES Media Streams

Section 3.3.3 stated that the first instance of a specific SDES RTCP 
instance (i.e., a specific SDES item associated with a specific RTP 
source identifier; e.g., a CNAME value for a specific SSRC) should be 
recorded into the Content Description Object (CDO). The Stream Number 
field within the CDO should refer to the media stream associated with 
the RTP source identifier (i.e., SSRC/CSRC field of section 6.4 of [1]) 

E. Fleischman and A. Klemets                                    [Page 16]


Internet Draft   draft-fleischman-asf-rtp-record-00   November 14, 1997

of that SDES packet chunk. The CDO has provisions for storing only one 
SDES type instance (e.g., only one instance of a CNAME) for any given 
media stream. Therefore, subsequent instances of the same SDES type for 
that media stream will need to be recorded as a distinct "media stream" 
if that information is to be preserved. This appendix defines how to 
create such an SDES media stream.

An SDES media stream consists of SDES information written into the ASF 
Data Object via the mechanisms described in section 3.2. Each SDES media 
stream records SDES information from only one RTP source identifier. A 
Stream Properties Object (SPO) is constructed for each SDES media 
stream. That SDES media stream should also be associated with (i.e., 
synchronized with) the media stream containing the RTP data of that same 
RTP source identifier via the ASF Header Object's Inter-Media Dependency 
Object. 

The SPO for a SDES media stream should be constructed as follows:
* The UUID of the SDES Media Stream is {96800c62-4c94-11d1-837b-
  0080c7a37f95}. This value should be written into the Stream Type 
  field of ASF's Stream Properties Object (SPO) to identify SDES Media 
  Streams. 
* The value of the Type-Specific Data Length field within the SPO is 
  zero (i.e., no Type-Specific Data). 

The format of an SDES Media Stream consists of one or more instances 
(per ASF Data Unit) of the following structure:

Field Name:    Field Type:  Size (bits):    Description:
Type Array Size   UINT      16              Size in bytes of the Type 
                                            Array
Value Array Size  UINT      16              Size in bytes of the Value 
                                            Array
Type Array        UINT8     ?               UTF-2 string [8] identifying 
                                            the specific SDES type 
                                            instance (e.g., "CNAME")
Value Array       UINT8     ?               UTF-2 string [8] containing 
                                            the SDES value (e.g., "user 
                                            and domain name" for a CNAME)






E. Fleischman and A. Klemets                                    [Page 17]


Internet Draft   draft-fleischman-asf-rtp-record-00   November 14, 1997




Authors Address
   Eric Fleischman 
   E-mail: ericfl@microsoft.com
   and
   Anders Klemets
   E-mail: anderskl@microsoft.com
   Microsoft Corporation
   1 Microsoft Way
   Redmond, WA 98052-8300
   USA

References:
1 H. Schulzrinne, S. Casner, R. Frederick, V. Jacobson., "RTP : A 
  Transport Protocol for Real-Time Applications", IETF RFC 1889, 
  January 1996.
2 Microsoft Corporation, "Advanced Streaming Format (ASF) 
  Specification", http://www.microsoft.com/asf/specs.htm, September 
  1997.
3 H. Schulzrinne, "RTP Profile for Audio and Video Conference with 
  Minimal Control", IETF RFC 1890, January 1996.
4 H. Schulzrinne, A. Rao, and R. Lanphier "Real Time Streaming 
  Protocol (RTSP)", work in progress.
5 J. Miller, P. Resnick, and D. Singer, "Rating Services and Rating 
  Systems (and Their Machine Readable Descriptions)," World Wide Web 
  Consortium http://www.w3.org/PICS/services.html, May 5 1996.
6 T. Krauskopf, J. Miller, P. Resnick, and G. W. Treese, "Label Syntax 
  and Communication Protocols," World Wide Web Consortium 
  http://www.w3.org/PICS/labels.html, May 5 1996.
7 M. Handley, V. Jacobson, "SDP: Session Description Protocol", work 
  in progress.
8 International Standards Organization, "ISO/IEC DIS 10646-1:1993 
  information technology - universal multiple-octet coded character 
  set (UCS) - part I: Architecture and basic multilingual plane," 
  1993.
9 "RTP Payload types (PT) for standard audio and video encodings", 
  ftp://ftp.isi.edu/in-notes/iana/assignments/rtp-av-payload-types
10 "ASF Codec GUIDs", http://www.microsoft.com/asf/guids.htm





E. Fleischman and A. Klemets                                    [Page 18]