INTERNET-DRAFT                                              John Lazzaro
March 1, 2003                                             John Wawrzynek
Expires: September 1, 2003                                   UC Berkeley


              The MIDI Wire Protocol Packetization (MWPP)

                 <draft-ietf-avt-mwpp-midi-rtp-06.txt>


Status of this Memo

This document is an Internet-Draft and is subject to all provisions of
Section 10 of RFC2026.

Internet-Drafts are working documents of the Internet Engineering Task
Force (IETF), its areas, and its working groups.  Note that other groups
may also distribute working documents as Internet-Drafts.

Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference material
or to cite them other than as "work in progress."

The list of current Internet-Drafts can be accessed at
http://www.ietf.org/1id-abstracts.html

The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html

                                Abstract

     The MIDI Wire Protocol Packetization (MWPP) is a general-purpose
     RTP packetization for the MIDI command language. MWPP is suitable
     for use in both interactive applications (such as the remote
     operation of musical instruments) and content-delivery applications
     (such as MIDI file streaming). MWPP is suitable for use over
     unicast and multicast UDP, and defines tools that support the
     graceful recovery from packet loss. MWPP may also be used over
     reliable transport such as TCP. The SDP parameters defined for MWPP
     support the customization of stream behavior (including the MIDI
     rendering method) during session setup. MWPP is compatible with the
     MPEG-4 generic RTP payload format, to support the MPEG 4 Audio
     object types for General MIDI, DLS2, and Structured Audio.







Lazzaro/Wawrzynek                                               [Page 1]

INTERNET-DRAFT                                              1 March 2003


0. Change Log for <draft-ietf-avt-mwpp-midi-rtp-06.txt>

This revision implements potential solutions to many of the remaining
MWPP open issues. Comments are welcome as always. Each resolved open
issue is listed below, along with a pointer to changes in the document
that execute the fix.

  1. The MIDI Command Section header has a new flag bit, the P
     (phantom) bit (Section 3). If P is 1, the status octet of
     the first MIDI channel command in the MIDI list is a
     (P)hantom octet that does not appear in the MIDI source data.
     The P bit signals that the sender has undone running status
     coding on a source command, in order to ensure that the first
     channel command in the MIDI list has a status octet. With the
     extra information coded by the P bit, receivers can reduce jitter
     when transcoding MWPP streams that originate and terminate on a
     MIDI 1.0 cable. Note that text in Appendix C.2.1 and C.2.2 has
     been changed to reflect the P flag bit, and that the variable
     length LEN field now has possible sizes of 5 bits and 13 bits.
     Thanks to Jim Wright.

  2. The MIDI command field of the final element of the MIDI list
     of the MIDI Command Section may now be empty (Section 3,
     modifications to Figure 3, and a new paragraph in the text
     that follows Figure 3). This change facilitates interactive
     systems that use the timestamp of the last MIDI command in the
     MIDI list as a proxy for the sending time of the packet. In
     practice, interactive senders may sense an incoming MIDI command
     from a source, wait a short period of time to see if a second command
     follows immediately, and if not send off the packet with a single
     command. By adding a second (null) MIDI command to the MIDI
     list, senders are able to code an accurate sending time
     proxy in the MIDI list in this case.

  3. The ordering and uniqueness of channel journals in the recovery
     journal is now normatively specified (Section 5). Receivers
     are now required to use LENGTH fields to parse over journal
     structures (Appendix A.1). This restriction is for forward
     compatibility with future versions of the recovery journal
     that may use the R (reserved) flag bits to extend the format.

  4. In Section 2.2, sender rules have been added for RTP streams
     over reliable transport (such as TCP). If the stream is setup
     to not use journalling, senders have the responsibility to send
     a "perfect" stream (no lost or out of order RTP packets). If a
     reliable transport RTP stream is configured to use the recovery
     journal system, the sender may transmit a stream with late and
     lost packets. These rules supports retransmitting UDP MWPP streams



Lazzaro/Wawrzynek                                               [Page 2]

INTERNET-DRAFT                                              1 March 2003


     over TCP.

  5. Section 4 has been rewritten, to clearly state the normative
     responsibilities of senders, receivers, and session description
     creators, regarding the recovery journal system. The rewrite
     adds new responsibilities regarding receivers that join or leave
     a stream. To support these changes, Appendix C.1 has also been
     rewritten, and many small changes have been made in the normative
     text in Appendices A and B. Note that some text that used to
     appear in Section 4 has been moved to Appendix C.1.

  6. A paragraph has been added to Appendix C.4.1, to add sender
     receiver responsibilities for RPN/NRPN transactions over
     identity relationship streams. Thanks to Martijn Sipkema.

  7. Section C.1.1 now includes instructions for using the SDP j_sec
     parameter with RTSP.

  8. Appendix B.3 now notes that MIDI Tick is a non-standard use
     of the undefined 0xF9 MIDI byte. In the next I-D release,
     MIDI Tick will most probably be removed from the Chapter Q
     bitfield that Appendix B.3 describes. Thanks to Jim Wright.

In addition to open issue resolution, this revision also makes the
following editorial changes.

  o MWPP retains SDP support for specifying journalling formats
    other than the recovery journal. However, the editorial
    motivation for alternative formats has changed. The original
    focus was as a future mechanism for implementing FEC and
    packet retransmission. In the new memo, we refer the reader
    to generic RTP services for retransmission and FEC. The
    remaining rationale for alternative journalling formats is
    to provide a way for protocol designers to experiment with
    new MIDI resiliency techniques in the context of an existing
    payload.

  o In the introduction, we add network music performance to
    content streaming and LAN musical instrument control as
    motivating applications for MWPP. We also emphasize that
    quality-of-service lies outside the scope of RTP.

  o In Section 1.1, added "local datagram networks that are
    known to be reliable" as an acceptable transport for MWPP
    w/o the recovery journal.

  o In Section 2, the description of why applications choose
    a particular transport (unicast UDP, multicast UDP, or



Lazzaro/Wawrzynek                                               [Page 3]

INTERNET-DRAFT                                              1 March 2003


    TCP) has been rewritten to be orthogonal along four
    axes (reliability, latency, topology, and availability).

  o In Section 2.1, the discussion of RTP timestamp increment
    policy has been rewritten, to clarify the fundamental
    differences between interactive and content-streaming
    approaches to RTP timestamp increment policy. Pointers
    to non-normative algorithms in [22] are also included.

  o Minor change to Section 3, to emphasize that senders should
    feel free to use running status coding in the MIDI list to
    improve bandwidth efficiency, as receivers must be able to
    decode running status. Partial rewrite of Appendix A.4
    (recovery journal Chapter N, for MIDI NoteOn and NoteOff),
    to clarify the LEN = 127 escape mechanism, and to improve
    notation. Thanks to Dominique Fober.

  o In Section 6, fixed text on session description coding of
    transport information.

  o The paragraph defining semantics of S bits in Appendix A.1
    was rewritten, to emphasize the normative nature of the
    definition. The paragraph defining the semantics of the B
    bit in Chapter N in Appendix A.4 is also rewritten.



























Lazzaro/Wawrzynek                                               [Page 4]

INTERNET-DRAFT                                              1 March 2003


                            Table of Contents


1. Introduction  . . . . . . . . . . . . . . . . . . . . . . . . . .   7
     1.1 MWPP RTP Overview . . . . . . . . . . . . . . . . . . . . .   8
     1.2 Overview of SDP Parameters for MWPP . . . . . . . . . . . .   9
2. MWPP Packet Format.   . . . . . . . . . . . . . . . . . . . . . .   9
     2.1 RTP Header  . . . . . . . . . . . . . . . . . . . . . . . .  11
     2.2 MWPP Payload  . . . . . . . . . . . . . . . . . . . . . . .  12
3. MIDI Command Section  . . . . . . . . . . . . . . . . . . . . . .  14
4. The Recovery Journal System . . . . . . . . . . . . . . . . . . .  20
5. Recovery Journal Format . . . . . . . . . . . . . . . . . . . . .  22
6. MWPP and the Session Description Protocol . . . . . . . . . . . .  25
     6.1 Session Descriptions for Native MWPP Streams  . . . . . . .  26
     6.2 Session Description for mpeg4-generic MWPP Streams  . . . .  28
     6.3 MWPP SDP Parameters . . . . . . . . . . . . . . . . . . . .  30
7. Security Considerations . . . . . . . . . . . . . . . . . . . . .  31
8. Congestion Control  . . . . . . . . . . . . . . . . . . . . . . .  32
9. Acknowledgements  . . . . . . . . . . . . . . . . . . . . . . . .  32
Appendix A. The Recovery Journal Channel Chapters  . . . . . . . . .  34
     Appendix A.1. Recovery Journal Definitions  . . . . . . . . . .  34
     Appendix A.2. Chapter P: MIDI Program Change  . . . . . . . . .  36
     Appendix A.3. Chapter W: MIDI Pitch Wheel . . . . . . . . . . .  37
     Appendix A.4. Chapter N: MIDI NoteOff and NoteOn  . . . . . . .  37
     Appendix A.5. Chapter A: MIDI Poly Aftertouch . . . . . . . . .  40
     Appendix A.6. Chapter T: MIDI Channel Aftertouch  . . . . . . .  41
     Appendix A.7. Chapter C: MIDI Control Change  . . . . . . . . .  41
     Appendix A.8. Chapter M: MIDI Parameter System  . . . . . . . .  45
Appendix B. The Recovery Journal System Chapters . . . . . . . . . .  49
     Appendix B.1. System Chapter D: Reset, etc.   . . . . . . . . .  49
     Appendix B.2. System Chapter V: Active Sense Command  . . . . .  50
     Appendix B.3. System Chapter Q: Sequencer State Commands  . . .  50
     Appendix B.4. System Chapter E: MIDI Time Code  . . . . . . . .  53
          B.4.1  Informative Description of Chapter E  . . . . . . .  54
          B.4.2  Normative Definition of Chapter E . . . . . . . . .  54
     Appendix B.5. System Chapter X: System Exclusive  . . . . . . .  56
Appendix C. Session Description Protocol (SDP) Definitions . . . . .  61
     Appendix C.1. The Journalling System  . . . . . . . . . . . . .  62
          C.1.1. The j_sec Parameter . . . . . . . . . . . . . . . .  63
          C.1.2. The j_update Parameter  . . . . . . . . . . . . . .  64
               C.1.2.1 The anchored sending policy . . . . . . . . .  64
               C.1.2.2 The closed-loop sending policy  . . . . . . .  65
               C.1.2.3 The open-loop sending policy  . . . . . . . .  68
          C.1.3. Chapter Inclusion Parameters  . . . . . . . . . . .  69
     Appendix C.2. Command Execution Semantics . . . . . . . . . . .  72
          C.2.1 Description of the async method  . . . . . . . . . .  73
          C.2.2 Description of the buffer method . . . . . . . . . .  74
     Appendix C.3. Media Time  . . . . . . . . . . . . . . . . . . .  75



Lazzaro/Wawrzynek                                               [Page 5]

INTERNET-DRAFT                                              1 March 2003


     Appendix C.4. Multiple Streams  . . . . . . . . . . . . . . . .  76
          C.4.1 The midiport parameter . . . . . . . . . . . . . . .  76
          C.4.2 The zerosync parameter . . . . . . . . . . . . . . .  78
     Appendix C.5. MIDI Rendering  . . . . . . . . . . . . . . . . .  81
          C.5.1 The sasc Method  . . . . . . . . . . . . . . . . . .  82
     Appendix C.6. ABNF Specifications for MWPP Parameters . . . . .  84
     Appendix C.7. IANA Considerations . . . . . . . . . . . . . . .  88
          Appendix C.7.1 mwpp MIME Registration  . . . . . . . . . .  88
          Appendix C.7.2 mpeg4-generic MIME Registration . . . . . .  90
          Appendix C.7.3 sasc MIME Registration  . . . . . . . . . .  93
Appendix D. A MIDI Overview for Networking Specialists . . . . . . .  96
Appendix E. Author Addresses . . . . . . . . . . . . . . . . . . . .  98
Appendix F. References . . . . . . . . . . . . . . . . . . . . . . .  99






































Lazzaro/Wawrzynek                                               [Page 6]

INTERNET-DRAFT                                              1 March 2003


1. Introduction

The Internet Engineering Task Force (IETF) has developed a set of
focused tools for multimedia networking ([2] [9] [10] [12]). These tools
can be combined in different ways to support a variety of real-time
applications over Internet Protocol (IP) networks.

For example, to support IP telephony, applications might use the Session
Initiation Protocol (SIP, [10]) to set up phone calls. Call setup might
include negotiations (using the SIP offer/answer protocol [11]) to agree
on a common audio codec.  These negotiations would use the Session
Description Protocol (SDP, [9]) to describe candidate codecs.  After a
call is set up, audio data would flow between the participants using the
Real Time Protocol (RTP, [2]) under the Audio/Visual Profile (RTP/AVP,
[3]). The tools used in this telephony example (SIP, SDP, RTP/AVP) might
be combined in a different way to support a content streaming
application, perhaps in conjunction with other tools (such as the Real
Time Streaming Protocol (RTSP, [12])).

The Musical Instrument Digital Interface (MIDI, [1]), a standard for
musical instrument control, is widely used in applications that are
roughly analogous to the example applications described above.  On stage
and in the recording studio, MIDI is used for the interactive remote
control of musical instruments, an application similar to spirit to
telephony. On web pages, Standard MIDI Files [1] rendered using the
General MIDI standard [1] provide a low-bandwidth substitute for audio
streaming, suitable for simple "background music" uses.

This memo is motivated by a simple premise: if MIDI performances could
be sent as RTP streams that are managed by IETF session tools, a
hybridization of the MIDI and IETF application domains may occur.

For example, interoperable MIDI networking may foster the development of
network music performance [6] applications, in which a group of
musicians, located at different physical locations, interact over a
network to perform as they would if located in the same room. As another
example, the audio streaming community may begin to use gestural codes
(such as MIDI) for normative low-bitrate audio, perhaps using the sound
synthesis standards described in [5] or [18]. As another example,
manufacturers of professional audio equipment and electronic musical
instruments may consider adopting the IETF multimedia stack (IP, SIP,
RTP) as the networking layer for a MIDI control plane.

To provide a foundation for these new applications, this memo extends
two of the IETF tools (RTP and SDP) to support the MIDI standard. The
memo extends RTP by adding a new packetization, the MIDI Wire Protocol
Packetization (MWPP), to the Audio/Visual Profile. The memo extends SDP
by defining a set of SDP parameters to support the configuration and



Lazzaro/Wawrzynek                                               [Page 7]

INTERNET-DRAFT                                              1 March 2003


negotiation of MIDI endpoint behaviors using SIP, RTSP, and other IETF
session setup tools.

Some applications may require MIDI media delivery at a certain service
quality level (latency, jitter, packet loss, etc). RTP itself does not
provide service guarantees. However, applications may use lower-layer
network protocols to configure the quality of the transport services
that RTP uses. These protocols may act to reserve network resources for
RTP flows [24], or may simply direct RTP traffic onto a dedicated "media
network" in a local installation. Note that RTP and MWPP do provide
tools that applications may use to achieve the best possible real-time
performance at a given service level.

The scope of this memo is limited in several respects. This memo
normatively defines the syntax and semantics of MWPP, an RTP
packetization for MIDI. However, this memo does not define algorithms
for sending and receiving MWPP RTP packets. An ancillary IETF document
[22] provides informative guidance on MWPP algorithms. Supplemental
information may be found in related conference publications [6] [8] and
reference software [7].

The scope of this memo is also limited in that it defines MIDI
extensions for RTP and SDP, but it does not define frameworks for using
RTP, SDP and other IETF tools in any specific MIDI application domain.
Other documents, from the IETF or from other organizations, may define
frameworks that incorporate MWPP, but this memo does not.

1.1 MWPP RTP Overview

The first part of this memo (Sections 2-5, Appendices A and B) defines
the MIDI Wire Protocol Packetization (MWPP), a MIDI RTP [2]
packetization for the Audio/Visual Profile [3].

The MIDI standard [1] defines a command set that describes sound as a
series of events (NoteOn command to start a musical note event, NoteOff
command to end a note, etc). Commands execute on one of the 16 voice
channels (a voice channel is usually devoted to a single instrument
timbre) or on the special systems channel. The command syntax explicitly
codes the execution channel. See Appendix D for a more detailed
introduction to MIDI.

MWPP maps a single MIDI command stream (16 voice channels + systems)
onto an RTP stream.  Section 2 of this memo introduces the modular
design of MWPP packetization. The simplest form of MWPP uses the MIDI
command section (described in Sections 3) as a complete self-framed RTP
payload. This lightweight version of MWPP is suitable for use over
reliable transport such as TCP, or over local datagram networks that are
known to be reliable.



Lazzaro/Wawrzynek                                               [Page 8]

INTERNET-DRAFT                                              1 March 2003


MWPP is also suitable for use over unreliable transport such as unicast
and multicast UDP. The term unreliable transport means that packets may
be lost in transit or delivered out-of-order. MWPP provides feed-forward
resiliency by inserting a journal section (such as the recovery journal,
described in Sections 4 and 5 and Appendices A and B) into each RTP
packet. The journal codes the recent history of the stream. Receivers
use the journal to gracefully recover from packet loss and out-of-order
packet delivery.

MWPP supports the two command execution timing methods defined in the
MIDI standard: the implicit "time-of-arrival" code used in the MIDI wire
protocol (a networking standard for the remote operation of musical
instruments over short asynchronous serial lines), and the explicit
timestamps of Standard MIDI Files (a file format for representing
complete musical performances).

1.2 Overview of SDP Parameters for MWPP

The second part of this memo (Section 6 and Appendices C.1-5) defines
Session Description Protocol (SDP, [9]) parameters for MWPP.  These
parameters may be used to customize (and perhaps negotiate [11]) the
configuration of an MWPP session, by using SDP in conjunction with
session setup tools like SIP [10] or RTSP [12].

For example, the extensible SDP parameter "render" configures the method
of rendering the MIDI command stream into audio output (Appendix C.5).
Other SDP parameters provide tools for structuring multiple MWPP streams
(Appendix C.4), setting the resiliency configuration (Appendix C.1), and
customizing the MWPP timestamp semantics (Appendix C.2).

Section 6 describes the SDP syntax for binding an MWPP stream to a MIME
type. MWPP supports two MIME types: the general-purpose mwpp MIME type,
and the mpeg4-generic MIME type [4]. The mpeg4-generic MIME type
supports MIDI rendering using the MPEG 4 Audio synthesis tools (General
MIDI [1], DLS2 [18], and Structured Audio [5]).

In this memo, the phrase "native MWPP stream" refers to an MWPP stream
that uses the mwpp MIME type. The phrase "mpeg4-generic MWPP stream"
refers to an MWPP stream that uses the mpeg4-generic MIME type.


2. MWPP Packet Format.

In this section, we introduce the format of MWPP RTP packets. The
description includes some background information on RTP/AVP, for the
benefit of MIDI implementors new to IETF tools. Likewise, Appendix D
provides a MIDI overview, for the benefit of networking specialists new
to musical applications. However, implementors should consult the



Lazzaro/Wawrzynek                                               [Page 9]

INTERNET-DRAFT                                              1 March 2003


normative documents for RTP/AVP [2,3] and MIDI [1] for authoritative
descriptions of these standards.

An RTP media stream is a sequence of logical packets that share a common
format. Each RTP packet consists of two parts: the RTP header and a
payload. Figure 1 shows this format for MWPP RTP packets (vertical space
delineates the header from the payload).


 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| V |P|X|  CC   |M|     PT      |        Sequence number        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                           Timestamp                           |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                             SSRC                              |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                     MIDI command section ...                  |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                       Journal section ...                     |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                  Figure 1 -- MWPP packet format


We describe RTP packets as "logical" packets to highlight the fact that
RTP itself is not a network-layer protocol. Instead, RTP packets are
mapped onto network protocols (such as unicast UDP, multicast UDP, or
TCP) by an application [13]. An application chooses a particular network
protocol for an MWPP stream based on several factors:

  o  Reliability. MWPP is designed for use over unreliable UDP
     transport. Receivers may use the recovery journal system to
     gracefully recover from packet loss. For higher fidelity,
     the recovery journal system may be combined with generic RTP
     resiliency tools (Appendix A.3 of [22]). Applications that
     require error-free transport of MIDI data may use MWPP over
     reliable TCP transport, or may pair a UDP MWPP stream with
     an RTP retransmission stream (Appendix A.3 of [22]).








Lazzaro/Wawrzynek                                              [Page 10]

INTERNET-DRAFT                                              1 March 2003


  o  Latency. Low latency is a desirable property for interactive
     applications. The latency of UDP and TCP are comparable over
     networks with very low loss. For higher loss rates, UDP has a
     latency advantage: TCP packet retransmission adds a round-trip
     time to packet latency, and head-of-line blocking further
     increases latency.

  o  Topology. Some MWPP applications are a good match to the
     one-to-many architecture of UDP multicast transport (one piano
     keyboard controlling several synthesizers over a LAN, one
     streaming server broadcasting to many receivers over a WAN,
     etc). Although one-to-many topologies may be simulated using
     unicast links (see Appendix B.1 of [22]), simple approaches
     to multicast simulation do not scale well to large sessions.

  o  Availability. Low-cost embedded environments may not support
     TCP, forcing a UDP solution. The IETF multimedia toolkit is
     designed to scale downward to low-cost environments that do
     not support TCP.

Next, we describe the RTP header and payload, in separate sections.

2.1 RTP Header

The RTP header begins with an octet of fields (V, P, X, and CC) to
support specialized RTP uses (see [2] and [3] for details). For the bulk
of RTP applications, V is set to 2, and the P, X, and CC fields are set
to 0. These default values yield an RTP stream with a fixed header size
of 12 octets. If network bandwidth is at a premium, header compression
[14] may be used to reduce overhead.

The second RTP header octet holds the M and PT fields. The 1-bit M field
is set to 1 for all MWPP packets.  The 7-bit PT field encodes the
payload format type. The PT field value for a stream is set during
session configuration by the SDP rtpmap line (Sections 6.1 and 6.2).

The other RTP header fields code the 16-bit sequence number and 32-bit
timestamp for the packet, and the 32-bit sender identification number
(SSRC) for the stream. These unsigned integer values are coded in the
IETF network byte order (big-endian). We discuss the timestamp and
sequence fields below, and refer the reader to [2] for information on
the SSRC field.

The sequence number is initialized to a randomly chosen value, and is
incremented by one (modulo 2^16) for each packet sent in the stream.  A
related quantity, the 32-bit extended packet sequence number, may be
computed by tracking rollovers of the 16-bit sequence number.  Note that
different receivers in the same session may compute different extended



Lazzaro/Wawrzynek                                              [Page 11]

INTERNET-DRAFT                                              1 March 2003


packet sequence numbers, depending on when the receiver joined the
session.

The RTP timestamp sets the base timestamp value for the packet. The MWPP
payload codes MIDI command timestamps relative to this base timestamp
value (Section 3). The sampling instant of the RTP packet (used in [2]
to calculate stream statistics) is the command timestamp of the first
MIDI command in the MIDI command section. If an RTP packet has an empty
MIDI command section, the RTP timestamp of the packet codes the sampling
instant for the packet.

The RTP timestamp units are set during session configuration by the SDP
rtpmap parameter srate (Sections 6.1 and 6.2). For example, if
configuration sets srate to a value of 44100 Hz, two MWPP packets whose
base timestamp values differ by 2 seconds have RTP timestamp fields that
differ by 88200. By default (Appendix C.4.2) the timestamp field is
initialized to a randomly chosen value.

MWPP RTP timestamps do not necessarily increment at a fixed rate,
because MWPP packets are not necessarily sent at a fixed rate. The
timestamps for two sequential RTP packets may be identical, or the
second packet may have a timestamp arbitrarily larger than the first
packet (modulo 2^32). Section 3 places additional restrictions on the
RTP timestamps for two sequential RTP packets.

The degree of regularity of MWPP packet transmission reflects the
underlying dynamics of the application. Interactive applications may
vary the packet sending rate to track the gestural rate of a human
performer, whereas content-streaming applications may send MWPP packets
at a fixed rate.

MWPP defines the length of media time a packet encodes as the RTP
timestamp difference (modulo 2^32) between the packet's successor and
the packet itself. By default, the media time for a packet may be
arbitrarily long. However, a maximum media time for MWPP packets in a
stream may be set during session configuration, via the SDP parameter
maxptime (Appendix C.3).

2.2 MWPP Payload

The MWPP payload (Figure 1) MUST begin with the MIDI command section.
The MIDI command section codes a (possibly empty) list of timestamped
MIDI commands, and provides the essential service of MWPP. The payload
may also contain a journal section. The journal section provides
resiliency by coding the recent history of the stream.

Section 3 defines the format for the MIDI command section. Sections 4-5
and Appendices A and B define the recovery journal, the default format



Lazzaro/Wawrzynek                                              [Page 12]

INTERNET-DRAFT                                              1 March 2003


for journal section. Here, we describe how these payload sections
operate in an MWPP stream.

The journalling method for an MWPP stream is set at the start of a
session and may not be changed thereafter. A stream may be set to use
the recovery journal, to use an alternative journal format (none are
defined in this memo), or to not use a journal.

The default journalling method of a stream is inferred from its
transport type. Streams that use unreliable transport (such as UDP)
default to using the recovery journal. Streams that use reliable
transport (such as TCP) default to not using a journal. Appendix C.1.1
defines session configuration tools for overriding these defaults.

If an MWPP stream uses the recovery journal, every payload in the stream
MUST include a journal section. If an MWPP stream does not use
journalling, a journal section MUST NOT appear in a stream payload. If
an MWPP stream uses an alternative journal format, the specification for
the journal format defines an inclusion policy.

If an MWPP stream sent over reliable transport does not use journalling,
the sender MUST transmit an RTP packet stream with consecutive sequence
numbers (modulo 2^16). If an MWPP stream sent over reliable transport
uses the recovery journal, the sender MAY transmit an RTP stream with
missing or out-of-order packets.

The recovery journal codes the minimal information needed for a graceful
recovery (ending stuck notes, updating channel volumes, etc) from a
packet loss episode. The minimal approach is a good fit to low-latency
interactive applications. Content-streaming applications may combine the
recovery journal system with generic RTP resiliency tools, to improve
the fidelity of the rendered MIDI performance (see Appendix A.3 of
[22]).

Some MWPP applications are archival in nature. Archival applications
require error-free transport of MIDI data, but are unconcerned with
transmission latency. An example archival application would be a program
that records MIDI streams to disk.

Latency-tolerant archival applications typically map MWPP streams onto
TCP transport. Archival applications with a real-time component (such as
a disk recorder with real-time monitoring) have several transport
options. If network packet loss is very low, the TCP stream may be
suitable for both real-time monitoring and archiving.  Alternatively, a
session may use two streams for MIDI data, one sent over UDP for
monitoring, and one sent over TCP for archiving (see Figures E.3 and E.4
of Appendix E.1 of [22]). A third approach is to pair a UDP MWPP stream
with a generic RTP retransmission stream configured for reliability (see



Lazzaro/Wawrzynek                                              [Page 13]

INTERNET-DRAFT                                              1 March 2003


Appendix A.3 of [22])

The payload of an MWPP stream encodes data for a single MIDI command
namespace (16 voice channels + systems). Applications may use several
MWPP streams in a session. For example, an application may use 2 MWPP
streams to send 32 MIDI voice channels. As a second example, an
application may split a single MIDI namespace between a UDP MWPP stream
and a TCP MWPP stream, to separate real-time data and archival bulk
data. Session configuration tools for multiple MWPP streams are defined
in Appendix C.4, and Appendix E of [22] shows detailed examples of
multi-stream sessions.

The definition of an MWPP stream may specify how the receiver renders
the MIDI data into audio (or sometimes, into control actions such as the
rewind of a tape deck or the dimming of stage lights). Appendix C.5
defines session configuration tools to set the MIDI rendering model for
an MWPP stream. These tools support standards-based models (such as the
General MIDI [1], DLS2 [18], and Structured Audio [5] profiles of MPEG 4
Audio [5]), and may be extended to support proprietary MIDI renderers.

The theoretical size of the MIDI command section ranges from 1 to 8193
octets; the theoretical size of a recovery journal ranges from 3 to
17394 octets. If an MWPP stream is sent over UDP transport, the Maximum
Transmission Unit (MTU) of the underlying network limits the practical
size of these payload sections (for example, an Ethernet MTU is 1500
octets). The session configuration tools defined in Appendix C.4 may be
used to split a dense MIDI namespace into several UDP MWPP streams, so
that the MWPP payload fits comfortably into an MTU.


3. MIDI Command Section

Figure 2 shows the format of the MIDI command section.


 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|B|Z|P| LEN .. |          MIDI list ...                         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                 Figure 2 -- MIDI command section


The MIDI command section begins with a variable-length header.  The
header field LEN codes the length (in units of octets) of the MIDI list
that follows the header.




Lazzaro/Wawrzynek                                              [Page 14]

INTERNET-DRAFT                                              1 March 2003


If the header flag B is 0, the header is one octet long, and LEN is a
5-bit field, supporting a maximum MIDI list length of 31 octets. If B is
1, the header is two octets long, and LEN is a 13-bit field, supporting
a maximum MIDI list length of 8191 octets.

A LEN value of 0 is legal, and codes an empty MIDI list. If LEN is
nonzero, the MIDI list has the structure shown in Figure 3.


 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|    Delta Time 0 (if Z = 1)   |     MIDI Command 0 ...         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|        Delta Time 1          |     MIDI Command 1 ...         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|        Delta Time 2          |     MIDI Command 2 ...         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                            .....                              |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|        Delta Time N          |  MIDI Command N (may be empty) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                 Figure 3 -- MIDI list structure


If the header flag Z is 1, the MIDI list begins with a complete MIDI
command (MIDI Command 0) preceded by a delta time (Delta Time 0). If Z
is 0, the Delta Time 0 field is not present in the MIDI list, and MIDI
Command 0 has an implicit delta time of 0.  The MIDI list structure may
also optionally encode a list of N additional complete MIDI commands.
Each additional command is preceded by a delta time.

The MWPP delta time syntax is a modified form of the MIDI File delta
time syntax [1]. MWPP delta times use 1-4 octet fields to encode 32-bit
unsigned integers. Figure 4 shows the encoded and decoded forms of delta
times. Note that delta time values may be legally encoded in multiple
formats; for example, there are four legal ways to encode the zero delta
time (0x00, 0x8000, 0x800000, 0x80000000).












Lazzaro/Wawrzynek                                              [Page 15]

INTERNET-DRAFT                                              1 March 2003


  One-Octet Delta Time:

     Encoded form: 0ddddddd
     Decoded form: 00000000 000000000 00000000 0ddddddd

  Two-Octet Delta Time:

     Encoded form: 1ccccccc 0ddddddd
     Decoded form: 00000000 00000000 00cccccc cddddddd

  Three-Octet Delta Time:

     Encoded form: 1bbbbbbb 1ccccccc 0ddddddd
     Decoded form: 00000000 000bbbbb bbcccccc cddddddd

  Four-Octet Delta Time:

     Encoded form: 1aaaaaaa 1bbbbbbb 1ccccccc 0ddddddd
     Decoded form: 0000aaaa aaabbbbb bbcccccc cddddddd

              Figure 4 -- Decoding delta time formats


MWPP uses delta times to encode a timestamp for each MIDI command. The
timestamp for MIDI Command K is the summation (modulo 2^32) of the RTP
timestamp and decoded delta times 0 through K. This cumulative coding
technique, borrowed from MIDI File delta time coding, is efficient
because it reduces the number of multi-octet delta times.

All command timestamps in a packet MUST be less than or equal to the RTP
timestamp of the next packet in the MWPP stream (modulo 2^32).

By default, a command timestamp indicates the execution time for the
command. The difference between two timestamps indicates the time delay
between the execution of the commands. This difference may be zero,
coding simultaneous execution. MIDI sources that use explicit command
timestamps, such as the MIDI file format, are simple to transcode into
MWPP streams using these default semantics.

MIDI command sources that use implicit command timing, such as the MIDI
wire protocol, must be annotated with timestamps as part of the MWPP
transcoding process. The hardware and systems environment for an
application may dictate a particular approach to timestamps, that may
not be a good fit for the default MWPP timestamp semantics. To address
this issue, the semantics of command timestamps may be customized during
session configuration, as described in Appendix C.2.

The command timestamp for MIDI Command N (the final command in the MIDI



Lazzaro/Wawrzynek                                              [Page 16]

INTERNET-DRAFT                                              1 March 2003


list) indicates the last moment in time coded by the MIDI Command
Section. Low-latency interactive systems MAY use this timestamp as a
proxy for the sending time of the packet. To facilitate this use, the
MIDI Command field associated with the Delta Time N in the MIDI list MAY
be empty.

As a rule, each non-empty MIDI Command field in the MIDI list contains a
complete MIDI command, in the binary command format defined in the MIDI
standard [1]. In the remainder of this section, we describe exceptions
to this rule.

The first MIDI channel command in the MIDI list MUST include a status
octet. Running status coding, as defined in [1], may be used for all
subsequent MIDI channel commands in the MIDI list. If the P (phantom)
header flag is set to 1, the status octet of the first MIDI channel
command in the MIDI list does not appear in the source data stream, and
is coded in the MIDI list only to satisfy the normative sentence at the
start of this paragraph.

As in [1], System Common and System Exclusive messages (0xF0 ... 0xF7)
cancel running status state, but System RealTime messages (0xF8 ...
0xFF) do not effect running status state. As receivers MUST be able to
decode running status, sender implementors should feel free to use
running status to improve bandwidth efficiency. However, senders SHOULD
NOT introduce timing jitter into an existing MIDI command stream through
an inappropriate use of running status coding.

In the MIDI wire protocol [1], a System RealTime command may be embedded
inside of another "host" MIDI command.  This syntactic construction is
not supported in MWPP: a MIDI Command field in the MIDI list codes
exactly one complete MIDI command.

To encode an embedded System RealTime command, senders MUST extract the
command from its host, and code it in the MIDI list as a separate
command. The host command and System RealTime command SHOULD appear in
the same MIDI list. The delta time of the System RealTime command SHOULD
result in a command timestamp that encodes the System RealTime command
placement in its original embedded position.

Two methods are provided for encoding MIDI System Exclusive (SysEx)
commands in the MIDI list. A SysEx command may be encoded in a MIDI
Command field verbatim: an 0xF0 octet, followed by an arbitrary number
of data octets, followed by an 0xF7 octet.

Alternatively, a SysEx command may be encoded as multiple segments.  The
command is divided into two or more SysEx command segments; each segment
is encoded in its own MIDI Command field in the MIDI list.




Lazzaro/Wawrzynek                                              [Page 17]

INTERNET-DRAFT                                              1 March 2003


MWPP supports segmentation in order to encode SysEx commands that encode
information in the temporal pattern of data octets. By encoding these
commands as a series of segments, each data octet is associated with a
delta time. Segmentation may also be useful in coding very large SysEx
commands across several RTP packets.

To segment a SysEx command, first partition its data octet list into two
or more sublists; each sublist must contain at least one data octet.  To
complete the segmentation, add status octets to the head and tail of
each sublist, as detailed in Figure 5. Figure 6 shows example
segmentations of a SysEx command.


    -----------------------------------------------------------
   | Sublist Position |  Head Status Octet | Tail Status Octet |
   |-----------------------------------------------------------|
   |    first         |       0xF0         |       0xF0        |
   |-----------------------------------------------------------|
   |    middle        |       0xF7         |       0xF0        |
   |-----------------------------------------------------------|
   |    last          |       0xF7         |       0xF7        |
    -----------------------------------------------------------

           Figure 5 -- Command Segmentation Status Octets



























Lazzaro/Wawrzynek                                              [Page 18]

INTERNET-DRAFT                                              1 March 2003


  Original SysEx command:

     0xF0 0x01 0x02 0x03 0x04 0x05 0x06 0x07 0x08 0xF7

  A two-segment segmentation:

     0xF0 0x01 0x02 0x03 0x04 0xF0

     0xF7 0x05 0x06 0x07 0x08 0xF7

  A different two-segment segmentation:

     0xF0 0x01 0xF0

     0xF7 0x02 0x03 0x04 0x05 0x06 0x07 0x08 0xF7

  A three-segment segmentation:

     0xF0 0x01 0x02 0xF0

     0xF7 0x03 0x04 0xF0

     0xF7 0x05 0x06 0x07 0x08 0xF7

  The segmentation with the largest number of segments:

     0xF0 0x01 0xF0

     0xF7 0x02 0xF0

     0xF7 0x03 0xF0

     0xF7 0x04 0xF0

     0xF7 0x05 0xF0

     0xF7 0x06 0xF0

     0xF7 0x07 0xF0

     0xF7 0x08 0xF7


                   Figure 6 -- Example segmentations







Lazzaro/Wawrzynek                                              [Page 19]

INTERNET-DRAFT                                              1 March 2003


The relative ordering of SysEx command segments in a MIDI list must
match the relative ordering of the sublists in the original SysEx
command. Only System RealTime MIDI commands may appear between SysEx
command segments. If the command segments of a SysEx command are placed
in the MIDI lists of two or more RTP packets, the segment ordering rules
apply to the concatenation of all affected MIDI lists.

The MIDI wire protocol [1] permits a "dropped 0xF7" construction for
SysEx commands; in this coding method, the 0xF7 octet is dropped from
the end of the SysEx command, and the status octet of the next MIDI
command acts both to terminate the SysEx command and start the next
command. To encode this construction in MWPP, follow these steps:

  o  Determine the appropriate delta times for the SysEx command and
     the command that follows the SysEx command.

  o  Insert the "dropped" 0xF7 octet at the end of the SysEx command,
     to form the standard SysEx syntax.

  o  Code both commands into the MIDI list using the rules above.

  o  Replace the 0xF7 octet that terminates the verbatim SysEx
     encoding or the last segment of the segmented SysEx encoding
     with a 0xF5 command. This substitution informs the receiver
     of the original dropped 0xF7 coding.


4. The Recovery Journal System

The recovery journal is the default MWPP resiliency tool for unreliable
transport. In this section, we normatively define the roles that senders
and receivers play in the recovery journal system.  Readers unfamiliar
with MIDI command semantics may wish to review Appendix D before reading
this section.

MIDI is a fragile code. A single lost command in a MIDI command stream
may produce an artifact in the rendered performance. We normatively
classify rendering artifacts into two categories:

   o Transient artifacts. Transient artifacts produce immediate
     but short-term glitches in the performance. For example, a lost
     NoteOn (0x9) command produces a transient artifact: one note
     fails to play, but the artifact does not extend beyond the end
     of that note.

   o Indefinite artifacts. Indefinite artifacts produce long-lasting
     errors in the rendered performance. For example, a lost NoteOff
     (0x8) command may produce an indefinite artifact: the note that



Lazzaro/Wawrzynek                                              [Page 20]

INTERNET-DRAFT                                              1 March 2003


     should have been ended by the lost NoteOff command may sustain
     indefinitely. As a second example, the loss of a Control Change
     (0xB) command for the channel volume (controller 7) may also
     produce an indefinite artifact: after the loss, all notes on the
     channel may play too softly or too loudly.

The purpose of the recovery journal system is to satisfy the recovery
journal mandate: the MIDI performance rendered from an MWPP stream sent
over unreliable transport must not contain indefinite artifacts.

The recovery journal system does not use packet retransmission to
satisfy this mandate. Instead, each MWPP packet includes a special
section, called the recovery journal.

The recovery journal codes the history of the MWPP stream, back to an
earlier packet called the checkpoint packet. The range of coverage for
the journal is called the checkpoint history. The recovery journal codes
the information necessary to recover from the loss of an arbitrary
number of packets in the checkpoint history. See Appendix A.1 for
normative definitions for these terms.

When a receiver detects a packet loss, it compares its own knowledge
about the history of the stream with the history information coded in
the recovery journal of the packet that ends the loss event. By noting
the differences in these two versions of the past, a receiver is able to
transform all indefinite artifacts in the rendered performance into
transient artifacts, by executing MIDI commands to repair the stream.

We now state the normative role for senders in the recovery journal
system.

Senders prepare a recovery journal for every MWPP RTP packet in the
stream. In doing so, senders choose the checkpoint packet identity for
the journal. Senders make this choice by applying a sending policy.
Appendix C.1.2 normatively defines three sending policies: closed-loop,
open-loop, and anchor.

By default, senders MUST use the closed-loop sending policy. If the
session description overrides this default policy, by using the SDP
parameter j_update defined in Appendix C.1.2, senders MUST use the
specified policy.

After choosing the checkpoint packet identity for an MWPP packet, the
sender creates the recovery journal. By default, this journal MUST
conform to the normative semantics in Section 5 and Appendices A and B
in this memo. In Appendix C.1.3, we define SDP parameters that modify
the normative semantics for recovery journals. If the session
description uses these SDP parameters, the journal created by the sender



Lazzaro/Wawrzynek                                              [Page 21]

INTERNET-DRAFT                                              1 March 2003


MUST conform to the modified semantics.

Next, we state the normative role for receivers in the recovery journal
system.

A receiver MUST detect each RTP sequence number break in an MWPP stream.
If the sequence number break is due to a packet loss event (as defined
in [2]), the receiver MUST repair all indefinite artifacts in the
rendered MIDI performance caused by the packet loss event. If the
sequence number break is due to an out-of-order packet (as defined in
[2]), the receiver MUST NOT take actions that introduce indefinite
artifacts (ignoring the out-of-order packet is a safe option).

Receivers take special precautions when entering or exiting a session.
A receiver MUST process the first received packet in an MWPP stream as
if it were a packet that ends a loss event. Upon exiting a session, a
receiver MUST ensure that the rendered MIDI performance does not end
with indefinite artifacts.

Receivers are under no obligation to perform indefinite artifact repairs
at the moment a packet arrives. A receiver that uses a playout buffer
may choose to wait until the moment of rendering before processing the
recovery journal, as the "lost" packet may be a late packet that arrives
in time to use.

Next, we state the normative role for the creator of the session
description in the recovery journal system. Depending on the
application, the sender, the receivers, and other parties may take part
in creating or approving the session description.

A session description that specifies the default closed-loop sending
policy and the default recovery journal semantics satisfies the recovery
journal mandate. However, these default behaviors may not be appropriate
for all sessions. If the creators of a session description use the SDP
parameters in Appendix C.1 to override these defaults, the creators MUST
ensure that the parameters define a system that satisfy the recovery
journal mandate.

Finally, we note that this memo does not specify sender or receiver
recovery journal algorithms. Implementations are free to use any
algorithm that conforms to the requirements in this section. The non-
normative [22] discusses sender and receiver algorithm design.


5. Recovery Journal Format

This section introduces the structure of the recovery journal, and
defines the bitfields of recovery journal headers. Appendices A and B



Lazzaro/Wawrzynek                                              [Page 22]

INTERNET-DRAFT                                              1 March 2003


complete the bitfield definition of the recovery journal. The recovery
journal has a three-level structure:

  o Top-level header.

  o Channel and system journal headers. Encodes recovery
    information for a single MIDI channel (channel journal)
    and for all MIDI Systems commands (system journal).

  o Chapters. Describes recovery information for a single MIDI
    command type.

Figure 7 shows the top-level structure of the recovery journal.  A
recovery journals consists of a 3-octet header, optionally followed by a
system journal and a list of channel journals.


 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|S|A|Y|R|TOTCHAN|    Checkpoint Packet Seqnum   |     ...       |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|   ... System journal ...      |  Channel journals ...         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

            Figure 7 -- Top-level recovery journal format


If the Y bit is set to 1, a system journal follows the recovery journal
header. If the A bit is set to 1, the recovery journal ends with a list
of (TOTCHAN + 1) channel journals. If A and Y are both zero, the
recovery journal only contains the 3-octet header, and is considered to
be an "empty" journal.

A MIDI channel may be represented by (at most) one channel journal in a
recovery journal. Channel journals appear in the recovery journal in
ascending channel-number order.

The S (single-packet loss) bit appears in most recovery journal
structures. It helps receivers efficiently parse the recovery journal in
the common case of the loss of a single packet.  Appendix A.1 defines S
bit semantics.

The R bit is reserved. The semantics for all R fields are uniform
throughout the recovery journal, and are defined in Appendix A.1.

The 16-bit Checkpoint Packet Seqnum field codes the sequence number of
the checkpoint packet for this journal. The choice of the checkpoint



Lazzaro/Wawrzynek                                              [Page 23]

INTERNET-DRAFT                                              1 March 2003


packet sets the depth of the checkpoint history for the journal (defined
in Appendix A.1).

Receivers may use the Checkpoint Packet Seqnum field of the packet that
ends a loss event to verify that the journal checkpoint history covers
the entire loss event. The checkpoint history covers the loss event if
the Checkpoint Packet Seqnum field is less than or equal to the highest
RTP sequence number previously received on the stream (modulo 2^16).


 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|S| CHAN  |R|      LENGTH       |P|W|N|A|T|C|M|R|  Chapters ... |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                Figure 8 -- Channel journal format


Figure 8 shows the structure of a channel journal: a 3-octet header,
followed by a list of leaf elements called channel chapters. A channel
journal encodes information about MIDI commands on the MIDI channel
coded by the 4-bit CHAN header field.

The 10-bit LENGTH field codes the length of the channel journal. The
semantics for LENGTH fields are uniform throughout the recovery journal,
and are defined in Appendix A.1.

The third octet of the channel journal header is the Table of Contents
(TOC) of the channel journal. The TOC is a set of bits that encode the
presence of a chapter in the journal. Each chapter contains information
about a certain class of MIDI channel command:

   o  Chapter P: MIDI Program Change (0xC)
   o  Chapter W: MIDI Pitch Wheel (0xE)
   o  Chapter N: MIDI NoteOff (0x8), NoteOn (0x9)
   o  Chapter A: MIDI Poly Aftertouch (0xA)
   o  Chapter T: MIDI Channel Aftertouch (0xD)
   o  Chapter C: MIDI Control Change (0xB)
   o  Chapter M: MIDI Parameter System (part of 0xB)

Chapters appear in a list following the header, in order of their
appearance in the TOC. Appendices A.2-8 describe the bitfield format for
each chapter, and define the conditions under which a chapter type MUST
appear in the recovery journal. If any chapter types are required for a
channel, an associated channel journal MUST appear in the recovery
journal.




Lazzaro/Wawrzynek                                              [Page 24]

INTERNET-DRAFT                                              1 March 2003


 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|S|D|V|Q|E|X|      LENGTH       |  System chapters ...          |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                Figure 9 -- System journal format


Figure 9 shows the structure of the system journal: a 2-octet header,
followed by a list of system chapters.  System chapters code information
about a specific class of MIDI Systems command:

   o  Chapter D: Song Select (0xF3), Tune Request (0xF6), Reset (0xFF)
   o  Chapter V: Active Sense (0xFE)
   o  Chapter Q: Sequencer State (0xF2, 0xF8, 0xF9, 0xFA, 0xFB, 0xFC)
   o  Chapter E: MTC Tape Position (0xF1, 0xF0 0x7F 0xcc 0x01 0x01)
   o  Chapter X: System Exclusive (all other 0xF0)

If header bits D, V, Q, or E are set to 1, one chapter for each chapter
type whose associated bit is set appears in a list following the header.
The chapter ordering follows the ordering of chapter header bits in the
header bitfield. If header bit X is set to 1, one or more Chapter X
bitfields appear at the end of the chapter list.

Appendix B describes the bitfield format for the system chapters, and
define the conditions under which a chapter type MUST appear in the
recovery journal. If any system chapter type is required to appear in
the recovery journal, the system journal MUST appear in the recovery
journal.


6. MWPP and the Session Description Protocol

RTP is a standard for the transport of media streams, but RTP does not
perform session management for the streams it carries. Instead, RTP is
designed to work together with tools that perform session management,
such as the Session Initiation Protocol (SIP, [10]) and the Real Time
Streaming Protocol (RTSP, [12]). RTP interacts with session management
tools via another standard, the Session Description Protocol (SDP, [9]).
SDP is a textual format for specifying session descriptions.

A session description is an ordered list of declarative statements (or
"lines"). A session description includes one or more media stream
descriptions. A stream description maps an RTP stream to a network
transport (for example, unicast UDP at a certain IP number and port
number), and defines the numeric value of the PT field in the RTP header
for the stream. A stream description also maps each RTP stream to a



Lazzaro/Wawrzynek                                              [Page 25]

INTERNET-DRAFT                                              1 March 2003


media encoding (such as MWPP), and may carry configuration parameters
for the media encoding.

Session management tools like SIP and RTSP coordinate the exchange of
complete session descriptions between session participants.  The
exchange protocol may by unilateral in nature: a sender proposes a
session description, which a receiver must accept in order to join the
session. Alternatively, some exchange protocols, like the SIP
offer/answer model [11], specify negotiation methods, in which the
proposal and acceptance/rejection of session descriptions are components
of the negotiation process.

In the sections that follow, we show how to construct session
descriptions that include MWPP stream descriptions. Section 6.1 defines
the stream description syntax for native MWPP streams.  Section 6.2
defines the stream description syntax for mpeg4-generic MWPP streams. In
Section 6.3, we introduce the SDP parameter extensions for MWPP; these
extensions are described in detail in Appendix C.

6.1 Session Descriptions for Native MWPP Streams

In this section, we show the session description syntax for sessions
that use native MWPP streams (i.e. MWPP streams layered directly onto
RTP). For simplicity, we focus on unicast UDP transport. See Appendix
B.2 of [22] for information on multicast UDP transport, and see Appendix
C of [22] for information on reliable TCP and TLS transport.

A unicast session description specifies how to send media streams to a
party in the session. Thus, for a session with two participants, two
session descriptions are needed to completely specify the session (one
for each party). In [22], we show pairs of session descriptions that
describe two-party sessions. For simplicity, the examples below show
unpaired session descriptions.

A session description begins with lines to describe the session
characteristics that are common to all streams (session name, start and
end time, etc). These common lines do not relate to MWPP, and so we do
not discuss them here; instead, we refer the reader to [9]. All session
description examples in this memo uses the same set of common lines,
shown below:

v=0
o=lazzaro 2520644554 2838152170 IN IP4 first.example.net
s=Example
t=0 0

One or more media stream descriptions follow the common lines of a
session description. The minimal SDP stream description consists of



Lazzaro/Wawrzynek                                              [Page 26]

INTERNET-DRAFT                                              1 March 2003


three lines: a media (m=) line, a connection data line (c=), and an
rtpmap attribute line (a=rtpmap). The media line binds the UDP port that
receives the RTP stream to the RTP payload type. The media line has the
syntax:

m=audio <port number> RTP/AVP <payload type>

The connection line specifies the IP network address that receives the
RTP stream, and has the syntax:

c=IN IP4 <IP number>

The rtpmap line maps the payload type to the MIME type for the stream,
and has the syntax:

a=rtpmap: <payload type> <mime-type>/<srate>[/<audio-channels>]

The <mime-type> for native MWPP streams is mwpp. The rtpmap line also
sets the sample rate and the number of audio channels.  For many MWPP
applications, the <audio-channels> field is irrelevant or redundant; we
include it here for compatibility reasons. Note that the square brackets
around <audio-channels> indicates it is an optional field; the default
value for <audio-channels> is 1 (mono).

We now show an example session description, that includes one minimal
MWPP stream description:

v=0
o=lazzaro 2520644554 2838152170 IN IP4 first.example.net
s=Example
t=0 0
m=audio 5004 RTP/AVP 96
c=IN IP4 192.0.2.94
a=rtpmap: 96 mwpp/44100

In this example, each MWPP packet in the stream has an RTP header PT
field value of 96, and the sample rate for the RTP header timestamp
field is 44100 Hz (Section 2.1 describes the RTP header fields).

The receiver accepts a unicast RTP stream at IP4 address 192.0.2.94 on
UDP port 5004. If the Real Time Control Protocol (RTCP, [2]) is in use,
the receiver accepts a second UDP stream on port 5005. The low-bandwidth
RTCP stream may serve two roles. RTCP carries session management
information about the RTP stream sent on port 5004. In addition, RTCP
carries reception quality information about the paired RTP stream that
the receiver may be transmitting to the sender.

We describe this stream description as minimal, because it does not



Lazzaro/Wawrzynek                                              [Page 27]

INTERNET-DRAFT                                              1 March 2003


customize the stream. Without such customization, a native MWPP stream
has these default characteristics:

  1. If the stream uses unreliable transport (unicast UDP, multicast
     UDP, ...) the recovery journal system is in use, and the RTP
     payload contains both the MIDI command section and the journal
     section. If the stream uses reliable transport (TCP, TLS, ...),
     the stream does not use journalling, and the payload contains
     only the MIDI command section. See Section 2.2 for details.

  2. If the stream uses the recovery journal system, the recovery
     journal system uses the default sending policy and the default
     journal semantics, as defined in Section 4 of this memo.

  3. In the MIDI command section of the payload, the command
     timestamps are interpreted as the command execution time, using
     the default semantics described in Section 3.

  4. An RTP packet does not have a defined maximum media time, and
     so the timestamp difference between adjacent packets in the
     stream may be arbitrarily large. See Section 2.1 for details.

  5. If more than one minimal mwpp stream appears in a session,
     the MIDI namespaces for these streams are independent: channel
     1 in the first stream does not reference the same MIDI channel
     as channel 1 in the second stream. In addition, the RTP timestamp
     fields for the streams do not necessarily share the same
     random offset value (see Section 2.1), and thus synchronization
     of the streams must use the generic RTP tools defined in [2].

  6. A MIDI rendering method for the stream is not specified.


6.2 Session Description for mpeg4-generic MWPP Streams

In this section, we show the session description syntax for sessions
that use mpeg4-generic MWPP streams (i.e. streams that layer MWPP
packets onto the mpeg4-generic RTP payload [4]). These streams support
MIDI rendering using the MPEG 4 Audio synthetic codecs:

  o General MIDI (Object Profile ID 14). This profile renders
    the MIDI stream using the General MIDI standard [1].

  o Wavetable Synthesis (Object Profile ID 13). This profile renders
    the MIDI stream using the DLS2 standard [18]. The session
    description includes the RIFF file to initialize the wavetable
    synthesis engine.




Lazzaro/Wawrzynek                                              [Page 28]

INTERNET-DRAFT                                              1 March 2003


  o Main Synthetic (Object Profile ID 12). This profile renders
    the MIDI stream using Structured Audio [5], an algorithmic
    synthesis system based on the programming language SAOL. The
    session description includes the SAOL program and associated
    data.

Minimal mpeg4-generic MWPP stream descriptions use the same media line,
connection line, and rtpmap line format as native MWPP stream
descriptions (Section 6.1). The only syntactic difference occurs in the
<mime-type> field (mpeg-4-generic replaces mwpp).

However, a minimal mpeg4-generic MWPP stream description also sets the
value of several mpeg4-generic SDP parameters, using fmtp lines.  Two of
these parameters (mode and streamtype) must be set to specific constant
values to create a legal mpeg4-generic MWPP stream. We show the proper
initialization for these parameters in the fmtp line below:

a=fmtp: <payload number> streamtype=5; mode=mwpp;

A third required parameter, profile-level-id, takes on the value 74 for
Main Synthetic (Object Profile ID 12), 75 for Wavetable Synthesis
(Object Profile ID 13), and 76 for General MIDI (Object Profile ID 14).

A fourth required parameter, config, is set to a double-quoted
hexadecimal string representation of the AudioSpecificConfig() binary
data block. Note that the format for AudioSpecificConfig() is shown in
[16]. For the Main Synthetic or Wavetable Synthesis profiles,
AudioSpecificConfig() codes the system initialization data (DLS2
samples, SAOL programs, etc). The config parameter may also be set to
the empty string, which acts as an escape code (see Appendix C.5.1).

We now show an example session description, that uses a minimal
mpeg4-generic MWPP stream to drive General MIDI (Object Profile ID 14):

v=0
o=lazzaro 2520644554 2838152170 IN IP4 first.example.net
s=Example
t=0 0
m=audio 5004 RTP/AVP 96
c=IN IP4 192.0.2.94
a=rtpmap: 96 mpeg4-generic/44100
a=fmtp: 96 streamtype=5; mode=mwpp; config="e4"; profile-level-id=76;

Each packet in the stream has an RTP header PT field value of 96, and
the sample rate for the RTP header timestamp field is 44100 Hz. See the
native MWPP stream example in Section 6.1 for a discussion of network
transport issues.




Lazzaro/Wawrzynek                                              [Page 29]

INTERNET-DRAFT                                              1 March 2003


The profile-level-id value of 76 informs the receiver to render the MIDI
stream using the General MIDI object type. The config value is a
hexadecimal string encoding of the short AudioSpecificConfig() used by
General MIDI.

We describe this stream description as minimal, because it defines the
SDP parameters that are required for mpeg4-generic operation, but does
not customize the stream via additional SDP parameters.

In Section 6.1, we describe the behavior of a minimal native MWPP
stream, as a numbered list of characteristics.  Characteristics 1-4 on
that list also describe the minimal mpeg4-generic MWPP stream, but
characteristics 5 and 6 require restatements, as listed below:

  5. If more than one minimal mpeg4-generic MWPP stream appears in
     a session, each stream denotes an independent instance of the
     synthesizer of the object type coded in the profile-level-id
     parameter. In addition, the RTP timestamp fields for the streams
     do not necessarily share the same random offset value (see
     Section 2.1), and thus synchronization of the streams must use
     the generic RTP tools defined in [2].

  6. The minimal MWPP stream encodes the AudioSpecificConfig()
     as an inline double-quoted hexadecimal string. This
     encoding limits the size of the AudioSpecificConfig() in
     some situations. Specifically, if the session management
     tool distributes a session description in a single datagram
     (such as SIP [10] over UDP transport), the size of the
     AudioSpecificConfig() string is limited by the Maximum
     Transmission Unit (MTU) of the underlying network (for
     Ethernet, the MTU is 1500 octets).


6.3 MWPP SDP Parameters

This section introduces optional MWPP session description parameters, to
add features to the minimal streams described in Sections 6.1 and 6.2.
In this section, we briefly discuss the purpose of each parameter, and
reference the Appendix C sub-section that contains the complete
parameter description.

Session descriptions use fmtp lines to set parameter values in a stream
description [9]. The syntax for a fmtp line is:

a=fmtp: <payload number> <param1>=<value1>; <param2>=<value2>; ...

The MWPP optional parameters provide several distinct sets of services:




Lazzaro/Wawrzynek                                              [Page 30]

INTERNET-DRAFT                                              1 March 2003


  o  Journal customization. The j_sec and j_update parameters
     configure the use of the journal section in the MWPP payload.
     The ch_default, ch_unused, ch_never, and ch_anchor parameters
     configure the semantics of the chapter types that appear in
     the recovery journal. These parameters are described in Appendix
     C.1, and override the default stream behaviors 1 and 2 listed
     in Section 6.1 and referenced in Section 6.2.

  o  MIDI command timestamp semantics. The tsmode, octpos,
     mperiod, and linerate parameters customize the semantics
     of the timestamps that label commands in the MIDI command
     section. These parameters let MWPP accurately encode the
     implicit time coding of the MIDI wire protocol. These
     parameters are described in Appendix C.2, and override
     default stream behavior 3 listed in Section 6.1 and
     referenced in Section 6.2

  o  Media time limits. The standard SDP parameter maxptime
     sets the maximum media time of an MWPP RTP packet, and
     as a consequence imposes a minimum sending rate for MWPP.
     This feature benefits algorithms performing clock-skew
     compensation, network latency estimation, and packet loss
     recovery. This parameter is described in Appendix C.3, and
     overrides default stream behavior 4 listed in Section 6.1
     and referenced in Section 6.2.

  o  Multiple streams. The midiport SDP parameter supports mapping
     multiple MWPP streams to the same MIDI namespace (for
     native MWPP streams) or to the same instance of an MPEG 4
     object type (for the mpeg4-generic MWPP streams). The zerosync
     SDP parameter provides an alternative way to synchronize
     multiple MWPP streams. These parameters are described in
     Appendix C.4, and override default stream behavior 5 in
     Sections 6.1 and 6.2.

  o  MIDI rendering. An extensible set of SDP parameters supports
     the specification of the MWPP rendering method, for both
     native MWPP streams and mpeg4-generic MWPP streams. These
     parameters are described in Appendix C.5 and override default
     stream behavior 6 in Sections 6.1 and 6.2.



7. Security Considerations

Cryptographic authentication of incoming RTP and RTCP packets is highly
recommended when using MWPP. Without such protections, attackers could
forge MIDI commands into an ongoing streams, potentially damaging



Lazzaro/Wawrzynek                                              [Page 31]

INTERNET-DRAFT                                              1 March 2003


speakers and eardrums. An attacker could also craft RTP and RTCP packets
to exploit known bugs in the client, and take effective control of a
client machine.

The session management tool should also use cryptographic authentication
on all session descriptions, as spoofed AudioSpecificConfig() data
blocks are another point of entry for attackers.

The zerosync SDP parameter (described in Appendix C.4.2) impairs a
security feature of RTP. In standard RTP, the RTP timestamp is
initialized to a randomly chosen value, to reduce the predictability of
RTP header values. If the zerosync SDP parameter is used with a non-zero
value in a stream description, and a plain-text session description is
snooped, an attacker knows the randomly chosen RTP timestamp offset for
the stream.

If the zerosync SDP parameter is used with a zero value for several
stream descriptions in a session, all of these streams use the same
randomly chosen RTP offset, and so an attacker may find this offset
value is easier to determine.

The sasc rendering value for the SDP render parameter (defined in
Appendix C.5.1) supports the inclusion of AudioSpecificConfig() data by
reference, using the url parameter. If this url is spoofed, an attacker
could change the session configuration in an arbitrary way, and thus
forge an attack on the MPEG 4 client.


8. Congestion Control

MWPP has congestion control issues that are unique for an RTP audio
packetization. In certain applications such as network musical
performance [6], the packet rate is linked to the gestural rate of a
human performer.

MWPP implementations SHOULD sense the MIDI wire protocol stream for
command patterns that result in excessive packet rates, and filter these
streams as part of MWPP to reduce the packet rate. [22] offers
implementation guidance on this issue.


9. Acknowledgements

We thank the networking, media compression, and computer music community
members who have commented or contributed to the MWPP standardization
effort, including Steve Casner, Robin Davies, Joanne Dow, Dominique
Fober, Adrian Freed, Philippe Gentric, Chris Grigg, Michel Jullian, Phil
Kerr, Young-Kwon Lim, Jan van der Meer, Colin Perkins, Charlie Richmond,



Lazzaro/Wawrzynek                                              [Page 32]

INTERNET-DRAFT                                              1 March 2003


Herbie Robinson, Larry Rowe, Dave Singer, Martijn Sipkema, David Wessel,
Matt Wright, Jim Wright, and Giorgio Zoia.

















































Lazzaro/Wawrzynek                                              [Page 33]

INTERNET-DRAFT                                              1 March 2003


Appendix A. The Recovery Journal Channel Chapters


Appendix A.1. Recovery Journal Definitions

In this Appendix, we define the terminology and the coding idioms that
are used in the recovery journal bitfield descriptions in Section 5
(journal header structure), Appendices A.2-8 (channel journal chapters)
and Appendices B.1-5 (system journal chapters).

These descriptions assume that the recovery journal resides in the
journal section of an RTP packet with sequence number I ("packet I") and
that the Checkpoint Packet Seqnum field in the top-level recovery
journal header refers to a packet with sequence number C. Sequence
number algorithms defined for the recovery journal system use modulo
2^16 arithmetic.

Several bitfield coding idioms appear throughout the recovery journal
system, with consistent semantics. Most recovery journal elements begin
with an "S" (Single-packet loss) bit. S bits are designed to help
receivers efficiently parse through the recovery journal hierarchy in
the common case of the loss of a single packet.

By default, an S bit MUST be set to 1. If a recovery journal element in
packet I encodes data about a MIDI command stored in the MIDI command
section of packet I - 1, the S bit MUST be set to 0. If a recovery
journal element has its S bit set to 0, all higher-level recovery
journal elements that contain it MUST also have S bits that are set to
0, including the top-level recovery journal header (Figure 7 in Section
5).

Other coding idioms that appear with consistent semantics throughout the
recovery journal system are described below.

  o R flag bit. R flag bits are reserved for future use by MWPP.
    Sender MUST set R bits to 0; receivers MUST ignore R bit values.

  o LENGTH field. All fields named LENGTH (as distinct from LEN)
    code the number of octets in the structure that contains it,
    including the header it resides in and all hierarchical levels
    below it. If a structure contains a LENGTH field, a receiver
    MUST use the LENGTH field value to advance past the structure
    during parsing, rather than use knowledge about the internal
    format of the structure. This restriction supports forward
    compatibility with revised versions of the journal format.

We now define normative terms used to describe recovery journal
semantics.



Lazzaro/Wawrzynek                                              [Page 34]

INTERNET-DRAFT                                              1 March 2003


  o Checkpoint history. The checkpoint history of a recovery journal
    is the concatenation of the MIDI command sections of packets C
    through I - 1. The last MIDI command in MIDI command section for
    packet I - 1 is considered the most recent command; the first
    MIDI command in the MIDI command section for packet C is
    the oldest command. A checkpoint history with no MIDI commands
    is considered to be empty. The checkpoint history never contains
    the MIDI Command section of the packet I (the packet containing
    the recovery journal), so if C == I, the checkpoint history is
    empty by definition.

  o Session history. The session history of a recovery journal is
    the concatenation of MIDI command sections from the first
    packet of the session up to packet I - 1. The definitions of
    MIDI command recency and history emptiness are the same as in
    the checkpoint history. The session history never contains the
    MIDI command section of packet I, and so the session history of
    the first packet in the session is empty by definition.

  o Finished/unfinished commands. If all octets of a MIDI command
    appear in the session history, the command is defined to be
    finished. If some but not all octets of a MIDI command appear
    in the session history, the command is defined to be unfinished.
    Unfinished commands occur if segments of a SysEx command appear
    in several RTP packets. For example, if a SysEx command is coded
    as 3 segments, with segment 1 in packet K, segment 2 in packet
    K + 1, and segment 3 in packet K + 2, the session histories for
    packets K + 1 and K + 2 contain unfinished versions of the command.

  o Active commands (default). For most types of MIDI commands,
    an active MIDI command is defined to be a MIDI command that does
    not appear before one of the following MIDI commands in the session
    history:  System Reset (0xFF), General MIDI System Enable
    (0xF0 0x7E 0xcc 0x09 0x01 0xF7), General MIDI System Disable
    (0xF0 0x7E 0xcc 0x09 0x00 0xF7). A few types of MIDI commands
    use a modified meaning of active (see below).

  o Active commands (NoteOn, NoteOff, Poly Aftertouch). For MIDI NoteOn,
    NoteOff, and Poly Aftertouch commands, an active MIDI command is
    defined to be a MIDI command that does not appear before one of the
    following MIDI commands in the session history: System Reset (0xFF),
    General MIDI System Enable (0xF0 0x7E 0xcc 0x09 0x01 0xF7), General
    MIDI System Disable (0xF0 0x7E 0xcc 0x09 0x00 0xF7), MIDI Control
    Change number 120 (All Notes Off) or 124 (All Sound Off).

  o Active commands (MIDI Control Change). For MIDI Control Change
    commands, an active MIDI command is defined to be a MIDI command
    that does not appear before one of the following MIDI commands in



Lazzaro/Wawrzynek                                              [Page 35]

INTERNET-DRAFT                                              1 March 2003


    the session history: System Reset (0xFF), General MIDI System Enable
    (0xF0 0x7E 0xcc 0x09 0x01 0xF7), General MIDI System Disable
    (0xF0 0x7E 0xcc 0x09 0x00 0xF7), MIDI Control Change number 121
    (All Controllers Off).

The chapter definitions in Appendices A.2-8 and B.1-5 reflect the
default recovery journal behavior of MWPP. The ch_default, ch_unused,
ch_never, and ch_anchor SDP parameters modulate these definitions, as
described in Appendix C.1.2.

The chapter definitions specify if data MUST be present in the journal.
Senders MAY also include non-required data in the journal.  This
optional data MUST comply with normative chapter definition. For
example, if a chapter definition states that a field codes data from the
most recent active command in the session history, the sender may not
code inactive commands or older commands in this field.

Finally, we note that channel journals only encode information about
MIDI commands appearing on the MIDI channel the journal protects. All
references to MIDI commands in Appendices A.2-8 should be read as "MIDI
commands appearing on this channel."


Appendix A.2. Chapter P: MIDI Program Change

A channel journal MUST contain Chapter P if an active Program Change
(0xC) command appears in the checkpoint history.  Figure A.2.1 shows the
format for Chapter P.


         0                   1                   2
         0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3
        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        |S|   PROGRAM   |C| BANK-COARSE |F| BANK-FINE   |
        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                Figure A.2.1 -- Chapter P Format


The chapter has a fixed size of 24 bits.  The PROGRAM field indicates
the program value of the most recent active Program Change command in
the session history.

By default, bits 8-23 of Chapter P are set to 0.  However, if an active
Control Change (0xB) command for controller 0 (Bank Select Coarse)
appears before this Program Change command in the session history, the C
bit is set to 1, and the BANK-COARSE field is set to the 7-bit data
value for the most recent Control Change command for controller 0. The F



Lazzaro/Wawrzynek                                              [Page 36]

INTERNET-DRAFT                                              1 March 2003


bit and BANK-FINE field code the Control Change command for controller
32 (Bank Select Fine) in an identical manner.


Appendix A.3. Chapter W: MIDI Pitch Wheel

A channel journal MUST contain Chapter W if an active MIDI Pitch Wheel
(0xE) command appears in the checkpoint history.  Figure A.3.1 shows the
format for Chapter W.


                 0                   1
                 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
                +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                |S|     FIRST   |R|    SECOND   |
                +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                Figure A.3.1 -- Chapter W Format


The chapter has a fixed size of 16 bits.  The FIRST and SECOND fields
are the 7-bit values of the first and second data octets of the most
recent active Pitch Wheel command in the session history.


Appendix A.4. Chapter N: MIDI NoteOff and NoteOn

In this Appendix, we consider NoteOn commands with zero velocity to be
NoteOff commands.

A channel journal MUST contain Chapter N if an active MIDI NoteOn (0x9)
or NoteOff (0x8) command appears in the checkpoint history. Figure A.4.1
shows the format for Chapter N.


















Lazzaro/Wawrzynek                                              [Page 37]

INTERNET-DRAFT                                              1 March 2003


   0                   1                   2                   3
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 8 0 1
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  |B|     LEN     |  LOW  | HIGH  |S|   NOTENUM   |Y|  VELOCITY   |
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  |S|   NOTENUM   |Y|  VELOCITY   | ....                          |
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  |   BITFIELD    |   BITFIELD    |     ....      |   BITFIELD    |
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                  Figure A.4.1 -- Chapter N Format


Chapter N codes the most recent active NoteOn or NoteOff reference to a
MIDI note number in the session history.  Chapter N consists of a
2-octet header, followed by least one of the following data structures:

   o A list of note logs to code NoteOn commands.
   o A NoteOff bitfield structure to code NoteOff commands.

The note log list MUST contain an entry for all note numbers whose most
recent checkpoint history appearance is in an active NoteOn command. The
NoteOff bitfield structure MUST contain a set bit for all note numbers
whose most recent checkpoint history appearance is in an active NoteOff
command. A note number is never coded in both structures.

The header for Chapter N, reproduced in Figure A.4.2, codes the size of
the note list and bitfield structures.


                 0                   1
                 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
                +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                |B|     LEN     |  LOW  | HIGH  |
                +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                Figure A.4.2 -- Chapter N Header


The LEN field, a 7-bit integer value, codes the number of 2-octet note
logs in the note list. Zero is a valid value for LEN, and codes an empty
note list. The LEN value of 127 has a special meaning, described later
in this section.

The 4-bit LOW and HIGH fields code the number of NoteOff bitfield octets
that follow the note log list. LOW and HIGH are unsigned integer values.
If LOW <= HIGH, there are (HIGH - LOW + 1) NoteOff bitfield octets in
the chapter.



Lazzaro/Wawrzynek                                              [Page 38]

INTERNET-DRAFT                                              1 March 2003


The value pairs (LOW = 15, HIGH = 0) and (LOW = 15, HIGH = 1) code an
empty NoteOff bitfield structure (no NoteOff bitfield octets).  Other
(LOW > HIGH) value pairs MUST NOT appear in the Chapter N header.

The LEN value of 127 serves double duty, coding a note list length of
127 or 128 note logs, depending on the values of LOW and HIGH. This
coding technique supports the unlikely, but legal, condition of 128
concurrent NoteOn commands, one for each note number.

If LEN = 127, LOW = 15, and HIGH = 0, the note list holds 128 note logs,
and the NoteOff bitfield structure is empty. If LEN = 127, the note list
contains 127 note logs. In this case, the chapter has (HIGH - LOW + 1)
NoteOff bitfield octets (if LOW <= HIGH) or has an empty NoteOff
bitfield structure (if LOW = 15 and HIGH = 1).

By default, the B bit MUST be set to 1. However, if the MIDI command
section of packet I - 1 includes a NoteOff command for the channel, the
B bit MUST be set to 0. If the B bit is set to 0, the higher-level
recovery journal elements that contain Chapter N MUST also have S bits
that are set to 0, including the top-level recovery journal header.

We now describe the 2-octet note log structure, reproduced in Figure
A.4.3.


                 0                   1
                 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
                +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                |S|   NOTENUM   |Y|  VELOCITY   |
                +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                Figure A.4.3 -- Chapter N Note Log


The 7-bit NOTENUM field codes the note number for the log; a note number
may not be represented by multiple note logs in the note list.  The
7-bit VELOCITY field codes the velocity value for the most recent active
NoteOn command for the note number in the session history.  VELOCITY is
never zero; NoteOn commands with zero velocity are coded as NoteOff
commands in the NoteOff bitfield structure.

The note log does not code the execution time of the NoteOn command.
However, the Y bit codes a hint from the sender about the NoteOn
execution time. This hint takes the form of a recommendation to play (Y
= 1) or skip (Y = 0) a recovered NoteOn command from this log.  More
specifically, Y is set to 1 if the NoteOn command coded by the note log
is considered to be simultaneous with the RTP timestamp of the packet
than contains the note log. The metric used to judge simultaneity is



Lazzaro/Wawrzynek                                              [Page 39]

INTERNET-DRAFT                                              1 March 2003


implementation dependent.

We now describe the NoteOff bitfield structure.  A NoteOff bitfield
octet codes NoteOff information for eight consecutive MIDI note numbers,
with the MSB representing the lowest note number. The MSB of the first
bitfield octet codes the note number 8*LOW; the MSB of the last bitfield
octet codes the note number 8*HIGH.

A set bit codes a NoteOff command for the note number; Chapter N does
not code NoteOff velocity data.  In the most efficient coding for the
NoteOff bitfield structure, the first and last octets of the structure
contain at least one set bit.


Appendix A.5. Chapter A: MIDI Poly Aftertouch

A channel journal MUST contain Chapter A if an active Poly Aftertouch
(0xA) command appears in the checkpoint history.  Figure A.5.1 shows the
format for Chapter A.


    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 8 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |S|    LEN      |S|   NOTENUM   |R|  PRESSURE   |S|   NOTENUM   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |R|  PRESSURE   |  ....                                         |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                    Figure A.5.1 -- Chapter A format


The chapter consists of a 1-octet header, followed by a variable length
list of 2-octet note logs. A note log MUST appear for a note number if
an active Poly Aftertouch command for the note number appears in the
checkpoint history.  A note number may not be represented by multiple
note logs in the note list.

The 7-bit LEN field codes the number of note logs in the list, minus
one. Figure A.5.2 reproduces the note log structure of Chapter A.











Lazzaro/Wawrzynek                                              [Page 40]

INTERNET-DRAFT                                              1 March 2003


                 0                   1
                 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
                +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                |S|   NOTENUM   |R|  PRESSURE   |
                +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                Figure A.5.2 -- Chapter A Note Log


The 7-bit PRESSURE field codes the pressure value of the most recent
active Poly Aftertouch command in the session history. The MIDI note
number for this command is coded in the 7-bit NOTENUM field.


Appendix A.6. Chapter T: MIDI Channel Aftertouch

A channel journal MUST contain Chapter T if an active MIDI Channel
Aftertouch (0xD) command appears in the checkpoint history.  Figure
A.6.1 shows the format for Chapter T.


                        0
                        0 1 2 3 4 5 6 7
                       +-+-+-+-+-+-+-+-+
                       |S|   PRESSURE  |
                       +-+-+-+-+-+-+-+-+

                Figure A.6.1 -- Chapter T Format


The chapter has a fixed size of 8 bits. The 7-bit PRESSURE field holds
the pressure value of the most recent active Channel Aftertouch command
in the session history.


Appendix A.7. Chapter C: MIDI Control Change

A channel journal MUST contain Chapter C if an active Control Change
(0xB) command appears in the checkpoint history (excepting controller
numbers 0, 6, 32, 38, 96, 97, 98, 99, 100, and 101). In certain cases
(defined later in this Appendix) this rule also applies to the excepted
controller numbers. Figure A.7.1 shows the format for Chapter C.









Lazzaro/Wawrzynek                                              [Page 41]

INTERNET-DRAFT                                              1 March 2003


    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 8 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |S|     LEN     |S|   NUMBER    |A|  VALUE/ALT  |S|   NUMBER    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |A| VALUE/ALT   |  ....                                         |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                    Figure A.7.1 -- Chapter C format


The chapter consists of a 1-octet header, followed by a variable length
list of 2-octet controller logs.  The list MUST contain an entry for a
controller number if an active Control Change command for the number
appears in the checkpoint history (excepting numbers 0, 6, 32, 38, 96,
97, 98, 99, 100, 101, 124, 125, 126, and 127). In certain cases (defined
later in this Appendix) this rule also applies to the excepted
controller numbers.

The 7-bit LEN field codes the number of controller logs in the list,
minus one.  A controller number may not appear in multiple controller
logs in the list. Figure A.7.2 reproduces the controller log structure
of Chapter C.


                 0                   1
                 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
                +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                |S|    NUMBER   |A|  VALUE/ALT  |
                +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

             Figure A.7.2 -- Chapter C Controller Log


The 7-bit NUMBER field identifies the controller number. The 7-bit
VALUE/ALT field codes recovery information for the most recent active
Control Change command for this number in the session history.

Chapter C provides three tools for coding recovery information for a
command in the VALUE/ALT field: the value tool, the toggle tool, and the
count tool. Implementations may choose among the tools to code a Control
Change command.

In the value tool, the 7-bit VALUE field codes the control value of the
most recent active Control Change command for this controller number in
the session history. This tool works best for controllers that code a
continuous quantity, such as number 1 (Modulation Wheel). If the value
tool is chosen, the A bit is set to 0.



Lazzaro/Wawrzynek                                              [Page 42]

INTERNET-DRAFT                                              1 March 2003


The A bit is set to 1 to code the toggle or count tool. These tools work
best for controllers that code discrete actions.  Figure A.7.3 shows the
controller log for these tools.


                0                   1
                0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
               |S|    NUMBER   |1|T|    ALT    |
               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

          Figure A.7.3 -- Controller Log for ALT tools


The T flag is set to 1 to code the toggle tool; T is set to 0 to code
the count tool. Both methods use the 6-bit ALT field as an unsigned
integer.

The toggle tools works best for controllers that act as on/off switches,
such as 64 (Hold Pedal). These controllers code the "off" state with
control values 0-63 and the "on" state with 64-127. The ALT field codes
the total number of toggles (off->on and on->off) due to Control Change
commands in the session history, including toggle events caused by MIDI
Control Change number 121 (All Controllers Off)

Toggle counting is performed modulo 64. The toggle count is reset at the
start of a session, and whenever a System Reset (0xFF), General MIDI
System Enable (0xF0 0x7E 0xcc 0x09 0x01 0xF7), or General MIDI System
Disable (0xF0 0x7E 0xcc 0x09 0x00 0xF7) appears in the session history.
When these reset events occur, the toggle count for a controller is set
to 0 (for controllers whose default value is 0-63) or 1 (for controllers
whose default value is 64-127).

The Hold Pedal controller illustrates the benefit of the toggle tool
over the value tool for switch controllers. As often used in piano
applications, the "on" state of the Hold Pedal lets notes resonate,
while the "off" state immediately damps notes to silence. The loss of
the "off" command in an "on->off->on" sequence results in ringing notes
that should have been damped silent.  The toggle tool lets receivers
detect this lost "off" command but the value tool does not.

The count tool is similar to the toggle tool, but is optimized for
controllers whose value octet is ignored, such as 120 (All Notes Off).
For the count tool, the ALT field codes the total number of Control
Change commands in the session history. Command counting is performed
modulo 64.

The command count is set to 0 at the start of the session, and is reset



Lazzaro/Wawrzynek                                              [Page 43]

INTERNET-DRAFT                                              1 March 2003


to 0 whenever a System Reset (0xFF), General MIDI System Enable (0xF0
0x7E 0xcc 0x09 0x01 0xF7), or General MIDI System Disable (0xF0 0x7E
0xcc 0x09 0x00 0xF7) appears in the session history.

We now describe normative coding rules for the controller numbers that
are excepted from the general rules presented in the beginning of this
Appendix. For each excepted controller number, we define the conditions
under which a control log MUST appear in Chapter C for a controller
number. By extension, these conditions imply that Chapter C MUST appear
in the recovery journal.

If active Control Change commands for controller numbers 0 (Bank Select
Coarse) or 32 (Bank Select Fine) appear in the checkpoint history, the
most recent commands for these numbers MUST appear as entries in the
controller list if the data value for these commands are not coded in
the BANK-COARSE (0) or BANK-FINE (32) fields of the Chapter P (Appendix
A.2) for the channel journal. This rule avoids redundant coding in
Chapters C and P.

Several controller numbers pairs are defined to be mutually exclusive.
Controller numbers 124 (Omni Off) and 125 (Omni On) form a mutually
exclusive pair, as do controller numbers 126 (Mono) and 127 (Poly).

If active Control Change commands for one or both members of a mutually
exclusive pair appear in the session history, one controller log MAY
appear in controller list to code the pair. If active Control Change
commands for one or both members of a mutually exclusive pair appear in
the checkpoint history, one controller log MUST appear in controller
list to code the pair. In both cases, the controller log that appears in
the controller list MUST code the controller number of the most recent
Control Change command of the pair in the session history.

Appendix A.8 defines Chapter M, the MIDI Parameter chapter, to provide
resiliency for the MIDI registered/non-registered parameter system.
Here, we define the Chapter C rules for coding Control Change commands
related to the registered/non-registered parameter system. These Chapter
C rules serve to minimize redundancy with Chapter M.

Control Change commands for controller numbers 6 and 38 (Data Slider)
and 96 and 97 (Data Button) may be used as part of the parameter system,
or may be used as general-purpose controllers. Control Change commands
for controller numbers 6, 38, 96, or 97 that appear in the session
history, and that are used in the parameter system, MUST NOT appear as
entries in the controller list.

However, if active Control Change commands for controller numbers 6, 38,
96, or 97 appear in the checkpoint history, and these commands are used
as general-purpose controllers, the most recent general-purpose command



Lazzaro/Wawrzynek                                              [Page 44]

INTERNET-DRAFT                                              1 March 2003


instance for these numbers MUST appear as entries in the controller
list.

Likewise, if active Control Change commands for controller numbers 6,
38, 96, or 97 appear in the session history, and these commands are used
as general-purpose controllers, the most recent general-purpose command
instance for these numbers MAY appear as entries in the controller list.

A parameter system transaction begins with paired Control Change
commands for numbers 98 and 99 (Non-Registered Parameter LSB and MSB) or
100 and 101 (Registered Parameter LSB and MSB). Chapter M codes these
paired Control Change commands. The Chapter C rule below acts to code
"unpaired" commands for these controller numbers, that appear in the
checkpoint history if a (98, 99) or (100, 101) pair is split across the
MIDI command sections of two MWPP packets.

If the most recent active Control Change command for controller 98, 99,
100, or 101 in the session history is part of a (98, 99) or (100, 101)
command pair that begins a parameter system transaction, the command
MUST NOT appear in the controller list.

However, if the most recent active Control Change command for controller
98, 99, 100, or 101 in the checkpoint history does not form part of a
(98, 99) or (100, 101) command pair, an entry MUST appear in the
controller list. Likewise, if the most recent active Control Change
command for controller 98, 99, 100, or 101 in the session history does
not form part of a (98, 99) or (100, 101) command pair, an entry MAY
appear in the controller list.


Appendix A.8. Chapter M: MIDI Parameter System

A channel journal MUST contain Chapter M if an active Control Change
command that forms part of an initiated parameter system transaction (as
defined below) appears in the checkpoint history.

We begin by defining the terms "parameter system", "parameter system
transaction", and "initiated parameter system transaction" as used in
the Appendix.

  o  Parameter system. This phrase refers to a MIDI feature that
     provides two sets of 16,384 parameters to augment the
     Control Change controller number space. Registered Parameter
     Names (RPN) system and the Non-Registered Parameter Names
     (NRPN) system each provides 16,384 parameters.

  o  Parameter system transaction. The value of RPNs and NRPNs are
     changed by a series of Control Change commands that form a



Lazzaro/Wawrzynek                                              [Page 45]

INTERNET-DRAFT                                              1 March 2003


     transaction. A transaction begins with two Control Change
     commands to set the parameter number (controller numbers
     98 and 99 for NRPNs, controller numbers 100 and 101 for RPNs).
     The transaction continues with an arbitrary number of
     Data Entry (controller numbers 6 and 38) and Data Button
     (controller numbers 96 and 97) Control Change commands to
     set the parameter value. The transaction ends with a second
     pair of (98, 99) or (100, 101) Control Change commands. These
     terminal commands are considered a part of the transaction.
     In addition, the terminal commands may start a second
     parameter system transaction; in this case, these commands
     belong to two transactions.

  o  Initiated parameter system transaction. An initiated parameter
     system transaction is a transaction whose (98, 99) or (100, 101)
     initial active Control Change command pair appears in the session
     history. Under certain conditions, unpaired active Control Change
     commands for controller numbers 98, 99, 100, or 100 are coded in
     Chapter C, as described in Appendix A.7.

Figure A.8.1 shows the variable-length format of Chapter M.


    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |S|P|N|R|R|R|      LENGTH       |  Transaction log list ...     |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

              Figure A.8.1  Top-level Chapter M format


Chapter M consists of a 2-octet header, followed by list of transaction
log entries. The 10-bit LENGTH field codes the length of Chapter M, and
conforms to semantics described in Appendix A.1.

If an active Control Change command that forms part of an initiated
parameter system transaction appears in the checkpoint history, a log
entry for the transaction MUST appear in the transaction list.

The relative order of transaction list entries MUST reflect the relative
position of parameter transactions in the session history: the first log
entry codes the most recent parameter transaction in the history, the
second log entry codes a transaction that appears before the first
parameter transaction in the history, etc.

The P header bit is set to 1 if an active Control Change command pair to
terminate the first RPN transaction in the log list does not appear in



Lazzaro/Wawrzynek                                              [Page 46]

INTERNET-DRAFT                                              1 March 2003


the session history. The N header bit has the same role for the first
NRPN transaction in the log list.

Figure A.8.2 shows the structure of a transaction log.


    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |S|T|       PARAM-NUMBER        |     KEY       |  DATA   ...   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |   ...         |      KEY      |   DATA ...                    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

              Figure A.8.2  Transaction Log Structure


The transaction log consists of a 2-octet header, followed by a
compressed enumeration of the Control Change commands for controller
numbers 6, 38, 96, and 97 for this transaction in the session history.
The presence of Control Change commands to terminate the transaction log
are coded implicitly by the P and N header bits of the top-level chapter
format (Figure A.8.1).

A transaction log header codes the parameter identity. If T is set to 1,
the log codes an NRPN parameter; if T is set to 0, the log codes an RPN
parameter. The 14-bit PARAM-NUMBER header field codes the parameter
number.

The KEY and DATA fields that follow log header encode the compressed
enumeration of the Control Change commands for numbers 6, 38, 96, and
97. The ordering of this enumeration matches the ordering of commands in
the transaction: the first transaction command appears as the first
command in the enumeration, the second transaction command appears as
the second command in the enumeration, etc.

KEY and DATA fields always appear in pairs in the transaction log; at
least one KEY-DATA pair MUST appear in a transaction log, even if no
Control Change commands need to be coded. The KEY field has a fixed
1-octet size, and acts as a directory for the KEY-DATA pair; the DATA
fields has a variable size of 0-3 octets. Figure A.8.3 shows the format
of the KEY octet.









Lazzaro/Wawrzynek                                              [Page 47]

INTERNET-DRAFT                                              1 March 2003


                        0
                        0 1 2 3 4 5 6 7
                       +-+-+-+-+-+-+-+-+
                       |S|M|IN1|IN2|IN3|
                       +-+-+-+-+-+-+-+-+

                   Figure A.8.3 -- Key Octet


The two-bit fields IN1, IN2, and IN3 code the appearance and meaning of
the first, second, and third DATA octet that may follows the KEY octet.
The IN fields code the following information:

  o  IN_k = 00. The DATA octet for this position is not present. The
     permitted placements of the 00 value are: IN1 = IN2 = IN3 = 00
     (no DATA octets follow the KEY octet), IN2 = IN3 = 00 (one DATA
     octet follow the KEY octet), IN3 = 00 (two DATA octets follow the
     KEY octet).

  o  IN_k = 01. Indicates an active Control Change command for
     controller number 6 (Data Entry Slider Coarse); the DATA
     octet codes the third octet of the Control Change command.

  o  IN_k = 02. Indicates an active Control Change command for
     controller number 38 (Data Entry Slider Fine); the DATA
     octet codes the third octet of the Control Change command.

  o  IN_k = 03. Indicates one or more active Control Change commands
     for controller number 96 (Data Button Increment) and/or 97
     (Data Button Decrement), without an intervening Control Change
     command 6 or 38.The DATA octet codes the cumulative effect of the
     Data Button commands, as a two's complement 8-bit value:
     controller 96 commands increment the value by 1, controller
     97 commands decrement the value by 1.

The M flag is 1 if another KEY octet follows the DATA octet(s). If M is
0, another transaction log may follow the DATA octet(s), or the DATA
octet(s) may mark the end of Chapter M, depending on the LENGTH field of
the top-level Chapter M header shown in Figure A.8.1.

In comparison with other recovery journal chapters, Chapter M is
inefficient: each transaction for a parameter number in the checkpoint
history is listed in the transaction list, and each Control Change
command for a transaction is enumerated in a transaction log. This
design decision trades off recovery journal size for design simplicity.
In practice, parameter system commands rarely appear in MIDI streams,
and this design decision does not have a significant impact on MWPP
bandwidth requirements.



Lazzaro/Wawrzynek                                              [Page 48]

INTERNET-DRAFT                                              1 March 2003


Appendix B. The Recovery Journal System Chapters


Appendix B.1. System Chapter D: Reset, Song Select, Tune Request

The system journal MUST contain Chapter D if an active MIDI Reset
(0xFF), MIDI Tune Request (0xF6), or MIDI Song Select (0xF3) command
appears in the checkpoint history.  Note that General MIDI reset
commands are coded in Chapter X (Appendix B.5), not in Chapter D.
Figure B.1.1 shows the variable-length format for Chapter D.


    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |S|E|T|G|R|R|R|R|  Command logs ...                             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

               Figure B.1.1 -- System Chapter D Format


The chapter consists of a 1-octet header, followed by one or more
command logs. Header flag bits indicate the presence of command logs for
the Reset (E = 1), Tune Request (T = 1), and Song Select (G = 1)
commands. Command logs appear in a list following the header, in the
order that their flag bits appear in the header.

Figure B.1.2 shows the 1-octet command log format for the Reset and Tune
Request commands.


                         0
                         0 1 2 3 4 5 6 7
                        +-+-+-+-+-+-+-+-+
                        |S|    COUNT    |
                        +-+-+-+-+-+-+-+-+

       Figure B.1.2 -- Command Log for Reset and Tune Request


Chapter D MUST contain the Reset command log if an active Reset command
appears in the checkpoint history. The 7-bit COUNT field codes the total
number of Reset commands (modulo 128) present in the session history.

Chapter D MUST contain the Tune Request command log if an active Tune
Request command appears in the checkpoint history. The 7-bit COUNT field
codes the total number of Tune Request commands (modulo 128) present in
the session history.



Lazzaro/Wawrzynek                                              [Page 49]

INTERNET-DRAFT                                              1 March 2003


Figure B.1.3 shows the 1-octet command log format for the Song Select
command.


                         0
                         0 1 2 3 4 5 6 7
                        +-+-+-+-+-+-+-+-+
                        |S|    VALUE    |
                        +-+-+-+-+-+-+-+-+

           Figure B.1.3 -- Song Select Command Log Format


Chapter D MUST contain the Song Select command log if an active Song
Select command appears in the checkpoint history. The 7-bit VALUE field
codes the song number of the most recent active Song Select command in
the session history.


Appendix B.2. System Chapter V: Active Sense Command

The system journal MUST contain Chapter V if an active MIDI Active Sense
(0xFE) command appears in the checkpoint history.  Figure B.2.1 shows
the format for Chapter V.


                         0
                         0 1 2 3 4 5 6 7
                        +-+-+-+-+-+-+-+-+
                        |S|    COUNT    |
                        +-+-+-+-+-+-+-+-+

               Figure B.2.1 -- System Chapter V Format


The 7-bit COUNT field codes the total number of Active Sense commands
(modulo 128) present in the session history.


Appendix B.3. System Chapter Q: Sequencer State Commands

This Appendix describes Chapter Q, the system chapter for the MIDI
sequencer commands.

The system journal MUST contain Chapter Q if an active MIDI Song
Position Pointer (0xF2), MIDI Clock (0xF8), MIDI Tick (0xF9), MIDI Start
(0xFA), MIDI Continue (0xFB) or MIDI Stop (0xFC) command appears in the
checkpoint history. MIDI Tick, a non-standard usage of 0xF9 that does



Lazzaro/Wawrzynek                                              [Page 50]

INTERNET-DRAFT                                              1 March 2003


not comply with [1], acts as a seconds-based alternative to MIDI Clock.
[Editors Note: MIDI Tick will be removed from Chapter Q in the next I-D
release]. Figure B.3.1 shows the variable-length format for Chapter Q.


 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|S|N|D|C|T|Q|TOP|          CLOCK                |   TICKS       |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|      ...                       |             QNOTE            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|  ...          |
+-+-+-+-+-+-+-+-+

               Figure B.3.1 -- System Chapter Q Format


Unlike most chapters, Chapter Q does not provide resiliency by coding
log entries for individual MIDI commands. Instead, Chapter Q captures
the cumulative effect of all sequencer commands in the session history,
by encoding the most recent sequencer system state. This coding strategy
yields an efficient chapter design: the minimal Chapter Q configuration
fits is 3 octets.

In a temporal sense, the fields of Chapter Q reflect system state up to
(but not including) the moment encoded by the RTP timestamp of the
packet in which it resides (packet I, as defined in Appendix A.1).  In
normal operation, a receiver examines Chapter Q after a packet loss
episode, in order to re-synchronize its open-loop estimation of the
sequencer state. Chapter Q state information includes the position of
the sequencer pointer (coded by the CLOCK and/or TICKS field), the
presence of the downbeat (the D bit) and the on/off state of the
sequencer (the N bit).

In addition, Chapter Q may optionally code an estimate of the current
tempo may be coded in the QNOTE field. QNOTE helps loss recovery in two
ways. If the sequencer is running, a tempo estimate may help a receiver
re-synchronize faster. If the sequencer is stopped, QNOTE tracks tempo
changes in the MIDI Clock or MIDI Tick stream; this information helps
receivers smoothly react if a Start or Continue command appears soon
after a packet loss episode.

We now state the normative definition of the Chapter Q bitfields.
Chapter Q consists of a 1-octet header followed by several optional
fields, in the order shown in Figure B.3.1.  Three header bits (C, T,
and Q) indicate the presence of fields following the header.  Two header
bits (N and D) encode aspects of the sequencer system state directly.



Lazzaro/Wawrzynek                                              [Page 51]

INTERNET-DRAFT                                              1 March 2003


Header flag bits C, T, and Q signal the presence of the 16-bit CLOCK
field (C set to 1), the 24-bit TICKS field (T set to 1) and the 24-bit
QNOTE field (Q set to 1).

The N header bit encodes the relative occurrence of the Start, Continue
and Stop commands in the session history.  If an active Start or
Continue command appears most recently, N is set to 1.  If an active
Stop appears most recently, or if no active instances of these commands
appear in the session history, N is set to 0.

The D header bit encodes the presence of the downbeat.  If N is set to
1, D is set to 1 if at least one Clock or Tick command follows the most
recent Start or Continue command in the session history. If this
condition does not hold, or if N is 0, then D is set to 0.

If N is set to 0 (coding a stopped sequence), or if N is set to 1 and D
is set to 0 (coding a sequence on the verge of beginning), Chapter Q
MUST encode the starting song position of the sequence. The C and T
header flags, the optional CLOCK (if C is set to 1) and TICKS (if T is
set to 1) fields, and the TOP header field, act to code the starting
song position, via the methods described below.

   o If C = 0 and T = 0, the starting song position is at the
     beginning of the song.

   o If C = 1 and T = 0, the 2-bit TOP header field and the 16-bit
     CLOCK field are combined to form the 18-bit unsigned quantity
     65536*TOP + CLOCK. This value encodes the starting song
     position, in units of clocks (24 clocks per quarter note).
     Use this method if the MIDI source uses Clock commands as
     timing pulses.

   o If C = 0 and T = 1, the 24-bit TICKS field codes the starting
     song position, in units of milliseconds. Use this method
     if the MIDI source uses Tick commands as timing pulses
     (10 ms per Tick). The song position MUST be encoded using
     sub-Tick (i.e. sub-10ms) resolution.

   o If C = 1 and T = 1, the starting song position is the sum of
     the positions encoded by the CLOCK, TOP and TICKS fields, as
     described above. Used this method if the MIDI stream
     uses Tick commands as timing pulses and also uses the
     clock-based Song Position Pointer commands to reposition
     the sequence.

If the N and D header bits are both set to 1, the sequence is playing,
and Chapter Q MUST encode the current song position in the sequence.
The current song position is coded using the same fields and methods as



Lazzaro/Wawrzynek                                              [Page 52]

INTERNET-DRAFT                                              1 March 2003


the starting song position (see above). If the TICKS field is used to
code the current song position, the field value counts time up to the
moment encoded by the RTP timestamp of packet I.

Chapter Q MAY encode an estimate of the current tempo, by setting the Q
header bit to 1, and placing the estimated tempo value in the 24-bit
QNOTE field. The QNOTE field has units of microseconds per quarter note.
This memo does not define a normative algorithm for tempo estimation for
the QNOTE field.  Note that Q may be set to 1 even if N is set to 0,
providing a method for coding current tempo while the sequence is
stopped.


Appendix B.4. System Chapter E: MIDI Time Code Tape Position

This Appendix describes Chapter E, the system chapter for the MIDI Time
Code (MTC) commands.

The system journal MUST contain Chapter E if an active MIDI System
Common Quarter Frame command (0xF1) or an active finished System
Exclusive (Universal Real Time) MTC Full Frame command (F0 7F cc 01 01
hr mn sc fr F7) appears in the checkpoint history.

Unfinished MTC Full Frame commands are coded in Chapter X, as described
in Appendix B.5. See Appendix A.1 for definitions of finished and
unfinished MIDI commands.

Figure B.4.1 shows the variable-length format for Chapter E.


 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|S|Q|C|P|D|POINT|                COMPLETE                       |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                 PARTIAL                       |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


               Figure B.4.1 -- System Chapter E Format


This Appendix contains two sub-sections. B.4.1 is an informative
description of the Chapter E design; B.4.2 is the normative definition
of the Chapter E bitfield semantics.

B.4.1  Informative Description of Chapter E




Lazzaro/Wawrzynek                                              [Page 53]

INTERNET-DRAFT                                              1 March 2003


The MIDI standard uses MTC to tag a particular moment in the MIDI stream
with a SMPTE timestamp (a frame-based timestamp standard for video and
film). In a typical application, a receiver uses these SMPTE timestamps
to synchronize the playback of a video tape deck with the MIDI stream.

MTC provides two methods for sending a SMPTE timestamp. The simple
method, the Full Frame command, encodes the entire timestamp in a
10-octet System Exclusive command. Alternatively, the timestamp value
may be transmitted incrementally, via 8 one-octet Quarter Frame commands
sent at regular intervals over two video frames.

Chapter E encodes SMPTE recovery information derived from MTC commands
that appear in the session history. In normal operation, a receiver
examines Chapter E after a packet loss episode, in order to re-
synchronize its open-loop estimation of the current SMPTE time.

Chapter E may hold two SMPTE timestamps. The 24-bit COMPLETE field,
present if the C bit is set, codes the most recent complete MTC
timestamp that appears in the session history. This timestamp may be
coded by one finished Full Frame command or 8 Quarter Frame commands. If
the COMPLETE field codes data from Quarter Frame commands, the COMPLETE
field value is two frames ahead of the timestamp encoded in the Quarter
Frame commands, to compensate for the transmission delay of the
incremental Quarter Frame code.

Chapter E may also contain a 24-bit PARTIAL field, that codes the
timestamp data fragments coded by an incomplete Quarter Frame sequence.
The P bit signals the presence of the PARTIAL field. The D, Q, and POINT
fields hold ancillary data that is essential for decoding the meaning of
the PARTIAL field.

B.4.2  Normative Definition of Chapter E

Chapter E holds information about the most recent MIDI Time Code (MTC)
tape position coded in the session history. Chapter E consists of a
1-octet header followed by two optional fields (COMPLETE and PARTIAL) in
the order shown in Figure B.4.1. The 24-bit COMPLETE field is present if
header bit C is set to 1; the 24-bit PARTIAL field is present if header
bit P is set to 1.

MTC tape position updates in the session history may occur atomically,
via a finished Full Frame command, or incrementally, via a series of
Quarter Frame commands spaced over the time period of two video frames.
The Q header bit codes if a Quarter Frame command (Q set to 1) or a
finished Full Frame command (Q set to 0) appears most recently in the
session history.

At any moment in time, the session history may hold a sequence of zero



Lazzaro/Wawrzynek                                              [Page 54]

INTERNET-DRAFT                                              1 March 2003


or more complete MTC frame values. A partially complete MTC frame value
(coded by an incomplete sequence of Quarter Frame commands) may also
appear in the session history (after the most recent complete MTC frame
value, if one exists).

If the session history holds a complete MTC frame, and if the Quarter
Frame command or finished Full Frame command that completes this frame
encoding appears in the checkpoint history, Chapter E MUST include the
24-bit COMPLETE field to encode the frame value. The C header bit is set
to 1 to signal the presence of the COMPLETE field.

If a partially complete MTC frame value appears in the session history
(after the most recent complete MTC frame value, if one exists), if this
partially complete frame value not malformed (i.e. the high nibble
sequence of Quarter Frame commands starts at 0 and increments
contiguously to an intermediate value, or else starts at 7 and
decrements contiguously to an intermediate value), and if at least one
Quarter Frame command coding this partial value appears in the
checkpoint history, Chapter E MUST include the 24-bit PARTIAL field to
encode the frame value in progress. The P header bit is set to 1 to
signal the presence of the PARTIAL field.

Note that the PARTIAL field never codes a frame value coded in a Full
Frame command; unfinished Full Frame commands are coded in Chapter X, as
described in Appendix B.5.

The D header flag bit signals the direction the tape is moving.  D is
set to 0 for forward or no movement; D is set to 1 for reverse movement.
If Q is set to 1, the relative motion of the upper nibble of the Quarter
Frame data value determines D. If Q is set to 0, the relative tape
motion from its last position determines D.

The D bit serves two roles in Chapter E. If a PARTIAL field is present
in Chapter E, the D bit serves a syntactic role: its state value is
required to parse the contents of PARTIAL (as explained below). In
addition, the tape direction information coded in the D bit serves an
advisory role for receivers performing tape re-synchronization after a
packet loss episode.

The 3-bit POINT field hold information about the incremental Quarter
Frame encoding in the session history. If Q is set to 1, POINT codes the
upper nibble of the most recent Quarter Frame data value in the session
history. If the PARTIAL field is present in Chapter E, the POINT field
serves a syntactic role: its state value is required to parse the
contents of PARTIAL (as explained below).  If Q is set to 0, POINT is
reserved for future use; senders MUST set POINT to 0x0, and receivers
must ignore its value.




Lazzaro/Wawrzynek                                              [Page 55]

INTERNET-DRAFT                                              1 March 2003


Figure B.4.2 shows the common format for the COMPLETE and PARTIAL
fields.


          0                   1                   2
          0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
         |TYP|  HOURS  |  MINUTES  | SECONDS   | FRAMES  |
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

           Figure B.4.2 -- COMPLETE and PARTIAL format


The 5-bit HOURS, 6-bit MINUTES, 6-bit SECONDS, and 5-bit FRAMES fields
encode the SMPTE values encoded in Full Frame and Quarter Frame
commands.  The bit allocations are sufficient to encode legal SMPTE
values; note that for some fields, the associated MIDI commands use
larger encodings. The 2-bit TYP field encodes the SMPTE frame type,
using same encoding as the Quarter Frame and Full Frame commands.

If used in the COMPLETE field, the TYP, HOURS, MINUTES, SECONDS, and
FRAMES fields hold the most recent complete frame value, encoded by a
finished Full Frame command or a series of 8 Quarter Frame commands in
the session history. If the COMPLETE field codes data from Quarter Frame
commands, the COMPLETE field value is two frames larger than the
timestamp encoded in the Quarter Frame commands, to compensate for the
transmission delay of the incremental Quarter Frame code.

If used in the PARTIAL field, the TYP, HOURS, MINUTES, SECONDS, and
FRAMES fields do not all contain valid values.  Recall that the PARTIAL
field encodes a partially complete SMPTE value encoded by a series of
Quarter Frame commands in the session history. The bits in the PARTIAL
field that correspond to data values in these Quarter Frame commands
hold valid values; all other PARTIAL bits are set to 0.  The valid
PARTIAL bits directly reflect the data values encoded in the Quarter
Frame commands in the session history; this PARTIAL field encoding MUST
NOT include a compensatory offset for transmission delay.

The D and POINT header values signal the valid bits in the PARTIAL
field.  If D is set to 0, PARTIAL field bits corresponding to Quarter
Frame commands with High Nibble values (0, 1, ... POINT) are valid.  If
D is set to 1, PARTIAL field bits corresponding to Quarter Frame
commands with High Nibble values (7, 6, ... POINT) are valid.


Appendix B.5. System Chapter X: System Exclusive

This Appendix describes Chapter X, the system journal chapter for the



Lazzaro/Wawrzynek                                              [Page 56]

INTERNET-DRAFT                                              1 March 2003


MIDI System Exclusive command (opcode 0xF0, abbreviation SysEx).

The system journal MUST contain at least one Chapter X entry if an
active SysEx command (excluding a finished MTC Full Frame command)
appears in the checkpoint history. A SysEx command is said to "appear"
in the checkpoint history if the history contains a verbatim encoding of
the SysEx command, or if the history contains at least one segment of
the segmental encoding of the SysEx command.

Note that finished MTC Full Frame commands are coded in Chapter E, as
described in Appendix B.4. Unfinished MTC Full Frame commands, however,
are coded in Chapter X. See Appendix A.1 for definitions of finished and
unfinished commands.

The Chapter X encoding is optimized for the short SysEx commands that
signal real-time events. Chapter X is not intended for use with the
longer SysEx commands used in bulk data transport, because the recovery
journal system is very inefficient if the journal size is large.  A MIDI
session that combines real-time and bulk-data functions SHOULD be sent
over two MWPP streams: a bulk-data stream sent over reliable transport,
and a real-time unreliable stream for shorter commands. The midiport SDP
parameter (Appendix C.4) supports split-stream operation.

Note that the structure of the system journal (Figure 9 in Section 5)
permits multiple entries for Chapter X. Each Chapter X entry codes
information about exactly one active SysEx command. The relative
ordering of Chapter X entries MUST reflect the relative position of
commands in the session history: the first Chapter X entry codes the
most recent active SysEx command in the history, the second Chapter X
entry codes a SysEx command that appears somewhere before the first
coded SysEx command in the history, the third Chapter X entry codes a
SysEx command that appears somewhere before the second coded SysEx
command in the history, etc.

A Chapter X entry for a SysEx command encodes all information about the
command that appears in the session history (as distinct from the
checkpoint history). This distinction is relevant for the coding of
SysEx commands whose segments appear across multiple packets. In this
case, the Chapter X entry MUST include the starting segments for the
SysEx command, even if these segments no longer appear in the checkpoint
history.

Chapter X provides two tools for encoding multiple SysEx commands of the
same type. Each command of a certain type may be encoded in a separate
Chapter X entry (the list tool) or only the most recent command of a
certain type may be encoded (the recency tool). Each active SysEx
command that appears in the checkpoint history MUST be associated with a
Chapter X entry via the list or recency tool (excluding finished MTC



Lazzaro/Wawrzynek                                              [Page 57]

INTERNET-DRAFT                                              1 March 2003


Full Frame commands).

However, if the recency tool is in use, some active SysEx commands that
appear in the checkpoint history may not actually be coded in a Chapter
X entry. If the recency tool is in use for a command type, older
commands of that type in the checkpoint history are considered to be
associated with the Chapter X entry that codes the most recent command
of that type, even though data from the older commands do not appear in
the Chapter X entry.

For each SysEx command type, an implementation may choose either the
list tool or the recency tool. Simple implementations may use the list
tool for all command types; sophisticated implementations may reduce
bandwidth by using the recency tool for some command types.

Figure B.5.1 shows the variable length format for System Chapter X.


 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|S|IDC|L|T| LEN |  DATA ...                                     |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

            Figure B.5.1 -- System Chapter X Format


Chapter X consists of a 1-octet header, following by an arbitrary length
DATA field. The DATA field encodes a modified version of the data octets
of the SysEx command, as described below. The leading 0xF0 and trailing
0x7F SysEx octets never appear in the DATA field.

If the Manufacturer ID value of the SysEx command (coded in the first
octet of the MIDI command) has the values 0x00, 0x7E, or 0x7F, the DATA
field begins with the second data octet of the SysEx command; for all
other Manufacturer ID values, the DATA field begins with the first data
octet of the SysEx command. The 2-bit IDC header field codes 0x00, 0x7E,
and 0x7F ID values, using the method shown in Figure B.5.2.













Lazzaro/Wawrzynek                                              [Page 58]

INTERNET-DRAFT                                              1 March 2003


-----------------------------------------------------------------------
| IDC | Manufacturer ID                | First DATA octet is:         |
|--------------------------------------|------------------------------|
| 0x0 | 0x7E (Universal Real-Time)     | 2nd SysEx data octet         |
|--------------------------------------|------------------------------|
| 0x1 | 0x7F (Universal Non-Real-Time) | 2nd SysEx data octet         |
|--------------------------------------|------------------------------|
| 0x2 | 0x00 (Extension Escape Code)   | 2nd SysEx data octet         |
|--------------------------------------|------------------------------|
| 0x3 | in the range 0x01--0x7D        | 1st SysEx data octet         |
----------------------------------------------------------------------|

                Figure B.5.2 -- IDC Header Field Encoding


The 3-bit LEN header field codes the exact length of short, complete
SysEx commands, and signals alternative coding techniques for longer
commands and truncated commands.

The LEN values 0x0 through 0x5 indicate that the length of the DATA
field is 1-6 octets. For these LEN values, the DATA field encodes a
complete SysEx command, as a verbatim copy of the SysEx data octets
(possibly skipping the first octet, as detailed in Figure B.5.2).

The LEN value 0x6 indicates that the DATA field contains 7 or more
octets. The DATA field encodes a complete SysEx command, as a verbatim
copy of the data octets of the SysEx command (possibly skipping the
first octet, as detailed in Figure B.5.2), with one exception: bit 7
(the most-significant bit) of the final data octet is set to one. This
set bit implicitly codes the length of the DATA field (MIDI data octets,
by definition, clear bit 7).

The LEN value 0x7 indicates that the DATA field encodes a truncated
SysEx command. This coding option is only to be used for SysEx commands
encoded using the segmented method, for the case where not all segments
appear in the session history.

If LEN is 0x7, the DATA field encodes the data octets of the SysEx
command segments that appear in the session history. The DATA field
holds a verbatim copy of the data octets of the coded portion of the
SysEx command, with two exceptions: the first octet may be skipped (as
detailed in Figure B.5.2) and bit 7 (the most-significant bit) of the
final coded data octet is set to one (to provide an implicit field
length, as in the case where LEN is 0x6).

The L and T header flags describe the coding tool used for the Chapter X
bitfield. If L is set to 1 (the list tool), all SysEx commands of this
type have an associated Chapter X bitfield in the system journal.  If L



Lazzaro/Wawrzynek                                              [Page 59]

INTERNET-DRAFT                                              1 March 2003


is set to 0 (the recency tool), only the most recent SysEx command of
this type has an associated Chapter X bitfield in the system journal.

The T flag defines the meaning of the word "type" in the previous
paragraph. The T flag has different semantics for MIDI Universal SysEx
commands (Manufacturers ID 0x7E and 0x7F) and for generic SysEx commands
(all other Manufacturers ID values).

We first define the T flag for Universal SysEx commands. The first four
data octets of Universal commands have a defined semantics in the MIDI
standard; we symbolically represent these four octets as: ID cc SubID
SubID1. If T is set to 0, all Universal commands with the same ID, cc,
SubID, and SubID1 values are considered the same type. If T is set to 1,
all Universal commands with the same ID, cc, and SubID values are
considered the same type.

For generic SysEx commands (all Manufacturers ID values except 0x7E and
0x7F), we define the T flag as follow. The first data octet of a generic
SysEx command is the Manufacturers ID; the remaining data octets may
have an arbitrary organization, but often have a set of octets coding
device and sub-command, followed by data octets for the command.

If T is set to 0, all generic SysEx commands with the same ID value are
considered to be of the same type. If T is set to 1, the SysEx command
is assumed to have a device/sub-command/data organization, and all
generic SysEx commands with the same ID value, device, and sub-command
values are considered to be of the same type. If the SysEx command has a
multi-level sub-command structure, these semantics require identical
sub-command values at all levels.






















Lazzaro/Wawrzynek                                              [Page 60]

INTERNET-DRAFT                                              1 March 2003


Appendix C. Session Description Protocol (SDP) Definitions

In this Appendix, we define the Session Description Protocol (SDP)
parameters for MWPP. These parameters may be used to customize (and
perhaps negotiate [11]) the configuration of an MWPP session, by using
SDP in conjunction with session setup tools like SIP [10] or RTSP [12].

Figure C.1 lists the parameters described in the Sections 1-5 of this
Appendix. With the exception of the standard SDP parameter maxptime
(defined in [9]), these parameters are defined in this memo for use with
MWPP.  Appendix C.6 formally defines the syntax for these parameters,
using ABNF [23].

MWPP uses parameters in three contexts, as formally defined in the IANA
considerations (Appendix C.7).  Session descriptions for native (Section
6.1) or mpeg4-generic (Section 6.2) MWPP streams may use parameters in
fmtp lines. In addition, a few MWPP parameters may be used to customize
the audio/sasc MIME encodings of Structured Audio initialization data
(Appendix C.5). The left-most columns of Figure C.1 show which
parameters may be used in each MIME context.































Lazzaro/Wawrzynek                                              [Page 61]

INTERNET-DRAFT                                              1 March 2003


   ----------------------------------------------------------------
  |  Parameter |   Type   | Appendix | mwpp | mpeg4-generic | sasc |
  |----------------------------------------------------------------|
  | j_sec      |  custom  |   C.1    |  x   |       x       |      |
  |----------------------------------------------------------------|
  | j_update   |  custom  |   C.1    |  x   |       x       |      |
  |----------------------------------------------------------------|
  | ch_default |  custom  |   C.1    |  x   |       x       |      |
  |----------------------------------------------------------------|
  | ch_unused  |  custom  |   C.1    |  x   |       x       |      |
  |----------------------------------------------------------------|
  | ch_never   |  custom  |   C.1    |  x   |       x       |      |
  |----------------------------------------------------------------|
  | ch_anchor  |  custom  |   C.1    |  x   |       x       |      |
  |----------------------------------------------------------------|
  | tsmode     |  custom  |   C.2    |  x   |       x       |      |
  |----------------------------------------------------------------|
  | linerate   |  custom  |   C.2    |  x   |       x       |      |
  |----------------------------------------------------------------|
  | octpos     |  custom  |   C.2    |  x   |       x       |      |
  |----------------------------------------------------------------|
  | mperiod    |  custom  |   C.2    |  x   |       x       |      |
  |----------------------------------------------------------------|
  | maxptime   | standard |   C.3    |  x   |       x       |      |
  |----------------------------------------------------------------|
  | midiport   |  custom  |   C.4    |  x   |       x       |      |
  |----------------------------------------------------------------|
  | zerosync   |  custom  |   C.4    |  x   |       x       |      |
  |----------------------------------------------------------------|
  | render     |  custom  |   C.5    |  x   |       x       |      |
  |----------------------------------------------------------------|
  | url        |  custom  |   C.5    |      |       x       |      |
  |----------------------------------------------------------------|
  | inline     |  custom  |   C.5    |      |       x       |      |
  |----------------------------------------------------------------|
  | compr      |  custom  |   C.5    |      |       x       |  x   |
  |----------------------------------------------------------------|
  | cid        |  custom  |   C.5    |      |       x       |  x   |
   ----------------------------------------------------------------

                 Figure C.1 -- Table of MWPP Parameters




Appendix C.1. SDP Definitions: The Journalling System

In this Appendix, we define the session description parameters that



Lazzaro/Wawrzynek                                              [Page 62]

INTERNET-DRAFT                                              1 March 2003


configure stream journalling and the recovery journal system.

Appendix C.1.1 defines the j_sec parameter, that sets the journalling
method for the stream.

Appendix C.1.2 defines the j_update parameter, that sets the recovery
journal sending policy for the stream. This section also normatively
defines the sending policies of the recovery journal system.

Appendix C.1.3 defines several parameters that modify the recovery
journal semantics. These parameters change the default recovery journal
semantics as defined in Section 5 and Appendices A and B.

Most of the normative text in this Appendix defines the behavior of a
recovery journal sender (Appendix C.1.2.2 and C.1.2.3 are exceptions,
and also define receiver behaviors). However, the sender definitions
have an indirect effect on receivers, as the requirements imposed on
senders affect how receivers carry out the normative duties defined in
Section 5.


C.1.1. The j_sec Parameter

Section 2.2 of this memo defines the default journalling method for an
MWPP stream. Streams that use unreliable transport (such as UDP) default
to using the recovery journal. Streams that use reliable transport (such
as TCP) default to not using a journal.

The SDP parameter j_sec may be used to override this default. This memo
defines two symbolic values for j_sec: "none", to indicate that all
stream payloads MUST NOT contain a journal section, and "recj", to
indicate that all stream payloads MUST contain a journal section that
uses the recovery journal format.

For example, the j_sec parameter might be set to "none" for a UDP MWPP
stream that travels between two hosts on a local network that is known
to provide reliable datagram delivery.

The stream description below configures a UDP stream that does not use
the recovery journal:

m=audio 5004 RTP/AVP 96
c=IN IP4 192.0.2.94
a=rtpmap: 96 mwpp/44100
a=fmtp: 96 j_sec=none;

Other IETF standards-track documents may define alternative formats for
the journal section. These documents MUST define new symbolic values for



Lazzaro/Wawrzynek                                              [Page 63]

INTERNET-DRAFT                                              1 March 2003


the j_sec parameter to signal the use of the alternative journal format.
If a session description uses a j_sec value unknown to the recipient,
the recipient MUST NOT accept the description.

Special j_sec issues arise when MWPP sessions are managed by the Real
Time Streaming Protocol (RTSP, [12]). In many streaming applications,
the session description in the response to the DESCRIBE method does not
code the transport details (such as UDP or TCP) for the session.
Instead, server and client negotiate transport details using the SETUP
method.

In this scenario, the use of the j_sec parameter may be ill-advised, as
the server does not yet know the transport reliability for the session.
In this case, the session description SHOULD configure the journalling
system using the parameters defined in the remainder of Appendix C.1,
but SHOULD NOT use j_sec to set the journalling status. Recall that if
j_sec does not appear in the session description, the default method for
choosing the journalling method is in effect (no journal for reliable
transport, recovery journal for unreliable transport).

However, an exception to this guidance occurs in situations where the
server knows journalling is always required (such as a pre-recorded MWPP
stream that contains packet loss events) or never required (such as UDP
transport over a local network that is known to be reliable).  In this
case, the session description returned by the DESCRIBE method SHOULD use
the j_sec parameter.


C.1.2. The j_update Parameter

In Section 4, we use the term "sending policy" to describe the method a
sender uses to choose the checkpoint packet identity for each recovery
journal in an MWPP stream. In the sub-sections that follow, we
normatively define three sending policies: anchored, closed-loop, and
open-loop.

As stated in Section 4, the default sending policy for an MWPP stream is
the closed-loop policy. The SDP parameter j_update may be used to
override this default.

We define three symbolic values for j_update: "anchor", to indicate that
the stream uses the anchored sending policy, "open-loop", to indicate
that the stream uses the open-loop sending policy, and "closed-loop", to
indicate that the stream uses the closed-loop sending policy. See
Appendix C.1.3 for examples session descriptions that use the j_update
parameter.

Other IETF standards-track documents may define additional sending



Lazzaro/Wawrzynek                                              [Page 64]

INTERNET-DRAFT                                              1 March 2003


policies for the recovery journal system. These documents MUST define
new symbolic values for the j_update parameter to signal the use of the
new policy. If a session description uses a j_update value unknown to
the recipient, the recipient MUST NOT accept the description.


C.1.2.1. The anchored sending policy

The anchored policy is the simplest sending policy. We normatively
define it as follows: in the anchored policy, the sender uses the first
packet in the stream as the checkpoint packet for all packets in the
stream.

In the anchored policy, the checkpoint history always covers the entire
stream. In this way, the anchored policy satisfies the recovery journal
mandate (Section 4).

Note that the anchored policy does not require the use of the Real Time
Control Protocol (RTCP, [2]) or other feedback from receiver to sender.
Senders do not need to take special actions to ensure that received
streams start up free of artifacts, as the recovery journal always
covers the entire history of the stream. Receivers are relieved of the
responsibility of tracking the changing identity of the checkpoint
packet, because the checkpoint packet never changes.

The main drawback of the anchored policy is bandwidth efficiency.
Because the checkpoint history covers the entire stream, the size of the
recovery journals produced by this policy usually exceeds the journal
size of alternative policies. For single-channel MIDI data streams, the
bandwidth overhead of the anchored policy is often acceptable (see
Appendix A.4 of [6]). For dense streams where this overhead is
unacceptable, the closed-loop or open-loop policies are more
appropriate.


C.1.2.2. The closed-loop sending policy

The closed-loop policy is the default policy of the recovery journal
system. For each packet in the stream, the policy lets senders choose
the smallest possible checkpoint history that satisfies the recovery
journal mandate. As smaller checkpoint histories generally yield smaller
recovery journals, the closed-loop policy reduces the bandwidth of an
MWPP stream, relative to the anchored policy.

The closed-loop policy relies on feedback from receiver to sender. The
policy assumes that a receiver periodically informs the sender of the
highest RTP packet sequence number it has seen so far in the stream,
coded in the 32-bit extension format defined in [2].



Lazzaro/Wawrzynek                                              [Page 65]

INTERNET-DRAFT                                              1 March 2003


In sessions that use RTCP, receivers transmit this information in the
Extended Highest Sequence Number Received (EHSNR) field of Receiver
Report (RR) packets. However, sessions MAY use any method of feedback to
implement the closed-loop policy. The sender may safely use receiver
sequence number reports to guide checkpoint history management, because
Section 4 requires receivers to repair indefinite artifacts whenever a
packet loss event occur.

We now normatively define the closed-loop policy. At the moment a sender
prepares an RTP packet for transmission, we assume that the sender is
aware of R >= 0 receivers for the stream. Senders may become aware of a
receiver via RTCP traffic from the receiver, via RTP packets from a
paired stream sent by the receiver to the sender, via messages from a
session management tool, or by other means. As receivers join and leave
a session, the value of R changes.

Each known receiver k (1 <= k <= R) is associated with a 32-bit extended
packet sequence number M(k), where the extension reflects the sequence
number rollover count of the sender. If the sender has received at least
one feedback report from receiver k, M(k) is the most recent report of
the highest RTP packet sequence number seen by the receiver, normalized
to reflect the rollover count of the sender.

If the sender has not received a feedback report from the receiver, M(k)
is the extended sequence number of the last packet the sender
transmitted before it became aware of the receiver. If the sender became
aware of this receiver before it sent the first packet in the stream,
M(k) is the extended sequence number of the first packet in the stream.

Given this definition of M(), we now state the closed-loop policy. When
preparing a new packet for transmission, a sender MUST choose a
checkpoint packet with extended sequence number N, such that M(k) >= (N
- 1) for all k, 1 <= k <= R, where R >= 1. The policy does not restrict
sender behavior in the R == 0 (no known receivers) case.

Under the closed-loop policy as defined above, a sender may transmit
packets whose checkpoint history is shorter than the session history (as
defined in Appendix A.1). In this event, a new receiver that joins the
stream may experience indefinite artifacts.

For example, if a Control Change (0xB) command for the channel volume
(controller 7) was sent early in a stream, and later a new receiver
joins the session, the closed-loop policy may permit all packets sent to
the new receiver to use a checkpoint history that does not include the
channel volume Control Change command. As a result, the new receiver
experiences an indefinite artifact, and play all notes on a channel too
loudly or too softly.




Lazzaro/Wawrzynek                                              [Page 66]

INTERNET-DRAFT                                              1 March 2003


To address this issue, the closed-loop policy states that whenever a
sender becomes aware of a new receiver, the sender MUST determine if the
receiver would be subject to indefinite artifacts under the closed-loop
policy. If so, the sender MUST ensure that the receiver starts the
session free of indefinite artifacts. In satisfying this requirement,
senders MAY infer the initial MIDI state of the receiver from the
session description. For example, the native MWPP stream defined in
Section 6.1. has the initial MIDI state defined in [1].

In some types of sessions, a receiver may have access to stream packets
before the sender is aware of the receiver. In this case, the
restrictions the closed-loop policy places on the sender may not protect
the receiver from indefinite artifacts.

To address this issue, the closed-loop policy states that if a receiver
participates in a session where it may have access to a stream before
the sender is aware of the receiver, the receiver MUST take actions to
ensure that its rendered MIDI performance does not contain indefinite
artifacts. The receiver MUST NOT discontinue these protective actions
until it is certain that the sender is aware of its presence.

The final set of normative closed-loop policy requirements concern how
senders drop receivers from a stream. As defined earlier in this
section, the closed-loop policy states that a sender MUST choose a
checkpoint packet with extended sequence number N, such that M(k) >= (N
- 1) for all k, 1 <= k <= R, where R >= 1. If the sender has received at
least one feedback report from receiver k, M(k) is the most recent
report of the highest RTP packet sequence number seen by the receiver,
normalized to reflect the rollover count of the sender.

If this receiver k stops sending feedback to the sender, the M(k) value
used by the sender reflects the last feedback report from the receiver.
As time progresses without feedback from receiver k, this fixed M(k)
value forces the sender to increase the size of the checkpoint history,
and thus increases the bandwidth of the stream.

At some point, the sender may be forced to take action in order to limit
the bandwidth of the stream. The closed-loop policy states that if this
situation occurs, and if the nature of the session permits a sender to
stop transmitting packets to the offending receiver, the sender MUST
stop transmitting packets to this receiver. In other words, it is not
permissible for a sender to no longer use M(k) in computing the
checkpoint packet identity but still send the stream to receiver k, if
it is possible for the sender to actively cut off receiver k from the
stream.

In certain types of sessions, it may not be possible for a sender to
actively stop sending packets to a particular receiver. The closed-loop



Lazzaro/Wawrzynek                                              [Page 67]

INTERNET-DRAFT                                              1 March 2003


policy states that if receivers participate in a session where senders
are unable to stop sending packets to a particular receiver of the
stream, the receiver MUST monitor the RTP stream, and any other sources
of information, to determine if the sender is no longer using the M(k)
feedback from the receiver to choose each checkpoint packet. If the
receiver detects this condition, it MUST leave the session, and close
down the rendered MIDI performance in a manner that is free of
indefinite artifacts.

We have now completed the normative definition of the closed-loop
policy. Note that the policy definition places requirements on senders
and receivers, but does not define a role for the creators of the
session descriptions. However, as we discuss in Appendix C.1.3 and in
[22], the complexity of sender and receiver implementations may be
reduced through the careful use of the SDP parameters that modify the
normative recovery journal semantics.

Finally, we note that the closed-loop policy is suitable for use in
RTP/RTCP sessions that use multicast transport. However, aspects of the
closed-loop policy do not scale well to sessions with large numbers of
participants. The sender state scales linearly with the number of
receivers, as the sender needs to track the identity and M(k) value for
each receiver k. The average recovery journal size is not independent of
the number of receivers, as the RTCP reporting interval backoff slows
down the rate of a full update of M(k) values.  The backoff algorithm
may also increase the amount of ancillary state used by implementations
of the normative sender and receiver behaviors defined in Section 4.
Appendix B of the non-normative [22] describes multicast issues in
detail.


C.1.2.3. The open-loop sending policy

The open-loop policy is suitable for sessions that are not able to
implement the receiver-to-sender feedback required by the closed-loop
policy, and are also not able to use the anchored policy because of
bandwidth constraints.

The open-loop policy does not place constraints on how a sender chooses
the checkpoint packet for each packet in the stream. In the absence of
such constraints, a receiver may find that the recovery journal in the
packet that ends a loss event has a checkpoint history that does not
cover the entire loss event. We refer to loss events of this type as
uncovered loss events.

To ensure that uncovered loss events do not compromise the recovery
journal mandate, the open-loop policy assigns specific recovery tasks to
senders, receivers, and the creators of session descriptions. The



Lazzaro/Wawrzynek                                              [Page 68]

INTERNET-DRAFT                                              1 March 2003


underlying premise of the open-loop policy is that the indefinite
artifacts produces during uncovered loss events fall into two classes.

One class of artifacts are recoverable indefinite artifacts. Receivers
are able to repair recoverable artifacts that occur during an uncovered
loss event without intervention from the sender, at the potential cost
of unpleasant transient artifacts.

For example, after an uncovered loss event, receivers are able to repair
indefinite artifacts due to NoteOff (0x8) commands that may have
occurred during the loss event, by execute NoteOff commands for all
active NoteOns commands. This action causes a transient artifacts (a
sudden silent period in the performance), but ensures that no stuck
notes sound indefinitely. We refer to MIDI commands that are amenable to
repair in this fashion as recoverable MIDI commands.

A second class of artifacts are unrecoverable indefinite artifacts.  If
this class of artifact occurs during an uncovered loss event, the
receiver is not able to repair the stream.

For example, after an uncovered loss event, receivers are not able to
repair indefinite artifacts due to Control Change (0xB) channel volume
(controller 7) commands that have occurred during the loss event. A
repair is impossible because the receiver has no way of determining the
data value of a lost channel volume command. We refer to MIDI commands
that are fragile in this way as unrecoverable MIDI commands.

The open-loop policy does not define a partition the MIDI command set
into recoverable and unrecoverable commands. Instead, it assumes that
the creators of the session descriptions are able to come to agreement
on a suitable recoverable/unrecoverable MIDI command partition for an
application.

Given these definitions, we now state the normative requirements for the
open-loop policy.

In the open-loop policy, the creators of the session description MUST
use the ch_unused or ch_anchor SDP parameters (defined in Appendix
C.1.3) to protect all unrecoverable MIDI command types from indefinite
artifacts. We normatively define these parameters in Appendix C.1.3.

In a general sense, the ch_anchor parameter changes the recovery journal
semantics to use the anchored checkpoint policy (Appendix C.1.2.1) for a
command, and the ch_unused parameter acts to exclude a command type from
the MWPP stream. These options act to shield command types from
artifacts during an uncovered loss event.

In the open-loop policy, receivers MUST examine the Checkpoint Packet



Lazzaro/Wawrzynek                                              [Page 69]

INTERNET-DRAFT                                              1 March 2003


Seqnum field of the recovery journal header after every loss event, to
check if the loss event is an uncovered loss event. Section 5 shows how
to perform this check. If an uncovered loss event has occurred, a
receiver MUST perform indefinite artifact recovery for all MIDI command
types that are not shielded by ch_anchor and ch_unused parameter
assignments in the session description.

The open-loop policy does not place specific constraints on the sender.
However, the open-loop policy works best if the sender manages the size
of the checkpoint history to ensure that uncovered losses occur
infrequently, by taking into account the delay and loss characteristics
of the network.  Also, as each checkpoint packet change incurs the risk
of an uncovered loss, senders should only move the checkpoint if it
reduces the size of the journal.


C.1.3. Recovery Journal Chapter Inclusion Parameters

Section 5 and Appendices A and B define the default semantics of the
recovery journal. In this Appendix, we define SDP parameters that modify
these semantics.

The recovery journal chapter definitions (Appendices A and B) specify
under what conditions a chapter must appear in the recovery journal.  In
most cases, the normative text states that if a certain MIDI command
type appears in the checkpoint history, a certain chapter type must
appear in the recovery journal to protect the command.

In this section, we describe the SDP chapter inclusion parameters.
These parameters modify the conditions under which a chapter appears the
journal.

Each parameter represents a type of chapter inclusion semantics. An
assignment to a parameter declares which chapters (or chapter subsets)
obey the inclusion semantics. We describe the assignment syntax for
these parameters later in this section.

Below, we normatively define the semantics of the chapter inclusion
parameters. For clarity, we define the action of parameters on complete
chapters. If a parameter is assigned a subset of a chapter, the
definition applies only to the chapter subset.

  o  ch_unused. Chapters assigned to the ch_unused parameter MUST
     NOT appear in the recovery journal. In addition, all MIDI
     command types encoded by these chapters MUST NOT appear in
     the MIDI command sections of the packets in the stream.

  o  ch_never. Chapters assigned to the ch_unused parameter MUST



Lazzaro/Wawrzynek                                              [Page 70]

INTERNET-DRAFT                                              1 March 2003


     NOT appear in the recovery journal. However, unlike ch_unused,
     the MIDI command types encoded by these chapters MAY appear in
     the MIDI command sections of the packets in the stream.

  o  ch_anchor. Chapters assigned to the ch_anchor obey a modified
     version of the chapter definition that appears in Appendices
     A or B. In the modified definition, all references to the
     checkpoint history are replaced with references to the session
     history, and all references to the checkpoint packet are
     replaced with references to the first packet sent in the
     stream. Senders MUST use this modified definition.

  o  ch_default. Chapters assigned to the ch_default parameter
     MUST follow the default semantics for the chapter (as defined
     in Appendices A and B).

Parameter assignments obey the following syntax:

  <parameter> = [channel list]<chapter list>[field list];

The chapter list is mandatory; the channel and field lists are optional.
Multiple assignments to these parameters have a cumulative effect, and
are applied in the order of parameter appearance. The ABNF in Appendix
C.6 provides a normative and concise statement of the syntax we define
below for channel, chapter, and field lists.

The chapter list specifies the channel and system chapters for which
this parameter applies, using a concatenated list of one or more upper-
case letters (ACDEMNPQTVW) corresponding to the chapter types.

The optional channel list specifies the channel journals for which this
parameter applies; if no channel list is provided, the parameter applies
to all channel journals. The channel list takes the form of a comma-
separated list of channel numbers (0 through 15) and dash-separated
channel number ranges (i.e. 0-5, 8-12, etc). The channel list is
irrelevant for system chapters.

The optional field list is only relevant for Chapters C, N, and A.  For
Chapter C (coding the MIDI Control Change command), the field list codes
the controller numbers for which the parameter applies.  For Chapters N
and A, the field list codes the note numbers for which the parameter
applies. The syntax for field lists follows the syntax for channel
lists. If no field list is provided, the parameter applies to all
controller or note numbers.

We have now concluded the normative definition of the chapter inclusion
parameters. These parameters are essential to the use of the open-loop
policy (Appendix C.1.2.3), and may also be used to simplify multicast



Lazzaro/Wawrzynek                                              [Page 71]

INTERNET-DRAFT                                              1 March 2003


implementations of the closed-loop policy (Appendix C.1.2.2).

The parameters also serve to signal the types of MIDI commands that are
not in use in a session. This information lets parties tune their MWPP
algorithms to the requirements of the session. It also lets an
incomplete MWPP implementation, that only provides recovery journal
services for a subset of the MIDI command set, avoid sessions that use
the unsupported MIDI commands.

The example session description below illustrates the use of the SDP
chapter inclusion parameters:

v=0
o=lazzaro 2520644554 2838152170 IN IP4 first.example.net
s=Example
t=0 0
m=audio 5004 RTP/AVP 96
c=IN IP4 192.0.2.94
a=rtpmap: 96 mwpp/44100
a=fmtp: 96 j_update=open-loop; j_unused=WATCMDVQEX;
a=fmtp: 96 j_anchor=P; j_anchor=C7,64;
a=fmtp: 96 j_never=4,11-13N;

The j_update parameter codes that the stream uses the open-loop policy.
Most chapters are assigned to j_unused, a typical MIDI usage pattern of
a low-bandwidth stream.

To guard against indefinite artifacts, the MIDI Program Change command
and several MIDI Control Change controller numbers are assigned to
ch_anchor. Note that the ordering of the j_anchor chapter C assignment
after the j_unused command acts to override the j_unused assignment for
the listed controller numbers (7 and 64).

Chapter N for several MIDI channels is assigned to ch_never; in
practice, this assignment pattern would reflect knowledge about a
resilient rendering method in use for certain channels. In this example,
Chapter N for MIDI channels other than 4, 11, 12, and 13 may appear in
the recovery journal, per the default behavior.


Appendix C.2. SDP Definitions: Command Execution Semantics

As defined in Section 3, the MIDI command section of the MWPP payload
consists of a list of MIDI commands, each with an associated command
timestamp. By default, a command timestamp indicates the execution time
for the command. If two commands have identical timestamps, the commands
execute simultaneously.




Lazzaro/Wawrzynek                                              [Page 72]

INTERNET-DRAFT                                              1 March 2003


This default timestamp behavior is not a good fit for the MIDI wire
protocol [1]. The MIDI wire protocol, a networking standard for the
remote control of musical instruments over serial lines, does not send
timestamps over the wire. Instead, MIDI commands are placed on the wire
at the moment of occurrence, and receivers infer the timestamp from the
moment of reception. In this memo, we refer to this coding technique as
an "implicit" or a "time-of-arrival" code.

As these names suggest, it is not possible to code two simultaneous MIDI
commands over the MIDI wire protocol, because two commands can not be
simultaneously sent over a serial line. If two musical events occur at
the same moment in time, a wire protocol sender arbitrarily sends one
MIDI command first, followed by the second MIDI command. The wire
protocol receiver sees a sequence of MIDI commands offset in time, but
cannot tell if the MIDI command offsets are serialization artifacts or
genuine event timing offsets played by the musician.

This Appendix defines alternative semantics of MIDI command timestamps,
for use in transcoding time-of-arrival MIDI data streams into MWPP
packets. The optional SDP parameter tsmode codes the choice of timestamp
semantics. The tsmode parameter takes on one of three symbolic values:
comex, async, or buffer.

The comex value indicates the default "command execution timestamp"
semantics defined in Section 3. The async and buffer values code two
different methods for coding MIDI wire protocol data, which we describe
in sub-sections C.2.1 and C.2.2 below.

The async and buffer methods are based on a simple idea: each method
describes a sampling algorithm to sense data octets on a MIDI wire. The
async and buffer methods use several SDP parameters to describe the
physical properties of the sampling algorithm, in order to describe a
wide range of plausible hardware and operating system environments.

One such SDP parameter is linerate. The linerate parameter codes the
timespan of one octet on the serial line. The linerate parameter has
units of nanoseconds, and takes on integral values. For the MIDI wire
protocol as defined in [1], linerate is 320,000 nanoseconds. Implicit
MIDI data sent over other physical layers (such as IEEE-1394) might
require a different linerate value. If linerate is not specified, it is
considered to be undefined.

We now describe the async and buffer methods in detail.

C.2.1 Description of the async method

The async method assumes an asynchronous sampling of the MIDI serial
line. At the moment a complete octet is received, it is labelled with an



Lazzaro/Wawrzynek                                              [Page 73]

INTERNET-DRAFT                                              1 March 2003


accurate wall-clock time value, whose units match the units of the RTP
header timestamp field.

The SDP parameter octpos defines how MWPP command timestamps are derived
from these octet timestamps. If octpos has the symbolic value first, a
MIDI command timestamp codes the time value for the first octet of the
MIDI command. If octpos has the symbolic value last, a MIDI command
timestamp codes the time value for the last octet of the MIDI command.
If an octpos parameter does not appear in the session description, the
MIDI command timestamp value may reflect any octet of the MIDI command.

Note that the octpos value refers to the first or last octet of the MIDI
command as it appears on the MIDI wire, not the MIDI command as it
appears in the MWPP packet. This distinction is important for cases
where the MWPP command representation includes extra octets that do not
appear on the MIDI wire. For example, if a MIDI command appears on the
wire using running status coding, and this command becomes the first
command in the MIDI command section of an MWPP packet, the MWPP
representation begins with a status octet that did not appear in the
original MIDI source on the wire, and the P header bit of the MIDI
command section is set to 1.

In the case of segmented SysEx commands (see Section 3), the octpos
rules apply to the octets of the SysEx command segment as they appear on
the MIDI wire.

We now show a session description example for the async method.
Consider an MWPP sender that is transcoding a MIDI wire protocol command
stream into an MWPP UDP RTP stream. The sender runs on a computing
platform that time stamps every incoming octet on the MIDI cable serial
line, and the sender chooses to use the timestamp of the first octet of
each command as the MIDI command timestamp. This stream description
accurately describes the transcoding:

m=audio 5004 RTP/AVP 96
c=IN IP4 192.0.2.94
a=rtpmap: 96 mwpp/44100
a=fmtp: 96 tsmode=async;linerate=320000;octpos=first;

C.2.2 Description of the buffer method

The buffer method uses a synchronous sampling of the MIDI wire data. In
this model, each arriving octet on the MIDI wire is placed in a buffer,
without adding a timestamp.

At periodic intervals, the MWPP sender examines the buffer. The sender
removes complete MIDI commands from the buffer, and places those
commands into the MIDI command section of an MWPP packet. The command



Lazzaro/Wawrzynek                                              [Page 74]

INTERNET-DRAFT                                              1 March 2003


timestamp reflects the actual moment of buffer examination, expressed in
the units of the RTP timestamp field. Note that in this coding scheme,
several commands may have the same command timestamp.

The SDP parameter mperiod defines the nominal periodic sampling interval
for the buffer tsmode. The mperiod parameter takes on positive integral
values, and has units of the RTP timestamp field.

The SDP parameter octpos (described in C.2.1 for the async method) is
also defined for the buffer method, but takes on different semantics.
These semantics address the choice of the command timestamp for MIDI
commands whose octets appear on the MIDI wire across several sampling
periods.

If octpos takes on the symbolic value first, the command timestamp
reflects the arrival period of the first octet of the command on the
wire. If octpos takes on the symbolic value last, the command timestamp
reflects the arrival period of the last octet of the command on the
wire. Note that if octpos takes on the symbolic value first, and the P
header bit of the MIDI Command Section is set to 1, the command
timestamp for the first channel command in the MIDI List reflects the
arrival period of the first data octet of the command, not the (phantom)
status octet of the command.

If an octpos parameter does not appear in the session description, MIDI
commands whose octets appear across several sampling periods may take on
the timestamp value associated with any arrival period of an octet in
the command. In the case of segmented SysEx commands (see Section 3),
the octpos rules apply to the octets of the SysEx command segment as
they appear on the MIDI wire.

We now show a session description example for the buffer method.
Consider an MWPP sender that is transcoding a MIDI wire protocol command
stream into an MWPP UDP RTP stream.  The sender runs on a computing
platform that places MIDI serial line data into a buffer upon receipt,
without timestamps.

The sender polls the buffer 1000 times a second, extracts all complete
commands from the buffer, and places them in the MIDI command section of
an MWPP packet. All of the MIDI command timestamps in this packet are
identical, and reflect the actual clock value at the sampling instant,
in RTP timestamp units. This stream description accurately describes the
transcoding:

m=audio 5004 RTP/AVP 96
c=IN IP4 192.0.2.94
a=rtpmap: 96 mwpp/44100
a=fmtp: 96 tsmode=buffer;linerate=320000;octpos=last;mperiod=44;



Lazzaro/Wawrzynek                                              [Page 75]

INTERNET-DRAFT                                              1 March 2003


Note that mperiod takes on an integral value, and has the units of the
RTP timestamp field. In this example, the mperiod value of 44 is derived
by dividing the rtpmap srate (44100 Hz) by the 1000 Hz buffer sampling
rate, and rounding to the nearest integer.  The MIDI command timestamps
might not advance by exact multiples of 44, as the actual buffer
sampling period might not precisely match the nominal sampling period.

Appendix C.3. SDP Definitions: Media Time

In Section 2.1, we define the media time of an MWPP RTP packet as the
RTP timestamp difference (modulo 2^32) between the packet's successor
and the packet itself.

By default, the media time for a packet may be arbitrarily long.  For
example, consider an MWPP stream that codes the real-time behavior of a
musician playing a piano keyboard. If the musician does not play a note
for several seconds, there is no reason to send a new packet, and so the
media time of the last packet sent may grow without bound.

However, for some applications, it is desirable to set a maximum media
time for an MWPP packet, that is independent of the source rate of MIDI
event data. This constraint acts to set a minimum packet sending rate,
which may simplify algorithms performing clock-skew compensation,
network latency estimation, and packet loss recovery.

Applications may use the SDP maxptime (defined in [9]) for this purpose.
The maxptime parameter specifies the maximum amount of media time an
MWPP packet encodes, in units of milliseconds. For example, this stream
description sets a maximum media time of 0.5 seconds, and thus a minimum
packet rate of 2 Hz:

m=audio 5004 RTP/AVP 96
c=IN IP4 192.0.2.94
a=rtpmap: 96 mwpp/44100
a=fmtp: 96 maxptime=500;

Appendix C.4. SDP Definitions: Multiple Streams

Several MWPP streams may appear in a session description. By default,
the MIDI name space (16 voice channels + systems) for each MWPP stream
is unique, and the rendering for each MWPP stream proceeds
independently. The audio outputs of the streams are presented
simultaneously, using the standard synchronization and audio mixing
conventions for RTP.

In this Appendix, we define two SDP parameters for use in sessions with
several MWPP streams. These parameters (midiport and zerosync) add three
features to MWPP:



Lazzaro/Wawrzynek                                              [Page 76]

INTERNET-DRAFT                                              1 March 2003


  1. Several MWPP streams may target the same MIDI name space.

  2. Several MWPP streams may be bundled to form a larger MIDI
     name space, that a single rendering system may treat as
     an ordered entity.

  3. Receivers may be informed of the synchronized behavior of the
     RTP timestamp fields of several MWPP streams, to simplify the
     time-locked rendering of multi-stream MWPP systems.

In Sections C.4.1 and C.4.2, we normatively define the midiport and
zerosync parameters. In Section C.4.3, we show a series of examples,
that illustrate the feature set described above.

C.4.1 The midiport parameter

The midiport SDP parameter codes an arbitrary identification number for
the MIDI name space (16 voice channels + systems) of an MWPP stream. The
midiport parameter may take on integer values between 0 and 429496729.

If several MWPP streams in a session share the same midiport value, the
streams target the same MIDI name space. We refer to this relationship
as the identity relationship.

If several MWPP streams in a session have contiguous midiport values
(i.e. i, i+1, ... i+k), the name spaces of the MWPP streams form an
ordered entity. In this case, the streams in the entity are said to
share an ordered relationship.

Note that streams may participate in both an identity and an ordered
relationship, if MWPP in an identity relationship have a midiport value
that forms part of an ordered relationship. If the midiport values of
two MWPP streams are not part of an ordered or identity relationship,
the two streams are independent, and have independent MIDI name spaces.

MWPP streams in an ordered or identity relationship MUST be all native
MWPP streams or all mpeg4-generic MWPP streams. Thus, we refer to
relationships as being native relationships or mpeg4-generic
relationships.

All streams in an ordered or identity mpeg4-generic relationship
generate audio using the same instance of the synthesis engine, and thus
the following restrictions apply:

  1. All streams in an identity or ordered relationship must have
     the same profile-level-id (74 for Main Synthetic, 75 for
     Wavetable Synthesis, 76 for General MIDI).




Lazzaro/Wawrzynek                                              [Page 77]

INTERNET-DRAFT                                              1 March 2003


  2. Ordered relationships MUST NOT be used with Wavetable Synthesis
     or General MIDI object types, because these systems are only
     defined for 16 MIDI voice channels. Ordered relationships MAY
     be used with the Main Synthetic object type, and follow the
     MIDI semantics defined in 5.14.3.2.2. of [5].

  3. At most one of the streams in an identity or ordered
     relationship may have a config parameter value other than
     the empty string. In this case, the non-empty config value
     configures the stream. Alternatively, the config parameter
     for all streams may be set to the empty string. In this case,
     exactly one stream in the relationship MUST define the
     configuration using the tools described in Section C.5.

For MWPP streams in an ordered or identity native relationship, at most
one stream may specify a MIDI renderer (using the tools described in
C.5). Each MIDI rendering type may define its own semantics with regard
to identity and ordered relationships.

In an identity relationship, the sender partitions a MIDI name space (16
voice channels + systems) into several MWPP streams. Receivers may
process these streams independently, or may merge the streams to
reconstitute the original MIDI command stream. We now specify receiver
and sender responsibilities to ensure the robust transmission of
identity relationships.

Receivers that merge identity relationship streams into a single MIDI
command stream MUST maintain the structural integrity of the MIDI
commands coded in each MWPP during the merging process, in the same way
that software that merges traditional MIDI wire protocol flows is
responsible for creating a merged command flow compatible with [1].

Senders MUST partition the name space so that the rendered MIDI
performance does not contain indefinite artifacts (as defined in Section
4). This responsibility holds even if all streams are sent over reliable
transport, as imperfect synchronization of reliable streams may yield
indefinite artifacts. For example, stuck notes may occur if a single-
channel MIDI performance is split over two TCP streams, if NoteOn
commands are sent on the first stream and NoteOff commands are sent on
the second stream.

Senders MUST NOT split a Registered Parameter Name (RPN) or Non-
Registered Parameter Name (NRPN) transaction appearing on a MIDI channel
across multiple identity relationship streams. Receivers MUST assume
that the RPN/NRPN transactions that appear on different identity
relationship streams are independent, and MUST preserve transactional
integrity during the MIDI merge.




Lazzaro/Wawrzynek                                              [Page 78]

INTERNET-DRAFT                                              1 March 2003


A simple way to safely partition voice channel commands is to place all
MIDI commands for a particular voice channel into the same MWPP stream.
Safe partitions of systems commands may be more complex for streams that
extensively use System Exclusive commands. In [22], we discuss identity
partitioning issues in detail.

C.4.2 The zerosync parameter

The RTP timestamp value of the first packet in a stream is not set to
zero. Instead, the RTP standard [2] mandates that the RTP timestamp is
initialized to a randomly chosen value, to guard against plaintext
attacks on encrypted streams. As a consequence, a receiver cannot
directly use RTP timestamps to play back two RTP streams in sync, even
if the sender is generating synchronized timestamps for the streams.

Note that the Real Time Control Protocol (RTCP), a low-bandwidth
feedback channel that is paired with each RTP stream, includes a
synchronization feature. Certain types of RTCP packets code the current
time in two forms: the format of the RTP timestamp, and the 64-bit
Network Time Protocol (NTP) format.  A receiver may examine the NTP
timestamps of several RTCP streams, and use this information to compute
the ongoing temporal relationship between the RTP streams associated
with the RTCP streams.

For many MWPP applications, this RTCP-based method is a good way to
synchronize streams. In some applications, however, this method is not
optimal, because of the synchronization time delay at the start of the
session.

The SDP parameter zerosync provides an alternative mechanism for MWPP
stream synchronization. The zerosync parameter codes the RTP timestamp
offsets for each stream, so that streams that are generated in a
synchronized fashion may be played back in sync without using RTCP
feedback. The use of the zerosync parameter weakens the security of RTP,
as discussed in Section 7 of this memo.

The zerosync parameter supports two different ways to normalize RTP
timestamp fields. One mechanism is in effect if the zerosync parameter
takes on integer values in the range 1 to 429496729. A second mechanism
is in effect of the zerosync parameter takes on the special value 0.

We first describe the synchronization behavior for non-zero values of
zerosync. This synchronization mechanism is designed for use with a set
of MWPP streams that form an ordered or identity relationship.  For a
relationship to use this mechanism, all streams in the relationship MUST
include a zerosync parameter set to a non-zero value, and the srate
rtpmap parameter (see Section 6.1) of all streams in the relationship
MUST have the same value.



Lazzaro/Wawrzynek                                              [Page 79]

INTERNET-DRAFT                                              1 March 2003


Given these conditions, the normalized RTP timestamp for a packet in a
stream is computed by subtracting (modulo 2^32) the stream zerosync
parameter value from the original RTP timestamp of the packet.

Next, we describe the synchronization behavior for zero-valued zerosync
parameters. All streams in a session with zerosync = 0 are generated
from a single RTP timebase. In other words, these streams simply ignore
the RTP requirement for random timestamp offsets.  All streams whose
zerosync values are set to 0 MUST have the same srate rtpmap parameter
value.

Note that a stream description may contain, at most, one zerosync
parameter assignment. A stream may participate in a non-zero-valued
zerosync behavior or a zero-valued zerosync behavior, but not both.

C.4.3 Multi-stream examples using midiport and zerosync.

This section shows several session description examples that use the
midiport and zerosync parameters.

Our first session description example shows two mpeg4-generic MWPP
streams that drive the same General MIDI decoder.

v=0
o=lazzaro 2520644554 2838152170 IN IP4 first.example.net
s=Example
t=0 0
m=audio 5004 RTP/AVP 61
c=IN IP4 192.0.2.94
a=rtpmap: 61 mpeg4-generic/44100
a=fmtp: 61 streamtype=5; mode=mwpp; config="e4"; profile-level-id=76;
a=fmtp: 61 midiport=12;zerosync=1726
m=audio 5006 RTP/AVP 62
c=IN IP4 192.0.2.94
a=rtpmap: 62 mpeg4-generic/44100
a=fmtp: 62 streamtype=5; mode=mwpp; config=""; profile-level-id=76;
a=fmtp: 62 midiport=12;zerosync=726

The two UDP streams in the session use different UDP ports (5004/5006)
that map to different RTP header PT field values (61 and 62). The
profile-level-id codes General MIDI. Note that only one config parameter
is set to a non-empty string. The midiport values indicate the streams
share an identity relationship; the presence of zerosync parameters with
non-zero values establish the synchronization mechanism.

A variant on this example, whose session description is not shown, is to
have two streams in an identity relationship driving the same MIDI
renderer, each with a different transport type. One stream would use



Lazzaro/Wawrzynek                                              [Page 80]

INTERNET-DRAFT                                              1 March 2003


UDP, and would be dedicated to real-time messages. A second stream would
use TCP, and would be dedicated to sending reliable bulk SysEx dumps.

In the next example, two mpeg4-generic MWPP streams form an ordered
relationship to drive a Structured Audio decoder with 32 MIDI voice
channels.

v=0
o=lazzaro 2520644554 2838152170 IN IP4 first.example.net
s=Example
t=0 0
m=audio 5004 RTP/AVP 61
c=IN IP4 192.0.2.94
a=rtpmap: 61 mpeg4-generic/44100
a=fmtp: 61 streamtype=5; mode=mwpp; config=""; profile-level-id=74;
a=fmtp: 61 midiport=5;zerosync=0;
m=audio 5006 RTP/AVP 62
c=IN IP4 192.0.2.94
a=rtpmap: 62 mpeg4-generic/44100
a=fmtp: 62 streamtype=5; mode=mwpp; config=""; profile-level-id=74;
a=fmtp: 62 midiport=6;zerosync=0;
a=fmtp: 62 render=sasc; url="http://second.example.com/cardinal.sasc";
a=fmtp: 62 cid="azsldkaslkdjqpwojdkmsldkfpe";

The sequential midiport pattern for the two streams establishes the
ordered relationship; the profile-level-id values of 74 indicate Main
Synthetic (i.e. Structured Audio). The midiport=5 stream maps to
Structured Audio extended channels range 0-15, the midiport=6 stream
maps to Structured Audio extended channels range 16-31.  Note the use of
the zero-valued zerosync option.

Finally, note that both config strings are empty. The configuration
information for the Structured Audio decoder is specified in the final
two fmtp lines of the second media stream description. In Appendix C.5,
we describe the coding used in these lines in detail.


Appendix C.5. SDP Definitions: MIDI Rendering

A MIDI command stream codes a series of high-level events, such as the
onset and termination of musical notes. A receiver turns this event
stream into audio (or some applications, into control actions such as
the dimming of stage lights) by applying a MIDI rendering algorithm.

By default, native MWPP streams do not specify a rendering algorithm.
This default behavior assumes that the rendering algorithm is sent in-
band, via MIDI System Exclusive commands. The minimal native MWPP stream
description in Section 6.1 exhibits this default behavior.



Lazzaro/Wawrzynek                                              [Page 81]

INTERNET-DRAFT                                              1 March 2003


In contrast, the default rendering algorithm for mpeg4-generic MWPP
streams is the MPEG 4 synthesis algorithm coded in the SDP config
parameter. The minimal mpeg4-generic MWPP stream description in Section
6.2 exhibits this default behavior.

In this Appendix, we define the SDP parameter "render" to override these
default rendering methods. Uses of the render parameter must obey the
restrictions defined in Appendix C.4.1.

This document defines two symbolic values for render: "default" and
"sasc". Other standards-track IETF documents may define additional
values for render. Receivers MUST NOT participate in sessions if the
session description sets the SDP render parameter to a value that is not
known by the receiver.

We anticipate that the standards-track IETF documents that extend the
render parameter will define registration hierarchies for rendering
algorithms, whose management will be independent of the IETF AVT Working
Group. Candidate hierarchies include the Manufacturer ID registration
tree used in MIDI System Exclusive commands [1], and hierarchies based
on the DNS registration tree.

If the SDP parameter render takes on the value "default", the stream
uses the default rendering method, as defined in Section 6.1 (for native
MWPP streams) or Section 6.2 (for mpeg4-generic MWPP streams).

We describe the use of the sasc value for the render parameter in the
following sub-section.

C.5.1 The sasc Method

The sasc method supports the flexible transport of the MPEG 4 Audio
AudioSpecificConfig() binary data block. This structure may contain the
configuration data for the General MIDI [1], DLS2 [18], or Structured
Audio [5] synthesis methods, as specified in [5].

Only an mpeg4-generic MWPP stream description may use the sasc method.
To signal the use of sasc, the config parameter for the mpeg4-generic
stream MUST be set to the empty string, AND the SDP render parameter
MUST be set to the symbolic value sasc.

Two AudioSpecificConfig() transport parameters are defined by sasc
method:

  o  The SDP parameter url may be assigned a string that contains
     a Uniform Resource Locator (URL) to the AudioSpecificConfig()
     data.




Lazzaro/Wawrzynek                                              [Page 82]

INTERNET-DRAFT                                              1 March 2003


  o  The SDP parameter inline may be assigned a string that contains
     a Base64 encoding of a representation of AudioSpecificConfig().

Exactly one url parameter assignment or exactly one inline parameter
assignment MUST appear in a stream description that uses the sasc
method. The url and inline parameters MUST NOT both appear in the same
stream description.

The sasc method is based on MIME [17]. We consider sasc to be a MIME
subtype for the audio media type. The SDP parameters we define in the
remainder of this sub-section may also act as MIME parameters for the
audio/sasc MIME type. If the url parameter is used in a stream
description, the coded URL SHOULD that returns a MIME document of type
audio/sasc.

We define the following SDP/MIME parameters for use with the sasc
method:

  o compr. The compr parameter indicates which lossless compression
    algorithm is in use to reduce the size of AudioSpecificConfig().
    Compression occurs before any content transfer encoding (such as
    the Base64 encoding for the inline parameter).

    This memo defines two legal values for compr: none (for no
    compression) and gzip (for the gzip compression algorithm as
    defined in [19]). The default value for compr is gzip.

    The compr parameter is an extensible parameter; other IETF
    documents may define new compression methods. Receivers MUST
    NOT participate in a session if the session description sets
    the compr parameter to a value that is not known by the receiver.

  o cid. The cid parameter is assigned a string value that
    encodes a globally unique identifier for the content encoded
    in the AudioSpecificConfig().

    The cid value supports cache management: if a receiver notices
    it has previously used an AudioSpecificConfig(), it can avoid
    redundant transmission or decoding.

    If an AudioSpecificConfig() is coded in a MIME document, the
    Content-ID header [17] value MUST match the cid value in the
    stream description. Using the cid parameter in a MIME document
    is legal but redundant, because Content-ID also codes the string.

If these parameters are in use for a stream, SDP fmtp lines that assign
values to these parameters MUST appear in the session description. In
addition, if the stream description uses the url parameter to encode a



Lazzaro/Wawrzynek                                              [Page 83]

INTERNET-DRAFT                                              1 March 2003


MIME document, the MIME version of these parameters SHOULD appear in the
MIME document, unless the parameter definition indicates otherwise.

We now show stream description examples for the sasc method. The stream
description below uses the inline SDP parameter to code the
AudioSpecificConfig() block for a mpeg4-generic General MIDI stream.
This stream has the same characteristics as the example shown in Section
6.2.

m=audio 5004 RTP/AVP 61
c=IN IP4 192.0.2.94
a=rtpmap: 61 mpeg4-generic/44100
a=fmtp: 61 streamtype=5; mode=mwpp; config=""; profile-level-id=76;
a=fmtp: 61 render=sasc; inline="e4"; compr=none;

Note the empty string value for config, and the presence of the render
parameter. We use a General MIDI stream in this example for didactic
purposes; in practice, the sasc method would not be used for a General
MIDI stream, because the configuration string is trivially short.

The stream description below uses the url SDP parameter to code the
AudioSpecificConfig() block for the same General MIDI stream:

m=audio 5004 RTP/AVP 61
c=IN IP4 192.0.2.94
a=rtpmap: 61 mpeg4-generic/44100
a=fmtp: 61 streamtype=5; mode=mwpp; config=""; profile-level-id=76;
a=fmtp: 61 render=sasc; url="http://first.example.net/oski.sasc";
a=fmtp: 61 cid="xjflsoeiurvpa09itnvlduihgnvet98pa3w9utnuighbuk";

In this example, the MIME-encoded document oski.sasc, of MIME type
audio/sasc, contains the AudioSpecificConfig(). The default gzip
compression is used on the AudioSpecificConfig(), and the cid value
matches the Content-ID value of oski.sasc.


Appendix C.6. ABNF Specifications for MWPP Parameters

In this Appendix, we formally define the syntax for the MWPP parameters,
using ABNF [23]. MWPP parameters appear in the fmtp lines of session
descriptions for native or mpeg4-generic MWPP streams.  A fmtp line may
be defined in the following way:

;
; SDP fmtp line definition
;

fmtp = "a=fmtp:" token 1*(param-assign ";") CRLF



Lazzaro/Wawrzynek                                              [Page 84]

INTERNET-DRAFT                                              1 March 2003


where <token> codes the RTP payload type. At the end of this Appendix,
we define <param-assign> as a set of 17 incremental rules, one for each
custom parameter listed in Figure C.1 of the Appendix C preamble.

The mpeg4-generic RTP packetization [4] defines a mode parameter, that
signals the type of MPEG stream in use. We extend the mode parameter to
signal an MWPP mpeg4-generic stream, using the ABNF rule below:

;
; mpeg4-generic mode parameter extension
;

mode              /= "mwpp"
                  ; as described in Section 6.2 of this memo

Two of the parameters listed in Figure C.1 ("compr" and "cid") may
appear in Content-Type field of audio/sasc MIME documents.  The
<parameter> rule for MIME headers defined in Appendix A of [17] is
compatible with the definitions of "compr" and "cid" in the MWPP
parameter ABNF listed below.

;
; top-level definition for all MWPP parameters
;

param-assign  = "j_sec"      "=" ("none" / "recj" / (*ietf-extension))
                              ; described in Appendix C.1
                              ; for audio/mwpp and audio/mpeg-generic

param-assign /= "j_update"   "=" ("anchor" / "closed-loop" / "open-loop")
                              ; described in Appendix C.1
                              ; for audio/mwpp and audio/mpeg-generic

param-assign /= "ch_default" "=" ([channel-list] chapter-list [field-list])
                              ; described in Appendix C.1
                              ; for audio/mwpp and audio/mpeg-generic

param-assign /= "ch_unused"  "=" ([channel-list] chapter-list [field-list])
                              ; described in Appendix C.1
                              ; for audio/mwpp and audio/mpeg-generic

param-assign /= "ch_never"   "=" ([channel-list] chapter-list [field-list])
                              ; described in Appendix C.1
                              ; for audio/mwpp and audio/mpeg-generic

param-assign /= "ch_anchor"  "=" ([channel-list] chapter-list [field-list])
                              ; described in Appendix C.1
                              ; for audio/mwpp and audio/mpeg-generic



Lazzaro/Wawrzynek                                              [Page 85]

INTERNET-DRAFT                                              1 March 2003


param-assign /= "tsmode"     "=" ("comex" / "async" / "buffer")
                              ; described in Appendix C.2
                              ; for audio/mwpp and audio/mpeg-generic

param-assign /= "linerate"   "=" nonzero-four-octet
                              ; described in Appendix C.2
                              ; for audio/mwpp and audio/mpeg-generic

param-assign /= "octpos"     "=" ("first" / "last")
                              ; described in Appendix C.2
                              ; for audio/mwpp and audio/mpeg-generic

param-assign /= "mperiod"    "=" nonzero-four-octet
                              ; described in Appendix C.2
                              ; for audio/mwpp and audio/mpeg-generic

param-assign /= "midiport"   "=" four-octet
                              ; described in Appendix C.4
                              ; for audio/mwpp and audio/mpeg-generic

param-assign /= "zerosync"   "=" four-octet
                              ; described in Appendix C.4
                              ; for audio/mwpp and audio/mpeg-generic

param-assign /= "render"     "=" ("default" / "sasc" / (*ietf-extension))
                              ; described in Appendix C.5
                              ; for audio/mwpp and audio/mpeg-generic

param-assign /= "url"        "=" double-quote uri-element double-quote
                              ; described in Appendix C.5
                              ; for audio/mpeg-generic

param-assign /= "inline"     "=" double-quote base-64-block double-quote
                              ; described in Appendix C.5
                              ; for audio/mpeg-generic

param-assign /= "compr"      "=" ("none" / "gzip" / (*ietf-extension))
                              ; described in Appendix C.5
                              ; for audio/mpeg-generic and audio/sasc

param-assign /= "cid"        "=" double-quote cid-block double-quote
                              ; described in Appendix C.5
                              ; for audio/mpeg-generic and audio/sasc


;
; list definitions for the ch_ chapter-list
;



Lazzaro/Wawrzynek                                              [Page 86]

INTERNET-DRAFT                                              1 March 2003


chapter-list       = chapter-part1 chapter-part2 chapter-part3

chapter-part1      = 0*1"P" 0*1"W" 0*1"N" 0*1"A"

chapter-part2      = 0*1"T" 0*1"C" 0*1"M" 0*1"D"

chapter-part3      = 0*1"V" 0*1"Q" 0*1"E" 0*1"X"


;
; list definitions for the ch_ channel-list
;

channel-list       = midi-chan-element 1*(["," midi-chan-element])

midi-chan-element  = midi-chan  midi-chan-range

midi-chan-range    = midi-chan "-" midi-chan

                   ; decimal value of left midi-chan
                   ; MUST be strictly less than decimal
                   ; value of right midi-chan

midi-chan          = %d0-15


;
; list definitions for the ch_ field-list
;

field-list         = midi-field-element 1*(["," midi-field-element])

midi-field-element = midi-field  midi-field-range

midi-field-range   = midi-field "-" midi-field
                   ;
                   ; decimal value of left midi-field
                   ; MUST be strictly less than decimal
                   ; value of right midi-field

midi-field         = %d0-127


;
; generic rules
;

ietf-extension     = token



Lazzaro/Wawrzynek                                              [Page 87]

INTERNET-DRAFT                                              1 March 2003


                   ;
                   ; token as defined in reference [9].
                   ; ietf-extension may only be defined in
                   ; standards-track RFCs, but we expect
                   ; those RFCs to define a namespaces that
                   ; do not require IETF actions.

four-octet         = %d0-429496729
                   ; unsigned encoding of 32-bits

nonzero-four-octet = %d1-429496729
                   ; unsigned encoding of 32-bits, ex-zero

uri-element        = uri
                   ; as defined in reference [9].

base-64-block      = base64
                   ; as defined in reference [9].

double-quote       = %x22
                   ; the double-quote (") character

cid-block          = msg-id
                   ; as discussed in Section 7 of
                   ; reference [17]

;
; End of ABNF for MWPP parameters.
;



Appendix C.7. IANA Considerations

In this Appendix, we register the audio/mwpp and audio/sasc MIME types,
and we extend the audio/mpeg4-generic MIME type for use with MWPP. The
audio/mwpp and audio/sasc registrations are in the IETF tree, as we
expect MWPP to be widely used in MIDI and MPEG applications. The
mpeg4-generic extensions are in compliance with the extension guidelines
in [4].


Appendix C.7.1 mwpp MIME Registration

This section registers mwpp as a MIME subtype for the audio type.






Lazzaro/Wawrzynek                                              [Page 88]

INTERNET-DRAFT                                              1 March 2003


MIME media type name:

    audio


MIME subtype name:

    mwpp


Required parameters:

    rate: The RTP timestamp clock rate, as specified in the rtpmap
          line. See Sections 2.1 and 6.1 of this memo for usage details.


Optional parameters:

    Standard SDP parameters:

       maxptime:   See Appendix C.3 for usage details.

    Non-extensible parameters:

       j_sec:      See Appendix C.1 for usage details.
       j_update:   See Appendix C.1 for usage details.
       ch_default: See Appendix C.1 for usage details.
       ch_unused:  See Appendix C.1 for usage details.
       ch_never:   See Appendix C.1 for usage details.
       ch_anchor:  See Appendix C.1 for usage details.
       tsmode:     See Appendix C.2 for usage details.
       linerate:   See Appendix C.2 for usage details.
       octpos:     See Appendix C.2 for usage details.
       mperiod:    See Appendix C.2 for usage details.
       midiport:   See Appendix C.4 for usage details.
       zerosync:   See Appendix C.4 for usage details.

    Extensible parameter:

       render: See Appendix C.5 for usage details. The render
               parameter may only be extended via a standards
               track IETF document. We anticipate only a few
               such extensions; however, these extensions will
               serve to define methods for using existing
               registries (such as the MIDI Manufacturer Code
               registry [1]), so that implementors may define
               new rendering schemes without IETF involvement.




Lazzaro/Wawrzynek                                              [Page 89]

INTERNET-DRAFT                                              1 March 2003


Encoding considerations:

    This type is only defined for real-time transfers of MIDI
    streams via RTP transport. Note that an industry standard
    already exists for stored MIDI files [1].


Security considerations:

    See Section 7 of this memo.


Interoperability considerations:

    None.


Published specification:

   This memo and [1] serve as the normative specification. In
   addition, references [6], [8], and [22] provide non-normative
   implementation guidance.


Applications which use this media type:

   Audio content-creation hardware, such as MIDI controller piano
   keyboards and MIDI audio synthesizers. Audio content-creation
   software, such as music sequencers, digital audio workstations,
   and soft synthesizers. In addition, content distribution
   servers and terminals may use this media type for low bit-rate
   music coding.


Additional information:

    None.


Person & email address to contact for further information:

    John Lazzaro <lazzaro@cs.berkeley.edu>


Intended usage:

    COMMON. The goal is to replace the asynchronous serial line
    MIDI networking described in [1] with RTP. If this goal is



Lazzaro/Wawrzynek                                              [Page 90]

INTERNET-DRAFT                                              1 March 2003


    achieved, thousands of embedded devices will use this media
    type.


Author/Change controller:

    John Lazzaro <lazzaro@cs.berkeley.edu>



Appendix C.7.2 mpeg4-generic MWPP extensions MIME Registration

The mpeg4-generic MIME type [4] permits extensions to support new modes.
The registration below defines mode mwpp for use with mpeg4-generic.
These extensions support the MPEG Audio codecs [5] that use MIDI as a
control language.



MIME media type name:

    audio


MIME subtype name:

    mpeg4-generic


Required parameter extensions:

    We extend the mpeg4-generic required parameter mode, by
    adding the value=parameter syntax:

    mode=mwpp

    to the list of legal mode values defined in [4]. See
    Section 6.2 for usage details.

    rate: In mode mwpp, rate is a required parameter. Rate
    specifies the RTP timestamp clock rate on the rtpmap line.
    See Sections 2.1 and 6.2 for usage details.


Optional parameter extensions:


    Standard SDP parameters:



Lazzaro/Wawrzynek                                              [Page 91]

INTERNET-DRAFT                                              1 March 2003


       maxptime:   See Appendix C.3 for usage details.

    Non-extensible parameters:

       j_sec:      See Appendix C.1 for usage details.
       j_update:   See Appendix C.1 for usage details.
       ch_default: See Appendix C.1 for usage details.
       ch_unused:  See Appendix C.1 for usage details.
       ch_never:   See Appendix C.1 for usage details.
       ch_anchor:  See Appendix C.1 for usage details.
       tsmode:     See Appendix C.2 for usage details.
       linerate:   See Appendix C.2 for usage details.
       octpos:     See Appendix C.2 for usage details.
       mperiod:    See Appendix C.2 for usage details.
       midiport:   See Appendix C.4 for usage details.
       zerosync:   See Appendix C.4 for usage details.
       url:        See Appendix C.5.1 for usage details.
       inline:     See Appendix C.5.1 for usage details.
       cid:        See Appendix C.5.1 for usage details.

    Extensible parameters:

       render: See Appendix C.5 for usage details. The render
               parameter may only be extended via a standards
               track IETF document. Extensions of render in
               the context of mpeg4-generic would be rare; we
               define render as extensible to match the render
               parameter defined for audio/mwpp in Appendix C.7.1.

       compr:  See Appendix C.5.1 for usage details. The compr
               parameter may only be extended via a standards
               track IETF document. As compr specifies a
               compression method for a binary data block, we
               expect extensions of compr would be rare.


Encoding considerations:

    This type is only defined for real-time transfers of
    audio/mpeg4-generic streams with mode=mwpp.


Security considerations:

    See Section 7 of this memo.


Interoperability considerations:



Lazzaro/Wawrzynek                                              [Page 92]

INTERNET-DRAFT                                              1 March 2003


    The RTP packet formats for audio/mwpp and audio/mpeg4-generic
    mode=mwpp are identical. The two packetizations differ in
    purpose: audio/mpeg4-generic mode=mwpp is for MIDI transport
    for MPEG synthetic codecs, audio/mwpp is for MIDI transport for
    all other applications. Software may interoperate with both
    audio/mwpp and audio/mpeg4-generic mode=mwpp simply by
    supporting the differing parameter sets for each MIME type.
    See Section 6 for details.


Published specification:

   This memo, [1], and [5] are the normative references. In
   addition, references [6], [8], and [22] provide non-normative
   implementation guidance.


Applications which use this media type:

   MPEG 4 servers and terminals that support [5].


Additional information:

    None.


Person & email address to contact for further information:

    John Lazzaro <lazzaro@cs.berkeley.edu>


Intended usage:

    COMMON. The codecs in [5] have the potential to be for
    electronic musical instruments what Postscript is for
    printers -- the common language to express rendering.
    If [5] is successful in this goal, audio/mpeg4-generic
    mode=mwpp will be the RTP transport for playing these
    electronic musical instruments.


Author/Change controller:

    John Lazzaro <lazzaro@cs.berkeley.edu>






Lazzaro/Wawrzynek                                              [Page 93]

INTERNET-DRAFT                                              1 March 2003


Appendix C.7.3 sasc MIME Registration

This section registers sasc as a MIME subtype for the audio type.



MIME media type name:

    audio


MIME subtype name:

    sasc


Required parameters:

    none


Optional parameters:

    Non-extensible parameter:

       cid:    See Appendix C.5.1 for usage details.

    Extensible parameter:

       compr:  See Appendix C.5.1 for usage details. The compr
               parameter may only be extended via a standards
               track IETF document. As compr specifies a
               compression method for a binary data block, we
               expect extensions of compr would be rare.


Encoding considerations:

    This type is only defined for stored-file transfer. In the
    MIME registration extension for audio/mpeg4-generic mode=mwpp
    Appendix C.7.2, we define an optional parameter url. The
    stored-file data coded by url has the MIME type audio/sasc.
    The most common transports for audio/sasc are HTTP and SMTP.


Security considerations:

    See Section 7 of this memo.



Lazzaro/Wawrzynek                                              [Page 94]

INTERNET-DRAFT                                              1 March 2003


Interoperability considerations:

    None.


Published specification:

    The binary data coded in a audio/sasc document is normatively
    defined as the StructuredAudioSpecificConfig object in section
    5.5.2 of [5]. Methods for coding this data into a MIME document
    are normatively defined in Appendix C.5.1 of this memo.


Applications which use this media type:

    Applications that use RTP streams of type audio/mpeg4-generic
    mode=mwpp, and which wish to specify initialization data of
    non-trivial size in the session description.


Additional information:

    None.


Person & email address to contact for further information:

    John Lazzaro <lazzaro@cs.berkeley.edu>


Intended usage:

    COMMON. [5] defines three synthetic codecs for MPEG 4: the
            General MIDI codec originally defined in [1], the
            DLS2 codec originally defined in [18], and the
            Structured Audio codec. The latter two codecs have
            initialization data blocks too large for direct
            inclusion into SDP session descriptions sent over
            UDP. If audio/mpeg4-generic mode=mwpp becomes a
            popular MIME type for use with DLS2 or Structured
            Audio, audio/sasc will also become a popular MIME
            type.


Author/Change controller:

    John Lazzaro <lazzaro@cs.berkeley.edu>




Lazzaro/Wawrzynek                                              [Page 95]

INTERNET-DRAFT                                              1 March 2003


Appendix D. A MIDI Overview for Networking Specialists

This Appendix presents an overview of the MIDI standard, for the benefit
of networking specialists new to musical applications. MWPP implementors
should consult [1] for a normative description of MIDI.

Musicians make music by performing a controlled sequence of physical
movements. For example, a pianist plays by coordinating a series of key
presses, key releases, and pedal actions. MIDI represents a musical
performance by encoding these physical gestures as a sequence of MIDI
commands. This high-level musical representation is compact but fragile:
one lost command may be catastrophic to the performance.

MIDI commands have much in common with the machine instructions of a
microprocessor. MIDI commands are defined as binary elements.  Bitfields
within a MIDI command have a regular structure and a specialized
purpose. For example, the upper nibble of the first command octet (the
opcode field) codes the command type. MIDI commands may consist of an
arbitrary number of complete octets, but most MIDI commands are 1, 2, or
3 octets in length.


     -------------------------------------------------------------
    |              Name              |      Bitfield Pattern      |
    |-------------------------------------------------------------|
    | NoteOff (end a note)           | 1000cccc 0nnnnnnn 0vvvvvvv |
    |-------------------------------------------------------------|
    | NoteOn (start a note)          | 1001cccc 0nnnnnnn 0vvvvvvv |
    |-------------------------------------------------------------|
    | PTouch (Polyphonic Aftertouch) | 1010cccc 0nnnnnnn 0aaaaaaa |
    |-------------------------------------------------------------|
    | CControl (Controller Change)   | 1011cccc 0xxxxxxx 0yyyyyyy |
    |-------------------------------------------------------------|
    | PChange (Program Change)       | 1100cccc 0ppppppp          |
    |-------------------------------------------------------------|
    | CTouch (Channel Aftertouch)    | 1101cccc 0aaaaaaa          |
    |-------------------------------------------------------------|
    | PWheel (Pitch Wheel)           | 1110cccc 0xxxxxxx 0yyyyyyy |
    |-------------------------------------------------------------|
    | System (sub-opcode is xxxx)    | 1111xxxx ...               |
     -------------------------------------------------------------

                     Figure D.1 -- MIDI Command Chart


Figure D.1 shows the MIDI command family. There are two major classes of
commands: voice commands (opcode field values in the range 0x8 through
0xE) and system commands (opcode field value 0xF). Voice commands code



Lazzaro/Wawrzynek                                              [Page 96]

INTERNET-DRAFT                                              1 March 2003


the musical gestures for each timbre in a composition.  Systems commands
perform housekeeping functions, such as System Reset (the one-octet
command 0xFF).

Voice commands execute on one of 16 MIDI channels, as coded by its 4-bit
channel field (field cccc in Figure D.1). In most applications, notes
for different timbres are assigned to different channels. To support
applications that require more than 16 channels, MIDI systems use
several MIDI command streams in parallel, to yield 32, 48, or 64 MIDI
channels.

As an example of a voice command, consider a NoteOn command (opcode
0x9), with binary encoding 1001cccc 0nnnnnnn 0aaaaaaa. This command
signals the start of a musical note on MIDI channel cccc. The note has a
pitch coded by the note number nnnnnnn, and an onset amplitude coded by
note velocity aaaaaaa.

Other voice commands signal the end of notes (NoteOff, opcode 0x8), map
a specific timbre to a MIDI channel (PChange, opcode 0xC), or set the
value of parameters that modulate the timbral quality (all other voice
commands). The exact meaning of most voice channel commands depends on
the rendering algorithms the MIDI receiver uses to generate sound. In
most applications, a MIDI sender has a model (in some sense) of the
rendering method used by the receiver.

An examination of the opcode bitfields in Figure D.1 reveals a special
structure: the leading bit of the first octet is set to 1, and the
leading bit of all subsequent octets is set to 0.  This structure
supports a data compression system, called running status [1], that
significantly reduces the size of the MIDI command stream.

In running status coding, the first octet of a MIDI command may be
dropped if it is identical to the first octet of the previous MIDI
command. This rule, in combination with a convention to consider NoteOn
commands with a null third octet as NoteOff commands, supports the
coding of note sequences using two octets per command.

Finally, note that the bitfield formats in Figure D.1 do not encode the
execution time for a command. Timing information is not a part of the
MIDI command syntax itself; different applications of the MIDI command
language use different methods to encode timing.

For example, the MIDI Wire Protocol [1], a networking standard for the
remote control of musical instruments over short asynchronous serial
lines, does not place timestamps on the wire. Instead, the protocol uses
an implicit "time of arrival" code: receivers execute MIDI commands at
the moment they appear on the wire. In contrast, Standard MIDI Files
[1], a file format for representing complete musical performances, adds



Lazzaro/Wawrzynek                                              [Page 97]

INTERNET-DRAFT                                              1 March 2003


a timestamp field to each MIDI command, using a delta-time code that is
tuned to the statistics of musical performance.


Appendix E. Author Addresses

John Lazzaro (corresponding author)
UC Berkeley
CS Division
315 Soda Hall
Berkeley CA 94720-1776
Email: lazzaro@cs.berkeley.edu

John Wawrzynek
UC Berkeley
CS Division
631 Soda Hall
Berkeley CA 94720-1776
Email: johnw@cs.berkeley.edu
































Lazzaro/Wawrzynek                                              [Page 98]

INTERNET-DRAFT                                              1 March 2003


Appendix F. References

[1] MIDI Manufacturers Association. The complete MIDI 1.0
detailed specification, 1996. http://www.midi.org

[2] H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson.
RTP: A transport protocol for real-time applications. Work
in progress, draft-ietf-avt-rtp-new-11.txt.

[3] H. Schulzrinne and S. Casner. RTP Profile for Audio and Video
Conferences with Minimal Control. Work in progress,
draft-ietf-avt-profile-new-12.txt.

[4] Internet Engineering Task Force. Transport of MPEG-4 Elementary
Streams.  Work in progress, draft-ietf-avt-mpeg4-simple-04.txt.

[5] International Standards Organization. ISO 14496 MPEG-4,
Part 3 (Audio) Subpart 5 (Structured Audio) 1999.

[6] John Lazzaro and John Wawrzynek. A Case for Network
Musical Performance. The 11th International Workshop on Network
and Operating Systems Support for Digital Audio and Video
(NOSSDAV 2001) June 25-26, 2001, Port Jefferson, New York.
http://www.cs.berkeley.edu/~lazzaro/sa/pubs/pdf/nossdav01.pdf

[7] Sfront source code release, includes a Linux networking
client that implements the MIDI RTP packetization.
http://www.cs.berkeley.edu/~lazzaro/sa/

[8] Dominique Fober, Yann Orlarey, Stephane Letz. Real Time Musical
Events Streaming over Internet. Proceedings of the International
Conference on WEB Delivering of Music 2001, pages 147-154
http://www.grame.fr/~fober/RTESP-Wedel.pdf

[9] M. Handley, V. Jacobson and C. Perkins. SDP: Session Description
Protocol. Work in progress, draft-ietf-mmusic-sdp-new-10.txt.

[10] J. Rosenberg, H. Schulzrinne, G. Camarillo, A. Johnston,
J. Peterson, R. Sparks, M. Handley, and E. Schooler. SIP: Session
Initiation Protocol. Internet Engineering Task Force, RFC 3261.

[11] J. Rosenberg and H. Schulzrinne. An Offer/Answer Model with
SDP. Internet Engineering Task Force, RFC 3264.

[12] H. Schulzrinne, A. Rao, and R. Lanphier. Real Time Streaming
Protocol (RTSP). Work in progress,
draft-ietf-mmusic-rfc2326bis-00.txt.




Lazzaro/Wawrzynek                                              [Page 99]

INTERNET-DRAFT                                              1 March 2003


[13] D. D. Clark and D. L. Tennenhouse, "Architectural considerations
for a new generation of protocols," in SIGCOMM Symposium on
Communications Architectures and Protocols , (Philadelphia,
Pennsylvania), pp. 200--208, IEEE, Sept. 1990.  Computer
Communications Review, Vol. 20(4), Sept. 1990.

[14] C. Bormann et al. Robust Header Compression (ROHC). Internet
Engineering Task Force, RFC 3095. Also see related work at
http://www.ietf.org/html.charters/rohc-charter.html.

[15] D. Yon. Connection-Oriented Media Transport in SDP.  Work in
progress, draft-ietf-mmusic-sdp-comedia-03.txt.

[16] International Standards Organization. ISO 14496 MPEG-4, Part 3
(Audio) Subpart 1 (Main Document) 1999.

[17] N. Freed and N. Borenstein. MIME Part 1: Format of Internet
Message Bodies. RFC 2045, November 1996.

[18] MIDI Manufacturers Association. The MIDI Downloadable Sounds
Specification, v98.2. Available for purchase at http://www.midi.org.

[19] P. Deutsch. GZIP file format specification version 4.3.  Internet
Engineering Task Force, RFC 1952.

[20] C. Perkins et al. RTP Payload for Redundant Audio Data. Internet
Engineering Task Force, RFC 2198.

[21] J. Rosenberg and H. Schulzrinne. An RTP Payload Format for
Generic Forward Error Correction.  Internet Engineering Task Force,
RFC 2733.

[22] John Lazzaro and John Wawrzynek. An Implementation Guide to the
MIDI Wire Protocol Packetization (MWPP). An informative IETF I-D (in
preparation).

[23] D. Crocker and P Overell. Augmented BNF for Syntax
Specifications: ABNF. Internet Engineering Task Force, RFC 2234.

[24] "Resource ReSerVation protocol (RSVP) -- version 1 functional
specification," RFC 2205, Internet Engineering Task Force, Sept.
1997.









Lazzaro/Wawrzynek                                             [Page 100]