INTERNET-DRAFT John Lazzaro February 19, 2002 John Wawrzynek Expires: August 19, 2002 UC Berkeley The MIDI Wire Protocol Packetization (MWPP) Status of this Memo This document is an Internet-Draft and is subject to all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/1id-abstracts.html The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html Abstract This memo describes the MIDI Wire Protocol Packetization (MWPP). MWPP is a resilient RTP packetization for the MIDI wire protocol. MWPP defines a multicast-compatible recovery journal format, to support the graceful recovery from lost packets during a MIDI session. MWPP is compatible with the MPEG-4 generic RTP payload format, to support MPEG 4 Audio codecs that accept MIDI control input. Lazzaro/Wawrzynek [Page 1] INTERNET-DRAFT 19 February 2002 0. Changes Log 0.1 Changelog for o The MIDI command section now encodes all legal MIDI commands, including all MIDI Systems commands. o The MIDI command section header has new features to support the efficient streaming of pre-generated MIDI performances. o A new SDP parameter (pwe) indicates that a stream is suitable for use in pseudo-wire emulation. o In response to Dominique Fober's AVT postings, new text was added to explain the rationale for the G bit, a bug was fixed in the Chapter N definition, Chapter C was rewritten to include controller numbers and to explain the COUNT coding rationale, and the reference to his paper was corrected. 0.2 Changelog for o Normative algorithm specifications for the sending and receiving of MWPP packets have been deleted from the memo. Normative text is confined to the packetization format (in Sections 2-4 and Appendices A.1-6), a new policy for resilient sending (Section 5), and SDP issues (Section 6 for standard RTP, Section 7 for MPEG 4 generic RTP). o The memo now casts MWPP as a general-purpose transport for the MIDI wire protocol; text in the former document about network musical performance specialization has been deleted. o MWPP no longer uses the MPEG 4 Structured Audio standard as a normative reference. The only MPEG issue left in the document concerns MWPP's dual role as both a standalone RTP packetization and an MPEG-4 generic RTP packetization. o The MIDI command payload of a packet now specifies the event time of each MIDI command in the payload. o The marker bit in the RTP header is now always set to 1. Lazzaro/Wawrzynek [Page 2] INTERNET-DRAFT 19 February 2002 This modification lets us define a single MWPP payload format that is compatible with both standalone RTP and MPEG-4 generic RTP transport. o In the recovery journal header, we replace the redundant K flag bit with a new "G" (guaranteed) flag bit. The G flag bit codes that the sender is following the sending policy defined in Section 5; this sending policy provides the "graceful recovery upon receipt of the first packet following a loss" guarantee which motivates the recovery journal concept. o The "mpeg-generic" SDP typo was also fixed, and is now "mpeg4-generic." o Sender and receiver proxy discussions have been deleted. o New name reflects MWPP's AVT WG item status. Lazzaro/Wawrzynek [Page 3] INTERNET-DRAFT 19 February 2002 1. Introduction The MIDI standard [1] defines a real-time networking standard for the interconnection of electronic musical devices and general-purpose computers. The standard defines the MIDI command set, the MIDI wire protocol for the command set, and a physical layer to carry the wire protocol (short coaxial "MIDI cables"). This memo concerns the transport of the MIDI wire protocol on alternative network layers, using the Real- Time Protocol (RTP). This memo describes the MIDI Wire Protocol Packetization (MWPP), a resilient RTP [2] payload format for the MIDI wire protocol. MWPP is defined as a stand-alone RTP payload. However, MWPP is also suitable for use in conjunction with the MPEG-4 generic RTP payload format [3] [4], to support MPEG codecs that accept MIDI control input [5]. MWPP normatively specifies a payload format, but does not specify algorithms for sending and receiving MWPP packets. MWPP is capable of the pseudo-wire emulation of the MIDI physical layer as defined in [1]. However, MWPP is also capable of coding MIDI streams which cannot be represented on the MIDI physical layer; for example, the MIDI physical layer cannot code two MIDI commands that execute at the precise same time, but MWPP can. MWPP is designed for use over unreliable datagram transport such as unicast and multicast UDP: one design goal is the graceful recovery from lost packets, without using packet retransmission. MWPP also supports reliable transport such as TCP. MWPP is self-framing, to simplify TCP transport. Sending the MIDI wire protocol over unreliable transport is not trivial. The MIDI standard defines a set of commands, that reflect the gestures musicians make in playing their instruments ("NoteOn" command to start a new note, "NoteOff" command to end the note, etc). Gestural commands make MIDI data streams very compact, but also very fragile: a single lost "NoteOff" command could result in a sound that sustains indefinitely long. MWPP does not use packet retransmission to provide resiliency. Instead, each MWPP packet includes a special section (the "recovery journal") that codes the recent history of the stream. The recovery journal protects against the loss of RTP packets sent since an earlier "checkpoint" RTP packet. As described in this memo, the recovery journal protects MIDI Channel commands, but not MIDI Systems commands (editors note: partial MIDI Systems resiliency will be added before Final Call version of MWPP). Lazzaro/Wawrzynek [Page 4] INTERNET-DRAFT 19 February 2002 The remainder of this memo defines the MWPP payload format, and specifies Session Description Protocol configuration for both RTP and MPEG-4 generic transport. This memo describes a format, not an algorithm or an application. Readers unfamiliar with the application domain should first read [6], a paper that describes an experimental system [7] that uses an RTP packetization similar to MWPP. In addition, [8] describes another experimental system for MIDI transport, whose algorithms are compatible with MWPP. 2. MWPP Packet Format. Figure 1 shows the format of an MWPP packet, suitable for both RTP transport and MPEG 4 generic RTP transport. An MWPP packet consists of three sections: the RTP header, the MIDI command section, and the recovery journal. In Figure 1, vertical space delineates the RTP header and the payload. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |V=2|P|X| CC |M| PT | Sequence number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Timestamp | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SSRC | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | CSRCs | | ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | MIDI command section ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Recovery journal ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 1 -- MWPP packet format An MWPP packet begins with an RTP header. The marker bit is always set to 1, for compatibility with the MPEG 4 generic payload format. The RTP sequence number increments by one (modulo 65536) for each packet sent. As is standard in RTP, the sequence number is initialized to a randomly Lazzaro/Wawrzynek [Page 5] INTERNET-DRAFT 19 February 2002 chosen value. MWPP does not use header extensions. The RTP timestamp sets a base timestamp value for the packet. The event times coded in the MIDI command section are specified relative to this base timestamp value. If the MIDI command section carries no events, the timestamp indicates the instant the RTP packet was encoded. The RTP timestamp has the units of the SDP rtpmap parameter srate (see Section 6). For example, if srate has a value of 44100 (Hz), two MWPP packets whose base timestamp values differ by 2 seconds have RTP timestamps that differ by 88200. MWPP RTP timestamps do not necessarily increment at a fixed rate. The timestamps for two sequential RTP packets may be identical, or the second packet may have a timestamp arbitrarily larger than the first packet (modulo 2^32). As is standard in RTP, the timestamp field is initialized to a randomly chosen value. MWPP does not provide tools to multiplex several 16-channel MIDI cable streams onto a single MWPP payload. Instead, implementors should use the multiplexing tools provided by RTP: each MIDI cable stream should map to a separate RTP stream, identified by a distinct SSRC value. The MWPP payload always begins with the variable-length MIDI command section, described in detail in Section 3. If a stream is configured for resilient coding, the MIDI command section of every packet is followed by the variable-length recovery journal, described in detail in Section 4. If a stream is not configured for resiliency, the recovery journal never appears in the MWPP payload. The SDP rtpmap parameter rj (see Section 6) configures an MWPP stream for resilient coding. 3. MIDI Command Section Figure 2 shows the format of the variable-length MIDI command section. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |B|Z| LEN ... | MIDI list ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 2 -- MIDI command section Lazzaro/Wawrzynek [Page 6] INTERNET-DRAFT 19 February 2002 The MIDI command section begins with a variable-length header. The header field LEN codes the length (in units of octets) of the MIDI list that follows the header. If the header flag B is 0, the header is one octet long, and LEN is a 6-bit field, supporting a maximum MIDI list length of 63 octets. If B is 1, the header is two octets long, and LEN is a 14-bit field, supporting a maximum MIDI list length of 16383 octets. A LEN value of 0 is legal, and codes an empty MIDI list. If the MIDI list is empty, the RTP timestamp indicates the instant the RTP packet was encoded. If LEN is nonzero, the MIDI list has the structure shown in Figure 3. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Delta Time 0 (if Z = 1) | MIDI Command 0 ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Delta Time 1 | MIDI Command 1 ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Delta Time 2 | MIDI Command 2 ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ..... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Delta Time N | MIDI Command N ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 3 -- MIDI list structure If the header flag Z is 1, the MIDI list begins with a complete MIDI command (MIDI Command 0) preceded by a delta time (Delta Time 0). If Z is 0, the Delta Time 0 field is not present in the MIDI list, and MIDI Command 0 has an implicit delta time of 0. The execution time of MIDI Command 0 is the summation (modulo 2^32) of the RTP timestamp and its delta time. The MIDI list structure may also optionally encode a list of N additional complete MIDI commands. Each additional command is preceded by a delta time. The execution time for MIDI Command K is the summation (modulo 2^32) of the RTP timestamp, delta time 0 (if Z = 1), and delta times 1 through K. Lazzaro/Wawrzynek [Page 7] INTERNET-DRAFT 19 February 2002 MWPP delta times are a modified form of MIDI File delta times [1]. MWPP delta times use 1-4 octet fields to encode 32-bit unsigned integers. Figure 4 shows the encoded and decoded forms of delta times. One-Octet Delta Time: Encoded form: 0ddddddd Decoded form: 00000000 000000000 00000000 0ddddddd Two-Octet Delta Time: Encoded form: 1ccccccc 0ddddddd Decoded form: 00000000 00000000 00cccccc cddddddd Three-Octet Delta Time: Encoded form: 1bbbbbbb 1ccccccc 0ddddddd Decoded form: 00000000 000bbbbb bbcccccc cddddddd Four-Octet Delta Time: Encoded form: 1aaaaaaa 1bbbbbbb 1ccccccc 0ddddddd Decoded form: 0000aaaa aaabbbbb bbcccccc cddddddd Figure 4 -- Decoding delta time formats The first MIDI channel command in the MIDI list MUST include a status byte; running status coding, as defined in [1], may be used for all subsequent MIDI channel commands in the MIDI list. As in [1], System Common messages (F0-F7) cancel running status state, but System Realtime messages (F8-FF) do not effect running status state. In the MIDI wire protocol, a System Realtime command (F8-FF) may be embedded inside of another "host" MIDI command. This syntactic construction is not supported in MWPP: a MIDI Command field in the MIDI list codes exactly one complete MIDI command. To encode an embedded System Realtime command, extract the command from its host, and code it in the MIDI list as a separate command. Two methods are provided for encoding MIDI System Exclusive (SysEx) commands in the MIDI list. A SysEx command may be encoded in a MIDI Command field verbatim: an F0 octet, followed by an arbitrary number of data octets, followed by an F7 octet. Lazzaro/Wawrzynek [Page 8] INTERNET-DRAFT 19 February 2002 Alternatively, a SysEx command may be encoded as multiple segments. The command is divided into two or more SysEx command segments; each segment is encoded in its own MIDI Command field in the MIDI list. To segment a SysEx command, first partition its data octet list into two or more sublists; each sublist must contain at least one data octet. To complete the segmentation, add status bytes to the head and tail of each sublist, as detailed in Figure 5. Figure 6 shows example segmentations of a MIDI SysEx command. ----------------------------------------------------------- | Sublist Position | Head Status Octet | Tail Status Octet | |-----------------------------------------------------------| | first | 0xF0 | 0xF0 | |-----------------------------------------------------------| | middle | 0xF7 | 0xF7 | |-----------------------------------------------------------| | last | 0xF7 | 0xF0 | ----------------------------------------------------------- Figure 5 -- Command Segmentation Status Octets Lazzaro/Wawrzynek [Page 9] INTERNET-DRAFT 19 February 2002 Original SysEx command: 0xF0 0x01 0x02 0x03 0x04 0x05 0x06 0x07 0x08 0xF7 A two-segment segmentation: 0xF0 0x01 0x02 0x03 0x04 0xF0 0xF7 0x05 0x06 0x07 0x08 0xF0 A different two-segment segmentation: 0xF0 0x01 0xF0 0xF7 0x02 0x03 0x04 0x05 0x06 0x07 0x08 0xF0 A three-segment segmentation: 0xF0 0x01 0x02 0xF0 0xF7 0x03 0x04 0xF7 0xF7 0x05 0x06 0x07 0x08 0xF0 The segmentation with the largest number of segments: 0xF0 0x01 0xF0 0xF7 0x02 0xF7 0xF7 0x03 0xF7 0xF7 0x04 0xF7 0xF7 0x05 0xF7 0xF7 0x06 0xF7 0xF7 0x07 0xF7 0xF7 0x08 0xF0 Figure 6 -- Example segmentations Lazzaro/Wawrzynek [Page 10] INTERNET-DRAFT 19 February 2002 The relative ordering of SysEx command segments in a MIDI list must match the relative ordering of the sublists in the original SysEx command. Other complete MIDI commands may appear between SysEx command segments, including verbatim MIDI SysEx commands. However, SysEx command segments derived from one SysEx command may not appear between the command segments of another SysEx command. If the total size of all segments is too large to fit in a single RTP packet, segments may be placed in the MIDI lists of two or more RTP packets. In this case, the SysEx segment ordering rules apply to the concatenation of all affected MIDI lists. The MIDI list format supports pseudo-wire emulation applications, where an MWPP stream is encoded so that it can be accurately re-multiplexed onto the MIDI physical layer [1] after transport. However, the MIDI list format also permits the encoding of MWPP streams that are impossible to accurately regenerate on a MIDI cable. The SDP rtpmap parameter pwe (see Section 6) indicates that an MWPP stream may be accurately multiplexed onto the physical MIDI layer. The definition of the MIDI list restrictions required for accurate pseudo- wire emulation is outside the scope of this memo. 4. The Recovery Journal This section introduces the structure of the recovery journal, and defines the bitfields of recovery journal headers. Appendices to this memo complete the bitfield definition of the recovery journal. A recovery journal codes information about the MIDI command section of all previous packets in an MWPP stream, back to and including an earlier packet called the checkpoint packet. We identify the checkpoint packet by its sequence number. Note that the recovery journal for a packet does not contain information about the MIDI command section of its own packet. The recovery journal has a three-level structure: o Top-level header. Encodes recovery journal structure. o Channel journal header. Encodes recovery information for a single MIDI channel. o Chapters. Describes recovery information for a single MIDI command type. Lazzaro/Wawrzynek [Page 11] INTERNET-DRAFT 19 February 2002 Figure 7 shows the top-level structure of the recovery journal. A recovery journals consists of a 3-octet header, followed by a list of channel journals. Channel journals encode recovery information for a single MIDI channel. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |S|A|G|R|TOTCHAN| Checkpoint Packet Seqnum | Channels ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 7 -- Top-level recovery journal format If the A bit is set in the recovery journal header, the recovery journal is "empty", and contains no channel journals. If the A bit is clear, the channel journal list contains (TOTCHAN + 1) channel journals. The recovery journal header includes an S bit. S bits appear on structures throughout the recovery journal format, with uniform semantics: if the S bit is set to 1, the structure does not encode information about the MIDI command section of the previous packet in the stream. S bits support efficient recovery journal parsing in the common case of a single packet loss. A set S bit on the recovery journal header indicates the previous packet contained an empty MIDI command section. The 16-bit Checkpoint Packet Seqnum field codes the sequence number of the checkpoint packet for this journal. The G ("guaranteed") bit specifies the method used to update the checkpoint packet; we describe the G bit in detail in Section 5. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |S| CHAN |R| LENGTH |P|W|N|A|T|C|R|R| Chapters ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 8 -- Channel journal format Figure 8 shows the structure of a channel journal: a 3-octet header, followed by a list of leaf elements called chapters. A channel journal encodes information about MIDI commands on the MIDI channel coded by the 4-bit CHAN header field. The 10-bit LENGTH field codes the number of octets in the channel journal, including the header. Lazzaro/Wawrzynek [Page 12] INTERNET-DRAFT 19 February 2002 The third octet of the channel journal header is the Table of Contents (TOC) of the channel journal. The TOC is a set of bits to encode the presence of a chapter in the journal. Each chapter contains information about a certain class of MIDI command: o Chapter P: MIDI Program Change (0xC) o Chapter W: MIDI Pitch Wheel (0xE) o Chapter N: MIDI NoteOff (0x8), NoteOn (0x9) o Chapter A: MIDI Poly Aftertouch (0xA) o Chapter T: MIDI Channel Aftertouch (0xD) o Chapter C: MIDI Control Change (0xB) Chapters appear in a list following the header, in order of their appearance in the TOC. The Appendices of this memo describe the bitfield format for each chapter. 5. Checkpoint Packet Policy (Editors note: MWPP as defined in this version of the memo provides resiliency for MIDI Channel commands only; upcoming iterations will add partial MIDI Systems resiliency. Thus, the recovery assertions made in this section, at present, only apply to MIDI channel commands) In this section, we describe a normative policy that MWPP senders may use to update the checkpoint packet identity during an MWPP session. The use of the policy is optional, and is signalled via the use of G bit in the top-level recovery journal header. MWPP defines the G bit because the strategy a receiver uses to recover from packet loss depends on the checkpoint update policy. If the normative policy is in force, the receiver is always able to "gracefully" recover from the loss of an arbitrary number of packets, upon the receipt of the first packet following the loss [6]. If the policy is not in force, the receiver usually can apply the same recovery techniques used when the policy holds -- but not always. The G bit warns the receiver to run additional algorithms, to check for the rare cases where "less graceful" recovery techniques are needed. We now normatively define the checkpoint policy, and the usage of the G bit to signal the policy. In this description, we specify the identity of the checkpoint packet by the extended sequence number of the packet as maintained by the sender. We assume that senders can compensate for sequence number rollover in the implementation of the policy. Lazzaro/Wawrzynek [Page 13] INTERNET-DRAFT 19 February 2002 In order to implement the policy, senders must not advance the checkpoint packet to extended sequence number N, until it has direct knowledge that all known receivers have received an MWPP RTP packet with extended sequence number M >= (N - 1). Senders may deduce this knowledge by examining the "last extended sequence number received" fields of the standard RTCP packets from each receiver, or may use other direct feedback mechanisms. Senders may find that a receiver is not providing feedback for an extended period of time, and that the recovery journal size has grown unacceptably large as a result. To maintain the policy, the only acceptable action in this case is to drop the offending receiver from the session; a time-out mechanism may not be used in lieu of direct feedback to advance the checkpoint packet. Note that the policy is in effect for "known receivers." If MWPP is sent over true multicast, the receiver may be processing MWPP packets before the sender is aware of its existence. Receiver implementors SHOULD be aware of this start-up phenomena, and adjust its recovery procedures accordingly. Senders that implement this policy SHOULD set the G bit on the top-level recovery journal header (Figure 7) to 1; senders that do not implement this policy MUST set the G bit to 0. If a sender starts a session with the policy in effect, and then later abandons the policy, it MUST set the G bit on all recovery journals sent after abandonment to 0, for the remainder of the session. Receivers SHOULD monitor the G bit and adjust its recovery procedure based on its state. 6. Session Description Protocol for RTP Transport This section describes Session Description Protocol (SDP) [9] definitions for MWPP transport directly over RTP. Section 8 describes the SDP definitions for MWPP transport over the MPEG-4 generic RTP payload format. The MIME name for this packetization is mwpp. The SDP rtpmap attribute is declared as a=rtpmap: mwpp/// The integer parameter codes the sampling rate used for the RTP timestamp field, and has the units of Hz. The binary parameter codes the presence (1) or absence (0) of the recovery journal section in MWPP packets. Lazzaro/Wawrzynek [Page 14] INTERNET-DRAFT 19 February 2002 The binary parameter codes the suitability (1) or unsuitability (0) of the MWPP stream for pseudo-wire emulation applications, as described in Section 3. For example, the following lines bind the packetization to dynamic payload number 96, and specifies an srate of 44100 Hz and the presence of a recovery journal in each RTP packet: m=audio 5004 RTP/AVP 96 c=IN IP4 171.64.92.160 a=rtpmap: 96 mwpp/44100/1/0 Note that the packetization does not directly support multiple 16-channel MIDI Input sources. Different UDP ports should be used in this case, each devoted to a single source: m=audio 5004 RTP/AVP 96 c=IN IP4 171.64.92.160 a=rtpmap: 96 mwpp/44100/1/0 m=audio 5006 RTP/AVP 97 c=IN IP4 171.64.92.160 a=rtpmap: 97 mwpp/44100/1/0 7. Session Description Protocol for MPEG-4 generic transport This section describes Session Description Protocol (SDP) definitions for the MPEG-4 generic RTP payload format [3] [4]. Note that MWPP as defined in this memo creates valid MPEG-4 generic RTP packets; only SDP customization is necessary. The MIME name for this packetization is mpeg4-generic. The SDP rtpmap attribute is declared as a=rtpmap: mpeg4-generic/// The definitions of srate, rj, and pwe are identical to the descriptions in Section 6. The SDP fmpt command configures mpeg4-generic for MWPP transport, as shown below: a=fmpt: streamtype=5; profile-level-id=15; mode=mwpp; To signal SingleSL mode, we omit the ConstantSize and SizeLength format parameters from the fmpt command. If the MPEG 4 audio codec requires configuration data be sent via SDP, AudioSpecificConfig() may be added. Lazzaro/Wawrzynek [Page 15] INTERNET-DRAFT 19 February 2002 8. Security Considerations Cryptographic authentication of incoming RTP and RTCP packets is highly recommended when using MWPP. Without such protections, attackers could forge MIDI commands into an ongoing streams, potentially damaging speakers and eardrums. An attacker could also craft RTP and RTCP packets to exploit known bugs in the client, and take effective control of a client machine. 9. Congestion Control MWPP has congestion control issues that are unique for an RTP audio packetization. In certain applications such as network musical performance [6], the packet rate is linked to the gestural rate of a human performer. MWPP implementations SHOULD sense the MIDI wire procotol stream for command patterns that result in excessive packet rates, and filter these streams as part of MWPP to reduce the packet rate. Appendix A.1. Chapter P: MIDI Program Change A channel journal contains Chapter P if a MIDI Program Change command on this channel is present in the MIDI command section of an earlier packet, back to and including the checkpoint packet. Figure A.1.1 shows the format for Chapter P. 0 1 2 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |S| PROGRAM |C| BANK-COARSE |F| BANK-FINE | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure A.1.1 -- Chapter P Format The chapter has a fixed size of 24 bits. The PROGRAM field indicates the program value of the most recent MIDI Program Change command sent on this channel. The S bit is set to 1 if this most recent Program Change command did not appear in the previous packet in the stream (i.e. packet N-1, if the recovery journal is a part of packet N). If a MIDI Control Change command for the Bank Select Coarse controller was sent before this Program Change command, the C bit is set to 1, and Lazzaro/Wawrzynek [Page 16] INTERNET-DRAFT 19 February 2002 the BANK-COARSE field is the Bank Select Coarse controller value that was sent. The F bit and BANK-FINE field code the Bank Select Fine value in the same manner. The BANK-COARSE and BANK-FINE fields may reflect Control Change commands sent before the checkpoint packet. Appendix A.2. Chapter W: MIDI Pitch Wheel A channel journal contains Chapter W if a MIDI Pitch Wheel command on this channel is present in the MIDI command section of an earlier packet, back to and including the checkpoint packet. Figure A.2.1 shows the format for Chapter W. 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |S| FIRST |R| SECOND | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure A.2.1 -- Chapter W Format The chapter has a fixed size of 16 bits. The FIRST and SECOND fields are the 7-bit values of the first and second data bytes of the most recent Pitch Wheel command sent on this channel. The S bit is set to 1 if this most recent Pitch Wheel command did not appear in the previous packet in the stream. The R bit is reserved and set to 0. Appendix A.3. Chapter N: MIDI NoteOff and NoteOn A channel journal contains Chapter N if a MIDI Note On or a Note Off command on this channel is present in the MIDI command section of an earlier packet, back to and including the checkpoint packet. In the description that follows, we consider MIDI Note On commands with zero velocity to be MIDI Note Off commands. Lazzaro/Wawrzynek [Page 17] INTERNET-DRAFT 19 February 2002 Figure A.3.1 shows the format for Chapter N. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 8 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |B| LENGTH | LOW | HIGH |S| NOTENUM |Y| VELOCITY | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |S| NOTENUM |Y| VELOCITY | .... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | BITFIELD | BITFIELD | .... | BITFIELD | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure A.3.1 -- Chapter N Format Chapter N codes information about Note On and Note Off commands by coding information about the MIDI note numbers referenced by these commands. The chapter consists of a 2-octet header, and at least one of the following data structures: o A variable-length note list, coding Note On information. o A variable-length bitfield, coding Note Off information. Information about a specific MIDI note number may appear in the note list (if the note number last appears in a Note On command) or the bitfield (if the note number last appears in a Note Off command) but never both. The header for Chapter N, reproduced in Figure A.3.2, codes the size of the note list and bitfield structures. 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |B| LENGTH | LOW | HIGH | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure A.3.2 -- Chapter N Header The 7-bit LENGTH field codes the number of 2-octet note logs in the note list. Zero is a valid value for LENGTH, and codes the empty note list. The 4-bit fields LOW and HIGH determine the number of bitfield bytes that follow the note logs. A bitfield byte codes NoteOff information for eight consecutive MIDI note numbers, with the MSB representing the Lazzaro/Wawrzynek [Page 18] INTERNET-DRAFT 19 February 2002 lowest note number. The MSB of the first bitfield byte codes the note number 8*LOW; the MSB of the last bitfield byte codes the note number 8*HIGH. A 1 in a bit position codes that a Note Off command happened more recently than a Note On command for this note number, and that this Note Off command occurred in the MIDI command section of an earlier packet, back to and including the checkpoint packet. Note that because Chapter N codes the presence of a Note Off command using a single bit, the Note Off velocity value is not recorded. If LOW is less that or equal to HIGH, there are (HIGH - LOW + 1) bitfield octets in the chapter. An empty bitfield structure is coded by setting LOW to 15 and HIGH to 0. The B bit is set to 1 if the MIDI command section of the previous packet did not include a Note Off command for this channel. The note list structure consists of LENGTH 2-octet note logs. The note log structure is reproduced below. 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |S| NOTENUM |Y| VELOCITY | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure A.3.3 -- Chapter N Note Log A note log will exist for a note number (coded by the 7-bit NOTENUM field) if a Note On command happened more recently than a Note Off command for this note number, and if this Note On command occurred in the MIDI command section of an earlier packet, back to and including the checkpoint packet. A note number may not be represented by multiple note logs in the note list. The 7-bit VELOCITY field codes the velocity value for this most-recent NoteOn command, and is never zero: Note On commands with zero velocity are treated as Note Off commands, and coded in the bitfield structure. The S bit is set to 1 if the Note On command coded by the note log is not in the MIDI command section of the previous packet. The note log does not contain the execution time of the Note On command it codes, for efficiency reasons. In lieu of a timestamp, the Y bit codes information about the execution time of the Note On command coded Lazzaro/Wawrzynek [Page 19] INTERNET-DRAFT 19 February 2002 by the Note Log. The Y bit is set to 1 if the most recent event coded in the MIDI command section of the packet containing the recovery journal is considered to be simultaneous with the Note On command coded by the note log. If the MIDI command section of the packet contains no events, Y is set to 1 if a hypothetical MIDI command occurring at the RTP timestamp time would be considered simultaneous. The definition of simultaneity is implementation dependent. Appendix A.4. Chapter A: MIDI Poly Aftertouch A channel journal contains Chapter A if a MIDI Poly Aftertouch command on this channel is present in the MIDI command section of an earlier packet, back to and including the checkpoint packet. Poly Aftertouch commands contained in packets previous to the checkpoint packet are never coded in Chapter A. Figure A.4.1 shows the variable-length format for Chapter A. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 8 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |S| LENGTH |F| NOTENUM |R| PRESSURE |F| NOTENUM | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |R| PRESSURE | .... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure A.4.1 -- Chapter A format The chapter consists of a 1-octet header, followed by a variable length list of 2-octet note logs. A note log appears for a note number if a Poly Aftertouch command is present for the note number in the MIDI command section of an earlier packet, back to and including the checkpoint packet. A note number may not be represented by multiple note logs in the list. The 7-bit LENGTH field codes the number of note logs in the list, minus one. The expression (1 + 2*(LENGTH + 1)) yields the number of octets in Chapter A. The S bit in the header is set to 1 if the MIDI command section of the previous packet does not contain a Poly Aftertouch command on this channel. Lazzaro/Wawrzynek [Page 20] INTERNET-DRAFT 19 February 2002 Figure A.4.2 reproduces the note log structure of Chapter A. 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |F| NOTENUM |R| PRESSURE | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure A.4.2 -- Chapter A Note Log The 7-bit PRESSURE field codes the pressure value of the most recent Poly Aftertouch command for the MIDI note number coded by the 7-bit NOTENUM field. The F bit is 1 if this most recent Poly Aftertouch command did not appear in the previous packet. The R bit is reserved, and is set to 0. Appendix A.5. Chapter T: MIDI Channel Aftertouch A channel journal contains Chapter T if a MIDI Channel Aftertouch command on this channel is present in the MIDI command section of an earlier packet, back to and including the checkpoint packet. Figure A.5.1 shows the format for Chapter T. 0 0 1 2 3 4 5 6 7 +-+-+-+-+-+-+-+-+ |S| PRESSURE | +-+-+-+-+-+-+-+-+ Figure A.5.1 -- Chapter T Format The chapter has a fixed size of 8 bits. The 7-bit PRESSURE field holds the pressure value of the most recent Channel Aftertouch command sent on this channel. The S bit is set to 1 if this most recent Channel Aftertouch command for this channel did not appear in the previous packet in the stream. Appendix A.6. Chapter C: MIDI Control Change A channel journal contains Chapter C if a MIDI Control Change command on this channel is present in the MIDI command section of an earlier packet, back to and including the checkpoint packet. Control Change Lazzaro/Wawrzynek [Page 21] INTERNET-DRAFT 19 February 2002 commands contained in packets previous to the checkpoint packet are never coded in Chapter C. Figure A.6.1 shows the variable-length format for Chapter C. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 8 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |S| LENGTH |F| CONTROLLER |R| VALUE/COUNT |F| CONTROLLER | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |R| VALUE/COUNT | .... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure A.6.1 -- Chapter C format The chapter consists of a 1-octet header, followed by a variable length list of 2-octet controller logs. A controller log appears for a controller number if a Control Change command is present for the controller number in the MIDI command section of an earlier packet, back to and including the checkpoint packet. A controller number may not be represented by multiple controller logs in the list. The 7-bit LENGTH field codes the number of controller logs in the list, minus one. The expression (1 + 2*(LENGTH + 1)) yields the number of octets in Chapter C. The S bit in the header is set to 1 if the MIDI command section of the previous packet does not contain a MIDI Control Change command on this channel. Figure A.6.2 reproduces the note log structure of Chapter C. 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |F| CONTROLLER |R| VALUE/COUNT | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure A.6.2 -- Chapter C Controller Log The 7-bit CONTROLLER field identifies the controller number. For most controller numbers, the 7-bit VALUE/COUNT field codes the control value of the most recent Control Change command for this controller number. The F bit is 1 if this most recent Control Change command did not appear in the previous packet. The R bit is reserved, and is set to 0. Lazzaro/Wawrzynek [Page 22] INTERNET-DRAFT 19 February 2002 This value-coding scheme for the VALUE/COUNT field provides useful recovery information for controllers that modulate a continuous variable, such as controller 1 (Modulation Wheel). However, other controllers act as toggle switches, and better recovery methods can be implemented if the VALUE/COUNT field codes the number of toggle transitions. Exact semantics for each toggle controller type are defined below (editor's note: this list is incomplete, and will be completed before the Final Call). For controller number 66 (Sustenuto Pedal on/off), the VALUE/COUNT field has the value 0 if the most recent Sustenuto command codes a pedal release. However, if the most recent Sustenuto command codes a pedal depression, the VALUE/COUNT field codes the total number of Sustenuto depression commands present in the MIDI command section of all packets over the lifetime of the stream, including this most recent Sustenuto command. If the count exceeds 127, modulo arithmetic is used, but the value 0 is skipped. For controller numbers 120 (All Sound Off) and 123 (All Notes Off), the VALUE/COUNT field codes the total number of commands for the controller number present in the MIDI command sections of all packets over the lifetime of the stream, including this most recent command. If the count exceeds 127, modulo arithmetic is used, but the value 0 is skipped. Appendix B. Author Addresses John Lazzaro (corresponding author) UC Berkeley CS Division 315 Soda Hall Berkeley CA 94720-1776 Email: lazzaro@cs.berkeley.edu John Wawrzynek UC Berkeley CS Division 631 Soda Hall Berkeley CA 94720-1776 Email: johnw@cs.berkeley.edu Lazzaro/Wawrzynek [Page 23] INTERNET-DRAFT 19 February 2002 Appendix C. References [1] MIDI Manufacturers Association. The complete MIDI 1.0 detailed specification, 1996. http://www.midi.org [2] H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson. RFC 1889: RTP: A transport protocol for real-time applications, 1996. [3] Internet Engineering Task Force. RTP Payload Format for MPEG-4 Streams. Work in progress, draft-ietf-avt-mpeg4-multisl-02.txt. [4] Internet Engineering Task Force. Use of "RFC-generic" for MPEG-4 Elementary Streams with no SL layer. Work in progress, draft-ietf-avt-mpeg4-simple-00.txt. [5] International Standards Organization. ISO 14496 MPEG-4, Part 3 (Audio) Subpart 5 (Structured Audio) 1999. [6] John Lazzaro and John Wawrzynek. A Case for Network Musical Performance. The 11th International Workshop on Network and Operating Systems Support for Digital Audio and Video (NOSSDAV 2001) June 25-26, 2001, Port Jefferson, New York. http://www.cs.berkeley.edu/~lazzaro/sa/pubs/pdf/nossdav01.pdf [7] Sfront source code release, includes a Linux networking client that implements the MIDI RTP packetization. http://www.cs.berkeley.edu/~lazzaro/sa/ [8] Dominique Fober, Yann Orlarey, Stephane Letz. Real Time Musical Events Streaming over Internet. Proceedings of the International Conference on WEB Delivering of Music 2001, pages 147-154 http://www.grame.fr/~fober/RTESP-Wedel.pdf [9] M. Handley and V. Jacobson. RFC 2327: SDP: Session Description Protocol. 1998. Appendix D. Expiration Notice This document expires August 19, 2002. Lazzaro/Wawrzynek [Page 24]