Internet Draft Stinson S. Mathai draft-mathai-avt-smv-00.txt Lucent Technologies August 14, 2001 Expires: February 14, 2002 An RTP Payload Format for SMV Speech STATUS OF THIS MEMO This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as work in progress. The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. ABSTRACT This document describes the RTP payload format for Selectable Multirate Vocoder (SMV) Speech. The packet format supports various formats for different application scenarios. An bundled/interleaved format is included to reduce the effect of packet loss on speech quality. A non-bundled format is also supported for conversational applications. Table of Contents 1. Introduction ................................................... 2 2. Background ..................................................... 2 3. RTP/SMV Packet Format .......................................... 3 3.1. Type 1 RTP/SMV Packet Format ................................. 3 3.2. Type 2 RTP/SMV Packet Format ................................. 4 3.3. Detection Between the Type 1 and Type 2 Packets .............. 4 4. Packet Table of Content Entries and CODEC Data Frame Format .... 4 4.1. Packet Table of Content entries .............................. 4 4.2. The Codec Data Frame ......................................... 5 4.2.1 Rate 1 Frame Layout ......................................... 6 4.2.2 Rate 1/2 Frame Layout ....................................... 6 4.2.3 Rate 1/4 Frame Layout ....................................... 7 4.2.4 Rate 1/8 Frame Layout ....................................... 7 5. Bundling Codec Data Frames in Type 1 Packets ................... 8 6. Interleaving Codec Data Frame in Type 1 Packets ................ 8 6.1. Finding Interleave Group Boundaries .......................... 9 Stinson S. Mathai [Page 1] INTERNET-DRAFT An RTP Payload Format for SMV Speech August 14, 2001 6.2. Reconstructing Interleaved Speech ........................... 10 6.3. Receiving Invalid Interleaving Values ....................... 10 6.4. Additional Receiver Responsibilities ........................ 10 7. Handling Lost RTP Packets ..................................... 11 8. Implementation Issues ......................................... 11 8.1. Interleaving Length ......................................... 11 8.2. Signaling of Reduce Rate .................................... 12 9. IANA Considerations ........................................... 12 9.1 Storage Mode ................................................. 12 9.2 SMV MIME Registration ........................................ 12 10. Mapping to SDP Parameters .................................... 13 11. Security Considerations ...................................... 14 12. Acknowledgements ............................................. 14 13. References ................................................... 15 14. Author'`s Address ............................................ 15 1. Introduction This document describes how compressed SMV speech as produced by the SMV CODEC [1] may be formatted for use as an RTP payload type. Methods are provided to packetize the codec data frames into RTP packets, in bundled/interleaved and zero-header formats. The sender may choose among various formats the best solutions for different application scenarios based on the network condition, bandwidth restriction, delay requirements and packet-loss tolerance. The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [3]. 2. Background The Electronic Industries Association (EIA) & Telecommunications Industry Association (TIA) in association with 3GPP2 standards organization defines a speech compression algorithm for use in cdma2000 applications, called SMV. In the TR45 group this standard is called PN 4575. The SMV CODEC [1] compresses each 20 milliseconds of 8000 Hz, 16- bit sampled input speech into one of three different size output frames: Rate 1 (171 bits), Rate 1/2 (80 bits), Rate 1/4 (40 bits), or Rate 1/8 (16 bits). The CODEC chooses the output frame rate based on analysis of the input speech and the current operating mode (either normal or one of several reduced rates). The SMV uses the same rate-set as EVRC and 8K QCELP, and will require the use of a new voice service option. The SMV is designed to run in three modes, Modes 0, 1 and 2. Mode 0 is designed to provide higher quality then EVRC & QCELP13K, at the same Average Data Rate (ADR) as EVRC. Mode 1 is being designed to have the same voice quality as EVRC, however it will have a lower ADR then EVRC, approximately 70% of EVRC's ADR. Finally, mode 2 is an economy mode which has voice quality better then G723.1 at 6.3Kbps, and as good as EVRC in some conditions. The ADR for mode 2 is about 57% EVRC's ADR. Stinson S. Mathai [Page 2] INTERNET-DRAFT An RTP Payload Format for SMV Speech August 14, 2001 3. RTP/SMV Packet Format The RTP timestamp is in 1/8000 of a second units. The RTP payload data for the SMV CODEC the following two types. 3.1 Type 1 RTP/SMV Packet Format This format is intended for the situation where the sender and the receiver use interleaving and/or bundling to send one or more than one codec frames per packet. The RTP packet for this format is as follows: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | RTP Header [2] | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | RR| LLL | NNN | | +-+-+-+-+-+-+-+-+ one or more ToC entries +-------------+ | | | +-------------------------------------------------+ | | | | one or more codec data frames | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The RTP header has the expected values as described in [2]. The M bit should be set as specified in the applicable RTP profile, for example, RFC 1890. Note that RFC 1890 specifies that if the sender does not suppress silence (i.e., sends a frame on every 20 millisecond interval) the M bit will always be zero. When multiple codec data frames are present in a single RTP packet, the timestamp is, as always, that of the oldest data represented in the RTP packet. The assignment of an RTP payload type for this new packet format is outside the scope of this document, and will not be specified here. It is expected that the RTP profile for a particular class of applications will assign a payload type for this encoding, or if that is not done then a payload type in the dynamic range shall be chosen. The fields of the interleaving byte have the following meaning: Reserved (RR): 2 bits MUST be set to zero by sender, SHOULD be ignored by receiver. Interleave (LLL): 3 bits MUST have a value between 0 and 7 inclusive. A value of 0 indicates that interleaving is not used. Interleave Index (NNN): 3 bits MUST have a value less than or equal to the value of LLL. Values of NNN greater than the value of LLL are invalid. Stinson S. Mathai [Page 3] INTERNET-DRAFT An RTP Payload Format for SMV Speech August 14, 2001 Table of Content field (ToC) contains the indexes for the codec data frame(s) in the packet. There is one entry for each codec data frame. More than one codec data frame MAY be included in a single RTP packet by a sender. The data frames may be included in one of the two following manners: bundled or interleaved. Bundling of the codec data frames is described in detail in Section 5, and interleaving in Section 6. 3.2 Type 2 RTP/SMV Packet Format The Type 2 RTP/SMV Packet Format is designed for maximum efficiency in transmission of the SMV codec data. Only one codec data frame is sent with each RTP packet, and there is no ToC field prefix in the codec data. The SMV codec rate of the data frame can be found out at the receiver from the length of the codec frame, since there is only one codec data frame in each RTP packet for this type. The RTP header for Type 2 RTP/SMV Packet Format is the same as described in Section 3.1 for Type 1 RTP/SMV Packet Format. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | RTP Header [2] | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | | + ONLY one codec data frames +-+-+-+-+-+-+-+-+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3.3 Detection Between the Type 1 and Type 2 Packets All receivers MUST be able to process both types of packets. The sender MAY choose to use one or both types of packets. The packets of the two types can be distinguished by checking the payload type field in the RTP header. The association of payload type number with the packet type is done out-of-band, for example by SDP during the setup of a session. 4. Packet Table of Content Entries and CODEC Data Frame Format 4.1 Packet Table of Content entries For each of the codec data frames in Type 1 packets, there is a Table of Content (ToC) entry associated with it. The ToC entry indicates whether interleaving is present, if rate reduction is desired, if there are more entries following the current one, and the rate of the corresponding codec frame. Type 2 packets do NOT have the ToC field, Stinson S. Mathai [Page 4] INTERNET-DRAFT An RTP Payload Format for SMV Speech August 14, 2001 since there is always only one codec data frame in each Type 2 packet. Each ToC entry is one octet in size. The format of the octet is indicated below: 0 1 2 3 4 5 6 7 +-+-+-+-+-+-+-+-+ |F|D| frm type | +-+-+-+-+-+-+-+-+ Further Entry Indication (F): 1 bit Indicate if there are more ToC entries following the current on or the current one is the last in the ToC entry field. F = 1 indicates there are more ToC entries following. F = 0 indicates that the current entry is the last one in ToC. Reduce Rate (D): 1 bit Setting the 'D' bit indicates that this packet is requesting a reduced codec rate for the reverse direction. When the 'D' bit is not set the packet is requesting that the codec resume normal operation. In the case of packet loss the codec should continue to operate in the mode indicated by the last packet received. Receivers are not required to respond to the Reduce Rate signal. (See more discussion in Section 8.2). Frame Type: 6 bits The frame type values are described in the table below and the size of the associated packet is indicated in the table below: Value RATE TOTAL CODEC data frame size (in octets) --------------------------------------------------------- 0 Blank 0 1 1/8 2 2 1/4 5 3 1/2 10 4 1 22 14 Erasure 0 (SHOULD NOT be transmitted by sender) All values not listed in the above table MUST be considered reserved. Receipt of a ToC entry with a reserved value in Frame Type MUST be considered invalid data. 4.2 The Codec Data Frame The output of the SMV CODEC must be converted into CODEC data frames for inclusion in the RTP payload as follows: The bits as numbered in the standard [1], from the lowest to the Highest, are packed into octets. The lowest numbered bit (bit 1 for Rate 1, Rate 1/2 and Rate 1/8) is placed in the most significant bit (Internet bit 0) of octet 1 of the CODEC data frame, the second lowest bit is placed in the second most significant bit of the first octet, the third lowest in the third most significant bit of the Stinson S. Mathai [Page 5] INTERNET-DRAFT An RTP Payload Format for SMV Speech August 14, 2001 first octet, and so on. This continues until all of the bits have been placed in the CODEC data frame. The remaining unused bits of the last octet of the CODEC data frame MUST be set to zero (note that this is only applicable to rate 1 frames as the others fit completely into a whole number of octets). 4.2.1 Rate 1 frame layout Shown below is a detailed layout of a Rate 1 frame after it is converted into a CODEC data frame: The codec data frame for a Rate 1 frame is 22 byte long. Bits 1 through 171 from the standard Rate 1 frame are placed as indicated with bits marked with "Z" being set to zero. The Rate 1/8 and 1/2 standard frames are converted similarly but do not require zero padding because they align on octet boundaries. Rate 1 CODEC data frame (bytes 0 - 3) 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| |0|0|0|0|0|0|0|0|0|1|1|1|1|1|1|1|1|1|1|2|2|2|2|2|2|2|2|2|2|3|3|3| |1|2|3|4|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1|2| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Rate 1 CODEC data frame (bytes 18 - 21) 1 1 1 1 4 5 6 7 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1| | | | | | |4|4|4|4|4|5|5|5|5|5|5|5|5|5|5|6|6|6|6|6|6|6|6|6|6|7|7|Z|Z|Z|Z|Z| |5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1| | | | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 4.2.2 Rate 1/2 frame layout Shown below is a detailed layout of a Rate 1/2 frame after it is converted into a CODEC data frame: The codec data frame for a Rate 1/2 frame is 10 bytes long. Bits 1 through 80 from the standard Rate 1/2 frame are placed as indicated. The Rate 1/2 standard frames do not require Zero padding because they align on octet boundaries. Stinson S. Mathai [Page 6] INTERNET-DRAFT An RTP Payload Format for SMV Speech August 14, 2001 Rate 1/2 CODEC data frame (bytes 0 - 3) 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| |0|0|0|0|0|0|0|0|0|1|1|1|1|1|1|1|1|1|1|2|2|2|2|2|2|2|2|2|2|3|3|3| |1|2|3|4|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1|2| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Rate 1/2 CODEC data frame (bytes 6 - 9) 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| |4|5|5|5|5|5|5|5|5|5|5|6|6|6|6|6|6|6|6|6|6|7|7|7|7|7|7|7|7|7|7|8| |9|0|1|2|3|4|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 4.2.3 Rate 1/4 frame layout Shown below is a detailed layout of a Rate 1/4 frame after it is converted into a CODEC data frame: The codec data frame for a Rate 1/4 frame is 5 bytes long. Bits 1 through 40 from the standard Rate 1/4 frame are placed as indicated. The Rate 1/2 standard frames do not require Zero padding because they align on octet boundaries. Rate 1/4 CODEC data frame (bytes 0 - 3) 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| |0|0|0|0|0|0|0|0|0|1|1|1|1|1|1|1|1|1|1|2|2|2|2|2|2|2|2|2|2|3|3|3| |1|2|3|4|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1|2| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Rate 1/4 CODEC data frame (byte 4) 3 2 3 4 5 6 7 8 9 +-+-+-+-+-+-+-+-+ |0|0|0|0|0|0|0|0| |3|3|3|3|3|3|3|4| |3|4|5|6|7|8|9|0| +-+-+-+-+-+-+-+-+ 4.2.4 Rate 1/8 frame layout Shown below is a detailed layout of a Rate 1/8 frame after it is converted into a CODEC data frame: Stinson S. Mathai [Page 7] INTERNET-DRAFT An RTP Payload Format for SMV Speech August 14, 2001 The codec data frame for a Rate 1/8 frame is 2 bytes long. Bits 1 through 16 from the standard Rate 1/8 frame are placed as indicated. The Rate 1/8 standard frames do not require Zero padding because they align on octet boundaries. Rate 1/8 CODEC data frame (bytes 0 - 3) 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| |0|0|0|0|0|0|0|0|0|1|1|1|1|1|1|1| |1|2|3|4|5|6|7|8|9|0|1|2|3|4|5|6| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 5. Bundling Codec Data Frames in Type 1 Packets As indicated in section 3.1, more than one codec data frame MAY be included in a single RTP packet by a sender. Bundling codec data frames means multiple data frames are included consecutively in a packet without interleaving. The bundling of codec data frames is signaled by setting the LLL value in the Interleaving Byte to 0. Senders MAY support bundling. All receivers MUST support bundling. Receivers MAY signal the maximum number of codec data frames they can handle in a single RTP packet. Furthermore, senders have the following additional restrictions: o MUST never bundle more codec data frames in a single RTP packet than signaled by maxptime in Section 9. o SHOULD not bundle more codec data frames in a single RTP packet than will fit in the MTU of the RTP transport protocol. For the purpose of computing the maximum bundling value, all CODEC data frames should be assumed to have the Rate 1 size. Since no count is transmitted as part of the RTP payload and the codec data frames have differing lengths, the only way to determine how many codec data frames are present in the RTP packet is to examine the ToC field of the RTP packet until the entry with F bit set to 0 is reached. 6. Interleaving Codec Data Frames in Type 1 Packets Senders MAY support interleaving. All receivers MUST support interleaving. Receivers MAY signal the maximum number of codec data frames they can handle in a single RTP packet. Interleaving of codec data frames is signaled by setting the LLL value in the Interleaving Byte to a value between 1 and 7 inclusive. Given a time-ordered sequence of output frames from the SMV CODEC numbered 0..n, a bundling value B, and an interleave value L where n = B * (L+1) - 1, the output frames are placed into RTP packets as follows (the values of the fields LLL and NNN are indicated for each Stinson S. Mathai [Page 8] INTERNET-DRAFT An RTP Payload Format for SMV Speech August 14, 2001 RTP packet): First RTP Packet in Interleave group: LLL=L, NNN=0 Frame 0, Frame L+1, Frame 2(L+1), Frame 3(L+1), ... for a total of B frames Second RTP Packet in Interleave group: LLL=L, NNN=1 Frame 1, Frame 1+L+1, Frame 1+2(L+1), Frame 1+3(L+1), ... for a total of B frames This continues to the last RTP packet in the interleave group: L+1 RTP Packet in Interleave group: LLL=L, NNN=L Frame L, Frame L+L+1, Frame L+2(L+1), Frame L+3(L+1), ... for a total of B frames Senders MUST transmit in timestamp-increasing order. Furthermore, within each interleave group, the RTP packets making up the interleave group MUST be transmitted in value-increasing order of the NNN field. While this does not guarantee reduced end-to-end delay on the receiving end, when packets are delivered in order by the underlying transport, delay will be reduced to the minimum possible. Additionally, senders have the following restrictions: o Once beginning a session with a given maximum interleaving value set by maxinterleave in Section 9, MUST NOT increase the interleaving value exceeding the maximum interleaving the value that is signaled. o MAY change the interleaving value only between interleave groups. 6.1 Finding Interleave Group Boundaries Given an RTP packet with sequence number S, interleave value (field LLL) L, and interleave index value (field NNN) N, the interleave group consists of RTP packets with sequence numbers from S-N to S-N+L inclusive. In other words, the Interleave group always consists of L+1 RTP packets with sequential sequence numbers. The bundling value for all RTP packets in an interleave group MUST be the same. The receiver determines the expected bundling value for all RTP packets in an interleave group by the number of CODEC data frames bundled in the first RTP packet of the interleave group received. Note that this may not be the first RTP packet of the interleave group sent if packets are delivered out of order by the underlying transport. Stinson S. Mathai [Page 9] INTERNET-DRAFT An RTP Payload Format for SMV Speech August 14, 2001 On receipt of an RTP packet in an interleave group with other than the expected bundling value, the receiver MAY discard CODEC data frames off the end of the RTP packet or add erasure CODEC data frames to the end of the packet in order to manufacture a substitute packet with the expected bundling value. The receiver MAY instead choose to discard the whole interleave group and play silence. 6.2 Reconstructing Interleaved Speech Given an RTP sequence number ordered set of RTP packets in an interleave group numbered 0..L, where L is the interleave value and B is the bundling value, and CODEC data frames within each RTP packet that are numbered in order from first to last with the numbers 1..B, the original, time-ordered sequence of output frames from the CODEC may be reconstructed as follows: First L+1 frames: Frame 0 from packet 0 of interleave group Frame 0 from packet 1 of interleave group And so on up to... Frame 0 from packet L of interleave group Second L+1 frames: Frame 1 from packet 0 of interleave group Frame 1 from packet 1 of interleave group And so on up to... Frame 1 from packet L of interleave group And so on up to... Bth L+1 frames: Frame B from packet 0 of interleave group Frame B from packet 1 of interleave group And so on up to... Frame B from packet L of interleave group 6.3 Receiving Invalid Interleaving Values On receipt of an RTP packet with an invalid value of the LLL or NNN field, the RTP packet MUST be treated as lost by the receiver for the purpose of generating erasure frames as described in Section 7. 6.4 Additional Receiver Responsibilities Assume that the receiver has begun playing frames from an interleave group. The time has come to play frame x from packet n of the interleave group. Further assume that packet n of the interleave group has not been received. As described in section 7, an erasure frame will be sent to the SMV CODEC. Stinson S. Mathai [Page 10] INTERNET-DRAFT An RTP Payload Format for SMV Speech August 14, 2001 Now, assume that packet n of the interleave group arrives before frame x+1 of that packet is needed. Receivers SHOULD use frame x+1 of the newly received packet n rather than substituting an erasure frame. In other words, just because packet n was not available the first time it was needed to reconstruct the interleaved speech, the receiver SHOULD NOT assume it is not available when it is subsequently needed for interleaved speech reconstruction. 7. Handling Lost RTP Packets The SMV CODEC supports the notion of erasure frames. These are frames that for whatever reason are not available. When reconstructing interleaved speech or playing back non-interleaved speech, erasure frames MUST be fed to the SMV CODEC for all of the missing packets. Receivers MUST use the timestamp clock to determine how many CODEC data frames are missing. Each CODEC data frame advances the timestamp clock EXACTLY 160 counts. Since the bundling/interleaving value may vary, the timestamp clock is the only reliable way to calculate exactly how many CODEC data frames are missing when a packet is dropped. Specifically when reconstructing interleaved speech, a missing RTP packet in the interleave group should be treated as containing B erasure CODEC data frames where B is the bundling value for that interleave group. 8. Implementation Issues 8.1 Interleaving Length The SMV CODEC interpolates the missing speech content when given an erasure frame. However, the best quality is perceived by the listener when erasure frames are not consecutive. This makes interleaving desirable as it increases speech quality when packet loss may occur. On the other hand, interleaving can greatly increase the end-to-end delay. Where an interactive session is desired, the non-interleaved RTP payload type is recommended. When end-to-end delay is not a concern, an interleaving value (field LLL) of 4 or 5 is recommended subject to MTU limitations. The parameters maxptime and maxinterleaving at the initial setup of the session guarantees that the receiver can allocate a well-known amount of buffer space at the beginning of the session that will be sufficient for all future reception in that session. Less buffer space may be required at some point in the future if the sender decreases the bundling value or interleaving value, but never more Stinson S. Mathai [Page 11] INTERNET-DRAFT An RTP Payload Format for SMV Speech August 14, 2001 buffer space. This prevents the possibility of the receiver needing to allocate more buffer space (with the possible result that none is available). 8.2 Signaling of Reduce Rate The reduce rate signal requests a reduction of the codec rate on the reverse direction. It is not required that all implementations react to the Reduce rate signal. If an implementation does react to the Reduce rate signal, it MUST be able to process/react to the D bit in Type 1 packets. The Reduce Rate signal should only be used in one-to- one sessions. In multiparty sessions, all the received Reduce Rate signal MUST be discarded. In addition, the Reduce rate signal may also be sent through non-RTP means, which is out of the scope of this specification. 9. IANA Considerations One new MIME sub-type as described in this section is to be registered. The MIME-name for the SMV codec is allocated from the IETF tree since SMV is expected to be a widely used codec for voice-over-IP applications. The RTP mode has been described in the previous sections. 9.1 Storage Mode The storage mode is used for storing speech frames, e.g. as a file or e-mail attachment. The file begins with a magic number to identify that it is an SMV file. The magic number for SMV corresponds to the ASCII character string "#!SMV\n", i.e., 0x2321534d560a. The speech codec frames are stored in consecutive order with the TOC entry byte prefix each codec frame data. Speech frames lost in transmission and non-received frames MUST be stored as erasure frames (frame type 14, see definition in Section 4.1) to keep synchronization with the original media. 9.2 SMV MIME Registration Media Type Name: audio Media Subtype Name: SMV Stinson S. Mathai [Page 12] INTERNET-DRAFT An RTP Payload Format for SMV Speech August 14, 2001 Required Parameters: ptype: It is the type of the RTP/SMV packets. The valid values are 1 or 2. Optional parameters for RTP mode: ptime: Defined as usual for RTP audio. maxptime: The maximum amount of media which can be encapsulated in each packet, expressed as time in milliseconds. The time shall be calculated as the sum of the time the media present in the packet represents. The time SHOULD be a multiple of the frame size. If not signaled, the default maxptime value is 200 milliseconds. maxinterleave: Maximum number for interleaving value. The interleaving values used in the entire session should not exceed this maximum value. If not signaled, the maxinterleave value is 5. Optional parameters for storage mode: none Encoding considerations for RTP mode: see Section 5 and Section 6 of RFC xxxx. Encoding considerations for storage mode: The SMV speech frames are packed into consecutive compound SMV payloads, see Section 5 and Section 6 of RFC xxxx. The compound SMV payloads must be stored in sequential order. Furthermore, missing frames and non-received frames during non-speech period must be encapsulated into a compound SMV payload as blank frames or erasures. Each receiving entity that accepts this MIME type must be able to decode all SMV coding modes. Security considerations: see Section 11 "Security Considerations" of RFC xxxx. Public specification: RFC xxxx. Additional information for storage mode: Magic number: #!SMV\n File extensions: evc, EVC Macintosh file type code: none Object identifier or OID: none Intended usage: COMMON. It is expected that many VoIP applications (as well as mobile applications) will use this type. Person & email address to contact for further information: Smathai@lucent.com Stinson S. Mathai [Page 13] INTERNET-DRAFT An RTP Payload Format for SMV Speech August 14, 2001 Author/Change controller: Smathai@lucent.com IETF Audio/Video transport working group 10. Mapping to SDP Parameters Please note that this chapter applies to the RTP mode only. Parameters are mapped to SDP [5] as usual. Example usage in SDP: m = audio 49120 RTP/AVP 97 a = rtpmap:97 SMV a = fmtp:97 ptype=1; maxptime=4 11. Security Considerations RTP packets using the payload format defined in this specification are subject to the security considerations discussed in the RTP specification [2], and any appropriate profile (for example [4]). This implies that confidentiality of the media streams is achieved by encryption. Because the data compression used with this payload format is applied end-to-end, encryption may be performed after compression so there is no conflict between the two operations. A potential denial-of-service threat exists for data encoding using compression techniques that have non-uniform receiver-end computational load. The attacker can inject pathological datagrams into the stream which are complex to decode and cause the receiver to be overloaded. However, this encoding does not exhibit any significant non-uniformity. As with any IP-based protocol, in some circumstances, a receiver may be overloaded simply by the receipt of too many packets, either desired or undesired. Network-layer authentication may be used to discard packets from undesired sources, but the processing cost of the authentication itself may be too high. In a multicast environment, pruning of specific sources may be implemented in future versions of IGMP [6] and in multicast routing protocols to allow a receiver to select which sources are allowed to reach it. Interleaving MAY affect encryption. Depending on the used encryption scheme there MAY be restrictions on for example the time when keys can be changed. 12. Acknowledgements The author would like to thank the editor and authors of draft-ietf-avt-evrc-06.txt, since the text in this draft closely follows draft-ietf-avt-evrc-06.txt. Stinson S. Mathai [Page 14] INTERNET-DRAFT An RTP Payload Format for SMV Speech August 14, 2001 13. References [1] PN-4575 Selectable Mode Vocoder Service Option for Wideband Spread Spectrum Communication Systems. [2] Schulzrinne, H., Casner, S., Frederick, R. and V. Jacobson, "RTP: A Transport Protocol for Real-Time Applications", RFC 1889, January 1996. [3] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [4] Schulzrinne, H., "RTP Profile for Audio and Video Conferences with Minimal Control", RFC 1890, January 1996. [5] M. Handley and V. Jacobson, "SDP: Session Description Protocol", RFC 2327, April 1998. [6] Deering, S., "Host Extensions for IP Multicasting", STD 5, RFC 1112, August 1989. 14. Authors' Address Stinson S. Mathai Lucent Technologies Room 1E-550 263 Shuman Blvd Naperville, IL 60566 USA Phone: +1 630 713 5190 Email: smathai@lucent.com Stinson S. Mathai [Page 15]