Audio/Video Transport WG Ari Lakaniemi Internet Draft Ye-Kui Wang Intended status: Standards track Nokia Expires: March 2009 September 28, 2008 RTP payload format for G.718 speech/audio draft-lakaniemi-avt-rtp-evbr-03.txt Status of this Memo By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html This Internet-Draft will expire on March 28, 2009. Copyright Notice Copyright (C) The IETF Trust (2008). Abstract This document specifies the Real-Time Transport Protocol (RTP) payload format for the Embedded Variable Bit-Rate (EV-VBR) speech/audio codec, specified in ITU-T G.718. A media type registration for this RTP payload format is also included. Lakaniemi, Wang Expires March 28, 2009 [Page 1] Internet-Draft RTP payload for G.718 speech/audio September 2008 Conventions used in this document The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119]. Table of Contents 1. Introduction...................................................3 2. Background.....................................................3 2.1. The EV-VBR codec..........................................3 2.2. Benefits of layered design................................5 2.3. Transmitting layered data.................................5 2.4. Scaling scenarios & rate control..........................6 3. EV-VBR RTP payload format......................................7 3.1. Payload Structure.........................................7 3.1.1. Payload Header.......................................7 3.1.2. EV-VBR transport blocks..............................8 3.2. Handling the Encoded data................................11 3.3. EV-VBR scaling...........................................13 3.4. CRC verification.........................................14 3.5. EV-VBR session...........................................14 3.6. Cross-stream/cross-layer timing synchronization..........14 3.7. RTP Header usage.........................................15 4. Payload Format Parameters.....................................15 4.1. Media Type Registration..................................15 4.2. Mapping to SDP Parameters................................17 4.3. Offer/answer considerations..............................18 4.4. Declarative usage of SDP.................................18 4.5. SDP examples.............................................19 5. Security Considerations.......................................20 6. Congestion control............................................21 7. IANA Considerations...........................................22 APPENDIX A: Payload examples.....................................23 A.1. Simple payload examples..................................23 A.1.1. All the layers in the same payload..................23 A.1.2. Layers in separate RTP streams......................24 A.2. Advanced examples........................................25 A.2.1. Different update rate for subset of layers..........25 A.2.2. Redundant frames with limited set of layers.........26 8. References....................................................28 8.1. Normative References.....................................28 8.2. Informative References...................................29 Author's Addresses...............................................29 Intellectual Property Statement..................................30 Disclaimer of Validity...........................................30 Lakaniemi, Wang Expires March 28, 2009 [Page 2] Internet-Draft RTP payload for G.718 speech/audio September 2008 Copyright Statement..............................................30 Acknowledgment...................................................30 9. Open Issues...................................................31 10. Changes Log..................................................32 1. Introduction The International Telecommunication Union (ITU-T) Recommendation G.718 [G.718] specifies the Embedded Variable Bit Rate (EV-VBR) speech/audio codec. This document specifies the Real-time Transport Protocol (RTP) [RFC3550] payload format for this codec. 2. Background 2.1. The EV-VBR codec EV-VBR is an embedded variable rate speech codec having a layered design. The bitstream of the EV-VBR core codec consists of a core layer, denoted as L1, and four enhancement layers, denoted as L2-L5. The bit-rates of the EV-VBR core codec range from 8 kbit/s (core layer only) to 32 kbit/s (with all layers up to L5). Furthermore, the EV-VBR codec supports also discontinuous transmission (DTX) and comfort noise generation (CNG) by sending Silence Descriptor (SID) frames during periods of non-active input signal, resulting in a reduced bit-rate. The sampling frequency of the core codec is 16 kHz and the codec operates on 20 ms frames. The EV-VBR codec is also capable of narrowband operation with audio input and/or output at 8 kHz sampling frequency. While transmitting/receiving the core layer L1 is enough for successful decoding of the audio content, each of the enhancement layers Ln (n being 2 to 5, inclusive) provides an improvement to reconstructed audio quality. Thus, the core layer ensures the basic communication while the enhancement layers can be used to improve the perceptual quality. Furthermore, enhancement layers are dependent on all the lower layers in a sense that successful decoding of layer Ln requires also all the layers Lm with mn MUST also be discarded. 3.5. EV-VBR session An EV-VBR session consists of one or several RTP sessions carrying encoded EV-VBR data according the payload format specified in section 3.2. 3.6. Cross-stream/cross-layer timing synchronization In case an EV-VBR session consists of multiple RTP sessions, the RTP packets transmitted on separate RTP sessions need to be synchronized in order to enable reconstruction of the frames in the receiving end. Since each of the RTP sessions uses its own random initial value for the RTP timestamp, there is also a random offset between the RTP timestamps values carrying the EDUs belonging to the same encoded frame in different RTP sessions. The receiver SHOULD use the traditional RTCP based mechanism to synchronize streams by using the RTP and NTP timestamps of the RTCP Sender Reports (SR) it receives. Author's note: The above approach for cross-session synchronization is not possible until the first RTCP SRs are received in all sessions. This implies that decoding only a subset of layers may be possible until RTCP SRs in all sessions have been received. This may imposes higher end-to-end delay or higher bandwidth for RTCP data, and the approach may not work perfectly for some multicast topologies. There is a study ongoing by some AVT members. Once there is an acceptable solution the draft documenting that solution may be referenced herein. Lakaniemi, Wang Expires March 28, 2009 [Page 14] Internet-Draft RTP payload for G.718 speech/audio September 2008 3.7. RTP Header usage This section specifies the usage of some fields of the RTP header (specified in section 5 of [RFC3550]) with the EV-VBR RTP payload format. In case the EV-VBR session consists of multiple RTP sessions, the RTP sessions are further separated by using different payload type (PT) values for each of the RTP streams. In case of all layers carried within a single RTP session there is need for only one PT. Note that the assignment of the PT number(s) for this payload format are outside the scope of this document. It is expected that the RTP profile under which this payload is used will either assign PT number(s) for this encoding or specify the PT number(s) to be dynamically assigned. The RTP timestamp corresponds to the sampling instant of the first encoded sample of the earliest frame in the payload. The timestamp clock frequency is 32 kHz. The marker bit (M) of each of the RTP streams of the session SHALL be set to value 1 if the payload carries an EDU belonging to the first frame after an inactive period, i.e. an EDU from the first frame of a talkspurt. For all other packets the marker bit is set to value 0. 4. Payload Format Parameters This section defines the parameters that may be used to configure optional features in the EV-VBR RTP transmission. The parameters are defined here as part of the media subtype registration for the EV-VBR codec. Mapping of the parameters into the Session Description Protocol (SDP) [RFC4566] is also provided for those applications that use SDP. In control protocols that do not use MIME or SDP, the media type parameters must be mapped to the appropriate format used with that control protocol. 4.1. Media Type Registration This registration is done using the template defined in RFC 4288 [RFC4288] and following RFC 4855 [RFC4855]. Type name: audio Subtype name: EV-VBR Required parameters: none Lakaniemi, Wang Expires March 28, 2009 [Page 15] Internet-Draft RTP payload for G.718 speech/audio September 2008 Optional parameters: mode: This parameter MAY be used to indicate whether the mode with layer L1 being present or the AMR-WB compatible mode (with layer L1' being present) is in use. If this parameter is not present or the value of this parameter is equal to 0, the mode with layer L1 being present is in use. Otherwise, the AMR-WB compatible mode is in use. When this parameter is present, the value MUST be either 0 or 1. Author's note: When the upcoming stereo and SWB options are present, the semantics of this parameter may change. layers: The numbers of the layers (in range from 1 to 5, denoting layers from L1 to L5, respectively) transmitted in this session, expressed as comma- separated list of layer numbers. If the parameter is present, at least layer L1 or L1' MUST be included in the list of layers in one of the RTP sessions included in the EV-VBR session. If the parameter is not present, all layers up to layer L5 MAY be used in the session. Author's note: Why not use semantics similarly as L-ID? ptime: The recommended length of time (in milliseconds) represented by the media in a packet. See Section 6 of [RFC4566]. maxptime: The maximum length of time (in milliseconds) that can be encapsulated in a packet. See Section 6 of [RFC4566] Author's note: Some further study is needed to see if separate parameters for sending and receiving capabilities/preferences are needed -- especially for upcoming stereo and SWB options. Author's note: The support for upcoming SWB and stereo options needs to be taken into account. Basically we can either 1) extend the parameter "layers" to cover also this aspect, or 2) define separate parameter(s) for these new options when more details on the stereo/SWB support are available. Encoding considerations: Lakaniemi, Wang Expires March 28, 2009 [Page 16] Internet-Draft RTP payload for G.718 speech/audio September 2008 This media type is framed and contains binary data; see Section 4.8 of [RFC4288]. Security considerations: See Section 6 of RFC xxxx Interoperability considerations: none Published specification: RFC xxxx Applications which use this media type: For example Voice over IP, audio and video conferencing, audio streaming and voice messaging. Additional information: none Person & email address to contact for further information: Ari Lakaniemi, ari.lakaniemi@nokia.com Intended usage: COMMON Restrictions on usage: This media type depends on RTP framing, and hence is only defined for transfer via RTP [RFC3550] Author: Ari Lakaniemi, ari.lakaniemi@nokia.com Change controller: IETF Audio/Video Transport working group delegated from the IESG 4.2. Mapping to SDP Parameters The information carried in the media type specification has a specific mapping to fields of the SDP [RFC4566], which is commonly used to describe RTP sessions. When SDP is used to specify sessions employing the EV-VBR codec, the mapping is as follows: o The media type ("audio") goes in SDP "m=" as the media name. Lakaniemi, Wang Expires March 28, 2009 [Page 17] Internet-Draft RTP payload for G.718 speech/audio September 2008 o The media subtype ("EV-VBR") goes in SDP "a=rtpmap" as the encoding name. The RTP clock rate in "a=rtpmap" MUST be 32000 for EV-VBR. Author's note: The current choice for the RTP clock rate is a 'placeholder'. The clock rate needs to be set according to SWB sampling rate, which is still T.B.D. Since the core codec employs 16000 Hz sampling rate, an integer multiple of 16000 Hz seems to be a preferable choice. o The parameters "ptime" and "maxptime" go in the SDP "a=ptime" and "a=maxptime" attributes, respectively. o Any remaining parameters go in the SDP "a=fmtp" attribute by copying them directly from the media type string as a semicolon separated list of parameter=value pairs. 4.3. Offer/answer considerations The following considerations apply when using the SDP offer/answer [RFC3264] mechanism to negotiate the EV-VBR transport. The parameter "layers" MAY be used to indicate the layer configuration for the each RTP session belonging to current EV-VBR session an end-point making the offer is ready to transmit and wishes to receive. o In case the EV-VBR session consists of a single RTP session, it is RECOMMENDED not to impose any layer restrictions for the session but to use the rate control functionality to set possible restrictions on usage of the higher or highest layers. If the offer includes a layer configuration parameter, the answer MAY use different configuration, but the highest layer in the answer MUST NOT be higher than the highest layer of the offered configuration. Author's note: Support for answer modifying the layer configuration is FFS. In case the EV-VBR session consists of multiple RTP sessions, the answer MUST use the layer configurations provided in the offer for the sessions it accepts. 4.4. Declarative usage of SDP In declarative usage, such as SDP in RTSP [RFC2326] or SAP [RFC2974], the parameter "layers" SHALL be interpreted to provide a set of layers that the sender may use in the session. Lakaniemi, Wang Expires March 28, 2009 [Page 18] Internet-Draft RTP payload for G.718 speech/audio September 2008 4.5. SDP examples Some example SDP session descriptions utilizing EV-VBR encodings are provided below. The first example illustrates the simple case where the EV-VBR session employing a single RTP session and the AVPF profile is offered, and the answer accepts the offer without any changes. Offer: m=audio 49120 RTP/AVPF 97 a=rtpmap:97 EV-VBR/32000/1 Answer: m=audio 49120 RTP/AVPF 97 a=rtpmap:97 EV-VBR/32000/1 The second example shows a bit more complex case where the EV-VBR session using a single RTP session and the AVPF profile is offered with restriction to send/receive only with layers L1 and L2. The answer indicates that the other end-point is happy to receive (and send) layers up to L5. Offer: m=audio 49120 RTP/AVPF 97 a=rtpmap:97 EV-VBR/32000/1 a=fmtp:97 layers=1,2 Answer: m=audio 49120 RTP/AVPF 97 a=rtpmap:97 EV-VBR/32000/1 a=fmtp:97 layers=1,2,3,4,5 The third example shows an EV-VBR session using multiple RTP sessions with the AVPF profile. The answerer wishes to use only layers up to L3. Offer: Lakaniemi, Wang Expires March 28, 2009 [Page 19] Internet-Draft RTP payload for G.718 speech/audio September 2008 m=audio 49120 RTP/AVPF 97 a=rtpmap:97 EV-VBR/32000/1 a=fmtp:97 layers=1,2 a=mid=1 m=audio 49122 RTP/AVPF 98 a=rtpmap:98 EV-VBR/32000/1 a=fmtp:98 layers=3 a=mid=2 a=depend:lay 1 m=audio 49124 RTP/AVPF 99 a=rtpmap:99 EV-VBR/32000/1 a=fmtp:99 layers=4,5 a=mid=3 a=depend:lay 1 2 Answer: m=audio 49120 RTP/AVPF 97 a=rtpmap:97 EV-VBR/32000/1 a=fmtp:97 layers=1,2 a=mid=1 m=audio 49120 RTP/AVPF 98 a=rtpmap:98 EV-VBR/32000/1 a=fmtp:98 layers=3 a=mid=2 a=depend:lay 1 Note that the dependency signaling according to [smd-sdp] is used in the third example above to indicate the relationship between the layers distributed into separate RTP sessions. 5. Security Considerations RTP packets using the payload format defined in this specification are subject to the security considerations discussed in the RTP specification [RFC3550], and in any appropriate RTP profile (for example [RFC3551] or [RFC4585]). This implies that confidentiality of the media streams is achieved by encryption; for example, through the application of SRTP [RFC3711]. Because the data compression used with this payload format is applied end-to-end, any encryption needs to be performed after compression. A potential denial-of-service threat exists for data encodings using compression techniques that have non-uniform receiver-end Lakaniemi, Wang Expires March 28, 2009 [Page 20] Internet-Draft RTP payload for G.718 speech/audio September 2008 computational load. The attacker can inject pathological datagrams into the stream that will increase the processing load of the decoder and may cause the receiver to be overloaded. For example inserting additional EDUs representing the higher enhancement layers on top of the ones actually transmitted may increase the decoder load. However, the EV-VBR codec is not particularly vulnerable to such an attack, since the majority of the computational load in an EV-VBR session is associated to the encoder. Another form of possible attach might be forging of codec bit-rate control messages, which may result in encoder operating employing higher number of enhancement layers than originally intended and thereby requiring larger amount of computation resources. Therefore, the usage of data origin authentication and data integrity protection of at least the RTP packet is RECOMMENDED; for example, with SRTP [RFC3711]. Note that the appropriate mechanism to ensure confidentiality and integrity of RTP packets and their payloads is very dependent on the application and on the transport and signaling protocols employed. Thus, although SRTP is given as an example above, other possible choices exist. Note that end-to-end security with either authentication, integrity or confidentiality protection will prevent a network element not within the security context from performing media-aware operations other than discarding complete packets. To allow any (media-aware) intermediate network element to perform its operations, it is required to be a trusted entity which is included in the security context establishment. 6. Congestion control As scalable codec EV-VBR implicitly provides means for congestion control by providing a possibility for 'thinning' the bitstream. The RTP payload format according to this specification provides several different means for reducing the EV-VBR session bandwidth. The most appropriate mechanism (in terms of impact to the user experience) depends on the employed payload structure and also on the employed session configuration (single RTP session or multiple RTP sessions). The following means (in no particular order) can be used to assist congestion control procedures -- either by the sender or by the intermediate node. o The transport blocks carrying the EDUs representing the highest layers within the payload may be dropped. o The payloads carrying the EDUs representing the highest layers in an EV-VBR session are dropped. Lakaniemi, Wang Expires March 28, 2009 [Page 21] Internet-Draft RTP payload for G.718 speech/audio September 2008 o Transport blocks or payloads carrying EDUs belonging to redundant frames included in the payload are dropped. 7. IANA Considerations IANA is kindly requested to register a media type for the EV-VBR codec for RTP transport, as specified in section 5.1 of this document. Lakaniemi, Wang Expires March 28, 2009 [Page 22] Internet-Draft RTP payload for G.718 speech/audio September 2008 APPENDIX A: Payload examples The EV-VBR payload structure enables flexible transport either by carrying all layers in the same payload or separating the layers into separate payloads. The following subsections illustrate different possibilities for transport by simple examples. Note that examples do not show the full payload structure to keep the illustration simple. A.1. Simple payload examples A.1.1. All the layers in the same payload The illustration below shows layers L1-L3 from two encoded frames encapsulated into separate payloads using single transport block. +-------+--------+-----+------+------+------+ | RTP1 | L-ID=3 |NF=0 |F1-L1 |F1-L2 |F1-L3 | +-------+--------+-----+------+------+------+ +-------+--------+-----+------+------+------+ | RTP2 | L-ID=3 |NF=0 |F2-L1 |F2-L2 |F2-L3 | +-------+--------+-----+------+------+------+ In case the same layers from two input frames are encapsulated into one payload using single transport block, the structure is as shown below. +-------+--------+-----+------+------+------+------+------+------+ | RTP1 | L-ID=3 |NF=1 |F1-L1 |F2-L1 |F1-L2 |F2-L2 |F3-L3 |F2-L3 | +-------+--------+-----+------+------+------+------+------+------+ The third example illustrates the case where the layers L1-L3 from two input frames are encapsulated into one payload using two separate transport blocks, the first one carrying L1 and the other one containing L2 and L3. Lakaniemi, Wang Expires March 28, 2009 [Page 23] Internet-Draft RTP payload for G.718 speech/audio September 2008 +-------+--------+-----+------+------+ | RTP1 | L-ID=1 |NF=1 |F1-L1 |F2-L1 | +-------+--------+-----+------+------+------+------+ | L-ID=7 |NF=1 |F1-L2 |F2-L2 |F2-L2 |F2-L3 | +--------+-----+------+------+------+------+ A.1.2. Layers in separate RTP streams In this case the data for each layer is transmitted in its own payload. In the first example each transport block including a single EDU is carried in its own RTP payload. +-------+--------+-----+-----+ +-------+--------+-----+-----+ | RTP1a | L-ID=1 |NF=0 |F1-L1| | RTP1b | L-ID=6 |NF=0 |F1-L2| +-------+--------+-----+-----+ +-------+--------+-----+-----+ +-------+--------+-----+-----+ +-------+--------+-----+-----+ | RTP1c |L-ID=10 |NF=0 |F1-L3| | RTP2a | L-ID=1 |NF=0 |F2-L1| +-------+--------+-----+-----+ +-------+--------+-----+-----+ +-------+--------+-----+-----+ +-------+--------+-----+-----+ | RTP2b | L-ID=6 |NF=0 |F2-L2| | RTP2c |L-ID=10 |NF=0 |F2-L3| +-------+--------+-----+-----+ +-------+--------+-----+-----+ If the payloads carry data from two consecutive input frames, the same encoded data as in the previous example is arranged as follows. Lakaniemi, Wang Expires March 28, 2009 [Page 24] Internet-Draft RTP payload for G.718 speech/audio September 2008 +-------+--------+-----+-----+-----+ | RTP1a | L-ID=1 |NF=1 |F1-L1|F2-L1| +-------+--------+-----+-----+-----+ +-------+--------+-----+-----+-----+ | RTP1b | L-ID=6 |NF=1 |F1-L2|F2-L2| +-------+--------+-----+-----+-----+ +-------+--------+-----+-----+-----+ | RTP1c |L-ID=10 |NF=1 |F1-L3|F2-L3| +-------+--------+-----+-----+-----+ A.2. Advanced examples A.2.1. Different update rate for subset of layers An example employing different update rates (i.e. different number of frames per packet) for selected subsets of layers. In these examples all core codec layers L1-L5 are shown. Lakaniemi, Wang Expires March 28, 2009 [Page 25] Internet-Draft RTP payload for G.718 speech/audio September 2008 +-------+--------+-----+-----+-----+-----+-----+ | RTP1 | L-ID=1 |NF=3 |F1-L1|F2-L1|F3-L1|F4-L1| +-------+--------+-----+-----+-----+-----+-----+ +-------+--------+-----+-----+-----+-----+-----+ | RTP2a | L-ID=7 |NF=1 |F1-L2|F2-L2|F1-L3|F2-L3| +-------+--------+-----+-----+-----+-----+-----+ +-------+--------+-----+-----+-----+ | RTP3a |L-ID=14 |NF=0 |F1-L4|F1-L5| +-------+--------+-----+-----+-----+ +-------+--------+-----+-----+-----+ | RTP3b |L-ID=14 |NF=0 |F2-L4|F2-L5| +-------+--------+-----+-----+-----+ +-------+--------+-----+-----+-----+-----+-----+ | RTP2b | L-ID=7 |NF=1 |F3-L2|F4-L2|F3-L3|F4-L3| +-------+--------+-----+-----+-----+-----+-----+ +-------+--------+-----+-----+-----+ | RTP3c |L-ID=14 |NF=0 |F3-L4|F3-L5| +-------+--------+-----+-----+-----+ +-------+--------+-----+-----+-----+ | RTP3d |L-ID=14 |NF=0 |F4-L4|F4-L5| +-------+--------+-----+-----+-----+ A.2.2. Redundant frames with limited set of layers An example transmitting layers L1-L3 as primary data and L1 (of the previous frame) as redundant data is shown below. Each payload carries one primary (i.e. new) frame in one transport block and one redundant frame, which in this example is the frame preceding the primary frame, in another transport block. Lakaniemi, Wang Expires March 28, 2009 [Page 26] Internet-Draft RTP payload for G.718 speech/audio September 2008 +-------+--------+-----+-----+--------+-----+-----+-----+-----+ | RTP1 | L-ID=1 |NF=0 |F0-L1| L-ID=3 |NF=0 |F1-L1|F1-L2|F1-L3| +-------+--------+-----+-----+--------+-----+-----+-----+-----+ +-------+--------+-----+-----+--------+-----+-----+-----+-----+ | RTP2 | L-ID=1 |NF=0 |F1-L1| L-ID=3 |NF=0 |F2-L1|F2-L2|F2-L3| +-------+--------+-----+-----+--------+-----+-----+-----+-----+ +-------+--------+-----+-----+--------+-----+-----+-----+-----+ | RTP3 | L-ID=1 |NF=0 |F2-L1| L-ID=3 |NF=0 |F3-L1|F3-L2|F3-L3| +-------+--------+-----+-----+--------+-----+-----+-----+-----+ Alternatively, the payload carrying also redundant data for a subset of layers can be arranged differently, as shown in the example below. +-------+--------+-----+-----+-----+-----+--------+-----+-----+ | RTP1 | L-ID=3 |NF=0 |F0-L1|F0-L2|F0-L3| L-ID=1 |NF=0 |F1-L1| +-------+--------+-----+-----+-----+-----+--------+-----+-----+ +-------+--------+-----+-----+-----+-----+--------+-----+-----+ | RTP2 | L-ID=3 |NF=0 |F1-L1|F1-L2|F1-L3| L-ID=1 |NF=0 |F2-L1| +-------+--------+-----+-----+-----+-----+--------+-----+-----+ +-------+--------+-----+-----+-----+-----+--------+-----+-----+ | RTP3 | L-ID=3 |NF=0 |F2-L1|F2-L2|F2-L3| L-ID=1 |NF=0 |F3-L1| +-------+--------+-----+-----+-----+-----+--------+-----+-----+ Now the first transport block carries the primary data and the second transport block carries the redundant data, which in this case covers the frame following the primary frame. The benefit of this approach is that the redundant data is included in the last (secondary) transport block of the payload, which might be beneficial for possible payload scaling operation within the network. Lakaniemi, Wang Expires March 28, 2009 [Page 27] Internet-Draft RTP payload for G.718 speech/audio September 2008 8. References 8.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC3550] Schulzrinne, H., Casner, S., Frederick, R. and Jacobson, V., "RTP: A Transport Protocol for Real-Time Applications", STD 64, RFC 3550, July 2003. [G.718] ITU-T Recommendation G.718, "Frame Error Robust Narrowband and Wideband Embedded Variable Bit-Rate Coding of Speech and Audio from 8-32 Kbit/s", (consented) May 2008. [AMR-WB] 3GPP TS 26.171, "Adaptive Multi-Rate Wideband (AMR-WB) speech codec; General description (Release 7)", v7.0.0, September 2006. [RFC4867] Sjoberg, J., Westerlund, M., Lakaniemi, A., Xie, Q., "RTP Payload Format and File Storage Format fort he Adaptive Multi-Rate (AMR) and Adaptive Multi-Rate Wideband (AMR-WB) Audio Codecs", RFC 4867, April 2007. [RFC5104] Wenger, S., Chandra, U., Westerlund, M., Burman, B., "Codec Control Messages in the RTP Audio-Visual Profile with Feedback (AVPF)", RFC 5104, Feburary 2008. [RFC4585] Ott, J., Wenger, S., Sato, N., Burmeister, C., Rey, J., "Extended RTP Profile for Real-Time Transport Control Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC 4585, July 2006. [RFC4566] Handley, M., Jacobson, V. and Perkins, C., "SDP: Session Description Protocol", RFC 4566, July 2006. [RFC4288] Freed, N., Klensin, J., "Media Type Specifications and Registration Procedures", BCP 13, RFC 4288, December 2005. [RFC4855] Casner, S., "Media Type Registration of RTP Payload Formats", RFC 4855, February 2007. [RFC3264] Rosenberg, J., Schulzrinne, H., "An Offer/Answer Model with Session Description Protocol (SDP)", RFC 3264, June 2002. Lakaniemi, Wang Expires March 28, 2009 [Page 28] Internet-Draft RTP payload for G.718 speech/audio September 2008 [smd-sdp] Schierl, T., Wenger, S., "Signaling media decoding dependency in Session Description Protocol (SDP)", draft- schierl-mmusic-layered-codec-04 (work in progress), June 2007. [RFC3551] Schulzrinne, H., Casner, S., "RTP Profile for Audio and Video Conferences with Minimal Control", STD 65, RFC 3551, July 2003. [RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., Norrman, K., "The Secure Real-Time Transport Protocol (SRTP)", RFC 3711, March 2004. 8.2. Informative References [McCanne] McCanne, S., Jacobson, V., and Vetterli, M., "Receiver- driven layered multicast", in Proc. of ACM SIGCOMM'96, pages 117--130, Stanford, CA, August 1996. [RFC5117] Westerlund, M., Wenger, S., "RTP Topologies", RFC 5117, January 2008. [RFC2326] Schulzrinne, H., Rao, A., Lanphier, R., "Real Time Streaming Protocol (RTSP)", RFC 2326, April 1998. [RFC2974] Handley, M., Perkins, C., Whelan, E., "Session Announcement Protocol", RFC 2974, October 2000. Author's Addresses Ari Lakaniemi Nokia P.O.Box 407 FIN-00045 Nokia Group, FINLAND Phone: +358-71-8008000 Email: ari.lakaniemi@nokia.com Ye-Kui Wang Nokia Research Center P.O. Box 1000 33721 Tampere Finland Phone: +358-50-466-7004 EMail: ye-kui.wang@nokia.com Lakaniemi, Wang Expires March 28, 2009 [Page 29] Internet-Draft RTP payload for G.718 speech/audio September 2008 Intellectual Property Statement The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79. Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf-ipr@ietf.org. Disclaimer of Validity This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Copyright Statement Copyright (C) The IETF Trust (2008). This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights. Acknowledgment Funding for the RFC Editor function is currently provided by the Internet Society. Lakaniemi, Wang Expires March 28, 2009 [Page 30] Internet-Draft RTP payload for G.718 speech/audio September 2008 9. Open Issues 1) Support of super-wideband (SWB) audio and stereophonic encoding extensions to ITU-T G.718 currently being worked on by ITU-T is to be specified after ITU-T completes the work in that regards. a. Some further study is needed to see if separate parameters for sending and receiving capabilities/preferences are needed -- especially for upcoming stereo and SWB options. b. The support for upcoming SWB and stereo options needs to be taken into account. Basically we can either 1) extend the parameter "layers" to cover also this aspect, or 2) define separate parameter(s) for these new options when more details on the stereo/SWB support are available. 2) For streaming or other applications that allow for relatively long end-to-end delay, sometimes it would be beneficial to aggregate more than 4 frames in one Transport Block (TB). Should the length of the NF field be larger? 3) On layer structure and configuration signalling. Currently, a unique layer ID is assigned for any possible layer combinations. See the editing notes below Table 3 for other possible approaches. One of the alternative ways may be chosen in the final draft. 4) Currently, it is mandated that lower layer EDUs of later frames go before higher layer EDUs of earlier frames in a transport block. This way is friendlier to adaptation (dropping of higher layers). However, if all layers are received, then the depacketizer needs to reorder the EDUs to their decoding order before feeding them to the decoder. Therefore, the other way around (i.e. lower layer EDUs of later frames go after higher layer EDUs of earlier frames, or EDUs in transport blocks are placed in decoding order) is more friendly to the depacketizer. Another benefit of the latter is that it does not introduce any end-to-end delay. Which way to be specified (or both allowed if needed) is FFS. 5) MANEs dropping RTP packets are RTP translators. But are those MANEs dropping a subset of the transport blocks in one packet also RTP translators? 6) The RTCP based cross-session synchronization is not possible until the first RTCP SRs are received in all sessions. This implies that decoding only a subset of layers may be possible until RTCP SRs in all sessions have been received. This may imposes higher end-to- end delay or higher bandwidth for RTCP data, and the approach may Lakaniemi, Wang Expires March 28, 2009 [Page 31] Internet-Draft RTP payload for G.718 speech/audio September 2008 not work perfectly for some multicast topologies. There is a study ongoing by some AVT members. Once there is an acceptable solution fouthe draft documenting that solution may be referenced in this draft. 7) It might be better to change the semantics of the media type parameter 'layers' to be similar as that for L-ID. 8) Offer/answer with answer being capable of modifying the layer configuration is FFS. 9) Some references need to be updated in the final draft. 10. Changes Log From draft-lakaniemi-art-rtp-evbr-02 to From draft-lakaniemi-art-rtp- evbr-03 - In section 2.1, 1) updated the text and tables to include sampling rates and output as NB or WB, 2) corrected the bit rate values in Table 2, 3) clarified that all AMR-WB modes can be supported, and 4) added that in the AMR-WB interoperable mode, when the base layer L1' is transported in its own RTP packet stream, the packetisation specified in [RFC4867] MUST be used, to enable legacy RFC4867 receivers to receive the base layer L1'. - In section 3.1.2, added one more alternative way on layer structure and configuration signalling in an editing note. This uses separate L-ID value spaces for different modes. For example, the mode with L1 being present and the AMR-WB compatible mode (with L1' being present) use different value spaces of L-ID. - In section 3.1.2, clarified that the encoded data is not present (i.e. consists of zero octet) for an empty frame (with L-ID equal to 0). - In section 3.2, clarified that MANEs dropping some of the layers are RTP translators, and added references to RFC 5117 and RFC 3550, per Colin's comment. - In section 3.6, removed the payload specific multi-session transmission decoder order recovery mechanism based on time synchronization. In stead, the RTCP based synchronization mechanism is used (with a wording SHOULD). Lakaniemi, Wang Expires March 28, 2009 [Page 32] Internet-Draft RTP payload for G.718 speech/audio September 2008 - Removed the original section 4. How the preference of SWB or stereo is to be signaled is for further study after the ITU-T completes the relevant extension. - In section 4.1, added a new media type parameter, 'mode', to indicate whether the AMR-WB compatible mode is in use. - Added section 9 (Open issues) and section 10 (Changes Log). Lakaniemi, Wang Expires March 28, 2009 [Page 33]