Alan Clark Telchemy Robert Cole AT&T Labs Kaynam Hedayat Brix Networks Internet Draft Document: draft-clark-avt-rtcpvoip-01 July 2002 Expires: January 2003 RTCP Extensions for Voice over IP Metric Reporting Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as work in progress. The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract This document specifies an extension to the Real-time Transport Control Protocol (RTCP) to support reporting of Voice over IP metrics from end- points. The proposed extension is useful for supporting both mid-call and end-of-call reporting of metrics for management and active control applications. 1. Conventions and Acronyms The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this document, are to be interpreted as described in RFC 2119. 2. Introduction The Real-time Transport Protocol (RTP) [1] provides a real-time transport mechanism suitable for unicast or Internet Standard multicast communication between multimedia applications. Typical uses are for real-time or near real-time group communication via audio and video data Clark Internet Draft - Expires 2002 [Page 1] RTCP Extensions for Voice over IP Metric Reporting streams. An important component of the RTP protocol is the control channel, defined as the Real-time Control Protocol (RTCP). RTCP involves the periodic transmission of control packets between group members in a session, enabling the distribution or calculation of session specific information such as packet loss and round trip time to other hosts, and group size estimation. An additional advantage of providing a control channel for a session is that a third-party session monitor can listen to the traffic to establish network conditions and to diagnose faults based on receiver locations. Multimedia services, for example Voice over IP, are very sensitive to short term variations in network impairments such as packet loss and jitter. The use of adaptive jitter buffers and other impairment mitigation techniques can improve performance however make it very difficult to infer the behavior of the end to end connection from observations of the packet stream or from RTCP reported statistics. The extensions to RTCP described in this document provide concise but useful metrics for Voice over IP calls. 3. Basic Operation This draft proposes a concise set of metrics that provide sufficient information to characterize a Voice over IP connection. The set of metrics includes: 3.1 Packet loss metrics It has been shown [2] that the distribution of packet loss on the Internet can be reasonably approximated by a Markov Model. In the case of a Voice over IP connection it is necessary to consider both packets lost within the IP network and those discarded by a jitter buffer due to late arrival, overrun or underrun. (i) Packet loss rate - a measure of the proportion of packets lost within the IP network (ii) Packet discard rate - a measure of the proportion of packets discarded by the jitter buffer Assume a Gilbert-Elliott model of the IP "channel" in which a "burst" is a period of time during which the number of packets received between two successive lost or discarded packets is less than some minimum Gmin [3]. This approach works well for Voice over IP as it treats isolated lost packets as distinct from periods of high packet loss - reflecting the effectiveness of packet loss concealment. (iii) Mean burst duration (mS)- the average length of a burst (iv) Mean burst density - the proportion of packets lost within a burst Clark Internet Draft - Expires 2002 [Page 2] RTCP Extensions for Voice over IP Metric Reporting (v) Mean gap duration (mS) - the average time between bursts (vi) Mean gap density - the proportion of packets lost within a gap 3.2 Delay Delay is critical in interactive multimedia sessions. In general this comprises the transmission delay and the end-system delay. Users are generally not aware of asymmetry in delay and hence it is sufficient to report round trip delay. (vii) Round trip delay (mS) - as measured by RTCP (viii) End-system delay (mS) û round trip delay including jitter buffer and encoding/ decoding delay 3.3 Analog metrics Analog metrics may be available in some systems (ix) Voice signal relative power (dB) (x) Echo level (xi) Noise level (xii) Distortion level 3.4 Voice Quality metrics Computing voice quality metrics in the end-system has the advantage that all the essential information is available and that time correlation is preserved. (xiii) Voice quality metric - Voice over IP segment (R factor) (xiv) Voice quality metric - Voice over IP segment (MOS score) (xv) Voice quality metric - External network (R factor) 4. Extended Report Blocks 4.1 RTCP Report Format Metrics will be carried in an RTCP extended report block [4]. An extended report block has the following format: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | BT | type-specific | length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ : type-specific data : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Clark Internet Draft - Expires 2002 [Page 3] RTCP Extensions for Voice over IP Metric Reporting block type (BT): 8 bits = TBD Identifies the specific block format. type-specific: 8 bits The use of these bits is defined by the particular block type. length: 16 bits The length of this report block in 32-bit words minus one, including the header. type-specific data: variable length This MUST be a multiple of 32 bits long. It MAY be zero bits long. The encoding of the VoIP performance metrics consists of a six 32 bit words encoded using the following format. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | BT | type-specific | length=6 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Loss Rate | Discard rate | Burst duration | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Burst density | Gap duration | Gap density | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Round trip delay | End system delay | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Sig power | Echo level | Noise level | Distortion | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | R Factor | R external | MOS-LQ | MOS-CQ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Gmin | Imp Spec 1 | Imp Spec 2 | Imp Spec 3 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 4.2 Packet Loss and Discard Statistics Packet loss rate 0 0 1 2 3 4 5 6 7 +-+-+-+-+-+-+-+-+ | loss rate | +-+-+-+-+-+-+-+-+ The packet loss rate is defined as the fraction of packets detected as lost at the receiving RTCP instance during the entire RTP session or VoIP call expressed as a floating point value. This packet loss rate is calculated with both duplicated packets and packets arriving outside the jitter buffer window (i.e. discarded packets) excluded. The fraction is to be calculated by dividing the total number of packets lost by the Clark Internet Draft - Expires 2002 [Page 4] RTCP Extensions for Voice over IP Metric Reporting total number of packets expected and expressing the result as a fixed point number with the binary point at the left hand side of the field (this is equivalent to multiplying the result of the division by 256 and taking the integer part). Packet discard rate 0 0 1 2 3 4 5 6 7 +-+-+-+-+-+-+-+-+ | discard rate | +-+-+-+-+-+-+-+-+ The packet discard rate is defined as the fraction of packets discarded due to late or early arrival, under-run or overflow at the receiving jitter buffer during the entire RTP session or VoIP call expressed as a floating point number. The fraction is to be calculated by dividing the total number of packets discarded (excluding duplicate packet discards) by the total number of packets expected and expressing the result as a fixed point number with the binary point at the left hand side of the field (this is equivalent to multiplying the result of the division by 256 and taking the integer part). Mean burst duration (mS) 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | burst duration | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The mean burst duration is defined as the average period of time spent in a high loss state during the entire RTP session or VoIP call. A high loss state is defined as a period of time bounded by loss or discard events during which the number of consecutive received packets between lost or discarded packets is less than some maximum value Gmin (default value 16). This parameter is expressed in milliseconds as these units are commonly used when describing such events. Mean burst density 0 1 2 3 4 5 6 7 +-+-+-+-+-+-+-+-+ | burst density | +-+-+-+-+-+-+-+-+ Burst density is defined as the fraction of packets lost or discarded whilst in the burst state expressed as a floating point number. The fraction is to be calculated by dividing the total number of packets discarded or lost whilst in the burst state (as defined above) by the total number of packets expected whilst in the burst state and Clark Internet Draft - Expires 2002 [Page 5] RTCP Extensions for Voice over IP Metric Reporting expressing the result as a fixed point number with the binary point at the left hand side of the field (this is equivalent to multiplying the result of the division by 256 and taking the integer part). Mean gap duration (mS) 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | gap duration | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The mean gap duration is defined as the average period of time spent in a low loss state during the entire RTP session or VoIP call. A low loss state is defined as a period of time bounded by received packets during which the number of consecutive received packets between lost or discarded packets is greater than some maximum value Gmin (default value 16). This parameter is expressed in milliseconds as these units are commonly used when describing such events. Mean gap density 0 1 2 3 4 5 6 7 +-+-+-+-+-+-+-+-+ | gap density | +-+-+-+-+-+-+-+-+ Gap density is defined as the fraction of packets lost or discarded whilst in the gap state expressed as a floating point number. The fraction is to be calculated by dividing the total number of packets discarded or lost whilst in the gap state (as defined above) by the total number of packets expected whilst in the gap state and expressing the result as a fixed point number with the binary point at the left hand side of the field (this is equivalent to multiplying the result of the division by 256 and taking the integer part). Minimum gap size parameter û Gmin [Default 16] 0 0 1 2 3 4 5 6 7 +-+-+-+-+-+-+-+-+ | Gmin | +-+-+-+-+-+-+-+-+ This parameter is used to determine the packet loss density corres- ponding to a burst. The default value of 16 will result in the identification and measurement of bursts that have a packet loss rate of at least 6 percent Clark Internet Draft - Expires 2002 [Page 6] RTCP Extensions for Voice over IP Metric Reporting 4.3 Delay metrics Round trip delay (mS) 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | RTD | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Round trip delay is defined as the two-way delay measured between RTCP end points expressed in milliseconds. This may, for example, be calculated from RTCP SR/RR reports. End-system delay (mS) 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ESD -end system delay | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ End system delay is defined as the total encoding, decoding and jitter buffer delay determined at the reporting end point. This is defined as the time delay that would result from an arriving RTP frame being buffered, decoded, converted to "analog" form, being looped back at the local "analog" interface, encoded and made available for transmission as an RTP frame. One way symmetric voice path delay This may be calculated from the round trip and end system delays as follows. If the round trip delay is denoted RTD and the end system delays associated with the two endpoints are ESD(A) and ESD(B) then:- one way symmetric voice path delay = ( RTD + ESD(A) + ESD(B) ) / 2 4.4 Analog signal metrics Voice signal relative power (dBm) 0 0 1 2 3 4 5 6 7 +-+-+-+-+-+-+-+-+ | Sig Power | +-+-+-+-+-+-+-+-+ The voice signal relative power is defined as the ratio of the peak signal power to overflow signal power expressed in dB as a signed 8 bit integer number. A value of +127dB, which would be encoded as binary 1111110, indicates that this parameter is not available. Clark Internet Draft - Expires 2002 [Page 7] RTCP Extensions for Voice over IP Metric Reporting Echo level 0 0 1 2 3 4 5 6 7 +-+-+-+-+-+-+-+-+ | echo level | +-+-+-+-+-+-+-+-+ The echo level is defined as the ratio of far end echo to transmit level expressed in dB as a signed 8 bit integer number. A value of +127dB, which would be encoded as binary 1111110, indicates that this parameter is not available. Noise level 0 0 1 2 3 4 5 6 7 +-+-+-+-+-+-+-+-+ | Noise level | +-+-+-+-+-+-+-+-+ The noise level is defined as the ratio of the silent period background noise level to overflow signal power expressed in dB as a signed 8 bit integer number. A value of +127dB, which would be encoded as binary 1111110, indicates that this parameter is not available. Distortion level 0 0 1 2 3 4 5 6 7 +-+-+-+-+-+-+-+-+ | distortion | +-+-+-+-+-+-+-+-+ The distortion level is defined as the mean ratio of the signal distortion power to signal power expressed in dB as a signed 8 bit integer number. A value of +127dB, which would be encoded as binary 1111110, indicates that this parameter is not available. Clark Internet Draft - Expires 2002 [Page 8] RTCP Extensions for Voice over IP Metric Reporting 4.5 Voice quality metrics Voice quality metric - Voice over IP segment R Factor 0 0 1 2 3 4 5 6 7 +-+-+-+-+-+-+-+-+ | R Factor | +-+-+-+-+-+-+-+-+ Voice quality metric expressed as an R +Factor. An R Factor is an integer number in the range 0 to 100 with a value of 94 corresponding to "toll quality" and values of 50 or less being regarded as unusable. This metric is defined as including the effects of delay, consistent with ITU G.107 and ETSI TS101329-5. A value of +127, which would be encoded as binary 1111110, indicates that this parameter is not available. Voice quality metric - External network R Factor 0 0 1 2 3 4 5 6 7 +-+-+-+-+-+-+-+-+ | R(ext) Factor | +-+-+-+-+-+-+-+-+ Voice quality metric related to an external network segment, for example a cellular network. This metric is defined as including the effects of delay, consistent with ITU G.107 and ETSI TS101329-5. A value of +127, which would be encoded as binary 1111110, indicates that this parameter is not available. An overall R factor may be estimated from the Voice over IP segment R Factor and the External Network R Factor. R total (estimated) = R + R(ext) û 94 Voice quality metric - Voice over IP segment Listening Quality MOS 0 0 1 2 3 4 5 6 7 +-+-+-+-+-+-+-+-+ | MOS-LQ | +-+-+-+-+-+-+-+-+ Voice quality metric expressed as an estimated Mean Opinion Score (MOS). This metric is defined as not including the effects of delay and can be compared to MOS scores obtained from listening quality (ACR) tests. This is a score ranging from 1 to 5 in which 5 represents excellent and 1 represents unacceptable. The metric is represented as MOS x 10, for example a value of 35 would correspond to an estimated MOS score of 3.5. A value of +127, which would be encoded as binary 1111110, indicates that this parameter is not available. Clark Internet Draft - Expires 2002 [Page 9] RTCP Extensions for Voice over IP Metric Reporting Voice quality metric - Voice over IP segment Conversational MOS 0 0 1 2 3 4 5 6 7 +-+-+-+-+-+-+-+-+ | MOS-CQ | +-+-+-+-+-+-+-+-+ Voice quality metric expressed as an estimated Mean Opinion Score (MOS). This metric is defined as including the effects of delay and other effects that would affect conversational quality. The metric may be calculated by converting an R Factor determined according to G.107 or TS 101 329-5 into an estimated MOS using the equation specified in G.107 The metric is represented as MOS x 10, for example a value of 35 would correspond to an estimated MOS score of 3.5. A value of +127, which would be encoded as binary 1111110, indicates that this parameter is not available. 4.5 Type specific data Implementation specific octets 0 0 1 2 3 4 5 6 7 +-+-+-+-+-+-+-+-+ | Imp - 1 | +-+-+-+-+-+-+-+-+ 0 0 1 2 3 4 5 6 7 +-+-+-+-+-+-+-+-+ | Imp - 2 | +-+-+-+-+-+-+-+-+ 0 0 1 2 3 4 5 6 7 +-+-+-+-+-+-+-+-+ | Imp - 3 | +-+-+-+-+-+-+-+-+ Clark Internet Draft - Expires 2002 [Page 10] RTCP Extensions for Voice over IP Metric Reporting 5. Discussion and Applications 6. Intellectual Property The IETF takes no position regarding the validity or scope of any intellectual property or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; neither does it represent that it has made any effort to identify any such rights. Information on the IETF's procedures with respect to rights in standards-track and standards-related documentation can be found in BCP-11. Copies of claims of rights made available for publication and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementors or users of this specification can be obtained from the IETF Secretariat. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights which may cover technology that may be required to practice this standard. Please address the information to the IETF Executive Director. Clark Internet Draft - Expires 2002 [Page 11] RTCP Extensions for Voice over IP Metric Reporting 7. References [1] H. Schulzrinne, S. Casner, R. Frederick and V. Jacobson, "RTP: A transport protocol for real-time applications", RFC 1889, IETF, February 1996 [2] J. Bolot, S. Fosse-Parisis, D. Towsley "Adaptive FEC-Based Error Control for Interactive Audio in the Internet", IEEE Infocom 1999 [3] QoS Measurement for Voice over IP, ETSI TIPHON TS 101 329-5, December 2000 [4] T. Freidman, R. Caceres, K. Almeroth, K. Sarac, "RTCP Reporting Extensions", draft-friedman-avt-rtcp-report-extns-02.txt 8 Full Copyright Statement Copyright (C) The Internet Society (2002). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE." Clark Internet Draft - Expires 2002 [Page 12] RTCP Extensions for Voice over IP Metric Reporting 9. Authors' Addresses Alan Clark Telchemy Incorporated 3360 Martins Farm Road, Suite 200 Suwanee, GA 30024 Tel: +1-770-614-6944 Fax: +1-770-614-3951 Email: alan@telchemy.com Robert Cole AT&T Labs 330 Saint Johns Street, 2nd Floor Havre de Grace, MD, USA 21078 Tel: +1 410 939-8732 Fax: +1 410 939-8732 E-mail: rgcole@att.com Kaynam Hedayat Brix Networks 285 Mill Road Chelmsford, MA 01824 Tel: +1-978-367-5600 Fax: +1-978-367-5700 Email: khedayat@brixnet.com Clark Internet Draft - Expires 2002 [Page 13] RTCP Extensions for Voice over IP Metric Reporting Annex A Example Algorithm for Determining Burst and Gap Metrics The channel is assumed to have high packet loss (burst) and low packet loss (gap) conditions. During the Voice over IP call packet loss events and inter-loss gaps are counted. During the call or the end of the call the transition probabilities of the Markov model may be determined and used to compute a voice quality metric for the call. A gap is defined with respect to a parameter Gmin which represents the criteria that at least Gmin successive packets must be received between two lost packets in order for the channel to be in the gap state/ condition. If the number of packets received between two successive lost packets is less than the minimum value Gmin then the sequence of the two lost packets and the intervening received packets are regarded as part of a burst. The Markov model is defined as having the following states and associated transitions: State 1 - gap condition û packet not lost p11 - packet received p13 - packet loss (start of burst) p14 - isolated packet loss State 2 - burst condition û packet not lost p22 - packet received within burst p23 - packet lost within burst State 3 - burst condition - packet lost p31 - packet received (end of burst) p32 - packet received within burst p33 - packet lost State 4 - gap condition û isolated packet lost p41 - packet received This model can be constructed either by accumulating packet loss information during fixed sampling intervals or at packet loss events. An example of a computationally efficient method for determining the parameters of the Markov model is given below. Clark Internet Draft - Expires 2002 [Page 14] RTCP Extensions for Voice over IP Metric Reporting Assume a counter 'pkt' tracks the number of consecutively received packets since the last packet loss event, 'lost' tracks the number of lost packets in a burst, 'gmin' is the minimum gap size, and that an event can be generated if a packet loss is detected: Initial condition: lost = 0 c11, c13, c14, c22, c23, c33 = 0 Packet loss event (pkt)-> if pkt >= gmin then if lost = 1 then c14 = c14 + 1 else c13 = c13 + 1 lost = 1 c11 = c11 + pkt else lost = lost + 1 if pkt = 0 then c33 = c33 + 1 else c23 = c23 + 1 c22 = c22 + pkt û 1 pkt = 0 The series of counters c11 to c14 are used to determine the corres- ponding Markov model transition probabilities (i.e c11 is used to calculate p11). Counter c5 is used to measure the delay since the last "significant" burst of lost packets. Parameter gmin, the minimum gap size, is typically 16. The key metrics needed for determining application performance are:- c31 = c13 c32 = c23 c11 = c11 + c14 (for simplicity - combine states 4 and 1) p11 = c11 / (c11 + c13) p13 = 1 - p11 p31 = c31 / (c31 + c32 + c33) p32 = c32 / (c31 + c32 + c33) p33 = 1 - p31 - p32 p22 = c22 / (c22 + c23) p23 = 1 - p22 d = (p23 p31 + p13 p32 + p13 p23) p1 = p31 p23 / d p2 = p13 p32 / d p3 = p13 p23 / d frame size F = frame size (in seconds) average packet loss rate L = 100 p3 percent gap length g = F / (1 - p11) seconds gap loss density Dg= 100 c14 / c11 percent burst length b = F (1 - p1) / (p1 p13) seconds burst loss density Db = 100 p23 / (p23+p32) percent delay since last burst y = F c5 seconds Clark Internet Draft - Expires 2002 [Page 15]