AVTEXT Working Group J. Samuelsson Internet-Draft Ericsson Intended status: Standards Track M. Coban Expires: June 2015 Qualcomm S. Wenger Vidyo December 15, 2014 Reference Picture Verification Information in the RTP Audio-Visual Profile with Feedback (AVPF) draft-samuelsson-avtext-rpvi-00.txt Abstract This document specifies an extension to the feedback messages defined in the Audio-Visual Profile with Feedback (AVPF). The new Reference Picture Verification Information (RPVI) feedback message conveys information about available reference pictures in the decoded picture buffer of a video decoder in the receiver of an RTP video stream. By including information related to Decoded Picture Hash (DPH) values, media senders and media receivers can verify that reference pictures used for prediction by the video encoder and the video decoder are aligned. It is also possible to use the RPVI feedback message to indicate that a specific reference picture has incorrect sample values (i.e. a mismatch in the DPH value between encoder and decoder) or that a specific reference picture has been lost. Status of this Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any Samuelsson, et al. Expires June 15, 2015 [Page 1] Internet-Draft Reference Picture Verification Info December 2014 time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html This Internet-Draft will expire on June 15, 2015. Copyright Notice Copyright (c) 2014 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Table of Contents 1. Introduction...................................................2 1.1. Applicability.............................................3 2. Terminology....................................................4 2.1. Standards Language........................................4 2.2. Glossary..................................................4 3. Reference Picture Verification Information.....................4 3.1. Message Format............................................6 4. SDP Signaling..................................................8 5. Security Considerations........................................9 6. IANA Considerations............................................9 7. References.....................................................9 7.1. Normative References......................................9 7.2. Informative References...................................10 8. Acknowledgments...............................................10 1. Introduction This document defines a new RTCP feedback message to augment those defined in [RFC4585], [RFC5104] and [RFC6642], for use together with Samuelsson, et al. Expires June 15, 2015 [Page 2] Internet-Draft Reference Picture Verification Info December 2014 video codecs that exploits temporal prediction through the use of one or more reference pictures, e.g. [H.264], VP8 [RFC6386] and [HEVC]. 1.1. Applicability The video codecs [H.264] and [HEVC] both use temporal prediction in order to achieve efficient compression without compromising the visual quality of the compressed video. Video data (frames/pictures) are encoded together with non-video data (such as parameter sets) and an abstraction layer is used to structure the encoded bits in a format suitable for network transportation. A stream encoded according to H.264 or HEVC, and packetized according to [RFC6184] and [I-D.ietf-payload-rtp-h265], respectively, is typically transmitted from a media sender to a media receiver. The media sender encodes the video and the media receiver decodes the video. During the entire session (or, more specifically, within a coded video sequence, it is crucial that the process performed at the decoder is aligned with the process performed at the encoder. Even the slightest difference in the sample values of a decoded picture can result in severe visual degradation when the picture is used for prediction by following pictures. There are several factors that can affect the alignment of encoding and decoding processes: o Loss of data. In many applications it is possible to detect the loss of RTP packets and perform appropriate actions for repairing the loss without delivering corrupt data to the video decoder. However, in some applications such methods may not be available (for example due to delay constraints) or they may fail. o Bit errors. If the receiver does not have means for detecting individual bit errors, such errors may occur in the data that is delivered to the video decoder. o Random access. When performing random access into a stream it might be difficult for the decoder to deduce if it is operating with the correct parameters and reference pictures. o Hardware failure. The hardware in the decoder could be malfunctioning, for example if it is not able to correctly store decoded pictures used for prediction. Samuelsson, et al. Expires June 15, 2015 [Page 3] Internet-Draft Reference Picture Verification Info December 2014 o Incorrect implementations. Ideally all video encoders and video decoders would be implemented impeccably according to the codec specification. However, in practice there is unfortunately the risk of misinterpretation of the specification as well as the risk of implementation bugs. The feedback message specified in this memo can be utilized to detect misalignment between encoder and decoder reference pictures. Other mechanisms (such as sending IDR pictures) not specified herein, can be utilized to combat the potential negative effects of an encoder/decoder misalignment. 2. Terminology 2.1. Standards Language The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119]. 2.2. Glossary AVPF - Audio-Visual Profile with Feedback DPH - Decoded Picture Hash FCI - Feedback Control Information [RFC4585] IDR - Instantaneous Decoder Refresh RPVI - Reference Picture Verification Information SEI - Supplemental Enhancement Information 3. Reference Picture Verification Information A Reference Picture Verification Information (RPVI) feedback message can be sent by media receivers to report which reference pictures are available in the decoded picture buffer. Along with identifiers of the available reference pictures it is possible to transmit the result of verifying the Decoded Picture Hash (DPH) values or to transmit the actual DPH values (see section 3.1). The feedback message can be sent at any time during an RTP session. This memo does Samuelsson, et al. Expires June 15, 2015 [Page 4] Internet-Draft Reference Picture Verification Info December 2014 not describe the process for handling incorrect DPH values. However, in order to achieve good media quality and recover from errors in the sample values of decoded pictures it is strongly recommended that a media sender (encoder) takes appropriate actions upon the detection of an incorrect DPH value or negative acknowledgements (NACK). Such actions could for example include: o Transmission of data that resets the state of the decoder, e.g. an Instantaneous Decoder Refresh (IDR) picture. By providing a refresh-point, the media sender can ensure that errors that have occurred in decoded reference pictures do not propagate to future pictures. o Encoding following pictures using "old" reference pictures that have been received, decoded and preferably verified to have correct sample values. Excluding all references to pictures with incorrect sample values will give the same effect as providing a refresh-point: errors that are present in decoded reference pictures do not propagate to future pictures. o Retransmission of parameter sets. If an update of parameter sets is lost, there is a risk that the decoder uses some parameters incorrectly (e.g. too strong deblocking filter) without detectable errors in the decoding process. By retransmitting the parameter sets the encoder can make sure that the correct parameters are used but it is not by its own sufficient for recovering from errors in sample values of decoded reference pictures. This action is recommended to be combined with one of the first to actions in this list. o Changing encoder settings or parameters to avoid configurations that cause incorrect decoder state. When errors continuously appear (even after performing one or both of the first two actions in this list) a media sender can try to change the configuration of the encoder in order to find a setting that does not result in errors in the decoded pictures. Samuelsson, et al. Expires June 15, 2015 [Page 5] Internet-Draft Reference Picture Verification Info December 2014 3.1. Message Format The RPVI message is identified by RTCP packet type value PT=PSFB and FMT=TBD. The Feedback Control Information (FCI) for RPVI consists of one or more FCI entries, the content of which is depicted in Figure 1. Each entry applies to a different reference picture, identified by its Reference Picture Identifier. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | MT| Reserved6 | RefPicId | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | RefPicId | | +-+-+-+-+-+-+-+-+ + | | + Decoded Picture Hash (conditional) | + + | | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | +-+-+-+-+-+-+-+-+ Figure 1 Syntax of an FCI Entry in the RPVI message The semantics of the fields are as follows: MT: 2 bits Indicates the picture status information as follows: 0: No hash information regarding the correctness of the reference picture is available. 1: The Decoded Picture Hash of the reference picture is included in the Reference Picture Description. 2: The indicated picture is entirely or partially lost, hence not fully decodable. 3: The Decoded Picture Hash has been used to verify the reference picture to be incorrect. Samuelsson, et al. Expires June 15, 2015 [Page 6] Internet-Draft Reference Picture Verification Info December 2014 When MT equals 0 or 1, the reference picture identified by the current entry is indicated as being available at receiver's decoded picture buffer which may be available at the sender's decoded picture buffer for reference when encoding the next picture to be encoded at the reception of the RPVI feedback message. For MT equals to 1 with the exception that if the encoder finds that the provided hash of the reference picture does not match the encoder's hash value, then it MUST NOT use the reference picture. Informative note: When a feedback message contains one or more RPVI entry with MT equals to 0 or 1, the encoder may select one or more of the identified pictures and/or inferred reference pictures from the availability of the indicated pictures to be used for reference. The selection of which picture(s) to use for reference is out of scope of this memo but may for example be based on maximizing compression efficiency. When MT equals 2 or 3 the reference picture identified by the current entry MUST NOT be used for reference for the next picture or any picture that follows the next picture. Other reference pictures that use the reference picture identified by the current entry SHOULD NOT be used for reference, unless their Decoded Picture Hash has been verified to be correct. Reserved6: 6 bits This field is reserved for future definition. In the absence of such a definition, the bits in this field MUST be set to zero and ignored by the receiver of the RPVI feedback message. RefPicId: 32 bits If the video codec used for the media stream is HEVC, RefPicId represents the value of the PicOrderCntVal (in network byte order) of the reference picture, as defined in [HEVC]. If the video codec used for the media stream is H.264, RefPicId represents the value of the frame_num (in network byte order) of the reference picture, as defined in [H.264]. If the video codec used for the media stream is neither HEVC nor H.264, the picture identifier RefPicId SHOULD be defined outside of this specification. Samuelsson, et al. Expires June 15, 2015 [Page 7] Internet-Draft Reference Picture Verification Info December 2014 Decoded Picture Hash: Variable number of bytes Present only if MT equals 1. Represent the Decoded Picture Hash Supplemental Enhancement Information (SEI) data (in network byte order), see D.2.19 of [HEVC], of the decoded picture. The Decoded Picture Hash data starts with a one byte type field, which can be used to calculate the amount of hash data. For video encoded with three color components, such as YCbCr and RGB, the total length of the Decoded Picture Hash will be 49 bytes when the first byte equals 0, 7 bytes when the first byte equals 1 and 13 bytes when the first byte equals 2. Informative note: At the time of writing this memo, the Decoded Picture Hash SEI message is only specified for HEVC. However, the DPH calculations defined in D.3.19 of [HEVC] operate only on decoded sample values and is therefore codec agnostic. The DPH SEI message defined in D.2.19 of [HEVC] does not contain any HEVC specific information and can therefore easily be replicated in the context of any video codec that decode encoded data into arrays of sample values, such as H.264. 4. SDP Signaling A new "ack" and "nack" feedback parameter "rpvi" is defined to indicate the usage of the RPVI feedback message. (In the following ABNF [RFC5234], rtcp-fb-ack-param, rtcp-fb-nack- param is used as defined in [RFC4585].) rtcp-fb-ack-param =/ SP "rpvi" rtcp-fb-nack-param =/ SP "rpvi" The following parameter is defined in this document for use with 'ack': o 'rpvi' stands for Reference Picture Verification Information and indicates the use of RPVI messages as defined in Section 3. The following parameter is defined in this document for use with 'nack': o 'rpvi' stands for Reference Picture Verification Information and indicates the use of RPVI messages as defined in Section 3. Samuelsson, et al. Expires June 15, 2015 [Page 8] Internet-Draft Reference Picture Verification Info December 2014 The offer/answer rules for these SDP feedback parameters are specified in the RTP/AVPF profile [RFC4585]. Methods and rules for when to send RPVI messages are out of scope of this memo. When the RPVI message is used in "ack" mode it may for example be sent at a regular interval or for all pictures that fulfills certain requirements (such as being coded as Intra pictures). However, it is possible in both "ack" mode and "nack" mode to send the RPVI message in response to a specific event (such as a picture loss). When the "ack" mode is used for MT equal to 2 or 3 it can be said to represent an acknowledgement of having received enough data to derive the PictureID of the indicated picture but that there appears to be some data missing (MT equal to 2) or the sample values seems to be incorrect (MT equal to 3). 5. Security Considerations The security considerations documented in [RFC4585] are also applicable for the RPVI message defined in this document. More specifically, a malicious group member can report incorrect DPH values in RPVI feedback messages to make the sender throttle the data transmission and increase the amount of redundancy information or take other action to deal with the pretended incorrect DPH value (e.g. change encoder configuration). This may result in a degradation of the quality of the reproduced media stream. A solution to prevent such attack with maliciously sent RPVI feedback messages is to apply an authentication and integrity protection framework for the feedback messages. This can be accomplished using the RTP profile that combines Secure RTP [RFC3711] and AVPF into SAVPF [RFC5124]. 6. IANA Considerations A new RPVI Feedback Message Type should be registered with IANA in "FMT Values for PSFB Payload Types". 7. References 7.1. Normative References [H.264] ITU-T Recommendation H.264, "Advanced video coding for generic audiovisual services", February 2014, . Samuelsson, et al. Expires June 15, 2015 [Page 9] Internet-Draft Reference Picture Verification Info December 2014 [HEVC] ITU-T Recommendation H.265, "High Efficiency Video Coding", April 2013, . [RFC4585] Ott, J., Wenger, S., Sato, N., Burmeister, C., and J. Rey, "Extended RTP Profile for Real-Time Transport Control Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC 4585, July 2006. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. Norrman, "The Secure Real-time Transport Protocol (SRTP)", RFC 3711, March 2004. [RFC5124] Ott, J. and E. Carrara, "Extended Secure RTP Profile for Real-time Transport Control Protocol (RTCP)-Based Feedback (RTP/SAVPF)", RFC 5124, February 2008. 7.2. Informative References [RFC5104] Wenger, S., Chandra, U., Westerlund, M., and B. Burman, "Codec Control Messages in the RTP Audio-Visual Profile with Feedback (AVPF)", RFC 5104, February 2008. [RFC6642] Wu, Q., Xia, F., and R. Even, "RTP Control Protocol (RTCP) Extension for a Third-Party Loss Report", RFC 6642, June 2012. [RFC6184] Wang, Y., Even, R., Kristensen, T., and R. Jesup, "RTP Payload Format for H.264 Video", RFC 6184, May 2011. [I-D.ietf-payload-rtp-h265] Wang, Y., Sanchez, Y., Schierl, T., Wenger, S. and M. Hannuksela, "RTP Payload Format for High Efficiency Video Coding",draft-ietf-payload-rtp-h265 (work in progress), August 2014. 8. Acknowledgments The authors would like to thank Bo Burman, Rickard Sjoberg and Magnus Westerlund for valuable feedback during the development of this memo. This document was prepared using 2-Word-v2.0.template.dot. Samuelsson, et al. Expires June 15, 2015 [Page 10] Internet-Draft Reference Picture Verification Info December 2014 Authors' Addresses Jonatan Samuelsson Ericsson Farogatan 6, 164 80, Stockholm, Sweden Phone: +46 761 26 35 91 Email: jonatan.samuelsson@ericsson.com Muhammed Coban Qualcomm Email: mcoban@qti.qualcomm.com Stephan Wenger Vidyo Email: stewe@stewe.org Samuelsson, et al. Expires June 15, 2015 [Page 11]