Network Working Group A. Vitali Internet Draft STMicroelectronics Expires: January 2006 M.Fumagalli CEFRIEL - Politecnico di Milano July 2005 Standard-compatible Multiple-Description Coding (MDC) and Layered Coding (LC) of Audio/Video Streams Status of this Memo By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/1id-abstracts.html The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html Comments are solicited and should be addressed to the AVT WG mailing list at avt@ietf and/or the author(s). Abstract This document specifies an efficient way to ensure erasure resilient, scalable transmission of encoded multimedia sources via RTP using standard-compatible Multiple Description Coding (MDC) and Layered Coding (LC) together with interleaving, thus allowing a graceful degradation of the application quality with increasing packet loss rate and decreasing bandwidth/throughput on the network. This document describes what information needs to be communicated out-of-band to use a specific MDC/LC scheme and what information is A. Vitali Standards Track [Page 1] RFC PROPOSED standard-compatible MDC/LC May 2005 needed in RTP packets to identify what description/layer they carry. Definitions for SDP and MIME are provided. Table of Contents 1. Introduction.................................................. 3 1.1. Performance and efficiency of MDC/LC.................... 3 1.2. Application scenarios................................... 4 1.3. Standard-compatible framework, compatibility issues..... 4 1.4. Complexity of MDC/LC, implementation issues............. 5 1.5. Interaction with other techniques (ARQ/FEC)............. 6 1.6. Joint Source-Channel Coding (JSCC)...................... 6 2. Conventions................................................... 7 3. MDC/LC identification parameters.............................. 7 3.1. SDP media-level attributes.............................. 7 3.2. MIME registration....................................... 11 4. Typical cases and examples.................................... 11 4.1. Polyphase Downsampling Multiple Description (PDMD)...... 11 4.2. Multiple Description by filter bank..................... 13 4.3. Frame-expanded Multiple Description..................... 15 4.4. Unbalanced Multiple Description (UMD)................... 17 4.5. Classical Layered Coding (LC)........................... 18 4.6. Haar-wavelet Layered Coding............................. 19 5. Packetization................................................. 20 5.1. Synchronization......................................... 20 5.2. Multiplexing, Interleaving.............................. 20 6. Security Considerations....................................... 20 7. Congestion Control and bandwidth management................... 21 8. IANA Considerations........................................... 21 9. Informative appendix: rationale for std-compatible MDC/LC..... 21 9.1. Rationale for video MDC................................. 21 9.2. MDC in the error-prediction or in the transform domain.. 22 9.3. MDC in the pixel domain................................. 22 10. References.................................................... 23 10.1. Normative references.................................... 23 10.2. Informative references.................................. 23 Authors' Addresses................................................ 24 Full Copyright Statement.......................................... 24 Intellectual Property Right....................................... 25 Acknowledgements.................................................. 25 A. Vitali Standards Track [Page 2] RFC PROPOSED standard-compatible MDC/LC May 2005 1. Introduction Multiple Description Coding (MDC) and Layered Coding (LC) are analogous. Each description/layer can contribute to one or more characteristics of multimedia data: spatial/temporal resolution and quality (SNR). Descriptions/layers can also contribute to frequency content in the transform domain. The difference between MDC and LC lies in the dependency among bitstreams. The simplest case is when two bitstreams are created. In the case of LC they are referred to as "base layer" and "enhancement layer", the latter depends on the former and cannot be decoded independently. On the other hand, in the case of MDC, each description can be individually decoded to get a base quality. The more descriptions decoded, the larger the output quality. The base layer is clearly more important than enhancement layers. On the other hand, descriptions can have the same importance (as for balanced MDC) or they can have different importance (as for unbalanced MDC). 1.1. Performance and efficiency of MDC/LC MDC and LC are useful in case of varying bandwidth/throughput and in case of losses/erasures due to congestion (as for Internet) and uncorrectable errors (as for wireless channels). MDC greatly improves loss/erasure resilience because each bitstream can be decoded independently, making it unlikely to have the same portion of data corrupted in every description. LC can improve error resilience when the protection level for a given layer can be adapted to its importance so that the base layer is more protected. MDC/LC eases the management of variable bandwidth/throughput by transmitting a suitable number of descriptions/layers. It must be noted that, when neither LC nor MDC is used, an expensive transcoding process is needed to match the channel capacity. MDC/LC can also exploit path diversity. Coding efficiency is somewhat reduced depending on the amount of redundancy left among descriptions/layers. However, any technique that introduce scalability or resilience does reduce coding efficiency. A. Vitali Standards Track [Page 3] RFC PROPOSED standard-compatible MDC/LC May 2005 1.2. Application scenarios Foreseen applications are summarized in the following list: - Easy cell hand-over: different descriptions can be streamed from different base stations exploiting multi-paths on a cell boundary. - Adaptation to low resolution/memory/power: mobiles decode as many descriptions/layers as they can based on their display size, available memory, processor speed, and battery level. - Easy picture-in-picture: with the classical solution, a second full-decoding is needed plus downsizing; with MDC/LC, it is sufficient to decode one description or the base layer and paste it on the display. - Better delivered quality: in a broadcast scenario, there is no need to protect heavily the stream for the farthest user with FEC techniques, lowering quality at the same gross rate (media+FEC); nearer users will experience a better quality by receiving more descriptions/layers. - Adaptation to varying bandwidth: the base station can simply drop descriptions/layers; more users can be easily served, and no transcoding process is needed. - Multi-standard support (simulcast without simulcast): descriptions can be encoded with different codecs (MPEG-2, H.263, H.264); there's no waste of capacity as descriptions carry different information. - Divide-et-impera approach for HDTV distribution: HDTV sequences can be split into SDTV descriptions; no custom high-bandwidth h/w is required. - Enhanced carousel: instead of repeating the same data over and over again, different descriptions are transmitted one after another; the decoder can store and combine them to get an higher quality. - "pay-per-quality" services: user can decide at which quality level to enjoy a service, from low-cost low-resolution (base layer or one description only) to higher cost high-resolution (by paying for enhancement layers / more descriptions). 1.3. Standard-compatible framework, compatibility issues The implementation of the proposed MDC/LC scheme is completely independent of the underlying multimedia codec. The creation of A. Vitali Standards Track [Page 4] RFC PROPOSED standard-compatible MDC/LC May 2005 descriptions/layers is performed in the data domain. This is done in a pre-processing stage. Descriptions/layers can then be coded independently. At the decoder side, there is a post-processor stage in which decoded descriptions/layers are merged. Specific information needs to be communicated out-of-band via SDP or MIME to specify which MDC/LC scheme is in use. Standard decoders will ignore this specific information. However, such decoders will still be able to decode each successfully received description/layer. At the same time, decoders MDC/LC-aware will parse this information in order to properly decode and merge descriptions/layers. Balanced MDC can even be beneficial for standard decoders. Multiplexed descriptions can be marked so that standard decoders understand they are multiple copies of the same data. Of course, decoded data will have a smaller resolution/quality. As an example, when balanced descriptions are transmitted, standard decoders will understand that the same data is transmitted multiple times in a way similar, but not equal to, repetition codes. Actually, there is no repetition but slightly different data packets are transmitted. Decoders can be instructed to decode only the first successfully received 'copy'. 1.4. Complexity of MDC/LC, implementation issues Pre- and post-processing stages can be completely decoupled from the underlying multimedia codec. However, it must be noticed that keeping MDC/LC decoupled from the underlying codec prevent MDC/LC to give its best. To get maximum quality for the decoded MDC/LC and to do MDC/LC encoding with the least effort, joint or coordinated encoding could be used. Also, to exploit MDC/LC redundancy and to maximize the error resilience, joint MDC/LC decoding is recommended. As an example, video encoders can share expensive encoding decisions (motion vectors) instead of computing them; also they can coordinate encoding decisions (quantization policies) to enhance quality or enhance resilience (interleaved multiframe prediction policies, intra refresh policies). Decoders can share decoded data to ease error concealment; also they can share critical internal variables (anchor frame buffer) to stop error propagation due to prediction. It is worth mentioning that, if balanced descriptions are properly compressed and packetized, losses/erasures can be recovered before the decoding stage. In this case, decoders are preceeded by a special processor in which lost packets are recovered by copying similar packets from other descriptions. Similar packets are those that carry the same portion of data. A. Vitali Standards Track [Page 5] RFC PROPOSED standard-compatible MDC/LC May 2005 1.5. Interaction with other techniques (ARQ/FEC) Several techniques have been proposed to enhance error/loss resilience of multimedia streams sent through unreliable channels. Among these, there are techniques like forward error detection/correction codes (FEC) or automatic repetition requests (ARQ). ARQ is very effective but it requires a feedback channel and it can be used only in point-to-point communications, not for broadcast. Of course, time must be allowed for retransmissions. On the opposite, FEC does not require a feedback channel and it is suitable for broadcast. FEC usually needs to be complex (plus it introduces a substantial coding and interleaving delay) in order to be effective and it has an all-or-nothing performance: if the correction capability is exceeded, almost nothing is delivered to the receiver. Capacity may be wasted if the worst case (worst channel conditions, farthest user) must be considered. On the opposite, when the channel is better than expected and there are no losses, FEC redundancy is useless. On the other hand, a particular MDC scheme (frame-expanded MDC) can yield a superior quality to the user in this case as redundant data may contribute to the reduction of quantization noise power. Both these techniques, ARQ and/or FEC, can be used together with MDC/LC. It is suggested to adapt the protection level of a given description/ layer to its importance, a technique commonly known as unequal error protection. It is suggested to use unequal error protection even in the case of equally important descriptions (balanced MDC). In fact, armoring only one description may be more effective than trying to protect all descriptions. If this is done, there's one description which is heavily protected. If the channel becomes really bad, this description is likely to survive losses. Then the decoder will be able to guarantee a basic quality, thanks to this description. 1.6. Joint Source-Channel Coding (JSCC) It is often useful to optimize the parameters of the source and channel encoders jointly (joint source-channel coding, JSCC). In the case of multimedia communications, this means exploiting the error resilience that may be embedded in compressed multimedia bitstreams rather than using complex forward error detection/correction codes or complex communication protocols. As an example, in MPEG-x/H.26x video encoders, it is possible to A. Vitali Standards Track [Page 6] RFC PROPOSED standard-compatible MDC/LC May 2005 increase error resilience through one or more of the following techniques: more frequent intra pictures to reset the motion prediction loop; suitable intra macroblock update policy; more slices per picture to reset differential motion vector and DC coefficient coding; Flexible Macroblock Order (FMO) or Asynchronous Slice Order (ASO) or concealment motion vectors or interleaved multiframe prediction policies to ease error concealment; reversible variable length coding (RVLC, with MPEG-4) or an error resilient entropy coding scheme (EREC) ormore sync markers; etc. MDC can be seen as another way of enhancing error resilience without using complex channel coding schemes. Channel coding can then be reduced, compensating for the reduced coding efficiency due to redundancy left among descriptions. This can be seen as a form of joint source channel coding. 2. Conventions The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14, RFC 2119 [3]. 3. MDC/LC identification parameters Specific information needs to be communicated out-of-band to use an MDC/LC scheme. There are two sets of parameters. The first set of parameters is related to the pre-processor: it describes the creation of each description/layer so that smart receivers MDC/LC-aware can compute how to merge them depending on the loss pattern. The second set of parameters is related to the post-processor: it describes the merge of descriptions/layers for basic receivers MDC/LC-aware that can not compute how to merge them. Smart receivers MDC/LC-aware can compute a second set for the merge given the first set. One of these sets MUST be sent via Session Description Protocol (SDP) using media-level attributes ("a="). Both sets MAY be sent at the same time. A mapping of the parameters into MIME optional parameters is also provided. Equivalent parameters could be defined elsewhere for use with control protocols that do not use MIME or SDP. Standard receivers not MDC/LC-aware MUST ignore these sets. Receivers MDC/LC-aware MAY ignore these sets. 3.1. SDP media-level attributes Media-level unregistered attributes ("X-attribute:value") are defined. Their formatting in SDP is defined using the Augmented BNF (ABNF) grammar for syntax specifications (RFC 2234). Note: this A. Vitali Standards Track [Page 7] RFC PROPOSED standard-compatible MDC/LC May 2005 draft is not yet perfectly consistent with RFC 2234. - "a=X-mdclc-tag:" payload-type mdclc-tag All descriptions/layers used in MDC/LC MUST be identified by a media identification tag. Tag "S" MUST NOT be used: it is reserved to indicate the original multimedia data which is the default starting point for descriptions/layers generation. "_Q" MUST NOT be used as part of the tag: it is a suffix reserved to indicate decoded (and quantized) descriptions/layers. When lossless co/decoding is used (no quantization), the "_Q" suffix SHOULD NOT be used. "T_" MUST NOT be used ad part of the tag: it is a prefix reserved to indicate temporary descriptions/layers for computations in pre- and post-processor. "D" SHOULD be included as part of the tag to indicate that it is an independent description. "L" SHOULD be included as part of the tag to indicate that it is a dependent layer. Numbers MAY be used. Lower numbers SHOULD be used to indicate more important descriptions/layers. "_" SHOULD be used as separator to improve readability. The following suffixes are reserved video data components and they MUST NOT be used as part of the tag: ".Y" (luma), ".Cb" (first chroma), ".Cr" (second chroma), ".R" (red), ".G" (green), ".B" (blu). The following suffixes are reserved audio channels they MUST NOT be used as part of the tag: ".L" (left), ".R" (right), ".C" (center), ".RL" (rear left) or ".SL" (surround left), ".RR" (rear right) or ".SR" (Surround right), ".S" (surround, rear center), ".LFE" (low frequency effects). When they are used, they MUST be the last suffix. Alternatively, numbers can be used instead of labels for video components or audio channels. When they are not used as part of the tag, the same computations MUST be applied on all components or channels that are present. - "a=X-mdclc-group:" group-tag +(space mdclc-tag) Groups of descriptions/layers in MDC/LC are identified by a group tag followed by zero or more media identification tags. The media name in the "m=" line of SDP MUST be the same for all descriptions/layers in the group. - "a=X-mdclc-pre:" *(tag"="expression";") The creation of the indicated description/layer is specified as a A. Vitali Standards Track [Page 8] RFC PROPOSED standard-compatible MDC/LC May 2005 list of one or more expression that are computed in the pre- processor. The tag for the starting point is "S", it indicates the original data. Valid expression MAY use every tagged description/layer. Decoded descriptions/layers MAY be used, they are indicated by appending "_Q" to the tag. Temporary descriptions/layers MAY be used, they SHOULD be indicated by prepending "T_" to the tag. If an invalid expression is listed, the line MUST be entirely ignored. - "a=X-mdclc-post:" group-tag *(tag"="expression";") The merge of descriptions/layers in the indicated group is specified as a list of one or more expression that are computed in the post-processor. The tag for the ending point is "E", it indicates the reconstructed original data. The ending point MUST be initialized to zero, which is equivalent to the following expression at the beginning: "E=0;". Valid expression MAY use every tagged description/layer. Only decoded descriptions/layers are available at the receiver, therefore "_Q" SHOULD be added to every tag. Temporary descriptions/layers MAY be used, they SHOULD be indicated by prepending "T_" to the tag. If an invalid action is listed, the line MUST be ignored. All descriptions/layers listed in the indicated group MUST be received in order for the merge to take place. If there is more than one group for which all descriptions/layers have been received, the first group listed SHOULD be used. Valid expression are standard mathematical expressions: they are made of operands (tagged descriptions/layers) together with their operators, parenthesis MAY be used to prioritize computations. Operands are numbers or tags. The set of valid operators for the pre- and post-processor is listed below. Also, there is a set of available functions for up/downsampling, filtering, conversion. Labels are case insensitive. Computations are done with the precision of the operands. Typically for video each pixel is represented with 1 byte per component (YCbCr). Typically for audio each sample is represented with 2 bytes. Precision can be changed by using specific conversion functions. - Arithmetic operators. Plus + A. Vitali Standards Track [Page 9] RFC PROPOSED standard-compatible MDC/LC May 2005 Unary plus + Minus - Unary minus - Multiply * - Relational operators. Equal == Not equal ~= or != Less than < Greater than > Less than or equal <= Greater than or equal >= - Logical operators. Short-circuit logical AND && Short-circuit logical OR || Bit-wise logical AND & Bit-wise logical OR | Logical NOT ~ or ! Logical EXCLUSIVE OR ^ - up(F,P,A,tag): up-sampling of indicated description/layer by zero insertion along axis A. For every incoming sample, F output samples are created; F-1 zeros have to be inserted, valid factors are greater than or equal to one. The phase P indicates the output position of the incoming sample, valid phases goes from 0 to F-1. The axis A is indicated by a letter: X (horizontal), Y (vertical) or Z (temporal); more than one letter can be specified at the same time. - dn(F,P,A,tag): down-sampling of indicated description/layer by sample deletion along axis A. For every F incoming samples, one output sample is selected; F-1 samples have to be deleted, valid factors are greater than or equal to one. The phase P indicates the position of the incoming sample that survives the downsampling, valid phases goes from 0 to F-1. The axis A is indicated by a letter: X (horizontal), Y (vertical) or Z (temporal); more than one letter can be specified at the same time. - fir("["coefficients"]",tag[,IC]): finite impulse response (FIR) filter for indicated description/layer. Coefficients are listed from left to right, they MAY be separated by commas; from first row to last row, rows MUST be separated by semicolon; from first plane to last plane, planes MUST be separated by dots. Unlisted coefficients MUST be set to 0. There MUST be an odd number of coefficients per row, an odd number of rows, an odd number of planes. The coefficients of the filter A. Vitali Standards Track [Page 10] RFC PROPOSED standard-compatible MDC/LC May 2005 are centered around the output sample. Initial conditions are specified by IC: 0=unavailable samples are set to zero, 1=unavailable samples are copies of the first nearest available sample, 2=unavailable samples are taken by mirroring nearest available samples. The default is IC=0. To be added: conversion functions as long(), int(), etc; clipping function. To be discussed: allow for the definition of new functions? C-like operators as "? :". 3.2. MIME registration Equivalent MIME parameters are defined. 4. Typical cases and examples Several typical cases are discussed in this section for the case of video data. To be done: audio data. Multiple description (MDC) schemes. Basic polyphase downsampling MDC schemes (PDMD) with two balanced descriptions (2MD) and four balanced descriptions (4MD). Slightly unbalanced alias-free PDMD by filter bank. Frame expanded MDC with three descriptions (3MD) and five descriptions (5MD). Unbalanced MDC (UMD) where a low resolution redundant description is added. Layered coding (LC) schemes. Classical layered coding (LC) with two layers (2LC) and three layers (3LC). 1D and 2D Haar wavelet layered coding. 4.1. Polyphase Downsampling Multiple Description (PDMD) Two balanced descriptions can be generated by separating odd/even lines. The short name of this scheme is 2MD. The erasure resilience is low, but the overhead is also low. m=video 49170 RTP/AVP 97 98 a=rtpmap:97 H264/90000 a=fmtp:97 ... a=X-mdclc-tag: 97 D1 a=X-mdclc-pre: D1= dn(2,0,Y, S); a=rtpmap:98 H264/90000 a=fmtp:98 ... a=X-mdclc-tag: 98 D2 a=X-mdclc-pre: D2= dn(2,1,Y, S); Note that the quantity of data to be encoded is 2*1/2. A. Vitali Standards Track [Page 11] RFC PROPOSED standard-compatible MDC/LC May 2005 The behaviour of the basic receiver MDC/LC-aware SHOULD be specified for every loss pattern. In the case of 2MD, there are four loss patterns indicated as lpN, four merge scenarios are specified. When all descriptions are available, they are merged as they are. When only one description is available, missing lines are computed by averaging neighboring lines. When no description is received, pixel are set to gray (128). Smart receivers MDC/LC-aware MAY ignore this and implement a more effective concealment (e.g. copy-previous or motion-compensated copy). a=X-mdclc-group: lp0 D1 D2 a=X-mdclc-post: lp0 E= up(2,0,Y,D1) + up(2,1,Y,D2); a=X-mdclc-group: lp1 D1 a=X-mdclc-post: lp1 E= fir([0.5;1;0.5], up(2,0,Y,D1)); a=X-mdclc-group: lp2 D2 a=X-mdclc-post: lp2 E= fir([0.5;1;0.5], up(2,1,Y,D2)); a=X-mdclc-group: lp3 a=X-mdclc-post: lp3 E= 128; Two balanced descriptions can also be generated by separating odd/even frames. Four balanced descriptions can be generated by separating odd/even lines and taking every other pixel. The short name of this scheme is 4MD. The erasure resilience is high, but the overhead is also high. m=video 49170 RTP/AVP 97 98 99 100 a=rtpmap:97 H264/90000 a=fmtp:97 ... a=X-mdclc-tag: 97 D1 a=X-mdclc-pre: D1= dn(2,0,XY, S); a=rtpmap:98 H264/90000 a=fmtp:98 ... a=X-mdclc-tag: 98 D2 a=X-mdclc-pre: D1= dn(2,1,X, dn(2,0,Y, S)); a=rtpmap:99 H264/90000 a=fmtp:99 ... a=X-mdclc-tag: 99 D3 a=X-mdclc-pre: D3= dn(2,1,X, dn(2,0,Y, S)); a=rtpmap:100 H264/90000 a=fmtp:100 ... a=X-mdclc-tag:100 D4 a=X-mdclc-pre: D4= dn(2,1,XY, S); Note that the quantity of data to be encoded is 4*1/4. In the case of 4MD there are sixteen loss patterns. It is clear the advantage of having a smart receiver MDC/LC aware: merge scenarios A. Vitali Standards Track [Page 12] RFC PROPOSED standard-compatible MDC/LC May 2005 need not be specified explicitly. The concealment of missing data may be driven by actual received data rather than being independently fixed once and for all (e.g. edge-driven interpolation). 4.2. Multiple Description by filter bank Downsampling may cause aliasing in a given description if there are high frequencies in the original data. The aliasing is canceled during the merge process. The aliasing may also be avoided if original data is lowpass filtered prior downsampling. Original data may be reconstructed (except for quantization noise) if filters are properly designed. Two balanced descriptions can be generated by different mild lowpass filters. Each odd line (A) is combined with the following even line (B) with different weights. D1 = +0.75 A +0.25 B D2 = +0.25 A +0.75 B Corresponding SDP attributes are: m=video 49170 RTP/AVP 97 98 a=rtpmap:97 H264/90000 a=fmtp:97 ... a=X-mdclc-tag: 97 D1 a=X-mdclc-pre: D1= dn(2,0,Y, fir([0 0.75;0.25], S)); a=rtpmap:98 H264/90000 a=fmtp:98 ... a=X-mdclc-tag: 98 D2 a=X-mdclc-pre: D2= dn(2,0,Y, fir([0 0.25;0.75], S)); Note that in this example the phase of downsampling is not changed. Instead coefficients of FIRs are changed. If descriptions are quantized, their quantization noise affects the reconstructed lines. Pairs of odd (A) and even lines (B) can be reconstructed as follows: A = +1.5 D1 -0.5 D2 B = -0.5 D1 +1.5 D2 The generation of descriptions can be seen as a filter bank whose output is downsampled. Conversely, the merge of descriptions can be seen as a filter bank whose input is upsampled. Corresponding SDP attributes are: a=X-mdclc-group: lp0 D1 D2 A. Vitali Standards Track [Page 13] RFC PROPOSED standard-compatible MDC/LC May 2005 a=X-mdclc-post: lp0 E= fir([0;-0.5;+1.5], up(2,0,Y, D1)) + fir([0;+1.5;-0.5], up(2,0,Y, D2)); Reconstruction can also be done in this trivial way: a=X-mdclc-group: lp0 D1 D2 a=X-mdclc-post: lp0 E= up(2,0,Y,1.5*D1) + up(2,0,Y,-0.5*D2) + up(2,1,Y,-0.5*D1) + up(2,1,Y,+1.5*D2); Four slightly unbalanced descriptions can be generated by different lowpass filters. In the following example, the first description is not filtered and it may be aliased; the second description is lowpass filtered horizontally and it is free of horizontal alias; the third description is lowpass filtered vertically and it is free of vertical alias; the fourth description is lowpass filtered both horizontally and vertically and it is completely alias free. Pixels can be labeled in the following way: A B A' .. C D C' A" B" A"' : The invertible equation system is: D1 = 1/1 A D2 = 1/4 A + 1/2 B + 1/4 A' D3 = 1/4 A + 1/2 C + 1/4 A" D4 = 1/16 A + 1/8 B + 1/16 A' ... 1/8 C + 1/4 D + 1/8 C' ... 1/16 A" + 1/8 B" + 1/16 A"' Corresponding SDP attributes are: m=video 49170 RTP/AVP 97 98 99 100 a=rtpmap:97 H264/90000 a=fmtp:97 ... a=X-mdclc-tag: 97 D1 a=X-mdclc-pre: D1= dn(2,0,XY, S); a=rtpmap:98 H264/90000 a=fmtp:98 ... a=X-mdclc-tag: 98 D2 a=X-mdclc-pre: D2= dn(2,0,XY, fir([1,2,1]/4, S)); a=rtpmap:99 H264/90000 a=fmtp:99 ... a=X-mdclc-tag: 99 D3 a=X-mdclc-pre: D3= dn(2,0,XY, fir([1;2;1]/4, S)); A. Vitali Standards Track [Page 14] RFC PROPOSED standard-compatible MDC/LC May 2005 a=rtpmap:100 H264/90000 a=fmtp:100 ... a=X-mdclc-tag:100 D4 a=X-mdclc-pre: D4= dn(2,0,XY, fir([1,2,1; 2,4,2; 1,2,1]/16, S)); Note that the gain of FIR filters is 1. The gain is computed as the sum of the absolute values of coefficients. If the gain is 1, the range of data (max value - min value) is not changed. However, an offset may be required to have the same range of original data (typ. 0-255). It is better to have the same range because clipping due to over/underflows is avoided. Received multiplexed description can be labeled in the following way: D1 D2 D1' .. D3 D4 D3' D1" D2" D1"' : The inverse system is: A = +1 D1 B = -1/2 D1 +2 D2 -1/2 D1' C = -1/2 D1 +2 D3 -1/2 D1" D = +1/4 D1 -1 D2 +1/4 D1' ... -1 D3 +4 D4 -1 D3' ... +1/4 D1" -1 D2" +1/4 D1"' Corresponding SDP attributes for reconstruction are: a=X-mdclc-group: lp0 D1 D2 D3 D4 a=X-mdclc-post: lp0 E= fir([1/4,-1/2,1/4; -1/2,+1,-1/2; 1/4,-1/2,1/4], up(2,0,XY, D1)) + fir([-1,0,0; 2,0,0; -1,0,0], up(2,0,XY,D2)) + fir([-1,2,-1; 0,0,0; 0,0,0], up(2,0,XY,D3)) + fir([4,0,0; 0,0,0; 0,0,0], up(2,0,XY,D4)); 4.3. Frame-expanded Multiple Description Frame expansion is a way to expand the original data so that some controlled redundancy is added. In literature, frames expansion has been used with quantized filter banks. Here frames expansion is used with downsampled filter banks. In this example: 2 descriptions can be generated by separating odd and even lines as for 2MD; the 3rd description is simply the average of odd and even lines. The short name of this scheme is 3MD. It is clear that perfect reconstruction (except for quantization noise) is achieved if any 2 descriptions out of 3 are correctly received. The A. Vitali Standards Track [Page 15] RFC PROPOSED standard-compatible MDC/LC May 2005 3MD system can be seen as equivalent to a FEC code with rate 2/3: one single erasure can be recovered. m=video 49170 RTP/AVP 97 98 99 a=rtpmap:97 H264/90000 a=fmtp:97 ... a=X-mdclc-tag: 97 D1 a=X-mdclc-pre: D1= dn(2,0,Y,S) a=rtpmap:98 H264/90000 a=fmtp:98 ... a=X-mdclc-tag: 98 D2 a=X-mdclc-pre: D2= dn(2,1,Y,S) a=rtpmap:99 H264/90000 a=fmtp:99 ... a=X-mdclc-tag: 99 D3 a=X-mdclc-pre: D3= dn(2,0,Y, fir([0.5;0.5], S)); Note that the quantity of data to be encoded is 3*1/2. It must be appreciated that if there is quantization noise, its power on the reconstruction is reduced when all three descriptions are received. Unlike FEC, in case of better than expected channel, there is an improvement in received quality. The filter bank made by two all-pass and one lowpass filter can be seen as an encoding matrix, where two data (an odd line A and the following even line B) are encoded into three descriptions (D1, D2 and D3): D1 = +1.0 A D2 = +1.0 B D3 = +0.5 A +0.5 B Every 2x2 submatrix is invertible. Hence when two descriptions are received, the original data is reconstructed except for quantization noise using the corresponding inverse matrix. When all three description the pseudo-inverse matrix can be used: A = +0.833 D1 -0.166 D2 +0.333 D3 B = -0.166 D1 +0.833 D2 +0.333 D3 The range of the quantization noise from D1/D2/D3 to A/B can be found by summing the coefficient along the corresponding row. The power of the quantization noise can be found by summing squared coefficients. The redundancy can be controlled easily by quantizing more heavily the third downsampled description. This can be signalled via the "fmtp" attribute in SDP. When all three descriptions are received, A. Vitali Standards Track [Page 16] RFC PROPOSED standard-compatible MDC/LC May 2005 the smart receiver MDC/LC-aware MAY decide not to use it, as it has a lower quality. Another example: 4 descriptions can be generated by separating odd and even lines and taking every other pixel as for 4MD; the 5th descriptions can be generated by averaging separated pixels (i.e. averaging descriptions). The short name of this scheme is 5MD. Note that the quantity of data to be encoded is 5*1/4. 4.4. Unbalanced Multiple Description (UMD) In this example one descriptions corresponds to the original, the other description is simply a downsampled version having 1/4th the size of the original. The second description is more important and should be more protected than the first. It is to be used in case of losses to enhance the concealment. m=video 49170 RTP/AVP 97 98 a=rtpmap:97 H264/90000 a=fmtp:97 ... a=X-mdclc-tag: 97 D1 a=X-mdclc-pre: D1= S; a=rtpmap:98 H264/90000 a=fmtp:98 ... a=X-mdclc-tag: 98 D2 a=X-mdclc-pre: D2= dn(2,0,XY, S); Note that the quantity of data to be encoded is 1+1/4. The redundancy can be controlled easily by quantizing more heavily the second downsampled description. The protection level of the second description can be increased by simply increasing the intra refresh rate. This is useful because the second description is used only for the concealment of the first. When both descriptions are received, the second is discarded. When only the second description is received, it is upsampled by a linear interpolation filter. a=X-mdclc-group: lp01 D1 a=X-mdclc-post: lp01 E= D1; a=X-mdclc-group: lp2 D2 a=X-mdclc-post: lp2 E= fir([0.0625,0.125,0.0625; 0.125,0.25,0.125; 0.0625,0.125,0.0625], up(2,0,XY, D2)); a=X-mdclc-group: lp3 a=X-mdclc-post: lp3 E= 128; A. Vitali Standards Track [Page 17] RFC PROPOSED standard-compatible MDC/LC May 2005 4.5. Classical Layered Coding (LC) Two layers are created as follows: the original data is downsampled to 1/4th the original and encoded. This is the base layer. The base layer is decoded, upsampled and subtracted from the original data, generating what can be seen as a prediction error to be encoded. This is the enhancement layer. m=video 49170 RTP/AVP 97 98 a=rtpmap:97 H264/90000 a=fmtp:97 ... a=X-mdclc-tag: 97 L1 a=X-mdclc-pre: L1= dn(2,0,XY, L1) a=rtpmap:98 H264/90000 a=fmtp:98 ... a=X-mdclc-tag: 98 L2 a=X-mdclc-pre: L2= S - fir([0.0625,0.125,0.0625; 0.125,0.25,0.125; 0.0625,0.125,0.0625], up(2,0,XY, L1_Q)); Note that the quantity of data to be encoded is 1+1/4. Three layers are created as follows: the original data is downsampled to 1/16th the original and encoded. This is the base layer. The base layer is decoded, upsampled and subtracted from the original downsampled to 1/4th. This is the first enhancement layer. The first enhancement layer is decoded, upsampled and subtracted from the original. This is the second enhancement layer. m=video 49170 RTP/AVP 97 98 99 a=rtpmap:97 H264/90000 a=fmtp:97 ... a=X-mdclc-tag: 97 L1 a=X-mdclc-pre: L1= dn(4,0,XY, S) a=rtpmap:98 H264/90000 a=fmtp:98 ... a=X-mdclc-tag: 98 L2 a=X-mdclc-pre: L2= dn(2,0,XY, S) - fir([0.0625,0.125,0.0625; 0.125,0.25,0.125; .0625,0.125,0.0625], up=(2,0,XY, L1_Q)); a=rtpmap:99 H264/90000 a=fmtp:99 ... a=X-mdclc-tag: 99 L3 a=X-mdclc-pre: L3= dn(1,0,XY, S) - fir([0.0625,0.125,0.0625; 0.125,0.25,0.125; 0.0625,0.125,0.0625], up=(2,0,XY, L2_Q)); Note that the quantity of data to be encoded is 1+1/4+1/16. A. Vitali Standards Track [Page 18] RFC PROPOSED standard-compatible MDC/LC May 2005 4.6. Haar-wavelet Layered Coding Classical layered encoding suffers from overhead. Wavelet coding does not as enhancement data is downsampled critically. Two layers are created using the 1D vertical Haar wavelet. The base layer is the average of odd and even lines. The enhancement layer is the difference between odd and even lines. m=video 49170 RTP/AVP 97 98 a=rtpmap:97 H264/90000 a=fmtp:97 ... a=X-mdclc-tag: 97 L1 a=X-mdclc-pre: L1= dn(2,0,Y, fir([+0.5;+0.5], S)); a=rtpmap:98 H264/90000 a=fmtp:98 ... a=X-mdclc-tag: 98 L2 a=X-mdclc-pre: L2= dn(2,0,Y, fir([+0.5;-0.5], S)); Note that the quantity of data to be encoded is 2*1/2. Four layers are created using the 2D Haar wavelet. The base layer is the average of odd and even lines and columns. The other three enhancement layers are the horizontal, vertical and diagonal difference with respect to the base layer. m=video 49170 RTP/AVP 97 98 99 100 a=rtpmap:97 H264/90000 a=fmtp:97 ... a=X-mdclc-tag: 97 L1 a=X-mdclc-pre: L1= dn(2,0,XY, fir([+0.25 +0.25;+0.25 +0.25], S)); a=rtpmap:98 H264/90000 a=fmtp:98 ... a=X-mdclc-tag: 98 L2 a=X-mdclc-pre: L2= dn(2,0,XY, fir([+0.25 -0.25;+0.25 -0.25], S)); a=rtpmap:99 H264/90000 a=fmtp:99 ... a=X-mdclc-tag: 99 L3 a=X-mdclc-pre: L3= dn(2,0,XY, fir([+0.25 +0.25;-0.25 -0.25], S)); a=rtpmap:100 H264/90000 a=fmtp:100 ... a=X-mdclc-tag:100 L4 a=X-mdclc-pre: L4= dn(2,0,XY, fir([+0.25 -0.25;-0.25 +0.25], S)); Note that the quantity of data to be encoded is 4*1/4. A. Vitali Standards Track [Page 19] RFC PROPOSED standard-compatible MDC/LC May 2005 5. Packetization 5.1. Synchronization The post-processor need to merge the same portion of the data in order to produce the correct result. Syncronization of descriptions/layers is therefore critical and can be accomplished in one of the following ways. Timestamp syncronization. Data to be merged can be sent in packets having the same timestamp. This is possible if the sampling clock is the same. Sequence Number syncronization. Data to be merged can be sent in packets having the same sequence number. This is possible if each packet contain the same portion of the data. This means that packets may have variable length. Payload syncronization. Data to be merged can be identified looking at the payload. 5.2. Multiplexing, interleaving Descriptions should be offseted as much as possible when streams are multiplexed. In this way a burst of losses does not cause the loss of the same portion of data in all descriptions at the same time. +--------------------------------------> time D1_Nth_frame... | D2_Nth_frame... |<------offset------>| If interleaving is used, the same criterion is to be used: descriptions are spaced as much as possible. In this way a burst of losses does not cause the loss of the same portion of data in all descriptions at the same time. +--------------------------------------> time D1_Nth_frame...D2_Nth_frame...|D1_N+1th_frame... | | |<-----interleaver depth----->| 6. Security Considerations Using one of proposed SDP parameters, an entity that managed to modify the session descriptions exchanged between the partecipants to establish a multimedia session could impede partecipants the correct decoding/merge process. A. Vitali Standards Track [Page 20] RFC PROPOSED standard-compatible MDC/LC May 2005 The attacker can modify datagrams, particularly the RTP header which is used for syncronization, to impede the correct decoding/merge process. The attacker can insert pathological expression into the SDP "X- mdclc-pre:" or "X-mdclc-post:" that are complex to decode and that cause the post-processor in the receiver to be overloaded. Integrity mechanism provided by protocols used to exchange session descriptions and media encryption can be used to prevent this attack. 7. Congestion control and bandwidth management Congestion control for RTP SHALL be used in accordance with RFC 3550 [4], and with any applicable RTP profile; e.g., RFC 3551 [16]. An additional requirement if best-effort service is being used is: users of this payload format MUST monitor packet loss to ensure that the packet loss rate is within acceptable parameters. Packet loss is considered acceptable if a TCP flow across the same network path, and experiencing the same network conditions, would achieve an average throughput, measured on a reasonable timescale, that is not less than the RTP flow is achieving. This condition can be satisfied by implementing congestion control mechanisms to adapt the transmission rate (or the number of descriptions/layers subscribed), or by arranging for a receiver to leave the session if the loss rate is unacceptably high. 8. IANA Consideration This document defines several unregistered SDP attributes. IANA has not registered MIME types. 9. Informative appendix: rationale for std-compatible MDC/LC In the following paragraph the rationale for video MDC is presented. Neither video LC nor audio MDC/LC are covered. 9.1. Rationale for video MDC There are many techniques to create multiple descriptions: MDC quantization, correlating transforms and filters, quantized frames or redundant bases, FEC combined with layered coding, spatial or temporal polyphase downsampling. Many of these schemes can be adapted to existing video codecs which are based on prediction, transform, quantization and entropy coding: A. Vitali Standards Track [Page 21] RFC PROPOSED standard-compatible MDC/LC May 2005 it is possible to create descriptions in the pixel domain, in the error-prediction domain or in the transform domain. 9.2. MDC in the error-prediction or in the transform domain Working in the error-prediction domain or in the transform domain yields very efficient but complex schemes. If not all descriptions are received correctly, the prediction drifts because the frame memory in the decoder will not be the same as the one used in the encoder. To solve this problem, prediction may be removed, but this greatly reduce the video compression capability of the codec. A solution is to send a drift compensation term together with each description, but if there are more than two descriptions, the number of drift compensation terms increases dramatically, again reducing efficiency. Because of this, we restricted our attention to schemes that work in the pixel domain. 9.3. MDC in the pixel domain Working in the pixel domain has the advantage that MDC can be completely decoupled from the underlying video codec: descriptions can be created in a pre-processing stage before compression; successfully received descriptions can be merged in a post-processing stage after decompression. Spatial and temporal descriptions can be created by using Polyphase Downsampling (PDMD), programmable lowpass filters controls redundancy; SNR descriptions can be created by means of MDC quantizers (either scalar or vector), the structure of quantizers controls redundancy. Replicated headers/syntax and replicated motion vectors among bitstreams greatly impede coding efficiency in SNR MDC. Replicated headers/syntax also hinder temporal MDC, and motion compensation is less effective because of the increased temporal distance between frames. Spatial MDC is hindered by headers/syntax as well but unlike with temporal MDC, motion compensation is affected to a smaller extent, particularly when 8x8 blocks are split into smaller blocks, as in the latest H.264 codec. According to our experience, spatial PDMD is preferable over temporal PDMD: at very low bitrates, when temporal PDMD is used, there is an annoying flashing in the decoded sequence due to independent compression of descriptions; instead, when spatial PDMD is used, there is a kind of dithering due to the same reason. Dithering is not annoying, it can be easily detected and eliminated, and it can even improve the perceived quality. A. Vitali Standards Track [Page 22] RFC PROPOSED standard-compatible MDC/LC May 2005 10. References 10.1. Normative References [1] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [2] Crocker & Overell, "Augmented BNF for Syntax Specifications: ABNF", RFC 2234, November 1997. [3] Handley, M. and V. Jacobson, "SDP: Session Description Protocol", RFC 2327, April 1998. [4] Camarillo et al., "Grouping of Media Lines in the Session Description Protocol (SDP)", RFC 3388, December 2002. [5] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model with Session Description Protocol (SDP)", RFC 3264, June 2002. [6] Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson, "RTP: A Transport Protocol for Real-Time Applications", STD 64, RFC 3550, July 2003. 10.2. Informative References [7] John G. Apostolopoulos "Video Streaming: Concepts, Algorithms and Systems" HP Laboratories, report HPL-2002-260, September 2002. [8] Vivek K. Goyal "Multiple Description Coding: Compression Meets the Network" IEEE Signal Processing Magazine, September 2001. [9] Jelena Kovacevic, Vivek K. Goyal, "Multiple Descriptions - Source-Channel Coding Methods for Communications", Bell Labs, Innovation for Lucent Technologies, 1998. [10] R. Singh, A. Ortega, L. Perret, and W. Jiang, "Comparison of Multiple Description Coding and Layered Coding based on Network Simulations," in Proc. SPIE Image Video Proc., San Jose, CA, Jan. 2000, pp. 929-939. [11] A. Vitali, F. Rovati, R. Rinaldo, R. Bernardini, and M. Durigon, "Video Streaming over Lossy/Variable Bandwidth Networks by means of Multiple Description," in Proceedings of MMSP 2004, Siena, Italy, pp. 498--501, IEEE, September 2004. [12] R. Bernardini, R. Rinaldo, A. Tonello, and A. Vitali, "Frame based Multiple Description for Multimedia Transmission over Wireless Networks," in Proceedings of WPMC 2004, Abano Terme, A. Vitali Standards Track [Page 23] RFC PROPOSED standard-compatible MDC/LC May 2005 Italy, July 2004. [13] R. Bernardini, M. Durigon, R. Rinaldo, L. Celetto, and A. Vitali, "Polyphase Spatial Subsampling Multiple Description Coding of Video Streams with H264," in Proceedings of ICIP 2004, Singapore, pp. 3213-3216, October 2004. [14] N. Franchi, M. Fumagalli, R. Lancini, S. Tubaro, "Multiple Description Video Coding for Scalable and Robust Transmission over IP" IEEE Transactions on CSVT, vol. 15, no. 3, pp. 321-334, March 2005. [15] R. Bernardini, L. Celetto, R. Rinaldo, A. Vitali, P. Zontone, "Bit Allocation and Quantizer Optimization in Multiple Description Coding with Oversampled Filterbanks" paper 1930, IEEE International Conference on Image Processing, Genova, Italy, September 2005. Author's Address: Andrea L. Vitali STMicroelectronics via C. Olivetti 2 20041 Agrate Brianza (MI) Italy Phone: +39-039-603-7244 EMail: andrea.vitali@st.com Marco Fumagalli Cefriel - Politecnico di Milano via R. Fucini, 2 20133 Milano Phone: +39-02-2395-4208 Email: marco.fumagalli@cefriel.it Full Copyright Statement Copyright (C) The Internet Society (2005). This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights. A. Vitali Standards Track [Page 24] RFC PROPOSED standard-compatible MDC/LC May 2005 This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Intellectual Property Right The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79. Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf- ipr@ietf.org. Acknowledgement We wish to thank Nicola Franchi of CEFRIEL - Politecnico di Milano (ICT Center of Excellence For Research, Innovation, Education and industrial Labs partnership) for their help in defining this standard-compatible framework. We also wish to thank Roberto Rinaldo and Riccardo Bernardini of DIEGM - Universita` degli Studi di Udine for their work in the field. A. Vitali Standards Track [Page 25]