RTCWEB                                                         S. Wenger
Internet-Draft                                          A. Eleftheriadis
Intended status: Standards Track                                   Vidyo
Expires: January 5, 2012                                    July 4, 2011


                      The Case for Layered Codecs
                  draft-wenger-rtcweb-layered-codec-00

Abstract

   RTCWEB is in the process of developing a protocol infrastructure and
   a browser API to support browser-to-browser real-time communication
   over IP.  Real-time communication necessarily requires the use of
   encoders and decoders (codecs) for media data.  The document
   advocates mandating support for a class of codecs known as scalable
   or layered codecs, for their superior network adaptivity, error
   resilience, and application adaptivity.  Examples are provided for
   use cases currently under discussion, focusing on video coding as
   the, perhaps, most challenging media type currently under
   consideration.

Status of this Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on January 5, 2012.

Copyright Notice

   Copyright (c) 2011 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents


Wenger & Eleftheriadis   Expires January 5, 2012                [Page 1]

Internet-Draft         The Case for Layered Codecs             July 2011


   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.


Table of Contents

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
   2.  Terminology  . . . . . . . . . . . . . . . . . . . . . . . . .  3
   3.  Detour: Suggested additional Requirements  . . . . . . . . . .  3
   4.  Scalable Codecs in the defined Use Cases . . . . . . . . . . .  4
     4.1.  Use case: Browser to browser use-cases . . . . . . . . . .  5
     4.2.  Telephony use cases  . . . . . . . . . . . . . . . . . . .  7
     4.3.  Video conferencing use cases . . . . . . . . . . . . . . .  8
       4.3.1.  Use-case: Multiparty Video Communication . . . . . . .  8
       4.3.2.  Use-case: Video Conferencing w/ Central Server . . . . 10
     4.4.  Embedded voice communication use cases . . . . . . . . . . 10
     4.5.  Bandwidth/QoS/mobility use cases . . . . . . . . . . . . . 11
   5.  Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . 11
   6.  Security Considerations  . . . . . . . . . . . . . . . . . . . 12
   7.  References . . . . . . . . . . . . . . . . . . . . . . . . . . 12
     7.1.  Normative References . . . . . . . . . . . . . . . . . . . 12
     7.2.  Informative References . . . . . . . . . . . . . . . . . . 12
   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 13


Wenger & Eleftheriadis   Expires January 5, 2012                [Page 2]

Internet-Draft         The Case for Layered Codecs             July 2011


1.  Introduction

   In this document we advocate the mandatory support of scalable coding
   techniques for RTCWEB.  If consensus towards such as position is not
   achievable for whatever technical, commercial, or legal reasons, we
   suggest that not having mandatory codecs may be better than mandating
   the use of an inferior, non-scalable codec.

   The current status of RTCWEB's use case and requirements discussion
   is captured in [I-D.ietf-rtcweb-use-cases-and-requirements].  (Note:
   Version 00 of this I-D was used for the comparison in this document.)
   We use this document as a guide and attempt to compare the broad
   categories of scalable and non-scalable codecs in terms of how well
   they can satisfy the requirements specified therein.  We use video
   codecs as an example, and conclude that scalable codecs are
   considerably better-suited for RTCWEB than non-scalable codecs.

   We first address some additional - and, we believe, self-evident -
   requirements.


2.  Terminology

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119 [RFC2119] and
   indicate requirement levels for compliant implementations.


3.  Detour: Suggested additional Requirements

   There are certain requirements that, we believe, appear so obvious to
   the authors of [I-D.ietf-rtcweb-use-cases-and-requirements] that they
   have not been explicitly captured.  One of these requirements is:

   Fxx: The browser MUST enable communication with a peer (other
   browser(s), MCUs, etc.) with a latency adequate for "real-time"
   communication.

   This is a crucial requirement when selecting a (video) codec for
   reasons that will become apparent shortly.  We believe that RTCWEB
   should strive for solutions allowing latency (not counting long-
   distance network delays) below 200 msec.  Furthermore, any solution
   that cannot guarantee a latency below 500 msec (before dropping video
   altogether) should not be considered.  At the time of writing there
   are mailing list discussions ongoing regarding a requirement that may
   be differently formulated but addresses the same operational aspect.
   We don't specifically care about the formulation, but highly


Wenger & Eleftheriadis   Expires January 5, 2012                [Page 3]

Internet-Draft         The Case for Layered Codecs             July 2011


   recommend to go through the pain of agreeing to fixed millisecond
   numbers.

   An additional requirement that should be considered relates to
   heterogeneity:

   Fxy: The browser must be able to send fully decodable bitstreams,
   ideally without wasting resources, for a broad range of receivers
   (handheld to desktop) and, in multi-party cases, for a heterogeneous
   receiver population.

   Seems obvious?  Yes, to us.  Have the implications, however, been
   thoroughly considered?  Not that we are aware of.  We discuss this
   further when addressing the multiparty use case.

   Finally, an additional requirement that relates to the heterogeneity
   of peers has to do with the heterogeneity of connection
   characteristics among peers:

   Fxz: The browser MUST be able to perform error control on multiple
   streams, with potentially different error characteristics,
   simultaneously.

   In a full-mesh, multipoint scenario, for example, handling of
   individual peer connection characteristics can be very challenging.
   Assume, by way of example, that in a five-way conference one of the
   streams gets damaged or that the corresponding connection is subject
   to congestion.  If the browser choses to produce a more robust
   stream, then the other three receiving peers will be penalized
   through the overhead spent for error protection and resulting
   inferior picture quality, even if they have perfect connections to
   the sender.  If, on the other hand, the browser choses to produce
   separate streams for each of its peers, then it must be able to
   handle an increasing computational load.  Even if the computational
   load is not a concern, battery power consumption certainly is.  In
   fact, there is only one thing that grows faster than CPU power
   (according to Moore's law): per-pixel cycle demands of video codecs.
   Also, we should bear in mind that even when staying with moderately
   complex (and efficient) video codecs, video resolution constantly
   goes up.  Worse, resolution grows in both horizontal and vertical
   dimensions, whereas CPU power grows only in one dimension.


4.  Scalable Codecs in the defined Use Cases

   The concept of scalable coding (for video, audio, or any media)
   consists of providing a representation of the source at multiple
   levels of fidelity using a bitstream comprised of a corresponding


Wenger & Eleftheriadis   Expires January 5, 2012                [Page 4]

Internet-Draft         The Case for Layered Codecs             July 2011


   number of layers.  Typically these layers are formed in a pyramidal
   fashion, such that a given level of fidelity requires the
   availability of some or all of the lower layers (i.e., those that
   correspond to lower levels of fidelity).  The dependency exists
   because the encoding process uses prediction between layers, thus
   providing increased compression efficiency.  If no such prediction is
   used, then the layers are independent of one another, and we have
   what is called simulcasting.  Indeed, from a media coding point of
   view, simulcasting is scalable coding without inter-layer prediction.
   In simulcast, each layer provides a different level of fidelity and
   is independently decodable.

   One example of a standardized codec that uses such pyramidal layering
   to implement scalable coding is H.264/SVC.  MPEG-2 and H.263+ are two
   other similar examples.  The scalable layers in H.264/SVC enhance
   fidelity in any of three dimensions: temporal rate, spatial
   resolution, or spatial quality (or signal-to-noise ratio, SNR).  More
   than one such enhancement can exist in a given video at the same
   time, i.e., scalability enhancements can be combined.

   Scalable coding has been a well-known design architecture in the
   media coding community since at least 1993, but it has only recently
   been used in products.  As a relatively recent addition to the
   commercial audio-video communication arsenal, it is not yet widely
   known among people who are not video coding experts.  An overview of
   the H.264/SVC standard can be read in [wiegand2007].  A detailed
   discussion of how it can be used in videoconferencing systems is
   provided in [eleft2006].

   In the following we revisit each of the use cases and discuss how
   scalable coding, and particularly pyramidal scalable video coding,
   can be used to provide a significantly improved user experience.

4.1.  Use case: Browser to browser use-cases

   This category has four sub-cases.  The sub-cases "simple video
   communication" and "simple video communication with inter-operator
   calling" are, from a codec viewpoint, the same, with the exception of
   the mentioned interoperability argument resulting in a need for a
   baseline codec.  From a (video) codec viewpoint the "Hockey" sub-case
   appears to consist of two independent video streams being sent by the
   mobile phone.  (We are actually not sure whether this, rather exotic,
   multi-camera scenario warrants its own use case, but we are not
   opposed to it, either.)  The sub-case "video size change" appears, to
   us, to introduce a feature that would be relevant to all video-
   capable use cases.  As a result, from a codec viewpoint, all four use
   cases appear to have similar requirements and can be discussed
   jointly.


Wenger & Eleftheriadis   Expires January 5, 2012                [Page 5]

Internet-Draft         The Case for Layered Codecs             July 2011


   On the surface, these use cases appear to be trivially supported by
   any video codec as long as basic rate control and error resilience
   features are utilized (probably with the help of protocol support
   such as RTCP receiver reports and feedback messages therein,
   retransmission, and so on).

   Let's see, however, how scalable codecs can offer significant
   advantages over non-scalable codecs.

   First, temporal scalability (where pictures coded in an enhancement
   layer can be decoded in combination with a lower frame rate base
   layer) can greatly enhance error resilience, through techniques such
   as the one described in [wenger1998].  Especially in conjunction with
   feedback messages, virtually latency-neutral repair mechanisms can be
   devised without relying on latency- and bandwidth-unfriendly intra
   (I) pictures.  It is important to note that such "reactive"
   techniques do not penalize a system when no errors happen to occur.
   Also, by being reactive, they are amenable to heterogeneous error
   control support (thus addressing requirement Fxz).

   (Note: sending intra pictures is typically not a good idea.  First,
   when transmitting video over a bandwidth-limited link that is close
   to capacity (e.g., a 3G link), an I frame can bring the latency up to
   the seconds range.  Second, I frames are much larger than P or B
   frames and, therefore, statistically much more likely to be hit by a
   packet loss.  Sending I frames for error control is, in almost all
   cases, a bad idea.  It has been done for many years, but only because
   few, if any, better error control mechanisms were available.)

   A second scalability dimension is spatial scalability.  Here the
   enhancement layer enables decoding of the video in a higher
   resolution than a base layer, but can predict information from the
   base layer.  Spatial scalability can also offer advantages for error
   control, something that we would gladly elaborate on in the future.
   In this use case, however, its key advantage lies in the graceful
   and, if properly implemented, latency-neutral handling of both
   changes in available network bandwidth (addressing the bandwidth and
   error characteristics aspects of F23, among others) and receiver-side
   rendering requirements (addressing F22).

   Most, if not all, video coding experts will agree that there is
   nothing better than shedding pixels when running into a bandwidth
   issue, and this can be trivially done (without sending an I frame!)
   when using spatially scalable coding techniques.  Similarly,
   recovering from bandwidth shortages can be achieved without sending
   an I frame, again in those cases where the sender properly implements
   spatial scalability.  These techniques are generally more flexible
   and often lead to better-quality pictures compared to the rate-


Wenger & Eleftheriadis   Expires January 5, 2012                [Page 6]

Internet-Draft         The Case for Layered Codecs             July 2011


   control techniques used in single-layer codecs (QP adjustment).

   Another point that speaks for spatial scalability is the option to
   gracefully and quickly react to rendering size requirements, i.e.,
   users enlarging or "full-screening" the rendering window (F22).
   Again adding a high resolution scalable layer can be done without
   sending an I frame (which, as already pointed out, are evil), and, if
   implemented properly, can be done without interrupting the smoothness
   of the video playback experience.  (Note that even today's most
   advanced spatial scalability techniques still encode only a small set
   of discrete picture sizes in their finite, and small, number of
   layers.  As a result, some form of resampling in the rendering
   interface will probably still be required.)

   It has been remarked that simulcasting low and high resolution
   representations of the same picture can have very similar positive
   effects.  This is correct.  However, simulcast comes at a price,
   which is bandwidth and compute cycle requirements.  The quite
   extensive subjective evaluation tests performed by MPEG in
   conjunction with the H.264/SVC verification tests have shown that a
   2006-generation scalable video encoder offers between 17% and 40%
   bitrate savings over a same generation encoder using simulcast
   [mpeg2007].

   (Note that "subjective" evaluation is the Mercedes of media quality
   testing, whereas "objective" evaluation compares to a pre-"Volt"
   Chevrolet :-).  Subjective tests, when performed properly, are by no
   means unscientific and they are, in fact, considered a much better
   way to assess the quality of a codec compared with objective tests
   (e.g., the PSNR, or less crude metrics devised by groups such as the
   Video Quality Experts Group).  Unfortunately, subjective evaluations
   are also orders of magnitude more expensive and time-consuming than
   objective tests, as it takes many person-hours to evaluate a single
   test case, whereas objective tests take only CPU cycles.)

   Further, while a modern scalable encoder creating two spatial layers
   may require roughly the same number of cycles as two encoders coding
   the same pixel count, there are savings in the decoder due to what
   SVC calls "single loop decoding" (i.e., the fact that the receiver
   only needs to maintain a prediction loop just for the layer that is
   being decoded, and not for the layers that the to-be-decoded layer
   may depend on).

4.2.  Telephony use cases

   As the telephony use cases appear to address interaction with legacy
   telephone equipment only, there is probably little use for the
   quality scalability that modern sclabale speech/audio codecs offer.


Wenger & Eleftheriadis   Expires January 5, 2012                [Page 7]

Internet-Draft         The Case for Layered Codecs             July 2011


   As formulated, from a codec viewpoint, any speech/audio codec used in
   telco environments (and many that have been specified outside this
   environment) ought to be acceptable choices.

4.3.  Video conferencing use cases

   There are two sub-cases in this broad category: full-mesh video
   conferencing, and centralized conferencing with resolution switching.
   The use cases also make particular assumptions regarding audio; the
   first case involves mono audio with panning at the web application,
   whereas the second involves mono or stereo audio with mixing at the
   server.

   We fist point out that these two use cases offer only a small subset
   of the functionality offered in today's multiparty videoconferencing
   solutions.  In particular, the centralized server sub-case appears to
   deal with one, not frequently used, aspect of MCU-based communication
   that is implemented in a crude and unoptimized fashion (see below).
   We believe that video conferencing use cases in a standard drafted in
   2011 should encompass at least as much functionality as is routinely
   offered in 2005 generation MCUs.

   We wonder, specifically, why the group is not (yet?) considering
   other use cases in the same broad category of centralized
   cponferencing that require more flexibile server architectures.  For
   example, sometimes active speaker size-up is not wanted, i.e., in
   scenarios involving presentations with questions.  Or, one wants to
   be able to up-size more than one (but less than the whole population)
   of speakers.  Or a full continuous presence conference where it is up
   to the user of each receiving browser to decide how he/she wishes to
   render the received signals.  All of these capabilities are available
   with today's MCUs.  Many similar scenarios could be described.  We
   plan to contribute more detailed descriptions in the future unless we
   receive pushback.

   We now examine each of the two use-case categories.

4.3.1.  Use-case: Multiparty Video Communication

   On the surface, this use case is not very different from the Simple
   Video use case, at least from a codec viewpoint.  Indeed, it appears
   that a browser just needs to be able to simultaneously process N
   incoming independent video bitstreams (as spelled out in F12 and
   F14).  A slightly deeper examination will reveal that this use case
   appears to be a picture-perfect example of why scalable codecs offer
   superior performance.

   First off, all the arguments that were made for scalable codecs in


Wenger & Eleftheriadis   Expires January 5, 2012                [Page 8]

Internet-Draft         The Case for Layered Codecs             July 2011


   the Simple Video use case still apply.  Let's consider, however, what
   happens when more than one stream is received.

   A first consideration is the CPU requirements for decoding all the
   pictures in full resolution.  Suppose a browser is receiving, for
   example, four video pictures in standard TV resolution and in a non-
   scalable format.  The computational load for decoding these pictures
   in addition to the load of encoding one's outgoing picture would
   pretty much max out today's desktop CPUs.  In typical multipoint
   layouts, the reconstructed pictures of most of the peers would be
   shown in thumbnail format, thus throwing away anywhere between 75-95%
   of the reconstructed pixel count!  What a waste.

   But waste what?  Battery power?  Yes, for handheld devices, which do
   not have the screen real-estate to show all the pictures in full
   resolution.  We come to that in a moment.  Electricity?  Yes. This is
   not a joke.  A desktop motherboard can easily double its power intake
   based on CPU load and memory access.  Say, a motherboard's power
   consumption goes up from 100W to 200W, just because video is decoded
   in unnecessarily high resolution.  One kWh of electricity costs at
   one of the author's home (Bay Area, California, USA, PG&E as the
   electricity provider) about 26 cents at his consumption level (single
   f amily house).  This means that 8 hours of unnecessarily decoding of
   full- resolution video that is not rendered at full resolution will
   cost $0.20.  This, by coincidence, is the exact same maximum amount
   that MPEG-LA charges as licensing fees for using H.264/SVC :-) Never
   mind the numbers of trees that can be saved...

   A second consideration is when multiple streams are received, or
   transmitted, by the browser in a heterogeneous receiver population.
   Browsers are today ubiquitous and can be found anywhere from handheld
   devices with QVGA screens to gaming racks with multiple 2K pixel
   resolution screens.  That, incidentally, is a big part of the appeal
   of the RTCWEB activity.  If you are sitting behind a 23-inch 1080p
   monitor, your display is approximately 120 dpi.  A QVGA image,
   perfectly appropriate for being rendered on the screens of many
   smartphones, would be just 2-in by 1.5-in on the PC screen.  On one
   of the authors' laptop (a 1080p, 13.3-in model, better than 200 dpi)
   it would literally be the size of a thumbnail.  Not exactly what one
   expects to see, judging form daily uses of Skype and our own
   (Vidyo's) desktop video conferencing services.  Further, when
   upsampling to, say, a quarter HD resolution, a QVGA signal simply
   doesn't look good no matter how well your upsample filter is
   designed.

   This means that a transmitting browser that creates bitstreams for a
   receiver population including a smartphone and a PC, would have to,
   ideally, generate at least three resolutions: thumbnail, something


Wenger & Eleftheriadis   Expires January 5, 2012                [Page 9]

Internet-Draft         The Case for Layered Codecs             July 2011


   useful for smartphones (e.g., QVGA), and something useful for PC
   users (VGA or better).  More option would be better.  There are
   commercial products available today that ship, using a software codec
   on a high-end PC, 1080p60 video conferencing.  Two years down the
   road, and a mainstream PC will be able to handle that type of load.

   It has already been pointed out that the compute cycle requirements
   for simulcast and scalable encoding are roughly the same, assuming a
   common set of coding tools (i.e., H.264 Baseline profile vs. H.264
   Scalable Baseline profile).  The sending bandwidth requirements,
   however, are not the same: scalable beats simulcast by several tens
   of percentage points.

   There are additional considerations for simulcast vs. scalable coding
   that have to do with RTP packetization and access unit alignment,
   resolution switching, and error resilience, that require much more
   extensive analysis than what's intended in this document.

4.3.2.  Use-case: Video Conferencing w/ Central Server

   This particular use case is interesting because, as formulated, it
   appears to assume that multiple video resolutions are produced by
   each sending browser and simulcast to the server.  As formulated, the
   receiving browser receives the active speaker in full resolution and
   the other participants in low resolution.  The server selects what to
   forward based on speech activity.  Apparently, what is emulated here
   is one operation point implemented in many continuous presence MCUs
   (namely voice activation), without requiring the video transcoding
   features of a continuous presence MCU.

   This use case almost begs for use of scalable video coding.  The
   simulcasting alternative would have all the drawbacks discussed in
   the Simple Video Communication Service.  For example, consider what
   happens when the active speaker changes.  How is the receiving
   participant to switch resolutions between the two simulcast streams?
   It appears to require the request (from server to sending browser) of
   an I frame, with all the drawbacks that entails.  Actually, as
   described, there appears to be no value in simulcasting all
   resolutions to the server, because of the need of such interaction...

4.4.  Embedded voice communication use cases

   The benefits discussed in multiparty video communication can apply
   here for scalable audio codecs.  Scalable audio codecs haven't been
   mentioned prominently before, as in audio-visual systems the cost of
   video processing (in terms of compute cycles and bit rate, among
   others) typically outweighs that of audio.  The requirements for
   today's games are so high that the CPU and bandwidth requirements for


Wenger & Eleftheriadis   Expires January 5, 2012               [Page 10]

Internet-Draft         The Case for Layered Codecs             July 2011


   a couple of audio channels may fall into the category of background
   noise.  However, note that this ceases to be the case as the number
   of audio channels increases and the quality and (therefore)
   complexity of the audio codecs grows.  We understand that the audio
   quality here would probably not be restricted to toll quality, and
   require something like Opus or MP3 for coding.  While one or two of
   these codecs probably would still qualify as "background noise" on a
   gaming rack, 20 certainly do not.  What appears to be useful here
   would be scalability both in terms of bitrate scalability and
   complexity scalability.

4.5.  Bandwidth/QoS/mobility use cases

   The use cases as formulated appear to have limited impact on the
   codec choices beyond aspects already raised above.  NIC changes (or
   other changes that materially affect connectivity) are best addressed
   by codecs that can flexibly change their operation points without
   essentially restarting the codec (such as sending I frames in video,
   or user-generated codebooks in audio).

   If QoS mechanisms that take advantage of QoS marking are supported in
   the underlying network - something that is at least questionable in
   today's Internet, but may be very relevant in the future and in
   special application fields such as the military - the pyramid
   structure of scalable codecs makes the selection of bits that are
   best to be transported at higher QoS trivial: the lower the layer,
   the higher the desired QoS.  We note that scalable systems today
   emulate such a desirable network behavior by protecting base layers
   better than enhancement layers, through techniques such as
   retransmission or FEC.


5.  Concluding Remarks

   Scalability, while known as a concept for decades, is a relatively
   new technique in the commercial sphere of video and audio
   communication products.  As a result, a lot of people are not
   familiar with how it works, and how it can be beneficial to real-time
   communication systems.

   Most significantly, scalable coding has been successfully used to
   solve several fundamental, decades-old challenges in packet video
   communication, and it therefore behooves any standardization activity
   that considers codec design or adoption to take it into serious
   consideration.


Wenger & Eleftheriadis   Expires January 5, 2012               [Page 11]

Internet-Draft         The Case for Layered Codecs             July 2011


6.  Security Considerations

   None


7.  References

7.1.  Normative References

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.

7.2.  Informative References

   [I-D.ietf-rtcweb-use-cases-and-requirements]
              Holmberg, C., Hakansson, S., and G. Eriksson, "Web Real-
              Time Communication Use-cases and Requirements",
              draft-ietf-rtcweb-use-cases-and-requirements-00 (work in
              progress), June 2011.

   [eleft2006]
              Eleftheriadis, A., Cinvanlar, M., and O. Shapiro,
              "Multipoint Videoconferencing with Scalable Video Coding,
              Journal of Zhejiang University SCIENCE A, Vol. 7, Nr. 5,
              April 2006,  pp. 696-705. (Proceedings of the Packet Video
              2006 Workshop.)", 2006.

   [mpeg2007]
              HHI and MPEG, "MPEG Verification Test for SVCd", 2007, <ht
              tp://ip.hhi.de/imagecom_G1/savce/MPEG-Verification-Test/
              MPEG-Verification-Test.htm>.

   [wenger1998]
              Wenger, S., "Temporal scalability using P-pictures for
              low-latency applications, Multimedia Signal Processing
              Workshop 1998, available from
              http://www.stewe.org/papers/mmsp98/243/index.htm", 1998.

   [wiegand2007]
              Schwarz, H., Marpe, D., and T. Wiegand, "Overview of the
              Scalable Video Coding Extension of the H.264/AVC Standard,
              IEEE Transactions on Circuits and Systems for Video
              Technology, Special Issue on Scalable Video Coding, vol.
              17, no. 9, pp. 1103-1120, September 2007, Invited Paper.",
              2006.


Wenger & Eleftheriadis   Expires January 5, 2012               [Page 12]

Internet-Draft         The Case for Layered Codecs             July 2011


Authors' Addresses

   Stephan Wenger
   Vidyo, Inc.
   2400 SKyfarm Dr.
   Hillsborough, CA  94010
   US

   Email: stewe@stewe.org


   Alex Eleftheriadis
   Vidyo, Inc.
   433 Hackensack Avenue
   Seventh Floor
   Hackensack, NJ  07601
   US

   Email: alex@vidyo.com


Wenger & Eleftheriadis   Expires January 5, 2012               [Page 13]