Network Working Group JM. Valin Internet-Draft Mozilla Intended status: Standards Track July 6, 2015 Expires: January 7, 2016 Screencasting Considerations and L1-Tree Wavelet Coding draft-valin-netvc-l1tw-01 Abstract This document proposes a screencasting encoding mode based on the Haar wavelet transform and L1-tree wavelet (L1TW) coding. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on January 7, 2016. Copyright Notice Copyright (c) 2015 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. This document may not be modified, and derivative works of it may not be created, and it may not be published except as an Internet-Draft. Valin Expires January 7, 2016 [Page 1] Internet-Draft Screencasting and L1TW July 2015 Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 2. The Haar Wavelet . . . . . . . . . . . . . . . . . . . . . . 3 3. L1-Tree Coding . . . . . . . . . . . . . . . . . . . . . . . 3 4. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 5. Objective Evaluation . . . . . . . . . . . . . . . . . . . . 4 6. Development Repository . . . . . . . . . . . . . . . . . . . 5 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 5 8. Security Considerations . . . . . . . . . . . . . . . . . . . 5 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 5 10. Informative References . . . . . . . . . . . . . . . . . . . 5 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 5 1. Introduction Screensharing is an important application for an Internet video codec. Screensharing content differs from photographic images in many ways, including: o Text: screenshots often contain anti-aliased text on a perfectly flat background. This makes ringing artefacts highly perceptible. Also, typical photographic codecs based on the discrete cosine transform (DCT) cannot take advantage of the fact that the background often has a constant colour. o Lines and edges. Screenshots often contain perfectly straight horizontal and/or vertical lines. They appear in window frames, toolbars, widgets, spreadsheets, etc. DCT-based codecs can represent those lines and edges, but not as compactly as codecs like PNG. o Reduced number of colours: Screenshots are much less "noisy" than photographic images. It is common for a certain region of an image to only contain a handful of different colours, another property we would like to exploit in a video codec. o A very common motion pattern in screensharing content is the displacement of windows. This typically involves rectangular boundaries. The technique described in this document only deals with still images for now and focuses on the problem of efficiently coding anti-aliased text. While it is implemented for the Daala [Daala-website] codec, it should be applicable to most other video codecs. Valin Expires January 7, 2016 [Page 2] Internet-Draft Screencasting and L1TW July 2015 2. The Haar Wavelet The Haar wavelet is the simplest of all orthogonal wavelets, and also the only one with linear phase. We use the Haar transform both because it is spatially compact and because it makes it easy to switch between a wavelet transform and the DCT. In 1-D, a single level of the Haar transform is expressed as: ___ [ y0 ] / 1 [ 1 1 ] [ x0 ] [ ] = / --- [ ] [ ] [ y1 ] v 2 [ -1 1 ] [ x1 ] The 2-D Haar transform is implemented from a 2x2 lifting Haar kernel: inputs: x0, x1, x2, x3 x0 <= x0 + x2 x3 <= x3 - x1 tmp <= (x0 - x3) >> 1 x1 <= tmp - x1 x2 <= tmp - x2 x0 <= x0 - x1 x3 <= x3 + x2 outputs: x0, x1, x2, x3 This kernel has perfect reconstruction, making it also useful for lossless compression. The kernel above is applied on 5 levels for 32x32 superblocks. The resulting wavelet coefficients are quantized non-uniformly using the following quantization scales relative to the DC quantizer (from low frequency to high frequency): horizontal/vertical: [1.0, 1.0, 1.0, 1.5, 2.0] diagonal: [1.0, 1.0, 1.5, 2.0, 3.0] 3. L1-Tree Coding Like other wavelet coding methods such as EZW and SPIHT, we code the wavelet coefficients using trees. The main difference however is that rather than being based on the maximum coefficient value in a tree, this technique is based on the sum of the absolute values of all coefficients in the tree. Let x(i,j) denote the quantized wavelet coefficient at position (i,j), the children of x(i,j) are x(2*i,2*j), x(2*i,2*j+1), x(2*i+1,2*j), and x(2*i+1,2*j+1). The absolute sum of the tree rooted in (i,j) is defined recursively as: Valin Expires January 7, 2016 [Page 3] Internet-Draft Screencasting and L1TW July 2015 S(i,j) = |x(i,j)| + S(2*i,2*j) + S(2*i,2*j+1) + S(2*i+1,2*j) + S(2*i+1,2*j+1), with S(i,j)=0 for i or j >= N. C(i,j) is defined as S(i,j) - |x(i,j)|. Coefficient coding starts at the root of each of the three "direction trees": (1,0), (0,1), and (1,1). At each level we code the value of |x(i,j)| using a cumulative density function adapted based on the value of S(i,j). Coding |x(i,j)| implies that the value of C(i,j) is known to the decoder, so it does not need to be coded. Three symbols are then required to encode each of the new roots: S(2*i,2*j), S(2*i,2*j+1), S(2*i+1,2*j), and S(2*i+1,2*j+1). At the top level, we have S(0,0) = S(1,0) + S(0,1) + S(1,1), so that completely flat blocks can be coded with a single S(0,0)=0 symbol. The DC is coded separately. 4. Results The coded images obtained with the Haar transform and L1TW have far better subjective visual quality than those obtained with the lapped DCT or with JPEG, and of comparable quality to those obtained with x264 and x265 . An example image at around 0.35 bit/pixel is provided at . The x264 image encoded with options "--preset placebo --crf=27" and the x265 image is encoded with "--preset slow --crf=29". While the technique presented here works relatively well on the example above, there are still cases where it performs significantly worse than x265. These include gradients, such as those in toolbars and window titlebars, and long horizontal and vertical lines such as those found in spreadsheets. These cases should improve once we implement the ability to dynamically switch between the lapped DCT and the Haar transform. Other ways of improving performance on long lines and edges would be to extend to use a different 2D wavelet decomposition, or use an overcomplete basis. 5. Objective Evaluation As a first step for evaluating screensharing quality, we have added a small collection of screenshot images to the "Are We Compressed Yet?" (AWCY) website, under the "screenshots" set name. AWCY currently runs four quality metrics: PSNR, PSNR-HVS, SSIM, and FAST-SSIM [I-D.daede-netvc-testing]. It is not yet clear that and of these metrics is suitable for evaluating the quality of screensharing material. Valin Expires January 7, 2016 [Page 4] Internet-Draft Screencasting and L1TW July 2015 6. Development Repository The algorithms in this proposal are being developed as part of Xiph.Org's Daala project. The code is available in the Daala git repository at . See [Daala-website] for more information. 7. IANA Considerations This document makes no request of IANA. 8. Security Considerations This draft has no security considerations. 9. Acknowledgements Thanks to Timothy B. Terriberry for useful feedback and for designing the 2-D Haar lifting kernel. 10. Informative References [Daala-website] "Daala website", Xiph.Org Foundation , . [I-D.daede-netvc-testing] Daede, T. and J. Jack, "Video Codec Testing and Quality Measurement", draft-daede-netvc-testing-00 (work in progress), March 2015. Author's Address Jean-Marc Valin Mozilla 331 E. Evelyn Avenue Mountain View, CA 94041 USA Email: jmvalin@jmvalin.ca Valin Expires January 7, 2016 [Page 5]