Network Working Group                                            N. Egge
Internet-Draft                                             T. Terriberry
Intended status: Informational                       Mozilla Corporation
Expires: September 10, 2015                                March 9, 2015


             Time Domain Lapped Transforms for Video Coding
                     draft-egge-videocodec-tdlt-01

Abstract

   This proposes the use of Time Domain Lapped Transforms (TDLT) as the
   transform step for video coding.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on September 10, 2015.

Copyright Notice

   Copyright (c) 2015 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

   This document may not be modified, and derivative works of it may not
   be created, and it may not be published except as an Internet-Draft.


Egge & Terriberry      Expires September 10, 2015               [Page 1]

Internet-Draft                    TDLT                        March 2015


Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
   2.  TDLT Defined  . . . . . . . . . . . . . . . . . . . . . . . .   2
   3.  Lapped-Transform Selection  . . . . . . . . . . . . . . . . .   4
     3.1.  Coding Gain . . . . . . . . . . . . . . . . . . . . . . .   5
     3.2.  Transform Width . . . . . . . . . . . . . . . . . . . . .   6
   4.  Optimal Transform Coefficients  . . . . . . . . . . . . . . .   6
     4.1.  Exhaustive Search . . . . . . . . . . . . . . . . . . . .   6
     4.2.  Stochastic Search . . . . . . . . . . . . . . . . . . . .   7
     4.3.  Ramp Constraint . . . . . . . . . . . . . . . . . . . . .   8
   5.  Intra Prediction  . . . . . . . . . . . . . . . . . . . . . .  10
   6.  Motion Compensation . . . . . . . . . . . . . . . . . . . . .  11
   7.  Multiple Block Sizes  . . . . . . . . . . . . . . . . . . . .  11
     7.1.  Variable Sized Lapping  . . . . . . . . . . . . . . . . .  12
     7.2.  Fixed Sized Lapping . . . . . . . . . . . . . . . . . . .  15
   8.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  16
   9.  Security Considerations . . . . . . . . . . . . . . . . . . .  16
   10. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . .  16
   11. Informative References  . . . . . . . . . . . . . . . . . . .  16
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  17

1.  Introduction

   This draft outlines a proposal to adapt the Time-Domain Lapped
   Transforms (TDLT) for use in video coding.  Lapped transforms were
   proposed for video coding at least as as far back as 1989 [Malv89].
   Like the loop filters more commonly found in recent video coding
   standards, TDLTs use a post-processing filter that runs between block
   edges to reduce or eliminate blocking artifacts.  Unlike a loop
   filter, the TDLT filter is invertible, allowing the encoder to run
   the inverse filter on the input video.  This decorrelates blocks
   before they are passed through a normal block transform and
   quantization step, improving coding gain (which helps in both smooth
   and highly textured areas), in addition to reducing blocking
   artifacts.

2.  TDLT Defined

   The Time-Domain Lapped Transform can be viewed as a set of pre and
   post filters to an existing block-based DCT transform.  The idea is
   to place an invertible filter along the block boundaries outside an
   existing block-based DCT encoder.


Egge & Terriberry      Expires September 10, 2015               [Page 2]

Internet-Draft                    TDLT                        March 2015


                             +----+     +----+
                             |    |  Q  |    |
                       +----+|DCT |  u  |iDCT|+----+
                       |    ||    |  a  |    ||    |
                       | P  |+----+  n  +----+|P^-1|
                       |    ||    |  t  |    ||    |
                       +----+|DCT |  i  |iDCT|+----+
                       |    ||    |  z  |    ||    |
                       | P  |+----+  a  +----+|P^-1|
                       |    ||    |  t  |    ||    |
                       +----+|DCT |  i  |iDCT|+----+
                             |    |  o  |    |
                             +----+  n  +----+

   The pre-filter P operates in the time domain, processing block
   boundaries and removing inter-block correlation.  The blocks are then
   transformed by the DCT into the frequency domain, where the resulting
   coefficients are quantized and encoded.  When decoding, the inverse
   operator P^-1 is applied as a post-filter to the output of the
   inverse DCT.  This has two benefits:

   1.  Quantization errors are spread over adjacent blocks via the post-
       filter P^-1, reducing blocking artifacts.  This eliminates the
       need for a separate deblocking filter.

   2.  The increased support region of the transform allows it to take
       advantage of inter-block correlation to achieve a higher coding
       gain than a non-overlapped DCT.  This allows it to more
       effectively code both smooth and textured regions.

   The pre-filter P is defined in [Tran01] as follows:

                          1  [ I  J ][ I  0 ][ I  J ]
                     P = --- [      ][      ][      ]
                          2  [ J -I ][ 0  V ][ J -I ]

   Here I is the identity matrix and J is the "reversal matrix",
   obtained by simply re-ordering the rows of the identity matrix in
   reverse order.  The V matrix is a free parameter, and as long as V is
   invertible, this filter structure guarantees perfect reconstruction,
   linear phase, and biorthogonality.  If V is orthogonal, then the
   overall transform is also orthogonal instead of just biorthogonal.

   For the case of the 4x8 TDLT, we use the following invertible matrix
   for V:


Egge & Terriberry      Expires September 10, 2015               [Page 3]

Internet-Draft                    TDLT                        March 2015


                         [ 1 q_0 ][  1  0 ][ s_0  0  ]
                     V = [       ][       ][         ]
                         [ 0  1  ][ p_0 1 ][  0  s_1 ]

   Thus for the 4x8 case, the pre-filter and post-filter are completely
   described by the four parameters q_0, p_0, s_0, and s_1.  In general,
   any invertible V matrix may be used.  However, factoring V into a
   series of lifting steps ensures that it can be implemented
   efficiently, and can reduce the number of parameters required by the
   optimization process, since the full flexibility of an arbitrary
   invertible matrix is not required to achieve good coding gain.
   [Tran01] proposes two reduced-parameter factorizations, dubbed
   Type III and Type IV.  These are identical in the 4x8 case, but for
   larger transforms the differ in the order that the p_i and q_i steps
   are applied: interleaved for Type III and ascending and then
   descending order for Type IV.  While Type III appears to give
   slightly higher coding gain when unconstrained, when coupled with the
   ramp constraint discussed below and the constraint that all
   coefficients be dyadic rationals, the number of feasible solutions is
   much smaller than with Type IV.  The increased number of feasible
   solutions allows Type IV transforms to achieve higher coding gains
   than Type III when these constraints are imposed.  This definition
   easily extends to the 8x16 and 16x32 TDLT case with similar
   parameterizations.  In general, we use the Type IV factorization
   from [Tran01].  For a V matrix of size M, this has (M-1) p_i and
   (M-1) q_i parameters, and M s_i parameters.  For a transform of size
   Nx2N, this gives a total of 1.5N-2 parameters.  This is also the
   number of lifting steps that must be performed to implement the V
   portion of the pre- and post-filters.

3.  Lapped-Transform Selection

   We would like to find good candidate transform coefficients that
   perform well within a video coding framework.  There are several
   metrics we can use for evaluating pre-filter parameters.  Including

   1.  Coding Gain - how well energy is compacted into only a few
       coefficients

   2.  Side Band Attenuation - how much energy from frequencies outside
       the passband leaks into each basis function

   3.  Transform Width - how wide are the basis functions and how much
       ringing they will cause

   4.  Orthogonality - how linearly independent the basis functions are


Egge & Terriberry      Expires September 10, 2015               [Page 4]

Internet-Draft                    TDLT                        March 2015


   Of these, the most important by far is coding gain as it allows us to
   directly measure the improvement in bits between different candidate
   transforms.  At high bit rates using an efficient quantizer, every
   6.02 dB improvement in coding gain saves a bit of entropy per
   coefficient.

3.1.  Coding Gain

   Coding gain is a useful metric for comparing different candidate
   transforms.  Roughly speaking, it is the measure of how well energy
   is compacted into only a few coefficients.  The formula for coding
   gain of the lapped transform can be found in [Terr12].  Using an
   AR(1) model with r=0.95, we have

                        /                     1                   \
       C_g = 10*Log_10 | ----------------------------------------- |
                        \ Prod_i((G*AR(1)*G^T)[i,i]*(H^T*H)[i,i]) /


   where G is the analysis filter of the lapped transform:

                              [         ][ P  0 ]
                          G = [ 0 DCT 0 ][      ]
                              [         ][ 0  P ]

   and H is the synthesis filter of the lapped transform:

                             [ P^-1  0   ][  0   ]
                         H = [           ][ iDCT ]
                             [  0   P^-1 ][  0   ]

   In [Terr12] the coding gain of the non-lapped DCT is compared with
   the optimal non-lapped Karhunen-Loeve transform for the same AR(1)
   model with r=0.95.

                             4 point     8 point    16 point
                          +-----------+-----------+-----------+
                     DCT  | 7.5701 dB | 8.8259 dB | 9.4555 dB |
                     KLT  | 7.5825 dB | 8.8462 dB | 9.4781 dB |
                          +-----------+-----------+-----------+

   Similarly, in [Tran01] the coding gain of the TDLT using fast
   factorizations with real coefficients produced by unconstrained
   optimization are


Egge & Terriberry      Expires September 10, 2015               [Page 5]

Internet-Draft                    TDLT                        March 2015


                              4x8         8x16       16x32
                          +-----------+-----------+-----------+
            Type III TDLT | 8.6349 dB | 9.6115 dB | 9.9496 dB |
            Type IV TDLT  | 8.6349 dB | 9.6005 dB | 9.9057 dB |
                          +-----------+-----------+-----------+

3.2.  Transform Width

   In general, the wider the transform, the higher the coding gain: a
   16-point DCT will always have a higher coding gain than a 4-point
   DCT.  In the case of lapped transform, the width of the transform is
   more than just counting the number of points, it involves the shape
   of the basis functions.  At equal coding gain, a narrower transform
   is better because it causes a smaller amount of ringing around edges.
   We define the width of the transform as

                                                          1/4
                     / sum_ij ( H[i,j]^2 * (j-N+1/2)^4 ) \
            w = C * | ----------------------------------  |   ,
                     \        sum_ij ( H[i,j]^2 )        /

   where C=2.991 is a constant calibrated such that the width of the
   1024-point non-overlapped DCT is equal to 1024.

4.  Optimal Transform Coefficients

   Of the four metrics described in Section 3 we chose to optimize our
   transform parameters for the highest coding gain.

   To avoid the use of floating point operations, we use dyadic
   rationals to represent the parameters of our TDLT.  These are the
   p's, q's and s's that describe the V matrix in the pre-filter.  We
   chose a base of 2^6 because it offered enough resolution to find good
   approximations of the optimal values for the p's, q's, and s's and
   still allowed us to fit the results of multiplications in a 16 bit
   word.  Increasing the base to 2^8 improves the achievable coding gain
   of the 4x8 transform by less than 0.002 dB.  On the other hand,
   dropping it even one bit to 2^5 lowers the coding gain by 0.037 dB.

4.1.  Exhaustive Search

   For the smaller lapped transforms, it is possible to simply do an
   exhaustive search and check all possible transform candidates to find
   the one with the best coding gain.  The limitation that the p's, q's,
   and s's all be dyadic rationals allows us to simply enumerate all
   reasonable values.  Additional constraints allowed us to further
   reduce the search space.  Because the p's and q's are liftings steps
   that represent rotations in the plane their, values are between -1.0


Egge & Terriberry      Expires September 10, 2015               [Page 6]

Internet-Draft                    TDLT                        March 2015


   and 1.0.  Likewise the limitation that the pre- and post-filter steps
   be reversible requires that the scale factors be greater than or
   equal 1.0, otherwise information would be lost during the transform.
   Finally, all things equal we prefer smaller scale factors as it makes
   quantizing and encoding the coefficients cheaper.  We thus cap the
   scale factors at 2.0.  Based on some limited experimentation, scale
   factors larger than this do not appear to produce useful transforms
   according to our metrics, anyway.

   With a dyadic rational base of 2^6, the number of possible candidates
   to consider is

   |C| = (2*(2^6)+1)^(|p|+|q|)   *  (2^6+1)^|s|
       = (2*(2^6)+1)^(2*(N/2-1)) *  (2^6+1)^(N/2)

   Thus for the transform sizes we are interested in, the number of
   candidates is tractable only for the 4x8 case:

                 N          |C|
              +-----+------------------+
    4x8  TDLT |  4  | 68161536         |
    8x16 TDLT |  8  | 7.731400 * 10^19 |
   16x32 TDLT | 16  | 9.947082 * 10^43 |
              +-----+------------------+

   An exhaustive search for parameters that give the optimal coding gain
   for the 4x8 TDLT are below:

   +-----+--------+      +-----+--------+      +-----+--------+
   | p_0 | -11/64 |      | q_0 |  36/64 |      | s_0 |  91/64 |
   +-----+--------+      +-----+--------+      | s_1 |  85/64 |
                                               +-----+--------+

4.2.  Stochastic Search

   For the larger lapped transforms, doing an exhaustive search is not
   possible.  Instead we formulate the optimization problem as an
   integer programming problem and use a robust industrial solver to
   find optimal integer values for the p's, q's, and s's.

   For the 8x16 TDLT, the parameters are below:

   +-----+--------+      +-----+--------+      +-----+--------+
   | p_0 | -23/64 |      | q_0 |  48/64 |      | s_0 |  90/64 |
   | p_1 | -18/64 |      | q_1 |  34/64 |      | s_1 |  73/64 |
   | p_2 |  -6/64 |      | q_2 |  20/64 |      | s_2 |  72/64 |
   +-----+--------+      +-----+--------+      | s_3 |  75/64 |
                                               +-----+--------+


Egge & Terriberry      Expires September 10, 2015               [Page 7]

Internet-Draft                    TDLT                        March 2015


   For the 16x32 TDLT, the parameters are below:

   +-----+--------+      +-----+--------+      +-----+--------+
   | p_0 | -24/64 |      | q_0 |  50/64 |      | s_0 |  90/64 |
   | p_1 | -23/64 |      | q_1 |  40/64 |      | s_1 |  74/64 |
   | p_2 | -17/64 |      | q_2 |  31/64 |      | s_2 |  73/64 |
   | p_3 | -12/64 |      | q_3 |  22/64 |      | s_3 |  71/64 |
   | p_4 | -14/64 |      | q_4 |  18/64 |      | s_4 |  67/64 |
   | p_5 | -13/64 |      | q_5 |  16/64 |      | s_5 |  67/64 |
   | p_6 |  -7/64 |      | q_6 |  11/64 |      | s_6 |  67/64 |
   +-----+--------+      +-----+--------+      | s_7 |  72/64 |
                                               +-----+--------+

   In order to confirm that the integer approximations found are in fact
   optimal, we can compare them with the optimal real valued coding
   gains for the three lapped-transforms we are proposing.  In [Tran01]
   a numeric solver was used to find optimal values for a Type IV lapped
   transform.

                    4x8          8x16         16x32
                +------------+------------+------------+
    Real Valued | 8.6349 dB  | 9.6005 dB  | 9.9057 dB  |
    Approximate | 8.63473 dB | 9.60021 dB | 9.89338 dB |
                +------------+------------+------------+
           Loss | 0.00017 dB | 0.00029 dB | 0.01232 dB |
                +------------+------------+------------+

4.3.  Ramp Constraint

   It is also possible to constrain the lapped transform so that it is
   (1,2)-regular [DT03], i.e., that it has one vanishing moment in the
   analysis filter and two vanishing moments in the synthesis filter.
   This allows the synthesis filter to reconstruct any piecewise linear
   function solely from the DC coefficients.  This causes the shape of
   the DC basis function to be a symmetric linear ramp.  This can be
   particularly useful when it matches the shape of other windowing
   functions used in the codec.  For example, a linear window is
   commonly used with Overlapped Block Motion Compensation (OBMC), which
   is one possible approach for avoiding blocking artifacts in the
   motion-compensation stage of the codec.  More vanishing moments are
   possible, allowing reconstruction of piecewise quadratic or even
   higher-order functions, but these require additional overlap stages.

   This regularity can be enforced solely by enforcing a series of
   constraints on the scale factors, s_i.


Egge & Terriberry      Expires September 10, 2015               [Page 8]

Internet-Draft                    TDLT                        March 2015


        s_0 = N*(1 - q_0)

                 N       /                            \
        s_i = ------- * | (q_{i-1} - 1)*p_{i-1} - q_i | , for i > 0
              2*i + 1    \                            /

   Since 2*i + 1 is odd, but we want s_i to be a dyadic rational value,
   the remainder of the expression must be evenly divisible by (2*i+1).
   A similar set of constraints can be derived for Type III, but they
   involve more of the p's and q's per s_i value, and thus have far
   fewer admissible solutions when coupled with the dyadic rational
   constraint.

   The additional restrictions described above greatly reduce the number
   of combinations to consider, both because there are fewer parameters
   (the s_i's can no longer be chosen independently) and because there
   are fewer combinations of parameter values which produce dyadic
   rational coefficients.  With these constraints, the number of
   combinations is small enough that an exhaustive search is now
   tractable for the 8x16 TDLT.

                  N      |C|
              +-----+-----------+
    4x8  TDLT |  4  | 442       |
    8x16 TDLT |  8  | 331677320 |
              +-----+-----------+

   An exhaustive search for parameters that give the optimal coding gain
   under the ramp and dyadic rational constraints for the 4x8 and 8x16
   TDLT are below:

   +-----+--------+      +-----+--------+      +-----+--------+
   | p_0 | -16/64 |      | q_0 |  41/64 |      | s_0 |  92/64 |
   +-----+--------+      +-----+--------+      | s_1 |  93/64 |
                                               +-----+--------+

   +-----+--------+      +-----+--------+      +-----+--------+
   | p_0 | -24/64 |      | q_0 |  53/64 |      | s_0 |  88/64 |
   | p_1 | -20/64 |      | q_1 |  40/64 |      | s_1 |  75/64 |
   | p_2 |  -4/64 |      | q_2 |  24/64 |      | s_2 |  76/64 |
   +-----+--------+      +-----+--------+      | s_3 |  76/64 |
                                               +-----+--------+

   Unfortunately, in the 16x32 TDLT case the number of combinations is
   still not tractable, even with these additional constraints.  Again,
   we use an integer programming model to solve for the integer
   parameters that optimize coding gain in this context.


Egge & Terriberry      Expires September 10, 2015               [Page 9]

Internet-Draft                    TDLT                        March 2015


   +-----+--------+      +-----+--------+      +-----+--------+
   | p_0 | -32/64 |      | q_0 |  59/64 |      | s_0 |  80/64 |
   | p_1 | -28/64 |      | q_1 |  53/64 |      | s_1 |  72/64 |
   | p_2 | -24/64 |      | q_2 |  46/64 |      | s_2 |  73/64 |
   | p_3 | -32/64 |      | q_3 |  41/64 |      | s_3 |  68/64 |
   | p_4 | -24/64 |      | q_4 |  35/64 |      | s_4 |  72/64 |
   | p_5 | -13/64 |      | q_5 |  24/64 |      | s_5 |  74/64 |
   | p_6 |  -2/64 |      | q_6 |  12/64 |      | s_6 |  74/64 |
   +-----+--------+      +-----+--------+      | s_7 |  70/64 |
                                               +-----+--------+

                      4x8          8x16         16x32
                  +------------+------------+------------+
           Dyadic | 8.63473 dB | 9.60021 dB | 9.89338 dB |
    Ramp + Dyadic | 8.59886 dB | 9.56161 dB | 9.78294 dB |
                  +------------+------------+------------+
             Loss | 0.03587 dB | 0.0386 dB  | 0.11044 dB |
                  +------------+------------+------------+

5.  Intra Prediction

   Since the final pixel values of a block are not available until after
   the post-filter runs, they cannot be used to predict neighboring
   blocks.  There are a number of possible solutions to this.  For
   example, one could simply use pixels from outside the overlap region.
   However, as these pixels are farther away, they are poorer
   predictors, and the extra distance reduces the range of prediction
   directions which have enough neighbors available to form an adequate
   extrapolation [OP11].

   An alternate approach is to perform the prediction in the frequency
   domain.  Initial experiments suggest that this is just as effective
   as prediction in the time domain, and has similar computational
   requirements [Egge13].  However, because the frequency domain
   coefficients of a neighboring block are impacted both by what size
   DCT was used, and the lapping across all four of its edges,
   directional predictors can only be so good.  At low rates, this meant
   more bits were spent correcting an incorrect predictor than were
   saved by coding only a directional mode.

   A signal free technique was developed for doing limited intra
   prediction in the frequency domain when using lapped transforms.
   Note that when the spatial prediction mode is exactly horizontal or
   vertical, applying the filters described in this draft along the
   orthogonal direction is the identity.  Thus it is possible to look at
   the horizontal coefficients of the neighboring block to the left, and
   the vertical energy of the neighboring block above and simply use the
   coefficients where the energy is larger.  When this technique is


Egge & Terriberry      Expires September 10, 2015              [Page 10]

Internet-Draft                    TDLT                        March 2015


   coupled with a quantization and coefficient coder that makes
   signaling no predictor cheap [Vali15], this becomes an effective
   frequency domain intra predictor.

   Finally, a technique was developed for intra predicting chroma
   frequency domain coefficients from decoded coincident luma
   coefficients [Egge15].  While this technique does not strictly
   require the use of lapped transforms, because the block size extent
   (and thus the lapping region) for both the chroma and luma planes is
   the same, the use of a lapped transform does not change the
   effectiveness of this technique.

6.  Motion Compensation

   There have been several lapped transform proposals that perform
   block-by-block motion compensation by simply expanding the size of
   the prediction region for each block [TT01], [OPT11].  However, in
   addition to increasing the amount of motion-compensated prediction
   pixels that must be computed by a factor of four, this also increases
   the number of applications of the pre- and post-filter by a factor of
   four, since this must now be done separately for each block, using
   the motion-compensated frame difference for that block.

   An alternate approach is simply perform motion compensation of the
   frame in a completely separate step, prior to any transform, using
   any method desired [Terr15].  The lapping can then be applied to this
   motion-compensated prediction, producing per-block predictors.  This
   still allows the prediction mode (inter, intra, bi-prediction, etc.)
   to be chosen on a block-by-block basis.  It also interacts well with
   other techniques designed to operate in the frequency domain, such as
   the Pyramid Vector Quantization (PVQ) proposed elsewhere.

   The downside is that motion estimation in the encoder needs to be
   performed for regions slightly beyond the current block.  However,
   this is already required by blocking-artifact-free motion
   compensation techniques, such as Overlapped Block Motion Compensation
   (OBMC).  Experience with OBMC has shown that an encoder can mostly
   ignore look-ahead and still get acceptable results, unlike other
   techniques, such as control-grid interpolation (CGI).

7.  Multiple Block Sizes

   Multiple block size support is important for lapped transforms, since
   the larger support region increases their susceptibility to ringing
   artifacts compared to a non-overlapped transform with the same number
   of coefficients (though it is greatly reduced compared to a non-
   overlapped transform with a support region of the same size).


Egge & Terriberry      Expires September 10, 2015              [Page 11]

Internet-Draft                    TDLT                        March 2015


7.1.  Variable Sized Lapping

   The most obvious approach is to require that the size of the overlap
   filter be constrained by the smallest block adjacent to a given edge.
   This requires some amount of look-ahead in the encoder, but has the
   benefit of using the largest lapping possible in regions where all
   blocks are the same size while not introducing discontinuities where
   blocks of different sizes meet.  Note that this has an effect on the
   coding syntax, as the block size decision for the block below the one
   being coded must made and communicated to the decoder prior to
   coding.  Using this convention no additional information need to be
   communicated other than the block size decision to completely
   describe how the variable sized lapping should be applied.

   Consider an example image that is 32x32 with the following block size
   decisions.  We apply the lapping recursively to blocks of 32x32 at a
   time until we reach a block that is not subdivided into smaller
   blocks.  At each step in the recursions, we apply a filter vertically
   across the block edges that run left to right splitting the block in
   half.  We then apply a horizontal filter across the block edges that
   split the block in half top to bottom.


Egge & Terriberry      Expires September 10, 2015              [Page 12]

Internet-Draft                    TDLT                        March 2015


   +-------+-------+---------------+-------------------------------+
   |       |       |               |                               |
   |       |       |               |                               |
   |       |       |               |                               |
   +-------+-------+               |                               |
   |       |       |               |                               |
   |       |       |               |                               |
   |       |       |               |                               |
   +-------+-------+---------------+                               |
   |               |               |                               |
   |               |               |                               |
   |               |               |                               |
   |               |               |                               |
   |               |               |                               |
   |               |               |                               |
   |               |               |                               |
   +---------------+-------+-------+-------------------------------+
   |               |       |       |                               |
   |               |       |       |                               |
   |               |       |       |                               |
   |               +-------+-------+                               |
   |               |       |       |                               |
   |               |       |       |                               |
   |               |       |       |                               |
   +---------------+-------+-------+                               |
   |               |               |                               |
   |               |               |                               |
   |               |               |                               |
   |               |               |                               |
   |               |               |                               |
   |               |               |                               |
   |               |               |                               |
   +-------------------------------+-------------------------------+

   Block size decision for a 32x32 frame.


Egge & Terriberry      Expires September 10, 2015              [Page 13]

Internet-Draft                    TDLT                        March 2015


   +-------+-------+---------------+-------------------------------+
   |       |       |               |                               |
   |       |       |               |                               |
   |       |       |               |                               |
   +-------+-------+               |                               |
   |       |       |               |                               |
   |       |       |               |                               |
   |       |       |               |                               |
   +-------+-------+---------------+                               |
   |               |               |X X X X X X X X X X X X X X X X|
   |               |               |X X X X X X X X X X X X X X X X|
   |               |               |X X X X X X X X X X X X X X X X|
   |               |               |X X X X X X X X X X X X X X X X|
   |X X X X X X X X|               |X X X X X X X X X X X X X X X X|
   |X X X X X X X X|               |X X X X X X X X X X X X X X X X|
   |X X X X X X X X|X X X X X X X X|X X X X X X X X X X X X X X X X|
   +X-X-X-X-X-X X-X+X-X-X-X+X-X-X-X+X-X-X-X-X-X-X-X-X-X-X-X-X-X-X-X+
   |X X X X X X X X|X X X X|X X X X|X X X X X X X X X X X X X X X X|
   |X X X X X X X X|X X X X|X X X X|X X X X X X X X X X X X X X X X|
   |X X X X X X X X|       |       |X X X X X X X X X X X X X X X X|
   |X X X X X X X X+-------+-------+X X X X X X X X X X X X X X X X|
   |               |       |       |X X X X X X X X X X X X X X X X|
   |               |       |       |X X X X X X X X X X X X X X X X|
   |               |       |       |X X X X X X X X X X X X X X X X|
   +---------------+-------+-------+X X X X X X X X X X X X X X X X|
   |               |               |                               |
   |               |               |                               |
   |               |               |                               |
   |               |               |                               |
   |               |               |                               |
   |               |               |                               |
   |               |               |                               |
   +-------------------------------+-------------------------------+

   Apply the filter vertically across the horizontal internal edge.


Egge & Terriberry      Expires September 10, 2015              [Page 14]

Internet-Draft                    TDLT                        March 2015


   +-------+-------+---------------+-------------------------------+
   |       |       |        X X X X|X X X X                        |
   |       |       |        X X X X|X X X X                        |
   |       |       |        X X X X|X X X X                        |
   +-------+-------+        X X X X|X X X X                        |
   |       |       |        X X X X|X X X X                        |
   |       |       |        X X X X|X X X X                        |
   |       |       |        X X X X|X X X X                        |
   +-------+-------+--------X-X-X-X+X X X X                        |
   |               |        X X X X|X X X X                        |
   |               |        X X X X|X X X X                        |
   |               |        X X X X|X X X X                        |
   |               |        X X X X|X X X X                        |
   |               |        X X X X|X X X X                        |
   |               |        X X X X|X X X X                        |
   |               |        X X X X|X X X X                        |
   +---------------+-------+X-X-X-X+X-X-X-X------------------------+
   |               |       |    X X|X X                            |
   |               |       |    X X|X X                            |
   |               |       |    X X|X X                            |
   |               +-------+----X-X+X X                            |
   |               |       |    X X|X X                            |
   |               |       |    X X|X X                            |
   |               |       |    X X|X X                            |
   +---------------+-------+----X-X+X X                            |
   |               |            X X|X X                            |
   |               |            X X|X X                            |
   |               |            X X|X X                            |
   |               |            X X|X X                            |
   |               |            X X|X X                            |
   |               |            X X|X X                            |
   |               |            X X|X X                            |
   +----------------------------X-X+X-X----------------------------+

   Apply the filter horizontally across the vertical internal edge.

   The filters are then applied recursively in this manner to the four
   quadrants of the block.  By applying the filters recursively this
   way, we have prevented any discontinuities from appearing where block
   is split but its neighbor is not.

7.2.  Fixed Sized Lapping

   One of the challenges using variable sized lapping is that changing
   the block size decision (either splitting a block into four blocks a
   quarter as big, or merging four blocks into one four times the size)
   can have an impact on the coding performance outside the block
   considered.  This makes computing the optimal block size decision for


Egge & Terriberry      Expires September 10, 2015              [Page 15]

Internet-Draft                    TDLT                        March 2015


   a frame computationally difficult as traditional rate-distortion
   optimization (RDO) algorithms exploit this locality to iteratively
   improve an initial decision.

   One way to simplify the problem is to assume a fixed sized lapping
   across the entire image.  If only the 4-point filter is used across
   block boundaries, then it is possible to compare the distortion of an
   8x8 block with that of four 4x4 blocks by simply computing the mean
   squared error (MSE) of the 64 spatial domain coefficients after
   applying the inverse lapped transform.  Because changing the block
   size decision, and thus the interior lapping has no impact on the lap
   decision on the border of the 8x8 block, then just looking at the
   rate and distortion of the interior coefficients is sufficient.

   This approach has does not leverage the additional coding gain and
   deblocking achieved by using larger lapping filters but may make up
   for this by allowing computationally cheap block size decision
   heuristics in real-time encoding environments.

8.  IANA Considerations

   This document has no actions for IANA.

9.  Security Considerations

   This draft has no security considerations.

10.  Acknowledgments

   Thanks to Greg Maxwell and Jean-Marc Valin for their assistance in
   the experimentation and other valuable contributions to this
   document.

11.  Informative References

   [DT03]     Dai, W. and T. Tran, "Regularity-Constrained Pre- and
              Post-Filtering for Block DCT Based Systems", IEEE
              Transactions on Signal Processing 51(10):2568--2581,
              October 2003.

   [Egge13]   Egge, N., "Intra-Prediction in Daala", October 2013,
              <http://people.xiph.org/~unlord/Daala-Intra.pdf>.

   [Egge15]   Egge, N. and J. Valin, "Predicting Chroma from Luma with
              Frequency Domain Intra Prediction", February 2015,
              <http://people.xiph.org/~unlord/spie_cfl.pdf>.


Egge & Terriberry      Expires September 10, 2015              [Page 16]

Internet-Draft                    TDLT                        March 2015


   [Malv89]   Malvar, H. and D. Staelin, "The LOT: Transform Coding
              Without Blocking Effects", IEEE Transactions on Acoustics,
              Speech, and Signal Processing , April 1989,
              <http://research.microsoft.com/apps/pubs/
              default.aspx?id=102073>.

   [OP11]     de Oliveria, R. and B. Pesquet-Popescu, "Intra-Frame
              Prediction with Lapped Transforms for Image Coding", Proc.
              of the 36th IEEE International Conference on Acoustics,
              Speech, and Signal Processing (ICASSP'11) pp. 805--808,
              May 2011.

   [OPT11]    de Oliveria, R., Pesquet-Popescu, B., and M. Trocan,
              "Inter Prediction Using Lapped Transforms for Advanced
              Video Coding", Proc. of the 18th IEEE International
              Conference on Image Processing (ICIP'11) pp. 3705--3708,
              September 2011.

   [Tran01]   Tran, T., "Lapped Transform via Time-Domain Pre- and Post-
              Processing", IEEE Transactions on Signal Processing ,
              October 2001.

   [TT01]     Tran, T. and C. Tu, "Lapped Transform Based Video Coding",
              Proc. of the 24th SPIE Conference on Applications of
              Digital Image Processing vol. 4472, pp. 319--333, July
              2001.

   [Terr12]   Terriberry, T., "Introduction to Video Coding Part 1:
              Transform Coding", February 2012.

   [Terr15]   Terriberry, T., "Adaptive Motion Compensation Without
              Blocking Artifacts", February 2015,
              <https://people.xiph.org/~tterribe/daala/vbsobmc.pdf>.

   [Vali15]   Valin, J., "Perceptual Vector Quantization for Video
              Coding", February 2015,
              <http://jmvalin.ca/video/spie_pvq.pdf>.

Authors' Addresses

   Nathan E. Egge
   Mozilla Corporation
   650 Castro Street
   Mountain View, CA  94041
   USA

   Email: negge@dgql.org


Egge & Terriberry      Expires September 10, 2015              [Page 17]

Internet-Draft                    TDLT                        March 2015


   Timothy B. Terriberry
   Mozilla Corporation
   650 Castro Street
   Mountain View, CA  94041
   USA

   Phone: +1 650 903-0800
   Email: tterribe@xiph.org


Egge & Terriberry      Expires September 10, 2015              [Page 18]