VideoBits.org
home principles standards vendors publications  
intro color prediction transforms compression filtering streams interlace  
 


Transforms

Correction residuals

Prediction gives the decoder a way to approximate the content of the block, but because an image may changes in ways that make it impossible to find a perfectly matching prediction source in a neighboring block or in an area of a previously coded frame, the prediction of a block may be significantly different from that block in the video source. An encoder, knowing how the decoder will reconstruct the block from the prediction source, calculates the error between the reconstructed prediction and the original source for each pixel in the block. A block containing the difference values between the reconstructed prediction and the actual source block pixels represents the error in the prediction of each pixel. That block of pixel difference values is known as a block of residual data or a residuals block.

No video image data is actually encoded in a compressed video bitstream. Only two kinds of data are actually encoded:

  1. How to predict each block
    • motion vector(s) and previously coded frame buffer source reference(s) for an inter predicted block
    • prediction mode for an intra predicted block
  2. The residual block containing the value to add to (or subtract from) each predicted pixel value in order to recreate the correct intended value.

An example block of video source data, the block of prediction data using the best prediction source that the encoder could find, and the resulting residuals block is shown in figure 7.3.1.

3 20 62 135 5 18 66 133 -2 2 -4 2
8 34 88 175 8 33 96 169 0 1 -6 6
24 46 140 205 25 45 145 202 -1 1 -5 3
51 96 194 248 51 96 200 243 0 0 -6 5
Source Prediction Residuals

Notice that video pixel data is always positive because it is impossible to have a negative value for an intensity of visible light. Digital video pixel data is typically represented with unsigned binary numbers. Since residuals are a difference value, they can be negative and must be represented with signed binary numbers.

Spatial to frequency domain transformation

  1. The relationship of the space/time domain to the frequency domain

    Any arbitrary waveform within a fixed interval can be created by adding some (positive or negative) multiple of a constant and some multiple of a sine wave with a period equal to the interval and some multiple of a sine wave with a period equal to half of the interval and some multiple of a sine wave with a period equal to one third of the interval and so on. Another way to say the same thing is that any waveform can be synthesized by the summation of sine waves of integer multiples of the frequency corresponding to the interval period, starting with the multiple zero, with each frequency component weighted by a coefficient. This is expressed mathematically in figure 7.4.1 where Ci is a series of frequency weighting coefficients that can be used to create the desired waveform.

    waveform frequency summation

    When a waveform is expressed as a series of frequency weighting coefficients, that is known as an expression of the waveform in the frequency domain. If the waveform is an audio signal representing a change in air pressure over time then the signal can be identically transformed from the time domain to the frequency domain and back. This is most famously known as the Fourier transform.

    One interesting simple example is the creation of a square wave from the summation of frequency multiples of a sine wave. In the frequency domain, a square wave can be expressed as

            [0, 1, 0, 1/3, 0, 1/5, 0, 1/7, 0, 1/9, ...]
           

    Figure 7.4.2 shows these frequency multiples of sine waves weighted by their corresponding weighting factors.

    Weighted sine wave components of a square wave

    When those sine waves are all added together they approximate a square wave, as show in figure 7.4.3

    Square wave approximated by the summation of the first few sine wave frequency components

    The more frequency components added together with the appropriate patten of frequency domain coefficients, the closer the result looks to a true square wave. In fact, any abitrary real waveform in time or space is generally represented by an infinite series of frequency domain coefficients.

    The relationship of discretely sampled, limited resolution space/time to a limited number of frequency domain coefficients

    When a waveform is sampled at a fixed number of discrete sampling intervals within a larger waveform interval, the waveform is represented by a fixed set of samples. A limited number of space/time domain samples can be transformed to an equal number of frequency domain coefficients. This is most famously known as the discrete Fourier transform (DFT).

    When space/time domain sampels are represented digitally with a fixed number of bits, the number of bits used for each space/time domain sample corresponds to the number of bits required for each frequency domain coefficient. A specific algorithm for performing such transforms efficiently in digital computers is known as the fast Fourier transform (FFT).

    Notice that the number of bits and the number of samples in the space/time domain corresponds exactly to the number of bits and the number of coefficients in the frequency domain. The transformation from one domain to the other is one-to-one and reversible but does not provide any data reduction or compression.

    Two-dimensional space to two-dimensional frequencies

    The transformation of spatial domain samples to frequency domain coefficients can be extended to two-dimensions. This is most famously accomplished with the two-dimensional discrete cosine transform (DCT), which is reversed with the inverse DCT (IDCT).

    The DCT converts a square matrix of spatial domain data to an equally sized matrix of frequency domain coefficients. The coefficients on the left side of the matrix correspond to low frequencies in the horizontal direction and coefficients on the right side ofthe matrix correspond to high frequencies in the horizontal direction. Similarly, coefficients at the top of the matrix correspond to low frequencies in the vertical direction and coefficients at the bottom of the matrix correspond to high frequencies in the vertical direction.

    A spatial domain matrix with vertical stripes has high frequencies in the horizontal direction and low frequencies in the vertical direction. When transformed by a DCT operation, such a spatial domain matrix will yield a coefficient matrix with values of large magnitude near the top right and small values elsewhere. A spatial domain matrix with horizontal stripes has low frequencies in the horizontal direction and high frequencies in the vertical direction. When transformed by a DCT operation, such a spatial domain matrix will yield a coefficient matrix with values of large magnitude near the bottom left and small values elsewhere. A spatial domain matrix with a flat field of nearly identical values at all positions has low frequencies in the horizontal direction and low frequencies in the vertical direction. When transformed by a DCT operation, such a spatial domain matrix will yield a coefficient matrix with values of large magnitude near the upper left and small values elsewhere. A spatial domain matrix with a checkerboard pattern of high and low values has high frequencies in the horizontal direction and high frequencies in the vertical direction. When transformed by a DCT operation, such a spatial domain matrix will yield a coefficient matrix with values of large magnitude near the upper left and small values elsewhere.

    The matrix of residuals derived in the example shown in figure 7.3.1 roughly has vertical stripes of similar values. The frequency domain coefficients that result from the DCT transform of that matrix are shown in figure 7.4.3.1.

    -2 2 -4 2 -1.000 2.824 7.500 -10.737
    0 1 -6 6 0.079 0.146 -2.263 -0.146
    -1 1 -5 3 -0.500 0.1913 -1 0.4619
    0 0 -6 5 -1.115 0.854 -2.851 0.8536
    Spatial domain residuals Frequency domain coefficients

    Notice that the coefficients in the upper right have larger magnitude than the coefficients elsewhere in the matrix, as expected.

    The prevalence of low frequencies in digital video residuals blocks

    In the coding of most blocks of digital video data, prediction generates a close match to the source data. As a result, residuals blocks are most often most similar to a flat field than to a matrix of stripes or checkers. As a result, the DCT transform most often yields a matrix of frequency domain coefficients with the largest values in the upper left and with much smaller values elsewhere.

    Quantization

    So far, every step of the video coding process that we have discussed can be performed to encode video and reversed to decode a perfect copy of the original video data. We have not thrown away any data or lost any quality. Quantization is the step where we throw away some data to achieve compression.

    The frequency domain coefficients to be encoded, such as those in the example in figure 7.4.3.1, fall within a certain range of values. In the example, the highest value is 7.5 and the lowest value is -10.737. Using signed binary integers, the full range of coefficients can be encoded with 5 bit values. This eliminates 3 unnecessary high order bits from the bitstream, reducing the number of bits in this example by 37.5%.

    Next, by further rounding each coefficient to, for example, the nearest multiple of 4, we remove two more low order bits as shown in figure 7.5.1.

    -1.00 2.824 7.500 -10.737 0 4 8 -12 000 001 010 101
    0.079 0.146 -2.263 -0.146 0 0 -4 0 000 000 111 000
    -0.500 0.1913 -1 0.4619 0 0 0 0 000 000 000 000
    -1.115 0.854 -2.851 0.8536 0 0 -4 0 000 000 111 000
    Frequency domain coefficients Rounded coefficients Quantized binary representation

    This yields a matrix of coefficients with 3 bits per coefficient, which is 62.5% smaller than a matrix of 8-bit video data.

    Copyright © 2005-2006 Jonah Probell. All rights reserved.