VideoBits.org
home principles standards vendors publications  
intro color prediction transforms compression filtering streams interlace  
 


Compression

Coefficient reordering and run length encoding

Statistically, the greatest magnitudes of information in video blocks is most often in low frequencies both horizontally and vertically. That means that after quantization, there will most often be a large number of zero coefficients in the lower right corner of the frequency domain matrix and non-zero coefficients in the upper left. The coefficients can typically be highly compressed, in a lossles compression step, by run length encoding. The compression achieve by run length encoding is maximized when long strings of identical values (zeros in the case of digital video frequency domain coefficient blocks) are achieved.

To group the frequency coefficients that have the highest probability of being zero, the coefficients from the matrix are coded in a zig-zag order starting from the low frequency components and ending at the high frequency components. The exact zig-zag pattern varies from one digital video coding standard to another and can even be defined to fit the statistical nature of the video content in some video codecs. The example coefficient block from figure 7.5.1 is shown below in figure 7.6.1 along with the zig-zag ordering of coded coefficients.

Figure 7.6.1: Example quantized coefficients and their coding order

When coded in that order, the example data to be coded is:

000 001 000 000 000 010 101 111 000 000 000 000 000 000 111 000

Next, the data is run length coded by being expressed as the number of zeros followed by a one. To do so, the same data is grouped as:

000001 00000000001 01 01 1 1 1 0000000000000000001 1 1 000

and expressed as the count of zeros in each group.

5 10 1 1 0 0 0 18 0 0

These data run lengths are what is passed to the next stage of compression in order to generate the video bitstream.

Bitstream compression

  • Parameter prediction

    Parameters, such as motion vectors, are predicted in the decoder from those of previously coded neighboring blocks. Only the difference between the predicted and the actual parameter value is coded in the bitstream. Since for most video content neighboring blocks tend to have similar parameters, the differences tend to be small. Because it is possible for the differences to be as large as the parameters, encoding differences instead of the parameters themselves does not give any compression.

  • Statistical (entropy) coding

    Whereas all values for parameters of each video block are almost equally likely, the differences from a previously coded neighboring block is likely to be small. For the blocks that make up moving objects in a video sequence, all tend to have the same or similar motion vectors.

    As with coding parameters such as motion vectors, the likelihood of run lengths of quantized transformed residuals is weighted more towards some values than others. In the example in section 6.6, zero occurs most often, followed by one, followed by larger run lengths.

    Figure 7.7.1: Probabilities of example residual values
    Run Length Probability
    0 5/10
    1 2/10
    5 1/10
    10 1/10
    18 1/10

    Statistical (or entropy) coding takes advantage of the fact that certain values are more likely than others in a sequence of data. In fact, the greater the differences in probabilities between data values, the greater the amount of compression that is generally achievable. Two methods of entropy coding, used by many video codecs today, are Huffman coding and arithmetic coding. Both are described below.

    Compression occurs because the encoder maps each data value to a corresponding symbol such that small symbols are used for common data values and longer symbols for infrequent data values.

    In order to correctly decode symbols, the decoder must know what mapping the encoder used. For some video codecs, the mapping of symbols is defined by the coding standard as a result of research into the statistical probabilities of data values in real video sequences. For other video codecs, the symbol mapping for statistical coding is determined on the fly as a result of other recently coded data. Such a coding scheme is known as context-adaptive.

    Since different types of video material have different properties, context-adaptive compression can give a better compression ratio. The problem with context-adaptive compression is that if a bitstream transmission or storage error occurs, then a single incorrectly decoded piece of data could cause the propabilites measured by the decoder to be different from those used by the encoder. This could cause all subsequent data value to be incorrectly decoded.

  • Huffman Coding

    Many web pages discuss Huffman coding. A particularly good one is http://www.anaesthetist.com/mnm/compress/huffman/

    Huffman coding is used for bitstream compression in MPEG-1 and MPEG-2 video. Aside from video codecs, Huffman coding is the basis for PKZIP and many other lossless computer compression utilities. Huffman coding is also known as variable length coding (VLC) because coded symbols are of variable lengths.

  • CAVLC

    Context adaptive variable length coding (CAVLC) is a context-adaptive variation of Huffman coding. That means that the table of probabilities of each symbol changes depending on what kind of data is coded. In some context-adaptive methods, the probability table might change continuously based on the most recent data that has been coded.

  • CABAC

    Most video compression standards use some form of CAVLC. Some newer standards, in particular some profiles of the H.264, use context-adaptive binary arithmetic coding (CABAC). Arithmetic coding is a form of compression in which a single number with many digits of precision is used to represent a large number of symbols.

    The range from zero to one is divided among the symbols such that the range devoted to each possible symbol value to be coded is proportional to the probability of occurance of the symbol value. The number coded to represent the series of symbols is chosen such that the number falls in the range of the first symbol value. The number is also chosen so that within the range for the first symbol value, the numer also falls within the sub-range appropriate for the second symbol value. Furthermore, the number is chosen sothat within that sub-range, it falls in the sub-sub-range appropriate for the third symbol value, and so forth.

    CABAC is a form of arithmetic coding in which the table of probability for each symbol value varies based on the data that is coded. Like VLC, BAC is a form of lossless compression.

    CABAC compression requires more work in the processor, but generally gives a better compression ratio than CAVLC. It is interesting to note that in CAVLC, the theoretical best compression possible is one single bit per symbol. In CABAC, it is possible to achieve compression in which all coded symbols are represented with fewer bits than the number of symbols. Most high performance video processors perform CABAC in specialized dedicated processing hardware, rather than using software executed on a programmable processor.

Copyright © 2005-2006 Jonah Probell. All rights reserved.