|
Compression
Coefficient reordering and run length encoding
Statistically, the greatest magnitudes of information in video
blocks is most often in low frequencies both horizontally and
vertically. That means that after quantization, there will most
often be a large number of zero coefficients in the lower right
corner of the frequency domain matrix and non-zero coefficients in
the upper left. The coefficients can typically be highly
compressed, in a lossles compression step, by run length encoding.
The compression achieve by run length encoding is maximized when
long strings of identical values (zeros in the case of digital
video frequency domain coefficient blocks) are achieved.
To group the frequency coefficients that have the highest
probability of being zero, the coefficients from the matrix are
coded in a zig-zag order starting from the low frequency
components and ending at the high frequency components. The exact
zig-zag pattern varies from one digital video coding standard to
another and can even be defined to fit the statistical nature of
the video content in some video codecs. The example coefficient
block from figure 7.5.1 is shown below in figure 7.6.1 along with
the zig-zag ordering of coded coefficients.
Figure 7.6.1: Example quantized coefficients and their coding order
|
When coded in that order, the example data to be coded is:
000 001 000 000 000 010 101 111 000 000 000 000 000 000 111 000
Next, the data is run length coded by being expressed as the
number of zeros followed by a one. To do so, the same data is
grouped as:
000001 00000000001 01 01 1 1 1 0000000000000000001 1 1 000
and expressed as the count of zeros in each group.
5 10 1 1 0 0 0 18 0 0
These data run lengths are what is passed to the next stage of
compression in order to generate the video bitstream.
Bitstream compression
-
Parameter prediction
Parameters, such as motion vectors, are predicted in the decoder
from those of previously coded neighboring blocks. Only the
difference between the predicted and the actual parameter value
is coded in the bitstream. Since for most video content
neighboring blocks tend to have similar parameters, the
differences tend to be small. Because it is possible for the
differences to be as large as the parameters, encoding
differences instead of the parameters themselves does not give
any compression.
-
Statistical (entropy) coding
Whereas all values for parameters of each video block are almost
equally likely, the differences from a previously coded
neighboring block is likely to be small. For the blocks that
make up moving objects in a video sequence, all tend to have the
same or similar motion vectors.
As with coding parameters such as motion vectors, the likelihood
of run lengths of quantized transformed residuals is weighted
more towards some values than others. In the example in section
6.6, zero occurs most often, followed by one, followed by larger
run lengths.
Figure 7.7.1: Probabilities of example residual values
|
Run Length
|
Probability
|
| 0 |
5/10 |
| 1 |
2/10 |
| 5 |
1/10 |
| 10 |
1/10 |
| 18 |
1/10 |
Statistical (or entropy) coding takes advantage of the fact that
certain values are more likely than others in a sequence of data.
In fact, the greater the differences in probabilities between
data values, the greater the amount of compression that is
generally achievable. Two methods of entropy coding, used by
many video codecs today, are Huffman coding and arithmetic
coding. Both are described below.
Compression occurs because the encoder maps each data value to a
corresponding symbol such that small symbols are used for common
data values and longer symbols for infrequent data values.
In order to correctly decode symbols, the decoder must know what
mapping the encoder used. For some video codecs, the mapping of
symbols is defined by the coding standard as a result of
research into the statistical probabilities of data values in
real video sequences. For other video codecs, the symbol mapping
for statistical coding is determined on the fly as a result of
other recently coded data. Such a coding scheme is known as
context-adaptive.
Since different types of video material have different
properties, context-adaptive compression can give a better
compression ratio. The problem with context-adaptive compression
is that if a bitstream transmission or storage error occurs,
then a single incorrectly decoded piece of data could cause the
propabilites measured by the decoder to be different from those
used by the encoder. This could cause all subsequent data value
to be incorrectly decoded.
-
Huffman Coding
Many web pages discuss Huffman coding. A particularly good one
is
http://www.anaesthetist.com/mnm/compress/huffman/
Huffman coding is used for bitstream compression in MPEG-1 and
MPEG-2 video. Aside from video codecs, Huffman coding is the
basis for PKZIP and many other lossless computer compression
utilities. Huffman coding is also known as variable length
coding (VLC) because coded symbols are of variable lengths.
-
CAVLC
Context adaptive variable length coding (CAVLC) is a
context-adaptive variation of Huffman coding. That means that
the table of probabilities of each symbol changes depending on
what kind of data is coded. In some context-adaptive methods,
the probability table might change continuously based on the
most recent data that has been coded.
-
CABAC
Most video compression standards use some form of CAVLC. Some
newer standards, in particular some profiles of the H.264, use
context-adaptive binary arithmetic coding (CABAC). Arithmetic
coding is a form of compression in which a single number with
many digits of precision is used to represent a large number of
symbols.
The range from zero to one is divided among the symbols such
that the range devoted to each possible symbol value to be
coded is proportional to the probability of occurance of the
symbol value. The number coded to represent the series of
symbols is chosen such that the number falls in the range of
the first symbol value. The number is also chosen so that
within the range for the first symbol value, the numer also
falls within the sub-range appropriate for the second symbol
value. Furthermore, the number is chosen sothat within that
sub-range, it falls in the sub-sub-range appropriate for the
third symbol value, and so forth.
CABAC is a form of arithmetic coding in which the table of
probability for each symbol value varies based on the data that
is coded. Like VLC, BAC is a form of lossless compression.
CABAC compression requires more work in the processor, but
generally gives a better compression ratio than CAVLC. It is
interesting to note that in CAVLC, the theoretical best
compression possible is one single bit per symbol. In CABAC,
it is possible to achieve compression in which all coded
symbols are represented with fewer bits than the number of symbols.
Most high performance video processors perform CABAC in
specialized dedicated processing hardware, rather than using
software executed on a programmable processor.
|
Copyright © 2005-2006 Jonah Probell. All rights reserved.
|
|