The principles of digital video coding
Video
Video is an electronic representation of a moving picture. In practice, video is a series of
pictures that can give the illusion of motion when displayed sequentially over intervals of
time.
Digital Video
Digital video is a sequence of pictures each constructed from a two-dimensional array of pixels
(pels) each having a color represented as a digital value.
Digital Video Systems
The figure below shows the generalized anatomy of a system that encodes digital
video. This could be a professional video camera, a film scanner, a surveillance
camera or a camcorder. Such a system does not necessarily include a mass storage
device and usually outputs data to only one form of transmission or media.
This figure shows the generalized anatomy of a system that decodes digital
video. This could be a DVD player, a cable box, a VCR receiver, a satellite
receiver, a television set or a personal computer. Such a system does not
necessarily include a mass storage device and usually accepts data from only
one form of transmission or media. The system might create video images on any
one of a wide range of display devices.
Compression
Standard D1 resolution NTSC television displays 720 pixels per line and
480 lines per video frame.
|
Computer Standard
|
Resolution
|
Aspect Ratio
|
|
VGA
|
640 x 480
|
4:3
|
|
SVGA
|
800 x 600
|
4:3
|
|
XGA
|
1024 x 768
|
4:3
|
|
workstation
|
1152 x 864
|
4:3
|
|
WXGA
|
1280 x 768
|
15:9
|
|
SXGA
|
1280 x 1024
|
5:4
|
|
UXGA
|
1600 x 1200
|
4:3
|
|
QXGA
|
2048 x 1536
|
4:3
|
|
Digital Video Standard
|
Resolution
|
Aspect Ratio
|
Notes
|
|
QCIF
|
176 x 144
|
4:3
|
|
SIF
|
352 x 240
|
4:3
|
|
SDTV (D1)
|
720 x 480
|
4:3
|
non-square pixels
|
|
"720p"
|
1280 x 720
|
16:9
|
|
HDTV
|
1920 x 1080
|
16:9
|
|
Digital Film Standard
|
Resolution
|
Aspect Ratio
|
Notes
|
|
Academy standard
|
2048 x 1536
|
4:3
|
|
DVD
|
720 x 480
|
4:3
|
non-square pixels
|
|
Laserdisc
|
560 x 360
|
4:3
|
non-square pixels
|
Each pixel is redrawn 30 (actually 29.97)
times per second. Standard D1 resolution PAL and SECAM television
displays 720 pixels per line and 576 lines per video frame. Each pixel
is redrawn 25 times per second. For NTSC video, the pixel rate for D1
resolution is 10.4 million pixels per second.
720 pel/line * 480 line/frame * 30 frame/second = 10.4 million pel/second
For PAL and SECAM video, the pixel rate for D1 resolution is also 10.4
million pixels per second.
720 pel/line * 576 line/frame * 25 frame/second = 10.4 million pel/second
Today's high definition television (HDTV) requires 62.2 million pel/second.
A 2 hour HDTV movie requires drawing 450 billion pixels.
2 hour * 60 minute/hour * 60 second/minute * 62.2 million pel/second = 450 billion pels
Using a raw representation with 8 bits of red, 8 bits of green and 8 bits
of blue at full resolution, 10.8 trillion bits of data
are required for a movie.
8 red bits/pel
+ 8 green bits/pel
+ 8 blue bits/pel
= 24 sample bits/pel
24 bits/pel * 450 billion pel = 10.8 trillion bits
That means that over 1 terabyte of video data is required to show a
typical movie. A DVD disk holds about 9 gigabytes of data. That means
that the video data needs to be compressed by more than a factor of 150
to fit a HDTV movie onto a DVD disk.
Blocks
This video frame shows many of the considerations
required in digital video encoding and decoding. This is a frame
from the popular television show, The Simpsons. The same
basic steps of digital video encoding and decoding are common to all
major video codecs. They are as follows:
Encoding and decoding each video frame one pixel at a time would
require a lot of unique pieces of information in the coded
bitstream. Since each pixels tends to be similar to its neighbors,
though not necessarily similar to all other pixels in the frame,
video is encoded in blocks of neighboring pixels. Certain steps of digital video coding perform
best on square matrices of data, so video blocks are traditionally square.
For many codecs, square blocks of 8 x 8 pixels are used, as shown below. The choice of an 8 x 8
block size is a good compromise
between the number of unique pieces of information coded and the
amount of commonality between pixels within a block.
Some video coding parameters apply to groups of 2 x 2 blocks, known
as macroblocks. Each macroblock, therefore, represents a 16 x 16
group of pixels.
The H.264 codec supports variable block sizes, including
blocks that are not square. Blocks can be 16 x 16 pixels, 8 x 16
pixels, 16 x 8 pixels, 8 x 8 pixels, 4 x 8 pixels, 8 x 4 pixels, or
4 x 4 pixels. The size, shape, and
orientation of each block is selected by the encoder in order to
create groups of pixels with a high degree of similarity. Regions
of a video frame that show flat fields will tend to be coded with
large blocks to minimize the number of block parameters coded.
Regions of a frame with great detail will tend to be coded with
smaller blocks in order to increase similarity and improve the
accuracy of the coding.
Frame Scaling
Graphics Overlays
2D Graphics
3D Graphics
Stereoscopic (3D) Video
|
Copyright © 2005-2006 Jonah Probell. All rights reserved.
|
|