VideoBits.org
home principles standards vendors publications  
intro color prediction transforms compression filtering streams interlace  
 


The principles of digital video coding

Video

Video is an electronic representation of a moving picture. In practice, video is a series of pictures that can give the illusion of motion when displayed sequentially over intervals of time.

Digital Video

Digital video is a sequence of pictures each constructed from a two-dimensional array of pixels (pels) each having a color represented as a digital value.

Digital Video Systems

The figure below shows the generalized anatomy of a system that encodes digital video. This could be a professional video camera, a film scanner, a surveillance camera or a camcorder. Such a system does not necessarily include a mass storage device and usually outputs data to only one form of transmission or media.

This figure shows the generalized anatomy of a system that decodes digital video. This could be a DVD player, a cable box, a VCR receiver, a satellite receiver, a television set or a personal computer. Such a system does not necessarily include a mass storage device and usually accepts data from only one form of transmission or media. The system might create video images on any one of a wide range of display devices.

Compression

Standard D1 resolution NTSC television displays 720 pixels per line and 480 lines per video frame.

Computer Standard Resolution Aspect Ratio
VGA 640 x 480 4:3
SVGA 800 x 600 4:3
XGA 1024 x 768 4:3
workstation 1152 x 864 4:3
WXGA 1280 x 768 15:9
SXGA 1280 x 1024 5:4
UXGA 1600 x 1200 4:3
QXGA 2048 x 1536 4:3
Digital Video Standard Resolution Aspect Ratio Notes
QCIF 176 x 144 4:3
SIF 352 x 240 4:3
SDTV (D1) 720 x 480 4:3 non-square pixels
"720p" 1280 x 720 16:9
HDTV 1920 x 1080 16:9
Digital Film Standard Resolution Aspect Ratio Notes
Academy standard 2048 x 1536 4:3
DVD 720 x 480 4:3 non-square pixels
Laserdisc 560 x 360 4:3 non-square pixels

Each pixel is redrawn 30 (actually 29.97) times per second. Standard D1 resolution PAL and SECAM television displays 720 pixels per line and 576 lines per video frame. Each pixel is redrawn 25 times per second. For NTSC video, the pixel rate for D1 resolution is 10.4 million pixels per second.

720 pel/line * 480 line/frame * 30 frame/second = 10.4 million pel/second

For PAL and SECAM video, the pixel rate for D1 resolution is also 10.4 million pixels per second.

720 pel/line * 576 line/frame * 25 frame/second = 10.4 million pel/second

Today's high definition television (HDTV) requires 62.2 million pel/second.

A 2 hour HDTV movie requires drawing 450 billion pixels.

2 hour * 60 minute/hour * 60 second/minute * 62.2 million pel/second = 450 billion pels

Using a raw representation with 8 bits of red, 8 bits of green and 8 bits of blue at full resolution, 10.8 trillion bits of data are required for a movie.

8 red bits/pel
+ 8 green bits/pel
+ 8 blue bits/pel
= 24 sample bits/pel

24 bits/pel * 450 billion pel = 10.8 trillion bits

That means that over 1 terabyte of video data is required to show a typical movie. A DVD disk holds about 9 gigabytes of data. That means that the video data needs to be compressed by more than a factor of 150 to fit a HDTV movie onto a DVD disk.

Blocks

example video frame image

This video frame shows many of the considerations required in digital video encoding and decoding. This is a frame from the popular television show, The Simpsons. The same basic steps of digital video encoding and decoding are common to all major video codecs. They are as follows:

Encoding and decoding each video frame one pixel at a time would require a lot of unique pieces of information in the coded bitstream. Since each pixels tends to be similar to its neighbors, though not necessarily similar to all other pixels in the frame, video is encoded in blocks of neighboring pixels. Certain steps of digital video coding perform best on square matrices of data, so video blocks are traditionally square.

For many codecs, square blocks of 8 x 8 pixels are used, as shown below. The choice of an 8 x 8 block size is a good compromise between the number of unique pieces of information coded and the amount of commonality between pixels within a block.

example video frame image with 8 x 8 pixel blocks

Some video coding parameters apply to groups of 2 x 2 blocks, known as macroblocks. Each macroblock, therefore, represents a 16 x 16 group of pixels.

example video frame image with 16 x 16 pixel macroblocks

The H.264 codec supports variable block sizes, including blocks that are not square. Blocks can be 16 x 16 pixels, 8 x 16 pixels, 16 x 8 pixels, 8 x 8 pixels, 4 x 8 pixels, 8 x 4 pixels, or 4 x 4 pixels. The size, shape, and orientation of each block is selected by the encoder in order to create groups of pixels with a high degree of similarity. Regions of a video frame that show flat fields will tend to be coded with large blocks to minimize the number of block parameters coded. Regions of a frame with great detail will tend to be coded with smaller blocks in order to increase similarity and improve the accuracy of the coding.

a square of four macroblocks with variable size blocks

Frame Scaling

Graphics Overlays

2D Graphics

3D Graphics

Stereoscopic (3D) Video

Copyright © 2005-2006 Jonah Probell. All rights reserved.