Simple Definition of Digital Video
Using the simplest definition, digital video is the representation or encoding of an analog video signals in digital bits for storage, transmission, and/or display. If you have rented a DVD, watched digital cable, DirecTV, or Dish, played a video game, then you have experienced digital video.
The Encoding Process
An important component in digital video is the pixel, or picture element, which represents the color in bits. The color is a blend of red, green, and blue and is represented in bits. The number of bits is termed the bit-depth (and is usually 8, 10, or 16-bits per component). More bits allow a more precise representation of hue from the red, green and blue colors. Graphics are usually represented in RGB (red-green-blue) format, and TV video is usually represented as Y’CbCr, where Y’ is the luminance (or brightness) and CbCr represent the color (pure color with no brightness).
Video is basically a three-dimensional array of color pixels. Two dimensions serve as spatial (horizontal and vertical) directions, and one dimension represents the temporal (time) domain. A frame is a set of all pixels that correspond to a single point in time. Basically, a frame is the same as a still picture.
The number of pixels defines the spatial resolution. Standard television is displayed with 720x480 at 30Hz (576 at 25Hz for PAL) resolution. High Definition television is usually defined as 1280x720 at 60Hz (720P) or 1920x1080 at 30Hz (1080i). This means that standard definition requires 720*480*30fps*24color-bits = 250Mb/s. HDTV is 6 times more data. Thus, the data must be compressed.
Video sequences contain spatial and temporal redundancy. Similarities can thus be encoded by merely registering differences within a frame (spatial) and/or between frames (temporal). Spatial encoding is performed by taking advantage of the fact that the human eye is unable to distinguish small differences in color as easily as it can changes in brightness and so very similar areas of color can be "averaged out". With temporal compression only the changes from one frame to the next are encoded as often a large number of the pixels will be the same on a series of frames (About video compression).
The Decoding (Displaying) Process
Once the video is created, stored, and transmitted, a computer process must open the video and display the original video. This is termed a video decoder. It reads the encoded video file, decompresses and displays it.
The video data must be played in the correct order, with little or no packet loss, and with smooth, continuous timing or essential information will be missing. To ensure video quality, networking companies must provide a good transport mechanism, encoders must produce good picture quality, and the decoder must do a good just of displaying and compensating for errors.
Video Encoding Standards
MPEG-1 (ISO/IEC 11172)
The first digital video encoding standard was developed by the Moving Pictures Experts Groups and was termed MPEG-1. It was adopted in 1992 and provided VHS quality digital video for CD-ROM playback.
MPEG-1 employs intra-frame spatial compression on redundant color values using discrete cosine transforms (DCTs). The DCT is then further reduced by quantizing (basically reducing the scale) and converting 4:4:4 RGB data to 4:2:0 Y’CbCr, which reduces the amount of color information from 24-bits to 12-bits.
MPEG-1 relies on prediction, or more precisely motion-compensated prediction, for temporal compression between frames. It uses 3 frames to create temporal compression:
• I frames
• B frames
• P frames
An I-frame has no reference to the past or future frames. P-frames are forward predicted frames with reference to previous I or P frames. B-frames are encoded with reference to previous and future I, P, or B frames. The smaller number of I frames compared to P and B frames reduces bit-rate even further.
MPEG-2 (ISO/IEC 13818-2)
MPEG-2 was designed to encompass, and to be backward compatible with MPEG-1. It includes support for interlaced video for broadcast TV.
A television broadcast frame is created with two separate fields, a top and bottom interlaced field, with the first line of the bottom field appearing immediately after the first line of the top field. Thus, 30 frames-per-second is actually sent as 60 fields-per-second.
In addition, MPEG-2 includes improved color-sub-sampling, error correction, and improved audio support.
MPEG-3
MPEG-3 was designed for high definition television (HDTV), but the MPEG-2 standard scaled to encompass the requirements so this initiative was withdrawn.
MPEG-4 (ISO/IEC 14496)
MPEG-4 arose from the need to have scalable support for low bit rate applications – streaming over the Internet – and the need for better compression at higher resolutions.


0 comments