Professional / Visual Arts & Film Studies Research Paper

Image and video compression

If you want to know about the image and video compression, then you have come to the right page. Here’s what image and video compression are and how it works.

Image format

Like any image representation, the video is written in a format. There are many in photography or cinema. The format is given by the width/height or rather horizontal/vertical ratio of the useful image. From the outset, television adopted the 4/3 format. The screen size, much smaller than that of the cinema screen and closer to the square, is well suited to live and news.

However, the technology applied to the processing of video signals, as well as the generalization of larger reproduction devices and 16: 9 formats are obviously changing the modes of consumption of the television picture. Mainly used for news and events, the live only represents a small part of the airtime. With film consumption in lounges exceeding that in theaters and the appearance of HDTV (high definition television), the “television” format is expanding and slowly relegating the old little skylight to the antique department.

Video image

The screen is divided into elementary cells (or points of light) called pixels. Each pixel has an electrical level. The pixels are structured in line, and the lines stack to form an image. The analysis called television scanning is done from left to right and from top to bottom.

The camera analyzes according to this principle, a target (sensor) sensitive to light. It then delivers a signal to which the monitor or television is synchronized to produce, at the point of impact of the electron beam and the emissive surface of the cathode ray tube screen, a proportional light level. Unlike photography or film, which incorporates a physical representation of chemical support (film), the video image only exists in time. Indeed, at an infinitely short instant t, an electrical, physical quantity can have only one and only one value.

MPEG standards

Video compression algorithms other than M-JPEG are particularly numerous and cover a variety of proprietary and open-source formats. Presenting them all here would be particularly tedious and uninteresting. Indeed, the various codecs available apply to vary degrees the MPEG video compression standards defined by the Moving Picture Experts Group, an ISO working group. It is, therefore, these standards that we are going to examine in detail, as well as the underlying principles.

A few preliminary remarks are in order. First of all, it is necessary to bear in mind that these standards go beyond the simple question of compression to tackle other issues such as the synchronization mode of the various objects (sounds and images) that make up the video, equally fascinating issues but which take us away from our subject. Then they cover a variety of formats that are inspired from near or far. Finally, they meet a series of specific needs and should not, therefore, be perceived as simple successive improvements to an original model. 

The MPEG standards which partly concern video compression issues are as follows:

- MPEG-1

- MPEG-2

- MPEG-4


The MPEG-1 standard was created in 1992 to meet the growing needs for improving storage possibilities linked to the rise of compact-disc. The basic principle of MPEG is that in addition to managing the spatial redundancy present within each image that makes up the video sequences, we will also apply to manage the temporal dimension of the redundancy that is, determine which areas of pixels change from one image to another, and which the same remain. As a result, the flow can be taken into account and thus avoid having to re-encode for each image the same information as that of the previous image, as is the case with M-JPEG. Obviously, the other side of the coin is that the access becomes sequential and that we lose the possibility of working image by image. It will then be understood why MPEG compression techniques are essentially intended for broadcasting and not for editing.

Time compression is based on the definition of three types of images, within a video sequence:

- The key images (also called I-frames) are compressed individually and independently of their relation to the other images of the sequence. The elimination of spatial redundancy within the image is done by means of a discrete cosine transform (DCT) of the blocks of 8 × 8 pixels which compose it, followed by quantization and then RLE coding.

- The predicted images (also called P-frames) are compressed by means of the principle known as motion compensation, which aims to eliminate the temporal redundancy between the predicted image and the keyframe which precedes it. The idea is to code the macroblocks that make up the predicted frame differently depending on whether they are the same or different from those in the keyframe. Macroblocks that differ significantly are encoded the same as a macroblock in a keyframe, while those that are the same or nearly the same as in the previous frame are referentially encoded.

- The bidirectional images (also called B-frames) are compressed identically to the predicted images. The difference with these is that the macroblocks are encoded referentially to a preceding frame or to the following frame. The following images must, therefore, already be in memory for the reconstruction of the bidirectional image to be carried out. The very small quantity of macroblocks that cannot be referenced either by an image which precedes or by an image which follows explains the considerable saving of space during the compression operation.


MPEG-2 is not strictly speaking a replacement for MPEG-1 but has been designed to meet different transfer rate needs. The video is here interlaced, each sequence consisting of two frames and no longer a simple succession of images. The motion compensation is therefore worked here on 8x16 pixel blocks, better suited to this much larger flow of information. MPEG-2 is mainly used in digital television, as well as DVD-Video and SVCD. 

MPEG-4 :

The MPEG-4 standard includes a variety of aspects that go far beyond the simple domain of compression. Two parts of the standard are specifically devoted to video coding: parts 2 (ASP) and 10 (AVP, the latter underpinning H264, at the heart of Apple's Quick time 7). Inspired by previous techniques, the MPEG-4 standard develops a technology that allows it to provide exceptional compression rates, the main advantage of which is to allow the transfer of good quality video at a low bit rate. ASP underlies excellent current free codecs such as DivX or Xvid, which offer remarkable image quality for high compression rates. AVP is an advanced video codec that significantly improves the principle of motion compensation by allowing up to 32 intermediate images and allows the use of blocks of variable sizes (16x16 to 4x4).

- Jpeg (Joint Photographic Expert Group) applies a division into elementary matrices to the image. This compression is dedicated to the still image and applies mainly to DTP (desktop publishing) photos.

- M-Jpeg or Motion Jpeg is an evolution of Jpeg for moving image sequences. The M-Jpeg is used for amateur camcorders and, with lower compression rates, professional camcorders, non-linear editing systems (virtual editing stations), and video servers for final broadcast control rooms. This codec family includes, among others, the DV formats (DV 25, DV 50, and DV 100) as well as the DVCPRO and DVCAM formats.

- DivX is a derivative of MPEG-4. It has become very popular due to its ability to make a complete DVD video film on a simple CD-ROM while maintaining good image quality, hence its reputation as a hacking tool for illegally copying DVD video DVDs. Building on its success in use, DivX is now a format licensed and used by DivX Networks.

- MPEG-7(standard in 2001). This standard proposes a multimedia document exchange format with a description of content and interaction to facilitate the search and filtering of information (for example, when viewing a sequence, it becomes possible to "click" on an object, to handle it, to obtain a text file, etc.). MPEG-7 is defined as a multimedia content description interface. This still little used format allows new applications due to the integration of metadata synchronized with the audiovisual flow. Thus, it is possible to envisage “searchable” videos in which the main objects and characters appearing in the image are described, so as to be able to carry out requests of the “Display all sequences in which such character appears” type.

Network/Internet Broadcasting Standards:

In this area, the format war is in full swing. You have to maintain your computer with constant updates and codec additions necessary to read the streams and files available on the networks. The main formats are:

- MPEG-4 (ISO standardization);

- Windows Media (Microsoft);

- QuickTime (Apple);

- RealVideo (Real Networks);

- DivX (DivX Networks).

Some primary colors, that is to say, pure or monochromatic light emissions, mixed together, are able to reproduce a colored impression close to what we see in reality. Color synthesis for video is based on additive mixing. The device generates colored light; the mixture of the primaries gives white. It is the opposite in the printing press, which relies on a subtractive mixture; the mixture of primaries gives black. The three primaries chose for video and, by extension, for computer display are red, green, and blue. They are obtained by filtering when the image is captured in the camera.

A color camera behaves schematically like three black and white cameras, one per primary, which operate in parallel. At the output, the three signals, called components, represent a footprint three times greater than a black and white signal alone.

On restitution, a color cathode ray tube generates three electron beams, each responsible for the light in its primary color. The emissive surface of the tube reproduces a matrix of small dots (round or oblong) alternately red, green, and blue. The red, green, and blue triplet allows the color display of one pixel of the image.

LCD or Plasma active-matrix screens also use red, green, and blue triplets.

2 - Digitization and digital compression

Digitizing a video signal consists of representing this signal by a series of numbers in binary form. This has many advantages. Unlike analog recording systems that induce information loss and add noise to each copy, a digital system, like a hard drive, has no theoretical limit in the number of generations. Information in the form of numbers offers almost unlimited possibilities for processing by calculations. Some effects without equivalent in analog become possible. Finally, storage on computer support (hard disks or RAM) facilitates and accelerates access. Navigating from one sequence to another is fast enough to allow, for example, virtual editing applications.

However, the switch to digital does not only have advantages. Indeed, digital video is very volume-intensive. If we want to maintain high quality, we must accept in return to convey and store considerable volumes of information. In the professional standard 4.2.2 defined since 1982, also called standard D1 (for digital one), the video image imposes a net bit rate of 166 megabits / s (more than 170 million "0" and "1" per second). Each image, at 720 x 576 resolutions in 16.7 million colors, "weighs" about one megabyte. Under these conditions and without taking into account neither access times nor transfer speeds, a DVD of 4.7 gigabytes of capacity could contain only 200 seconds of the video image. The volumes and processing speed clearly limit the use of this standard to professional equipment.

A compression operation, or bit-rate reduction, is required. This involves transmitting or storing data with the least possible occupancy or bandwidth by eliminating information in the digital suites and in the signal they represent.

The compression techniques applied to the audiovisual sector permanently delete part of the original signal. They are destructive, which means that after compression, the original signal can no longer be regenerated identically. However, current compression algorithms aim to remove only the "least important" information, based on the analysis of the human sensory system. The first level of compression consists of reducing the bandwidth of each image by removing details and reducing color information (the human eye is less sensitive to it). We talk about intra-image compression. It is based on a reduction, also called decimation, of the number of pixels and useful lines. The "reduced" image thus obtained is cut into macro-blocks (8 pixels by 8 pixels generally) on which are applied mathematical algorithms of the DCT type (transformed into discrete cosine) of the same type as those used for compression, still images in JPEG format.

However, it is with the second level of compression, temporal compression, that the most spectacular gains are obtained. Indeed, in the same video sequence, the images which follow one another show only a few differences between them. Coding only these differences, which is really relevant, saves an important place. The sequences obtained after compression have a GOP (the group of pictures) structure) with intra images, called pivot images, which can be reproduced independently of the images surrounding them, and images of differences which only exist with reference to adjacent images. MPEG-1, one of the earliest digital compression standards, typically uses 12-frame GOPs or two pivot frames for one second of frames. Certain encoding formats, which seek a very high compression rate, can use much longer GOPs (20 to 50 images).

There are also many mathematical parameters and algorithms (transformed into wavelets, for example) which can intervene during the compression of the video. However, there will always be a direct relationship between the final quality and the compression rate applied. The types of defect generated are numerous (aliasing, pixilation, block or mosaic effect, jerk, etc.) The compression rates applied to the image can be very high (ratio greater than one thousand). In general, the higher the compression, with good quality being preserved, the more the calculations are heavy and complex. HoweverImage and video compression, the algorithms (mathematical processes) improve and the machine powers too. The subjective quality of the restored images follows the movement.

Looking for
an ideal essay?

Our expert writers will write your essay for as low as

from $10,99 $13.60

Place your order now


How does international marriage affect children?
How do racial stereotypes affect consciousness?
History of the film: 1989 – Present
History of Film: 1960 - 1988
History of New Media


Need your
Essay done Overnight?

Achieve your academic goals with our essay writing experts!