video compressiontechniques&standards lamamahmoud_report#2

of 13

VIDEO COMPRESSION FUNDAMENTALS AND

STANDARDS

LAMA MAHMOUD

Khalifa University of Science, Technology and Research

Electronics and Computer Engineering Department (ECE)

ELCE 491: Independent Study

Fall Semester / 2013-2014

A report submitted to Dr. Andrzej Sluzek as a one of the independent study course

reports

of 13

CONTENTS

ABSTRACT .................................................................................................................................... 3

CHAPTER [1]: INTRODUCTION ................................................................................................ 4

1.1 An overview of JPEG format for still images .................................................................. 4

1.2 Theoretical background about video conferencing specifications ................................... 5

CHAPTER [2]: VIDEO COMPRESSION STANDARDS ............................................................ 5

1. H. 261 STANDARD................................................................................................................ 6

2. H. 263 STANDARD ................................................................................................................ 6

3. MPEG - 1 Format: ................................................................................................................... 7

4. MPEG – 2 Format.................................................................................................................... 8

5. MPEG – 4 Format.................................................................................................................... 9

CHAPTER [4]: CONCLUSION ..................................................................................................... 9

1. Average Absolute Difference ............................................................................................ 10

2. Mean Square Error (MSE) ................................................................................................. 10

3. Signal to Noise Ratio (SNR) .............................................................................................. 11

4. Peak Signal to Noise Ratio (PSNR) ................................................................................... 11

CHAPTER [4]: CONCLUSION ................................................................................................... 11

REFERENCES ............................................................................................................................. 12

LIST OF FIGURES AND TABLES

Figure 1: A block diagram for the JPEG encoder ........................................................................... 4

Figure 2: A block diagram for the JPEG Decoder .......................................................................... 4

Figure 3: An example of the intra-frame and predictive frame coding .......................................... 5

Figure 4: Subjective and Objective Benchmarks for comparisons ................................................. 9

Table 1: Different applications with most suitable video compression format for each .............. 12

of 13

ABSTRACT

A video could be categorized into two types; namely, a video conference (slow motion) or a high

motion video. The former can be defined as a video with no or small motions. One example of

video conferencing is the video taken by a webcam in your laptop during a Skype video call or

the videos recorded by the security cameras. On the other hand, the latter or high motion video is

a video just like the sports videos in which an intensive motion will be presented. Technically,

video conferencing applications contains successive frames with few pixels changed from one

frame to another, and, high motion videos includes a high number of pixels changed from one

frame to another.

Several video compression formats took place in the past few decades. As technology

gets vastly improved especially with the newly installed hardware devices (such as HD TV and

Internet TV); higher compression ratio was demanded in order to compensate for the limited

available bandwidth. In addition, along with the newly designed video compression formats,

some benchmarks to evaluate the quality of the modified video were determined.

This report describes the resolution, advantages and disadvantages of the most commonly

used video compression formats such as H. 261, H. 263, MPEG-1, MPEG-2 and MPEG-4

formats. Furthermore, benchmark assessments are discussed in order to determine the quality of

the resultant video. Finally, a list of the most commonly-used video applications will be matched

with the most suitable compression format for each.

of 13

CHAPTER [1]: INTRODUCTION

1.1 An overview of JPEG format for still images

Basically, JPEG format is one of the ways in which a still image can be compressed. In general,

the image will be divided into small 8x8 blocks. Then, discrete cosine transform (or DCT for

short) will be generated for each block and such DCT values will be quantized according to a

quantization table. In the next step, the 8x8 block will be converted into 64 (1DC + 63 AC)

values by using the zigzag method and usually, an entropy encoder will be used to code these

values. Figures [1] & [2] below present the schematic diagram of the JPEG encoder and decoder

respectively.

Figure 1: A block diagram for the JPEG encoder

Figure 2: A block diagram for the JPEG Decoder

Furthermore, in image compression, any image will consist of the three main colors which are

red, green and blue (RGB colors for short). Recently, for further compression and in order to

have more control on the colors’ ratios, a still image will be further filtered into chrominance

(i.e. grey information) and luminance (i.e. color information). The luminance (Y) is equivalent to

the following formula:

Y= 0.3R + 0.59G + 0.11B

of 13

1.2 Theoretical background about video conferencing specifications

Mainly, video conferencing techniques are applications which are not motion intensive and

require limited motion search and estimation strategies (i.e. Skype video conferencing is an

example). They are optimized to achieve very high compression ratios for full color; real time

video transmissions. They combine intra frame (DCT) coding and inter-frame coding to provide

a good compression and decompression ratios as follows;

1. Intra-Frame: We use JPEG compression which is basically a discrete cosine transform

(DCT) according to the following formula;

N-1 M-1

C (k1,k2) = ∑ ∑ 4x(n,m) cos[pk1(2n+1)/2N]cos [pk2(2n+1)/2M]

n=0 m=0

2. Inter-Frame: Because a video is basically a 20-30 frames/second and in video

conferencing; we assume that motion is limited. Therefore, when you compare between

one frame and the previous one; you will notice that only few pixels are changed. So that;

inter-frame; we will only send the changed pixels from one frame to another. Inter-frame

coding is called “predictive inter-frame coding”.

Figure [3] below shows an example of the Intra-frame and the predictive-frame coding.

Figure 3: An example of the intra-frame and predictive frame coding

CHAPTER [2]: VIDEO COMPRESSION STANDARDS

Several video conferencing techniques were developed in the past few decades. However, only

some of them were used in real life applications. In fact, for a video compression technique to be

well-considered by the International Telecommunication Union (or ITU for short), several

criteria should be met as follows;

Interoperability: should assure that encoders and decoders from different manufacturers

work together seamlessly.

Innovation: should perform significantly better than previous standard.

of 13

Competition: should be flexible enough to allow competition between manufacturers

based on technical merit. Only standardize bit-stream syntax and reference decoder.

Independence from transmission and storage media: should be flexible enough to be

used for a range of applications.

Forward compatibility: should decode bit-streams from prior standard

Backward compatibility: prior generation decoders should be able to partially decode

new bit-streams

In this section, selected video compression standards will be presented along with their

specifications, advantages, disadvantages and most suitable applications.

1. H. 261 STANDARD

H. 261 International standard was mainly designed for ISDN picture phones and for video

conferencing systems in 1990. The main characteristics of the H. 261 Standard are as follows;

1. Quarter Common Intermediate Format (OCIF).

Luminance: 144x176, Chrominance: 72x88

2. Common Intermediate Format (CIF).


3. The Chrominance components are subsampled by two in both the vertical and horizontal

directions

4. 8x8 DCT

5. Macro-Block

6. 4Y + U + V = 6 blocks; more gray information

7. Group of Blocks= 11x3 Marco Blocks

8. Interceded Blocks use 16x16

Although H. 261 standard requires a comparatively low bandwidth, it has some disadvantages as

follows;

1. Old days ago; Etisalat in UAE for instance, used to sell them as both hardware and

software together in one hardware machine. Unless you have the hardware. You cannot

use the technology; that’s why it was unsuccessful.

2. Limited Resolution: 144x176 Chrominance

2. H. 263 STANDARD H. 263 is an improved standard for low bit-rate. Like H. 261, it uses the transform coding for

intra-frames and predictive coding for inter-frames. Furthermore, H. 263 supports the following

resolutions;

of 13

1. Sub QCIF:


2. QCIF

Luminance 144x176 Chrominance 72x88

3. CIF

Luminance 288x352 Chrominance 144x176

4. 4CIF:

Luminance 576x704, Chrominance: 288x352 just likw a normal TV

5. 16 CIF:

Luminance 1152x1408, Chrominance 576X704

6. 8x8 DCT, JPEG uses DCT (Discrete Cosine Transform)

7. Macro-block; 4Y + U +V

8. Motion estimation 16x16 and 8x8; varies according to the residual error to achieve better

performance

In comparison, the advantages of H. 263 include;

1. Comparatively, unlike H. 261, it was a software only (that is why skype was more

successful)

2. Improved resolution than H. 261, but still limited

3. Resolution could be alternated according to the bandwidth (more bandwidth, more pixels,

more frames; depending in the used software mobile or laptop).

As was mentioned previously, H.261 and H. 263 formats are designed for video conferencing

applications were slow motion only will take place. On the other hand, the following formats are

used for full motion video applications (with more motion-intensive than video conferencing

format). Clearly, this format type requires higher data rate and high bandwidth because

compression ratio will be less.

3. MPEG - 1 Format:

The Moving Picture Experts Group (MPEG) is a working group of experts that was formed

by ISO and IEC to set standards for audio and video compression and transmission. MPEG – 1

Format Algorithm is as follows:

1. Intra Frame coding: I-Frame; similar to JPEG (low compression ratios) and they are used

as random access points

2. Inter-frame Coding: -Frames, prediscted frames are coded using forward predictive

coding, where the actual frame coded with reference to the previous frame. Compression

ratio is higher than of the I frame

http://en.wikipedia.org/wiki/Working_group

http://en.wikipedia.org/wiki/International_Organization_for_Standardization

http://en.wikipedia.org/wiki/International_Electrotechnical_Commission

http://en.wikipedia.org/wiki/Audio_compression_(data)

http://en.wikipedia.org/wiki/Video_compression

of 13

3. B-Frames: Bi-directional frames are coded using two reference frames a past and future

frame.

MPEG -1 Resolution are as follows:

1. 8x8 DCT

2. 16 X 16 Motion compensated blocks

3. 4x3 aspect ratio

The disadvantage of MPEG -1:

1. MPEG – 1 is used to store video on compact disc VDC. 1 hour/Disc and 2 Discs/ movie.

So you need to change the VDC

2. Uses two channel steros audio; not surrounded sound. MPEG 1, Layer 1, MAPEG 2,

Layer 2 and MPEG 1, layer 3 or (mp3)

3. Low resolution: 4x4 aspect ratio (normal TV, not HD) with bit rate: 1 to 1.5

Mbits/second

The advantages of MPEG 1:

1. Bidirectional frame coding

2. Used for full motion rather than video conferencing compression only.

3. MPEG 2 outperforms MPEG 1

4. MPEG – 2 Format

The MPEG-2 format is the format that DVD’s are based on. Any software DVD player should be

able to play an MPEG-2 movie. However, note that MPEG-2 files are very large, approaching a

megabyte per running second.

MPEG – 2 Resolutions:

1. Supports high resolution (16383 x 16383)

2. Chrominance Sampling:

a. 3:2:0

b. 4:2:2

c. 4:4:4

Advantages of MPEG – 2:

1. Supports Bit rate 2-10Mbits/second and 3x4 aspect ratio

2. DVD up to 5.1 channels (surround sound)

of 13

Benchmarks for Comparasions

Subjective Assesment

Objective Assesment

1. Average Absolute Difference

2. Mean Squared Error (MSE)

3. Signal to Noise Ratio (SNR)

4. Peak Signal to Noise Ratio (PSNR)

Disadvantages of MPEG – 2:

1. Pixels are not labeled, so I cannot change objects from one frame and put it in another

one.

2. Designed to be played only on a computer

5. MPEG – 4 Format Was originally intended for very low bit rates; however; it went beyond high compression ratios

into areas such as content-based interactivity. MPEG – 4 will have a special language MSDL

which describes how to process the elementary data streams. Data could be MPEG – 1 or MPEG

– 2 or 2D or 3D synthesized images. Compression ratios will be achieved because missing bits

will be reconstructed using sets of tools and algorithms.

MPEG – 4 Features:

1. Universal accessibility

2. The ability to operate in extremely error prone environment (Mobile systems)

3. The possibility of user interaction when presenting audio and video information

4. Better compression ratio than MPEG – 2

5. Downloading decoding tools

6. Simultaneous use of data from different sources

7. Hybrid coding of natural and synthetic objects

8. Communications between several participants

9. Integration of real time applications and non-real time (stored) applications.

CHAPTER [4]: CONCLUSION

Consequently, after performing the compression for a selected video, some evaluation tools

could be used in order to evaluate the quality of the compressed video file. In this section,

several benchmarks for comparison will take place such as average absolute error, MSE, SNR

and PSNR (see figure 4).

Figure 4: Subjective and Objective Benchmarks for comparisons

of 13

To begin with, subjective and objective assessments are the two main benchmarks in which a

video could be evaluated. The subjective assessment is based on considering the ranking taken

by a survey for a selected number of persons. For example, the distorted data set will be rank-

ordered from best to worst as follows;

Imperceptible 5

Perceptible, not annoying 4

Slightly annoying 3

Annoying 2

Very annoying 1

As it might be noticed; such an assessment is never absolutely accurate since it depends on

people’s opinions which might be slightly different from one sample to another. Alternatively,

the objective assessment considers mathematical interpretations and numbers in which a more

accurate evaluation will be revealed. The following sections will mainly present the four

different objective benchmarks that are used to evaluate the compressed data.

1. Average Absolute Difference

Basically, if you have two images with NxM pixels where Pij are the pixels of the original image

and P’ij are the pixels of the modified image, then the average absolute difference is;

∑ ∑

| Pij - P`ij |

For example; if both the original and modified images have 4x4 pixels as follows;

Original

20 50

10 60

Then, the absolute average error could be calculated as follows;

= 2

2. Mean Square Error (MSE)

Similarly, the MSE is basically given by:

∑ ∑

(Pij - P`ij )2

Based on the previous example; the MSE could be calculated as follows;

= 6.5

Modified

23 49

10 64

of 13

3. Signal to Noise Ratio (SNR)

The signal to noise ratio could be calculated as follows;

SNR= ∑

∑

∑ ∑ ( )

Based on the previous example; SNR could be calculated as follows;

SNR = ( ) ( ) ( ) ( )

= 253.746

SNR dB = 10 log10 (SNR) = 10 log10 (253.746) = 24 dB

4. Peak Signal to Noise Ratio (PSNR)

PSNR= ∑

∑ ( )

∑ ∑ ( )

Based on the previous example; the PSNR could be calculated as follows;

PSNR = ( )( )

= 10003.8

PSNR dB = 10 log10 (PSNR) = 10 log10 (10003.8) = 40 dB

CHAPTER [4]: CONCLUSION

Finally, different video applications require different compression ratios and hence, different

video compression formats. However, some formats would outperform the others. For example,

to compare with, MPEG1 and MPEG2, they are different in the following aspects:

1. MPEG2 succeeded the MPEG1 to address some of the older standard's weaknesses;

2. MPEG2 has better quality than MPEG1;

3. MPEG1 is used for VCD while MPEG2 is used for DVD;

4. One may consider MPEG2 as MPEG1 that supports higher resolutions and capable of

using higher and variable bitrates;

5. MPEG1 is older than MPEG2 but the former is arguably better in lower bitrates;

6. MPEG2 has a more complex encoding algorithm.

The following table presents the different applications and the most suitable compression format

for each application.

http://www.winxdvd.com/resource/vcd-svcd.htm

of 13

Table 1: Different applications with most suitable video compression format for each

REFERENCES

[1] Graphics & Media Lab Video Group (2007). Lossless Video Codecs Comparison. Moscow State

University.

[2] B. G. Haskell, A. Puri, and A. N. Netravali, Digital Video: Chapter 2: MPEG-4 compression basics,

Springer, New York, 1997.

[3] B. G. Haskell, A. Puri, and A. N. Netravali, Digital Video: An Introduction to MPEG-2,

Chapman and Hall, New York, 1997.

[4] D. J. LeGall, “MPEG: A Video Compression Standard for Multimedia Applications,’ Communications of

the ACM, Vol. 34, No.4, April 1991, pp. 47–58.

http://compression.ru/video/codec_comparison/pdf/msu_lossless_codecs_comparison_2007_eng.pdf

video compressiontechniques&standards lamamahmoud_report#2

Engineering

video applications

skype video

modified video

resultant video

video compression standards

video compression fundamentals

used video compression

example of video conferencing