video compressiontechniques&standards lamamahmoud_report#2
DESCRIPTION
Signal processing: Video compression techniques & standardsTRANSCRIPT
Page 1 of 13
VIDEO COMPRESSION FUNDAMENTALS AND
STANDARDS
LAMA MAHMOUD
Khalifa University of Science, Technology and Research
Electronics and Computer Engineering Department (ECE)
ELCE 491: Independent Study
Fall Semester / 2013-2014
A report submitted to Dr. Andrzej Sluzek as a one of the independent study course
reports
Page 2 of 13
CONTENTS
ABSTRACT .................................................................................................................................... 3
CHAPTER [1]: INTRODUCTION ................................................................................................ 4
1.1 An overview of JPEG format for still images .................................................................. 4
1.2 Theoretical background about video conferencing specifications ................................... 5
CHAPTER [2]: VIDEO COMPRESSION STANDARDS ............................................................ 5
1. H. 261 STANDARD................................................................................................................ 6
2. H. 263 STANDARD ................................................................................................................ 6
3. MPEG - 1 Format: ................................................................................................................... 7
4. MPEG – 2 Format.................................................................................................................... 8
5. MPEG – 4 Format.................................................................................................................... 9
CHAPTER [4]: CONCLUSION ..................................................................................................... 9
1. Average Absolute Difference ............................................................................................ 10
2. Mean Square Error (MSE) ................................................................................................. 10
3. Signal to Noise Ratio (SNR) .............................................................................................. 11
4. Peak Signal to Noise Ratio (PSNR) ................................................................................... 11
CHAPTER [4]: CONCLUSION ................................................................................................... 11
REFERENCES ............................................................................................................................. 12
LIST OF FIGURES AND TABLES
Figure 1: A block diagram for the JPEG encoder ........................................................................... 4
Figure 2: A block diagram for the JPEG Decoder .......................................................................... 4
Figure 3: An example of the intra-frame and predictive frame coding .......................................... 5
Figure 4: Subjective and Objective Benchmarks for comparisons ................................................. 9
Table 1: Different applications with most suitable video compression format for each .............. 12
Page 3 of 13
ABSTRACT
A video could be categorized into two types; namely, a video conference (slow motion) or a high
motion video. The former can be defined as a video with no or small motions. One example of
video conferencing is the video taken by a webcam in your laptop during a Skype video call or
the videos recorded by the security cameras. On the other hand, the latter or high motion video is
a video just like the sports videos in which an intensive motion will be presented. Technically,
video conferencing applications contains successive frames with few pixels changed from one
frame to another, and, high motion videos includes a high number of pixels changed from one
frame to another.
Several video compression formats took place in the past few decades. As technology
gets vastly improved especially with the newly installed hardware devices (such as HD TV and
Internet TV); higher compression ratio was demanded in order to compensate for the limited
available bandwidth. In addition, along with the newly designed video compression formats,
some benchmarks to evaluate the quality of the modified video were determined.
This report describes the resolution, advantages and disadvantages of the most commonly
used video compression formats such as H. 261, H. 263, MPEG-1, MPEG-2 and MPEG-4
formats. Furthermore, benchmark assessments are discussed in order to determine the quality of
the resultant video. Finally, a list of the most commonly-used video applications will be matched
with the most suitable compression format for each.
Page 4 of 13
CHAPTER [1]: INTRODUCTION
1.1 An overview of JPEG format for still images
Basically, JPEG format is one of the ways in which a still image can be compressed. In general,
the image will be divided into small 8x8 blocks. Then, discrete cosine transform (or DCT for
short) will be generated for each block and such DCT values will be quantized according to a
quantization table. In the next step, the 8x8 block will be converted into 64 (1DC + 63 AC)
values by using the zigzag method and usually, an entropy encoder will be used to code these
values. Figures [1] & [2] below present the schematic diagram of the JPEG encoder and decoder
respectively.
Figure 1: A block diagram for the JPEG encoder
Figure 2: A block diagram for the JPEG Decoder
Furthermore, in image compression, any image will consist of the three main colors which are
red, green and blue (RGB colors for short). Recently, for further compression and in order to
have more control on the colors’ ratios, a still image will be further filtered into chrominance
(i.e. grey information) and luminance (i.e. color information). The luminance (Y) is equivalent to
the following formula:
Y= 0.3R + 0.59G + 0.11B
Page 5 of 13
1.2 Theoretical background about video conferencing specifications
Mainly, video conferencing techniques are applications which are not motion intensive and
require limited motion search and estimation strategies (i.e. Skype video conferencing is an
example). They are optimized to achieve very high compression ratios for full color; real time
video transmissions. They combine intra frame (DCT) coding and inter-frame coding to provide
a good compression and decompression ratios as follows;
1. Intra-Frame: We use JPEG compression which is basically a discrete cosine transform
(DCT) according to the following formula;
N-1 M-1
C (k1,k2) = ∑ ∑ 4x(n,m) cos[pk1(2n+1)/2N]cos [pk2(2n+1)/2M]
n=0 m=0
2. Inter-Frame: Because a video is basically a 20-30 frames/second and in video
conferencing; we assume that motion is limited. Therefore, when you compare between
one frame and the previous one; you will notice that only few pixels are changed. So that;
inter-frame; we will only send the changed pixels from one frame to another. Inter-frame
coding is called “predictive inter-frame coding”.
Figure [3] below shows an example of the Intra-frame and the predictive-frame coding.
Figure 3: An example of the intra-frame and predictive frame coding
CHAPTER [2]: VIDEO COMPRESSION STANDARDS
Several video conferencing techniques were developed in the past few decades. However, only
some of them were used in real life applications. In fact, for a video compression technique to be
well-considered by the International Telecommunication Union (or ITU for short), several
criteria should be met as follows;
Interoperability: should assure that encoders and decoders from different manufacturers
work together seamlessly.
Innovation: should perform significantly better than previous standard.
Page 6 of 13
Competition: should be flexible enough to allow competition between manufacturers
based on technical merit. Only standardize bit-stream syntax and reference decoder.
Independence from transmission and storage media: should be flexible enough to be
used for a range of applications.
Forward compatibility: should decode bit-streams from prior standard
Backward compatibility: prior generation decoders should be able to partially decode
new bit-streams
In this section, selected video compression standards will be presented along with their
specifications, advantages, disadvantages and most suitable applications.
1. H. 261 STANDARD
H. 261 International standard was mainly designed for ISDN picture phones and for video
conferencing systems in 1990. The main characteristics of the H. 261 Standard are as follows;
1. Quarter Common Intermediate Format (OCIF).
Luminance: 144x176, Chrominance: 72x88
2. Common Intermediate Format (CIF).
Luminance: 288x352, Chrominance: 144x176
3. The Chrominance components are subsampled by two in both the vertical and horizontal
directions
4. 8x8 DCT
5. Macro-Block
6. 4Y + U + V = 6 blocks; more gray information
7. Group of Blocks= 11x3 Marco Blocks
8. Interceded Blocks use 16x16
Although H. 261 standard requires a comparatively low bandwidth, it has some disadvantages as
follows;
1. Old days ago; Etisalat in UAE for instance, used to sell them as both hardware and
software together in one hardware machine. Unless you have the hardware. You cannot
use the technology; that’s why it was unsuccessful.
2. Limited Resolution: 144x176 Chrominance
2. H. 263 STANDARD H. 263 is an improved standard for low bit-rate. Like H. 261, it uses the transform coding for
intra-frames and predictive coding for inter-frames. Furthermore, H. 263 supports the following
resolutions;
Page 7 of 13
1. Sub QCIF:
Luminance: 96x128, Chrominance: 48x64
2. QCIF
Luminance 144x176 Chrominance 72x88
3. CIF
Luminance 288x352 Chrominance 144x176
4. 4CIF:
Luminance 576x704, Chrominance: 288x352 just likw a normal TV
5. 16 CIF:
Luminance 1152x1408, Chrominance 576X704
6. 8x8 DCT, JPEG uses DCT (Discrete Cosine Transform)
7. Macro-block; 4Y + U +V
8. Motion estimation 16x16 and 8x8; varies according to the residual error to achieve better
performance
In comparison, the advantages of H. 263 include;
1. Comparatively, unlike H. 261, it was a software only (that is why skype was more
successful)
2. Improved resolution than H. 261, but still limited
3. Resolution could be alternated according to the bandwidth (more bandwidth, more pixels,
more frames; depending in the used software mobile or laptop).
As was mentioned previously, H.261 and H. 263 formats are designed for video conferencing
applications were slow motion only will take place. On the other hand, the following formats are
used for full motion video applications (with more motion-intensive than video conferencing
format). Clearly, this format type requires higher data rate and high bandwidth because
compression ratio will be less.
3. MPEG - 1 Format:
The Moving Picture Experts Group (MPEG) is a working group of experts that was formed
by ISO and IEC to set standards for audio and video compression and transmission. MPEG – 1
Format Algorithm is as follows:
1. Intra Frame coding: I-Frame; similar to JPEG (low compression ratios) and they are used
as random access points
2. Inter-frame Coding: -Frames, prediscted frames are coded using forward predictive
coding, where the actual frame coded with reference to the previous frame. Compression
ratio is higher than of the I frame
Page 8 of 13
3. B-Frames: Bi-directional frames are coded using two reference frames a past and future
frame.
MPEG -1 Resolution are as follows:
1. 8x8 DCT
2. 16 X 16 Motion compensated blocks
3. 4x3 aspect ratio
The disadvantage of MPEG -1:
1. MPEG – 1 is used to store video on compact disc VDC. 1 hour/Disc and 2 Discs/ movie.
So you need to change the VDC
2. Uses two channel steros audio; not surrounded sound. MPEG 1, Layer 1, MAPEG 2,
Layer 2 and MPEG 1, layer 3 or (mp3)
3. Low resolution: 4x4 aspect ratio (normal TV, not HD) with bit rate: 1 to 1.5
Mbits/second
The advantages of MPEG 1:
1. Bidirectional frame coding
2. Used for full motion rather than video conferencing compression only.
3. MPEG 2 outperforms MPEG 1
4. MPEG – 2 Format
The MPEG-2 format is the format that DVD’s are based on. Any software DVD player should be
able to play an MPEG-2 movie. However, note that MPEG-2 files are very large, approaching a
megabyte per running second.
MPEG – 2 Resolutions:
1. Supports high resolution (16383 x 16383)
2. Chrominance Sampling:
a. 3:2:0
b. 4:2:2
c. 4:4:4
Advantages of MPEG – 2:
1. Supports Bit rate 2-10Mbits/second and 3x4 aspect ratio
2. DVD up to 5.1 channels (surround sound)
Page 9 of 13
Benchmarks for Comparasions
Subjective Assesment
Objective Assesment
1. Average Absolute Difference
2. Mean Squared Error (MSE)
3. Signal to Noise Ratio (SNR)
4. Peak Signal to Noise Ratio (PSNR)
Disadvantages of MPEG – 2:
1. Pixels are not labeled, so I cannot change objects from one frame and put it in another
one.
2. Designed to be played only on a computer
5. MPEG – 4 Format Was originally intended for very low bit rates; however; it went beyond high compression ratios
into areas such as content-based interactivity. MPEG – 4 will have a special language MSDL
which describes how to process the elementary data streams. Data could be MPEG – 1 or MPEG
– 2 or 2D or 3D synthesized images. Compression ratios will be achieved because missing bits
will be reconstructed using sets of tools and algorithms.
MPEG – 4 Features:
1. Universal accessibility
2. The ability to operate in extremely error prone environment (Mobile systems)
3. The possibility of user interaction when presenting audio and video information
4. Better compression ratio than MPEG – 2
5. Downloading decoding tools
6. Simultaneous use of data from different sources
7. Hybrid coding of natural and synthetic objects
8. Communications between several participants
9. Integration of real time applications and non-real time (stored) applications.
CHAPTER [4]: CONCLUSION
Consequently, after performing the compression for a selected video, some evaluation tools
could be used in order to evaluate the quality of the compressed video file. In this section,
several benchmarks for comparison will take place such as average absolute error, MSE, SNR
and PSNR (see figure 4).
Figure 4: Subjective and Objective Benchmarks for comparisons
Page 10 of 13
To begin with, subjective and objective assessments are the two main benchmarks in which a
video could be evaluated. The subjective assessment is based on considering the ranking taken
by a survey for a selected number of persons. For example, the distorted data set will be rank-
ordered from best to worst as follows;
Imperceptible 5
Perceptible, not annoying 4
Slightly annoying 3
Annoying 2
Very annoying 1
As it might be noticed; such an assessment is never absolutely accurate since it depends on
people’s opinions which might be slightly different from one sample to another. Alternatively,
the objective assessment considers mathematical interpretations and numbers in which a more
accurate evaluation will be revealed. The following sections will mainly present the four
different objective benchmarks that are used to evaluate the compressed data.
1. Average Absolute Difference
Basically, if you have two images with NxM pixels where Pij are the pixels of the original image
and P’ij are the pixels of the modified image, then the average absolute difference is;
∑ ∑
| Pij - P`ij |
For example; if both the original and modified images have 4x4 pixels as follows;
Original
20 50
10 60
Then, the absolute average error could be calculated as follows;
= 2
2. Mean Square Error (MSE)
Similarly, the MSE is basically given by:
∑ ∑
(Pij - P`ij )2
Based on the previous example; the MSE could be calculated as follows;
= 6.5
Modified
23 49
10 64
Page 11 of 13
3. Signal to Noise Ratio (SNR)
The signal to noise ratio could be calculated as follows;
SNR= ∑
∑
∑ ∑ ( )
Based on the previous example; SNR could be calculated as follows;
SNR = ( ) ( ) ( ) ( )
= 253.746
SNR dB = 10 log10 (SNR) = 10 log10 (253.746) = 24 dB
4. Peak Signal to Noise Ratio (PSNR)
PSNR= ∑
∑ ( )
∑ ∑ ( )
Based on the previous example; the PSNR could be calculated as follows;
PSNR = ( )( )
= 10003.8
PSNR dB = 10 log10 (PSNR) = 10 log10 (10003.8) = 40 dB
CHAPTER [4]: CONCLUSION
Finally, different video applications require different compression ratios and hence, different
video compression formats. However, some formats would outperform the others. For example,
to compare with, MPEG1 and MPEG2, they are different in the following aspects:
1. MPEG2 succeeded the MPEG1 to address some of the older standard's weaknesses;
2. MPEG2 has better quality than MPEG1;
3. MPEG1 is used for VCD while MPEG2 is used for DVD;
4. One may consider MPEG2 as MPEG1 that supports higher resolutions and capable of
using higher and variable bitrates;
5. MPEG1 is older than MPEG2 but the former is arguably better in lower bitrates;
6. MPEG2 has a more complex encoding algorithm.
The following table presents the different applications and the most suitable compression format
for each application.
Page 12 of 13
Table 1: Different applications with most suitable video compression format for each
REFERENCES
[1] Graphics & Media Lab Video Group (2007). Lossless Video Codecs Comparison. Moscow State
University.
[2] B. G. Haskell, A. Puri, and A. N. Netravali, Digital Video: Chapter 2: MPEG-4 compression basics,
Springer, New York, 1997.
[3] B. G. Haskell, A. Puri, and A. N. Netravali, Digital Video: An Introduction to MPEG-2,
Chapman and Hall, New York, 1997.
[4] D. J. LeGall, “MPEG: A Video Compression Standard for Multimedia Applications,’ Communications of
the ACM, Vol. 34, No.4, April 1991, pp. 47–58.
Page 13 of 13