error-prone channels - university of toronto
TRANSCRIPT
Transcoding of MPEG-4
Compressed Video Over
Error-Prone Channels
Aneela Jahan Zaib
.i\ t hesis submitted in conformity with the requirements
for the degree of Master of -4pplied Science
Graduate Department of Electrical and Computer Engineering
University of Toronto
@?O01 Copyright by Aneela Jahan Zaib
National Library I+I ,canada Bibliothèque nationale du Canada
Acquisitions and Acquisitions et Bibliographie Services services bibliographiques 395 WeUington Street 395, rue Wellington Oltawa ON K i A O N 4 OttawaON K 1 A W Canada Canada
The author has granteci a non- exclusive licence aiiowing the National Library of Canada to reproduce, loan, distribute or sell copies of this thesis in microform, paper or electronic formats.
The author retains ownership of the copyright in this thesis. Neither the thesis nor substantial extracts Fom it may be p ~ t e d or otherwise reproduced without the author's pemiission.
L'auteur a accordé une licence non exclusive permettant à la Bibliothèque nationale du Canada de reproduire, prêter, distribuer ou vendre des copies de cette thèse sous la forme de microfiche/fdm, de reproduction sur papier ou sur format électronique .
L'auteur consewe la propriété du droit d'auteur qui protège cette thèse. Ni la thèse ni des extraits substantiels de celle-ci ne doivent être imprimés ou autrement reproduits sans son autorisation.
Transcoding of MPEG-4 Compressed Video Over
Error-Prone Channels
AneeIa Jahan Zaib
A thesis submitted in conformity with the requirements for the Degree of Master of
Applied Science, Graduate Department of Electrical and Cornputer Engineering, in
the University of Toronto, 2001
Abstract
This thesis considers the performance of MPEG-4 compressed video over noisy
channels. This is a design project that proposes a synchronization technique for im-
proving the resilience of MPEG-4 video to transmission errors, without the addition of
any extra redundancy into the bitstream. The errors on noisy transmission channels
cause the loss of synchronization between encoder and decoder. The proposed scheme
transcodes the MPEG-4 compressed video bitst ream into an error resilient structure
for transmission over noisy channels. The Resynchronization Markers, traditionally
used in MPEG-4 for resynchronization are quite long and hence cause considerable
overhead. Furthermore, in the event of a transmission error al1 the data between
two consective Resynchronization markers needs to be discarded. The transcoding
scheme proposed in this thesis avoids the use of these long Resynchronization Mark-
ers and instead achieves resynchronization with minimal overhead using a technique
called Error Resilient Entropy Coding (EREC) that also provides enhanced resilience
to transmission errors. The proposed scheme is standard compatible and can be
implemented without any change in current codecs.
Acknowledgement s
1 would like to thank Prof. -4nastasios N. Venetsanopoulos and Prof. Kostas
Plataniotis, my research supervisors for their guidance throughout the course of my
graduate studies. I am extremely grateful t o my family and friends, and al1 a t Com-
munications Group, who have been so generous in their support. 1 would Iike to
thank my husband, Jahan Zaib Ali, for his infinite love and never ending support.
I t was not possible for me to complete this research work successfully without his
encouragement. Finally, 1 thank my precious son, Zoraiz Jahan, who put a great deal
of tolerance in me and provided a very pleasant diversion in some critical times.
Contents
List of Figures vi
1 Introduction 1
. . . . . . . . . . . . . . . . . . . . . . 1.1 History of Video Compression 2
. . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Video Compression 3
. . . . . . . . . . . . . . . . . . 1.2.1 Effect of Transmission Errors 3
1.2.2 Traditional Approaches to deal with Transmission errors . . . 4
1.2.3 Error Resilient -4pproach . . . . . . . . . . . . . . . . . . . . . 4
. . . . . . . . . . . . . . . . . . . . . . . . 1.3 ContributionoftheThesis 4
. . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Thesis Organization 5
Image and Video Compression 7
. . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Standard Video Coder 9
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Video Decoder 10
. . . . . . . . . . . . . . . . . . . . . . . . . 2.3 MPEG-4 Video Standard 10
2.3.1 Content Based Functionality -.Concept of Video Object Planes 11
. . . . . . . . . . . . . . . . . 2.4 Visual Bitstream Syntax and Structure 13
. . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.1 Start Codes 13
. . . . . . . . . . . . . . . . . . 2.4.2 Visual Object Sequence (VS) 13
. . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.3 Video Object 14
. . . . . . . . . . . . . . . . . . . . . . . . 2.4.4 Video Object Layer 14
. . . . . . . . . . . . . 2.4.5 Group of Video Object Planes (GOV) 14
. . . . . . . . . . . . . . . . . . . . 2.4.6 Video Object Plane (VOP) 14
. . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.7 h~lacroblock 16
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.8 Block 16
. . . . . . . . . . 2.5 Coding of Shape, Motion and Texture for each VOP 16
2.6 Support for Conventional as well as Content based Functionalities . . 19
3 Error Resilient Video Coding/Decoding 21
3.1 Traditional Robust Coding: Forward Error Correcting Coding . . . . 23
. . . . . . . . . . . . . . . . . . . . . 3.1.1 Bitrate-Quality Tradeoff 23
. . . . . . 3.2 Error Resilient Approach to deal with Transmission errors 25
. . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Error Resilience Tools 26
. . . . . . . . . . . . . . . . . . . . . 3.4 MPEG-4 Error Resilience Tools 29
. . . . . . . . . . . . . . . . . . . . . . 3.5 Resynchronization in MPEG-4 30
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6 Conclusion 31
4 Error Resilient Entropy Coding (EREC) 32
. . . . . . . . . . . 4.1 Disadvantages of Using Resynchronization Words 33
. . . . . . . . . . . . . . . . . . . . . 4.2 Error Resilient Entropy Coding 34
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Assumptions 35
. . . . . . . . . . . . . . . . . . . . . . . 4.4 Operation of EREC Encoder 35
4.4.1 Operation of EREC Decoder . . . . . . . . . . . . . . . . . . . 37
. . . . . . . . . . . . . . . . . . . . . . . . 4.4.2 EREC Parameters 39
. . . . . . . . . . . . . . . . . . . . . 4.5 Implementation Issues of EREC 40
. . . . . . . . . . . . . . . . . . 4.5.1 Highly Protected Parameters 40
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.2 Buffering 41
. . . . . . . . . . . . . . . . . . . . . . . . 4.5.3 Error Propagation 41
. . . . . . . . . . . . . . . 4.6 EREC Performance in case of Burst Errros 42
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7 Conclusion 42
5 Transcoding of MPEG-4 Video using EREC 44
5.1 Bitstream Parser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.1.1 Pseudo-Code . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.2 EREC Encoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.3 EREC Decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5.4 EREC Decoding in the Presence of Channel Errors . . . . . . . . . . 51
5.5 Limitations of the EREC Scheme . . . . . . . . . . . . . . . . . . . . 52
5.5.1 Complexity of EREC Decoder . . . . . . . . . . . . . . . . . . 52
. . . . . . . . . . . . . . . . . . . . . . . . 5.5.2 Error Propagation 53
. . . . . . . . . . . . . . . . . . . . . . . 5.5.3 Buffering and Delay 53
5.5.4 Enhancements to the EREC . . . . . . . . . . . . . . . . . . . 53
5.6 Simulation Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
5.7 Overhead -4nalysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
5.8 Results: Experiment 1 . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.9 Resu1t:Experiment 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.10 Resu1t:Experiment 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.11 Resu1t:Experiment 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.12 Resu1t:Experiment 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.13 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
6 Conclusions and Future Directions 70
6.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
6.2 FutureDirections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
Bibliography 75
List of Figures
Two Stage Process for Reducing the Temporal and Spatial Redundancy
in Video Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . Standard Video Coder
Hierarchical levels in an MPEG-4 bitstream . . . . . . . . . . . . . .
. . . . . . . . . . Example of Visual Information - Logical Structure
Example Visual Bitstream - Separate Configuration Information and
. . . . . . . . . . . . . . . . . . . . . . . . . Elementary Stream Data
. . . . . . . . . . . . . . . . . . . . . . . Source and Channel Coding
Decoder can only isolate the error to be between two resynchronization
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . points
. . . . . . . . . . . . . . . . . . . . . . . . An MPEG-4 video packet
. . . . . . Image coding using Huffman coding of transformed blocks
Variable length blocks(1eft) are fitted into fixed length EREC slot struc-
ture (right) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
The stages of EREC Encoding Process: Stage 1 (left) and Stage 2 (right) 38
The stages of EREC Encoding Process: Stage 3 (left) and Final Stage
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (right) 38
. . . . . . . . . . . . . . . . . . . . The MPEG-4 lossless Transcoder 45
. . . . . . . . . . The block diagram of proposed transcoding scheme 45
Two bitstreams running in parallel in EREC encoder . . . . . . . . . 49
Logical blocks in EREC encoder . . . . . . . . . . . . . . . . . . . . . -4 Frame of foreman sequence coded using standard MPEG-4 scheme
with resynchronization markers on error free channel. . . . . . . . . .
The same frame of foreman sequence coded using proposed transcoding
scheme on error free channel. . . . . . . . . . . . . . . . . . . . . . . . -4 Frame of foreman sequence coded using standard MPEG-4 scheme
with resynchronization markers on error free channel. . . . . . . . . . The same frame of foreman sequence coded using proposed transcoding
scheme on error free channel. . . . . . . . . . . . . . . . . . . . . . . . Proposed transcoding scheme using EREC with channel BER of
Proposed transcoding scheme using EREC with channel BER of 10-6
Proposed transcoding scheme using EREC with channel BER of
Proposed transcoding scheme using EREC with channel BER of
Proposed transcoding scheme using EREC with channel BER of
Proposed transcoding scheme using EREC with channel BER of
PSNR (dB) Vs Channel BER for a frame of foreman sequence encoded
using proposed transcoding scheme. . . . . . . . . . . . . . . . . . . . Proposed transcoding scheme using EREC with channel BER of 1 0 - ~
Proposed transcoding scheme using EREC with channel BER of IOh5
Proposed transcoding scheme using EREC with channel BER of
PSNR (dB) Vs Channel BER for a frame of Table Tenis sequence
encoded usinn ~ r o ~ o s e d transcodine; scheme. . . . . . . . . . . . . . .
vii
List of Acronyms
AIR
BER
BMC
DCT
EREC
GOB
HEC
ISO
IEC
ITU
1\11 B
h1 C
ME
MM
QP
RLVC
RM
VLC
vo VOL
VOP
Adaptive Intra Refresh
Bit Error Rate
Block Motion Compensation
Discrete Cosine Transform
Error Resilient Entropy Coding
Group of Blocks
Header Extension Code
International Organization for Standardization
International Electrotechnical Commission
International Telecommunication Union
h/Iacroblock
Mot ion compensation
Motion estimation
Motion Marker
Quant izat ion Parameter
Reversible Variable Length Code
Resynchronization Marker
Variable Lengt h Code
Video Object
Video Object Layer
Video Object Plane
viii
Chapter 1
Introduction
Multimedia refers to a variety of media, such as voice, data, image and video, which
are present either simultaneously or sequentially. In old days, most of the data carried
on the communication networks was textual data. Today, the transmission of multi-
media information has become an important application requirement on the Internet.
Video clips, animation greeting cards with music etc. have become more and more
important on the Internet. Compared with the traditional textual applications, mul-
timedia applications carry speech, voice and video information simultaneously. This
huge amount of data requires much higher bandwidth. A typical piece of 25 second,
320 s 240 Quick Time movie could take 2.3 bfegabits, which is equivaient of about
1000 screens of textual data [l]. Hence in multimedia systems, different sources of
information like voice, data, audio, and video are compressed as much as possible
before the bits are transrnitted via communication and storage channels.
In multimedia communications, on one hand, separate applications are combined
for transmission while on the other hand very adverse requirements have to be handled
simultaneously by sharing the provided bandwidth [2]. In particular, image and
video produce a large amount of data and hence image and video coinmunication
is considered t o be the main system bottleneck. Assuming that wireless access to
multimedia da ta is an important objective for the future, it is necessary to find ways
to make the transmission of image and video over wireless channels as efficient and
reliable as possible.
1.1 History of Video Compression
ISO/IEC is the main standardization body behind the well-known video compres-
sion standards like MPEG-1, MPEG-2 and MPEG-4. Although, the basic video
compression techniques are almost the same fore these standards, they differ in the
applications they address fundamentally.
MPEG- 1 standard deals with the storage and retrieval of multimedia informa-
tion on a CD-ROM. It utilizes the JPEG and H.261 (developed by ITU-T) as start-
ing point but provides many new features like frame based random access of video,
fast forwardlfast reverse etc. The channel bitrate for MPEG-1 was assumed to be
1.5içlbits/sec.
MPEG-2 is focussed on high quality multimedia compression for use in broadcast
applications. MPEG-2 hence covers the application areas like Direct Broadcast Satel-
lite (DBS), Digital Versatile Disk (DVD) and High Definition Television (HDTV).
The target data rate for MPEG-2 is 4-9Mbits/sec. It supports both progressive (non-
interlaced) as well as interlaced formats.
MPEG-4 initially began as a low bitrate standard but is now transformed into the
first standard t hat truly addresses the multimedia by providing the functionalities
like compression, universal access and content based interactivity.
Very little attention was given to error resilience and concealment during the
development of MPEG-1, but work in this area started in parallel to compression ac-
tivities for MPEG-2. Now Error Resilience has become a major effort within MPEG.
That is why MPEG-4 standard cornes with a variety of Error Resilience tools.
1.2 Video Compression
In particular, image and video transmission is the main system bottleneck since it
requires far more bandwidth than the transmission of other information sources such
as speech or data. Second, it is a more difficult problem due to the inherent complexity
of the coding methods.
Due to the enormous amount of bandwidth required, video data is typically com-
pressed before it is transmitted on a channel. Generally, the coding techniques offering
the greatest amounts of compression are most vulnerable to errors. This implies that
MPEG-encoded video images are highly vulnerable t o channel errors due t o extensive
use of interframe coding, which is susceptible to error accumulation upon decoding.
In fact, higher the compression factor, the higher is the vulnerability of bitstream
to channel errors [2j. Of course, error control channel coding can be applied to the
compressed data streams introducing structured redundancy which somewhat offsets
the redundancy removal achieved by compression. However, we should also look for
other image and video compression techniques that are less error prone. In this re-
gard, error resilient video compression schemes have shown to be very effective, which
not only provide efficient protection against channel errors but also corne with less
overhead as compared to traditional error control channel coding schemes.
1.2.1 Effect of Transmission Errors
If an uncompressed video were transmitted, a single bit error would result in the loss
of a single pixel, which is a small element and will be barely noticeable. In compressed
pictures fewer bits are used to describe the same information, and consequently each
bit takes on a much greater meaning. Thus a single bit error in a compressed picture
can result in large areas of the picture being corrupted. Compressing a video signal
reduces the redundancy in each picture, so each error has a much greater impact.
Whenever data is transmitted either through cable, radio or satellite link, the
transmitted signal will be subjected to distortion and noise. The received signal will
therefore? be different from the transmitted signal, and transmission errors will have
been introduced.
The wireless channel is a noisy fading channel characterized by long bursts of error
[3]. When compressed video data is transmitted over wireless channels, the effect of
channel errors can be severe.
1.2.2 Traditional Approaches to deal with Transmission er-
rors
The traditional approaches have been mainly to add redundancy through channel
coding. These Error checksums and correcting codes car? correct a certain number
of errors and so a specified error rate can be accommodated before any noticeable
degradation of picture quality is observed. However, this approach adds overhead
and hence may offset the compression, which rather defeats the purpose.
1.2.3 Error Resilient Approach
The main aim of error resilient approach is to lirnit the propagation of errors, and to
make the encoding process robust enough to deal with the transmission errors a t the
same time avoiding the addition of much extra redundancy.
1.3 Contribution of the Thesis
It has been found that it is the loss of bitstream synchronization that causes the ma-
jor artifacts and sometimes the spatial dislocation of decoded video data. To achieve
and maintain the resynchronization between encoder and decoder, Synchronization
Words (Sync. Words) have long been used. These Synchronization Words are unique
bit patterns that cannot be emulated by any entry in Variable Length Coding (VLC)
tables. However, Resynchronization Words are quite long and hence cause the over-
head that may offset the compression achieved. Also, because of the length of these
Synchronization Words, they are used relatively infrequently. In the event of an error,
the amount of data to be discarded is localized between two consecutive Synchroniza-
tion Words, which is still quite large because of the relative infrequent occurrence
of Synchronization Words in the compressed bitstream. In MPEG-4, this discarded
data can span many macroblocks. The synchronization may be achieved more often
to limit the amount of da ta t o be discarded, but this would introduce a huge overhead
and will not be practical.
We suggest the use of a technique called Error Resilient Entropy Coding (EREC)
that avoids the use of long synchronization words and not only provides more frequent
synchronization than that achieved by Synchronization Words but is also resilient to
transmission errors. The transcoding scheme proposed in this thesis utilizes EREC
and makes the MPEG-4 compressed video more resilient to the errors occurring over
wireless channels without needing any changes in existing MPEG-4 Encoder/Decoder.
The main advantages of this scheme include the provision of frequent synchroniza-
tion with minimal overhead and graceful picture degradation, when the quality gets
progressively worse, as opposed to failing abruptly, when the error rate increases.
1.4 Thesis Organization
The organization of thesis is as follows: Chapters 2 and 3 provide the theoretical
background relating to Image and Video compression, Error Control, Error resilient
techniques etc.. These chapters focus on traditional approaches to deal with the
transmission errors and presents a review and some details of error resilient approach
to limit the propagation of errors without the addition of much extra redundancy.
Chapter 4 introduces Error Resilient Entropy Coding (EREC). Chapter 5 describes
the design and details of the scheme proposed in this thesis, and also shows the
results obtained with the scheme implemented. Chapter 6 concludes and presents
future direct ions.
Chapter 2
Image and Video Compression
Image data tend to have a high degree of spatial redundancy. Spatial redundancy
implies that pixels are correlated across space. Most image compression schemes use
Transform-domain Block- based coding to exploit the spatial redundancy in images.
Similarly successive pictures (images) from video sequences are very sirnilar. Typically
two successive pictures will share a very similar background, and only a small area
of foreground will change. This correlation of video sequences in time is termed as
temporal redundancy. There is a wide range of lossy video compression methods but
basically they exploit the spatial and temporal redundancy present in video sequences.
In order to exploit both the spatial and the temporal redundancy, most video coders
use a two-stage process, shown in figure 2.1, to achieve good compression [5]. The
first stage uses a method that exploits the temporal redundancy between frames. The
output of this stage is followed by a coding method that exploits spatial redundancy
within the frame. In fact, most of the current video coding standards such as H.263
and bIPEG-4 are al1 based on this hybrid coding technique, shown in figure 2.2. The
Hybrid coding technique consists of Block Motion Compensation (BMC) and Discrete
Cosine Transform (DCT). BMC is used to exploit temporal redundancy while DCT is
used to reduce spatial redundancy. In discussing the compression of still and moving
images, we can distinguish betrveen intraframe and interframe coding. In intraframe
Stage 1 1 Processing for reducing temporal redundancy 1
frame (t- 1 )
frame difference
frame (t)
Stage2 1 Processing for reducing spatial redundancy 1
r 7
Figure 2.1: Two Stage Process for Reducing the Temporal and Spatial Redundancy
in Video Sequences
coding, redundancy is removed from a single image frame by exploiting the spatial
correlation within that frame. Of course, a moving image can be compressed by
only applying intraframe coding separately to each successive frame. However, much
greater compression of moving images is achieved by exploiting the sirnilarity between
successive frames, and this is termed as interframe coding. There are two types of
interframe coding Predictive, and Interpolative.
0 Predictive Coding: In predictive coding, first one picture frarne is coded using
the intrafranie coding technique, and then the differences between the reference
frarne and successive frames are encoded. To prevent error propagation, the
process is periodically restarted with a new reference frame.
O Interpolative Coding: With the interpolative coding, reference frames are again
used, but some frames between reference frames are simply not transmitted and
are restored during decompression by interpolating between reference frames.
Figure 2.2: Standard Video Coder
Standard Video Coder
Figure 2.2 shows a standard hybrid BMC/DCT video coder configuration, similar
to those used in MPEG-2 and JPEG. This also forms the basis of MPEG-4 video
coding scheme. Pictures are coded in either of the two modes, intraframe mode or
interframe mode. In intraframe mode, pictures are coded without any relation to the
previous image whereas in interframe coding, the current image is predicted from the
previous image using Block Motion Compensation (BMC); and the difFerence between
current image and the predicted image, called the residual image is coded. The basic
unit of data which is operated on is called a macroblock (MB) and is the data (both
luminance and chrominance components) corresponding to a block of 16x16 pixels.
The input image is split in to disjoint macroblocks and the processing is done on
macroblock basis.
2.2 Video Decoder
The macroblocks are reconstructed a t the receiver by the decoder using a reverse
process. The variable length codewords present in the received video bitstream are
decoded first. For inter macroblocks, the pixel values of the prediction error are
reconstructed by inverse quantization and inverse DCT and are then added to the
motion compensated pixels from the previous frame t o reconstruct the macroblocks.
2.3 MPEG-4 Video Standard
There are two different voluntary organizations in the field of visual communica-
tion. First one is the International Teleconimunications Union/Telecommunications
Standardization Sector (ITU-T). The second one is the International Organization
for Standardization/International Electrotechnical Commission (ISO/IEC). Table 3.1
provides an overview of the standards developed by ISO/IEC organizations.
Table 2.1 Description of Parameters
1 EVIPEG-1 1 1992 1 Digital Storage Media 1-2 Mbps 1
/ Standard 1 Year of Adoption
1 MPEG-2 1 1994 1 Broadcast 4-6 Mbps 1
Functionality Description
- - - - - - -
Video Coding standards, MPEG-1 and MPEG-2, although perfectly well suited in
the environment for which they were designed, are not necessarily flexible enough to
efficiently address the requirements of multimedia applications [?]. MPEG-4 visual
standard provides users a new level of interaction with visual contents. I t provides
/ MPEG-4 1999 / Content Based Interaetivity 10Kbps- 1
technologies to view, access and manipulate objects rather than pixels, with great
error robustness and a t a large range of bit error rates [il. This section describes the
ILIPEG-4 standard, as defined in ISOP/IEC 14496-2 document. The MPEG-4 visual
standard has been explicitly optimized for three bitrate ranges.
Below 64 Kbits/sec
MPEG-4 provides support for both interlaced and progressive material. The chromi-
nance format that is supported is 4:2:0. In this format the number of Cb and Cr
samples are half the number of samples of the luminance samples in both horizontal
and vertical directions. The resolutions supported by MPEG-4 standard are from
sub-QCIF to beyond HDTV.
2.3.1 Content Based Functionality -Concept of Video Ob-
ject Planes
The MPEG-4 video coding standard supports the functionalities already provided
by MPEG-1 and hIPEG-2: including the provision to efiiciently cornpress standard
rectangular sized image sequences a t varying levels of input formats, frame rates, and
bit rates [7]. Rirthermore, it provides the support for the separate encoding and
decoding of content Le., physical objects in a scene. Within the context of l"PEG-4,
the ability to identify and selectively decode and reconstruct video content of interest
is referred to as content based scalabiIity". Because of this functionality of MPEG-
4, we can interact and manipulate the contents of images and video sequences in the
compressed domain without the need for further segmentation or transcoding at the
receiver.
To enable the content based interactive functionalities envisioned, the MPEG-4
standard introduces the concept of "Video Object Planes" (VOP's). I t is assumed
that each frame of a n input video sequence is segmented into a nurnber of arbitrarily
shaped image regions (Video Object Planes). Each of the regions may possibly cover
particular image or video content of interest, Le., it describes a physical object within
the scenes. In contrast to the video source format used for the MPEG-1 and MPEG-2
standards, the video t o be coded by MPEG-4 is thus no longer considered a rectangu-
lar region. The input to be coded can be a VOP image region of arbitrary shape and
the shape and location of the region can Vary from frame to frame. Successive VOP's
belonging to the same physical object in a scene are referred to as, Video Objects
(VO's) . A Video Object (VO) is a sequence of VOP's of possibly arbitrary shape and
position. The shape, motion and texture information of the VOP's belonging to the
same VO is encoded and transmitted in to a separate Video Object Layer (VOL).
In addition, relevant information needed to identify each of the VOL's and how the
various VOL's are arranged, is also included in the bitstream to allow the decoder to
reconstruct the entire original sequence at the receiver. Hence, each VOP is decoded
separately which allows the flexible manipulation of the video sequence. (The video
source input assumed for the VOL structure may be generated by means of on-line
or off-line segmentation algorithms).
If the original input image sequences are not decomposed in to several VOL's of
arbitrary shape, the coding structure simply degenerates in to a single layer repre-
sentation, which supports conventional image sequences of rectangular shape. The
MPEG-4 content-based approach can thus be seen as a logical extension of the con-
ventionül MPEG-1 and MPEG-2 coding approach towards image input sequences of
arbitrary shape.
2.4 Visual Bitstream Syntax and Structure
The central concept defined by the MPEG-4 standard is the audio-visual object.
An MPEG-4 scene may consist of one or more video objects. An MPEG-4 visual
bitstream provides a hierarchical description of a visual scene as shown in figure
2.3 [BI . Eacli level of hierarchy can be accessed in the bitstream by special code
values called start codes.
2.4.1 Start Codes
Start codes are specific bit patterns that do not otherwise occur in the video strearn.
Each start code consists of a start code prefix followed by a start code value. The
start code prefix is a string of 23 bits with the value zero followed by a single bit with
the value one. The start code prefix is thus the bit string '0000 0000 0000 0000 0000
0001'. The start code value is an &bit integer, which identifies the type of start code
as shown in table 2.2.
Table 2.2 Some Start Code Values for MPEG-4 Visual Bitstream
Narne 1 Start Code Value (Hexadecirnal)
video - object - start - code 1 00 through 1F -. - -
video - object - layer - start - code 1 20 through 2F
2.4.2 Visual Object Sequence (VS)
-
uisual - object - sequence - start - code
visual - object - sequence - end - code
group - of - vop - start - code
vop - start - code
Visual object sequence is the highest syntact ic structure of the coded visual bitstream.
A visual objec t sequence commences wit h a visualOb ject,eqwence,tart,ode, which is
BO
BI
B3
B6
followed by one or more visual objects coded concurrently. The visual object sequence
is terminated by a uisual - object - sequence - end - code.
2.4.3 Video Object
A video object corresponds t o a particular (2-D) object in the scene. In most sim-
ple case it could be a rectangular frame, or it can be an arbitrarily shaped ob-
ject corresponding to an object or background. A video object commences with a
video - object - start - code, and is followed by one or more video object layers.
2.4.4 Video Object Layer
The VOL provides support for scalable coding. Each Video Object may be encoded
in scalable (multi-layer) or non-scalable form (single layer), depending upon the a p
plication. The video - object - layer - start - code marks a new video object layer.
2.4.5 Group of Video Object Planes (GOV)
The GOV groups together Video Object Planes. GOVs are optional.
2.4.6 Video Object Plane (VOP)
A VOP is a time sample of a video object. -4 conventional video frame can be
represented by a VOP with rectangular shape. VOP start code is the bit string
'000001B6' in hexadecimal. It marks the start of a video object plane. The VOP
contains the encoded video data of a time sample of a video object. That is, it
contains motion parameters, shape information and texture data. Al1 this information
is coded using macroblocks. The above hierarchical levels can be accessed by specific
start codes as mentioned above. However the macroblocks are coded sequentialiy and
there is no explicit boundary between macroblocks.
VS, ...VS, Visual Object Sequence (VS)
VOP, ." VOP, VOP,, m..
t VOP, VOP, ... VOP,
1-
Video Object (VO)
Video Object Layer
LAYER 1 LAYER 2
VOL,
Figure 2.3: Hierarchical levels in an MPEG-4 bitstream
VOL,
L 1 \
Gov, 1 GOY2 - -
Video Object Plane (VOP) -
2.4.7 Macroblock
A macroblock contains a section of the luminance component and the spatially corre-
sponding chrominance components. The term macroblock can either refer to source
and decoded data or to the corresponding coded data elements. A skipped mac-
roblock is one for which no information is transmitted. PresentIy there is only one
chrominance format for a macroblock; namely, 4:2:0 format. The orders of blocks
in a macroblock is illustrated below. -4 4:2:0 Macroblock consists of 6 blocks. This
structure holds 4 Y, 1 Cb and 1 Cr Blocks.
2.4.8 Block
The term block can refer either to source and reconstructed data or to the DCT
coefficients or to the corresponding coded data elements.
The syntax for visual bitstream defines two types of information, Configuration
information and elementary stream data. Configuration information refers to the
header information such as Video Object Sequence Header, Video Object Header and
Video Object Layer Header etc. The Elementary stream contains the data for a single
Iayer of a video object. Configuration information may be carried separately from
or combined with elementary stream data. The information about how the multiple
elementary streams are multiplexed in to a single bitstream is beyond the scope of
this work. The interested readers are referred to [IO] and [Il] for more information.
2.5 Coding of Shape, Motion and Texture for each
VOP
In MPEG -4 video standard, the information related to the shape, motion and texture
For each VO is coded in to a separate VOL in order to support separate decoding of
VO's. Identical algorithm is used, to code the shape, motion and texture information
vol Elementary Stream VOLl Visual Object 1 Header Layer i
Visual Object Visual Object I Header
Sequence Layer 2
Header
L
Figure 2.4:
VOz Elementas, Stream VOLl Visual Object 2 Header h y e r 1
-
Example of Visual Information - Logical Structure
Visud Object
Header Header
Elernentvy Stream Visuai Object 1 Lûyer 1
Elementii Stream Visud Object 2
I Loyer 1
Figure 2.5: Example Visual
Elementary Stream Data
Bitstream - Separate
Header r l
Configuration Information and
in each of laver. However, if the application requires high coding efficiency only
without the need for extended content based functionalities, input image sequence to
be coded contains only standard rectangular sized images [SI. The shape information
is not transmitted. In this case MPEG-4 video coding algorithm is similar to MPEG-
1/2 or H.26X coding algorithms. For coding each VOP image sequence (rectangular
size or not), the MPEG-4 coding standard uses a hybrid Block Motion Compensation
(BMC) and Discrete Cosine Transform (DCT) technique, already employed in MPEG-
112 [Il*
The Shape information, for arbitrarily shaped VO's, is referred to as " alpha
planes" in the context of MPEG-4. Shape coding can be lossless or lossy, allowing
the tradeoff between bitrate and accuracy. Two kinds of shape information, binary
shape information and gray scale shape information are used commonly [?].
Motion estimation and compensation are commonly used to compress video se-
quences by exploiting temporal redundancies between frames. The approach for mo-
tion compensation in the MPEG-4 standard is similar to those in other video coding
standards like MPEG-2. The main difference is that the block-based techniques have
been adapted to the VOP structure used in MPEG-4. There are three modes for
encoding an input VOP, namely:
A VOP may be encoded independently of any other VOP. In this case the
encoded VOP is called Intra VOP (IVOP).
-4 VOP may be predicted (using motion compensation) based on another previ-
ously decoded reference VOP. Such VOPs are called Predicted VOPs (P-VOP).
A VOP may be predicted based on past as well as future reference VOPs. Such
VOPs are called Bidirectional Predicted VOPs (B-VOP). B-VOPs may only be
interpolated based on 1-VOPs or P-VOPs. Motion Estimation is necessary only
for coding P-VOPs and B-VOPs.
It is very important to note that the coding of standard MPEG 1-frames, P-frames,
and B- frames is still supported by the MPEG-4 standard as a special case of image
in put sequences (VOP's) of rectangular shape.
The texture information of a VOP is present in the luminance, Y, and two chromi-
nance components, Cb, Cr, of the video data. In the case of an 1-VOP, the texture
information resides directly in the luminance and chrominance components. In the
case of motion compensated VOP's the texture information represents the residual
error remaining after motion compensation. For encoding the texture information,
the standard 8 x 8 block-based DCT is used [6]. For each macroblock a maximum of
four 8 x 8 luminance blocks (Yl, Y2, Y3, Y4) and two 8 x 8 chrominance blocks U
and V are coded. For 8 x 8 blocks straddling the VOP borders, the image padding
technique is used to fil1 macroblock content outside of a VOP prior to applying the
DCT in intra VOPs. For coding of motion-compensated prediction error P-VOPs,
the contents of the pixels outside the active VOP area are set to 128 [18]. Scanning
of the DCT coefficients followed by quantization and run length coding is performed
using techniques and VLC tables similar to those used in MPEG-1/2 and H.263 stan-
dards. An efficient prediction of DC and AC coefficients of the DCT is performed for
intra coded VOPs. MPEG-4 standard basically supports al1 the tools (DCT, motion
estimation and compensation, etc.) defined in MPEG-1, H.263 and in MPEG-2 Main
Profile. The compressed alpha plane, motion vector and DCT bit words are multi-
plexed into a VOL layer bitstream by coding the shape information first followed by
motion and texture information.
2.6 Support for Conventional as well as Content
based Funct ionalit ies
-4s indicated above, Besides the provision of new content based functionalities and
error resilience and robustness, the MPEG-4 video coding standard allows the coding
of standard rectangular size image sequences, as a special case, using a single layer
VOP coding. In this coding mode, the VOP is considered to be of rectangular shape
instead of being arbitrarily shaped. Consequently, since the input image is not seg-
mented in to several VOP's, there is only single layer, instead of many layers used for
coding each VOP separately.
Chapter 3
Error Resilient Video
Coding/Decoding
Current video compression standards are not designed for error prone transmission,
they can suffer seriously if any of the compressed data is corrupted The error rate
experienced on wireless channels a re relatively high in cornparison with those in wired
networks. For example, the error characteristics of a circuit switched wireline POTS,
or "plain old telephone service", transmission are around 106 random BER in worst
cases. Hence, there are difficulties peculiar to wireless channels that exhibit extreme
problems in transmission and networking. However, the problem is made more severe
for the transmission of compressed image sequences. The following section explains
why this is the case.
In order to achieve highly efficient irnage/video compression, many systems use
variable-rate coding (VLC) techniques such as entropy coding and run-length coding.
Variable lengt h coding techniques provide much better compression ratios t han do
fixed-rate techniques but are degradeci more severely by channel errors.
In VLC boundary between codewords is irnplicit in the decoder. The variable
length codeword decoder reads compressed bit stream until a full codetvord is en-
countered, then it translates that codeword in to a meaningful symbol, and begins
decoding a new word. When there are transmission errors the implicit nature of
boundary between codewords leads to an incorrect number of bits being used in VLC
decoding. This simply means that if there is an error in a variable length codeword,
the decoder may not be able to detect that error, but would rather decode an incor-
rect symbol, and t hus subsequently results in loss of synchronization wit h encoder.
These errors may never be detected until a unique resynchronization point or start
code is encountered in the bitstream.
The demand for wireless services is continuing to grow and as wireless networks
become more widely deployed, the need will inevitably arise for a variety of wire-
less imagery and video transmission capabili t ies similar to t hose which are becoming
available in the existing public switched network and wired office environrnents [IO].
There are some issues involved in migrating imagery and video transmission to the
wireless environment. In particular image and video transmission is the main sys-
tem bottleneck, since it requires far more bandwidth than the transmission of other
information sources such as speech or data. Second, it is a more difficult problem
due to the inherent complexity of the coding methods. Two conflicting requirements
regarding this are:
0 Limited capacity of wireless channel. This compels the use of efficient compres-
sion techniques of source data.
a The wireless channel is a noisy channel characterized by long bursts of error and
rapid degradation in signal quality due to interference and multipath fading. In
short, quality of channel is highly variable which gives rise to erroneous trans-
mission. This requires that any coding scheme used must degrade gracefuliy in
the presence of errors introduced by the channel.
Present day video coding techniques employ predictive coding and motion compensa-
tion to exploit existing temporal and spatial redundancy. Hence not only the use of
predictive coding leads to the propagation of errors to neighboring spatial blocks but
also errors occurring in one frame will therefore propagate to the following frames.
Due to extensive use of Variable Length Coding (VLC) and Predictive Coding,
compressed data is very vulnerable to transmission errors. This problem becornes
severe for video transmission over wireless channels. Firstly because the video data
is highly compressed and secondly because wireless channels have higher error rate
than wireline channels. This leads to a rapid degradation in the reconstructed video
quality
In short in video communication over error prone channels, transmission errors
will occur, and if compressed digital video is used, the compression algorithm has to
be error robust. Two approaches are known to make video communication system
error robust: protection on transmission level (channel coding), and error robust video
compression.
3.1 Traditional Robust Coding: Forward Error Cor-
recting Coding
A traditional method to cope with transmission errors is to employ Forward error
Correcting (FEC) coding. Traditionally, source coding and channel coding have been
separated. Figure 3.1 shows the coding of digital data. In the source coding stage,
data is compressed, and as much uncontrolled redundancy is removed as possible. In
the channel coding stage, controlled redundancy is put back to allow error detection
and correction for errors a t the channel decoder.The addition of controlled redundancy
is referred to as forward error correcting coding (FEC). Some examples are Linear
Block Codes, Linear Cyclic Codes, Convolutional Codes etc.
3.1.1 Bitrate-Quality Tradeoff
If video is passed down a noisy channel of a fixed capacity, C, then the number of
databits, N and the number of controlled redundancy checkbits, R, must be fewer
Noisy Channel
Figure 3.1: Source and Channel Coding
I
Source Channel
Coding Coding
The picture quality in the absence of errors is a function of N alone. R governs the
error correcting capability of a code, and this affects the picture quality in the presence
of channel errors. To combat the effect of channel errors, a few databits (N) are traded
for codebits for FEC. With FEC, a lower quality is noticed a t low error rates while, a
higher quality is achieved a t high error rates. FEC also produces a rnuch sharper curve
[4], where the picture suddenly deteriorates very quickly with increasing error rate.
This occurs because the error correcting capability of the code has been exceeded,
and the code attempts to correct multiple errors incorrectly. Thus video protected by
FEC degrades very suddenly with little notice, whereas unprotected video degrades
more gracefuIly. FEC techniques provide effective error protection against random
bit errors but their performance is usually inadequate against longer duration burst
errors. These FEC codes come with an increased overhead in terms of bitstream size;
hence some of the coding efficiency achieved by the video compression scheme is lost,
which espands the da ta rate unnecessarily and may still not be able to overcome the
effects of errors under severe channel degradation. Also FEC coding is not suitable
for channels having a highly variable quality and need for very powerful FEC codes
for worst case channel situation would severely reduce the compression performance.
Further more such a system would fail catastrophically whenever this worst case is
exceeded and on the other hand would be over protected in normal channel situation
[14]. Thus the noisy channel environment presents a difficult tradeoff between the
-) Channel
Decoding
Source
Decoding b
need to constrain transmission rates and the need to provide acceptable video quality
in the presence of channel errors. -4 video compression scheme designed for these
channels must degrade gracefully in performance when channel fading occurs.
Generally, the coding techniques offering the greatest amounts of compression are
most vulnerable to errors [Il]. This implies that MPEG-encoded video images are
highly vulnerable to channels, due to extensive use of interframe coding, which is
susceptible to error accumulation upon decoding. In fact, higher the compression
factor, the higher is the vulnerability of bit stream to channel errors. Of course, error
control channel coding can be applied to the compressed data streams, introducing
structured redundancy which somewhat offsets the redundancy rernoval achieved by
compression. However, we should also look for other image compression techniques
which are less error prone.
3.2 Error Resilient Approach to deal with Trans-
mission errors
What is needed is, therefore, to redesign the compression system in order to be more
resilient to channel errors. The aim of error resilient Image and video coding should
be:
To reduce the propagation efFects of channel errors and maintain synchroniza-
tion even when transmitted bit stream is corrupted by channel errors.
To provide a graceful degradation in picture quality, when error rate is increased
The error resilient coding schemes thus provide improved performance for both good
and poor quality channels, while a FEC system may provide superior performance
around a pre-designed quality channel.
3.3 Error Resilience Tools
In prac t ical video communication schemes, error correct ing codes are typically used
only to provide a certain level of error protection to the compressed video bitstream
and it becomes necessary for the video decoder to accept some level of errors in
the compressed bit stream. This necessitates the use of Error Resilience tools to
handle these residual errors that remain after error correction especially if less delay
is required. The goal of traditional video coding is to eliminate both spatial and
temporal redundancy in the video signal. However, to achieve high video quality for
transmission over an error prone channel, it is highly desirable to have video codecs
designed with error resilience in mind.
Error resilience techniques are the tools employed to improve the error robust-
ness of communication system. For the transmission of compressed digital video
[13], Forward Error Resilience techniques refer to the technique where encoder plays
an important part in improving the error robustness, typically by introducing the
redundancy in transmitted information. For example, the error resilience tools incor-
porated into the PVIPEG-4 video coder are basically Forward Error Resilience tools.
These include, Resynchronization markers, Data Partitioning, Reversible Variable
length Codes (RVLCs), Header Extension Code (HEC) and Adaptive Intra Refresh
(-4IR). Even after performing error control and correction some amount of residual
errors still exists in compressed bit stream fed to video decoder in the receiver, due
to transmission over wireless channels. Even a very low BER in this stream, will
have a devastating effect on the quality of the decoded sequence because of the high
compression and the error propagation. Therefore, the video decoder should be ro-
bust enough to provide acceptable video quality even in the presence of some residual
errors. Following are the error resilient tools, generally included in video decoder, to
make it more robust and minimize the effect of transmission errors [12].
O Error detection and localization
0 Resynchronization
0 Data Recovery
Error concealment
-4ccurate detection of errors is essential step since most of the other error resilience
techniques can only be invoked if an error is detected. The presence of errors in the
compressed bitstream can be signaled by FEC used in multiplex layer. The video
decoder can also detect errors whenever illegal VLC codewords are encountered in
the bitstream or when the decoding of VLC leads to an illegal value of decoded infor-
mation i-e., occurrence of more than 64 DCT coefficient for an 8x8 DCT block. The
detection of an error implies that decoder has lost synchronization with the encoder.
The decoder is made to fa11 back into lock step with the encoder by using Resynchro-
nization schemes. Encoder inserts unique synchronization words in the bitstream a t
approximately equally spaced intervals. These synchronization words are chosen such
that they are unique from valid video bitstream i.e. no valid combination of the video
algorithm's VLC tables can produce these words. The decoder, after the error de-
tection, seeks forward in the bit stream looking for this known synchronization word.
Once this word is found the decoder then falls back in to synchronization with the en-
coder. -4t this point the decoder has detected an error, regained the synchronization
with the encoder and isolated the error between two resynchronization points.
Due to the use of VLC, the location in the bitstrearn where the decoder detects the
error is not the same location where the error had occurred but some undetermined
distance away from i t , as shown in figure 3.2. Since the decoder only isolate the
error to be between two synchronization points and not pinpoint the exact location,
generally al1 of the data that corresponds to the macroblocks between these two
resynchronization points needs to be discarded. Otherwise, the effects of displaying an
image reconstructed from erroneous data can cause highly annoying visual artifacts.
After resynchronization is reestablished, data recovery tools like reversible decod-
- Discarded Data
1 1 1 1 I
-
Resync.Pt. Error Location Error detected Resync. Pt.
Figure 3.2: Decoder can only isolate the error to be between two resynchronization
points
ing attempt to recover dat,a that in general would be lost. These special VLCs called
RVLC have the property that they can be decoded in both forward and reverse direc-
tion which is made possible by the fact that special kind of VLC table at the encoder
are used in coding DCT coefficients and motion vectors. The exact location of error
can now be localized more precisely, by comparing the forward and reverse decoded
data.
The remaining operation Le., error concealment is kind of post processing tech-
nique that aims at minimizing the impact of data that is in error. Different implemen-
tations of wireless video systems utilize different kinds of error concealment strategies
depending upon the available computational power and the quality of the channel.
One simple error concealment strategy is to simply replace the luminance and chromi-
nance components of erroneous macroblocks with the luminance and chrominance of
the corresponding macroblocks in the previous frame of video sequence. More com-
plex techniques use sorne kind of estimation strategies to exploit the local correlation
that exists within a frame of video sequence to fil1 the missing information of erroneous
blocks of data.
A third type of error resilience technique is still possible if the network supports a
back channel. In this interactive error resilience technique, the decoder and encoder
interact through a feedback path, to improve the error resilience by retransmitting
Figure 3.3: An MPEG-4 video packet
Resync. Marker
the data or by influencing future encoder action so as to stop the propagation of
detected errors in the decoder.
For completeness the error resilience tools developed for MPEG-4 are described
in the following section. They offer a clear set of tools which, when used properly,
can permit communication of video information in noisy environments. This is a
critical breakthrough in video technology because error prone environment is very
unforgiving to digital video [14].
3.4 MPEG-4 Error Resilience Tools
MB No.
X number of tools have been incorporated in MPEG-4 video coder to make it more
error resilient and compatible for the transmission of compressed data over wireless
channels. These include: [15]
0 Resynchronization
0 Data Partitioning
0 Reversible Variable Length Codes (RVLCs)
Adaptive Intra Refresh (AIR)
MPEG-4 standard transmits the data/inforination in the form of packets. An MPEG-
4 video packet. Each video packet is made up of an integer number of macroblocks
in raster scan order. These macroblocks can span several rows of macrobiocks in the
irnage and can even include partial rows of macroblocks.
Macroblock data QP HEC
3.5 Resynchronization in MPEG-4
When a video decoder looses synchronization due to the decoding of a corrupted bit
strearn, it becomes unable to identify precise location in the image where the current.
data belongs, that results in rapid degradation of quality of decoded video or in some
cases rendering the video unusable.
MPEG-4 standard uses Resynchronization Markers to achieve resynchronization
of video decoder with encoder [13] Resynchronization Markers are specially designed
bit patterns that are usually placed at approximately regular intervals in the video
bitstream. When the decoder detects an error. It can then look for this resynchro-
nization marker and regain synchronization. There are two main points of concern in
achieving resynchronization in MPEG-4.
0 I t differs from previous video coding standard the way it inserts these Resyn-
chronization Markers in bit stream. I t inserts the markers at the beginning of
each video packet.
The encoder needs to remove al1 da ta dependencies that exist between the data
belonging to two different video packets within the same image
Previous standards such as H.261 and H.263 (Version 1) insert these Resynchroniza-
tion Markers a t the beginning of each of the GOBs [15]. The images to be encoded
are logically partitioned in to r o m of macroblocks called Group of Blocks (GOBs):
in case of QCIF images these GOBs correspond to a horizontal row of macroblock.
Hence, in case of H.263 one row of macroblock is the smallest region that the error
can be isolated to. In EVIPEG-4 resynchronization markers are not spaced after a
fixed nurnber of rnacroblocks (slice concept), but it is attempted t o space the markers
evenly throughout the bit stream to obtain the video packets of nearly equal length.
This is particularly advantageous in case of short bursts of errors where decoder can
quickly localize the error to within a few macroblocks in the important high activ-
ity areas as macroblocks corresponding to these areas generate more bits than other
parts of the image. The reason being that PvIPEG-4 encoder inserts the resynchroniza-
tion rnarkers at uniformly spaced bit intervals, (Note that resynchronization markers
can only be placed at a macroblock boundary), the macroblock interval between the
markers is a lot closer in high activity areas and a lot further apart in low activity
areas. Hence MPEG-4 preserves the image quality in important areas, in contrasts
to H.263 (version l), where the resynchronization rnarkers are restricted to be at the
beginning of a fixed GOB independent of image content. The recommended spacing
of resynchronization markers (based on bit rates) is at the intervals of 480 bits for
24Kb/s and 736 bits for bit rates between 25Kb/s to 48Kb/s. Another important
point of concern is that al1 predictively coded information is confined within one
video packet so as to prevent the propagation of errors, if one of the video packets
in the current is corrupted due to errors. This is goal is achieved by inserting two
additional fields at the beginning of each video packet as shown in figure 3.3 [13].
The first field is the Absolute Macroblock Number (MB No.) of the first rnacroblock
in the video packet that indicates the spatial location of macroblock in the current im-
age. The second field is the Quantization Parameter (QP), nhich denotes the initial
quantization parameter used to quantize the DCT coefficients in the video packet.
3.6 Conclusion
Only bitstream syntax and error free decoding procedures are standardized in MPEG-
4 [IO]. So, there is still room for improvement in other areas of error resilience like
encoding, error detection, error localization and error concealment.
This research proposa1 aims to address the problem of error resilieiit image and
video source coding, by utilizing an alternative resynchronization tool, called Error
Resilient Entropy Coding (EREC). We propose to use it to multiplex, the variable
length blocks of data produced by MPEG-4 compression algorithm, in to an error
resilient structure.
Chapter 4
Error Resilient Entropy Coding
(EREC)
Error Resilient Entropy Coding (EREC) is a method of synchronization that can
be adopted to the existing video coding schemes to provide enhanced resilience to
transmission errors. EREC achieves synchronization with minimal redundancy, hence
maintains the high coding efficiency achieved by video coding scheme. It is designed
to provide graceful degradation in quality as bit error rate increases, and is superior to
channel coding that fails abruptly when bit error rate is increased beyond its capacity.
It is not the loss of any particular bit, but loss of bitstream synchronization, which
causes the most visible corruption of compressed pictures. The block diagram of a
general source-coding algorithm is shown in figure 4.1. The process named multiplex
coding/ decoding involves combining al1 data to be coded in to a form that is suitable
for transmission e.g., a single binary bitstream. This stage of coding may include
channel coding and addition of synchronization words that help the decoder regain
the synchronization once lost due to channel errors.
The most common way to implement multiplex coding is to transmit the variable
length blocks consecutively. If this method is adopted then and compressed data gets
corrupted due to transmission errors, the decoder loses synchronization with the start
Input Multiplex
Di screte Coâe Vaiablelength 1 N
Image i
Coefficients Words Block Data
I i !
Figure 4.1: Image coding using Huffman coding of transformed blocks
4
i i
and end of variable length blocks. Even if the synchronization is regained after the
ioss of a few bIocks, the block count is likely to be permanently offset, resulting in
following data being decoded at the wrong location [16]. In image and video coding
this can result in large areas of the picture being spatially displaced. As explained
earlier, unique synchronization code words (Resynchronization Warkers in MPEG-4)
followed by some block address information are inserted at intervals in compressed
bitstream to achieve synchronization between encoder and decoder.
-D I mag-
i i
4.1 Disadvantages of Using Resynchronization Words
- Forwrud Huffman
Coding .
Forwd
Run Length
Coding
Forward
Transfonn
Forwrird
Quantization
H
A
The disadvantages associated with the use of Resynchronization Markers can be sum-
marized as below:
Coding
I 1 7 I
*
Image Decoding .d
Inverse
Huffman
'Oding
4- Output
The resynchronization words are relatively long bit patterns. Although the
necessary length of these code words is minimized, the nature of constraints
implies that they are still quite long. Thus in order to maintain low redundancy,
4
Multiplex
Inverse
Transfonn
Inverse 1 ' Qumtizuion
' ~nverse
Run Length
Coding
they must be used relatively infrequently. This limits the propagation of errors
to a maximum of the separation bettveen two Resynchronization Markers, which
due to the relative infrequent occurrence of sync words is still quite large.
a Before inserting the Resynchronization Markers a t the start of a video packet,
al1 predictively encoded information must be confined within a video packet to
prevent the error propagation caused by predictive coding/decoding steps in
the algorithm. This results in some small sacrifices in coding efficiency.
0 By using Resynchronization Markers, the error can be isolated t o be between
two consecutive resynchronization markers and generally al1 of the da ta that
corresponds to the macroblocks between these ttvo resynchronization markers
needs to be discarded. Hence, a lot of macroblocks are discarded and this can
result in highly annoying visual artifacts if not concealed properly.
4.2 Error Resilient Entropy Coding
The Error Resilient Entropy Code (EREC) is a method for coding variable length
blocks of data, before transmitting them on channel, with low overhead and a high
resilience to transmission errors. EREC is actually a met hod of synchronization t hat
enhances the resilience of coded bitstream to transmission errors. EREC does this
by effectively reordering the variable length block da ta produced by video coding
schemes into a fixed length slot structure, such that each variable length block starts
at a known position in the bitstream. This means that decoder is autornatically
synchronized with the encoder at the start of each block. Hence EREC achieves
resynchronization a t the beginning of each variable length block.
The most advantageous feature of EREC is that it achieves resynchronization
more frequently but wit h minimal redundancy, t hus maintaining the high compression
achieved by the video-coding scheme. EREC has been applied to both still image
compression and video compression schemes like H.261, MPEG-2 and H.263 and has
shown improved performance for noisy channel 1161. EREC is applied to the MPEG-
4 video for the first time in this research work. The method is described in detail in
[dl, [16] and [lï], So we give only a brief surnmary here.
4.3 Assumpt ions
EREC is a general method of achieving synchronization, that can be applied to any
image or video-coding scheme to provide enhanced resilience to transmission errors
provided the following assumptions are met:
Each variable length block must be a prefix code. An overall prefix code is a set
of codewords whose length can be determined by reading the individual bits of
a codeword, one by one, without reading beyond the end of the codeword. So
for a prefix code the decoder is aware of when it has decoded a block without
any reference to previous block or following information.
The information belongiiig to each variable length block is causal; that is, the
channel errors affect the current and following blocks only.
Most of video coding standards like MPEG-2, H.263 and MPEG-4 use entropy based
source coding such as Huffman coding or arithmetic coding. Hence these assumptions
are met by most video coding schemes.
4.4 Operation of EREC Encoder
If there are N variable length blocks of data, each of bits bi, then total number of bits
to be encoded by EREC encoder is given as:
EREC encoder encodes these variable length blocks of data in to a fixed length slot
structure, called EREC frame. The EREC frame structure consists of total Ts bits.
These Ts bits are arranged in to N slots of fixed length s j 7 such that:
The EREC encoding algorithm proceeds in stages. In the first stage, as much data as
possible is placed from block i to the corresponding dot j = i. However some blocks
have lengths longer than slot size, while some have length shorter than slot size. In
subsequent stages of algorithm, the data left from longer blocks is filled in the spaces
left by shorter blocks. This shifting of data is governed by an offset sequence. For
example the offset sequence used in figure 4.3 is O, 1 ,2 ,3 ,4 ,5 ,6 ,7 ,8 ,9
Figures 4.3 and 4.4 show an example of EREC encode process. There are 10
variable length blocks of data, giving a total of 70 bits. These 70 bits are rearranged
into an EREC frame as shown in figure 4.2. The EREC frame comprises of 10 dots
each of length 7 bits. The length of 7 bits is chosen as slot length, so that Ts = Tb.
At stage 1, the offset is O and the blocks are fitted in to corresponding slots. At
stage 2, the offset is 1, and each block with data still to be placed searches for the
dot next to it, if this next dot has space al1 or as mush data as possible is placed in
it. At stage 3, offset is 2 and each block with data still to be encoded searches for the
slot next to its adjacent slot and so on, until al1 the variable length block data has
been encoded in to fixed length slots. The number of variable length blocks bound
the maximum number of stages. In figures 4.3 and 4.4:
the blocks with number of bits equal to space available in slot, Le., bï = sj, are
encoded completely, leaving the slot full(e.g. dot number 7).
the blocks with bits less than the slot length, i.e., bi < s j are coded completely,
leaving the sj - bi bits unused in the slot (e.g. slot numbers 2, 4, 5, 6 and 10)
the blocks with bits more than the slot size i.e.,bi > s j , have their sj bits coded
Figure 1.2: Variable length blocks(1eft) are fitted into fixed leligth EREC slot struc-
ture (right)
to fil1 the slot, and leave bi - sj bits remaining to be coded (e.g. slot numbers
1, 3, 8, and 9)
Thus a t the end of stage 1, slots 1, 3, 7, 8 and 9 are full, while dots 2, 4, 5, 6 and 10
have space left to code data for blocks 1, 3, 8, and 9.
4.4.1 Operation of EREC Decoder
The only information EREC decoder requires to perform the decoding, is the knowl-
edge of number of slots and their size, to perform the decoding process. This addi-
tional information in present in EREC frame header and is encoded using a traditional
Error Correcting Code. Since it is small, it presents no significant overhead. The de-
coding process is just the reverse of encoding. In the first stage of decoding, the
decoder parses the data up each slot until either the End of Block (EOB) or the End
of Slot, whichever cornes first.
O If the decoder reaches the EOB, before the end of slot, then the block was short
and whole block is successfully decoded.
Figure 4.3: The stages of EREC Encoding Process: Stage 1 (left) and Stage 2 (right)
Figure 4.4: The stages of EREC Encoding Process: Stage 3 (left) and Final Stage
(right)
a If the decoder reâches the end of slot and does not find the EOB, then this was
a longer block, and this block cannot be decoded in current stage.
In the later stages of the algorithm, the decoder shifts the data from offset slots back
in to the slots where it belongs according to the offset sequence. This continues until
al1 of the blocks are reconstructed.
4.4.2 EREC Parameters
The EREC parameters are:
a Total number of bits to be transmitted in an EREC frame, Ts. This parameter
is usually kept equal to the Total number of bits in al1 the variable length blocks
of data i.e., Tb.
a Number of Slots, N. In simple cases, number of slots can be made equal to the
number of variable length blocks.
Slot size or Slot length, S.
In the proposed scheme, the first two parameters are calculated a t the transmitter
side, prior to the start of EREC encode process. These parameters are calculated
on the bases of the information obtained by "Bitstream Parser", a stage prior to the
EREC encoder during the transcoding. The relationship between above parameters
can be given by the simple formula given beIow.
The total number of bits in the variable length blocks,Tb is not necessarily a multiple
of number of sIots. In that case the total number of bits in variable length blocks of
data are fitted in an EREC frame by making the first few slots one bit longer than
the last ones.
sj = D I V ( T ~ , N ) + 1, vo < j j IMOD(T~, N ) (4-4)
sj = DIVb(Tb, N ) , VMOD(Tb7 N ) < j < N
For example, if the total number of bits of al1 the variable length blocks of data is 93
and number of blocks are 10, then according to above equations, each of the first 3
slots is made one bit longer than the rest of the slots. That is, for slots 1, 2, and 3
length is equal to 10 bits, while each of the d o t from 4 to 10 has length equal to 9
bits. Hence, in above example, first 3 do t s are made 1 bit longer t o accommodate al1
the variable length block data.
The information relating to the number of dots and their sizes is extremely im-
portant. If this information is received incorrectly, then the whole EREC frame is
lost. This information must be highly protected with FEC.
4.5 Implementation Issues of EREC
There are some issues regarding the implementation of EREC to existing video coding
schemes, which are described in this section.
4.5.1 Highly Protected Parameters
The transmission and protection of header information is the only area in which EREC
adds any overhead. For typical EREC frame sizes of 50 Kbits in intra pictures, an
18 bit code is needed to describe this length, which if protected up to a bit error rate
(BER) of IO%, corresponds to a coding overhead of the order of 0.2%. [16]. I t has
been seen that despite of this requirement of protecting the EREC header, it is still
expected to give improved performance as compared to the use of Resynchronization
Markers. The reason being that the saving of bits by avoiding Resynchronization
Markers is more than the bits we may use for the EREC header protection even in
worst case channel conditions.
4.5.2 Buffering
Al1 the blocks of da ta in one EREC frame must have been received by the encoder,
before EREC encoding can begin. Similarly the decoder must receive the whole frame
before decoding can begin. The EREC scheme, therefore, introduces a delay of two
EREC frames. This will typically be eight slices of picture for MPEG-2 (128 lines),
which usually corresponds to less than 20 of the picture [4].
4.5.3 Error Propagation
If an error occurs in one EREC slot during transmission, it can affect the decode
process in other slots. Thus EREC structure can extend one error to other slots. For
example, channel errors can cause the End of Block (EOB) to be missed or falsely
detected. This will cause al1 the remaining da ta for that block to be incorrect, and the
block is termed as erroneous for the current and later stages in the decoder algorithm.
Let us Say, slot 5 contains data from macroblocks 5, 1 and 8. One error in block 5
will, therefore cause errors a t the end of blocks 1 and 8, as well as error in block 5.
Thus one error in a short block may cause errors towards the end of other long blocks.
It has been seen that, in case of channel errors, the farther the da ta is from the start
of the slot in which it is coded, the more likely it is, that it will be in error [16]. This
is because during the encoding process of a long block, some of the d a t a of the long
block is fitted in to spaces left by some shorter block. If an error occurs in one of
the shorter blocks, this may cause the decoder to assume an incorrect length of the
shorter block. In this case, during the EREC decoding process the wrong data will be
shifted back into the long block resulting in the corruption of some of the data a t the
end of the long block. Thus, it is important t o place more important information near
the start of each dot . For most standards like JPEG and MPEG, this is obviously the
case, silice the coefficients representing the lower frequency parts of image are coded
first [20].
In practice, blocks representing high activity regions in an image will require many
bits to be encoded and wiI1 be longer. In case of image and video coding, the bits
a t the end of longer blocks typically correspond to high-frequency information. As
described above, the data most likely to suffer from channel error propagation is
that placed in later stages of algorithm i.e., the data near the end of longer blocks.
Thus most of the effects of channel errors will be seen as high frequency errors in
high activity regions of the image. In case of subjective testing the distortion in these
regions of an image is less noticeable as compared to the errors in low activity regions.
EREC scheme, therefore, provides a subjective benefit as well.
4.6 EREC Performance in case of Burst Errros
It is further noticed that if two or more errors occur, then the total number of bits
affected will not be much greater than that for one error. The reason for this is that
successive errors in a burst are more likely to affect the same information and thus
multiple errors together can often be considered as only a single error event. For
example: if there is an error in an EREC slot and rve assume that this error causes
loss of synchronization until the next slot, then subsequent errors in that EREC slot
will have no further effect. Hence EREC copes well in case of burst errors too [5] .
Because of this property, EREC is capable of showing graceful degradation a t higher
bit error rates.
4.7 Conclusion
The EREC scheine has the many advantages including very low overhead, frequent
synchronization, graceful degradation a t increased biterror rates and the ability to
cope well with random as well as burst errors. Frequent synchronization with minimal
overhead make EREC useful for applications where channel coding is too expensive
and some loss of fidelity is preferred over complete breakdown a t high error rates.
Also graceful degradation gives user some indication of the channel condition and
situation can be improved by rerouting the channel? or by changing the frequency of
radio link etc. Hence EREC can be used for applications like speech, image and video
transfer over cellular netwoks and wireiess channels. EREC is a very good option for
transferring video over wireless channels as channel conditions are very unpredictable
and not only the error rate is high but errors occur in bursts.
The above arguments suggest that EREC technique is a very good option to
achieve synchronization as well as error resilience for transmitting the compressed
image and video da ta over noisy channel. The following part of the thesis describes
how we have used the EREC technique to replace resynchronization marker used in
MPEG-4 video.
Chapter 5
Transcoding of MPEG-4 Video
using EREC
This chapter describes how we have used EREC for the error resilient transmission of
IVIPEG-4 video over channels subject to randorn bit errors. It explains in detail the
contribution of the research work, implementation details and the results achieved.
The thesis considers the video data that has been already compressed using
MPEG-4 standard video coding scheme. We have used a lossless "black box" a p
proach [4] shown in figure 5.1. The MPEG-4 compressed video data, from a stan-
dard MPEG-4 encoder, is transcoded in to a more resilient structure using EREC,
transmitted, and finally recorded back, to be read by a standard MPEG-4 decoder.
The transcoder and inverse transcoder is lossless and reversible, so in the absence of
channel errors, the output will be equivalent to the input. The transcoder is designed
not to significantly alter the bit-rate, so the transcoded da ta could be transrnitted
t hrough the original channel.
We have used a standard MPEG-4 video coder/decoder with single layer coding
option. The proposed scheme replaces the existing met hod of synchronization (Resyn-
chronization Markers) used in standard MPEG-4 codec by an alternative synchroniza-
tion technique called EREC. The synchronization is achieved a t rnacroblock level in
Figure 5.1: The MPEG-4 lossless Transcoder
MPEG-4
compressed--,
bitstream
.cmP 1 Inverse Transcoder 1 .erec
.erec
Figure 5.2: The block diagram of proposed transcoding scheme
MPEG-4
compressed '-,
bi tstream
T~iîm~~Ier . ' Lossy channel
.yuv
both intra video object plane (1-VOP) and inter video object plane (P-VOP) using
the EREC. This simply means that the macroblock data from MPEG-4 compressed
bitstream is organized in to a fixed length slot structure such that each macroblock
starts at the beginning of each slot. Since the decoder also has the information about
this fixed length slot structure hence synchronization is achieved at the beginning of
each macroblock. A very simple concealment method is used that is explained later
in this chapter.
The MPEG-4 compressed bitstream consists of macroblock data (containing DCT
coefficients and motion vectors etc), preceded by header information such as video
Inverse
Tmscoder
-
MPEG-4 Encoder
Encoder b
Transcoder -
object (VO) header, Video Object Layer (VOL) header, and Video Object Plane
(VOP) headers. The EREC algorithm is based on re-organization of the variable
length blocks in a way that each block starts a t a specific known position within the
code.
-4 bitstream parser is designed to extract different data. from the compressed
MPEG-4 bitstream. The input MPEG-4 bitstream is parsed to obtain variable length
bIocks of data, in this case, group of macroblocks. The bitstream parser extracts
the macroblock (MB) data from compressed bitstream without act ually decoding
the macroblocks. While doing so, the parser keeps track of the total number of
rnacroblocks, N and the number of bits in each macroblock. This information is later
used to calculate the slot size and the number of slots for the EREC frame. So the
bitstream parser gets rid of al1 the synchronization words and header information and
produces the macroblock data at its output.
Once al1 the macroblock data has been obtained from bitstrearn parser, it is passed
on to EREC encoder that fits it in to a fixed length slot structure. The synchroniza-
tion is achieved at the start of each macroblock since each dot starts at the beginning
of a macroblock. Hence synchronization is achieved more frequently Le., a t mac-
roblock level than the standard MPEG-4 decoder, without any additional overhead
of sync. words. EREC encoder outputs the transcoded data ( an .erec file) which
is transmitted through the channel along with the necessary header information. At
receiver side EREC decoder parses the slots, and outputs the data to a bitstream
formatter. The output of this formatter is a standard MPEG-4 bitstream structure
(.cmp file) which in ideal case should be exactly the same as original bitstream at
the transrnitter. The standard MPEG-4 decoder operates normally and decodes this
.cmp file to produce a .yuv file that can be displayed.
Bitstream Parser
We designed and developed the code for bitstream parser using the MPEG-4 decoder
code written in C++. We have modified the functions in MPEG-4 decoder that
helped to get macroblock data from the compressed bitstream. This macroblock data
\vas later used as the input of EREC encoder.
5.1.1 Pseudo-Code
% The bi t s t ream parser taises t h e compressed MPEG-4 bitstream a t
% i t s input and produces two f i les a t its output . One of them
% conta ins t h e macroblock d a t a from a l 1 t h e VOPs and t h e o the r one
% conta ins al1 the header information from MPEG-4 compressed bi ts tream.
% Every t i m e t he program g e t s b i t s f o r parsing, it also t r a n s f e r s
%these b i t s t o t h e EREC bi t s t r eam
g e t VO Header Information
g e t VOL Header Information while ( End of input has not been reached )
s t o r e r e t r i e v e d header information f o r f u r t h e r use i n an output f i l e
I d e n t i f y VOP start code
For each VOP
{ get VOP Header
s t o r e r e t r i eved header information f o r f u r t h e r use i n an output f i l e
Parse VOP Macroblock d a t a based on type
If IVOP
f o r ( i n t i=O; i < number of MBs; i++)
{
p a r s e MB inforomation
count b i t s e x t r a c t e d
f o r ( i n t i=0 ; i < number of Blocks ; i++)
{
p a r s e Block i n f o
count B i t s e x t r a c t e d
1 1
1 If PVOP
{ f o r ( i n t i = O ; i < number of MBs; i++)
{ p a r s e MB inforomation
g e t motion v e c t o r in foromat ion
count b i t s e x t r a c t e d
f o r ( i n t i = O ; i < number of Blocks ; i++)
{ p a r s e Block i n f o
count B i t s e x t r a c t e d
1 1
1 s t o r e al1 t h e macroblocks and motion vec to r d a t a t o be sent t o EREC encoder
1
MPEG-4 Compressed Biistream
1
t
i a Macroblock \ 1
/ / information is put i j
VO
Header
i in to a dot structure \ 1 cailed EREC fnmc \
1
VOL
Header
i j according to EREC ,
VOP,
Heder
Figure 5.3: Two bitstreams running in parallel in EREC encoder
i
EREC Encoder
Mxroblocks
of VOP,
VO
Header
We have implemented the algorithm for EREC encoder in C++. The algorithm
takes macroblock data as input and fits it in to slots of fixed length. The length and
number of slots is calculated based on the total number of bits in al1 the variable
length macroblocks. The total number of macroblocks, and nurnber of bits in each
macroblock are obtained during bitstream parsing. The encoder has two bitstreams
running in parallel. The first one is the MPEG-4 compressed bitstrearn. The second
one is the EREC bitstream, which is empty at the start of the encoder program.
This bitstream is built as the encoder program progresses. The parser stores al1
the macroblock data in to a file. The information in this file is used to calculate the
number of macroblocks in one VOP and total number of bits in each macroblock.
This information is later used in EREC Encoder to calculate the number of slots and
slot size. EREC encoder starts filling the slots such that each macroblock is put in
one slot (starting from the bottom, going to the top). If the length of a macroblock
is greater than the do t size, the da ta from this macroblock is placed in a neighboring
VOP,
Header
VOL
Headcr
Macroblocks
of VOP2
VOP,
Header
ERECfrmc
containing
MBs of VOP,
EREC
fnmeI
Header
VOPz
Header
EREC
f m e 2
Heûder
E R E C f m e
containhg
MBs of VOP2
1 Dump in to macroblock file that is I > I p i n i IO be convertcd in to EREC
1 dot structure by applying EREC 1 l algorithm
Dump in to EREC I EREC Encoding
Figure 5.4: Logical blocks in EREC encoder
dot (having some empty space) a t a later stage. Eventually at stage N-1, al1 the data
from N macroblocks is fitted in to the N slots. EREC sIot structure is sequentially
put in to the EREC bitstream. The EREC bitstream is transmitted through the
channel and is received by EREC decoder.
5.3 EREC Decoder
The EREC decoder takes a " .erec "file created by EREC encoder. This file only differs
from the ".cmp7' file, in that it has EREC headers (Le. information regarding, total
bits in an EREC frame and niimber of slots) and that the macroblock information has
been fitted in to the EREC frame structure. The boundaries between macroblocks
are implicit i.e. there is no specific codeword a t the end of a macroblock. For EREC
decoder to recognize the End of Block (a block is the contents of a slot, in this case a
Macroblock), MPEG-4 decoder must be incorporated in to the EREC decoder. Since
the EREC decoder does not completely decodes a slot in one pass (stage), the actual
MPEG-4 decoder functions were modified t o perform the functions described in Table
Table 5.1 The decoding procedure of EREC decoder
Scenario
End of do t is reached before
End of Block (EOB)
End of Block (EOB) is
reached before End of slot
End of slot coincides with
EOB
Actions perform by EREC decoder
This implies that macroblock \vas longer
than slot length length (bi > s i ) , and its
remaining data is placed in some other
offset slot. This dot is terrned flagged
as" partially decoded" -- - -
This implies that macroblock length is less
than slot length (bi < si), and macroblock
can be fully decoded at the current stage.
The d o t is flagged as "fully decoded".
However this slot contains data from some
other slot
The macroblock length is exactly the same
as the size of slot (bi = s i )
EREC decoder performs decoding in stages until al1 the slot data has been shifted
back to produce the variable length macroblocks. This macroblock data is reformatted
to create an MPEG-4 compressed standard bitstream or ".cmpn file, which is fed to
the standard MPEG-4 decoder for display.
5.4 EREC Decoding in the Presence of Channel
Errors
To deal with effects of transmission errors each slot has a flag, which is used to mark
the slots containing errors. Any macroblock that is put in the d o t " in Error State" is
also marked as erroneous and is not used in the subsequent stages of algorithm. An
error is flagged when eit her a coefficient is read t hat exceeds a certain t hreshold, an
invalid Huffman code is read, or the number of coefficients read for a block exceeds
the known maximum number of coefficients coded per block. Al1 the macroblocks in
error are replaced by the previously correctly decoded macroblocks.
5.5 Limitations of the EREC Scheme
5.5.1 Complexity of EREC Decoder
EREC Encoder is fast, since it does not have to perform actual decoding and it only
parses the bitstream and fits the macroblock data into the slot structure of EREC
frame. The complexity of EREC decoder is much higher than that of EREC encoder
because for the decoder to detect end of Block (EOB), it has to use the parsing
functions. These functions are used repeatedly for a particular slot since in an EREC
frame, slot data is spread over other offset slots and must be obtained from those
offset slots. For example the final offset value in most cases for the decoding process
is 396 (for 352 x 288 frame). This means that there is at least one slot, which uses
the parsing functions 396 times. The complexity of EREC scheme also increases with
the number of blocks N, and is known to be proportional to Log(N) for an efficient
implernentation. One solution to this cornplexity problem is to breakdown the large
number of blocks into several subframes of N blocks each. In this way N is reduced as
now each EREC frame contains less number of blocks. But there is a slight increase
in redundancy as each EREC frame will have some header information associated
with it.
5.5.2 Error Propagation
If an error occurs in one EREC slot during transmission, it ca affect the decoding
process in other slots as the data from longer bIocks is fitted in to spaces left by
shorter blocks. Any error in a short block may cause the wrong data to be shifted
back into the longer block during EREC decoding and some of the data at the end
of the longer blocks may be corrupted. However, due to inherent error extensions
towards the end of long EREC blocks, many transmission errors will occur in the
high frequency DCT coefficients because the data a t the end of longer macroblocks
corresponds to the high frequency DCT coefficients for high activity regions of image.
The distortions in these regions of an image are visually less noticeable as compared
to the errors in low activity regions. EREC scherne, therefore provides a subjective
benefit as well.
5.5.3 Buffering and Delay
The EREC algorithm requires that al1 the data for the N variable length blocks
coming from video encoder must be known before EREC encoding starts. Similarly
EREC decoding algorithm starts when the EREC decoder has received the entire
EREC frame. This implies a delay of two EREC frames. Also significant buffering
requirements need to be fulfilied. The effect of these limitations can be minimized, by
dividing one big EREC frame into several subframes of N blocks. There is, however,
a slight increase in redundancy associated with each EREC frame.
5.5.4 Enhancernents to the EREC
The EREC performance may be improved by optimizing its parameters according to
the application for which EREC is being used. The following section discusses some
enhancements to EREC [22]. The error propagations can be minimized by placing as
much data as possible in the first EREC shift. This is possible if we use few long slots
instead of many short slots. Hence a smaller proportion of the whole data is placed
towards the end of a slot and few error extensions are observed. However, fewer long
dots will attain synchronization less frequently while many short slots will achieve
synchronization more often. The choice between these two scenarios depeds upon
the application. Forexample, if the data encoded has relatively many high activity
regions then we will have many long blocks of data and we may choose the few long
dots to avoid error extensions.
Unequal error protecion may be provided to the da ta according to its importance
by using dots of different lengths for them. For example one long slot can be used
to accornodate al1 important motion vector information instead of spreading it across
many slots. Less important -AC coefficients can be placed in many short slots. Thus
total number of bits sent in an EREC frame will not be changed but error extensions
will not occur in motion vector information. It is essential that decoder has the
knowledge of al1 slot lengths in advance.
The choice of offset sequence is also an important parameter. For images and
sequences having high activity, the longer blocks will often be clustered together. In
this case a simple sequential offset sequence will cause the full slots to search the next
full slots during the early stages of the algorithm. -4 pseudo random sequence will be
a good choice in this case t o increase the spped of EREC encode process.
Al1 the above parameters can be optirnized according to the intended application
of EREC. There is an optimum set of parameters for a given EREC implementation.
5.6 Simulation Details
The scheme above has been simulated with various transmission methods:
Using standard MPEG-4 transmission with Resynchronization Markers a t the
start of each video packet
üsing EREC encoder and decoder and avoiding the use of synchronization words
Video sequences used for simulations are "foreman" (CIF 352 x 288 format) and "table
tennis" ( SIF 352 x 240 format). These sequences are in 4:2:0 YUV concatenated
format, where each frame is represented by al1 its luminance (Y) samples, followed
by al1 its chrominance (U) samples and finally al1 its chrominace (V) samples. The
resolution used is 8 bits per sample.
Table 5.2 MPEG-4 Simulation Parameters
I Parameter I Value I
( YUV Format 1 4:2:0 1 Number of Frames
I
1 FrameRate 1 30 Fframes per second 1
30
1 Target Bitrate 1 48 kbps 1 1 Scalability 1 None 1
1 Quantization step for PVOP 1 16 1
Alpha Type
Quantization step for IVOP
1 PVOPs Count between IVOPs ( 8 1
None
16
1 BVOPs Count between IVOPs 1 O 1
1 Data Partitioning
Motion Vector Search Range
Sprite Type
Reversible Variable Lenght Coding
1 Disabled 1
16
None
Disabled
1 Video Packet mode 1 Disabled 1
In experiments to explore the error robustness, the random error patterns are applied
directly to the macroblocks of the encoded video bitstream. It is assumed that a
transmission error will not effect the EREC header data that carries information
about the lerigth and number of slots. Hence random errors are introduced in the
Temporal Prediction Type 1 P P ...
macroblock data prior to fitting it in the EREC slot structure. For these simulations
we have ignored the effect of any concealment strategies and error correcting coding.
However: these techniques can be used in conjunction with al1 of the above methods
to further improve error resilience. For example, error correction coding can well be
applied to the output of EREC Encoder.
5.7 Overhead Analysis
The following example taken from the values during simulations gives an idea about
the overhead involved in EREC and compares i t with that for MPEG-4 resynchro-
nization markers.
MPEG-4, resynchronization markers are inserted a t the start of a video packet.
The total overhead involved is equal to the sum of size of resynchronization marker"
and the " Video Packet Header" . Video packet header contains information regarding
the Absolute Macroblock Number (MB no.) and Header Extension Code (HEC) etc.
Resynchronization markers are typically inserted after every 736 bits for the bit-
rates between 25Kbits/sec. to 48Kbits/sec. For a P-VOP having a total of 5600bits,
the insertion of resynchronization markers after every 736 means that there are 8
resynchronization markers in t his P-VOP. The overhead associated with one resyn-
chronization marker, as explained above, is the sum of size of resynchronization
rnarker and Video Packet Header. The size of resynchronization marker varies be-
tween 17-23 bits, and the Video Packet Header size can be taken to be equal to 15 bits
approxirnately. This results in an overhead of 35 bits approximately for one resyn-
chronization marker and hence an overhead of 35 x 8 = 280 bits per VOP. While if
the EREC scheme is used, one EREC frame accommodates one VOP and it uses one
EREC header. EREC header has information about "Total number of bits in EREC
frame" and that about "Slot sizes". We have chosen The EREC header size to be
equal to 30 bits; 20 bits are used to specify the EREC frame length and 10 bits are
used to specify slot sizes. This implies an overhead of only 30 bits per P-VOP. Hence
the number of bits saved per VOP using EREC scheme is 280 - 30 = 250 bits. Since
EREC information has to be heavily protected, some of the bits saved above can be
spent on FEC coding. For example, if the 30 bit EREC Header is saved up to a BER
of 10% using (32,6) augrnented Reed Muller (Distance 16) code. The number of bits
required be 160: which is still less than the bits saved using EREC.
The situation becomes even better for 1-VOP. For a typical 1-VOP having a size
of 34197 bits, the overhead by using resynchronization markers is 1610 bits approxi-
mately. While with EREC this overhead is again only 30 bits, saving 1610-30 = 1580
bits. This means saving of 6 times more bits than P-VOP.
5.8 Results: Experiment 1
First of al1 we show that our proposed scheme can be successfully used to replace the
Resynchronization Markers in MPEG-4. Figures 5.5 through 5.8 show corresponding
frames from two foreman sequences, one coded with MPEG-4 scheme using resyn-
chronization markers while the other coded using transcoded scheme with EREC.
Figure 5.5: -4 Frame of foreman sequence coded using standard MPEG-4 scheme with
resynchronization markers on error free channel.
Figure 5.6: The same frame of foreman sequence coded using proposed transcoding
scheme on error free channel.
Figure 5.7: 4 Frame of foreman sequence coded using standard MPEG-4 scheme with
resynchronization markers on error free channel .
Figure 5.8: The same frame of forernan sequence coded using proposed transcoding
scheme on error free channel.
Figure 5.9, 5.11 and 5.13 show frames from foreman sequence coded using proposed
transcoding scheme with EREC a t random bit error rates of 1 0 - ~ , 10-~, and IO-"
respectively These results show the performance of EREC in the event of random
channel errors. The results show that there is no significant degradation in visual
quality as the bit error rate increases. Unfortunately we do not have any visual
results available from MPEG-4 scheme with Resynchronization Markers, exposed to
above randorn errors. The reason being that the MPEG-4 codec available, was not
designed to handle the erroneous bitstream. However, we do have some PSNR values
available for some test sequences from some past research work [13]. Although these
PSNR values give a rough idea about the performance of MPEG-4 scheme using
Resynchronization Markers, but can not be used to compare with Our scheme, as
we do not know the exact parameters used a t the time of encoding. It is worth
mentioning that the scheme in [13], has used some post processing techniques to
conceal the errors and obtain a good picture quality while we have not used any
superior methods for concealment.
Although the results have been simulated using random errors but because of the
nature of this scheme, it is accepted to give improved performance in case of burst
errors too. Some results, in this thesis have been presented using Peak Signal to Noise
Ratio (PSNR) as a quality rneasure. However PSNR is not a very good indicator of
visual quality as it is an objective measure and does not take into account the tolerance
that human eye has for some distortion in images.
Figure 5.11: Proposed transcoding scheme using EREC with channel BER of IO-'
Figure 5.12: Proposed transcoding scheme using EREC with channel BER of 10-5
Figure 5.13: Proposed transcoding scheme using EREC with channel BER of 10e4
Figure 5.14: Proposed transcoding scheme using EREC with channel BER of IO-'
63
1.00E-06 t.00E-05 1 .OOE-134 1 .OU€-03
Bit Error Rate
Figure 5.15: PSNR (dB) Vs Channel BER for a frame of foreman sequence encoded
using proposed transcoding scheme.
5.10 Result :Experiment 3
Figure 5.15 shows the degradation of PSNR vs. channel Bit Error Rate (BER) for a
frame of foreman sequence. The simulation is done using random bit errors and the
results presented are for 10 runs. The PSNR values a t different error rates produce
a smooth curve showing tha t quality of decoded picture shows a graceful degrada-
tion when the bit error rate is increased. Hence, EREC scheme is not only able to
combat the effect of increase in error rate gracefully but achieves this at much less
overhead as compared with MPEG-4 scheme with Resynchronization Marker. Hence,
EREC scheme is not only able to combat the effect of increase in error rate grace-
fully but achieves this a t much less overhead as compared with MPEG-4 scheme with
Figure 5.16: P roposed transcoding scheme using EREC with channel BER of IO-^
resynchronization markers.
5.11 Result :Experiment 4
Figure 5.16, 5.17 and 5.18 show frames from "Table Tennis (SIF Format)" sequence
coded using proposed transcoding scheme with EREC a t random bit error rates of
10F6, IO-', and IO-' respectively. These results show the performance of EREC in
the event of random channel errors. The results show that there is no significant
degradation in visual quality as the bit error rate increases. Since Table Tennis
sequence has high spatial details and fast motions, the proposed algorithm is shown
to make the damaged area to be subjectively Iess visible.
Figure 5.17: Proposed transcoding scheme using EREC with channel BER of 1 0 - ~
Figure 5.18: Proposed transcoding scheme using EREC with channel BER of
Figure 5.19: PSNR (dB) Vs Channel BER for a frame of Table Tenis sequence encoded
using proposed transcoding scheme.
5.12 Result :Experiment 5
Figure 5.19 shows the degradation of PSNR vs. channel Bit Error Rate (BER) for
a frame of Table Tannis sequence. The simulation is done using random bit errors
and the results presented are for 10 runs. The PSNR values at different error rates
produce a smooth curve showing that quality of decoded picture shows a graceful
degradation when the bit error rate is increased. Note that Table Tennis sequeiice has
very fast motions and we know that the macroblocks corresponding to high activity
regions of image are longer as they require many bits to be coded. As explained
earlier, the data placed at the end of longer macroblocks is more likely to suffer from
channel error propagations. The the knee of PSNR Vs BER curve for table tannis
sequence occurs earlier as cornpared to that in foreman sequence. This is because
table tennis sequence has higher activity and hence has many longer macroblocks
compared to foreman sequence which has relatively low amount of movement. Many
longer macroblocks cause the error propagtion as the data from these macroblocks is
spread over other offset slots. Howevewr it is still able to combat the effects of the
errors and shows a smooth degradation.
5.13 Conclusion
In this chapter we have considered the performance of our proposed transcoding
scheme using EREC for two test sequences namely: 77foreman" and "table ten-
nis" Foreman" sequence has medium spatial details and low amount of movement
while "table tennis" has high spatial detail and fast motions in it. The visual results
presented in this chapter show that EREC scheme is a good alternative to the use
of resynchronization markers but with much less overhead. In case of transmission
errors our proposed scheme also performs well and limits the propagation of errors
and hence shows a graceful degradation as the channel BER increases. EREC is es-
pecially useful for sequences having fast motions like "table tennis" as most of the
channel error effects are seen as high frequency errors in high activity regions of the
images which are subjectively less noticeable as compared to errors in low activity
regions. The reason being that in practice, blocks representing high activity regions
in an image will require many bits to be coded and will be longer. The bits a t the
end of longer blocks typically correspon to high frequency DCT coefficients in case of
image and video coding. In EREC due to inhernt error propagation towards the end
of EREC slots, many errors will corrupt the high frequency DCT coefficients. Thus
most of the effects of channel errors are seen as high frequency errors in high activity
regions of the image which are visually less noticeable. However, EREC scheme is
flexible enough and by optimizing the EREC parameters according to the application,
it can give improved performace for most of the images and video sequences. -411 the
transmission errors are being handled by the EREC decoder which uses a very coarse
form of concealment and just replaces the erroneous macroblock with any previous
correctly decoded macroblock. However, EREC is capable of performing even bet-
ter than reflected by these results if used with more sophisticated error concealment
techniques.
Chapter 6
Conclusions and Future Directions
6.1 Conclusions
In this research work we have presented the Error Resilience Aspects of MPEG-4
video. A number of tools have been adopted in to the MPEG-4 video standard which
enable robust transmission of compressed video over noisy communication channels
such as wireless links. One of these tools is the use of resynchronization markers t o
achieve synchronization even when the bitstream gets corrupted due to channel errors.
The use of resynchronization markers involves a tradeoff between the "amount of da ta
discarded because of transmission errors" and the "compression efficiency". By using
resynchronization markers more frequently the amount of da ta discarded can be made
less but a t the cost of increased overhead that offsets the compression achieved by
the encoder. There are, however, other methods that improve the performance of
the MPEG-4 video over these noisy channels, that standard does not specify. We
have used such a method, called Error Resilient Entropy Coding (EREC), in our
proposed scheme. EREC is an alternative to using resynchronization markers, tha t
limits the amount of da ta discarded in the event of transmission errors a t the cost of
very lit tle overhead. The proposed scheme trancodes the MPEG-4 compressed video
into an error resilient structure using the EREC technique, making it less vulnerable
to channel errors.
EREC scheme does not suffer from loss of synchronization and catastrophic failure
typical of cliannel coding. Similarly it is a better alternative to the use of resynchro-
nization markers which provide synchronization a t the cost of increased overhead.
Both channel coding and resynchronization markers increase overhead and cause
some sacrifice in coding efficiency achieved by video coding scheme. EREC, how-
ever, achieves more frequent synchronization and enhanced resilience to transmission
errors with much less overhead than the above two schemes. The overhead involved
in case of channel coding is due to the addition of extra parity bits that are added to
the compressed bitstream to allow the decoder to correct certain number of errors.
More powerful channel codes can provide the protection for increased bit error rates.
In fact the more the protection, provided by channel coding, the more is the over-
head associated with it. Aithough resynchronization markers are bit patterns of fixed
length but they are place approximately a t regular intervals in the compressed bit-
stream. Hence they also involve overhead. In contrast to both of the above schemes,
EREC requires l e s t overhead and is still able to provide a quality comparable t o
above two schemes. The only overhead involved in EREC is the EREC header that
contains information relating t o the length and number of slots. This information is
very important and is used by decoder to perform the decoding. Simulation results
show that EREC saves considerable number of bits per frame by avoiding the use
of resynchronization markers. This saving in bits is even more, in case of an 1-VOP.
Some of the bits saved can be used to protect the EREC header information. The
overhead ânalysis of resynchronization markers and EREC header reveals the fact
that EREC scheme has less overhead as compared to the use of resynchronization
markers even if its header is protected very heavily by employing powerful channel
coding.
EREC scheme provides a graceful degradation in quality as it has the capability
t o cop well with burst as well as random errors. EREC scheme can localize the
errors to only corrupted macroblocks, while resynchronization rnarkers can localize
the errors only to the separation between two resynchronization markers. In case
of an error in one macroblock, al1 the macroblocks between two resynchronization
markers are discard that can cause highly annoying visual; artifacts. The ability to
provide frequent synchronization and graceful degradation is very advantageous not
only for end users but for content providers as wel1. From content providers point
of view EREC is less expensive alternative to channel coding and resynchronization
markers. Channel coding is expensive in terrns of added redundant information.
Hence by using EREC, a content provider can accommodate more channels and hence
more users under a fixed bandwidth constraint. EREC scheme will provide the user
almost the same quality of video as obtained by using channel coding but with much
Iess noticeable distortion when error rate increases on the channel. The property of
graceful degradation rather than abrupt failure of a coding scheme as channel error
rate increases is important for rnany reasons. First it provides users with a warning
of difficult conditions over the channel and allows them to improve the situation by
rerouting the channel, changing the frequency of a radio link and by adjusting the
position of receiver.
EREC is more users friendly than channel coding. In case of noisy channels EREC
produces distortion only approximately as long as the duration of burst. Hence data
received after the burst can be useful to the user, in contrast to channel coding which
may cause the complete breakdown of picture received, if the depth of interleaving is
insufficient to deal with the duration of burst. EREC bas many applications in source
coding systems for noisy and unpredictable channels such as wireless where channel
coding is too expensive as it adds redundant information and some loss of fidelity
is preferable to complete breakdown. Example applications include the transmission
of speech, images or video over cellular networks or noisy telephone lines or weak
radio links. If used with proper error concealment techniques, EREC scheme can be
more useful for images and video sequences with high activity such as faces etc. as
compared to the sequences with low amount of movement such as landscape. The
reason being that EREC reduces the channel error propagation effects and that the
remaining channel error propagation rnost likely affects high frequency information
for more active blocks. These error are subjectively less visible than errors in inactive
regions or errors in low frequencies and motion vectors. Our transcoding scheme has
the added advantage that it can be used with the existing coders/decoders without
any need to change them, hence it is standard compatible.
6.2 Future Directions
Synchronization information in the compressed bitstream is also prone to channel
errors. If a channel error corrupts the synchronization marker then the whole video
packet needs to be discarded. Similarly an error occurring in EREC header can render
the whole EREC frame unusable. Future directions of research on "Transcoding of
MPEG-4 using EREC" include a through investigation on the protection of EREC
frame header using Forward Error Correction Coding and the performance comparison
to see how the proposed scheme behaves if the EREC header gets corrupted. At
the same time, same type of experimentation should be done on standard MPEG-4
schemes by corrupting the Resynchronization Markers. Furthermore, the complexity
analysis of EREC decoder shouid be performed to see the feasibility of a built in
EREC decoder within MPEG-4 decoder.
Further improvements within the EREC are also open to exploration. The o p
timization of EREC parameters like EREC Frarne size, Number of SIots and Slot
lengths can lead to more error resilience and understanding of EREC concept. For
example, separate slots can be allocated for .4C and DC coefficients, which can allo-
cate different levels of resilience to coefficients of differing importance. Use of pseudo
random offset sequence and the use of "Hierarchical EREC", are some more areas to
be esplored. Also the option of dividing a big EREC frame into several subframes
can decrease the complexity of EREC scheme. More sophisticated techniques can be
used for error concealment in conjunction with EREC.
Widespread use of Multimedia Communication on rvireless channels is dependent
a lot on the reliable and efficient delivery of image and video sequences. This is a very
active research area right now. With MPEG-4 being the first standard specifically
designed for multimedia communication, standard as well as non standard approaches
should be explored to make MPEG-4 video more robust. EREC should be considered
to be one of a selection of techniques for error resilience coding as i t can be used in
conjunction with error correction coding, error concealment and even synchronization
code worcls for systems that suffer from bit insertion or deletion errors.