error-prone channels - university of toronto

Transcoding of MPEG-4

Compressed Video Over

Error-Prone Channels

Aneela Jahan Zaib

.i\ t hesis submitted in conformity with the requirements

for the degree of Master of -4pplied Science

Graduate Department of Electrical and Computer Engineering

University of Toronto

@?O01 Copyright by Aneela Jahan Zaib

National Library I+I ,canada Bibliothèque nationale du Canada

Acquisitions and Acquisitions et Bibliographie Services services bibliographiques 395 WeUington Street 395, rue Wellington Oltawa ON K i A O N 4 OttawaON K 1 A W Canada Canada

The author has granteci a non- exclusive licence aiiowing the National Library of Canada to reproduce, loan, distribute or sell copies of this thesis in microform, paper or electronic formats.

The author retains ownership of the copyright in this thesis. Neither the thesis nor substantial extracts Fom it may be p ~ t e d or otherwise reproduced without the author's pemiission.

L'auteur a accordé une licence non exclusive permettant à la Bibliothèque nationale du Canada de reproduire, prêter, distribuer ou vendre des copies de cette thèse sous la forme de microfiche/fdm, de reproduction sur papier ou sur format électronique .

L'auteur consewe la propriété du droit d'auteur qui protège cette thèse. Ni la thèse ni des extraits substantiels de celle-ci ne doivent être imprimés ou autrement reproduits sans son autorisation.

Transcoding of MPEG-4 Compressed Video Over

Error-Prone Channels

AneeIa Jahan Zaib

A thesis submitted in conformity with the requirements for the Degree of Master of

Applied Science, Graduate Department of Electrical and Cornputer Engineering, in

the University of Toronto, 2001

Abstract

This thesis considers the performance of MPEG-4 compressed video over noisy

channels. This is a design project that proposes a synchronization technique for im-

proving the resilience of MPEG-4 video to transmission errors, without the addition of

any extra redundancy into the bitstream. The errors on noisy transmission channels

cause the loss of synchronization between encoder and decoder. The proposed scheme

transcodes the MPEG-4 compressed video bitst ream into an error resilient structure

for transmission over noisy channels. The Resynchronization Markers, traditionally

used in MPEG-4 for resynchronization are quite long and hence cause considerable

overhead. Furthermore, in the event of a transmission error al1 the data between

two consective Resynchronization markers needs to be discarded. The transcoding

scheme proposed in this thesis avoids the use of these long Resynchronization Mark-

ers and instead achieves resynchronization with minimal overhead using a technique

called Error Resilient Entropy Coding (EREC) that also provides enhanced resilience

to transmission errors. The proposed scheme is standard compatible and can be

implemented without any change in current codecs.

Acknowledgement s

1 would like to thank Prof. -4nastasios N. Venetsanopoulos and Prof. Kostas

Plataniotis, my research supervisors for their guidance throughout the course of my

graduate studies. I am extremely grateful t o my family and friends, and al1 a t Com-

munications Group, who have been so generous in their support. 1 would Iike to

thank my husband, Jahan Zaib Ali, for his infinite love and never ending support.

I t was not possible for me to complete this research work successfully without his

encouragement. Finally, 1 thank my precious son, Zoraiz Jahan, who put a great deal

of tolerance in me and provided a very pleasant diversion in some critical times.

Contents

List of Figures vi

1 Introduction 1

. . . . . . . . . . . . . . . . . . . . . . 1.1 History of Video Compression 2

. . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Video Compression 3

. . . . . . . . . . . . . . . . . . 1.2.1 Effect of Transmission Errors 3

1.2.2 Traditional Approaches to deal with Transmission errors . . . 4

1.2.3 Error Resilient -4pproach . . . . . . . . . . . . . . . . . . . . . 4

. . . . . . . . . . . . . . . . . . . . . . . . 1.3 ContributionoftheThesis 4

. . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Thesis Organization 5

Image and Video Compression 7

. . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Standard Video Coder 9

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Video Decoder 10

. . . . . . . . . . . . . . . . . . . . . . . . . 2.3 MPEG-4 Video Standard 10

2.3.1 Content Based Functionality -.Concept of Video Object Planes 11

. . . . . . . . . . . . . . . . . 2.4 Visual Bitstream Syntax and Structure 13

. . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.1 Start Codes 13

. . . . . . . . . . . . . . . . . . 2.4.2 Visual Object Sequence (VS) 13

. . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.3 Video Object 14

. . . . . . . . . . . . . . . . . . . . . . . . 2.4.4 Video Object Layer 14

. . . . . . . . . . . . . 2.4.5 Group of Video Object Planes (GOV) 14

. . . . . . . . . . . . . . . . . . . . 2.4.6 Video Object Plane (VOP) 14

. . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.7 h~lacroblock 16

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.8 Block 16

. . . . . . . . . . 2.5 Coding of Shape, Motion and Texture for each VOP 16

2.6 Support for Conventional as well as Content based Functionalities . . 19

3 Error Resilient Video Coding/Decoding 21

3.1 Traditional Robust Coding: Forward Error Correcting Coding . . . . 23

. . . . . . . . . . . . . . . . . . . . . 3.1.1 Bitrate-Quality Tradeoff 23

. . . . . . 3.2 Error Resilient Approach to deal with Transmission errors 25

. . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Error Resilience Tools 26

. . . . . . . . . . . . . . . . . . . . . 3.4 MPEG-4 Error Resilience Tools 29

. . . . . . . . . . . . . . . . . . . . . . 3.5 Resynchronization in MPEG-4 30

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6 Conclusion 31

4 Error Resilient Entropy Coding (EREC) 32

. . . . . . . . . . . 4.1 Disadvantages of Using Resynchronization Words 33

. . . . . . . . . . . . . . . . . . . . . 4.2 Error Resilient Entropy Coding 34

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Assumptions 35

. . . . . . . . . . . . . . . . . . . . . . . 4.4 Operation of EREC Encoder 35

4.4.1 Operation of EREC Decoder . . . . . . . . . . . . . . . . . . . 37

. . . . . . . . . . . . . . . . . . . . . . . . 4.4.2 EREC Parameters 39

. . . . . . . . . . . . . . . . . . . . . 4.5 Implementation Issues of EREC 40

. . . . . . . . . . . . . . . . . . 4.5.1 Highly Protected Parameters 40

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.2 Buffering 41

. . . . . . . . . . . . . . . . . . . . . . . . 4.5.3 Error Propagation 41

. . . . . . . . . . . . . . . 4.6 EREC Performance in case of Burst Errros 42

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7 Conclusion 42

5 Transcoding of MPEG-4 Video using EREC 44

5.1 Bitstream Parser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

5.1.1 Pseudo-Code . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

5.2 EREC Encoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

5.3 EREC Decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

5.4 EREC Decoding in the Presence of Channel Errors . . . . . . . . . . 51

5.5 Limitations of the EREC Scheme . . . . . . . . . . . . . . . . . . . . 52

5.5.1 Complexity of EREC Decoder . . . . . . . . . . . . . . . . . . 52

. . . . . . . . . . . . . . . . . . . . . . . . 5.5.2 Error Propagation 53

. . . . . . . . . . . . . . . . . . . . . . . 5.5.3 Buffering and Delay 53

5.5.4 Enhancements to the EREC . . . . . . . . . . . . . . . . . . . 53

5.6 Simulation Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

5.7 Overhead -4nalysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

5.8 Results: Experiment 1 . . . . . . . . . . . . . . . . . . . . . . . . . . 57

5.9 Resu1t:Experiment 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 60




5.13 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

6 Conclusions and Future Directions 70

6.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

6.2 FutureDirections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

Bibliography 75

List of Figures

Two Stage Process for Reducing the Temporal and Spatial Redundancy

in Video Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . Standard Video Coder

Hierarchical levels in an MPEG-4 bitstream . . . . . . . . . . . . . .

. . . . . . . . . . Example of Visual Information - Logical Structure

Example Visual Bitstream - Separate Configuration Information and

. . . . . . . . . . . . . . . . . . . . . . . . . Elementary Stream Data

. . . . . . . . . . . . . . . . . . . . . . . Source and Channel Coding

Decoder can only isolate the error to be between two resynchronization

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . points

. . . . . . . . . . . . . . . . . . . . . . . . An MPEG-4 video packet

. . . . . . Image coding using Huffman coding of transformed blocks

Variable length blocks(1eft) are fitted into fixed length EREC slot struc-

ture (right) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

The stages of EREC Encoding Process: Stage 1 (left) and Stage 2 (right) 38

The stages of EREC Encoding Process: Stage 3 (left) and Final Stage

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (right) 38

. . . . . . . . . . . . . . . . . . . . The MPEG-4 lossless Transcoder 45

. . . . . . . . . . The block diagram of proposed transcoding scheme 45

Two bitstreams running in parallel in EREC encoder . . . . . . . . . 49

Logical blocks in EREC encoder . . . . . . . . . . . . . . . . . . . . . -4 Frame of foreman sequence coded using standard MPEG-4 scheme

with resynchronization markers on error free channel. . . . . . . . . .

The same frame of foreman sequence coded using proposed transcoding

scheme on error free channel. . . . . . . . . . . . . . . . . . . . . . . . -4 Frame of foreman sequence coded using standard MPEG-4 scheme

with resynchronization markers on error free channel. . . . . . . . . . The same frame of foreman sequence coded using proposed transcoding

scheme on error free channel. . . . . . . . . . . . . . . . . . . . . . . . Proposed transcoding scheme using EREC with channel BER of

Proposed transcoding scheme using EREC with channel BER of 10-6

Proposed transcoding scheme using EREC with channel BER of




PSNR (dB) Vs Channel BER for a frame of foreman sequence encoded

using proposed transcoding scheme. . . . . . . . . . . . . . . . . . . . Proposed transcoding scheme using EREC with channel BER of 1 0 - ~

Proposed transcoding scheme using EREC with channel BER of IOh5


PSNR (dB) Vs Channel BER for a frame of Table Tenis sequence

encoded usinn ~ r o ~ o s e d transcodine; scheme. . . . . . . . . . . . . . .

vii

List of Acronyms

AIR

BER

BMC

DCT

EREC

GOB

HEC

ISO

IEC

ITU

1\11 B

h1 C

ME

MM

QP

RLVC

RM

VLC

vo VOL

VOP

Adaptive Intra Refresh

Bit Error Rate

Block Motion Compensation

Discrete Cosine Transform

Error Resilient Entropy Coding

Group of Blocks

Header Extension Code

International Organization for Standardization

International Electrotechnical Commission

International Telecommunication Union

h/Iacroblock

Mot ion compensation

Motion estimation

Motion Marker

Quant izat ion Parameter

Reversible Variable Length Code

Resynchronization Marker

Variable Lengt h Code

Video Object

Video Object Layer

Video Object Plane

viii

Chapter 1

Introduction

Multimedia refers to a variety of media, such as voice, data, image and video, which

are present either simultaneously or sequentially. In old days, most of the data carried

on the communication networks was textual data. Today, the transmission of multi-

media information has become an important application requirement on the Internet.

Video clips, animation greeting cards with music etc. have become more and more

important on the Internet. Compared with the traditional textual applications, mul-

timedia applications carry speech, voice and video information simultaneously. This

huge amount of data requires much higher bandwidth. A typical piece of 25 second,

320 s 240 Quick Time movie could take 2.3 bfegabits, which is equivaient of about

1000 screens of textual data [l]. Hence in multimedia systems, different sources of

information like voice, data, audio, and video are compressed as much as possible

before the bits are transrnitted via communication and storage channels.

In multimedia communications, on one hand, separate applications are combined

for transmission while on the other hand very adverse requirements have to be handled

simultaneously by sharing the provided bandwidth [2]. In particular, image and

video produce a large amount of data and hence image and video coinmunication

is considered t o be the main system bottleneck. Assuming that wireless access to

multimedia da ta is an important objective for the future, it is necessary to find ways

to make the transmission of image and video over wireless channels as efficient and

reliable as possible.

1.1 History of Video Compression

ISO/IEC is the main standardization body behind the well-known video compres-

sion standards like MPEG-1, MPEG-2 and MPEG-4. Although, the basic video

compression techniques are almost the same fore these standards, they differ in the

applications they address fundamentally.

MPEG- 1 standard deals with the storage and retrieval of multimedia informa-

tion on a CD-ROM. It utilizes the JPEG and H.261 (developed by ITU-T) as start-

ing point but provides many new features like frame based random access of video,

fast forwardlfast reverse etc. The channel bitrate for MPEG-1 was assumed to be

1.5içlbits/sec.

MPEG-2 is focussed on high quality multimedia compression for use in broadcast

applications. MPEG-2 hence covers the application areas like Direct Broadcast Satel-

lite (DBS), Digital Versatile Disk (DVD) and High Definition Television (HDTV).

The target data rate for MPEG-2 is 4-9Mbits/sec. It supports both progressive (non-

interlaced) as well as interlaced formats.

MPEG-4 initially began as a low bitrate standard but is now transformed into the

first standard t hat truly addresses the multimedia by providing the functionalities

like compression, universal access and content based interactivity.

Very little attention was given to error resilience and concealment during the

development of MPEG-1, but work in this area started in parallel to compression ac-

tivities for MPEG-2. Now Error Resilience has become a major effort within MPEG.

That is why MPEG-4 standard cornes with a variety of Error Resilience tools.

1.2 Video Compression

In particular, image and video transmission is the main system bottleneck since it

requires far more bandwidth than the transmission of other information sources such

as speech or data. Second, it is a more difficult problem due to the inherent complexity

of the coding methods.

Due to the enormous amount of bandwidth required, video data is typically com-

pressed before it is transmitted on a channel. Generally, the coding techniques offering

the greatest amounts of compression are most vulnerable to errors. This implies that

MPEG-encoded video images are highly vulnerable t o channel errors due t o extensive

use of interframe coding, which is susceptible to error accumulation upon decoding.

In fact, higher the compression factor, the higher is the vulnerability of bitstream

to channel errors [2j. Of course, error control channel coding can be applied to the

compressed data streams introducing structured redundancy which somewhat offsets

the redundancy removal achieved by compression. However, we should also look for

other image and video compression techniques that are less error prone. In this re-

gard, error resilient video compression schemes have shown to be very effective, which

not only provide efficient protection against channel errors but also corne with less

overhead as compared to traditional error control channel coding schemes.

1.2.1 Effect of Transmission Errors

If an uncompressed video were transmitted, a single bit error would result in the loss

of a single pixel, which is a small element and will be barely noticeable. In compressed

pictures fewer bits are used to describe the same information, and consequently each

bit takes on a much greater meaning. Thus a single bit error in a compressed picture

can result in large areas of the picture being corrupted. Compressing a video signal

reduces the redundancy in each picture, so each error has a much greater impact.

Whenever data is transmitted either through cable, radio or satellite link, the

transmitted signal will be subjected to distortion and noise. The received signal will

therefore? be different from the transmitted signal, and transmission errors will have

been introduced.

The wireless channel is a noisy fading channel characterized by long bursts of error

[3]. When compressed video data is transmitted over wireless channels, the effect of

channel errors can be severe.

1.2.2 Traditional Approaches to deal with Transmission er-

rors

The traditional approaches have been mainly to add redundancy through channel

coding. These Error checksums and correcting codes car? correct a certain number

of errors and so a specified error rate can be accommodated before any noticeable

degradation of picture quality is observed. However, this approach adds overhead

and hence may offset the compression, which rather defeats the purpose.

1.2.3 Error Resilient Approach

The main aim of error resilient approach is to lirnit the propagation of errors, and to

make the encoding process robust enough to deal with the transmission errors a t the

same time avoiding the addition of much extra redundancy.

1.3 Contribution of the Thesis

It has been found that it is the loss of bitstream synchronization that causes the ma-

jor artifacts and sometimes the spatial dislocation of decoded video data. To achieve

and maintain the resynchronization between encoder and decoder, Synchronization

Words (Sync. Words) have long been used. These Synchronization Words are unique

bit patterns that cannot be emulated by any entry in Variable Length Coding (VLC)

tables. However, Resynchronization Words are quite long and hence cause the over-

head that may offset the compression achieved. Also, because of the length of these

Synchronization Words, they are used relatively infrequently. In the event of an error,

the amount of data to be discarded is localized between two consecutive Synchroniza-

tion Words, which is still quite large because of the relative infrequent occurrence

of Synchronization Words in the compressed bitstream. In MPEG-4, this discarded

data can span many macroblocks. The synchronization may be achieved more often

to limit the amount of da ta t o be discarded, but this would introduce a huge overhead

and will not be practical.

We suggest the use of a technique called Error Resilient Entropy Coding (EREC)

that avoids the use of long synchronization words and not only provides more frequent

synchronization than that achieved by Synchronization Words but is also resilient to

transmission errors. The transcoding scheme proposed in this thesis utilizes EREC

and makes the MPEG-4 compressed video more resilient to the errors occurring over

wireless channels without needing any changes in existing MPEG-4 Encoder/Decoder.

The main advantages of this scheme include the provision of frequent synchroniza-

tion with minimal overhead and graceful picture degradation, when the quality gets

progressively worse, as opposed to failing abruptly, when the error rate increases.

1.4 Thesis Organization

The organization of thesis is as follows: Chapters 2 and 3 provide the theoretical

background relating to Image and Video compression, Error Control, Error resilient

techniques etc.. These chapters focus on traditional approaches to deal with the

transmission errors and presents a review and some details of error resilient approach

to limit the propagation of errors without the addition of much extra redundancy.

Chapter 4 introduces Error Resilient Entropy Coding (EREC). Chapter 5 describes

the design and details of the scheme proposed in this thesis, and also shows the

results obtained with the scheme implemented. Chapter 6 concludes and presents

future direct ions.

Chapter 2

Image and Video Compression

Image data tend to have a high degree of spatial redundancy. Spatial redundancy

implies that pixels are correlated across space. Most image compression schemes use

Transform-domain Block- based coding to exploit the spatial redundancy in images.

Similarly successive pictures (images) from video sequences are very sirnilar. Typically

two successive pictures will share a very similar background, and only a small area

of foreground will change. This correlation of video sequences in time is termed as

temporal redundancy. There is a wide range of lossy video compression methods but

basically they exploit the spatial and temporal redundancy present in video sequences.

In order to exploit both the spatial and the temporal redundancy, most video coders

use a two-stage process, shown in figure 2.1, to achieve good compression [5]. The

first stage uses a method that exploits the temporal redundancy between frames. The

output of this stage is followed by a coding method that exploits spatial redundancy

within the frame. In fact, most of the current video coding standards such as H.263

and bIPEG-4 are al1 based on this hybrid coding technique, shown in figure 2.2. The

Hybrid coding technique consists of Block Motion Compensation (BMC) and Discrete

Cosine Transform (DCT). BMC is used to exploit temporal redundancy while DCT is

used to reduce spatial redundancy. In discussing the compression of still and moving

images, we can distinguish betrveen intraframe and interframe coding. In intraframe

Stage 1 1 Processing for reducing temporal redundancy 1

frame (t- 1 )

frame difference

frame (t)

Stage2 1 Processing for reducing spatial redundancy 1

r 7

Figure 2.1: Two Stage Process for Reducing the Temporal and Spatial Redundancy

in Video Sequences

coding, redundancy is removed from a single image frame by exploiting the spatial

correlation within that frame. Of course, a moving image can be compressed by

only applying intraframe coding separately to each successive frame. However, much

greater compression of moving images is achieved by exploiting the sirnilarity between

successive frames, and this is termed as interframe coding. There are two types of

interframe coding Predictive, and Interpolative.

0 Predictive Coding: In predictive coding, first one picture frarne is coded using

the intrafranie coding technique, and then the differences between the reference

frarne and successive frames are encoded. To prevent error propagation, the

process is periodically restarted with a new reference frame.

O Interpolative Coding: With the interpolative coding, reference frames are again

used, but some frames between reference frames are simply not transmitted and

are restored during decompression by interpolating between reference frames.

Figure 2.2: Standard Video Coder

Standard Video Coder

Figure 2.2 shows a standard hybrid BMC/DCT video coder configuration, similar

to those used in MPEG-2 and JPEG. This also forms the basis of MPEG-4 video

coding scheme. Pictures are coded in either of the two modes, intraframe mode or

interframe mode. In intraframe mode, pictures are coded without any relation to the

previous image whereas in interframe coding, the current image is predicted from the

previous image using Block Motion Compensation (BMC); and the difFerence between

current image and the predicted image, called the residual image is coded. The basic

unit of data which is operated on is called a macroblock (MB) and is the data (both

luminance and chrominance components) corresponding to a block of 16x16 pixels.

The input image is split in to disjoint macroblocks and the processing is done on

macroblock basis.

2.2 Video Decoder

The macroblocks are reconstructed a t the receiver by the decoder using a reverse

process. The variable length codewords present in the received video bitstream are

decoded first. For inter macroblocks, the pixel values of the prediction error are

reconstructed by inverse quantization and inverse DCT and are then added to the

motion compensated pixels from the previous frame t o reconstruct the macroblocks.

2.3 MPEG-4 Video Standard

There are two different voluntary organizations in the field of visual communica-

tion. First one is the International Teleconimunications Union/Telecommunications

Standardization Sector (ITU-T). The second one is the International Organization

for Standardization/International Electrotechnical Commission (ISO/IEC). Table 3.1

provides an overview of the standards developed by ISO/IEC organizations.

Table 2.1 Description of Parameters

1 EVIPEG-1 1 1992 1 Digital Storage Media 1-2 Mbps 1

/ Standard 1 Year of Adoption

1 MPEG-2 1 1994 1 Broadcast 4-6 Mbps 1

Functionality Description

- - - - - - -

Video Coding standards, MPEG-1 and MPEG-2, although perfectly well suited in

the environment for which they were designed, are not necessarily flexible enough to

efficiently address the requirements of multimedia applications [?]. MPEG-4 visual

standard provides users a new level of interaction with visual contents. I t provides

/ MPEG-4 1999 / Content Based Interaetivity 10Kbps- 1

technologies to view, access and manipulate objects rather than pixels, with great

error robustness and a t a large range of bit error rates [il. This section describes the

ILIPEG-4 standard, as defined in ISOP/IEC 14496-2 document. The MPEG-4 visual

standard has been explicitly optimized for three bitrate ranges.

Below 64 Kbits/sec

MPEG-4 provides support for both interlaced and progressive material. The chromi-

nance format that is supported is 4:2:0. In this format the number of Cb and Cr

samples are half the number of samples of the luminance samples in both horizontal

and vertical directions. The resolutions supported by MPEG-4 standard are from

sub-QCIF to beyond HDTV.

2.3.1 Content Based Functionality -Concept of Video Ob-

ject Planes

The MPEG-4 video coding standard supports the functionalities already provided

by MPEG-1 and hIPEG-2: including the provision to efiiciently cornpress standard

rectangular sized image sequences a t varying levels of input formats, frame rates, and

bit rates [7]. Rirthermore, it provides the support for the separate encoding and

decoding of content Le., physical objects in a scene. Within the context of l"PEG-4,

the ability to identify and selectively decode and reconstruct video content of interest

is referred to as content based scalabiIity". Because of this functionality of MPEG-

4, we can interact and manipulate the contents of images and video sequences in the

compressed domain without the need for further segmentation or transcoding at the

receiver.

To enable the content based interactive functionalities envisioned, the MPEG-4

standard introduces the concept of "Video Object Planes" (VOP's). I t is assumed

that each frame of a n input video sequence is segmented into a nurnber of arbitrarily

shaped image regions (Video Object Planes). Each of the regions may possibly cover

particular image or video content of interest, Le., it describes a physical object within

the scenes. In contrast to the video source format used for the MPEG-1 and MPEG-2

standards, the video t o be coded by MPEG-4 is thus no longer considered a rectangu-

lar region. The input to be coded can be a VOP image region of arbitrary shape and

the shape and location of the region can Vary from frame to frame. Successive VOP's

belonging to the same physical object in a scene are referred to as, Video Objects

(VO's) . A Video Object (VO) is a sequence of VOP's of possibly arbitrary shape and

position. The shape, motion and texture information of the VOP's belonging to the

same VO is encoded and transmitted in to a separate Video Object Layer (VOL).

In addition, relevant information needed to identify each of the VOL's and how the

various VOL's are arranged, is also included in the bitstream to allow the decoder to

reconstruct the entire original sequence at the receiver. Hence, each VOP is decoded

separately which allows the flexible manipulation of the video sequence. (The video

source input assumed for the VOL structure may be generated by means of on-line

or off-line segmentation algorithms).

If the original input image sequences are not decomposed in to several VOL's of

arbitrary shape, the coding structure simply degenerates in to a single layer repre-

sentation, which supports conventional image sequences of rectangular shape. The

MPEG-4 content-based approach can thus be seen as a logical extension of the con-

ventionül MPEG-1 and MPEG-2 coding approach towards image input sequences of

arbitrary shape.

2.4 Visual Bitstream Syntax and Structure

The central concept defined by the MPEG-4 standard is the audio-visual object.

An MPEG-4 scene may consist of one or more video objects. An MPEG-4 visual

bitstream provides a hierarchical description of a visual scene as shown in figure

2.3 [BI . Eacli level of hierarchy can be accessed in the bitstream by special code

values called start codes.

2.4.1 Start Codes

Start codes are specific bit patterns that do not otherwise occur in the video strearn.

Each start code consists of a start code prefix followed by a start code value. The

start code prefix is a string of 23 bits with the value zero followed by a single bit with

the value one. The start code prefix is thus the bit string '0000 0000 0000 0000 0000

0001'. The start code value is an &bit integer, which identifies the type of start code

as shown in table 2.2.

Table 2.2 Some Start Code Values for MPEG-4 Visual Bitstream

Narne 1 Start Code Value (Hexadecirnal)

video - object - start - code 1 00 through 1F -. - -

video - object - layer - start - code 1 20 through 2F

2.4.2 Visual Object Sequence (VS)

-

uisual - object - sequence - start - code

visual - object - sequence - end - code

group - of - vop - start - code

vop - start - code

Visual object sequence is the highest syntact ic structure of the coded visual bitstream.

A visual objec t sequence commences wit h a visualOb ject,eqwence,tart,ode, which is

BO

BI

B3

B6

followed by one or more visual objects coded concurrently. The visual object sequence

is terminated by a uisual - object - sequence - end - code.

2.4.3 Video Object

A video object corresponds t o a particular (2-D) object in the scene. In most sim-

ple case it could be a rectangular frame, or it can be an arbitrarily shaped ob-

ject corresponding to an object or background. A video object commences with a

video - object - start - code, and is followed by one or more video object layers.

2.4.4 Video Object Layer

The VOL provides support for scalable coding. Each Video Object may be encoded

in scalable (multi-layer) or non-scalable form (single layer), depending upon the a p

plication. The video - object - layer - start - code marks a new video object layer.

2.4.5 Group of Video Object Planes (GOV)

The GOV groups together Video Object Planes. GOVs are optional.

2.4.6 Video Object Plane (VOP)

A VOP is a time sample of a video object. -4 conventional video frame can be

represented by a VOP with rectangular shape. VOP start code is the bit string

'000001B6' in hexadecimal. It marks the start of a video object plane. The VOP

contains the encoded video data of a time sample of a video object. That is, it

contains motion parameters, shape information and texture data. Al1 this information

is coded using macroblocks. The above hierarchical levels can be accessed by specific

start codes as mentioned above. However the macroblocks are coded sequentialiy and

there is no explicit boundary between macroblocks.

VS, ...VS, Visual Object Sequence (VS)

VOP, ." VOP, VOP,, m..

t VOP, VOP, ... VOP,

1-

Video Object (VO)

Video Object Layer

LAYER 1 LAYER 2

VOL,

Figure 2.3: Hierarchical levels in an MPEG-4 bitstream

VOL,

L 1 \

Gov, 1 GOY2 - -

Video Object Plane (VOP) -

2.4.7 Macroblock

A macroblock contains a section of the luminance component and the spatially corre-

sponding chrominance components. The term macroblock can either refer to source

and decoded data or to the corresponding coded data elements. A skipped mac-

roblock is one for which no information is transmitted. PresentIy there is only one

chrominance format for a macroblock; namely, 4:2:0 format. The orders of blocks

in a macroblock is illustrated below. -4 4:2:0 Macroblock consists of 6 blocks. This

structure holds 4 Y, 1 Cb and 1 Cr Blocks.

2.4.8 Block

The term block can refer either to source and reconstructed data or to the DCT

coefficients or to the corresponding coded data elements.

The syntax for visual bitstream defines two types of information, Configuration

information and elementary stream data. Configuration information refers to the

header information such as Video Object Sequence Header, Video Object Header and

Video Object Layer Header etc. The Elementary stream contains the data for a single

Iayer of a video object. Configuration information may be carried separately from

or combined with elementary stream data. The information about how the multiple

elementary streams are multiplexed in to a single bitstream is beyond the scope of

this work. The interested readers are referred to [IO] and [Il] for more information.

2.5 Coding of Shape, Motion and Texture for each

VOP

In MPEG -4 video standard, the information related to the shape, motion and texture

For each VO is coded in to a separate VOL in order to support separate decoding of

VO's. Identical algorithm is used, to code the shape, motion and texture information

vol Elementary Stream VOLl Visual Object 1 Header Layer i

Visual Object Visual Object I Header

Sequence Layer 2

Header

L

Figure 2.4:

VOz Elementas, Stream VOLl Visual Object 2 Header h y e r 1

-

Example of Visual Information - Logical Structure

Visud Object

Header Header

Elernentvy Stream Visuai Object 1 Lûyer 1

Elementii Stream Visud Object 2

I Loyer 1

Figure 2.5: Example Visual

Elementary Stream Data

Bitstream - Separate

Header r l

Configuration Information and

in each of laver. However, if the application requires high coding efficiency only

without the need for extended content based functionalities, input image sequence to

be coded contains only standard rectangular sized images [SI. The shape information

is not transmitted. In this case MPEG-4 video coding algorithm is similar to MPEG-

1/2 or H.26X coding algorithms. For coding each VOP image sequence (rectangular

size or not), the MPEG-4 coding standard uses a hybrid Block Motion Compensation

(BMC) and Discrete Cosine Transform (DCT) technique, already employed in MPEG-

112 [Il*

The Shape information, for arbitrarily shaped VO's, is referred to as " alpha

planes" in the context of MPEG-4. Shape coding can be lossless or lossy, allowing

the tradeoff between bitrate and accuracy. Two kinds of shape information, binary

shape information and gray scale shape information are used commonly [?].

Motion estimation and compensation are commonly used to compress video se-

quences by exploiting temporal redundancies between frames. The approach for mo-

tion compensation in the MPEG-4 standard is similar to those in other video coding

standards like MPEG-2. The main difference is that the block-based techniques have

been adapted to the VOP structure used in MPEG-4. There are three modes for

encoding an input VOP, namely:

A VOP may be encoded independently of any other VOP. In this case the

encoded VOP is called Intra VOP (IVOP).

-4 VOP may be predicted (using motion compensation) based on another previ-

ously decoded reference VOP. Such VOPs are called Predicted VOPs (P-VOP).

A VOP may be predicted based on past as well as future reference VOPs. Such

VOPs are called Bidirectional Predicted VOPs (B-VOP). B-VOPs may only be

interpolated based on 1-VOPs or P-VOPs. Motion Estimation is necessary only

for coding P-VOPs and B-VOPs.

It is very important to note that the coding of standard MPEG 1-frames, P-frames,

and B- frames is still supported by the MPEG-4 standard as a special case of image

in put sequences (VOP's) of rectangular shape.

The texture information of a VOP is present in the luminance, Y, and two chromi-

nance components, Cb, Cr, of the video data. In the case of an 1-VOP, the texture

information resides directly in the luminance and chrominance components. In the

case of motion compensated VOP's the texture information represents the residual

error remaining after motion compensation. For encoding the texture information,

the standard 8 x 8 block-based DCT is used [6]. For each macroblock a maximum of

four 8 x 8 luminance blocks (Yl, Y2, Y3, Y4) and two 8 x 8 chrominance blocks U

and V are coded. For 8 x 8 blocks straddling the VOP borders, the image padding

technique is used to fil1 macroblock content outside of a VOP prior to applying the

DCT in intra VOPs. For coding of motion-compensated prediction error P-VOPs,

the contents of the pixels outside the active VOP area are set to 128 [18]. Scanning

of the DCT coefficients followed by quantization and run length coding is performed

using techniques and VLC tables similar to those used in MPEG-1/2 and H.263 stan-

dards. An efficient prediction of DC and AC coefficients of the DCT is performed for

intra coded VOPs. MPEG-4 standard basically supports al1 the tools (DCT, motion

estimation and compensation, etc.) defined in MPEG-1, H.263 and in MPEG-2 Main

Profile. The compressed alpha plane, motion vector and DCT bit words are multi-

plexed into a VOL layer bitstream by coding the shape information first followed by

motion and texture information.

2.6 Support for Conventional as well as Content

based Funct ionalit ies

-4s indicated above, Besides the provision of new content based functionalities and

error resilience and robustness, the MPEG-4 video coding standard allows the coding

of standard rectangular size image sequences, as a special case, using a single layer

VOP coding. In this coding mode, the VOP is considered to be of rectangular shape

instead of being arbitrarily shaped. Consequently, since the input image is not seg-

mented in to several VOP's, there is only single layer, instead of many layers used for

coding each VOP separately.

Chapter 3

Error Resilient Video

Coding/Decoding

Current video compression standards are not designed for error prone transmission,

they can suffer seriously if any of the compressed data is corrupted The error rate

experienced on wireless channels a re relatively high in cornparison with those in wired

networks. For example, the error characteristics of a circuit switched wireline POTS,

or "plain old telephone service", transmission are around 106 random BER in worst

cases. Hence, there are difficulties peculiar to wireless channels that exhibit extreme

problems in transmission and networking. However, the problem is made more severe

for the transmission of compressed image sequences. The following section explains

why this is the case.

In order to achieve highly efficient irnage/video compression, many systems use

variable-rate coding (VLC) techniques such as entropy coding and run-length coding.

Variable lengt h coding techniques provide much better compression ratios t han do

fixed-rate techniques but are degradeci more severely by channel errors.

In VLC boundary between codewords is irnplicit in the decoder. The variable

length codeword decoder reads compressed bit stream until a full codetvord is en-

countered, then it translates that codeword in to a meaningful symbol, and begins

decoding a new word. When there are transmission errors the implicit nature of

boundary between codewords leads to an incorrect number of bits being used in VLC

decoding. This simply means that if there is an error in a variable length codeword,

the decoder may not be able to detect that error, but would rather decode an incor-

rect symbol, and t hus subsequently results in loss of synchronization wit h encoder.

These errors may never be detected until a unique resynchronization point or start

code is encountered in the bitstream.

The demand for wireless services is continuing to grow and as wireless networks

become more widely deployed, the need will inevitably arise for a variety of wire-

less imagery and video transmission capabili t ies similar to t hose which are becoming

available in the existing public switched network and wired office environrnents [IO].

There are some issues involved in migrating imagery and video transmission to the

wireless environment. In particular image and video transmission is the main sys-

tem bottleneck, since it requires far more bandwidth than the transmission of other

information sources such as speech or data. Second, it is a more difficult problem

due to the inherent complexity of the coding methods. Two conflicting requirements

regarding this are:

0 Limited capacity of wireless channel. This compels the use of efficient compres-

sion techniques of source data.

a The wireless channel is a noisy channel characterized by long bursts of error and

rapid degradation in signal quality due to interference and multipath fading. In

short, quality of channel is highly variable which gives rise to erroneous trans-

mission. This requires that any coding scheme used must degrade gracefuliy in

the presence of errors introduced by the channel.

Present day video coding techniques employ predictive coding and motion compensa-

tion to exploit existing temporal and spatial redundancy. Hence not only the use of

predictive coding leads to the propagation of errors to neighboring spatial blocks but

also errors occurring in one frame will therefore propagate to the following frames.

Due to extensive use of Variable Length Coding (VLC) and Predictive Coding,

compressed data is very vulnerable to transmission errors. This problem becornes

severe for video transmission over wireless channels. Firstly because the video data

is highly compressed and secondly because wireless channels have higher error rate

than wireline channels. This leads to a rapid degradation in the reconstructed video

quality

In short in video communication over error prone channels, transmission errors

will occur, and if compressed digital video is used, the compression algorithm has to

be error robust. Two approaches are known to make video communication system

error robust: protection on transmission level (channel coding), and error robust video

compression.

3.1 Traditional Robust Coding: Forward Error Cor-

recting Coding

A traditional method to cope with transmission errors is to employ Forward error

Correcting (FEC) coding. Traditionally, source coding and channel coding have been

separated. Figure 3.1 shows the coding of digital data. In the source coding stage,

data is compressed, and as much uncontrolled redundancy is removed as possible. In

the channel coding stage, controlled redundancy is put back to allow error detection

and correction for errors a t the channel decoder.The addition of controlled redundancy

is referred to as forward error correcting coding (FEC). Some examples are Linear

Block Codes, Linear Cyclic Codes, Convolutional Codes etc.

3.1.1 Bitrate-Quality Tradeoff

If video is passed down a noisy channel of a fixed capacity, C, then the number of

databits, N and the number of controlled redundancy checkbits, R, must be fewer

Noisy Channel

Figure 3.1: Source and Channel Coding

I

Source Channel

Coding Coding

The picture quality in the absence of errors is a function of N alone. R governs the

error correcting capability of a code, and this affects the picture quality in the presence

of channel errors. To combat the effect of channel errors, a few databits (N) are traded

for codebits for FEC. With FEC, a lower quality is noticed a t low error rates while, a

higher quality is achieved a t high error rates. FEC also produces a rnuch sharper curve

[4], where the picture suddenly deteriorates very quickly with increasing error rate.

This occurs because the error correcting capability of the code has been exceeded,

and the code attempts to correct multiple errors incorrectly. Thus video protected by

FEC degrades very suddenly with little notice, whereas unprotected video degrades

more gracefuIly. FEC techniques provide effective error protection against random

bit errors but their performance is usually inadequate against longer duration burst

errors. These FEC codes come with an increased overhead in terms of bitstream size;

hence some of the coding efficiency achieved by the video compression scheme is lost,

which espands the da ta rate unnecessarily and may still not be able to overcome the

effects of errors under severe channel degradation. Also FEC coding is not suitable

for channels having a highly variable quality and need for very powerful FEC codes

for worst case channel situation would severely reduce the compression performance.

Further more such a system would fail catastrophically whenever this worst case is

exceeded and on the other hand would be over protected in normal channel situation

[14]. Thus the noisy channel environment presents a difficult tradeoff between the

-) Channel

Decoding

Source

Decoding b

need to constrain transmission rates and the need to provide acceptable video quality

in the presence of channel errors. -4 video compression scheme designed for these

channels must degrade gracefully in performance when channel fading occurs.

Generally, the coding techniques offering the greatest amounts of compression are

most vulnerable to errors [Il]. This implies that MPEG-encoded video images are

highly vulnerable to channels, due to extensive use of interframe coding, which is

susceptible to error accumulation upon decoding. In fact, higher the compression

factor, the higher is the vulnerability of bit stream to channel errors. Of course, error

control channel coding can be applied to the compressed data streams, introducing

structured redundancy which somewhat offsets the redundancy rernoval achieved by

compression. However, we should also look for other image compression techniques

which are less error prone.

3.2 Error Resilient Approach to deal with Trans-

mission errors

What is needed is, therefore, to redesign the compression system in order to be more

resilient to channel errors. The aim of error resilient Image and video coding should

be:

To reduce the propagation efFects of channel errors and maintain synchroniza-

tion even when transmitted bit stream is corrupted by channel errors.

To provide a graceful degradation in picture quality, when error rate is increased

The error resilient coding schemes thus provide improved performance for both good

and poor quality channels, while a FEC system may provide superior performance

around a pre-designed quality channel.

3.3 Error Resilience Tools

In prac t ical video communication schemes, error correct ing codes are typically used

only to provide a certain level of error protection to the compressed video bitstream

and it becomes necessary for the video decoder to accept some level of errors in

the compressed bit stream. This necessitates the use of Error Resilience tools to

handle these residual errors that remain after error correction especially if less delay

is required. The goal of traditional video coding is to eliminate both spatial and

temporal redundancy in the video signal. However, to achieve high video quality for

transmission over an error prone channel, it is highly desirable to have video codecs

designed with error resilience in mind.

Error resilience techniques are the tools employed to improve the error robust-

ness of communication system. For the transmission of compressed digital video

[13], Forward Error Resilience techniques refer to the technique where encoder plays

an important part in improving the error robustness, typically by introducing the

redundancy in transmitted information. For example, the error resilience tools incor-

porated into the PVIPEG-4 video coder are basically Forward Error Resilience tools.

These include, Resynchronization markers, Data Partitioning, Reversible Variable

length Codes (RVLCs), Header Extension Code (HEC) and Adaptive Intra Refresh

(-4IR). Even after performing error control and correction some amount of residual

errors still exists in compressed bit stream fed to video decoder in the receiver, due

to transmission over wireless channels. Even a very low BER in this stream, will

have a devastating effect on the quality of the decoded sequence because of the high

compression and the error propagation. Therefore, the video decoder should be ro-

bust enough to provide acceptable video quality even in the presence of some residual

errors. Following are the error resilient tools, generally included in video decoder, to

make it more robust and minimize the effect of transmission errors [12].

O Error detection and localization

0 Resynchronization

0 Data Recovery

Error concealment

-4ccurate detection of errors is essential step since most of the other error resilience

techniques can only be invoked if an error is detected. The presence of errors in the

compressed bitstream can be signaled by FEC used in multiplex layer. The video

decoder can also detect errors whenever illegal VLC codewords are encountered in

the bitstream or when the decoding of VLC leads to an illegal value of decoded infor-

mation i-e., occurrence of more than 64 DCT coefficient for an 8x8 DCT block. The

detection of an error implies that decoder has lost synchronization with the encoder.

The decoder is made to fa11 back into lock step with the encoder by using Resynchro-

nization schemes. Encoder inserts unique synchronization words in the bitstream a t

approximately equally spaced intervals. These synchronization words are chosen such

that they are unique from valid video bitstream i.e. no valid combination of the video

algorithm's VLC tables can produce these words. The decoder, after the error de-

tection, seeks forward in the bit stream looking for this known synchronization word.

Once this word is found the decoder then falls back in to synchronization with the en-

coder. -4t this point the decoder has detected an error, regained the synchronization

with the encoder and isolated the error between two resynchronization points.

Due to the use of VLC, the location in the bitstrearn where the decoder detects the

error is not the same location where the error had occurred but some undetermined

distance away from i t , as shown in figure 3.2. Since the decoder only isolate the

error to be between two synchronization points and not pinpoint the exact location,

generally al1 of the data that corresponds to the macroblocks between these two

resynchronization points needs to be discarded. Otherwise, the effects of displaying an

image reconstructed from erroneous data can cause highly annoying visual artifacts.

After resynchronization is reestablished, data recovery tools like reversible decod-

- Discarded Data

1 1 1 1 I

-

Resync.Pt. Error Location Error detected Resync. Pt.

Figure 3.2: Decoder can only isolate the error to be between two resynchronization

points

ing attempt to recover dat,a that in general would be lost. These special VLCs called

RVLC have the property that they can be decoded in both forward and reverse direc-

tion which is made possible by the fact that special kind of VLC table at the encoder

are used in coding DCT coefficients and motion vectors. The exact location of error

can now be localized more precisely, by comparing the forward and reverse decoded

data.

The remaining operation Le., error concealment is kind of post processing tech-

nique that aims at minimizing the impact of data that is in error. Different implemen-

tations of wireless video systems utilize different kinds of error concealment strategies

depending upon the available computational power and the quality of the channel.

One simple error concealment strategy is to simply replace the luminance and chromi-

nance components of erroneous macroblocks with the luminance and chrominance of

the corresponding macroblocks in the previous frame of video sequence. More com-

plex techniques use sorne kind of estimation strategies to exploit the local correlation

that exists within a frame of video sequence to fil1 the missing information of erroneous

blocks of data.

A third type of error resilience technique is still possible if the network supports a

back channel. In this interactive error resilience technique, the decoder and encoder

interact through a feedback path, to improve the error resilience by retransmitting

Figure 3.3: An MPEG-4 video packet

Resync. Marker

the data or by influencing future encoder action so as to stop the propagation of

detected errors in the decoder.

For completeness the error resilience tools developed for MPEG-4 are described

in the following section. They offer a clear set of tools which, when used properly,

can permit communication of video information in noisy environments. This is a

critical breakthrough in video technology because error prone environment is very

unforgiving to digital video [14].

3.4 MPEG-4 Error Resilience Tools

MB No.

X number of tools have been incorporated in MPEG-4 video coder to make it more

error resilient and compatible for the transmission of compressed data over wireless

channels. These include: [15]

0 Resynchronization

0 Data Partitioning

0 Reversible Variable Length Codes (RVLCs)

Adaptive Intra Refresh (AIR)

MPEG-4 standard transmits the data/inforination in the form of packets. An MPEG-

4 video packet. Each video packet is made up of an integer number of macroblocks

in raster scan order. These macroblocks can span several rows of macrobiocks in the

irnage and can even include partial rows of macroblocks.

Macroblock data QP HEC

3.5 Resynchronization in MPEG-4

When a video decoder looses synchronization due to the decoding of a corrupted bit

strearn, it becomes unable to identify precise location in the image where the current.

data belongs, that results in rapid degradation of quality of decoded video or in some

cases rendering the video unusable.

MPEG-4 standard uses Resynchronization Markers to achieve resynchronization

of video decoder with encoder [13] Resynchronization Markers are specially designed

bit patterns that are usually placed at approximately regular intervals in the video

bitstream. When the decoder detects an error. It can then look for this resynchro-

nization marker and regain synchronization. There are two main points of concern in

achieving resynchronization in MPEG-4.

0 I t differs from previous video coding standard the way it inserts these Resyn-

chronization Markers in bit stream. I t inserts the markers at the beginning of

each video packet.

The encoder needs to remove al1 da ta dependencies that exist between the data

belonging to two different video packets within the same image

Previous standards such as H.261 and H.263 (Version 1) insert these Resynchroniza-

tion Markers a t the beginning of each of the GOBs [15]. The images to be encoded

are logically partitioned in to r o m of macroblocks called Group of Blocks (GOBs):

in case of QCIF images these GOBs correspond to a horizontal row of macroblock.

Hence, in case of H.263 one row of macroblock is the smallest region that the error

can be isolated to. In EVIPEG-4 resynchronization markers are not spaced after a

fixed nurnber of rnacroblocks (slice concept), but it is attempted t o space the markers

evenly throughout the bit stream to obtain the video packets of nearly equal length.

This is particularly advantageous in case of short bursts of errors where decoder can

quickly localize the error to within a few macroblocks in the important high activ-

ity areas as macroblocks corresponding to these areas generate more bits than other

parts of the image. The reason being that PvIPEG-4 encoder inserts the resynchroniza-

tion rnarkers at uniformly spaced bit intervals, (Note that resynchronization markers

can only be placed at a macroblock boundary), the macroblock interval between the

markers is a lot closer in high activity areas and a lot further apart in low activity

areas. Hence MPEG-4 preserves the image quality in important areas, in contrasts

to H.263 (version l), where the resynchronization rnarkers are restricted to be at the

beginning of a fixed GOB independent of image content. The recommended spacing

of resynchronization markers (based on bit rates) is at the intervals of 480 bits for

24Kb/s and 736 bits for bit rates between 25Kb/s to 48Kb/s. Another important

point of concern is that al1 predictively coded information is confined within one

video packet so as to prevent the propagation of errors, if one of the video packets

in the current is corrupted due to errors. This is goal is achieved by inserting two

additional fields at the beginning of each video packet as shown in figure 3.3 [13].

The first field is the Absolute Macroblock Number (MB No.) of the first rnacroblock

in the video packet that indicates the spatial location of macroblock in the current im-

age. The second field is the Quantization Parameter (QP), nhich denotes the initial

quantization parameter used to quantize the DCT coefficients in the video packet.

3.6 Conclusion

Only bitstream syntax and error free decoding procedures are standardized in MPEG-

4 [IO]. So, there is still room for improvement in other areas of error resilience like

encoding, error detection, error localization and error concealment.

This research proposa1 aims to address the problem of error resilieiit image and

video source coding, by utilizing an alternative resynchronization tool, called Error

Resilient Entropy Coding (EREC). We propose to use it to multiplex, the variable

length blocks of data produced by MPEG-4 compression algorithm, in to an error

resilient structure.

Chapter 4

Error Resilient Entropy Coding

(EREC)

Error Resilient Entropy Coding (EREC) is a method of synchronization that can

be adopted to the existing video coding schemes to provide enhanced resilience to

transmission errors. EREC achieves synchronization with minimal redundancy, hence

maintains the high coding efficiency achieved by video coding scheme. It is designed

to provide graceful degradation in quality as bit error rate increases, and is superior to

channel coding that fails abruptly when bit error rate is increased beyond its capacity.

It is not the loss of any particular bit, but loss of bitstream synchronization, which

causes the most visible corruption of compressed pictures. The block diagram of a

general source-coding algorithm is shown in figure 4.1. The process named multiplex

coding/ decoding involves combining al1 data to be coded in to a form that is suitable

for transmission e.g., a single binary bitstream. This stage of coding may include

channel coding and addition of synchronization words that help the decoder regain

the synchronization once lost due to channel errors.

The most common way to implement multiplex coding is to transmit the variable

length blocks consecutively. If this method is adopted then and compressed data gets

corrupted due to transmission errors, the decoder loses synchronization with the start

Input Multiplex

Di screte Coâe Vaiablelength 1 N

Image i

Coefficients Words Block Data

I i !

Figure 4.1: Image coding using Huffman coding of transformed blocks

4

i i

and end of variable length blocks. Even if the synchronization is regained after the

ioss of a few bIocks, the block count is likely to be permanently offset, resulting in

following data being decoded at the wrong location [16]. In image and video coding

this can result in large areas of the picture being spatially displaced. As explained

earlier, unique synchronization code words (Resynchronization Warkers in MPEG-4)

followed by some block address information are inserted at intervals in compressed

bitstream to achieve synchronization between encoder and decoder.

-D I mag-

i i

4.1 Disadvantages of Using Resynchronization Words

- Forwrud Huffman

Coding .

Forwd

Run Length

Coding

Forward

Transfonn

Forwrird

Quantization

H

A

The disadvantages associated with the use of Resynchronization Markers can be sum-

marized as below:

Coding

I 1 7 I

*

Image Decoding .d

Inverse

Huffman

'Oding

4- Output

The resynchronization words are relatively long bit patterns. Although the

necessary length of these code words is minimized, the nature of constraints

implies that they are still quite long. Thus in order to maintain low redundancy,

4

Multiplex

Inverse

Transfonn

Inverse 1 ' Qumtizuion

' ~nverse

Run Length

Coding

they must be used relatively infrequently. This limits the propagation of errors

to a maximum of the separation bettveen two Resynchronization Markers, which

due to the relative infrequent occurrence of sync words is still quite large.

a Before inserting the Resynchronization Markers a t the start of a video packet,

al1 predictively encoded information must be confined within a video packet to

prevent the error propagation caused by predictive coding/decoding steps in

the algorithm. This results in some small sacrifices in coding efficiency.

0 By using Resynchronization Markers, the error can be isolated t o be between

two consecutive resynchronization markers and generally al1 of the da ta that

corresponds to the macroblocks between these ttvo resynchronization markers

needs to be discarded. Hence, a lot of macroblocks are discarded and this can

result in highly annoying visual artifacts if not concealed properly.

4.2 Error Resilient Entropy Coding

The Error Resilient Entropy Code (EREC) is a method for coding variable length

blocks of data, before transmitting them on channel, with low overhead and a high

resilience to transmission errors. EREC is actually a met hod of synchronization t hat

enhances the resilience of coded bitstream to transmission errors. EREC does this

by effectively reordering the variable length block da ta produced by video coding

schemes into a fixed length slot structure, such that each variable length block starts

at a known position in the bitstream. This means that decoder is autornatically

synchronized with the encoder at the start of each block. Hence EREC achieves

resynchronization a t the beginning of each variable length block.

The most advantageous feature of EREC is that it achieves resynchronization

more frequently but wit h minimal redundancy, t hus maintaining the high compression

achieved by the video-coding scheme. EREC has been applied to both still image

compression and video compression schemes like H.261, MPEG-2 and H.263 and has

shown improved performance for noisy channel 1161. EREC is applied to the MPEG-

4 video for the first time in this research work. The method is described in detail in

[dl, [16] and [lï], So we give only a brief surnmary here.

4.3 Assumpt ions

EREC is a general method of achieving synchronization, that can be applied to any

image or video-coding scheme to provide enhanced resilience to transmission errors

provided the following assumptions are met:

Each variable length block must be a prefix code. An overall prefix code is a set

of codewords whose length can be determined by reading the individual bits of

a codeword, one by one, without reading beyond the end of the codeword. So

for a prefix code the decoder is aware of when it has decoded a block without

any reference to previous block or following information.

The information belongiiig to each variable length block is causal; that is, the

channel errors affect the current and following blocks only.

Most of video coding standards like MPEG-2, H.263 and MPEG-4 use entropy based

source coding such as Huffman coding or arithmetic coding. Hence these assumptions

are met by most video coding schemes.

4.4 Operation of EREC Encoder

If there are N variable length blocks of data, each of bits bi, then total number of bits

to be encoded by EREC encoder is given as:

EREC encoder encodes these variable length blocks of data in to a fixed length slot

structure, called EREC frame. The EREC frame structure consists of total Ts bits.

These Ts bits are arranged in to N slots of fixed length s j 7 such that:

The EREC encoding algorithm proceeds in stages. In the first stage, as much data as

possible is placed from block i to the corresponding dot j = i. However some blocks

have lengths longer than slot size, while some have length shorter than slot size. In

subsequent stages of algorithm, the data left from longer blocks is filled in the spaces

left by shorter blocks. This shifting of data is governed by an offset sequence. For

example the offset sequence used in figure 4.3 is O, 1 ,2 ,3 ,4 ,5 ,6 ,7 ,8 ,9

Figures 4.3 and 4.4 show an example of EREC encode process. There are 10

variable length blocks of data, giving a total of 70 bits. These 70 bits are rearranged

into an EREC frame as shown in figure 4.2. The EREC frame comprises of 10 dots

each of length 7 bits. The length of 7 bits is chosen as slot length, so that Ts = Tb.

At stage 1, the offset is O and the blocks are fitted in to corresponding slots. At

stage 2, the offset is 1, and each block with data still to be placed searches for the

dot next to it, if this next dot has space al1 or as mush data as possible is placed in

it. At stage 3, offset is 2 and each block with data still to be encoded searches for the

slot next to its adjacent slot and so on, until al1 the variable length block data has

been encoded in to fixed length slots. The number of variable length blocks bound

the maximum number of stages. In figures 4.3 and 4.4:

the blocks with number of bits equal to space available in slot, Le., bï = sj, are

encoded completely, leaving the slot full(e.g. dot number 7).

the blocks with bits less than the slot length, i.e., bi < s j are coded completely,

leaving the sj - bi bits unused in the slot (e.g. slot numbers 2, 4, 5, 6 and 10)

the blocks with bits more than the slot size i.e.,bi > s j , have their sj bits coded

Figure 1.2: Variable length blocks(1eft) are fitted into fixed leligth EREC slot struc-

ture (right)

to fil1 the slot, and leave bi - sj bits remaining to be coded (e.g. slot numbers

1, 3, 8, and 9)

Thus a t the end of stage 1, slots 1, 3, 7, 8 and 9 are full, while dots 2, 4, 5, 6 and 10

have space left to code data for blocks 1, 3, 8, and 9.

4.4.1 Operation of EREC Decoder

The only information EREC decoder requires to perform the decoding, is the knowl-

edge of number of slots and their size, to perform the decoding process. This addi-

tional information in present in EREC frame header and is encoded using a traditional

Error Correcting Code. Since it is small, it presents no significant overhead. The de-

coding process is just the reverse of encoding. In the first stage of decoding, the

decoder parses the data up each slot until either the End of Block (EOB) or the End

of Slot, whichever cornes first.

O If the decoder reaches the EOB, before the end of slot, then the block was short

and whole block is successfully decoded.

Figure 4.3: The stages of EREC Encoding Process: Stage 1 (left) and Stage 2 (right)

Figure 4.4: The stages of EREC Encoding Process: Stage 3 (left) and Final Stage

(right)

a If the decoder reâches the end of slot and does not find the EOB, then this was

a longer block, and this block cannot be decoded in current stage.

In the later stages of the algorithm, the decoder shifts the data from offset slots back

in to the slots where it belongs according to the offset sequence. This continues until

al1 of the blocks are reconstructed.

4.4.2 EREC Parameters

The EREC parameters are:

a Total number of bits to be transmitted in an EREC frame, Ts. This parameter

is usually kept equal to the Total number of bits in al1 the variable length blocks

of data i.e., Tb.

a Number of Slots, N. In simple cases, number of slots can be made equal to the

number of variable length blocks.

Slot size or Slot length, S.

In the proposed scheme, the first two parameters are calculated a t the transmitter

side, prior to the start of EREC encode process. These parameters are calculated

on the bases of the information obtained by "Bitstream Parser", a stage prior to the

EREC encoder during the transcoding. The relationship between above parameters

can be given by the simple formula given beIow.

The total number of bits in the variable length blocks,Tb is not necessarily a multiple

of number of sIots. In that case the total number of bits in variable length blocks of

data are fitted in an EREC frame by making the first few slots one bit longer than

the last ones.

sj = D I V ( T ~ , N ) + 1, vo < j j IMOD(T~, N ) (4-4)

sj = DIVb(Tb, N ) , VMOD(Tb7 N ) < j < N

For example, if the total number of bits of al1 the variable length blocks of data is 93

and number of blocks are 10, then according to above equations, each of the first 3

slots is made one bit longer than the rest of the slots. That is, for slots 1, 2, and 3

length is equal to 10 bits, while each of the d o t from 4 to 10 has length equal to 9

bits. Hence, in above example, first 3 do t s are made 1 bit longer t o accommodate al1

the variable length block data.

The information relating to the number of dots and their sizes is extremely im-

portant. If this information is received incorrectly, then the whole EREC frame is

lost. This information must be highly protected with FEC.

4.5 Implementation Issues of EREC

There are some issues regarding the implementation of EREC to existing video coding

schemes, which are described in this section.

4.5.1 Highly Protected Parameters

The transmission and protection of header information is the only area in which EREC

adds any overhead. For typical EREC frame sizes of 50 Kbits in intra pictures, an

18 bit code is needed to describe this length, which if protected up to a bit error rate

(BER) of IO%, corresponds to a coding overhead of the order of 0.2%. [16]. I t has

been seen that despite of this requirement of protecting the EREC header, it is still

expected to give improved performance as compared to the use of Resynchronization

Markers. The reason being that the saving of bits by avoiding Resynchronization

Markers is more than the bits we may use for the EREC header protection even in

worst case channel conditions.

4.5.2 Buffering

Al1 the blocks of da ta in one EREC frame must have been received by the encoder,

before EREC encoding can begin. Similarly the decoder must receive the whole frame

before decoding can begin. The EREC scheme, therefore, introduces a delay of two

EREC frames. This will typically be eight slices of picture for MPEG-2 (128 lines),

which usually corresponds to less than 20 of the picture [4].

4.5.3 Error Propagation

If an error occurs in one EREC slot during transmission, it can affect the decode

process in other slots. Thus EREC structure can extend one error to other slots. For

example, channel errors can cause the End of Block (EOB) to be missed or falsely

detected. This will cause al1 the remaining da ta for that block to be incorrect, and the

block is termed as erroneous for the current and later stages in the decoder algorithm.

Let us Say, slot 5 contains data from macroblocks 5, 1 and 8. One error in block 5

will, therefore cause errors a t the end of blocks 1 and 8, as well as error in block 5.

Thus one error in a short block may cause errors towards the end of other long blocks.

It has been seen that, in case of channel errors, the farther the da ta is from the start

of the slot in which it is coded, the more likely it is, that it will be in error [16]. This

is because during the encoding process of a long block, some of the d a t a of the long

block is fitted in to spaces left by some shorter block. If an error occurs in one of

the shorter blocks, this may cause the decoder to assume an incorrect length of the

shorter block. In this case, during the EREC decoding process the wrong data will be

shifted back into the long block resulting in the corruption of some of the data a t the

end of the long block. Thus, it is important t o place more important information near

the start of each dot . For most standards like JPEG and MPEG, this is obviously the

case, silice the coefficients representing the lower frequency parts of image are coded

first [20].

In practice, blocks representing high activity regions in an image will require many

bits to be encoded and wiI1 be longer. In case of image and video coding, the bits

a t the end of longer blocks typically correspond to high-frequency information. As

described above, the data most likely to suffer from channel error propagation is

that placed in later stages of algorithm i.e., the data near the end of longer blocks.

Thus most of the effects of channel errors will be seen as high frequency errors in

high activity regions of the image. In case of subjective testing the distortion in these

regions of an image is less noticeable as compared to the errors in low activity regions.

EREC scheme, therefore, provides a subjective benefit as well.

4.6 EREC Performance in case of Burst Errros

It is further noticed that if two or more errors occur, then the total number of bits

affected will not be much greater than that for one error. The reason for this is that

successive errors in a burst are more likely to affect the same information and thus

multiple errors together can often be considered as only a single error event. For

example: if there is an error in an EREC slot and rve assume that this error causes

loss of synchronization until the next slot, then subsequent errors in that EREC slot

will have no further effect. Hence EREC copes well in case of burst errors too [5] .

Because of this property, EREC is capable of showing graceful degradation a t higher

bit error rates.

4.7 Conclusion

The EREC scheine has the many advantages including very low overhead, frequent

synchronization, graceful degradation a t increased biterror rates and the ability to

cope well with random as well as burst errors. Frequent synchronization with minimal

overhead make EREC useful for applications where channel coding is too expensive

and some loss of fidelity is preferred over complete breakdown a t high error rates.

Also graceful degradation gives user some indication of the channel condition and

situation can be improved by rerouting the channel? or by changing the frequency of

radio link etc. Hence EREC can be used for applications like speech, image and video

transfer over cellular netwoks and wireiess channels. EREC is a very good option for

transferring video over wireless channels as channel conditions are very unpredictable

and not only the error rate is high but errors occur in bursts.

The above arguments suggest that EREC technique is a very good option to

achieve synchronization as well as error resilience for transmitting the compressed

image and video da ta over noisy channel. The following part of the thesis describes

how we have used the EREC technique to replace resynchronization marker used in

MPEG-4 video.

Chapter 5

Transcoding of MPEG-4 Video

using EREC

This chapter describes how we have used EREC for the error resilient transmission of

IVIPEG-4 video over channels subject to randorn bit errors. It explains in detail the

contribution of the research work, implementation details and the results achieved.

The thesis considers the video data that has been already compressed using

MPEG-4 standard video coding scheme. We have used a lossless "black box" a p

proach [4] shown in figure 5.1. The MPEG-4 compressed video data, from a stan-

dard MPEG-4 encoder, is transcoded in to a more resilient structure using EREC,

transmitted, and finally recorded back, to be read by a standard MPEG-4 decoder.

The transcoder and inverse transcoder is lossless and reversible, so in the absence of

channel errors, the output will be equivalent to the input. The transcoder is designed

not to significantly alter the bit-rate, so the transcoded da ta could be transrnitted

t hrough the original channel.

We have used a standard MPEG-4 video coder/decoder with single layer coding

option. The proposed scheme replaces the existing met hod of synchronization (Resyn-

chronization Markers) used in standard MPEG-4 codec by an alternative synchroniza-

tion technique called EREC. The synchronization is achieved a t rnacroblock level in

Figure 5.1: The MPEG-4 lossless Transcoder

MPEG-4

compressed--,

bitstream

.cmP 1 Inverse Transcoder 1 .erec

.erec

Figure 5.2: The block diagram of proposed transcoding scheme

MPEG-4

compressed '-,

bi tstream

T~iîm~~Ier . ' Lossy channel

.yuv

both intra video object plane (1-VOP) and inter video object plane (P-VOP) using

the EREC. This simply means that the macroblock data from MPEG-4 compressed

bitstream is organized in to a fixed length slot structure such that each macroblock

starts at the beginning of each slot. Since the decoder also has the information about

this fixed length slot structure hence synchronization is achieved at the beginning of

each macroblock. A very simple concealment method is used that is explained later

in this chapter.

The MPEG-4 compressed bitstream consists of macroblock data (containing DCT

coefficients and motion vectors etc), preceded by header information such as video

Inverse

Tmscoder

-

MPEG-4 Encoder

Encoder b

Transcoder -

object (VO) header, Video Object Layer (VOL) header, and Video Object Plane

(VOP) headers. The EREC algorithm is based on re-organization of the variable

length blocks in a way that each block starts a t a specific known position within the

code.

-4 bitstream parser is designed to extract different data. from the compressed

MPEG-4 bitstream. The input MPEG-4 bitstream is parsed to obtain variable length

bIocks of data, in this case, group of macroblocks. The bitstream parser extracts

the macroblock (MB) data from compressed bitstream without act ually decoding

the macroblocks. While doing so, the parser keeps track of the total number of

rnacroblocks, N and the number of bits in each macroblock. This information is later

used to calculate the slot size and the number of slots for the EREC frame. So the

bitstream parser gets rid of al1 the synchronization words and header information and

produces the macroblock data at its output.

Once al1 the macroblock data has been obtained from bitstrearn parser, it is passed

on to EREC encoder that fits it in to a fixed length slot structure. The synchroniza-

tion is achieved at the start of each macroblock since each dot starts at the beginning

of a macroblock. Hence synchronization is achieved more frequently Le., a t mac-

roblock level than the standard MPEG-4 decoder, without any additional overhead

of sync. words. EREC encoder outputs the transcoded data ( an .erec file) which

is transmitted through the channel along with the necessary header information. At

receiver side EREC decoder parses the slots, and outputs the data to a bitstream

formatter. The output of this formatter is a standard MPEG-4 bitstream structure

(.cmp file) which in ideal case should be exactly the same as original bitstream at

the transrnitter. The standard MPEG-4 decoder operates normally and decodes this

.cmp file to produce a .yuv file that can be displayed.

Bitstream Parser

We designed and developed the code for bitstream parser using the MPEG-4 decoder

code written in C++. We have modified the functions in MPEG-4 decoder that

helped to get macroblock data from the compressed bitstream. This macroblock data

\vas later used as the input of EREC encoder.

5.1.1 Pseudo-Code

% The bi t s t ream parser taises t h e compressed MPEG-4 bitstream a t

% i t s input and produces two f i les a t its output . One of them

% conta ins t h e macroblock d a t a from a l 1 t h e VOPs and t h e o the r one

% conta ins al1 the header information from MPEG-4 compressed bi ts tream.

% Every t i m e t he program g e t s b i t s f o r parsing, it also t r a n s f e r s

%these b i t s t o t h e EREC bi t s t r eam

g e t VO Header Information

g e t VOL Header Information while ( End of input has not been reached )

s t o r e r e t r i e v e d header information f o r f u r t h e r use i n an output f i l e

I d e n t i f y VOP start code

For each VOP

{ get VOP Header

s t o r e r e t r i eved header information f o r f u r t h e r use i n an output f i l e

Parse VOP Macroblock d a t a based on type

If IVOP

f o r ( i n t i=O; i < number of MBs; i++)

{

p a r s e MB inforomation

count b i t s e x t r a c t e d

f o r ( i n t i=0 ; i < number of Blocks ; i++)

{

p a r s e Block i n f o

count B i t s e x t r a c t e d

1 1

1 If PVOP

{ f o r ( i n t i = O ; i < number of MBs; i++)

{ p a r s e MB inforomation

g e t motion v e c t o r in foromat ion

count b i t s e x t r a c t e d

f o r ( i n t i = O ; i < number of Blocks ; i++)

{ p a r s e Block i n f o

count B i t s e x t r a c t e d

1 1

1 s t o r e al1 t h e macroblocks and motion vec to r d a t a t o be sent t o EREC encoder

1

MPEG-4 Compressed Biistream

1

t

i a Macroblock \ 1

/ / information is put i j

VO

Header

i in to a dot structure \ 1 cailed EREC fnmc \

1

VOL

Header

i j according to EREC ,

VOP,

Heder

Figure 5.3: Two bitstreams running in parallel in EREC encoder

i

EREC Encoder

Mxroblocks

of VOP,

VO

Header

We have implemented the algorithm for EREC encoder in C++. The algorithm

takes macroblock data as input and fits it in to slots of fixed length. The length and

number of slots is calculated based on the total number of bits in al1 the variable

length macroblocks. The total number of macroblocks, and nurnber of bits in each

macroblock are obtained during bitstream parsing. The encoder has two bitstreams

running in parallel. The first one is the MPEG-4 compressed bitstrearn. The second

one is the EREC bitstream, which is empty at the start of the encoder program.

This bitstream is built as the encoder program progresses. The parser stores al1

the macroblock data in to a file. The information in this file is used to calculate the

number of macroblocks in one VOP and total number of bits in each macroblock.

This information is later used in EREC Encoder to calculate the number of slots and

slot size. EREC encoder starts filling the slots such that each macroblock is put in

one slot (starting from the bottom, going to the top). If the length of a macroblock

is greater than the do t size, the da ta from this macroblock is placed in a neighboring

VOP,

Header

VOL

Headcr

Macroblocks

of VOP2

VOP,

Header

ERECfrmc

containing

MBs of VOP,

EREC

fnmeI

Header

VOPz

Header

EREC

f m e 2

Heûder

E R E C f m e

containhg

MBs of VOP2

1 Dump in to macroblock file that is I > I p i n i IO be convertcd in to EREC

1 dot structure by applying EREC 1 l algorithm

Dump in to EREC I EREC Encoding

Figure 5.4: Logical blocks in EREC encoder

dot (having some empty space) a t a later stage. Eventually at stage N-1, al1 the data

from N macroblocks is fitted in to the N slots. EREC sIot structure is sequentially

put in to the EREC bitstream. The EREC bitstream is transmitted through the

channel and is received by EREC decoder.

5.3 EREC Decoder

The EREC decoder takes a " .erec "file created by EREC encoder. This file only differs

from the ".cmp7' file, in that it has EREC headers (Le. information regarding, total

bits in an EREC frame and niimber of slots) and that the macroblock information has

been fitted in to the EREC frame structure. The boundaries between macroblocks

are implicit i.e. there is no specific codeword a t the end of a macroblock. For EREC

decoder to recognize the End of Block (a block is the contents of a slot, in this case a

Macroblock), MPEG-4 decoder must be incorporated in to the EREC decoder. Since

the EREC decoder does not completely decodes a slot in one pass (stage), the actual

MPEG-4 decoder functions were modified t o perform the functions described in Table

Table 5.1 The decoding procedure of EREC decoder

Scenario

End of do t is reached before

End of Block (EOB)

End of Block (EOB) is

reached before End of slot

End of slot coincides with

EOB

Actions perform by EREC decoder

This implies that macroblock \vas longer

than slot length length (bi > s i ) , and its

remaining data is placed in some other

offset slot. This dot is terrned flagged

as" partially decoded" -- - -

This implies that macroblock length is less

than slot length (bi < si), and macroblock

can be fully decoded at the current stage.

The d o t is flagged as "fully decoded".

However this slot contains data from some

other slot

The macroblock length is exactly the same

as the size of slot (bi = s i )

EREC decoder performs decoding in stages until al1 the slot data has been shifted

back to produce the variable length macroblocks. This macroblock data is reformatted

to create an MPEG-4 compressed standard bitstream or ".cmpn file, which is fed to

the standard MPEG-4 decoder for display.

5.4 EREC Decoding in the Presence of Channel

Errors

To deal with effects of transmission errors each slot has a flag, which is used to mark

the slots containing errors. Any macroblock that is put in the d o t " in Error State" is

also marked as erroneous and is not used in the subsequent stages of algorithm. An

error is flagged when eit her a coefficient is read t hat exceeds a certain t hreshold, an

invalid Huffman code is read, or the number of coefficients read for a block exceeds

the known maximum number of coefficients coded per block. Al1 the macroblocks in

error are replaced by the previously correctly decoded macroblocks.

5.5 Limitations of the EREC Scheme

5.5.1 Complexity of EREC Decoder

EREC Encoder is fast, since it does not have to perform actual decoding and it only

parses the bitstream and fits the macroblock data into the slot structure of EREC

frame. The complexity of EREC decoder is much higher than that of EREC encoder

because for the decoder to detect end of Block (EOB), it has to use the parsing

functions. These functions are used repeatedly for a particular slot since in an EREC

frame, slot data is spread over other offset slots and must be obtained from those

offset slots. For example the final offset value in most cases for the decoding process

is 396 (for 352 x 288 frame). This means that there is at least one slot, which uses

the parsing functions 396 times. The complexity of EREC scheme also increases with

the number of blocks N, and is known to be proportional to Log(N) for an efficient

implernentation. One solution to this cornplexity problem is to breakdown the large

number of blocks into several subframes of N blocks each. In this way N is reduced as

now each EREC frame contains less number of blocks. But there is a slight increase

in redundancy as each EREC frame will have some header information associated

with it.

5.5.2 Error Propagation

If an error occurs in one EREC slot during transmission, it ca affect the decoding

process in other slots as the data from longer bIocks is fitted in to spaces left by

shorter blocks. Any error in a short block may cause the wrong data to be shifted

back into the longer block during EREC decoding and some of the data at the end

of the longer blocks may be corrupted. However, due to inherent error extensions

towards the end of long EREC blocks, many transmission errors will occur in the

high frequency DCT coefficients because the data a t the end of longer macroblocks

corresponds to the high frequency DCT coefficients for high activity regions of image.

The distortions in these regions of an image are visually less noticeable as compared

to the errors in low activity regions. EREC scherne, therefore provides a subjective

benefit as well.

5.5.3 Buffering and Delay

The EREC algorithm requires that al1 the data for the N variable length blocks

coming from video encoder must be known before EREC encoding starts. Similarly

EREC decoding algorithm starts when the EREC decoder has received the entire

EREC frame. This implies a delay of two EREC frames. Also significant buffering

requirements need to be fulfilied. The effect of these limitations can be minimized, by

dividing one big EREC frame into several subframes of N blocks. There is, however,

a slight increase in redundancy associated with each EREC frame.

5.5.4 Enhancernents to the EREC

The EREC performance may be improved by optimizing its parameters according to

the application for which EREC is being used. The following section discusses some

enhancements to EREC [22]. The error propagations can be minimized by placing as

much data as possible in the first EREC shift. This is possible if we use few long slots

instead of many short slots. Hence a smaller proportion of the whole data is placed

towards the end of a slot and few error extensions are observed. However, fewer long

dots will attain synchronization less frequently while many short slots will achieve

synchronization more often. The choice between these two scenarios depeds upon

the application. Forexample, if the data encoded has relatively many high activity

regions then we will have many long blocks of data and we may choose the few long

dots to avoid error extensions.

Unequal error protecion may be provided to the da ta according to its importance

by using dots of different lengths for them. For example one long slot can be used

to accornodate al1 important motion vector information instead of spreading it across

many slots. Less important -AC coefficients can be placed in many short slots. Thus

total number of bits sent in an EREC frame will not be changed but error extensions

will not occur in motion vector information. It is essential that decoder has the

knowledge of al1 slot lengths in advance.

The choice of offset sequence is also an important parameter. For images and

sequences having high activity, the longer blocks will often be clustered together. In

this case a simple sequential offset sequence will cause the full slots to search the next

full slots during the early stages of the algorithm. -4 pseudo random sequence will be

a good choice in this case t o increase the spped of EREC encode process.

Al1 the above parameters can be optirnized according to the intended application

of EREC. There is an optimum set of parameters for a given EREC implementation.

5.6 Simulation Details

The scheme above has been simulated with various transmission methods:

Using standard MPEG-4 transmission with Resynchronization Markers a t the

start of each video packet

üsing EREC encoder and decoder and avoiding the use of synchronization words

Video sequences used for simulations are "foreman" (CIF 352 x 288 format) and "table

tennis" ( SIF 352 x 240 format). These sequences are in 4:2:0 YUV concatenated

format, where each frame is represented by al1 its luminance (Y) samples, followed

by al1 its chrominance (U) samples and finally al1 its chrominace (V) samples. The

resolution used is 8 bits per sample.

Table 5.2 MPEG-4 Simulation Parameters

I Parameter I Value I

( YUV Format 1 4:2:0 1 Number of Frames

I

1 FrameRate 1 30 Fframes per second 1

30

1 Target Bitrate 1 48 kbps 1 1 Scalability 1 None 1

1 Quantization step for PVOP 1 16 1

Alpha Type

Quantization step for IVOP

1 PVOPs Count between IVOPs ( 8 1

None

16

1 BVOPs Count between IVOPs 1 O 1

1 Data Partitioning

Motion Vector Search Range

Sprite Type

Reversible Variable Lenght Coding

1 Disabled 1

16

None

Disabled

1 Video Packet mode 1 Disabled 1

In experiments to explore the error robustness, the random error patterns are applied

directly to the macroblocks of the encoded video bitstream. It is assumed that a

transmission error will not effect the EREC header data that carries information

about the lerigth and number of slots. Hence random errors are introduced in the

Temporal Prediction Type 1 P P ...

macroblock data prior to fitting it in the EREC slot structure. For these simulations

we have ignored the effect of any concealment strategies and error correcting coding.

However: these techniques can be used in conjunction with al1 of the above methods

to further improve error resilience. For example, error correction coding can well be

applied to the output of EREC Encoder.

5.7 Overhead Analysis

The following example taken from the values during simulations gives an idea about

the overhead involved in EREC and compares i t with that for MPEG-4 resynchro-

nization markers.

MPEG-4, resynchronization markers are inserted a t the start of a video packet.

The total overhead involved is equal to the sum of size of resynchronization marker"

and the " Video Packet Header" . Video packet header contains information regarding

the Absolute Macroblock Number (MB no.) and Header Extension Code (HEC) etc.

Resynchronization markers are typically inserted after every 736 bits for the bit-

rates between 25Kbits/sec. to 48Kbits/sec. For a P-VOP having a total of 5600bits,

the insertion of resynchronization markers after every 736 means that there are 8

resynchronization markers in t his P-VOP. The overhead associated with one resyn-

chronization marker, as explained above, is the sum of size of resynchronization

rnarker and Video Packet Header. The size of resynchronization marker varies be-

tween 17-23 bits, and the Video Packet Header size can be taken to be equal to 15 bits

approxirnately. This results in an overhead of 35 bits approximately for one resyn-

chronization marker and hence an overhead of 35 x 8 = 280 bits per VOP. While if

the EREC scheme is used, one EREC frame accommodates one VOP and it uses one

EREC header. EREC header has information about "Total number of bits in EREC

frame" and that about "Slot sizes". We have chosen The EREC header size to be

equal to 30 bits; 20 bits are used to specify the EREC frame length and 10 bits are

used to specify slot sizes. This implies an overhead of only 30 bits per P-VOP. Hence

the number of bits saved per VOP using EREC scheme is 280 - 30 = 250 bits. Since

EREC information has to be heavily protected, some of the bits saved above can be

spent on FEC coding. For example, if the 30 bit EREC Header is saved up to a BER

of 10% using (32,6) augrnented Reed Muller (Distance 16) code. The number of bits

required be 160: which is still less than the bits saved using EREC.

The situation becomes even better for 1-VOP. For a typical 1-VOP having a size

of 34197 bits, the overhead by using resynchronization markers is 1610 bits approxi-

mately. While with EREC this overhead is again only 30 bits, saving 1610-30 = 1580

bits. This means saving of 6 times more bits than P-VOP.

5.8 Results: Experiment 1

First of al1 we show that our proposed scheme can be successfully used to replace the

Resynchronization Markers in MPEG-4. Figures 5.5 through 5.8 show corresponding

frames from two foreman sequences, one coded with MPEG-4 scheme using resyn-

chronization markers while the other coded using transcoded scheme with EREC.

Figure 5.5: -4 Frame of foreman sequence coded using standard MPEG-4 scheme with

resynchronization markers on error free channel.

Figure 5.6: The same frame of foreman sequence coded using proposed transcoding

scheme on error free channel.

Figure 5.7: 4 Frame of foreman sequence coded using standard MPEG-4 scheme with

resynchronization markers on error free channel .

Figure 5.8: The same frame of forernan sequence coded using proposed transcoding

scheme on error free channel.

Figure 5.9, 5.11 and 5.13 show frames from foreman sequence coded using proposed

transcoding scheme with EREC a t random bit error rates of 1 0 - ~ , 10-~, and IO-"

respectively These results show the performance of EREC in the event of random

channel errors. The results show that there is no significant degradation in visual

quality as the bit error rate increases. Unfortunately we do not have any visual

results available from MPEG-4 scheme with Resynchronization Markers, exposed to

above randorn errors. The reason being that the MPEG-4 codec available, was not

designed to handle the erroneous bitstream. However, we do have some PSNR values

available for some test sequences from some past research work [13]. Although these

PSNR values give a rough idea about the performance of MPEG-4 scheme using

Resynchronization Markers, but can not be used to compare with Our scheme, as

we do not know the exact parameters used a t the time of encoding. It is worth

mentioning that the scheme in [13], has used some post processing techniques to

conceal the errors and obtain a good picture quality while we have not used any

superior methods for concealment.

Although the results have been simulated using random errors but because of the

nature of this scheme, it is accepted to give improved performance in case of burst

errors too. Some results, in this thesis have been presented using Peak Signal to Noise

Ratio (PSNR) as a quality rneasure. However PSNR is not a very good indicator of

visual quality as it is an objective measure and does not take into account the tolerance

that human eye has for some distortion in images.

Figure 5.11: Proposed transcoding scheme using EREC with channel BER of IO-'

Figure 5.12: Proposed transcoding scheme using EREC with channel BER of 10-5

Figure 5.13: Proposed transcoding scheme using EREC with channel BER of 10e4

Figure 5.14: Proposed transcoding scheme using EREC with channel BER of IO-'

63

1.00E-06 t.00E-05 1 .OOE-134 1 .OU€-03

Bit Error Rate

Figure 5.15: PSNR (dB) Vs Channel BER for a frame of foreman sequence encoded

using proposed transcoding scheme.

5.10 Result :Experiment 3

Figure 5.15 shows the degradation of PSNR vs. channel Bit Error Rate (BER) for a

frame of foreman sequence. The simulation is done using random bit errors and the

results presented are for 10 runs. The PSNR values a t different error rates produce

a smooth curve showing tha t quality of decoded picture shows a graceful degrada-

tion when the bit error rate is increased. Hence, EREC scheme is not only able to

combat the effect of increase in error rate gracefully but achieves this at much less

overhead as compared with MPEG-4 scheme with Resynchronization Marker. Hence,

EREC scheme is not only able to combat the effect of increase in error rate grace-

fully but achieves this a t much less overhead as compared with MPEG-4 scheme with

Figure 5.16: P roposed transcoding scheme using EREC with channel BER of IO-^

resynchronization markers.


Figure 5.16, 5.17 and 5.18 show frames from "Table Tennis (SIF Format)" sequence

coded using proposed transcoding scheme with EREC a t random bit error rates of

10F6, IO-', and IO-' respectively. These results show the performance of EREC in

the event of random channel errors. The results show that there is no significant

degradation in visual quality as the bit error rate increases. Since Table Tennis

sequence has high spatial details and fast motions, the proposed algorithm is shown

to make the damaged area to be subjectively Iess visible.

Figure 5.17: Proposed transcoding scheme using EREC with channel BER of 1 0 - ~

Figure 5.18: Proposed transcoding scheme using EREC with channel BER of

Figure 5.19: PSNR (dB) Vs Channel BER for a frame of Table Tenis sequence encoded

using proposed transcoding scheme.


Figure 5.19 shows the degradation of PSNR vs. channel Bit Error Rate (BER) for

a frame of Table Tannis sequence. The simulation is done using random bit errors

and the results presented are for 10 runs. The PSNR values at different error rates

produce a smooth curve showing that quality of decoded picture shows a graceful

degradation when the bit error rate is increased. Note that Table Tennis sequeiice has

very fast motions and we know that the macroblocks corresponding to high activity

regions of image are longer as they require many bits to be coded. As explained

earlier, the data placed at the end of longer macroblocks is more likely to suffer from

channel error propagations. The the knee of PSNR Vs BER curve for table tannis

sequence occurs earlier as cornpared to that in foreman sequence. This is because

table tennis sequence has higher activity and hence has many longer macroblocks

compared to foreman sequence which has relatively low amount of movement. Many

longer macroblocks cause the error propagtion as the data from these macroblocks is

spread over other offset slots. Howevewr it is still able to combat the effects of the

errors and shows a smooth degradation.

5.13 Conclusion

In this chapter we have considered the performance of our proposed transcoding

scheme using EREC for two test sequences namely: 77foreman" and "table ten-

nis" Foreman" sequence has medium spatial details and low amount of movement

while "table tennis" has high spatial detail and fast motions in it. The visual results

presented in this chapter show that EREC scheme is a good alternative to the use

of resynchronization markers but with much less overhead. In case of transmission

errors our proposed scheme also performs well and limits the propagation of errors

and hence shows a graceful degradation as the channel BER increases. EREC is es-

pecially useful for sequences having fast motions like "table tennis" as most of the

channel error effects are seen as high frequency errors in high activity regions of the

images which are subjectively less noticeable as compared to errors in low activity

regions. The reason being that in practice, blocks representing high activity regions

in an image will require many bits to be coded and will be longer. The bits a t the

end of longer blocks typically correspon to high frequency DCT coefficients in case of

image and video coding. In EREC due to inhernt error propagation towards the end

of EREC slots, many errors will corrupt the high frequency DCT coefficients. Thus

most of the effects of channel errors are seen as high frequency errors in high activity

regions of the image which are visually less noticeable. However, EREC scheme is

flexible enough and by optimizing the EREC parameters according to the application,

it can give improved performace for most of the images and video sequences. -411 the

transmission errors are being handled by the EREC decoder which uses a very coarse

form of concealment and just replaces the erroneous macroblock with any previous

correctly decoded macroblock. However, EREC is capable of performing even bet-

ter than reflected by these results if used with more sophisticated error concealment

techniques.

Chapter 6

Conclusions and Future Directions

6.1 Conclusions

In this research work we have presented the Error Resilience Aspects of MPEG-4

video. A number of tools have been adopted in to the MPEG-4 video standard which

enable robust transmission of compressed video over noisy communication channels

such as wireless links. One of these tools is the use of resynchronization markers t o

achieve synchronization even when the bitstream gets corrupted due to channel errors.

The use of resynchronization markers involves a tradeoff between the "amount of da ta

discarded because of transmission errors" and the "compression efficiency". By using

resynchronization markers more frequently the amount of da ta discarded can be made

less but a t the cost of increased overhead that offsets the compression achieved by

the encoder. There are, however, other methods that improve the performance of

the MPEG-4 video over these noisy channels, that standard does not specify. We

have used such a method, called Error Resilient Entropy Coding (EREC), in our

proposed scheme. EREC is an alternative to using resynchronization markers, tha t

limits the amount of da ta discarded in the event of transmission errors a t the cost of

very lit tle overhead. The proposed scheme trancodes the MPEG-4 compressed video

into an error resilient structure using the EREC technique, making it less vulnerable

to channel errors.

EREC scheme does not suffer from loss of synchronization and catastrophic failure

typical of cliannel coding. Similarly it is a better alternative to the use of resynchro-

nization markers which provide synchronization a t the cost of increased overhead.

Both channel coding and resynchronization markers increase overhead and cause

some sacrifice in coding efficiency achieved by video coding scheme. EREC, how-

ever, achieves more frequent synchronization and enhanced resilience to transmission

errors with much less overhead than the above two schemes. The overhead involved

in case of channel coding is due to the addition of extra parity bits that are added to

the compressed bitstream to allow the decoder to correct certain number of errors.

More powerful channel codes can provide the protection for increased bit error rates.

In fact the more the protection, provided by channel coding, the more is the over-

head associated with it. Aithough resynchronization markers are bit patterns of fixed

length but they are place approximately a t regular intervals in the compressed bit-

stream. Hence they also involve overhead. In contrast to both of the above schemes,

EREC requires l e s t overhead and is still able to provide a quality comparable t o

above two schemes. The only overhead involved in EREC is the EREC header that

contains information relating t o the length and number of slots. This information is

very important and is used by decoder to perform the decoding. Simulation results

show that EREC saves considerable number of bits per frame by avoiding the use

of resynchronization markers. This saving in bits is even more, in case of an 1-VOP.

Some of the bits saved can be used to protect the EREC header information. The

overhead ânalysis of resynchronization markers and EREC header reveals the fact

that EREC scheme has less overhead as compared to the use of resynchronization

markers even if its header is protected very heavily by employing powerful channel

coding.

EREC scheme provides a graceful degradation in quality as it has the capability

t o cop well with burst as well as random errors. EREC scheme can localize the

errors to only corrupted macroblocks, while resynchronization rnarkers can localize

the errors only to the separation between two resynchronization markers. In case

of an error in one macroblock, al1 the macroblocks between two resynchronization

markers are discard that can cause highly annoying visual; artifacts. The ability to

provide frequent synchronization and graceful degradation is very advantageous not

only for end users but for content providers as wel1. From content providers point

of view EREC is less expensive alternative to channel coding and resynchronization

markers. Channel coding is expensive in terrns of added redundant information.

Hence by using EREC, a content provider can accommodate more channels and hence

more users under a fixed bandwidth constraint. EREC scheme will provide the user

almost the same quality of video as obtained by using channel coding but with much

Iess noticeable distortion when error rate increases on the channel. The property of

graceful degradation rather than abrupt failure of a coding scheme as channel error

rate increases is important for rnany reasons. First it provides users with a warning

of difficult conditions over the channel and allows them to improve the situation by

rerouting the channel, changing the frequency of a radio link and by adjusting the

position of receiver.

EREC is more users friendly than channel coding. In case of noisy channels EREC

produces distortion only approximately as long as the duration of burst. Hence data

received after the burst can be useful to the user, in contrast to channel coding which

may cause the complete breakdown of picture received, if the depth of interleaving is

insufficient to deal with the duration of burst. EREC bas many applications in source

coding systems for noisy and unpredictable channels such as wireless where channel

coding is too expensive as it adds redundant information and some loss of fidelity

is preferable to complete breakdown. Example applications include the transmission

of speech, images or video over cellular networks or noisy telephone lines or weak

radio links. If used with proper error concealment techniques, EREC scheme can be

more useful for images and video sequences with high activity such as faces etc. as

compared to the sequences with low amount of movement such as landscape. The

reason being that EREC reduces the channel error propagation effects and that the

remaining channel error propagation rnost likely affects high frequency information

for more active blocks. These error are subjectively less visible than errors in inactive

regions or errors in low frequencies and motion vectors. Our transcoding scheme has

the added advantage that it can be used with the existing coders/decoders without

any need to change them, hence it is standard compatible.

6.2 Future Directions

Synchronization information in the compressed bitstream is also prone to channel

errors. If a channel error corrupts the synchronization marker then the whole video

packet needs to be discarded. Similarly an error occurring in EREC header can render

the whole EREC frame unusable. Future directions of research on "Transcoding of

MPEG-4 using EREC" include a through investigation on the protection of EREC

frame header using Forward Error Correction Coding and the performance comparison

to see how the proposed scheme behaves if the EREC header gets corrupted. At

the same time, same type of experimentation should be done on standard MPEG-4

schemes by corrupting the Resynchronization Markers. Furthermore, the complexity

analysis of EREC decoder shouid be performed to see the feasibility of a built in

EREC decoder within MPEG-4 decoder.

Further improvements within the EREC are also open to exploration. The o p

timization of EREC parameters like EREC Frarne size, Number of SIots and Slot

lengths can lead to more error resilience and understanding of EREC concept. For

example, separate slots can be allocated for .4C and DC coefficients, which can allo-

cate different levels of resilience to coefficients of differing importance. Use of pseudo

random offset sequence and the use of "Hierarchical EREC", are some more areas to

be esplored. Also the option of dividing a big EREC frame into several subframes

can decrease the complexity of EREC scheme. More sophisticated techniques can be

used for error concealment in conjunction with EREC.

Widespread use of Multimedia Communication on rvireless channels is dependent

a lot on the reliable and efficient delivery of image and video sequences. This is a very

active research area right now. With MPEG-4 being the first standard specifically

designed for multimedia communication, standard as well as non standard approaches

should be explored to make MPEG-4 video more robust. EREC should be considered

to be one of a selection of techniques for error resilience coding as i t can be used in

conjunction with error correction coding, error concealment and even synchronization

code worcls for systems that suffer from bit insertion or deletion errors.

error-prone channels - university of toronto

Documents