dit - university of trentoassets.disi.unitn.it/uploads/doctoral_school/documents/... · 2011. 2....

156
PhD Dissertation International Doctorate School in Information and Communication Technologies DIT - University of Trento Design principles for embedded multimedia bitstreams transmission over wireless links Cristina E. Costa Advisor: Prof. Francesco G.B. De Natale Universit` a degli Studi di Trento Co-Advisor: Prof. Aggelos Katsaggelos Northwestern University February 22, 2005

Upload: others

Post on 15-Feb-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

  • PhD Dissertation

    International Doctorate School in Information and

    Communication Technologies

    DIT - University of Trento

    Design principles for embedded

    multimedia bitstreams transmission over

    wireless links

    Cristina E. Costa

    Advisor:

    Prof. Francesco G.B. De Natale

    Università degli Studi di Trento

    Co-Advisor:

    Prof. Aggelos Katsaggelos

    Northwestern University

    February 22, 2005

  • Abstract

    Applications supported by new wireless communications systems are evolv-

    ing from voice and/or pure data transmission to multimedia. These ser-

    vices are characterized by the transmission of a large amount of data and

    real time constraints, which impose a significant increase in the complexity

    and capability offered by transmitting devices. Wireless networks impose

    great limitations to multimedia transmission, due to channel losses and

    channel characteristics variability, that, combined with the limitation of

    resources such as energy and computational power, lead to the research of

    different approaches for multimedia data transmission.

    Multimedia bitstreams have unique characteristics that can be exploited as

    an advantage if taken into account during the design phase of the system.

    In order to increase the transmission robustness and flexibility, new ap-

    proaches for compression and coding have been studied.

    Between them, progressive coding is one of the more interesting techniques

    because it allows to create embedded bitstreams. This kind of bitstreams

    can be used for implementing SNR scalability, since they can be truncated,

    and still decoded, generating a lower quality version of the original data.

    In this thesis, we investigate the use of embedded bitstreams in wireless

    transmission and various approaches are proposed. In particular cross-

    layering techniques are considered for implementing energy efficient coding

    and transmission.

    Keywords: embedded multimedia coding, wireless transmission, cross-

    layer, energy efficient coding, MPEG-4 FGS, JPEG2000, region of interest.

  • Contents

    1 Introduction 1

    1.1 Video source coding techniques . . . . . . . . . . . . . . . 2

    1.2 Scalability in image and video coding . . . . . . . . . . . . 3

    1.3 Region of interest and non uniform compression . . . . . . 5

    1.4 Transmission of multimedia bitstreams . . . . . . . . . . . 7

    1.5 Unequal Error Protection . . . . . . . . . . . . . . . . . . 9

    1.6 Joint source and channel coding . . . . . . . . . . . . . . . 10

    1.7 Scope and main contributions . . . . . . . . . . . . . . . . 11

    2 Progressive coding in video and image compression 13

    2.1 Progressive scalability in JPEG2000 image coding standard 15

    2.1.1 Rate-Distortion information . . . . . . . . . . . . . 16

    2.2 Progressive scalability in MPEG-4 video coding standard . 17

    2.2.1 FGS decoder simplification using post-clipping . . . 21

    2.2.2 FGS Advanced Features . . . . . . . . . . . . . . . 21

    2.2.3 Rate-Distortion model of the FGS bitstream . . . . 24

    2.3 Applications of progressive coding . . . . . . . . . . . . . . 26

    3 Wireless video and embedded bitstreams transmission 29

    3.1 Joint source and channel coding in wireless transmission . 30

    3.2 Unequal error protection in progressive and scalable bitstreams 30

    3.3 Cross-layer approaches . . . . . . . . . . . . . . . . . . . . 31

    i

  • 3.4 Joint source coding and power control . . . . . . . . . . . . 33

    3.5 Modulation based UEP . . . . . . . . . . . . . . . . . . . . 34

    4 Non uniform compression in image and video transmission 35

    4.1 Nonuniform compression of geometrically distorted images 36

    4.2 Evaluation of spatial distortion . . . . . . . . . . . . . . . 38

    4.3 Adaptive compression of geometrically distorted images . . 47

    4.3.1 Adaptive Compression using a JPEG-like scheme and

    QDM . . . . . . . . . . . . . . . . . . . . . . . . . . 49

    4.3.2 Adaptive Compression using JPEG2000 and QDM 50

    4.4 Experimental results . . . . . . . . . . . . . . . . . . . . . 52

    4.4.1 Quality measurement . . . . . . . . . . . . . . . . . 52

    4.4.2 Non Uniform Compression using JPEG . . . . . . . 56

    4.4.3 Non Uniform Compression using JPEG2000 . . . . 56

    4.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . 63

    5 Interactive ROI selection using FGS in MPEG-4 video trans-

    mission 65

    5.1 The use of RoI in video browsing . . . . . . . . . . . . . . 66

    5.2 The proposed approach . . . . . . . . . . . . . . . . . . . . 67

    5.3 Application testbed . . . . . . . . . . . . . . . . . . . . . . 70

    5.4 Experimental results . . . . . . . . . . . . . . . . . . . . . 70

    5.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . 71

    6 Energy efficient transmission 75

    6.1 Distortion in progressive and scalable bitstreams . . . . . . 77

    6.2 A general optimization approach to energy constrained prob-

    lems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

    6.3 Channel model . . . . . . . . . . . . . . . . . . . . . . . . 81

    6.4 Application to image transmission . . . . . . . . . . . . . . 85

    ii

  • 6.4.1 Simulations results for jpeg2000 transmission . . . . 86

    6.5 Application to video transmission . . . . . . . . . . . . . . 87

    6.5.1 Simulations results for FGS MPEG-4 video trans-

    mission . . . . . . . . . . . . . . . . . . . . . . . . . 91

    6.5.2 Rate-Distortion model of the FGS bitstream . . . . 97

    6.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . 99

    7 Study of the effects of the modulation scheme choice 101

    7.1 AWGN channel model . . . . . . . . . . . . . . . . . . . . 101

    7.2 Solution for AGWN channel model . . . . . . . . . . . . . 103

    7.2.1 Modulation scheme comparison . . . . . . . . . . . 105

    7.3 Combined use of energy based UEP and channel coding . . 107

    7.4 Solution with RS coding . . . . . . . . . . . . . . . . . . . 112

    7.5 Modulation comparison with error correcting codes . . . . 116

    7.5.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . 120

    8 Conclusions 121

    Bibliography 123

    A Detailed procedure 135

    A.1 Defining the channel model . . . . . . . . . . . . . . . . . 138

    A.2 Dual problem . . . . . . . . . . . . . . . . . . . . . . . . . 140

    iii

  • List of Tables

    4.1 QDM statistics for JPEG encoder without adaptation, CR=10 54

    4.2 QDM statistics for JPEG encoder without adaptation, CR=20 54

    4.3 QDM statistics for JPEG2000 encoder without adaptation,

    CR=10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

    4.4 QDM statistics for JPEG2000 encoder without adaptation,

    CR=20 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

    6.1 Image PSNR. . . . . . . . . . . . . . . . . . . . . . . . . . 87

    6.2 General parameter settings. . . . . . . . . . . . . . . . . . 92

    6.3 Parameter settings for the three experiments. . . . . . . . 92

    7.1 Parameters a, α, and the spectral efficiency rb/BT for dif-

    ferent modulations. . . . . . . . . . . . . . . . . . . . . . . 102

    v

  • List of Figures

    2.1 FGS encoder block schema. . . . . . . . . . . . . . . . . . 18

    2.2 Bit-plane encoding of the Enhancement Layer. . . . . . . . 19

    2.3 FGS bitstream. . . . . . . . . . . . . . . . . . . . . . . . . 19

    2.4 FGS bitplane truncation. . . . . . . . . . . . . . . . . . . 19

    2.5 Comparison between the measured data and the R-D curve

    calculated from the BP data. . . . . . . . . . . . . . . . . . 25

    4.1 Graphical representation of a fish-eye distorted image . . . 39

    4.2 Compression and transmission schemes considered . . . . . 41

    4.3 Example of application of QDM . . . . . . . . . . . . . . . 45

    4.4 Conceptual scheme of the proposed approach . . . . . . . . 48

    4.5 QDM maps of a test image: (a) Original achieved by patch

    repetition of Baboon image; (b) QDM map for Semi-spherical

    Mirror; (c) Parabolic Mirror . . . . . . . . . . . . . . . . . 53

    4.6 Identification of a RoI from the QDM of a distorted image 56

    4.7 Comparison between compression schemes at increasing com-

    pression ratio for JPEG . . . . . . . . . . . . . . . . . . . 57

    4.8 Comparison between compression schemes at increasing com-

    pression ratio for JPEG-2000 . . . . . . . . . . . . . . . . . 59

    4.9 Performance comparison on Blood image . . . . . . . . . . 60

    4.10 Performance comparison on Tiled Baboon image . . . . . . 61

    4.11 Performance comparison on Mobile and Calendar image . . 61

    vii

  • 4.12 Performance comparison on a synthetic image generated by

    PovRay . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

    5.1 Block scheme of the proposed method. . . . . . . . . . . . 68

    5.2 mobile calendar sequence frame with and without RoI en-

    hancement layer. . . . . . . . . . . . . . . . . . . . . . . . 72

    5.3 AIDER sequence frame with and without RoI enhancement

    layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

    6.1 Total energy Etot and frame distortion (MSE) versus the

    power of the last packet . . . . . . . . . . . . . . . . . . . 84

    6.2 Probability of packet loss ρj versus the assigned power PL

    for the last 4 packets (j = L− 3, .., L) for a frame in a FGScoded video sequence . . . . . . . . . . . . . . . . . . . . . 86

    6.3 PSNR Gain in dB vs. interference plus noise . . . . . . . . 88

    6.4 Assigned Power vs. Packet Number . . . . . . . . . . . . . 89

    6.5 Average size of the bit-planes for the Foreman sequence

    (QCIF). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

    6.6 Experiment A results . . . . . . . . . . . . . . . . . . . . . 94

    6.7 Experiment B results . . . . . . . . . . . . . . . . . . . . . 95

    6.8 Experiment C results . . . . . . . . . . . . . . . . . . . . . 96

    6.9 PSNR: (a) experiment A, (b) B and (c) C. . . . . . . . . . 98

    6.10 Comparison be tween the PSNR obtained using measured

    data and the R-D model. . . . . . . . . . . . . . . . . . . . 99

    7.1 PSNR comparison of the equal energy distribution method

    and the proposed scheme for different modulations . . . . . 107

    7.2 Average PSNR comparison of equal energy distribution method

    and the proposed scheme for different modulations and en-

    ergy budgets . . . . . . . . . . . . . . . . . . . . . . . . . . 108

    7.3 Reed-Solomon RS(n, k) code. . . . . . . . . . . . . . . . . 109

    viii

  • 7.4 Random symbol block error performance for the RS(255,k)

    code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

    7.5 Performance Curves for RS codes with CR = 0.92. . . . . 111

    7.6 Performance comparison between the proposed approach

    and equal energy distribution . . . . . . . . . . . . . . . . 117

    7.7 Impact of different levels of error protection. . . . . . . . . 118

    7.8 Comparison among BPSK, FSK, and MSK modulation schemes

    in terms of PSNR at the receiver . . . . . . . . . . . . . . 119

    7.9 Performance improvement deriving by the introduction of

    RS(110, 100) code (BPSK, BT = 200KHz). . . . . . . . . . 119

    ix

  • Chapter 1

    Introduction

    Multimedia real-time transmission can very challenging, due to variations

    in throughput, delay, packet loss and limited resources. Indeed video trans-

    mission is very resource demanding in various ways because of the joint ef-

    fect of real-time requirements and high volume of data do be transmitted.

    Most concerns about multimedia transmission are related to compress-

    ing data, transmitting it using the limited bandwidth offered by the chan-

    nel, and protecting this data in such a way that the decoded sequence is

    acceptable to the user view. Indeed video data produces an huge amount of

    data that, without compression is unmanageable, due to the limitations in

    the storage size and transmission bandwidth. Currently, several compres-

    sion approaches are available, but without appropriate countermeasures,

    the generated bitstream can became very sensitive to errors. For this rea-

    son the newest video standards not only cover compression efficiency but

    also transmission related issues such as error resilience and scalability.

    In section 1.1 a brief introduction to video and image compression tech-

    niques is given, while multimedia transmission issues are covered in deep

    in section 1.4.

    1

  • CHAPTER 1. INTRODUCTION

    1.1 Video source coding techniques

    In video data, a certain amount of redundancy is present both in spatial

    (as for static images). Compression algorithms reduce this redundancy

    with various methods and eliminate information that the human eye can

    not perceive. In video compression, spatial redundancy is reduced through

    methods similar to those of used for image compression, based on DCT or

    wavelets. Temporal redundancy can be reduced through the encoding of

    the residual image resulting from the difference between the original image

    and the its prediction, resulting from the processing of the decoded near

    frames. The difference image is the encoded and transmitted, together with

    the side information necessary for generating the prediction image at the

    decoder side. This operation is usually referred as motion compensation,

    because it tries to compensate the motion between two or more frames, in

    order to find an image that can be considered as a good prediction of the

    original one.

    In the algorithms implemented by video standards, frames are typi-

    cally encoded into three different modes, usually called I-frame, P-frame

    and B-frame. I-frames, also called Intra frames, are coded without motion

    compensation, they allow synchronization and random access to the media,

    and are a reference point for motion compensated frames. P-frames, also

    called Inter frames, are encoded applying motion compensation to the pre-

    vious I or P-frame. B-frames, also called Bidirectional frames, are encoded

    applying motion compensation to both the previous and the successive I

    or P frames. This last type of frame has the interesting property that no

    other frame is encoded from its data.

    More complex encoding algorithms also exist, such as 3-D wavelets and

    DCT, that take in account a number of frames at time, performing in this

    way both spatial and temporal compression at the same time.

    2

  • 1.2. SCALABILITY IN IMAGE AND VIDEO CODING

    Currently various video standards exist:

    • ISO MPEG-1, ISO MPEG-2, ISO MPEG-4

    • H.263, H.26L, H.263+

    • H.264/AVC

    1.2 Scalability in image and video coding

    The need of scaling is present in various situations, specially in multicast

    or non live streaming where the same content must be used by different

    users, each of them with different available resources. Indeed, in this cases

    the characteristics of the user’s device or the available bandwidth offered

    by the channel are not known in advance. For being able to cope with

    these situations, it may be needed to scale the transmission bitrate, the

    spatial resolution, or computational complexity.

    Even if scalability is mainly used to cope with variable channel bitrate,

    it can also be used for allowing different display resolutions, computing

    resource capabilities, etc. Scalability techniques can be found for all types

    of multimedia data, and can be provided in various ways.

    In video coding several forms of scalability exist, depending on which

    aspect of the decoded sequence is affected. The main types are the tem-

    poral, spatial and SNR scalability. Also hybrid approaches, that combine

    more than one type of scalability, exist.

    Scalability is usually implemented during the compression process. In

    the traditional approach, also called Layered Scalability, the encoder en-

    coder generates more one bitstream are created: the more important is the

    Base Layer (BL), and can be decoded independently from the others layer

    generating a low resolution version of the original sequence. BL contains

    critical information because it is needed for the decoding of the subsequent

    3

  • CHAPTER 1. INTRODUCTION

    layers. These add information to the BL, and are called Enhancement

    Layers (ELs). The number of ELs depends on how many scalability layers

    are desired. When the main BL is jointly decoded with one or more ELs,

    a high resolution of the video is generated (in terms of quality, spatial or

    temporal resolution, according to the scalable coding technique).

    Temporal scalability generates different layers with increasing frame

    rate. It is the most immediate form of scalability, since it can be per-

    formed by dropping B-frames from the original bitstream. Another type of

    scalability is based on spatial resolution and is useful in cases where the res-

    olution of the display device is not known in advance. Finally, also PSNR

    scalability is possible. It allows to achieve the video quality increasing the

    number of layers considered.

    Scalability allows to avoid encoding and maintaining different copies of

    the same video at the server side, and the transmission of same data mul-

    tiple times (i.e. using simulcast techniques). It is a valid alternative to the

    use of simulcast transmission, since it solves the problem of transmitting

    a multiple versions of the original data.

    A more recent approach is the embedded coding (also known as progres-

    sive scalability). Instead of using different distinct layers, embedded coding

    implements scalability progressively in the same bitstream. The informa-

    tion is added as the bitstream is decoded, gradually increasing resolution

    to the reconstructed data.

    Forms of progressive scalability are present in image, video and even

    audio coding. It can achieved using wavelet transform or bit plane coded.

    An introduction to progressive coding for both image and video is given in

    chapter 2.

    Multimedia standards address scalability in various ways. For still image

    scalability, the JPEG2000 and MPEG-4 VTC standards that offer wavelet

    based embedded scalability.

    4

  • 1.3. REGION OF INTEREST AND NON UNIFORM COMPRESSION

    For video, some sort of scalability exist in all the most recent image and

    video standards:

    • MPEG-2 and H.262 implement temporal, spatial and SNR layeredscalability.

    • MPEG-4 includes coding modes that allow layered scalability (tempo-ral, spatial, SNR), object scalability and progressive SNR scalability,

    also known as Fine Granular Scalability (FGS).

    • H.263+ implements temporal scalability using B-frames.

    • H.264 implements temporal scalability using B-frames.

    • MPEG-21 video, currently under development, should in future in-clude scalability features.

    Only MPEG1, H.261 and H.263 video standards do not include any sort

    of scalability.

    A technique similar to scalability is multiple description (MD) coding.

    It has been included in the H.263+ standard and allows to encode the

    sequence into two equally important bitstreams. Each bitstream, when

    independently decoded, generates a low resolution version of the encoded

    sequence, but when decoded together generates a high definition version

    of it. MD is often proposed for the transmission over error prone networks

    where data can choose different paths for getting to its destination: if one

    path fails, part of the data can still be received and decoded.

    1.3 Region of interest and non uniform compression

    Visual data represented in an image or video sequence may not be equally

    important for the user. One of the reasons is because the human eye usually

    focus on the part of visual information more relevant from a semantic

    5

  • CHAPTER 1. INTRODUCTION

    point of view. We can think for example of an anchorman speaking in a

    news TV program. In this case the user is more interested on the face of

    the person and in particular to the mouth and eyes, rather than on the

    background. Another example are environmental images, where areas with

    weather intense activity, such as an hurricane’s eye, are certainly the more

    important for the interpretation of the image.

    Commonly a region of an image or video that contains more information

    for the user is called Region of Interest (RoI). A RoI usually delimits an

    area that contain information necessary for the user to correctly interpret

    the visual data. Often rectangular RoIs are preferred because are more easy

    to encode, but it can be of any shape. Practically, RoI’s shape is limited by

    coding algorithm characteristics. In the same image or video, more than

    one RoI can exist and they can have various degrees of importance, and

    the RoI can be segmented into sub-RoIs if some areas are more important

    than others.

    Since some regions are more visually important than other, loss and

    inaccuracy during coding and transmission are more tolerated outside RoI.

    This effect can be obtained through non-uniform lossy compression, tech-

    nique used in video/image compression to reserve more coding resources

    for the RoI, allowing a worst quality to the background. The idea is to

    obtain a non-uniform quality in the image through a non uniform com-

    pression, reducing in such a way the amount of data to be transmitted, or

    improving the perceived quality of the data.

    Currently, there exist various techniques in in video/image coding that

    allow RoI definition and they were introduced in the most recent standards.

    JPEG2000 was the first standard to introduce RoI definition. It is a recent

    image compression standard based on Discrete Wavelet Transform (DWT).

    In JPEG2000, for example, it is possible to specify a RoI during the coding

    phase in order to implement non-uniform compression. Another possibility

    6

  • 1.4. TRANSMISSION OF MULTIMEDIA BITSTREAMS

    is to specify the RoI during the decoding phase, allowing the user to specify

    the RoI a posteriori. This feature, in combination with the communication

    protocol JPIP, it allows a selective retrieval of the image and allows to

    introduce the use of RoI also as a scalability tool.

    RoI can be implemented also in video coding, for example through the

    object concept in MPEG4 video coding standard or through scalability

    tools like the selective enhancement in MPEG-4 FGS.

    In chapters 4 and 5 examples of the use of RoI and non uniform com-

    pression for transmission are presented.

    1.4 Transmission of multimedia bitstreams

    As far as the channel is concerned, video transmission requires a stable and

    robust channel, a high bandwidth (even compressed the amount of data is

    still significant), and little delay and jitter.

    Compressed multimedia data, and in particular video, is highly sensitive

    to transmission error. In compressed sequences, temporal and spatial pre-

    dictive coding allows error propagation, and VBR coding generates peaks

    of data where the encoder finds it hard to compress (for example in the

    presence of rapid movement in the sequence of complex scenes).

    Real-time video transmission, or streaming, is quite different from file

    transfer for different reasons. In file transfer, a file containing a certain

    amount of data must to be transmitted over the network integrally because

    only when the file is completely transmitted it can be used. If only a

    minimal a part of the data file is missing or damaged, the entire file is

    compromised. In video streaming, the user can start decoding and viewing

    the data before all the encoded sequence is transmitted, or even, in the case

    of live streaming, encoded. This approach to data transmission impose

    more tight constraints concerning transfer rate, delay and error resilience.

    7

  • CHAPTER 1. INTRODUCTION

    On the contrary of normal data, video data (as well as other multimedia

    data) can still be used even if some data is lost or missing. Not always

    transmission losses compromise the entire sequence and the eye can still

    compensate and tolerate the some errors. Moreover, the introduction of

    error control tools that allow the resynchronization, error recovery and

    concealment, help to easy the task.

    Various error control tools exist that allow to cope with channel fad-

    ing, packet losses, transmission errors. These can be implemented at the

    encoder or at the decoder side. The so called error resilient encoding falls

    into the first case. These techniques can follow different approaches, and

    include techniques that add redundancy to the bitstream, allow resynchro-

    nization, or divide the data in independent decodable sections.

    For example, they can be based on the spatial position of the errors,

    trying to isolate them in a limited portion of the image or of the bitstream.

    This can be achieved with resynchronization markers or data partitioning.

    In data partitioning, data is organized in the bitstream so that important

    data is grouped together and isolated. Other techniques are based on

    temporal characteristics of the encoded sequence, and involves the insertion

    of intra coded blocks or frames, at random or piloted by a criteria, such as

    minimum distortion. Tools for error resilient encoding are present in the

    most recent standards, such as in H.263, H.264 and MPEG4.

    At the decoding side it always possible to use error detection and con-

    cealment techniques for recovering from transmission errors [72]. The aim

    of error concealment is to exploit the knowledge of human visual system

    and common properties of visual data for reconstructing the missing bits

    for reducing as much as possible the perceived effects of loses. Conceal-

    ment algorithms have to mediate between computational complexity and

    effectiveness, since video have strict timing constraints and do not tolerate

    delays. Visual standards do not define how to conceal transmission er-

    8

  • 1.5. UNEQUAL ERROR PROTECTION

    rors, but give the decoder designer the freedom to choose an concealment

    approach appropriate to the system resources and requirements.

    Finally, it’s always possible to use mixed techniques that involve both

    encoder and decoder, for example including some sort of interactivity be-

    tween the both, based on feedback messages on received or lost data. Tech-

    niques that require exchange of control messages between encoder and de-

    coder usually are suitable to point-to-point transmission, but not always

    to point-to-multipoint scenarios.

    In [32], the authors present a review of several channel-adaptive video

    streaming techniques that, employed in different components of the system,

    are allows to provide efficient, robust, scalable and low-latency streaming

    video.

    For a review of the technical challenges of video streaming and ap-

    proaches how to solve the discussed problems in given in [46], while Zhang

    et alt., in [74], give a good overview of challenges and approaches in trans-

    porting Real-Time Video over the Internet.

    1.5 Unequal Error Protection

    Unequal Error Protection (UEP) of the bitstreams is implemented when

    different error resilience strategies are used for protecting different parts of

    data of the same multimedia file.

    UEP approaches can be implemented using different techniques. Typ-

    ically they consider characteristics of the encoded video for deciding the

    protection strategy to be adopted. This, because in multimedia bitstreams

    data is not equally important. It can be a good idea, then to protect more

    the data that is more important for the decoding process, or that allows to

    minimize distortion. From combining UEP with encoding strategies, such

    as data partitioning or scalable coding, different solutions can be found.

    9

  • CHAPTER 1. INTRODUCTION

    If we consider for example video coding, I-frame reception is critical,

    because of the prediction coding techniques that cause error propagation,

    while the loss of a B-frame creates an isolated error, not visible in the

    subsequent frames. An UEP strategy can be defined differentiating the

    transmission of I- and P-frames from B-frames, for example adding variable

    error correction codes to the transmitted data. Data partition can be used

    combined with UEP for protecting important data, such as motion vectors

    information, more heavily. It also possible to apply UEP to layered scalable

    bitstreams, using techniques that differentiates the protection applied to

    different layers [35]. Another possibility is to use RoI based encoding and

    UEP.

    1.6 Joint source and channel coding

    Even if encoders include a great number of tools, including error resilience

    ones, the main task of source coding is of reducing the bit size of the data,

    using techniques eliminating spatial and, for video sequences, temporal

    redundancy. On the other side, channel coding introduce redundancy to

    protect data from channel errors and packet losses. Forward Error Correc-

    tion codes (FEC) are mainly used for this aim, and between them the most

    popular are the Reed-Salomon codes. They allow to correct up to a cer-

    tain number of errors within a block of bits. When the data transmission

    is packet based, problem is to cope with packet losses and FEC is usually

    applied across packets.

    Shannon’s source and channel coding theorem ([Ref] C. E. Shannon,

    A mathematical theory of communication, Bell System Technical Journal,

    vol. 27, pp. 379-423, 623-656, 1948.) states that, under certain conditions,

    in a communications system source and channel coding can be optimized

    independently. This important theorem is the foundation of design of many

    10

  • 1.7. SCOPE AND MAIN CONTRIBUTIONS

    communications system, however the hypothesis needed for its validity can

    become very restrictive for recent communication systems and specially for

    video transmission. Indeed first the theorem assumes that it is possible to

    use codewords of infinite length. This implies that we should allow infinite

    delay in transmission, a restrictive hypothesis for real-time transmission.

    A second requirement is to consider only point-to-point transmissions.

    Methods that jointly consider the source and channel coding can be

    used when Shannon’s theorem is not valid. Instead of applying source and

    channel coding as two independent steps they are considered and optimized

    together, in order to better exploit the knowledge arising from the coding

    process during the transmission. Commonly known as joint source and

    channel coding (JSCC), it is usually implemented at the application layer.

    A great number of JSCC have been studied for both image and video

    transmission, in chapter 3 an introduction of the use of these techniques in

    wireless transmission in given.

    1.7 Scope and main contributions

    In this thesis we present some approaches used for embedded image and

    video transmission over wireless.

    An introduction to progressive coding is given in 2, while in chapter 3

    existing techniques for progressive coding transmission over wireless net-

    works are presented.

    In chapter 4 we introduce the use of non uniform compression for image

    transmission, while the use of RoI and embedded coding for interactive

    transmission of video is discussed in chapter 5.

    In chapter 6 we introduce an general approach to unequal error protec-

    tion of embedded bitstream based on energy management. The method

    found, allows to optimize the energy distribution among the packets, in

    11

  • CHAPTER 1. INTRODUCTION

    order to minimize the distortion or the energy consumption, while in chap-

    ter 7 the efficiency of the method is compared for different modulation

    schemes, and with or without channel coding.

    12

  • Chapter 2

    Progressive coding in video and

    image compression

    Multimedia data can be coded with coding techniques that generate em-

    bedded bitstreams. Scalability allows encoding a video sequence in such a

    way that the compressed video can accommodate different bitrates. The

    progressive coding approach differs from traditional layered methods be-

    cause the set of possible rates varies in a nearly continuous way.

    The progressive scalability main characteristic is its capability to achieve

    a smooth transition between different bit rates since the enhancement layer

    frame information can be efficiently truncated at any point in order to

    achieve the desired target and still be decoded correctly.

    Indeed, in pure embedded bitstreams there are no distinct layers, as it

    happens traditional layered coding. Indeed, in the traditional approach,

    scalability is achieved by coding the data into different separate coding

    layers, starting from the Base Layer (BL), which contains essential infor-

    mation, and then by generating one or more Enhancement Layers (ELs)

    with additional data. In progressive coding, scalability is achieved though

    the direct truncation of the main bitstream. This approach differs from

    traditional layered methods for video scalability because of its capability

    to achieve a smooth transition between different bit rates.

    13

  • CHAPTER 2. PROGRESSIVE CODING IN VIDEO AND IMAGE COMPRESSION

    When decoded, these bitstreams progressively add resolution data to

    the recovered image or sequence. During the decoding, the process can be

    interrupted at any point, and the data decoded up to that point can be

    interpreted as a low resolution version of the fully decoded data.

    Progressive scalability allows to gradually obtain the improvement of

    the decoded video, making it useful in applications like video browsing or

    remote access to video servers, in particular when dealing with narrowband

    data channels (like mobile applications).

    This encoding method can be employed with success in the field of

    video communication, allowing real-time stream processing able to adapt

    the bitstream to the channel bandwidth. In the context of rate control,

    progressively coded bitstreams can be used for obtaining fine granular data

    representations at lower bitrates, since these bitstreams have the property

    of allowing different spatial/quality resolutions depending on the amount

    of data being transmitted and decoded.

    Progressive coding inherently allows also complexity scalability and easy

    resource adaptation depending on the capabilities of video devices. From

    the transmitter’s point of view, this means that the same bitstream can

    accommodate the different bitrates needed for sending video data to users

    on networks with heterogeneous capacity. The receiver can decide to de-

    code only the amount of data supported by its own resources (i.e. memory,

    computation power etc.).

    The most popular progressive coding implementations are based on

    wavelet transforms and/or bitplane coding. These techniques enable the

    progressive coding of image, video and even audio data. For image cod-

    ing, wavelet-based coding techniques, like those used in SPHIT [56] and

    EBCOT [62] can be used. These techniques differ on how the compression

    is achieved, but all of them can generate progressively coded bitstreams.

    In particular, wavelets were exploited by the newest image compression

    14

  • 2.1. PROGRESSIVE SCALABILITY IN JPEG2000 IMAGE CODING STANDARD

    standard, JPEG2000 [4][64] which is based on the EBCOT paradigm and

    not only delivers a state-of-the-art compression performance, but also is

    flexible to accommodate tools for the implementation of region of interest

    (RoI), perception-based quality optimization, and quality layers.

    Also in video compression wavelets can be used: 3-D wavelet coding

    schemes, such as 3-D SPHIT [39], can be used for obtaining embedded

    bitstreams of video data. These techniques group together a sequence of

    frames and apply to them the 3-D wavelet transform, eventually allowing

    both temporal and quality scalability.

    Another important approach is represented by the Fine Granular Scal-

    ability (FGS), since it was recently included in the streaming profile of

    MPEG-4 standard, Part 2 [45][5].

    The most recent video standard, H.264/AVC, implements only tempo-

    ral scalability, using B-frames, but a special committee (known as SVC,

    Scalable Video Coding) is evaluating the possibility of inserting progressive

    coding.

    Finally, also in audio coding it is possible to implement progressive

    coding [43][47], and progressive coding techniques have been added also to

    the MPEG-4 Audio standard [3].

    2.1 Progressive scalability in JPEG2000 image cod-

    ing standard

    For creating the embedded bitstream, the JPEG2000 baseline compression

    scheme [64] starts from a partitioning of the image into rectangular regions

    called tiles, to each of which a discrete wavelet transform (DWT) is applied.

    The DWT generates several wavelet sub-bands, which are divided for cod-

    ing purposes into several smaller blocks called codeblocks. Each codeblock

    is then independently quantized and bitplane encoded, thus achieving an

    15

  • CHAPTER 2. PROGRESSIVE CODING IN VIDEO AND IMAGE COMPRESSION

    embedded bitstream at codeblock level. Codeblocks are then grouped to-

    gether to form precincts.

    From the error resilience point of view, when the decoder detects an

    error in the codeblock data, it typically discards all the successive data

    related to this codeblock. This produces a decoded codeblock equivalent

    to the one generated by an encoder using a coarser quantization parameter.

    As far as quality scalability properties are concerned, JPEG2000 creates,

    at the encoding time, a certain number of Quality Layers (QLs). They are

    formed in such a way to accommodate different coding rates and qualities in

    the same bitstream. The user can decide how many QLs to implement and

    the coding rates they must achieve. Each QL progressively accommodates

    a given number of bits from each precinct. The contributions from each

    precinct are chosen by the encoder in such a way to minimize the distortion

    at the target rate. Each quality layer progressively reduces the distortion

    of the decoded image in an optimal way in the rate-distortion sense. If

    the number of layers is large enough, the distortion associated with the

    bitstream truncated at a random point will be close to the optimal one. In

    general, a layer is completely decodable only if all the precedent layers are

    received; the first layer is then fundamental for the decoding of the entire

    bitstream, and the importance of the layer decreases as we go from lower

    to higher layers.

    Rate-distortion statistics for each QL are generated by the encoder

    during coding time. If the number of QL is sufficiently dense, the rate-

    distortion curve of the original image can be constructed based on the

    statistics obtained for each QL.

    2.1.1 Rate-Distortion information

    Most unequal error protection algorithms require the operational distortion-

    rate curve of the source coder of the original images. A general R-D model,

    16

  • 2.2. PROGRESSIVE SCALABILITY IN MPEG-4 VIDEO CODING STANDARD

    valid for progressively coded images (jpeg2000) is shown in [14]. The au-

    thors propose the use of parametric models instead of the true D/R curves

    for wavelet-based embedded image and video coders. This model is also

    used in [61].

    In JPEG2000, rate-distortion data can also be collected during the en-

    coding phase.

    2.2 Progressive scalability in MPEG-4 video coding

    standard

    MPEG-4 FGS (Fine Granular Scalability) is a video coding approach that

    allows introducing quality scalability to the encoded video. It uses a mixed

    implementation of layered scalability and bit-plane coding for obtaining

    two bitstreams, commonly called Base and Enhancement Layers.

    The Base Layer (BL) contains essential information about the sequence

    and can be decoded independently from the Enhancement Layer (EL),

    producing a low quality reconstruction of the video sequence. A higher

    quality reconstruction can be then achieved by decoding both the Base

    and Enhancement Layers together. Since the EL is progressively coded, it

    can be used to gradually add information and detail to the BL.

    Due to its structure, the EL can be truncated at any point and is still

    used to add information to the decoded BL. The FGS’s inherent scalability

    and flexibility, also allows complexity scalability and easy resource adapta-

    tion depending on the capabilities of video devices. Thus FGS is suitable

    for video conferencing and video multicast. An interesting overview of

    applications enabled by FGS technology is given in [67].

    The Part 10 of the MPEG-4 standard includes FGS encoding and a hy-

    brid method that combines FGS with temporal scalability (called also FSG-

    T). Advanced MPEG-4 FGS tools are Selective Enhancement, Frequency

    17

  • CHAPTER 2. PROGRESSIVE CODING IN VIDEO AND IMAGE COMPRESSION

    Weighting and Synchronization Markers for improving error resilience.

    In FGS the base layer (BL) behaves as a normal compressed bitstream

    (like MPEG-4 Simple Profile), while the difference between the encoded/decoded

    video sequence and original video sequence is encoded in the Enhancement

    Layer (EL) (Fig.2.1).

    DCT Q

    Q-1

    IDCT

    MotionCompensation

    MotionEstimation

    FrameMemory

    VLCOriginal Sequence

    Base LayerBitstream

    Bit-planeShift

    FindMaximum

    Bit-planeVLC Enhancement Layer

    Bitstream

    FGS Enhancement Layer Encoding

    Clipping

    DCT

    Figure 2.1: FGS encoder block schema.

    Progressive decoding is achieved by a bit-plane coding of the DCT of

    the residual image: the frame data is transmitted starting from the most

    significant bit-plane (MSB) to the last one (LSB). DCT is performed in

    a block basis, as for the base layer, but is bit-plane coded after zig-zag

    scanning of the coefficients (Fig.2.2)

    Data of each bit-plane (BP) is grouped in macroblock basis, and sent

    one MB at time, starting from the upper left corner.

    Usually the MSBP is the BP with smaller size, and BP size increases as

    it goes from MSB to LSB (Fig.6.5).

    18

  • 2.2. PROGRESSIVE SCALABILITY IN MPEG-4 VIDEO CODING STANDARD

    Figure 2.2: Bit-plane encoding of the Enhancement Layer.

    Figure 2.3: FGS bitstream.

    Figure 2.4: FGS bitplane truncation.

    19

  • CHAPTER 2. PROGRESSIVE CODING IN VIDEO AND IMAGE COMPRESSION

    It is likely that the first bit plane mostly contains information from

    those MBs that the BL found more difficult to compress, such as those

    containing movement. The first BP will contain also a great number of

    zeros, due to small error in the others MBs.

    Not all the data in the EL have the same importance: data from the

    most significant bit planes are necessary for the decoding of the following

    ones, and this data carries also more information, with respect to the last

    bit planes.

    Due to its structure, the EL can be truncated at any point and still be

    decoded. Truncation can happen at any point of the EL of a frame: it

    can happen, then, that half a frame is coded with a better quality then

    the rest of it, since the EL is truncated at the middle of the BP for that

    frame. This is more likely to happen in the LSBs, since MSB contains a lot

    of zeros and implicitly gives more importance to certain MB with respect

    others giving a sort of quality priority.

    FGS allows performing rate control on pre-encoded sequences simply

    truncating the EL in a such a way to satisfy the bit budget required.

    While the Base Layer is compressed setting a maximum bit rate RB,

    such that it can always be transmitted over the channel, the Enhancement

    Layer can be cut in such a way that the FGS coded video can be trans-

    mitted at any bit rate greater than RB (and minor of a certain RE, that

    depends on the number of BP used in the EL), fully utilizing the band-

    width available at the transmission time. In this way it is possible to adapt

    the coded video to the time-varying condition of the channel.

    FGS can also be used in multicast environments, allowing the transmis-

    sion of the same compressed video to users with different requirements.

    20

  • 2.2. PROGRESSIVE SCALABILITY IN MPEG-4 VIDEO CODING STANDARD

    2.2.1 FGS decoder simplification using post-clipping

    In Fine Granular Scalability, the residual image from which the enhance-

    ment layer is created can be computed using a pre-clipping or post clipping

    structure [54].

    In a pre-clipping structure the residue is computed directly in DCT

    domain from the difference from the original DCT coefficients and the

    quantized ones (obtained during the BL coding). In this case EL, during

    decoding, depends on intermediate results of the BL decoder. In streaming,

    VOLs can arrive at different times and cross dependencies between BL

    and EL decoders (such as use of intermediate data) may restrict decoder

    implementation options.

    For decoupling EL from BL decoding, a post-clipping approach must be

    used. In this case the residue is calculated from the difference of the de-

    coded BL VOP and the original one, not using in this way any intermediate

    information.

    In the MPEG-4, the FGS is implemented using a post-clipping coding

    scheme [54] because it presents implementation advantages. Indeed, in this

    kind of schema, the base and enhancement layers are de-coupled, and the

    residue can be computed directly in the spatial domain. Various decoder

    implementation are possible: it can be implemented as a sequential de-

    coder, using the same hardware that operates both on BL and EL VOP;

    as a parallel decoder, using dual hardware implementation that operates

    on different VOPs (for base/enhancement); or as pipelined decoding.

    2.2.2 FGS Advanced Features

    Advanced features in FGS help to improve visual quality, usability, and

    error resilience.

    21

  • CHAPTER 2. PROGRESSIVE CODING IN VIDEO AND IMAGE COMPRESSION

    Fine-granular temporal scalability

    Fine-granular temporal scalability (FGST) is a hybrid SNR-temporal scala-

    bility and allows to trade off between individual frame quality and temporal

    resolution.

    FGST can be implemented with low added complexity, allowing to ob-

    tain a lower transmission bitrate both slowing down the sequence frame

    rate and reducing the quality in terms of PSNR.

    In MPEG-4 FGST is implemented in two modes: as a single layer scal-

    ability structure, referred as FGST, and as two layers structure, referred

    as FGS-FGST.

    Selective Enhancement

    Selective Enhancement (SE) is implemented at a frame level and allows to

    arrange the bit-plane coding order based on region selection. The region

    of interest considered (RoI) can be arbitrary shaped, and have as shape

    unit a macroblock. This method allows to transmit more bit-planes from

    the RoI macroblocks.

    SE is a tool for encoder optimization and can be used for a number of

    operations such as region-based quality adjustment or object tracing, and

    can be combined with frequency weighting operations (see next paragraph).

    Automated RoI selection combined with SE can be implemented in vari-

    ous ways. An example of how it is possible to use the SE tool for improving

    the encoding perceived quality of a video conference stream is given in [68].

    In this work, the SE is used combined with a real-time face detection algo-

    rithm and improving in this way the subjective visual quality of streaming

    video under various transmission bit-rates.

    A more complex example is given in [36], where the an automated coding

    mode selection is implemented. The encoder, based on the video content

    22

  • 2.2. PROGRESSIVE SCALABILITY IN MPEG-4 VIDEO CODING STANDARD

    and the current available bandwidth, selects between the available coding

    schemes (FGS, FGST, FGS-SE, and FGST) in order to achieve higher

    perceptual video quality. SE and background selection is based on the

    contents of the video sequences.

    Even if in the successor of MPEG-4, H.264, no FGS and SE are imple-

    mented, it is also interesting to see how H.264 features are used in order to

    automatically select the visually important regions to be used as SE regions

    in a non standard H.264 FGS implementation [66]. The method requires

    low computational complexity and can be used in real time transmissions.

    Frequency Weighting

    Frequency weighting (FW) method has been included in the MPEG-4 stan-

    dard to allow the prioritized transmission of low frequency DCT coeffi-

    cients. When different frequencies are treated equally and the precision is

    limited by FGS bit-plane truncation, in certain sequences some flickering

    artifacts can occur. This happens when high-frequency residues are added

    to a low quality blocky BL.

    Using the FW tool, DCT frequencies can be weighted on the basis of

    different psychovisual importance. The approach is used for giving more

    precision for low frequency and it is similar to the use of customized quanti-

    zation matrices in the BL. For applying FW correctly, separate weighting

    matrices must be used for I-frames and P-frames to cope with different

    statistics since the first is applied to the residue of quantization, and the

    latter to the residue of motion estimation.

    An example of application of FW can be found in [53] where the authors

    use different FW matrices in order to improve the FGS visual quality.

    The FW matrice is chosen automatically, depending on the video sequence

    characteristics and succeeds in improving the visual quality of the decoded

    sequence.

    23

  • CHAPTER 2. PROGRESSIVE CODING IN VIDEO AND IMAGE COMPRESSION

    Error Resilience

    Standing that no temporal error propagation can occur in the FGS EL,

    since no prediction scheme is implemented in this layer, the more significant

    problem is maintaining the synchronization and being able to decode as

    much as possible of sequential data.

    In FGS data partition is used as an error resilience tool in order to

    improve the quality of transmission of EL over error prone channels. Re-

    synchronization markers are used considering bit-plane relations.

    The error resilience of FGS video streaming is studied in [70][78]. An

    improvement attempt is described in [77] where an Header Extension Code

    (HEC) is proposed.

    2.2.3 Rate-Distortion model of the FGS bitstream

    It is possible to define a rate-distortion (R-D) model of the EL based on the

    statistics collected during the encoding. The rate distortion model utilized

    can be derived either from empirical considerations or from analytical cal-

    culations. An interesting analysis of the FGS EL layer is given by Loguinov

    and Radha in [26][24], where also a distortion model is defined.

    Depending on the application, a simple R-D curve obtained from the

    R-D data measured at each bitplane can be used. Indeed, experimental

    measurements show that the R-D curve for the FGS EL is approximately

    linear within a bitplane [82][81]. This is reasonable if we consider that

    inside a bitplane the distortion is improved gradually by adding bitplane

    information one MB at time, and if we consider that the statistics proper-

    ties of a bitplane is constant within the bitplane itself. From the R-D data

    measured for each BP, we can then obtain a good approximation of the

    R-D curve (Fig.2.5). We recall that the R-D data can be easily calculated

    also in the frequency domain, as highlighted in [25].

    24

  • 2.2. PROGRESSIVE SCALABILITY IN MPEG-4 VIDEO CODING STANDARD

    0 0.5 1 1.5 2 2.5 3 3.5 4

    x 104

    0

    20

    40

    60

    80

    100

    120

    Rate (bits)

    Dis

    tort

    ion

    (MS

    E)

    R−D modelMeasured dataMeasured data at each BP

    BL

    BP 1

    BP 2

    BP 3

    BP 4

    Figure 2.5: Comparison between the measured data and the R-D curve calculated from

    the BP data.

    25

  • CHAPTER 2. PROGRESSIVE CODING IN VIDEO AND IMAGE COMPRESSION

    Cuetos et alt. collected interesting FGS statistics data in their work [27],

    where they present a publicly available library of frame size and quality

    traces of long MPEG-4 FGS encoded videos.

    2.3 Applications of progressive coding

    Progressive coding provide easy bandwidth adaptability because it allows

    to separate the encoding process from the transmission. Indeed encoder do

    not need to know the bitrate at which the bitstream will be transmitted.

    Moreover, at the receiver side, if necessary it is possible to decode only part

    of the transmitted data, according to its own computational capabilities.

    The same transmitted bitstream can be used by different user or appliances,

    according to their own needs and resources.

    Progressive coding can be used as a pre-encoded video rate control tool.

    Indeed, in order to cope with bandwidth variations, often present in wire-

    less links, some sort of rate control must be adopted. Traditionally in

    real-time non-scalable coding and transmission, data bitrate is adapted on

    the fly to the available bandwidth during coding, in order to adapt the

    bitstream to the changing conditions of the channel. If the data is already

    compressed, this approach is not possible and transcoding techniques may

    be adopted. These methods allow to create a lower bitrate version of the

    video data directly from the compressed bitstream, without going through

    a computationally intense compression-decompression process. An alter-

    native to this approach is switching during the transmission time between

    pre-encoded bitstreams, using simulcast techniques.

    In this context, scalable bitstreams can represent a good solution to rate

    control of pre-encoded sequences since this technique allows to encode once

    but at different bitrates. Traditional layered encoded bitstreams permit to

    transmit the data at different, but fixed, bitrates. A further enhancement

    26

  • 2.3. APPLICATIONS OF PROGRESSIVE CODING

    to this approach, is is given by the use of embedded bitstreams for perform-

    ing a fine granular rate-control. In [52] Radha and Parthasarathy present

    two optimum (in a rate-distortion sense) rate-control algorithms for FGS

    scalable video transmission.

    An interesting review of FGS applications is given by Schaar et alt. in

    [67]. The paper refers to FGS coding, but the considerations made in it

    can be applied as well to other scalable coding techniques.

    27

  • CHAPTER 2. PROGRESSIVE CODING IN VIDEO AND IMAGE COMPRESSION

    28

  • Chapter 3

    Wireless video and embedded

    bitstreams transmission

    Wireless transmission is becoming popular both in home and office context,

    and the number and variety of emerging application is growing. Residential

    WLAN are growing in popularity, and wireless hot spots can be found in

    major airports, hotel chains and conference rooms.

    The term Wireless Transmission is commonly related to frameworks re-

    lated both mobile/cellular services and wireless data networks. WLAN’s

    standards include IEEE 802.11 HIPERLAN, Bluetooth, NMAC etc. Mo-

    bile transmission include GPRS, 3G, UMTS, EDGE and next 4G technolo-

    gies. It is realistic to think that, in the future, these two frameworks will

    merge together.

    As Girod and Fäber highlight in [31], the challenges of transmission

    of multimedia data transmission go well beyond the problems related to

    the poor bandwidth. In wireless based networks, errors during transmis-

    sion cannot be avoided, due its own nature, even when error correction

    techniques are implemented. The authors argue that the only practicable

    solution is to achieve a compromise between reliability, throughput and

    delay. The authors focus their work on cellular networks, but most of

    the presented strategies can be applied to the transmission of video over

    29

  • CHAPTER 3. WIRELESS VIDEO AND EMBEDDED BITSTREAMS TRANSMISSION

    WLANs.

    A review of error concealment strategies for fine granularity scalable

    (FGS) video transmission is given by [11].

    3.1 Joint source and channel coding in wireless trans-

    mission

    Various approaches of JSCC applied for wireless communications links exist

    in literature, a general approach is presented in [9] by Appadwedula et alt.

    The authors propose a JSCC schema based on a parametric distortion

    model. An advantage of the method is that it can be applied to most

    classes of source and channel coders, making it possible to obtain nearly

    all of the benefits of joint source-channel optimization by matching existing

    source and channel coding standards using a simple and general approach.

    3.2 Unequal error protection in progressive and scal-

    able bitstreams

    The application of UEP to the transmission of progressive coded images

    and video is not new. It has been implemented in various fashions, and

    some UEP techniques designed for other contexts can be adapted to the

    transmission of embedded bitstreams.

    As highlighted before, in embedded bitstreams, the data is implicitly

    sorted by its importance, and this characteristic can be used for the im-

    plementation of error resilience techniques based on the UEP. Traditional

    equal error protection (EEP) schemes consider all the data as having the

    same importance and assign the same amount of protection to the whole

    bitstream. On the other side, UEP schemes give more importance, hence

    more protection, to the most critical parts of the coded image.

    30

  • 3.3. CROSS-LAYER APPROACHES

    Reed Salomon codes were used by Natu and Taubman [48] for the pro-

    tection of JPEG2000 bitstreams during transmission over wireless channels.

    In [75] the channel coding is used for implementing UEP of a JPEG2000

    bitstream.

    For video transmission, the application of UEP within the EL FGS

    bitstream was first considered by Schaar et alt. in [70], where the frame-

    grained loss protection (FGLP) framework was introduced. Based on it,

    Yang et alt. proposed in [78] a “degressive” protection algorithm (DEP)

    based on FEC for optimal assignment of protection redundancy among

    bit-planes. In [71], Wang et alt. studied the problem of rate-distortion

    optimized UEP for Progressive FGS (PFGS) over wireless channels using

    prioritized FEC for the BL and EL. A similar problem was studied in [79]

    in which the objective was to minimize the processing power for PFGS

    video given bandwidth and distortion constraints.

    3.3 Cross-layer approaches

    Each network layer (physical, link and application), are able to individu-

    ally apply error protection schemes, that are independent from each other.

    This behavior is implicit in the layering paradigm commonly used in the

    networks’ structure definition. Of course, independent strategies do not

    provide the overall optimal solution, since they ignore each other and do

    not create useful synergies.

    The idea behind cross-layer approaches is to jointly consider the error

    protection strategies at various layers, in order to improve the transmission

    efficiency in terms of protection, bandwidth and resource consumption.

    These techniques do not necessary involves JSCC, and are aware of the

    hole system.

    Usually cross-layers approaches use optimization strategies in order to

    31

  • CHAPTER 3. WIRELESS VIDEO AND EMBEDDED BITSTREAMS TRANSMISSION

    minimize the overall resource utilization, or video distortion, and parame-

    terize the characteristics of the different network layers. The adaptation

    parameters that can be considered in a cross-layer scheme, can be found

    at any layer:

    • physical layer: transmission power, antenna characteristics, modula-tion and equalization schema;

    • link layer: frame size, error correction coding strategy, ARQ, admis-sion control and scheduling, packetization

    • transport and network layer: signaling and packetization

    • application layer: compression strategy, error concealment, rate con-trol, error correction codes, ARQ, scheduling, packetization

    The number of parameters involved in this process influence the overall

    complexity of the optimization solution, and should be limited if a feasible

    approach is wanted.

    For complex problems, closed solutions are difficult to achieve, and dy-

    namic programming is often used for finding solutions.

    Resource optimization can involve one or more aspects of the trans-

    mission, and can aim to optimize the consumption of a single resource

    (for example the transmission energy or the bandwidth) or result (overall

    distortion, video quality).

    Shakkottai et alt. give in [60] an interesting overview of the issues

    related to cross-layer design for wireless networks.

    Cross-layer approaches to embedded bitstream transmission have been

    studied. In [80] an hybrid UEP and ARQ scheme is used for the trans-

    mission of scalable video over wireless. FEC and ARQ are also used for

    transmission of FGS streams over 802.11 channels in [76] and [41]. The ap-

    proach considers the characteristics of IEEE 802.11 WLANs for evaluating

    32

  • 3.4. JOINT SOURCE CODING AND POWER CONTROL

    the transmission parameters.

    In [33] a cross-layer optimization of OFDM transmission systems for

    MPEG-4 video streaming is presented. In [17] Radha and Cohen presents

    an efficient method for streaming FGS video over packet-based networks.

    In [44], Li and van der Schaar present several heuristic algorithms for

    real-time transmission of layered video bitstreams over wireless LANs, pro-

    viding and adaptive QoS through real-time retry-limit adaptation (RTRA).

    In [38] Khayam et alt. propose the MAC Lite strategy as a cross-

    layer protocol design for real-time multimedia applications over 802.11b

    networks.

    3.4 Joint source coding and power control

    For wireless networks energy is an important and limited resource. In order

    to optimize its consumption, source coding parameters and power control

    can be jointly considered and optimized. This form of cross-layering is

    commonly addressed as joint source coding and power control (JSCPC).

    In [83], a joint FEC and transmission power allocation scheme for lay-

    ered video transmission over a multiple user CDMA networks was pro-

    posed. In the work, scalability was achieved using 3D-SPIHT (wavelet

    based coding). The objective was to minimize the end-to-end distortion

    through optimal bit allocation among source layers and power allocation

    among different CDMA channels.

    The authors in [12] considered jointly adapting the source bit rate and

    the transmission power in order to maximizing the performance of a CDMA

    system subject to a constraint on the equivalent bandwidth. In that work,

    an H.263+ codec was used to generate the layered bitstream.

    In [13] Chan consider a JSCPC approach for video transmission over

    3G wireless CDMA cellular networks. In [59], Sehlstedt and Le Blanc pro-

    33

  • CHAPTER 3. WIRELESS VIDEO AND EMBEDDED BITSTREAMS TRANSMISSION

    pose the use of alternate metrics to dynamically fine tune the performance

    optimization and to use a dynamically adjustable bit-energy distribution.

    For a progressive coded video, the position of the first bit error within

    a frame is of more importance than the overall bit error probability. In

    [30], Fossorier, Xiong and Zeger derive the optimal channel code rate and

    the optimal energy allocation per transmitted bit for the transmission of a

    progressively, numerically optimizing the choice of channel code rate and

    the energy per bit allocation.

    In [15] the authors propose an energy-aware MPEG-4 FGS video stream-

    ing system with client feedback.

    3.5 Modulation based UEP

    An interesting approach to cross-layer and UEP, involves the usage of mul-

    tiple modulation channels.

    In [10], Atzori propose an approach for robust transmission of JPEG2000

    images over wireless networks using a wavelet transmultiplexer. In [69],

    Schaar and Meeha propose to use an adaptive modulation scheme in com-

    bination with FGS coded video, in order to obtain an UEP based video

    transmission over wireless. The approach, termed Adaptive Modulated

    FGS (AM-FGS), is able to cope with channel bandwidth variations and

    degradation exploiting the FGS structure and tailor the modulation scheme

    to the channel conditions and data characteristics.

    34

  • Chapter 4

    Non uniform compression in image

    and video transmission

    In wireless channels bandwidth is limited and must be used wisely. Non

    uniform compression using RoI is an interesting option in transmission

    because it allows to distribute bits according to data importance achieving

    a better perceived quality.

    In this chapter, we consider the transmission of still images affected

    by geometrical distortion. For their transmission, the effects of different

    lossy compression strategies are analysed. Indeed, in this specific case, the

    encoding-decoding process and the geometric correction, together generate

    a non-homogeneous image degradation, since different amount of infor-

    mation associated to each resulting pixel. A distortion measure named

    Quadtree Distortion Map (QDM) able to quantify this distortion is de-

    scribed in the following chapter. In in order to ensure a uniform quality on

    the final image and to QDM exploited during compression. The resulting

    method is able to reduce the total size of compressed geometrically dis-

    torted pictures. From tests performed using JPEG and JPEG2000 coding

    standards it is shown that it is possible to improve both the measured and

    the perceived quality of the transmitted image.

    In the next section some background on nonuniform compression of geo-

    35

  • CHAPTER 4. NON UNIFORM COMPRESSION IN IMAGE AND VIDEO TRANSMISSION

    metrically distorted images is given and the effects of non-linear geometric

    distortions on co-decoded images are considered. A detailed description of

    the concept of Quadtree Distortion Map (QDM) is given in 4.2. In section

    4.3, it is shown how QDM can be used to design an adaptive image com-

    pressor able to achieve a uniform error distribution over the decompressed

    and de-warped image. It is also shown how this approach can be applied to

    the standard JPEG and JPEG2000 image compression algorithms, while

    maintaining full compliance only in the latter case. A selection of quanti-

    tative results is provided to demonstrate the viability and effectiveness of

    the proposed approach is provided in section 4.4. 1

    4.1 Nonuniform compression of geometrically distorted

    images

    Images acquired by optical sensors usually present some kind of geomet-

    rical distortion due to the characteristics of lenses and sensors adopted

    in the acquisition system, or to the physical structure of the object un-

    der inspection, such as in the case of textures projected onto non-planar

    surfaces [29]. In specific applications, such effects may also become more

    significant, due to the specific nature of the acquisition system. This is

    the case for instance of acquisition systems used in video surveillance or

    ambient intelligence applications, where wide-angle lenses are commonly

    used to acquire large areas with a single camera. In particular, fish-eye

    lenses and panoramic lenses using omni-directional mirrors are adopted to

    grab large portions of narrow indoor environments (a room, a car inside,

    etc.) [[58]-[58]]. Another application that strongly suffers from geometrical

    distortion is remote sensing [73].

    In the projection of the real-world scene onto the image plane, the geo-

    1This chapter was published in [21]

    36

  • 4.1. NONUNIFORM COMPRESSION OF GEOMETRICALLY DISTORTED IMAGES

    metrical distortion acts as a non linear spatial compression and expansion

    of the luminance function in the pixel plane. This may cause problems

    in all the successive image treatment stages, from low-level processing to

    the interpretation of the scene, and can be partially solved by applying

    geometrical correction techniques based on sensor models and calibration

    processes. Unfortunately, the correction is only seldom operated at the

    sensor level, while it usually takes place at some remotely connected unit,

    where the application software is run. The geometrical correction may

    then happen to be carried out after that important processing steps have

    already been applied: in particular, compression and encoding of images

    is often implemented on-board to attain a more efficient transmission.

    Some proposals to exploit the knowledge about the acquisition process

    to improve image processing have already been made, with application to

    specific domains such as medical tele-radiology [23]. In [51] a generic and

    very simple acquisition model is studied, where the acquisition sensor is

    modeled through a modulation transfer function which simply introduces

    blurring. Another related work on the topic can be also found in [57], where

    the features of a retina-like sensor, associated with an omni-directional

    mirror, are exploited for imaging purposes.

    In this framework, we have investigated the impact of the geometrical

    distortion on image compression. The aim is to limit as much as possible

    the amount of encoded data in order to accommodate it to the transmission

    bandwidth available. As a result, we verified that it is possible to improve

    the compression performance when encoding is applied to the geometrically

    distorted image.

    The analysis was conducted on both distorted images produced by real

    systems, and synthetic images achieved by warping algorithms that sim-

    ulate common distortion effects (fish-eye and mirrored lenses). For this

    reason we will use in the following the terms warping and distortion, as

    37

  • CHAPTER 4. NON UNIFORM COMPRESSION IN IMAGE AND VIDEO TRANSMISSION

    well as the opposite terms analysis-warping and geometric correction, re-

    ferring to the same concepts.

    4.2 Evaluation of spatial distortion

    The first goal of the work is to evaluate the impact deriving from lossy

    co-decompression followed by geometric distortion correction on the final

    image quality.

    The underlying assumption is that the image is compressed and de-

    compressed before applying any geometrical correction. This hypothesis

    is reasonable in many practical systems due to several reasons, including:

    necessity of ensuring a low complexity of the acquisition system, use of sen-

    sors with embedded compression tools, frequent changes of optical lens or

    environment preventing the use of an embedded de-warping algorithm, etc.

    On the other hand, compression is increasingly used in the early stage of

    the acquisition, in particular for applications where the sensor is remotely

    connected to the processing unit using narrow-bandwidth channels (e.g.,

    wireless cameras) or is attached to a limited-capacity local storage device.

    A spatial distortion in the acquisition system introduces a non-uniform

    distribution of the visual information in the acquired image. As a matter

    of fact, given two image areas with equivalent frequency content in the

    undistorted domain, the relevant areas in the acquired picture will show

    a higher frequency content where spatial compression occurred, and vice-

    versa. Conversely, the coding algorithm usually operates in a homogeneous

    way over the whole image. To achieve effective data compression it must

    neglect some information, especially at the higher frequencies, and have to

    produce an information loss as uniform as possible over the whole image,

    in order to avoid local peaks in the distortion.

    Consequently, the error introduced by the encoder in an image region

    38

  • 4.2. EVALUATION OF SPATIAL DISTORTION

    (a)

    (b)

    Figure 4.1: Graphical representation of a fish-eye distorted image: (a) before, and (b)

    after de-warping.

    will be proportional to the local spatial deformation. Where spatial com-

    pression is present, the error will affect a larger zone in the final corrected

    image, and will be more severe due to the presence of higher frequency

    contents. On the other side, in areas with low information density, the

    error will be attenuated by the averaging effect introduced by geometrical

    correction algorithms. Figure 4.1 depicts an example of this phenomenon

    related to the use of a fish-eye lens, where the above concepts can easily

    find clear evidence. It can be observed that two areas of equal dimension

    in the undistorted (or corrected) domain, represented in dark and light

    gray in Fig.4.1.b, are associated in the distorted domain to areas contain-

    ing more or fewer samples according to their spatial position and to the

    geometry of the acquisition system.

    In order to quantify this effect, the idea is to compare two schemes

    (see Fig.4.2): in the former, labeled as “scheme A”, the acquired image is

    compressed and transmitted after the geometric correction; in the latter,

    “scheme B”, compression and transmission are performed prior the geo-

    metric correction of the image. The distortion is measured in any case by

    39

  • CHAPTER 4. NON UNIFORM COMPRESSION IN IMAGE AND VIDEO TRANSMISSION

    comparing the final result (decompressed, de-warped image) with the un-

    compressed, de-warped image, being the real-world (undistorted) picture

    unavailable in real cases.

    A commonly accepted metric to estimate the distortion introduced by a

    processing system is the Peak Signal-to-Noise Ratio (PSNR), which treats

    the distortion as a kind of noise introduced on the original data, indepen-

    dently of its origin. The noise power is estimated through the computation

    of the Mean Square Error (MSE), and the signal power is computed on the

    basis of the maximum excursion of the luminance function, namely:

    PSNR(dB) = 10 · log1022·b

    MSE(4.1)

    where b is the number of bits per pixels in the original image. Usually

    PSNR is calculated on the whole image, but we are interested mostly in lo-

    cal measures that can highlight the non-homogeneous distortion introduced

    by scheme B as compared to scheme A. For the purpose of evaluating the

    local distortion introduced by the process, we propose a method, called

    QDM, which uses a quadtree decomposition to generate a local map of

    the distortion effects. It will be demonstrated that QDM can be useful to

    evaluate the performance of compression schemes applied to geometrically

    distorted images, as well as to design optimized compression schemes able

    to improve the overall coding performance. It is to be pointed out that the

    concept of QDM is independent of the use of PSNR as a quality measure:

    QDM-based approaches can be implemented also using more sophisticated

    perceptual error models at the price of an increased complexity [65].

    QDM is based on the application of the well-known quadtree decom-

    position algorithm [5]. The quadtree segmentation was demonstrated to

    efficiently represent simple image partitions subject to rigid geometric con-

    straints. In our approach the quadtree decomposition is applied to the

    error image Ierr(x, y), defined as the absolute difference performed on a

    40

  • 4.2. EVALUATION OF SPATIAL DISTORTION

    (a) (b)

    Figure 4.2: The two alternative compression and transmission schemes considered in the

    estimation of the impact of geometrical distortion on compression performance: (a) co-

    decoding is applied after geometrical correction, (b) vice-versa.

    41

  • CHAPTER 4. NON UNIFORM COMPRESSION IN IMAGE AND VIDEO TRANSMISSION

    pixel-by-pixel basis between the reference image Iref(x, y) (i.e. the geo-

    metrically corrected uncompressed image) and the output image (image

    after co-decompression and de-warping, in either order) Idist(x, y):

    ∀(x, y) Ierr = |Iref(x, y) − Idist(x, y)| (4.2)

    The aim is to obtain a map representing the spatial distribution of

    the distortion, through local measurements of the PSNR. The areas where

    PSNR is considered to be homogeneous are those identified by the leaves of

    the Quadtree decomposition. The QDM algorithm is a recursive process,

    and proceeds as follows:

    i Compute the variance σ2 of the error image Ierr(x, y)

    ii If σ2 is greater than a given threshold Σth: then ⇒ split the imageinto four sub-images, having its size along x and y directions else ⇒stop recursion

    iii Recursively apply steps (i) and (ii) to each sub-image until each block

    fulfills the variance condition defined at point (ii) or reaches a mini-

    mum size ∆min.

    The stop condition in point (iii) takes into account also a minimum

    allowed dimension ∆min for each sub-image, to avoid excessive splitting: in

    our tests, we used ∆min = 8, corresponding to the typical block size used

    in coding standards. Σth is set equal to α · σ2A, where σ2A is the varianceof the error image in Case A, and α is a parameter in the range 1 ÷ 2taking into account the type of spatial distortion and the characteristics of

    the compression algorithm. More in detail, the choice of a is connected to

    the distortion introduced by the acquisition device, which largely depends

    on the viewing angle. For instance, the effect of fish-eye lenses can be

    approximated by a spherical transform, in which the distributed over large

    42

  • 4.2. EVALUATION OF SPATIAL DISTORTION

    image areas, while not reaching very high values. In this case, a low value

    of a (e.g. 1.2 ÷ 1.5) is required to achieve a precise QDM map. On theother side, parabolic or conic projections typical of mirrored lenses produce

    heavier distortions, thus requiring higher values of α (1.6 ÷ 2) to focuson greatly distorted areas. Consequently, it has been found that it is

    possible to heuristically set α a-priori on the basis of the type of geometrical

    distortion, independently of the image content. Further considerations on

    the setting of α are provided in section 4.4 (tables 4.1, 4.2, 4.3 and 4.4

    and relevant discussion), where the impact of the coding algorithm is also

    considered.

    As a consequence of the above procedure, the areas are split where the

    error is more fluctuant, thus achieving a subdivision of the error image

    into areas with nearly constant distortion. The result of the decomposi-

    tion is a sparse matrix that indicates the block subdivision of the error

    image in block of various dimensions, associated to different error values.

    In figure 3, an example of QDM is shown with application to the “blood”

    test image, 256x256, 8 bpp. Here, the distortion introduced by the ac-

    quisition system is simulated by a polar coordinate transform (3.a), which

    reproduces the behavior of a 360 mirrored lens. The error is computed be-

    tween the reference image (uncompressed de-warped) in Fig. (3.c) and the

    output of schemes ’A’ and ’B’, in Figs. (3.d-e), respectively. A standard

    JPEG encoder with compression ratio CR = 10 was used in both cases

    (the co-decoded images in warped and de-warped domains are shown in

    Figs. 3.b and 3.d, respectively), while the parameter a was set to 1.4. The

    compression ratio CR is defined as:

    CR =Nb,oNb,c

    (4.3)

    where Nb,o is the number of bits required for representing the original

    image in the canonical form and Nb,c the number of bits after compression.

    43

  • CHAPTER 4. NON UNIFORM COMPRESSION IN IMAGE AND VIDEO TRANSMISSION

    Since the variance threshold Σth results higher than the error variance

    in scheme A, the relevant output image does not produce any split. As

    far as scheme B is concerned, the result of the splitting process is repre-

    sented in Fig. (3.f). In Fig. (3.g), called QDM map, each leave of the

    relevant quadtree is associated to a gray level proportional to the local

    distortion (the higher the distortion, the darker the corresponding block).

    The QDM map of scheme B makes evident that the compression in the

    distorted domain generates an uneven distribution of the error. To better

    appreciate this fact, in Fig. (3.h) the QDM map associated to scheme B is

    transformed back in polar coordinates, i.e., in the original acquisition do-

    main. The resulting map provides a convincing confirmation of the above

    reasoning about the implications of lossy compression applied to geomet-

    rically distorted images. As a matter of fact, it can be observed that the

    quality degradation progressively increases toward the image center, where

    the information density is higher (due to spatial compression).

    It is important to point out that in the compression of natural images,

    the distribution of the error can fluctuate also in the absence of geometri-

    cal distortions, due to the non-stationarity of the input image and to the

    characteristics and parameters of the encoder. Nevertheless, this effect can

    be neglected for two reasons.

    First, the image content is the same for both scheme A and B, thus

    allowing a comparative assessment. The underlying assumption is that the

    effects of non-stationary image contents and geometrical distortions on the

    error distribution are uncorrelated and additive. This is not completely

    true in general, due to the fact that a geometrical deformation can al-

    ter not only the magnitude but also the orientation of spatial frequencies

    (e.g., straight lines become curves when acquired by a wide-angle lens).

    Therefore, due to the different treatment of the spatial frequencies at the

    encoder, the distortion can have some “second-order” effects on the final

    44

  • 4.2. EVALUATION OF SPATIAL DISTORTION

    (a) (b)

    (c) (d) (e)

    (f) (g) (h)

    Figure 4.3: Example of application of QDM: (a) original, uncompressed and warped by

    polar transform, (b) compressed in warped domain, (c) original, uncompressed de-warped,

    (d) output of scheme A , (e) output of scheme B, (f) result of split process scheme B, (g)

    QDM map scheme B, (h) polar transform of QDM map scheme B Note that when the

    split process is applied to the scheme A (with the same parameters used for scheme B),

    there is no split at all, and the QDM map is a constant value image.

    45

  • CHAPTER 4. NON UNIFORM COMPRESSION IN IMAGE AND VIDEO TRANSMISSION

    result. Nevertheless, these phenomena are related more to the perceptive

    quality of the decompressed image than to its objective assessment, and

    therefore can be neglected in QDM, which is simply based on absolute error

    estimation.

    Second and more important, in practical applications QDM is meant to

    be performed off-line, by presenting to the system some pre-defined cali-

    bration images, designed to match the application to which the acquisition

    system is targeted. For instance, in a fixed camera surveillance system the

    calibration set could be obtained by selecting some shots acquired in typi-

    cal operating conditions, thus allowing to take into account also the local

    image content. On the contrary, to achieve a general purpose system the

    calibration image should have a frequency content as uniform as possible,

    to ensure a uniform behavior independently of the application. Accord-

    ing to this last model, in our tests we used images containing statistical

    or structural textures, as in the case of the “blood” image, or synthetic

    patterns obtained by patch repetition.

    A further consideration about system calibration concerns the possibil-

    ity of computing the distortion map a-priori, simply based on the char-

    acteristics of the acquisition system. For instance, it would be possible

    to determine the local compression and expansion due to the geometrical

    deformation, and directly estimate the relevant impact on the compression

    distortion. Unfortunately, this is not a trivial task, since the deformation

    produces in general a re-sampling of the picture over an irregular sam-

    pling grid, which in turn generates very different spatial frequencies (both

    in magnitude and orientation). Moreover, such spurious freq