non-uniform filter banks and context modeling for image coding

Non-uniform Filter Banks and

Context Modeling for Image Coding

Ho Man Wing

M.Phil. Thesis

The University of Hong Kong

August 2001

Abstract of thesis entitled



submitted by

Ho Man Wing

for the degree of Master of Philosophy

at the University of Hong Kong

in August 2001

In recent years, subband coding is getting popular for image compression. Traditionally,

subband filter banks decompose input signals into subbands with equal bandwidth. With

the doubt that whether uniform subband decomposition is best for image coding, we

carried out coding experiments using non-uniform filter banks that decompose image

with unequal bandwidths to see their performances. First, we revisited non-uniform

filter banks derived from uniform ones and revised the constraints. The resulting filter

banks have some branches with rational sampling rates. Then, linear phase property, as

an essential issue of image coding, was discussed. Finally, different non-uniform filter

banks were designed and applied to image coding. In addition, different horizontal and

vertical non-uniform partitions were applied base on the spectral analysis of image.

Improvements over uniform filter bank were observed and discussed.

As another important aspect in image compression, coefficient context modeling

algorithms showed improvements in rate-distortion performance. In this dissertation, we

proposed a new coefficient context modeling algorithm which is tailor-made for

transform coding. The algorithm exploited the intraband and interband dependence

through the use of space and frequency neighbors in a transformed image. Context

quantization was used to reduce the possible configurations of the neighborhood. It

attained improved gain in conditional arithmetic coding. Besides, precise bit rate control

was possible using casual truncation of bit stream while good rate-distortion

performance was maintained. Finally, its performance was compared with other

algorithms and satisfactory results were obtained.



by

Ho Man Wing

B.Eng. H.K.

A thesis submitted in partial fulfillment of the

requirements for

the Degree of Master of Philosophy

at the University of Hong Kong.

August 2001

Declaration

I declare that this thesis represents my own work, except where due acknowledgement

is made, and that it has not been previously included in a thesis, dissertation or report

submitted to this University or to any other institution for a degree, diploma or other

qualification.

Signed _____________________________

( Ho Man Wing )

August, 2001

i

Acknowledgements

I am pleased to express my sincere thanks and appreciations to my thesis supervisor, Dr.

Kwok-Ping Chan. He gave his constructive advices and comments on my research

works. Besides, he patiently gave his encouragements and supervision of my research

progress. I would also like to thank Dr. Li Chen, for giving his thesis manuscript and the

resources he had worked out on linear phase paraunitary filter bank. His help made a

good start for my work. Special thanks are given to Dr. Jelena Kovačević, for

explaining her work in detail. Her theory on non-uniform filter bank has been the

foundation of my research.

Thanks also go to the staffs of the department general office and technical

support team, the department of Computer Science and Information Systems, for helps

in every aspect during my study. Gratitude is given to the members of the Committee of

Research and Conference Grants and Student Research Support, for providing financial

support in my research work.

Lastly, I would like to thank all my friends in The University of Hong Kong,

Ben Ng, Wan Hon Hing, Tse Hiu Ming, Cheng Ho Yin, Choy Ming Fai and Sam Yu.

ii

Contents

List of Figures vi

List of Tables ix

Chapter 1 Introduction 1

1.1 Non-uniform filter banks...................................................................................2

1.2 Coefficient context modeling............................................................................3

1.3 Thesis outline.....................................................................................................4

Chapter 2 Linear Phase Paraunitary Filter Banks and Introduction to Non-

uniform Filter Banks 6

2.1 Introduction.......................................................................................................6

2.2 Perfect reconstruction filter banks.....................................................................7

2.2.1 Polyphase representation...................................................................7

2.2.2 Paraunitary systems.........................................................................10

2.3 Linear phase paraunitary filter banks..............................................................10

2.3.1 Linear phase paraunitary filter banks for even M............................11

2.3.2 The GenLOT, DCT and LOT..........................................................14

2.3.3 Linear phase paraunitary filter banks for odd M.............................17

2.4 Optimizations...................................................................................................19

2.5 Introduction to non-uniform filter banks.........................................................21

2.5.1 Non-uniform filter bank derived from uniform filter bank.............24

2.5.2 Equivalent rational sampling and filtering structure.......................28

iii

Chapter 3 Overview of Image Coding 31

3.1 Image decomposition.......................................................................................31

3.2 Quantization and bit rate allocation.................................................................34

3.3 Entropy Coding................................................................................................36

3.4 Embedded Zerotree Wavelet (EZW)...............................................................38

3.5 Set Partitioning in Hierarchical Trees (SPIHT)...............................................40

Chapter 4 Non-uniform filter banks and its application to image coding 42

4.1 Introduction.....................................................................................................42

4.2 Extracting the correct spectrum of signal........................................................43

4.3 Linear phase property......................................................................................49

4.4 Design examples..............................................................................................50

4.5 Image coding examples and results.................................................................57

4.5.1 Subband images after non-uniform decomposition.........................57

4.5.2 Coding performance and observations............................................57

4.6 Conclusion.......................................................................................................68

Chapter 5 Space and Frequency Context Modeling for Block-Transform-Based

Image Coding 69

5.1 Introduction.....................................................................................................69

5.2 Intra-band and Inter-band Dependence...........................................................70

5.2.1 Space and frequency neighborhood.................................................71

5.2.2 Statistical context modeling.............................................................73

5.2.3 Context quantization........................................................................75

5.2.4 Another context quantization method..............................................76

5.3 Coding Passes..................................................................................................77

iv

5.3.1 Significance pass.............................................................................78

5.3.2 Refinement pass...............................................................................78

5.3.3 Cleanup pass....................................................................................78

5.3.4 Illustration of coding passes............................................................79

5.3.5 Embedded bit stream truncation......................................................82

5.4 Coding Examples and Results.........................................................................82

5.5 Conclusion.......................................................................................................87

Chapter 6 Conclusion 90

References 92

v

List of Figures

Figure 2-1 M-channel analysis/synthesis filter bank.........................................................9

Figure 2-2 Equivalent polyphase structure of filter bank in above figure.........................9

Figure 2-4 Polyphase representation after rearrangement of decimators and expanders in

Figure 2-2..................................................................................................................9

Figure 2-5 Frequency response of a 16-channel LPPUFB that has been optimized for

coding gain..............................................................................................................14

Figure 2-6 Example lattice structure for GenLOT of 8 basis..........................................15

Figure 2-7 Frequency response of a 7-channel linear phase paraunitary filter bank.......18

Figure 2-8 Filter bank with rational sampling factors.....................................................22

Figure 2-9 Illustration on how signal can be extracted in fractions and reconstructed

perfectly...................................................................................................................23

Figure 2-10 Non-uniform filter bank designed indirectly. (a) Analysis Part (b) Synthesis

Part...........................................................................................................................25

Figure 2-11 Spectral analysis of filter bank with ideal filters for (1/3, 2/3) splitting......27

Figure 2-12 Equivalent structure for Figure 2-10...........................................................30

Figure 3-1 Forward discrete wavelet transform through a 2-channel filter bank............33

Figure 3-2 Two popular subband decomposition structure, left: dyadic wavelet

transform, right: parallel M-channel decomposition...............................................34

Figure 3-3 Illustration of scalar quantization with dead zone.........................................35

Figure 3-4 Demonstration of arithmetic coding using a 4-ary string "CCBDB"............37

Figure 3-5 Parent-child relationship of wavelet coefficient in a EZW coder..................39

vi

Figure 4-1 Illustration on the upsampled version of Hk(z) & Gi(z) and relation of m & n

.................................................................................................................................47

Figure 4-2 Real non-uniform filter bank design example with different values of k. The

plots show the spectrums of the combined channels with bandwidth of 5π/8 for

k=0,1,2,3..................................................................................................................48

Figure 4-3 Symmetry of combined filters of our proposed design.................................49

Figure 4-4 6-channel non-uniform filter bank with bandwidth: (π/16, π/16, π/16, π/16,

5π/16, 7π/16)...........................................................................................................52

Figure 4-5 Frequency plot of the extra 3π/16 band. It is overlapping with that of 5π/16.

.................................................................................................................................53

Figure 4-6 10-channel non-uniform filter bank with bandwidth:(π/16, π/16, π/16, π/16,

π/16, π/16, π/16, π/16, π/16, 7π/16).........................................................................54

Figure 4-7 4-channel non-uniform filter bank with bandwidth: (π/16, 5π/16, 5π/16,

5π/16).......................................................................................................................55

Figure 4-8 4-channel non-uniform Filter bank with bandwidth: (π/16, 5π/16, 5π/16,

5π/16). Both the base bank and the synthesis banks are optimized........................56

Figure 4-9 Subbands of obtained after the decomposition by a 4-channel non-uniform

filter bank: (π/16, 3π/16, 5π/16, 7π/16the image Lena)...........................................60

Figure 4-10 Subbands of the image Lena obtained after the decomposition by a 16-

channel uniform filter bank.....................................................................................60

Figure 4-13 Plot of PSNR vs Bit rates using different decomposition for the image

Barbara....................................................................................................................67

vii

Figure 4-14 Piecewise constant horizontal and vertical power spectrum of image

Barbara. Dotted lines show the best partition found for the non-uniform filter

banks........................................................................................................................67

Figure 4-15 Subbands obtained by different horizontal and vertical non-uniform filter

banks with partitions as shown above.....................................................................67

Figure 5-1 Space and frequency neighbors in three different directions inside a block-

transformed image using M-channel FB or transforms with M basis.....................71

Figure 5-2 Image after transformation with 16x32 LPPUFB..........................................72

Figure 5-3 Example of finding context label...................................................................77

Figure 5-4 Illustration of the coding process...................................................................79

Figure 5-5 Example transformed image with 3x3 blocks...............................................80

Figure 5-6 Example of coding passes for the image in Figure 5-5.................................81

Figure 5-7 Plots of PSNR vs bit rates for images: Lena, Barbara & Goldhill................88

Figure 5-8 Reconstructed images of Barbara at bit rates of 0.5bpp and 1.0bpp.............89

viii

List of Tables

Table 4.1 PSNR results for image coding using different NUFB and LPPUFB on

different test images................................................................................................62

Table 5.1 Lookup table for context labels.......................................................................75

Table 5.2 Objective PSNR results in dB for various image coding method...................86

ix

Chapter 1 Introduction

In recent years, subband image coding is popular as it obtained certain degree of

success. Traditionally, input signal is decomposed into subbands with equal size using a

M-channel filter bank. Meanwhile, wavelet transform is regarded as another kind of

subband decomposition method. It decomposes the input in the dyadic manner and

gives the multiresolution representation of an image. However, these methods analyze

images in fixed frequency bandwidth and thus may not adapt to the different spectral

distribution of some images. In this dissertation, we attempt to investigate and

characterize filter banks that can decompose input signals into subbands with different

bandwidth. We also try to apply these filter banks to image compression to see its

performance.

In addition, the recent development and success of coefficient modeling algorithm

in image coding attract great attentions of researchers. As an important part in image

coding process, an effective coefficient modeling algorithm may often contribute

observable entropy gain and hence improve the rate-distortion performance. Among

these recent advances in image coding, most of them were tailor-made for wavelet-

based image coding like the Embedded Zerotrees Wavelet (EZW) and the Set

Partitioning of Hierarchical Trees (SPIHT). However, some of these algorithms were

adapted to block-transform-based coding and similar success has been noticed.

Therefore, it is attractive to see if any algorithm that can better model the coefficients

after block transform. Consequently, this dissertation also introduces an effective

coefficient context modeling algorithm dedicated for block-transform-based image

compression.

1

1.1 Non-uniform filter banks

It was found that filter banks with rational sampling factors could be obtained by

cascading synthesis filter banks to a uniform filter bank base. The synthesis filter bank

is essentially used to combine signal from several uniform channels into one single

channel occupying a larger bandwidth. This can be easily justified by considering the

transmultiplexer structure, which is used to multiplex more than one signal into a signal

at a higher rate. The most trivial choice is the delay chain, which is actually time-

division-multiplexing (TDM) of signals. With the choice of synthesis and analysis

filters having brick-wall frequency response, it is indeed the frequency-division-

multiplexing (FDM).

Based on this construction, numbers of channels of the synthesis filter banks and

the base filter banks are restricted to be relatively prime of each other. Otherwise, the

downsampler and upsampler cannot be interchanged. Also, in order to have contiguous

frequency response, the desired non-uniform channel has to start at even position in the

base bank. By choosing a base filter bank with 2N channels, any odd-channel filter bank

can act as the synthesis bank for combining the desired channel. In this dissertation, we

try to relax the constraint of the starting position of non-uniform channel by allowing

reversing the matching order of filters from two banks (base and synthesis filter banks).

Linear phase property implies symmetry (i.e. either symmetric or anti-

symmetric) of each filters and is crucial for image coding. Filter banks with non-linear

phase will cause undesirable artifacts in reconstructed image if it is quantized coarsely.

Therefore, we try to find out the criteria for linear phase in the proposed non-uniform

filter banks.

2

Theoretically, subband coding exploits the non-flatness of spectral of the input.

Finer division in frequency will produce lower quantization noise at the same rate. Our

approach to combine subband signals into one signal seems contradicting this.

Nevertheless, we will see that the channels that we are going to combine are basically

having similar variances, i.e. there is very little non-flatness among them. Hence, there

will be a little increase in quantization error at the same bit rate. Conversely, we may

achieve enhancement during the part of lossless coding, if we observe any nice nature in

the wider subband. As an example, for subbands with similar variance, i.e. signal spread

over subbands, energy is probably localized in time domain. This occurs, for example,

in the object boundaries in a picture where energy spread in many uniform subbands

after decomposition. However, with a subband of large bandwidth, we can easily see the

boundaries. The connected nature of the boundaries is somehow useful in context

modeling algorithm.

1.2 Coefficient context modeling

Our coefficient context modeling algorithm is dedicated for block-transform image

compression schemes. It had been pointed out that dependences exist in intraband as

well as interband coefficients. For image transformed or decomposed by M-channel

filter bank in parallel, we can always figure out the adjacent coefficients of a particular

coefficient in frequency and space domain. We call these frequency and space

neighbors. With the first few bits of neighbors appeared, it might be easy to guess how

large the magnitude of the coefficient will be. For example, if most of the neighbors are

seems to be small, then the surrounded coefficients has a high chance to be small too.

This is justified for the smooth region in an image, most of the coefficients are small in

both space and frequency domain. At the same time, large coefficients may also be

3

predicted from its significant neighbors. For example, for highly textured region, the

energy concentrates in few bands while occupying a large area. Meanwhile, for object

boundaries, the signal spreads in different bands.

Our algorithm is going to code the coefficient values in bits starting from the

most significant bit. By observing the neighborhood status, we can determine the

probabilities of the to possible symbol, i.e. 0 or 1. Using a binary arithmetic coder, we

can code the bits following the probabilities. This attains reasonable gain if the

possibility is biased to any one of them. Considering the status of the significance of

each neighbor, we have quite a lot of possible combinations. A well-designed context

quantization may help to solve the problem. Furthermore, the transformed image has

probably quite a lot of insignificant values. That is, we may usually find these

insignificant values occur in groups and hence coding them by groups may enhance the

gain and speed up the coding process.

1.3 Thesis outline

The organization of this thesis is as follows. In Chapter 2, basic concept and design

approaches of linear phase filter banks are revised briefly. The chapter includes criteria

for perfect reconstruction, polyphase representation of filter banks, formulation of linear

phase property, factorization of linear phase filter bank and finally, the optimization

measure. Design of linear phase filter banks is essential to our works in Chapter 4 where

non-uniform filter banks are described. In last section of Chapter 2, the principle of

deriving filter banks with rational sampling factors from uniform filter banks is

explained. This includes the constraints and the equivalent structure.

Chapter 3 reviews some fundamental matters in image compression as well as

some recent achievements by other researchers. These include major processes in lossy

4

image coding like: image decomposition, quantization and entropy coding techniques.

The author hopes this chapter will help readers to better understand the works in

Chapter 4 and 5.

In Chapter 4, the constraint on position of synthesized channel for contiguous

spectrum is relaxed. It is found that the matching of each pair of filter from two banks is

constrained by the position of the synthesized channel. Linear phase property of such

filter banks is considered. The application of such kind of filter banks is applied to

image coding. Different non-uniform bandwidth partitions are mentioned and discussed

base on spectrum of image.

Chapter 5 proposes a new image coding algorithm dedicated for block-

transform-based image coder. Its working principle and implementation details are

explained. The coding process is illustrated and coding results is shown. The

performance and comparison with other coding algorithm are also discussed.

Lastly, chapter 6 presents a summary on this dissertation.

5

Chapter 2 Linear Phase Paraunitary Filter Banks

and Introduction to Non-uniform

Filter Banks

2.1 Introduction

In recent years, the theory of design and implementation of filter banks have been

extensively studied and researched. Applications of filter banks can be found in many of

the signal processing fields like waveform coding in audio and image processing,

crosstalk free transmultiplexing in communication and feature detection in pattern

recognition.

Linear phase (LP) paraunitary filter banks play a major role in our works. Even-

channel filter banks serve as the bases in our designs of non-uniform filter banks. Also,

they act as the control for comparison of coding performance with our non-uniform

filter banks. In the part of context modeling for image coding, we used linear phase

filter bank as the fundamental tool for image decomposition. For odd-channel LP FB,

they were mainly used as the synthesis filter banks to combine several uniform channels

into a non-uniform channel. Therefore, some theory of LP FB will be briefly reviewed

in this chapter before going on with the main parts of this thesis. Besides, some useful

filter banks that were widely used in our works will be shown in this chapter.

In the last section, non-uniform filter banks with rational sampling factors will be

introduced. The proposed filter banks are derived from uniform filter banks. The

principle of maintaining perfect reconstruction property for such filter banks will be

explained. Constraints for equivalent rational sampling structure will be mentioned.

6

This section is the preliminary section for our discussion of non-uniform filter banks in

Chapter 4 .

2.2 Perfect reconstruction filter banks

Traditionally, we only consider maximally decimated M-channel uniform filter banks

for coding purpose where the overall sampling rate will be preserved. Under such filter

bank systems, the input signal X(z) is passed to M analysis filters H i(z) and is filtered.

The resulting signal is then decimated by a factor of M. For applications like image

coding, these subband signals are usually quantized, entropy coded, stored and/or

transmitted. For the synthesis part, the subband signals are upsampled to a factor of M

before filtered by M synthesis filters and added together to reconstruct the original input

signal. Theoretically, if there is no loss of signal after the analysis stage, the signal can

be recovered perfectly during the synthesis stage when perfect reconstruction (PR) filter

bank is used. The only difference is the delay and the scaling factor introduced by the

filter bank system, i.e.

(2.)

where c is a constant and d is an integer. Figure 2-1 depicts the block diagram of signal

flow for a typical maximally decimated analysis/synthesis filter bank system.

2.2.1 Polyphase representation

Let the filters . Suppose they can be

written as [25]

(2.)

Then,

7

(2.)

and hence,

(2.)

Equation (2.) is called Type 1 polyphase representation [25][25] and is referred

as the polyphase components of . Similarly, for the set of filters

, they can be represented as

(2.)

where

(2.)

and hence

(2.)

Equation (2.) is called the Type 2 polyphase representation.

As a result, the set of filters can be represented

as

(2.)

where , and

(2.)

where .

8

E(z) and R(z) are regarded as the Type 1 and 2 polyphase matrix [25] for the analysis

Figure 2-1 M-channel analysis/synthesis filter bank

Figure 2-2 Equivalent polyphase structure of filter bank in above figure

Figure 2-3 Polyphase representation after rearrangement of decimators and expanders in Figure 2-2.

MH0(z)

HM-1(z)

X

H1(z) M

M X’

M F0(z)

M

M F1(z)

FM-1(z)

M

M

M

z-1

E(zM)z-1

M

M

M

z-1

R(zM)z-1

z-1 z-1

M

M

M

z-1

E(z)z-1

M

M

M

z-1

R(z)z-1

(a)

(b)

(c)

(d)

(f)

Figure 411

(a)-

(f) Plots of PSNR vs Bit

rates for

different images using four different filter banks

z-1

9

and synthesis filter bank. The graphical structure of (2.) and (2.) can be found in Figure

2-2. Using noble identities, we can rearrange structure in Figure 2-2 to that in Figure 2-3

[25].

2.2.2 Paraunitary systems

Base on Figure 2-3, it will be a perfect reconstruction filter bank if [25]

(2.)

Obviously, given any FIR polyphase matrix E(z)

(2.)

and R(z) is also FIR. One easy way is to choose a paraunitary matrix E(z) such that

(2.)

Therefore, paraunitary filter bank can be designed if its polyphase matrix E(z) can be

determined. This can be done through cascading a series of elementary paraunitary

matrix. The factorization of paraunitary filter banks has been well described in [25].

2.3 Linear phase paraunitary filter banks

Consider a set of M paraunitary transfer functions whose polyphase matrix E(z)

satisfies the property [4]

(2.)

where N is the order of the paraunitary matrix E(z). The matrix D is a diagonal matrix

with entries +1s for symmetric filters and –1s for anti-symmetric filters. The equation

(2.) though does not represent all the linear phase paraunitary filter banks, it represents a

large and useful class of them, from a practical point of view. In a LP filter bank, each

filter is either symmetric or anti-symmetric. The number of symmetric or anti-

10

symmetric filters is imposed in the equation (2.) and summarized in the following

theorem [4]

Theorem 1: Consider a M-channel linear phase perfect reconstruction system.

1) If M is even, there are M/2 symmetric, and M/2 anti-symmetric filters.

2) If M is odd, there are (M+1)/2 symmetric and (M-1)/2 anti-symmetric filters.

2.3.1 Linear phase paraunitary filter banks for even M

For paraunitary filter banks, the paraunitary property can propagate along with the

structure wherever building blocks with paraunitary property are added. In addition to

paraunitary property, we have to maintain the linear phase property. However, to

constraint the system to be linear phase, the linear phase property cannot propagate

along the system when building blocks are added [4]. Instead, we can have an

intermediate property to help us to achieve this. This is the so called pairwise time-

reversed property. It can be easily converted to linear phase polyphase matrix, as the

sum (difference) of the time-reversed pair is symmetric (anti-symmetric). Moreover,

any linear combination of those symmetric (anti-symmetric) filters will not disturb the

symmetry. Suppose we have Fm(z) to be the polyphase matrix at stage m and satisfying

the pairwise time-reversed property [4]

(2.)

And we would like to add building block so that at next stage

(2.)

11

It was shown that the time-reversed property is maintained by multiplying in

front of as long as 's having a particular form [4]. Therefore, we have the

factorization for the polyphase matrix having time-reversed property. Finally, to

convert it into a polyphase matrix with linear phase, we may assume and

it was shown may have the following form

(2.)

Consequently, we have the complete factorization as

(2.)

where

(2.)

and

(2.)

The factorization in (2.) was further simplified in [28]. With the introduction of

, we have a simpler factorization

(2.)

12

This is called LPPUFB (Linear phase paraunitary filter bank) in [28]. We can see that

LPPUFB characterized by the predefined paraunitary matrices like W and (z) as well

as some arbitrary orthogonal matrices. Therefore, the problem for designing or

optimizing filter banks becomes the process of fixing such arbitrary orthogonal matrices

under certain optimization criteria. It was shown that any orthogonal matrix of size

MM could be factored into a product of M(M-1)/2 Givens rotation [25]. Each of these

Givens rotations is determined by one rotation angle. Observing (2.), we can see that for

LPPUFB with filter length NM (i.e. N stages), there are 2N orthogonal matrices of size

(M/2)(M/2). Hence, there are totally NM(M-1)/4 free parameters. As an example of

design of even channel LPPUFB, Figure 2-4 depicts the frequency response of a 16-

channel LPPUFB that had been optimized for coding gain.

Title:E:\RESEARCH\thesis\16-ch.epsCreator:MATLAB, The Mathworks, Inc.Preview:This EPS picture was not savedwith a preview included in it.Comment:This EPS picture will print to aPostScript printer, but not toother types of printers.

Figure 2-4 Frequency response of a 16-channel LPPUFB that has been optimized for coding gain

13

2.3.2 The GenLOT, DCT and LOT

A. Generalized Lapped Orthogonal Transform (GenLOT)

The GenLOT was defined as a LPPUFB satisfying (2.) with E0 chosen to be the DCT

matrix[28]. The GenLOT with filter length L=NM have N-1 stages after the DCT. The

DCT output has to be arranged in two groups according to the symmetry (symmetric or

anti-symmetric), as indicated in Figure 2-5. The polyphase matrix was defined as

(2.)

The DCT (Discrete Cosine Transform) and LOT (Lapped Orthogonal Transform) can

be regarded as special classes of GenLOT with N=1 and N=2. Since, E0 is chosen to be

the DCT matrix, there are only N-1 stages of free parameters, i.e. (N-1)M(M-1)/4 free

parameters.

B. Discrete Cosine Transform (DCT )

DCT has been widely used in application of image coding and video coding. DCT has

sub-optimal energy compaction performance for processes modeled by first-order

z-1

z-1

z-1

z-1

½

½

½

½

½

½

½

½

z-1

z-1

z-1

z-1

½

½

½

½

½

½

½

½

U1

V1

- - - -

- - - -

0

1

2

3

4

5

6

7

7

UN

VN

- - - -

- - - -

DCT

0

1

2

3

4

5

6

7

0

2

4

6

1

3

5

7

W(z)W W(z)W

1

N C

0

2

4

6

1

3

5

7

Figure 2-5 Example lattice structure for GenLOT of 8 basis.

14

regression with correlation coefficient when 1. Therefore, it is an alternative

choice other than the optimal Karhunen-Loéve transform (KLT) for image coding.

Since KLT is signal dependent, it is computationally complex. Furthermore, the DCT

can be computed via fast algorithm. The basis functions of DCT are defined as

(2.)

The disadvantage of DCT in image coding is that blocking effect appears when the

image is compressed to a high ratio. It is because in order to achieve high compression

ratio, coefficients after DCT will be unavoidably quantized coarsely.

C. Lapped Orthogonal Transform (LOT)

The LOT was developed to alleviate the major disadvantage of traditional block

transforms: blocking effects [10]. In block transform like DCT, the input signal is

divided into separate blocks and the blocks transformed independently. This produces

discontinuities at boundaries of blocks when the coefficients are coarsely quantized

before the inverse transform. On the other hand, the LOT allows overlapping of input

signal and thus makes the block boundaries smooth without increasing the bit rate.

Moreover, the LOT (9.20dB for type-I LOT with 8 basis functions) has higher coding

gain than the DCT (8.83dB) for image coding [10]. Generally, it has better coding

performance than DCT.

Also, the LOT has its fast implementation algorithm based on fast algorithm of

DCT and DST-IV(Discrete Sine Transform, type-IV). Nevertheless, there is still the

trade-off between LOT and DCT for better coding performance (along with reducing

15

blocking effects) and computation efficiency. As mentioned above, the LOT can be

regarded as a special case of GenLOT with 2 stages and having filter lengths of 2M. For

type-I LOT, U1 is taken to be I while V1 is characterized by 3 Givens rotations.

2.3.3 Linear phase paraunitary filter banks for odd M

Similar to that in subsection 2.3.1, we can achieve linear phase filter banks with odd

number of channels by cascading paraunitary building blocks to the structure having the

pairwise time-reversed property[4]. However, using the factorization in [4], we will

have some zeros imposed in each of the filters (for order > 0) and the number of zeros

depends on the order of polyphase matrix. Moreover, it is impossible for us to have

meaningful 3-channel system having more than 1 stage. Indeed, using the above

factorization, we will have zeros inserted between the 3 non-zero coefficients for filter

bank with length of filters longer than 3. This is undesirable for us as we usually need

the 3-channel linear phase system during our design of non-uniform filter bank later in

this dissertation.

Fortunately, there is another cascade structure for linear phase systems that can

solve the problem [6]. Using this factorization, we have more degree of freedom when

designing linear phase system. Besides, for the 3-channel case, we may have system

with order higher than 0. The factorization for the filter bank with odd-channels is as

follow:

(2.)

where

16

(2.)

(2.)

(2.)

and

(2.)

where and

17

For order-N system in the above factorization, the number of free parameters is

(2.)

An an design example, Figure 2-6 shows the frequency plot of a 7-channel filter bank.

Title:E:\RESEARCH\thesis\7ch.epsCreator:MATLAB, The Mathworks, Inc.Preview:This EPS picture was not savedwith a preview included in it.Comment:This EPS picture will print to aPostScript printer, but not toother types of printers.

Figure 2-6 Frequency response of a 7-channel linear phase paraunitary filter bank

18

2.4 Optimizations

A. Coding Gain

For coding purpose, it is always preferred to optimize filter banks using their coding

gain. For paraunitary system, the coding gain was proved to be[2]

(2.)

where is the variance of ith subband and is the variance of the input:

(2.)

Usually, we use AR(1) process to model the input for image coding and the intersample

autocorrelation coefficient is taken to 0.95. The autocorrelation of such process is

defined as

(2.)

And we have the subband variance

(2.)

B. Stopband Attenuation

For some application, we may need to optimized the filter banks according to the

stopband attenuation of its filters formulated as

(2.)

19

The above formula sums up all the stopband energy for all subband filters in the filter

bank. For paraunitary system, the last filter may not be included as the whole system is

power complementary. The stopband attenuation of a filter can be computed through the

autocorrelation of its impulse response.

C. Polyphase Normalization and DC Leakage

Sometimes, it is desirable to have only one transform coefficient to be non-zero for

constant input. In other words, we do not want to have DC (i.e. at =0) constituent to

leak to other subbands other than the DC subband. This avoids the so-called

checkerboard artifacts. Hence it implies we need to have

(2.)

And the criterion becomes minimizing

(2.)

2.5 Introduction to non-uniform filter banks

In [18] and [12], non-uniform filter banks with rational sampling rate based on the

uniform filter bank system have been introduced. By cascading the uniform analysis

filter bank with corresponding synthesis bank with certain number of channels, desired

non-uniform filter can be synthesized. Repeat this with other set of contiguous channels

on the uniform base filter bank will eventually form the required complete non-uniform

filter bank. For the synthesis system of the non-uniform system, one just needs to

reverse the process, i.e. split the combined signals using the corresponding analysis

filter bank. This is then followed by the synthesis part of the uniform base. This

structure is actually still a system with two stages, i.e. filtering and downsampling

20

followed by upsampling and filtering. However, our aim is to form filter bank with

rational sampling rate, i.e. to first upsample the input signal, filter the signal and

downsample the signal. To achieve this, we should be able to exchange the

downsampler and upsampler as well as move them across the two filters. And this is

valid if and only if the downsampling factor of the analysis base bank and the

upsampling factor of the synthesis banks are relatively prime.

In [18], the synthesis filter bank used to form combined channels was always

chosen to be the delay chain (i.e. 1, z-1,z-2, … ). This will produce perfect reconstruction

filter banks, as the cascaded parts are perfect reconstruction. This can actually be

regarded as time division multiplexing of separate signal into one combined channel.

This choice of the synthesis filter banks will make the shortest resulting rational

sampling filters since delay chain is the shortest possible perfect reconstruction filter

bank. However, delay chain makes relative shift of the individual symmetric (or anti-

symmetric) filters and therefore, shifts the center of symmetry of each filter. Finally, the

resulting combined filter will be not symmetric. This is undesirable for image coding

and not suitable for us.

21

In [12], the indirect method allow us to design non-uniform filter banks base on

uniform filter banks using synthesis filter banks not restricted to be the delay chain

system but should be perfect reconstruction in nature. If the chosen synthesis bank is

with ideal filter responses, one can regard this as the frequency division multiplexing of

signals. This allows more design freedom because we can alter the synthesis bank to

achieve certain design criteria such as the stopband error. Besides, by choosing the

linear phase uniform base bank and linear phase synthesis bank, symmetric or anti-

symmetric rational sampling filters can be formed.

In our discussion, we have the following assumptions:

All filters are assumed to be FIR with real coefficients and contiguous passbands;

All signals are assumed to be real;

All filter banks are critically sampled;

The synthesis banks used for recombination will contain band pass filters with

central frequencies in increasing order.

2.5.1 Non-uniform filter bank derived from uniform filter bank

Figure 2-7 illustrates our desired structure of filter bank with rational sampling factors.

In order to make a clear picture about how a signal can be extracted in rational fractions

Figure 2-7 Filter bank with rational sampling factors

YN-1

QH0(z) P0 Y0

QHN-1(z)PN-1

X QH1(z) P1 Y1

Q F0(z) P0

Q FN-1(z) PN-1

Q F1(z) P1 X’

22

and perfectly reconstructed from these fractions, we make use of an example for filter

bank with partition (2/3π, 1/3π). Using this filter bank, we can extract 2/3 of the lowpass

signal by first upsampling the signal by 2, filter it and downsample it by 3. Meanwhile,

we can extract 1/3 of the highpass signal through filtering followed by downsampling

by 3. Figure 2-8 illustrates the details about this.

To obtain a perfect reconstruction filter bank, we can combine a base analysis

filter banks consisting of Q filters with N synthesis filter banks, each with p i filters (i=0,

… , N-1). Obviously, if the analysis base bank and synthesis bank are both perfect

reconstruction, the resulting system will be perfect reconstruction as well. However, one

has to take care of the difference in delay for different rational sampling filter and the

unaffected filters before passing from analysis part of the system to the synthesis part.

23

Before going further our discussion, we need to understand the constraints on

the sampling factors that are limited in this method. Suppose our desired filter bank is

with sampling factors (p0/q0, p1/q1, … , pN-1/qN-1). Ideally, for any ith channel (starting

from 0), it will occupy frequency range:

(2.)

Assume the filter bank is critically sampled, i.e.:

Figure 2-8 Illustration on how signal can be extracted in fractions and reconstructed perfectly.

(top: filter bank structure; bottom left: spectral analysis of rational sampling (2/3) lowpass; bottom right: spectral analysis of highpass

(1/3) )

2 H0(z) 3

H1(z) 3

2G0(z) 3

G1(z) 3

1 2 3 4 5 6 7

1

2

3

4

5

6

7

24

(2.)

As it will be mentioned later in this chapter, for the upsampler and downsampler to be

interchangeable, all pi’s have to be coprime with Q. Where Q is defined as:

(2.)

Therefore,

(2.)

Obviously, for all pi’s to be coprime with Q, all ri

’ s have to be 1. This means

(2.)

and

(2.)

As a result, in our following discussion, we assume all filter banks have sampling rates

of (p0/Q, p1/Q, … , pN-1/Q)[12].

25

Figure 2-9 Non-uniform filter bank designed indirectly. (a) Analysis Part (b) Synthesis Part

F0 (z) Q

FP0-1(z) Q

K0 0(z) P0

K0 P0-1(z)

P0

FQ-PN(z) Q

FQ-1(z) Q

KN-1 0(z)

PN-1

KN-1 PN-1(z)

PN-1

X’(z)

Base Synthesis BankAnalysis 0

Analysis N-1

Y0(z)

YN-1(z)

X(z)

H0 (z) Q

H P0-1(z) Q

G0 0(z)

G0 P0-1(z)

P0

HQ-PN (z) Q

HQ-1(z) Q

GN-1 0(z)

PN-1

GN-1 PN-1(z)

PN-1

Base Analysis BankSynthesis 0

Synthesis N-1

Y0(z)

YN-1(z)

P0

26

Although it can maintain the perfect reconstruction property, this method does

not ensure the filter bank giving meaningful spectral output. For example, consider a

filter bank with sampling factors (2/3, 1/3) (that is, the lowpass filter of bandwidth 2/3

and the highpass filter of bandwidth /3). It can be verified that the lowpass channel

conserve the input frequencies from 0 to 2/3 in correct and contiguous order. If we use

highpass and lowpass filters of the synthesis bank to match the first and second channel

respectively, the system would still be perfect reconstruction, but now, the spectrum in

the 2/3 subband would be the mirror image of the true spectrum of the signal (that is,

“high” frequencies would appear before “low” one).

Consider a filter bank with sampling factors (1/3, 2/3), now we have a wider

highpass filter. It can be seen that the original signal cannot be preserved contiguously

and in correct order. If a lowpass/highpass pair (from the 2-channel synthesis bank) is

matched with the second/third channels (from the base bank), we get a shuffled and

mirrored version of the spectral bands of interest. At the same time, with a

highpass/lowpass pair, low and high bands are interchanged. This is illustrated in Figure

2-10.

To formulate the observation, the synthesis banks used are assumed to have

bandpass filters with central frequencies in increasing order. i.e. analysis filters with

increasing central frequencies combine with synthesis filters in increasing order only.

Suppose we want to extract the following part of input signal:

(2.)

Consider the first branch only, after downsampling by Q,

27

(2.)

(a)Original spectrum (dotted lines show the parts to be extracted by three analysis filters)

(b) Spectrum of 2nd band output after 3 and 2.

(c) Spectrum of 3rd band output after 3 and 2.

(d) If we choose lowpass/highpass (synthesis bank) for the 2nd/3rd (base bank),

non-contiguous spectrum results.

(e) If we choose highpass/lowpass (synthesis bank) for the 2nd/3rd

(base bank), non-contiguous spectrum results again.

Figure 2-10 Spectral analysis of filter bank with ideal filters for (1/3, 2/3) splitting.

/2 3/2 2

/2 3/2 2

/2 3/2 2

/3 2/3 4/3 5/3 2a

28

and after upsampling by P,

(2.)

To avoid shuffling of frequencies, one of the XRi but not its mirrored version has to fall

within [0, π/P]. Hence,

(2.)

and thus give the following proposition [12]:

Proposition: Not to have shuffling of frequencies in the indirect method, k (as defined

above), has to be even (assuming P>1 )

2.5.2 Equivalent rational sampling and filtering structure

Consider the sampling factor pi and Q in our proposed non-uniform filter banks design.

If pi and Q are relatively prime, the upsampler and downsampler are commutative [25].

And according to noble identities:

We can move the upsampler towards the left across the analysis filters H(z) and raise z

of it to the power of pi, thus it becomes . This is actually equivalent to

upsampling H(z) with a factor of pi. Similarly, we can move the downsampler across the

synthesis filters G(z) towards the right and raise z to the power of Q. Again, this is the

same as upsampling G(z) with a factor of Q. Then, the two upsampled filters

and are brought adjacent to each other and can be merged to be .

Now we have pi branches consist of upsampler of pi and the merged analysis filter

G(z) M M G(zM)

L G(zL) L G(z)

29

followed by downsampler of Q. These branches in parallel can be added to form

equivalent transfer function below and structure is as illustrated in Figure 2-11:

(2.)

Having this single transfer function, one should have idea of the frequency response of

the signal filtered by this rational sampling channel. Besides, when optimizing the filter

bank, one can have better control of the magnitude response of the non-uniform filters.

To the implement the equivalent filtering and rational sampling, we can adopt

polyphase implementation mentioned in [25] as the upsampler and downsampler are

relatively prime. The implementation requires only (N+1)/M MPUs and (N+1-L)/M. N,

L, M are the length of equivalent filter, upsampling rate and downsampling rate,

respectively. For the equivalent filter, the length is N = (NH –1)L + (NG -1)M+1, where

NH and NG represents the length of base analysis filters and synthesis filters respectively.

Figure 2-11 Equivalent structure for Figure 2-9

X(z)

Y0(z)

YN-1(z)

1

0

0'

0

0

0)()()(

p

j

Qj

pjk zGzHzH P0

1

0

1'

1

1

1)()()(

N

N

N

p

j

Qj

pjkN zGzHzH QPN-1

Q

30

Chapter 3 Overview of Image Coding

Image coding has been widely researched for higher compression gain as well as faster

algorithm. Among the common practices, we can summarize the coding algorithm in

few steps: 1) image is first transformed or decomposed into subbands; then 2) bit rates

are allocated according to the subband variances and coefficients are quantized; finally

3) quantized coefficients are entropy coded or coded by dedicated coefficient modeling

algorithms. Transform or subband filter bank are used to decorrelated inputs and

compact energy into fewer coefficients. As a kind of subband decomposition technique,

wavelet transform was found to be effective for image coding. Along with development

of subband decomposition techniques, some dedicated coefficient modeling algorithms

have also been introduced. In this chapter, we will review some common aspects of

image coding.

3.1 Image decomposition

A. Transform coding

Consider an input signal x(n) , it can be arranged into blocks xm of size M

(3.)

The M samples of xm are then mapped to a set of M coefficients in the vector ym through

a linear combination

(3.)

where a(i,k) are called the basis functions. The above relation can be written in matrix

form

31

(3.)

where A=[a(i,k)]MM. To reconstruct xm from ym, it is clear we have to choose the inverse

transform to be A-1. For orthogonal transform, the inverse transform is AT.

The most popular transform for image coding is the DCT (Discrete Cosine

Transform). It is an orthogonal transform and essentially good at energy compaction for

most correlated sources. In addition, its performance is very close to the optimal KLT

(Karhunen-Loéve Transform) for AR(1) process with high correlation coefficient . As

images can be modeled by AR(1) process with high correlation coefficient, the DCT

had been applied in many international standards like JPEG, MPEG, CCITT H.261.

B. Subband coding

Typically, in subband coding, the input signal is decomposed by a set of M analysis

filters. To preserve the overall sampling rate, each subband signal is downsampled to a

factor of M. The signal can be perfectly reconstructed if the subband signals are fed into

the synthesis bank immediately and PR filter bank is chosen. However, in real coding

application, subband signals undergo quantization and entropy coding before being

transmitted and stored. During decoding, compressed data are entropy decoded and

dequantized. Then, at the synthesis stage, each subband signal is upsampled to a factor

of M by inserting M-1 zeros between each two samples, filtered by M synthesis filters

and added to form the output signal.

Transform coding mentioned above can be thought as a special case of subband

coding where each filter length is not longer than M. On the other hand, we can think

subband decomposition is equivalent to transforming block of elements by the

polyphase matrix. For each element in the polyphase matrix, they can be of order higher

32

than 0. If it is only an order-0 polpyhase matrix, it is the same as typical transform

mentioned in last subsection.

C. Wavelet decomposition

It has been found that discrete wavelet transform can be achieved by cascading a 2-

channel perfect reconstruction filter bank iteratively on its lowpass output as depicted in

Figure 3-12. It can be observed that such construction will be perfect reconstruction as

long as the 2-channel filter bank is PR. The discrete wavelet transform can be regarded

as a multiresolution decomposition of a signal into coarser and detail constituents. For

image compression, wavelet decomposition is performed horizontally and then

vertically. This results in four sub-images. Three of them are highpass and contain

horizontal, vertical and diagonal edges and textures respectively. The lowpass subimage

gives a small scale of the original image. Splitting the lowpass subimage using the 2-

channel FB repeatedly will produce coarser and detail representation of the image at

different scales. In addition, from frequency domain point of view, wavelet transform

has good energy compaction capability that fit the decaying spectrum hypothesis for

image compression.

Figure 3-12 Forward discrete wavelet transform through a 2-channel filter bank.

X

2H0(z)

H1(z) 2

2H0(z)

H1(z) 2

2H0(z)

H1(z) 2

33

3.2 Quantization and bit rate allocation

A. Quantization

In lossy compression, the only process that introduces noise and thus causes loss in

reconstruction of signal is quantization. Quantization is a process of representing the

source outputs using smaller number of codewords. If the inputs are scalars, we call the

process scalar quantization. Similarly, if the inputs are vectors, we call it vector

quantization. For scalar quantization, a scalar value within a particular interval is

mapped to the corresponding integer representing that range. During the dequantization,

Figure 3-13 Two popular subband decomposition structure, left: dyadic wavelet transform, right: parallel M-channel

decomposition.

34

the integer is mapped back to the estimate value for that particular interval. The estimate

value is always taken to be the centroid inside the range to minimize the error if the

source inputs are evenly distributed in the range. If all the intervals are of the same size,

we can it uniform quantization. Otherwise, it is non-uniform quantization.

Due to its simplicity, scalar quantization with uniform intervals is always chosen

for practical coding purpose. For image coding, it is crucial to have zero values

represented accurately. Therefore, it is desirable to have the so-called midtread

quantizer that can output zero upon quantization. In image coding, most of the high

frequency coefficients will be quantized to zero. More zeros will be quantized to if we

set quantization interval containing zero to be larger than other intervals. This was

found to have higher compression gain as it favors entropy coding of zero runs. Figure

3-14 illustrates the simple scalar quantization method with dead zone.

35

B. Bit rate allocation

After the image having been decomposed or transformed into different subbands, we

have to quantize the coefficients to certain number of bits. Since, subband coding

exploits the non-flatness of spectral of the input, it makes sense to assign different bit

rates to different subbands. The problem is: given an overall bit rate R, how can we

assign the bit rates Ri to subband i such that the overall reconstruction error is

minimized? Since under scalar quantization, the coefficients are divided into

intervals. The reconstruction error variance of quantized subband i is given by

(3.)

where is the variance of subband i and is constant. Suppose there are M subbands

in total. Then, the overall error variance after quantization is

Figure 3-14 Illustration of scalar quantization with dead zone

Input

Out

put

-4 -3 -2 - 0 2 3 4

-3 -2 -1 0 1 2 3

36

(3.)

Our objective is to determine Ri while minimizing the error variance. The minimization

problem can be solved using Lagrange multiplier and the optimal bit rate allocation

is[19]

(3.)

3.3 Entropy Coding

After quantization, we still observe some dependencies of the quantized values. For

example, there is large portion of zero samples comparing to non-zero samples and the

probability of seeing large values is much less than small values. This makes rooms for

better compression performance through entropy coding. These entropy coding

techniques include Huffman coding, arithmetic coding and run-length coding. Here, we

briefly describe the principle of arithmetic coding and run-length coding as they

contributed to the success of our context modeling algorithm in Chapter 5 .

A. Arithmetic coding

37

The idea of arithmetic coding is to represent the string of symbols using a real number

between 0 and 1. Initially, we have an interval [0,1] and it is divided into sub-intervals

with lengths equal to the probabilities of symbols. When a symbol is encoded, we take

its corresponding sub-interval. Then, this sub-interval is divided similarly into ranges

proportional to the probabilities of symbols. When encoding the second symbol, we

choose the corresponding sub-interval again. This process is repeated until all symbols

have been encoded. The string of symbol can be represented using any number within

the final interval. Of course, we would choose the shortest fixed-point representation

that can be mapped inside that final interval. As an example, Figure 3-15 demonstrates

the processing of arithmetic coding for the alphabet {A, B, C, D} with probabilities

{0.1, 0.3, 0.4, 0.2}. From that example, we have the final interval [0.61536, 0.61824].

We can represent the string using 0.6171875 that is in the final interval and its binary

representation is 1001111.

38

Arithmetic coding allows representing symbols with fractional number of bits.

Therefore, it benefits entropy coding with small alphabet (with even 2 symbols).

Another advantage of arithmetic coding is that it only needs to update the probability

counts adaptively during coding. Also, it allows switching of different context models

easily made it suitable for conditional entropy coding. The drawback of arithmetic

coding is that it requires multiplication and division each time a symbol is being

encoded. For binary case, this can be solved by an arithmetic coder called MQ-coder

mentioned in [11] which requires no multiplication and division when encoding

symbols.

B. Run-length coding

After image decomposition and quantization, there are always long runs of zeros

in high frequency subbands. These long runs of zeros represent the smooth or blur

region of image. It is often beneficial to encode zero runs using a special symbol or

Figure 3-15 Demonstration of arithmetic coding using a 4-ary string "CCBDB".

D

C

B

A

1.0

0.8

0.4

0.1

0.0

D

C

B

A

1.0

0.8

0.4

0.1

0.0

D

C

B

A

1.0

0.8

0.4

0.1

0.0

D

C

B

A

1.0

0.8

0.4

0.1

0.0

D

C

B

A

1.0

0.8

0.4

0.1

0.0

39

prefix followed by the number indicating the number of upcoming zeros. This

contributes compression gain as well as speeds up decoding.

Recently, there are some algorithms that exploit the correlation among

coefficients when coding of the quantized coefficients. These algorithms surpass

traditional approaches by representing zeros effectively and coding important

coefficient first.

3.4 Embedded Zerotree Wavelet (EZW)

The embedded zerotree wavelet (EZW) coder introduced by Shapiro [17] is a wavelet-

based image coder. It exploits the self-similarity across different scales of an image by

organizing the coefficients in tree structure. A coefficient at coarser scale (lower

frequency subband) has four child coefficients at finer scale (higher frequency subband)

at the same relative spatial location of the subimage as depicted in Figure 3-16. For the

coefficients at the LL subband, they have only three child coefficients. Meanwhile for

the coefficients at the largest scale, they do not have any child coefficient.

40

The EZW algorithm bases on the successive-approximation quantization (SAQ)

to provide embedded property. SAQ is actually the same as bit-plane encoding of

magnitudes. Given the threshold T, a coefficient is said to be significant if it ≥ T.

Otherwise, it is insignificant. After all the coefficients have been scanned with respect

to T, T will be halved. The EZW coder consists of two passes: dominant pass is for

encoding the leading zero bits, the leftmost 1 and the sign of the coefficients; sub-

ordinate pass is for refining the magnitude of coefficient that have been found to be

significant. Coefficients are scanned from subbands of lower resolution to high

resolution. An arithmetic coder has been employed to code one of the four possible

symbols: POS - positive coefficient, NEG - negative coefficient, IZ - isolated zero and

ZTR - zerotree root. For POS, NEG and IZ, there is at least one of the descendants of

the current coefficient is significant and thus its children have to been coded. ZTR

Figure 3-16 Parent-child relationship of wavelet coefficient in a EZW coder

41

denotes all nodes in the tree rooted at the current coefficient are insignificant and thus

we do not have to code any of its descendants.

The EZW algorithm had been found to have superior performance[17]. The

philosophy behind is the implication of insignificance of coefficients from its parent at

coarser scale. i.e. the insignificance of a coefficient at a coarser scale always implies

insignificance of its children at larger scale. This is quite adequate for image coding as

for natural image energy is always concentrated in low frequency and decays towards

high frequency. As a result, the zerotree structure fits the spectrum quite well and able

to representation lot of zero samples effectively.

3.5 Set Partitioning in Hierarchical Trees (SPIHT)

The set partitioning in hierarchical trees (SPIHT) [5] image compression algorithm has

been regarded as the improved version of the EZW algorithm. Instead of using a 4-ary

alphabet, it codes each significant status (either for a set of coefficients or individual

coefficient) and sign of coefficients in 0s and 1s. The SPIHT algorithm uses the

following notation for explaining the new coding method:

O(i, j): set of coordinates of all offspring of node (i, j);

D(i, j): set of coordinates of all descendants of the node (i, j);

H: set of coordinates of all spatial orientation tree roots (nodes in the highest

pyramid level );

L(i, j) = D(i, j) – O(i, j)

The parent and child relationship is the same as the EZW algorithm. We can

regard (i, j)+D(i, j) as the tree rooted at (i, j) while L(i, j) was newly introduced. The

algorithm maintains three lists: LIP (list of insignificant pixels) contains insignificant

pixels result from set partitioning; LIS (list of insignificant sets) consists of all

42

insignificant sets of pixels and LSP (list of significant pixels) contains pixels whose

magnitude are to be refined in refinement pass. Initially, we have only tree roots in LIP

and their corresponding D(i, j) set in LIS. During the sorting pass (similar to the

dominant pass in the EZW), the LIP will be scanned prior to LIS. After that, LSP will

be scanned during refinement pass.

When examining the LIP, individual pixel is coded by outputting its significance

state, and its sign if it is significant. Any significant pixel will be moved to the LSP.

When scanning the LIS, the algorithm outputs the significance state of each set. If

it is insignificant, none of the coefficients in the set is significant and thus no need to be

coded individually. However, if it is significant, it will be partitioned like this:

if it is a D(i, j) set then it is broken into L(i, j) plus four individual elements

(k, l), (k, l) O(i, j)

if it is a L(i, j) set then it is broke into four D(k, l), (k, l) O(i, j)

The old set will be removed while the outcome sets (except empty one) will be put to

the end of the LIS. For the four elements coming from the set partitioning, they are

coded similar to those in the LIP. After that, any insignificant pixel remains will be

moved to LIP.

The SPIHT [5] algorithm surpasses the EZW [17] in terms of the PSNR

performance. Since it outputs each event in 0 and 1, there is also binary uncoded

version, which omitted the arithmetic coder. The binary uncoded version is 0.3dB to

0.6dB inferior to the one with arithmetic coding. Nevertheless, it is faster in coding and

simpler in implementation.

43

Chapter 4 Non-uniform filter banks and its

application to image coding

4.1 Introduction

In section 2.5 of Chapter 2 , we have discussed the principle of deriving filter banks

with rational sampling rate from uniform filter banks [12]. According to [12], the

position of synthesis bank (k) is restricted to even. This is very limited when we want to

adapt spectral characteristics of images using non-uniform filter banks with different

bandwidth partitions. Moreover, for image coding, it is very critical for filter banks to

have linear phase. The work in [12] did not consider the linear phase property.

In this chapter, the conditions for relaxing the constraint of k will be shown so that

it can take every position other than even. In that case, we have to allow reversed

matching order of two sets of filters in terms of their passband frequencies. Besides,

linear phase (LP) property can be achieved when we have both LP base filter bank and

LP synthesis filter banks. Symmetric polarities are simply depending on the parity of k.

Moreover, the optimization issues of such non-uniform filter banks will be discussed.

Then, the optimized filter banks will be applied to image coding. It will be proposed

that different non-uniform partitions can be applied subject to the spectral distribution

of images. The idea of determining non-uniform partition will be explained. Lastly, the

coding results will be discussed and observations will be explained.

44

4.2 Extracting the correct spectrum of signal

It has been emphasized that the filter banks achieve should have meaningful spectrum.

By “meaningful”, we mean the spectrum extracted should be contiguous and in correct

order. Consider the equivalent structure:

(4.)

The first analysis filter Hk(z) is situated in

(4.)

and the last analysis filter Hk+p-1(z) is situated in

(4.)

while the Gi(z) resides at

p

i

p

iGi

1,:

(4.)

Consider the upsampled versions of Hk(z) and Gi(z), i.e. Hk(zp), Gi(zQ). For simplicity,

we neglect the any other things outside [0, π] as they are just the periodic and mirrored

versions of those in [0, π]. For even n, the normal version is:

(4.)

and for odd n, the mirrored version is:

(4.)

where n=0, 1, … p-1.

The normal version is:

45

(4.)

The mirrored version is(4.)

where m=0, 1, … q-1.

Suppose we want to extract the contiguous spectrum in the upsampled domain and we

allow the matching of two sets of filters in reverse order in terms of passband

distribution. Spectrum can be extracted only if the two sets of filters coincide.

Therefore, we have to find m and n such that for even n,

(4.)

and for odd n,

(4.)

where m=0, 1, … , q-1 and n=0, 1, …, p-1.

Hence, we have to solve m and n below:

(4.)

or

(4.)

Rewriting above two equations, we have

for even n(4.)

or

for odd n(4.)

46

To solve the above two sets of equations, one can make use of the Chinese Remainder

Theorem as p and q are relatively prime. For (4.)

(4.)

and for (4.), it becomes

(4.)

On solving a and b of the above two equations, we can find m and n.

Examples

Let’s consider a non-uniform filter banks with passband bandwidth: (1/8π,

1/8π,1/8π,5/8π). Here, we have p=5, q=8 and k=3 for the combined channel. We have

(4.)

Thus for equation (4.),

(4.)

or for equation (4.),

=> This violates the assumption that n should be odd

(4.)

Therefore, we have n=4 and m=7. Similarly, one can verify that for the following non-

uniform filter bank cases:

(1/8π, 1/8π , 5/8π, 1/8π), (1/8π, 5/8π, 1/8π, 1/8π) and (5/8π, 1/8π, 1/8π, 1/8π)

47

The values of m and n are: m=5, n=3; m=2, n=1; m=0, n=0 respectively. Figure 4-17

illustrates the location of Hk(zp) and Gi(zq) and the relation of m and n in upsampled

domain. While Figure 4-18 shows the situation of spectrum of the combined channel

with different k in our practical design.

Now, let’s consider the case of (1/3, 2/3) we have mentioned before. We have p=2, q=3

and k=1. Hence, we have

(4.)

Thus for (4.),

(4.)

and similarly, for(4.), we have

(4.)

Both (4.) and (4.) violate the parity assumption of n in (4.) (n is even) and (4.) (n is

odd). As a result, we cannot obtain contiguous passband for the band [π/3, 2π/3] using

this method.

48

0 πm 0 1 2 3 4 5 6 7

0 1 2 3 4 4 3 2 1 0 0 1 2 3 4 4 3 2 1 0 0 1 2 3 4 4 3 2 1 0 0 1 2 3 4 4 3 2 1 0

3 4 5 6 7 7 6 5 4 3 3 4 5 6 7 7 6 5 4 3 3 4 5 6 7

2 3 4 5 6　 6 5 4 3 2 　 2 3 4 5 6　 6 5 4 3 2 　 2 3 4 5 6

1 2 3 4 5 5 4 3 2 1 1 2 3 4 5 5 4 3 2 1 1 2 3 4 5

0 1 2 3 4 　 4 3 2 1 0 0 1 2 3 4 　 4 3 2 1 0 0 1 2 3 4

n 0 1 2 3 4

Figure 4-17 Illustration on the upsampled version of Hk(z) & Gi(z) and relation of m & n

Figure 4-18 Real non-uniform filter bank design example with different values of k. The plots show the spectrums of the combined

channels with bandwidth of 5π/8 for k=0,1,2,3.

49

Note that for even m and n, the spectrum is the normal version, i.e. the bands go

in ascending order as frequency increases. On the other hand, for odd m and n, the

spectrum is the mirrored version. Thus, the bands go in descending order as frequency

increases. As a result, the filters should be matched in normal order (reverse order) if

the two sets of bands go in same direction (reverse direction). In other words, the filters

should be matched in normal order if

(4.)

Otherwise, the channels should be matched in reverse order

(4.)

4.3 Linear phase property

For image coding using uniform filter banks, the number of channels is always chosen

to be 2N. So, in our image coding examples later in this chapter, we use filter banks with

Figure 4-19 Symmetry of combined filters of our proposed design.

where S = symmetric filter and A = anti-symmetric filter

(a)The resulting filter will be symmetric if the first base filter is symmetric, i.e. k=even

(b)The resulting filter will be anti-symmetric if the first base filter is anti-symmetric, i.e. k=odd

S S

A A

A A

S S

S

odd number of channels

A S

S A

S A

A S

A

odd number of channels

k=even k=odd

(a) (b)

50

2N channels as the base banks for deriving non-uniform filter banks. As mentioned

before, the downsampling rate of the base filter banks and the upsampling rate of have

to be coprime and our base filter banks have 2N channels (i.e. the downsampling rate

have factor of 2 only ). As a result, the number of channels of the synthesis banks can be

any odd number larger than 1 and less than 2N.

The work in [12] only considered the general filter bank with rational sampling

rate but not filter banks with linear phase. However, as image coding using non-linear

phase filter banks produces undesirable artifacts upon quantization, linear phase

property is essential for our non-uniform filter bank, i.e. each filter should be symmetric

or anti-symmetric in its impulse response. Fortunately, one can achieve linear phase

combined channels if both the base filter bank and synthesis banks chosen to form the

rational sampling channels are of linear phase too. Usually, successive filters of linear

phase filter banks have alternating symmetric or anti-symmetric property. Hence, all

branches to be combined will be of the same symmetry i.e. either symmetric or anti-

symmetric as illustrated in Figure 4-19. As the resulting filter is the addition of all these

branches, it will be symmetric or anti-symmetric if these symmetric or anti-symmetric

branches have the same length and thus the same center of symmetry. So, the simplest

way is to choose base analysis filters with equal length as well as for synthesis filters.

For simplicity, in our design examples to be illustrated later in this chapter, all the filter

banks including the analysis base banks and synthesis banks will have equal length

filters. Since our synthesis banks always have odd channels, the first filter will be

symmetric. Thus the symmetry of the combined channel can be determined solely by

the first analysis filter to be combined as depicted in Figure 4-19.

51

4.4 Design examples

Based on the above discussions, we are now able to design our non-uniform filter banks

with rational sampling factors. These filter banks will later be applied to image coding

in next section to see the improvements over uniform ones. As mentioned before, our

base uniform filter banks will have 2N channels. Here, we chose a 16-channel LPPUFB

as reviewed in Chapter 2 . We optimized the filter bank with coding gain for AR(1)

process as shown in Figure 2-4. For those filter banks with odd-channel that is going to

be used to combine several channels into one to form filter with rational sampling rate,

we make use of lattice structure in section 2.3.3.

A. Optimization

Due to large number of lattice coefficients involved in optimization of our proposed

non-uniform filter banks, we simplify the problem by first designing the uniform filter

bank before bringing the synthesis filter bank to form the desired rational sampling

bands. For image coding, it is always better to optimize the filter bank with the coding

gain for the AR(1) process as mentioned section 2.4. After that, following section 4.2 in

this chapter, we need to find the values of m, n as well as in which order the two set of

filters should be matched. We can optimize rational sampling band one by one with the

uniform base unaffected. To optimize the non-uniform channel, we can choose the

stopband attenuation as the criterion. The passband distribution is very similar to any

one of the uniform filters and is determined by m. Therefore, the stopband is

(4.)

where are the number of channel of the base filter bank, the relatively

position of the resulting non-uniform filter, and the two transition bandwidths. For the

52

computation of stopband attenuation from its impulse response, we may refer to section

2.4 of Chapter 2 . Upon optimizing the non-uniform channels accordingly with

satisfactory stopband error, we can further optimize the filter banks by allowing the

uniform base to be fine-tuned. However, we cannot find any literature on the coding

gain of filter bank with rational sampling factors. Besides, filter bank with smaller

overall filter stopband attenuation does not necessarily imply better coding

performance. Consequently, we intend not to make any change to the fine optimized

base bank for the non-uniform bank that will be applied to image coding. However,

filter bank design example by optimizing both base bank and synthesis banks will be

shown too.

B. Examples

Here we show 5 examples of design of non-uniform filter banks. The first three are

related while the last two have the same frequency distribution. All of them are based on

a 16-channel LPPUFB as described in Chapter 2 and was shown in Figure 2-4. The odd-

channel LPPUFB responsible for synthesizing the desired band is also introduced in

Chapter 2 .

For Figure 4-20 to Figure 4-24 in the following examples, the dash lines are

uniform channels coming from the base filter bank while the solid lines are the rational

sampling channels. Rational sampling channels have bandwidths similar to those

uniform channels as input will be upsampled before passing into them. The vertical dash

lines in Figure 4-21 separates upsampled input into three parts. Basically, the input is

filtered in one of the three parts.

1. Filter bank with bandwidths: (π/16, π/16, π/16, π/16, 5π/16, 7π/16)

53

This non-uniform filter bank can be formed base on a 16-channel LPPUFB. And we are

employing a 5-channel and a 7-channel LPPUFB to combine 5th to 9th and 10th to 16th

band of that 16-channel base. For the 5-channel LPPUFB, we use one with lengths of

5×3. For the 7-channel LPPUFB, we use the one with filter lengths of 7×5.

For the band of 5π/16: m=11, n=3, filters are matched in normal order.

For the band of 7π/16: m=15, n=6, filters are matched in reverse order.

The plot of frequency response is shown in Figure 4-20.

2. Filter bank with bandwidths: (π/16, 3π/16, 5π/16, 7π/16)

This filter bank is similar to the previous one except that a 3-channel LPPUFB is

required additionally to combine 2nd to 4th channel to form a rational sampling band with

bandwidth of 3π/16. Therefore, for convenience, one can first find the previous one

before adding the 3π/16 band. Once the previous filter bank have been optimized, we

0 0.1 0.2 0.3 0.4 0.5-40

-30

-20

-10

0

Mag

nitu

de (

dB)

Normalized Frequency

Nonuniform Filter Bank: (/16,/16,/16,/16,5/16,7/16)

Figure 4-20 6-channel non-uniform filter bank with bandwidth: (π/16, π/16, π/16, π/16, 5π/16, 7π/16)

54

can add this 3π/16 band by giving up the 2nd to 4th uniform channel. We make use of a 3-

channel LPPUFB with filter lengths 3×5.

For the band of 3π/16, m=11, n=2 and filters are matched in reverse order.

The frequency response plot for the extra 3π/16 band can be found in Figure 4-21. The

frequency plot of the whole filter bank is indeed the superposition of this plot and the

plot of the previous filter bank. We can notice that the frequency response of this 3π/16

band is overlapping with that of 5π/16, since their m’s are equal.

3. Filter bank with bandwidths: (π/16, π/16, π/16, π/16, π/16, π/16, π/16, π/16, π/16,

7π/16)

We can form filter bank with extra rational sampling band base on the bank that have

been already designed. I.e. we can obtain example 2 based on example 1. Conversely,

one can drop any of the rational sampling band and replace the original uniform

0 0.1667 0.3333 0.5-40

-30

-20

-10

0

Mag

nitu

de (

dB)


Frequency Response of the filter [1/16 , 4/16]

Figure 4-21 Frequency plot of the extra 3π/16 band. It is overlapping with that of 5π/16.

55

channels back to it to get filter bank having more uniform channels. Thus, by giving up

the 5π/16 band in example 1, we can achieve our desired bandwidths in this example.

Actually, for practical design process, one can have this filter bank first, example 1 the

secondly and example 2 finally. The frequency plot is shown in Figure 4-22

4. Filter bank with bandwidths: (π/16, 5π/16, 5π/16, 5π/16)

For this 4-channel filter bank, the three rational sampling bands are of same bandwidth

of 5π/16. However, we use the 5-channel LPPUFBs with filter lengths of 5×5, 5×3, 5×1

respectively. The values of m are 13, 14 and 15 respectively and they have the same

n=4. Therefore, the three bands are adjacent to each other as shown in Figure 4-23.

0 0.1 0.2 0.3 0.4 0.5-40

-30

-20

-10

0

Mag

nitu

de (

dB)


Nonuniform Filter Bank: (/16,/16,/16,/16,/16,/16,/16,/16,/16,7/16)

Figure 4-22 10-channel non-uniform filter bank with bandwidth:(π/16, π/16, π/16, π/16, π/16, π/16, π/16, π/16, π/16,

7π/16)

56

5. Filter bank with bandwidths: (π/16, 5π/16, 5π/16, 5π/16) --- optimized both base

bank and synthesis bank for minimum overall stopband attenuation

This bank is in fact coming from example 4 but is further optimized by allowing

modification of the base filter bank so as to have better overall stopband attenuation.

Figure 4-24 depicts the frequency plot of the filter bank. We can see that the stopband

attenuation is much higher than the one in Figure 4-23. The stopband attenuation can be

as higher as near 30dB. It is obviously because we allowed much more freedom to

optimize it. Though having much better stopband error, its coding performance is not as

good as the one in Figure 4-23.

0 0.1 0.2 0.3 0.4 0.5-40

-30

-20

-10

0M

agni

tude

(dB

)


Nonuniform Filter Bank: (/16,5/16,5/16,5/16)

Figure 4-23 4-channel non-uniform filter bank with bandwidth: (π/16, 5π/16, 5π/16, 5π/16)

57

4.5 Image coding examples and results

To see the coding performance of the non-uniform filter bank we have designed, we just

applied them to our image coding algorithm and tested it with some test images. For the

quantization of the coding, we just employed scalar quantization with dead zone for all

subbands. For the entropy coding part, we applied the coefficient modeling method

similar to that in [11].

4.5.1 Subband images after non-uniform decomposition

Upon analysis, the original images will be decomposed into a set of subband images.

These subband images is indeed representing different frequency contents of the

original image. For a 4-channel non-uniform filter bank with bandwidth:(π/16, 3π/16,

5π/16, 7π/16), the width or height of those subbands should be with the ratio 1:3:5:7 as

0 0.1 0.2 0.3 0.4 0.5-40

-30

-20

-10

0M

agni

tude

(dB

)


Nonuniform Filter Bank: (/16,5/16,5/16,5/16)

Figure 4-24 4-channel non-uniform Filter bank with bandwidth: (π/16, 5π/16, 5π/16, 5π/16). Both the base bank and the synthesis

banks are optimized.

58

it is determined by their sampling ratio (1/16:3/16:5/16:7/16). Error: Reference source

not found depicts the subbands upon the decomposition by the filter bank we have just

mentioned. On the other hand, for the decomposition using traditional uniform filter

banks, one can expect the subbands having the same sizes as illustrated in Figure 4-26.

For the decomposition result by filter bank like: (π/16, π/16, π/16, π/16, 5π/16,

7π/16) and (π/16, π/16, π/16, π/16, π/16, π/16, π/16, π/16, π/16, 7π/16), we can imagine

it will like Error: Reference source not found with some rational sampling subbands

replaced back with their uniform subband filters in Figure 4-26.

4.5.2 Coding performance and observations

In our experiments, we chose the well-known test images: Lena, Barbara, Goldhill,

Baboon and Airplane. All of them are 8-bit gray level images and having the size

512×512. Firstly, we used the following non-uniform filter banks: (1,3,5,7)π/16,

(1,1,1,1,5,7)π/16, (1,1,1,1,1,1,1,1,1,7)π/16. Certainly, we had also applied the 16-

channel uniform filter bank for the sake of comparison. The Peak Signal-to-Noise Ratio

(PSNR) for various bit rates from 0.1bpp to 1.0 bpp had been recorded. To find coding

result of such wide range of bit rates, we encoded the image at 1.0bpp with its optimal

quantization step size. Then, to have our desired bit rates, we just truncated anywhere of

the stream where the bit rate reached. The PSNR results for 0.125, 0.25, 0.5, 1.0 bpp

can be found in Table 4.1. The table also contains the results for the image Barbara with

different horizontal and vertical non-uniform partition. We will discuss this later in this

chapter. Meanwhile, Error: Reference source not found(a)-(e) illustrated the coding

performance of these filter banks for different images graphically.

From the plots, we can see that non-uniform filter banks generally outperform

the uniform 16-channel LPPUFB for most of the cases except for the image Barbara.

59

Among the non-uniform filter banks we have investigated, the filter bank with

bandwidth :(1,3,5,7)π/16 performs the best. It always achieves the best PSNR results for

almost all the bit rates. The improvement of it over the uniform one could be as high as

0.9dB as found in the case of image Airplane.

We observed that the filter banks have consistent performance over different

images. Undoubtedly, we can order their coding performance as:

(1,3,5,7)π/16 > (1,1,1,1,5,7)π/16 > (1,1,1,1,1,1,1,1,1,7)π/16 > 16-channel uniform

With 7 of the uniform channels combined into one rational sampling channel, we

improve the performance. Similarly, we can further improve it using the filter banks:

(1,1,1,1,5,7)π/16. Finally, we have the best filter banks: (1,3,5,7)π/16. We see that larger

subbands at relatively high frequencies benefit image coding. For image coding, energy

is always concentrated in low frequency. The high frequency contents are often less

dominant. And upon quantization, many of the high frequency coefficients will be

quantized to zero. These zeros will usually forming some large areas since these large

area of zero coefficients are actually representing the flat or blur areas of the images.

Therefore, having larger subbands at high frequency can help forming large zero areas.

This favors the run length and arithmetic coding of these zeros. For run length coding,

several zeros can be coded as a group efficiently since they most likely appears. For

arithmetic coding with context models, the implication of zeros from its insignificant

neighbors makes less bits required. Besides, high frequency subbands always contain

outline or irregular texture of objects in image and these kinds of signal are usually

quite localized in spatial domain. Thus, the larger the bandwidths, the more precise the

representation will be in spatial domain with less significant coefficients. We can

improve it by ‘predict’ the significance of a coefficient from its significant neighbors

60

since outlines are always connected in nature. For more intuitive understanding of this,

we can see the decomposed subband images in Error: Reference source not found and

Figure 4-26. For decomposition with non-uniform and larger subbands, we can see

easily the shape of Lena. In other words, the significant coefficients are quite

concentrated and connected. Nevertheless, for subband images decomposed by uniform

filter bank, significant coefficients representing the shape of Lena spreaded over most of

subbands. We see many of small Lena images and thus there are more significant

coefficients.

We observed that the bandwidths of the best filter bank are: (1,3,5,7)π/16, i.e. it

has increasing bandwidths from low frequency to high frequency. This fits the decaying

spectrum property of most of the images. To verify this, we may compare its

performance with filter bank having similarly 4-channel but with last three channels

Figure 4-25 Subbands of obtained after the decomposition by a 4-channel non-uniform filter bank: (π/16, 3π/16, 5π/16,

7π/16the image Lena)

61

having equal bandwidths, i.e. the filter banks with bandwidths: (π/16,5π/16,

5π/16,5π/16). From Error: Reference source not found(f), we see that the new 4-channel

non-uniform filter banks are inferior than the (1,3,5,7)π/16 filter bank. With the

decaying spectrum property, the higher the frequency means the less the variation of the

energy. Therefore, combining more channels into a wider one will cause a little bit loss

in energy compaction but have much higher entropy gain.

62

There is also another filter bank with same bandwidths (1,5,5,5)π/16 but was

optimized for overall stopband attenuation by altering both the base bank and synthesis

bank. Its performance is even worse than the previous one because higher overall

stopband attenuation does not guarantee best coding result. Indeed, allowing

optimization of the uniform base for smaller overall stopband attenuation will destroy

the high coding gain of it.

Figure 4-26 Subbands of the image Lena obtained after the decomposition by a 16-channel uniform filter bank

63

Bit Rates(bpp)

Filter Banks: Images: 0.125 0.25 0.5 1

16-channel LPPUFB Lena 30.1420 33.2126 36.4530 39.7104

Barbara 26.2664 29.2165 32.8824 37.6777

Goldhill 28.0476 30.2388 32.8717 36.3483

Baboon 22.9430 24.3481 26.5659 30.1622

Airplane 28.5113 31.9842 35.7285 40.1436

Non-uniform FB: Lena 30.3032 33.6093 36.9375 40.2971

(1,3,5,7)π/16 Barbara 25.4638 28.4150 32.6808 37.9097

Goldhill 28.3059 30.4461 33.1392 36.7305

Baboon 23.1276 24.4758 26.8839 30.7011

Airplane 29.1193 32.5229 36.4834 41.0526


(1,1,1,1,5,7)π/16 Barbara 25.8020 28.5844 32.6387 37.8062

Goldhill 28.2578 30.3836 33.0102 36.6103

Baboon 23.1553 24.4383 26.7921 30.5779

Airplane 28.9182 32.1912 36.1962 40.7981


(1,1,1,1,1,1,1,1,1,7)π/16 Barbara 26.0305 28.9936 32.8727 37.7500

Goldhill 28.2398 30.3075 32.9070 36.4032

Baboon 23.0611 24.4017 26.6660 30.3457

Airplane 28.8474 32.0874 35.8219 40.3351

NUFB: (1,5,5,5)π/16 Lena 29.7453 33.1299 36.7196 40.1831

NUFB: (1,5,5,5)π/16b Lena 29.5969 32.9706 36.5334 40.0654

H: (1,1,3,3,5,1,1,1)π/16V: (1,1,3,3,1,7)π/16

Barbara 26.1195 29.4099 33.5413 38.3710

H: (1,1,3,3,1,7)π/16V: (1,1,3,3,5,1,1,1)π/16

Barbara 26.0782 29.2324 33.2516 38.1143

Table 4.1 PSNR results for image coding using different NUFB and LPPUFB on different test images

64

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 128

30

32

34

36

38

40

42

Bit Rates (bpp)

PS

NR

(dB

)

Lena

16-channel LPPUFB NUFB: (/16, 3/16, 5/16, 7/16)

NUFB: (/164, 5/16, 7/16)

NUFB: (/169, 7/16)

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 124

26

28

30

32

34

36

38

Bit Rates (bpp)

PS

NR

(dB

)

Barbara


NUFB: (/164, 5/16, 7/16)

NUFB: (/169, 7/16)

(a) (b)

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 127

28

29

30

31

32

33

34

35

36

37

Bit Rates (bpp)

PS

NR

(dB

)

Goldhill


NUFB: (/164, 5/16, 7/16)

NUFB: (/169, 7/16)

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 122

23

24

25

26

27

28

29

30

31

Bit Rates (bpp)

PS

NR

(dB

)

Baboon


NUFB: (/164, 5/16, 7/16)

NUFB: (/169, 7/16)

(c) (d)

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 126

28

30

32

34

36

38

40

42

Bit Rates (bpp)

PS

NR

(dB

)

Airplane


NUFB: (/164, 5/16, 7/16)

NUFB: (/169, 7/16)

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 128

30

32

34

36

38

40

42

Bit Rates (bpp)

PS

NR

(dB

)

Lena

16-channel LPPUFB NUFB: (/16, 3/16, 5/16, 7/16) NUFB: (/16, 5/16, 5/16, 5/16)

NUFB: (/16, 5/16, 5/16, 5/16)a

(e) (f)

Figure 4.27 (a)-(f) Plots of PSNR vs Bit rates for different images using four different filter banks

65

Lena (512x512) PSNR=33.21 dB Lena (512x512) PSNR=33.61 dB

( a ) ( b )

Lena (512x512) PSNR=36.45 dB Lena (512x512) PSNR=36.94 dB

( c ) ( d )

Figure 4-28 Reconstructed Lena images;

(a)&(b) at bit rate=0.25bpp, (c)&(d) at bit rate=0.5bpp ; (a)&(c) used 16-channel LPPUFB, (b)&(d) used NUFB: (π/16, 3π/16,

5π/16,7π/16 )

66

For the image Barbara, the result is quite different from other images. The PSNR results

for non-uniform filter banks are better than the 16-channel uniform bank only at the bit

rate of 1.0 bpp. This is probably because the spectral characteristics of the image

Barbara. The power spectrum is not quite satisfying the decaying spectrum hypothesis.

Moreover, the horizontal and vertical spectrums as shown in Figure 4-30 are not alike.

Therefore, decomposing the image Barbara in the (1,3,5,7) manner in both the

horizontal and vertical orientation cannot provide the best solution to the image.

Then, how can we find a non-uniform filter bank to adapt the image’s

spectrums? To solve this, let’s consider the horizontal and vertical spectrums as in

Figure 4-30. Traditionally, for an ideal subband coder, combining the subbands into one

covering the same frequency range will only gives worse energy compaction capability.

However, we can give up a little bit energy compaction gain while having better entropy

gain by forming larger subbands. To ensure only limited energy compaction loss, we

have to combine only those subbands that have no or little power variation. Hence, for

Barbara, we had designed the partitions of non-uniform filter banks are shown as the

vertical dotted line in Figure 4-30. After the image had been decomposed into subbands

with different horizontal and vertical filter banks, we achieved better entropy coding

result as explained before. Numerical results can be found in Table 4.1 while Error:

Reference source not found shows the improvement graphically when compared with

16-channel LPPUFB. There are results for decomposition using exchanged horizontal

and vertical filter banks. Its performance is worse than the normal one and still better

than the uniform one. This is because the horizontal and vertical filter banks differ only

for the higher π/2 bandwidth. Here, we see that decomposition with different suitable

horizontal and vertical filter banks will sometimes provide better coding performance.

67

From the above observations, we propose an algorithm to partition non-uniform

filter banks base on their energy variation. The basic idea of the algorithm is to merge

subbands with similar energy levels. Therefore, it searches from low frequency to high

frequency until energy level of any subband differs from others by higher than Δ. Here,

Δ can be determined by the DC energy (S1) of the image and an adjustable parameter α.

The algorithm goes on searching until all subbands have been covered. Before this

algorithm, we have to compute the energies for every subband of an image. This can be

done by Fast Fourier Transform. As mentioned before, we may have different vertical

and horizontal partitions. Therefore, the algorithm can be carried out for images

vertically and horizontally. For the algorithm below, we have Si representing the energy

of subband i. Also, k and M are the starting positions of synthesis bank and the number

of channel of the uniform base respectively. It should be noted below that the algorithm

keep the DC subband alone as its energy is always much higher than others.

Algorithm for partitioning non-uniform filter banks

output 1,1Δ = αS1

Stt = S2

k=2i=3while i<=M+1

Sd =| Stt/(i-k)-Si|if(Sd >=Δ or i=M+1)

if( i-k= odd )output k, i-1k = ii =i+1Stt = Sk

else output k, i-2k= i-1Stt = Sk

elseStt = Stt + Si

end while

68

Figure 4-29 Plot of PSNR vs Bit rates using different decomposition for the image Barbara

Horizontal Power Spectrum for Image Barbara

Frequency

ln(S

xx(e

j))

Vertical Power Spectrum for Image Barbara

Frequency

ln(S

xx(e

j))

Figure 4-30 Piecewise constant horizontal and vertical power spectrum of image Barbara. Dotted lines show the best partition

found for the non-uniform filter banks.

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 124

26

28

30

32

34

36

38

40

Bit Rates (bpp)

PS

NR

(dB

)

Barbara

16-channel LPPUFB H:(1,1,3,3,5,1,1,1)/16 V:(1,1,3,3,1,7)/16 H: (1,1,3,3,1,7)/16 V: (1,1,3,3,5,1,1,1)/16

69

Figure 4-31 Subbands obtained by different horizontal and vertical non-uniform filter banks with partitions as shown above.

4.6 Conclusion

In this chapter, constraints of designing such filter banks have also been relaxed and

formulated. Based on a 16-channel LPPUFB, we have designed various non-uniform

filter banks with different bandwidth distribution. We have shown that non-uniform

filter banks outperform traditional uniform filter banks consistently. For images having

spectrums different from the decaying spectrum property or have different horizontal

and vertical spectrums, non-uniform filter banks with suitable partitions can be found to

offer better coding results. The method we used to design such filter banks is indeed

sub-optimal in terms of implementation complexity and coding performance. However,

the potential of non-uniform filter bank for image coding has been experienced.

70

Chapter 5 Space and Frequency Context

Modeling for Block-Transform-Based

Image Coding

In this chapter, a new image compression algorithm for block transform-based image

coding is proposed. The algorithm is based on context modeling of coefficients. Both

the intra-band and inter-band dependence have been exploited. Coefficients are coded

in bit-planes to yield embedded streams. Instead of coding each subband independently,

our algorithm treated the transformed image as a whole. It has been found to have

comparable performance with those wavelet-based algorithms despite its simplicity in

implementation.

5.1 Introduction

The recent success of wavelet image coding is most likely because of the application of

context modeling and entropy coding to the quantized coefficients. Being capable to

code groups of zeros efficiently and make a good “prediction” of significant coefficients

based on the neighboring context, wavelet image coding has been brought to its

maturity. Besides, coding coefficients from most significant bit-plane to least significant

bit-plane give progressive and embedded data stream. Hence, it provides satisfactory

rate-distortion performance and allows precise bit rate control by truncation at any point

of the stream. Therefore, it is worth to see if this kind of techniques can be applied to

image coding using uniform decomposition. Actually, there were efforts to adopt image

coding algorithms that was originated for wavelet transform, to block transform image

71

coding[34][40]. However, these methods require us to treat each block of transform

coefficients analogously to a full wavelet tree. This requires either rearrangement or

index mapping of block transform coefficients and hence increases implementation

complexity. Moreover, these methods perform further decomposition of dc coefficients

and thus are more complex. On the other hand, with consideration of 8 frequency

neighbors and 8 space neighbors and coding the coefficients in bit-planes in three

passes, our proposed algorithm is competitive to the state-of-the-art algorithms like

SPIHT.

5.2 Intra-band and Inter-band Dependence

It was known that context dependence could be exploited across subbands or within

subbands. The well-known zerotree coders EZW (Embedded Zero-tree Wavelet)[17]

and SPIHT (Set Partitioning in Hierarchical Trees)[5] are two examples that exploited

the interband dependence. The idea is: the insignificance of a coefficient in a coarser

scale subband will most probably imply insignificance of those coefficients on the same

location of a finer scale subband. In [11], the adopted algorithm further improves the

coding performance by exploiting intraband dependence, i.e. the probability of a

coefficient being significant depends on the significance of its adjacent coefficients

within the same subband. The parent and child relationship in zerotree coders indeed

can be considered as frequency neighborhood since parent coefficient and its child

coefficients are adjacent horizontally, vertically and diagonally, in terms of frequency.

5.2.1 Space and frequency neighborhood

For block-transform-based coding, each coefficient in transformed block is actually

representing certain horizontal and vertical frequency contents at that rough location.

72

Adjacent coefficients within the same transformed block are having increasing (or

decreasing) frequency in horizontal, vertical or diagonal direction. We call these

coefficients frequency neighbors of each other. For coefficient having the same relative

position in adjacent blocks, we call them space neighbors as they can be regarded in the

same subband but having adjacent location. Therefore, for the images decomposed by

M-channel filter banks or transforms with M basis, the frequency neighbors are adjacent

coefficients except those outside the transformed blocks. Meanwhile, space neighbors

are those coefficients that are M-1 coefficients apart. Figure 5-32 depicts the

relationship between a coefficient and all its space and frequency neighbors.

73

5.2.2 Statistical context modeling

In our proposed algorithm, we code the transformed coefficient in bits. Therefore we

have a bit sequence and the minimum code length for sequence given by [37]:

(5.)

where xi is the current bit to be coded and xi-1, xi-2, … , xi-1 represents the bits coded int

the past. Given the quantized coefficients, this achieves the minimum lossless rate.

Ds0

(1)Vs0

(3)Ds1

(1)

Df0

(2)Vf0

(4)Df1

(2)Hs0

(3)Hf0

(4) C Hf1

(4)Hs1

(3)Df2

(2)Vf1

(4)Df3

(2)

Ds2

(1)Vs1

(3)Ds3

(1)

Figure 5-32 Space and frequency neighbors in three different directions inside a block-transformed image using M-channel FB or

transforms with M basis.

The notations H, V and D represent the horizontal, vertical and diagonal neighbors of the coefficient C respectively. While the

subscripts s and f indicates the space and frequency neighbors of the three types. The numbers inside bracket is the context score of

each type of neighbors.74

Practically, we can have a binary arithmetic coder to approach this minimum rate. The

main issue is to have a good choice of subset of {xi-1, xi-2, … , x1} such that we have an

reasonable approximation of the conditional probability. In our discussion, let’s denote

such subset of {xi-1, xi-2, … , x1} S. The probability of the current bit is determined by

the set of past observation S. Therefore S serves as the modeling context. Given the

observation S, we may have an estimate of the conditional probability that works as

source to the statistical model. For image coding, the past observation can consist of any

casual adjacent bits in both space and frequency domain. What we have chosen are 8

frequency neighbors and 8 space neighbors surrounding the current coefficient bit C:

For the graphically representation of them, we can refer to Figure 5-32. Here H*, V* and

D* represent the existence of any corresponding significant neighbors. As an example,

consider the horizontal neighbors at a particular bit-plane with threshold of Tb:

Having the set of past observation, we have the statistical model:

(5.)

Though it is pretty precise for probability estimation, it seems to be quite a high order

model and thus cause two problems: (1) it is both memory and time consuming in real

application, (2) we most likely do not have sufficient past observations to produce a

robust estimate. Therefore, instead of looking at the individual significance state of the

neighbors, we sum up the each type of them. According to space and frequency nature

75

and directions of them (i.e. horizontal, vertical and diagonal), we have 6 type of

neighbors. Hence, it becomes

(5.)

We still hope to avoid such high order modeling. It will be nice if the probability

is linearly dependent on the significance of different types of neighbors. Fortunately,

good coding performance results from this assumption. What’s next is to find the

coefficient (i.e. the weights) for the corresponding neighbors. Although, this can be

done through linear regression while doing it empirically will achieved quite good

result. Hence, we now have a much simpler estimation of the probability:

(5.)

76

In order to fit the binary arithmetic coder, we have to map the above estimate

into a finite number of context groups that representing the real probabilistic counts

during coding.

5.2.3 Context quantization

With 8 frequency neighbors and 8 space neighbors, we have up to 65536 possible

combinations of context information. Consequently, we have to reduce this to a

manageable number of contexts. This process is called context quantization. Our

context quantization rule bases on the discussion in the previous subsection. Our context

quantization maps the possible combinations to one of 9 context labels and is as follow:

each type of neighbors that have been found significant will be given a context score.

We may think the score is related to the importance of any particular neighbor to the

Context Score Context Label

0 0

1 1

2 2

3-4 3

5-6 4

7-9 5

10-12 6

13-16 7

>16 8

Table 5.2 Lookup table for context labels

77

significance of certain coefficient. Experimentally, we found the context scores

illustrated in Figure 5-32 inside the brackets have the best coding performance.

Neighbors that are still insignificant will have a score of 0. The overall context score of

a coefficient is the sum of context score of all its significant neighbors found. After that,

one can find the context label by the lookup table in Table 5.2 according to the context

score. These nine contexts label is corresponding to nine probability models for

arithmetic coding. A coefficient will have context of 0 only if it does not have any

frequency or space neighbor. We can see that the context quantization involving only

integer arithmetics and a small lookup table, therefore, it is quite efficient. Unlike those

coefficient modeling algorithms for wavelet transform, our algorithm do not have

different context quantization rules for different subbands, e.g. for subbands with

different orientations.

78

5.2.4 Another context quantization method

Here we suggest an alternative method for context quantization whose coding

performance is as good as the above one but requires more context labels. This method

only requires bit setting during updating significance information of the neighbors,

whenever a significant coefficient is found. Instead of considering the number of

Figure 5-33 Image after transformation with 16x32 LPPUFB

79

significant neighbors of each kind, it simply sets the significance state (bit) for that type

of neighbor to 1 whenever any one of them is found to be significant. We have 6

different types of neighbors and thus the value of the context labels consist of 6 bits.

Therefore, there are a total of 64 different contexts. Again, context label 0 represents

context of a coefficient with no significant neighbor. Example of finding the context

label based on the neighboring significance status is shown in Figure 5-34. Here, we

give up summing the total number of each types of significant neighbor. However, we

have made no assumption on importance of contribution of each type of them. This

makes it more adaptive for images with different nature where different neighboring

dependences may exhibit. This is an alternate context quantization method that provides

similar result to that in subsection 5.2.3. However, all the coding results in this chapter

are still base on the context quantization method in subsection 5.2.3.

Type of significant neighbors

Hf Vf Df Hs Vs Ds Context label

Bit values 1 0 0 0 1 0 34

Figure 5-34 Example of finding context label

5.3 Coding Passes

In our proposed algorithm, the probability for a coefficient bit to become significant is

determined by its context label. In fact, the context label depends on the neighboring

significances of the coefficient. The conditional probabilities then drive the adaptive

arithmetic coder. When coding a coefficient bit, the bit value and its context label are

passed to the arithmetic coder to output the bit stream. The arithmetic coder benefits

from the uneven probability distributions of different contexts and thus gives entropy

gain. Except those context labels for the most-significant-bits of a coefficients, we have

80

several context labels for the sign bits, the refinement bits and those to used in cleanup

pass. They will be explained later in this section. The arithmetic coder will start with

equal probabilities for different contexts. It will be reset before coding each bit-plane for

better adaptation. For the working principle of arithmetic coding, we may refer to 2.5

During the coding process, coefficients are coded in bit-planes and each bit-

plane consists of three coding passes: significance pass, refinement pass and cleanup

pass. Each coefficient bit is coded in one and only one of the three coding passes. Since

all coefficients will initially have context of 0, the coder will start at the cleanup pass

the highest bit-plane of the transformed image. Each bit-plane is scanned in a particular

order. Starting at the top left of the transformed image, first four bits of the first column

are scanned. Then the four bits of the second column, until the width of the image have

been covered. After that, the second four bits of the first column are scanned and so on.

The scan process continues until all bits of the image have been covered. Figure 5-35

illustrates the coding process while each coding pass will be described as follows.

5.3.1 Significance pass

In this pass, coefficients that are more likely to become significant in current bit-plane

will be coded base on their significance context, i.e. bases on the existence of its

significant neighbors. This pass is prior to other passes because significant coefficients

found will contribute to distortion reduction. Only coefficients with non-zero context

are coded here. For sign coding, a single independent context label 9 is reserved as it

seems that the sign is independent of the context for transform coding.

81

5.3.2 Refinement pass

Significant coefficients that have been discovered are refined in this pass. Since one

more magnitude bit is coded in this pass, the uncertain interval of a coefficient is halved

and thus the coefficient is refined. We have two context labels for coding those

refinement bits. If it is the first magnitude refinement bit of the coefficient, context label

10 will be used. Otherwise, context label 11 will be used.

5.3.3 Cleanup pass

The cleanup pass includes all remaining bits of coefficients that were insignificant and

were not coded in the significance pass. The bits of coefficients are considered four at a

group vertically. If all four bits of coefficients need to be coded in the cleanup pass, an

extra bit is required to indicate whether all of them are insignificant. Otherwise, they are

coded in the same way as in the significant pass. If at least one of the four bits is

Figure 5-35 Illustration of the coding process.

Sig

nific

ant P

ass

Sig

nific

ant P

ass

Re

fine

men

t Pa

ssR

efin

em

ent P

ass

Cle

an-

Up

Pas

sC

lea

n-U

p P

ass

Binary Arithmetic Coder

most significant bit-plane of transformed image

82

significant, a two bits run-length code then will be generated to indicate first significant

coefficient of the four. This is followed by sign bit coding of that coefficient. For other

significant coefficients of the four, they are coded in the same way as in the significant

pass. Context label 12 is assigned for coding the bit representing the insignificance of 4

coefficient bits while context label 13 is assigned for the two bit run-length code. The

cleanup pass codes the coefficients that are less likely to be significant and thus helps to

speed up the decoding process. It is because that the decoder may skip all the 4

coefficients when a single bit was decoded to indicate that all of the 4 bits are

insignificant.

5.3.4 Illustration of coding passes

In this subsection, we demonstrate the coding passes using an example transformed

image as shown in Figure 5-36. From Figure 5-36, we suppose the image was

transformed by a transform of 4 bases. We are going to illustrate the coding procedure

using the middle block of coefficients. The shaded squares denote significant

coefficients have been found and different grayscales represent the significance have

been found at different bit-plane. Figure 5-37 illustrates the bit output (at the top) and

the context label (at the bottom) to the arithmetic coder for the bit-plane of T=8 and

T=4.

83

During the bit-plane of T=8, the context score of the coefficient 13 is 7 and thus

the context label is 5. It is found to be significant and its sign ‘+’ is coded too. For the

coefficient -3 below 13, its context score changes from 0 to 3 as the coefficient 13 is

newly found to be significant. For other coefficients (i.e. -7,1,0) with non-zero context

label, they are coded in the significance pass while others are ignored in this pass.

During the refinement pass, since the coefficient 21 was discovered to be significant in

last bit-plane (T=16), its magnitude residue is refined here. As it is the first refinement

bit, context label 10 is given. At the cleanup pass, there exist two vertical group of four

i.e. (0,-3,1,0) and (0,0,4,0) that can be coded in groups. The two groups of coefficients

are insignificant in this bit-plane.

21 -7 0 0

13 1 -3 0

-3 0 1 4

0 -1 0 0 Legend:

Significant after

Bit-plane of T=16

Bit-plane of T=8

Bit-plane of T=4

Figure 5-36 Example transformed image with 3x3 blocks

84

T PassesCoefficients

21 13 -3 0 -7 1 0 1 0 -3 1 0 0 0 4 0

::

::

8

Significance 1,+ 0 0 0 0

Context label 7,9 3 5 5 2

Refinement 0

Context label 10

Cleanup 0 0 ------ 0 ------- ------ 0 -------

Context label 0 0 ------ 12 ----- ------ 12 -----

4

Significance 0 1,- 0 0 0 0

Context label 3 8,9 6 2 3 2

Refinement 1 1

Context label 11 10

Cleanup 0 0 0 0 (1, 10,-, 0 )

Context label 0 0 0 0 (12, 13,9,3)

::

::

Figure 5-37 Example of coding passes for the image in Figure 5-36

For the significance and refinement passes of bit-plane with T=4, the output bits

and context labels are found similarly. At the cleanup pass, the group of four (0,-3,1,0)

is no longer be coded together as (0,-3) have been already coded in the significance

pass. Therefore, (1,0) will be coded individually. For the group (0,0,4,0), the first ‘1’

indicates that at least one of them is significant. Then, ‘10’ indicates that the first

significant coefficient is the third one. This is followed by the sign (‘-’) of it. Lastly, the

last 0 is coded with context label 3.

85

5.3.5 Embedded bit stream truncation

Nowadays, embedded image coding is essential for various applications such as

progressive image transmission, internet browsing, scalable image and video database.

For embedded coding, bit stream can be truncated at any point and image can be

reconstructed with reasonable quality. With the image transmission going on, the

quality of reconstructed image improves and can even up to the point of lossless quality.

In [17][5], coefficients are coded in bit-planes. Also, subbands of coarser scale are

coded prior to those of finer scale as it is supposed that low frequency subbands

contribute distortion decreases more than high frequency subbands. As mentioned, our

algorithm does not handle subbands separately. Therefore, we cannot guarantee low

frequency subbands to be coded prior to those of higher frequencies. However, during

the significance pass, only coefficient bits with non-zero contexts will be scanned.

Usually, low frequency subbands are more probable to have significant neighbor and

thus non-zero contexts than that of higher frequencies. Hence, coefficients with lower

frequencies will usually be coded in significance pass (and thus prior to those of higher

frequencies) during the early stage of coding. Therefore, our coder has reasonable

performance even at relatively low bit rate when truncation is made at some point. It can

be seen later in this chapter that the PSNR curves are quite “convex”. That is, it has

quite good rate-distortion performance although only casual truncation of bit-stream is

applied for bit-rate control.

5.4 Coding Examples and Results

In our experiments, an 8×8 DCT, a 16×16 DCT and a 16×32 LPPUFB [28] have been

used for testing. The LPPUFB one was our own implementation based on [28] and was

86

optimized for coding gain (9.81dB). For the details of design and optimization of that

filter bank, we can refer to Chapter 2 . Firstly, the images were block transformed by the

above transforms. Then they were explicitly quantized uniformly using single

quantization step before passing to our proposed context modeling algorithm. To obtain

the PSNR at various rates of the same image, we first compressed the image to the bit

rate of 1.0 bpp. Then, we simply truncated the bit stream when the target bit rate had

reached during decoding.

Table 5.3 shows the coding performance of our algorithm in terms of PSNR.

The images tested were 512×512 gray scale images: Lena, Barbara, Goldhill, Baboon

and Airplane. The result of the well-known wavelet coder SPIHT (with 6-levels wavelet

decomposition) have been included too. For the sake of justifying the performances due

to the part of entropy coding in our algorithm, we have to compare it with other image

coder which employed block-transform technique too. Hence, on the second last and

last columns of the table, we also included the results by the image coder using 8×8

DCT plus 3-level wavelet transform (9/7 tap filters) of dc coefficients and 16×32 GLBT

(Generalized Lapped Biorthogonal Transform) plus 2-level wavelet transform (9/7 tap

filters) of dc coefficients respectively[34]. That image coder used SPIHT for

quantization and entropy coding [34]. Figure 5-38 shows plots of PSNR versus bit rates

of these image coding algorithms.

From the plots, one should note that our proposed algorithm with 16×32

LPPUFB outperforms SPIHT consistently for the all three images. Generally, our

algorithm is better than SPIHT for Lena and Goldhill with improvement under 0.4dB.

However, the improvement can be as high as 3.0dB in the case of Barbara. It is because

Barbara has many regular textures that are high frequency dominant. It was known that

87

uniform filter banks or lapped transforms can provide better high frequency resolution

and thus energy compaction, while wavelet transform cannot. Besides, our algorithm is

able to have further entropy gain from the space contexts since those high frequency

components spread in spatial domain.

From Table 5.3, it can be found that our algorithm using 8×8 DCT is

comparable to that based on the SPIHT algorithm [34] (on the second last column of the

table) except for relatively lower bit rate. Also, our algorithm using 16×32 LPPUFB is

competitive with that based on GLBT (with coding gain 9.96dB) together with the

SPIHT algorithm [34]. For the case of the image Barbara, our algorithm even

consistently outperforms the GLBT with SPIHT by about 0.5 dB. The GLBT (coding

gain=9.96dB) with higher coding gain than the LPPUFB (coding gain=9.81dB) in our

tests is similar to the LPPUFB as both of them can be regarded as block-transform

technique. This justified that the improvement brought by our algorithm for highly-

textured image like Barbara is not only due to the energy compaction capability but also

our context modeling algorithm. On the other hand, one has to realize that the method in

[34] required three/two more levels decomposition of the DC coefficients using the

popular biorthogonal 9/7-tap wavelet pair for the 8×8 DCT/16×32 GLBT. Furthermore,

the SPIHT algorithm is not tailor-made for block transform but wavelet decomposition.

To fit SPIHT, one has to rearrange coefficients in the manner like wavelet transform.

This unavoidably increases the implementation complexity. However, our algorithm

requires only a significance map to store the neighboring significance states. The

significance state of a coefficient is updated whenever its neighbor coefficient is found

to be significant. Hence, it is flexible for us to speed up the coding process by giving up

88

some of the space/frequency neighbors with a trade-off of decrease in coding

performance.

With the result of 16×16 DCT having also been included, we observe that coding

performance improves significantly from 8×8 DCT. This is partly due to the better

energy compaction capability of 16×16 DCT. The increase in number of transform basis

also provides better frequency context modeling as more precise frequency information

exists.

Principally, our algorithm works like this: DCs of each block are first discovered

and other frequency components near DC appear gradually. For smooth region, only

few coefficients are found significant for each block. While most of the other

coefficients having no significant neighbors (both space and frequency neighbors) are

coded efficiently and compactly in the cleanup pass. For complex region, the signal

spread in both space and frequency domains. Both space and frequency neighborhoods

are useful in this case. For highly-textured images, i.e. dominated by some frequency

ranges, they are overcome by our space context modeling as they usually cover a large

area of the image. In general, the more its significant neighbors, the higher the chance a

coefficient becomes significant. Moreover, frequency neighborhood is more important

than space neighborhood for the significance of a coefficient. Diagonal neighbors, being

farther than horizontal and vertical neighbors, are less related to the significance of

coefficients. This has been reflected in the context scoring.

In our experience, scanning subbands of lower frequencies before any other

subbands improved PSNR slightly, especially at a relatively low bit rate. This is because

subbands of lower frequencies are usually more dominant than other subbands. When

the truncation is somewhere within certain pass of a bit-plane, this contributes a little bit

89

better progressive performance as more significant coefficients was found before

truncation. However, the author still insists in maintaining implementation simplicity

with a little bit coding performance tradeoff.

90

Image Bit Rate (bpp)

SPIHT 16×32LPPUFB

16×16 DCT

8×8 DCT

[34]8x8DCTwith SPIHT

[34]16x32GLBT with SPIHT

Lena 1.0 40.41 40.52 40.26 39.96 39.91 40.43

0.5 37.21 37.34 36.92 36.39 36.38 37.33

0.25 34.11 34.29 33.67 32.51 32.90 34.27

0.125 31.10 31.09 30.36 28.49 29.67 31.18

Barbara 1.0 36.41 38.97 37.78 36.84 36.31 38.43

0.5 31.40 34.53 32.97 31.52 31.11 33.94

0.25 27.58 30.70 29.17 27.44 27.28 30.18

0.125 24.86 27.50 26.13 24.37 24.58 27.13

Goldhill 1.0 36.55 36.94 36.63 36.34 36.25 36.78

0.5 33.13 33.35 33.02 32.65 32.76 33.42

0.25 30.56 30.65 30.27 29.73 30.07 30.84

0.125 28.48 28.55 28.12 27.29 27.93 28.74

Baboon 1.0 ----- 30.86 30.63 30.32 ----- -----

0.5 ----- 27.07 26.81 26.46 ----- -----

0.25 ----- 24.61 24.36 23.93 ----- -----

0.125 ----- 23.17 22.94 22.50 ----- -----

Airplane 1.0 ----- 41.04 40.80 40.63 ----- -----

0.5 ----- 36.73 36.29 35.77 ----- -----

0.25 ----- 32.75 32.18 31.12 ----- -----

0.125 ----- 29.18 28.58 27.06 ----- -----

Table 5.3 Objective PSNR results in dB for various image coding method

91

5.5 Conclusion

It has been shown that context modeling techniques [17][5][7][36][37] help to improve

coding performance significantly. This chapter presented our context modeling method

that is dedicated for block-transform-based image coding algorithms. Our algorithm was

shown to be competitive to those wavelet-based algorithms. Our algorithm is actually

not only simple in concept but also its implementation. One can imagine this algorithm

is as simple as coding bit-planes of transformed image with the consideration of context

of each bit. With only the inclusion of space and frequency neighbors as well as 14

context combinations, this algorithm is comparable to the state-of-the-art algorithms

like SPIHT.

92

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 126

28

30

32

34

36

38

40

42

Bit Rates (bpp)

PS

NR

(dB

)

Lena

LPPUFB DCT(16x16) DCT(8x8) SPIHT [11] (DCT(8x8))

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 122

24

26

28

30

32

34

36

38

40

Bit Rates (bpp)

PS

NR

(dB

)

Barbara


0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 126

28

30

32

34

36

38

Bit Rates (bpp)

PS

NR

(dB

)

Goldhill


Figure 5-38 Plots of PSNR vs bit rates for images: Lena, Barbara & Goldhill

93

Figure 5-39 Reconstructed images of Barbara at bit rates of 0.5bpp and 1.0bpp.

Barbara (512x512) PSNR=34.53 dB

Barbara (512x512) PSNR=38.97 dB

94

Chapter 6 Conclusion

The linear phase paraunitary filter banks act an important role in image coding and are

the generalization of the DCT and LOT. Besides, it had been the foundation of our non-

uniform filter bank design in Chapter 4. Its theory of design and implementation were

reviewed in Chapter 2. In last section of Chapter 2, we revisited the construction of non-

uniform filter bank by cascading perfect reconstruction transmultiplexer in between

analysis and synthesis part. The basic idea was to combine several ordinary channels

into a channel with wider bandwidth. Such construction method results in filter banks

having some branches with rational sampling factors.

General issues on image compression were reviewed in Chapter 3. These

included: subband image decomposition, image transform, quantization and bit rate

allocation, entropy coding. Two well-known image coding algorithms EZW and SPIHT

were also briefly discussed.

In Chapter 4, constraints of designing non-uniform filter bank with rational

sampling factor were discussed. It was found that the combined channel could take

every position (k) depended on the matching order of the channels from two uniform

filter bank. Being critical for image coding, linear phase property was sought for non-

uniform channels. Several design examples of non-uniform filter banks with different

bandwidth partitions have been shown in Chapter 4. The filter banks were designed base

on a 16-channel linear phase filter bank. We used odd-channel filter banks to merge our

desired channels. These non-uniform filter banks were then applied to image coding.

The PSNR results were taken and compared with those using traditional uniform one. It

was found that the non-uniform filter banks generally give improvements over

95

traditional M-channel filter bank. Furthermore, we proposed using different non-

uniform partition for horizontal and vertical filtering. Spectrum of images could first be

analyzed and non-uniform partition could then be determined. It was shown that

different horizontal and vertical partitions that fit the frequency energy distribution got

better coding performance.

It could be seen that an effective coefficient modeling algorithm in an image

coder may often contribute improvement in compression gain. As motivated by this, we

proposed in Chapter 5 a coefficient context modeling algorithm that is dedicated for any

transform-based image coder. The algorithm utilizes the intraband and interband

correlations of coefficients through space and frequency neighbors as defined in the

chapter. Context quantization rules were used to reduce number of possible

configurations of the neighbors’ states. Conditional entropy coding of coefficients was

driven by an adaptive arithmetic coder. Coding experiments showed its promising

results in spite of its simplicity in implementation. The algorithm was found to be

comparable to the state-of-the-art algorithm SPIHT.

96

References

[1] A. Bilgin, P. J. Sementilli, and M. W. Marcellin, “Progressive Image Coding

Using Trellis Coded Quantization”, IEEE Trans. Image Processing, vol. 8, no. 11,

pp.1638-1643, Nov. 1999.

[2] A. K. Soman, P. P. Vaidyanathan, “Coding gain in paraunitary analysis/synthesis

systems,” IEEE Trans. Signal Processing, vol. 41, May. 1993.

[3] A. K. Soman, P. P. Vaidyanathan, and T. Q. Nguyen, “Linear-phase orthonormal

filter banks,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing,

Minneapolis, MN, vol. III, Apr. 1993, pp. 209-212.

[4] A. K. Soman, P. P. Vaidyanathan, and T. Q. Nguyen, “Linear-phase paraunitary

filter banks: Theory, factorizations and designs,” IEEE Trans. Signal Processing,

vol. 41, Dec. 1993.

[5] A. Said and W.A. Pearlman, “A new fast and efficient image codec based on set

partitioning in hierarchical trees,” IEEE Trans. Circuits Syst. Technol., vol. 6, pp.

243-250, June 1996.

[6] C. W. Kok, T. Nagai, M. Ikehara and T. Q. Nguyen, “Structures and factorizations

of linear phase paraunitary filter banks,” Proc. IEEE Int. Symp. Circuits Syst.,

Hong Kong, Jun. 1997, pp. 365-368

[7] D. Taubman, “High Performance Scalable Image Compression with EBCOT”,

IEEE Trans. on Image Processing, vol. 9, no 7, pp 1158-pp1170, June, 2000.

[8] G. Strang and T. Q. Nguyen, Wavelet and Filter Banks. Wellesley, Ma:

Wellesley-Cambridge, 1996.

97

[9] H. S. Malvar, and D. H. Staelin, “The LOT: Transform coding without blocking

effects”, IEEE Trans. Acoust., Speech, Signal Processing, vol. 37, no. 4, pp553-

559, Apr. 1989.

[10] H. S. Malvar, Signal Processing with Lapped Transforms. Norwood, MA: Artech

House, 1992.

[11] ISO/IEC JTC1/SC29 WG1, JPEG 2000 Part I Final Committee Draft Version

1.0, Mar. 2000.

[12] J. Kovačević and M. Vetterli, “Perfect Reconstruction Filter Banks with Rational

Sampling Factors”, IEEE Trans. Signal Processing, vol. 41, no. 6, pp. 2047-2066,

Jun 1993.

[13] J. Kovačević and M. Vetterli, “Perfect reconstruction filter banks with rational

sampling rates in one and two dimensions”, in Proc. SPIE Conf. Visual Commun.

Image Processing, Philadelphia, PA, Nov. 1989, pp. 1258-1268

[14] J. Kovačević and M. Vetterli, “Perfect reconstruction filter banks with rational

sampling rate changes”, in Proc. IEEE Int. Conf. Acoust., Speech, Signal

Processing, Toronto, Canada, May 1991, pp. 1785-1788

[15] J. Kovačević and M. Vetterli, “The commutativity of up/downsampling in two

dimensions”, IEEE Trans. Inform. Theory, vol. 37, pp.695-698, May 1991.

[16] J. Li and S. Lei, “An embedded still image coder with rate-distortion

optimization,” IEEE Trans. Image Processing, vol. 8, no. 7, pp. 913-924, Jul,

1999.

[17] J.M. Shapiro, “Embedded image coding using zerotrees of wavelet coefficients”,

IEEE Trans. Signal Processing, vol.41, no. 12, pp3445-3462, Dec. 1993.

98

[18] K. Nayebi, T.P. Barnwell, III and M.J.T. Smith, “The design of perfect

reconstruction non-uniform band filter banks,” IEEE Trans. Signal Processing,

vol. 41, no. 3, pp. 1114-1127,Mar 1993.

[19] Khalid Sayood, Introduction to Data Compression, Second Edition, San

Francisco, CA 94104-3205, USA, Morgan Kaufmann Publishers, 2000

[20] L. Chen, Design of Linear Phase Paraunitary Filter Banks and Finite Length

Signal Processing, PhD thesis, The University of Hong Kong, Hong Kong,

August 1997.

[21] M. J. Tsai, J. D. Villasenor, and F. Chen, “Stack-run image coding”, IEEE Trans.

Circuits and Systems for Video Technology, vol. 6, pp519-521, Oct. 1996.

[22] M. Veterlli and D.Le Gall, “Perfect reconstruction filter banks: Some properties

and factorizations,” IEEE Trans. Acoust., Speech, Signal Processing, vol.37, pp.

1057-1071, July 1989.

[23] M. W. Ho and K. P. Chan, “Space and frequency context modeling for block-

transform –based image coding”, Proc. Int. Conf. Imaging Science, Systems, and

Technology, vol. 1, pp. 209-214, Jun, 2001

[24] M.W. Marcellin, M.J. Gormish, A. Bilgin, M.P. Boliek, “An Overview of JPEG-

2000”, Proc of IEEE Data Compression Conference, pp523-541, 2000

[25] P. P. Vaidyanathan, Multirate Systems and Filter Banks. Englewood Cliffs, NJ

07632, Prentice-Hall, 1993.

[26] P. Q. Hoang and P.P. Vaidyanathan, “Non-uniform multirate filter banks: Theory

and design”, in Proc. IEEE Int. Symp. Circuits Syst., Portland, OR, 1989, pp. 371-

374.

99

[27] Pankaj N. Topiwala, Wavelet Image and Video Compression, Massachusetts

02061, Kluwer Academic Publishers, 1998

[28] R.L. de Queirz, T.Q. Nguyen, and K.R. Rao, “The GenLOT: Generalized linear-

phase lapped orthogonal transform”, IEEE Trans. Signal Processing, vol. 44, pp.

497-507, Mar. 1996.

[29] T. D. Tran, “The BinDCT: fast multiplierless approximation of the DCT,” IEEE

Signal Processing Letters, vol. 7, no. 6, pp. 141-144, Jun 2000.

[30] T. D. Tran, “The LiftLT: fast-lapped transforms via lifting steps,” IEEE Trans.

Signal Processing Letters, vol. 7, no. 6, pp. 145-148, Jun 2000.

[31] T. D. Tran, R. L. de Queiroz and T. Q. Nguyen, “Linear-phase perfect

reconstruction filter bank: lattice structure, design, and application in image

coding,” IEEE Trans. Signal Processing, vol. 48, pp. 133-147, Jan 2000.

[32] T. Q. Nguyen and P. P. Vaidyanathan, “Structures for M-channel perfect

reconstruction FIR QMF banks which yield linear phase analysis filters,” IEEE

Trans. Acoust., Speech, Signal Processing, vol. 38, pp. 433-446, Mar. 1990.

[33] T. Q. Nguyen and P. P. Vaidyanathan, “Two-channel perfect reconstruction FIR

QMF structures which yield linear phase analysis and synthesis filters,” IEEE

Trans. Acoust., Speech, Signal Processing, vol. 37, pp. 676-690, May. 1989.

[34] T.D. Tran and T.Q. Nguyen, “A progressive transmission image coder using

linear phase filter banks as block transforms,” IEEE Trans. Image Processing,

vol. 8, pp.1493-1507, Nov, 1999.

[35] W. H. Press, B. P. Flannery, S. A. Teukolsky and W. T. Vetterling, Numerical

Recipes in C, Cambridge University Press, 1988.

100

[36] W. Zhang and T.R. Fischer, “Comparison of Different Image Subband Coding

Methods at Low Bit Rates’, IEEE Trans. Circuits Syst. Technol., vol 9, pp 419-

423, April 1999.

[37] X. Wu, “High-order context modeling and embedded conditional entropy coding

of wavelet coefficients for image compression”, Proc. Of 31st Asilomar Conf. On

Signal, Systems, and Computers, p.1378-1382, Nov. 1997

[38] Y. P. Lin and P. P. Vaidyanathan, “Linear phase cosine modulated maximally

decimated filter banks with perfect reconstruction,” IEEE Trans. Signal

Processing, vol. 42, no. 11, pp. 2525-2539, Nov 1995.

[39] Z Xiong, K Ramchandran, M.T. Orchard, and Y.Q. Zhang, “A Comparative Study

of DCT- and Wavelet-Based Image Coding,” IEEE Trans. Circuits and Systems

for Video Technology, vol. 8, no. 5, pp.692-695, Aug, 1999.

[40] Z.Xiong, O. Guleryuz, and M.T. Orchard, “A DCT-based embedded image

coder,” IEEE Signal Processing Letter, vol. 3, pp. 289-290, Nov, 1996.

101

non-uniform filter banks and context modeling for image coding

Documents

rational sampling factors

pqp mpqmpz

dotted lines show

discrete cosine transform

decaying spectrum hypothesis

rational sampling rate

binary uncoded version

filters nni im