non-uniform filter banks and context modeling for image coding
TRANSCRIPT
Non-uniform Filter Banks and
Context Modeling for Image Coding
Ho Man Wing
M.Phil. Thesis
The University of Hong Kong
August 2001
Abstract of thesis entitled
Non-uniform Filter Banks and
Context Modeling for Image Coding
submitted by
Ho Man Wing
for the degree of Master of Philosophy
at the University of Hong Kong
in August 2001
In recent years, subband coding is getting popular for image compression. Traditionally,
subband filter banks decompose input signals into subbands with equal bandwidth. With
the doubt that whether uniform subband decomposition is best for image coding, we
carried out coding experiments using non-uniform filter banks that decompose image
with unequal bandwidths to see their performances. First, we revisited non-uniform
filter banks derived from uniform ones and revised the constraints. The resulting filter
banks have some branches with rational sampling rates. Then, linear phase property, as
an essential issue of image coding, was discussed. Finally, different non-uniform filter
banks were designed and applied to image coding. In addition, different horizontal and
vertical non-uniform partitions were applied base on the spectral analysis of image.
Improvements over uniform filter bank were observed and discussed.
As another important aspect in image compression, coefficient context modeling
algorithms showed improvements in rate-distortion performance. In this dissertation, we
proposed a new coefficient context modeling algorithm which is tailor-made for
transform coding. The algorithm exploited the intraband and interband dependence
through the use of space and frequency neighbors in a transformed image. Context
quantization was used to reduce the possible configurations of the neighborhood. It
attained improved gain in conditional arithmetic coding. Besides, precise bit rate control
was possible using casual truncation of bit stream while good rate-distortion
performance was maintained. Finally, its performance was compared with other
algorithms and satisfactory results were obtained.
Non-uniform Filter Banks and
Context Modeling for Image Coding
by
Ho Man Wing
B.Eng. H.K.
A thesis submitted in partial fulfillment of the
requirements for
the Degree of Master of Philosophy
at the University of Hong Kong.
August 2001
Declaration
I declare that this thesis represents my own work, except where due acknowledgement
is made, and that it has not been previously included in a thesis, dissertation or report
submitted to this University or to any other institution for a degree, diploma or other
qualification.
Signed _____________________________
( Ho Man Wing )
August, 2001
i
Acknowledgements
I am pleased to express my sincere thanks and appreciations to my thesis supervisor, Dr.
Kwok-Ping Chan. He gave his constructive advices and comments on my research
works. Besides, he patiently gave his encouragements and supervision of my research
progress. I would also like to thank Dr. Li Chen, for giving his thesis manuscript and the
resources he had worked out on linear phase paraunitary filter bank. His help made a
good start for my work. Special thanks are given to Dr. Jelena Kovačević, for
explaining her work in detail. Her theory on non-uniform filter bank has been the
foundation of my research.
Thanks also go to the staffs of the department general office and technical
support team, the department of Computer Science and Information Systems, for helps
in every aspect during my study. Gratitude is given to the members of the Committee of
Research and Conference Grants and Student Research Support, for providing financial
support in my research work.
Lastly, I would like to thank all my friends in The University of Hong Kong,
Ben Ng, Wan Hon Hing, Tse Hiu Ming, Cheng Ho Yin, Choy Ming Fai and Sam Yu.
ii
Contents
List of Figures vi
List of Tables ix
Chapter 1 Introduction 1
1.1 Non-uniform filter banks...................................................................................2
1.2 Coefficient context modeling............................................................................3
1.3 Thesis outline.....................................................................................................4
Chapter 2 Linear Phase Paraunitary Filter Banks and Introduction to Non-
uniform Filter Banks 6
2.1 Introduction.......................................................................................................6
2.2 Perfect reconstruction filter banks.....................................................................7
2.2.1 Polyphase representation...................................................................7
2.2.2 Paraunitary systems.........................................................................10
2.3 Linear phase paraunitary filter banks..............................................................10
2.3.1 Linear phase paraunitary filter banks for even M............................11
2.3.2 The GenLOT, DCT and LOT..........................................................14
2.3.3 Linear phase paraunitary filter banks for odd M.............................17
2.4 Optimizations...................................................................................................19
2.5 Introduction to non-uniform filter banks.........................................................21
2.5.1 Non-uniform filter bank derived from uniform filter bank.............24
2.5.2 Equivalent rational sampling and filtering structure.......................28
iii
Chapter 3 Overview of Image Coding 31
3.1 Image decomposition.......................................................................................31
3.2 Quantization and bit rate allocation.................................................................34
3.3 Entropy Coding................................................................................................36
3.4 Embedded Zerotree Wavelet (EZW)...............................................................38
3.5 Set Partitioning in Hierarchical Trees (SPIHT)...............................................40
Chapter 4 Non-uniform filter banks and its application to image coding 42
4.1 Introduction.....................................................................................................42
4.2 Extracting the correct spectrum of signal........................................................43
4.3 Linear phase property......................................................................................49
4.4 Design examples..............................................................................................50
4.5 Image coding examples and results.................................................................57
4.5.1 Subband images after non-uniform decomposition.........................57
4.5.2 Coding performance and observations............................................57
4.6 Conclusion.......................................................................................................68
Chapter 5 Space and Frequency Context Modeling for Block-Transform-Based
Image Coding 69
5.1 Introduction.....................................................................................................69
5.2 Intra-band and Inter-band Dependence...........................................................70
5.2.1 Space and frequency neighborhood.................................................71
5.2.2 Statistical context modeling.............................................................73
5.2.3 Context quantization........................................................................75
5.2.4 Another context quantization method..............................................76
5.3 Coding Passes..................................................................................................77
iv
5.3.1 Significance pass.............................................................................78
5.3.2 Refinement pass...............................................................................78
5.3.3 Cleanup pass....................................................................................78
5.3.4 Illustration of coding passes............................................................79
5.3.5 Embedded bit stream truncation......................................................82
5.4 Coding Examples and Results.........................................................................82
5.5 Conclusion.......................................................................................................87
Chapter 6 Conclusion 90
References 92
v
List of Figures
Figure 2-1 M-channel analysis/synthesis filter bank.........................................................9
Figure 2-2 Equivalent polyphase structure of filter bank in above figure.........................9
Figure 2-4 Polyphase representation after rearrangement of decimators and expanders in
Figure 2-2..................................................................................................................9
Figure 2-5 Frequency response of a 16-channel LPPUFB that has been optimized for
coding gain..............................................................................................................14
Figure 2-6 Example lattice structure for GenLOT of 8 basis..........................................15
Figure 2-7 Frequency response of a 7-channel linear phase paraunitary filter bank.......18
Figure 2-8 Filter bank with rational sampling factors.....................................................22
Figure 2-9 Illustration on how signal can be extracted in fractions and reconstructed
perfectly...................................................................................................................23
Figure 2-10 Non-uniform filter bank designed indirectly. (a) Analysis Part (b) Synthesis
Part...........................................................................................................................25
Figure 2-11 Spectral analysis of filter bank with ideal filters for (1/3, 2/3) splitting......27
Figure 2-12 Equivalent structure for Figure 2-10...........................................................30
Figure 3-1 Forward discrete wavelet transform through a 2-channel filter bank............33
Figure 3-2 Two popular subband decomposition structure, left: dyadic wavelet
transform, right: parallel M-channel decomposition...............................................34
Figure 3-3 Illustration of scalar quantization with dead zone.........................................35
Figure 3-4 Demonstration of arithmetic coding using a 4-ary string "CCBDB"............37
Figure 3-5 Parent-child relationship of wavelet coefficient in a EZW coder..................39
vi
Figure 4-1 Illustration on the upsampled version of Hk(z) & Gi(z) and relation of m & n
.................................................................................................................................47
Figure 4-2 Real non-uniform filter bank design example with different values of k. The
plots show the spectrums of the combined channels with bandwidth of 5π/8 for
k=0,1,2,3..................................................................................................................48
Figure 4-3 Symmetry of combined filters of our proposed design.................................49
Figure 4-4 6-channel non-uniform filter bank with bandwidth: (π/16, π/16, π/16, π/16,
5π/16, 7π/16)...........................................................................................................52
Figure 4-5 Frequency plot of the extra 3π/16 band. It is overlapping with that of 5π/16.
.................................................................................................................................53
Figure 4-6 10-channel non-uniform filter bank with bandwidth:(π/16, π/16, π/16, π/16,
π/16, π/16, π/16, π/16, π/16, 7π/16).........................................................................54
Figure 4-7 4-channel non-uniform filter bank with bandwidth: (π/16, 5π/16, 5π/16,
5π/16).......................................................................................................................55
Figure 4-8 4-channel non-uniform Filter bank with bandwidth: (π/16, 5π/16, 5π/16,
5π/16). Both the base bank and the synthesis banks are optimized........................56
Figure 4-9 Subbands of obtained after the decomposition by a 4-channel non-uniform
filter bank: (π/16, 3π/16, 5π/16, 7π/16the image Lena)...........................................60
Figure 4-10 Subbands of the image Lena obtained after the decomposition by a 16-
channel uniform filter bank.....................................................................................60
Figure 4-13 Plot of PSNR vs Bit rates using different decomposition for the image
Barbara....................................................................................................................67
vii
Figure 4-14 Piecewise constant horizontal and vertical power spectrum of image
Barbara. Dotted lines show the best partition found for the non-uniform filter
banks........................................................................................................................67
Figure 4-15 Subbands obtained by different horizontal and vertical non-uniform filter
banks with partitions as shown above.....................................................................67
Figure 5-1 Space and frequency neighbors in three different directions inside a block-
transformed image using M-channel FB or transforms with M basis.....................71
Figure 5-2 Image after transformation with 16x32 LPPUFB..........................................72
Figure 5-3 Example of finding context label...................................................................77
Figure 5-4 Illustration of the coding process...................................................................79
Figure 5-5 Example transformed image with 3x3 blocks...............................................80
Figure 5-6 Example of coding passes for the image in Figure 5-5.................................81
Figure 5-7 Plots of PSNR vs bit rates for images: Lena, Barbara & Goldhill................88
Figure 5-8 Reconstructed images of Barbara at bit rates of 0.5bpp and 1.0bpp.............89
viii
List of Tables
Table 4.1 PSNR results for image coding using different NUFB and LPPUFB on
different test images................................................................................................62
Table 5.1 Lookup table for context labels.......................................................................75
Table 5.2 Objective PSNR results in dB for various image coding method...................86
ix
Chapter 1 Introduction
In recent years, subband image coding is popular as it obtained certain degree of
success. Traditionally, input signal is decomposed into subbands with equal size using a
M-channel filter bank. Meanwhile, wavelet transform is regarded as another kind of
subband decomposition method. It decomposes the input in the dyadic manner and
gives the multiresolution representation of an image. However, these methods analyze
images in fixed frequency bandwidth and thus may not adapt to the different spectral
distribution of some images. In this dissertation, we attempt to investigate and
characterize filter banks that can decompose input signals into subbands with different
bandwidth. We also try to apply these filter banks to image compression to see its
performance.
In addition, the recent development and success of coefficient modeling algorithm
in image coding attract great attentions of researchers. As an important part in image
coding process, an effective coefficient modeling algorithm may often contribute
observable entropy gain and hence improve the rate-distortion performance. Among
these recent advances in image coding, most of them were tailor-made for wavelet-
based image coding like the Embedded Zerotrees Wavelet (EZW) and the Set
Partitioning of Hierarchical Trees (SPIHT). However, some of these algorithms were
adapted to block-transform-based coding and similar success has been noticed.
Therefore, it is attractive to see if any algorithm that can better model the coefficients
after block transform. Consequently, this dissertation also introduces an effective
coefficient context modeling algorithm dedicated for block-transform-based image
compression.
1
1.1 Non-uniform filter banks
It was found that filter banks with rational sampling factors could be obtained by
cascading synthesis filter banks to a uniform filter bank base. The synthesis filter bank
is essentially used to combine signal from several uniform channels into one single
channel occupying a larger bandwidth. This can be easily justified by considering the
transmultiplexer structure, which is used to multiplex more than one signal into a signal
at a higher rate. The most trivial choice is the delay chain, which is actually time-
division-multiplexing (TDM) of signals. With the choice of synthesis and analysis
filters having brick-wall frequency response, it is indeed the frequency-division-
multiplexing (FDM).
Based on this construction, numbers of channels of the synthesis filter banks and
the base filter banks are restricted to be relatively prime of each other. Otherwise, the
downsampler and upsampler cannot be interchanged. Also, in order to have contiguous
frequency response, the desired non-uniform channel has to start at even position in the
base bank. By choosing a base filter bank with 2N channels, any odd-channel filter bank
can act as the synthesis bank for combining the desired channel. In this dissertation, we
try to relax the constraint of the starting position of non-uniform channel by allowing
reversing the matching order of filters from two banks (base and synthesis filter banks).
Linear phase property implies symmetry (i.e. either symmetric or anti-
symmetric) of each filters and is crucial for image coding. Filter banks with non-linear
phase will cause undesirable artifacts in reconstructed image if it is quantized coarsely.
Therefore, we try to find out the criteria for linear phase in the proposed non-uniform
filter banks.
2
Theoretically, subband coding exploits the non-flatness of spectral of the input.
Finer division in frequency will produce lower quantization noise at the same rate. Our
approach to combine subband signals into one signal seems contradicting this.
Nevertheless, we will see that the channels that we are going to combine are basically
having similar variances, i.e. there is very little non-flatness among them. Hence, there
will be a little increase in quantization error at the same bit rate. Conversely, we may
achieve enhancement during the part of lossless coding, if we observe any nice nature in
the wider subband. As an example, for subbands with similar variance, i.e. signal spread
over subbands, energy is probably localized in time domain. This occurs, for example,
in the object boundaries in a picture where energy spread in many uniform subbands
after decomposition. However, with a subband of large bandwidth, we can easily see the
boundaries. The connected nature of the boundaries is somehow useful in context
modeling algorithm.
1.2 Coefficient context modeling
Our coefficient context modeling algorithm is dedicated for block-transform image
compression schemes. It had been pointed out that dependences exist in intraband as
well as interband coefficients. For image transformed or decomposed by M-channel
filter bank in parallel, we can always figure out the adjacent coefficients of a particular
coefficient in frequency and space domain. We call these frequency and space
neighbors. With the first few bits of neighbors appeared, it might be easy to guess how
large the magnitude of the coefficient will be. For example, if most of the neighbors are
seems to be small, then the surrounded coefficients has a high chance to be small too.
This is justified for the smooth region in an image, most of the coefficients are small in
both space and frequency domain. At the same time, large coefficients may also be
3
predicted from its significant neighbors. For example, for highly textured region, the
energy concentrates in few bands while occupying a large area. Meanwhile, for object
boundaries, the signal spreads in different bands.
Our algorithm is going to code the coefficient values in bits starting from the
most significant bit. By observing the neighborhood status, we can determine the
probabilities of the to possible symbol, i.e. 0 or 1. Using a binary arithmetic coder, we
can code the bits following the probabilities. This attains reasonable gain if the
possibility is biased to any one of them. Considering the status of the significance of
each neighbor, we have quite a lot of possible combinations. A well-designed context
quantization may help to solve the problem. Furthermore, the transformed image has
probably quite a lot of insignificant values. That is, we may usually find these
insignificant values occur in groups and hence coding them by groups may enhance the
gain and speed up the coding process.
1.3 Thesis outline
The organization of this thesis is as follows. In Chapter 2, basic concept and design
approaches of linear phase filter banks are revised briefly. The chapter includes criteria
for perfect reconstruction, polyphase representation of filter banks, formulation of linear
phase property, factorization of linear phase filter bank and finally, the optimization
measure. Design of linear phase filter banks is essential to our works in Chapter 4 where
non-uniform filter banks are described. In last section of Chapter 2, the principle of
deriving filter banks with rational sampling factors from uniform filter banks is
explained. This includes the constraints and the equivalent structure.
Chapter 3 reviews some fundamental matters in image compression as well as
some recent achievements by other researchers. These include major processes in lossy
4
image coding like: image decomposition, quantization and entropy coding techniques.
The author hopes this chapter will help readers to better understand the works in
Chapter 4 and 5.
In Chapter 4, the constraint on position of synthesized channel for contiguous
spectrum is relaxed. It is found that the matching of each pair of filter from two banks is
constrained by the position of the synthesized channel. Linear phase property of such
filter banks is considered. The application of such kind of filter banks is applied to
image coding. Different non-uniform bandwidth partitions are mentioned and discussed
base on spectrum of image.
Chapter 5 proposes a new image coding algorithm dedicated for block-
transform-based image coder. Its working principle and implementation details are
explained. The coding process is illustrated and coding results is shown. The
performance and comparison with other coding algorithm are also discussed.
Lastly, chapter 6 presents a summary on this dissertation.
5
Chapter 2 Linear Phase Paraunitary Filter Banks
and Introduction to Non-uniform
Filter Banks
2.1 Introduction
In recent years, the theory of design and implementation of filter banks have been
extensively studied and researched. Applications of filter banks can be found in many of
the signal processing fields like waveform coding in audio and image processing,
crosstalk free transmultiplexing in communication and feature detection in pattern
recognition.
Linear phase (LP) paraunitary filter banks play a major role in our works. Even-
channel filter banks serve as the bases in our designs of non-uniform filter banks. Also,
they act as the control for comparison of coding performance with our non-uniform
filter banks. In the part of context modeling for image coding, we used linear phase
filter bank as the fundamental tool for image decomposition. For odd-channel LP FB,
they were mainly used as the synthesis filter banks to combine several uniform channels
into a non-uniform channel. Therefore, some theory of LP FB will be briefly reviewed
in this chapter before going on with the main parts of this thesis. Besides, some useful
filter banks that were widely used in our works will be shown in this chapter.
In the last section, non-uniform filter banks with rational sampling factors will be
introduced. The proposed filter banks are derived from uniform filter banks. The
principle of maintaining perfect reconstruction property for such filter banks will be
explained. Constraints for equivalent rational sampling structure will be mentioned.
6
This section is the preliminary section for our discussion of non-uniform filter banks in
Chapter 4 .
2.2 Perfect reconstruction filter banks
Traditionally, we only consider maximally decimated M-channel uniform filter banks
for coding purpose where the overall sampling rate will be preserved. Under such filter
bank systems, the input signal X(z) is passed to M analysis filters H i(z) and is filtered.
The resulting signal is then decimated by a factor of M. For applications like image
coding, these subband signals are usually quantized, entropy coded, stored and/or
transmitted. For the synthesis part, the subband signals are upsampled to a factor of M
before filtered by M synthesis filters and added together to reconstruct the original input
signal. Theoretically, if there is no loss of signal after the analysis stage, the signal can
be recovered perfectly during the synthesis stage when perfect reconstruction (PR) filter
bank is used. The only difference is the delay and the scaling factor introduced by the
filter bank system, i.e.
(2.)
where c is a constant and d is an integer. Figure 2-1 depicts the block diagram of signal
flow for a typical maximally decimated analysis/synthesis filter bank system.
2.2.1 Polyphase representation
Let the filters . Suppose they can be
written as [25]
(2.)
Then,
7
(2.)
and hence,
(2.)
Equation (2.) is called Type 1 polyphase representation [25][25] and is referred
as the polyphase components of . Similarly, for the set of filters
, they can be represented as
(2.)
where
(2.)
and hence
(2.)
Equation (2.) is called the Type 2 polyphase representation.
As a result, the set of filters can be represented
as
(2.)
where , and
(2.)
where .
8
E(z) and R(z) are regarded as the Type 1 and 2 polyphase matrix [25] for the analysis
Figure 2-1 M-channel analysis/synthesis filter bank
Figure 2-2 Equivalent polyphase structure of filter bank in above figure
Figure 2-3 Polyphase representation after rearrangement of decimators and expanders in Figure 2-2.
MH0(z)
HM-1(z)
X
H1(z) M
M X’
M F0(z)
M
M F1(z)
FM-1(z)
M
M
M
z-1
E(zM)z-1
M
M
M
z-1
R(zM)z-1
z-1 z-1
M
M
M
z-1
E(z)z-1
M
M
M
z-1
R(z)z-1
(a)
(b)
(c)
(d)
(f)
Figure 411
(a)-
(f) Plots of PSNR vs Bit
rates for
different images using four different filter banks
z-1
9
and synthesis filter bank. The graphical structure of (2.) and (2.) can be found in Figure
2-2. Using noble identities, we can rearrange structure in Figure 2-2 to that in Figure 2-3
[25].
2.2.2 Paraunitary systems
Base on Figure 2-3, it will be a perfect reconstruction filter bank if [25]
(2.)
Obviously, given any FIR polyphase matrix E(z)
(2.)
and R(z) is also FIR. One easy way is to choose a paraunitary matrix E(z) such that
(2.)
Therefore, paraunitary filter bank can be designed if its polyphase matrix E(z) can be
determined. This can be done through cascading a series of elementary paraunitary
matrix. The factorization of paraunitary filter banks has been well described in [25].
2.3 Linear phase paraunitary filter banks
Consider a set of M paraunitary transfer functions whose polyphase matrix E(z)
satisfies the property [4]
(2.)
where N is the order of the paraunitary matrix E(z). The matrix D is a diagonal matrix
with entries +1s for symmetric filters and –1s for anti-symmetric filters. The equation
(2.) though does not represent all the linear phase paraunitary filter banks, it represents a
large and useful class of them, from a practical point of view. In a LP filter bank, each
filter is either symmetric or anti-symmetric. The number of symmetric or anti-
10
symmetric filters is imposed in the equation (2.) and summarized in the following
theorem [4]
Theorem 1: Consider a M-channel linear phase perfect reconstruction system.
1) If M is even, there are M/2 symmetric, and M/2 anti-symmetric filters.
2) If M is odd, there are (M+1)/2 symmetric and (M-1)/2 anti-symmetric filters.
2.3.1 Linear phase paraunitary filter banks for even M
For paraunitary filter banks, the paraunitary property can propagate along with the
structure wherever building blocks with paraunitary property are added. In addition to
paraunitary property, we have to maintain the linear phase property. However, to
constraint the system to be linear phase, the linear phase property cannot propagate
along the system when building blocks are added [4]. Instead, we can have an
intermediate property to help us to achieve this. This is the so called pairwise time-
reversed property. It can be easily converted to linear phase polyphase matrix, as the
sum (difference) of the time-reversed pair is symmetric (anti-symmetric). Moreover,
any linear combination of those symmetric (anti-symmetric) filters will not disturb the
symmetry. Suppose we have Fm(z) to be the polyphase matrix at stage m and satisfying
the pairwise time-reversed property [4]
(2.)
And we would like to add building block so that at next stage
(2.)
11
It was shown that the time-reversed property is maintained by multiplying in
front of as long as 's having a particular form [4]. Therefore, we have the
factorization for the polyphase matrix having time-reversed property. Finally, to
convert it into a polyphase matrix with linear phase, we may assume and
it was shown may have the following form
(2.)
Consequently, we have the complete factorization as
(2.)
where
(2.)
and
(2.)
The factorization in (2.) was further simplified in [28]. With the introduction of
, we have a simpler factorization
(2.)
12
This is called LPPUFB (Linear phase paraunitary filter bank) in [28]. We can see that
LPPUFB characterized by the predefined paraunitary matrices like W and (z) as well
as some arbitrary orthogonal matrices. Therefore, the problem for designing or
optimizing filter banks becomes the process of fixing such arbitrary orthogonal matrices
under certain optimization criteria. It was shown that any orthogonal matrix of size
MM could be factored into a product of M(M-1)/2 Givens rotation [25]. Each of these
Givens rotations is determined by one rotation angle. Observing (2.), we can see that for
LPPUFB with filter length NM (i.e. N stages), there are 2N orthogonal matrices of size
(M/2)(M/2). Hence, there are totally NM(M-1)/4 free parameters. As an example of
design of even channel LPPUFB, Figure 2-4 depicts the frequency response of a 16-
channel LPPUFB that had been optimized for coding gain.
Title:E:\RESEARCH\thesis\16-ch.epsCreator:MATLAB, The Mathworks, Inc.Preview:This EPS picture was not savedwith a preview included in it.Comment:This EPS picture will print to aPostScript printer, but not toother types of printers.
Figure 2-4 Frequency response of a 16-channel LPPUFB that has been optimized for coding gain
13
2.3.2 The GenLOT, DCT and LOT
A. Generalized Lapped Orthogonal Transform (GenLOT)
The GenLOT was defined as a LPPUFB satisfying (2.) with E0 chosen to be the DCT
matrix[28]. The GenLOT with filter length L=NM have N-1 stages after the DCT. The
DCT output has to be arranged in two groups according to the symmetry (symmetric or
anti-symmetric), as indicated in Figure 2-5. The polyphase matrix was defined as
(2.)
The DCT (Discrete Cosine Transform) and LOT (Lapped Orthogonal Transform) can
be regarded as special classes of GenLOT with N=1 and N=2. Since, E0 is chosen to be
the DCT matrix, there are only N-1 stages of free parameters, i.e. (N-1)M(M-1)/4 free
parameters.
B. Discrete Cosine Transform (DCT )
DCT has been widely used in application of image coding and video coding. DCT has
sub-optimal energy compaction performance for processes modeled by first-order
z-1
z-1
z-1
z-1
½
½
½
½
½
½
½
½
z-1
z-1
z-1
z-1
½
½
½
½
½
½
½
½
U1
V1
- - - -
- - - -
0
1
2
3
4
5
6
7
7
UN
VN
- - - -
- - - -
DCT
0
1
2
3
4
5
6
7
0
2
4
6
1
3
5
7
W(z)W W(z)W
1
N C
0
2
4
6
1
3
5
7
Figure 2-5 Example lattice structure for GenLOT of 8 basis.
14
regression with correlation coefficient when 1. Therefore, it is an alternative
choice other than the optimal Karhunen-Loéve transform (KLT) for image coding.
Since KLT is signal dependent, it is computationally complex. Furthermore, the DCT
can be computed via fast algorithm. The basis functions of DCT are defined as
(2.)
The disadvantage of DCT in image coding is that blocking effect appears when the
image is compressed to a high ratio. It is because in order to achieve high compression
ratio, coefficients after DCT will be unavoidably quantized coarsely.
C. Lapped Orthogonal Transform (LOT)
The LOT was developed to alleviate the major disadvantage of traditional block
transforms: blocking effects [10]. In block transform like DCT, the input signal is
divided into separate blocks and the blocks transformed independently. This produces
discontinuities at boundaries of blocks when the coefficients are coarsely quantized
before the inverse transform. On the other hand, the LOT allows overlapping of input
signal and thus makes the block boundaries smooth without increasing the bit rate.
Moreover, the LOT (9.20dB for type-I LOT with 8 basis functions) has higher coding
gain than the DCT (8.83dB) for image coding [10]. Generally, it has better coding
performance than DCT.
Also, the LOT has its fast implementation algorithm based on fast algorithm of
DCT and DST-IV(Discrete Sine Transform, type-IV). Nevertheless, there is still the
trade-off between LOT and DCT for better coding performance (along with reducing
15
blocking effects) and computation efficiency. As mentioned above, the LOT can be
regarded as a special case of GenLOT with 2 stages and having filter lengths of 2M. For
type-I LOT, U1 is taken to be I while V1 is characterized by 3 Givens rotations.
2.3.3 Linear phase paraunitary filter banks for odd M
Similar to that in subsection 2.3.1, we can achieve linear phase filter banks with odd
number of channels by cascading paraunitary building blocks to the structure having the
pairwise time-reversed property[4]. However, using the factorization in [4], we will
have some zeros imposed in each of the filters (for order > 0) and the number of zeros
depends on the order of polyphase matrix. Moreover, it is impossible for us to have
meaningful 3-channel system having more than 1 stage. Indeed, using the above
factorization, we will have zeros inserted between the 3 non-zero coefficients for filter
bank with length of filters longer than 3. This is undesirable for us as we usually need
the 3-channel linear phase system during our design of non-uniform filter bank later in
this dissertation.
Fortunately, there is another cascade structure for linear phase systems that can
solve the problem [6]. Using this factorization, we have more degree of freedom when
designing linear phase system. Besides, for the 3-channel case, we may have system
with order higher than 0. The factorization for the filter bank with odd-channels is as
follow:
(2.)
where
16
(2.)
(2.)
(2.)
and
(2.)
where and
17
For order-N system in the above factorization, the number of free parameters is
(2.)
An an design example, Figure 2-6 shows the frequency plot of a 7-channel filter bank.
Title:E:\RESEARCH\thesis\7ch.epsCreator:MATLAB, The Mathworks, Inc.Preview:This EPS picture was not savedwith a preview included in it.Comment:This EPS picture will print to aPostScript printer, but not toother types of printers.
Figure 2-6 Frequency response of a 7-channel linear phase paraunitary filter bank
18
2.4 Optimizations
A. Coding Gain
For coding purpose, it is always preferred to optimize filter banks using their coding
gain. For paraunitary system, the coding gain was proved to be[2]
(2.)
where is the variance of ith subband and is the variance of the input:
(2.)
Usually, we use AR(1) process to model the input for image coding and the intersample
autocorrelation coefficient is taken to 0.95. The autocorrelation of such process is
defined as
(2.)
And we have the subband variance
(2.)
B. Stopband Attenuation
For some application, we may need to optimized the filter banks according to the
stopband attenuation of its filters formulated as
(2.)
19
The above formula sums up all the stopband energy for all subband filters in the filter
bank. For paraunitary system, the last filter may not be included as the whole system is
power complementary. The stopband attenuation of a filter can be computed through the
autocorrelation of its impulse response.
C. Polyphase Normalization and DC Leakage
Sometimes, it is desirable to have only one transform coefficient to be non-zero for
constant input. In other words, we do not want to have DC (i.e. at =0) constituent to
leak to other subbands other than the DC subband. This avoids the so-called
checkerboard artifacts. Hence it implies we need to have
(2.)
And the criterion becomes minimizing
(2.)
2.5 Introduction to non-uniform filter banks
In [18] and [12], non-uniform filter banks with rational sampling rate based on the
uniform filter bank system have been introduced. By cascading the uniform analysis
filter bank with corresponding synthesis bank with certain number of channels, desired
non-uniform filter can be synthesized. Repeat this with other set of contiguous channels
on the uniform base filter bank will eventually form the required complete non-uniform
filter bank. For the synthesis system of the non-uniform system, one just needs to
reverse the process, i.e. split the combined signals using the corresponding analysis
filter bank. This is then followed by the synthesis part of the uniform base. This
structure is actually still a system with two stages, i.e. filtering and downsampling
20
followed by upsampling and filtering. However, our aim is to form filter bank with
rational sampling rate, i.e. to first upsample the input signal, filter the signal and
downsample the signal. To achieve this, we should be able to exchange the
downsampler and upsampler as well as move them across the two filters. And this is
valid if and only if the downsampling factor of the analysis base bank and the
upsampling factor of the synthesis banks are relatively prime.
In [18], the synthesis filter bank used to form combined channels was always
chosen to be the delay chain (i.e. 1, z-1,z-2, … ). This will produce perfect reconstruction
filter banks, as the cascaded parts are perfect reconstruction. This can actually be
regarded as time division multiplexing of separate signal into one combined channel.
This choice of the synthesis filter banks will make the shortest resulting rational
sampling filters since delay chain is the shortest possible perfect reconstruction filter
bank. However, delay chain makes relative shift of the individual symmetric (or anti-
symmetric) filters and therefore, shifts the center of symmetry of each filter. Finally, the
resulting combined filter will be not symmetric. This is undesirable for image coding
and not suitable for us.
21
In [12], the indirect method allow us to design non-uniform filter banks base on
uniform filter banks using synthesis filter banks not restricted to be the delay chain
system but should be perfect reconstruction in nature. If the chosen synthesis bank is
with ideal filter responses, one can regard this as the frequency division multiplexing of
signals. This allows more design freedom because we can alter the synthesis bank to
achieve certain design criteria such as the stopband error. Besides, by choosing the
linear phase uniform base bank and linear phase synthesis bank, symmetric or anti-
symmetric rational sampling filters can be formed.
In our discussion, we have the following assumptions:
All filters are assumed to be FIR with real coefficients and contiguous passbands;
All signals are assumed to be real;
All filter banks are critically sampled;
The synthesis banks used for recombination will contain band pass filters with
central frequencies in increasing order.
2.5.1 Non-uniform filter bank derived from uniform filter bank
Figure 2-7 illustrates our desired structure of filter bank with rational sampling factors.
In order to make a clear picture about how a signal can be extracted in rational fractions
Figure 2-7 Filter bank with rational sampling factors
YN-1
QH0(z) P0 Y0
QHN-1(z)PN-1
X QH1(z) P1 Y1
Q F0(z) P0
Q FN-1(z) PN-1
Q F1(z) P1 X’
22
and perfectly reconstructed from these fractions, we make use of an example for filter
bank with partition (2/3π, 1/3π). Using this filter bank, we can extract 2/3 of the lowpass
signal by first upsampling the signal by 2, filter it and downsample it by 3. Meanwhile,
we can extract 1/3 of the highpass signal through filtering followed by downsampling
by 3. Figure 2-8 illustrates the details about this.
To obtain a perfect reconstruction filter bank, we can combine a base analysis
filter banks consisting of Q filters with N synthesis filter banks, each with p i filters (i=0,
… , N-1). Obviously, if the analysis base bank and synthesis bank are both perfect
reconstruction, the resulting system will be perfect reconstruction as well. However, one
has to take care of the difference in delay for different rational sampling filter and the
unaffected filters before passing from analysis part of the system to the synthesis part.
23
Before going further our discussion, we need to understand the constraints on
the sampling factors that are limited in this method. Suppose our desired filter bank is
with sampling factors (p0/q0, p1/q1, … , pN-1/qN-1). Ideally, for any ith channel (starting
from 0), it will occupy frequency range:
(2.)
Assume the filter bank is critically sampled, i.e.:
Figure 2-8 Illustration on how signal can be extracted in fractions and reconstructed perfectly.
(top: filter bank structure; bottom left: spectral analysis of rational sampling (2/3) lowpass; bottom right: spectral analysis of highpass
(1/3) )
2 H0(z) 3
H1(z) 3
2G0(z) 3
G1(z) 3
1 2 3 4 5 6 7
1
2
3
4
5
6
7
24
(2.)
As it will be mentioned later in this chapter, for the upsampler and downsampler to be
interchangeable, all pi’s have to be coprime with Q. Where Q is defined as:
(2.)
Therefore,
(2.)
Obviously, for all pi’s to be coprime with Q, all ri
’ s have to be 1. This means
(2.)
and
(2.)
As a result, in our following discussion, we assume all filter banks have sampling rates
of (p0/Q, p1/Q, … , pN-1/Q)[12].
25
Figure 2-9 Non-uniform filter bank designed indirectly. (a) Analysis Part (b) Synthesis Part
F0 (z) Q
FP0-1(z) Q
K0 0(z) P0
K0 P0-1(z)
P0
FQ-PN(z) Q
FQ-1(z) Q
KN-1 0(z)
PN-1
KN-1 PN-1(z)
PN-1
X’(z)
Base Synthesis BankAnalysis 0
Analysis N-1
Y0(z)
YN-1(z)
X(z)
H0 (z) Q
H P0-1(z) Q
G0 0(z)
G0 P0-1(z)
P0
HQ-PN (z) Q
HQ-1(z) Q
GN-1 0(z)
PN-1
GN-1 PN-1(z)
PN-1
Base Analysis BankSynthesis 0
Synthesis N-1
Y0(z)
YN-1(z)
P0
26
Although it can maintain the perfect reconstruction property, this method does
not ensure the filter bank giving meaningful spectral output. For example, consider a
filter bank with sampling factors (2/3, 1/3) (that is, the lowpass filter of bandwidth 2/3
and the highpass filter of bandwidth /3). It can be verified that the lowpass channel
conserve the input frequencies from 0 to 2/3 in correct and contiguous order. If we use
highpass and lowpass filters of the synthesis bank to match the first and second channel
respectively, the system would still be perfect reconstruction, but now, the spectrum in
the 2/3 subband would be the mirror image of the true spectrum of the signal (that is,
“high” frequencies would appear before “low” one).
Consider a filter bank with sampling factors (1/3, 2/3), now we have a wider
highpass filter. It can be seen that the original signal cannot be preserved contiguously
and in correct order. If a lowpass/highpass pair (from the 2-channel synthesis bank) is
matched with the second/third channels (from the base bank), we get a shuffled and
mirrored version of the spectral bands of interest. At the same time, with a
highpass/lowpass pair, low and high bands are interchanged. This is illustrated in Figure
2-10.
To formulate the observation, the synthesis banks used are assumed to have
bandpass filters with central frequencies in increasing order. i.e. analysis filters with
increasing central frequencies combine with synthesis filters in increasing order only.
Suppose we want to extract the following part of input signal:
(2.)
Consider the first branch only, after downsampling by Q,
27
(2.)
(a)Original spectrum (dotted lines show the parts to be extracted by three analysis filters)
(b) Spectrum of 2nd band output after 3 and 2.
(c) Spectrum of 3rd band output after 3 and 2.
(d) If we choose lowpass/highpass (synthesis bank) for the 2nd/3rd (base bank),
non-contiguous spectrum results.
(e) If we choose highpass/lowpass (synthesis bank) for the 2nd/3rd
(base bank), non-contiguous spectrum results again.
Figure 2-10 Spectral analysis of filter bank with ideal filters for (1/3, 2/3) splitting.
/2 3/2 2
/2 3/2 2
/2 3/2 2
/3 2/3 4/3 5/3 2a
28
and after upsampling by P,
(2.)
To avoid shuffling of frequencies, one of the XRi but not its mirrored version has to fall
within [0, π/P]. Hence,
(2.)
and thus give the following proposition [12]:
Proposition: Not to have shuffling of frequencies in the indirect method, k (as defined
above), has to be even (assuming P>1 )
2.5.2 Equivalent rational sampling and filtering structure
Consider the sampling factor pi and Q in our proposed non-uniform filter banks design.
If pi and Q are relatively prime, the upsampler and downsampler are commutative [25].
And according to noble identities:
We can move the upsampler towards the left across the analysis filters H(z) and raise z
of it to the power of pi, thus it becomes . This is actually equivalent to
upsampling H(z) with a factor of pi. Similarly, we can move the downsampler across the
synthesis filters G(z) towards the right and raise z to the power of Q. Again, this is the
same as upsampling G(z) with a factor of Q. Then, the two upsampled filters
and are brought adjacent to each other and can be merged to be .
Now we have pi branches consist of upsampler of pi and the merged analysis filter
G(z) M M G(zM)
L G(zL) L G(z)
29
followed by downsampler of Q. These branches in parallel can be added to form
equivalent transfer function below and structure is as illustrated in Figure 2-11:
(2.)
Having this single transfer function, one should have idea of the frequency response of
the signal filtered by this rational sampling channel. Besides, when optimizing the filter
bank, one can have better control of the magnitude response of the non-uniform filters.
To the implement the equivalent filtering and rational sampling, we can adopt
polyphase implementation mentioned in [25] as the upsampler and downsampler are
relatively prime. The implementation requires only (N+1)/M MPUs and (N+1-L)/M. N,
L, M are the length of equivalent filter, upsampling rate and downsampling rate,
respectively. For the equivalent filter, the length is N = (NH –1)L + (NG -1)M+1, where
NH and NG represents the length of base analysis filters and synthesis filters respectively.
Figure 2-11 Equivalent structure for Figure 2-9
X(z)
Y0(z)
YN-1(z)
1
0
0'
0
0
0)()()(
p
j
Qj
pjk zGzHzH P0
1
0
1'
1
1
1)()()(
N
N
N
p
j
Qj
pjkN zGzHzH QPN-1
Q
30
Chapter 3 Overview of Image Coding
Image coding has been widely researched for higher compression gain as well as faster
algorithm. Among the common practices, we can summarize the coding algorithm in
few steps: 1) image is first transformed or decomposed into subbands; then 2) bit rates
are allocated according to the subband variances and coefficients are quantized; finally
3) quantized coefficients are entropy coded or coded by dedicated coefficient modeling
algorithms. Transform or subband filter bank are used to decorrelated inputs and
compact energy into fewer coefficients. As a kind of subband decomposition technique,
wavelet transform was found to be effective for image coding. Along with development
of subband decomposition techniques, some dedicated coefficient modeling algorithms
have also been introduced. In this chapter, we will review some common aspects of
image coding.
3.1 Image decomposition
A. Transform coding
Consider an input signal x(n) , it can be arranged into blocks xm of size M
(3.)
The M samples of xm are then mapped to a set of M coefficients in the vector ym through
a linear combination
(3.)
where a(i,k) are called the basis functions. The above relation can be written in matrix
form
31
(3.)
where A=[a(i,k)]MM. To reconstruct xm from ym, it is clear we have to choose the inverse
transform to be A-1. For orthogonal transform, the inverse transform is AT.
The most popular transform for image coding is the DCT (Discrete Cosine
Transform). It is an orthogonal transform and essentially good at energy compaction for
most correlated sources. In addition, its performance is very close to the optimal KLT
(Karhunen-Loéve Transform) for AR(1) process with high correlation coefficient . As
images can be modeled by AR(1) process with high correlation coefficient, the DCT
had been applied in many international standards like JPEG, MPEG, CCITT H.261.
B. Subband coding
Typically, in subband coding, the input signal is decomposed by a set of M analysis
filters. To preserve the overall sampling rate, each subband signal is downsampled to a
factor of M. The signal can be perfectly reconstructed if the subband signals are fed into
the synthesis bank immediately and PR filter bank is chosen. However, in real coding
application, subband signals undergo quantization and entropy coding before being
transmitted and stored. During decoding, compressed data are entropy decoded and
dequantized. Then, at the synthesis stage, each subband signal is upsampled to a factor
of M by inserting M-1 zeros between each two samples, filtered by M synthesis filters
and added to form the output signal.
Transform coding mentioned above can be thought as a special case of subband
coding where each filter length is not longer than M. On the other hand, we can think
subband decomposition is equivalent to transforming block of elements by the
polyphase matrix. For each element in the polyphase matrix, they can be of order higher
32
than 0. If it is only an order-0 polpyhase matrix, it is the same as typical transform
mentioned in last subsection.
C. Wavelet decomposition
It has been found that discrete wavelet transform can be achieved by cascading a 2-
channel perfect reconstruction filter bank iteratively on its lowpass output as depicted in
Figure 3-12. It can be observed that such construction will be perfect reconstruction as
long as the 2-channel filter bank is PR. The discrete wavelet transform can be regarded
as a multiresolution decomposition of a signal into coarser and detail constituents. For
image compression, wavelet decomposition is performed horizontally and then
vertically. This results in four sub-images. Three of them are highpass and contain
horizontal, vertical and diagonal edges and textures respectively. The lowpass subimage
gives a small scale of the original image. Splitting the lowpass subimage using the 2-
channel FB repeatedly will produce coarser and detail representation of the image at
different scales. In addition, from frequency domain point of view, wavelet transform
has good energy compaction capability that fit the decaying spectrum hypothesis for
image compression.
Figure 3-12 Forward discrete wavelet transform through a 2-channel filter bank.
X
2H0(z)
H1(z) 2
2H0(z)
H1(z) 2
2H0(z)
H1(z) 2
33
3.2 Quantization and bit rate allocation
A. Quantization
In lossy compression, the only process that introduces noise and thus causes loss in
reconstruction of signal is quantization. Quantization is a process of representing the
source outputs using smaller number of codewords. If the inputs are scalars, we call the
process scalar quantization. Similarly, if the inputs are vectors, we call it vector
quantization. For scalar quantization, a scalar value within a particular interval is
mapped to the corresponding integer representing that range. During the dequantization,
Figure 3-13 Two popular subband decomposition structure, left: dyadic wavelet transform, right: parallel M-channel
decomposition.
34
the integer is mapped back to the estimate value for that particular interval. The estimate
value is always taken to be the centroid inside the range to minimize the error if the
source inputs are evenly distributed in the range. If all the intervals are of the same size,
we can it uniform quantization. Otherwise, it is non-uniform quantization.
Due to its simplicity, scalar quantization with uniform intervals is always chosen
for practical coding purpose. For image coding, it is crucial to have zero values
represented accurately. Therefore, it is desirable to have the so-called midtread
quantizer that can output zero upon quantization. In image coding, most of the high
frequency coefficients will be quantized to zero. More zeros will be quantized to if we
set quantization interval containing zero to be larger than other intervals. This was
found to have higher compression gain as it favors entropy coding of zero runs. Figure
3-14 illustrates the simple scalar quantization method with dead zone.
35
B. Bit rate allocation
After the image having been decomposed or transformed into different subbands, we
have to quantize the coefficients to certain number of bits. Since, subband coding
exploits the non-flatness of spectral of the input, it makes sense to assign different bit
rates to different subbands. The problem is: given an overall bit rate R, how can we
assign the bit rates Ri to subband i such that the overall reconstruction error is
minimized? Since under scalar quantization, the coefficients are divided into
intervals. The reconstruction error variance of quantized subband i is given by
(3.)
where is the variance of subband i and is constant. Suppose there are M subbands
in total. Then, the overall error variance after quantization is
Figure 3-14 Illustration of scalar quantization with dead zone
Input
Out
put
-4 -3 -2 - 0 2 3 4
-3 -2 -1 0 1 2 3
36
(3.)
Our objective is to determine Ri while minimizing the error variance. The minimization
problem can be solved using Lagrange multiplier and the optimal bit rate allocation
is[19]
(3.)
3.3 Entropy Coding
After quantization, we still observe some dependencies of the quantized values. For
example, there is large portion of zero samples comparing to non-zero samples and the
probability of seeing large values is much less than small values. This makes rooms for
better compression performance through entropy coding. These entropy coding
techniques include Huffman coding, arithmetic coding and run-length coding. Here, we
briefly describe the principle of arithmetic coding and run-length coding as they
contributed to the success of our context modeling algorithm in Chapter 5 .
A. Arithmetic coding
37
The idea of arithmetic coding is to represent the string of symbols using a real number
between 0 and 1. Initially, we have an interval [0,1] and it is divided into sub-intervals
with lengths equal to the probabilities of symbols. When a symbol is encoded, we take
its corresponding sub-interval. Then, this sub-interval is divided similarly into ranges
proportional to the probabilities of symbols. When encoding the second symbol, we
choose the corresponding sub-interval again. This process is repeated until all symbols
have been encoded. The string of symbol can be represented using any number within
the final interval. Of course, we would choose the shortest fixed-point representation
that can be mapped inside that final interval. As an example, Figure 3-15 demonstrates
the processing of arithmetic coding for the alphabet {A, B, C, D} with probabilities
{0.1, 0.3, 0.4, 0.2}. From that example, we have the final interval [0.61536, 0.61824].
We can represent the string using 0.6171875 that is in the final interval and its binary
representation is 1001111.
38
Arithmetic coding allows representing symbols with fractional number of bits.
Therefore, it benefits entropy coding with small alphabet (with even 2 symbols).
Another advantage of arithmetic coding is that it only needs to update the probability
counts adaptively during coding. Also, it allows switching of different context models
easily made it suitable for conditional entropy coding. The drawback of arithmetic
coding is that it requires multiplication and division each time a symbol is being
encoded. For binary case, this can be solved by an arithmetic coder called MQ-coder
mentioned in [11] which requires no multiplication and division when encoding
symbols.
B. Run-length coding
After image decomposition and quantization, there are always long runs of zeros
in high frequency subbands. These long runs of zeros represent the smooth or blur
region of image. It is often beneficial to encode zero runs using a special symbol or
Figure 3-15 Demonstration of arithmetic coding using a 4-ary string "CCBDB".
D
C
B
A
1.0
0.8
0.4
0.1
0.0
D
C
B
A
1.0
0.8
0.4
0.1
0.0
D
C
B
A
1.0
0.8
0.4
0.1
0.0
D
C
B
A
1.0
0.8
0.4
0.1
0.0
D
C
B
A
1.0
0.8
0.4
0.1
0.0
39
prefix followed by the number indicating the number of upcoming zeros. This
contributes compression gain as well as speeds up decoding.
Recently, there are some algorithms that exploit the correlation among
coefficients when coding of the quantized coefficients. These algorithms surpass
traditional approaches by representing zeros effectively and coding important
coefficient first.
3.4 Embedded Zerotree Wavelet (EZW)
The embedded zerotree wavelet (EZW) coder introduced by Shapiro [17] is a wavelet-
based image coder. It exploits the self-similarity across different scales of an image by
organizing the coefficients in tree structure. A coefficient at coarser scale (lower
frequency subband) has four child coefficients at finer scale (higher frequency subband)
at the same relative spatial location of the subimage as depicted in Figure 3-16. For the
coefficients at the LL subband, they have only three child coefficients. Meanwhile for
the coefficients at the largest scale, they do not have any child coefficient.
40
The EZW algorithm bases on the successive-approximation quantization (SAQ)
to provide embedded property. SAQ is actually the same as bit-plane encoding of
magnitudes. Given the threshold T, a coefficient is said to be significant if it ≥ T.
Otherwise, it is insignificant. After all the coefficients have been scanned with respect
to T, T will be halved. The EZW coder consists of two passes: dominant pass is for
encoding the leading zero bits, the leftmost 1 and the sign of the coefficients; sub-
ordinate pass is for refining the magnitude of coefficient that have been found to be
significant. Coefficients are scanned from subbands of lower resolution to high
resolution. An arithmetic coder has been employed to code one of the four possible
symbols: POS - positive coefficient, NEG - negative coefficient, IZ - isolated zero and
ZTR - zerotree root. For POS, NEG and IZ, there is at least one of the descendants of
the current coefficient is significant and thus its children have to been coded. ZTR
Figure 3-16 Parent-child relationship of wavelet coefficient in a EZW coder
41
denotes all nodes in the tree rooted at the current coefficient are insignificant and thus
we do not have to code any of its descendants.
The EZW algorithm had been found to have superior performance[17]. The
philosophy behind is the implication of insignificance of coefficients from its parent at
coarser scale. i.e. the insignificance of a coefficient at a coarser scale always implies
insignificance of its children at larger scale. This is quite adequate for image coding as
for natural image energy is always concentrated in low frequency and decays towards
high frequency. As a result, the zerotree structure fits the spectrum quite well and able
to representation lot of zero samples effectively.
3.5 Set Partitioning in Hierarchical Trees (SPIHT)
The set partitioning in hierarchical trees (SPIHT) [5] image compression algorithm has
been regarded as the improved version of the EZW algorithm. Instead of using a 4-ary
alphabet, it codes each significant status (either for a set of coefficients or individual
coefficient) and sign of coefficients in 0s and 1s. The SPIHT algorithm uses the
following notation for explaining the new coding method:
O(i, j): set of coordinates of all offspring of node (i, j);
D(i, j): set of coordinates of all descendants of the node (i, j);
H: set of coordinates of all spatial orientation tree roots (nodes in the highest
pyramid level );
L(i, j) = D(i, j) – O(i, j)
The parent and child relationship is the same as the EZW algorithm. We can
regard (i, j)+D(i, j) as the tree rooted at (i, j) while L(i, j) was newly introduced. The
algorithm maintains three lists: LIP (list of insignificant pixels) contains insignificant
pixels result from set partitioning; LIS (list of insignificant sets) consists of all
42
insignificant sets of pixels and LSP (list of significant pixels) contains pixels whose
magnitude are to be refined in refinement pass. Initially, we have only tree roots in LIP
and their corresponding D(i, j) set in LIS. During the sorting pass (similar to the
dominant pass in the EZW), the LIP will be scanned prior to LIS. After that, LSP will
be scanned during refinement pass.
When examining the LIP, individual pixel is coded by outputting its significance
state, and its sign if it is significant. Any significant pixel will be moved to the LSP.
When scanning the LIS, the algorithm outputs the significance state of each set. If
it is insignificant, none of the coefficients in the set is significant and thus no need to be
coded individually. However, if it is significant, it will be partitioned like this:
if it is a D(i, j) set then it is broken into L(i, j) plus four individual elements
(k, l), (k, l) O(i, j)
if it is a L(i, j) set then it is broke into four D(k, l), (k, l) O(i, j)
The old set will be removed while the outcome sets (except empty one) will be put to
the end of the LIS. For the four elements coming from the set partitioning, they are
coded similar to those in the LIP. After that, any insignificant pixel remains will be
moved to LIP.
The SPIHT [5] algorithm surpasses the EZW [17] in terms of the PSNR
performance. Since it outputs each event in 0 and 1, there is also binary uncoded
version, which omitted the arithmetic coder. The binary uncoded version is 0.3dB to
0.6dB inferior to the one with arithmetic coding. Nevertheless, it is faster in coding and
simpler in implementation.
43
Chapter 4 Non-uniform filter banks and its
application to image coding
4.1 Introduction
In section 2.5 of Chapter 2 , we have discussed the principle of deriving filter banks
with rational sampling rate from uniform filter banks [12]. According to [12], the
position of synthesis bank (k) is restricted to even. This is very limited when we want to
adapt spectral characteristics of images using non-uniform filter banks with different
bandwidth partitions. Moreover, for image coding, it is very critical for filter banks to
have linear phase. The work in [12] did not consider the linear phase property.
In this chapter, the conditions for relaxing the constraint of k will be shown so that
it can take every position other than even. In that case, we have to allow reversed
matching order of two sets of filters in terms of their passband frequencies. Besides,
linear phase (LP) property can be achieved when we have both LP base filter bank and
LP synthesis filter banks. Symmetric polarities are simply depending on the parity of k.
Moreover, the optimization issues of such non-uniform filter banks will be discussed.
Then, the optimized filter banks will be applied to image coding. It will be proposed
that different non-uniform partitions can be applied subject to the spectral distribution
of images. The idea of determining non-uniform partition will be explained. Lastly, the
coding results will be discussed and observations will be explained.
44
4.2 Extracting the correct spectrum of signal
It has been emphasized that the filter banks achieve should have meaningful spectrum.
By “meaningful”, we mean the spectrum extracted should be contiguous and in correct
order. Consider the equivalent structure:
(4.)
The first analysis filter Hk(z) is situated in
(4.)
and the last analysis filter Hk+p-1(z) is situated in
(4.)
while the Gi(z) resides at
p
i
p
iGi
1,:
(4.)
Consider the upsampled versions of Hk(z) and Gi(z), i.e. Hk(zp), Gi(zQ). For simplicity,
we neglect the any other things outside [0, π] as they are just the periodic and mirrored
versions of those in [0, π]. For even n, the normal version is:
(4.)
and for odd n, the mirrored version is:
(4.)
where n=0, 1, … p-1.
The normal version is:
45
(4.)
The mirrored version is(4.)
where m=0, 1, … q-1.
Suppose we want to extract the contiguous spectrum in the upsampled domain and we
allow the matching of two sets of filters in reverse order in terms of passband
distribution. Spectrum can be extracted only if the two sets of filters coincide.
Therefore, we have to find m and n such that for even n,
(4.)
and for odd n,
(4.)
where m=0, 1, … , q-1 and n=0, 1, …, p-1.
Hence, we have to solve m and n below:
(4.)
or
(4.)
Rewriting above two equations, we have
for even n(4.)
or
for odd n(4.)
46
To solve the above two sets of equations, one can make use of the Chinese Remainder
Theorem as p and q are relatively prime. For (4.)
(4.)
and for (4.), it becomes
(4.)
On solving a and b of the above two equations, we can find m and n.
Examples
Let’s consider a non-uniform filter banks with passband bandwidth: (1/8π,
1/8π,1/8π,5/8π). Here, we have p=5, q=8 and k=3 for the combined channel. We have
(4.)
Thus for equation (4.),
(4.)
or for equation (4.),
=> This violates the assumption that n should be odd
(4.)
Therefore, we have n=4 and m=7. Similarly, one can verify that for the following non-
uniform filter bank cases:
(1/8π, 1/8π , 5/8π, 1/8π), (1/8π, 5/8π, 1/8π, 1/8π) and (5/8π, 1/8π, 1/8π, 1/8π)
47
The values of m and n are: m=5, n=3; m=2, n=1; m=0, n=0 respectively. Figure 4-17
illustrates the location of Hk(zp) and Gi(zq) and the relation of m and n in upsampled
domain. While Figure 4-18 shows the situation of spectrum of the combined channel
with different k in our practical design.
Now, let’s consider the case of (1/3, 2/3) we have mentioned before. We have p=2, q=3
and k=1. Hence, we have
(4.)
Thus for (4.),
(4.)
and similarly, for(4.), we have
(4.)
Both (4.) and (4.) violate the parity assumption of n in (4.) (n is even) and (4.) (n is
odd). As a result, we cannot obtain contiguous passband for the band [π/3, 2π/3] using
this method.
48
0 πm 0 1 2 3 4 5 6 7
0 1 2 3 4 4 3 2 1 0 0 1 2 3 4 4 3 2 1 0 0 1 2 3 4 4 3 2 1 0 0 1 2 3 4 4 3 2 1 0
3 4 5 6 7 7 6 5 4 3 3 4 5 6 7 7 6 5 4 3 3 4 5 6 7
2 3 4 5 6 6 5 4 3 2 2 3 4 5 6 6 5 4 3 2 2 3 4 5 6
1 2 3 4 5 5 4 3 2 1 1 2 3 4 5 5 4 3 2 1 1 2 3 4 5
0 1 2 3 4 4 3 2 1 0 0 1 2 3 4 4 3 2 1 0 0 1 2 3 4
n 0 1 2 3 4
Figure 4-17 Illustration on the upsampled version of Hk(z) & Gi(z) and relation of m & n
Figure 4-18 Real non-uniform filter bank design example with different values of k. The plots show the spectrums of the combined
channels with bandwidth of 5π/8 for k=0,1,2,3.
49
Note that for even m and n, the spectrum is the normal version, i.e. the bands go
in ascending order as frequency increases. On the other hand, for odd m and n, the
spectrum is the mirrored version. Thus, the bands go in descending order as frequency
increases. As a result, the filters should be matched in normal order (reverse order) if
the two sets of bands go in same direction (reverse direction). In other words, the filters
should be matched in normal order if
(4.)
Otherwise, the channels should be matched in reverse order
(4.)
4.3 Linear phase property
For image coding using uniform filter banks, the number of channels is always chosen
to be 2N. So, in our image coding examples later in this chapter, we use filter banks with
Figure 4-19 Symmetry of combined filters of our proposed design.
where S = symmetric filter and A = anti-symmetric filter
(a)The resulting filter will be symmetric if the first base filter is symmetric, i.e. k=even
(b)The resulting filter will be anti-symmetric if the first base filter is anti-symmetric, i.e. k=odd
S S
A A
A A
S S
S
odd number of channels
A S
S A
S A
A S
A
odd number of channels
k=even k=odd
(a) (b)
50
2N channels as the base banks for deriving non-uniform filter banks. As mentioned
before, the downsampling rate of the base filter banks and the upsampling rate of have
to be coprime and our base filter banks have 2N channels (i.e. the downsampling rate
have factor of 2 only ). As a result, the number of channels of the synthesis banks can be
any odd number larger than 1 and less than 2N.
The work in [12] only considered the general filter bank with rational sampling
rate but not filter banks with linear phase. However, as image coding using non-linear
phase filter banks produces undesirable artifacts upon quantization, linear phase
property is essential for our non-uniform filter bank, i.e. each filter should be symmetric
or anti-symmetric in its impulse response. Fortunately, one can achieve linear phase
combined channels if both the base filter bank and synthesis banks chosen to form the
rational sampling channels are of linear phase too. Usually, successive filters of linear
phase filter banks have alternating symmetric or anti-symmetric property. Hence, all
branches to be combined will be of the same symmetry i.e. either symmetric or anti-
symmetric as illustrated in Figure 4-19. As the resulting filter is the addition of all these
branches, it will be symmetric or anti-symmetric if these symmetric or anti-symmetric
branches have the same length and thus the same center of symmetry. So, the simplest
way is to choose base analysis filters with equal length as well as for synthesis filters.
For simplicity, in our design examples to be illustrated later in this chapter, all the filter
banks including the analysis base banks and synthesis banks will have equal length
filters. Since our synthesis banks always have odd channels, the first filter will be
symmetric. Thus the symmetry of the combined channel can be determined solely by
the first analysis filter to be combined as depicted in Figure 4-19.
51
4.4 Design examples
Based on the above discussions, we are now able to design our non-uniform filter banks
with rational sampling factors. These filter banks will later be applied to image coding
in next section to see the improvements over uniform ones. As mentioned before, our
base uniform filter banks will have 2N channels. Here, we chose a 16-channel LPPUFB
as reviewed in Chapter 2 . We optimized the filter bank with coding gain for AR(1)
process as shown in Figure 2-4. For those filter banks with odd-channel that is going to
be used to combine several channels into one to form filter with rational sampling rate,
we make use of lattice structure in section 2.3.3.
A. Optimization
Due to large number of lattice coefficients involved in optimization of our proposed
non-uniform filter banks, we simplify the problem by first designing the uniform filter
bank before bringing the synthesis filter bank to form the desired rational sampling
bands. For image coding, it is always better to optimize the filter bank with the coding
gain for the AR(1) process as mentioned section 2.4. After that, following section 4.2 in
this chapter, we need to find the values of m, n as well as in which order the two set of
filters should be matched. We can optimize rational sampling band one by one with the
uniform base unaffected. To optimize the non-uniform channel, we can choose the
stopband attenuation as the criterion. The passband distribution is very similar to any
one of the uniform filters and is determined by m. Therefore, the stopband is
(4.)
where are the number of channel of the base filter bank, the relatively
position of the resulting non-uniform filter, and the two transition bandwidths. For the
52
computation of stopband attenuation from its impulse response, we may refer to section
2.4 of Chapter 2 . Upon optimizing the non-uniform channels accordingly with
satisfactory stopband error, we can further optimize the filter banks by allowing the
uniform base to be fine-tuned. However, we cannot find any literature on the coding
gain of filter bank with rational sampling factors. Besides, filter bank with smaller
overall filter stopband attenuation does not necessarily imply better coding
performance. Consequently, we intend not to make any change to the fine optimized
base bank for the non-uniform bank that will be applied to image coding. However,
filter bank design example by optimizing both base bank and synthesis banks will be
shown too.
B. Examples
Here we show 5 examples of design of non-uniform filter banks. The first three are
related while the last two have the same frequency distribution. All of them are based on
a 16-channel LPPUFB as described in Chapter 2 and was shown in Figure 2-4. The odd-
channel LPPUFB responsible for synthesizing the desired band is also introduced in
Chapter 2 .
For Figure 4-20 to Figure 4-24 in the following examples, the dash lines are
uniform channels coming from the base filter bank while the solid lines are the rational
sampling channels. Rational sampling channels have bandwidths similar to those
uniform channels as input will be upsampled before passing into them. The vertical dash
lines in Figure 4-21 separates upsampled input into three parts. Basically, the input is
filtered in one of the three parts.
1. Filter bank with bandwidths: (π/16, π/16, π/16, π/16, 5π/16, 7π/16)
53
This non-uniform filter bank can be formed base on a 16-channel LPPUFB. And we are
employing a 5-channel and a 7-channel LPPUFB to combine 5th to 9th and 10th to 16th
band of that 16-channel base. For the 5-channel LPPUFB, we use one with lengths of
5×3. For the 7-channel LPPUFB, we use the one with filter lengths of 7×5.
For the band of 5π/16: m=11, n=3, filters are matched in normal order.
For the band of 7π/16: m=15, n=6, filters are matched in reverse order.
The plot of frequency response is shown in Figure 4-20.
2. Filter bank with bandwidths: (π/16, 3π/16, 5π/16, 7π/16)
This filter bank is similar to the previous one except that a 3-channel LPPUFB is
required additionally to combine 2nd to 4th channel to form a rational sampling band with
bandwidth of 3π/16. Therefore, for convenience, one can first find the previous one
before adding the 3π/16 band. Once the previous filter bank have been optimized, we
0 0.1 0.2 0.3 0.4 0.5-40
-30
-20
-10
0
Mag
nitu
de (
dB)
Normalized Frequency
Nonuniform Filter Bank: (/16,/16,/16,/16,5/16,7/16)
Figure 4-20 6-channel non-uniform filter bank with bandwidth: (π/16, π/16, π/16, π/16, 5π/16, 7π/16)
54
can add this 3π/16 band by giving up the 2nd to 4th uniform channel. We make use of a 3-
channel LPPUFB with filter lengths 3×5.
For the band of 3π/16, m=11, n=2 and filters are matched in reverse order.
The frequency response plot for the extra 3π/16 band can be found in Figure 4-21. The
frequency plot of the whole filter bank is indeed the superposition of this plot and the
plot of the previous filter bank. We can notice that the frequency response of this 3π/16
band is overlapping with that of 5π/16, since their m’s are equal.
3. Filter bank with bandwidths: (π/16, π/16, π/16, π/16, π/16, π/16, π/16, π/16, π/16,
7π/16)
We can form filter bank with extra rational sampling band base on the bank that have
been already designed. I.e. we can obtain example 2 based on example 1. Conversely,
one can drop any of the rational sampling band and replace the original uniform
0 0.1667 0.3333 0.5-40
-30
-20
-10
0
Mag
nitu
de (
dB)
Normalized Frequency
Frequency Response of the filter [1/16 , 4/16]
Figure 4-21 Frequency plot of the extra 3π/16 band. It is overlapping with that of 5π/16.
55
channels back to it to get filter bank having more uniform channels. Thus, by giving up
the 5π/16 band in example 1, we can achieve our desired bandwidths in this example.
Actually, for practical design process, one can have this filter bank first, example 1 the
secondly and example 2 finally. The frequency plot is shown in Figure 4-22
4. Filter bank with bandwidths: (π/16, 5π/16, 5π/16, 5π/16)
For this 4-channel filter bank, the three rational sampling bands are of same bandwidth
of 5π/16. However, we use the 5-channel LPPUFBs with filter lengths of 5×5, 5×3, 5×1
respectively. The values of m are 13, 14 and 15 respectively and they have the same
n=4. Therefore, the three bands are adjacent to each other as shown in Figure 4-23.
0 0.1 0.2 0.3 0.4 0.5-40
-30
-20
-10
0
Mag
nitu
de (
dB)
Normalized Frequency
Nonuniform Filter Bank: (/16,/16,/16,/16,/16,/16,/16,/16,/16,7/16)
Figure 4-22 10-channel non-uniform filter bank with bandwidth:(π/16, π/16, π/16, π/16, π/16, π/16, π/16, π/16, π/16,
7π/16)
56
5. Filter bank with bandwidths: (π/16, 5π/16, 5π/16, 5π/16) --- optimized both base
bank and synthesis bank for minimum overall stopband attenuation
This bank is in fact coming from example 4 but is further optimized by allowing
modification of the base filter bank so as to have better overall stopband attenuation.
Figure 4-24 depicts the frequency plot of the filter bank. We can see that the stopband
attenuation is much higher than the one in Figure 4-23. The stopband attenuation can be
as higher as near 30dB. It is obviously because we allowed much more freedom to
optimize it. Though having much better stopband error, its coding performance is not as
good as the one in Figure 4-23.
0 0.1 0.2 0.3 0.4 0.5-40
-30
-20
-10
0M
agni
tude
(dB
)
Normalized Frequency
Nonuniform Filter Bank: (/16,5/16,5/16,5/16)
Figure 4-23 4-channel non-uniform filter bank with bandwidth: (π/16, 5π/16, 5π/16, 5π/16)
57
4.5 Image coding examples and results
To see the coding performance of the non-uniform filter bank we have designed, we just
applied them to our image coding algorithm and tested it with some test images. For the
quantization of the coding, we just employed scalar quantization with dead zone for all
subbands. For the entropy coding part, we applied the coefficient modeling method
similar to that in [11].
4.5.1 Subband images after non-uniform decomposition
Upon analysis, the original images will be decomposed into a set of subband images.
These subband images is indeed representing different frequency contents of the
original image. For a 4-channel non-uniform filter bank with bandwidth:(π/16, 3π/16,
5π/16, 7π/16), the width or height of those subbands should be with the ratio 1:3:5:7 as
0 0.1 0.2 0.3 0.4 0.5-40
-30
-20
-10
0M
agni
tude
(dB
)
Normalized Frequency
Nonuniform Filter Bank: (/16,5/16,5/16,5/16)
Figure 4-24 4-channel non-uniform Filter bank with bandwidth: (π/16, 5π/16, 5π/16, 5π/16). Both the base bank and the synthesis
banks are optimized.
58
it is determined by their sampling ratio (1/16:3/16:5/16:7/16). Error: Reference source
not found depicts the subbands upon the decomposition by the filter bank we have just
mentioned. On the other hand, for the decomposition using traditional uniform filter
banks, one can expect the subbands having the same sizes as illustrated in Figure 4-26.
For the decomposition result by filter bank like: (π/16, π/16, π/16, π/16, 5π/16,
7π/16) and (π/16, π/16, π/16, π/16, π/16, π/16, π/16, π/16, π/16, 7π/16), we can imagine
it will like Error: Reference source not found with some rational sampling subbands
replaced back with their uniform subband filters in Figure 4-26.
4.5.2 Coding performance and observations
In our experiments, we chose the well-known test images: Lena, Barbara, Goldhill,
Baboon and Airplane. All of them are 8-bit gray level images and having the size
512×512. Firstly, we used the following non-uniform filter banks: (1,3,5,7)π/16,
(1,1,1,1,5,7)π/16, (1,1,1,1,1,1,1,1,1,7)π/16. Certainly, we had also applied the 16-
channel uniform filter bank for the sake of comparison. The Peak Signal-to-Noise Ratio
(PSNR) for various bit rates from 0.1bpp to 1.0 bpp had been recorded. To find coding
result of such wide range of bit rates, we encoded the image at 1.0bpp with its optimal
quantization step size. Then, to have our desired bit rates, we just truncated anywhere of
the stream where the bit rate reached. The PSNR results for 0.125, 0.25, 0.5, 1.0 bpp
can be found in Table 4.1. The table also contains the results for the image Barbara with
different horizontal and vertical non-uniform partition. We will discuss this later in this
chapter. Meanwhile, Error: Reference source not found(a)-(e) illustrated the coding
performance of these filter banks for different images graphically.
From the plots, we can see that non-uniform filter banks generally outperform
the uniform 16-channel LPPUFB for most of the cases except for the image Barbara.
59
Among the non-uniform filter banks we have investigated, the filter bank with
bandwidth :(1,3,5,7)π/16 performs the best. It always achieves the best PSNR results for
almost all the bit rates. The improvement of it over the uniform one could be as high as
0.9dB as found in the case of image Airplane.
We observed that the filter banks have consistent performance over different
images. Undoubtedly, we can order their coding performance as:
(1,3,5,7)π/16 > (1,1,1,1,5,7)π/16 > (1,1,1,1,1,1,1,1,1,7)π/16 > 16-channel uniform
With 7 of the uniform channels combined into one rational sampling channel, we
improve the performance. Similarly, we can further improve it using the filter banks:
(1,1,1,1,5,7)π/16. Finally, we have the best filter banks: (1,3,5,7)π/16. We see that larger
subbands at relatively high frequencies benefit image coding. For image coding, energy
is always concentrated in low frequency. The high frequency contents are often less
dominant. And upon quantization, many of the high frequency coefficients will be
quantized to zero. These zeros will usually forming some large areas since these large
area of zero coefficients are actually representing the flat or blur areas of the images.
Therefore, having larger subbands at high frequency can help forming large zero areas.
This favors the run length and arithmetic coding of these zeros. For run length coding,
several zeros can be coded as a group efficiently since they most likely appears. For
arithmetic coding with context models, the implication of zeros from its insignificant
neighbors makes less bits required. Besides, high frequency subbands always contain
outline or irregular texture of objects in image and these kinds of signal are usually
quite localized in spatial domain. Thus, the larger the bandwidths, the more precise the
representation will be in spatial domain with less significant coefficients. We can
improve it by ‘predict’ the significance of a coefficient from its significant neighbors
60
since outlines are always connected in nature. For more intuitive understanding of this,
we can see the decomposed subband images in Error: Reference source not found and
Figure 4-26. For decomposition with non-uniform and larger subbands, we can see
easily the shape of Lena. In other words, the significant coefficients are quite
concentrated and connected. Nevertheless, for subband images decomposed by uniform
filter bank, significant coefficients representing the shape of Lena spreaded over most of
subbands. We see many of small Lena images and thus there are more significant
coefficients.
We observed that the bandwidths of the best filter bank are: (1,3,5,7)π/16, i.e. it
has increasing bandwidths from low frequency to high frequency. This fits the decaying
spectrum property of most of the images. To verify this, we may compare its
performance with filter bank having similarly 4-channel but with last three channels
Figure 4-25 Subbands of obtained after the decomposition by a 4-channel non-uniform filter bank: (π/16, 3π/16, 5π/16,
7π/16the image Lena)
61
having equal bandwidths, i.e. the filter banks with bandwidths: (π/16,5π/16,
5π/16,5π/16). From Error: Reference source not found(f), we see that the new 4-channel
non-uniform filter banks are inferior than the (1,3,5,7)π/16 filter bank. With the
decaying spectrum property, the higher the frequency means the less the variation of the
energy. Therefore, combining more channels into a wider one will cause a little bit loss
in energy compaction but have much higher entropy gain.
62
There is also another filter bank with same bandwidths (1,5,5,5)π/16 but was
optimized for overall stopband attenuation by altering both the base bank and synthesis
bank. Its performance is even worse than the previous one because higher overall
stopband attenuation does not guarantee best coding result. Indeed, allowing
optimization of the uniform base for smaller overall stopband attenuation will destroy
the high coding gain of it.
Figure 4-26 Subbands of the image Lena obtained after the decomposition by a 16-channel uniform filter bank
63
Bit Rates(bpp)
Filter Banks: Images: 0.125 0.25 0.5 1
16-channel LPPUFB Lena 30.1420 33.2126 36.4530 39.7104
Barbara 26.2664 29.2165 32.8824 37.6777
Goldhill 28.0476 30.2388 32.8717 36.3483
Baboon 22.9430 24.3481 26.5659 30.1622
Airplane 28.5113 31.9842 35.7285 40.1436
Non-uniform FB: Lena 30.3032 33.6093 36.9375 40.2971
(1,3,5,7)π/16 Barbara 25.4638 28.4150 32.6808 37.9097
Goldhill 28.3059 30.4461 33.1392 36.7305
Baboon 23.1276 24.4758 26.8839 30.7011
Airplane 29.1193 32.5229 36.4834 41.0526
Non-uniform FB: Lena 30.4151 33.5935 36.8346 40.1878
(1,1,1,1,5,7)π/16 Barbara 25.8020 28.5844 32.6387 37.8062
Goldhill 28.2578 30.3836 33.0102 36.6103
Baboon 23.1553 24.4383 26.7921 30.5779
Airplane 28.9182 32.1912 36.1962 40.7981
Non-uniform FB: Lena 30.2659 33.2908 36.6305 39.9146
(1,1,1,1,1,1,1,1,1,7)π/16 Barbara 26.0305 28.9936 32.8727 37.7500
Goldhill 28.2398 30.3075 32.9070 36.4032
Baboon 23.0611 24.4017 26.6660 30.3457
Airplane 28.8474 32.0874 35.8219 40.3351
NUFB: (1,5,5,5)π/16 Lena 29.7453 33.1299 36.7196 40.1831
NUFB: (1,5,5,5)π/16b Lena 29.5969 32.9706 36.5334 40.0654
H: (1,1,3,3,5,1,1,1)π/16V: (1,1,3,3,1,7)π/16
Barbara 26.1195 29.4099 33.5413 38.3710
H: (1,1,3,3,1,7)π/16V: (1,1,3,3,5,1,1,1)π/16
Barbara 26.0782 29.2324 33.2516 38.1143
Table 4.1 PSNR results for image coding using different NUFB and LPPUFB on different test images
64
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 128
30
32
34
36
38
40
42
Bit Rates (bpp)
PS
NR
(dB
)
Lena
16-channel LPPUFB NUFB: (/16, 3/16, 5/16, 7/16)
NUFB: (/164, 5/16, 7/16)
NUFB: (/169, 7/16)
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 124
26
28
30
32
34
36
38
Bit Rates (bpp)
PS
NR
(dB
)
Barbara
16-channel LPPUFB NUFB: (/16, 3/16, 5/16, 7/16)
NUFB: (/164, 5/16, 7/16)
NUFB: (/169, 7/16)
(a) (b)
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 127
28
29
30
31
32
33
34
35
36
37
Bit Rates (bpp)
PS
NR
(dB
)
Goldhill
16-channel LPPUFB NUFB: (/16, 3/16, 5/16, 7/16)
NUFB: (/164, 5/16, 7/16)
NUFB: (/169, 7/16)
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 122
23
24
25
26
27
28
29
30
31
Bit Rates (bpp)
PS
NR
(dB
)
Baboon
16-channel LPPUFB NUFB: (/16, 3/16, 5/16, 7/16)
NUFB: (/164, 5/16, 7/16)
NUFB: (/169, 7/16)
(c) (d)
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 126
28
30
32
34
36
38
40
42
Bit Rates (bpp)
PS
NR
(dB
)
Airplane
16-channel LPPUFB NUFB: (/16, 3/16, 5/16, 7/16)
NUFB: (/164, 5/16, 7/16)
NUFB: (/169, 7/16)
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 128
30
32
34
36
38
40
42
Bit Rates (bpp)
PS
NR
(dB
)
Lena
16-channel LPPUFB NUFB: (/16, 3/16, 5/16, 7/16) NUFB: (/16, 5/16, 5/16, 5/16)
NUFB: (/16, 5/16, 5/16, 5/16)a
(e) (f)
Figure 4.27 (a)-(f) Plots of PSNR vs Bit rates for different images using four different filter banks
65
Lena (512x512) PSNR=33.21 dB Lena (512x512) PSNR=33.61 dB
( a ) ( b )
Lena (512x512) PSNR=36.45 dB Lena (512x512) PSNR=36.94 dB
( c ) ( d )
Figure 4-28 Reconstructed Lena images;
(a)&(b) at bit rate=0.25bpp, (c)&(d) at bit rate=0.5bpp ; (a)&(c) used 16-channel LPPUFB, (b)&(d) used NUFB: (π/16, 3π/16,
5π/16,7π/16 )
66
For the image Barbara, the result is quite different from other images. The PSNR results
for non-uniform filter banks are better than the 16-channel uniform bank only at the bit
rate of 1.0 bpp. This is probably because the spectral characteristics of the image
Barbara. The power spectrum is not quite satisfying the decaying spectrum hypothesis.
Moreover, the horizontal and vertical spectrums as shown in Figure 4-30 are not alike.
Therefore, decomposing the image Barbara in the (1,3,5,7) manner in both the
horizontal and vertical orientation cannot provide the best solution to the image.
Then, how can we find a non-uniform filter bank to adapt the image’s
spectrums? To solve this, let’s consider the horizontal and vertical spectrums as in
Figure 4-30. Traditionally, for an ideal subband coder, combining the subbands into one
covering the same frequency range will only gives worse energy compaction capability.
However, we can give up a little bit energy compaction gain while having better entropy
gain by forming larger subbands. To ensure only limited energy compaction loss, we
have to combine only those subbands that have no or little power variation. Hence, for
Barbara, we had designed the partitions of non-uniform filter banks are shown as the
vertical dotted line in Figure 4-30. After the image had been decomposed into subbands
with different horizontal and vertical filter banks, we achieved better entropy coding
result as explained before. Numerical results can be found in Table 4.1 while Error:
Reference source not found shows the improvement graphically when compared with
16-channel LPPUFB. There are results for decomposition using exchanged horizontal
and vertical filter banks. Its performance is worse than the normal one and still better
than the uniform one. This is because the horizontal and vertical filter banks differ only
for the higher π/2 bandwidth. Here, we see that decomposition with different suitable
horizontal and vertical filter banks will sometimes provide better coding performance.
67
From the above observations, we propose an algorithm to partition non-uniform
filter banks base on their energy variation. The basic idea of the algorithm is to merge
subbands with similar energy levels. Therefore, it searches from low frequency to high
frequency until energy level of any subband differs from others by higher than Δ. Here,
Δ can be determined by the DC energy (S1) of the image and an adjustable parameter α.
The algorithm goes on searching until all subbands have been covered. Before this
algorithm, we have to compute the energies for every subband of an image. This can be
done by Fast Fourier Transform. As mentioned before, we may have different vertical
and horizontal partitions. Therefore, the algorithm can be carried out for images
vertically and horizontally. For the algorithm below, we have Si representing the energy
of subband i. Also, k and M are the starting positions of synthesis bank and the number
of channel of the uniform base respectively. It should be noted below that the algorithm
keep the DC subband alone as its energy is always much higher than others.
Algorithm for partitioning non-uniform filter banks
output 1,1Δ = αS1
Stt = S2
k=2i=3while i<=M+1
Sd =| Stt/(i-k)-Si|if(Sd >=Δ or i=M+1)
if( i-k= odd )output k, i-1k = ii =i+1Stt = Sk
else output k, i-2k= i-1Stt = Sk
elseStt = Stt + Si
end while
68
Figure 4-29 Plot of PSNR vs Bit rates using different decomposition for the image Barbara
Horizontal Power Spectrum for Image Barbara
Frequency
ln(S
xx(e
j))
Vertical Power Spectrum for Image Barbara
Frequency
ln(S
xx(e
j))
Figure 4-30 Piecewise constant horizontal and vertical power spectrum of image Barbara. Dotted lines show the best partition
found for the non-uniform filter banks.
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 124
26
28
30
32
34
36
38
40
Bit Rates (bpp)
PS
NR
(dB
)
Barbara
16-channel LPPUFB H:(1,1,3,3,5,1,1,1)/16 V:(1,1,3,3,1,7)/16 H: (1,1,3,3,1,7)/16 V: (1,1,3,3,5,1,1,1)/16
69
Figure 4-31 Subbands obtained by different horizontal and vertical non-uniform filter banks with partitions as shown above.
4.6 Conclusion
In this chapter, constraints of designing such filter banks have also been relaxed and
formulated. Based on a 16-channel LPPUFB, we have designed various non-uniform
filter banks with different bandwidth distribution. We have shown that non-uniform
filter banks outperform traditional uniform filter banks consistently. For images having
spectrums different from the decaying spectrum property or have different horizontal
and vertical spectrums, non-uniform filter banks with suitable partitions can be found to
offer better coding results. The method we used to design such filter banks is indeed
sub-optimal in terms of implementation complexity and coding performance. However,
the potential of non-uniform filter bank for image coding has been experienced.
70
Chapter 5 Space and Frequency Context
Modeling for Block-Transform-Based
Image Coding
In this chapter, a new image compression algorithm for block transform-based image
coding is proposed. The algorithm is based on context modeling of coefficients. Both
the intra-band and inter-band dependence have been exploited. Coefficients are coded
in bit-planes to yield embedded streams. Instead of coding each subband independently,
our algorithm treated the transformed image as a whole. It has been found to have
comparable performance with those wavelet-based algorithms despite its simplicity in
implementation.
5.1 Introduction
The recent success of wavelet image coding is most likely because of the application of
context modeling and entropy coding to the quantized coefficients. Being capable to
code groups of zeros efficiently and make a good “prediction” of significant coefficients
based on the neighboring context, wavelet image coding has been brought to its
maturity. Besides, coding coefficients from most significant bit-plane to least significant
bit-plane give progressive and embedded data stream. Hence, it provides satisfactory
rate-distortion performance and allows precise bit rate control by truncation at any point
of the stream. Therefore, it is worth to see if this kind of techniques can be applied to
image coding using uniform decomposition. Actually, there were efforts to adopt image
coding algorithms that was originated for wavelet transform, to block transform image
71
coding[34][40]. However, these methods require us to treat each block of transform
coefficients analogously to a full wavelet tree. This requires either rearrangement or
index mapping of block transform coefficients and hence increases implementation
complexity. Moreover, these methods perform further decomposition of dc coefficients
and thus are more complex. On the other hand, with consideration of 8 frequency
neighbors and 8 space neighbors and coding the coefficients in bit-planes in three
passes, our proposed algorithm is competitive to the state-of-the-art algorithms like
SPIHT.
5.2 Intra-band and Inter-band Dependence
It was known that context dependence could be exploited across subbands or within
subbands. The well-known zerotree coders EZW (Embedded Zero-tree Wavelet)[17]
and SPIHT (Set Partitioning in Hierarchical Trees)[5] are two examples that exploited
the interband dependence. The idea is: the insignificance of a coefficient in a coarser
scale subband will most probably imply insignificance of those coefficients on the same
location of a finer scale subband. In [11], the adopted algorithm further improves the
coding performance by exploiting intraband dependence, i.e. the probability of a
coefficient being significant depends on the significance of its adjacent coefficients
within the same subband. The parent and child relationship in zerotree coders indeed
can be considered as frequency neighborhood since parent coefficient and its child
coefficients are adjacent horizontally, vertically and diagonally, in terms of frequency.
5.2.1 Space and frequency neighborhood
For block-transform-based coding, each coefficient in transformed block is actually
representing certain horizontal and vertical frequency contents at that rough location.
72
Adjacent coefficients within the same transformed block are having increasing (or
decreasing) frequency in horizontal, vertical or diagonal direction. We call these
coefficients frequency neighbors of each other. For coefficient having the same relative
position in adjacent blocks, we call them space neighbors as they can be regarded in the
same subband but having adjacent location. Therefore, for the images decomposed by
M-channel filter banks or transforms with M basis, the frequency neighbors are adjacent
coefficients except those outside the transformed blocks. Meanwhile, space neighbors
are those coefficients that are M-1 coefficients apart. Figure 5-32 depicts the
relationship between a coefficient and all its space and frequency neighbors.
73
5.2.2 Statistical context modeling
In our proposed algorithm, we code the transformed coefficient in bits. Therefore we
have a bit sequence and the minimum code length for sequence given by [37]:
(5.)
where xi is the current bit to be coded and xi-1, xi-2, … , xi-1 represents the bits coded int
the past. Given the quantized coefficients, this achieves the minimum lossless rate.
Ds0
(1)Vs0
(3)Ds1
(1)
Df0
(2)Vf0
(4)Df1
(2)Hs0
(3)Hf0
(4) C Hf1
(4)Hs1
(3)Df2
(2)Vf1
(4)Df3
(2)
Ds2
(1)Vs1
(3)Ds3
(1)
Figure 5-32 Space and frequency neighbors in three different directions inside a block-transformed image using M-channel FB or
transforms with M basis.
The notations H, V and D represent the horizontal, vertical and diagonal neighbors of the coefficient C respectively. While the
subscripts s and f indicates the space and frequency neighbors of the three types. The numbers inside bracket is the context score of
each type of neighbors.74
Practically, we can have a binary arithmetic coder to approach this minimum rate. The
main issue is to have a good choice of subset of {xi-1, xi-2, … , x1} such that we have an
reasonable approximation of the conditional probability. In our discussion, let’s denote
such subset of {xi-1, xi-2, … , x1} S. The probability of the current bit is determined by
the set of past observation S. Therefore S serves as the modeling context. Given the
observation S, we may have an estimate of the conditional probability that works as
source to the statistical model. For image coding, the past observation can consist of any
casual adjacent bits in both space and frequency domain. What we have chosen are 8
frequency neighbors and 8 space neighbors surrounding the current coefficient bit C:
For the graphically representation of them, we can refer to Figure 5-32. Here H*, V* and
D* represent the existence of any corresponding significant neighbors. As an example,
consider the horizontal neighbors at a particular bit-plane with threshold of Tb:
Having the set of past observation, we have the statistical model:
(5.)
Though it is pretty precise for probability estimation, it seems to be quite a high order
model and thus cause two problems: (1) it is both memory and time consuming in real
application, (2) we most likely do not have sufficient past observations to produce a
robust estimate. Therefore, instead of looking at the individual significance state of the
neighbors, we sum up the each type of them. According to space and frequency nature
75
and directions of them (i.e. horizontal, vertical and diagonal), we have 6 type of
neighbors. Hence, it becomes
(5.)
We still hope to avoid such high order modeling. It will be nice if the probability
is linearly dependent on the significance of different types of neighbors. Fortunately,
good coding performance results from this assumption. What’s next is to find the
coefficient (i.e. the weights) for the corresponding neighbors. Although, this can be
done through linear regression while doing it empirically will achieved quite good
result. Hence, we now have a much simpler estimation of the probability:
(5.)
76
In order to fit the binary arithmetic coder, we have to map the above estimate
into a finite number of context groups that representing the real probabilistic counts
during coding.
5.2.3 Context quantization
With 8 frequency neighbors and 8 space neighbors, we have up to 65536 possible
combinations of context information. Consequently, we have to reduce this to a
manageable number of contexts. This process is called context quantization. Our
context quantization rule bases on the discussion in the previous subsection. Our context
quantization maps the possible combinations to one of 9 context labels and is as follow:
each type of neighbors that have been found significant will be given a context score.
We may think the score is related to the importance of any particular neighbor to the
Context Score Context Label
0 0
1 1
2 2
3-4 3
5-6 4
7-9 5
10-12 6
13-16 7
>16 8
Table 5.2 Lookup table for context labels
77
significance of certain coefficient. Experimentally, we found the context scores
illustrated in Figure 5-32 inside the brackets have the best coding performance.
Neighbors that are still insignificant will have a score of 0. The overall context score of
a coefficient is the sum of context score of all its significant neighbors found. After that,
one can find the context label by the lookup table in Table 5.2 according to the context
score. These nine contexts label is corresponding to nine probability models for
arithmetic coding. A coefficient will have context of 0 only if it does not have any
frequency or space neighbor. We can see that the context quantization involving only
integer arithmetics and a small lookup table, therefore, it is quite efficient. Unlike those
coefficient modeling algorithms for wavelet transform, our algorithm do not have
different context quantization rules for different subbands, e.g. for subbands with
different orientations.
78
5.2.4 Another context quantization method
Here we suggest an alternative method for context quantization whose coding
performance is as good as the above one but requires more context labels. This method
only requires bit setting during updating significance information of the neighbors,
whenever a significant coefficient is found. Instead of considering the number of
Figure 5-33 Image after transformation with 16x32 LPPUFB
79
significant neighbors of each kind, it simply sets the significance state (bit) for that type
of neighbor to 1 whenever any one of them is found to be significant. We have 6
different types of neighbors and thus the value of the context labels consist of 6 bits.
Therefore, there are a total of 64 different contexts. Again, context label 0 represents
context of a coefficient with no significant neighbor. Example of finding the context
label based on the neighboring significance status is shown in Figure 5-34. Here, we
give up summing the total number of each types of significant neighbor. However, we
have made no assumption on importance of contribution of each type of them. This
makes it more adaptive for images with different nature where different neighboring
dependences may exhibit. This is an alternate context quantization method that provides
similar result to that in subsection 5.2.3. However, all the coding results in this chapter
are still base on the context quantization method in subsection 5.2.3.
Type of significant neighbors
Hf Vf Df Hs Vs Ds Context label
Bit values 1 0 0 0 1 0 34
Figure 5-34 Example of finding context label
5.3 Coding Passes
In our proposed algorithm, the probability for a coefficient bit to become significant is
determined by its context label. In fact, the context label depends on the neighboring
significances of the coefficient. The conditional probabilities then drive the adaptive
arithmetic coder. When coding a coefficient bit, the bit value and its context label are
passed to the arithmetic coder to output the bit stream. The arithmetic coder benefits
from the uneven probability distributions of different contexts and thus gives entropy
gain. Except those context labels for the most-significant-bits of a coefficients, we have
80
several context labels for the sign bits, the refinement bits and those to used in cleanup
pass. They will be explained later in this section. The arithmetic coder will start with
equal probabilities for different contexts. It will be reset before coding each bit-plane for
better adaptation. For the working principle of arithmetic coding, we may refer to 2.5
During the coding process, coefficients are coded in bit-planes and each bit-
plane consists of three coding passes: significance pass, refinement pass and cleanup
pass. Each coefficient bit is coded in one and only one of the three coding passes. Since
all coefficients will initially have context of 0, the coder will start at the cleanup pass
the highest bit-plane of the transformed image. Each bit-plane is scanned in a particular
order. Starting at the top left of the transformed image, first four bits of the first column
are scanned. Then the four bits of the second column, until the width of the image have
been covered. After that, the second four bits of the first column are scanned and so on.
The scan process continues until all bits of the image have been covered. Figure 5-35
illustrates the coding process while each coding pass will be described as follows.
5.3.1 Significance pass
In this pass, coefficients that are more likely to become significant in current bit-plane
will be coded base on their significance context, i.e. bases on the existence of its
significant neighbors. This pass is prior to other passes because significant coefficients
found will contribute to distortion reduction. Only coefficients with non-zero context
are coded here. For sign coding, a single independent context label 9 is reserved as it
seems that the sign is independent of the context for transform coding.
81
5.3.2 Refinement pass
Significant coefficients that have been discovered are refined in this pass. Since one
more magnitude bit is coded in this pass, the uncertain interval of a coefficient is halved
and thus the coefficient is refined. We have two context labels for coding those
refinement bits. If it is the first magnitude refinement bit of the coefficient, context label
10 will be used. Otherwise, context label 11 will be used.
5.3.3 Cleanup pass
The cleanup pass includes all remaining bits of coefficients that were insignificant and
were not coded in the significance pass. The bits of coefficients are considered four at a
group vertically. If all four bits of coefficients need to be coded in the cleanup pass, an
extra bit is required to indicate whether all of them are insignificant. Otherwise, they are
coded in the same way as in the significant pass. If at least one of the four bits is
Figure 5-35 Illustration of the coding process.
Sig
nific
ant P
ass
Sig
nific
ant P
ass
Re
fine
men
t Pa
ssR
efin
em
ent P
ass
Cle
an-
Up
Pas
sC
lea
n-U
p P
ass
Binary Arithmetic Coder
most significant bit-plane of transformed image
82
significant, a two bits run-length code then will be generated to indicate first significant
coefficient of the four. This is followed by sign bit coding of that coefficient. For other
significant coefficients of the four, they are coded in the same way as in the significant
pass. Context label 12 is assigned for coding the bit representing the insignificance of 4
coefficient bits while context label 13 is assigned for the two bit run-length code. The
cleanup pass codes the coefficients that are less likely to be significant and thus helps to
speed up the decoding process. It is because that the decoder may skip all the 4
coefficients when a single bit was decoded to indicate that all of the 4 bits are
insignificant.
5.3.4 Illustration of coding passes
In this subsection, we demonstrate the coding passes using an example transformed
image as shown in Figure 5-36. From Figure 5-36, we suppose the image was
transformed by a transform of 4 bases. We are going to illustrate the coding procedure
using the middle block of coefficients. The shaded squares denote significant
coefficients have been found and different grayscales represent the significance have
been found at different bit-plane. Figure 5-37 illustrates the bit output (at the top) and
the context label (at the bottom) to the arithmetic coder for the bit-plane of T=8 and
T=4.
83
During the bit-plane of T=8, the context score of the coefficient 13 is 7 and thus
the context label is 5. It is found to be significant and its sign ‘+’ is coded too. For the
coefficient -3 below 13, its context score changes from 0 to 3 as the coefficient 13 is
newly found to be significant. For other coefficients (i.e. -7,1,0) with non-zero context
label, they are coded in the significance pass while others are ignored in this pass.
During the refinement pass, since the coefficient 21 was discovered to be significant in
last bit-plane (T=16), its magnitude residue is refined here. As it is the first refinement
bit, context label 10 is given. At the cleanup pass, there exist two vertical group of four
i.e. (0,-3,1,0) and (0,0,4,0) that can be coded in groups. The two groups of coefficients
are insignificant in this bit-plane.
21 -7 0 0
13 1 -3 0
-3 0 1 4
0 -1 0 0 Legend:
Significant after
Bit-plane of T=16
Bit-plane of T=8
Bit-plane of T=4
Figure 5-36 Example transformed image with 3x3 blocks
84
T PassesCoefficients
21 13 -3 0 -7 1 0 1 0 -3 1 0 0 0 4 0
::
::
8
Significance 1,+ 0 0 0 0
Context label 7,9 3 5 5 2
Refinement 0
Context label 10
Cleanup 0 0 ------ 0 ------- ------ 0 -------
Context label 0 0 ------ 12 ----- ------ 12 -----
4
Significance 0 1,- 0 0 0 0
Context label 3 8,9 6 2 3 2
Refinement 1 1
Context label 11 10
Cleanup 0 0 0 0 (1, 10,-, 0 )
Context label 0 0 0 0 (12, 13,9,3)
::
::
Figure 5-37 Example of coding passes for the image in Figure 5-36
For the significance and refinement passes of bit-plane with T=4, the output bits
and context labels are found similarly. At the cleanup pass, the group of four (0,-3,1,0)
is no longer be coded together as (0,-3) have been already coded in the significance
pass. Therefore, (1,0) will be coded individually. For the group (0,0,4,0), the first ‘1’
indicates that at least one of them is significant. Then, ‘10’ indicates that the first
significant coefficient is the third one. This is followed by the sign (‘-’) of it. Lastly, the
last 0 is coded with context label 3.
85
5.3.5 Embedded bit stream truncation
Nowadays, embedded image coding is essential for various applications such as
progressive image transmission, internet browsing, scalable image and video database.
For embedded coding, bit stream can be truncated at any point and image can be
reconstructed with reasonable quality. With the image transmission going on, the
quality of reconstructed image improves and can even up to the point of lossless quality.
In [17][5], coefficients are coded in bit-planes. Also, subbands of coarser scale are
coded prior to those of finer scale as it is supposed that low frequency subbands
contribute distortion decreases more than high frequency subbands. As mentioned, our
algorithm does not handle subbands separately. Therefore, we cannot guarantee low
frequency subbands to be coded prior to those of higher frequencies. However, during
the significance pass, only coefficient bits with non-zero contexts will be scanned.
Usually, low frequency subbands are more probable to have significant neighbor and
thus non-zero contexts than that of higher frequencies. Hence, coefficients with lower
frequencies will usually be coded in significance pass (and thus prior to those of higher
frequencies) during the early stage of coding. Therefore, our coder has reasonable
performance even at relatively low bit rate when truncation is made at some point. It can
be seen later in this chapter that the PSNR curves are quite “convex”. That is, it has
quite good rate-distortion performance although only casual truncation of bit-stream is
applied for bit-rate control.
5.4 Coding Examples and Results
In our experiments, an 8×8 DCT, a 16×16 DCT and a 16×32 LPPUFB [28] have been
used for testing. The LPPUFB one was our own implementation based on [28] and was
86
optimized for coding gain (9.81dB). For the details of design and optimization of that
filter bank, we can refer to Chapter 2 . Firstly, the images were block transformed by the
above transforms. Then they were explicitly quantized uniformly using single
quantization step before passing to our proposed context modeling algorithm. To obtain
the PSNR at various rates of the same image, we first compressed the image to the bit
rate of 1.0 bpp. Then, we simply truncated the bit stream when the target bit rate had
reached during decoding.
Table 5.3 shows the coding performance of our algorithm in terms of PSNR.
The images tested were 512×512 gray scale images: Lena, Barbara, Goldhill, Baboon
and Airplane. The result of the well-known wavelet coder SPIHT (with 6-levels wavelet
decomposition) have been included too. For the sake of justifying the performances due
to the part of entropy coding in our algorithm, we have to compare it with other image
coder which employed block-transform technique too. Hence, on the second last and
last columns of the table, we also included the results by the image coder using 8×8
DCT plus 3-level wavelet transform (9/7 tap filters) of dc coefficients and 16×32 GLBT
(Generalized Lapped Biorthogonal Transform) plus 2-level wavelet transform (9/7 tap
filters) of dc coefficients respectively[34]. That image coder used SPIHT for
quantization and entropy coding [34]. Figure 5-38 shows plots of PSNR versus bit rates
of these image coding algorithms.
From the plots, one should note that our proposed algorithm with 16×32
LPPUFB outperforms SPIHT consistently for the all three images. Generally, our
algorithm is better than SPIHT for Lena and Goldhill with improvement under 0.4dB.
However, the improvement can be as high as 3.0dB in the case of Barbara. It is because
Barbara has many regular textures that are high frequency dominant. It was known that
87
uniform filter banks or lapped transforms can provide better high frequency resolution
and thus energy compaction, while wavelet transform cannot. Besides, our algorithm is
able to have further entropy gain from the space contexts since those high frequency
components spread in spatial domain.
From Table 5.3, it can be found that our algorithm using 8×8 DCT is
comparable to that based on the SPIHT algorithm [34] (on the second last column of the
table) except for relatively lower bit rate. Also, our algorithm using 16×32 LPPUFB is
competitive with that based on GLBT (with coding gain 9.96dB) together with the
SPIHT algorithm [34]. For the case of the image Barbara, our algorithm even
consistently outperforms the GLBT with SPIHT by about 0.5 dB. The GLBT (coding
gain=9.96dB) with higher coding gain than the LPPUFB (coding gain=9.81dB) in our
tests is similar to the LPPUFB as both of them can be regarded as block-transform
technique. This justified that the improvement brought by our algorithm for highly-
textured image like Barbara is not only due to the energy compaction capability but also
our context modeling algorithm. On the other hand, one has to realize that the method in
[34] required three/two more levels decomposition of the DC coefficients using the
popular biorthogonal 9/7-tap wavelet pair for the 8×8 DCT/16×32 GLBT. Furthermore,
the SPIHT algorithm is not tailor-made for block transform but wavelet decomposition.
To fit SPIHT, one has to rearrange coefficients in the manner like wavelet transform.
This unavoidably increases the implementation complexity. However, our algorithm
requires only a significance map to store the neighboring significance states. The
significance state of a coefficient is updated whenever its neighbor coefficient is found
to be significant. Hence, it is flexible for us to speed up the coding process by giving up
88
some of the space/frequency neighbors with a trade-off of decrease in coding
performance.
With the result of 16×16 DCT having also been included, we observe that coding
performance improves significantly from 8×8 DCT. This is partly due to the better
energy compaction capability of 16×16 DCT. The increase in number of transform basis
also provides better frequency context modeling as more precise frequency information
exists.
Principally, our algorithm works like this: DCs of each block are first discovered
and other frequency components near DC appear gradually. For smooth region, only
few coefficients are found significant for each block. While most of the other
coefficients having no significant neighbors (both space and frequency neighbors) are
coded efficiently and compactly in the cleanup pass. For complex region, the signal
spread in both space and frequency domains. Both space and frequency neighborhoods
are useful in this case. For highly-textured images, i.e. dominated by some frequency
ranges, they are overcome by our space context modeling as they usually cover a large
area of the image. In general, the more its significant neighbors, the higher the chance a
coefficient becomes significant. Moreover, frequency neighborhood is more important
than space neighborhood for the significance of a coefficient. Diagonal neighbors, being
farther than horizontal and vertical neighbors, are less related to the significance of
coefficients. This has been reflected in the context scoring.
In our experience, scanning subbands of lower frequencies before any other
subbands improved PSNR slightly, especially at a relatively low bit rate. This is because
subbands of lower frequencies are usually more dominant than other subbands. When
the truncation is somewhere within certain pass of a bit-plane, this contributes a little bit
89
better progressive performance as more significant coefficients was found before
truncation. However, the author still insists in maintaining implementation simplicity
with a little bit coding performance tradeoff.
90
Image Bit Rate (bpp)
SPIHT 16×32LPPUFB
16×16 DCT
8×8 DCT
[34]8x8DCTwith SPIHT
[34]16x32GLBT with SPIHT
Lena 1.0 40.41 40.52 40.26 39.96 39.91 40.43
0.5 37.21 37.34 36.92 36.39 36.38 37.33
0.25 34.11 34.29 33.67 32.51 32.90 34.27
0.125 31.10 31.09 30.36 28.49 29.67 31.18
Barbara 1.0 36.41 38.97 37.78 36.84 36.31 38.43
0.5 31.40 34.53 32.97 31.52 31.11 33.94
0.25 27.58 30.70 29.17 27.44 27.28 30.18
0.125 24.86 27.50 26.13 24.37 24.58 27.13
Goldhill 1.0 36.55 36.94 36.63 36.34 36.25 36.78
0.5 33.13 33.35 33.02 32.65 32.76 33.42
0.25 30.56 30.65 30.27 29.73 30.07 30.84
0.125 28.48 28.55 28.12 27.29 27.93 28.74
Baboon 1.0 ----- 30.86 30.63 30.32 ----- -----
0.5 ----- 27.07 26.81 26.46 ----- -----
0.25 ----- 24.61 24.36 23.93 ----- -----
0.125 ----- 23.17 22.94 22.50 ----- -----
Airplane 1.0 ----- 41.04 40.80 40.63 ----- -----
0.5 ----- 36.73 36.29 35.77 ----- -----
0.25 ----- 32.75 32.18 31.12 ----- -----
0.125 ----- 29.18 28.58 27.06 ----- -----
Table 5.3 Objective PSNR results in dB for various image coding method
91
5.5 Conclusion
It has been shown that context modeling techniques [17][5][7][36][37] help to improve
coding performance significantly. This chapter presented our context modeling method
that is dedicated for block-transform-based image coding algorithms. Our algorithm was
shown to be competitive to those wavelet-based algorithms. Our algorithm is actually
not only simple in concept but also its implementation. One can imagine this algorithm
is as simple as coding bit-planes of transformed image with the consideration of context
of each bit. With only the inclusion of space and frequency neighbors as well as 14
context combinations, this algorithm is comparable to the state-of-the-art algorithms
like SPIHT.
92
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 126
28
30
32
34
36
38
40
42
Bit Rates (bpp)
PS
NR
(dB
)
Lena
LPPUFB DCT(16x16) DCT(8x8) SPIHT [11] (DCT(8x8))
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 122
24
26
28
30
32
34
36
38
40
Bit Rates (bpp)
PS
NR
(dB
)
Barbara
LPPUFB DCT(16x16) DCT(8x8) SPIHT [11] (DCT(8x8))
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 126
28
30
32
34
36
38
Bit Rates (bpp)
PS
NR
(dB
)
Goldhill
LPPUFB DCT(16x16) DCT(8x8) SPIHT [11] (DCT(8x8))
Figure 5-38 Plots of PSNR vs bit rates for images: Lena, Barbara & Goldhill
93
Figure 5-39 Reconstructed images of Barbara at bit rates of 0.5bpp and 1.0bpp.
Barbara (512x512) PSNR=34.53 dB
Barbara (512x512) PSNR=38.97 dB
94
Chapter 6 Conclusion
The linear phase paraunitary filter banks act an important role in image coding and are
the generalization of the DCT and LOT. Besides, it had been the foundation of our non-
uniform filter bank design in Chapter 4. Its theory of design and implementation were
reviewed in Chapter 2. In last section of Chapter 2, we revisited the construction of non-
uniform filter bank by cascading perfect reconstruction transmultiplexer in between
analysis and synthesis part. The basic idea was to combine several ordinary channels
into a channel with wider bandwidth. Such construction method results in filter banks
having some branches with rational sampling factors.
General issues on image compression were reviewed in Chapter 3. These
included: subband image decomposition, image transform, quantization and bit rate
allocation, entropy coding. Two well-known image coding algorithms EZW and SPIHT
were also briefly discussed.
In Chapter 4, constraints of designing non-uniform filter bank with rational
sampling factor were discussed. It was found that the combined channel could take
every position (k) depended on the matching order of the channels from two uniform
filter bank. Being critical for image coding, linear phase property was sought for non-
uniform channels. Several design examples of non-uniform filter banks with different
bandwidth partitions have been shown in Chapter 4. The filter banks were designed base
on a 16-channel linear phase filter bank. We used odd-channel filter banks to merge our
desired channels. These non-uniform filter banks were then applied to image coding.
The PSNR results were taken and compared with those using traditional uniform one. It
was found that the non-uniform filter banks generally give improvements over
95
traditional M-channel filter bank. Furthermore, we proposed using different non-
uniform partition for horizontal and vertical filtering. Spectrum of images could first be
analyzed and non-uniform partition could then be determined. It was shown that
different horizontal and vertical partitions that fit the frequency energy distribution got
better coding performance.
It could be seen that an effective coefficient modeling algorithm in an image
coder may often contribute improvement in compression gain. As motivated by this, we
proposed in Chapter 5 a coefficient context modeling algorithm that is dedicated for any
transform-based image coder. The algorithm utilizes the intraband and interband
correlations of coefficients through space and frequency neighbors as defined in the
chapter. Context quantization rules were used to reduce number of possible
configurations of the neighbors’ states. Conditional entropy coding of coefficients was
driven by an adaptive arithmetic coder. Coding experiments showed its promising
results in spite of its simplicity in implementation. The algorithm was found to be
comparable to the state-of-the-art algorithm SPIHT.
96
References
[1] A. Bilgin, P. J. Sementilli, and M. W. Marcellin, “Progressive Image Coding
Using Trellis Coded Quantization”, IEEE Trans. Image Processing, vol. 8, no. 11,
pp.1638-1643, Nov. 1999.
[2] A. K. Soman, P. P. Vaidyanathan, “Coding gain in paraunitary analysis/synthesis
systems,” IEEE Trans. Signal Processing, vol. 41, May. 1993.
[3] A. K. Soman, P. P. Vaidyanathan, and T. Q. Nguyen, “Linear-phase orthonormal
filter banks,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing,
Minneapolis, MN, vol. III, Apr. 1993, pp. 209-212.
[4] A. K. Soman, P. P. Vaidyanathan, and T. Q. Nguyen, “Linear-phase paraunitary
filter banks: Theory, factorizations and designs,” IEEE Trans. Signal Processing,
vol. 41, Dec. 1993.
[5] A. Said and W.A. Pearlman, “A new fast and efficient image codec based on set
partitioning in hierarchical trees,” IEEE Trans. Circuits Syst. Technol., vol. 6, pp.
243-250, June 1996.
[6] C. W. Kok, T. Nagai, M. Ikehara and T. Q. Nguyen, “Structures and factorizations
of linear phase paraunitary filter banks,” Proc. IEEE Int. Symp. Circuits Syst.,
Hong Kong, Jun. 1997, pp. 365-368
[7] D. Taubman, “High Performance Scalable Image Compression with EBCOT”,
IEEE Trans. on Image Processing, vol. 9, no 7, pp 1158-pp1170, June, 2000.
[8] G. Strang and T. Q. Nguyen, Wavelet and Filter Banks. Wellesley, Ma:
Wellesley-Cambridge, 1996.
97
[9] H. S. Malvar, and D. H. Staelin, “The LOT: Transform coding without blocking
effects”, IEEE Trans. Acoust., Speech, Signal Processing, vol. 37, no. 4, pp553-
559, Apr. 1989.
[10] H. S. Malvar, Signal Processing with Lapped Transforms. Norwood, MA: Artech
House, 1992.
[11] ISO/IEC JTC1/SC29 WG1, JPEG 2000 Part I Final Committee Draft Version
1.0, Mar. 2000.
[12] J. Kovačević and M. Vetterli, “Perfect Reconstruction Filter Banks with Rational
Sampling Factors”, IEEE Trans. Signal Processing, vol. 41, no. 6, pp. 2047-2066,
Jun 1993.
[13] J. Kovačević and M. Vetterli, “Perfect reconstruction filter banks with rational
sampling rates in one and two dimensions”, in Proc. SPIE Conf. Visual Commun.
Image Processing, Philadelphia, PA, Nov. 1989, pp. 1258-1268
[14] J. Kovačević and M. Vetterli, “Perfect reconstruction filter banks with rational
sampling rate changes”, in Proc. IEEE Int. Conf. Acoust., Speech, Signal
Processing, Toronto, Canada, May 1991, pp. 1785-1788
[15] J. Kovačević and M. Vetterli, “The commutativity of up/downsampling in two
dimensions”, IEEE Trans. Inform. Theory, vol. 37, pp.695-698, May 1991.
[16] J. Li and S. Lei, “An embedded still image coder with rate-distortion
optimization,” IEEE Trans. Image Processing, vol. 8, no. 7, pp. 913-924, Jul,
1999.
[17] J.M. Shapiro, “Embedded image coding using zerotrees of wavelet coefficients”,
IEEE Trans. Signal Processing, vol.41, no. 12, pp3445-3462, Dec. 1993.
98
[18] K. Nayebi, T.P. Barnwell, III and M.J.T. Smith, “The design of perfect
reconstruction non-uniform band filter banks,” IEEE Trans. Signal Processing,
vol. 41, no. 3, pp. 1114-1127,Mar 1993.
[19] Khalid Sayood, Introduction to Data Compression, Second Edition, San
Francisco, CA 94104-3205, USA, Morgan Kaufmann Publishers, 2000
[20] L. Chen, Design of Linear Phase Paraunitary Filter Banks and Finite Length
Signal Processing, PhD thesis, The University of Hong Kong, Hong Kong,
August 1997.
[21] M. J. Tsai, J. D. Villasenor, and F. Chen, “Stack-run image coding”, IEEE Trans.
Circuits and Systems for Video Technology, vol. 6, pp519-521, Oct. 1996.
[22] M. Veterlli and D.Le Gall, “Perfect reconstruction filter banks: Some properties
and factorizations,” IEEE Trans. Acoust., Speech, Signal Processing, vol.37, pp.
1057-1071, July 1989.
[23] M. W. Ho and K. P. Chan, “Space and frequency context modeling for block-
transform –based image coding”, Proc. Int. Conf. Imaging Science, Systems, and
Technology, vol. 1, pp. 209-214, Jun, 2001
[24] M.W. Marcellin, M.J. Gormish, A. Bilgin, M.P. Boliek, “An Overview of JPEG-
2000”, Proc of IEEE Data Compression Conference, pp523-541, 2000
[25] P. P. Vaidyanathan, Multirate Systems and Filter Banks. Englewood Cliffs, NJ
07632, Prentice-Hall, 1993.
[26] P. Q. Hoang and P.P. Vaidyanathan, “Non-uniform multirate filter banks: Theory
and design”, in Proc. IEEE Int. Symp. Circuits Syst., Portland, OR, 1989, pp. 371-
374.
99
[27] Pankaj N. Topiwala, Wavelet Image and Video Compression, Massachusetts
02061, Kluwer Academic Publishers, 1998
[28] R.L. de Queirz, T.Q. Nguyen, and K.R. Rao, “The GenLOT: Generalized linear-
phase lapped orthogonal transform”, IEEE Trans. Signal Processing, vol. 44, pp.
497-507, Mar. 1996.
[29] T. D. Tran, “The BinDCT: fast multiplierless approximation of the DCT,” IEEE
Signal Processing Letters, vol. 7, no. 6, pp. 141-144, Jun 2000.
[30] T. D. Tran, “The LiftLT: fast-lapped transforms via lifting steps,” IEEE Trans.
Signal Processing Letters, vol. 7, no. 6, pp. 145-148, Jun 2000.
[31] T. D. Tran, R. L. de Queiroz and T. Q. Nguyen, “Linear-phase perfect
reconstruction filter bank: lattice structure, design, and application in image
coding,” IEEE Trans. Signal Processing, vol. 48, pp. 133-147, Jan 2000.
[32] T. Q. Nguyen and P. P. Vaidyanathan, “Structures for M-channel perfect
reconstruction FIR QMF banks which yield linear phase analysis filters,” IEEE
Trans. Acoust., Speech, Signal Processing, vol. 38, pp. 433-446, Mar. 1990.
[33] T. Q. Nguyen and P. P. Vaidyanathan, “Two-channel perfect reconstruction FIR
QMF structures which yield linear phase analysis and synthesis filters,” IEEE
Trans. Acoust., Speech, Signal Processing, vol. 37, pp. 676-690, May. 1989.
[34] T.D. Tran and T.Q. Nguyen, “A progressive transmission image coder using
linear phase filter banks as block transforms,” IEEE Trans. Image Processing,
vol. 8, pp.1493-1507, Nov, 1999.
[35] W. H. Press, B. P. Flannery, S. A. Teukolsky and W. T. Vetterling, Numerical
Recipes in C, Cambridge University Press, 1988.
100
[36] W. Zhang and T.R. Fischer, “Comparison of Different Image Subband Coding
Methods at Low Bit Rates’, IEEE Trans. Circuits Syst. Technol., vol 9, pp 419-
423, April 1999.
[37] X. Wu, “High-order context modeling and embedded conditional entropy coding
of wavelet coefficients for image compression”, Proc. Of 31st Asilomar Conf. On
Signal, Systems, and Computers, p.1378-1382, Nov. 1997
[38] Y. P. Lin and P. P. Vaidyanathan, “Linear phase cosine modulated maximally
decimated filter banks with perfect reconstruction,” IEEE Trans. Signal
Processing, vol. 42, no. 11, pp. 2525-2539, Nov 1995.
[39] Z Xiong, K Ramchandran, M.T. Orchard, and Y.Q. Zhang, “A Comparative Study
of DCT- and Wavelet-Based Image Coding,” IEEE Trans. Circuits and Systems
for Video Technology, vol. 8, no. 5, pp.692-695, Aug, 1999.
[40] Z.Xiong, O. Guleryuz, and M.T. Orchard, “A DCT-based embedded image
coder,” IEEE Signal Processing Letter, vol. 3, pp. 289-290, Nov, 1996.
101