lee: image representation using 2d gabor wavelets 3 · lee: image representation using 2d gabor...

$: LEE: IMAGE REPRESENTATION USING 2D GABOR WAVELETS 3 · lee: image representation using 2d gabor wavelets 3 \\ca_net1\sys\library\share\trans\pami\2-inprod\p96072\p96072_1.doc trans-96.dot$
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 18, NO. 10, OCTOBER 1996 1

\\CA_NET1\SYS\LIBRARY\SHARE\TRANS\PAMI\2-INPROD\P96072\P96072_1.DOC trans-96.dot KSM 19,968 09/20/96 10:37 AM 1 / 13

Image RepresentationUsing 2D Gabor Wavelets

Tai Sing Lee

Abstract—This paper extends to two dimensions the frame criterion developed by Daubechies for one-dimensional wavelets, and itcomputes the frame bounds for the particular case of 2D Gabor wavelets. Completeness criteria for 2D Gabor imagerepresentations are important because of their increasing role in many computer vision applications and also in modeling biologicalvision, since recent neurophysiological evidence from the visual cortex of mammalian brains suggests that the filter responseprofiles of the main class of linearly-responding cortical neurons (called simple cells) are best modeled as a family of self-similar 2DGabor wavelets. We therefore derive the conditions under which a set of continuous 2D Gabor wavelets will provide a completerepresentation of any image, and we also find self-similar wavelet parameterizations which allow stable reconstruction bysummation as though the wavelets formed an orthonormal basis. Approximating a “tight frame” generates redundancy which allowslow-resolution neural responses to represent high-resolution images, as we illustrate by image reconstructions with severelyquantized 2D Gabor coefficients.

Index Terms—Gabor wavelets, coarse coding, image representation, visual cortex, image reconstruction.

—————————— ✦ ——————————

1 INTRODUCTION

INCE Hubel and Wiesel’s [16] discovery of the crystal-line organization of the primary visual cortex in mam-

malian brains some thirty years ago, an enormous amountof experimental and theoretical research has greatly ad-vanced our understanding of this area and the responseproperties of its cells. On the theoretical side, an importantinsight has been advanced by Marcelja [25] and Daugman[6], [7] that simple cells in the visual cortex can be modeledby Gabor functions. The 2D Gabor functions proposed byDaugman are local spatial bandpass filters that achieve thetheoretical limit for conjoint resolution of information in the2D spatial and 2D Fourier domains.

Gabor [13] showed that there exists a “quantum princi-ple” for information: the conjoint time-frequency domainfor 1D signals must necessarily be quantized so that no sig-nal or filter can occupy less than a certain minimal area init. This minimal area, which reflects the inevitable trade-offbetween time resolution and frequency resolution, has alower bound in their product, analogous to Heisenberg’suncertainty principle in physics. He discovered that Gaus-sian-modulated complex exponentials provide the besttrade-off. The original Gabor elementary functions, in theform proposed by Gabor [13], are generated with a fixedGaussian while the frequency of the modulating wave var-ies. These are equivalent to a family of “canonical” coherentstates generated by the Weyl-Heisenberg group. Examplesof a set of these coherent states or elementary functions arepresented in Fig. 1. A signal can be encoded by its projec-

tion onto these elementary functions. This decomposition isequivalent to the Gaussian-windowed Fourier transform.

Fig. 1. An ensemble of odd (a) and even (b) Gabor filters or Weyl-Heisenberg coherent states.

Daugman [6], [7] generalized the Gabor function to thefollowing 2D form to model the receptive fields of the ori-entation-selective simple cells:

G x y e e

x x y y

i x y

o o

o,b gc h c h

=-

-+

-LNMM

OQPP +1

2

2

2

2

20

psb

ps b x n (1)

where (xo, yo) is the center of the receptive field in the spa-tial domain and ([o, Qo) is the optimal spatial frequency ofthe filter in the frequency domain. V and E are the standarddeviations of the elliptical Gaussian along x and y. The 2DGabor function is thus a product of an elliptical Gaussianand a complex plane wave. The careful mapping of the re-ceptive fields of the simple cells by Jones and Palmer [17]confirmed the validity of this model. Mathematically, the2D Gabor function achieves the resolution limit in the con-joint space only in its complex form. Since a complex-valued 2D Gabor function contains in quadrature projection

0162-8828/96$05.00 ©1996 IEEE

————————————————

• The author is with the Center for the Neural Basis of Cognition, 115 Mel-lon Institute, Carnegie Mellon University, Pittsburgh, PA 15213.

E-mail: [email protected]..

Manuscript received Aug. 16, 1994; revised June 6, 1996. Recommended for accep-tance by J. Daugman.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference IEEECS Log Number P96072.

S

2 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 18, NO. 10, OCTOBER 1996


an even-symmetric cosine component and an odd-symmetric sine component, Pollen and Ronner’s [33] find-ing that simple cells exist in quadrature-phase pairs there-fore showed that the design of the cells might indeed beoptimal. The fact that the visual cortical cell has evolved toan optimal design for information encoding has caused aconsiderable amount of excitement not only in the neuro-science community but in the computer science communityas well. Gabor filters, rediscovered and generalized to 2D,are now being used extensively in various computer visionapplications [3], [20].

Recent neurophysiological evidence [11] suggests thatthe spatial structure of the receptive fields of simple cellshaving different sizes is virtually invariant. Daugman [8]and others [3], [34] have proposed that an ensemble of sim-ple cells is best modeled as a family of 2D Gabor waveletssampling the frequency domain in a log-polar manner. Thisclass is equivalent to a family of affine coherent states gen-erated by the affine group. The decomposition of an image finto these states is called the wavelet transform of the image:

T f a x y a dxdyf x yx x

ay y

awav

o oo oe jc h b g, , , , ,q y q=

- -FHG

IKJ

- zz1 (2)

where a is the dilation parameter, related to V and E, xo, and yo

the spatial translation parameters, T the orientation parameter

of the wavelet. y yq q( , , , , ) ,a x y x y ao ox x

ay y

ao o= - - -1 e j is the

2D wavelet elementary function, rotated by T. A particularGabor elementary function can be used as the motherwavelet to generate a whole family of Gabor wavelets. Ex-amples of a particular class of 2D Gabor wavelets, to bederived in the next section, are presented in Fig. 2.

Fig. 2. An ensemble of Gabor wavelets (1.5 octave bandwidth) (a) andtheir coverage of the spatial frequency plane (b). Each ellipse showsthe half-amplitude bandwidth contour dilated by a factor of 2, coveringalmost the complete support of a wavelet.

Despite this recognition, many questions, originallyposed by Daugman [7], about how the degrees-of-freedomwhich create the “ coding budget” in the visual cortex areallocated and constrained, such as the trade-off betweenorientation sampling and spatial sampling, have remainedunanswered. We here address and propose answers tothose questions.

In this paper, we first derive a class of 2D Gaborwavelets, with their parameters properly constrained by

neurophysiological data on simple cells and by the wavelettheory. We extend Daubechies’ completeness criteria on 1Dwavelets to 2D and apply such criteria to study thephysiologically relevant family of 2D Gabor wavelets. Bynumerically computing the frame bounds for this family ofwavelets in different phase space sampling schemes, weelucidate the conditions under which they form a tightframe. We find that the phase space sampling density pro-vided by the simple cells in the primary visual cortex issufficient to form an almost tight frame that allows stablereconstruction of the image by linear superposition of theGabor wavelets with their own projection coefficients, andprovides representation of high resolution images usingcoarse neuronal responses. Finally, we demonstrate thesetheoretical insights with results from image reconstructionexperiments.

2 DERIVATION OF THE 2D GABOR WAVELETS

In this section, we will derive the following family of 2DGabor wavelets that satisfies the wavelet theory and theneurophysiological constraints for simple cells,

y w qw

pk

w

kq q q q

x y eoo x y x yo

, , ,cos sin sin cosc h b g b ge j

=- + + - +

2

2

22 2

84

◊ -LNMM

OQPP

+ -e ei x yo ow q w q

kcos sinc h

2

2 (3)

where Zo is the radial frequency in radians per unit lengthand T is the wavelet orientation in radians. The Gaborwavelet is centered at (x = 0, y = 0) and the normalizationfactor is such that <\, \> = 1, i.e., normalized by L2 norm. Nis a constant, with N < S for a frequency bandwidth of oneoctave and N < 2.5 for a frequency bandwidth of 1.5 oc-taves. A discrete ensemble of the 1.5 octave bandwidthfamily of Gabor wavelets is shown in Fig. 2.

Each member of this family of Gabor wavelets modelsthe spatial receptive field structure of a simple cell. The re-sponse of a simple cell is the projection of an image onto aGabor wavelet, which is the inner product of the image fwith the receptive field centered at (xo, yo),

R x y fs o o o o= < >y w q, , , ,c h= - -zz y w qx x y y f x y dxdyo o o oyx

, : , ,c h b g (4)

where < , > denotes the inner product. Because the responseof a simple cell is half-wave rectified, each projection is infact represented by the responses of two simple cells, theresponse of an on-center cell (Rs+) and the response of anoff-center cell (Rs−), where

Rs+(xo, yo, Zo, To) = [<\(xo, yo, Zo, To), f>]+ (5)

and

Rs−(xo, yo, Zo, To) = [<\(xo, yo, Zo, To), f>]− (6)

respectively, and [D]+ = {D if D � 0}, [D]+ = {0 if D < 0}, [D]− ={−D if D � 0} and [D]− = {0 if D > 0} are the half-wave rectifi-cations of the function D.

LEE: IMAGE REPRESENTATION USING 2D GABOR WAVELETS 3


Let us start our derivation with the most general 2Dcomplex Gabor function, normalized so that its L2 norm<\, \> = 1.

y x n r q s b

psb

q q

s

q q

b

x y x y

e

o o o o

x x y y x x y yo o o

, , , , , , , , ,

cos sin sin cos

c hc h c hd i c h c hd i

=

-- + -

+- - + -F

HGG

I

KJJ1

2

20

2

22 2

◊- + - +

ei x x y yo o ox n r0c h c hd i (7)

The filter is centered at (x = xo, y = yo) in the spatial domain,and at (

�= [o, � = Qo) in the spatial frequency domain. V and

E are the standard deviations of an elliptical Gaussian alongthe x and y axes. T is the orientation of the filter, rotatedcounter-clockwise around the origin. U is the absolute phaseof an individual filter. There are, therefore, eight degrees offreedom in the general Gabor function: [o, Qo, T, U, V, E, xo,and yo.

Simple cells are often not strictly even or odd symmetric,but the quadrature-phase relationship between a pair ofsimple cells is generally preserved, whatever the absolutephase may be [33], [7], [9]. Since only the power-modulusenters into our later calculation, we can simplify the Gaborfilter by setting the spatial location of the filter’s center(xo = 0, yo = 0) and the absolute phase U of the filter to 0. TheFourier Transform of the simplified complex-valued Gaborfunction is

$ , , , , , ,y x n x n q s bo oc h =

212 0

2 20

2 2

psbx x q n w q s x x q n n q b

eo o- - + - + - - + -L

NMOQPc h c hd i c h c hd icos sin sin cos (8)

where [ and Q are the spatial frequencies in radians per unitlength along x and y.1

We can further reduce the number of degrees of freedomof the filter by noting that V, E, and T are constrained by [oand Qo according to the following physiological findings [7].The spatial frequency bandwidths of the simple and com-plex cells have been found to range from 0.5 to 2.5 octaves,clustering around 1.2 octaves [10], [18], and 1.5 octaves [38].The Gaussian envelope is usually elliptical, with an aspectratio of 1.5-2.0 [38], [7] and with the plane wave’s propa-gating direction along the short axis of the elliptical Gaus-sian (Fig. 2).

CONSTRAINT #1. The aspect ratio bs of the elliptical Gaussian

envelope is 2:1 [7].

CONSTRAINT #2. The plane wave with frequency ([o, Qo) tends tohave its “propagating direction” along the short axis of theelliptical Gaussian.

When the plane wave rotates, the elliptical Gaussian ro-tates correspondingly. This implies the center frequency

1. In this paper, the convention of the Fourier transform is,$ , ,

, $ ,

f f x y e e dxdy

f x y f e e d d

i x i y

R

i x i y

R

x n

px n x n

x n

x n

a f a f

a f a f

=

=

- -zzzz

2

2

1

4 2

([o, Qo) of the filter is related to the rotation angle T of the

modulating Gaussian by [o = Zo cosT and Qo = Zo sinT,

where the radial frequency w x no o o= +2 2 .With these two constraints, the Gabor filter becomes

y x n sx y o o, , , ,c h =

1

2

1

842

0

2 2

pss

xw

nw

nw

xw x ne e

x y x yi x y

o o

o

o

o

o

oo o

- +FHG

IKJ + - +

FHG

IKJ

L

NMMM

O

QPPP +◊ c h . (9)

The orientation of the filter is aligned with the long axis ofthe elliptical Gaussian.

CONSTRAINT #3. The half-amplitude bandwidth of the frequencyresponse is about 1 to 1.5 octaves along the optimal orienta-tion [18], [38], [7].

The relationship between V and Zo can be derived to be,

skw k

f

f= =+

-

FHG

IKJo

where 2 22 1

2 1ln (10)

where I is the bandwidth in octaves. For I = 1 octave, V <S/Zo = O/2 . For I = 1.5 octave, V < 2.5/Zo.

Imposing this constraint, we obtain the following familyof self-similar Gabor filters depending on four continuousvariables,

y w qw

pk

w

kq q q q

x y eoo x y x yo


=- + + - +

2

2

22 2

84

◊ +ei x yo ow q w qcos sinc h (11)

where q nx= arctan o

o, and N is fixed for Gabor wavelets of a

particular bandwidth. The whole family can be translatedto any spatial position (xo, yo). In order to make the Gaborfilters into admissible wavelets, we need to introduce thefollowing constraint.

CONSTRAINT #4. Admissible wavelets are functions having zeromean.

Why is this important? Consider a map \(f) : L2(R2) �L2(R4), the Gabor wavelet is admissible only if its norm isfinite, i.e.,

y y w qw

w qp

f f x yd

d dxdyL R RR

b g b gd ie j2 4

2

00

2=

•zzzz , , ,

= ◊ < •$ $fL R L dd2 2 2e j c hy w

w q . (12)

This implies $ (y w = =0 0 is a necessary condition; other-

wise the norm will become infinite in the measure of dww as

Z � 0.The sine component of the complex-valued Gabor filter

has zero mean, but its cosine component has nonzero mean(d.c. response). The d.c. response can be computed from itsFourier transform, with

�= 0 and � = 0,

$ , : ,y x n x n psk

= = =-

0 0 8

2

2o o ec h . (13)



A family of admissible 2D Gabor wavelets can be obtainedby subtracting this d.c. response from the Gabor filter,

y w qw

pk

w

kq q q q

x y eoo x y x yo


=L

NMMM

O

QPPP

- + + - +

2

2

22 2

84

◊ -LNMM

OQPP

+ -e ei x yo ow q w q

kcos sinc h

2

2 (14)

with its Fourier transform equal to

$ , , , [cos sin sin cos

y x n x npk

w

k

wx x q n n q x x q n n q

o oo

e oo o o oc h

c h c hd i c h c hd i=

- - + - + - - + -LNM

OQP8

2

2

2 2

24

-- + + - + +

e oo

k

wx q n q x q n q w

2

22 2 2

24cos sin sin cos

]b g b g

(15)

where

[[o + QQo = 0 (16)

specifies the line of zeroes ( $ ( , , , )y x n x no o = 0) on the fre-quency plane.

Each of these two families of Gabor wavelets can be gen-erated by rotation and dilation (affine group) from the fol-lowing mother Gabor wavelet,

yp

kk

x y e e ex y i x,b g e j= ◊ -

LNMM

OQPP

- + -1

2

18 4

22 2

2

(17)

whose Fourier transform is

$ ,y x n px k n x n kb g b g

= -RS|T|

UV|W|

- - + - + +8

12 4

12 4

2 2 2 2 2

e e (18)

3 NONORTHOGONAL WAVELETS AND FRAMES

Now we have obtained the properly parameterized Gaborwavelet equation, we can address the issue of completerepresentation. A transform is said to provide a completerepresentation if we can reconstruct f in a numerically sta-ble way from the transform of f, or alternatively, if anyfunction f can be written as a superposition of the trans-form’s elementary functions.

Recall that the continuous wavelet transform is given by

T f a x y a dxdyf x yx x

ay y

awav

o oo oe jc h b g, , , , ,q y q=

- -FHG

IKJ

- zz1 (19)

where a is the dilation parameter, xo and yo the spatialtranslation parameters, T the orientation parameter of the

wavelet. y yq q( , , , , ) ( , )a x y x y ao ox x

ay y

ao o= - - -1 is the 2D

wavelet elementary function, rotated by T.A function can always be reconstructed from its con-

tinuous wavelet transform by means of the following resolu-tion of identity formula, provided that the wavelets are ad-missible [5],

f Cda

adxdy d f a x y a x y= < >-

-•

•

-•

•• zzz zy

pq y q y q1

30 0

2, , , , , , ,b g b g (20)

where < , > denotes the L2 inner product and

Cd

dy

pp

ww

q y w q w q=•z z4 2

0

2

0

2$ cos , sina f (21)

where w x n= +2 2 , and \ ° L1(R), $y is continuous and

C\ < � only if $ ( , )y 0 0 0= . This is because it can be shownthat [5]

da

adxdy d T f a x y T g a x ywav wav

30 0

2•

-•

•

-•

•z zz z =q q qp

, , , , , ,b g b g

C f gy < >, . (22)

When a, x, y, T take on discrete values (i.e., a aom= , ao > 1, T

= Tl = lTo, x nb ao om= , and y kb ao o

m= , and m, n, l, k are inte-gers ranging over Z), the resulting transform

T f a dxdyf x y a x nb a y kbm n k lwav

om

om

o om

ol, , , , ,e j b g e j= - -- - -zz y q (23)

is called the discrete wavelet transform. y q l is a rotated ver-

sion of the mother wavelet \(x, y). ao > 1, To > 0, and bo > 0are fixed. In this scheme, a spatially narrower wavelettranslates by finer steps, and a wider wavelet translated bylarger steps. As a result, the discretized wavelets ateach m level “ cover” the spatial domain in the sameway as \(x − nbo, y − kbo) at the m = 0 level.

The wavelets are given by

y y qm n k l om

om

o om

om

o omx y a a x nb a a y kb a

l, , , , ,b g e j e je j= - -- - -

= - -- - -a a x nb a y kbom

om

o om

oly q ,e j. (24)

Note that although the parameterization is discrete, eachwavelet elementary function is a function of the continuousvariables x and y.

Certain discrete 1D wavelets, for examples, the one-dimensional Harr bases, Meyer wavelets, and Battle-Lemarie wavelets, form orthonormal bases. In this case, thefunction can be reconstructed by a linear superposition ofthe bases weighed by the wavelet coefficients,

f f m n m nm n

= < >Â , , ,,

y y (25)

where <D1, D2> denotes the inner product of D1 and D2.However, 1D or 2D Gabor wavelets do not form or-

thonormal bases. They are called nonorthogonal wavelets. Inthis case, we are interested in knowing whether a discrete setof the nonorthogonal wavelets form a frame. If the waveletsform a frame, then one can completely characterize a func-tion f by knowing Twav(f) and can reconstruct f in a numeri-

cally stable way from Twav(f) with the dual frame ~y [5],

f f fm n k l m n k lm n k l

m n k l m n k lm n k l

= < > = < >Â Â, ~ , ~, , , , , ,

, , ,, , , , , ,

, , ,

y y y y (26)

The concept of frames was first introduced by Duffin andSchaeffer [12] in the context of non-harmonic Fourier series.Their definition is as follows:

DEFINITION. A family of functions {\m,n; m, n ° Z} in a HilbertSpace H is called a frame if there exist A > 0, B < � sothat, for all f in H,



A f f B fm nm n Z

2 2 2£ < > £Œ

Â , ,,

y (27)

where f f x y dxdy=-•

•

-•

• zz ( , )2

. A and B are called

frame bounds.

When a discrete family of wavelet forms a frame, theyprovide a complete representation of the input functions[2], [5].

When 0 < A � B < 2, the family of wavelets can be treatedas if it were an orthonormal basis, and the function can berecovered with good approximation using the followinginversion formula [26],

f A B fm n k l m n k lm n k l

ª + < >Â2y y, , , , , ,

, , ,

, (28)

(A + B)/2 is a measure of the redundancy of the framewhereas B/A is a measure of the tightness of the frame.When B = A , the frame is called a tight frame and the re-construction by linear superposition using the above inver-sion formula is exact, analogous to the resolution of identityequation [5]. Therefore, a reasonably tight frame behaves ina way similar to the continuous wavelet transform and canbe treated as though it were an orthonormal basis in imagedecomposition and reconstruction.

Readers should consult [5], [26], and [15] for a carefulmathematical elucidation and review of frames. The framebounds can be simply interpreted as the bounds on the gainof the wavelet transform, with i f i2 as the total power of the

input function and < >Â f m nm n, ,,y

2 the total output

power of the transform. Therefore A > 0 means that theoutput will not be null if the input signal is nonzero and B < �means that given an input of finite power, the outputpower of the transform has to be finite also. A tight frameprovides a uniform gain for the frequency response of theinput functions.

4 2D GENERALIZATION OF DAUBECHIES’S FRAMECRITERION

How dense should the phase space or the conjoint spa-tial/spatial frequency domain be sampled so that the 2DGabor wavelets will provide a complete representation ofany image? To answer this question, we need to generalizeto 2D the following frame criterion for 1D wavelets ofDaubechies [5]:

Daubechies’s Frame Criterion: Let \ ° L1(R) be a functionof zero mean. It defines a sequence E(k), k ° Z, by theformula,

b y x y x px

k kj jb g e j e j= +£ £ -•

•

Âsup $ $1 2

2 2 2 . (29)

Now suppose that

sup $1 2

22

£ £ -•

•

Â £ < •x

y xj Ce j (30)

and introduce the constant

W j=£ £ -•

•

Âsup $1 2

22

xy xe j (31)

Then if

W k k> -•

Â21 2

1

b bb g b gc h /(32)

the collection 2j/2\(2jx − k), j, k ° Z, forms a frame for

L2(R).

At the outset, it is important to note that here we arestudying the frame formed by a discrete set of continuouswavelets, i.e., the discretization is in the phase space sam-pling but not in the description of the wavelet function.

Let us consider the following 2D admissible waveletfamily

y y qm n k l om

om

o om

ox y a a x nb a y kbl, , , , ,b g e j= - -- - - (33)

where

y y q q q qq lx y x l y l x l y lo o o o( , ) ( cos( ) sin( ), sin( ) cos( ));= + - +

\ is the mother wavelet as defined in (17); To denotes thestep size of each angular rotation, l the index of rotationsteps, bo the unit spatial interval, and ao

m the dilation inscale. The (x, y) domain is sampled by each wavelet in ascale-dependent spatial lattice with horizontal and verticaltranslation intervals given by a bo

mo at each m level.

The Fourier transform of the 2D wavelet is given by,

$ , $ ,, , ,y x n y x nqx n

m n k l om

om

om ia b n ia b ka a a e e

l

om

o om

ob g e j= - - , (34)

where

$ ( , ) $ ( cos( ) sin( ), sin( ) cos( ))y x n y x q n q x q n qq ll l l lo o o= + - + 0 ,

and $y is the Fourier transform of the mother wavelet asdefined in (18).

To evaluate the frame bounds A and B as in

02 2

< £ < >-•

•

-•

• zz ÂA f d d f x y x ym n k lm n k l

$ , , , ,, , ,, , ,

x n x n yb g b g b g

£ < •-•

•

-•

• zzB f d d$ ,x n x nb g 2, (35)

the basic idea is to sum up the power of all the wavelet re-sponses to an image. This can be done in the spectral do-main using Parseval’s theorem:

f x y dxdy f d d, $ ,b g b g2

2

21

4-•

•

-•

•

-•

•

-•

•zz zz=p

x n x n (36)

Let ao be the scaling step and bo be the unit translation step,then Daubechies’s derivation of frame bounds for 1Dwavelets [5] can be generalized for 2D wavelets as follows:



< >

=

=

◊ + +

Œ Œ

-•

•

-•

•

- - - -

Â

zzÂ

zzÂ

f

d d f a a a e e

a d d e e

f ha b ja b

m n k lm n k Z l Q

om

om

om ib a n ib a k

m n k l

om ib a n ib a k

m n k l

om

o om

o

l

o om

o om

o om

o om

boaomboao

m

,

| $ , $ , |

|

$ , $

, , ,, , ,

, , ,

, , ,

y

px n x n y x n

px n

x p n p y

qx n

x n

q

pp

2

22

22

00

1 1

1

4

1

4

2 2

22

b g e j

e jl

boaomboao

m

l

a hb a jb

bd d f hb jb

a ha b a ja b

bd d f f pa b qa b

om

o om

oh j Z

oo o

h j Zm l

om

om

o om

om

o

oo

mo o

mo

x p n p

x n x p n p

y x p n p

x n x n x p n p

pp

q

+ +

= + +

◊ + +

= + +

- -

Œ

- -

Œ

- - - -

- - - -

Â

ÂzzÂ

2 2

12 2

2 2

12 2

1 1 2

21 1

00

1 1 2

21 1

22

, |

| [ $ ,

$ , ]|

[ $ , $ ,

,

,,

e j

e j

e jb g e j

-•

•

-•

•

Œ Œ

- -

zzÂ

◊ + +

m p q Z l Q

om

om

om

o om

ol la a a pb a qb

, , ,

$ , $ , ]y x n y x p n pq qe j e j2 21 1

= +P R (37)

where the set Q = {0, 1, 2, 3 ... K − 1} and K is the number ofsampling orientations, y is the complex conjugate of \.

The first principal term P (i.e., the power for (p = 0, q = 0))is the product between the power of the input function andthe sum of the spectral powers of all the wavelets.

Pb

d d f a ao

om

om

m Zl Ql

=-•

•

-•

•

ŒŒzz ÂÂ1

2

2 2x n x n y x nq

$ , $ ,b g e j (38)

=-•

•

-•

• zz12

2

bd d f F

o

x n x n x n$ , ,b g b g (39)

Because of the self-similarity of the wavelets in both thespectral and spatial domains, the spectral power of

F a al o

mom

m Z l Q

x n y x nq, $ ,,

b g e j=Œ ŒÂ

2

is also self-similar, i.e.,

F a a Fom

omx n x n, ,e j b g= (40)

and

F([ cos To + Q sin To, −[ sin To + Q cos To) = F([, Q). (41)

Therefore, the infimum and supremum of F over thewhole frequency domain is equivalent to the infimum andsupremum of F over a fundamental sector S (see Figs. 3 and

4). S is defined by 1 2 2< + <x n ao and 0 < <tan( )nx

pK ,

where K is the number of sampling orientations, and ao isthe dilation scaling factor. As a result, the product of theinfimum and supremum of F over S with the spectralpower of the input function provides the lower and upperbounds of the first term, respectively,

inf $ , $ ,,x n qx n x n y x n

Œ -•

•

-•

•

ŒŒ

FHG

IKJ £ zz ÂÂS o

mom

m Zl Q

F f d d f a al

2 2 2b g e j

<FHG

IKJŒ

sup,x n S

F f2

(42)

Fig. 3. The principal power F([) of 1D Gabor wavelets of 0.65 octavebandwidth (a) and 1.0 octave bandwidth (b) are shown to illustrate itsself-similarity in the frequency spectrum. The principal term becomessmoother as the number of voices (N) increases, corresponding to theincrease in the tightness of the frame. The progression to tightness isfaster for the wavelets with broader bandwidths (e.g., 1.0 octavebandwidth).

Fig. 4. (a) The principal power F([, Q) for the 2D Gabor wavelet. It isperiodic along the orientation axis, and exhibits dilating oscillationalong the radial frequency axis. The fundamental sector S (enclosed bythe black rectangle) is repeated in its dilated and translated versions tocover the whole frequency domain. (b) Three examples of F([, Q) withdifferent sampling densities are shown. The gray level indicates thevalue of the spectral power. Higher value is shown with lighter color.Generally, as the number of orientations and voices (i.e., frequencysteps per octave) increases, the power spectrum becomes more uni-form in absolute term.

The second residue term R captures the interference ef-fect between the wavelets in a family due to their nonor-thogonality. It is computed by summing all the cross prod-ucts of each wavelet transform and the transform of itsspectrally displaced (scaled and rotated) version, for all(p, q) ° Z2 except at (p = 0, q = 0),

Rb

d d f a ao

om

om

p q Zm Z l Ql

=-•

•

-•

•

ŒŒ ŒzzÂÂ| $ , $ ,

, \ ,,

12

0 02

x n x n y x nqb g e jb g a fm r

$ , $ , |f pa b qa b a pb a qbom

o om

o om

o om

olx p n p y x p n pq+ + + +- - - - - -2 2 2 21 1 1 1e j e j (43)



£

+ +LNM

OQP

ŒŒ Œ

- -

-•

•

-•

•

ÂÂ

zz

1

2 2

20 0

2 1 1

2

12

b

d d f a a a pb a qb

o p q Zm Z l Q

om

om

om

o om

ol l

, \ ,,

$ , $ , $ ,

b g a fm r

b g e j e jx n x n y x n y x p n pq q

◊ - -LNM

OQP

- -

-•

•

-•

• zz d d f a pb a qb a al lo

mo o

mo o

mom~ ~ $ ~

, ~ $~

, ~ $~

, ~x n x n y x p n p y x nq qe j e j e j2

1 12 212

(44)

(with change of variables ~x x p= + -2 1pb ao o

m , ~n n p= + - -2 1qb ao om

in the second factor)

£

+ +LNMM

OQPP

Œ

- -

-•

•

-•

•

Â

Âzz

1

2 2

20 0

2 1 1

2

12

b

d d f a a a pb a qb

o p q Z

om

om

om

o om

om l

l l

, \ ,

,

$ , $ , $ ,

b g a fm r

b g e j e jx n x n y x n y x p n pq q

◊ - -LNMM

OQPP-•

•

-•

• - -zz Âd d f a a a pb a qbl lo

mom

om

o om

om l

~ ~ $ ~, ~ $

~, ~ $

~, ~

,

x n x n y x n y x p n pq qe j e j e j2

1 12 2

12

(45)

(use Cauchy-Schwarz on the sum over m and l)

£FHG

IKJ -

FHG

IKJ

LNMM

OQPPŒ

Â1 2 2 2 22

21 2

0 02bf

pb

qb

pb

qb

o o o o op q Z

x n bp p

bp p

, , ,/

, \ ,

b gb g a fm r

(46)

where

b y x n y x nx n

q qs t a a a s a tl lo

mom

om

om

m Z l Q

, sup $ , $ ,, ,

a f e j e j= + +Œ ŒÂ (47)

and s pbo

= 2p and t qbo

= 2p .

Since the “ cross-product” term,

$ , $ ,,

y x n y x nq ql la a a s a to

mom

om

om

m Z l Qe j e j+ +

Œ ŒÂ , (48)

is also self-similar. Its supremum and infimum over thewhole spectrum are also equal to its supremum and in-fimum over the fundamental sector S. Therefore, we canwrite,

b y x n y x nx n

q qs t a a a s a tS

om

om

om

om

m Z l Ql l

, sup $ , $ ,, ,

a f e j e j= + +Œ Œ Œ

Â (49)

Thus, the infimum and supremum of the power output isgiven by

inf , { inf $ ,, , , ,

, , ,,

,f f m n k l

m n k l oS o

mom

m Z l Q

f fb

a alŒ π

-

ŒŒ Œ

< > ≥Â Â+ 0

2 2

2

21y y x n

x n q e j

-FHG

IKJ - -

FHG

IKJ

LNMM

OQPPŒ

Â bp p

bp p2 2 2 2

1 2

20 0

pb

qb

pb

qbo o o op q Z

, , }/

, \ ,b g a fo t

(50)

sup , {sup $ ,,

, , ,, , , , ,f f

m n k lm n k l o S

om

om

m Z l Q

f fb

a al

Œ π

-

Œ Œ Œ

< > £Â Â+ 0

2 2

2

21y y x n

x nq e j

+FHG

IKJ - -

FHG

IKJ

LNMM

OQPPŒ

Â bp p

bp p2 2 1 2 2 1

1 2

0 02

pb b

pb bo o o op q Z

, , }/

, \ ,b g a fm r (51)

and the frame bounds A and B for 2D wavelets are given by,

Ab

a ao

S om

om

m Z l Ql

= -RS|T| Œ

Œ ŒÂ1

2

2inf $ ,,

,x n qy x ne j

bp p

bp p2 2 2 2

0

1 2

0 02

pb

qb

pb

qbo o op q Z

, ,/

, \ ,

FHG

IKJ - -

FHG

IKJ

LNMM

OQPP

UV|W|Œ

Âb g a fm r

(52)

Bb

a ao S

om

om

m Z l Ql

=RS|T|

+Œ Œ Œ

Â12

2sup $ ,, ,x n

qy x ne j

bp p

bp p2 2 2 2

1 2

0 02

pb

qb

pb

qbo o o op q Z

, ,/

, \ ,

FHG

IKJ - -

FHG

IKJ

LNMM

OQPP

UV|W|Œ

Âb g a fm r

(53)

The Gabor wavelet is a complex-valued function. If onlyreal signals f are represented and reconstructed by means ofGabor wavelet responses <\m,n,k,l, f>, then the complexwavelet really consists of two wavelets, Re(\) and Im(\).Therefore, the frame bounds for 2D Gabor wavelets aregiven as follows,

Ab

a a a a Ro

S om

om

om

om

l Qm Zl l

= + - -LNM

OQP -

RS|T|

UV|W|Œ

ŒŒÂÂ1 1

22

2 2inf $ , $ ,,x n q qy x n y x ne j e j (54)

Bb

a a a a Ro S

om

om

om

om

l Qm Zl l

= + - -LNM

OQP +

RS|T|

UV|W|Œ ŒŒ

ÂÂ1 122

2 2sup $ , $ ,,x n

q qy x n y x ne j e j (55)

where

Rp

bq

bp

bq

bo o o op q Z

=FHG

IKJ - -

FHG

IKJ

LNMM

OQPPŒ=+ -

ÂÂ bp p

bp p

e ee

2 2 2 21 2

0 02

, ,/

, \ ,, b g a fm r (56)

and

b y x n ey x nex n

q qs t a a a aS

om

om

om

om

m Z l Ql l

, sup $ , $ ,, ,

a f e j e j= + - -Œ Œ Œ

Â14

$ , $ ,y x n ey x nq ql la s a t a s a to

mom

om

om+ + + - - - -e j e j (57)

When the frame bounds A > 0 and B < � for a discrete fam-ily of Gabor wavelets, this family of wavelets is said to pro-vide a complete representation.

5 FRAME BOUNDS FOR FRACTIONALLY DILATED 2DWAVELETS

In the above derivation, ao = 2, the frequency space is sam-pled every octave. However, it is known that in the visualcortex, the frequency space is sampled every half or everythird of an octave [32], [36]. What is the advantage ofsuboctave sampling? Grossmann, Kronland-Martinet, andMorlet [14] have found that when the frequency space issampled suboctavely, the frame will become tighter, i.e.,A � B. They proposed to construct a frame based on“ fractionally” dilated versions of a single 1D wavelet \:

\K(x) = 2−K/N

\(2−K/Nx) (58)

N is the number of sampling frequency steps per octave



and K is the index of the frequency step per octave. Withineach octave, the scale of the wavelet is changed without thecorresponding change in its spatial sampling intervals.Daubechies [1989] remarked that the \K, constructed this way,do not have the same L2-normalization, i \K i2 = 2−K/2N i � i2.This change in normalization compensates for the fact thatthe phase space lattice, which is a superposition of N lat-tices, is “ denser” than the lattice whose translation intervalis strictly proportional to the scale of the wavelet.

The 2D version of the fractionally dilated wavelets canbe generalized to be

y y hqh h

qh h

l lx y x y NN N N, , , ,/ / /b g e j= = -- - -2 2 2 0 12 K , (59)

with its Fourier transform equal to

$ , $ , , ,/ /y x n y x n hqh

qh n

l l

N N Nb g e j= = -2 2 0 1K , (60)

The general frame conditions for the fractionally dilated 2Dwavelets are given below.

Define

m a N a ao o S om

om

m Z l Q

N

l, , inf $ ,

,,

q y x nx n q

h

hc h e j=

ŒŒ Œ=

-

ÂÂ2

0

1

(61)

M a N a ao oS

om

om

m Z l Q

N

l, , sup $ ,

, ,

q y x nx n

qh

hc h e j=

Œ Œ Œ=

-

ÂÂ2

0

1

(62)

b y x n y x nh

x nqh

qhs t a a a s a t

Som

om

om

om

m Z l Ql l

, sup $ , $ ,, ,

a f e j e j= + +Œ Œ Œ

Â (63)

where To = S/K, K is the number of sampling orientations,Q = {0, 1, 2 ... K − 1}; N is the number of voices or frequencysteps per octave (i.e., N = 4 implies sampling at 2−1/4 inter-vals). S is the fundamental sector defined earlier.

Choose bo and To such that

bp p

bp ph h

h

2 2 2 21 2

0 00

1

2

pb

qb

pb

qbo o o op q Z

N

, ,/

, \ ,

FHG

IKJ - -

FHG

IKJ

LNMM

OQPPŒ=

-

ÂÂb g a fm r

< m a No o, , ,qc h (64)

then

{ ; , , , , , , , , }, , ,y hhm n k l m n k Z l K NŒ = - = -0 1 0 1K K

constitute a frame with the following frame bounds,

Ab

m a Np

bq

bp

bq

bo

o oo o o op q Z

N

= -FHG

IKJ - -

FHG

IKJ

LNMM

OQPP

RS|T|

UV|W|Œ=

-

ÂÂ1 2 2 2 22

1 2

0 00

1

2

, , , ,/

, \ ,

q bp p

bp ph h

hc h

b g a fm r(65)

Bb

M a Np

bq

bp

bq

bo

o oo o op q Z

N

= +FHG

IKJ - -

FHG

IKJ

LNMM

OQPP

RS|T|

UV|W|Œ=

-

ÂÂ1 2 2 2 22

0

1 2

0 00

1

2

, , , , ./

, \ ,

q bp p

bp ph h

hc h

b g a fm r(66)

The proof is a simple variant of the proof for the one-octavesampling case (see [5]).

When A > 0 and B < �, \m,n,k,l constitute a frame. This isvalid only if the following conditions are true, with thedefinitions in (61), (62), and (63),

For bo sufficiently small, m(\; ao, To, N) > 0, M(\; ao, To, N)< � , EK(s, t) decays at least as fast as (1 + s2+t2)−(1+H),

with e > 0, or put in another way,

sup ,,s t R

s t s t CŒ

++ +L

NMOQP = < • >1 02 2 1e j a fa fe

eb e for some (67)

where s and t are defined as in EK(s, t).

The condition on EK and the bounds are satisfied if $ ( , )y x nh

decays exponentially as in our case and that $ ( , )y x nq la ao

mom

= 0 when ( � = 0, � = 0), for all m, l.For 2D multivoice Gabor wavelets, the frame bounds are

as follows,

Ab

a a a a Ro

S om

om

om

om

l Qm Z

N

l l= + - -L

NMOQP -

RS|T|

UV|W|Œ

ŒŒ=

-

ÂÂÂ1 122

2 2

0

1

inf $ , $ ,,x n q

hqh

h

y x n y x ne j e j(68)

Bb

a a a a Ro S

om

om

om

om

l Qm Z

N

l l= + - -L

NMOQP +

RS|T|

UV|W|Œ ŒŒ=

-

ÂÂÂ1 122

2 2

0

1

sup $ , $ ,,x n

qh

qh

h

y x n y x ne j e j(69)

where

Rp

bq

bp

bq

bo o o op q Z

N

=FHG

IKJ - -

FHG

IKJ

LNMM

OQPPŒ=

-

=+ -ÂÂÂ b

p pb

p peh

eh

he

2 2 2 21 2

0 00

1

2

, ,/

, \ ,, b g a fm r (70)

and

b y x n ey x neh

x nqh

qhs t a a a a

Som

om

om

om

l

K

m Zl l

, sup $ , $ ,,

a f e j e j= + - -Œ =Œ

ÂÂ14

1

$ , $ ,y x n ey x nqh

qh

l la s a t a s a to

mom

om

om+ + + - - - -e j e j (71)

These frame bound equations for Gabor wavelets withsuboctave, multi-orientation phase space sampling allow usto study the implications of the cortical sampling strategies.

6 FRAME BOUNDS FOR VARIOUS SAMPLINGSCHEMES

Given \ is admissible, i.e., it has reasonable decay in spaceand spatial frequency domains, with zero d.c. response( $ ( , )y 0 0 0= ), then there exists a range of ao, To, bo for whicha family of \ constitute a frame. The lower bounds forthese parameters are ao > 1, bo > 0, To > 0 and the upperbounds are given by the critical values ao

coc,q , and bo

c .

b boc

o= inf{ s. t.

bp p

bp p

y qh h

h

2 2 2 2

2 0 00

1 pb

qb

pb

qb m a

o o o op q Z

N

o o, , , , }, \ ,

FHG

IKJ - -

FHG

IKJ

LNMM

OQPP

≥Œ=

-

ÂÂb g a fm r

c h (72)

q qoc

o= inf{ s. t.

bp p

bp p

yh h

h

2 2 2 2

2 0 0

1 pb

qb

pb

qb m a b

o o o op q Z

N

o o, , , }, \ ,

FHG

IKJ - -

FHG

IKJ

LNMM

OQPP

≥Œ

-

ÂÂb g a fm r

c h (73)



a aoc

o= inf{ s. t.

bp p

bp p

y qh h

h

2 2 2 2

2 0 00

1 pb

qb

pb

qb m b

o o o op q Z

N

o o, , , , }, \ ,

FHG

IKJ - -

FHG

IKJ

LNMM

OQPP

≥Œ=

-

ÂÂb g a fm r

c h (74)

Because there is a line of zeroes in the frequency plane foreach 2D Gabor wavelet (16),

[[o + QQo = 0 (75)

at least two sampling orientations are required to cover thefrequency domain, except at ([ = 0, � = 0). This is becausewavelets of the same orientation have the same line of ze-roes, since Qo/[o is fixed for each orientation. Therefore, theupper bound of the rotation step size q o

c , cannot exceed S/2.The upper bound of the unit spatial interval bo is 1 for

the single voice wavelets. This can be deduced from theNyquist sampling theorem: the maximum sampling inter-vals (max nx) before aliasing occurs is O/2, where O is thewavelength of the filter. Since nx = aobo, where ao = V =O/2, the upper bound of bo is 1 for the one-octave samplingscheme. However, for fractionally dilated wavelets, theupper bound of bo could be pushed above 1 becausewavelets of the fractional scales are imposed on denserlattices.

To determine whether the 2D Gabor wavelets form aframe and how tight that frame is, we computed theirframe bounds using the frame bound (68) and (69), themother wavelet (18), and the fractional wavelet scaling(60). The frame bounds for two families of the 2D Gaborwavelets with bandwidths of 1.0 octave and 1.5 octaves

respectively were computed for different chosen samplingdensities. The results are shown in Tables 1 to 5.

We also computed the frame bounds for a set of K and bofor N = 1 to 4. Fig. 5 shows the frame regions of differentdegrees of tightness for each N. The tightness of the frameis measured by the ratio B/A and is classified into five de-grees, each indicated by a different gray level in the figure.It is found that when N = 1, there is no tight frame. As Nincreases, the frame region expands and the frame becomestighter and tighter.

Several general observations can be made from these re-sults. First, the frame becomes tighter with the increase inthe number of orientation, frequency and spatial samplingsteps. Second, at each phase space sampling density, the 1.5octave bandwidth Gabor wavelets provide a tighter framethan the 1.0 octave bandwidth Gabor wavelets.

Finally, with a phase space sampling density of 20 ori-

TABLE 1FRAME BOUNDS FOR THE 2D GABOR WAVELETS WITH1.5 OCTAVE BANDWIDTH FOR DIFFERENT NUMBERS OF

SAMPLING ORIENTATIONS (K), WITH THE NUMBER OF FRE-QUENCY STEPS PER OCTAVE N = 1 AND N = 3,

AND THE UNIT SPATIAL SAMPLING INTERVAL bo = 'X/ao = 0.8N = 1

K A B B/A4 2.502 54.388 21.7396 17.131 56.965 3.3258 32.107 62.141 1.93512 57.150 81.810 1.43216 78.494 107.516 1.37020 98.499 134.178 1.362

N = 3K A B B/A4 19.273 136.424 7.0796 67.908 144.066 2.1228 118.664 161.044 1.35712 201.665 217.529 1.07916 272.766 286.155 1.04920 341.454 357.199 1.046

TABLE 2FRAME BOUNDS FOR THE 2D GABOR WAVELETS WITH1.5 OCTAVE BANDWIDTH FOR DIFFERENT NUMBERS OFSAMPLING ORIENTATIONS (K), WITH THE NUMBER OF

FREQUENCY STEPS PER OCTAVE N = 1 AND 3,AND THE UNIT SPATIAL SAMPLING INTERVAL bo = 1

N = 1K A B B/A4 - - -6 - - -8 5.114 55.205 10.79512 18.168 70.766 3.89516 26.255 92.792 3.53420 33.033 115.880 3.508

N = 3K A B B/A4 - - -6 25.440 110.223 4.3338 57.460 121.554 2.11512 107.492 160.792 1.49616 146.924 210.786 1.43520 184.056 263.082 1.429

The notation ‘-’ implies not guaranteed to form a frame.

Fig. 5. The figures depict regions of tightness in the parametric spaceof sampling density. The frames become tighter with the increase inphase space sampling density.



entations per cycle and 3 scale steps per octave, the 1.5 oc-tave bandwidth Gabor wavelets family forms a tight framewhen nx = 0.4O, (bo = 0.8). There is physiological evidencesuggesting that the phase space sampling density of thecortical cells in the brain’s primary visual cortex is close tothis tight frame sampling density. Hubel and Wiesel’s [16]data suggested that at least 16 to 20 orientations are beingsampled per hypercolumn, and the receptive fields of thecortical cells in one hypercolumn overlap half of the recep-tive fields of the cells in the adjacent hypercolumns. Camp-bell and Robson [4], De Valois et al. [11], [10], found that ateach eccentricity the human visual system is sensitive to aspatial frequency range of three to five octaves. Pollen andFeldon [32] found that frequency is sampled at 1/2 octaveintervals at cat visual cortex, while the data of Silverman etal. [36] suggested the sampling interval in spatial frequencyis about 1/3 octave in monkeys.

TABLE 5FRAME BOUNDS FOR 2D GABOR WAVELETS OF

1.0 OCTAVE BANDWIDTH FOR DIFFERENT UNIT SPATIALSAMPLING INTERVALS BO, WITH THE NUMBER OF SAMPLING

ORIENTATIONS K = 20 AND THE NUMBER OFFREQUENCY STEPS PER OCTAVE N = 1 AND 3

N = 1

bo A B B/A0.25 543.289 844.349 1.5540.50 135.817 211.093 1.5540.75 39.490 114.692 2.9040.80 21.067 114.444 5.4321.00 - - -1.25 - - -

N = 3

bo A B B/A0.25 2087.619 2090.410 1.0010.50 521.899 522.608 1.0010.75 210.553 253.672 1.2050.80 170.087 237.924 1.3991.00 57.254 203.873 3.5611.25 1.550 165.572 106.846

These findings suggest that the cortical cells in the primaryvisual cortex not only provide a complete representation of theimage within a three to five octave frequency band at eacheccentricity, but also provide an almost tight frame represen-tation through oversampling. In the next section, we will ex-plore the implications of such a representation.

7 ILLUSTRATIVE EXPERIMENTAL RESULTS

The numerical results of the last section indicate that at thephase space sampling density of K = 20, N = 3, bo < 0.8, theGabor wavelets can form a very tight frame. In fact, even atK = 8, N = 1, bo = 0.8, the frame is reasonably tight (B/A = 2)so that any image can be reconstructed approximately by alinear superposition of the Gabor wavelet elementary func-tions, as follows,

f A B fm n k l m n k lm n k l

approx = + < >Â2y y, , , , , ,

, , ,

, (76)

without the use of a dual frame [5]. This is because as theredundancy increases, the frame becomes tighter. Both thelower and upper bounds of the spectral power of the trans-form (A, B in (68) and (69)) will increase and converge to-ward each other, rendering insignificant the interferencedue to the nonorthogonality of the Gabor wavelets (R in(70)). On the other hand, if the frame is not tight, the resi-dues due to the nonorthogonality of the Gabor waveletswill introduce spurious effect to the reconstructed image asshown in Fig. 7a. In this case, coefficients from the projec-tion onto the dual frame {~ }, , ,y m n k l are needed if f is to bereconstructed by a linear superposition of the wavelet ele-mentary functions as in (26). Dual frames are often difficultto obtain. But the projection coefficients to the dual framecan be obtained numerically by an iterative error minimi-zation procedure [8].

In this iterative method, which we will use to evaluateempirically the completeness of a frame, the set of projec-

TABLE 3FRAME BOUNDS FOR THE 2D GABOR WAVELETS WITH

1.5 OCTAVE BANDWIDTH FOR DIFFERENT UNIT SPATIALSAMPLING INTERVALS BO, WITH THE NUMBER OF

SAMPLING ORIENTATIONS K = 20 AND THE NUMBER OFFREQUENCY STEPS PER OCTAVE N = 1 AND N = 3

N = 1

bo A B B/A0.25 1087.822 1294.789 1.1900.50 271.955 323.697 1.1900.75 117.261 147.473 1.2580.80 98.499 134.178 1.3621.00 33.033 115.880 3.5081.25 - - -

N = 3

bo A B B/A0.25 3576.876 3577.334 1.0000.50 894.219 894.333 1.0000.75 393.801 401.112 1.0190.80 341.454 357.200 1.0461.00 184.056 263.082 1.4291.25 65.683 220.456 3.357

TABLE 4FRAME BOUNDS FOR THE 2D GABOR WAVELETS WITH1.0 OCTAVE BANDWIDTH FOR DIFFERENT NUMBERS OFSAMPLING ORIENTATIONS (K), WITH THE NUMBER OF

FREQUENCY STEPS PER OCTAVE N = 1 AND 3, AND THEUNIT SPATIAL SAMPLING INTERVAL bo = 0.8

N = 1K A B B/A4 - - -6 - - -8 - - -12 7.266 74.224 10.21516 15.765 92.388 5.86020 21.068 114.444 5.432

N = 3K A B B/A4 - - -6 2.727 125.798 46.1378 34.536 129.686 3.75512 92.217 152.605 1.65516 134.362 192.045 1.42920 170.087 237.924 1.398



tion coefficients to the dual frame c fm n k l m n k l, , , , , ,, ~=< >y canbe obtained by minimizing the following cost function,

E f cm n k l m n k lm n k l

= - ◊Â , , , , , ,, , ,

y2

(77)

where f is the original image, and \ is a family of 2D Gaborfilters as specified in (33). Then the image can be recon-structed by

f cm n k l m n k lm n k l

= ◊Â , , , , , ,, , ,

y (78)

However, the cortical sampling rate approximates a tightframe. Therefore, we should be able to reconstruct the im-age using the following linear summation formula,

f x y A B f x y x y x ym n k l m n k ln m k l

, , , , ,, , , , , ,, , ,

b g b g b g b g= + < >Â2y y (79)

This reconstruction formula implies the use of a pyramidsampling scheme, with spatial sampling interval Dx a bo

mo=

at each m level. We conducted a set of experiments to testthese theoretical insights.

Fig. 6 presents the reconstruction results using both thelinear summation method as well as the iterative method.In this reconstruction, only eight orientations and sevenscales (one octave apart) are used (K = 8, N = 1, bo = 0.8).The reasonable reconstruction result using the iterativemethod shows the completeness of the reconstruction(Fig. 6c), establishing that the wavelets indeed form aframe. The result using the linear summation method pro-vides a good approximation for the original image also(Fig. 6b), suggesting that, even at this relatively coarsesampling density, the frame is reasonably tight. As thenumber of orientations and the number of voices increasetoward the sampling rate of the visual cortex, the frame willtheoretically become tighter, and the result using this linearsummation method should be a better approximation of theoriginal image.

Fig. 6. Reconstruction of image ‘Lena’ (a) using the linear summationmethod (b) and the iterative error minimization method (c). The exam-ple is computed with a relatively tight frame (K = 8, N = 1, bo = 0.8).Wavelets with seven scales are used in a pyramid scheme. Even atthis coarse sampling density, the approximation to the original image isreasonable.

Fig. 7 presents the reconstruction results obtained whenonly three orientations are used (K = 3, N = 1, bo = 0.8). Thelinear summation results indicate that the frame is veryloose, and therefore linear summation is not a good ap-proximation (Fig. 7a). The iterative method, however, dem-onstrates that even three orientations can provide a com-

plete representation of the image (Fig. 7b). On the otherhand, when bo = 3.0, the reconstruction (K = 8, N = 1, bo = 3.0),even with the iterative method, is no longer complete be-cause this family of Gabor wavelets no longer forms aframe (Fig. 7c).

Fig. 7. (a) Reconstruction of image ‘Lena’ using a complete by looseframe (k = 3, N = 1, bo = 0.8) with the linear summation method showsthat this method will introduce spurious results when the frame is nottight. However, the iterative error minimization method yields a reason-able reconstruction (b), indicating that the frame is complete. When aframe is not complete, even the iterative method cannot eliminate thespurious effect due to incompleteness. (c) shows an example of thereconstruction using an incomplete frame (K = 8, N = 1, bo = 3).

The results above show that two or three orientations aresufficient for complete representation of the image. Why,then, does the brain construct a tight frame? Is there anyadvantage for sampling at such a large number of orienta-tion and frequency steps?

Grossman et al. [14] have proposed that the continuouswavelet transform, through its redundancy, might providea robust representation in the sense that the wavelet coeffi-cients for a function can be computed and stored with lowprecision and still can be used to reconstruct or representthe original function with much higher precision. Could theredundancy provided by a tight frame reduce the“ resolution burden” of the cortical cells? Suppose a neu-ron’s firing rate is limited in resolution, probably to threebits or four bits. Can an eight bit resolution image be repre-sented, for example, using cells with four bits in resolution?

To test this idea, we conducted another set of experi-ments in which the coefficients of the Gabor wavelets (i.e.,the responses of the simple cells) are severely quantized.The phase space sampling density is the same as in the pre-vious experiment, i.e., K = 8, N = 1, bo = 0.8. Fig. 8 shows thereconstruction of an image from the 2D Gabor wavelettransform with its coefficients quantized to four bits, threebits, and two bits, respectively. The quantization is doneindependently on each scale, and the range of the responseat each scale is stored and used subsequently in the recon-struction. We observe that even at such a coarse samplingdensity, the original image can be reasonably representedwhen the resolution of the coefficients or neuronal re-sponses is reduced to four bits or even three bits (Fig. 8b).The degradation of the image representation is gracefulwith the decrease in the resolution of the representationalelements (Fig. 8c).

8 CONCLUSION

In this paper, we have derived, based on physiological con-straints and the wavelet theory, a family of 2D Gabor



wavelets which model the receptive fields of the simplecells in the brain’s primary visual cortex. By generalizingDaubechies’s [5] frame criteria to 2D, we established theconditions under which a discrete class of continuous Ga-bor wavelets will provide complete representation of anyimage. We computed the frame bounds for the 2D Gaborwavelet families with 1.0 octave frequency bandwidth and1.5 octave frequency bandwidth, and found that the latterfamily provides a tighter frame at identical phase spacesampling densities. For the 1.5 octave family, the frame istightened significantly when the number of voices (i.e., thenumber of wavelet frequencies per octave) is increasedfrom one to two, but not much more with further increasein the number of voices (see Fig. 5). Even at one voice andeight orientations, the family forms a reasonably tightframe at bo = 0.8 (i.e., spatial sampling interval nx = 0.4O).Hence, image reconstruction can be achieved by linear su-perposition of the Gabor wavelets using the wavelet pro-jection coefficients. This finding suggests that such a parsi-monious sampling of the phase space is sufficient for anapproximation of the continuous wavelet transform— aninsight that is important for computer vision applicationsand brain modeling which involve continuous wavelettransform formulations [20].

Physiological data suggest that the cortical samplingdensity is far greater than the parsimonious sampling den-sity required for complete representation. In fact, it is denseenough to form a tight frame within a three to five octavesfrequency band at each eccentricity, providing an over-complete and redundant representation of the retinal imagewithin that frequency band. We have demonstrated thereare at least two advantages to such a redundant represen-tation: first, an image can be represented and easily recon-structed as a linear superposition of the receptive fieldstructures of the simple cells weighed by their firing rates;second, high precision information can be computed andstored by a population of low-precision neurons. It hasbeen suggested by information theoretic analysis that indi-vidual neurons can signal at most three to four bits of in-formation in their instantaneous responses [39], [31], [37].Earlier work on Gabor wavelets focused on the issues inimage compression and compact representation [8]. Ourwork suggests that precision in resolution achievedthrough redundancy may be a more relevant issue in brainmodeling. Furthermore, the high orientation and frequencysampling densities allow orientation and scale variables tobe represented precisely and explicitly in the primary visual

cortex. As a result, only a small number of cells need to beactivated at a time to signal precisely the structures in natu-ral images [30]. Finally, the visual cortex is primarily con-cerned with extracting and computing perceptual informa-tion such as segmenting a scene [19], [22], [23], rather thanre-presenting simply the retinal image. The simple cells,modeled by Gabor wavelets, with the redundancy providedby a tight frame, facilitate these computations by providingan ideal medium for representing surface texture and sur-face boundary with high resolution.

ACKNOWLEDGMENTS

The author thanks David Mumford, Josef Segman, PeterHallinan, Robert Hewes, and Song-Chun Zhu for helpfuldiscussion. The author is indebted to the three reviewersand particularly John Daugman for their thoughtful com-ments and careful correction of the manuscript. This re-search is supported in part by a Harvard-MIT Division ofHealth Science and Technology fellowship and in part byNational Science Foundation grant DMS 91-21266.

REFERENCES

[1] Z. Li and J.J. Atick, “ Towards a Theory of the Striate Cortex,”Neural Computation, vol. 6, pp. 127-146, 1994.

[2] V. Bargmann, P. Butera, L. Girardello, and J.R. Klauder, “ On theCompleteness of Coherent States,” Rep. Math. Phys., vol. 2, pp. 221-228, 1989.

[3] A.C. Bovik, M. Clark, and W.S. Geisler, “ Multichannel TextureAnalysis Using Localized Spatial Filters,” IEEE Trans. PatternAnalysis and Machine Intelligence, vol. 12, no. 1, pp. 55-73, Jan. 1990.

[4] F.W. Campbell and J.G. Robson, “ Application of Fourier Analysisto the Visibility of Gratings,” J. Physiology, vol. 197, pp. 551-566,1968.

[5] I. Daubechies, “ The Wavelet Transform, Time-Frequency Localiza-tion and Signal Analysis,” IEEE Trans. Information Theory, vol. 36,no. 5, pp. 961-1004, 1990.

[6] J.G. Daugman, “ Two-Dimensional Spectral Analysis of CorticalReceptive Field Profile,” Vision Research, vol. 20, pp. 847-856, 1980.

[7] J.G. Daugman, “ Uncertainty Relation for Resolution in Space,Spatial Frequency, and Orientation Optimized by Two-Dimensional Visual Cortical Filters,” J. Optical Soc. Amer., vol. 2,no. 7, pp. 1,160-1,169, 1985.

[8] J.G. Daugman, “ Complete Discrete 2-D Gabor Transforms byNeural Networks for Image Analysis and Compression,” IEEE Trans.Acoustics, Speech, and Signal Processing, vol. 36, no. 7, pp. 1,169-1,179,1988.

[9] J.G. Daugman, “ Quadrature-Phase Simple-Cell Pairs Are Appro-priately Described in Complex Analytic Form,” J. Optical Soc. Am.,vol. 10, no. 7, pp. 375-377, 1993.

[10] R.L. De Valois, D.G. Albrecht, and L.G. Thorell, “ Spatial Fre-quency Selectivity of Cells in Macaque Visual Cortex,” Vision Re-search, vol. 22, pp. 545-559, 1982.

[11] R.L. De Valois and K.K. De Valois, Spatial Vision. New York: Ox-ford Univ. Press, 1988.

[12] R.J. Duffin and A.C. Schaeffer, “ A Class of Nonharmonic FourierSeries,” Trans. Am. Math. Soc., vol. 72, pp. 341-366, 1952.

[13] D. Gabor, “ Theory of Communication,” J. IEE, vol. 93, pp. 429-459, 1946.

[14] A. Grossmann, R. Kronland-Martinet, and J. Morlet, “ Readingand Understanding Continuous Wavelet Transforms,” Wavelets,J.M. Combes, A. Grossmann, and P. Tchamitchian, eds., pp. 2-20.Berlin: Springer-Verlag, 1989.

[15] C.E. Heil and D.F. Walnut, “ Continuous and Discrete WaveletTransforms,” SIAM Review, vol. 31, no. 4, pp. 628-666, 1989.

[16] D.H. Hubel and T.N. Wiesel, “ Functional Architecture of Ma-caque Monkey Visual Cortex,” Proc. Royal Soc. B (London), vol. 198,pp. 1-59, 1978.

Fig. 8. Reconstruction of image ‘Lena’ using the linear summationmethod with coefficients quantized to four bits, three bits, and two bits,respectively. The example is computed with a relatively tight frame(K = 8, N = 1, bo = 0.8).



[17] J.P. Jones and L.A. Palmer, “ An Evaluation of the Two-DimensionalGabor Filter Model of Simple Receptive Fields in the Cat StriateCortex,” J. Neurophysiology, vol. 58, pp. 1,233-1,258, 1987.

[18] J.J. Kulikowski and P.O. Bishop, “ Fourier Analysis and SpatialRepresentation in the Visual Cortex,” Experientia, vol. 37, pp. 160-163, 1981.

[19] V.A.F. Lamme, “ The Neurophysiology of Figure-Ground Segre-gation in Primary Visual Cortex,” J. Neuroscience, vol. 10, no. 2,pp. 649-669, 1995.

[20] T.S. Lee, D. Mumford, and A.L. Yuille, “ Texture Segmentation byMinimizing Vector-Valued Energy Functionals: The Couple-Membrane Model,” Lecture Notes in Computer Science, 588, G.Sandini, ed., Computer Vision ECCV ’92, pp. 165-173. Springer-Verlag, 1992.

[21] T.S. Lee, “ A Bayesian Framework for Understanding the TextureSegmentation in the Primary Visual Cortex,” Vision Research, vol. 35,no. 18, pp. 2,643-2,657, 1995.

[22] T.S. Lee, D. Mumford, and P.H. Schiller, “ Neural Correlates ofBoundary and Medial Axis Representations in Primate VisualCortex.” Invest. Opth. Vis. Sci, vol. 36, p. 477, 1995.

[23] T.S. Lee, V.A.F. Lamme, and D. Mumford, “ The Role of V1 inScene Segmentation and Shape Representation,” unpublishedmanuscript.

[24] S. Mallat, “ A Theory for Multiresolution Signal Decomposition:The Wavelet Representation,” IEEE Trans. Pattern Analysis andMachine Intelligence, vol. 31, pp. 674-693, 1989.

[25] S. Marcelja, “ Mathematical Description of the Responses of Sim-ple Cortical Cells,” J. Optical Soc. Am., vol. 70, pp. 1,297-1,300,1980.

[26] Y. Meyer, Wavelets and Operators. New York: Cambridge Univ.Press, 1992.

[27] J. Morlet, G. Arens, E. Fourgeau, and D. Giard, “ Wave Propaga-tion and Sampling Theory— Part II: Sampling Theory and Com-plex Waves,” Geophysics, vol. 47, no. 2, pp. 222-236, 1982.

[28] J.A. Movshon and D.J. Tolhurst, “ On the Response Linearity ofNeurons in Cat Visual Cortex,” J. Physiology (London), vol. 249,pp. 56P-57P, 1975.

[29] D. Mumford, “ Neuronal Architectures for Pattern-TheoreticProblems,” Large Scale Neuronal Theories of the Brain, C. Koch, ed.MIT Press, 1993.

[30] B.A. Olshausen and D.J. Field, “ Sparse Coding of Natural ImagesProduces Localized, Oriented, Bandpass Receptive Fields,” un-published manuscript.

[31] L.M. Optican and B.J. Richmond, “ Temporal Encoding of Two-Dimensional Patterns by Single Units in Primate Inferior Tempo-ral Cortex. III. Information Theoretic Analysis,” J. Neurophyisology,vol. 57, pp. 162-178, 1987.

[32] D.A. Pollen and S.E. Feldon, “ Spatial Periodicities of PeriodicComplex Cells in the Visual Cortex Cluster at One-Half OctaveIntervals,” Invest. Ophth. Vis. Sci, pp. 429-434, Apr. 1979.

[33] D.A. Pollen and S.F. Ronner, “ Phase Relationship Between Adja-cent Simple Cells in the Visual Cortex,” Science, vol. 212, pp. 1,409-1,411, 1981.

[34] M. Porat and Y. Zeevi, “ The Generalized Gabor Scheme of ImageRepresentation in Biological and Machine Vision,” IEEE Trans.Pattern Analysis and Machine Intelligence, vol. 10, no. 4, pp. 452-468, 1988.

[35] J. Segman and Y.Y. Zeevi, “ A Wavelet-Type Approach to ImageAnalysis and Vision,” Brussels: NATO Book on Wavelets, Aug. 1992.

[36] M.S. Silverman, D.H. Grosof, R.L. De Valois, and S.D. Elfar,“ Spatial-Frequency Organization in Primate Striate Cortex,” Proc.Nat’l Academy of Science U.S.A., vol. 86, Jan. 1989.

[37] D.J. Tolhurst, “ The Amount of Information Transmitted AboutContrast by Neurons in the Cat’s Visual Cortex,” Visual Neurosci-ence, vol. 2, pp. 409-413, 1989.

[38] M.A. Webster and R.L. De Valois, “ Relationship Between SpatialFrequency and Orientation Tuning of Striate Cortex Cells,” J. Op-tical Soc. Am., vol. A2, no. 7, July 1985.

[39] G. Werner and V.B. Mountcastle, “ Neural Activity in Mechano-Receptive Cutaneous Afferents: Stimulus-Response Relations,Weber Functions, and Information Transmission,” J. Neurophysiol-ogy, vol. 28, pp. 359-397, 1965.

Tai Sing Lee received the SB degree in 1986and the PhD degree in engineering sciences in1993, both from Harvard University, Cambridge,Massachusetts. He also graduated from thePhD program in medical engineering and medi-cal physics from the Harvard-MIT Division ofHealth Science and Technology in 1993. From1993 to 1996, he was a postdoctoral fellow inneurophysiology in the Department of Brain andCognitive Sciences at MIT, and a postdoctoralfellow in biomedical physics in the Division of

Applied Sciences, Harvard University.Dr. Lee is currently an assistant professor in the Department of

Computer Science at Carnegie Mellon University and in the CMU-University of Pittsburgh Center of Neural Basis of Cognition. His re-search involves the application of mathematical, computational, andneurophysiological techniques to uncover the neural mechanisms un-derlying visual perception in the brain.

lee: image representation using 2d gabor wavelets 3 · lee: image representation using 2d gabor...

Documents