lecture 1 introductory mathematical ideaslecture 1 introductory mathematical ideas the term wavelet,...

Lecture 1

Introductory mathematical ideas

The term wavelet, meaning literally ‘little wave’, originated in the early 1980s in itsFrench version ‘ondelette’ in the work of Morlet and some French seismologists.

Prototypical example. Haar wavelet

-3

-2

-1

0

1

2

1

1

−1

h(x) =

1, 0 ≤ x < 12,

−1, 12 ≤ x < 1,

0, elsewhere.

12

Here the ‘wave’ property is shown in the graph and indicated also by the fact that h hasaverage zero i.e., ∫ ∞

−∞h(x) dx =

∫ 1

0

h(x) dx = 0,

so the Haar wavelet has some oscillation as do trigonometric functions, but unlike suchperiodic functions as sin and cosine it has compact support - it ‘lives’ on the interval [0, 1)and doesn’t oscillate everywhere on (−∞, ∞) - hence the addition of the word ‘little’.

Much of the impetus for this development, however, came from the realization that theideas underlying wavelets could be found in many different forms in many different areasof mathematics, science and engineering:

(I) Harmonic analysis:

(II) Mathematical physics:

(III) Electrical engineering, signal processing:

(IV) Geology:

(V) Image analysis:

This meant that researchers from many disparate fields were automatically attracted towavelets. As a result the development of the subject, both theory and applications, has‘exploded’ since the early 1980s. It is not our intention to cover all these topics, even if

1

2

time were available, nor can we hope to describe a lot of the ‘cutting-edge’ applicationsthat have been made of wavelets to these and other areas, but we shall try to developenough of the ideas that one can at least begin to explore the world of wavelets with anemphasis on signal analysis, admittedly from a mathematical point of view!

We shall use the words function and signal interchangably. A function will be a function

f = f(x), f = f(x, y)

of one or more variables (at most two in these notes), or a sequence

a = {an}n, a = {amn}m,n

with one or more indices. The continuous variable case corresponds to analogue signals,while the discrete case corresponds to digital signals. Whether continuous or discrete,the single variable should usually be thought of as time or space; on the other hand, twovariables should always be thought of as space variables so that the function or signal thencorresponds to an image, a still image that is, since moving images will involve three ormore variables.

(1.1) Energy. The ‘size’ of a function can be measured in many ways - the absolutemaximum or absolute minimum value was a standard way in calculus, for instance. Anaverage rather than a pointwise measure will be more useful to us, however.

As always we’ll take as our starting point the N -dimensional space CN of all N -tuplesz = (z1, z2, . . . , zN ) of complex numbers where ‘size’ or ‘Energy’ of z means the square

E(z) = |z1|2 + |z2|2 + . . .+ |zN |2 =(‖z‖CN

)2

of the usual Euclidean length. A point of view more rooted in algebraic structures, however,points the way to the future. Let’s denote by ZN the set

ZN ={

0, 1, 2, . . . , N − 1}

of integers; you may have learned already in algebraic structures that this set has anadditive structure defined on it, namely addition m + n = k where m + n ≡ k(modN)which in number theory courses you probably called modular addition. A sequence z inCN is then just a function z : ZN → C. But to appreciate why modular addition arises,set w = e2πi/N so that wN = 1; then wmwn = wm+n = wk. This will become fundamentalwhen we want to apply Fourier analysis to finite data sets.

There are natural extensions to functions f : X → C defined on any set X on whichsome notion of integration or summation and are defined. For example, when X = R =(−∞, ∞) or X = R2 the Energy of a analogue signal f is defined by

E(f) =

∫ ∞

−∞|f(x)|2 dx,

∫ ∞

−∞

∫ ∞

−∞|f(x, y)|2 dxdy .

3

while in the case of a (two-way) infinite digital signal a defined on X = Z or on Z2 theenergy is given by

E(a) =∞∑

n =−∞

|an|2,∞∑

m, n =−∞

|amn|2

respectively; addition on R, R2 or on Z, Z2 is well-known! A finite energy function (orsignal) is thus one for which E(f) < ∞. In the analogue case, the set of all finite energyfunctions f = X → C will be denoted by L2(X), and by ℓ2(X) in the discrete case ofdigital signals. In particular, therefore, we can, and often do, write ℓ2(ZN ) instead of CN .Typical examples of such functions/signals are

0 50 100 150 200 250 300-4

-3

-2

-1

0

1

2

3

4

5

‘transients’ ‘transients+noise’

in the one variable case, and

0 25 50 75 1001251501750

50

100

150

200

‘real data’ ‘synthetic data’

in the two variable case.

4

(1.2) Analogue to digital. Many discrete signals come from sampling continuous func-tions:

f(t) −→ {an}n, an = f( n

2k

)

where the integer n varies but the integer k is fixed. Here we have sampled f by evaluatingthe function uniformly at dyadic rationals, the fixed choice of k indicating the resolution,the larger k is the finer the resolution.

-2-10123456789

1 2−1

1

2

-2

-1

0

1

2

3

4

5

6

7

8

9

1 2−1

1

2

b

b

b

b

b

b

b

b

b

b

b

b

b

b

b

b

b

b

A/D

Such analogue to digital (A/D) conversion is central to communications technology. Butdoes A/D conversion necessarily lead to a loss in energy?

(1.3) Digital to analogue. At the heart of wavelet analysis is the reversal of A/Dconversion. Fix a function φ = φ(x) and for each pair of integers k, n define a newfunction

φkn(x) = 2k/2 φ(2kx− n).

The inclusion of the 2k/2-factor preserves energy since

E(φkn) = 2k

∫ ∞

−∞|φ(2kx− n)|2 dx =

∫ ∞

−∞|φ(y)|2 dy = E(φ)

after a change of variable x → y = 2kx − n (notice how we are using structural such asaddition and scaling properties of the set X on which φ is defined. A simple example anda few graphs suggest what the significance of φkn is. Consider the box function

5

-1 0 1 2

-1

0

1

2

1

1

b(x) =

{1, 0 ≤ x < 1,

0, elsewhere.

Then b(x) lives on [0, 1), whereas b(2kx) lives on [0, 1/2k), i.e., b(2kx) is a ‘speeded up’or ‘slowed down’ version of b(x) according as k > 0 or k < 0; on the other hand, b(x−n)lives on [n, n + 1), i.e., b(x − n) is a ‘delayed’ or ‘advanced’ version of b(x) according asn > 0 or n < 0. Putting the two together we see that bkn(x) lives on [2−kn, 2−k(n+ 1));in general mathematical terms,

supp(bkn

)=

[ n

2k,n+ 1

2k

).

For fixed k, therefore, we get a specific resolution, while varying n changes location. Someexamples of the bkn are shown in

-1

0

1

2

3

1 2 3 4−1−2

2

b0,−1

b1,0

b2,3

b−1,1

b0,1

Given a sequence a = {an} we can now associate a function

f(x) =∑

n

an bkn(x)

at each fixed resolution k. For example at resolution k = 1, we get

6

-2-10123456789

1 2−1

1

2

-2

-1

0

1

2

3

4

5

6

7

8

9

1 2−1

1

2

b

b

b

b

b

b

b

b

b

b

b

b

bc

bc

bc

bc

bc

bc

D/A

In addition,

E(f) =

∫ ∞

−∞

∣∣∑

n

an bkn(x)∣∣2 dx =

∑

n

|an|2∫ ∞

−∞|bkn(x)|2 dx = E(a),

since

∫ ∞

−∞bkm(x)bkn(x) dx =

{1, m = n,

0, m 6= n,

(see problems). Consequently, this D/A conversion a → f using the box function haspreserved energy. Notice that the composition

(D/A) ◦ (A/D) : f −→∑

n

f( n

2k

)bkn(x)

takes a function to an approximating function at a fixed resolution k. As the example,again at k = 1 we get

-2-10123456789

1 2−1

1

2

-2

-1

0

1

2

3

4

5

6

7

8

9

1 2−1

1

2

b

b

b

b

b

b

b

b

b

b

b

b

b

b

b

b

b

b

bc

bc

bc

bc

bc

bc

D/A ◦A/D

7

shows, however, using the box function gives a poor visual approximation which may notimprove much even at high resolution because of the discontinuity of b(x). This beginsto suggest that continuous functions will do a better job for us; perhaps differentiablefunctions will give an even greater improvement - the eye can perceive slope and concavity!

(1.4) Inner product spaces. Just as functions can be added to form a new function,or vectors in 3-space added to form a new vector, so the sum of two signals makes goodsense. Doubling the strength of a signal shows that (real) multiples of a signal also is anaturally occuring concept exactly as it is for functions and vectors. All this suggests thatcollections of functions, vectors, and signals share two common features: they have length(energy) and linear combinations of them can be formed. As the mathematical notion ofvector space captures the idea of linear combinations, what we need to do is add a notionof length to a vector space. With the basic idea from calculus of dot (or inner) product inmind, let’s introduce inner product spaces.

Definition. let V be a (real or complex) vector space. An inner product on V is a mapping

( . , .) : V × V −→ C such that

(i) (f, f) ≥ 0, (f, f) = 0 if and only if f = 0, (f, g) = (g, f),

(ii) (f + g, h) = (f, h) + (g, h), (λf, f) = λ (f, g),

for all f, g, h in V and λ in C. The Energy of f in V is defined by E(f) = (f, f).When V is a real vector space we require that ( . , .) : V × V −→ (−∞, ∞) and that

(f, g) = (g, f).

The usual dot product

u .v = (u1, u2, u3) . (v1, v2, v3) = u1 v1 + u2 v2 + u3

on vectors makes 3-space into a (real) inner product space. The energy

E(u ) = u21 + u2

2 + u23 = (u, u)

of u is then simply the square of the usual notion of length, ‖u‖, of u, agreeing with theearlier definition of energy for a signal {u1, u2, u3}. In the same way, the spaces of finiteenergy signals introduced earlier become inner product spaces under the respective innerproducts

(i) ℓ2(ZN ) = CN : for sequences z = {zn}n, w = {bn}n

(z, w) =

N−1∑

n = 0

zn wn ;

8

(ii) ℓ2: for sequences a = {an}n, b = {bn}n

(a, b) =∞∑

n =−∞

an bn ;

(iii) ℓ2(Z2): for sequences a = {amn}mn, b = {bmn}mn

(a, b) =

∞∑

m, n =−∞

amn bmn ;

(iv) L2[0, 1]: for functions f = f(x), g = g(x)

(f, g) =

∫ 1

0

f(x) g(x)dx ,

and correspondingly for functions on any interval [a, b].

(v) L2(−∞, ∞): for functions f = f(x), g = g(x)

(f, g) =

∫ ∞

−∞f(x) g(x)dx ,

and correspondingly for functions of several variables.

It will be convenient to collect together some properties of an inner product.

Theorem. Let V be a complex inner product space. Then

(i) (Cauchy-Schwarz inequality)

|(f, g)|2 ≤ E(f)E(g),

(ii) (Triangle inequality)

E1/2(f + g) ≤ E1/2(f) +E1/2(g),

(iii) (Polarization identity)

(f, g) = 14

{E(f + g) − E(f − g) + iE(f + ig) − iE(f − ig)

}

9

hold for all f, g in V.

Exactly the same results hold for a real inner product space except that the polarizationidentity simplifies to

(f, g) = 14

{E(f + g) −E(f − g)

}.

Proof of Theorem : (i) When g = 0 the inequality is obvious, so we can assume that g 6= 0.Thus it is enough to show that

∣∣∣(f,

g

E1/2(g)

)∣∣∣2

≤ E(f),

or, equivalently, that

E(g) = 1 =⇒ |(f, g)|2 ≤ E(f).

Now when E(g) = 1,

0 ≤ E(f − (f, g)g) = (f − (f, g) g, f − (f, g) g)

= E(f) + |(f, g)|2 − (f, (f, g) g)− ((f, g) g, f).

But

(f, (f, g) g) + ((f, g) g, f) = 2|(f, g)|2.

Consequently,

E(f)− |(f, g)|2 ≥ 0, E(g) = 1,

establishing the Cauchy-Schwarz inequality.

(ii) For any f, g in V,

0 ≤ E(f + g) = (f + g, f + g) = E(f) + E(g) + (f, g) + (g, f)

= E(f) +E(g) + (f, g) + (f, g).

Consequently, in view of the Cauchy–Schwarz inequality,

0 ≤ E(f + g) ≤ E(f) + E(g) + 2|(f, g)| ≤ (E1/2(f) +E1/2(g))2,

establishing also the triangle inequality.

(iii) Left as exercise (see problems) �

10

(1.5) Orthonormal families. Just as the dot product on 3-vectors has an equivalentgeometric formulation

u .v = ‖u‖ ‖v‖ cos θ,

with θ the angle between u and v, the inner product (f, g) provides also a measure ofthe ‘angle’ between f and g for a general inner product space. From an analytic point ofview, the inner product (f, g) of two functions provides us with a measure of how well fcorrelates with g because if f looks like g then (f, g) will be close to the energy E(g) of g;for instance, if g oscillates, then f will correlate well with g when f is oscillating like g.

The geometric interpretation of the dot product allowed us to introduce perpendicular

unit vectorsi = (1, 0, 0), j = (0, 1, 0), k = (0, 0, 1)

in 3-space, (aka orthogonal ), so we can do the same in a general V. This is another of thereally key ideas in wavelets.

Definition. A subset {φn} of an inner product space V is said to be an orthonormal family

when

E(φn) = 1, (φm, φn) = 0 m 6= n,

i.e., the family consists of mutually orthogonal elements having unit energy.

For example, when the sequence ε(n) ={ε(n)ℓ

}ℓ

is defined by

ε(n)ℓ =

{1, ℓ = n,

0, ℓ 6= n,

the family{ε(n) : n = . . . ,−1, 0, 1, . . .

}is orthonormal in ℓ2; it is an obvious gener-

alization of the orthonormal family i, j, k for 3-space. But a vector v in 3-space can berepresented as

v = (v1, v2, v3) = v1i + v2j + v3k

where

v1 = (v, i ), v2 = (v, j ), v3 = (v, k );

naturally, there is a corresponding representation for any sequence a = {an} in the infinite-dimensional case ℓ2:

a =∑

n

an ε(n) =

∑

n

(a, ε(n)) ε(n).

This use of an orthonormal family to represent a signal in ℓ2 carries over to an arbitraryinner product space.

11

Definition. For a function f in an inner product space V the orthonormal series, S[f ],of f associated with an orthonormal family {φn} in V is defined by

S[f ] =∑

n

(f, φn)φn.

The numerical values (f, φn) are called the coefficients of f .

Notice that we don’t claim that f = S[f ] automatically holds. For instance, the family{i, j } is orthonormal in 3-space, but no vector in the z-direction can be represented interms of {i, j} since these vectors lie in the x-y plane; we obviously don’t have a largeenough family of orthonormal vectors! Making precise when f = S[f ] is a key step in thetheory.

Bessel’s Inequality. Let {φn} be an orthonormal family in an inner product space V.

Then the inequality

∑

n

|(f, φn)|2 ≤ E(f)

holds for all f in V; in other words, the sequence {(f, φn)}n of coefficients has finite energy

as a sequence in ℓ2, and E({(f, φn)}n

)≤ E(f).

Proof : using the symmetry and linearity properties of the inner product we see that

0 ≤ E(f −

N∑

n = 0

(f, φn)φn

)=

(f −

N∑

m = 0

(f, φm)φm , f −N∑

n = 0

(f, φn)φn

)

= E(f) − 2N∑

n = 0

|(f, φn)|2 +N∑

m, n =0

(f, φm) (f, φn) (φm, φn).

But by orthonormality,N∑

m, n = 0

(f, φm) (f, φn) (φm, φn) =

N∑

n = 0

|(f, φn)|2.

Consequently,

E(f)−N∑

n = 0

|(f, φn)|2 ≥ 0

for all choices of N . From this Bessel’s inequality follows immediately letting N −→ ∞,making obvious changes if the orthonormal family is indexed differently. �

In the example of {i, j} in 3-space, Bessel’s inequality becomes a strict inequality

|(v, i )|2 + |(v, j )|2 < E(v)

when v has a component in the z-direction, i.e., in the k-direction. This suggests whatwe have to do to guarantee that f = S[f ].

12

Definition. An orthonormal family {φn} in V is said to be complete when the equality∑

n

|(f, φn)|2 = E(f)

holds for all f in V; in other words, when Bessel’s inequality becomes an equality for all f .

Given a complete orthonormal family{φn

}∞

n=1it can be shown that

S[f ] =∑

n

(f, φn)φn = f

in the sense

(†) limN →∞

E(f −

N∑

n = 1

(f, φn)φn

)= 0.

When the orthonormal family is indexed by something other than {1, 2, . . .}, the partialsums have to be changed, but it’s usually clear how to do that. For instance, if the familyis indexed by {. . . , −1, 0, 1, . . . }, all the way from −∞ to ∞, then we would replace (†)with

(‡) limN →∞

E(f −

N∑

n =−N

(f, φn)φn

)= 0.

Given the importance of these ideas, let’s collect them together in one result.

Theorem. If {φn}n is a complete orthonormal family in an inner product space V, then

each f in V admits an orthonormal series representation

f = S[f ] =∑

n

(f, φn)φn.

Furthermore, the coefficients {(f, φn)}n satisfy the identities

(i) (Plancherel)

E(f) =∑

n

|(f, φn)|2 ,

(ii) (Parseval)

(f, g) =∑

n

(f, φn) (g, φn)

for all f, g in V.

Parseval’s equation follows immediately from Plancherel’s theorem by polarization (seeproof of a more general result in (1.7)), while Plancherel’s theorem is just the same as ourdefinition of completeness!

13

(1.6) Examples. 1. The simplest generalization of the {i, j, k} basis of R3 is the so-calledStandard Basis

ε(1) = (1, 0, 0, . . . , 0, 0), ε(2) = (0, 1, 0, . . . , 0, 0), . . . , ε(N) = (0, 0, 0, . . . , 0, 1)

of ℓ2(ZN ) and its generalization to the complete orthonormal family

{ ε(n) : −∞ < n <∞}in ℓ2 we have met already. This family will also be referred as the Standard Basis.

2. Of crucial importance to signal analysis is the so-called Fourier Basis: set w = e2πi/N .

Then each power wk, k ≥ 0, is an Nth root of unity in the sense that wN = 1. Now set

ξ(0) =1√N

(1, 1, 1, . . . , 1

), ξ(1) =

1√N

(1, w, w2, . . . , wN−1

),

...

ξ(k) =1√N

(1, wk, w2k, . . . , wk(N−1)

),

...

ξ(N−1) =1√N

(1, wN−1, w2(N−1, . . . , w(N−1)2

)

define N elements of ℓ2(ZN ) such that E(ξ(k)

)= 1 for each k. On other hand, by using

the geometric series fact:

1 + wn + w2n + · · ·+ wn(N−1) =1 − wnN

1 − w= 0 ,

it can be shown that the family

ξ(0) , ξ(1) , ξ(2) , . . . , ξ(N−1)

is a complete orthonormal family in ℓ2(ZN ).

3. In R2 the vectors

1√2

(i + j),

1√2

(i − j

)

form an orthonormal basis because they are just the rotation through 45◦ of the standard{i, j}-basis in the plane. This two dimensional example generalizes easily to ℓ2 in the sameway that {ε(n)}n generalizes {i, j}. Indeed, define ϕ(n), ϕ̃(n) in ℓ2 by

ϕ(n) =1√2

(ε(2n) + ε(2n+1)), ϕ̃(n) =

1√2

(ε(2n) − ε(2n+1)).

14

Then each of {ϕ(n)}n and {ϕ̃(n)}n is orthonormal in ℓ2; in addition, together the family{ϕ(n), ϕ̃(n)}n is a complete orthonormal family in ℓ2. Consequently, every x in ℓ2 admitsa representation

x =∑

n

(x, ϕ(n))ϕ(n) +∑

n

(x, ϕ̃(n)) ϕ̃(n).

In the signal processing literature, such bases were called block bases (can you see why?),and were very common prior to the appearance of wavelets. We shall meet these families{ϕ(n)}n , {ϕ̃(n)}n later in lecture 2 and crucially in lecture 4 in connection with filterbanks.

4. Since

∫ a+1

a

e2πimξ e−2πinξ dξ =

{1, m = n,

0, m 6= n,

the family {e2πinξ : −∞ < n < ∞} of period 1 functions is orthonormal in L2[a, a+ 1]for any choice of a; in particular, it is orthonormal in L2[−1

2 ,12 ] as well as in L2[0, 1]. It

can be shown that this family is also complete in L2[a, a+ 1] for any choice of a.

5. The family

{b0n : −∞ < n <∞}, b0n(x) = b(x− n),

of all integer delays of the box function b is orthonormal in L2(−∞, ∞) because

∫ ∞

−∞b0m(x) b0n(x) dx =

∫ m+1

m

b(x− n) dx =

{1, m = n,

0, m 6= n.

As any combination∑

n an b(x − n) is constant on intervals of length 1, however, thereis no way that the Haar function, for instance, can be written as

∑n an b(x− n). Hence

{b0n : −∞ < n <∞} is not complete in L2(−∞, ∞). Similarly, the family

{bkn : −∞ < n <∞}, bkn(x) = 2k/2 b(2kx− n),

is orthonormal, but not complete, in L2(−∞, ∞), for each fixed resolution k.

6. The family

{h0n : −∞ < n <∞}, h0n(x) = h(x− n),

of all integer delays of the Haar function h is orthonormal in L2(−∞, ∞). Since theinteger translate h0n = h(x−n) lives on [n, n+1) and has average 0 on that interval, the

15

box function b cannot be written as a linear combination∑

n an b(x − n); in particular,therefore, the family {h0n is not complete in L2(−∞, ∞). Similarly, the family

{hkn : −∞ < n <∞}, hkn(x) = 2k/2 h(2kx− n),

is orthonormal, but not complete, in L2(−∞, ∞), for each fixed resolution k.

7. By allowing all possible resolutions and integer delays of the Haar function, however,we do get a complete orthonormal family in L2(−∞, ∞). To be precise, when hkn is definedas in the previous example, then the family {hkn : −∞ < k, n <∞} is orthonormal sincecalculations show that

(hjm, hkℓ) = 2(j+k)/2

∫ ∞

−∞h(2jx−m) h(2kx− ℓ) dx =

{1, j = k, m = ℓ,

0, otherwise;

in addition, the family is complete in L2(−∞, ∞) so each finite energy function f admitsa represention

f(x) =∑

k, ℓ =−∞

(f, hkℓ) hkℓ(x).

This Haar decomposition is the prototype for all wavelet representations of finite energyanalogue signals for reasons we shall make clearer in lecture 4.

(1.7) Energy-preserving mappings, adjoints. In linear algebra a key role is playedby m× n matrices A = [Amn] because they define linear transformations

A : x −→ Ax =

a11 a12 . . . a1n

a21 a22 . . . a2n...

......

am1 am2 . . . amn

x1

x2...xn

by (right) matrix multiplication from Rn into R

m. The same is true in the complex case,of course. This idea extends to arbitrary linear transformations T : V −→ W from onevector space V into a possibly different vector space W . When V and W are inner productspaces, however, we shall want such T to ‘respect’ the energy of vectors in V . Let’s makethis precise because it too will be important in our development of the theory of wavelets.

Definition. A linear operator T : V −→ W from one inner product space V into a possibly

different inner product space W (aka linear mapping) is said to be bounded when there

is a constant such that the inequality

E(T f) ≤ const. E(f)

holds for all f in V. It is said to be energy-preserving when E(T f) = E(f) for all f .

16

If {φn} is orthonormal in V, then by Bessel’s inequality the coefficient mapping

T : f −→{

(f, φn)}

n

is bounded (with constant at most one) from V into ℓ2; in addition, this coefficient mappingis energy-preserving when {φn} is a complete orthonormal family. In the reverse direction,consider the D/A mapping

S : a = {an}n −→∑

n

an b(x− n)

from ℓ2 into L2(−∞, ∞). Because the integer translates b(x − n) are orthonormal inL2(−∞, ∞), we see that

E(Sa) =

∫ ∞

−∞

∣∣∣∑

n

an b(x− n)∣∣∣2

dx =∑

n

|an|2 = E(a).

Consequently, S is energy-preserving as a mapping from finite energy digital signals tofinite energy analogue signals. Similarly, the D/A mapping

Sk : a = {an}n −→∑

n

an bkn(x), bkn(x) = 2k/2b(2kx− n),

is energy-preserving at every fixed resolution k because the family {bkn}n is orthonormalfor each fixed k.

The previous examples show the close relationship between energy-preserving operatorsand orthonormal families. We need to make this precise in general. Suppose then thatT : V −→ W is an energy-preserving operator from one inner product space into a possiblydifferent one W. By the polarization identity, T preserves inner products also since

(T f, T g) = 14

{E(T (f + g))− E(T (f − g)) + iE(T (f + ig)) − iE(T (f − ig))

}

= 14

{E(f + g) − E(f − g) + iE(f + ig) − iE(f − ig)

}= (f, g)

(linearity of T ensures that T f + T g = T (f + g) etc). Hence, if {φn}n is orthonormal inV,

(T φm, T φn) = (φm, φn) =

{1, m = n,

0, m 6= n,

proving the following result.

17

Theorem. An energy-preserving mapping T : V −→ W from one inner product space

into a possibly different one W maps an orthonormal family {φn} in V to an orthonormal

family {T (φn)}n in W.

Finally, recall the idea of the adjoint

A∗ = At

=

a11 a21 . . . am1

a12 a22 . . . am2...

......

a1n a2n . . . amn

of the m × n matrix dealt with earlier. Then A∗ is an n ×m matrix, hence it defines alinear transformtion Rm −→ Rn by right matrix multiplication. But is there a deepermeaning to this? The answer lies in the notion of adjoint of a bounded operator betweeninner product spaces.

Definition. Let T : V −→ W be a bounded linear operator from one inner product space

into a possibly different one W. Then the adjoint T ∗ of T is the bounded linear operator

T ∗ : W −→ V such that (Tf, g) = (f, T ∗g) holds for all f in V and g in W.

When T is determined by an m×n matrix A, the adjoint of T is determined by the adjointmatrix A∗ of A (see problems).

A particularly interesting example for us is the adjoint of the coefficient mapping

T : V −→ ℓ2, T f = {(f, φn)}n

determined by an orthonormal family {φn}n in V. For then

T ∗ : {an}n −→∑

n

an φn

since

(Tf, a) =∑

n

(f, φn) an =(f,

∑

n

an φn

)= (f, T ∗a).

Notice that T ∗ is always energy-preserving whether or not {φn}n is complete in V. Wehave used this several times already. For instance, the family {b0n}n, b0n(x) = b(x− n),is orthonormal in L2(−∞, ∞), and the adjoint of the coefficient mapping

f −→ {(f, b0n)}n, (f, b0n) =

∫ n+1

n

f(y) dy

is simply the D/A mapping

S : {an}n −→∑

n

anb(x− n).

18

(1.8) Summary, compression. So what have we achieved so far? It has been convenientto think of collections of signals, whether analogue or digital, and functions, of continuousor discrete variables, as belonging to an inner product space. This combined the linearstructure of signals with the notion of energy and correlation. Families {φn}n of particularelements of that space, orthonormal with respect to the inner product, could then be chosenwith the crucial idea of representing a signal as an orthonormal series f =

∑n (f, φn)φn

when the family is complete. Why bother? Well, it will surely be helpful if the members ofthat orthonormal family somehow capture the fundamental properties of the model we arestudying - they are to be the basic building blocks for the model. Furthermore, complete-ness ensures that the energy of f is completely determined by the energy of the coefficients{(f, φn)}n. There are then two things we can do using these coefficients:

(i) by dropping very small coefficients from∑

n (f, φn)φn we get only an approx-imation to f , but the terms (f, φn)φn omitted will have very small energy, sopresumably the approximation may still be a ‘good’ representation of f . In otherwords, we have represented the original f , up to a very small error, by fewer terms,thus providing compression of f .

(ii) If the orthonormal family captures key features of a model - say, sharp spikesor sudden oscillations - then the large coefficients {(f, φn)}n in the representationf =

∑n (f, φn)φn probably identify sharp edges in a signal, thus providing feature

detection: think ‘little wave’ property.

The four signals pictured in section (1.1) illustrate these ideas well. For example, detec-tion of transients could be an important issue for the first of the one-dimensional signals,while for the two fingerprint pictures the question might be one of detecting or charac-terizing edges. For efficient transmission or storage of fingerprints, however, compressionwould be crucial. In practice, say in medical imaging, most transients or edges, however,would be hidden in lots of noise as in the second of the one-dimensional signals. So thequestion would become one of isolating features within noise; or one could simply ask forde-noising. Whatever the question, however, the ‘moral of the story’ is always the same:everything depends on the choice of orthonormal family {φn}n.

19

Problems.

1. Establish the parallelogram law

E(f + g) + E(f − g) = 2E(f) + 2E(g), f, g ∈ V

for an inner product space V. Why is this called the parallelogram law?

2. Establish the polarization identity

(f, g) = 14

{E(f + g) −E(f − g) + iE(f + ig) − iE(f − ig)

}

for f, g in an inner product space.

There are many examples of inner product spaces. The next three problems discussexamples quite different from the ones given in this lecture.

3. Recall that the trace of a 2 × 2 matrix is defined by

trace(A) = a+ d, A =

[a bc d

].

Use this to show that the set C2×2 of all such 2×2 matrices having complex entries becomesan inner product space with respect to

(A, B) = trace(AB∗).

Exhibit at least one family in C2×2 which is complete and orthonormal with respect tothis inner product.

4. Modify the inner product on 2 × 2 matrices introduced in problem 3 so that thefamily C2×3 of 2 × 3 matrices having complex entries becomes an inner product space.(Don’t forget to show that (A, B) = (B, A) for all 2 × 3-matrices A, B.)

5. Denote by A the family of all (real-valued) functions f that have a Taylor seriesexpansion

f(x) =∞∑

n = 0

1

n!f (n)(0) xn

whose interval of convergence is some interval about the origin, the interval of convergencebeing allowed to vary with f . Notice that

f(D) =

∞∑

n = 0

1

n!f (n)(0)Dn, D =

d

dx,

then makes sense as a differential operator. Show that A becomes an inner product spacewith respect to the inner product

(f, g) = f(D)g(x)∣∣∣x =0

;

20

and that the corresponding energy E(f) of a function in A is given by

E(f) =∞∑

n = 0

1

n!(f (n)(0))2.

(a) Show that the family { 1√n!xn : n = 0, 1, . . . , } is orthonormal and complete in

A with respect to this inner product.

(b) Interpret the degree N Taylor polynomial

PN (f, x) =N∑

n = 0

1

n!f (n)(0) xn

as a ‘compression’ of f . What is the loss in energy by using this compression? Doyou think this is a good measure of the compression of f?

6. Show that every 2 × 2 matrix A acting by right matrix multiplication

A =

[a bc d

]: z = (z1 z2) −→ (az1 + bz2, cz1 + dz2)

on C2 automatically defines a bounded operator on C2 such that

E(Az) ≤ E(A)E(z), x = (z1, z2) ∈ C2.

(Hint: apply Cauchy-Schwarz to the entries in Az).

The next few problems establish orthonormality properties of the functions

bkn(x) = 2k/2 b(2kx− n), hkn(x) = 2k/2 h(2kx− n)

obtained by stretching and translating the box and Haar functions. One of the best waysof providing careful proofs of these properties as well as beginning to develop a goodunderstanding of the more general wavelet functions

φkn(x) = 2k/2 φ(2kx− n), ψkn(x) = 2k/2 ψ(2kx− n)

defined later is to look at the geometry and number theory associated with dyadic intervals.Recall from the lecture that to each pair of integers k, n there corresponds a dyadic interval

Ikn = [2−kn, 2−k(n+ 1)).

Often we get ‘clever’ and denote such intervals by I, I ′ etc. when specific values of k, naren’t needed.

21

7. Establish the following results for dyadic intervals.

(a) Show that the functions bkn and hkn ‘live on’ Ikn in the sense that they are ZEROoff Ikn.

(b) Show that the family { Ikn : −∞ < n,∞ } is a partition of (−∞, ∞) for eachfixed choice of k. (Think of this as the partition of (−∞, ∞) at resolution k.)

(c) Let I, I ′, I 6= I ′ be dyadic intervals such that I 6= I ′. Show that exactly one of

I ⊂ I ′, I ′ ⊂ I, I ∩ I ′ = ∅,

holds. Show furthermore that when I ⊂ I ′, then either I lies in the first half[2−kn, 2−k(n+ 1

2 )) of I ′ or in the second half [2−k(n+ 12), 2−k(n+ 1)) of I ′.

8. Show that the family { bkn : −∞ < n < ∞ } is orthonormal in L2(−∞, ∞) in thesense that

(bkm, bkn) =

{1, m = n,

0, m 6= n,

for each fixed k.

9. Use property (c) in problem 7 to establish the following results:

(a) the family {hkn : −∞ < k, n < ∞ } is orthonormal in L2(−∞, ∞) in the sensethat

(hjm, hkn) =

{1, j = k, m = n,

0, otherwise.

(b) the family {hjm : j ≥ k, −∞ < m < ∞ } is orthogonal to the family { bkn :−∞ < n <∞ } for each fixed k in the sense that

(hjm, bkn) = 0

for all j ≥ k and all integers m, n.

10. Let T : V −→ W be energy-preserving. Provide an example to show that {T (φn)}n

need not be a complete orthonormal family in W even if {φn}n is a complete orthonormalfamily in V.

lecture 1 introductory mathematical ideaslecture 1 introductory mathematical ideas the term wavelet,...

Documents