lecture 28 - university of waterloolinks.uwaterloo.ca/amath353docs/set10.pdf · that all of the...

Lecture 28

Solution of Heat Equation via Fourier Transforms and Convolution Theorem

Relvant sections of text: 10.4.2, 10.4.3

In the previous lecture, we derived the unique solution to the heat/diffusion equation on R,

∂u

∂t= k

∂2u

∂x2, −∞ < x < ∞, (1)

with initial condition

u(x, 0) = f(x). (2)

The result was

u(x, t) =

∫

∞

−∞

f(s)ht(x − s) ds, t > 0, (3)

where the “heat kernel” function ht(x) is given by

ht(x) =1√

4πkte−x2/4kt, t > 0. (4)

This is a fundamental result – it states that the solution u(x, t) is the spatial convolution of the

functions f(x) and ht(x).

The derivation given in the last lecture may seem a little contrived, since we took the integral form

of the solution u(x, t) derived in a previous lecture and then performed some manipulations inside the

integral.

In this lecture, we provide another derivation, in terms of a convolution theorem for Fourier

transforms. Starting with the heat equation in (1), we take Fourier transforms of both sides, i.e.,

F(

∂u

∂t

)

= kF(

∂2u

∂x2

)

, (5)

where we have acknowledged the linearity of the Fourier transform in moving the constant k out of

the transform. It now remains to make sense of these Fourier transforms. You may recall that in the

case of Laplace transforms (LTs), the LT of a derivative of a function could be related to the LT of

the function.

First, let us recall the definition of the FT of a function u(x, t):

F(u) = U(ω, t) =1

2π

∫

∞

−∞

u(x, t)eiωxdx. (6)

194

This implies that

F(

∂u

∂t

)

=1

2π

∫

∞

−∞

∂u

∂t(x, t)eiωxdx. (7)

The only t-dependence in the integral on the right is in the integrand u(x, t). As a result, we may

write

F(

∂u

∂t

)

=∂

∂t

[

1

2π

∫

∞

−∞

u(x, t)eiωxdx

]

=∂

∂tU(ω, t). (8)

In other words, the partial time-derivative of u(ω, t) is simply the partial time-derivative of U(ω, t).

In order to make sense of the right hand side of Eq. (5), we should first examine the FT of ∂u/∂x,

i.e.,

F(

∂u

∂x

)

=1

2π

∫

∞

−∞

∂u

∂x(x, t)eiωxdx. (9)

In this case, of course, we may not pull the partial derivative ∂/∂x out of the integral, since the variable

x is involved in the integration procedure. This integral looks like it is begging for an integration by

parts, so let’s try it: We’ll set f = eiωx and dg = ∂u/∂x to yield

1

2π

∫

∞

−∞

∂u

∂x(x, t)eiωxdx =

1

2πeiωxu(x, t)|∞

−∞− 1

2π(iω)

∫

∞

−∞

u(x, t)eiωxdx. (10)

Recall that the boundary conditions for the heat equation on the infinite interval were

u(x, t) → 0 as x → ±∞. (11)

As a result, the contributions to the first term vanish and we are left with the integral. But the

integral is simply a multiple of the Fourier transform of u, i.e.,

F(

∂u

∂x

)

= −iωF(u) = −iωU(ω, t). (12)

We may iterate this result to obtain the FT of ∂2u/∂x2:

F(

∂2u

∂x2

)

= F(

∂

∂x

(

∂u

∂x

))

= −iωF(

∂u

∂x

)

= (−iω)2F(u)

= −ω2F(u). (13)

195

We may now substitute the results of these calculations into Eq. (5) to give

∂U(ω, t)

∂t= −kω2U(ω, t). (14)

Taking FTs of both sides of the heat equation converts a PDE involving both partial derivatives in x

and t into a PDE that has only partial derivatives in t. This means that we can solve Eq. (14) as we

would an ordinary differential equation in the independent variable t – in essence, we may ignore any

dependence on the ω variable.

The above equation may be solved in the same way as we solve a first-order linear ODE in t.

(Actually it is also separable.) We write it as

∂U(ω, t)

∂t+ kω2U(ω, t) = 0, (15)

and note that the integrating factor is I(t) = ekω2t to give

∂

∂t

[

ekω2tU(ω, t)]

= 0. (16)

Integrating (partially) with respect to t yields

ekω2tU(ω, t) = c, (17)

or

U(ω, t) = ce−kω2t. (18)

where c is a “constant” with respect to partial differentiation by t. This means that c can be a function

of ω. As such, we’ll write

U(ω, t) = c(ω)e−kω2t. (19)

You can differentiate this expression partially with respect to t to check that it satisfies Eq. (14).

Now notice that at time t = 0,

U(ω, 0) = c(ω). (20)

In other words, c(ω) is the FT of the function u(x, 0). But this is the initial temperature distribution

f(x)! In other words

c(ω) =1

2π

∫

∞

−∞

f(x)eiωxdx = F (ω), (21)

where we have written F (ω) to represent the FT of f(x). Therefore, Eq. (19) may be rewritten as

U(ω, t) = F (ω)e−kω2t. (22)

196

Now, it seems that we have almost solved our heat equation problem. All we have to do is to take

the inverse FT of both sides to retrieve u(x, t). But how do we take the inverse FT of the right-hand

side? The function e−kω2t is not a complex exponential in x, e.g., eiωx, so that we cannot use a

shift theorem. On the other hand, if we may consider this Gaussian as a Fourier transform, then the

right-hand side becomes a product of Fourier transforms, i.e.,

U(ω, t) = F (ω)G(ω, t), G(ω, t) = e−kω2t. (23)

You may recall that there is a convolution theorem for products of Laplace transforms – there is also

a convolution theorem for Fourier transforms:

Convolution Theorem for Fourier Transforms: Let F(f) = F and F(g) = G. Then, assuming

that all of the integrals in the equation below exist,

F−1(FG) =1

2π

∫

∞

−∞

f(s)g(x − s) ds =1

2πf ∗ g(x). (24)

Or, equivalently,

F(

1

2πf ∗ g

)

= F (ω)G(ω). (25)

Proof: By definition

F−1(FG) =

∫

∞

−∞

F (ω)G(ω)e−iωxdω. (26)

We replace F (ω) by its definition as an integration in x, i.e.,

F (ω) =1

2π

∫

∞

−∞

f(x)eiωsds, (27)

so that

F−1(FG) =

∫

∞

−∞

[

1

2π

∫

∞

−∞

f(s)eiωsds

]

G(ω)e−iωxdω. (28)

We now reverse the order of integration. (Theoretically, this should involve a little care, since the

intervals of integration are infinite. We assume that the functions in the integrand are suitably well-

behaved, i.e., piecewise continuous, L1-integrable and then invoke “Fubini’s Theorem.”) The result

is

F−1(FG) =1

2π

∫

∞

−∞

f(s)

[∫

∞

−∞

G(ω)e−iω(x−s)dω

]

ds. (29)

197

But, by definition, the integral in the square brackets is the inverse Fourier transform of G, i.e., the

function g, but evaluated at x− s, because of the term “x− s” in the complex exponential. The result

is

F−1(FG) =1

2π

∫

∞

−∞

f(s)g(x − s) dx, (30)

which proves the theorem. Note the similarity – at least in idea – to the convolution theorem for

Laplace transforms. In that case, the convolution between two functions was defined slightly differently.

As well, there is an additional factor of 1/2π which arises from the asymmetric appearance of this

factor in the FT but not in the inverse FT.

We now wish to apply the above convolution theorem to Eq. (22). This, of course, means that we

must be able to produce the inverse FT of the Gaussian G(ω, t) = e−kω2t. But we did this a couple of

lectures ago – it is the function

gt(x) =

√

π

kte−x2/4kt. (31)

From the Convolution Theorem for FTs, then, we have

u(x, t) =1

2π

∫

∞

−∞

f(s)gt(x − s) ds

=

∫

∞

−∞

f(s)1√

4πkte−(x−s)2/4kt ds

=

∫

∞

−∞

f(s)ht(x − s) ds, (32)

which is precisely the result obtained in the previous lecture. Note that the functions gt(x) and ht(x)

are related by the factor 1/2π, i.e.,

ht(x) =1

2πgt(x). (33)

198

Lecture 29

Fourier Transforms (cont’d)

Let’s review our results to date regarding the heat equation on the (infinite) real line R:

∂u

∂t= k

∂2u

∂x2, −∞ < x < ∞, (34)


u(x, 0) = f(x), f ∈ L2(R). (35)

The condition that f ∈ L2(R) implies zero-endpoint conditions, i.e.,

f(x) → 0 as x → ±∞. (36)

By taking Fourier transforms of both sides of Eq. (34), we arrive at the following equation of evolution

of the Fourier transform U(ω, t) of u(x, t):

∂U

∂t(ω, t) = −kω2U(ω, t), (37)


U(ω, 0) = F (ω) = F(f). (38)

Since there are no partial derivatives with respect to ω in this equation, we may solve it as we would

an ordinary differential equation, in fact, a first order, linear (as well as separable) differential equation

with respect to the independent variable t. The solution was easily found to be

U(ω, t) = F (ω)e−kω2t. (39)

We may now obtain the solution u(x, t) to the heat equation by taking inverse Fourier transforms

(IFTs) of both sides of this equation. The IFT of the LHS is u(x, t). The RHS of (39) is a product of

functions, namely, F (ω) and G(ω) = e−kω2t. From the Convolution Theorem for FTs, the IFT of this

product is a convolution of the IFTs of F and G, i.e.,

u(x, t) =1

2π

∫

∞

−∞

f(s)gt(x − s) ds, (40)

where

gt(x) = F−1[G(ω)] =

√

π

kte−x2/4kt. (41)

199

The final result is

u(x, t) =

∫

∞

−∞

f(s)ht(x − s) ds, (42)

where

ht(x) =1√

4πkte−x2/4kt (43)

is the heat kernel. The heat kernel ht(x) is a Gaussian function that “spreads out” in time: its standard

deviation is given by σ(t) =√

2kt. The convolution in Eq. (42) represents a weighted averaging of

the function f(s) about the point x – as time increases, this weighted averaging includes more points

away from x, which accounts for the smoothing of the temperature distribution u(x, t).

You are already familiar with such smoothing from our study of the heat/diffusion equation in

1D on a finite interval, but here is an illustration of the qualitative aspects of such smoothing. In the

figure below, we show the graph of a temperature distribution that exhibits a concentration of heat

energy around the position x = a. At a time t2 > t1, this heat energy distribution has been somehwat

smoothened – some of the heat in the region of concentration has been transported to points farther

away from x = a:

u(x, t1) u(x, t2), t2 > t1

In the figure below, we illustrate how a discontinuity in the temperature function gets smoothened

into a continuous distribution. At time t = 0, there is a discontinuity in the temperature function

u(x, 0) = f(x) at x = a. Recall that we can obtain all future distributions u(x, t) by convolving f(x)

with the Gaussian heat kernel. For some representative time t > 0, but not too large, three such

Gaussian kernels are shown at points x1 < a, x2 = a and x3 > a. Since t is not too large, the Gaussian

kernels are not too wide. A convolution of u(x, 0) with these kernels at x1 and x3 will not change the

distribution too significantly, since u(x, 0) is constant over most of the region where these kernels have

significant values. However, the convolution of u(x, 0) with the Gaussian kernel centered at the point

of discontinuity x = x2 will involve values of u(x, 0) to the left of the discontinuity as well as values of

u(x, 0) to the right. This will lead to a significant averaging of the temperature values, as shown in

the figure. At a time t2 > t1, the averaging of u(x, 0) with even wider Gaussian kernels will produce

200

even greater smoothing. We shall return to this phenomenon in a later section.

x3a a

t1 > 0

u(x, t), t > 0u(x, 0)

t2 > t1

x1

Let’s now return to the evolution of the Fourier transform function U(ω, t) in Eq. (39). From this

equation, we see that

U(0, t) = F (0) and U(ω, t) → 0 for all ω 6= 0. (44)

This means that in the limit t → ∞, U(ω, t) approaches the zero function – it is nonzero only at the

point ω = 0. Let’s call this limiting function U∞(ω), i.e.,

U∞(ω) =

F (0), ω = 0

0, ω 6= 0(45)

Then

U(ω, t) → U∞(ω) as t → ∞ for all ω ∈ R. (46)

The inverse Fourier transform of this limiting function, which we’ll call u∞(x), is, by definition, given

by

u∞(x) =

∫

∞

−∞

U∞(ω)e−iωx dω = 0. (47)

(The possibly nonzero value of U∞(0) does not contribute to the integral.) Taking inverse Fourier

transforms of each side of Eq. (46), we obtain the result

u(x, t) → 0 as t → ∞ for all x ∈ R. (48)

The fact that we start with a nonzero initial temperature distribution u(x, 0) = f(x) which evolves

to the zero solution on the real line might be somewhat bothersome from the viewpoint of conservation

of energy. Where did all of that energy go? We claim that it is still there, i.e., on the line, but that

it has dispersed over the entire infinite line.

201

To see this, let’s start by computing the total thermal energy associated with the temperature

distribution u(x, 0) = f(x) for x ∈ R. We’ll assume that the real line represents a homogeneous rod of

constant cross-sectional area A and lineal density ρ. Furthermore, we assume that the thermal energy

density function is given by

e(x, t) = cρ[u(x, t) − u0]. (49)

And for simplicity, we again assume that u0 = 0.

The initial total thermal energy of the infinite rod is then given by

E(0) = cρ

∫

∞

−∞

u(x, 0)A dx = cρA

∫

∞

−∞

f(x) dx. (50)

We assume that this integral is finite. The thermal energy E(t) at any time t > 0 is then given by

E(t) = cρA

∫

∞

−∞

u(x, t) dx. (51)

Let’s now compute the rate of change E′(t):

E′(t) = cρAd

dt

∫

∞

−∞

u(x, t) dx

= cρA

∫

∞

−∞

∂u

∂t(x, t) dx (by Leibniz rule)

= cρA

∫

∞

−∞

k∂2u

∂x2(x, t) dx (since u is solution to heat equation)

= ckρA

[

∂u

∂x(x, t)

]x→∞

x→−∞

(Fundamental Theorem of Calculus II)

= 0. (52)

The final line comes from the fact that the partial derivative ∂u/∂x must go to zero as x → ±∞ since

u → 0 as x → ±∞.

Since E′(t) = 0 for all t > 0, it follows that

E(t) = E(0), t ≥ 0. (53)

Even though u(x, t) → 0 as t → ∞, the total energy remains constant – it simply becomes spread out

over the entire infinite real line R.

Without loss of generality, we assume that the initial temperature function f(x) is nonnegative

for all x ∈ R and u(x, t) ≥ 0 for all x ∈ R and t ≥ 0. Then if we strip away the constant multiplicative

202

factor ckρA from the above result, we have

∫

∞

−∞

u(x, t) dx =

∫

∞

−∞

|u(x, t)| dx

=

∫

∞

−∞

|f(x)| dx. (54)

This last line defines the so-called “L1 norm” of f , denoted as ‖f‖1. This comes from the definition

of the space of integrable functions, L2(R) (as opposed to square-integrable functions L2(R)):

L1(R) = {f : R → R | ‖f‖1 =

∫

∞

−∞

|f(x)| dx < ∞}. (55)

In other words, the L1 norm of the temperature function u(x, t) remains constant in time, even though

u(x, t) → 0.

On the other hand, we now show that the L2 norm of u(x, t) goes to zero as t → ∞, i.e.,

‖u(x, t)‖2 =

[∫

∞

−∞

|u(x, t)|2 dx

]1/2

→ 0 as t → ∞. (56)

To show this, let’s compute the time derivative of the squared L2 norm of u:

d

dt‖u‖2

2 =d

dt

∫

∞

−∞

u(x, t)2 dx

=

∫

∞

−∞

∂

∂t[u(x, t)2] dx (Leibniz rule)

= 2

∫

∞

−∞

u∂u

∂tdx

= 2k

∫

∞

−∞

u∂2u

∂x2dx (u is solution to heat equation). (57)

We now integrate by parts, with f = u and g′ = ∂u/∂x:

d

dt‖u‖2

2 = 2k

[

u∂u

∂x

]x→∞

x→∞

− 2k

∫

∞

−∞

(

∂u

∂x

)2

dx. (58)

The first term on the RHS vanishes because u → 0 as x → ±∞. The integral on the RHS is nonnegative

since the integrand is nonnegative. The only way that the integral can be zero is if∂u

∂x= 0 for all

x ∈ R, in which case u(x, t) must be constant. The only constant function that is in the space L2(R)

is the zero function. So if we exclude the case that u = 0, we have

d

dt‖u(x, t)‖2

2 < 0, t > 0. (59)

This implies that for each x ∈ R, ‖u(x, t)‖2 is decreasing in time. Unfortunately, this result does not

imply that u(x, t) → 0 as t → ∞: it doesn’t prevent ‖u(x, t)‖2 from approaching a nonzero constant

203

asymptotically. In a little while, with the help of Parseval’s identity, we’ll be able to show that,

indeed, the L2 norm of u does go to zero. For the moment, let us simply accept this result.

The above results – that as t → ∞:

1. u(x, t) → 0 for all x ∈ R,

2. ‖u‖1 = ‖f‖1, constant,

3. ‖u‖2 → 0,

may be somewhat difficult to reconcile. Let’s consider a rather simple example that also demonstrates

these properties. For any L > 0, define the following function,

fL(x) =

12L , −L ≤ x ≤ L,

0, otherwise.(60)

The graph of this function is sketched below.

y = fL(x)

x

y

0 L−L

1

2L

We are concerned with the behaviour of fL(x) as L → 0. From the figure above, it is clear that the

nonzero part of the graph of fL gets wider, but it also approaches zero. First of all, note that

‖fL‖1 =

∫

∞

−∞

|fL(x)| dx

=

∫ L

−L

1

2Ldx

= 1. (61)

This, of course, implies that

‖fL‖1 → 1 as L → ∞. (62)

204

This is a simple consequence of the fact that the area enclosed by the graph of fL(x) and the x-axis

is constant. (Yes, the function was cleverly contrived to exhibit this behaviour.)

Now consider

‖fL‖22 =

∫

∞

−∞

|fL(x)|2 dx

=

∫ L

−L

(

1

2L

)2

dx

=1

4L2· 2L

=1

2L. (63)

This implies that

‖fL‖2 =1√2L

→ 0 as L → ∞. (64)

We have constructed a simple function that behaves qualitatively in a manner similar to the temper-

ature function u(x, t) discussed earlier.

205

Smoothing produced by the heat/diffusion equation and some applications in signal

and image processing

We now return to the idea of smoothing under evolution of the heat/diffusion equation, in particular,

the smoothing of a discontinuity, as showed earlier. The graphs shown earlier are reproduced below.

x3a a

t1 > 0

u(x, t), t > 0u(x, 0)

t2 > t1

x1

We now move away from the idea of the heat or diffusion equation modelling the behaviour of

temperatures or concentrations. Instead, we consider the function u(x, t) as representing a time-

varying signal. And in the case of two spatial dimensions, we may consider the function u(x, y, t) as

representing the time evolution of an image. Researchers in the signal and image processing noted the

smoothing behaviour of the heat/diffusion equation quite some time ago and asked whether it could

be exploited to accomplish desired tasks with signals and images. We outline some of these ideas very

briefly below for the case of images, since the results are quite dramatic visually.

First of all, images generally contain many points of discontinuity – these are the edges of the

image. In fact, edges are considered to define an image to a very large degree, since the boundaries

of any objects in the image produce edges. In what follows, we shall let the function u(x, y, t) denote

the evolution of a (non-negative) image function under the heat/diffusion equation,

∂u

∂t= k∇2u = k

[

∂2u

∂x2+

∂2u

∂y2

]

, u(x, y, 0) = f(x, y). (65)

Here we shall not worry about the domain of definition of the image. In practice, of course, image

functions are defined on a finite set, e.g., the rectangular region [a, b] × [c, d] ⊂ R2.

Some additional notes on the representation of images

Digitized images are defined over a finite set of points in a rectangular domain. Therefore,

they are essentially m × n arrays of greyscale or colour values – the values that are used

206

to assign brightness values to pixels on a computer screen. In what follows, we may

consider such matrices to define images that are piecewise constant over a region of R2.

We shall also be looking only at black-and-white (BW) images. Mathematically, we may

consider the range of greyscale values of a BW image function to be an interval [A,B], with

A representing black and B representing white. In mathematical analysis, one usually

assumes that [A,B] = [0, 1]. As you probably know, the practical storage of digitized

images also involves a quantization of the greyscale values into discrete values. For so-

called “8 bit-per-pixel” BW images, where each pixel may assume one of 28 = 256 discrete

values (8 bits of computer memory are used to the greyscale value at each pixel), the

greyscale values assume the values (black) 0, 1, 2, · · · , 255 (white). This greyscale range,

[A,B] = [0, 255] is used for the analysis and display of the images below.

From our earlier discussions, we expect that edges of the input image f(x, y) will become more

and more smoothened as time increases. The result will be an increasingly blurred image, as we show

in Figure 1 below. The top image in the figure is the input image f(x, y). The bottom row shows

the image u(x, y, t) for two future times. The solutions u(x, y, t) we computed by means of a 2D

finite-difference scheme using forward time difference and centered difference for the Laplacian. It

assumes the following form,

u(n+1)ij = u

(n)ij + s[u

(n)i−1,j + u

(n)i+1,j + u

(n)i,j−1 + u

(n)i,j+1 − 4u

(n)ij ], s =

k∆t

(∆x)2. (66)

(For details, you may consult the text by Haberman, Chapter 6, p. 253.) This scheme is numerically

stable for s < 0.25: The value s = 0.1 was used to compute the images in the figure.

Heat/diffusion equation and “deblurring”

You may well question the utility of blurring an image: Why would one wish to degrade an image in

this way? We’ll actually provide an answer very shortly but, for the moment, let’s make use of the

blurring result in the following way: If we know that an image is blurred by the heat/diffusion equation

as time increases, i.e., as time proceeds forward, then perhaps a blurry image can be deblurred by

letting it evolve as time proceeds backward. The problem of “image deblurring” is an important one,

e.g., acquiring the license plate numbers of cars that are travelling at a very high speed.

If everything proceeded properly, could a blurred edge could possibly be restored by such a pro-

cedure? In theory, the answer is “yes”, provided that the blurred image is known for all (continuous)

207

Figure 1. Image blurring produced by the heat/diffusion equation. Top: 2288 × 1712 pixel (8 bits=256

grayscale levels/pixel) image as initial data function f(x, y) = u(x, y, 0) for heat/diffusion equation on R2.

Bottom: Evolution of image function u(x, y, t) under discrete 2D finite-difference scheme (see main text). Left:

After n = 100 iterations. Right: After n = 500 iterations.

values of x and y. In practice, however, the answer is “generally no”, since we know only discrete, sam-

pled values of the image. In addition, running the heat/diffusion equation backwards is an unstable

process. Instead of the exponential damping that we saw very early in the course, i.e., an eigensolution

208

in one dimension, un(x, t) = φnhn(t) evolving as follows,

un(x, t) = φn(x)e−k(nπ/L)2t, (67)

we encounter exponential increase: replacing t with −t yields,

un(x, t) = φn(x)ek(nπ/L)2t. (68)

As such, any inaccuracies in the function will be amplified. As a result, numerical procedures associated

with running the heat/diffusion equation backwards are generally unstable.

To investigate this effect, the blurred image obtained after 100 iterations of the first experiment

was used as the initial data for a heat/diffusion equation that was run backwards in time. This may

be done by changing ∆t to −∆t in the finite difference scheme, implying that s is replaced by −s.

The result is the following “backward scheme,”

u(n−1)ij = u

(n)ij − s[u

(n)i−1,j + u

(n)i+1,j + u

(n)i,j−1 + u

(n)i,j+1 − 4u

(n)ij ]. (69)

The first blurred image of the previous experiment (lower left image of Figure 1) was used as input

into the above backward-time scheme. It is shown at the top of Figure 2. After five iterations, the

image at the lower left of Figure 2 is produced. Some deblurring of the edges has been accomplished.

(This may not be visible if the image is printed on paper, since the printing process itself introduces

a degree of blurring.) After another five iterations, additional deblurring is achieved at some edges

but at the expense of some severe degradation at other regions of the image. Note that much of the

degradation occurs at smoother regions of the image, i.e., where spatially neighbouring values of the

image function are closer to each other. This degradation is an illustration of the numerical instability

of the backward-time procedure.

Image denoising under the heat/diffusion equation

We now return to the smoothing effect of the heat/diffusion equation and ask whether or not it could

be useful. The answer is “yes” – it may be useful in the denoising of signals and/or images. In many

applications, signals and images are degraded by noise in a variety of possible situations, e.g., (i)

atmospheric disturbances, particularly in the case of images of the earth obtained from satellites or

astronomical images obtained from telescopes, (ii) the channel over which such signals are transmitted

are noisy, These are part of the overall problem of signal/image degradation which may include both

209

Figure 2. Attempts to deblur images by running the heat/diffusion equation backwards. Top: Blurred image

u(100)ij from previous experiment, as input into “backward” heat/diffusion equation scheme, with s = 0.1.

Bottom left: Result after n = 5 iterations. Some deblurring has been achieved. Bottom right: Result

after n = 10 iterations. Some additional deblurring but at the expense of degradation in some regions due to

numerical instabilities.

blurring as well as noise. The removal of such degradations, which is almost always only partial, is

known as signal/image enhancement.

A noisy signal may look something like the sketch at the left of Figure 3 below. Recalling that

the heat/diffusion equation causes blurring, one might imagine that the blurring of a noisy signal may

210

produce some deblurring, as sketched at the right of Figure 3. This is, of course, a very simplistic

idea, but it does provide the starting point for a number of signal/image denoising methods.

Denoised image u(x, t)Noisy image f(x) = u(x, 0)

A noisy signal (left) and its denoised counterpart (right).

In Figure 4 below, we illustrate this idea as applied to image denoising. The top left image is our

original, “noiseless” image u. Some noise was added to this image to produce the noisy image u at

the top right. Very simply,

u(i, j) = u(i, j) + n(i, j), (70)

where n(i, j) ∈ R was chosen randomly from the real line according to the normal Gaussian distribution

N (0, σ), i.e., zero-mean and standard deviation σ. In this case, σ = 20 was used. The above equation

is usually written more generally as follows,

u = u + N (0, σ). (71)

For reference purposes, the (discrete) L2 error between u and u was computed as follows,

‖u − u‖2 =

√

√

√

√

1

5122

512∑

i,j=1

[u(i, j) − u(i, j)]2 = 20.03. (72)

This is the “root mean squared error” (RMSE) between the discrete functions u and u. (You first

compute the average value of the squares of the differences of the greyscale values at all pixels. Then

take the square root of this average value.) In retrospect, it should be close to the standard deviation

σ of the added noise. In other words, the average magnitude of the error between u(i, j) and u(i, j)

should be the σ-value of the noise added.

The noisy image u was then used as the input image for the diffusion equation, more specifically,

the 2D finite difference scheme used earlier, i.e.,

u(n+1)ij = u

(n)ij + s[u

(n)i−1,j + u

(n)i+1,j + u

(n)i,j−1 + u

(n)i,j+1 − 4u

(n)ij ], (73)

with s = 0.1.

211

After five iterations (lower left), we see that some denoising has been produced, but at the expense

of blurring, particularly at the edges. The L2 distance between this denoised/blurred image and the

original noiseless image u is computed to be

‖u5 − u‖2 = 16.30. (74)

We see that from the viewpoint of L2 distance, i.e., the denoised image u5 is “closer” to the noiseless

image u than the noisy image u. This is a good sign – we would hope that the denoising procedure

would produce an image that is closer to u. But more on this later.

After another five iterations, as expected, there is further denoising but accompanied by additional

blurring. The L2 distance between this image and u is computed to be

‖u10 − u‖2 = 18.23. (75)

Note that the L2 distance of this image is larger than that of u5 – in other words, we have done worse.

One explanation is that the increased blurring of the diffusion equation has degraded the image farther

away from u than the denoising has improved it.

We now step back and ask: Which of the above results is “better” or “best”? In the L2 sense, the

lower left result is better since its L2 error (i.e., distance to u) is smaller. But is it “better” visually?

Quite often, a result that is better in terms of L2 error is is poorer visually. And are the denoised

images visually “better” than the noisy image itself. You will recall that some people in class had the

opinion that the noisy image u actually looked better than any of the denoised/blurred results. This

illustrates an important point about image processing – the L2 distance, although easy to work with,

is not necessarily the best indicator of visual quality. Psychologically, our minds are sometimes more

tolerant of noise than degradation in the edges – particularly in the form of blurring – that define an

image.

Image denoising using “anisotropic diffusion”

We’re not totally done with the idea of using the heat/diffusion equation to remove noise by means

of blurring. Once upon a time, someone got the idea of employing a “smarter” form of diffusion –

one which would perform blurring of images but which would leave their edges relatively intact. We

could do this by making the diffusion parameter k to be sensitive to edges – when working in the

vicinity of an edge, we restrict the diffusion so that the edges are not degraded. As we mentioned

earlier, edges represent discontinuities – places where the magnitudes of the gradients become quite

212

Figure 4. Image denoising using the heat/diffusion equation. Top left: 512× 512 pixel (8 bits=256 grayscale

levels/pixel) San Francisco test image u. Top right: Noisy image, u = u+N (0, σ) (test image plus zero-mean

Gaussian noise, σ = 20), which will serve as the initial data function for heat/diffusion equation on R2. L2

error of noisy image: ‖u − u‖2 = 20.03.

Bottom: Evolution under discrete 2D finite-difference scheme (forward time difference scheme), Left: After

n = 5 iterations, some denoising along with some blurring, ‖u5 − u‖2 = 16.30. Right: After n = 10 iterations,

some additional denoising with additional blurring, ‖u10 − u‖2 = 18.23.

large. (Technically, in the continuous domain, the gradients would be undefined. But we are working

with finite differences, so the gradients will be defined, but large in magnitude.)

213

This implies that the diffusion parameter k would depend upon the position (x, y). But this is

only part of the process – since k would be sensitive to the gradient ~∇u(x, y) of the image, it would,

in fact, be dependent upon the image function u(x, y) itself!

One way of accomplishing this selective diffusion, i.e., slower at edges, is to let k(x, y) be inversely

proportional to some power of the gradient, e.g.,

k = k(‖~∇u‖) = C‖~∇u‖−α, α > 0. (76)

The resulting diffusion equation,∂u

∂t= k(‖~∇u‖)∇2u, (77)

would be a nonlinear diffusion equation, since k is now dependent upon u, and it multiplies the

Laplacian of u. And since the diffusion process is no longer constant throughout the region, it is

no longer homogeneous but nonhomogeneous or anisotropic. As such, Eq. (77) is often called the

anisotropic diffusion equation.

To illustrate this process, we have considered a very simple example, where

k(‖~∇u‖) = ‖~∇u‖−1/2. (78)

Some results are presented in Figure 5. This simply anisotropic scheme works well to preserve edges,

therefore producing better denoising of the noisy image used in the previous experiment. The denoised

image u20 is better not only in terms of L2 distance but also from the perspective of visual quality

since its edges are better preserved.

Needless to say, a great deal of research has been done on nonlinear, anisotropic diffusion and its

applications to signal and image processing.

214

Figure 5. Image denoising and edge preservation via “anisotropic diffusion.” Top: Noisy image, u = u+N (0, σ)

(test image plus zero-mean Gaussian noise, σ = 20). L2 error of noisy image: ‖u − u‖2 = 20.03.

Bottom left: Denoising with (isotropic) heat/diffusion equation, ut = k∇2u, reported earlier. Finite-difference

scheme, s = 0.1, n = 5 iterations. L2 of denoised image: ‖u5 − u‖2 = 16.30. Bottom right: Denoising with

anisotropic heat equation, Finite-difference scheme, s = 0.1, k(‖~∇u‖) = ‖~∇u‖−1/2, n = 20 iterations. There

is denoising but much less blurring around edges. L2 error: ‖u20 − u‖2 = 15.08. Not only is the result from

anisotropic diffusion better in the L2 sense but it is also better visually, since edges have been better preserved.

215

Lecture 30

Fourier Transforms (cont’d)

At this point, we stop to examine some of the mathematical properties of the Fourier transform –

such details were skipped over in previous lectures so that we could quickly arrive at some operational

results on how to solve PDEs with FTs.

Let us recall the basic formula for the Fourier transform, F (ω), of a function f : R → R:

F (ω) =1

2π

∫

∞

−∞

f(x)eiωxdx, ω ∈ R. (79)

First of all, the most obvious aspect of F (ω) is that it is complex-valued, i.e., F : R → C. This

follows from the Euler formula for the complex exponential - we won’t go into any particulars since

they are not important to the present discussion.

Let us consider the FT in Eq. (1) as defining a mapping of the function f(x) to the function

F (ω). Note that both functions, f and F , are defined over the real line R – it is convention that “x”

(space variable) or “t” (time variable) are used to denote the argument of f and “ω” (frequency) the

argument of F .

So we’ll consider the Fourier transform as a mapping

F : f → F, or F(f) = F. (80)

And we also have encountered the inverse mapping,

F−1 : F → f, or F−1(F ) = f. (81)

The first thing that we can say about F (and its inverse) is that it is linear, i.e., if F1 = F(f1)

and F2 = F(f2), then

F(c1f1 + c2f2) = c1F1 + c2F2. (82)

This is easily verified using the definition in Eq. (79).

However, we would like to make more mathematical sense of this mapping. For example, do all

functions f(x) have Fourier transforms? Or are there some for which the Fourier transform in (79) is

undefined? As an example, consider the function f(x) = 1 for x ∈ R. From Eq. (79, its FT is given

by

F (ω) =1

2π

∫

∞

−∞

eiωxdx, ω ∈ R. (83)

216

Does this integral make sense? Let’s consider the trunctated integral

Fb(ω) =1

2π

∫ b

−beiωxdx

=1

2iπω

[

eiωb − e−iωb]

=sin(ωb)

πω. (84)

For each value of ω, the limit of Fb(ω) as b → ∞ does not exist – Fb(ω) simply oscillates with b.

Therefore, the FT, as an improper integral does not exist.

This simple example indicates that we’ll have to be careful a specify what kinds of functions are

being mapped to what other functions. In other words, what spaces of functions is being mapped to

what other corresponding spaces?

It is not too hard to show that if f belongs to the space L1(R) introduced earlier, that is, the set

of functions f : R → R for which∫

∞

−∞

|f(x)| dx < ∞, (85)

then the integral in (79) is well-defined, i.e., is finite. But since we are now encountering complex-

valued functions F (ω), it makes sense to extend our definition of these function spaces. We’ll let

L1(R) now denote the set of integrable complex-valued functions, i.e.,

L1(R) = {f : R → C ,

∫

∞

−∞

|f(x)| dx < ∞}, (86)

where |f(x)| denotes the modulus of the complex number f(x). Furthermore, we let the above integral

define the L1-norm of the function f :

‖f‖1 =

∫

∞

−∞

|f(x)| dx, (87)

which is finite.

The extension of our space of functions to complex-valued ones is not a big deal: It merely states

that we can take Fourier transforms of complex-valued functions. In applications, we normally deal

with real-valued functions, but sometimes it’s convenient to extend the treatment to cover complex-

valued ones. After all, when we take the inverse Fourier transform, it will often be the inverse FT of

a complex-valued function.

217

So let’s now assume that f ∈ L1(R) and look at its Fourier transform. In particular, we’ll look

at the truncated integral:

Fb(ω) =1

2π

∫ b

−bf(x)eiωxdx. (88)

Noting that |eiωx| ≤ 1, we examine the modulus of this truncated integral:

|Fb(ω)| =

∣

∣

∣

∣

1

2π

∫ b

−bf(x)eiωxdx

∣

∣

∣

∣

≤ 1

2π

∫ b

−b|f(x)eiωx| dx

≤ 1

2π

∫ b

−b|f(x)| dx. (89)

But now note that, by definition,

|F (ω)| = limb→∞

|Fb(ω)| = limb→∞

1

2π

∫ b

−b|f(x)| dx =

1

2π‖f‖1 < ∞. (90)

In other words, the Fourier transform F (ω) is well-defined for all ω ∈ R.

Unfortunately, we cannot go further: It is not necessarily true that if f ∈ L1(R), then its Fourier

transform F (ω) is an L1 function. This, however, is not a stumbling block in this course.

In many applications, it is desirable to work in another space of integrable functions, namely the

square-integrable functions L2. We shall modify the definition given earlier in this course to include

complex-valued functions, i.e.,

L2(R) = {f : R → C |∫

∞

−∞

|f(x)|2 dx < ∞}. (91)

As in the case of real-valued functions, this space is an inner product space: The inner product of two

functions f and g is given by

〈f, g〉 =

∫

∞

−∞

f(x)∗g(x) dx, (92)

where f(x)∗ denotes the complex conjugate of f(x). The L2-norm of a function f ∈ L2(R) is then

given by

‖f‖2 = 〈f, f〉1/2 =

[∫

∞

−∞

f(x)∗f(x) dx

]1/2

=

[∫

∞

−∞

|f(x)|2 dx

]1/2

. (93)

We’ll show below that the Fourier transform F maps L2(R) functions to L2(R) functions, i.e.,

F : L2(R) → L2(R). (94)

218

This is important in quantum mechanics. First of all, the space L2(R) (if we simply limit ourselves

to one-dimensional systems on R for the moment) is natural since we wish the (complex-valued)

wavefunction Ψ(x) to be square-integrable. In fact, one requires that Ψ satisfy the normalization

condition,∫

∞

−∞

Ψ(x)∗Ψ(x) dx = 1. (95)

As those who have studied quantum mechanics know, the probability of finding the particle in the

interval (x, x + dx) is given by |Ψ(x)|dx. Since the probability of finding the particle somewhere in

space, i.e., over the real line R is unity, we have the above normalization condition.

And as those who have studied quantum mechanics may also have seen, the Fourier transform

F (ω) is related to the so-called momentum representation of the wavefunction Ψ(x).

Note that a necessary condition for a function f(x) to belong to L1(R) or L2(R) is that f(x) → 0

as x → ±∞.

Parseval’s Identity

Relevant section of text: 10.4.3

We now derive an important result that relates the L2-norms of a function f(x) and its Fourier

transform F (ω).

Let f, g ∈ L2(R) with Fourier transforms, F,G ∈ L2(R), respectively. Furthermore, suppose that

H(ω) = F (ω)G(ω). (96)

From the Convolution Theorem for FTs, it follows that h, the inverse FT of H, is given by

h =1

2πf ∗ g = F−1(FG). (97)

From the definition of the FT and its inverse, this implies that

1

2π

∫

∞

−∞

f(s)g(x − s) ds =

∫

∞

−∞

F (ω)G(ω)e−iωxdω. (98)

This relation is true for all ω ∈ R, including x = 0, so that

1

2π

∫

∞

−∞

f(s)g(−s) ds =

∫

∞

−∞

F (ω)G(ω)dω. (99)

219

Now for a function f(x), define the function g(x) such that

g(−x) = f(x)∗. (100)

Then the LHS of Eq. (99) becomes1

2π

∫

∞

−∞

f(s)f(s)∗ ds. (101)

It now remains to make sense of the RHS of Eq. (99). We need to determine what the Fourier

transform G(ω) is. By definition,

G(ω) =1

2π

∫

∞

−∞

g(s)eiωsds

=1

2π

∫

∞

−∞

f(−s)∗eiωsds. (102)

Now make the change of variable u = −s, du = −ds, etc., so that the above integral becomes

G(ω) = − 1

2π

∫

−∞

∞

f(u)∗e−iωudu

=1

2π

∫

∞

−∞

f(u)∗e−iωudu

=

[

1

2π

∫

∞

−∞

f(u)eiωudu

]

∗

= F (ω)∗. (103)

We now substitute the results of (101) and (103) into (99) to obtain the final result,

1

2π

∫

∞

−∞

f(s)f(s)∗ ds =

∫

∞

−∞

F (ω)F (ω)∗dω. (104)

This is Parseval’s identity. It is the continuous analogue of the (discrete) Parseval equality relating

the L2-norm of a function f in terms of the sum of squares of its Fourier coefficients with respect to

an orthonormal basis {φn}. The above result can be expressed in terms of the L2 norms of f and F :

1

2π‖f‖2

2 = ‖F‖22. (105)

In signal processing language, up to the factor 1/(2π), the “energy” of the signal f(x) is equal to the

energy of its Fourier transform F (ω). The appearance of the factor 1/(2π) is due to its appearance

in the definition of the Fourier transform and not in the inverse. In other, “symmetric” definitions,

there is no factor appearing in Parseval’s identity.

220

Some final comments:

1. Parseval’s identity tells us that the Fourier transform F is a mapping from L2(R) to L2(R).

From Eq. (105), it follows that if f ∈ L2(R), i.e., ‖f‖2 < ∞, then F is also in L2(R), i.e.,

‖F‖2 < ∞.

2. This result now allows us to conclude that the L2 norm of the solution u(x, t) to the heat equation

goes to zero, i.e.,

‖u(x, t)‖2 → 0, as t → ∞. (106)

Remember that we showed that ‖u(x, t)‖2 was decreasing in time, but we couldn’t show that it

was decreasing to zero. We may now use Parseval’s inequality to establish this result. Recall

that the time evolution of the Fourier transform of u(x, t) is given by

U(ω, t) = F (ω)e−kω2t, (107)

where F = F(f) is the FT of the initial temperature distribution.

The squared L2 norm of U(ω, t) is given by

‖U(ω, t)‖22 = 〈U(ω, t), U(ω, t)〉

=

∫

∞

−∞

|F (ω)|2e−2kω2t dω. (108)

We now claim that the function F (ω), the Fourier transform of f(x), is bounded for all ω ∈ R.

The proof of this statement follows in the same way that we showed that F (ω) was bounded in

the case that f(x) is an L1 function.

Now define

M = maxω∈R

|F (ω)|. (109)

The integral in the second line of Eq. (108) may then be bounded as follows,

∫

∞

−∞

|F (ω)|2e−2kω2t dω ≤ M2

∫

∞

−∞

e−2kω2t dω. (110)

The integral on the RHS may be evaluated with the help of the following result that we have

used earlier in our discussion of Gaussian functions,

∫

∞

−∞

e−x2

dx =√

π. (111)

221

We make the change of variable x = ω√

2kt, so that dx =√

2kdω. With a just a little work, we

find that∫

∞

−∞

e−2kω2t dω ≤√

π

2kt. (112)

Putting all the results together, we obtain the result,

‖U(ω, t)‖22 ≤ M2

√

π

2kt→ 0 as t → ∞. (113)

From Parseval’s identity,1

2π‖u(x, t)‖2

2 = ‖U(ω, t)‖22, (114)

it follows that

‖u(x, t)‖2 → 0 as t → ∞, (115)

thus proving the desired result.

222

lecture 28 - university of waterloolinks.uwaterloo.ca/amath353docs/set10.pdf · that all of the...

Documents