paul e. sacks august 20, 2015 - department of … examples of adjoints . . . . . . . . . . . . . . ....

165
Notes for MATH 519-520 Paul E. Sacks August 20, 2015

Upload: phungkhuong

Post on 20-Jun-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

Notes for MATH 519-520

Paul E. Sacks

August 20, 2015

Page 2: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

Contents

1 Orientation 8

1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2 Preliminaries 9

2.1 Ordinary di↵erential equations . . . . . . . . . . . . . . . . . . . . . . . . 9

2.1.1 Initial Value Problems . . . . . . . . . . . . . . . . . . . . . . . . 10

2.1.2 Boundary Value Problems . . . . . . . . . . . . . . . . . . . . . . 12

2.1.3 Some exactly solvable cases . . . . . . . . . . . . . . . . . . . . . 13

2.2 Integral equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.3 Partial di↵erential equations . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.3.1 First order PDEs and the method of characteristics . . . . . . . . 18

2.3.2 Second order problems in R2 . . . . . . . . . . . . . . . . . . . . . 21

2.3.3 Further discussion of model problems . . . . . . . . . . . . . . . . 24

2.3.4 Standard problems and side conditions . . . . . . . . . . . . . . . 30

2.4 Well-posed and ill-posed problems . . . . . . . . . . . . . . . . . . . . . . 33

2.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

1

Page 3: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

3 Vector spaces 39

3.1 Axioms of a vector space . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3.2 Linear independence and bases . . . . . . . . . . . . . . . . . . . . . . . 42

3.3 Linear transformations of a vector space . . . . . . . . . . . . . . . . . . 43

3.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

4 Metric spaces 46

4.1 Axioms of a metric space . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

4.2 Topological concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

4.3 Functions on metric spaces and continuity . . . . . . . . . . . . . . . . . 53

4.4 Compactness and optimization . . . . . . . . . . . . . . . . . . . . . . . . 54

4.5 Contraction mapping theorem . . . . . . . . . . . . . . . . . . . . . . . . 58

4.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

5 Normed linear spaces and Banach spaces 66

5.1 Axioms of a normed linear space . . . . . . . . . . . . . . . . . . . . . . . 66

5.2 Infinite series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

5.3 Linear operators and functionals . . . . . . . . . . . . . . . . . . . . . . . 70

5.4 Contraction mappings in a Banach space . . . . . . . . . . . . . . . . . . 72

5.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

6 Inner product spaces and Hilbert spaces 75

6.1 Axioms of an inner product space . . . . . . . . . . . . . . . . . . . . . . 75

2

Page 4: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

6.2 Norm in a Hilbert space . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

6.3 Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

6.4 Projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

6.5 Gram-Schmidt method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

6.6 Bessel’s inequality and infinite orthogonal sequences . . . . . . . . . . . . 84

6.7 Characterization of a basis of a Hilbert space . . . . . . . . . . . . . . . . 85

6.8 Isomorphisms of a Hilbert space . . . . . . . . . . . . . . . . . . . . . . . 87

6.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

7 Distributions 93

7.1 The space of test functions . . . . . . . . . . . . . . . . . . . . . . . . . . 94

7.2 The space of distributions . . . . . . . . . . . . . . . . . . . . . . . . . . 95

7.3 Algebra and Calculus with Distributions . . . . . . . . . . . . . . . . . . 99

7.3.1 Multiplication of distributions . . . . . . . . . . . . . . . . . . . . 99

7.3.2 Convergence of distributions . . . . . . . . . . . . . . . . . . . . . 99

7.3.3 Derivative of a distribution . . . . . . . . . . . . . . . . . . . . . . 102

7.4 Convolution and distributions . . . . . . . . . . . . . . . . . . . . . . . . 108

7.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

8 Fourier analysis and distributions 115

8.1 Fourier series in one space dimension . . . . . . . . . . . . . . . . . . . . 115

8.2 Alternative forms of Fourier series . . . . . . . . . . . . . . . . . . . . . . 121

8.3 More about convergence of Fourier series . . . . . . . . . . . . . . . . . . 123

3

Page 5: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

8.4 The Fourier Transform on RN . . . . . . . . . . . . . . . . . . . . . . . . 125

8.5 Further properties of the Fourier transform . . . . . . . . . . . . . . . . . 130

8.6 Fourier series of distributions . . . . . . . . . . . . . . . . . . . . . . . . 134

8.7 Fourier transforms of distributions . . . . . . . . . . . . . . . . . . . . . . 137

8.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

9 Distributions and Di↵erential Equations 148

9.1 Weak derivatives and Sobolev spaces . . . . . . . . . . . . . . . . . . . . 148

9.2 Di↵erential equations in D0 . . . . . . . . . . . . . . . . . . . . . . . . . . 150

9.3 Fundamental solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153

9.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163

10 Linear operators 165

10.1 Linear mappings between Banach spaces . . . . . . . . . . . . . . . . . . 165

10.2 Examples of linear operators . . . . . . . . . . . . . . . . . . . . . . . . . 167

10.3 Linear operator equations . . . . . . . . . . . . . . . . . . . . . . . . . . 173

10.4 The adjoint operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174

10.5 Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177

10.6 Conditions for solvability of linear operator equations . . . . . . . . . . . 179

10.7 Fredholm operators and the Fredholm alternative . . . . . . . . . . . . . 180

10.8 Convergence of operators . . . . . . . . . . . . . . . . . . . . . . . . . . . 181

10.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182

4

Page 6: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

11 Unbounded operators 186

11.1 General aspects of unbounded linear operators . . . . . . . . . . . . . . . 186

11.2 The adjoint of an unbounded linear operator . . . . . . . . . . . . . . . . 190

11.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194

12 Spectrum of an operator 196

12.1 Resolvent and spectrum of a linear operator . . . . . . . . . . . . . . . . 196

12.2 Examples of operators and their spectra . . . . . . . . . . . . . . . . . . 200

12.3 Properties of spectra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203

12.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206

13 Compact Operators 209

13.1 Compact operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209

13.2 The Riesz-Schauder theory . . . . . . . . . . . . . . . . . . . . . . . . . . 216

13.3 The case of self-adjoint compact operators . . . . . . . . . . . . . . . . . 220

13.4 Some properties of eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . 227

13.5 The Singular Value Decomposition and Normal Operators . . . . . . . . 229

13.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231

14 Spectra and Green’s functions for di↵erential operators 234

14.1 Green’s functions for second order ODEs . . . . . . . . . . . . . . . . . . 234

14.2 Adjoint problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238

14.3 Sturm-Liouville theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241

5

Page 7: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

14.4 The Laplacian with homogeneous Dirichlet boundary conditions . . . . . 245

14.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252

15 Further study of integral equations 256

15.1 Singular integral operators . . . . . . . . . . . . . . . . . . . . . . . . . . 256

15.2 Layer potentials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260

15.3 Convolution equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265

15.4 Wiener-Hopf technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266

15.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267

16 Variational methods 269

16.1 The Dirichlet quotient . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269

16.2 Eigenvalue approximation . . . . . . . . . . . . . . . . . . . . . . . . . . 274

16.3 The Euler-Lagrange equation . . . . . . . . . . . . . . . . . . . . . . . . 275

16.4 Variational methods for elliptic boundary value problems . . . . . . . . . 277

16.5 Other problems in the calculus of variations . . . . . . . . . . . . . . . . 281

16.6 The existence of minimizers . . . . . . . . . . . . . . . . . . . . . . . . . 286

16.7 The Frechet derivative . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287

16.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291

17 Weak solutions of partial di↵erential equations 297

17.1 Lax-Milgram theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297

17.2 More function spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303

6

Page 8: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

17.3 Galerkin’s method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309

17.4 PDEs with variable coe�cients . . . . . . . . . . . . . . . . . . . . . . . 309

17.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309

18 Appendices 311

18.1 Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311

18.2 Integration by parts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314

18.3 Spherical coordinates in RN . . . . . . . . . . . . . . . . . . . . . . . . . 315

19 Bibliography 317

7

Page 9: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

Chapter 1

Orientation

1.1 Introduction

While the phrase ’Applied Mathematics’ has a very broad meaning, the purpose of thistextbook is much more limited, namely to present techniques of mathematical analysiswhich have been found to be particularly useful in understanding some kinds of math-ematical problems which are very commonly occurring in scientific and technologicaldisciplines, especially physics and engineering. These methods, which are often regardedas belonging to the realm of functional analysis, have been motivated most specifically inconnection with the study of ordinary di↵erential equations, partial di↵erential equationsand integral equations. The mathematical modeling of physical phenomena typically in-volves one or more of these types of equations, and insight into the physical phenomenonitself may result from a deep understanding of the underlying mathematical propertieswhich the models possess. All concepts and techniques discussed in this book are ulti-mately of interest because of their relevance for the study of these three general types ofproblems. There is a great deal of beautiful mathematics which has grown out of theseideas, and so intrinsic mathematical motivation cannot be denied or ignored.

8

Page 10: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

Chapter 2

Preliminaries

chprelim

In this chapter we will discuss ’standard problems’ in the theory of ordinary di↵erentialequations (ODEs), integral equations, and partial di↵erential equations (PDEs). Thetechniques developed in these notes are all meant to have some relevance for one or moreof these kinds of problems, so it seems best to start with some awareness of exactly whatthe problems are. In each case there are some relatively elementary methods, which thereader may well have seen before, or which depend only on simple considerations, whichwe will review. At the same time we establish terminology and notations, and begin toget some sense of the ways in which problems are classified.

2.1 Ordinary di↵erential equations

An n’th order ordinary di↵erential equation for an unknown function u = u(t) on aninterval (a, b) ⇢ R may be given in the form

F (t, u, u0, u00, . . . u(n)) = 0 (2.1.1) odeform1

where we use the usual notations u0, u00, . . . for derivatives of order 1, 2, . . . and also u(n)

for derivative of order n. Unless otherwise stated, we will assume that the ODE can besolved for the highest derivative, i.e. written in the form

u(n) = f(t, u, u0, . . . u(n�1)) (2.1.2) odeform

For the purpose of this discussion, a solution of either equation will mean a real valuedfunction on (a, b) possessing continuous derivatives up through order n, and for which

9

Page 11: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

the equation is satisfied at every point of (a, b). While it is easy to write down ODEs inthe form (2.1.1) without any solutions (for example, (u0)2 + u2 + 1 = 0), we will see thatODEs of the type (2.1.2) essentially always have solutions, subject to some very minimalassumptions on f .

The ODE is linear if it can be written asnX

j=0

aj(t)u(j)(t) = g(t) (2.1.3) lode

for some coe�cients a0

, . . . an, g, and homogeneous linear if also g(t) ⌘ 0. It is commonto use also operator notation for derivatives, especially in the linear case. Set

D =d

dt(2.1.4)

so that u0 = Du, u00 = D2u etc., and (2.1.3) may be given as

Lu :=nX

j=0

aj(t)Dju = g(t) (2.1.5)

By standard calculus properties L is a linear operator, meaning that

L(c1

u1

+ c2

u2

) = c1

Lu1

+ c2

Lu2

(2.1.6) linear

for any scalars c1

, c2

and any n times di↵erentiable functions u1

, u2

.

An ODE normally has infinitely many solutions – the collection of all solutions iscalled the general solution of the given ODE.

Example 2.1. By elementary calculus considerations, the simple ODE u0 = 0 has generalsolution u(t) = c, where c is an arbitrary constant. Likewise u0 = u has the generalsolution u(t) = cet and u00 = 1 has the general solution u(t) = t2

2

+ c1

t + c2

, where c1

, c2

are arbitrary constants. 2

2.1.1 Initial Value Problems

The general solution of an n’th order ODE typically contains exactly n arbitrary con-stants, whose values may be then chosen so that the solution satisfies n additional, or side,conditions. The most common kind of side conditions for an ODE are initial conditions,

u(j)(t0

) = �j j = 0, 1, . . . n� 1 (2.1.7) initcond

10

Page 12: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

where t0

is a given point in (a, b) and �0

, . . . �n�1

are given constants. Thus we areprescribing the value of the solution and its derivatives up through order n � 1 at thepoint t

0

. The problem of solving (2.1.2) together with the initial conditions (2.1.7) iscalled an initial value problem (IVP), and it is a very important fact that under fairlyunrestrictive hypotheses a unique solution exists. In stating conditions on f , we regardit as a function f = f(t, y

1

, . . . yn) defined on some domain in Rn+1.

OdeMain Theorem 2.1. Assume that

f,@f

@y1

, . . . ,@f

@yn(2.1.8)

are defined and continuous in a neighborhood of the point (t0

, �0

, . . . , �n�1

) 2 Rn+1. Thenthere exists ✏ > 0 such that the initial value problem (2.1.2),(2.1.7) has a unique solutionon the interval (t

0

� ✏, t0

+ ✏).

A proof of this theorem may be found in standard ODE textbooks, see for example[4],[7]. A slightly weaker version of this theorem will be proved in Section 4.5. As will bediscussed there, the condition of continuity of the partial derivatives of f with respectto each of the variables yi can actually be replaced by the weaker assumption that f isLipschitz continuous with respect to each of these variables. If we assume only that f iscontinuous in a neighborhood of the point (t

0

, �0

, . . . , �n�1

) then it can be proved that atleast one solution exists, but it may not be unique, see Exercise 3.

It should also be emphasized that the theorem asserts a local existence property, i.e.only in some su�ciently small interval centered at t

0

. It has to be this way, first of all,since the assumptions on f are made only in the vicinity of (t

0

, �0

, . . . , �n�1

). But evenif the continuity properties of f were assumed to hold throughout Rn+1, then as thefollowing example shows, it would still only be possible to prove that a solution existsfor points t close enough to t

0

.

Example 2.2. Consider the first order initial value problem

u0 = u2 u(0) = � (2.1.9)

for which the assumptions of Theorem 2.1 hold for any �. It may be checked that thesolution of this problem is

u(t) =�

1� �t(2.1.10)

which is only a valid solution for t < 1

�, which can be arbitrarily small. 2

11

Page 13: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

With more restrictions on f it may be possible to show that the solution exists onany interval containing t

0

, in which case we would say that the solution exists globally.This is the case, for example, for the linear ODE (2.1.3).

Whenever the conditions of Theorem 2.1 hold, the set of all possible solutions maybe regarded as being parametrized by the n constants �

0

, . . . , �n�1

, so that as mentionedabove, the general solution will contain n arbitrary parameters. In the special case ofthe linear equation (2.1.3) it can be shown that the general solution may be given as

u(t) =nX

j=1

cjuj(t) + up(t) (2.1.11)

where up is any particular solution of (2.1.3), and u1

, . . . , un are any n linearly inde-pendent solutions of the corresponding homogeneous equation Lu = 0. Any such set offunctions u

1

, . . . , un is also called a fundamental set for Lu = 0.

Example 2.3. If Lu = u00+u then by direct substitution we see that u1

(t) = sin t, u2

(t) =cos t are solutions, and they are clearly linearly independent. Thus {sin t, cos t} is afundamental set for Lu = 0 and u(t) = c

1

sin t+ c2

cos t is the general solution of Lu = 0.For the inhomogeneous ODE u00 + u = et one may check that up(t) =

1

2

et is a particularsolution, so the general solution is u(t) = c

1

sin t+ c2

cos t+ 1

2

et.

2.1.2 Boundary Value Problems

For an ODE of degree n � 2 it may be of interest to impose side conditions at morethan one point, typically the endpoints of the interval of interest. We will then refer tothe side conditions as boundary conditions and the problem of solving the ODE subjectto the given boundary conditions as a boundary value problem(BVP). Since the generalsolution still contains n parameters, we still expect to be able to impose a total of n sideconditions. However we can see from simple examples that the situation with regard toexistence and uniqueness in such boundary value problems is much less clear than forinitial value problems.

Example 2.4. Consider the boundary value problem

u00 + u = 0 0 < t < ⇡ u(0) = 0 u(⇡) = 1 (2.1.12)

Starting from the general solution u(t) = c1

sin t + c2

cos t, the two boundary conditionslead to u(0) = c

2

= 0 and u(⇡) = c2

= 1. Since these are inconsistent, the BVP has nosolution. 2

12

Page 14: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

Example 2.5. For the boundary value problem

u00 + u = 0 0 < t < ⇡ u(0) = 0 u(⇡) = 0 (2.1.13)

we have solutions u(t) = C sin t for any constant C, that is, the BVP has infinitely manysolutions.

The topic of boundary value problems will be studied in much more detail in Chapter( ).

2.1.3 Some exactly solvable cases

Let us recall explicit solution methods for some commonly occurring types of ODEs.

• For the first order linear ODE

u0 + p(t)u = q(t) (2.1.14) lode1

define the so-called integrating factor ⇢(t) = eP (t) where P 0 = p. Multiplying theequation through by ⇢ we then get

(⇢u)0 = ⇢q (2.1.15)

so if we pick Q such that Q0 = ⇢q, the general solution may be given as

u(t) =Q(t) + C

⇢(t)(2.1.16)

• Next consider the linear homogeneous constant coe�cient ODE

Lu =nX

j=0

aju(j) = 0 (2.1.17)

If we look for solutions in the form u(t) = e�t then by direct substitution we find thatu is a solution provided � is a root of the corresponding characteristic polynomial

P (�) =nX

j=0

aj�j (2.1.18)

13

Page 15: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

We therefore obtain as many linearly independent solutions as there are distinctroots of P . If this number is less than n, then we may seek further solutions ofthe form te�t, t2e�t, . . . , until a total of n linearly independent solutions have beenfound. In the case of complex roots, equivalent expressions in terms of trigonometricfunctions are often used in place of complex exponentials.

• Finally, closely related to the previous case is the so-called Cauchy-Euler type equa-tion

Lu =nX

j=0

(t� t0

)jaju(j) = 0 (2.1.19) CEtype

for some constants a0

, . . . , an. In this case we look for solutions in the form u(t) =(t � t

0

)� with � to be found. Substituting into (2.1.19) we will find again an n’thorder polynomial whose roots determine the possible values of �. The interestedreader may refer to any standard undergraduate level ODE book for the additionalconsiderations which arise in the case of complex or repeated roots.

2.2 Integral equations

In this section we discuss the basic set-up for the study of linear integral equations. Seefor example [15], [21] for general references in the classical theory of integral equations.Let ⌦ ⇢ RN be a measurable set and set

Tu(x) =

Z

K(x, y)u(y) dy (2.2.1)

Here the function K should be a measurable function on ⌦⇥⌦, and is called the kernelof the integral operator T , which is linear since (2.1.6) obviously holds.

A class of associated integral equations is thenZ

K(x, y)u(y) dy = �u(x) + g(x) x 2 ⌦ (2.2.2) basicie

for some scalar � and given function g in some appropriate class. If � = 0 then (2.2.2)is a first kind integral equation, otherwise it is second kind. Let us consider some simpleexamples which may be studied by elementary means.

14

Page 16: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

Example 2.6. Let ⌦ = (0, 1) ⇢ R and K(x, y) ⌘ 1. The corresponding first kindintegral equation is therefore

Z1

0

u(y) dy = g(x) 0 < x < 1 (2.2.3)

For simplicity here we will assume that g is a continuous function. The left hand side isindependent of x, thus a solution can exist only if g(x) is a constant function. When gis constant, on the other hand, infinitely many solutions will exist, since we just need tofind any u with the given definite integral.

For the corresponding second kind equation,Z

1

0

u(y) dy = �u(x) + g(x) (2.2.4) simplestie

a solution must have the specific form u(x) = (C � g(x))/� for some constant C. Sub-stituting into the equation then gives, after obvious simplification, that

C �Z

1

0

g(y) dy = C� (2.2.5) 2-01

or

C =

R1

0

g(y) dy

1� �(2.2.6)

in the case that � 6= 1. Thus, for any continuous function g and � 6= 0, 1, there exists aunique solution of the integral equation, namely

u(x) =

R1

0

g(y) dy

�(1� �)� g(x)

�(2.2.7)

In the remaining case that � = 1 it is immediate from (2.2.5) that a solution can existonly if

R1

0

g(y) dy = 0, in which case u(x) = C � g(x) is a solution for any choice of C.

This very simple example already exhibits features which turn out to be common toa much larger class of integral equations of this general type. These are

• The first kind integral equation will require much more restrictive conditions on gin order for a solution to exist.

• For most � 6= 0 the second kind integral equation has a unique solution for any g.

15

Page 17: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

• There may exist a few exceptional values of � for which either existence or unique-ness fails in the corresponding second kind equation.

All of these points will be elaborated and made precise in Chapter ( ).

Example 2.7. Let ⌦ = (0, 1) and

Tu(x) =

Z x

0

u(y) dy (2.2.8) opVolterra

corresponding to the kernel

K(x, y) =

(1 y < x

0 x y(2.2.9)

The corresponding integral equation may then be written asZ x

0

u(y) dy = �u(x) + g(x) (2.2.10) simpleVolterra

This is the prototype of an integral operator of so-called Volterra type, see the definitionbelow.

In the first kind case, � = 0, we see that g(0) = 0 is a necessary condition forsolvability, in which case the solution is u(x) = g0(x), provided that g is di↵erentiable insome suitable sense. For � 6= 0 we note that di↵erentiation of (2.2.10) with respect to xgives

u0 � 1

�u = �g0(x)

�(2.2.11)

which is of the type (2.1.14), and so may be solved by the method given there. Theresult, after some obvious algebraic manipulation, is

u(x) = �ex

�g(0)� 1

Z x

0

ex�y

� g0(y) dy (2.2.12) 2-02

Note, however, that by an integration by parts, this formula is seen to be equivalent to

u(x) = �g(x)

�� 1

�2

Z x

0

ex�y

� g(y) dy (2.2.13) 2-03

Observe that (2.2.12) seems to require di↵erentiability of g even though (2.2.13) doesnot, thus (2.2.13) would be the preferred solution formula. It may be verified directly by

16

Page 18: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

substitution that (2.2.13) is a valid solution of (2.2.10) for all � 6= 0, assuming that g iscontinuous on [0, 1].

Concerning the two simple integral equations just discussed observe that

• For the first kind equation, there are fewer restrictions on g needed for solvabilityin the Volterra case (2.2.10) than in the non-Volterra case (2.2.4).

• There are no exceptional values � 6= 0 in the Volterra case, that is, a unique solutionexists for every � 6= 0 and every continuous g.

Here are some of the more important ways in which integral operators are classified:

IntOpClass Definition 2.1. The kernel K(x, y) is called

• symmetric if K(x, y) = K(y, x)

• Volterra type if N = 1 and K(x, y) = 0 for x > y or x < y

• convolution type if K(x, y) = K(x� y)

• Hilbert-Schmidt type ifR⌦⇥⌦

|K(x, y)|2 dxdy < 1

• singular if K(x, y) is unbounded on ⌦⇥ ⌦

Some important examples of integral operators, which will receive much more atten-tion later in the book are the Fourier transform

Tu(x) =1

(2⇡)N

2

Z

RN

e�ix·yu(y) dy, (2.2.14) opFourier

the Laplace transform

Tu(x) =

Z 1

0

e�xyu(y) dy, (2.2.15) opLaplace

the Hilbert transform

Tu(x) =1

Z 1

�1

u(y)

x� ydy, (2.2.16) opHilbert

and the Abel operator

Tu(x) =

Z x

0

u(y)px� y

dy. (2.2.17) opAbel

17

Page 19: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

2.3 Partial di↵erential equations

An m’th order partial di↵erential equation (PDE) for an unknown function u = u(x) ona domain ⌦ ⇢ RN may be given in the form

F (x, {D↵u}|↵|m) = 0 (2.3.1) pdeform1

Here we are using the so-called multi-index notation for partial derivatives which worksas follows. A multi-index is vector of non-negative integers

↵ = (↵1

,↵2

, . . . ,↵N) ↵i 2 {0, 1, . . . } (2.3.2)

In terms of ↵ we define

|↵| =NX

i=1

↵i (2.3.3)

the order of ↵, and

D↵u =@|↵|u

@↵1

x1

@↵2

x2

. . . @↵N

xN

(2.3.4)

the corresponding ↵ derivative of u. For later use it is also convenient to define thefactorial of a multi-index

↵! = ↵1

!↵2

! . . .↵N ! (2.3.5)

The PDE (2.3.1) is linear if it can be written as

Lu(x) =X

|↵|m

D↵u(x) = g(x) (2.3.6)

2.3.1 First order PDEs and the method of characteristicspdeorder1

Let us start with the simplest possible example.

Example 2.8. When N = 2 and m = 1 consider

@u

@x1

= 0 (2.3.7)

By elementary calculus considerations it is clear that u is a solution if and only if u isindependent of x

1

, i.e.u(x

1

, x2

) = f(x2

) (2.3.8)

for some function f . This is then the general solution of the given PDE, which we notecontains an arbitrary function f .

18

Page 20: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

Example 2.9. Next consider, again for N = 2, m = 1, the PDE

a@u

@x1

+ b@u

@x2

= 0 (2.3.9)

where a, b are fixed constants. This amounts precisely to the condition that u has direc-tional derivative 0 in the direction ✓ = ha, bi, so u is constant along any line parallel to✓. This in turn leads to the conclusion that u(x

1

, x2

) = f(ax2

� bx1

) for some arbitraryfunction f , which at least for the moment would seem to need to be di↵erentiable. 2

The collection of lines parallel to ✓, i.e lines ax2

� bx1

= C obviously play a specialrole in the above example, they are the so-called characteristics, or characteristic curvesassociated to this particular PDE. The general concept of characteristic curve will nowbe described for the case of a first order linear PDE in two independent variables, (witha temporary change of notation)

a(x, y)ux + b(x, y)uy = c(x, y) (2.3.10) linear1order

Consider the associated ODE system

dx

dt= a(x, y)

dy

dt= b(x, y) (2.3.11)

and suppose we have some solution pair x = x(t), y = y(t) which we regard as a paramet-rically given curve in the (x, y) plane. Such a curve is then, by definition, a characteristiccurve for (2.3.10). Observe that if u(x, y) is a di↵erentiable solution of (2.3.10) then

d

dtu(x(t), y(t)) = a(x(t), y(t))ux(x(t), y(t)) + b(x(t), y(t))uy(x(t), y(t)) = c(x(t), y(t))

(2.3.12) udoteq

so that u satisfies a certain first order ODE along any characteristic curve. For exampleif c(x, y) ⌘ 0 then, as in the previous example, any solution of the PDE is constant alongany characteristic curve.

Now let � ⇢ R2 be some curve, which we assume can be parametrized as

x = f(s), y = g(s), s0

< s < s1

(2.3.13)

The Cauchy problem for (2.3.10) consists in finding a solution of (2.3.10) with valuesprescribed on �, that is,

u(f(s), g(s)) = h(s) s0

< s < s1

(2.3.14)

19

Page 21: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

for some given function h. Assuming for the moment that such a solution u exists, letx(t, s), y(t, s) be the characteristic curve passing through (f(s), g(s)) 2 � when t = 0, i.e.

(@x@t

= a(x, y) x(0, s) = f(s)@y@t

= b(x, y) y(0, s) = g(s)(2.3.15)

We must then have

@

@tu(x(t, s), y(t, s)) = c(x(t, s), y(t, s)) u(x(0, s), y(0, s)) = h(s) (2.3.16)

This is a first order initial value problem in t, depending on s as a parameter, whichis then guaranteed to have a solution at least for |t| < ✏ for some ✏ > 0. The threerelations x = x(t, s), y = y(t, s), z = u(x(t, s), y(t, s)) generally amounts to the parametricdescription of a surface in R3 containing �. If we can eliminate the parameters s, t toobtain the surface in non-parametric form z = u(x, y) then u is the sought after solutionof the Cauchy problem.

example30 Example 2.10. Let � denote the x axis and let us solve

xux + uy = 1 (2.3.17) 300

with u = h on �. Introducing f(s) = s, g(s) = 0 as the parametrization of �, we mustthen solve 8

><

>:

@x@t

= x x(0, s) = s@y@t

= 1 y(0, s) = 0@@tu(x(t, s), y(t, s)) = 1 u(s, 0) = h(s)

(2.3.18)

We then easily obtain

x(s, t) = set y(s, t) = t u(x(s, t), y(s, t)) = t+ h(s) (2.3.19)

and eliminating t, s yields the solution formula

u(x, y) = y + h(xe�y) (2.3.20) 301

The characteristics in this case are the curves x = set, y = t for fixed s, or x = sey innonparametric form. Note here that the solution is defined throughout the x, y planeeven though nothing in the preceding discussion guarantees that. Since h has not beenotherwise prescribed we may also regard (2.3.20) as the general solution of (2.3.17).

The attentive reader may already realize that this procedure cannot work in all cases,as is made clear by the following consideration: if c ⌘ 0 and � is itself a characteristic

20

Page 22: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

curve, then the solution on � would have to simultaneously be equal to the given functionh and to be constant, so that no solution can exist except possibly in the case that h isa constant function. From another, more general, point of view we must eliminate theparameters s, t by inverting the relations x = x(s, t), y = y(s, t) to obtain s, t in terms ofx, y, at least near �, and according to the inverse function theorem this should requirethat the Jacobian matrix

@x@t

@y@t

@x@s

@y@s

�����t=0

=

a(f(s), g(s)) b(f(s), g(s))

f 0(s) g0(s)

�(2.3.21)

be nonsingular for all s. Equivalently the direction hf 0, g0i should not be parallel toha, bi, and since ha, bi must be tangent to the characteristic curve, this amounts to therequirement that � itself should have a non-characteristic tangent direction at everypoint. We say that � is non-characteristic for the PDE (2.3.10) when this conditionholds.

The following precise theorem can be established, see for example Chapter 1 of [18],or Chapter 3 of [10].

Theorem 2.2. Let � ⇢ R2 be a continuously di↵erentiable curve, which is non-characteristicfor (2.3.10), h a continuously di↵erentiable function on � and let a, b, c be continuouslydi↵erentiable functions in a neighborhood of �. Then there exists a unique continu-ously di↵erentiable function u(x, y) defined in a neighborhood of � which is a solution of(2.3.10).

The method of characteristics is capable of a considerable amount of generalization,in particular to first order PDEs in any number of independent variables, and to fullynonlinear first PDEs, see the references just given above.

2.3.2 Second order problems in R2

classif

Let us next look at the following special type of second order PDE in two independentvariables x, y:

Auxx +Buxy + Cuyy = 0 (2.3.22) l2order

where A,B,C are real constants, not all zero. Consider introducing new coordinates ⇠, ⌘by means of a linear change of variable

⇠ = ↵x+ �y ⌘ = �x+ �y (2.3.23) ltrans

21

Page 23: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

with ↵� � �� 6= 0, so that the transformation is invertible. Our goal is to make a goodchoice of ↵, �, �, � so as to achieve a simpler, but equivalent PDE to study.

Given any PDE and any change of coordinates, we obtain the expression for the PDEin the new coordinate system by straightforward application of the chain rule. In ourcase, for example, we have

@u

@x=@u

@⇠

@⇠

@x+@u

@⌘

@⌘

@x= ↵

@u

@⇠+ �

@u

@⌘(2.3.24)

@2u

@x2

=

✓↵@

@⇠+ �

@

@⌘

◆✓↵@u

@⇠+ �

@u

@⌘

◆= ↵2

@2u

@⇠2+ 2↵�

@2u

@⇠@⌘+ �2

@2u

@⌘2(2.3.25)

with similar expressions for uxy and uyy. Substituting into (2.3.22) the resulting PDE is

au⇠⇠ + bu⇠⌘ + cu⌘⌘ = 0 (2.3.26) trpde

where

a = ↵2A+ ↵�B + �2C (2.3.27)

b = 2↵�A+ (↵� + ��)B + 2��C (2.3.28)

c = �2A+ ��B + �2C (2.3.29)

The idea now is to make special choices of ↵, �, �, � to achieve as simple a form as possiblefor the transformed PDE (2.3.26).

Suppose first that B2� 4AC > 0, so that there exist two real and distinct roots r1

, r2

of Ar2 +Br + C = 0. If ↵, �, �, � are chosen so that↵

�= r

1

�= r

2

(2.3.30)

then a = c = 0, (and ↵� � �� 6= 0) so that the transformed PDE is simply u⇠⌘ = 0. Thegeneral solution of this second order PDE is easily obtained: u⇠ must be a function of ⇠alone, so integrating with respect to ⇠ and observing that the ’constant of integration’could be any function of ⌘, we get

u(⇠, ⌘) = F (⇠) +G(⌘) (2.3.31)

for any di↵erentiable functions F,G. Finally reverting to the original coordinate system,the result is

u(x, y) = F (↵x+ �y) +G(�x+ �y) (2.3.32)

The lines ↵x + �y = C, �x + �y = C are called the characteristics for (2.3.22). Char-acteristics are an important concept for this and some more general second order PDEs,but they don’t play as central a role as in the first order case.

22

Page 24: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

Example 2.11. For the PDEuxx � uyy = 0 (2.3.33)

the roots r satisfy r2�1 = 0. We may then choose, for example, ↵ = � = � = 1, � = �1,to get the general solution

u(x, y) = F (x+ y) +G(x� y) (2.3.34)

Next assume that B2 � 4AC = 0. If either of A or C is 0, then so is B, in which casethe PDE already has the form u⇠⇠ = 0 or u⌘⌘ = 0, say the first of these without loss ofgenerality. Otherwise, choose

↵ = � B

2A� = 1 � = 1 � = 0 (2.3.35)

to obtain a = b = 0, c = A, so that the transformed PDE in all cases is u⇠⇠ = 0.

Finally, if B2 � 4AC < 0 then A 6= 0 must hold, and we may choose

↵ =2Ap

4AC � B2

� =�Bp

4AC � B2

� = 0 � = 1 (2.3.36)

in which case the transformed equation is

u⇠⇠ + u⌘⌘ = 0 (2.3.37)

We have therefore established that any PDE of the type (2.3.22) can be transformed,by means of a linear change of variables, to one of the three simple types,

u⇠⌘ = 0 u⇠⇠ = 0 u⇠⇠ + u⌘⌘ = 0 (2.3.38) modelpde

each of which then leads to a prototype for a certain class of PDEs. If we allow lowerorder terms

Auxx +Buxy + Cuyy +Dux + Euy + Fu = G (2.3.39) l2orderg

then after the transformation (2.3.23) it is clear that the lower order terms remain as lowerorder terms. Thus any PDE of the type (2.3.39) is, up to a change of coordinates, one ofthe three types (2.3.38), up to lower order terms, and only the value of the discriminantB2 � 4AC needs to be known to determine which of the three types is obtained.

The above discussion motivates the following classification: The PDE (2.3.39) is saidto be:

23

Page 25: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

• hyperbolic if B2 � 4AC > 0

• parabolic if B2 � 4AC = 0

• elliptic if B2 � 4AC < 0

The terminology comes from an obvious analogy with conic sections, i.e. the solutionset of Ax2 + Bxy + Cy2 + Dx + Ey + F = 0 is respectively a hyperbola, parabola orellipse (or a degenerate case) according as B2 � 4AC is positive, zero or negative.

We can also allow the coe�cients A,B, . . . G to be variable functions of x, y, andin this case the classification is done pointwise, so the type can change. An importantexample of this phenomenon is the so-called Tricomi equation (see e.g. Chapter 12 of[13])

uxx � xuyy = 0 (2.3.40)

which is hyperbolic for x > 0 and elliptic for x < 0. One might refer to the equation asbeing parabolic for x = 0 but generally speaking we do not do this, since it is not reallymeaningful to speak of a PDE being satisfied in a set without interior points.

The above discussion is special to the case of N = 2 independent variables, and inthe case of N � 3 there is no such complete classification. As we will see there are stillPDEs referred to as being hyperbolic, parabolic or elliptic, but there are others whichare not of any of these types, although these tend to be of less physical importance.

2.3.3 Further discussion of model problems

According to the previous discussion, we should focus our attention on a representativeproblem for each of the three types, since then we will also gain considerable informationabout other problems of the given type.

Wave equation

For the hyperbolic case we consider the wave equation

utt � c2uxx = 0 (2.3.41) waveeq

where c > 0 is a constant. Here we have changed the name of the variable y to t,following the usual convention of regarding u = u(x, t) as depending on a ’space’ variable

24

Page 26: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

x and ’time’ variable t. This PDE arises in the simplest model of wave propagation inone dimension, where u represents, for example, the displacement of a vibrating mediumfrom its equilibrium position, and c is the wave speed.

Following the procedure outlined at the beginning of this section, an appropriatechange of coordinates is ⇠ = x+ ct, ⌘ = x� ct, and we obtain the expression, also knownas d’Alembert’s formula, for the general solution,

u(x, t) = F (x+ ct) +G(x� ct) (2.3.42) dal

for arbitrary twice di↵erentiable functions F,G. The general solution may be viewed asthe superposition of two waves of fixed shape, moving to the right and to the left withspeed c.

The initial value problem for the wave equation consists in solving (2.3.41) for x 2 Rand t > 0 subject to the side conditions

u(x, 0) = f(x) ut(x, 0) = g(x) x 2 R (2.3.43) waveeqic

where f, g represent the initial displacement and initial velocity of the vibrating medium.This problem may be completely and explicitly solved by means of d’Alembert’s formula.We have

F (x) +G(x) = f(x) c(F 0(x)�G0(x)) = g(x) x 2 R (2.3.44)

Integrating the second relation gives F (x) � G(x) = 1

c

R x

0

g(s) ds + C for some constantC, and combining with the first relation yields

F (x) =1

2

✓f(x) +

1

c

Z x

0

g(s) ds+ C

◆G(x) =

1

2

✓f(x)� 1

c

Z x

0

g(s) ds� C

(2.3.45)Substituting into (2.3.42) and doing some obvious simplification we obtain

u(x, t) =1

2(f(x+ ct) + f(x� ct)) +

1

2c

Z x+ct

x�ct

g(s) ds (2.3.46) dalivp

We remark that a general solution formula like (2.3.42) can be given for any PDEwhich is exactly transformable to u⇠⌘ = 0, that is to say, any hyperbolic PDE of theform (2.3.22), but once lower order terms are allowed such a simple solution method isno longer available. For example the so-called Klein-Gordon equation utt � uxx + u = 0may be transformed to u⇠⌘ + 4u = 0 which cannot be solved in so transparent a form.Thus the d’Alembert solution method, while very useful when applicable, is limited inits scope.

25

Page 27: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

Heat equation

Another elementary method, which may be used in a wide variety of situations, is sepa-ration of variables. We illustrate with the case of the initial and boundary value problem

ut = uxx 0 < x < 1 t > 0 (2.3.47)

u(0, t) = u(1, t) = 0 t > 0 (2.3.48)

u(x, 0) = f(x) 0 < x < 1 (2.3.49)

Here (2.3.47) is the heat equation, a parabolic equation modeling for example the tem-perature in a one dimensional medium u = u(x, t) as a function of location x and time t,(2.3.48) are the boundary conditions, stating that the temperature is held at temperaturezero at the two boundary points x = 0 and x = 1 for all t, and (2.3.49) represents theinitial condition, i.e. that the initial temperature distribution is given by the prescribedfunction f(x).

We begin by ignoring the initial condition and otherwise looking for special solutionsof the form u(x, t) = �(t) (x). Obviously u = 0 is such a solution, but cannot be of anyhelp in eventually solving the full stated problem, so we insist that neither of � and isthe zero function. Inserting into (2.3.47) we obtain immediately that

�0(t) (x) = �(t) 00(x) (2.3.50)

must hold, or equivalently�0(t)�(t)

= 00(x) (x)

(2.3.51)

Since the left side depends on t alone and the right side on x alone, it must be that bothsides are equal to a common constant which we denote by �� (without yet at this pointruling out the possibility that � itself is negative or even complex). We have thereforeobtained ODEs for � and

�0(t) + ��(t) = 0 00(x) + � (x) = 0 (2.3.52)

linked via the separation constant �. Next, from the boundary condition (2.3.48) we get�(t) (0) = �(t) (1) = 0, and since � is nonzero we must have (0) = (1) = 0.

The ODE and side conditions for , namely

00(x) + � (x) = 0 0 < x < 1 (0) = (1) = 0 (2.3.53) SL1

26

Page 28: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

is the simplest example of a so-called Sturm-Liouville problem, a topic which will bestudied in detail in Chapter ( ), but this particular case can be handled by elementaryconsiderations. We emphasize that our goal is to find nonzero solutions of (2.3.53), alongwith the values of � these correspond to, and as we will see, only certain values of � willbe possible.

Considering first the case that � > 0, the general solution of the ODE is

(x) = c1

sinp�x+ c

2

cosp�x (2.3.54)

The first boundary condition (0) = 0 implies that c2

= 0 and the second givesc1

sinp� = 0. We are not allowed to have c

1

= 0, since otherwise = 0, so insteadsin

p� = 0 must hold, i.e.

p� = ⇡, 2⇡, . . . . Thus we have found one collection of solu-

tions of (2.3.53), which we denote k(x) = sin k⇡x, k = 1, 2, . . . . Since they were foundunder the assumption that � > 0, we should next consider other possibilities, but itturns out that we have already found all possible solutions of (2.3.53). For example if wesuppose � < 0 and k =

p�� then to solve (2.3.53) we must have (x) = c

1

ekx + c2

e�kx.From the boundary conditions

c1

+ c2

= 0 c1

ek + c2

e�k = 0 (2.3.55)

we see that the unique solution is c1

= c2

= 0 for any k > 0. Likewise we can check that = 0 is the only possible solution for k = 0 and for nonreal k.

For each allowed value of � we obviously have the corresponding function �(t) = e��t,so that

uk(x, t) = e�k2⇡2t sin k⇡x k = 1, 2, . . . (2.3.56)

represents, aside from multiplicative constants, all possible product solutions of (2.3.47),(2.3.48).

To complete the solution of the initial and boundary value problem, we observe thatany sum

P1k=1

ckuk(x, t) is also a solution of (2.3.47),(2.3.48) as long as ck ! 0 su�cientlyrapidly, and we try to choose the coe�cients ck to achieve the initial condition (2.3.49).The requirement is therefore that

f(x) =1X

k=1

ck sin k⇡x (2.3.57) foursine

hold. For any f for which such a sine series representation is valid, we then have thesolution of the given PDE problem

u(x, t) =1X

k=1

cke�k2⇡2t sin k⇡x (2.3.58)

27

Page 29: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

The question then becomes to characterize this set of f ’s in some more straightforwardway, and this is done, among many other things, within the theory of Fourier series, whichwill be discussed in Chapter 8. Roughly speaking the result will be that essentially anyreasonable function can be represented this way, but there are many aspects to this,including elaboration of the precise sense in which the series converges. One other factconcerning this series which we can easily anticipate at this point, is a formula for thecoe�cient ck: If we assume that (2.3.57) holds, we can multiply both sides by sinm⇡xfor some integer m and integrate with respect to x over (0, 1), to obtain

Z1

0

f(x) sinm⇡x dx = cm

Z1

0

sin2 m⇡x dx =cm2

(2.3.59)

sinceR1

0

sin k⇡x sinm⇡x dx = 0 for k 6= m. Thus, if f is representable by a sine series,there is only one possibility for the k’th coe�cient, namely

ck = 2

Z1

0

f(x) sin k⇡x dx (2.3.60) sinecoef

Laplace equation

Finally we discuss a model problem of elliptic type,

uxx + uyy = 0 x2 + y2 < 1 (2.3.61)

u(x, y) = f(x, y) x2 + y2 = 1 (2.3.62)

where f is a given function. The PDE in (2.3.61) is known as Laplace’s equation, and iscommonly written as �u = 0 where � = @2

@x2

+ @2

@y2is the Laplace operator, or Laplacian.

A function satisfying Laplace’s equation in some set is said to be a harmonic function onthat set, thus we are solving the boundary value problem of finding a harmonic functionin the unit disk x2 + y2 < 1 subject to a prescribed boundary condition on the boundaryof the disk.

One should immediately recognize that it would be natural here to make use of polarcoordinates (r, ✓), where according to the usual calculus notations,

r =p

x2 + y2 tan ✓ =y

xx = r cos ✓ y = r sin ✓ (2.3.63)

and we regard u = u(r, ✓) and f = f(✓).

28

Page 30: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

To begin we need to find the expression for Laplace’s equation in polar coordinates.Again this is a straightforward calculation with the chain rule, for example

@u

@x=

@u

@r

@r

@x+@u

@✓

@✓

@x(2.3.64)

=xp

x2 + y2@u

@r� y

x2 + y2@u

@✓(2.3.65)

= cos ✓@u

@r� sin ✓

r

@u

@✓(2.3.66)

and similar expressions for @u@y

and the second derivatives. The end result is

uxx + uyy = urr +1

rur +

1

r2u✓✓ = 0 (2.3.67) laplace2radial

We may now try separation of variables, looking for solutions in the product formu(r, ✓) = R(r)⇥(✓). Substituting into (2.3.67) and dividing by R⇥ gives

r2R00(r)R(r)

+ rR0(r)R(r)

= �⇥00(✓)⇥(✓)

(2.3.68)

so both sides must be equal to a common constant �. Therefore R and ⇥ must be nonzerosolutions of

⇥00 + �⇥ = 0 r2R00 + rR0 � �R = 0 (2.3.69)

Next it is necessary to recognize that there are two ’hidden’ side conditions which wemust make use of. The first of these is that ⇥ must be 2⇡ periodic, since otherwise itwould not be possible to express the solution u in terms of the original variables x, y inan unambiguous way. We can make this explicit by requiring

⇥(0) = ⇥(2⇡) ⇥0(0) = ⇥0(2⇡) (2.3.70)

As in the case of (2.3.53) we can search for allowable values of � by considering thevarious cases � > 0,� < 0 etc. The outcome is that nontrivial solutions exist precisely if� = k2, k = 0, 1, 2, . . . , with corresponding solutions, up to multiplicative constant,

k(x) =

(1 k = 0

sin kx or cos kx k = 1, 2, . . .(2.3.71)

If one is willing to use the complex form, we could replace sin kx, cos kx by e±ikx fork = 1, 2, . . . .

29

Page 31: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

With � determined we must next solve the corresponding R equation,

r2R00 + rR0 � k2R = 0 (2.3.72)

which is of the Cauchy-Euler type (2.1.19). The general solution is

R(r) =

(c1

+ c2

log r k = 0

c1

rk + c2

r�k k = 1, 2 . . .(2.3.73)

and here we encounter the second hidden condition, the solution R should be not besingular at the origin, since otherwise the PDE would not be satisfied throughout theunit disk. Thus we should choose c

2

= 0 in each case, leaving R(r) = rk, k = 0, 1, . . . .

Summarizing, we have found all possible product solutions R(r)⇥(✓) of (2.3.61), andthey are

1, rk sin k✓, rk cos k✓ k = 1, 2, . . . (2.3.74)

up to constant multiples. Any sum of such terms is also a solution of (2.3.61), so we seeka solution of (2.3.61),(2.3.62) in the form

u(r, ✓) = a0

+1X

k=1

akrk cos k✓ + bkr

k sin k✓ (2.3.75)

The coe�cients must then be determined from the requirement that

f(✓) = a0

+1X

k=1

ak cos k✓ + bk sin k✓ (2.3.76) fourseries

This is another problem in the theory of Fourier series, which is very similar to thatassociated with (2.3.57), which as mentioned before will be studied in detail in Chapter8. Exact formulas for the coe�cients in terms of f may be given, as in (2.3.60), seeExercise 19.

2.3.4 Standard problems and side conditions

Let us now formulate a number of typical PDE problems which will recur throughoutthis book, and which are for the most part variants of the model problems discussed in

30

Page 32: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

the previous section. Let ⌦ be some domain in RN and let @⌦ denote the boundary of⌦. For any su�ciently di↵erentiable function u, the Laplacian of u is

�u =NX

k=1

@2u

@x2

k

(2.3.77)

• The PDE�u = h x 2 ⌦ (2.3.78)

is Poisson’s equation, or Laplace’s equation in the special case that h = 0. It isregarded as being of elliptic type, by analogy with the N = 2 case discussed in theprevious section, or on account of a more general definition of ellipticity which willbe given in Chapter 9. The most common type of side conditions associated withthis PDE are

– Dirichlet, or first kind, boundary conditions

u(x) = g(x) x 2 @⌦ (2.3.79)

– Neumann, or second kind, boundary conditions

@u

@n(x) = g(x) x 2 @⌦ (2.3.80)

where @u@n(x) = (ru · n)(x) is the directional derivative in the direction of the

outward normal direction n(x) for x 2 @⌦.

– Robin, or third kind, boundary conditions

@u

@n(x) + �(x)u(x) = g(x) x 2 @⌦ (2.3.81)

for some given function �.

• The PDE�u+ �u = h x 2 ⌦ (2.3.82)

where � is some constant, is the Helmholtz equation, also of elliptic type. Thethree types of boundary condition mentioned for the Poisson equation may also beimposed in this case.

31

Page 33: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

• The PDEut = �u x 2 ⌦ t > 0 (2.3.83)

is the heat equation and is of parabolic type. Here u = u(x, t), where x is regardedas a spatial variable and t a time variable. By convention, the Laplacian acts onlywith respect to the N spatial variables x

1

, . . . xN . Appropriate side conditions fordetermining a solution of the heat equation are an initial condition

u(x, 0) = f(x) x 2 ⌦ (2.3.84)

and boundary conditions of the Dirichlet, Neumann or Robin type mentioned above.The only needed modification is that the functions involved may be allowed todepend on t, for example the Dirichlet boundary condition for the heat equation is

u(x, t) = g(x, t) x 2 @⌦ t > 0 (2.3.85)

and similarly for the other two types.

• The PDEutt = �u x 2 ⌦ t > 0 (2.3.86)

is the wave equation and is of hyperbolic type. Since it is second order in t it isnatural that there be two initial conditions, usually given as

u(x, 0) = f(x) ut(x, 0) = g(x) x 2 ⌦ (2.3.87)

Suitable boundary conditions for the wave equation are precisely the same as forthe heat equation.

• Finally, the PDEiut = �u x 2 RN t > 0 (2.3.88)

is the Schrodinger equation. Even when N = 1 it does not fall under the classifi-cation scheme of Section 2.3.2 because of the complex coe�cient i =

p�1. It is

nevertheless one of the fundamental partial di↵erential equations of mathematicalphysics, and we will have some things to say about it in later chapters. The spatialdomain here is taken to be all of RN rather than a subset ⌦ because this is by farthe most common situation and the only one which will arise in this book. Sincethere is no spatial boundary, the only needed side condition is an initial conditionfor u, u(x, 0) = f(x), as in the heat equation case.

32

Page 34: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

2.4 Well-posed and ill-posed problemsillposed

All of the PDEs and associated side conditions discussed in the previous section turnout to be natural, in the sense that they lead to what are called well-posed problems, asomewhat imprecise concept we explain next. Roughly speaking a problem is well-posedif

• A solution exists.

• The solution is unique.

• The solution depends continuously on the data.

Here by ’data’ we mean any of the ingredients of the problem which we might imaginebeing changed, to obtain a problem of the same general type. For example in the Dirichletproblem for the Poisson equation

�u = f x 2 ⌦ u = 0 x 2 @⌦ (2.4.1)

the term f = f(x) would be regarded as the given data. The idea of continuous depen-dence is that if a ’small’ change is made in the data, then the resulting solution shouldalso undergo only a small change. For such a notion to be made precise, it is necessaryto have some specific idea in mind of how we would measure the magnitude of a changein f . As we shall see, there may be many natural ways to do so, and no precise state-ment about well-posedness can be given until such choices are made. In fact, even theexistence and uniqueness requirements, which may seem more clear cut, may also turnout to require much clarification in terms of what the exact meaning of ’solution’ is.

A problem which is not well-posed is called ill-posed. A classical problem in whichill-posedness can be easily recognized is Hadamard’s example, which we note is not ofone of the standard types mentioned above:

uxx + uyy = 0 �1 < x < 1 y > 0 (2.4.2)

u(x, 0) = 0 uy(x, 0) = g(x) �1 < x1 (2.4.3)

If g(x) = ↵ sin kx for some ↵, k > 0 then a corresponding solution is

u(x, y) = ↵sin kx

keky (2.4.4)

33

Page 35: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

This is known to be the unique solution, but notice that a change in ↵ (i.e. of the data g)of size ✏ implies a corresponding change in the solution for, say, y = 1 of size ✏ek. Sincek can be arbitrarily large, it follows that the problem is ill-posed, that is, small changesin the data do not necessarily lead to small changes in the solution.

Note that in this example if we change the PDE from uxx + uyy = 0 to uxx � uyy = 0then (aside from the name of a variable) we have precisely the problem (2.3.41),(2.3.43),which from the explicit solution (2.3.46) may be seen to be well-posed under any rea-sonable interpretation. Thus we see that some care must be taken in recognizing whatare the ’correct’ side conditions for a given PDE. Other interesting examples of ill-posedproblems are given in exercises 23 and 26, see also [24].

2.5 Exercises

1. Find a fundamental set and the general solution of u000 + u00 + u0 = 0.

2. Let L = aD2+bD+c (a 6= 0) be a constant coe�cient second order linear di↵erentialoperator, and let p(�) = a�2 + b� + c be the associated characteristic polynomial.If �

1

,�2

are the roots of p, show that we can express the operator L as L = a(D ��1

)(D � �2

). Use this factorization to obtain the general solution of Lu = 0 in thecase of repeated roots, �

1

= �2

.

ex22 3. Show that the solution of the initial value problem y0 = 3

py, y(0) = 0 is not unique.

(Hint: y(t) = 0 is one solution, find another one.) Why doesn’t this contradict theassertion in Theorem 2.1 about unique solvability of the initial value problem?

4. Solve the initial value problem for the Cauchy-Euler equation

(t+ 1)2u00 + 4(t+ 1)u0 � 10u = 0 u(1) = 2 u0(1) = �1

5. Consider the integral equationZ

1

0

K(x, y)u(y) dy = �u(x) + g(x)

for the kernel

K(x, y) =x2

1 + y3

a) For what values of � 2 C does there exist a unique solution for any function gwhich is continuous on [0, 1]?

34

Page 36: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

b) Find the solution set of the equation for all � 2 C and continuous functions g.

(Hint: For � 6= 0 any solution must have the form u(x) = �g(x)�

+ Cx2 for someconstant C.)

6. Find a kernel K(x, y) such that u(x) =R1

0

K(x, y)f(y) dy is the solution of

u00 + u = f(x) u(0) = u0(0) = 0

(Hint: Review the variation of parameters method in any undergraduate ODEtextbook.)

2-7 7. If f 2 C([0, 1]),

K(x, y) =

(y(x� 1) 0 < y < x < 1

x(y � 1) 0 < x < y < 1

and

u(x) =

Z1

0

K(x, y)f(y) dy

show thatu00 = f 0 < x < 1 u(0) = u(1) = 0

8. For each of the integral operators in (2.2.8),(2.2.14),(2.2.15),(2.2.16),and (2.2.17),discuss the classification(s) of the corresponding kernel, according to Definition(2.1).

9. Find the general solution of (1 + x2)ux + uy = 0. Sketch some of the characteristiccurves.

10. The general solution in Example 2.10 was found by solving the correspondingCauchy problem with � being the x axis. But the general solution should notactually depend on any specific choice of �. Show that the same general solution isfound if instead we take � to be the y axis.

11. Find the solution ofyux + xuy = 1 u(0, y) = e�y2

Discuss why the solution you find is only valid for |y| � |x|.

12. The method of characteristics developed in Section 2.3.1 for the linear PDE (2.3.10)can be easily extended to the so-called semilinear equation

a(x, y)ux + b(x, y)uy = c(x, y, u) (2.5.1)

35

Page 37: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

We simply replace (2.3.12) by

d

dtu(x(t), y(t)) = c(x(t), y(t), u(x(t), y(t))) (2.5.2)

which is still an ODE along a characteristic. With this in mind, solve

ux + xuy + u2 = 0 u(0, y) =1

y(2.5.3)

13. Find the general solution of uxx � 4uxy + 3uyy = 0.

14. Find the regions of the xy plane where the PDE

yuxx � 2uxy + xuyy � 3ux + u = 0

is elliptic, parabolic, and hyperbolic.

15. Find a solution formula for the half line wave equation problem

utt � c2uxx = 0 x > 0 t > 0 (2.5.4)

u(0, t) = h(t) t > 0 (2.5.5)

u(x, 0) = f(x) x > 0 (2.5.6)

ut(x, 0) = g(x) x > 0 (2.5.7)

Note where the solution coincides with (2.3.46) and explain why this should beexpected.

16. Complete the details of verifying (2.3.67)

ex-2-17 17. If u is a twice di↵erentiable function on RN depending only on r = |x|, show that

�u = urr +N � 1

rur

(Spherical coordinates in RN are reviewed in Section 18.3, but the details of theangular variables are not needed for this calculation. Start by showing that @u

@xj

=

u0(r)xj

r.)

18. Verify in detail that there are no nontrivial solutions of (2.3.53) for nonreal � 2 C.

ex23 19. Assuming that (2.3.76) is valid, find the coe�cients ak, bk in terms of f . (Hint:multiply the equation by sinm✓ or cosm✓ and integrate from 0 to 2⇡.)

36

Page 38: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

20. In the two dimensional case, solutions of Laplace’s equation �u = 0 may also befound by means of analytic function theory. Recall that if z = x+iy then a functionf(z) is analytic in an open set ⌦ if f 0(z) exists at every point of ⌦. If we think off = u + iv and u, v as functions of x, y then u = u(x, y), v = v(x, y) must satisfythe Cauchy-Riemann equations ux = vy, uy = �vx. Show in this case that u, v arealso solutions of Laplace’s equation. Find u, v if f(z) = z3 and f(z) = ez.

21. Find all of the product solutions u(x, t) = �(t) (x) that you can which satisfy thedamped wave equation

utt + ↵ut = uxx 0 < x < ⇡ t > 0

and the boundary conditions

u(0, t) = ux(⇡, t) = 0 t > 0

Here ↵ > 0 is the damping constant. What is the significance of the condition↵ < 1?

ex24 22. Show that any solution of the wave equation utt � uxx = 0 has the ‘four pointproperty’

u(x, t) + u(x+ h� k, t+ h+ k) = u(x+ h, t+ h) + u(x� k, t+ k)

for any h, k. (Suggestion: Use d’Alembert’s formula.)

ex25 23. In the Dirichlet problem for the wave equation

utt � uxx = 0 0 < x < 1 0 < t < 1

u(0, t) = u(1, t) = 0 0 < t < 1

u(x, 0) = 0 u(x, 1) = f(x) 0 < x < 1

show that neither existence nor uniqueness holds. (Hint: For the non-existencepart, use exercise 22 to find an f for which no solution exists.)

24. Let ⌦ be the rectangle [0, a]⇥ [0, b] in R2. Find all possible product solutions

u(x, y, t) = �(t) (x)⇣(y)

satisfyingut ��u = 0 (x, y) 2 ⌦ t > 0

u(x, y, t) = 0 (x, y) 2 @⌦ t > 0

37

Page 39: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

25. Find a solution of the Dirichlet problem for u = u(x, y) in the unit disc ⌦ = {(x, y) :x2 + y2 < 1},

�u = 1 (x, y) 2 ⌦ u(x, y) = 0 (x, y) 2 @⌦

(Suggestion: look for a solution in the form u = u(r) and recall (2.3.67).)

ex26 26. The problem

ut = uxx 0 < x < 1 t < T (2.5.8)

u(0, t) = u(1, t) = 0 t > 0 (2.5.9)

u(x, T ) = f(x) 0 < x < 1 (2.5.10)

is sometimes called a final value problem for the heat equation.

a) Show that this problem is ill-posed.

b) Show that this problem is equivalent to (2.3.47),(2.3.48),(2.3.49) except with theheat equation (2.3.47) replaced by the backward heat equation ut = �uxx.

38

Page 40: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

Chapter 3

Vector spaces

We will be working frequently with function spaces which are themselves special casesof more abstract spaces. Most such spaces which are of interest to us have both linearstructure and metric structure. This means that given any two elements of the space itis meaningful to speak of (i) a linear combination of the elements, and (ii) the distancebetween the two elements. These two kinds of concepts are abstracted in the definitionsof vector space and metric space.

3.1 Axioms of a vector spacechvec-1

Definition 3.1. A vector space is a set X such that whenever x, y 2 X and � is a scalarwe have x+ y 2 X and �x 2 X, and the following axioms hold.

[V1] x+ y = y + x for all x, y 2 X

[V2] (x+ y) + z = x+ (y + z) for all x, y, z 2 X

[V3] There exists an element 0 2 X such that x+ 0 = x for all x 2 X

[V4] For every x 2 X there exists an element �x 2 X such that x+ (�x) = 0

[V5] �(x+ y) = �x+ �y for all x, y 2 X and any scalar �

[V6] (�+ µ)x = �x+ µx for any x 2 X and any scalars �, µ

39

Page 41: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

[V7] �(µx) = (�µ)x for any x 2 X and any scalars �, µ

[V8] 1x = x for any x 2 X

Here the field of scalars my be either the real numbers R or the complex numbersC, and we may refer to X as a real or complex vector space accordingly, if a distinctionneeds to be made.

By an obvious induction argument, if x1

, . . . , xm 2 X and �1

, . . . ,�m are scalars, thenthe linear combination

Pmj=1

�jxj is itself an element of X.

Example 3.1. Ordinary N -dimensional Euclidean space

RN := {x = (x1

, x2

. . . xN) : xj 2 R}

is a real vector space with the usual operations of vector addition and scalar multiplica-tion,

(x1

, x2

. . . xN) + (y1

, y2

, . . . yN) = (x1

+ y1

, x2

+ y2

. . . xN + yN)

�(x1

, x2

. . . xN) = (�x1

,�x2

, . . .�xN) � 2 R

If we allow the components xj as well as the scalars � to be complex, we obtaininstead the complex vector space CN .

Example 3.2. If E ⇢ RN , let

C(E) = {f : E ! R : f is continous at x for every x 2 E}

denote the set of real valued continuous functions on E. Clearly C(E) is a real vectorspace with the ordinary operations of function addition and scalar multiplication

(f + g)(x) = f(x) + g(x) (�f)(x) = �f(x) � 2 R

If we allow the range space in the definition of C(E) to be C then C(E) becomes acomplex vector space.

Spaces of di↵erentiable functions likewise may be naturally regarded as vector spaces,for example

Cm(E) = {f : D↵f 2 C(E), |↵| m}and

C1(E) = {f : D↵f 2 C(E), for all ↵}2

40

Page 42: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

Example 3.3. If 0 < p < 1 and E is a measurable subset of RN , the space Lp(E) isdefined to be the set of measurable functions f : E ! R or f : E ! C such that

Z

E

|f(x)|p dx < 1 (3.1.1)

Here the integral is defined in the Lebesgue sense. Those unfamiliar with measure theoryand Lebesgue integration should consult a standard textbook such as [29],[27], or see abrief summary in Appendix ( ).

It may then be shown that Lp(E) is vector space for any 0 < p < 1. To see this weuse the known fact that if f, g are measurable then so are f + g and �f for any scalar �,and the numerical inequality (a+ b)p Cp(ap+ bp) for a, b � 0, where Cp = max (2p�1, 1)to prove that f +g 2 Lp(E) whenever f, g 2 Lp(E). Verification of the remaining axiomsis routine.

The related vector space L1(E) is defined as the set of measurable functions f forwhich

ess supx2E|f(x)| < 1 (3.1.2)

Here M = ess supx2E|f(x)| if |f(x)| M a.e. and there is no smaller constant with thisproperty.

Definition 3.2. If X is a vector space, a subset M ⇢ X is a subspace of X if

(i) x+ y 2 M whenever x, y 2 M

(ii) �x 2 M whenever x 2 M and � is a scalar

That is to say, a subspace is a subset of X which is closed under formation of linearcombinations. Clearly a subspace of a vector space is itself a vector space.

Example 3.4. The subset M = {x 2 RN : xj = 0} is a subspace of RN for any fixed j.

Example 3.5. If E ⇢ RN then C1(E) is a subspace of Cm(E) for any m, which in turnis a subspace of C(E).

Example 3.6. If X is any vector space and S ⇢ X, then the set of all finite linearcombinations of elements of S,

L(S) := {v 2 X : x =mX

j=1

�jxj for some scalars �1

,�2

, . . .�m and elements x1

, . . . xm 2 S}

41

Page 43: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

is a subspace ofX. It is also called the span, or linear span of S, or the subspace generatedby S. 2

Example 3.7. If in Example 5 we take X = C([a, b]) and fj(x) = xj�1 for j = 1, 2, . . .then the subspace generated by {fj}N+1

j=1

is PN , the vector space of polynomials of degreeless than or equal to N . Likewise, the the subspace generated by {fj}1j=1

is P , the vectorspace of all polynomials. 2

3.2 Linear independence and bases

Definition 3.3. We say that S ⇢ X is linearly independent if whenever x1

, . . . xm 2 S,�1

, . . .�m are scalars andPm

j=1

�jxj = 0 then �1

= �2

= . . .�m = 0. Otherwise S islinearly dependent.

Equivalently, S is linearly dependent if it is possible to express at least one of itselements as a linear combination of the remaining ones. In particular any set containingthe zero element is linearly dependent.

hamel Definition 3.4. We say that S ⇢ X is a basis of X if for any x 2 X there exists uniquescalars �

1

,�2

, . . .�m and elements x1

, . . . , xm 2 S such that x =Pm

j=1

�jxj.

The following characterization of a basis is then immediate:

Theorem 3.1. S ⇢ X is a basis of X if and only if S is linearly independent andL(S) = X.

It is important to emphasize that in this definition of basis it is required that everyx 2 X be expressible as a finite linear combination of the basis elements. This notionof basis will be inadequate for later purposes, and will be replaced by one which allowsinfinite sums, but this cannot be done until a meaning of convergence is available. Thenotion of basis in Definition 3.4 is called a Hamel basis if a distinction is necessary.

Definition 3.5. We say that dimX, the dimension of X, is m if there exist m linearlyindependent vectors in X but any collection of m+1 elements of X is linearly dependent.If there exists m linearly independent vectors for any positive integer m, then we saydimX = 1.

prop31 Proposition 3.1. The elements {x1

, x2

, . . . xm} form a basis for L({x1

, x2

, . . . xm}) ifand only if they are linearly independent.

42

Page 44: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

prop32 Proposition 3.2. The dimension of X is the number of vectors in any basis of X.

The proof of both of these Propositions is left for the exercises.

Example 3.8. RN or CN has dimension N . We will denote by ej the standard unitvector with a one in the j’th position and zero elsewhere. Then {e

1

, e2

, . . . eN} is thestandard basis for either RN or CN .

Example 3.9. In the vector space C([a, b]) the elements fj(t) = tj�1 are clearly linearlyindependent, so that the dimension is 1, as is the dimension of the subspace P . Alsoevidently the subspace PN has dimension N + 1.

Example 3.10. The set of solutions of the ordinary di↵erential equation u00 + u = 0is precisely the set of linear combinations u(t) = �

1

sin t + �2

cos t. Since sin t, cos t arelinearly independent functions, they form a basis for this two dimensional space.

The following is interesting, although not of great practical significance. Its proof,which is not obvious in the infinite dimensional case, relies on the Axiom of Choice andwill not be given here.

Theorem 3.2. Every vector space has a basis.

3.3 Linear transformations of a vector spacesec33

If X and Y are vector spaces, a mapping T : X 7�! Y is called linear if

T (�1

x1

+ �2

x2

) = �1

T (x1

) + �2

T (x2

) (3.3.1)

for all x1

, x2

2 X and all scalars �1

,�2

. Such a linear transformation is uniquely deter-mined on all of X by its action on any basis of X, i.e. if S = {x↵}↵2A is a basis of Xand y↵ = T (x↵), then for any x =

Pmj=1

�jx↵j

we have Tx =Pm

j=1

�jy↵j

.

In the case thatX andY are both of finite dimension let us choose bases {x1

, x2

, . . . xm},{y

1

, y2

, . . . yn} of X,Y respectively. For 1 j m there must exist unique scalars akjsuch that Txj =

Pnk=1

akjyk and it follows that

x =mX

j=1

�jxj ) Tx =nX

k=1

µkyk where µk =mX

j=1

akj�j (3.3.2)

43

Page 45: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

For a given basis {x1

, x2

, . . . xm} of X, if x =Pm

j=1

�jxj we say that �1

,�2

, . . .�m arethe coordinates of x with respect to the given basis. The n ⇥m matrix A = [akj] thusmaps the coordinates of x with respect to the basis {x

1

, x2

, . . . xm} to the coordinates ofTx with respect to the basis {y

1

, y2

, . . . yn}, and thus encodes all information about thelinear mapping T .

If T : X 7�! Y is linear, one-to-one and onto then we say T is an isomorphismbetween X to Y, and the vector spaces X and Y are isomorphic whenever there existsan isomorphism between them. If T is such an isomorphism, and S is a basis of X then iteasy to check that the image set T (S) is a basis of Y. In particular, any two isomorphicvector spaces have the same finite dimension or are both infinite dimensional.

For any linear mapping T : X ! Y we define the kernel, or null space, of T as

N(T ) = {x 2 X : Tx = 0} (3.3.3)

and the range of T as

R(T ) = {y 2 Y : y = Tx for some x 2 X} (3.3.4)

It is immediate that N(T ) and R(T ) are subspaces of X,Y respectively, and T is anisomorphism precisely if N(T ) = {0} and R(T ) = Y. If X = Y = RN or CN , we learnin linear algebra that these two conditions are equivalent.

3.4 Exercises

1. Using only the vector space axioms, show that the zero element in [V3] is unique.

2. Prove Propositions 3.1 and 3.2.

3. Show that the intersection of any family of subspaces of a vector space is also asubspace. What about the union of subspaces?

4. Show that Mm⇥n, the set of m⇥ n matrices, with the usual definitions of additionand scalar multiplication, is a vector space of dimension mn. Show that the subsetof symmetric matrices n ⇥ n matrices forms a subspace of Mn⇥n. What is itsdimension?

5. Under what conditions on a measurable set E ⇢ RN and p 2 (0,1] will it be truethat C(E) is a subspace of Lp(E)? Under what conditions is Lp(E) a subset ofLq(E)?

44

Page 46: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

6. Let uj(t) = t�j where �1

, . . .�n are arbitrary unequal real numbers. Show that{u

1

. . . un} are linearly independent functions on any interval (a, b) ⇢ R. (Sugges-tion: If

Pnj=1

↵jt�j = 0, divide by t�1 and di↵erentiate.)

7. A side condition for a di↵erential equation is homogeneous if whenever two func-tions satisfy the side condition then so does any linear combination of the twofunctions. For example the Dirichlet type boundary condition u = 0 for x 2 @⌦ ishomogeneous. Now let Lu =

P|↵|m a↵(x)D↵u denote any linear di↵erential oper-

ator. Show that the set of functions satisfying Lu = 0 and any homogeneous sideconditions is a vector space.

8. Consider the di↵erential equation u00 + u = 0 on the interval (0, ⇡). What is thedimension of the vector space of solutions which satisfy the homogeneous boundaryconditions a) u(0) = u(⇡), and b) u(0) = u(⇡) = 0. Repeat the question if theinterval (0, ⇡) is replaced by (0, 1) and (0, 2⇡).

9. Let Df = f 0 for any di↵erentiable function f on R. For any N � 0 show thatD : PN ! PN is linear and find its null space and range.

10. If X and Y are vector spaces, then the Cartesian product of X and Y, is definedas the set of ordered pairs

X⇥Y = {(x, y) : x 2 X, y 2 Y} (3.4.1)

Addition and scalar multiplication on X⇥Y are defined in the natural way,

(x, y) + (x, y) = (x+ x, y + y) �(x, y) = (�x,�y) (3.4.2)

a) Show that X⇥Y is a vector space.

b) Show that R⇥ R is isomorphic to R2.

11. IfX,Y are vector spaces of the same finite dimension, showX andY are isomorphic.

12. Show that Lp(0, 1) and Lp(a, b) are isomorphic, for any a, b 2 R and p 2 (0,1].

45

Page 47: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

Chapter 4

Metric spaces

chmetric

4.1 Axioms of a metric space

A metric space is a set on which some natural notion of distance may be defined.

Definition 4.1. A metric space is a pair (X, d) where X is a set and d is a real valuedmapping on X⇥X, such that the following axioms hold.

[M1] d(x, y) � 0 for all x, y 2 X

[M2] d(x, y) = 0 if and only if x = y

[M3] d(x, y) = d(y, x) for all x, y 2 X

[M4] d(x, y) d(x, z) + d(z, y) for all x, y, z 2 X.

Here d is the metric on X, i.e. d(x, y) is regarded as the distance from x to y. Axiom[M4] is known as the triangle inequality. Although strictly speaking the metric space isthe pair (X, d) it is a common practice to refer to X itself as being the metric space, withthe metric d understood from context. But as we will see in examples it is often possibleto assign di↵erent metrics to the same set X.

If (X, d) is a metric space and Y ⇢ X then it is clear that (Y, d) is also a metricspace, and in this case we say that Y inherits the metric of X.

46

Page 48: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

ex41 Example 4.1. If X = RN then there are many choices of d for which (RN , d) is a metricspace. The most familiar is the ordinary Euclidean distance

d(x, y) =

NX

j=1

|xj � yj|2! 1

2

(4.1.1)

In general we may define

dp(x, y) =

NX

j=1

|xj � yj|p! 1

p

1 p < 1 (4.1.2)

andd1(x, y) = max (|x

1

� y1

|, |x2

� y2

|, . . . |xn � yn|) (4.1.3)

The verification that (Rn, dp) is a metric space for 1 p 1 is left to the exercises– the triangle inequality is the only nontrivial step. The same family of metrics may beused with X = CN.

CofE Example 4.2. To assign a metric to C(E) more specific assumptions must be madeabout E. If we assume, for example, that E is a closed and bounded1 subset of RN wemay set

d1(f, g) = maxx2E

|f(x)� g(x)| (4.1.4) CMetric

so that d(f, g) is always finite by virtue of the well known theorem that a continuousfunction achieves its maximum on a closed, bounded set. Other possibilities are

dp(f, g) =

✓Z

E

|f(x)� g(x)|p dx◆ 1

p

1 p < 1 (4.1.5)

Note the analogy with the definition of dp in the case of RN or CN .

For more arbitrary sets E there is in general no natural metric for C(E). For example,if E is an open set, none of the metrics dp can be used since there is no reason why dp(f, g)should be finite for f, g 2 C(E).

As in the case of vector spaces, some spaces of di↵erentiable functions may also bemade into metric spaces. For this we will assume a bit more about E, namely that E is

1I.e. E is compact in RN . Compactness is discussed in more detail below, and we avoid using the term untilthen.

47

Page 49: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

the closure of a bounded open set O ⇢ RN , and in this case will say that D↵f 2 C(E) ifthe function D↵f defined in the usual pointwise sense on O has a continuous extensionto E. We then can define

Cm(E) = {f : D↵f 2 C(E) whenever |↵| m} (4.1.6)

with metricd(f, g) = max

|↵|mmaxx2E

|D↵(f � g)(x)| (4.1.7) CmMetric

which may be easily checked to satisfy [M1]-[M4].

We cannot define a metric on C1(E) in the obvious way just by letting m ! 1in the above definition, since there is no reason why the resulting maximum over m in(4.1.7) will be finite, even if f 2 Cm(E) for every m. See however Exercise 18.

Example 4.3. Recall that if E is a measurable subset of RN , we have defined corre-sponding vector spaces Lp(E) for 0 < p 1. To endow them with metric space structurelet

dp(f, g) =

✓Z

E

|f(x)� g(x)|p dx◆ 1

p

(4.1.8) dpmet

for 1 p < 1, andd1(f, g) = ess supx2E|f(x)� g(x)| (4.1.9) dinfmet

The validity of axioms [M1] and [M3] is clear, and the triangle inequality [M4] isan immediate consequence of the Minkowski inequality (18.1.10). But axiom [M2] doesnot appear to be satisfied here, since for example, two functions f, g agreeing except ata single point, or more generally agreeing except on a set of measure zero, would havedp(f, g) = 0. It is necessary, therefore, to modify our point of view concerning Lp(E) asfollows. We define an equivalence relation f ⇠ g if f = g almost everywhere, i.e. excepton a set of measure zero. If dp(f, g) = 0 we would be able to correctly conclude thatf ⇠ g, in which case we will regard f and g as being the same element of Lp(E). Thusstrictly speaking, Lp(E) is the set of equivalence classes of measurable functions, wherethe equivalence classes are defined by means of the above equivalence relation.

The distance dp([f ], [g]) between two equivalence classes [f ] and [g] may be unam-biguously determined by selecting a representative of each class and then evaluatingthe distance from (4.1.8) or (4.1.9). Likewise the vector space structure of Lp(E) ismaintained since, for example, we can define the sum of equivalence classes [f ] + [g] byselecting a representative of each class and observing that if f

1

⇠ f2

and g1

⇠ g2

then

48

Page 50: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

f1

+g1

⇠ f2

+g2

. It is rarely necessary to make a careful distinction between a measurablefunction and the equivalence class it belongs to, and whenever it can cause no confusionwe will follow the common practice of referring to members of Lp(E) as functions ratherthan equivalence classes. The notation f may be used to stand for either a function or itsequivalence class. An element f 2 Lp(E) will be said to be continuous if its equivalenceclass contains a continuous function, and in this way we can naturally regard C(E) as asubset of Lp(E).

Although Lp(E) is a vector space for 0 < p 1, we cannot use the above definitionof metric for 0 < p < 1, since it turns out the triangle inequality is not satisfied (seeExercise 6 of Chapter 5) except in degenerate cases.

4.2 Topological concepts

In a metric space various concepts of point set topology may be introduced.

Definition 4.2. If (X, d) is a metric space then

1. B(x, ✏) = {y 2 X : d(x, y) < ✏} is the ball centered at x of radius ✏.

2. A set E ⇢ X is bounded if there exists some x 2 X and R < 1 such thatE ⇢ B(x,R).

3. If E ⇢ X, then a point x 2 X is an interior point of E if there exists ✏ > 0 suchthat B(x, ✏) ⇢ E.

4. If E ⇢ X, then a point x 2 X is a limit point of E if for any ✏ > 0 there exists apoint y 2 B(x, ✏) \ E, y 6= x.

5. A subset E ⇢ X is open if every point of E is an interior point of E. By convention,the empty set is open.

6. A subset E ⇢ X is closed if every limit point of E is in E.

7. The closure E of a set E ⇢ X is the union of E and the limit points of E.

8. The interior E� of a set E is the set of all interior points of E.

9. A subset E is dense in X if E = X

49

Page 51: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

10. X is separable if it contains a countable dense subset.

11. If E ⇢ X, we say that x 2 X is a boundary point of E if for any ✏ > 0 the ballB(x, ✏) contains at least one point of E and at least one point of the complementEc = {x 2 X : x 62 E}. The boundary of E is denoted @E.

The following Proposition states a number of elementary but important properties.Proofs are essentially the same as in the more familiar special case when the metric spaceis a subset of RN , and will be left for the reader.

Proposition 4.1. Let (X, d) be a metric space. Then

1. B(x, ✏) is open for any x 2 X and ✏ > 0.

2. E ⇢ X is open if and only if its complement Ec is closed

3. An arbitrary union or finite intersection of open sets is open.

4. An arbitrary intersection or finite union of closed sets is closed.

5. If E ⇢ X then E� is the union of all open sets contained in E, E� is open, and Eis open if and only if E = E�.

6. E is the intersection of all closed sets containing E, E is closed, and E is closed ifand only if E = E.

7. If E ⇢ X then @E = E\E� = E \ Ec

Next we study infinite sequences in X.

Definition 4.3. We say that a sequence {xn}1n=1

in X is convergent to x, that is,limn!1 xn = x, if for any ✏ > 0 there exists n

0

< 1 such that d(xn, x) < ✏ when-ever n � n

0

.

Example 4.4. If X = RN or CN , and d is any one of the metrics dp, then xn ! x if andonly if each component sequence converges to the corresponding limit, i.e. xj,n ! xj asn ! 1 in the ordinary sense of convergence in R or C. (Here xj,n is the j’th componentof xn.)

Example 4.5. In the metric space (C(E), d1) of Example 4.2, limn!1 fn = f is equiv-alent to the definition of uniform convergence on E.

50

Page 52: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

Definition 4.4. We say that a sequence {xn}1n=1

in X is a Cauchy sequence if for any✏ > 0 there exists n

0

< 1 such that d(xn, xm) < ✏ whenever n,m � n0

.

It is easy to see that a convergent sequence is always a Cauchy sequence, but theconverse may be false.

Definition 4.5. A metric space X is said to be complete if every Cauchy sequence in Xis convergent in X.

Example 4.6. Completeness is one of the fundamental properties of the real numbersR, see for example Chapter 1 of [28]. If a sequence {xn}1n=1

in RN is Cauchy with respectto any of the metrics dp, then each component sequence {xj,n}1n=1

is a Cauchy sequencein R, hence convergent in R. It then follows immediately that {xn}1n=1

is convergent inRN , again with any of the metrics dp. The same conclusion holds for CN , so that RN ,CN

are complete metric spaces. These spaces are also separable since the subset consistingof points with rational co-ordinates is countable and dense. A standard example of anincomplete metric space is the set of rational numbers with the metric inherited from R.

Most metric spaces used in this book, and indeed most metric spaces used in appliedmathematics, are complete.

prop42 Proposition 4.2. If E ⇢ RN is closed and bounded, then the metric space C(E) withmetric d = d1 is complete.

Proof: Let {fn}1n=1

be a Cauchy sequence in C(E). If ✏ > 0 we may then find n0

suchthat

maxx2E

|fn(x)� fm(x)| < ✏ (4.2.1) eq401

whenever n,m � n0

. In particular the sequence of numbers {fn(x)}1n=1

is Cauchy in Ror C for each fixed x 2 E, so we may define f(x) := limn!1 fn(x). Letting m ! 1 in(4.2.1) we obtain

|fn(x)� f(x)| ✏ n � n0

x 2 E (4.2.2)

which means d(fn, f) ✏ for n � n0

. It remains to check that f 2 C(E). If we pickx 2 E, then since fn

0

2 C(E) there exists � > 0 such that |fn0

(x) � fn0

(y)| < ✏ if|y � x| < �. Thus for |y � x| < � we have

|f(x)� f(y)| |f(x)� fn0

(x)|+ |fn0

(x)� fn0

(y)|+ |fn0

(y)� f(y)| < 3✏ (4.2.3)

Since ✏ is arbitrary, f is continuous at x, and since x is arbitrary f 2 C(E). Thus wehave concluded that the Cauchy sequence {fn}1n=1

is convergent in C(E) to f 2 C(E),as needed. 2

51

Page 53: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

The final part of the above proof should be recognized as the standard proof of thefamiliar fact that a uniform limit of continuous functions is continuous.

The spaces Cm(E) can likewise be shown, again assuming that E is closed andbounded, to be complete metric spaces with the metric defined in (4.1.7), see Exercise19.

If we were to choose the metric d1

on C(E) then the resulting metric space is not

complete. Choose for example E = [�1, 1] and fn(x) = x1

2n+1 so that the pointwise limitof fn(x) is

f(x) = 1 x > 0 f(x) = �1 x < 0 f(0) = 0 (4.2.4)

By a simple calculation Z1

�1

|fn(x)� f(x)| = 1

n+ 1(4.2.5)

so that {fn}1n=1

must be Cauchy in C(E) with metric d1

. On the other hand {fn}1n=1

cannot be convergent in this space, since the only possible limit is f which does notbelong to C(E).

The same example can be modified to show that C(E) is not complete with any ofthe metrics dp for 1 p < 1, and so d1 is in some sense the ’natural’ metric. For thisreason C(E) will always be assumed to supplied with the metric d1 unless otherwisestated.

We next summarize in the form of a theorem some especially important facts aboutthe metric spaces Lp(E), which may be found in any standard textbook on Lebesgueintegration, for example Chapter 3 of [29] or Chapter 8 of [37].

th41 Theorem 4.1. If E ⇢ RN is measurable, then

1. Lp(E) is complete for 1 p 1.

2. Lp(E) is separable for 1 p < 1.

3. If Cc(E) is the set of continuous functions of bounded support, i.e.

Cc(E) = {f 2 C(E) : there exists R < 1 such that f(x) ⌘ 0 for |x| > R} (4.2.6)

then Cc(E) is dense in Lp(E) for 1 p < 1

The completeness property is a significant result in measure theory, often known asthe Riesz-Fischer Theorem.

52

Page 54: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

4.3 Functions on metric spaces and continuity

Next, suppose X,Y are two metric spaces with metrics dX, dY respectively.

Definition 4.6. Let T : X ! Y be a mapping.

1. We say T is continuous at a point x 2 X if for any ✏ > 0 there exists � > 0 suchthat dY(T (x), T (x)) ✏ whenever dX(x, x) �.

2. T is continuous on X if it is continuous at each point of X.

3. T is uniformly continuous on X if for any ✏ > 0 there exists � > 0 such thatdY(T (x), T (x)) ✏ whenever dX(x, x) �, x, x 2 X.

4. T is Lipschitz continuous on X if there exists L such that

dY(T (x), T (x)) LdX(x, x) x, x 2 X (4.3.1)

The infimum of all L’s which work in this definition is called the Lipschitz constantof T .

Clearly we have the implications that T Lipschitz continuous implies T is uniformlycontinuous, which in turn implies that T is continuous.

T is one-to-one, or injective, if T (x1

) = T (x2

) only if x1

= x2

, and onto, or surjective,if for every y 2 Y there exists some x 2 X such that T (x) = y. If T is both one-to-oneand onto then we say it is bijective, and in this case there must exist an inverse mappingT�1 : Y ! X.

For any mapping T : X ! Y we define, for E ⇢ X and F ⇢ Y

T (E) = {y 2 Y : y = T (x) for some x 2 E} (4.3.2)

the image of E in Y, and

T�1(E) = {x 2 X : T (x) 2 E} (4.3.3)

the preimage of F in X. Note that T is not required to be bijective in order that thepreimage be defined.

The following theorem states two useful characterizations of continuity. Condition b)is referred to as the sequential definition of continuity, for obvious reasons, while c) isthe topological definition, since it may be used to define continuity in much more generaltopological spaces.

53

Page 55: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

Theorem 4.2. Let X,Y be metric spaces and T : X ! Y. Then the following areequivalent:

a) T is continuous on X.

b) If xn 2 X and xn ! x, then T (xn) ! T (x).

c) If E is open in Y then T�1(E) is open in X.

Proof: Assume T is continuous on X and let xn ! x in X. If ✏ > 0 then there exists� > 0 such that dY(T (x), T (x)) < ✏ if dX(x, x) < �. Choosing n

0

su�ciently large thatdX(xn, x) < � for n � n

0

we then must have dY(T (xn), T (x)) < ✏ for n � n0

, so thatT (xn) ! T (x). Thus a) implies b).

To see that b) implies c), suppose condition b) holds, E is open in Y and x 2 T�1(E).We must show that there exists � > 0 such that x 2 T�1(E) whenever dX(x, x) < �. If notthen there exists a sequence xn ! x such that xn 62 T�1(E), and by b), T (xn) ! T (x).Since y = T (x) 2 E and E is open, there exists ✏ > 0 such that z 2 E if dY(z, y) < ✏.Thus T (xn) 2 E for su�ciently large n, i.e. xn 2 T�1(E), a contradiction.

Finally, suppose c) holds and fix x 2 X. If ✏ > 0 then corresponding to the open setE = B(T (x), ✏) in Y there exists a ball B(x, �) in X such that B(x, �) ⇢ T�1(E). Butthis means precisely that if dX(x, x) < � then dY(T (x), T (x)) < ✏, so that T is continuousat x. 2

4.4 Compactness and optimization

Another important topological concept is that of compactness.

Definition 4.7. If E ⇢ X then a collection of open sets {G↵}↵2A is an open cover of Eif E ⇢ [↵2AG↵.

Here A is the index set and may be finite, countably or uncountably infinite.

Definition 4.8. K ⇢ X is compact if any open cover of K has a finite subcover. Moreexplicitly, K is compact if whenever K ⇢ [↵2AG↵, where each G↵ is open, there exists afinite number of indices ↵

1

,↵2

, . . .↵m 2 A such that K ⇢ [mj=1

G↵j

. In addition, E ⇢ X

is precompact (or relatively compact) if E is compact.

54

Page 56: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

compact1 Proposition 4.3. A compact set is closed and bounded. A closed subset of a compactset is compact.

Proof: Suppose that K is compact and pick x 2 Kc. For any r > 0 let Gr = {y 2X : d(x, y) > r}. It is easy to see that each Gr is open and K ⇢ [r>0

Gr. Thus thereexists r

1

, r2

, . . . rm such that K ⇢ [mj=1

Grj

and so B(x, r) ⇢ Kc if r < min {r1

, r2

, . . . rm}.Thus Kc is open and so K is closed.

Obviously [r>0

B(x, r) is an open cover of K for any fixed x 2 X. If K is compactthen there must exist r

1

, r2

, . . . rm such that K ⇢ [mj=1

B(x, rj) and so K ⇢ B(x,R) whereR = max {r

1

, r2

, . . . rm}. Thus K is bounded.

Now suppose that F ⇢ K where F is closed and K is compact. If {G↵}↵2A is anopen cover of F then these sets together with the open set F c are an open cover of K.Hence there exists ↵

1

,↵2

, . . .↵m such that K ⇢ ([mj=1

G↵j

)[F c, from which we concludethat F ⇢ [m

j=1

G↵j

. 2

There will be frequent occasions for wanting to know if a certain set is compact, butit is rare to use the above definition directly. A useful equivalent condition is that ofsequential compactness.

Definition 4.9. A set K ⇢ X is sequentially compact if any infinite sequence in E hasa subsequence convergent to a point of K.

Proposition 4.4. A set K ⇢ X is compact if and only if it is sequentially compact.

We will not prove this result here, but instead refer to Theorem 16, Section 9.5 of[27] for details. It follows immediately that if E ⇢ X is precompact then any infinitesequence in X has a convergent subsequence (the point being that the limit need notbelong to E).

We point out that the concepts of compactness and sequential compactness are ap-plicable in spaces even more general than metric spaces, and are not always equivalentin such situations. In the case that X = RN or CN we have an even more explicit char-acterization of compactness, the well known Heine-Borel Theorem, for which we refer to[28] for a proof.

thhb Theorem 4.3. E ⇢ RN or E ⇢ CN is compact if and only if it is closed and bounded.

While we know from Proposition 4.3 that a compact set is always closed and bounded,

55

Page 57: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

the converse implication is definitely false in most function spaces we will be interestedin.

In later chapters a great deal of attention will be paid to optimization problems infunction spaces, that is, problems in the Calculus of Variations. A simple result alongthese lines that we can prove already is:

th43 Theorem 4.4. Let X be a compact metric space and f : X ! R be continuous. Thenthere exists x

0

2 X such that

f(x0

) = maxx2X

f(x) (4.4.1)

Proof: LetM = supx2X f(x) (which may be +1). so there exists a sequence {xn}1n=1

such that limn!1 f(xn) = M . By sequential compactness there is a subsequence {xnk

}and x

0

2 X such that limk!1 xnk

= x0

and since f is continuous on X we must havef(x

0

) = limk!1 f(xnk

) = M . Thus M < 1 and 4.4.1 holds. 2

A common notation expressing the same conclusion as 4.4.1 is 2

x0

2 argmax(f(x)) (4.4.2)

which is also useful in making the distinction between the maximum value of a functionand the point(s) at which the maximum is achieved.

We emphasize here the distinction between maximum and supremum, which is anessential point in later discussion of optimization. If E ⇢ R then M = supE if

• x M for all x 2 E

• if M 0 < M there exists x 2 E such that x > M 0

Such a number M exists for any E ⇢ R if we allow the value M = +1; by conventionM = �1 if E is the empty set. On the other hand M = maxE if

• x M 2 E2Even though argmax(f(x)) is in general a set of points, i.e. all points where f achieves its maximum value,

one will often see this written as x

0

= argmax(f(x)). Naturally we use the corresponding notation argmin forpoints where the minimum of f is achieved.

56

Page 58: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

in which case evidently the maximum is finite and equal to the supremum.

If f : X ! C is continuous on a compact metric space X, then we can apply Theorem4.4 with f replaced by |f |, to obtain that there exists x

0

2 X such that |f(x)| |f(x0

)|for all x 2 X. We can then also conclude, as in Example 4.2 and Proposition 4.2

Proposition 4.5. If X is a compact metric space, then

C(X) = {f : X ! C : f is continous at x for every x 2 X} (4.4.3)

is a complete metric space with metric d(f, g) = maxx2X |f(x)� g(x)|.

In general C(X), or even a bounded set in C(X), is not itself precompact. A usefulcriteria for precompactness of a set of functions in C(X) is given by the Arzela-Ascolitheorem, which we review here, see e.g. [28] for a proof.

Definition 4.10. We say a family of real or complex valued functions F defined on ametric space X is uniformly bounded if there exists a constant M such that

|f(x)| M whenever x 2 X , f 2 F (4.4.4)

and equicontinuous if for every ✏ > 0 there exists � > 0 such that

|f(x)� f(y)| < ✏ whenever x, y 2 E d(x, y) < � f 2 F (4.4.5)

We then have

arzasc Theorem 4.5. (Arzela-Ascoli) If X is a compact metric space and F ⇢ C(X) is uni-formly bounded and equicontinuous, then F is precompact in C(X).

ex48 Example 4.7. Let

F = {f 2 C([0, 1]) : |f 0(x)| M 8x 2 (0, 1), f(0) = 0} (4.4.6)

for some fixed M . Then for f 2 F we have

f(x) =

Z x

0

f 0(s) ds (4.4.7)

implying in particular that |f(x)| R x

0

M ds M . Also

|f(x)� f(y)| =����Z y

x

f 0(s) ds

���� M |x� y| (4.4.8)

57

Page 59: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

so that for any ✏ > 0, � = ✏/M works in the definition of equicontinuity. Thus by theArzela-Ascoli theorem F is precompact in C([0, 1]).

IfX is a compact subset of RN then since uniform convergence implies Lp convergence,it follows that any set which is precompact in C(X) is also precompact in Lp(X). Butthere are also more refined, i.e. less restrictive, criteria for precompactness in Lp spaces,which are known, see e.g. [5], Section 4.5.

4.5 Contraction mapping theoremMet-Contr

One of the most important theorems about metric spaces, frequently used in appliedmathematics, is the Contraction Mapping Theorem, which concerns fixed points of amapping of X into itself.

Definition 4.11. A mapping T : X ! X is a contraction on X if it is Lipschitz contin-uous with Lipschitz constant ⇢ < 1, that is, there exists ⇢ 2 [0, 1) such that

d(T (x), T (x)) ⇢d(x, x) 8x, x 2 X (4.5.1)

If ⇢ = 1 is allowed, we say T is nonexpansive.

cmt Theorem 4.6. If T is a contraction on a complete metric space X then there exists aunique x 2 X such that T (x) = x.

Proof: The uniqueness assertion is immediate, namely if T (x1

) = x1

and T (x2

) = x2

then d(x1

, x2

) = d(T (x1

), T (x2

)) ⇢d(x1

, x2

). Since ⇢ < 1 we must have d(x1

, x2

) = 0 sothat x

1

= x2

.

To prove the existence of x, fix any point x1

2 X and define

xn+1

= T (xn) (4.5.2) fpi

for n = 1, 2, . . . . We first show that {xn}1n=1

must be a Cauchy sequence.

Note thatd(x

3

, x2

) = d(T (x2

), T (x1

)) ⇢d(x1

, x1

) (4.5.3)

and by induction

d(xn+1

, xn) = d(T (xn), T (xn�1

) ⇢n�1d(x2

, x1

) (4.5.4)

58

Page 60: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

Thus by the triangle inequality and the usual summation formula for a geometric series,if m > n > 1

d(xm, xn) m�1X

j=n

d(xj+1

, xj) m�1X

j=n

⇢j�1d(x2

, x1

) (4.5.5)

=⇢n�1(1� ⇢m�n+1)

1� ⇢d(x

2

, x1

) ⇢n�1

1� ⇢d(x

2

, x1

) (4.5.6)

It follows immediately that {xn}1n=1

is a Cauchy sequence, and since X is complete thereexists x 2 X such that limn!1 xn = x. Since T is continuous T (xn) ! T (x) as n ! 1and so x = T (x) must hold. 2

The point x in the Contraction Mapping Theorem which satisfies T (x) = x is calleda fixed point of T , and the process (4.5.2) of generating the sequence {xn}1n=1

, is calledfixed point iteration. Not only does the theorem show that T possesses a unique fixedpoint under the stated hypotheses, but the proof shows that the fixed point may beobtained by fixed point iteration starting from an arbitrary point of X.

As a simple application of the theorem, consider a second kind integral equation

u(x) +

Z

K(x, y)u(y) dy = f(x) (4.5.7) inteq

with ⌦ ⇢ RN a bounded open set, a kernel function K = K(x, y) defined and continuousfor (x, y) 2 ⌦⇥ ⌦ and f 2 C(⌦). We can then define a mapping T on X = C(⌦) by

T (u)(x) = �Z

K(x, y)u(y) dy + f(x) (4.5.8)

so that (4.5.7) is equivalent to the fixed point problem u = T (u) in X. Since K isuniformly continuous on ⌦ ⇥ ⌦ it is immediate that Tu 2 X whenever u 2 X, and byelementary estimates we have

d(T (u), T (v)) = maxx2⌦

|T (u)(x)� T (v)(x)| = maxx2⌦

����Z

K(x, y)(u(y)� v(y)) dy

���� Ld(u, v)

(4.5.9)where L := maxx2⌦

R⌦

|K(x, y)| dy. We therefore may conclude from the ContractionMapping Theorem the following:

Proposition 4.6. If

maxx2⌦

Z

|K(x, y)| dy < 1 (4.5.10) 410

59

Page 61: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

then (4.5.7) has a unique solution for every f 2 C(⌦).

The condition (4.5.10) will be satisfied if either the maximum of |K| is small enoughor the size of the domain ⌦ is small enough. Eventually we will see that some suchsmallness condition is necessary for unique solvability of (4.5.7), but the exact conditionswill be sharpened considerably.

If we consider instead the family of second kind integral equations

�u(x) +

Z

K(x, y)u(y) dy = f(x) (4.5.11)

with the same conditions on K and f , then the above argument show unique solvabilityfor all su�ciently large �, namely provided

maxx2⌦

Z

|K(x, y)| dy < |�| (4.5.12)

As a second example, consider the initial value problem for a first order ODE

du

dt= f(t, u) u(t

0

) = u0

(4.5.13) odeivp1

where we assume at least that f is continuous on [a, b]⇥R with t0

2 (a, b). If a classicalsolution u exists, then integrating both sides of the ODE from t

0

to t, and taking accountof the initial condition we obtain

u(t) = u0

+

Z t

t0

f(s, u(s)) ds (4.5.14) odeie

Conversely, if u 2 C([a, b]) and satisfies (4.5.14) then necessarily u0 exists, is also contin-uous and (4.5.13) holds. Thus the problem of solving (4.5.13) is seen to be equivalentto that of finding a continuous solution of (4.5.14). In turn this can be viewed as theproblem of finding a fixed point of the nonlinear mapping T : C([a, b]) ! C([a, b]) definedby

T (u)(t) = u0

+

Z t

t0

f(s, u(s)) ds (4.5.15)

Now if we assume that f satisfies the Lipschitz condition with respect to u,

|f(t, u)� f(t, v)| L|u� v| u, v 2 R t 2 [a, b] (4.5.16)

60

Page 62: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

then

|T (u)(t)� T (v)(t)| L

Z t

t0

|u(s)� v(s)| ds L|b� a| maxatb

|u(t)� v(t)| (4.5.17)

ord(T (u), T (v)) L|b� a|d(u, v) (4.5.18) odelip

where d is again the usual metric on C([a, b]). Thus the contraction mapping provides aunique local solution, that is, on any interval [a, b] containing t

0

for which (b� a) < 1/L.

Instead of the requirement that the Lipschitz condition (4.5.18) be valid on the entireinfinite strip [a, b]⇥R, it is actually only necessary to assume it holds on [a, b]⇥[c, d] whereu0

2 (c, d). Also, first order systems of ODEs (and thus scalar higher order equations)can be handled in essentially the same manner. Such generalizations may be found instandard ODE textbooks, e.g. Chapter 1 of [CL] or Chapter 3 of [BN].

We conclude with a useful variant of the contraction mapping theorem. If T : X ! Xthen we can define the (composition) powers of T by T 2(x) = T (T (x)), T 3(x) = T (T 2(x))etc. Thus T n : X ! X for n = 1, 2, 3, . . . .

Theorem 4.7. If there exists a positive integer n such that T n is a contraction on acomplete metric space X then there exists a unique x 2 X such that T (x) = x.

Proof: By Theorem 4.6 there exists a unique x 2 X such that T n(x) = x. Applying Tto both sides gives T n(T (x)) = T n+1(x) = T (x) so that T (x) is also a fixed point of T n.By uniqueness, T (x) = x, i.e. T has at least one fixed point. To see that the fixed pointof T is unique, observe that any fixed point of T is also a fixed point of T 2, T 3, . . . . Inparticular, if T has two distinct fixed points then so does T n, which is a contradiction.2

4.6 Exercises

1. Verify that dp defined in Example 4.1 is a metric on RN or CN . (Suggestion: toprove the triangle inequality, use the finite dimensional version of the Minkowskiinequality (18.1.15)).

2. If (X, dX), (Y, dY ) are metric spaces, show that the Cartesian product

Z = X ⇥ Y = {(x, y) : x 2 X, y 2 Y }

61

Page 63: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

is a metric space with distance function

d((x1

, y1

), (x2

, y2

)) = dX(x1

, x2

) + dY (y1, y2)

3. Is d(x, y) = |x � y|2 a metric on R? What about d(x, y) =p|x� y|? Find

reasonable conditions on a function � : [0,1) ! [0,1) such that d(x, y) = �(|x�y|)is a metric on R.

4. Prove that a closed subset of a compact set in a metric space is also compact.

5. Let (X, d) be a metric space, A ⇢ X be nonempty and define the distance from apoint x to the set A to be

d(x,A) = infy2A

d(x, y)

a) Show that |d(x,A) � d(y, A)| d(x, y) for x, y 2 X (i.e. x ! d(x,A) is nonex-pansive).

b) Assume A is closed. Show that d(x,A) = 0 if and only if x 2 A.

c) Assume A is compact. Show that for any x 2 X there exists z 2 A such thatd(x,A) = d(x, z).

6. Suppose that F is closed and G is open in a metric space (X, d) and F ⇢ G. Showthat there exists a continuous function f : X ! R such that

i) 0 f(x) 1 for all x 2 X.

ii) f(x) = 1 for x 2 F .

iii) f(x) = 0 for x 2 Gc.

Hint: Consider

f(x) =d(x,Gc)

d(x,Gc) + d(x, F )

7. Two metrics d, d on a set X are said to be equivalent if there exist constants 0 <C < C⇤ < 1 such that

C d(x, y)

d(x, y) C⇤ 8x, y 2 X

a) If d, d are equivalent, show that a sequence {xk}1k=1

is convergent in (X, d) if andonly if it is convergent in (X, d).

b) Show that any two of the metrics dp on Rn are equivalent.

62

Page 64: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

ex4-8 8. Prove that C([a, b]) is separable (you may quote the Weierstrass approximationtheorem) but L1(a, b) is not separable.

9. If X, Y are metric spaces, f : X ! Y is continuous and K is compact in X, showthat the image f(K) is compact in Y .

10. Let

F = {f 2 C([0, 1]) : |f(x)� f(y)| |x� y| for all x, y,Z

1

0

f(x) dx = 0}

Show that F is compact in C([0, 1]). (Suggestion: to prove that F is uniformlybounded, justify and use the fact that if f 2 F then f(x) = 0 for some x 2 [0, 1].)

11. Show that the set F in Example 4.7 is not closed.

12. From the proof of the contraction mapping it is clear that the smaller ⇢ is, the fasterthe sequence xn converges to the fixed point x. With this in mind, explain whyNewton’s method

xn+1

= xn �f(xn)

f 0(xn)

is in general a very rapidly convergent method for approximating roots of f : R ! R,as long as the initial guess is close enough.

13. Let fn(x) = sinn x for n = 1, 2, . . . .

a) Is the sequence {fn}1n=1

convergent in C([0, ⇡])?

b) Is the sequence convergent in L2(0, ⇡)?

c) Is the sequence compact or precompact in either of these spaces?

14. Let X be a complete metric space and T : X ! X satisfy d(T (x), T (y)) < d(x, y)for all x, y 2 X, x 6= y. Show that T can have at most one fixed point, but mayhave none. (Suggestion: for an example of non-existence look at T (x) =

px2 + 1

on R.)

15. Let S denote the linear Volterra type integral operator

Su(x) =

Z x

a

K(x, y)u(y) dy

where the kernel K is continuous and satisfies |K(x, y)| M for a y x.

63

Page 65: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

a) Show that

|Snu(x)| Mn(x� a)n

n!maxayx

|u(y)| x > a n = 1, 2, . . .

b) Deduce from this that for any b > a, there exists an integer n such that Sn is acontraction on C([a, b]).

c) Show that for any f 2 C([a, b]) the second kind Volterra integral equationZ x

a

K(x, y)u(y) dy = u(x) + f(x) a < x < b

has a unique solution u 2 C([a, b]).

16. Show that for su�ciently small |�| there exists a unique solution of the boundaryvalue problem

u00 + �u = f(x) 0 < x < 1 u(0) = u(1) = 0

for any f 2 C([0, 1]). (Suggestion: use the result of Chapter 2, Exercise 7 totransform the boundary value problem into a fixed point problem for an integraloperator, then apply the Contraction Mapping Theorem.) Be as precise as you canabout which values of � are allowed.

17. Let f = f(x, y) be continuously di↵erentiable on [0, 1]⇥ R and satisfy

0 < m @f

@y(x, y) M

Show that there exists a unique continuous function �(x) such that

f(x,�(x)) = 0 0 < x < 1

(Suggestion: Define the transformation

(T�)(x) = �(x)� �f(x,�(x))

and show that T is a contraction on C([0, 1]) for some choice of �. This is a specialcase of the implicit function theorem.)

64

Page 66: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

ex4-18 18. Show that if we let

d(f, g) =1X

n=0

2�nen1 + en

whereen = max

x2[a,b]|f (n)(x)� g(n)(x)|

then (C1([a, b]), d) is a metric space, in which fk ! f if and only if f (n)k ! f (n)

uniformly on [a, b] for n = 0, 1, . . . .

ex4-19 19. If E ⇢ RN is closed and bounded, show that C1(E) is a complete metric space withthe metric defined by (4.1.7).

65

Page 67: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

Chapter 5

Normed linear spaces and Banachspaces

chbanach

5.1 Axioms of a normed linear space

Definition 5.1. A vector space X is said to be a normed linear space if for every x 2 Xthere is defined a nonnegative real number ||x||, the norm of x, such that the followingaxioms hold.

[N1] ||x|| = 0 if and only if x = 0

[N2] ||�x|| = |�|||x|| for any x 2 X and any scalar �.

[N3] ||x+ y|| ||x||+ ||y|| for any x, y 2 X.

As in the case of a metric space it is technically the pair (X, || · ||) which constitute anormed linear space, but the definition of the norm will usually be clear from the context.If two di↵erent normed spaces are needed we will use a notation such as ||x||X to indicatethe space in which the norm is calculated.

Example 5.1. In the vector space X = RM or CN we can define the family of norms

||x||p =

nX

j=1

|xj|p! 1

p

1 p < 1

66

Page 68: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

||x||1 = max1jn

|xj| (5.1.1)

Axioms [N1] and [N2] are obvious, while axiom [N3] amounts to the Minkowski inequality(18.1.15).

We obviously have dp(x, y) = ||x� y||p here, and this correspondence between normand metric is a special case of the following general fact that a norm always gives rise toa metric, and whose proof is immediate from the definitions involved.

prop51 Proposition 5.1. Let (X, || · ||) be a normed linear space. If we set d(x, y) = ||x � y||for x, y 2 X then (X, d) is a metric space.

Example 5.2. If E ⇢ RN is closed and bounded then it is easy to verify that

||f || = maxx2E

|f(x)| (5.1.2)

defines a norm on C(E), and the usual metric (4.1.4) on C(E) amounts to d(f, g) =||f � g||. Likewise, the metrics (4.1.8),(4.1.9) on Lp(E) may be viewed as coming fromthe corresponding Lp norms,

||f ||Lp

(E)

=

(�RE|f(x)|p dx

� 1

p 1 p < 1ess supx2E|f(x)| p = 1

(5.1.3)

Note that for such a metric we must have d(�x,�y) = |�|d(x, y) so that if this propertydoes not hold, the metric cannot arise from a norm in this way. For example,

d(x, y) =|x� y|

1 + |x� y| (5.1.4)

is a metric on R which does not come from a norm.

Since any normed linear space may now be regarded as metric space, all of the topo-logical concepts defined for a metric space are meaningful in a normed linear space.Completeness holds in many situations of interest, so we have a special designation inthat case.

Definition 5.2. A Banach space is a complete normed linear space.

Example 5.3. The spaces RN ,CN are vector spaces which are also complete metricspaces with any of the norms || · ||p, hence they are Banach spaces. Similarly C(E),Lp(E) are Banach spaces with norms indicated above.

Here are a few simple results we can prove already.

67

Page 69: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

prop52 Proposition 5.2. If X is a normed linear space the the norm is a continuous functionon X. If E ⇢ X is compact and y 2 X then there exists x

0

2 E such that

||y � x0

|| = minx2E

||y � x|| (5.1.5)

Proof: From the triangle inequality we get |(||x1

||� ||x2

||)| ||x1

� x2

|| so thatf(x) = ||x|| is Lipschitz continuous (with Lipschitz constant 1) on X. Similarly f(x) =||x�y|| is also continuous for any fixed y, so we may apply Theorem 4.4 with X replacedby the compact metric space E and f(x) = �||x� y|| to get the conclusion (ii). 2

Another topological point of interest is the following.

th51 Theorem 5.1. If M is a subspace of a normed linear space X, and dimM < 1 thenM is closed.

Proof: The proof is by induction on the number of dimensions. Let dim(M) = 1 so thatM = {u = �e : � 2 C} for some e 2 X, ||e|| = 1. If un 2 M then un = �ne for some�n 2 C and un ! u in X implies, since ||un � um|| = |�n � �m|, that {�n} is a Cauchysequence in C. Thus there exist � 2 C such that �n ! � so that un ! u = �e 2 M , asneeded.

Now suppose we know that all N dimensional subspaces are closed and dimM =N + 1, so we can find e

1

, . . . , eN+1

linearly independent unit vectors such that M =L(e

1

, . . . , eN+1

). Let M = L(e1

, . . . , eN) which is closed by the induction assumption. Ifun 2 M there exists �n 2 C and vn 2 M such that un = vn + �neN+1

. Suppose thatun ! u in X. We claim first that {�n} is bounded in C. If not, there must exist �n

k

suchthat |�n

k

| ! 1, and since un remains bounded in X we get unk

/�nk

! 0. It follows that

eN+1

� unk

�nk

= � vnk

�nk

2 M (5.1.6)

Since M is closed, it would follow, upon letting nk ! 1, that eN+1

2 M , which isimpossible.

Thus we can find a subsequence �nk

! � for some � 2 C and

vnk

= unk

� �nk

eN+1

! u� �eN+1

(5.1.7)

Again since M is closed it follows that u� �eN+1

2 M , so that u 2 M as needed.

68

Page 70: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

For the proof, see for example Theorem 1.21 of [30]. For an infinite dimensionalsubspace this is false in general. For example, the Weierstrass approximation theoremstates that if f 2 C([a, b]) and ✏ > 0 there exists a polynomial p such that |p(x)�f(x)| ✏on [a, b]. Thus if we take X = C([a, b]) and E to be the set of all polynomials on [a, b]then clearly E is a subspace of X and every point of X is a limit point of E. Thus Ecannot be closed since otherwise E would be equal to all of X.

Recall that when E = X as in this example, we say that E is a dense subspace of X.Such subspaces play an important role in functional analysis. According to Theorem 5.1a finite dimensional Banach space X has no dense subspace aside from X itself.

5.2 Infinite series

In a normed linear space we can study limits of sums, i.e. infinite series.

Definition 5.3. We sayP1

j=1

xj is convergent in X to the limit s if limn!1 sn = s,where sn =

Pnj=1

xj is the n’th partial sum of the series.

A useful criterion for convergence can then be given, provided the space is also com-plete.

prop53 Proposition 5.3. If X is a Banach space, xn 2 X for n = 1, 2, . . . andP1

n=1

||xn|| < 1then

P1n=1

xn is convergent to an element s 2 X with ||s|| P1

n=1

||xn||.

Proof: If m > n we have ||sm � sn|| = ||Pm

j=n+1

xj|| Pm

j=n+1

||xj|| by the triangleinequality. If

P1j=1

||xj|| it is convergent, its partial sums form a Cauchy sequence in R,and hence {sn} is also Cauchy. Since the space is complete s = limn!1 sn exists. Wealso have ||sn||

Pnj=1

||xj|| for any fixed n, and ||sn|| ! ||s|| by Proposition 5.2, so||s||

P1j=1

||xj|| must hold. 2

The concepts of linear combination, linear independence and basis may now be ex-tended to allow for infinite sums in an obvious way: We say a countably infinite set ofvectors {xn}1n=1

is linearly independent if

1X

n=1

�nxn = 0 if and only if �n = 0 for all n (5.2.1)

69

Page 71: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

and x 2 L({xn}1n=1

), the span of ({xn}1n=1

), provided x =P1

n=1

�nxn for some scalars{�n}1n=1

. A basis of X is then a linearly independent spanning set, or equivalently{xn}1n=1

is a basis of X if for any x 2 X there exist unique scalars {�n}1n=1

such thatx =

P1n=1

�nxn.

We emphasize that this definition of basis is not the same as that given in Definition3.4 for a basis of a vector space, the di↵erence being that the sum there is required toalways be finite. The term Schauder basis is sometimes used for the above definition ifthe distinction needs to be made. Throughout the remainder of these notes, the termbasis will always mean Schauder basis unless otherwise stated.

A Banach space X which contains a Schauder basis {xn}1n=1

is always separable, sincethen the set of all finite linear combinations of the xn’s with rational coe�cients is easilyseen to be countable and dense. It is known that not every separable Banach space hasa Schauder basis (recall there must exist a Hamel basis), see for example Section 1.1 of[38].

5.3 Linear operators and functionals

We have previously defined what it means for a mapping T : X 7�! Y between vectorspaces to be linear. When the spaces X,Y are normed linear spaces we usually refer tosuch a mapping T as a linear operator. We say that T is bounded if there exists a finiteconstant C such that ||Tx|| C||x|| for every x 2 X, and we may then define the normof T as the smallest such C, or equivalently

||T || = supx6=0

||Tx||||x|| (5.3.1) normdef

The condition ||T || < 1 is equivalent to continuity of T .

prop54 Proposition 5.4. If X,Y are normed linear spaces and T : X ! Y is linear then thefollowing three conditions are equivalent.

a) T is bounded

b) T is continuous

c) There exists x0

2 X such that T is continuous at x0

.

70

Page 72: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

Proof: If x0

, x 2 X then

||T (x)� T (x0

)|| = ||T (x� x0

)|| ||T || ||x� x0

|| (5.3.2)

Thus if T is bounded then it is (Lipschitz) continuous at any point of X. The implicationthat b) implies c) is trivial. Finally suppose T is continuous at x

0

2 X. For any ✏ > 0there must exist � > 0 such that ||T (z� x

0

)|| = ||T (z)�T (x0

)|| ✏ if ||z� x0

|| �. Forany x 6= 0, choose z = x

0

+ � x||x|| to get

||T✓�

x

||x||

◆|| ✏ (5.3.3)

or equivalently, using the linearity of T , ||Tx|| C||x|| with C = ✏/�. Thus T isbounded. 2

A continuous linear operator is therefore the same as a bounded linear operator, andthe two terms are used interchangeably. When the range space Y is the scalar field R orC we call T a linear functional instead of linear operator, and correspondingly a bounded(or continuous) linear functional if |Tx| C||x|| for some finite constant C.

We introduce the notation

B(X,Y) = {T : X ! Y : T is linear and bounded} (5.3.4)

and the special casesB(X) = B(X,X) X0 = B(X,C) (5.3.5)

Examples of linear operators and functionals will be studied much more extensivelylater. For now we just give two simple examples.

Example 5.4. If X = RN ,Y = RM and A is an M⇥N real matrix with entries akj, thenyk =

PMj=1

akjxj defines a linear mapping, and according to the discussion of Section 3.3any linear mapping of RN to RM is of this form. It is not hard to check that T is alwaysbounded, assuming that we use any of the norms || · ||p in X and in Y. Evidently T is alinear functional if M = 1.

Example 5.5. If ⌦ ⇢ RN is compact and X = C(⌦) pick x0

2 ⌦ and set T (f) = f(x0

)for f 2 X. Clearly T is a linear functional and |Tf | ||f || so that ||T || 1.

71

Page 73: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

5.4 Contraction mappings in a Banach spacesec54

If the Contraction Mapping theorem, Theorem 4.6, is specialized to a Banach space, theresulting statement is that if X is a Banach space and F : X ! X satisfies

||F (x)� F (y)|| L||x� y|| x, y 2 X (5.4.1) conaflin

for some L < 1, then F has a unique fixed point in X.

A particular case which arises frequently in applications is when the mapping F hasthe form F (x) = Tx + b for some b 2 X and bounded linear operator T on X, in whichcase the contraction condition (5.4.1) simply amounts to the requirement that ||T || < 1.If we then initialize the fixed point iteration process (4.5.2) with x

1

= b, the successiveiterates are

x2

= F (x1

) = F (b) = Tb+ b (5.4.2)

x3

= F (x2

) = Tx2

+ b = T 2b+ Tb+ b (5.4.3)

etc., the general pattern being

xn =n�1X

j=0

T jb n = 1, 2, . . . (5.4.4)

with T 0 = I as usual. If ||T || < 1 we already know that this sequence must converge,but it could also be checked directly from Proposition 5.3 using the obvious inequality||T jb|| ||T ||j||b||. In fact we know that xn ! x, the unique fixed point of F , so

x =1X

j=0

T jb (5.4.5) neuser

is an explicit solution formula for the linear, inhomogeneous equation x � Tx = b. Theright hand side of (5.4.5) is known as the Neumann series for x = (I � T )�1b, andsymbolically we may write

(I � T )�1 =1X

j=0

T j (5.4.6)

Note the formal similarity to the usual geometric series formula for (1 � z)�1 if z 2 C,|z| < 1. If T and b are such that ||T jb|| << ||Tb|| for j � 2, then truncating the seriesafter two terms we get the Born approximation formula x ⇡ b+ Tb.

72

Page 74: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

5.5 Exercises

1. Give the proof of Proposition 5.1.

2. Show that any two norms on a finite dimensional normed linear space are equivalent.That is to say, if (X, || · ||), (X, ||| · |||) are both normed linear spaces, then thereexist constants 0 < c < C < 1 such that

c |||x|||||x|| C for all x 2 X

ex5-3 3. If X is a normed linear space and Y is a Banach space, show that B(X,Y) is aBanach space, with the norm given by (5.3.1).

4. If T is a linear integral operator, Tu(x) =R⌦

K(x, y)u(y) dy, then T 2 is also a linearintegral operator. What is the kernel for T 2?

5. If X is a normed linear space and E is a subspace of X, show that E is also asubspace of X.

ex5-7 6. If p 2 (0, 1) show that ||f ||p =�R

|f(x)|p dx�1/p

does not define a norm.

7. The simple initial value problem

u0 = u u(0) = 1

is equivalent to the integral equation

u(x) = 1 +

Z x

0

u(s) ds

which may be viewed as a fixed point problem of the special type discussed inSection 5.4. Find the Neumann series for the solution u. Where does it converge?

8. If Tf = f(0), show that T is not a bounded linear functional on Lp(�1, 1) for1 p < 1.

expop 9. Let A 2 B(X).

a) Show that

exp(A) = eA :=1X

n=0

An

n!(5.5.1)

73

Page 75: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

is defined in B(X).

b) If also B 2 B(X) and AB = BA show that exp(A+B) = exp(A) exp(B).

c) Show that exp((t+ s)A) = exp(tA) exp(sA) for any t, s 2 R.d) Show that the conclusion in b) is false, in general, if A and B do not commute.(Suggestion: a counterexample can be found in X = R2.)

10. Find an integral equation of the form u = Tu+ f , T linear, which is equivalent tothe initial value problem

u00 + u = x2 x > 0 u(0) = 1 u0(0) = 2 (5.5.2)

Calculate the Born approximation to the solution u and compare to the exactsolution.

74

Page 76: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

Chapter 6

Inner product spaces and Hilbertspaces

chhilbert

6.1 Axioms of an inner product space

Definition 6.1. A vector space X is said to be an inner product space if for everyx, y 2 X there is defined a complex number hx, yi, the inner product of x and y suchthat the following axioms hold.

[H1] hx, xi � 0 for all x 2 X

[H2] hx, xi = 0 if and only if x = 0

[H3] h�x, yi = �hx, yi for any x, y 2 X and any scalar �.

[H4] hx, yi = hy, xi for any x, y 2 X.

[H5] hx+ y, zi = hx, zi+ hy, zi for any x, y, z 2 X

Note that from axioms [H3] and [H4] it follows that

hx,�yi = h�y, xi = �hy, xi = �hy, xi = �hx, yi (6.1.1)

75

Page 77: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

Another immediate consequence of the axioms is that

||x+ y||2 = hx+ y, x+ yi = ||x||2 + 2Re hx, yi+ ||y||2 = ||x||2 + ||y||2 (6.1.2) 612

If we replace y by �y and add the resulting identities we obtain the so-called Parallelo-gram Law

||x+ y||2 + ||x� y||2 = 2||x||2 + 2||y||2 (6.1.3) plaw

Example 6.1. The vector space RN is an inner product space if we define

hx, yi =nX

j=1

xjyj (6.1.4)

In the case of Cn we must define

hx, yi =nX

j=1

xjyj (6.1.5)

in order that [H4] be satisfied.

Example 6.2. For the vector space L2(⌦), with ⌦ ⇢ RN , we may define

hf, gi =Z

E

f(x)g(x) dx (6.1.6)

where of course the complex conjugation can be ignored in the case of the real vectorspace L2(⌦). Note the formal analogy with the inner product in the case of RN or CN .The finiteness of hf, gi is guaranteed by the Holder inequality (18.1.6), and the validityof [H1]-[H5] is clear.

littlel2 Example 6.3. Another important inner product space which we introduce at this pointis the sequence space

`2 =

(x = {xj}1j=1

:1X

j=1

|xj|2 < 1)

(6.1.7)

with inner product

hx, yi =1X

j=1

xjyj (6.1.8)

The fact that hx, yi is finite for any x, y 2 `2 follows now from (18.1.14), the discrete formof the Holder inequality. The notation `2(Z) is often used when the sequences involvedare bi-infinite, i.e. of the form x = {xj}1j=�1.

76

Page 78: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

6.2 Norm in a Hilbert space

Proposition 6.1. If x, y 2 X, an inner product space, then

|hx, yi|2 hx, xihy, yi (6.2.1) schwarz

Proof: For any z 2 X we have

0 hx� z, x� zi = hx, xi � hx, zi � hz, xi+ hz, zi (6.2.2)

= hx, xi+ hz, zi � 2Re hx, zi (6.2.3)

and hence2Re hz, xi hx, xi+ hz, zi (6.2.4)

If y = 0 there in nothing to prove, otherwise choose z = (hx, yi/hy, yi)y to get

2|hx, yi|2hy, yi hx, xi+ |hx, yi|2

hy, yi (6.2.5)

The conclusion (6.2.1) now follows upon rearrangement. 2

th61 Theorem 6.1. If X is an inner product space and if we set ||x|| =phx, xi then || · || is

a norm on X.

Proof: By axiom [H1] ||x|| is defined as a nonnegative real number for every x 2 X,and axiom [H2] implies the corresponding axiom [N1] of norm. If � is any scalar then||�x||2 =< �x,�x >= ��hx, xi = |�|2||x||2 so that [N2] also holds. Finally, if x, y 2 Xthen

||x+ y||2 = hx+ y, x+ yi = ||x||2 + 2Re hx, yi+ ||y||2 (6.2.6)

||x||2 + 2|hx, yi|+ ||y||2 ||x||2 + 2||x|| ||y||+ ||y||2 (6.2.7)

= (||x||+ ||y||)2 (6.2.8)

so that the triangle inequality [N3] also holds. 2

The inequality (6.2.1) may now be restated as

|hx, yi| ||x|| ||y|| (6.2.9)

for any x, y 2 X, and in this form is usually called the Schwarz or Cauchy-Schwarzinequality.

77

Page 79: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

cor61 Corollary 6.1. If xn ! x in X then hxn, yi ! hx, yi for any y 2 X

Proof: We have that

|hxn, yi � hx, yi| = |hxn � x, yi| ||xn � x|| ||y|| ! 0 (6.2.10)

By Theorem 6.1 an inner product space may always be regarded as a normed linearspace, and analogously to the definition of Banach space we have

Definition 6.2. A Hilbert space is a complete inner product space.

Example 6.4. The spaces RN and CN are Hilbert spaces, as is L2(⌦) on account of thecompleteness property mentioned in Theorem 4.1 of Chapter 4. On the other hand if weconsider C(E) with inner product hf, gi =

REf(x)g(x) dx, then it is an inner product

space which is not a Hilbert space, since as previously observed, C(E) is not completewith the L2(⌦) metric. The sequence space `2 is also a Hilbert space, see Exercise 7.

6.3 Orthogonality

Recall from elementary calculus that in Rn the inner product allows one to calculate theangle between two vectors, namely

hx, yi = ||x|| ||y|| cos ✓ (6.3.1)

where ✓ is the angle between x and y. In particular x and y are perpendicular if and onlyif hx, yi = 0. The concept of perpindicularity, also called orthogonality, is fundamentalin Hilbert space analysis, even if the geometric picture is less clear.

Definition 6.3. if x, y 2 X, an inner product space, we say x, y are orthogonal ifhx, yi = 0.

From (6.1.2) we obtain immediately the ’Pythagorean Theorem’ that if x and y areorthogonal then

||x+ y||2 = ||x||2 + ||y||2 (6.3.2)

78

Page 80: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

A set of vectors {x1

, x2

, . . . xn} is called an orthogonal set if xj and xk are orthogonalwhenever j 6= k, and for such a set we have

||nX

j=1

xj||2 =nX

j=1

||xj||2 (6.3.3) 634

The set is called orthonormal if in addition ||xj|| = 1 for every j. The same terminologyis used for countably infinite sets, with (6.3.3) still valid provided that the series on theright is convergent.

We may also use the notation x ? y if x, y are orthogonal, and if E ⇢ X we definethe orthogonal complement of E

E? = {x 2 X : hx, yi = 0 for all y 2 E}

(E? = x? if E consists of the single point x). We obviously have 0? = X and X? = {0}also, since if x 2 X? then hx, xi = 0 so that x = 0.

prop62 Proposition 6.2. If E ⇢ X then E? is a closed subspace of X. If E is a closed subpacethen E = E??.

We leave the proof as an exercise. Here E?? means (E?)?, the orthogonal comple-ment of the orthogonal complement.

Example 6.5. If X = R3 and E = {x = (x1

, x2

, x3

) : x1

= x2

= 0} then E? = {x 2 R3 :x3

= 0}.Example 6.6. If X = L2(⌦) with ⌦ a bounded open set in RN , let E = L{1}, i.e. theset of constant functions. Then f 2 E? if and only if hf, 1i =

R⌦

f(x) dx = 0. Thus E?

is the set of functions in L2(⌦) with mean value zero.

6.4 Projections

If E ⇢ X and x 2 X, the projection PEx of x onto E is the element of E closest to x, ifsuch an element exists. That is, y = PE(x) if y is the unique solution of the minimizationproblem

minz2E

||x� z|| (6.4.1) bestapprox

Of course such a point may not exist, and may not be unique if it does exist. In a Hilbertspace the projection will be well defined provided E is closed and convex.

79

Page 81: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

Definition 6.4. IfX is a vector space and E ⇢ X, we say E is convex if �x+(1��)y 2 Ewhenever x, y 2 E and � 2 [0, 1].

Example 6.7. If X is a vector space then any subspace of X is convex. If X is a normedlinear space then any ball B(x,R) ⇢ X is convex.

Theorem 6.2. Let H be a Hilbert space, E ⇢ H closed and convex, and x 2 H. Theny = PEx exists. Furthermore, y = PEx if and only if

y 2 E Re hx� y, z � yi 0 for all z 2 E (6.4.2) projvar

Proof: Set d = infz2E ||x � z|| so that there exists a sequence zn 2 E such that||x�zn|| ! d. We wish to show that {zn} is a Cauchy sequence. From the ParallelogramLaw (6.1.3) applied to zn � x, zm � x we have

||zn � zm||2 = 2||zn � x||2 + 2||zm � x||2 � 4||zn + zm2

� x||2 (6.4.3)

Since E is convex, (zn + zm)/2 2 E so that || zn+zm

2

� x|| � d, and it follows that

||zn � zm||2 2||zn � x||2 + 2||zm � x||2 � 4d2 (6.4.4) 644

Letting n,m ! 1 the right hand side tends to zero, so that {zn} is Cauchy. Since thespace is complete there exists y 2 H such that limn!1 zn = y, and y 2 E since E isclosed. It follows that ||y�x|| = limn!1 ||zn�x|| = d so that minz2E ||z�x|| is achievedat y.

For the uniqueness assertion, suppose ||y � x|| = ||y � x|| = d with y, y 2 E. Then(6.4.4) holds with zn, zm replaced by y, y giving

||y � y|| 2||y � x||2 + 2||y � x||2 � 4d2 = 0 (6.4.5)

so that y = y. Thus y = PEx exists.

To obtain the characterization (6.4.2), note that for any z 2 E

f(t) = ||x� (y + t(z � y))||2 (6.4.6)

has its minimum value on the interval [0, 1] when t = 0, since y+t(z�y) = tz+(1�t)y 2E. We explicitly calculate

f(t) = ||x� y||2 � 2tRe hx� y, z � yi+ t2||z � y||2 (6.4.7)

80

Page 82: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

By elementary calculus considerations, the minimum of this quadratic occurs at t = 0only if f 0(0) = �2Re hx � y, z � yi � 0 which is equivalent to (6.4.2). If, on the otherhand, (6.4.2) holds, then for any z 2 E we must have

||z � x||2 = f(1) � f(0) = ||z � y||2 (6.4.8)

so that minz2E ||z � x|| must occur at y, i.e. y = PEx 2

The most important special case of the above theorem is when E is a closed subspaceof the Hilbert space H (recall a subspace is always convex), in which case we have

th63 Theorem 6.3. If E ⇢ H is a closed subspace of a Hilbert space H and x 2 H theny = PEx if and only if y 2 E and x� y 2 E?. Furthermore

1. x� y = x� PEx = PE?x

2. We have thatx = y + (x� y) = PEx+ PE?x (6.4.9)

is the unique decomposition of x as the sum of an element of E and an element ofE?.

3. PE is a linear operator on H with ||PE|| = 1 except for the case E = {0}.

Proof: If y = PEx then for any w 2 E we also have y ± w 2 E, and choosingz = y ± w in (6.4.2) gives ±Re hx � y, wi 0. Thus Re hx � y, wi = 0, and repeatingthe same argument with z = y ± iw gives Re hx � y, iwi = Im hx � y, wi = 0 also. Weconclude that hx� y, wi = 0 for all w 2 E, i.e. x� y 2 E?. The converse statement maybe proved in a similar manner.

Recall that E? is always a closed subspace of H. The statement that x�y = PE?x isthen equivalent, by the previous paragraph, to x�y 2 E? and hx�(x�y), wi = hy, wi = 0for every w 2 E?, which is evidently true since y 2 E.

Next, if x = y1

+ z1

= y2

+ z2

with y1

, y2

2 E and z1

, z2

2 E? then y1

� y2

= z1

� z2

implying that y = y1

� y2

belongs to both E and E?. But then y ? y, i.e. < y, y >= 0,must hold so that y = 0 and hence y

1

= y2

, z1

= z2

. We leave the proof of linearity tothe exercises. 2

If we denote by I the identity mapping, we have just proved that PE? = I � PE. Wealso obtain that

||x||2 = ||PEx||2 + ||PE?x||2 (6.4.10) 6410

81

Page 83: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

for any x 2 H.

Example 6.8. In the Hilbert space L2(�1, 1) let E denote the subspace of even functions,i.e. f 2 E if f(x) = f(�x) for almost every x 2 (�1, 1). We claim that E? is the subspaceof odd functions on (�1, 1). The fact that any odd function belongs to E? is clear, sinceif f is even and g is odd then fg is odd and so hf, gi =

R1

�1

f(x)g(x) dx = 0. Conversely,if g ? E then for any f 2 E we have

0 = hg, fi =Z

1

�1

g(x)f(x) dx =

Z1

0

(g(x) + g(�x))f(x) dx (6.4.11)

by an obvious change of variables. Choosing f(x) = g(x) + g(�x) we see that

Z1

0

|g(x) + g(�x)|2 dx = 0 (6.4.12)

so that g(x) = �g(�x) for almost every x 2 (0, 1) and hence for almost every x 2 (�1, 1).Thus any element of E? is and odd function on (�1, 1).

Any function f 2 L2(�1, 1) thus has the unique decomposition f = PEf + PE?f , asum of an even and an odd function. Since one such splitting is

f(x) =f(x) + f(�x)

2+

f(x)� f(�x)

2(6.4.13)

we conclude from the uniqueness property that these two term are the projections, i.e.

PEf(x) =f(x) + f(�x)

2PE?f(x) =

f(x)� f(�x)

2(6.4.14)

Example 6.9. Let {x1

, x2

, . . . xn} be an orthogonal set of nonzero elements in a Hilbertspace X and E = L(x

1

, x2

. . . xn) the span of these elements. Let us compute PE forthis closed subspace E. If y = PEx then y =

Pnj=1

�jxj for some scalars �1

, . . .�n sincey 2 E. From Theorem 6.3 we also have that x� y ? E which is equivalent to x� y ? xk

for each k. Thus hx, xki = hy, xki = �khxk, xki using the orthogonality assumption. Thuswe conclude that

y = PEx =nX

j=1

hx, xjihxj, xji

xj (6.4.15) 6415

82

Page 84: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

6.5 Gram-Schmidt method

The projection formula (6.4.15) provides an explicit and very convenient expressionfor the solution y of the best approximation problem (6.4.1) provided E is a subspacespanned by mutually orthogonal vectors {x

1

, x2

, . . . xn}. If instead E = L(x1

, x2

. . . xn)is a subspace but {x

1

, x2

, . . . xn} are not orthogonal vectors, we can still use (6.4.15) tocompute y = PEx if we can find a set of orthogonal vectors {y

1

, y2

, . . . ym} such thatE = L(x

1

, x2

, . . . xn) = L(y1

, y2

, . . . ym), i.e. if we can find an orthogonal basis of E.This may always be done by the Gram-Schmidt orthogonalization procedure from linearalgebra, which we now describe.

Assume that {x1

, x2

, . . . xn} are linearly independent, so that m = n must hold. Firstset y

1

= x1

. If orthogonal vectors y1

, y2

. . . yk have been chosen for some 1 k < nsuch that Ek := L(y

1

, y2

, . . . yk) = L(x1

, x2

, . . . xk) then define yk+1

= xk+1

� PEk

xk+1

.Clearly {y

1

, y2

, . . . yk+1

} are orthogonal since yk+1

is the projection of xk+1

onto E?k .

Also since yk+1

, xk+1

di↵er by an element of Ek it is evident that L(x1

, x2

, . . . xk+1

) =L(y

1

, y2

, . . . yk+1

). Thus after n steps we obtain an orthogonal set {y1

, y2

, . . . yn} whichspans E. If the original set {x

1

, x2

, . . . xn} is not linearly independent then some of theyk’s will be zero. After discarding these and relabeling, we obtain {y

1

, y2

, . . . ym} forsome m n, an orthogonal basis for E. Note that we may compute yk+1

using (6.4.15),namely

yk+1

= xk+1

�kX

j=1

< xk+1

, yj >

< yj, yj >yj (6.5.1)

In practice the Gram-Schmidt method is often modified to produce an orthonormalbasis of E by normalizing yk to be a unit vector at each step, or else discarding it if it isalready a linear combination of {y

1

, y2

, . . . yk�1

}. More explicitly:

• Set y1

= x1

||x1

||

• If orthonormal vectors {y1

, y2

, . . . yk} have been chosen, set

yk+1

= xk+1

�kX

j=1

< xk+1

, yj > yj (6.5.2)

If yk+1

= 0 discard it, otherwise set yk+1

= yk+1

||yk+1

|| .

83

Page 85: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

The reader may easily check that {y1

, y2

, . . . ym} constitutes an orthonormal basis of E,and consequently PEx =

Pmj=1

< x, yj > yj for any x 2 H.

6.6 Bessel’s inequality and infinite orthogonal sequences

The formula (6.4.15) for PE may be adapted for use in infinite dimensional subspaces E.If {xn}1n=1

is a countable orthogonal set in H, xn 6= 0 for all n, we formally expect thatif E = L({xn}1n=1

) then

PEx =1X

n=1

hx, xnihxn, xni

xn (6.6.1) 661

To verify that this is correct, we must show that the infinite series in (6.6.1) is guaranteedto be convergent in H.

First of all, let us set

en =xn

||xn||cn = hx, eni EN = L(x

1

, x2

, . . . xN) (6.6.2)

so that {en}1n=1

is an orthonormal set, and

PEN

x =NX

n=1

cnen (6.6.3)

From (6.4.10) we haveNX

n=1

|cn|2 = ||PEN

x||2 ||x||2 (6.6.4)

Letting N ! 1 we obtain Bessel’s inequality

1X

n=1

|cn|2 =1X

n=1

|hx, eni|2 ||x||2 (6.6.5) besselin

The immediate implication that limn!1 cn = 0 is sometimes called the Riemann-Lebesguelemma.

Proposition 6.3. (Riesz-Fischer) Let {en}1n=1

be an orthonormal set in H, E = L({en}1n=1

),prop63x 2 H and cn = hx, eni. Then the infinite series

P1n=1

cnen is convergent in H to PEx.

84

Page 86: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

Proof: First we note that the seriesP1

n=1

cnen is Cauchy in H since if M > N

||MX

n=N

cnen||2 =MX

n=N

|cn|2 (6.6.6)

which is less than any prescribed ✏ > 0 for M < N su�ciently large, sinceP1

n=1

|cn|2 <1. Thus y =

P1n=1

cnen exists in H, and clearly y 2 E. Since hPN

n=1

cnen, emi = cm ifN > m it follows easily that < y, em >= cm =< x, em >. Thus y � x ? em for any mwhich implies y � x 2 E?. From Theorem 6.3 we conclude that y = PEx. 2

6.7 Characterization of a basis of a Hilbert space

Now suppose we have an orthogonal set {xn}1n=1

and we wish to determine whether ornot it is a basis of the Hilbert space H. There are a number of interesting ways toanswer this question, summarized in Theorem 6.4 below. First we must make some moredefinitions.

Definition 6.5. A collection of vectors {xn}1n=1

is closed in H if the set of all finite linearcombinations of {xn}1n=1

is dense in H

A collection of vectors {xn}1n=1

is complete in H if there is no nonzero vector orthog-onal to all of them, i.e. hx, xni = 0 for all n if and only if x = 0.

An orthogonal set {xn}1n=1

in H is a maximal orthogonal set if it is not contained inany larger orthogonal set.

basischar Theorem 6.4. Let {en}1n=1

be an orthonormal set in a Hilbert space H. Then the fol-lowing are equivalent.

a) {en}1n=1

is a basis of H.

b) x =P1

n=1

hx, enien for every x 2 H.

c) hx, yi =P1

n=1

hx, enihen, yi for every x, y 2 H.

d) ||x||2 =P1

n=1

|hx, eni|2 for every x 2 H.

e) {en}1n=1

is a maximal orthonormal set.

f) {en}1n=1

is closed in H.

85

Page 87: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

g) {en}1n=1

is complete in H.

Proof: a) implies b): If {en}1n=1

is a basis of H then for any x 2 H there exist uniqueconstants dn such that x = limS

N

where SN =P

n = 1Ndnen. Since hSN , emi = dm ifN > m it follows that

|dm � hx, emi| = |hSn � x, emi| ||SN � x|| ||em|| ! 0 (6.7.1)

as N ! 1, using the Schwarz inequality. Hence

x =1X

n=1

dnen =1X

n=1

hx, enien (6.7.2)

b) implies c): For any x, y 2 H we have

hx, yi = hx, yi = hx, limN!1

NX

n=1

hy, enieni (6.7.3)

= limN!1

hx,NX

n=1

hy, enieni = limN!1

1X

n=1

hy, enihx, eni (6.7.4)

=1X

n=1

hx, enihy, eni = hx, yi =1X

n=1

hx, enihen, yi (6.7.5)

Here we have used Corollary 6.1 in the second line.

c) implies d): We simply choose x = y in the identity stated in c).

d) implies e): If {en}1n=1

is not maximal then there exists e 2 H such that

{en}1n=1

[ {e} (6.7.6)

is orthonormal. Since he, eni = 0 but ||e|| = 1 this contradicts d).

e) implies f): Let E denote the set of finite linear combinations of the en’s. If {en}1n=1

is not closed then E 6= H so there must exist x 62 E. If we let y = x � PEx then y 6= 0and y ? E. If e = y/||y|| we would then have that {en}1n=1

[ {e} is orthonormal so that{en}1n=1

could not be maximal.

86

Page 88: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

f) implies g): Assume that hx, eni = 0 for all n. If {en}1n=1

is closed then for any ✏ > 0there exists �

1

, . . .�N such that ||x�PN

n=1

�nen||2 < ✏. But then ||x||2 +PN

n=1

|�n|2 < ✏and in particular ||x||2 < ✏. Thus x = 0 so {en}1n=1

is complete.

g) implies a): Let E = L({en}1n=1

). If x 2 H and y = PEx =P1

n=1

hx, enien thenas in the proof of Proposition 6.3 hy, xni = hx, xni. Since {en}1n=1

is complete it followsthat x = y 2 E so that L{en}1n=1

= H. Since an orthonormal set is obviously linearlyindependent it follows that {en}1n=1

is a basis of H.

Because of the equivalence of the stated conditions, the phrases ’complete orthonormalset’, ’maximal orthonormal set’, and ’closed orthonormal set’ are often used interchange-ably with ’orthonormal basis’ in a Hilbert space setting. The identity in d) is called theBessel equality (recall the corresponding inequality (6.6.5) is valid whether or not theorthonormal set {en}1n=1

is a basis), while the identity in c) is the Parseval equality. Forreasons which should become more clear in Chapter 8 the infinite series

P1n=1

hx, enienis often called the generalized Fourier series of x with respect to the orthonormal basis{en}1n=1

, and hx, eni is the n’th generalized Fourier coe�cient.

th65 Theorem 6.5. Every separable Hilbert space has an orthonormal basis.

Proof: If {xn}1n=1

is a countable dense sequence in H and we carry out the Gram-Schmidt procedure, we obtain an orthonormal sequence {en}1n=1

. This sequence mustbe complete, since any vector orthogonal to every en must also be orthogonal to everyxn, so must be zero, since {xn}1n=1

is dense. Therefore by Theorem 6.4 {en}1n=1

(or{e

1

, e2

, . . . en} in the finite dimensional case) is an orthonormal basis of H.

The same conclusion is actually correct in a no-separable Hilbert space also, but needsmore explanation. See for example Chapter 4 of [29].

6.8 Isomorphisms of a Hilbert space

There are two interesting isomorphisms of every separable Hilbert space, one is to its so-called dual space, and the second is to the sequence space `2. In this section we explainboth of these facts.

Recall that in Chapter 5 we have already introduced X⇤ = B(X,C), the space ofcontinuous linear functionals on the normed linear space X. It is itself always a Banachspace (see Exercise 3 of Chapter 5), and is also called the dual space of X.

87

Page 89: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

exmp6-10 Example 6.10. IfH is a Hilbert space and y 2 H, define �(x) = hx, yi. Then � : H ! Cis clearly linear, and |�(x)| ||y|| ||x|| by the Schwarz inequality, hence � 2 H⇤, with||�|| ||y||.

The following theorem asserts that every element of the dual space H⇤ arises in thisway.

riesz Theorem 6.6. (Riesz representation theorem) If H is a Hilbert space and � 2 H⇤ thenthere exists a unique y 2 H such that �(x) = hx, yi.

Proof: Let M = {x 2 H : �(x) = 0}, which is clearly a closed subspace of H. If M = Hthen � can only be the zero functional, so y = 0 has the required properties. Otherwise,there must exist e 2 M? such that ||e|| = 1. For any x 2 H let z = �(x)e � �(e)x andobserve that �(z) = 0 so z 2 M , and in particular z ? e. It then follows that

0 = hz, ei = �(x)he, ei � �(e)hx, ei (6.8.1)

Thus �(x) = hx, yi with y := �(e)e, for every x 2 H.

The uniqueness property is even easier to show. If �(x) = hx, y1

i = hx, y2

i for everyx 2 H then necessarily hx, y

1

� y2

i = 0 for all x, and choosing x = y1

� y2

we get||y

1

� y2

||2 = 0, that is, y1

= y2

.

We view the element y 2 H as ’representing’ the linear functional � 2 H⇤, hence thename of the theorem. There are actually several theorems one may encounter, all calledthe Riesz representation theorem, and what they all have in common is that the dualspace of some other space is characterized. The Hilbert space version here is by the farthe easiest of these theorems.

If we define the mapping R : H ! H⇤ (the Riesz map) by the condition R(y) = �,with �, y related as above, then Theorem 6.6 amounts to the statement that R is oneto one and onto. Since it is easy to check that R is also linear, it follows that R is anisomorphism from H to H⇤. In fact more is true, R is an isometric isomorphism whichmeans that ||R(y)|| = ||y|| for every y 2 H. To see this, recall we have already seen inExample 6.10 that ||�|| ||y||, and by choosing x = y we also get �(y) = ||y||2, whichimplies ||�|| � ||y||.

Next, suppose that H is an infinite dimensional separable Hilbert space. Accordingto Theorem 6.5 there exists an orthonormal basis of H which cannot be finite, and somay be written as {en}1n=1

. Associate with any x 2 H the corresponding sequence

88

Page 90: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

of generalized Fourier coe�cients {cn}1n=1

, where cn = hx, eni, and let ⇤ denote thismapping, i.e. ⇤(x) = {cn}1n=1

.

We know by Theorem 6.4 thatP1

n=1

|cn|2 < 1, i.e. ⇤(x) 2 `2. On the otherhand, suppose

P1n=1

|cn|2 < 1 and let x =P1

n=1

cnen. This series is Cauchy, henceconvergent in H, by precisely the same argument as used in the beginning of the proof ofProposition 6.3. Since {en}1n=1

is a basis, we must have cn = hx, eni, thus ⇤(x) = {cn}1n=1

,and consequently ⇤ : H ! `2 is onto. It is also one-to-one, since ⇤(x

1

) = ⇤(x2

) meansthat hx

1

� x2

, eni = 0 for every n, hence x1

� x2

= 0 by the completeness property of abasis. Finally it is straightforward to check that ⇤ is linear, so that ⇤ is an isomorphism.Like the Riesz map, the isomorphism ⇤ is also isometric, ||⇤(x)|| = ||x||, on account ofthe Bessel equality. By the above considerations we have then established the followingtheorem.

Theorem 6.7. If H is an infinite dimensional separable Hilbert space, then H is isomet-rically isomorphic to `2.

Since all such Hilbert spaces are isometrically isomorphic to `2, they are then obviouslyisometrically isomorphic to each other. If H is a Hilbert space of dimension N , the samearguments show that H is isometrically isomorphic to the Hilbert space RN or CN ,depending on whether real or complex scalars are allowed. Finally, see Theorem 4.17 of[29] for the nonseparable case.

6.9 Exercises

1. Prove Proposition 6.2.

2. In the Hilbert space L2(�1, 1) what is M? if

a) M = {u : u(x) = u(�x) a.e.}b) M = {u : u(x) = 0 a.e. for � 1 < x < 0}.Give an explicit formula for the projection onto M in each case.

3. Prove that PE is a linear operator on H with norm ||PE|| = 1 except in the trivialcase when E = {0}. Suggestion: If x = c

1

x1

+ c2

x2

first show that

PEx� c1

PEx1

� c2

PEx2

= �PE?x+ c1

PE?x1

+ c2

PE?x2

89

Page 91: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

4. Show that the parallelogram law fails in L1(⌦), so there is no choice of innerproduct which can give rise to the norm in L1(⌦). (The same is true in Lp(⌦) forany p 6= 2.)

5. If (X, h·, ·i) is an inner product space prove the polarization identity

hx, yi = 1

4

�||x+ y||2 � ||x� y||2 + i||x+ iy||2 � i||x� iy||2

Thus, in any normed linear space, there can exist at most one inner product givingrise to the norm.

6. Let M be a closed subspace of a Hilbert space H, and PM be the correspondingprojection. Show that

a) P 2

M = PM

b) hPMx, yi = hPMx, PMyi = hx, PMyi for any x, y 2 H.

ex6-6 7. Show that `2 is a Hilbert space. (Discussion: The only property you need to checkis completeness, and you may freely use the fact that R is complete. A Cauchysequence in this case is a sequence of sequences, so use a notation like

x(n) = {x(n)1

, x(n)2

, . . . }

where x(n)j denotes the j’th term of the n’th sequence x(n). Given a Cauchy sequence

{x(n)}1n=1

in `2 you’ll first find a sequence x such that limn!1 x(n)j = xj for each

fixed j. You then must still show that x 2 `2, and one good way to do this is byfirst showing that x� x(n) 2 `2 for some n.)

8. Let H be a Hilbert space.

a) If xn ! x in H show that {xn}1n=1

is bounded in H.

b) If xn ! x, yn ! y in H show that hxn, yni ! hx, yi.

ex6-8 9. Compute orthogonal polynomials of degree 0,1,2,3 on [�1, 1] and on [0, 1] by apply-ing the Gram-Schmidt procedure to 1, x, x2, x3 in L2(�1, 1) and L2(0, 1). (In thecase of L2(�1, 1), you are finding so-called Legendre polynomials.)

10. Use the result of Exercise 9 and the projection formula (6.6.1) to compute thebest polynomial approximations of degrees 0,1,2 and 3 to u(x) = ex in L2(�1, 1).Feel free to use any symbolic calculation tool you know to compute the necessaryintegrals, but give exact coe�cients, not calculator approximations. If possible,produce a graph displaying u and the 4 approximations.

90

Page 92: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

11. Let ⌦ ⇢ RN , ⇢ be a measurable function on ⌦, and ⇢(x) > 0 a.e. on ⌦. Let Xdenote the set of measurable functions u for which

R⌦

|u(x)|2⇢(x) dx is finite. Wecan then define the weighted inner product

hu, vi⇢ =Z

u(x)v(x)⇢(x) dx

and corresponding norm ||u||⇢ =phu, ui⇢ on X. The resulting inner product space

is complete, often denoted L2

⇢(⌦). (As in the case of ⇢(x) = 1 we regard any twofunctions which agree a.e. as being the same element, so L2

⇢(⌦) is again really a setof equivalence classes.)

a) Verify that all of the inner product axioms are satisfied.

b) Suppose that there exist constants C1

, C2

such that 0 < C1

⇢(x) C2

a.e.Show that un ! u in L2

⇢(⌦) if and only if un ! u in L2(⌦).

12. More classes of orthogonal polynomials may be derived by applying the Gram-Schmidt procedure to {1, x, x2, . . . } in L2

⇢(a, b) for various choices of ⇢, a, b, two ofwhich occur in Exercise 9. Another class is the Laguerre polynomials, correspondingto a = 0, b = 1 and ⇢(x) = e�x. Find the first four Laguerre polynomials.

13. Show that equality holds in the Schwarz inequality (6.2.1) if and only if x, y arelinearly dependent.

14. Show by examples that the best approximation problem (6.4.1) may not have asolution if E is either not closed or not convex.

15. If ⌦ is a compact subset of RN , show that C(⌦) is a subspace of L2(⌦) which isn’tclosed.

16. Show that ⇢1p2, cosn⇡x, sinn⇡x

�1

n=1

(6.9.1)

is an orthonormal set in L2(�1, 1). (Completeness of this set will be shown inChapter 8.)

17. For nonnegative integers n define

vn(x) = cos (n cos�1 x)

a) Show that vn+1

(x) + vn�1

(x) = 2xvn(x) for n = 1, 2, . . .

91

Page 93: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

b) Show that vn is a polynomial of degree n (the so-called Chebyshev polynomials).

c) Show that {vn}1n=1

are orthogonal in L2

⇢(�1, 1) where the weight function is⇢(x) = 1p

1�x2

.

18. If H is a Hilbert space we say a sequence {xn}1n=1

converges weakly to x (notation:xn

w! x) if hxn, yi ! hx, yi for every y 2 H.

a) Show that if xn ! x then xnw! x.

b) Prove that the converse is false, as long as dim(H) = 1, by showing that if{en}1n=1

is any orthonormal sequence in H then enw! 0, but limn!1 en doesn’t

exist.

c) Prove that if xnw! x then ||x|| lim infn!1 ||xn||.

d) Prove that if xnw! x and ||xn|| ! ||x|| then xn ! x.

19. Let M1

,M2

be closed subspaces of a Hilbert space H and suppose M1

? M2

. Showthat

M1

�M2

= {x 2 H : x = y + z, y 2 M1

, z 2 M2

}is also a closed subspace of H.

92

Page 94: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

Chapter 7

Distributions

chdist

In this chapter we will introduce and study the concept of distribution, also commonlyknown as generalized function. To motivate this study we first mention two examples.

Example 7.1. The wave equation utt � uxx = 0 has the general solution u(x, t) =F (x+ t) +G(x� t) where F,G must be in C2(R) in order that u be a classical solution.However from a physical point of view there is no apparent reason why such smoothnessrestrictions on F,G should be needed. Indeed the two terms represent waves of fixed shapemoving to the left and right respectively with speed one, and it ought to be possible toallow the shape functions F,G to have discontinuities. The calculus of distributions willallow us to regard u as a solution of the wave equation in a well defined sense even forsuch irregular F,G.

Example 7.2. In physics and engineering one frequently encounters the Dirac deltafunction �(x), which has the properties

�(x) = 0 x 6= 0

Z 1

�1�(x) dx = 1 (7.0.1)

Unfortunately these properties are inconsistent for ordinary functions – any functionwhich is zero except at a single point must have integral zero. The theory of distributionswill allow us to give a precise mathematical meaning to the delta function and in so doingjustify formal calculations with it.

Roughly speaking a distribution is a mathematical object whose unique identity isspecified by how it acts on all test function. It is in a sense quite analogous to a function

93

Page 95: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

in the ordinary sense, whose unique identity is specified by how it acts (i.e. how it maps)all points in its domain. As we will see, most ordinary functions may viewed as a specialkind of distribution, which explains the ’generalized function’ terminology. In addition,there is a well defined calculus of distributions which is basic to the modern theory ofpartial di↵erential equations. We now start to give precise meaning to these concepts.

7.1 The space of test functions

For any real or complex valued function f defined on some domain in RN , the supportof f , denoted supp f , is the closure of the set {x : f(x) 6= 0}.Definition 7.1. If ⌦ is any open set in RN the space of test functions on ⌦ is

C10

(⌦) = {� 2 C1(⌦) : supp� is compact in ⌦} (7.1.1)

This function space is also commonly denoted D(⌦), which is the notation we willuse from now on. Clearly D(⌦) is a vector space, but it may not be immediately evidentthat it contains any function other than � ⌘ 0.

Example 7.3. Define

�(x) =

(e

1

x

2�1 |x| < 1

0 |x| � 1(7.1.2)

Then � 2 D(⌦) with ⌦ = R. To see this one only needs to check that limx!1� �(k)(x) = 0for k = 0, 1, . . . , and similarly at x = �1. Once we have one such function then manyothers can be derived from it by dilation (�(x) ! �(↵x)), translation (�(x) ! �(x�↵)),scaling (�(x) ! ↵�(x)), di↵erentiation (�(x) ! �(k)(x)) or any linear combination ofsuch terms. See also Exercise 1.

Next, we define convergence in the test function space.

Definition 7.2. If �n 2 D(⌦) then we say �n ! 0 in D(⌦) if

(i) There exists a compact set K ⇢ ⌦ such that supp�n ⇢ K for every n

(ii) limn!1 maxx2⌦ |D↵�n(x)| = 0 for every multiindex ↵

We also say that �n ! � in D(⌦) provided �n � �! 0 in D(⌦). By specifying whatconvergence of a sequence in D(⌦) means, we are partly, but not completely, specifying

94

Page 96: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

a topology on D(⌦). We will have no need of further details about this topology, seeChapter 6 of [30] for more on this point.

7.2 The space of distributionssec72

We come now to the basic definition – a distribution is a continuous linear functional onD(⌦). More precisely

Definition 7.3. A linear mapping T : D(⌦) ! C is a distribution on ⌦ if T (�n) ! T (�)whenever �n ! � in D(⌦). The set of all distributions on ⌦ is denoted D0(⌦).

The distribution space D0(⌦) is another example of a dual space X0, the set of allcontinuous linear functionals on X, which can be defined if X is any vector space in whichconvergence of sequences is defined. The dual space is always itself a vector space. We’lldiscuss many more examples of dual spaces later on. We emphasize that the distributionT is defined solely in terms of the values it assigns to test functions �, in particular twodistributions T

1

, T2

are equal if and only if T1

(�) = T2

(�) for every � 2 D(⌦).

To clarify the concept, let us discuss a number of examples.

Example: If f 2 L1(⌦) define

T (�) =

Z

f(x)�(x) dx (7.2.1) Tf

Obviously |T (�)| ||f ||L1

(⌦)

||�||L1(⌦)

, so that T : D(⌦) ! C and is also clearly linear.If �n ! � in D(⌦) then by the same token

|T (�n)� T (�)| ||f ||L1

(⌦)

||�n � �||L1(⌦)

! 0 (7.2.2)

so that T is continuous. Thus T 2 D0(⌦).

Because of the fact that � must have compact support in ⌦ one does not really needf to be in L1(⌦) but only in L1(K) for any compact subset K of ⌦. For any 1 p 1let us define

Lploc(⌦) = {f : f 2 Lp(K) for any compact set K ⇢ ⌦} (7.2.3)

95

Page 97: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

Thus a function in Lploc(⌦) can become infinite arbitrarily rapidly at the boundary of ⌦.

We say that fn ! f in Lploc(⌦) if fn ! f in Lp(K) for every compact subset K ⇢ ⌦.

Functions in L1

loc are said to be locally integrable on ⌦.

Now if we let f 2 L1

loc(⌦) the definition (7.2.1) still produces a finite value, since

|T (�)| =Z

f(x)�(x) dx =

Z

K

f(x)�(x) dx ||f ||L1

(K)

||�||L1(K)

< 1 (7.2.4)

if K = supp�. Similarly if �n ! � in D(⌦) we can choose a fixed compact set K ⇢ ⌦containing supp� and supp�n for every n, hence again

|T (�n)� T (�)| ||f ||L1

(K)

||�n � �||L1(K)

! 0 (7.2.5)

so that T 2 D0(⌦).

When convenient, we will denote the distribution in (7.2.1) by Tf . The correspondencef ! Tf allows us to think of L1

loc(⌦) as a special subspace of D0(⌦), i.e. locally integrablefunctions are always distributions. The point is that a function f can be thought of as amapping

�!Z

f� dx (7.2.6)

instead of the more conventionalx ! f(x) (7.2.7)

In fact for L1

loc functions the former is in some sense more natural since it doesn’t requireus to make special arrangements for sets of measure zero. A distribution of the formT = Tf for some f 2 L1

loc(⌦) is sometimes referred to as a regular distribution, while anydistribution not of this type is a singular distribution.

The correspondence f ! Tf is also one-to-one. This is a slightly technical result inmeasure theory which we leave for the exercises, for those with the necessary background.See also Theorem 2, Chapter II of [31]:

th71 Theorem 7.1. Two distributions Tf1

, Tf2

on ⌦ are equal if and only if f1

= f2

almosteverywhere on ⌦.

Example 7.4. Fix a point x0

2 ⌦ and define

T (�) = �(x0

) (7.2.8) Tdelta

Clearly T is defined and linear on D(⌦) and if �n ! � in D(⌦) then

|T (�n)� T (�)| = |�n(x0

)� �(x0

)| ! 0 (7.2.9)

96

Page 98: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

since �n ! � uniformly on ⌦. We claim that T is not of the form Tf for any f 2 L1

loc(⌦)(i.e. not a regular distribution). To see this, suppose some such f existed. We wouldthen have Z

f(x)�(x) dx = 0 (7.2.10)

for any test function � with �(x0

) = 0. In particular if ⌦0 = ⌦\{x0

} and � 2 D(⌦0) thendefining �(x

0

) = 0 we clearly have � 2 D(⌦) and T (�) = 0, hence f = 0 a.e. on ⌦0 andso on ⌦, by Theorem 7.1. On the other hand we must also have, for any � 2 D(⌦) that

�(x0

) = T (�) =

Z

f(x)�(x) dx (7.2.11a)

=

Z

f(x)(�(x)� �(x0

)) dx+ �(x0

)

Z

f(x) dx = �(x0

)

Z

f(x) dx (7.2.11b)

since f = 0 a.e. on ⌦, and thereforeR⌦

f(x) dx = 1 a contradiction.

Note that f(x) = 0 for a.e x 2 ⌦. andR⌦

f(x) dx � 1 are precisely the formalproperties of the delta function mentioned in Example 2. We define T to be the Diracdelta distribution with singularity at x

0

, usually denoted �x0

, or simply � in the casex0

= 0. By an acceptable abuse of notation, pretending that � is an actual function, wemay write a formula like Z

�(x)�(x) dx = �(0) (7.2.12)

but we emphasize that this is simply a formal expression of (7.2.8), and any rigorousarguments must make use of (7.2.8) directly. In the same formal sense �x

0

(x) = �(x�x0

)so that Z

�(x� x0

)�(x) dx = �(x0

) (7.2.13)

ex-75 Example 7.5. Fix a point x0

2 ⌦, a multiindex ↵ and define

T (�) = (D↵�)(x0

) (7.2.14)

One may show, as in the previous example, that T 2 D0(⌦).

Example 7.6. Let ⌃ be a su�ciently smooth hypersurface in ⌦ of dimension m n� 1and define

T (�) =

Z

�(x) ds(x) (7.2.15)

where ds is the surface area element on ⌃. Then T is a distribution on ⌦ sometimesreferred to as the delta distribution concentrated on ⌃, sometimes written as �

.

97

Page 99: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

ex-77 Example 7.7. Let ⌦ = R and define

T (�) = lim✏!0+

Z

|x|>✏

�(x)

xdx (7.2.16) pv1x

As we’ll show below, the indicated limit always exists and is finite for � 2 D(⌦) (evenfor � 2 C1

0

(⌦)). In general, a limit of the form

lim✏!0+

Z

⌦\|x�a|>✏f(x) dx (7.2.17) pvdef

when it exists, is called the Cauchy principal value ofR⌦

f(x) dx, which may be finite

even whenR⌦

f(x) dx is divergent in the ordinary sense. For exampleR1

�1

dxxis divergent,

regarded as either a Lebesgue integral or an improper Riemann integral, but

lim✏!0+

Z

1>|x|>✏

dx

x= 0 (7.2.18)

To distinguish the principal value meaning of the integral, the notation

pv

Z

f(x) dx (7.2.19)

may be used instead of (7.2.17), where the point a in question must be clear from context.

Let us now check that (7.2.16) defines a distribution. If supp� ⇢ [�M,M ] then since

Z

|x|>✏

�(x)

xdx =

Z

M>|x|>✏

�(x)

xdx =

Z

M>|x|>✏

�(x)� �(0)

xdx+ �(0)

Z

M>|x|>✏

1

xdx (7.2.20)

and the last term on the right is zero, we have

T (�) = lim✏!0+

Z

M>|x|>✏ (x) dx (7.2.21)

where (x) = (�(x)� �(0))/x. It now follows from the mean value theorem that

|T (�)| Z

|x|<M

| (x)| dx 2M ||�0||L1 (7.2.22)

98

Page 100: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

so T (�) is defined and finite for all test functions. Linearity of T is clear, and if �n ! �in D(⌦) then

|T (�n)� T (�)| 2M ||�0n � �0||L1 ! 0 (7.2.23)

where M is chosen so that supp�n, supp� ⇢ [�M,M ], and it follows that T is continu-ous.

The distribution T is often denoted pv 1

x, so for example pv 1

x(�) means the same

thing as the right hand side of (7.2.16). For reasons which will become more clear later,it may also be referred to as pf 1

x, pf standing for pseudofunction (also finite part).

7.3 Algebra and Calculus with Distributions

7.3.1 Multiplication of distributions

As noted above D0(⌦) is a vector space, hence distributions can be added and multipliedby scalars. In general it is not possible to multiply together arbitrary distributions –for example �2 = � · � cannot be defined in any consistent way. It is always possible,however, to multiply a distribution by a C1 function. More precisely, if a 2 C1(⌦) andT 2 D0(⌦) then we may define the product aT as a distribution via

Definition 7.4. aT (�) = T (a�) � 2 D(⌦)

Clearly a� 2 D(⌦) so that the right hand side is well defined, and it it straightfor-ward to check that aT satisfies the necessary linearity and continuity conditions. Oneshould also note that if T = Tf then this definition is consistent with ordinary pointwisemultiplication of the functions f and a.

7.3.2 Convergence of distributions

An appropriate definition of convergence of a sequence of distributions is as follows.

convD Definition 7.5. If T, Tn 2 D0(⌦) for n = 1, 2 . . . then we say Tn ! T in D0(⌦) (or inthe sense of distributions) if Tn(�) ! T (�) for every � 2 D(⌦).

99

Page 101: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

It is an interesting fact, which we shall not prove here, that it is not necessary toassume that the limit T belongs to D0(⌦), that is to say, if T (�) := limn!1 Tn(�) existsfor every � 2 D(⌦) then necessarily T 2 D0(⌦), (see Theorem 6.17 of [30]).

Example 7.8. If fn 2 L1

loc(⌦) and fn ! f in L1

loc(⌦) then the corresponding distributionTf

n

! Tf in the sense of distributions, since

|Tfn

(�)� Tf (�)| Z

K

|fn � f ||�| dx ||fn � f ||L1

(K)

||�||L1(⌦)

(7.3.1)

where K is the support of �. Because of the one-to-one correspondence f $ Tf , we willusually write instead that fn ! f in the sense of distributions.

Example 7.9. Define

fn(x) =

⇢n 0 < x < 1

n0 otherwise

(7.3.2) ds2

We claim that fn ! � in the sense of distributions. We see this by first observing that

|Tfn

(�)� �(�)| =�����nZ 1

n

0

�(x) dx� �(0)

����� =

�����nZ 1

n

0

(�(x)� �(0)) dx

����� (7.3.3)

By the continuity of �, if ✏ > 0 there exists � > 0 such that |�(x) � �(0)| ✏ whenever|x| �. Thus if we choose n > 1

�there follows

n

Z 1

n

0

|�(x)� �(0)| dx n✏

Z 1

n

0

dx = ✏ (7.3.4)

from which the conclusion follows. Note that the formal properties of the � function,�(x) = 0, x 6= 0, �(0) = +1,

R�(x) dx = 1, are clearly reflected in the pointwise limit

of the sequence fn, but it is only the distributional definition that is mathematicallysatisfactory.

Sequences converging to � play a very large role in methods of applied mathematics,especially in the theory of di↵erential and integral equations. The following theoremincludes many cases of interest.

dst Theorem 7.2. Suppose fn 2 L1(RN) for n = 1, 2, . . . and assume

a)RRN

fn(x) dx = 1 for all n.

b) There exists a constant C such that ||fn||L1

(RN

)

C for all n.

100

Page 102: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

c) limn!1R|x|>� |fn(x)| dx = 0 for all � > 0.

If � is bounded on RN and continuous at x = 0 then

limn!1

Z

RN

fn(x)�(x) dx = �(0) (7.3.5)

and in particular fn ! � in D0(RN).

Proof: For any � 2 D(⌦) we haveZ

RN

fn(x)�(x) dx� �(0) =

Z

RN

fn(x)(�(x)� �(0)) dx (7.3.6) ds1

and so we will be done if we show that that the integral on the right tends to zero asn ! 1. Fix ✏ > 0 and choose � > 0 such that |�(x)� �(0)| ✏ whenever |x| < �. Writethe integral on the right in (7.3.6) as the sum An,� +Bn,� where

An,� =

Z

|x|�fn(x)(�(x)� �(0)) dx Bn,� =

Z

|x|>�fn(x)(�(x)� �(0)) dx (7.3.7)

We then have, by obvious estimations, that

|An,�| ✏

Z

RN

|fn(x)| C✏ (7.3.8)

while

lim supn!1

|Bn,�| lim supn!1

2||�||L1

Z

|x|>�|fn(x)| dx = 0 (7.3.9)

Thus

lim supn!1

����Z

RN

fn(x)�(x) dx� �(0)

���� C✏ (7.3.10)

and the conclusion follows since ✏ > 0 is arbitrary. 2

It is often the case that fn � 0 for all n, in which case assumption b) follows auto-matically from a) with C = 1. We will refer to any sequence satisfying the assumptionsof Theorem 7.2 as a delta sequence. A common way to construct such a sequence is topick any f 2 L1(RN) with

RRN

f(x) dx = 1 and set

fn(x) = nNf(nx) (7.3.11) 7311

The verification of this is left to the exercises. If, for example, we choose f(x) = �[0,1](x),

then the resulting sequence fn(x) is the same as is defined in (7.3.2). Since we can alsochoose such an f in D(RN) we also have

101

Page 103: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

dst2 Proposition 7.1. There exists a sequence {fn}1n=1

such that fn 2 D(RN) and fn ! �in D0(RN).

7.3.3 Derivative of a distribution

Next we explain how it is possible to define the derivative of an arbitrary distribution.For the moment, suppose (a, b) ⇢ R, f 2 C1(a, b) and T = Tf is the correspondingdistribution. We clearly then have from integration by parts that

Tf 0(�) =

Z b

a

f 0(x)�(x) dx = �Z b

a

f(x)�0(x) dx = �Tf (�0) (7.3.12)

This suggests definingT 0(�) = �T (�0) � 2 C1

0

(a, b) (7.3.13)

whenever T 2 D0(a, b). The previous equation shows that this definition is consistentwith the ordinary concept of di↵erentiability for C1 functions. Clearly, T 0(�) is alwaysdefined, since �0 is a test function whenever � is, linearity of T 0 is obvious, and if �n ! �in C1

0

(a, b) then �0n ! �0 also in C1

0

(a, b) so that

T 0(�n) = �T (�0n) ! �T (�0) = T 0(�) (7.3.14)

Thus, T 0 2 D0(a, b).

Example: Consider the case of the Heaviside (unit step) function H(x)

H(x) =

⇢0 x < 01 x > 0

(7.3.15)

If we seek the derivative of H (i.e. of TH) according to the above distributional definition,then we compute

H 0(�) = �H(�0) = �Z 1

�1H(x)�0(x) dx = �

Z 1

0

�0(x) dx = �(0) (7.3.16)

(where we use the natural notation H 0 in place of T 0H). This means that H 0(�) = �(�) for

any test function �, and so H 0 = � in the sense of distributions. This relationship clearlycaptures the fact the H 0 = 0 at all points where the derivative exists in the classicalsense, since we think of the delta function as being zero on any interval not containing

102

Page 104: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

the origin. Since H is not di↵erentiable at the origin, the distributional derivative is itselfa distribution which is not a function.

Since � is again a distribution, it will itself have a derivative, namely

�0(�) = ��(�0) = ��0(0) (7.3.17)

a distribution of the type discussed in Example 7.5, often referred to as the dipole distri-bution, which of course we may regard as the second derivative of H.

For an arbitrary domain ⌦ ⇢ RN and su�ciently smooth function f we have thesimilar integration by parts formula (see (18.2.3))

Z

@f

@xi

� dx = �Z

f@�

@xi

dx (7.3.18)

leading to the definition

Definition 7.6.@T

@xi

(�) = �T

✓@�

@xi

◆� 2 D(⌦) (7.3.19)

As in the one dimensional case we easily check that @T@x

i

belongs to D0(⌦) whenever Tdoes. This has the far reaching consequence that every distribution is infinitely di↵eren-tiable in the sense of distributions. Furthermore we have the general formula, obtainedby repeated application of the basic definition, that

(D↵T )(�) = (�1)|↵|T (D↵�) (7.3.20)

for any multiindex ↵.

A simple and useful property is

prop72 Proposition 7.2. If Tn ! T in D0(⌦) then D↵Tn ! D↵T in D0(⌦) for any multiindex↵.

Proof: D↵Tn(�) = (�1)|↵|Tn(D↵�) ! (�1)|↵|T (D↵�) = D↵T (�) for any test function�. 2

Next we consider a more generic one dimensional situation. Let x0

2 R and considera function f which is C1 on (�1, x

0

) and on (x0

,1), and for which f (k) has finite left

103

Page 105: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

and right hand limits at x = x0

, for any k. Thus, at the point x = x0

, f or any of itsderivatives may have a jump discontinuity, and we denote

�kf = limx!x

0

+

f (k)(x)� limx!x

0

�f (k)(x) (7.3.21)

(and by convention �f = �0f .) Define also

[f (k)](x) =

⇢f (k)(x) x 6= x

0

undefined x = x0

(7.3.22)

which we’ll refer to as the pointwise k’th derivative. The notation f (k) will always beunderstood to mean the distributional derivative unless otherwise stated. The distinctionbetween f (k) and [f (k)] is crucial, for example if f(x) = H(x), the Heaviside function,then H 0 = � but [H 0] = 0 for x 6= 0, and is undefined for x = 0.

For f as described above, we now proceed to calculate the distributional derivative.If � 2 C1

0

(R) we haveZ 1

�1f(x)�0(x) dx =

Z x0

�1f(x)�0(x) dx+

Z 1

x0

f(x)�0(x) dx (7.3.23a)

= f(x)�(x)��x0

�1 �Z x

0

�1f 0(x)�(x) dx+ f(x)�(x)

���1x0

�Z �1

x0

f 0(x)�(x) dx (7.3.23b)

= �Z 1

�1[f 0(x)]�(x) dx+ (f(x

0

�)� f(x0

+))�(x0

) (7.3.23c)

It follows that

f 0(�) =Z 1

�1[f 0(x)]�(x) dx+ (�f)�(x

0

) (7.3.24)

orf 0 = [f 0] + (�f)�(x� x

0

) (7.3.25)

Note in particular that f 0 = [f 0] if and only if f is continuous at x0

.

The function [f 0] satisfies all of the same assumptions as f itself, with �f 0 = �[f 0],thus we can di↵erentiate again in the distribution sense to obtain

f 00 = [f 0]0 + (�f)�0(x� x0

) = [f 00] + (�1f)�(x� x0

) + (�f)�0(x� x0

) (7.3.26)

Here we use the evident fact that the distributional derivative of �(x� x0

) is �0(x� x0

).

104

Page 106: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

A similar calculation can be carried out for higher derivatives of f , leading to thegeneral formula

f (k) = [f (k)] +k�1X

j=0

(�jf)�(k�1�j)(x� x0

) (7.3.27) 7327

One can also obtain a similar formula if f is allowed to have any finite number of suchsingular points.

Example 7.10. Let

f(x) =

⇢x x < 0cos x x > 0

(7.3.28)

Clearly f satisfies all of the assumptions mentioned above with x0

= 0, and

[f 0](x) =⇢

1 x < 0� sin x x > 0

(7.3.29)

[f 00](x) =⇢

0 x < 0� cos x x > 0

(7.3.30)

so that �f = 1,�1f = �1. Thus

f 0 = [f 0] + � f 00 = [f 00]� � + �0 (7.3.31)

Here is one more instructive example in the one dimensional case.

Example 7.11. Let

f(x) =

(log x x > 0

0 x 0(7.3.32)

Since f 2 L1

loc(R) we may regard it as a distribution on R, but its pointwise derivativeH(x)/x is not locally integrable, so does not have an obvious distributional meaning.Nevertheless f 0 must exist in the sense of D0(R). To find it we use the definition above,

f 0(�) = �f(�0) = �Z 1

0

�0(x) log x dx = � lim✏!0+

Z 1

�0(x) log x dx (7.3.33)

= lim✏!0+

�(✏) log ✏+

Z 1

�(x)

xdx

�(7.3.34)

= lim✏!0+

�(0) log ✏+

Z 1

�(x)

xdx

�(7.3.35)

105

Page 107: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

where the final equality is valid because the di↵erence between it and the previous lineis lim✏!0

(�(✏)��(0)) log ✏ = 0. The functional defined by the final expression above will

be denoted1 as pf⇣

H(x)x

⌘, i.e.

pf

✓H(x)

x

◆(�) = lim

✏!0+

�(0) log ✏+

Z 1

�(x)

xdx

�(7.3.36) pf1

Since we have already established that the derivative of a distribution is also a distri-

bution, it follows that pf⇣

H(x)x

⌘2 D0(R) and in particular the limit here always exists

for � 2 D(R). It should be emphasized that if �(0) 6= 0 then neither of the two termson the right hand side in (7.3.36) will have a finite limit separately, but the sum alwayswill. For a test function � with support disjoint from the singularity at x = 0, the action

of the distribution pf⇣

H(x)x

⌘coincides with that of the ordinary function H(x)/x, as we

might expect.

Next we turn to examples involving partial derivatives.

exm7-12 Example 7.12. Let F 2 L1

loc(R) and set u(x, t) = F (x+ t). We claim that utt�uxx = 0in D0(R2). Recall that this is the point that was raised in the first example at thebeginning of this chapter. A similar argument works for F (x � t). To verify this claim,first observe that for any � 2 D(R2)

(utt � uxx)(�) = u(�tt � �xx) =

ZZ

R2

F (x+ t)(�tt(x, t)� �xx(x, t)) dxdt (7.3.37)

Make the change of coordinates

⇠ = x� t ⌘ = x+ t (7.3.38)

to obtain

(utt � uxx)(�) = 2

Z 1

�1F (⌘)

Z 1

�1�⇠⌘(⇠, ⌘) d⇠

�d⌘ = 2

Z 1

�1F (⌘)

��⌘(⇠, ⌘)|1⇠=�1

�d⌘ = 0

(7.3.39)since � has compact support.

exm7-13 Example 7.13. Let N � 3 and define

u(x) =1

|x|N�2

(7.3.40)

1Recall the pf notation was introduced earlier in Section 7.2.

106

Page 108: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

We claim that�u = CN� in D0(RN) (7.3.41) 7341

where CN = (2�N)⌦N�1

and ⌦N�1

is the surface area2 of the unit sphere in RN . Firstnote that for any R we have

Z

|x|<R

|u(x)| dx = ⌦N�1

Z R

0

1

rN�2

rN�1 dr < 1 (7.3.42)

(using, for example (18.3.1)) so u 2 L1

loc(RN) and in particular u 2 D0(RN).

It is natural here to use spherical coordinates in RN , see Section 18.3 for a review. Inparticular the expression for the Laplacian in spherical coordinates may be derived fromthe chain rule, as was done in (2.3.67) for the two dimensional case. When applied to afunction depending only on r = |x|, such as u, the result is

�u = urr +N � 1

rur (7.3.43) radialNlaplacian

(see Exercise 17 of Chapter 2) and it follows that �u(x) = 0 for x 6= 0.

We may use Green’s identity (18.2.6) to obtain, for any � 2 D(RN)

�u(�) = u(��) =

Z

RN

u(x)��(x) dx = lim✏!0+

Z

|x|>✏u(x)��(x) dx (7.3.44)

= lim✏!0+

Z

|x|>✏�u(x)�(x) dx+

Z

|x|=✏

✓u(x)

@�

@n(x)� �(x)

@u

@n(x)

◆dS(x)

�(7.3.45)

Since �u = 0 for x 6= 0 and @@n

= � @@r

on {x : |x| = ✏} this simplifies to

�u(�) = lim✏!0+

Z

|x|=✏

✓2�N

✏N�1

�(x)� 1

✏N�2

@�

@r(x)

◆dS(x) (7.3.46)

We next observe that

lim✏!0+

Z

|x|=✏

2�N

✏N�1

�(x) dS(x) = (2�N)⌦N�1

�(0) (7.3.47)

2The usual notation is to use N�1 rather than N as the subscript because the sphere is a surface of dimensionN � 1.

107

Page 109: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

since the average of � over the sphere of radius ✏ converges to �(0) as ✏! 0. Finally, thesecond integral tends to zero, since

����Z

|x|=✏

1

✏N�2

@�

@r(x) dS(x)

���� ⌦N�1

✏N�1

✏N�2

||r�||L1 ! 0 (7.3.48)

Thus (7.3.41) holds. When N = 2 an analogous calculation shows that if u(x) = log |x|then �u = 2⇡� in D0(R2).

7.4 Convolution and distributions

If f, g are locally integrable functions on RN the classical convolution of f and g is definedto be

(f ⇤ g)(x) =Z

RN

f(x� y)g(y) dy (7.4.1)

whenever the integral is defined. By an obvious change of variable we see that convolutionis commutative, f ⇤ g = g ⇤ f .

Proposition 7.3. If f 2 Lp(RN) and g 2 Lq(RN) then f ⇤ g 2 Lr(RN) if 1 + 1

r= 1

p+ 1

q,

so in particular is defined almost everywhere. Furthermore

||f ⇤ g||Lr

(RN

)

||f ||Lp

(RN

)

||g||Lq

(RN

)

(7.4.2) youngci

The inequality (7.4.2) is Young’s convolution inequality, and we refer to [37] (Theorem9.2) for a proof. In the case r = 1 it can actually be shown that f ⇤ g 2 C(RN).

Our goal here is to generalize the definition of convolution in such a way that at leastone of the two factors can be a distribution. Let us introduce the notations for translationand inversion of a function f ,

(⌧hf)(x) = f(x� h) (7.4.3)

f(x) = f(�x) (7.4.4)

so that f(x � y) = (⌧xf)(y). If f 2 D(RN) then so is (⌧xf) so that (f ⇤ g)(x) may beregarded as Tg(⌧xf), i.e. the value obtained when the distribution corresponding to thelocally integrable function g acts on the test function (⌧xf). This motivates the followingdefinition.

108

Page 110: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

convdp Definition 7.7. If T 2 D0(RN) and � 2 D(RN) then (T ⇤ �)(x) = T (⌧x�).

By this definition (T ⇤�)(x) exists and is finite for every x 2 RN but other smoothnessor decay properties of T ⇤ � may not be apparent.

Example 7.14. If T = � then

(T ⇤ �)(x) = �(⌧x�) = (⌧x�)(y)|y=0

= �(x� y)|y=0

= �(x) (7.4.5)

Thus, � is the ’convolution identity’, � ⇤ � = � at least for � 2 D(RN). Formally thiscorresponds to the widely used formula

Z

RN

�(x� y)�(y) dy = �(x) (7.4.6)

If Tn ! � in D0(RN) then likewise

(Tn ⇤ �)(x) = Tn(⌧x�) ! �(⌧x�) = �(x) (7.4.7) ci3

for any fixed x 2 RN .

A key property of convolution is that in computing a derivative D↵(T ⇤�), the deriva-tive may be applied to either factor in the convolution. More precisely we have thefollowing theorem.

convth1 Theorem 7.3. If T 2 D0(RN) and � 2 D(RN) then T ⇤ � 2 C1(RN) and for anymulti-index ↵

D↵(T ⇤ �) = D↵T ⇤ � = T ⇤D↵� (7.4.8) ci2

Proof: First observe that

(�1)|↵|D↵(⌧x�) = ⌧x((D↵�) ) (7.4.9)

and applying T to these identical test functions we get the right hand equality in (7.4.8).We refer to Theorem 6.30 of [30] for the proof of the left hand equality.

When f, g are continuous functions of compact support it is elementary to see thatsupp (f ⇤ g) ⇢ supp f + supp g. The same property holds for T ⇤ � if T 2 D0(RN) and� 2 D(RN), once a proper definition of the support of a distribution is given.

If ! ⇢ ⌦ is an open set we say that T = 0 in ! if T (�) = 0 whenever � 2 D(⌦) andsupp (�) ⇢ !. IfW denotes the largest open subset of ⌦ on which T = 0 (equivalently the

109

Page 111: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

union of all open subsets of ⌦ on which T = 0) then the support of T is the complementof W in ⌦. In other words, x 62 suppT if there exists ✏ > 0 such that T (�) = 0 whenever� is a test function with support in B(x, ✏). One can easily verify that the supportof a distribution is closed, and agrees with the usual notion of support of a function,up to sets of measure zero. The set of distributions of compact support in ⌦ forms avector subspace of D0(⌦) denoted E 0(⌦). This notation is appropriate because E 0(⌦)turns out to be precisely the dual space of C1(RN) =: E(RN) when a suitable definitionof convergence is given, see for example Chapter II, section 5 of [31].

If now T 2 E 0(RN) and � 2 D(RN), we observe that

supp (⌧x�) = x� supp� (7.4.10)

Thus(T ⇤ �)(x) = T (⌧x�) = 0 (7.4.11)

unless there is a nonempty intersection of suppT and x � supp�, in other words, x 2suppT + supp�. Thus from these remarks and Theorem 7.3 we have

convth2 Proposition 7.4. If T 2 E 0(RN) and � 2 D(RN) then

supp (T ⇤ �) ⇢ suppT + supp� (7.4.12)

and in particular T ⇤ � 2 D(RN).

Convolution provides an extremely useful and convenient way to approximate func-tions and distributions by very smooth functions, the exact sense in which the approxi-mation takes place being dependent on the object being approximated. We will discussseveral results of this type.

thuapprox Theorem 7.4. Let f 2 C(RN) with supp f compact in RN . Pick � 2 D(RN), withRRN

�(x) dx = 1, set �n(x) = nN�(nx) and fn = f ⇤ �n. Then fn 2 D(RN) and fn ! funiformly on RN .

Proof: The fact that fn 2 D(RN) is immediate from Proposition 7.4. Fix ✏ > 0. By theassumption that f is continuous and of compact support it must be uniformly continuouson RN so there exists � > 0 such that |f(x) � f(z)| < ✏ if |x � z| < �. Now choose n

0

such that supp�n ⇢ B(0, �) for n > n0

. We then have, for n > n0

that

|fn(x)� f(x)| =����Z

RN

(fn(x� y)� fn(x))�n(y) dy

���� (7.4.13)Z

|y|<�|fn(x� y)� f(x)||�n(y)| dy ✏||�||L1

(RN

)

(7.4.14)

110

Page 112: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

and the conclusion follows. 2

If f is not assumed continuous then of course it is not possible for there to existfn 2 D(RN) converging uniformly to f . However the following can be shown.

thLpApprox Theorem 7.5. Let f 2 Lp(RN), 1 p < 1. Pick � 2 D(RN), withRRN

�(x) dx = 1,set �n(x) = nN�(nx) and fn = f ⇤ �n. Then fn 2 C1(RN) \ Lp(RN) and fn ! f inLp(RN).

Proof: If ✏ > 0 we can find g 2 C(RN) of compact support such that ||f � g||Lp

(RN

)

< ✏.If gn = g ⇤ �n then

||f � fn||Lp

(RN

)

||f � g||Lp

(RN

)

+ ||g � gn||Lp

(RN

)

+ ||fn � gn||Lp

(RN

)

(7.4.15)

C||f � g||Lp

(RN

)

+ ||g � gn||Lp

(RN

)

(7.4.16)

where we have used Young’s convolution inequality (7.4.2) to obtain

||fn � gn||Lp

(RN

)

||�n||L1

(RN

)

||f � g||Lp

(RN

)

= ||�||L1

(RN

)

||f � g||Lp

(RN

)

(7.4.17)

Since gn ! g uniformly by Theorem 7.4 and g � gn has support in a fixed compact setindependent of n, it follows that ||g�gn||Lp

(RN

)

! 0, and so lim supn!1 ||f�fn||Lp

(RN

)

C✏.

Further refinements and variants of these results can be proved, see for exampleSection C.4 of [10].

Next consider the even more general case that T 2 D0(RN). As in Proposition 7.1we can choose n 2 D(RN) such that n ! � in D0(RN). Set Tn = T ⇤ n, so thatTn 2 C1(RN). If � 2 D(RN) we than have

Tn(�) = (Tn ⇤ �)(0) = ((Tn ⇤ n) ⇤ �)(0) = ((T ⇤ n) ⇤ �)(0) (7.4.18)

= (T ⇤ ( n ⇤ �))(0) = T (( n ⇤ �) ) (7.4.19)

It may be checked that n ⇤ � ! � in D(RN), thus Tn(�) ! T (�) for all � 2 D(RN),that is, Tn ! T in D0(RN).

In the above derivation we used associativity of convolution. This property is notcompletely obvious, and in fact is false in a more general setting in which convolutionof two distributions is defined. For example, if we were to assume that convolution ofdistributions was always defined and that Theorem 7.3 holds, we would have 1⇤(�0⇤H) =

111

Page 113: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

1 ⇤H 0 = 1 ⇤ � = 1, but (1 ⇤ �0) ⇤H = 0 ⇤H = 0. Nevertheless, associativity is correct inthe case we have just used it, and we refer to [30] Theorem 6.30(c), for the proof.

The pattern of the results just stated is that T ⇤ n converges to T in the topologyappropriate to the space that T itself belongs to, but this cannot be true in all situationswhich may be encountered. For example it cannot be true that if f 2 L1 then f ⇤ n

converges to f in L1 since this would amount to uniform convergence of a sequence ofcontinuous functions, which is impossible if f itself is not continuous.

7.5 Exercises

ex7-1 1. Construct a test function � 2 C10

(R) with the following properties: 0 �(x) 1for all x 2 R, �(x) ⌘ 1 for |x| < 1 and �(x) ⌘ 0 for |x| > 2. (Suggestion: thinkabout what �0 would have to look like.)

2. Show that

T (�) =1X

n=1

�(n)(n)

defines a distribution T 2 D0(R).

3. If � 2 D(R) show that (x) = (�(x)� �(0))/x (this function appeared in Example7.7) belongs to C1(R). (Suggestion: first prove (x) =

R1

0

�0(xt) dt.)

4. Find the distributional derivative of f(x) = [x], the greatest integer function.

5. Find the distributional derivatives up through order four of f(x) = |x| sin x.6. (For readers familiar with the concept of absolute continuity.) If f is absolutely

continuous on (a, b) and f 0 = g a.e., show that f 0 = g in the sense of distributionson (a, b).

7-3 7. Let �n > 0, �n ! +1 and set

fn(x) = sin�nx gn(x) =sin�nx

⇡x

a) Show that fn ! 0 in D0(R) as n ! 1.

b) Show that gn ! � in D0(R) as n ! 1.

(You may use without proof the fact that the value of the improper integralR1�1

sinxx

dxis ⇡.)

112

Page 114: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

8. Let � 2 C10

(R) and f 2 L1(R).a) If n(x) = n(�(x + 1

n) � �(x)), show that n ! �0 in C1

0

(R). (Suggestion: usethe mean value theorem over and over again.)

b) If gn(x) = n(f(x+ 1

n)� f(x)), show that gn ! f 0 in D0(R).

9. Let T = pv1

x. Find a formula analogous to (7.3.35) for the distributional derivative

of T .

10. Find limn!1 sin2 nx in D0(R), or show that it doesn’t exist.

ex7-11 11. Define the distribution

T (�) =

Z 1

�1�(x, x) dx

for � 2 C10

(R2). Show that T satisfies the wave equation uxx�uyy = 0 in the senseof distributions on R2. Discuss why it makes sense to regard T as being �(x� y).

12. Let ⌦ ⇢ RN be a bounded open set and K ⇢⇢ ⌦. Show that there exists � 2C1

0

(⌦) such that 0 �(x) 1 and �(x) ⌘ 1 for x 2 K. (Hint: approximate thecharacteristic function of ⌃ by convolution, where ⌃ satisfies K ⇢⇢ ⌃ ⇢⇢ ⌦. UseProposition 7.4 for the needed support property.)

13. If a 2 C1(⌦) and T 2 D0(⌦) prove the product rule

@

@xj

(aT ) = a@T

@xj

+@a

@xj

T

14. Let T 2 D0(RN). We may then regard � 7�! A� = T ⇤ � as a linear mappingfrom C1

0

(Rn) into C1(Rn). Show that A commutes with translations, that is,⌧hA� = A⌧h� for any � 2 C1

0

(RN). (The following interesting converse statementcan also be proved: If A : C1

0

(RN) 7�! C(RN) is continuous and commutes withtranslations then there exists a unique T 2 D0(RN) such that A� = T ⇤ �. Anoperator commuting with translations is also said to be translation invariant.)

15. If f 2 L1(RN),R1�1 f(x) dx = 1, and fn(x) = nNf(nx), use Theorem 7.2 to show

that fn ! � in D0(RN).

16. Prove Theorem 7.1.

113

Page 115: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

17. If T 2 D0(⌦) prove the equality of mixed partial derivatives

@2T

@xi@xj

=@2T

@xj@xi

(7.5.1)

in the sense of distributions, and discuss why there is no contradiction with knownexamples from calculus showing that the mixed partial derivatives need not beequal.

18. Show that the expression

T (�) =

Z1

�1

�(x)� �(0)

|x| dx+

Z

|x|>1

�(x)

|x| dx

defines a distribution on R. Show also that xT = sgn x.

19. If f is a function defined on RN and � > 0, let f�(x) = f(�x). We say that f ishomogeneous of degree ↵ if f� = �↵f for any � > 0. If T is a distribution on RN

we say that T is homogeneous of degree ↵ if

T (��) = ��↵�NT (���1)

a) Show that these two definitions are consistent, i.e., if T = Tf for some f 2L1

loc(RN) then T is homogeneous of degree ↵ if and only if f is homogeneous ofdegree ↵.

b) Show that the delta function is homogeneous of degree �N .

ex7-17 20. Show that u(x) = 1

2⇡log |x| satisfies �u = � in D0(R2).

21. Without appealing to Theorem 7.3, give a direct proof of the fact that T ⇤ � is acontinuous function of x, for T 2 D0(RN) and � 2 D(RN).

22. Let

f(x) =

(log2 x x > 0

0 x < 0

Show that f 2 D0(R) and find the distributional derivative f 0. Is f a tempereddistribution?

23. If a 2 C1(R), show thata�0 = a(0)�0 � a0(0)�

24. If T 2 D0(RN) has compact support, show that T (�) is defined in an unambiguousway for any � 2 C1(RN) =: E(RN). (Suggestion: write � = � + (1 � )� where 2 D(RN) satisfies ⌘ 1 on the support of T .)

114

Page 116: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

Chapter 8

Fourier analysis and distributions

chfourier

In this chapter We present some of the elements of Fourier analysis, with special attentionto those aspects arising in the theory of distributions. Fourier analysis is often viewedas made up of two parts, one being a collection of topics relating to Fourier series, andthe second being those connected to the Fourier transform. The essential distinction isthat the former focuses on periodic functions while the latter is concerned with functionsdefined on all of RN . In either case the central question is that of how we may representfairly arbitrary functions, or even distributions, as combinations of particularly simpleperiodic functions.

We will begin with Fourier series, and restrict attention to the one dimensional case.See for example [25] for treatment of multidimensional Fourier series.

8.1 Fourier series in one space dimension

The fundamental point is that if un(x) = einx then the functions {un}1n=�1 make upan orthogonal basis of L2(�⇡, ⇡). It will then follow from the general considerations ofChapter 6 that any f 2 L2(�⇡, ⇡) may expressed as a linear combination

f(x) =1X

n=�1cne

inx (8.1.1) 81

115

Page 117: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

where

cn =hf, unihun, uni

=1

2⇡

Z ⇡

�⇡f(y)e�iny dy (8.1.2) 82

The right hand side of (8.1.1) is a Fourier series for f , and (8.1.2) is a formula for the n’thFourier coe�cient of f . It must be understood that the equality in (8.1.1) is meant onlyin the sense of L2 convergence of the partial sums, and need not be true at any particularpoint. From the theory of Lebesgue integration it follows that there is a subsequenceof the partial sums which will converge almost everywhere on (�⇡, ⇡), but more thanthat we cannot say, without further assumptions on f . Any finite sum

PNn=�N �ne

inx iscalled a trigonometric polynomial, so in particular we will be showing that trigonometricpolynomials are dense in L2(�⇡, ⇡).

Let us set

en(x) =1p2⇡

einx n = 0,±1,±2, . . . (8.1.3)

Dn(x) =1

2⇡

nX

k=�n

eikx (8.1.4) 82a

KN(x) =1

N + 1

NX

n=0

Dn(x) (8.1.5)

It is immediate from checking the necessary integrals that {en}1n=�1 is an orthonormalset inH = L2(�⇡, ⇡). The main goal for the rest of this section is to prove that {en}1n=�1is actually an orthonormal basis of H.

For the rest of this section, the inner product symbol hf, gi and norm || · || refer to theinner product and norm in H unless otherwise stated. In the context of Fourier analysis,Dn, KN are known as the Dirichlet kernel and Fejer kernel respectively. Note that

Z ⇡

�⇡Dn(x) dx =

Z ⇡

�⇡KN(x) dx = 1 (8.1.6)

for any n,N .

If f 2 H, let

sn(x) =nX

k=�n

ckeikx (8.1.7) 83

116

Page 118: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

where ck is given by (8.1.2) and

�N(x) =1

N + 1

NX

n=0

sn(x) (8.1.8)

Since

sn(x) =nX

k=�n

hf, ekiek(x) (8.1.9)

it follows that the partial sum sn is also the projection of f onto the span of {ek}nk=�n

and so in particular the Bessel inequality

||sn|| =

vuutnX

k=�n

|ck|2 ||f || (8.1.10)

holds for all n. In particular, limn!1hf, eni = 0, which is the Riemann Lebesgue lemmafor the Fourier coe�cients of f 2 H.

Next observe that by substitution of (8.1.2) into (8.1.7) we obtain

sn(x) =

Z ⇡

�⇡f(y)Dn(x� y) dy (8.1.11) 84

We can therefore regard sn as being given by the convolution Dn ⇤f if we let f(x) = 0outside of the interval (�⇡, ⇡). We can also express Dn in an alternative and useful way:

Dn(x) =1

2⇡e�inx

2nX

k=0

eikx =1

2⇡e�inx

✓1� e(2n+1)ix

1� eix

◆(8.1.12)

for x 6= 0. Multiplying top and bottom of the fraction by e�ix/2 then yields

Dn(x) =1

2⇡

sin (n+ 1

2

)x

sin x2

x 6= 0 (8.1.13) 84a

and obviously Dn(0) = (2n+ 1)/2⇡.

An alternative viewpoint of the convolutional relation (8.1.11), which is in some sensemore natural, starts by defining the unit circle as T = R mod 2⇡Z, i.e. we identify any

117

Page 119: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

two points of R di↵ering by an integer multiple of 2⇡. Any 2⇡ periodic function, suchas en, Dn, sn etc may be regarded as a function on T, and if f is originally given as afunction on (�⇡, ⇡) then it may extended in a 2⇡ periodic manner to all of R and so alsoviewed as a function on the circle T. With f , Dn both 2⇡ periodic, the integral (8.1.11)could be written as

sn(x) =

Z

Tf(y)Dn(x� y) dy (8.1.14) 85

since (8.1.11) simply amounts to using one natural parametrization of the independentvariable. By the same token

sn(x) =

Z a+2⇡

a

f(y)Dn(x� y) dy (8.1.15)

for any convenient choice of a. A 2⇡ periodic function is continuous on T if it is continuouson [�⇡, ⇡] and f(⇡) = f(�⇡), and the space C(T) may simply be regarded as

C(T) = {f 2 C([�⇡, ⇡]) : f(⇡) = f(�⇡)} (8.1.16)

a closed subspace of C([�⇡, ⇡]), so is itself a Banach space with maximum norm. Likewisewe can define

Cm(T) = {f 2 Cm([�⇡, ⇡]) : f (j)(⇡) = f (j)(�⇡), j = 0, 1, . . .m} (8.1.17)

a Banach space with the analogous norm.

Next let us make some corresponding observations about KN .

Proposition 8.1. There holds

�N(x) =

Z

TKN(x� y)f(y) dy (8.1.18) 86

and

KN(x) =NX

k=�N

✓1� |k|

N + 1

◆eikx =

1

2⇡(N + 1)

sin ( (N+1)x

2

)

sin (x2

)

!2

x 6= 0 (8.1.19) 86b

Proof: The identity (8.1.18) is immediate from (8.1.14) and the definition of KN , and

118

Page 120: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

the first identity in (8.1.19) is left as an exercise. To complete the proof we observe that

2⇡NX

n=0

Dn(x) =

PNn=0

sin (n+ 1

2

)x

sin x2

(8.1.20)

=Im⇣ei

x

2

PNn=0

einx⌘

sin x2

(8.1.21)

=Im⇣ei

x

2

⇣1�ei(N+1)x

1�eix

⌘⌘

sin x2

(8.1.22)

=Im⇣

1�cos (N+1)x�i sin (N+1)x�2i sin x

2

sin x2

(8.1.23)

=cos (N + 1)x� 1

2 sin2 x2

(8.1.24)

=

sin (N+1)x

2

sin (x2

)

!2

(8.1.25)

and the conclusion follows upon dividing by 2⇡(N + 1). 2

fejerconvergence Theorem 8.1. Suppose that f 2 C(T). Then �N ! f in C(T).

Proof: Since KN � 0 andRT KN(x� y) dy = 1 for any x, we have

|�N(x)� f(x)| =����Z

TKn(x� y)(f(y)� f(x)) dy

���� Z x+⇡

x�⇡Kn(x� y)|f(y)� f(x)| dy

(8.1.26)If ✏ > 0 is given, then since f must be uniformly continuous on T, there exists � > 0 suchthat |f(x)� f(y)| < ✏ if |x� y| < �. Thus

|�N(x)� f(x)| (8.1.27)

✏R|x�y|<�KN(x� y) dy + 2||f ||1

R�<|x�y|<⇡KN(x� y) dy(8.1.28)

✏+ ||f ||1⇡(N+1) sin

2 ( �

2

)(8.1.29)

Thus there exists N0

such that for N � N0

, |�N(x)�f(x)| < 2✏ for all x, that is, �N ! funiformly.

119

Page 121: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

corr81 Corollary 8.1. The functions {en(x)}1n=�1 form an orthonormal basis of H = L2(�⇡, ⇡).

Proof: We have already observed that these functions form an orthonormal set, so itremains only to verify one of the equivalent conditions stated in Theorem 6.4. We willshow the closedness property, i.e. that set of finite linear combinations of {en(x)}1n=�1is dense in H. Given g 2 H and ✏ > 0 we may find f 2 C(T) such that ||f � g|| < ✏,f 2 D(�⇡, ⇡) for example. Then choose N such that ||�N � f ||C(T) < ✏, which implies||�N � f || <

p2⇡✏. Thus �N is a finite linear combination of the en’s and

||g � �N || < (1 +p2⇡)✏ (8.1.30)

Since ✏ is arbitrary, the conclusion follows.

corr82 Corollary 8.2. For any f 2 H = L2(�⇡, ⇡), if

sn(x) =nX

k=�n

ckeikx (8.1.31)

where

ck =1

2⇡

Z ⇡

�⇡f(x)e�ikx dx (8.1.32)

then sn ! f in H.

For f 2 H, we will often write

f(x) =1X

n=�1cne

inx (8.1.33)

but we emphasize that without further assumptions this only means that the partialsums converge in L2(�⇡, ⇡).

At this point we have looked at the convergence properties of two di↵erent sequencesof trigonometric polynomials, sn and �N , associated with f . While sn is simply the n’thpartial sum of the Fourier series of f , the �N ’s are the so-called Fejer means of f . Whileeach Fejer mean is a trigonometric polynomial, the sequence �N does not amount to thepartial sums of some other Fourier series, since the n’th coe�cient would also have todepend on N . For f 2 H, we have that sN ! f in H, and so the same is obviously trueunder the stronger assumption that f 2 C(T). On the other hand for f 2 C(T) we have

120

Page 122: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

shown that �N ! f uniformly, but it need not be true that sN ! f uniformly, or evenpointwise (example of P. du Bois-Reymond, see Section 1.6.1 of [25]). For f 2 H it canbe shown that �N ! f in H, but on the other hand the best L2 approximation propertyof sN implies that

||sN � f || ||�N � f || (8.1.34)

since both sN and �N are in the span of {ek}Nk=�N . That is to say, the rate of convergenceof sN to f is faster, in the L2 sense at least, than that of �N . In summary, both sN and�N provide a trigonometric polynomial approximating f , but each has some advantageover the other, depending on what is to be assumed about f .

8.2 Alternative forms of Fourier series

From the basic Fourier series (8.1.1) a number of other closely related and useful expres-sions can be immediately derived. First suppose that f 2 L2(�L,L) for some L > 0. Ifwe let f(x) = f(Lx/⇡) then f 2 L2(�⇡, ⇡), so

f(x) =1X

n=�1cne

inx cn =1

2⇡

Z ⇡

�⇡f(y)e�iny dy (8.2.1)

or equivalently

f(x) =1X

n=�1cne

i⇡nx/L cn =1

2L

Z L

�L

f(y)e�i⇡ny/L dy (8.2.2) 87

Likewise (8.2.2) holds if we just regard f as being 2L periodic and in L2, and in theformula for cn we could replace (�L,L) by any other interval of length 2L. The functionsei⇡nx/L/

p2L make up an orthonormal basis of L2(a, b) if b� a = 2L.

Next observe that we can write

f(x) =1X

n=�1cn⇣cos

n⇡x

L+ i sin

n⇡x

L

⌘= c

0

+1X

n=1

(cn+ c�n) cosn⇡x

L+ i(cn� c�n) sin

n⇡x

L

(8.2.3)If we let

an = cn + c�n bn = i(cn � c�n) n = 0, 1, 2, . . . (8.2.4)

121

Page 123: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

then we obtain the equivalent formulas

f(x) =a0

2+

1X

n=1

an cosn⇡x

L+ bn sin

n⇡x

L(8.2.5) 88

where

an =1

L

Z L

�L

f(y) cosn⇡y

Ldy n = 0, 1, . . . bn =

1

L

Z L

�L

f(y) sinn⇡y

Ldy n = 1, 2, . . .

(8.2.6) 89

We refer to (8.2.5),(8.2.6) as the ’real form’ of the Fourier series, which is natural touse, for example, if f is real valued, since then no complex quantities appear. Again theprecise meaning of (8.2.5) is that sn ! f in H = L2(�L,L) or other interval of length2L, where now

sn(x) =a0

2+

nX

k=1

ak cosk⇡x

L+ bk sin

k⇡x

L(8.2.7)

with results analogous to those mentioned above for the Fejer means also being valid. Itmay be easily checked that the set of functions

⇢1p2L

,cos n⇡x

LpL

,sin n⇡x

LpL

�1

n=1

(8.2.8)

make up an orthonormal basis of L2(�L,L).

Another important variant is obtained as follows. If f 2 L2(0, L) then we may definethe associated even and odd extensions of f in L2(�L,L), namely

fe(x) =

(f(x) 0 < x < L

f(�x) � L < x < 0fo(x) =

(f(x) 0 < x < L

�f(�x) � L < x < 0(8.2.9)

If we replace f by fe in (8.2.5),(8.2.6), then we obtain immediately that bn = 0 and aresulting cosine series representation for f ,

f(x) =a0

2+

1X

n=1

an cosn⇡x

Lan =

2

L

Z L

0

f(y) cosn⇡y

Ldy n = 0, 1, . . . (8.2.10)

Likewise replacing f by fo gives us a corresponding sine series,

f(x) =1X

n=1

bn sinn⇡x

Lbn =

2

L

Z L

0

f(y) sinn⇡y

Ldy n = 1, 2, . . . (8.2.11)

122

Page 124: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

Note that if the 2L periodic extension of f is continuous, then the same is true of the2L periodic extension of fe, but this need not be true in the case of fo. Thus we mightexpect that the cosine series of f has typically better convergence properties than thatof the sine series.

8.3 More about convergence of Fourier series

If f 2 L2(�⇡, ⇡) it was already observed that since the the partial sums sn converge tof in L2(�⇡, ⇡), some subsequence of the partial sums converges pointwise a.e. In fact itis a famous theorem of Carleson ([6]) that sn ! f (i.e. the entire sequence, not just asubsequence) pointwise a.e. This is a complicated proof and even now is not to be foundeven in advanced textbooks. No better result could be expected since f itself is onlydefined up to sets of measure zero.

If we were to assume the stronger condition that f 2 C(T) then it mighty be naturalto conjecture that sn ! f for every x (recall we know �N ! f uniformly in this case), butthat turns out to be false, as mentioned above: in fact there exist continuous functionsfor which sn(x) is divergent at infinitely many x 2 T, see Section 5.11 of [29].

A su�cient condition implying that sn(x) ! f(x) for every x 2 T is that f bepiecewise continuously di↵erentiable on T. In fact the following more precise theoremcan be proved.

FourPointwise Theorem 8.2. Assume that there exist points �⇡ x0

< x1

< . . . xM = ⇡ such thatf 2 C1([xj, xj+1

]) for j = 0, 1, . . .M � 1. Let

f(x) =

(1

2

(limy!x+ f(y) + limy!x� f(y)) � ⇡ < x < ⇡1

2

(limy!�⇡+ f(y) + limy!⇡� f(y)) x = ±⇡ (8.3.1)

Then limn!1 sn(x) = f(x) for �⇡ x ⇡.

Under the stated assumptions on f , the theorem states in particular that sn convergesto f at every point of continuity of f , (with appropriate modification at the endpoints)and otherwise converges to the average of the left and right hand limits. The proof issomewhat similar to that of Theorem 8.1 – steps in the proof are outlined in the exercises.

So far we have discussed the convergence properties of the Fourier series based onassumptions about f , but another point of view we could take is to focus on how con-

123

Page 125: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

vergence properties are influenced by the behavior of the Fourier coe�cients cn. A firstsimple result of this type is:

prop82 Proposition 8.2. If f 2 H = L2(�⇡, ⇡) and its Fourier coe�cients satisfy

1X

n=�1|cn| < 1 (8.3.2) acfs

then f 2 C(T) and sn ! f uniformly on T

Proof: By the Weierstrass M-test, the seriesP1

n=�1 cneinx is uniformly convergent onR to some limit g, and since each partial sum is continuous, the same must be true of g.Since uniform convergence implies L2 convergence on any finite interval, we have sn ! gin H, but also sn ! f in H by Corollary 8.2. By uniqueness of the limit f = g and theconclusion follows.

We say that f has an absolutely convergent Fourier series when (8.3.2) holds. Weemphasize here that the conclusion f = g is meant in the sense of L2, i.e. f(x) = g(x)a.e., so by saying that f is continuous, we are really saying that the equivalence class off contains a continuous function, namely g.

It is not the case that every continuous function has an absolutely convergent Fourierseries, according to remarks made earlier in this section. It would therefore be of interestto find other conditions on f which guarantee that (8.3.2) holds. One such conditionfollows from the following, which is also of independent interest.

prop83 Proposition 8.3. If f 2 Cm(T), then limn!±1 nmcn = 0.

Proof: We integrate by parts in (8.1.2) to get, for n 6= 0,

cn =1

2⇡

f(y)e�iny

�in

��⇡�⇡ +

1

in

Z ⇡

�⇡f 0(y)e�iny dy

�=

1

2⇡in

Z ⇡

�⇡f 0(y)e�iny dy (8.3.3) 810

if f 2 C1(T). Since f 0 2 L2(T), the Riemann-Lebesgue lemma implies that ncn ! 0 asn ! ±1. If f 2 C2(T) we could integrate by parts again to get n2cn ! 0 etc.

It is immediate from this result that if f 2 C2(T) then it has an absolutely convergentFourier series, but in fact even f 2 C1(T) is more than enough, see Exercise 6.

One way to regard Proposition 8.3 is that it says that the smoother f is, the morerapidly its Fourier coe�cients must decay. The next result is a sort of converse statement.

124

Page 126: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

prop84 Proposition 8.4. If f 2 H = L2(�⇡, ⇡) and its Fourier coe�cients satisfy

|nm+↵cn| C (8.3.4) 811

for some C and ↵ > 1, then f 2 Cm(T).

Proof: When m = 0 this is just a special case of Proposition 8.2. When m = 1we see that it is permissible to di↵erentiate the series (8.1.1) term by term, since thedi↵erentiated series 1X

n=�1incne

inx (8.3.5)

is uniformly convergent, by the assumption (8.3.4). Thus f, f 0 are both a.e. equal to anabsolutely convergent Fourier series, so f 2 C1(T), by Proposition 8.2. The proof form = 2, 3, . . . is similar.

Note that Proposition 8.3 states a necessary condition on the Fourier coe�cients forf to be in Cm and Proposition 8.4 states a su�cient condition. The two conditions arenot identical, but both point to the general tendency that increased smoothness of f isassociated with more rapid decay of the corresponding Fourier coe�cients.

8.4 The Fourier Transform on RN

If f is a given function on RN the Fourier transform of f is defined as

f(y) =1

(2⇡)N

2

Z

RN

f(x)e�ix·y dx y 2 RN (8.4.1) 812

provided that the integral is defined in some sense. This will always be the case, forexample, if f 2 L1(RN) and any y 2 RN since then

|f(y)| 1

(2⇡)N

2

Z

RN

|f(x)| dx < 1 (8.4.2) 813

thus in fact f 2 L1(RN) in this case.

There are a number of other commonly used definitions of the Fourier transform,obtained by changing the numerical constant in front of the integral, and/or replacing

125

Page 127: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

�ix · y by ix · y and/or including a factor of 2⇡ in the exponent in the integrand. Eachconvention has some convenient properties in certain situations, but none of them isalways the best, hence the lack of a universally agreed upon definition. The di↵erencesare non-essential, all having to do with the way certain numerical constants turn up, sothe only requirement is that we adopt one specific definition, such as (8.4.1), and stickwith it.

The Fourier transform is a particular integral operator, and an alternative operatortype notation for it,

F� = � (8.4.3)

is often convenient to use, especially when discussing its mapping properties.

Example 8.1. If N = 1 and f(x) = �[a,b](x), the indicator function of the interval [a, b],

then the Fourier transform of f is

f(y) =1p2⇡

Z b

a

e�ixy dy =e�iay � e�iby

p2⇡iy

(8.4.4)

Example 8.2. If N = 1, ↵ > 0 and f(x) = e�↵x2

(a Gaussian function) then

f(y) =1p2⇡

Z 1

�1e�↵x

2

e�ixy dx =e�

y

2

4↵

p2⇡

Z 1

�1e�↵(x+

iy

2

)

2

dx (8.4.5)

=e�

y

2

4↵

p2⇡

Z 1

�1e�↵x

2

dx =e�

y

2

4↵

p2⇡

r⇡

↵=

1p2↵

e�y

2

4↵ (8.4.6)

In the above derivation, the key step is the third equality which is justified by contourintegration techniques in complex function theory – the integral of e�↵z

2

along the realaxis is the same as the integral along the parallel line Imz = y

2

for any y.

Thus the Fourier transform of a Gaussian is another Gaussian, and in particular f = fif ↵ = 1

2

.

It is clear from the Fourier transform definition that if f has the special product formf(x) = f

1

(x1

)f2

(x2

) . . . fN(xN) then f(y) = f1

(y1

)f2

(y2

) . . . fN(yN). The Gaussian inRN , namely f(x) = e�↵|x|

2

, is of this type, so using (8.4.6) we immediately obtain

f(y) =e�

|y|24↵

(2↵)N

2

(8.4.7) NdGaussian

126

Page 128: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

To state our first theorem about the Fourier transform, let us denote

C0

(RN) = {f 2 C(RN) : lim|x|!1

|f(x)| = 0} (8.4.8)

the space of continuous functions vanishing at 1. It is a closed subspace of L1(RN),hence a Banach space with the L1 norm. We emphasize that despite the notation,functions in this space need not be of compact support.

Theorem 8.3. If f 2 L1(RN) then f 2 C0

(RN).

Proof: If yn 2 RN and yn ! y then clearly f(x)e�ix·yn ! f(x)e�ix·y for a.e. x 2 RN .

Also, |f(x)e�ix·yn | |f(x)|, and since we assume f 2 L1(RN) we can immediately apply

the dominated convergence theorem to obtain

limn!1

Z

RN

f(x)e�ix·yn dx =

Z

RN

f(x)e�ix·y dx (8.4.9)

that is, f(yn) ! f(y). Hence f 2 C(RN).

Next, suppose temporarily that g 2 C1(RN) and has compact support. An integrationby parts gives us, for j = 1, 2, . . . N that

g(y) = � 1

(2⇡)N

2

1

iyj

Z

RN

@g

@yje�ix·y dx (8.4.10)

Thus there exists some C, depending on g, such that

|g(y)|2 C

y2jj = 1, 2, . . . N (8.4.11)

from which it follows that

|g(y)|2 minj

✓C

y2j

◆ CN

|y|2 (8.4.12)

Thus g(y) ! 0 as |y| ! 1 in this case.

Finally, such g’s are dense in L1(RN), so given f 2 L1(RN) and ✏ > 0, choose g asabove such that ||f � g||L1

(RN

)

< ✏. We then have, taking into account (8.4.2)

|f(y)| |f(y)� g(y)|+ |g(y)| 1

(2⇡)N

2

||f � g||L1

(RN

)

+ |g(y)| (8.4.13)

127

Page 129: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

and solim sup|y|!1

|f(y)| < ✏

(2⇡)N

2

(8.4.14)

Since ✏ > 0 is arbitrary, the conclusion f 2 C0

(RN) follows.

The fact that f(y) ! 0 as |y| ! 1 is analogous to the property that the Fouriercoe�cients cn ! 0 as n ! ±1 in the case of Fourier series, and in fact is also called theRiemann-Lebesgue Lemma.

One of the fundamental properties of the Fourier transform is that it is ’almost’ itsown inverse. A first precise version of this is given by the following Fourier InversionTheorem.

finvthm Theorem 8.4. If f, f 2 L1(RN) then

f(x) =1

(2⇡)N

2

Z

RN

f(y)eix·y dy a.e. x 2 RN (8.4.15) fourinv

The right hand side of (8.4.15) is not precisely the Fourier transform of f becausethe exponent contains ix · y rather than �ix · y, but it does mean that we can think of

it as saying that f(x) = ˆf(�x), orˆf = f , (8.4.16) 819

where f(x) = f(�x), is the reflection of f .1 The requirement in the theorem that bothf and f be in L1 will be weakened later on.

Proof: Since f 2 L1(RN) the right hand side of (8.4.15) is well defined, and we denoteit temporarily by g(x). Define also the family of Gaussians,

G↵(x) =e�

|x|24↵

(4⇡↵)N

2

(8.4.17)

1Warning: some authors use the symbol f to mean the inverse Fourier transform of f .

128

Page 130: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

We then have

g(x) = lim↵!0+

1

(2⇡)N

2

Z

RN

f(y)eix·ye�↵|y|2

dy (8.4.18)

= lim↵!0+

1

(2⇡)N

Z

RN

Z

RN

f(z)e�↵|y|2

e�i(z�x)·y dzdy (8.4.19)

= lim↵!0+

1

(2⇡)N

Z

RN

f(z)

✓Z

RN

e�↵|y|2

e�i(z�x)·y dy◆

dz (8.4.20)

= lim↵!0+

Z

RN

f(z)e�

|z�x|24↵

(4⇡↵)N

2

dz (8.4.21)

= lim↵!0+

(f ⇤G↵)(x) (8.4.22)

Here (8.4.18) follows from the dominated convergence theorem and (8.4.20) from Fubini’stheorem, which is applicable here because

Z

RN

Z

RN

|f(z)e�↵|y|2 | dzdy < 1 (8.4.23)

In (8.4.21) we have used the explicit calculation (8.4.7) above for the Fourier transformof a Gaussian.

Noting thatRRN

G↵(x) dx = 1 for every ↵ > 0, we see that the di↵erence f ⇤G↵(x)�f(x) may be written as Z

RN

G↵(y)(f(x� y)� f(x)) dx (8.4.24)

so that

||f ⇤G↵ � f ||L1

(RN

)

Z

RN

G↵(y)�(y) dy (8.4.25)

where �(y) =RRN

|f(x� y)� f(x)| dx. Then � is bounded and continuous at y = 0 with�(0) = 0 (see Exercise 10), and we can verify that the hypotheses of Theorem 7.2 aresatisfied with fn replaced by G↵

n

as long as ↵n ! 0+. For any sequence ↵n > 0,↵n ! 0it follows that G↵

n

⇤ f ! f in L1(RN), and so there is a subsequence ↵nk

! 0 such that(G↵

n

k

⇤ f)(x) ! f(x) a.e. We conclude that (8.4.15) holds. 2

129

Page 131: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

8.5 Further properties of the Fourier transform

Formally speaking we have

@

@yj

Z

RN

f(x)e�ix·y dx =

Z

RN

�ixjf(x)e�ix·y dx (8.5.1)

or in more compact notation@f

@yj= (�ixjf ) (8.5.2)

This is rigorously justified by standard theorems of analysis about di↵erentiation of in-tegrals with respect to parameters provided that

RRN

|xjf(x)| dx < 1.

A companion property, obtained formally using integration by parts, is thatZ

RN

@f

@xj

e�ix·y dx =

Z

RN

iyjf(x)e�ix·y dx (8.5.3)

or ✓@f

@xj

◆ˆ= iyj f (8.5.4)

which is rigorously correct provided at least that f 2 C1(RN) andR|x|=R

|f(x)| dS ! 0as R ! 1. Repeating the above arguments with higher derivatives we obtain

Proposition 8.5. If ↵ is any multi-index then

D↵f(y) = ((�ix)↵f ) (y) (8.5.5) 821

if Z

RN

|x↵f(x)| dx < 1 (8.5.6) 822

and(D↵f ) (y) = (iy)↵f(y) (8.5.7) 823

if

f 2 Cm(Rn)

Z

|x|=R

|D�f(x)| dS ! 0 as R ! 1 |�| < |↵| = m (8.5.8) 824

130

Page 132: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

We will eventually see that (8.5.5) and (8.5.7) remain valid, suitably interpreted in adistributional sense, under conditions much more general than (8.5.6) and (8.5.8). Butfor now we introduce a new space in which these last two conditions are guaranteed tohold.

Definition 8.1. The Schwartz space is defined as

S(RN) = {� 2 C1(RN) : x↵D�� 2 L1(RN) for all ↵, �} (8.5.9)

Thus a function is in the Schwartz space if any derivative of it decays more rapidlythan the reciprocal of any polynomial. Clearly S(RN) contains all test functions D(RN)as well as other kinds of functions such as Gaussians, e�↵|x|

2

for any ↵ > 0.

If � 2 S(RN) then in particular, for any n

|D��(x)| C

(1 + |x|2)n (8.5.10) 825

for some C, and so clearly both (8.5.5) and (8.5.7) hold, thus the two key identities (8.5.5)and (8.5.7) are correct whenever f is in the Schwartz space. It is also immediate from(8.5.10) that S(RN) ⇢ L1(RN) \ L1(RN).

Proposition 8.6. If � 2 S(RN) then � 2 S(RN).

Proof: Note from (8.5.5) and (8.5.7) that

(iy)↵D��(y) = (iy)↵((�ix)��) (y) = (D↵((�ix)��)) (y) (8.5.11)

holds for � 2 S(RN). Also, since S(RN) ⇢ L1(RN) it follows from (8.4.2) that if � 2S(RN) then � 2 L1(RN). Thus we have the following list of implications:

� 2 S(RN) =) (�ix)�� 2 S(RN) (8.5.12)

=) D↵((�ix)��) 2 S(RN) (8.5.13)

=) (D↵((�ix)��)) 2 L1(RN) (8.5.14)

=) y↵D�� 2 L1(RN) (8.5.15)

=) � 2 S(RN) (8.5.16)

fmap Corollary 8.3. The Fourier transform F : S(RN) ! S(RN) is one to one and onto.

131

Page 133: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

Proof: The above theorem says that F maps S(RN) into S(RN), and if F� = � = 0then the inversion theorem Theorem 8.4 is applicable, since both �, � are in L1(RN). We

conclude � = 0, i.e. F is one to one. If 2 S(RN), let � = ˇ . Clearly � 2 S(RN)and one may check directly, again using the inversion theorem, that � = , so that F isonto.

The next result, usually known as the Parseval identity, is the key step needed todefine the Fourier transform of a function in L2(RN), which turns out to be the morenatural setting.

Proposition 8.7. If �, 2 S(RN) thenZ

RN

�(x) (x) dx =

Z

RN

�(x) (x) dx (8.5.17) pars

Proof: The proof is simply an interchange of order in an iterated integral, which is easilyjustified by Fubini’s theorem:

Z

RN

�(x) (x) dx =1

(2⇡)N

2

Z

RN

�(x)

✓Z

RN

(y)e�ix·y dy◆

dx (8.5.18)

=1

(2⇡)N

2

Z

RN

(y)

✓Z

RN

�(x)e�ix·y dx◆

dy (8.5.19)

=

Z

RN

�(y) (y) dy (8.5.20)

There is a slightly di↵erent but equivalent formula, which is also sometimes called theParseval identity, see Exercise 11. The content of the following corollary is the Plancherelidentity.

planchthm Corollary 8.4. For every � 2 S(RN) we have

||�||L2

(RN

)

= ||�||L2

(RN

)

(8.5.21) planch

Proof: Given � 2 S(RN) there exists, by Corollary 8.3, 2 S(RN) such that = �. Inaddition it follows directly from the definition of the Fourier transform and the inversion

132

Page 134: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

theorem that = �. Therefore, by Parseval’s identity

||�||2L2

(RN

)

=

Z

RN

�(x) (x) dx =

Z

RN

�(x) (x) =

Z

RN

�(x)�(x) dx = ||�||2L2

(RN

)

(8.5.22)

Recalling that D(RN) is dense in L2(RN) it follows that the same is true of S(RN)and the Plancherel identity therefore implies that the Fourier transform has an extensionto all of L2(RN). To be precise, if f 2 L2(RN) pick �n 2 S(RN) such that �n ! fin L2(RN). Since {�n} is Cauchy in L2(RN), (8.5.21) implies the same for {�n}, sog := limn!1 �n exists in the L2 sense, and this limit is by definition f . From elementaryconsiderations this limit is independent of the choice of approximating sequence {�n},the extended definition of f agrees with the original definition if f 2 L1(RN) \ L2(RN),and (8.5.21) continues to hold for all f 2 L2(RN).

Since �n ! f in L2(RN), it follows by similar reasoning that ˆ�n ! ˆf . By the inversion

theorem we know that ˆ�n = �n which must converge to f , thus f = ˆf , i.e. the Fourierinversion theorem continues to hold on L2(RN).

The subset L1(RN)\L2(RN) is dense in L2(RN) so we also have that f = limn!1 fnif fn is any sequence in L1(RN) \ L2(RN) convergent in L2(RN) to f . A natural choiceof such a sequence is

fn(x) =

(f(x) |x| < n

0 |x| > n(8.5.23)

leading to the following explicit formula, similar to an improper integral, for the Fouriertransform of an L2 function,

f(y) = limn!1

1

(2⇡)N

2

Z

|x|<n

f(x)e�ix·y dx (8.5.24) fourL2

where again without further assumptions we only know that the limit takes place in theL2 sense.

Let us summarize.

Theorem 8.5. For any f 2 L2(RN) there exists a unique f 2 L2(RN) such that f isgiven by (8.4.1) whenever f 2 L1(RN) \ L2(RN) and

||f ||L2

(RN

)

= ||f ||L2

(RN

)

. (8.5.25) planch2

133

Page 135: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

Furthermore, f, f are related by (8.5.24) and

f(x) = limn!1

1

(2⇡)N

2

Z

|y|<n

f(y)eix·y dy (8.5.26)

We conclude this section with one final important property of the Fourier transform.

ftconv Proposition 8.8. If f, g 2 L1(RN) then f ⇤ g 2 L1(RN) and

(f ⇤ g) = (2⇡)N

2 f g (8.5.27) 874

Proof: The fact that f ⇤ g 2 L1(RN) is immediate from Fubini’s theorem, or, alterna-tively, is a special case of Young’s convolution inequality (7.4.2). To prove (8.5.27) wehave

(f ⇤ g) (z) =1

(2⇡)N

2

Z

RN

(f ⇤ g)(x)e�ix·z dx (8.5.28)

=1

(2⇡)N

2

Z

RN

✓Z

RN

f(x� y)g(y) dy

◆e�ix·z dx (8.5.29)

=1

(2⇡)N

2

Z

RN

g(y)e�iy·z✓Z

RN

f(x� y)e�i(x�y)·z dx◆

dy (8.5.30)

= (2⇡)N

2 f(z)g(z) (8.5.31)

with the exchange of order of integration justified by Fubini’s theorem.

8.6 Fourier series of distributions

In this and the next section we will see how the theory of Fourier series and Fouriertransforms can be extended to a distributional setting. To begin with let us considerthe case of the delta function, viewed as a distribution on (�⇡, ⇡). Formally speaking, if�(x) =

P1n=�1 cneinx, then the coe�cients cn should be given by

cn =1

2⇡

Z ⇡

�⇡�(x)e�inx dx =

1

2⇡(8.6.1)

134

Page 136: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

for every n, so that

�(x) =1

2⇡

1X

n=�1einx (8.6.2) 871

Certainly this is not a valid formula in any classical sense, since the terms of the seriesdo not decay to zero. On the other hand, the N ’th partial sum of this series is preciselythe Dirichlet kernel DN(x), as in (8.1.4) or (8.1.13), and one consequence of Theorem8.2 is precisely that DN ! � in D0(�⇡, ⇡). Thus we may expect to find Fourier seriesrepresentations of distributions, provided that we allow for the series to converge in adistributional sense.

Note that since DN ! � we must also have, by Proposition 7.2, that

D0N =

i

2⇡

NX

n=�N

neinx ! �0 (8.6.3)

asN ! 1. By repeatedly di↵erentiating, we see that any formal Fourier seriesP1

n=�1 nmeinx

is meaningful in the distributional sense, and is simply, up to a constant multiple, somederivative of the delta function. The following proposition shows that we can allow anysequence of Fourier coe�cients as long as the rate of growth is at most a power of n.

Proposition 8.9. Let {cn}1n=�1 be any sequence of constants satisfying

|cn| C|n|M (8.6.4)

for some constant C and positive integer M . Then there exists T 2 D0(�⇡, ⇡) such that

T =1X

n=�1cne

inx (8.6.5) distfs

Proof: Let

g(x) =1X

n=�1

cn(in)M+2

einx (8.6.6)

which is a uniformly convergent Fourier series, so in particular the partial sums SN ! gin the sense of distributions on (�⇡, ⇡). But then S(j)

N ! g(j) also in the distributionalsense, and in particular

1X

n=�1cne

inx = T := g(M+2) (8.6.7)

135

Page 137: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

It seems clear that any distribution on R of the form (8.6.5) should be 2⇡ periodicsince every partial sum is. To make this precise, define the translate of any distributionT 2 D0(RN) by the natural definition ⌧hT (�) = T (⌧�h�), where as usual ⌧h�(x) =�(x� h), h 2 RN . We then say that T is h-periodic with period h 2 RN if ⌧hT = T , andit is immediate that if Tn is h-periodic and Tn ! T in D0(RN) then T is also h periodic.

Example 8.3. The Fourier series identity (8.6.2) becomes

1X

n=�1�(x� 2n⇡) =

1

2⇡

1X

n=�1einx (8.6.8) 872

when regarded as an identity in D0(R), since the left side is 2⇡ periodic and coincideswith � on (�⇡, ⇡).

A 2⇡ periodic distribution on R may also naturally be regarded as an element of thedistribution space D0(T), which is defined as the space of continuous linear functionals

on C1(T). Here, convergence in C1(T) means that �(j)n ! �(j) uniformly on T for all

j = 0, 1, 2 . . . . Any function f 2 L1(T) gives rise in the usual way to regular distributionTf defined by Tf (�) =

R ⇡�⇡ f(x)�(x) dx and if f 2 L2 then then n’th Fourier coe�cient

is cn = 1

2⇡Tf (e�inx). Since e�inx 2 C1(T) it follows that

cn = T (e�inx) (8.6.9)

is defined for T 2 D0(T), and is defined to be the n’th Fourier coe�cient of the distributionT . This definition is then consistent with the definition of Fourier coe�cient for a regulardistribution, and it can be shown (Exercise 29) that

NX

n=�N

cneinx ! T in D0(T) (8.6.10)

Example 8.4. Let us evaluate the distributional Fourier series

1X

n=0

einx (8.6.11)

The n’th partial sum is

sn(x) =nX

k=0

eikx =1� ei(n+1)x

1� eix(8.6.12)

136

Page 138: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

so that we may write, sinceR ⇡�⇡ sn(x) dx = 2⇡,

sn(�) = 2⇡�(0) +

Z ⇡

�⇡

1� ei(n+1)x

1� eix(�(x)� �(0)) dx (8.6.13)

for any test function �.

The function (�(x)� �(0))/(1� eix) belongs to L2(�⇡, ⇡), henceZ ⇡

�⇡

ei(n+1)x

1� eix(�(x)� �(0)) dx ! 0 (8.6.14)

as n ! 1 by the Riemann-Lebesgue lemma. Next, using obvious trigonometric identitieswe see that 1/(1� eix) = 1

2

(1 + i cot x2

), and soZ ⇡

�⇡

�(x)� �(0)

1� eixdx = lim

✏!0+

1

2

Z

✏<|x|<⇡(�(x)� �(0))(1 + i cot

x

2) dx (8.6.15)

=1

2

Z ⇡

�⇡�(x) dx� ⇡�(0) (8.6.16)

+ lim✏!0+

i

2

Z

✏<|x|<⇡�(x) cot

x

2dx (8.6.17)

The principal value integral in (8.6.17) is naturally defined to be the action of the distri-bution pv(cot x

2

), and we obtain the final result, upon letting n ! 1, that

1X

n=0

einx = ⇡� +1

2+

i

2pv(cot

x

2) (8.6.18)

By taking the real and imaginary parts of this identity we also find1X

n=0

cosnx = ⇡� +1

2

1X

n=1

sinnx =1

2pv(cot

x

2) (8.6.19)

8.7 Fourier transforms of distributions

Taking again the example of the delta function, now considered as a distribution on RN ,it appears formally correct that it should have a Fourier transform which is a constantfunction, namely

�(x) =1

(2⇡)N

2

Z

RN

�(x)e�ix·y dx =1

(2⇡)N

2

(8.7.1)

137

Page 139: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

If the inversion theorem remains valid then any constant should also have a Fouriertransform, e.g. 1 = (2⇡)

N

2 �. On the other hand it will turn out that a function such asex does not have a Fourier transform in any reasonable sense.

We will now show that the set of distributions for which the Fourier transform canbe defined turns out to be precisely the dual space of the Schwartz space, known alsoas the space of tempered distributions. To define this we must first have a definition ofconvergence in S(RN).

Definition 8.2. We say that �n ! � in S(RN) if

limn!1

||x↵D�(�n � �)||L1(RN

)

= 0 for any ↵, � (8.7.2)

Proof of the following lemma will be left for the exercises.

lemma81 Lemma 8.1. If �n ! � in S(RN) then �n ! � in S(RN).

Definition 8.3. The set of tempered distributions on RN is the space of continuouslinear functionals on S(RN), denoted S 0(RN).

It was already observed that D(RN) ⇢ S(RN) and in addition, if �n ! � in D(RN)then the sequence also converges in S(RN). It therefore follows that

S 0(RN) ⇢ D0(RN) (8.7.3) 851

i.e. any tempered distribution is also a distribution, as the choice of language suggests.On the other hand, if Tf is the regular distribution corresponding to the L1

loc functionf(x) = ex, then Tf 62 S 0(RN) since this would require

R1�1 ex�(x) dx to be finite for any

� 2 S(RN), which is not true. Thus the inclusion (8.7.3) is strict. We define convergencein S 0(RN) is defined in the expected way, analogously to Definition 7.5:

convS Definition 8.4. If T, Tn 2 S 0(RN) for n = 1, 2 . . . then we say Tn ! T in S 0(RN) (or inthe sense of tempered distributions) if Tn(�) ! T (�) for every � 2 S(RN).

It is easy to see that the delta function belongs to S 0(RN) as does any derivative ortranslate of the delta function. A regular distribution Tf will belong to S 0(RN) providedit satisfies the condition

lim|x|!1

f(x)

|x|m = 0 (8.7.4)

138

Page 140: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

for somem. Such an f is sometimes referred to as a function of slow growth. In particular,any polynomial belongs to S 0(RN).

We can now define the Fourier transform T for any T 2 S 0(RN). For motivationof the definition, recall the Parseval identity (8.5.17), which amounts to the identityT

ˆ (�) = T (�), if we regard � as a function in S(RN) and as a tempered distribution.

Definition 8.5. If T 2 S 0(RN) then T is defined by T (�) = T (�) for any � 2 S(RN).

The action of T on any � 2 S(RN) is well-defined, since � 2 S(RN), and linearity ofT is immediate. If �n ! � in S(RN) then by Lemma 8.1 �n ! � in S(RN), so that

T (�n) = T (�n) ! T (�) = T (�) (8.7.5)

We have thus verified that T 2 S 0(RN) whenever T 2 S 0(RN).

Example 8.5. If T = �, then from the definition,

T (�) = T (�) = �(0) =1

(2⇡)N

2

Z

RN

�(x) dx (8.7.6)

Thus, as expected, � = 1

(2⇡)N

2

, the constant distribution.

Example 8.6. If T = 1 (the constant distribution) then

T (�) = T (�) =

Z

RN

�(x) dx = (2⇡)N

2

ˆ�(0) = (2⇡)N

2 �(0) (8.7.7)

where the last equality follows from the inversion theorem which is valid for any � 2S(RN). Thus again the expected result is obtained,

1 = (2⇡)N

2 � (8.7.8)

The previous two examples verify the validity of one particular instance of the Fourierinversion theorem in the distributional context, but it turns out to be rather easy toprove that it always holds. One more definition is needed first, that of the reflection ofa distribution.

Definition 8.6. If T 2 D0(RN) then T , the reflection of T , is the distribution definedby T (�) = T (�).

139

Page 141: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

We now obtain the Fourier inversion theorem in its most general form, analogous tothe statement (8.4.16) first justified when f, f are in L1(RN).

Theorem 8.6. If T 2 S 0(RN) then ˆT = T .

Proof: For any � 2 S(RN) we have

ˆT (�) = T ( ˆ�) = T (�) = T (�) (8.7.9)

The apparent triviality of this proof should not be misconstrued, as it relies on thevalidity of the inversion theorem in the Schwartz space, and other technical machinerywhich we have developed.

Here we state several more simple but useful properties. Here and elsewhere, wefollow the convention of using x and y as the independent variables before and afterFourier transformation respectively.

ftdprop Proposition 8.10. Let T 2 S 0(RN) and ↵ be a multi-index. Then

1. x↵T 2 S 0(RN).

2. D↵T 2 S 0(RN).

3. D↵T = ((�ix)↵T ) .

4. (D↵T ) = (iy)↵T .

5. If Tn 2 S 0(RN) and Tn ! T in S 0(RN) then Tn ! T in S 0(RN).propftd

Proof: We give the proof of part 3 only, leaving the rest for the exercises. Just like theinversion theorem, it is more or less a direct consequence of the corresponding identityfor functions in S(RN). For any � 2 S(RN) we have

D↵T (�) = (�1)|↵|T (D↵�) (8.7.10)

= (�1)|↵|T ((D↵�) ) (8.7.11)

= (�1)|↵|T ((iy)↵�) (8.7.12)

= (�ix)↵T (�) = ((�ix)↵T ) (�) (8.7.13)

as needed, where we used (8.5.7) to obtain (8.7.12).

140

Page 142: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

Example 8.7. If T = �0 regarded as an element of S 0(R) then

T = (�0) = iy� =iyp2⇡

(8.7.14)

by part 4 of the previous proposition. In other words

T (�) =ip2⇡

Z 1

�1x�(x) dx (8.7.15)

Example 8.8. Let T = H(x), the Heaviside function, again regarded as an element ofS 0(R). To evaluate the Fourier transform H, one possible approach is to use part 4 ofProposition 8.10 along with H 0 = � to first obtain iyH = 1/

p2⇡. A formal solution

is then H = 1/p2⇡iy, but it must then be recognized that this distributional equation

does not have a unique solution, rather we can add to it any solution of yT = 0, e.g.T = C� for any constant C. It must be verified that there are no other solutions, theconstant C must be evaluated, and the meaning of 1/y in the distribution sense mustbe made precise. See Example 8, section 2.4 of [32] for details of how this calculation iscompleted.

An alternate approach, which yields other useful formulas along the way is as follows.For any � 2 S(RN) we have

H(�) = H(�) =

Z 1

0

�(y) dy (8.7.16)

=1p2⇡

Z 1

0

Z 1

�1�(x)e�ixy dxdy (8.7.17)

= limR!1

1p2⇡

Z R

0

Z 1

�1�(x)e�ixy dxdy (8.7.18)

= limR!1

1p2⇡

Z 1

�1�(x)

✓Z R

0

e�ixy dy

◆dx (8.7.19)

= limR!1

1p2⇡

Z 1

�1�(x)

✓1� e�iRx

ix

◆dx (8.7.20)

= limR!1

1p2⇡

Z 1

�1

sinRx

x�(x) dx+

ip2⇡

Z 1

�1

cosRx� 1

x�(x) dx (8.7.21)

It can then be verified that

sinRx

x! ⇡�

cosRx� 1

x! � pv

1

x(8.7.22) 881

141

Page 143: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

as R ! 1 in D0(R). The first limit is just a restatement of the result of part b) inExercise 7 of Chapter 7, and the second we leave for the exercises. The final result,therefore, is that

H =

r⇡

2� � ip

2⇡pv

1

x(8.7.23) heavtrans

Example 8.9. Let Tn = �(x� n), i.e. Tn(�) = �(n), for n = 0,±1, . . . , so that

Tn(�) = �(n) =1p2⇡

Z 1

�1�(x)e�inx dx (8.7.24)

Equivalently,p2⇡Tn = e�inx. If we now set T =

P1n=�1 Tn then T 2 S 0(R) and

T =1p2⇡

1X

n=�1e�inx =

1p2⇡

1X

n=�1einx =

p2⇡

1X

n=�1�(x� 2⇡n) (8.7.25)

where the last equality comes from (8.6.8). The relation T (�) = T (�), then yields thevery interesting identity

1X

n=�1�(n) =

p2⇡

1X

n=�1�(2⇡n) (8.7.26) poissum

valid at least for � 2 S(R), which is known as the Poisson summation formula.

We conclude this section with some discussion of the Fourier transform and convo-lution in a distributional setting. Recall we gave a definition of the convolution T ⇤ �in Definition 7.7, when T 2 D0(RN) and � 2 D(RN). We can use precisely the samedefinition if T 2 S 0(RN) and � 2 S(RN), that is

convsp Definition 8.7. If T 2 S 0(RN) and � 2 S(RN) then (T ⇤ �)(x) = T (⌧x�).

Note that in terms of the action of the distribution T , x is just a parameter, and thatwe must regard � as a function of some unnamed other variable, say y or ·. By methodssimilar to those used in the proof of Theorem 7.3 it can be shown that

T ⇤ � 2 C1(RN) \ S 0(RN) (8.7.27)

andD↵(T ⇤ �) = D↵T ⇤ � = T ⇤D↵� (8.7.28)

In addition we have the following generalization of Proposition 8.8:

142

Page 144: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

convth3 Theorem 8.7. If T 2 S 0(RN) and � 2 S(RN) then

(T ⇤ �) = (2⇡)N

2 T � (8.7.29)

Sketch of proof: First observe that from Proposition 8.8 and the inversion theorem wesee that

(� ) =1

(2⇡)N

2

(� ⇤ ) (8.7.30)

for �, 2 S(RN). Thus for 2 S(RN)

(T �)( ) = T (� ) = T ((� ) ) =1

(2⇡)N

2

T ( ˆ� ⇤ ) = 1

(2⇡)N

2

T (� ⇤ ) (8.7.31)

On the other hand,

(T ⇤ �) ( ) = (T ⇤ �)( ) (8.7.32)

=

Z

RN

(T ⇤ �)(x) (x) dx =

Z

RN

T (⌧x�) (x) dx (8.7.33)

= T

✓Z

RN

⌧x�(·) (x) dx◆

= T

✓Z

RN

�(·� x) (x) dx

◆(8.7.34)

= T (� ⇤ ) (8.7.35)

which completes the proof.

We have labeled the above proof a ’sketch’ because one key step, the first equality in(8.7.34) was not explained adequately. See the conclusion of the proof of Theorem 7.19in [30] for why it is permissible to move T across the integral in this way.

8.8 Exercises

8-1 1. Find the Fourier seriesP1

n=�1 cneinx for the function f(x) = x on (�⇡, ⇡). Usesome sort of computer graphics to plot a few of the partial sums of this series onthe interval [�3⇡, 3⇡].

8-2 2. Use the Fourier series in problem 1 to find the exact value of the series1X

n=1

1

n2

1X

n=1

1

(2n� 1)2

143

Page 145: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

3. Evaluate explicitly the Fourier series, justifying your steps:

1X

n=1

n

2ncos (nx)

(Suggestion: start by evaluatingP1

n=1

einx

2

n

, which is a geometric series.)

4. Produce a sketch of the Dirichlet and Fejer kernels DN and KN , either by hand orby computer, for some reasonably large value of N .

5. Verify the first identity in (8.1.19).

8-5 6. We say that f 2 Hk(T) if f 2 D0(T) and its Fourier coe�cients cn satisfy

1X

n=�1n2k|cn|2 < 1 (8.8.1)

a) If f 2 H1(T) show thatP1

n=�1 |cn| is convergent and so the Fourier series of fis uniformly convergent.

b) Show that f 2 Hk(T) for every k if and only if f 2 C1(T).

7. Evaluate the Fourier series 1X

n=1

(�1)nn sin (nx)

in D0(R). If possible, plot some partial sums of this series.

8. Find the Fourier transform of H(x)e�↵x for ↵ > 0.

9. Let f 2 L1(RN).

a) If f�(x) = f(�x) for � > 0, find a relationship between bf� and bf .b) If fh(x) = f(x� h) for h 2 RN , find a relationship between bfh and bf .

8n1 10. If f 2 L1(RN) show that ⌧hf ! f in L1(RN) as h ! 0. (Hint: First prove it whenf is continuous and of compact support.)

8-10 11. Show that Z

RN

�(x) (x) dx =

Z

RN

b�(x) b (x) dx (8.8.2)

for � and in the Schwartz space. (This is also sometimes called the Parsevalidentity and leads even more directly to the Plancherel formula.)

144

Page 146: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

8n3 12. Prove Lemma 8.1.

ex-8-13 13. In this problem Jn denotes the Bessel function of the first kind and of order n. Itmay defined in various ways, one of which is

Jn(z) =i�n

Z ⇡

0

eiz cos ✓ cos (n✓) d✓ (8.8.3)

Suppose that f is a radially symmetric function in L1(R2), i.e. f(x) = f(r) wherer = |x|. Show that

f(y) =

Z 1

0

J0

(r|y|)f(r)r dr

It follows in particular that f is also radially symmetric. Using the known identityddz(zJ

1

(z)) = zJ0

(z) compute the Fourier transform of �B(0,R)

the indicator functionof the ball B(0, R) in R2.

14. For ↵ 2 R let f↵(x) = cos↵x.

a) Find the Fourier transform bf↵.b) Find lim↵!0

bf↵ and lim↵!1 bf↵ in the sense of distributions.

15. Compute the Fourier transform of the Heaviside function H(x) in yet a di↵erentway by justifying that

H = limn!1

bHn

in the sense of distributions, where Hn(x) = H(x)e�x

n , and then evaluating thislimit.

16. Prove the remaining parts of Proposition 8.10.

17. Let f 2 C(R) be 2⇡ periodic. It then has a Fourier series in the classical sense,but it also has a Fourier transform since f is a tempered distribution. What is therelationship between the Fourier series and the Fourier transform?

18. Let f 2 L2(RN). Show that f is real valued if and only if bf(�k) = bf(k) for allk 2 RN . What is the analog of this for Fourier series?

19. Let f be a continuous 2⇡ periodic function with the usual Fourier coe�cients

cn =1

2⇡

Z ⇡

�⇡f(x)e�inx dx

145

Page 147: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

Show that

cn = � 1

2⇡

Z ⇡

�⇡f(x+

n)e�inx dx

and therefore

cn =1

4⇡

Z ⇡

�⇡

⇣f(x)� f(x+

n)⌘e�inx dx.

If f is Lipschitz continuous, use this to show that there exists a constant M suchthat

|cn| M

|n| n 6= 0

20. Let R = (�1, 1) ⇥ (�1, 1) be a square in R2, let f be the indicator function of Rand g be the indicator function of the complement of R.

a) Compute the Fourier transforms f and g.

b) Is either f or g in L2(R2)?

8n5 21. Verify the second limit in (8.7.22).

22. A distribution T on RN is even if T = T , and odd if T = �T . Prove that theFourier transform of an even (resp. odd) tempered distribution is even (resp. odd).

23. Let � 2 S(R), ||�||L2

(R) = 1, and show that✓Z 1

�1x2|�(x)|2 dx

◆✓Z 1

�1y2|�(y)|2 dy

◆� 1

4(8.8.4) uncert

This is a mathematical statement of the Heisenberg uncertainty principle. (Sugges-tion: start with the identity

1 =

Z 1

�1|�(x)|2 dx = �

Z 1

�1xd

dx|�(x)|2 dx

Make sure to allow � to be complex valued.) Show that equality is achieved in(8.8.4) if � is a Gaussian.

24. Let ✓(t) =P1

n=�1 e�⇡n2t. (It is a particular case of a class of special functions

known as theta functions.) Use the Poisson summation formula (8.7.26) to showthat

✓(t) =

r1

t✓

✓1

t

146

Page 148: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

25. Use (8.7.23) to obtain the Fourier transform of pv 1

x,

( pv1

x) (y) = �i

r⇡

2sgn y (8.8.5) pvhat

26. The proof of Theorem 8.7 implicitly used the fact that if �, 2 S(RN) then �⇤ 2S(RN). Prove this property.

27. Where is the mistake in the following argument? If u(x) = e�x then u0 + u = 0 soby Fourier transformation

iyu(y) + u(y) = (1 + iy)u(y) = 0 y 2 R

Since 1+ iy 6= 0 for real y, it follows that u(y) = 0 for all real y and hence u(x) = 0.

28. If f 2 L2(RN), the autocorrelation function of f is defined to be

g(x) = (f ⇤ f)(x) =Z

RN

f(y)f(y � x) dy

Show that g(y) = |f(y)|2, g 2 L1(RN) and that g 2 C0

(RN). (g is called the powerspectrum or spectral density of f .)

8n29 29. If T 2 D0(T) and cn = T (e�inx), show that T =P1

n=�1 cneinx in D0(T).

30. The ODE u00 � xu = 0 is known as Airy’s equation, and solutions of it are calledAiry functions.

a) If u is an Airy function which is also a tempered distribution, use the Fouriertransform to find a first order ODE for u(y).

b) Find the general solution of the ODE for u.

c) Obtain the formal solution formula

u(x) = C

Z 1

�1eixy+iy3/3 dy

d) Explain why this formula is not meaningful as an ordinary integral, and how itcan be properly interpreted.

e) Is this the general solution of the Airy equation?

147

Page 149: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

Chapter 9

Distributions and Di↵erentialEquations

chde

In this chapter we will begin to apply the theory of distributions developed in the previ-ous chapter in a more systematic way to problems in di↵erential equations. The moderntheory of partial di↵erential equations, and to a somewhat lesser extent ordinary di↵eren-tial equations, makes extensive use of the so-called Sobolev spaces which we now proceedto introduce.

9.1 Weak derivatives and Sobolev spacessec-sobolev

If f 2 Lp(⌦) then for any multiindex ↵ we know that D↵f exists as an element of D0(RN),but in general the distributional derivative need not itself be a function. However if thereexists g 2 Lq(⌦) such that D↵f = Tg in D0(RN) then we say that f has the weak ↵derivative g in Lq(⌦). That is to say, the requirement is that

Z

fD↵� dx = (�1)|↵|Z

g� dx 8� 2 D(⌦) (9.1.1)

and we write D↵f 2 Lq(⌦). It is important to distinguish the concept of weak derivativeand almost everywhere (a.e.) derivative.

Example 9.1. Let ⌦ = (�1, 1) and f(x) = |x|. Obviously f 2 Lp(⌦) for any 1 p 1,and in the sense of distributions we have f 0(x) = 2H(x)� 1 (use, for example, (7.3.27)).

148

Page 150: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

Thus f 0 2 Lq(⌦) for any 1 q 1. On the other hand f 00 = 2� which does not coincidewith Tg for any g in any Lq space. Thus f has the weak first derivative, but not theweak second derivative, in Lq(⌦) for any q. The first derivative of f coincides with itsa.e. derivative. In the case of the second derivative, f 00 = 2� in the sense of distributions,and obviously f 00 = 0 a.e. but this function does not coincide with the weak secondderivative, indeed there is no weak second derivative according to the above definition.

We may now define the spaces W k,p(⌦), known as Sobolev spaces.

Definition 9.1. If ⌦ ⇢ RN is an open set, 1 p 1 and k = 1, 2, . . . then

W k,p(⌦) := {f 2 D0(⌦) : D↵f 2 Lp(⌦) |↵| k} (9.1.2)

We emphasize that the meaning of the condition D↵f 2 Lp(⌦) is that f should havethe weak ↵ derivative in Lp(⌦) as discussed above. Clearly

D(⌦) ⇢ W k,p(⌦) ⇢ Lp(⌦) (9.1.3)

so that W k,p(⌦) is always a dense subspace of Lp(⌦) for 1 p < 1.

Example 9.2. If f(x) = |x| then referring to the discussion in the previous example wesee that f 2 W 1,p(�1, 1) for any p 2 [1,1], but f 62 W 2,p for any p.

It may be readily checked that W k,p(⌦) is a normed linear space with norm

||f ||Wk,p

(⌦)

=

8<

:

⇣P|↵|k ||D↵f ||pLp

(⌦)

⌘ 1

p

1 p < 1max|↵|k ||D↵f ||L1

(⌦)

p = 1(9.1.4)

Furthermore, the necessary completeness property can be shown (Exercise 5, or see The-orem 9.1 below) so that W k,p(⌦) is a Banach space. When p = 2 the norm may beregarded as arising from the inner product

hf, gi =X

|↵|k

D↵f(x)D↵g(x) dx (9.1.5)

so that it is a Hilbert space. The alternative notation Hk(⌦) is commonly used in placeof W k,2(⌦).

There is a second natural way to give meaning to the idea of a function f 2 Lp(⌦)having a derivative in an Lq space, which is as follows: if there exists g 2 Lq(⌦) such

149

Page 151: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

that there exists fn 2 C1(⌦) satisfying fn ! f in Lp(⌦) and D↵fn ! g in Lq(⌦), thenwe say f has the strong ↵ derivative g in Lq(⌦).

It is elementary to see that a strong derivative is also a weak derivative – we simplylet n ! 1 in the identity

Z

D↵fn� dx = (�1)↵Z

fnD↵� dx (9.1.6)

for any test function �. Far more interesting is that when p < 1 the converse statement isalso true, that is weak=strong. This important result, which shall not be proved here, wasfirst established by Friedrichs [12] in some special situations, and then in full generalityby Meyers and Serrin [23]. A more thorough discussion may be found, for example, inChapter 3 of Adams [1]. The key idea is to use convolution, as in Theorem 7.5 to obtainthe needed sequence fn of C1 functions. For f 2 W k,p(⌦) the approximating sequencemay clearly be supposed to belong to C1(⌦)\W k,p(⌦), so this space is dense in W k,p(⌦)and we have

HisW Theorem 9.1. For any open set ⌦ ⇢ RN , 1 p < 1 and k = 0, 1, 2 . . . the Sobolevspace W k,p(⌦) coincides with the closure of C1(⌦) \W k,p(⌦) in the W k,p(⌦) norm.

We now define another class of Sobolev spaces which will be important for later use.

Definition 9.2. For ⌦ ⇢ RN , W k,p0

(⌦) is defined to be the closure of C10

(⌦) in theW k,p(⌦) norm.

Obviously W k,p0

(⌦) ⇢ W k,p(⌦), but it may not be immediately clear whether theseare actually the same space. In fact this is certainly true when k = 0 since in this casewe know C1

0

(⌦) is dense in Lp(⌦), 1 p < 1. It also turns out to be correct for any k, pwhen ⌦ = RN (see Corollary 3.19 of Adams [ ]). But in general the inclusion is strict,and f 2 W k,p

0

(⌦) carries the interpretation that D↵f = 0 on @⌦ for |↵| k � 1. Thistopic will be continued in more detail in Chapter ( ).

9.2 Di↵erential equations in D0

If we consider the simplest di↵erential equation u0 = f on an interval (a, b) ⇢ R, thenfrom elementary calculus we know that if f is continuous on [a, b], then every solutionis of the form u(x) =

R x

af(y) dy + C, for some constant C. Furthermore in this case

150

Page 152: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

u 2 C1([a, b]), and u0(x) = f(x) for every x 2 (a, b) and we would refer to u as a classicalsolution of u0 = f . If we make the weaker assumption that f 2 L1(a, b) then we can nolonger expect u to be C1 or u0(x) = f(x) to hold at every point, since f itself is onlydefined up to sets of measure zero. If, however, we let u(x) =

R x

af(y) dy + C then it is

an important result of measure theory that u0(x) = f(x) a.e. on (a, b). The questionremains whether all solutions of u0 = f are of this form, and the answer must nowdepend on precisely what is meant by ’solution’. If we were to interpret the di↵erentialequation as meaning u0 = f a.e. then the answer is no. For example u(x) = H(x) isa nonconstant function on (�1, 1) with u0(x) = 0 for x 6= 0. An alternative meaning isthat the di↵erential equation should be satisfied in the sense of distributions on (a, b), inwhich case we have the following theorem.

th9-2 Theorem 9.2. Let f 2 L1(a, b).

a) If F (x) =R x

af(y) dy then F 0 = f in D0(a, b).

b) If u0 = f in D0(a, b), then there exists a constant C such that

u(x) =

Z x

a

f(y) dy + C a < x < b (9.2.1) 921

Proof: If F (x) =R x

af(y) dy, then for any � 2 C1

0

(a, b) we have

F 0(�) = �F (�0) = �Z b

a

F (x)�0(x) dx (9.2.2)

= �Z b

a

✓Z x

a

f(y) dy

◆�0(x) dx (9.2.3)

= �Z b

a

f(y)

✓Z b

y

�0(x) dx◆

dy (9.2.4)

=

Z b

a

f(y)�(y) dy = f(�) (9.2.5)

Here the interchange of order of integration in the third line is easily justified by Fubini’stheorem. This proves part a).

Now if u0 = f in the distributional sense then T = u� F satisfies T 0 = 0 in D0(a, b),and we will finish by showing that T must be a constant. Choose �

0

2 C10

(a, b) such

151

Page 153: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

thatR b

a�0

(y) dy = 1. If � 2 C10

(a, b), set

(x) = �(x)�✓Z b

a

�(y) dy

◆�0

(x) (9.2.6)

so that 2 C10

(a, b) andR b

a (x) dx = 0. Let

⇣(x) =

Z x

a

(y) dy (9.2.7)

Obviously ⇣ 2 C1(a, b) since ⇣ 0 = , but in fact ⇣ 2 C10

(a, b) since ⇣(a) = ⇣(b) = 0 and⇣ 0 = ⌘ 0 in some neighborhood of a and of b. Finally it follows, since T 0 = 0 that

0 = T 0(⇣) = �T (⇣ 0) = �T ( ) =

✓Z b

a

�(y) dy

◆T (�

0

)� T (�) (9.2.8)

or equivalently T (�) =R b

aC�(y) dy where C = T (�

0

). Thus T is the distribution corre-sponding to the constant function C.

We emphasize that part b) of this theorem is of interest, and not obvious, even whenf = 0: any distribution whose distributional derivative on some interval is zero must be aconstant distribution on that interval. Therefore, any distribution is uniquely determinedup to an additive constant by its distributional derivative, which, to repeat, is not thecase for the a.e. derivative.

Now let ⌦ ⇢ RN be an open set and

Lu =X

|↵|m

a↵(x)D↵u (9.2.9) pdo

be a di↵erential operator of order m. We assume that a↵ 2 C1(⌦) in which caseLu 2 D0(⌦) is well defined for any u 2 D0(⌦). We will use the following terminology forthe rest of this chapter.

Definition 9.3. If f 2 D0(⌦) then

• u is a classical solution of Lu = f in ⌦ if u 2 Cm(⌦) and Lu(x) = f(x) for everyx 2 ⌦.

• u is a weak solution of Lu = f in ⌦ if u 2 L1

loc(⌦) and Lu = f in D0(⌦).

152

Page 154: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

• u is a distributional solution of Lu = f in ⌦ if u 2 D0(⌦) and Lu = f in D0(⌦).

It is clear that a classical solution is also a weak solution, and a weak solution is adistributional solution. The converse statements are false in general, but may be truein special cases.For example we have proved above that any distributional solution ofu0 = 0 must be constant, hence in particular any distributional solution of this di↵erentialequation is actually a classical solution. On the other hand u = � is a distributionalsolution of x2u0 = 0, but is not a classical or weak solution. Of course a classical solutioncannot exist if f is not continuous on ⌦. A theorem which says that any solution ofa certain di↵erential equation must be smoother than what is actually needed for thedefinition of solution, is called a regularity result. Regularity theory is a large andimportant research topic within the general area of di↵erential equations.

Example 9.3. Let Lu = uxx � uyy. If F,G 2 C2(R) and u(x, y) = F (x+ y) +G(x� y)then we know u is classical solution of Lu = 0. We have also observed, in Example 7.12that if F,G 2 L1

loc(R) then Lu = 0 in the sense of distributions, thus u is a weak solutionof Lu = 0 according to the above definition. The equation has distributional solutionsalso, which are not weak solutions. For example, the singular distribution T defined byT (�) =

R1�1 �(x, x), dx in Exercise 11 of Chapter 7).

Example 9.4. If Lu = uxx+uyy then it turns out that all solutions of Lu = 0 are classicalsolutions, in fact, any distributional solution must be in C1(⌦). This is an example ofvery important kind of regularity result in PDE theory, and will not be proved here, seefor example Corollary 2.20 of [11]. The di↵erence between Laplace’s equation and thewave equation, i.e. that Laplace’s equation has only classical solutions, while the waveequation has many non-classical solutions, is a typical di↵erence between solutions ofPDEs of elliptic and hyperbolic types.

9.3 Fundamental solutionssecfundsol

Let ⌦ ⇢ RN , L be a di↵erential operator as in (9.2.9), and suppose G(x, y) has thefollowing properties1:

G(·, y) 2 D0(⌦) LxG(x, y) = �(x� y) 8y 2 ⌦ (9.3.1)

1The subscript x in L

x

is used here to emphasize that the di↵erential operator is acting in the x variable,with y in the role of a parameter.

153

Page 155: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

We then call G a fundamental solution of L in ⌦. If such a G can be found, then formallyif we let

u(x) =

Z

G(x, y)f(y) dy (9.3.2) fundsolform

we may expect that

Lu(x) =

Z

LxG(x, y)f(y) dy =

Z

�(x� y)f(y) dy = f(x) (9.3.3) 932

That is to say, (9.3.2) provides a way to obtain solutions of the PDE Lu = f , andperhaps also a tool to analyze specific properties of solutions. We are of course ignoringhere all questions of rigorous justification – whether the formula for u even makes senseif G is only a distribution in x, for what class of f ’s this might be so, and whether it ispermissible to di↵erentiate under the integral to obtain (9.3.3). A more advanced PDEtext such as Hormander [16] may be consulted for such study. Fundamental solutionsare not unique in general, since we could always add to G any function H(x, y) satisfyingthe homogeneous equation LxH = 0 for fixed y.

We will focus now on the case that ⌦ = RN and a↵(x) ⌘ a↵ for every ↵, i.e. L is aconstant coe�cient operator. In this case, if we can find � 2 D0(RN) for which L� = �,then G(x, y) = �(x� y) is a fundamental solution according to the above definition, andit is normal in this situation to refer to � itself as the fundamental solution rather thanG.

Formally, the solution formula (9.3.2) becomes

u(x) =

Z

RN

�(x� y)f(y) dy (9.3.4)

an integral operator of convolution type. Again it may not be clear if this makes senseas an ordinary integral, but recall that we have earlier defined (Definition 7.7) the con-volution of an arbitrary distribution and test function, namely

u(x) = (� ⇤ f)(x) := �(⌧xf) (9.3.5)

if � 2 D0(⌦) and f 2 C10

(RN). Furthermore, using Theorem 7.3, it follows that u 2C1(RN) and

Lu(x) = (L�) ⇤ f(x) = (� ⇤ f)(x) = f(x) (9.3.6)

We have therefore proved

Proposition 9.1. If there exists � 2 D0(⌦) such that L� = �, then for any f 2 C10

(RN)the function u = � ⇤ f is a classical solution of Lu = f .

154

Page 156: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

It will essentially always be the case that the solution formula u = �⇤f is actually validfor a much larger class of f ’s than C1

0

(RN) but this will depend on specific properties ofthe fundamental solution �, which in turn depend on those of the original operator L.

Example 9.5. If L = �, the Laplacian operator in R3, then we have already shown(Example 7.13) that �(x) = �1/4⇡|x| satisfies �� = � in the sense of distributions onR3. Thus

u(x) =

✓� 1

4⇡|x| ⇤ f◆(x) = � 1

4⇡

Z

R3

f(y)

|x� y| dy (9.3.7) newtpot

provides a solution of �u = f in R3, at least when f 2 C10

(R3). The integral on theright in (9.3.7) is known as the Newtonian potential of f , and can be shown to be a validsolution formula for a much larger class of f ’s. It is in any case always a ’candidate’solution, which can be analyzed directly. A fundamental solution of the Laplacian existsin RN for any dimension, and will be recalled at the end of this section.

Example 9.6. Consider the wave operator Lu = utt�uxx in R2. A fundamental solutionfor L (see Exercise 9) is

�(x, t) =1

2H(t� |x|) (9.3.8)

The support of �, namely the set {(x, t) : |x| < t} is in this context known as the forwardlight cone, representing the set of points x at which for fixed t > 0 a signal emanatingfrom the origin x = 0 at time t = 0, and travelling with speed one, may have reached.

The resulting solution formula for Lu = f may then be obtained as

u(x, t) =

Z 1

�1

Z 1

�1�(x� y, t� s)f(y, s) dyds (9.3.9)

=1

2

Z 1

�1

Z 1

�1H(t� s� |x� y|)f(y, s) dyds (9.3.10)

=1

2

Z t

�1

Z x+t�s

x�t+s

f(y, s) dyds (9.3.11)

In many cases of interest f(x, t) ⌘ 0 for t < 0 in which case we replace the lower limitin the s integral by 0. In any case the region over which f is integrated is the ’backward’light cone, with vertex at (x, t). Under this support assumption on f it also follows thatu(x, 0) = ut(x, 0) ⌘ 0, so by adding in the corresponding terms in D’Alembert’s solution(2.3.46) we find that

u(x, t) =1

2

Z t

0

Z x+s�t

x+t�s

f(y, s) dyds+1

2(h(x+ t) + h(x� t)) +

1

2

Z x+t

x�t

g(s) ds (9.3.12)

155

Page 157: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

is the unique solution of

utt � uxx = f(x, t) x 2 R t > 0 (9.3.13)

u(x, 0) = h(x) x 2 R (9.3.14)

ut(x, 0) = g(x) x 2 R (9.3.15)

It is of interest to note that this solution formula could also be written, formally at least,as

u(x, t) = (� ⇤ f)(x, t) + @

@t(� ⇤

(x)h)(x, t) + (� ⇤

(x)g)(x, t) (9.3.16) 9315

where the notation (� ⇤(x)

h) indicates that the convolution takes place in x only, with t

as a parameter. Thus the fundamental solution � enters into the solution not only of theinhomogeneous equation Lu = f but in solving the Cauchy problem as well. This is notan accidental feature, and we will see other instances of this sort of thing later.

So far we have seen a couple of examples where an explicit fundamental solution isknown, but have given no indication of a general method for finding it, or even deter-mining if a fundamental solution exists. Let us address the second issue first, by statingwithout proof a remarkable theorem.

MalEhr Theorem 9.3. (Malgrange-Ehrenpreis) If L 6= 0 is any constant coe�cient linear di↵er-ential operator then there exists a fundamental solution of L.

The proof of this theorem is well beyond the scope of this book, see for exampleTheorem 8.5 of [30] or Theorem 10.2.1 of [16]. The assumption of constant coe�cientsis essential here, counterexamples are known otherwise.

If we now consider how it might be possible to compute a fundamental solution for agiven operator L, it soon becomes apparent that the Fourier transform may be a usefultool. If we start with the distributional PDE

L� =X

|↵|m

a↵D↵� = � (9.3.17)

and take the Fourier transform of both sides, the result is

X

|↵|m

a↵(D↵�) =

X

|↵|m

a↵(iy)↵� =

1

(2⇡)N

2

(9.3.18)

orP (y)�(y) = 1 (9.3.19)

156

Page 158: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

where P (y), the so-called symbol or characteristic polynomial of L is defined as

P (y) = (2⇡)N

2

X

|↵|m

(iy)↵a↵ (9.3.20)

Note it was implicitly assumed here that � exists, which would be the case if � werea tempered distribution, but this is not actually guaranteed by Theorem 9.3. This is arather technical issue which we will not discuss here, but rather take the point of viewthat we seek a formal solution which, potentially, further analysis may show is a bonafide fundamental solution.

We have thus obtained �(y) = 1/P (y), or by the inversion theorem

�(x) =1

(2⇡)N

2

Z

RN

1

P (y)eix·y dy (9.3.21) fundsolform2

as a candidate for fundamental solution of L. One particular source of di�culty inmaking sense of the inverse transform of 1/P is that in general P has zeros, which mightbe of arbitrarily high order, making the integrand too singular to have meaning in anyordinary sense. On the other hand, we have seen, at least in one dimension, how well-defined distributions of the ’pseudo-function’ type may be associated with non- locallyintegrable functions such as 1/xm. Thus there may be some analogous construction inmore than one dimension as well. This is in fact one possible means to proving theMalgrange-Ehrenpreis theorem.

It also suggests that the situation may be somewhat easier to deal with if the zeroset of P in RN is empty, or at least not very large. As a polynomial, of course, Palways has zeros, but some or all of these could be complex, whereas the obstructions tomaking sense of (9.3.21) pertain to the real zeros of P only. If L is a constant coe�cientdi↵erential operator of order m as above, define

Pm(y) = (2⇡)N

2

X

|↵|=m

(iy)↵a↵ (9.3.22)

which is known as the principal symbol of L.

Definition 9.4. We say that L is elliptic if y 2 RN , Pm(y) = 0 implies that y = 0.

That is to say, the principal symbol has no nonzero real roots. For example theLaplacian operator L = � is elliptic, as is �+ lower order terms, since either way

157

Page 159: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

P2

(y) = �|y|2. On the other hand, the wave operator, written say as Lu = �u�uxN+1

xN+1

is not elliptic, since the principal symbol is P2

(y) = y2N+1

�PN

j=1

y2j .

The following is not so di�cult to establish (Exercise 16), and may be exploited inworking with the representation (9.3.21) in the elliptic case.

prop92 Proposition 9.2. If L is elliptic then

{y 2 RN : P (y) = 0} (9.3.23)

the real zero set of P , is compact in RN , and lim|y|!1 |P (y)| = 1.

We will next derive a fundamental solution for the heat equation by using the Fouriertransform, although in a slightly di↵erent way from the above discussion. Consider firstthe initial value problem for the heat equation

ut ��u = 0 x 2 RN t > 0 (9.3.24)

u(x, 0) = h(x) x 2 RN (9.3.25)

with h 2 C10

(RN). Assuming a solution exists, define the Fourier transform in the xvariables,

u(y, t) =1

(2⇡)N

2

Z

RN

u(x, t)e�ix·y dx (9.3.26)

Taking the partial derivative with respect to t of both sides gives (u)t = (ut) so by theusual Fourier transformation calculation rules,

(ut) = (u)t = �|y|2u (9.3.27)

and u(y, 0) = h(y). We may regard this as an ODE in t satisfied by u(y, t) for fixed y,for which the solution obtained by elementary means is

u(y, t) = e�|y|2th(y) (9.3.28)

If we let � be such that �(y, t) = 1

(2⇡)N

2

e�|y|2t then by Theorem 8.8 it follows that

u(x, t) = (� ⇤(x)

h)(x, t) (9.3.29)

Since � is a Gaussian in x, the same is true for � itself, as long as t > 0, and from (8.4.7)we get

�(x, t) = H(t)e�

|x|24t

(4⇡t)N

2

(9.3.30) hteqfs

158

Page 160: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

By including the H(t) factor we have for later convenience defined �(x, t) = 0 for t < 0.Thus we get an integral representation for the solution of (9.3.38)-(9.3.39), namely

u(x, t) =

Z

RN

�(x� y, t)h(y) dy =1

(4⇡t)N

2

Z

RN

e�|x�y|2

4t h(y) dy (9.3.31) 930

valid for x 2 RN and t > 0. As usual, although this was derived for convenience undervery restrictive conditions on h, it is actually valid much more generally (see Exercise12).

Now to derive a solution formula for ut ��u = f , let v = v(x, t; s) be the solution of(9.3.38)-(9.3.39) with h(x) replaced by f(x, s), regarding s for the moment as a parameter,and define

u(x, t) =

Z t

0

v(x, t� s; s) ds (9.3.32) 931

Assuming that f is su�ciently regular, it follows that

ut(x, t) = v(x, 0, t) +

Z t

0

vt(x, t� s, s) ds (9.3.33)

= f(x, t) +

Z t

0

�v(x, t� s, s) ds (9.3.34)

= f(x, t) +�u(x, t) (9.3.35)

Inserting the formula (9.3.31) with h replaced by f(·, s) gives

u(x, t) = (� ⇤ f)(x, t) =Z t

0

Z

RN

�(x� y, t� s)f(y, s) dyds (9.3.36) 935

with � given again by (9.3.30). Strictly speaking, we should assume that f(x, t) ⌘ 0 fort < 0 in order that the integral on the right in (9.3.36) coincide with the convolutionin RN+1, but this is without loss of generality, since we only seek to solve the PDE fort > 0. The procedure used above for obtaining the solution of the inhomogeneous PDEstarting with the solution of a corresponding initial value problem is known as Duhamel’smethod, and is generally applicable, with suitable modifications, for time dependent PDEsin which the coe�cients are independent of time.

Since u(x, t) in (9.3.32) evidently satisfies u(x, 0) ⌘ 0, it follows (compare to (9.3.16))that

u(x, t) = (� ⇤(x)

h)(x, t) + (� ⇤ f)(x, t) (9.3.37)

159

Page 161: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

is a solution2 of

ut ��u = f(x, t) x 2 RN t > 0 (9.3.38)

u(x, 0) = h(x) x 2 RN (9.3.39)

Let us also observe here that if

F (x) =1

(2⇡)N

2

e�|x|24 (9.3.40)

then F � 0,RRN

F (x) dx = 1, and

�(x, t) =

✓1pt

◆N

F (xpt) (9.3.41)

for t > 0. From Theorem 7.2, and the observation that a sequence of the form (7.3.11)satisfies the assumptions of that theorem, it follows that nNF (nx) ! � in D(RN) asn ! 1. Choosing n = 1p

twe conclude that

limt!0+

�(·, t) = � in D0(RN) (9.3.42)

In particular limt!0+

(� ⇤(x)

h)(x, t) = h(x) for all x 2 RN , at least when h 2 C10

(RN).

We conclude this section by collecting all in one place a number of important funda-mental solutions. Some of these have been discussed already, some will be left for theexercises, and in several other cases we will be content with a reference.

Laplace operator

For L = � in RN there exists the following fundamental solutions3:

�(x) =

8><

>:

|x|2

N = 11

2⇡log |x| N = 2

CN

|x|N�2

N � 3

(9.3.43) laplace-fund

2Note we do not say ’the solution’ here, in fact the solution is not unique without further restrictions.3Some texts will use consistently the fundamental solution of �� rather than �, in which case all of the signs

will be reversed.

160

Page 162: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

where

CN =1

(2�N)⌦N�1

⌦N�1

=

Z

|x|=1

dS(x) (9.3.44)

Thus CN is a geometric constant, related to the area of the unit sphere in RN – anequivalent formula in terms of the volume of the unit ball in RN is also commonly used.Of the various cases, N = 1 is elementary to check, N = 2 is requested in Exercise 20 ofChapter 7, and we have done the N � 3 case in Example 7.13.

Heat operator

For the heat operator L = @@t

� � in RN+1, we have derived earlier in this section thefundamental solution

�(x, t) = H(t)e�

|x|24t

(4⇡t)N

2

(9.3.45)

for all N .

Wave operator

For the wave operator L = @2

@t2�� in RN+1, the fundamental solution is again significantly

dependent on N . The cases of N = 1, 2, 3 are as follows:

�(x, t) =

8>><

>>:

1

2

H(t� |x|) N = 11

2⇡H(t�|x|)pt2�|x|2 N = 2

�(t�|x|)4⇡|x| N = 3

(9.3.46)

We have discussed the N = 1 case earlier in this section, and refer to [10] or [18] for thecases N = 2, 3. As a distribution, the meaning of the the fundamental solution in theN = 3 case is just what one expects from the formal expression, namely

�(�) =

Z

R3

Z 1

�1

�(t� |x|)4⇡|x| �(x, t) dtdx =

Z

R3

�(x, |x|)4⇡|x| dx (9.3.47)

for any test function �. Note the tendency for the fundamental solution to become moreand more singular, as N increases. This pattern persists in higher dimensions, as thefundamental solution starts to contain expressions involving �0 and higher derivatives ofthe � function.

161

Page 163: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

Schrodinger operator

The Schrodinger operator is defined as L = @@t

� i� in RN+1. The derivation of afundamental solution here is nearly the same as for the heat equation, the result being

�(x, t) = H(t)e�

|x|24it

(4i⇡t)N

2

(9.3.48)

In quantum mechanics � is frequently referred to as the ’propagator’. See [26] for muchmaterial about the Schrodinger equation.

Helmholtz operator

The Helmholtz operator is defined by Lu = �u��u. For � > 0 and dimensions N = 2, 3fundamental solutions are

�(x) =

8>><

>>:

sin (

p�|x|)

2

p�

N = 1p�

2⇡K

0

(p�x) N = 2

� e�p

�|x|

4⇡|x| N = 3

(9.3.49) 9349

where K0

is the so-called modified Bessel function of the second kind and order 0. SeeChapter 6 of [3] for derivations of these formulas when N = 2, 3, while the N = 1 caseis left for the exercises. This is a case where it may be convenient to use the Fouriertransform method directly, since the symbol of L, P (y) = �|y|2 � � has no real zeros.

Klein-Gordon operator

The Klein-Gordon operator is defined by Lu = @2u@t2

��u��u in RN+1. We mention onlythe case N = 1, � > 0, in which case a fundamental solution is

�(x, t) =1

2H(t� |x|)J

0

(p�(t2 � x2)) N = 1 (9.3.50)

where J0

is the Bessel function of the first kind and order zero (see Exercise 13 of Chapter8). This may be derived, for example, by the method presented in Problem 2, Section5.1 of [18], and choosing = �.

162

Page 164: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

Biharmonic operator

The biharmonic operator is L = �2, i.e. Lu = �(�u). It arises especially in connectionwith the theory of plates and shells, so that N = 2 is the most interesting case. Afundamental solution is

�(x) = |x|2 log |x| N = 2 (9.3.51)

for which a derivation of this is outlined in Exercise 10.

9.4 Exercises

1. Show that an equivalent definition of W 2,s(RN) = Hs(RN) for s = 0, 1, 2, . . . is

Hs(RN) = {f 2 S 0(RN) :

Z

Rn

|f(y)|2(1 + |y|2)s dy < 1} (9.4.1) HsDef

The second definition makes sense even if s isn’t a positive integer and leads to oneway to define fractional and negative order di↵erentiability. Implicitly it requiresthat f (but not f itself) must be a function.

2. Using the definition (9.4.1), show that Hs(RN) ⇢ C0

(RN) if s > N2

. Show that� 2 Hs(RN) if s < �N

2

.

3. If ⌦ is a bounded open set in R3, and u(x) = 1

|x| , show that u 2 W 1,p(⌦) for

1 p < 3

2

. Along the way, you should show carefully that a distributional firstderivative @u

@xi

agrees with the corresponding pointwise derivative.

4. Prove that if f 2 W 1,p(a, b) for p > 1 then

|f(x)� f(y)| ||f ||W 1,p

(a,b)|x� y|1�1

p (9.4.2)

so in particular W 1,p(a, b) ⇢ C([a, b]). (Caution: You would like to use the fun-damental theorem of calculus here, but it isn’t quite obvious whether it is validassuming only that f 2 W 1,p(a, b).)

ex-8-6 5. Proved directly that W k,p(⌦) is complete (relying of course on the fact that Lp(⌦)is complete).

6. Show that Theorem 9.1 is false for p = 1.

163

Page 165: Paul E. Sacks August 20, 2015 - Department of … Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 ... a solution of either equation will mean a

7. If f is a nonzero constant function on [0, 1], show that f 62 W 1,p0

(0, 1) for 1 p < 1.

8. Let Lu = u00 + u and E(x) = H(x) sin x, x 2 R.a) Show that E is a fundamental solution of L.

b) What is the corresponding solution formula for Lu = f?

c) The fundamental solution E is not the same as the one given in (9.3.49). Doesthis call for any explanation?

ex-8-8 9. Show that E(x, t) = 1

2

H(t � |x|) is a fundamental solution for the wave operatorLu = utt � uxx.

ex-9-10 10. The fourth order operator Lu = uxxxx + 2uxxyy + uyyyy in R2 is the biharmonicoperator which arises in the theory of deformation of elastic plates.

a) Show that L = �2, i.e. Lu = �(�u) where � is the Laplacian.

b) Find a fundamental solution of L. (Suggestions: To solve LE = �, first solve�F = � and then �E = F . Since F will depend on r =

px2 + y2 only, you can

look for a solution E = E(r) also.)

11. Let Lu = u00 + ↵u0 where ↵ > 0 is a constant.

a) Find a fundamental solution of L which is a tempered distribution.

b) Find a fundamental solution of L which is not a tempered distribution.

ex-9-12 12. Show directly that u(x, t) defined by (9.3.31) is a classical solution of the heatequation for t > 0, under the assumption that h is bounded and continuous on RN .

13. Assuming that (9.3.31) is valid and h 2 Lp(RN), derive the decay property

||u(·, t)||L1(RN

)

||h||Lp

(RN

)

tN

2p

(9.4.3)

for 1 p 1.

14. If

G(x, y) =

(y(x� 1) 0 < y < x < 1

x(y � 1) 0 < x < y < 1

show that G is a fundamental solution of Lu = u00 in (0, 1).

15. Is the heat operator L = @@t�� elliptic?

ex-9-4 16. Prove Proposition 9.2.

164