strichartz_the way of analysis 2000

759

Click here to load reader

Upload: pedro-cango

Post on 25-Oct-2015

5.049 views

Category:

Documents


867 download

DESCRIPTION

Advanced Mathematics for economics

TRANSCRIPT

Page 1: Strichartz_The Way of Analysis 2000

TheWay

Robert S.Strichartz ofAnalysis

Page 2: Strichartz_The Way of Analysis 2000

BOSTON roRONTO LONDON SINGAPORE

Sudbury, MassachusettsJONES AND BARTLETT PUBLISHERS

la~

Comell University

Robert S. Strichartz

The Way of AnalysisRevised Edition

Page 3: Strichartz_The Way of Analysis 2000

Printed in the United States of America04 03 109 8 7 6 5 4 3

Tbe quotations in problem 1 in problem set 1.1.3 (page 7) are of proverbial or biblical origin,except for a. from "Home on the Range," by Brewster M. Higley, and g. fromMacbeth, byWilliam Shakespeare.

ISBN: 0-7637-1497-6

All rights reserved. No part of the material protected by tbis copyright notice may be reproduced orutilized in any form, electronic or mechanical, including photocopying, recording, or by anyinfonnation storage and retrieval system, without written permission from the copyright owner.

Copyright© 2000 by Iones and Bartlett Publishers, Inc.

Iones and Bartlett Publishers IntemationalBarb House, Barb MewsLondonW6 7PAUK

Iones and Bartlett PublishersCANADA WEST2100Bloor StoSuite 6-272Toronto, ONM65 5A5CANADA

World HeadquartersIones and Bartlett Publishers40 Tall Pine OriveSudbury, MA [email protected]

Page 4: Strichartz_The Way of Analysis 2000

v

25252530373838414548

2 Construction of the Real Number System2.1 Cauchy Sequences ...

2.1.1 Motivation ..2.1.2 The Definition2.1.3 Exercises ....

2.2 The Reals as an Ordered Field2.2.1 Defining Arithmetic2.2.2 The Field Axioms2.2.3 Order ..2.2.4 Exercises .....

1114788

10131313171821

1 Preliminaries1.1 The Logic of Quantifiers

1.1.1 Rules of Quantifiers1.1.2 Examples1.1.3 Exercises .

1.2 Infinite Sets .1.2.1 Countable Sets1.2.2 Uncountable Sets .1.2.3 Exercises .

1.3 Proofs .1.3.1 How to Discover Proofs1.3.2 How to Understand Proofs

1.4 The Rational Number System .1.5 The Axiom of Choice* .

xiiiPreface

Contents

Page 5: Strichartz_The Way of Analysis 2000

138140

4.2.4 Exercises4.3 Summary . . . . . . . . . .

4.1.2 Limits of Functions and Limits of Sequences . 1194.1.3 Inverse Images of Open Sets . 1214.1.4 Related Definitions . . . . . . 1234.1.5 Exercises 125

4.2 Properties of Continuous Functions . 1274.2.1 Basic Properties . . . . . . . 1274.2.2 Continuous Functions on Compact Domains . 1314.2.3 Monotone Functions 134

111111111

4 Continuous Functions4.1 Concepts of Continuity .

4.1.1 Definitions ...

73737378848686919899

106107

3 Topologyof the Real Line3.1 The Theory of Limits ..

3.1.1 Limits, Sups, and Infs3.1.2 Limit Points .3.1.3 Exercises .

3.2 Open Sets and Closed Sets3.2.1 Open Sets .3.2.2 Closed Sets3.2.3 Exercises

3.3 Compact Sets . .3.3.1 Exercises

3.4 Summary ....

5050525456565963666869

2.3 Limits and Completeness ..2.3.1 Proof of Completeness .2.3.2 Square Roots . . . .2.3.3 Exercises .

2.4 Other Versions and Visions .2.4.1 Infinite Decimal Expansions .2.4.2 Dedekind Cuts* ....2.4.3 Non-Standard Analysis*2.4.4 Constructive Analysis* .2.4.5 Exercises

2.5 Summary . . . . . . . . .

Contentsvi

Page 6: Strichartz_The Way of Analysis 2000

227231232

tinuities* ..6.2.4 Exercises ..

6.3 Improper Integrals*

6.1.3 Useful Integration Formulas. . 2126.1.4 Numerical Integration 2146.1.5 Exercises 217

6.2 The Riemann Integral . . . . . . 2196.2.1 Definition of the Integral 2196.2.2 Elementary Properties of the Integral 2246.2.3 Functions with a Countable Number of Discon-

201201201207

143143143148152153153

. 157162163165165168171176177177181185188190192195

vii

6 Integral Calculus6.1 Integrals of Continuous Functions. . . . . .

6.1.1 Existence of the Integral . . . . . . .6.1.2 Fundamental Theorems of Calculus .

5.2.4 Exercises .5.3 The Calculus of Derivatives . . . . .

5.3.1 Product and Quotient Rules ..5.3.2 The Chain Rule .5.3.3 Inverse Function Theorem ..5.3.4 Exercises .

5.4 Higher Derivatives and Taylor's Theorem5.4.1 Interpretations of the Second Derivative5.4.2 Taylor's Theorem .5.4.3 L'Hopital's Rule* .5.4.4 Lagrange Remainder Formula*5.4.5 Orders of Zeros*5.4.6 Exercises

5.5 Summary . . . . . . . .

5 Differential Calculus5.1 Concepts of the Derivative . . . . . . . . . . . . . .

5.1.1 Equivalent Definitions .5.1.2 Continuity and Continuous Differentiability5.1.3 Exercises .

5.2 Properties of the Derivative .5.2.1 Local Properties . . . . . . . . . . . . . . . .5.2.2 Intermediate Value and Mean Value Theorems5.2.3 Global Properties ...

Contents

Page 7: Strichartz_The Way of Analysis 2000

305307309309312314316

7.5.4 Approximating Derivatives7.5.5 Exercises .

7.6 Equicontinuity .7.6.1 The Definition of Equicontinuity7.6.2 The Arzela-Ascoli Theorem7.6.3 Exercises

7.7 Summary .

7.3.2 Integration and Differentiation of Limits . 2687.3.3 Unrestricted Convergence* 2727.3.4 Exercises 274

7.4 Power Series. . . . . . . . . . . . . 2767.4.1 The Radius of Convergence 2767.4.2 Analytic Continuation . . . 2817.4.3 Analytic Functions on Complex Domains* . . 2867.4.4 Closure Properties of Analytic Functions* 2887.4.5 Exercises 294

7.5 Approximation by Polynomials . . . . . . . . . . 2967.5.1 Lagrange Interpolation. . . . . . . . . . . 2967.5.2 Convolutions and Approximate Identities 2977.5.3 The Weierstrass Approximation Theorem 301

7.2

241241241247249250250256260262263263

7.3

7 Sequences and Series of Functions7.1 Complex Numbers .

7.1.1 Basic Properties of C .7.1.2 Complex-Valued Functions7.1.3 Exercises .Numerical Series and Sequences ..7.2.1 Convergence and Absolute Convergence7.2.2 Rearrangements ...7.2.3 Summation by Parts*7.2.4 Exercises .Uniform Convergence .7.3.1 Uniform Limits and Continuity

232235236

6.3.1 Definitions and Examples6.3.2 Exercises

6.4 Summary . . . . . . . . . . . . .

Contentsviii

Page 8: Strichartz_The Way of Analysis 2000

419419

. .. 419

355355355358364366368368373374377384386386391393397399403409412

349350

10 Differential Calculus in Euclidean Space10.1 The Differential : .

10.1.1 Definition of Differentiability .

9.2.5 Exercises .9.3 Continuous Functions on Metric Spaces

9.3.1 Three Equivalent Definitions ...9.3.2 Continuous Functions on Compact Domains .9.3.3 Connectedness . . . . . . . . . . . .9.3.4 The Contractive Mapping Prínciple .....9.3.5 The Stone-Weierstrass Theorem* . . . . . . . .9.3.6 Nowhere Differentiable Functions, and Worse*9.3.7 Exercises

9.4 Summary . . . . . . . . . . . . . . . . . . .

Completeness .Compactness .

9.2.39.2.4

9 Euclidean Space and Metric Spaces9.1 Structures on Euclidean Space ....

9.1.1 Vector Space and Metric Space9.1.2 Norm and Inner Product .9.1.3 The Complex Case ..9.1.4 Exercises .

9.2 Topology of Metric Spaces . .9.2.1 Open Sets . . . . . . . . ....9.2.2 Limits and Closed Sets. . .

8.2.3 Exercises8.3 Summary . . . . . . . . .

8.1.4 Exercises 3358.2 Trigonometric Functions 337

8.2.1 Definition of Sine and Cosine . . . . . . . . . . . 3378.2.2 Relationship Between Sines, Cosines, and Com-

plex Exponentials 344

323323323329

. 332

ix

8 Transcendental Functions8.1 The Exponential and Logarithm .

8.1.1 Five Equivalent Definitions . . . . . . .8.1.2 Exponential Glue and Blip Functions. .8.1.3 Functions with Prescribed Taylor Expansions*

Contents

Page 9: Strichartz_The Way of Analysis 2000

Contents

423428432435437437441

..... 448452454

459459459467473476481483485485490494500501501505506507509

515515515520525528531

12 Fourier Series12.1 Origins of Fourier Series .

12.1.1 Fourier Series Solutions of p.n.E.'s .12.1.2 Spectral Theory .12.1.3 Harmonic Analysis . .12.1.4 Exercises .

12.2 Convergence of Fourier Series

11 Ordinary Differential Equations11.1 Existence and Uniqueness

11.1.1 Motivation ....11.1.2 Picard Iteration ..11.1.3 Linear Equations .11.1.4 Local Existence and Uniqueness*11.1.5 Higher Order Equations*11.1.6 Exercises .

11.2 Other Methods of Solution* .11.2.1 Difference Equation Approximation11.2.2 Pe ano Existence Theorem .11.2.3 Power-Series Solutions11.2.4 Exercises .....

11.3 Vector Fields and Flows* ..11.3.1 Integral Curves ....11.3.2 Hamiltonian Mechanics11.3.3 First-Order Linear P.D.E.'s11.3.4 Exercises

11.4 Summary .

10.1.2 Partial Derivatives . . . . .10.1.3 The Chain Rule .10.1.4 Differentiation of Integrals .10.1.5 Exercises .

10.2 Higher Derivatives . . . . . . . . .10.2.1 Equality of Mixed Partials .10.2.2 Local Extrema . .10.2.3 Taylor Expansions10.2.4 Exercises

10.3 Summary . . . . . . . . . .

x

Page 10: Strichartz_The Way of Analysis 2000

623623623627631634636639

567567567573580581581585591597600602602605609610610614616617618

559562

14.1.4 Basie Properties oí Measures14.1.5 A Formula for Lebesgue Measure ..14.1.6 Other Examples oí Measures ...

14 The Lebesgue Integral14.1 The Concept of Measure .

14.1.1 Motivation ....14.1.2 Properties oí Length .14.1.3 Measurable Sets ...

13.3.3 Exercises ..... . . . . .13.4 Are Length . . . . . . . . . . . . . . ....

13.4.1 Rectifiable Curves . . . . . . . . . . . .13.4.2 The Integral Formula for Are Length ..13.4.3 Are Length Parameterization*13.4.4 Exercises

13.5 Summary . . . . . . . . . . . . . . . .

13 Implicit Functions, Curves, and Surfaces13.1 The Implicit Function Theorem ..

13.1.1 Statement of the Theorem .13.1.2 The Proof* .13.1.3 Exereises .

13.2 Curves and Surfaces .13.2.1 Motivation and Examples13.2.2 Immersions and Embeddings .13.2.3 Parametrie Deseription of Surfaces .13.2.4 Implicit Description of Surfaces . .13.2.5 Exereises . .

13.3 Maxima and Minima on Surfaces .13.3.1 Lagrange Multipliers ....13.3.2 A Second Derivative Test* .

12.2.6 Exereises12.3 Summary . . . . . . . . . . . . . . . .

12.2.1 Uniform Convergenee for el Funetions . 53112.2.2 Summability of Fourier Series . . . . . 53712.2.3 Convergenee in the Mean . . . . . . . . 54312.2.4 Divergenee and Gibb's Phenomenon* 55012.2.5 Solution oí the Heat Equation* 555

xiContents

Page 11: Strichartz_The Way of Analysis 2000

Index 727

15.1.4 Exercises 70315.2 Change of Variable in Multiple Integrals . 705

15.2.1 Determinants and Volume . 70515.2.2 The Jacobian Factor* . . . . 70915.2.3 Polar Coordinates 71415.2.4 Change of Variable for Lebesgue Integrals* 71715.2.5 Exercises 720

15.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . 722

691691691694700

15 Multiple Integrals15.1 Interchange of Integrals .

15.1.1 Integrals of Continuous Functions15.1.2 Fubini's Theorem. . . . . . . .15.1.3 The Monotone Class Lemma* ...

14.4.4 Exercises14.5 Summary ...

641643643647650654655655660664667668670670673676681682

14.1.7 Exercises .14.2 Proof of Existence of Measures* ..

14.2.1 Outer Measures ....14.2.2 Metric Outer Measure14.2.3 Hausdorft'Measures* .14.2.4 Exercises .

14.3 The Integral .14.3.1 Non-negative Measurable Functions14.3.2 The Monotone ConvergenceTheorem14.3.3 Integrable Functions14.3.4 Almost Everywhere ....14.3.5 Exercises..........

14.4 The Lebesgue Spaces Ll and L214.4.1 Ll as a Banach Space ...14.4.2 L2 as a Hilbert Space ..14.4.3 Fourier Series for L2 Functions

Contentsxii

Page 12: Strichartz_The Way of Analysis 2000

xüi

Mathematics is more than a collection of theorems, definitions,problems and techniques; it is a way of thought. The same can be saidabout an individual branch of mathematics, such as analysis. Analysishas its roots in the work of Archimedes and other ancient Greek ge­ometers, who developed techniques to find areas, volumes, centers ofgravity, are lengths, and tangents to curves. In the seventeenth centurythese techniques were further developed, culminating in the inventionof the calculus of Newton and Leibniz. During the eighteenth centu­ry the calculus was fashioned into a tool of bold computational powerand applied to diverse problems of practical and theoretical interest.At the same time the foundation of analysis-the logical justificationfor the success of the methods-was left in limbo. This had practicalconsequences: for example, Euler-the leading mathematician of theeighteenth century-developed all the techniques needed for the studyof Fourier series, but he never carried out the project. On the contrary,he argued in print against the possibility of representing functions asFourier series, when this proposal was put forth by Daniel Bernoulli,and bis argument was based on fundamental misconceptions concerningthe nature of functions and infinite series.

In the nineteenth century, the problem of the foundation of anal­ysis was faced squarely and resolved. The theory that was developedforms most of the content of this book. We wiIl describe it in its logical

Do not ask permission to understand.Do not wait for the word of authority.Seize reason in your own hand.With your own teeth savor the fruit.

Preface

Page 13: Strichartz_The Way of Analysis 2000

order, starting from the most basic concepts such as sets and numbersand building up to the more involved concepts of limits, continuity,derivative, and integral. The actual historical order of discovery wasalmost the reverse; much like peeling a cabbage, mathematicians be­gan with the outermost layers and worked their way inward. Cauchyand Bolzano began the process in the 1820s by developing the theo­ry of functions without defining the real numbers. The first rigorousdefinition of the real number system came in the work of Dedekind,Weierstrass, and Heine in the 1860s. Set theory carne later in the workof Cantor, Peano, and Frege.

The consequences of the nineteenth century foundational work wereenormous and are still being felt today. Perhaps the least importantconsequence was the establishment of a logically valid explanation ofthe calculus. More important, with the clearing away of the concep­tual murk, new problems emerged with clarity and were developedinto important theories. We will give sorne illustrations of these newnineteenth century discoveries in our discussions of differential equa­tions, Fourier series, higher dimensional calculus, and manifolds. Mostimportant of all, however, the nineteenth century foundational workpaved the way for the work of the twentieth cent ury. Analysis todayis a subject of vast scope and beauty, ranging from the abstract to theconcrete, characterized both by the bold computational power of theeighteenth century and the logical subtlety of the nineteenth century.Most of these developments are beyond the scope of this book or atbest merely hinted ato Still, it is my hope that the reader, after hav­ing entered so deeply along the way of analysis, will be encouraged tocontinue the study.

My goal in writing this book is to communicate the mathematicalideas of the subject to the reader. I have tried to be generous with ex­planations. Perhaps there will be places where 1 belabor the obvious,nevertheless, I think there is enough truly challenging material hereto inspire even the strongest students. On the other hand, there willinevitably be places where each reader will find difficulties in follow­ing the arguments. When this happens, I suggest that you write yourquestions in the margins. Later, when you go over the material, youmay find that you can answer the question. If not, be sure to ask yourinstructor or another student; often, it is a minor misunderstandingthat causes confusion and can easily be cleared up. Sometimes, the in-

Prefacexiv

Page 14: Strichartz_The Way of Analysis 2000

There are many optional sections, marked with an asterisk (*), thatcan be covered or ornitted at your discretion. There is sorne Bexibilityin the ordering of the later chapters. Thus you can design a coursein accordance with your interests and requirements. There are threechapters on applications (Chapter 11, Ordinary Differential Equations¡Chapter 12, Fourier Series; and Chapter 13, Implicit Functions, Curvesand Surfaces). These topics are often omitted, or treated very briefíy, in

3. a two-semester real analysis course including an introduction toLebesgue integration.

2. a two-semester real analysis course not including Lebesgue inte­gration,

1. a one-semester introductory real analysis course,

This book is designed so that it may be used in several ways, including

To the Instructor

herent difficulty of the material will demand considerable effort on yourpart to attain understanding. 1 hope you will not become frustratedin the process¡ it is something which a11students of mathematics mustconfronto 1 believe that what you learn through a process of struggleis more likely to stick with you than what you learn without effort.

Understanding mathematics is a complex process. It involves notonly following the details of an argument and verifying its correctness,but seeing the overall strategy of the argument, the role played by everyhypothesis, and understanding how different theorems and definitionsfit together to create the whole. It is a long-term process¡ in a sense,you cannot appreciate the significance of the first theorem until youhave learned the last theorem. So please be sure to review old mate­rial¡ you may find the chapter summaries useful for this purpose. Themathernatical ideas presented in this book are of fundamental irnpor­tance, and you are sure to encounter them again in further studies inboth pure and applied mathematics. Learn them we11and they willserve you well in the future. It may not be an easy task, but it is aworthy one.

xvPreface

Page 15: Strichartz_The Way of Analysis 2000

2. Two-semester course (without Lebesgue integrals)

First semester: do chapters 1-7 in order, including most sectionsmarked with an asterisk.

1. One-semester course: do Chapters 1-8 in order, omitting all sec­tions marked with an asterisk. This will cover the one variabletheory. If time remains at the end, return to omitted sections.

Note that Chapter 15, Multiple Integrals, may be used either with orwithout the Lebesgue integral.

The first 10 chapters are designed to be used in the given order(sections marked with an asterisk may be omitted or postponed). Ifyouare not covering the Lebesgue integral, then selections from Chapter15 (15.1.1, 15.2.1, and 15.2.3) can be covered any time after Chapter10. The applications, Chapters 11, 12, and 13, can be done in anyorder. It is advisable to do at least some oí Chapter 12, Fourier series,before doing the Lebesgue Integral, Chapter 14. In Chapter 6 (section6.1.3) I have included a preview of some results in integration theorythat are covered in detail later in the book-this is the only place Ihave violated the principle of presenting full proofs of all results in theorder they are discussed. I think this is a reasonable compromise, inview of the facts that (a) the students will want to use these resultsin doing exercises, and (b) to present proofs at this point in the textwould require long detours.

Here are some concrete suggestions for using this book.

4. Lebesgue integration: 14, 15.

3. applications: 11, 12, 13;

2. functions of several variables: 9, 10, 15;

1. functions of one variable: 1, 2, 3, 4, 5, 6, 7, 8;

a real analysis course because they are covered in other courses. How­ever, they serve an important purpose in illustrating how the abstracttheory may be applied to more concrete situations. I would urge youto try to fit as much of this material as time allows into your course.

The chapters may be divided into four groupings:

Prefacexvi

Page 16: Strichartz_The Way of Analysis 2000

This book contains a generous selection of exercises, ranging indifficulty from straightforward to challenging. The most difficult onesare marked with an asterisk.

AH the main results are presented with complete proofs¡ indeed theemphasis is on a careful explanation of the ideas behind the proofs. Oneimportant goal is to develop the reader's mathematical maturity. Formany students, a course in real analysis may be their first encounterwith rigorous mathematical reasoning. This can be a daunting experi­ence but also an inspiring one. 1 have tried to supply the students withthe support they wiIl need to meet the challenge.

My recommendation is that students be required to read the mate­rial before it is discussed in cIass. (This may be difficult to enforce inpractice, but here is one suggestion: have students submit brief writtenanswers to a question based on the reading and also a question theywould like to have answered in cIass.) The ability to read and learnfrom a mathematical text is a valuable skill for students to develop.This book was written to be read-not deciphered. If 1 have perhapscoddled the students too much, I'm sure they won't complain aboutthat!

The presentation of the material in this book is often informal. Alot of space is given to motivation and a discussion of proof strategies.Not every result is labeled as a theorem, and sometimes the precisestatement of the result does not emerge until after the proof has beengiven. Formulas are not numbered, and theorems are referred to byname and not number. To compensate for the informality of the bodyof the text, 1 have incIuded summaries at the end of each chapter(except the first) of all the main results, in standard dry mathematical

3. Two-semester course (with Lebesgue integrals)

First semester: do Chapters 1-7 in order, incIuding most sectionsmarked with an asterisk, but omit 6.2.3.

Second semester: do Chapters 8-10; then selections from chapters11-13; then chapters 14 and 15.

Second semester: do Chapters 8-10¡ then 15.1.1, 15.2.1, 15.2.3,then Chapters 11-13, incIuding some sections marked with anasterisk.

xviiPreface

Page 17: Strichartz_The Way of Analysis 2000

1 would like to thank the students at Cornell who learned this mate­rial from me as 1 developed the preliminary versions of the text over

Acknowledgments

formato The students should find these chapter summaries handy bothfor review purposes and for references.

1 have tried to give some historical perspective on the material pre­sented, but the basic organization follows logical rather than historie alorder. 1 use conventional names for theorems, even if this perpetuatesinjustices and errors (for example, 1 believe it is more important toknow what the Cauchy-Schwartz inequality is than to decide whetheror not Bunyakowsky deserves some/rnost/all of the credit for it). Oneimportant lesson from the historie al record is that abstract theoremsdid not grow up in a vacuum: they were motivated by concrete prob­lems and proved their worth through a variety of applications. Thistext gives students ample opportunity to see this interplay in action,especially in Chapters 11, 12, and 13.

In order to give the material unity, 1 have emphasized themes thatrecuro Also, many results are presented twice, first in a more concretesetting. For example, 1 develop the topology of the real line first,postponing the general theory of metrie spaces to Chapter 9. This isperhaps not the most efficient route, but 1 think it makes it easier forthe students. Whenever possible 1give the most algorithmic proof, evenif it is sometimes harder (for example, 1 construct a Fourier series of acontinuous function that diverges at a point). 1have tried to emphasizetechniques that can be used again in other contexts. 1construct the realnumber system by Cauchy completion of the rationals, since Cauchycompletion is an important technique. The derivative in one variable isdefined by best affine approximation, since the same definition can beused in R". Chapter 7, Sequences and Series of Functions, is presentedentirely in the context of functions of one variable, even though most ofthe results extend easily to the multivariable setting, or more generallyto functions on metric spaces.

A good text should make the job of teaching easier. 1 hope 1 havesucceeded in providing you with a text that you can easily teach from.1would appreciate receiving any comments or suggestions for improve­ments from you.

Prefacexviii

Page 18: Strichartz_The Way of Analysis 2000

the past dozen years. 1 am grateful for your criticism as well as yourencouragement. 1 needed both, even if 1was more gracious about theencouragement (special thanks to Cbris Wittemann). 1am also gratefulto Graeme Bailey, Dan Barbasch, Eugene Dynkin, Archil Gulisashvili,and Osear Rothaus, who also taught using this material.

1am especially grateful to Mark Barsamian, who went over the textline-by-line from the point of view of the student, and made me ehangejust about everything. He helped me to improve the organization andstyle of presentation and pointed out possible sources of ambiguity andconfusion. 1 know that the text is mueh stronger as a result of hiseritieism, and 1 feel more confident that 1 have fulfilledmy promise towrite a book that students can understand.

1 am grateful to Carl Hesler, my editor at Jones and Bartlett, whohas eneouraged me throughout the long process of turning my roughlecture notes into a polished book, and 1especially appreciate bis eon­fidenee in the value of this work for the mathematieal eommunity.

1 would like to thank June Meyermann for her outstanding jobpreparing the manuseript in ~TEX and David Larkin who producedmany of the figures using Mathematica.

xixPreface

Page 19: Strichartz_The Way of Analysis 2000
Page 20: Strichartz_The Way of Analysis 2000

1

Logic plays a central role in mathematics. While other considerations­such as intuition, agreement with empirical evidence, taste, esthetics,wishful thinking, personal ambition-may influence the way mathe­maticians think and act, there is always a central core of mathematicalreasoning that is supposed to be logically sound. But is it? There arereally two questions here. The first concerns the structure of the logicitself; the second concerns how it is used. Mathematicians have madea careful study of logic, and the results are rather impressive. Themain result, called the completeness theorem [or first-order predicatecalculus, shows that the logical reasoning we are going to use in thisbook, if used correctly, is both sound and incapable of being improved(in other words, additional new forms of logical reasoning would notenable us to do more than we already can). The interested reader isreferred to any text on mathematical logic for a full discussion, whíchis beyond the scope of this work.

But what about the second question. Do mathematicians makemistakes in logic? Of course they do! Carelessness and wishful thínk­ing are the main culprits, and there is little that can be done other thanconstant vigilance and rechecking oí work. At the very least, however,every mathematician and student of mathematics should clearly un­derstand the rules of logic so that deliberate mistakes are not made,

1.1.1 Rules of Quantifters

1.1 The Logic of Quantifters

Preliminaries

Chapter 1

Page 21: Strichartz_The Way of Analysis 2000

We will not employ special symbols such as 1\ for "and", ---+ for "im­plies", but we will use the usual circumlocutions such as "if A then B"for "A implies B". Notice that the use of connectives is purely finitis­tic; although the individual statements may refer to the infinite, theuse of connectives does not introduce any new infinite reference. Thestatement "A or B" means A is true or B is true (or both), whateverA or B may be. Of course a proof of "A or B" may not tell us whichof the two is true. Notice that the double negative "not not A" is iden­tical to the statement "A". The statement "A implies B" is logicallyequivalent to its contrapositive "not B implies not A" but is distinctfrom its converse "B implies A". The logic of connectives, also calledthe propositional calculus, will be assumed as part of the mathematicalbackground of the reader.

Quantifiers introduce the infinite, potential or actual, into our finitelanguage. We can write "2+1 = 1+2,2+2 = 2+2,2+3 = 3+2,2+4 =4 + 2, ... " and assume that the three dots are self-explanatory, or wecan write "for all natural numbers z, x + 2 = 2 + z". In either case,the finite sentence we have written is supposed to convey an infiniteamount of information. The infinite enters because the set over whichx varies-the natural numbers-is infinite. In this context we can,if we wish, regard this infinite as only potential; any individual will

A B not A A and B Aor B A implies B A if and only if BT T F T T T TT F F F T F FF T T F T T F

IF F T F F T T

TRUTHTABLE

19norance 01 logic is no excuse! Some of the theorems we are going tostudy in this book were originally stated incorrectly or given incorrectproofs because of misunderstandings conceming the logic of quantifier­s. So our first responsibility is to make perfectly clear the meaningand use of "there exists" and "for all", Of course preliminary to thiswe need the logic of connectives. Fortunately, this is a straightforwardmatter, summarized by the following truth tableo

Chapter 1 Preliminaries2

Page 22: Strichartz_The Way of Analysis 2000

To deny a universal statement we need find only one counterexam­ple, so the negation of "for all x in U,A(x)" is the existential statement"there exists x in U such that not A(x)". But to negate an existen­tial statement we must show that every possible instance is false, so thenegation of "there exists x in U such that A(x)" is the universal "for allx in U, not A (x)". This is to be expected if we recall that the negationof a conjunction (and) is the disjunction (or) of the negations, and thenegation of a disjunction is the conjunction of the negations (universaland existential quantifiers being grand conjunctions and disjunctions,respectively). To paraphrase: commuting a quantifier with a negationchanges the type 01 quantifier.

The real fun comes when there is more than one quantifier in astatement. For example, the commutative law of addition of integerscan be stated as follows: for all integers x, for all integers y, x+y = y+x.Clearly the order of the quantifiers is immaterial, and in virtue of thisfact we will abbreviate the statement as follows: for all integers x andy, x + y = y + x. Similarly, the order of two or more consecutiveexistential quantifiers is immaterial, so we say "there exist integers xand y such that x+y = 2 and x+2y = 3". To paraphrase, quantifiers 01like type commute. But the situation is different for multiple quantifiers

experience only a finite number of instances of x +2 = 2+x. However,in order to understand the way of analysis, we will have to accept thecompleted infinite. This may create problems, both mathematical andpsychological, but it is inescapable.

In this book, the quantifiers used will always refer to variables whosedomains are clearly specified sets. Thus we may say "for all sets of setsof integers", but we will not say "for all sets". The "for all" is theuniversal quantifier, and the "there exists" is the existential quantifier.The universal quantifier is a grand "and", while the existential quan­tifier is a grand "or". The statement "for all x in U,A(x)" where U isa prescribed set and A is a statement in which the variable x appearshas the meaning that A(x) is true for every value of x in U. If theelements in U could be enumerated Ul, U2,U3,"" then "for all x inU,A(x)" would mean "A(u¡) and A(U2) and A(U3) and ... ". If the setU is finite, then "for all x in U,A(x)" is exactly the finite conjunction.Similarly "there exists x in U such that A(x)" means A(x) is true forat least one x in U or "A(Ul) or A(U2) or ... " if Ul, U2,... enumeratesU.

31.1 The Logic 01 Quantifiers

Page 23: Strichartz_The Way of Analysis 2000

To understand the way of analysis we have to deal with sentences thatcontain many quantified variables; we have to understand what thesesentences mean and how to form their negation. This involves onlyapplying the aboye principles several times; but it can get confusing,especially when a lot of intricate mathematical ideas are involved at thesame time. To gain sorne confidence with the process we will examinesorne examples where the mathematics is simple.

Goldbach's conjecture states: every even natural number greaterthan 2 is the sum of two primes. This can be written: for every xin the set E of even natural numbers greater than 2, there exists pin the set P of prime natural numbers and there exists q in P suchthat x = p + q. Since the last two consecutive quantifiers are bothexistential, we can combine them: for every x in E there exists p andq in P such that x = p + q. To make the dependence of p and q onx clearer we could write them as functions p( x) and q(x) from E toP, and Goldbach's conjecture becomes: there exist functions p(x) and

1.1.2 Examples

of differing type. Consider the statement: everyone has a mother. Wecan express this via quantifiers as follows: for every person x thereexists a person y such that y is the mother of x. If we reverse the orderof quantifiers the statement becomes: there exists a person y such thatfor every person x, y is the mother of z. Clearly this is not the sarnestatement as before¡ rather it says "y is the mother of us all", This isa stronger statement, less likely to be true; in fact, it is falseo

Since this is a crucial point, it is worth looking into more closely.The statement "there exists y in U such that for every x in V, A(x, y)"asserts that one y will make A(x, y) true no matter what x is. Thestatement "for every x in V there exists y in U such that A(x, y)"asserts only that A( z, y) can be made true by choosing y depending onX. It really asserts the existence of a function y = f(x) (in the exampleaboye we can call it the mother function) such that A(x, I(x)) is true.For the existential-universal form the function must be constant. Toparaphrase: the existential-universal implies the universal-existentialbut not vice versa; the universal-exi.stential is equivalent to assertingthe existence 01afu,nctionfrom the domain 01the universally quantifiedvariable to the domain oi the exi.stentiallyquantified variable.

Chapter 1 Preliminaries4

Page 24: Strichartz_The Way of Analysis 2000

q(x) from E to P such that x = p(x) + q(x).What is the negation of Goldbach's conjecture? We can calculate

it in stages as fo11ows:(not) (for every x in E) (there exist p and q inP) (x = p + q) is equivalent to (there exists x in E) (not) (there existsp and q in P) (x = p+ q) is equivalent to (there exists x in E) (for a11pand q in P) (not) (x =p + q). In other words, there exists x in E suchthat for a11p and q in P, x "# p+ q. The negation of Goldbach's conjec­ture thus asserts the existence of a counterexample: an even numberx (greater than 2) that is not the sum of two primes. Looking backat our computation of this rather simple fact, we see that it was a bitlong-winded. Clearly to form the negation of a statement with a stringof quantifiers we simply change the type of each quantifier, preservingthe order, and negate whatever fo11owsthe string of quantifiers. Fromnow on we will do this without further ado.

Goldbach's conjecture being rather difficult-no one has succeed­ed in proving or disproving it-mathematicians have looked at someweaker statements (also without success) in the same vein. For exam­ple, "Goldbach's conjecture has at most a finite number of counterex­amples". This can be written several ways. We can say, there exists anatural number n such that for a11x in E there exist p and q in P suchthat x ~ n or x =p + q. This is perhaps a somewhat artificial form inthat the p and q whose existence are asserted for x ~ n are completelyarbitrary and irrelevant. However, it has the advantage of placing a11the quantifiers first. A second version is: there exists a natural numbern such that for a11x in E, x > n implies there exist p and q in P suchthat x = p + q. A third version is: there exists a natural number nsuch that for all x in En, the set of even numbers greater than n, thereexist p and q in P such that x = p + q.

What is the negation of the sentence "Goldbach's conjecture hasat most a finite number of counterexamples"? From the first versionwe find: for all natural numbers n there exists x in E such that for a11p and q in P, x > n and x "# p + q-in other words, the existence ofcounterexamples greater than any prescribed n. (Here we have usedthe propositional logic equivalence of not (A or B) and not A and notB.)

Another weakening of Goldbach's conjecture can be formulated as"every even number is the sum of at most a fixed number of primes" .We can write this as follows: there exists a natural number k such

51.1 The Logic of Quantifiers

Page 25: Strichartz_The Way of Analysis 2000

that for every x in E there exist PI, P2, ... ,Pi in P such that j ~kand x = PI +P2 + ... +Pi' The negation of this statement is: forevery natural number k there exists x in E such that for all j ~k andPI, ... ,Pi in P, x =F PI + 1>2 + ... +Pi'

Here is a statement with four consecutive quantifiers: for everynatural number n there exists a natural number m such that for everynatural number x there exist non-negative integers al, a2, ... , am suchthat x = ai + ai + ... + a~. What does it mean? It says that givenn, there is a number m depending on n-we might write m(n)-suchthat every number is the sum of m( n) n-th powers. This statement isknown as Waring's problem, and there are some rather difficult proofsof it. Its negation would say: there exists a natural number n such thatfor every natural number m there exists a natural number x such thatfor all non-negative numbers a¡,a2, ... ,am, x =F ai + ai + ... + a~.

In deciphering complicated strings of quantifiers it is amusing toimagine a game played with the devil. Every time an existential quan­tifier appears it is your move, and every time a universal quantifierappears it is the devil's move. You make a choice of the quantifiedvariable from the specifiedset that makes things as good as possible,while the devil does his best to mess you up. If you have a strategy tobeat the devil every time, then the statement is true; otherwise, it isfalseo

A direct proof of a quantified statement is one that gives the strat­egy for beating the devil explicity. However, we will also allow indirectproofs, in which we only prove (by contradiction) that such a strategymust existo Clearly a direct proof is preferable, if it can be found. Thereis a minority school of thought, called Intuitionism or Constructioism,that holds that we should not accept indirect proofs. The majorityof mathematicians reject this as counterproductive, for the foHowingreason: by allowing indirect proofs, we create a much richer mathe­matical world, that contains more direct proofs than would have beendiscovered (presumably) had indirect proofs been rejected outright.

In the best of all possible worlds (from the point of view of students),mathematicians would be required to write all quantified sentences withthe quantifiers at the beginning of the sentence (in the correct order).Similarly, all impliction sentences (or statements of theorems) wouldhave the hypotheses fírst and the conclusions second. In this world,they don't do it that way! The English language allows many different

Chapter 1 Preliminaries6

Page 26: Strichartz_The Way of Analysis 2000

a. Every positive integer has a unique prime factorization.

b. The only even prime is 2.

c. Multiplication of integers is associative.

d. Two points in the plane determine a lineo

e. The altitudes of a triangle intersect at a point.

f. Given a line in the plane and a point not on it, there existsa unique line passing through the given point parallel to thegiven lineo

2. For each of the following mathematical statements, rewrite the state­ment making all the quantifiers explicito Then form the negation ofthe statement. Finally recast the negation in a form similar to theoriginal statement.

a. The skies are not cloudy a11day.

b. Man cannot live by bread alone.

c. The sun never sets on the British Empire.

d. To every thing there is a season, and a time for every purposeunder heaven.

e. The devil makes work for idIe hands.

f. Sufficient unto the day is the evíl thereof.

g. AIl our yesterdays have líghted fools the way to dusty death.

1. For each of the foIlowing famous sayings, rewrite the statementmaking a11the quantifiers explicito Then form the negation of thestatement. Finally, recast the negation in a form similar to theoriginal saying.

1.1.3 Exercises

forms of expression for the same ideas, and mathematicians help them­selves freeIy. This is bound to cause confusion. As you do the exercisesand look back at the examples already discussed, try to develop theinsight to recognize hidden quantifiers or transposed word orders. It isa skill that wiIl serve you weIl in what foIlows.

71.1 The Logic 01 Quantifiers

Page 27: Strichartz_The Way of Analysis 2000

Set theory, like logic, plays a central role in mathematics today. Incontrast to the settled state of logic, however, we are in a state ofgrievous ignorance about some oí the basic properties of sets and arelikely to remain so. The reason for this is that our finite minds havedifficulty penetrating to the core oí the infinite¡ at best we can playwith pebbles on the shore oí a vast ocean. Fortunately we possesssome mighty pebbles that will enable us to enter the way of analysis.Sorne oí the material in the section may be familiar to you, but in anycase it is worth reviewing because of its importance.

The basis for our intuition of the infinite is the set of natural num­bers 1,2,3,.... As Galileo observed, this set can be put in one-to-onecorrespondence with a proper subset, say the even numbers-an obser­vation that convinced Galileo that he shouldn't toy with the infinite.Cantor was braver, adopting the definition that two sets have the samecardinality íf their elements can be put in one-to-one correspondenceand facing squarely the fact that infinite sets can have the same cardi­nality as proper subsets. For finite sets this cannot happen, of course.

1.2.1 Countable Sets

1.2 Inftnite Sets

a. Every line segment has a midpoint.b. Every non-zero rational number has a rational reciprocal,C. Every non-empty subset of the positive integers has a smallest

elementod. There is no largest prime.

3. Each of the following true statements is in universal-existentialforme Write the corresponding statement with the order of quan­tifiers reversed, and show why it is falsee

g. Any partitioning of the integers into a finite number of dis­joint subsets has the property that one of the subsets containsarbitrarily long arithmetic progressions.

h. If there are more letters than mailboxes, at least one mailboxmust get more than one letter.

Chapter 1 Preliminaries8

Page 28: Strichartz_The Way of Analysis 2000

The k-th row of this matrix enumerates the set Ak. Of course thematrix may contain duplications, but this can be handled as before.The question of the countability of the union set then boils down tothe question: are the elements of the aboye infinite matrix countable?The answer is obtained by turning your head 45° counterclockwise (orequivalently turning the paper 45° clockwise). The matrix then looks

all al2 al3a21 a22 a23a3l a32 a33

This leads immediately to the question: do a11infinite sets have thesame cardinality? A set with the same cardinality as the natural num­bers is ca11edcountable; the one-to-one correspondence with 1,2,3, ...amounts to a counting or enumeration of the seto The elements ofa countable set can be listed Ul, U2, U3, ••• , although the order of theelements may have nothing to do with any relationships between theelements. The listing is merely a convenient way of displaying the one­to-one correspondence with the natural numbers. In order to constructa set that is not countable (uncountable) we attempt to build largersets by the natural processes of set theory.

Suppose we have two countable sets, A = (al, a2, ... ) and B =(bl,~, ... ). Is their union countable? Clearly we can splice the twolistings al, bl , a2, ~, . .. to obtain a listing of A U B. This listing mayinvolve duplication, since we have not assumed the sets A and B aredisjoint, so it may not be a one-to-one correspondence. However, theproblem is easily fixed by tossing out the duplications as they arise.In other words we use the simple lemma: if there is a mapping of thenatural numbers onto a set U (not necessarily one-to-one), then U iseither finite or countable. In this case A U B cannot be finite since itcontains the infinte subset A.

A similar argument shows that the union of any finite number ofcountable sets is countable. What about the union of a countablenumber of countable sets? Suppose Al, A2, A3, ... are sets and eachone is countable. Let us denote the elements of Al by all, al2, ... , theelements of A2 by a2l, a22, ... , and in general the elements of Ak byakl, ak2, ... · We can then write a11the elements of the union U~l Akin an infinite matrix as

91.2 Infinite Sets

Page 29: Strichartz_The Way of Analysis 2000

To escape beyond the countable we need a more powerful set operation,called the power seto If A is any set, then the power set, 2A, is the set ofal! subsets of A. That is, the elements x of 2A are the subsets of A (inother words, x is a set, all of whose elements are elements of A). Thereason behind the notation 2A is that we can "parametrize" the subsetsof A by attaching a two-element set, say (Yesa, Noa), to each elementa of A. A particular subset x of A is then uniquely determined by achoice oí one of the two, Yea, if a is in x or N0a if a is not in x, for eacha in A. Alternatively we can think of the choice Yes¿ or NOa as givinga function from A to the two-element set (Yes, No). It is important tounderstand that the choice Yes, or NOa is completely arbitrary-we donot assume that it is given by a particular rule that we could describe,even potentially, in our finite language. This is essential because thenumber of subsets that are potentially describable in any finite languageare at most countable and rather limited in many ways (when the set Ais the natural numbers then these describable sets are called recursive).To consider 2A for any infinite set A is thus to admit into mathematicsan object well beyond the scope of our full comprehension. This is anorder of magnitude greater than the admission oí the completed infiniteof the natural numbers, because every natural number has a potentialname in our finite language.

1.2.2 Uncountable Sets

an infinite triangle and can be counted by following the arrows. Thuswe cannot construct an uncountable set by the process of unions. Inci­denta11y,the same argument shows that the Cartesian product A x B oftwo countable sets is also countable (recall that the Cartesian productA x B is the set of a11ordered pairs (a, b) where a varies over A and bvaríes over B). One can similarly show that any finite Cartesian prod­uct Al x A2 X ••• x Ak of countable sets is countable, but the same isnot true of the Cartesian product oí a countable number of countablesets (see exercise set 1.2.3, number 5).

all -+-+ a21 -+ a12 -+

-+ a3l -+ a22 -+ a13 -+

like

Chapter 1 Preliminaries10

Page 30: Strichartz_The Way of Analysis 2000

Cantor's proof that 2N is uncountable, where N is the set of naturalnumbers, goes as fo11ows:Suppose 2N is countable; then we will obtaina contradiction. Let Ul, U2, U3, .•• be the supposed enumeration of 2N•This means that each Uk is a set of natural numbers, and every setof natural numbers must appear on the listo The contradiction arisesfrom the fact that one can now describe a set v that is not on thelisto The set v is the set that contains the number k if and only ifUk does not contain k. Symbolica11y,v = {k : k is not in Uk}. Theset v is unambiguously defined, in fact by a statement in our finitelanguage, but the definition of v depends on the particular enumerationU¡, U2, U3, .... Now the very construction of v guarantees that it doesnot appear on the list, because it differs from Uk in the matter of thenumber k; one contains k and the other doesn't. This contradicts thefact that U¡, U2,U3, •.• was supposed to list every subset of N, and thusthe impossibility of such an enumeration is demonstrated.

Despite the simplicity of Cantor's proof, it contains some subtlepoints that deserve to be explicated. For example, how come we can'trepair the error oí having left out the set v simply by inserting it sorne­where on the list, say first: v, Ul, U2, .•. ? Now we have an enumerationthat contains V. And yet, doesn't the same argument show that v isnot on the list? The key to overcoming these sophistries is to remem­ber that the definition of v depended on the particular enumeration.Thus the set constructed by the argument from the new enumerationv,Ul, U2,'" is a different set; ca11it Vi. We could of course add Vi tothe enumeration, but then the argument would construct still anotherseto Thus in fact what the proof shows is that given any countable setU¡,U2, U3, •.• of subsets of N, it is possible to construct a subset of Ndifferent from all of them. This construction is often called a diagonal­ization argument, since if we represent the enumeration Ul, U2,U3,' ..by an infinite matrix of Yes's and No's, with a Ves in the j - k placemeaning that j is in Uk, then v is obtained by changing a11the diagonalentries. This kind of argument is used quite frequently in mathematics.We wiIl also encounter later a different kind of diagonalization argu­ment that shows how we can pass to a subsequence a countable numberof times.

The conclusion that 2N is oí greater cardinality than N (we saythat a set A has greater cardinality than a set B if A cannot be put inone-to-one correspondence with B, but a proper subset of A can be put

111.2 Infinite Sets

Page 31: Strichartz_The Way of Analysis 2000

in one-to-one correspondence with B) raises the interesting question ofwhether there are sets of intermediate cardinality. In other words, doesthere exist a subset A of 2N that is of greater cardinality than N but oflesser cardinality than 2N; or on the contrary, must every infinite subsetA of 2N be capable of being put in one-to-one correspondence witheither N or 2N, as Cantor's famous Continuum Hypothesis asserts? Wenow have very convincing evidence, in the work of Kurt Godel and PaulCohen, that this question will never be answered. Godel showed thatthe Continuum Hypothesis cannot be disproved, and Cohen showedthat it cannot be proved. They constructed different "rnodels" of settheory in which all the usual axioms of set theory are valid and theContinuum Hypothesis is in one case true and in another case not true.The usual interpretation of these results is that we do not have enoughaxioms for set theory to distinguish between these wildly different "settheories". But we don't have a clue where to search for new axioms forset theory nor any legitimate way to decide whether or not to acceptnewaxioms.

It is possible that sorne intuitively appealing set-theoretic axiommay yet be discovered that will settle the Continuum Hypothesis, butat present this is just ídle speculation. At any rate, we have no rightto expect that our finite reasoning can fully illuminate the uncountableinfinite.

Fortunately, for the analysis we are going to do in this book, theseset-theoretic questions will not entero In fact one might say that the wayof analysis consists of using the finite to get at the countably infinite(by constructing arbitrarily large finite segments of countable sets) andthen using the countably infinite to get at the uncountably infinite byan appropriate approximation process. The uncountably infinite willremain largely potential, just as in number theory the infinite is largelypotential. Also, we will be dealing only with sets that are built up fromthe natural numbers in a finite number of steps; that is, we may dealwith sets of sets of numbers or sets of sets of sets of numbers, and thelike, although we will introduce abbreviations that will hide this fact.We will certainly never deal with such linguistic contortions as "the setof all sets" , that leads to the famous paradoxes of naive set theory suchas "the set of a11sets not containing themselves". A11the set theory wewill use is capable of being axiomatized, although we shall not do so.

Chapter 1 Preliminaries12

Page 32: Strichartz_The Way of Analysis 2000

1.3.1 Howto DiscoverProofsThis book is about proofs. Not only will you read the proofs presentedin the text, but in the exercises you will be asked to prove things.What is a proof] How do you go about finding one? How shouldyou write a proo/? Presumably, these are questions you have thoughtabout before in the course of your mathematical education. Theseare not easy questioos; and although there is a reasonable coosensus

1.3 Proofs

7. Generalize the diagonalization argument to show that 2A hasgreater cardinality than A for every infinite set A.

6. Let A be a set for which there exists a function / from A to Nwith the property that for every natural number k, the subset ofA given by the solutioos to /(a) = k is finite. Show that A isfinite or countable.

5. Let Al, A2, A3,." be countable sets, and let their Cartesian prod­uct Al x A2 X A3 X ••• be defined to be the set of all sequences(al, a2,"') where a/c is an element of A/c. Prove that the Carte­sian product is uncountable. Show tbat the same conclusión holdsif each of the sets Al, A2, ... has at least two elements.

4. Show that if a countable subset is removed from an uncountableset, the remainder is still uncountable.

3. Prove that the rational numbers are countable. (Hint: they canbe written as the uníon over k E N of the sets Q/c= {±j / k : j EN}.)

2. Is the set of all finite subsets ofN countable or uncountable? Givea proof of your assertion.

1.2.3 Exercises1. Prove that every subset of N is either finite or countable. (Hint:

use the ordering of N.) Conclude from this that there is no infiniteset with cardinality less than that of N.

131.3 Proofs

Page 33: Strichartz_The Way of Analysis 2000

in the mathematical community on the answers, there is still somedisagreement on where to put the emphasis.

In principle, a proof is a sequence of logical deductions from thehypotheses to the conclusion of the statement being proved. The state­ment is typically in the form of an implication, say "P and Q impliesR". In this case there are two hypotheses, P and Q, and one conclu­sion, R. All terms involved in the expressions P,Q, and R should bepreviously defined. The first step in reading (or finding) a proof is tomake sure you understand all the hypotheses and the conslusions andremember the definitions 01 all the terms used. Sometimes, importanthypotheses are hidden in the fine print. Read the fine print.

The reasoning allowed in a proof involves the accepted principlesof logical reasoning and the application ofaxioms or previously estab­lished theorems. A common source of error in proofs involves applyinga theorem without verifying all the hypotheses of that theorem. Thismay result in an incorrect proof (if the theorem does not apply) or anincomplete proof (if the theorem does apply but some of the key ideasof the proof are involved in verifying this). Another common sourceof error involves making extra assumptions, beyond what is given inthe statement. Also, if the conclusion of the statement says "for allx ... ", it is not correct to prove the statement just for x = 2.

When faced with the task of finding a proof for a given statemen­t, it is a good idea to make a list of what you know (the hypotheses,and sometimes also the definitions of key terms) and what you want toshow (the conclusion, sometimes spelled out by supplying definitionsof terms). Then search for theorems that connect the two. It is advis­able to work simultaneously backward and forward. If your hypothesescontain the hypotheses of a theorem that you know, you may deducethe conclusion of that theorem. This is working forward. Or, you mayknow a theorem whose conclusion is the conclusion you are trying toshow. Then the hypotheses of this theorem become the new target(what you have to show). This is working backward. Often, the fitbetween the theorem you want to use and the way you want to use itis not perfecto For example, in working forward, the hypotheses of thetheorem you want to use may not all be exactly on your list of whatyou know, but you may be able to deduce them (with a little work)from what you know. In this way you can build a chain of deductionsfrom your hypotheses and a chain of deductions to your conclusion.

Chapter 1 Preliminaries14

Page 34: Strichartz_The Way of Analysis 2000

If you are lucky, at some point these two chains will meet, as inFigure 1.3.1. Then you can join them to form a single forward chain ofdeduction from hypotheses to conclusion, the proof you sought. Sucha simple strategy may be called "fOIlOWyour nose". It should workfor the simpler problems. Here is a good tip if you get stuck. A goodproblem will require that all the hypotheses be used in the proof. So ifyou are wondering what to do next, go down the list of what you knowand see if anything hasn't been used yet. Then figure out a way to useit-that is, look for theorems that have it (or a consequence of it) asone hypothesis.

If "foIlowyour nose" doesn't lead to a proof, you may have to resortto a "construction". This means that you introduce a new object that isnot mentioned in either the hypotheses or conclusions of the statement.For example, if the statement concerns two numbers x and y, you mightneed to consider the average z = 1/2(x + y). This is a simple example,but constructions can be much more elaborate; their discovery requiresingenuity and creativity. There is no set formula for how to go aboutthe process, but you will find that experience is helpful. The goal ofthe construction is to make more theorems applicable to the problem.You may get a hint that a construction is needed if there is a theoremthat seems likely to be relevant but doesn't quite apply to the objectsmentioned in the problem.

After you have discovered a proof, and checked that it is a validproof, you wiIl want to write it up. There are two approaches you cantake. You can write it in the order in which you discovered it, or youcan try to rearrange the order of reasoning to "polísh" the argumentoFor example, in Figure 1.3.1, you might first have observed that "P

Figure 1.3.1:

SandQ~T

---0151.9 Proofs

Page 35: Strichartz_The Way of Analysis 2000

The argument for preferring the rearranged proof is that it is easierto follow. On the other hand, the first proof indicates the reasoningused in the process of discovery and so may make more sense to thereader. A little common sense may help one decide between the twooptions (or a compromise, that retains some of the original order ofdíscovery but does sorne pruning and straightening in between).

One important issue with which you will have to grapple is theamount of detail you inelude in the proof. Often there are very simplearguments that can either be omitted altogether or dismissed with aphrase like "it is obvious that". If all these simple arguments wereincluded, the proof might become so long that the reader would "losethe forest for the trees". On the other hand, you don't want to leaveout any non-trivial parts of the argument or make a mistake in reason­ing. The ultimate test of a written proof is whether or not it convincesthe reader. This, of course, depends on the reader! For a written as­signment that is being graded, you had better inelude enough detailto convince the grader that you know how to supply the missing ar­guments. For the proofs presented in the text, I have tried to supplyenough detail so that you can fill in whatever is omitted.

A word about proof by contradiction (also known as indirect proof,

Proof: We are assuming P and Q. Then P implies S by the defini­tion of grunginess, and Q and S imply T by the previous lemma. Thuswe have both P and T, the hypotheses of the Purple Peril Theorem,whose conelusion is R. QED

Or, you could rearrange the deductions to run forward. The resultmight read like this:

Proof: We need to show R. But the Purple Peril Theorem says Pand T imply R; and since we already know P, it suffices to show Tto complete the proof. But observe that P implies S by the definitionof grunginess, and Q and S imply T by the previous lemma. Thiscompletes the proof. QED

and T imply R", then that "P implies S", and finally that "S and Qimply T". A proof in the order of discovery might read as follows:

Chapter 1 Preliminaries16

Page 36: Strichartz_The Way of Analysis 2000

One of the goals of this book is to get you to understand the proofs ofthe theorems presented. There are many levels of understanding, andthey will involve considerable effort on your parto On the first level ofunderstanding, you should be able to read through the proof and beconvinced that the reasoning is correcto There must be no "proof byintimidation" o You have to see all the individual steps of the argument;and this means going through the proof line by line, checking aH thedetails that are presented, and filling in the details that are left out.I have tried to present the proofs elearly, but it will probably happenthat some statements are not elear to you on first reading, Put a markin the text when this happens and be sure to return to these pointsand elear them up, either by yourself or with the help of someone else.

The second level of understanding involves grasping the structure ofthe proof. What are the main theorems that are used? What construc-

1.3.2 How to Understand Proofs

reductio ad absurdum, or proo! by the contmpositive). It is an acceptedlogical principle that "not Q implies not P" is the same as "P impliesQ". SO to prove the statement "P implies Q", it is legitimate to startby assuming Q is false and reach a contradiction. Along the way youmay use both the hypothesis P and the assumed not Q in the argumentoOften this allows you to successfully find a proof that otherwise mightbe elusive. However, it is my experience that students tend to abuseproof by contradiction in the following way. The student will assumeQ is false, then give an argument that produces Q as a consequence,and find the contradiction between Q and not Q. But in fact, buried inthe middle of the student's proof will be a direct proof that P impliesQ. General1y speaking, this might be the case with your proof if thecontradiction involves Q and not Q. If so, check to see whether youreally need not Q in your argumento If you didn't use not Q, then it ispreferable to present the proof as a direct proof.

Why is the direct proof preferred over the indirect proof? Usuallya direct proof is more constructive. If the conelusion is an existentialstatement, "there exists x such that ... ", the direct proof may ineludean algorithm for finding X. This gives more information than simplyshowing that the non-existence of such an x always leads to a contra­diction.

171.3 Proofs

Page 37: Strichartz_The Way of Analysis 2000

We take the rational number system as our starting point in the con­struction of the real number system. We could, of course, give a detailedconstruction of the rational numbers in terms of more primitive notions.However, every mathematical work must start somewhere, with some

1.4 The Rational Number System

tions are involved? What are the main difficulties, and how are theyovercome? You should have in your mind an outline of the proof, withsome hints as to how to fill in the details. Without actually having tomemorize anything, you should retain enough to be able to reconstructthe proof in your own words without referring to the book.

The third level of understanding involves seeing how every hypoth­esis enters into the proof. You should be able to point out where eachhypothesis is used: better still, you should be able to give a counterex­ample to the theorem without that hypothesis. You should explore thepossibilities of weakening the hypotheses or strengthening the conclu­sion. Is the result still true, or can you find a counterexample? If theproof involves a construction, try to understand why that particularconstruction was chosen. In short, this level of understanding requiresthat you take the proof apart and put it back together to see how itticks (this cliché refers to the distant past when watches actually ticked,and you could actually take them apart). When you have achieved thislevel of understanding, you should have no difficulty remembering theexact statement of the theorem.

There is another level of understanding, although this refers to thetheorem as a whole, not just the proof. This level refers to the waythe theorem is used and how it can be generalized. Obviously, youcannot achieve this level right away. Working the problems will helpgive you a feel for what is involved in using a theorem. By the timeyou reach the end of this book you will have seen how many of thetheorems in the beginning of the book are used. As you continue yourmathematical education you will see how the themes developed in thisbook recur and how theorems are used and generalized. By the sametoken, you should find as you study this book that your understand­ing of previously learned mathematics is broadened and deepened. Inmathematics, truly, nothing should be forgotten.

Chapter 1 Preliminaries18

Page 38: Strichartz_The Way of Analysis 2000

p p' p.p'q.q'=q.q'.

These operations are well defined in that if we replace one of the ex­pressions in the sum or product by a different express ion for the samerational number, then the result will be a different expression for thesame rational number (so 1/2 + 1/3 = 5/6 and 2/4 + 1/3 = 10/12 but5/6 and 10/12 are expressions for the same rational number). This mayseem like a trivial point, but it is essential that we observe it. Later,when we define real numbers, we will encounter a similar situation inthat there will be many different expressions for the same real number,and when we define operations like addition on real numbers we will al­so be obliged to verify that the result is independent of that particularexpression is chosen.

The rational numbers form a field under the operations of additionand multiplication defined aboye. This means

1. addition and multiplication are each commutative and associative,a+b = b+a, (a+b)+c = a+(b+c), a-b = b-a, (a·b)·c = a·(b·c);

and

common notions that are accepted without formal development. It isappropriate, since we are entering upon the way of analysis, that ischaracterized by infinite processes, that we begin with a system thatis essentially finite and yet is as close to our goal as possible-so thatwe do not excessively weary ourselves with preliminaries. The rationalnumber system serves this purpose admirably. (There are an infinitenumber of rationals, but we do not require infinite notions to definethe algebraic and order structures of the rationals.)

It is assumed that the reader has had ample experience in dealingwith the rational number system and feels comfortable with its prop­erties. We give here a summary of the properties we will be using.

A rational number is a number of the form p/q, where p and q areintegers and q is not zero. The expressions p/q and p' /q denote thesame rational number if and only if pq = p'q. Every rational numberhas a unique irreducible express ion p/q where q is positive and as smallas possible.

Arithmetic is defined for rational numbers by

p p' pq +p'qq + q' = qq'

191.4 The Rational Number System

Page 39: Strichartz_The Way of Analysis 2000

In addition to the connections with arithmetic, there are some otherproperties of the ordering of the rational numbers that are noteworthy.One is called the Axiom 01 Archimedes: for every positive rationalnumber a > O there exists an integer n such that a > l/no (Notethat the term "axiom" is used here for historie reasons only; it is nottaken as an axiom for the rational numbers but rather is a theoremthat can be proven for the rational nurnber system). The reciprocalversion of this is that every positive rational number is less than sorneinteger. This will turn out to be a very crucial property--one that isalso possessed by the real number system.

la - bl ~ [e] - Ibl·

The fíeld axioms imply a11the usual laws of arithmetie and allow thedefinition of subtraction and division by a non-zero rational number.

The rational numbers also possess an order. A rational is positiveif it has the expression p/ q with p and q positive. It is negative if ithas the expression p/ q with p negative and q positive. Every rationalis either positive, negative, or zero. In terms of this we define a < b ifb - a is positive and a ~ b if b - a is positive or zero (non-negative).

The order and the arithmetic are connected. The sum and productof positive rational numbers are positive. These properties expresa thefact that the rational numbers form an ordered field. Al! the usualproperties relating order and arithmetie (such as a > b and e ~ dimply a + e > b + d) are easily deducible from them.

For example, we can define the absolute value lal = a ir a ~ O and-a if a < O and prove the triangle inequality la+bl ~ lal+lbl for rationalnumbers. This property will be used frequently and generalized broadly(the terminology "triangle inequality" comes from a generalization tovectors in the plane, where la + bl is interpreted as the length of thethird side of a triangle whose two other sides have length [e] and lb!).The triangle inequality is also frequently used in transposed form

2. multiplication distributes over addition, a . (b + e) = a . b + a . C;

3. O is the additive identity, O + a = a, and 1 is the multiplicativeidentity, 1 . a = e;

4. every rational has a negative, a + (-a) = O, and every non-zerorational has a reciprocal, a· l/a = 1.

Chapter 1 Preliminaries20

Page 40: Strichartz_The Way of Analysis 2000

In thinking about infinitesets, weare inclinedto adopt formsof reason­ing that arise from our intuitive ideas about finite sets. This transfer­ence of ideas from the finite to the infinite is by no means routine andoften has consequencesthat are unforseen. Such an innocent principleas the Iaw of the excludedmiddle-that a statement must either betrue or false-results in the non-constructivenature of mathematics.We can prove ''there exists z such that bIah" by showingthat ''for allx not blah" leads to a contradiction,without offeringa clue as to howto find the z whose existence is asserted. With the exception of theIntuitionist and Constructivist schools of thought, most mathemati­cians accept this sort of non-constructivity routinely, with the feelingthat the problemof actually findingthe z is a Iegitimate,but different,mathematical problem.

The Axiom of Choice is another principIe-obvious for finite setsand transferred to infinitesets by analogy-that leadsto non-constructivemathematics. Inmanywaysthe use of this axiomIeadsto a higher de­gree of non-constructivenessthan the use of the law of the excludedmiddle. (Here1am referringto an intuitive conceptionof the degreeofnon-constructivenessrather than a formalmathematical theory.) Forthis reason, the axiom of choicehas receivedcareful scrutiny and per­haps a bit of notoriety as wel1. Since we will be using it as a validmethod of reasoning, we will take the time here at the beginning todiscuss it in detail.

IfA and B are two sets that are non-empty,wedefinethe Cartesian

1.5 The Axiom of Choice*

Another interesting property of the orderingof rational numbers isthat between any two distinct rationals there is an infinite number ofother rationals. Thus there is no next largest rational. If you have everstudied the concept of well-ordering you will recognizethe fact thatthe ordering of the rationals is not one.

The rational numbersystem,with its arithmetic and ordering,form­s such a simple and elegantmathematical model that it is tempting towant to stay within its comfortable domain. What more could onedemand of a number system? Why not do analysis here? To find out,turn to the next chapter.

211.5 The Axiom 01 Choice

Page 41: Strichartz_The Way of Analysis 2000

product A x B to be the set of ordered pairs (a, b) with a in A andb in B. For finite sets A and B this is a completely straightforwarddefinition, and even for infinite sets A and B it causes Httle problem. IfA is non-empty it must contain at least one element al and if B is non­empty it must contain at least one element bl, so (al, bl) is in A x B.The Cartesian product 01 two non-empty seis is non-empty. Of coursewe may not have a constructive procedure for obtaining the element(a¡, bl) ofAx B, if we lack a constructive method for getting al or bl.But the process of pairing does not add to the non-constructivity. Ifwe can "construct" al and bl, then we can "construct" (al! bI).

The same ideas can be used to create the Cartesian product Al xA2 x ... x An of n sets for any finite n. This is the set of ordered n­tuples (a¡, a2,"" an) where each aj is an element of Aj. Once again,the Cartesian product of non-empty sets is non-empty.

We encounter the axiom of choice when we try to extend theseideas to an infinite collection of sets. Suppose Al, A2, is a countablecollection of sets. The Cartesian product Al x A2 x is defined tobe the set of sequences (al! a2,"') where each an belongs to An. Thecountable axiom 01 choice asserts that if the sets An are all non-empty,then the Cartesian product is also non-empty. The term "choice" refersto the fact that any particular element (al, a2, ... ) of the Cartesianproduct arises from the "choice" of one an from each set An.

There are two important points that need to be emphasized here.The first is that it is usually not necessary to invoke this axiom inorder to show that the Cartesian product is non-empty. In most par­ticular cases we know enough about the sets An to produce a sequence(al! a2,"') by other methods of reasoning. For example, if the An arelines in the plane (being considered as sets of the points on the lines) ,we can take a fixed origin in the plane and define an to be the pointon the line closest to the origino We thus use a theorem of Euclideangeometry to produce the element (a¡, a2," .). From the point ofview of"choice", we have replaced the infinite simultaneous unspecified choiceof the axiom with an infinite simultaneous specified choice. When aspecified choice is available, the axiom of choice is unnecessary.

The second point is that it is the infinite number of sets involved andnot the infinity of cardinality of the sets An that requires the axiom.We might even need to use the axiom if each of the sets An containsonly two elements. It might be that each An contains two elements, and

Chapter 1 Preliminaries22

Page 42: Strichartz_The Way of Analysis 2000

we might even possess procedures for choosing one of the two elementsof An, but the procedures might be so unrelated to each other thatwe cannot specify a general procedure. Thus the countable axiom ofchoice, applied to sets of cardinality two, leads to non-constructiveexistence.

Normally the countable axiom of choice is used to justify the inclu­sion in an argument of the making of a countable number of simulta­neous choices where we cannot (or are too lazy to) make the choicesin a specific way. In such applications we can avoid all mention of theCartesian product, since it is the individual sequence (al, a2, ... ) thatwe need. Such uses are relatively uncontroversial and do not lead toany worse non-contructiveness than the use of the law of the exdudedmiddle.

The general axiom of choice refers to an arbitrary-perhapsuncountable--collection A of non-empty sets A and asserts the pos­sibility of making a choice of one element from each of the sets A(formally, a function 1, called a choice function, whose domain is A,and such that I(A) is a point in A for each set A in A). Although thegeneral concept of set as an arbitrary collection of elements would nat­urally lead us to accept this axiom-if each of the sets A is non-empty,why shouldn't there be a choice function?-it does lead to a level ofnon-constructivity that is mind-boggling.

Here is an example. The general axiom of choice can be used toshow that there exists an ultrafilter on the set of natural numbers.Intuitively, an ultrafilter is a collection of the "big" sets of numbers. Itmust possess the property that if A contains B and B is big, then A isbig and also that the intersection of any finite number of big sets is alsobig. These two consistency conditions define the notion of jilter. Thereare many filters¡ the simplest example is to define the complements offinite sets to be big. The ultra denotes the additional property thatevery set of numbers must either be big or else its complement must bebig. This means for instance that either the even numbers or the oddnumbers must be big (not both, for their intersection is empty). Thusthe ultra-filter makes an arbitrary choice-in a consistent manner­between each set and its complemento Clearly there is no specifíc wayto make such a choice, and it is beyond the imagination how such achoice could be made. The use of the words "there exists" in the phrase"there exists an ultrafilter on the natural numbers" thus involves a

231.5 The Axiom 01 Choice

Page 43: Strichartz_The Way of Analysis 2000

further step away from the constructive. For this reason, any use ofthe general axiom of choice deserves special mention and comment.Fortunately we will not need to use this axiom in this work.

Chapter 1 Preliminaries24

Page 44: Strichartz_The Way of Analysis 2000

25

We have an intuitive concept of the real number system; it is the num­ber system that should be used for measurements of space and timeas well as for other quantities such as mass, temperature, and pres­sure that are thought of as varying continuously rather than discretely.This intuitive real number system has been used by mathematician­s since at least the period of ancient Greek mathematics, but it wasnot until the second half oí the nineteenth century that a satisfactoryformal mathematical system was constructed that could serve in itsplace. In fact, the whole history of the discovery of the foundation­s of analysis reads backward-much as one peels a cabbage startingfrom the outermost leaves and working inward-so mathematicians s­tarted by giving precise definitions for the most advanced conceptssuch as derivative and integral (Cauchy and Bolzano, 1820s) in termsof an intuitive real number system, then worked inward to constructthe real number system (Weierstrass, Dedekind, Meray, Heine, Cantor,1860s and 1870s) in terms of an intuitive set theory, and then workedto the core oí axiomatic set theory (Peano, Frege, Zermelo, Russell,Whitehead, F'rankel, starting in the 1890s and extending well into thetwentieth century). Whether or not this is really the core or merely

2.1.1 Motivation

2.1 Cauchy Sequences

Construction of the RealNumber System

Chapter 2

Page 45: Strichartz_The Way of Analysis 2000

another inner leaf surrounding a more elementary core is left for thefuture to answer. The order of eating the cabbage do es not have tocorrespond to the order of peeling, so in this book we will discard thecore and start with the inner leaves.

We formulate the problem as follows: construct a mathematical sys­tem, by means 01precise, unambiguous definitions and theorems prouedby purely logical reasoning, accepting as given the logic, set theory, andrational number system discussed in the preoious chapter, which hasas much as possible in common with the intuitive real number system.This problem is clearly not as well formulated as we might like. It is byno means clear that a solution is possible or that there are not manydifferent solutions; we do not even have a strict standard by whichto judge what constitutes a solution. Nevertheless it is an importantproblem, and we shall study in detail one solution, called the Cauchycompletion 01 the rationals. This solution is equivalent to several oth­ers, which we will discuss latero Together we can refer to these as theclassical real number system. Since there is no completely objectivemethod to evaluate how closely this system conforms to the intuitiveconcept of real numbers, we must rely on the consensus of the major­ity of working mathematicians and users of mathematics to ratify thechoice of the cIassical real number system as a worthy and successfulsolution. As a student of mathematics, you are invited to study thissystem and become part of the consensus or, if you wish, to oppose theconsensus. (There are two other seriously competing number systems,called the constructive real number system (E. Bishop, 1970) and thenon-standard real number system (A. Robinson, 1960). Each has a fol­lowing of mathematicians who believe that the alternative system isin fact a better solution to the problem as formulated above than theclassical real number system. These systems are discussed briefiy at theend of the chapter. Neither would make a good basis for a book on thislevel because both require a thorough understanding of the classicalreal number system.)

Before discussing the construction of the classical real number sys­tem, we should at least attempt to spell out some of the propertiesof the intuitive real number system that we expect the formal math­ematical system to possess, It should certainly contain the rationalnumbers, and it should have an arithmetic with similar properties. Inaddition to the arithmetic operations, the rational numbers possess a

Chapter 2 Construction of the Real Number System26

Page 46: Strichartz_The Way of Analysis 2000

compatible notion of order that is of great importance, so we want thereal number system to have a similar order. We can summarize theaboye by saying we want the real number system to be an ordered field,that is, a set 1Rwith two operations, addition (x +y) and multiplication(x· y), an additive unit 0, a multiplicative unit 1, and an order relation(x < y), such that the ordered field axioms described in Section 1.4 areverified. (We will recall these axioms when we prove that the systemwe construct is in fact an ordered field.)

So far we have not discussed any requirements for the real numbersystem that are not already possessed by the rational number sys­temo Nevertheless we know that the rational number system is notlarge enough for even simply geometry, let alone analysis. There are"numbers", such as v'2, for which we have intuitive evidence favoringinclusion in the real number system, which are not included in therational number system. For v'2 the evidence is especially striking,consisting of drawing the diagonal of a square whose sides have lengthone and quoting the Pythagorean theorem. While this "construction"of v'2 is extremely picturesque, it is somewhat misleading in that itonly involves a finite number of steps; if we were to insist that all "con­structions" involve only a finite number of steps we would never be ableto follow the way of analysis. Of course there are many other examplesof numbers such as 1[', e, v'2.../2for which no finite construction exists;still it is particularly simple to discuss v'2 and to learn an importantpoint from the discussion.

What do we know about v'2? By defínition it is the positive solu­tion to the equation x2 = 2. (You probably remember the proof thatno rational number can satisfy this equation: if x = pI q is a ratio­nal number factor out all powers of two, so x = 2kp¡fql with PI andql odd and k an arbitrary integer. Then x2 = 22kpVq~ and x2 = 2lead to 22k-Ip~ = q~, an even = odd contradiction for k positive or21-2kq~ = p~ if k is not positive.) Perhaps you remember an algorithmfor computing the decimal expansion of v'2. It is somewhat cumber­sorne, and in fact there is a simpler and more efficient (you get moreaccuracy for the same amount of labor) method, which is a lot morefun. Choose a fírst guess Xl, and take for the second guess the average

272.1 Cauchy Sequences

Page 47: Strichartz_The Way of Analysis 2000

of the first guess and two divided by the first guess:

X2 = ~ (Xl + ~) o2 Xl

If the first guess were exactly V2, then the second guess would also beV2; but if you guessed too low, then 2/Xl would be greater then V2and the average would be closer to V2, and similarly if you guessed toohigh, The process can then be iterated, producing a third guess X3 =1/2(X2 +2/X2) and so on. In this way we get a sequence Xl, X2, X3, o o o ofbetter and better approximations to V2, and with a little calculus onecan show that convergence is quite rapid (we will discuss this examplelater). It is quite likely that your pocket calculator uses this procedure,or something similar, when it tells you V2 = 1.414 o o o (the calculatorstops iterating when the iteration produces no change in the numberof decimals retained) o

The key point of this discussion is that when we calculate V2 nu­merically, what we actually obtain is a sequence of approximations toV2, whether it be the successive partial decimals 1, 1.4, 1.41, 1.414 orthe successive guesses Xl, X2, X3, o o 00 In any particular computation weobtain only a finite number of approximations, but in principle the ap­proximation could continue indefínitely, The evidence [or the existence01V2 as a number is then that we can approximate it by other numberswhose existence we already know (the partial decimals are a11rationalnumbers, and the same is true of the sequence Xl, X2, o o o aboye, pro­vided the first guess Xl is a rational number}, The same can be said,

. /(\../2for example, for 'Ir, e, v ¿, o Thus we want the real numbers to pos-sess a property not shared by the rational numbers, which we will callcompleteness. For now we can describe this intuitively by the con­dition that anything that can be approximated arbitrarily closely byreal numbers must also be a real number; later we will give the formalcounterpart of this statement and prove that the real number systemwe construct does have this property, In the meantime we will use thisintuitive description as a clue to how to proceed.

As a counterweight to the intuitive notion of completeness, we wanta principIe that will keep the real number system from being too large(for example, to exclude imaginary numbers like H)o The simplestsuch principIe, called the density 01 the rational numbers, is that thereare rational numbers arbitrarily close to any real number, This means

Chapter 2 Construction of the Real Number System28

Page 48: Strichartz_The Way of Analysis 2000

where the ... here means that we don't know what comes next? Is ita 2 or a 3? We can't say in advance how many more digits we need tocompute before we know, despite the fact that we already know thatthe sum is extremely close to .3.

But aside from these technical problems, which can be overcome,there is a more important reason why mathematicians prefer not todescribe the real number system in terms of infinite decimal expansions.This reason is that the infinite decimal is only one way of describing realnumbers and, although it has its uses, is a somewhat peculiar one (wementioned two such peculiarities above). It would be pedagogically andpsychologically unsound to devote minute attention to the peculiaritiesoí this system oí representing numbers, since these peculiarities do notshed any light on the path we intend to follow. Instead we wiIl look

.199999999 .+ .100000000 .

that if we represent the real numbers and rational numbers graphicallyby points along a line, we will not see any difference. The "holes" inthe rational number system due to the absence of irrational numberslike J2 are not visible, because of the rational numbers nearby.

Having rejected the rational number system because it is too smaIl,let us pause to consider another possible easy way out: infinite dec­imal expansions. AIl the numbers we have been talking about haveinfinite decimal expansions, and we could just as easily think of thecomputations above of J2 as merely producing more and more digitsof this expansiono On a certain level we would not be far from wrongto define the real number system as the set of all infinite decimals ±N.al, a2, a3, ... , where N is a nonegative integer, and each aj is a digitfrom O to 9. Of course there is nothing special about the base 10; wecould use any other base just as weIl. Because of the familiarity ofinfinite decimals, this proposal is quite appealing. However, it has twotechnical drawbacks. The first is that the decimal expansion is not u­nique: .999 ... and 1.000 ... are the same number. This is usuaIly metby the ad hoc requirement that the decimal cannot end in an infinitestring of zeroes. The second drawback is that it is somewhat awkwardto define addition and multiplication, because long carries could changeearlier digits. For example, what is the first digit in the sum

292.1 Cauchy Sequences

Page 49: Strichartz_The Way of Analysis 2000

Let us examine more closely the idea oí approximating exotic numbers,like V2, by sequences oí more prosaic numbers. What is it about thesequence 1, 1.4,1.41,1.414, ... that gives us confidence there is sornenumber being approximated? We might say that there is a "comingtogether oí terms", unlike the sequences 1,2,3, ... or 1,2, 1,2,1,2, ... ,which do not appear to approximate anything. The key question, whichcan be proposed first in the intuitive real number system and was firstsolved by Cauchy in that context, is the following: what condition ona sequence 01 numbers is necessary and sufficient [or the sequence toconverge to a limit but does not explicitly involve the limit? Oí courseif we knew in advance the number x to which the sequence Xl, X2,' ••

oí numbers is supposed to converge, we could express the convergencein the typical way: [or all natural numbers n, there exists a naturalnumber m (depending on n), such that Ix - xkl < l/n [or all k ~ m.In other words, given any prescribed error l/n, if we go far enough outin the sequence (beyond m) the terms all differ from X by at most l/n.This is the standard definition oí limito Commonly the error l/n isdenoted € and is allowed to be any positive quantity. But this is merelyan equivalent variant since we can always find l/n smaller than €. Inthis book we will use l/n rather than € because it simplifies matters.Regardless oí which variant we use, it should be recognized that thisdefinition is only precise if we know what "number" means.

Cauchy's problem was how to get the limit x out oí the definition oílimit! His solution was to observe on an informal level that il the num­bers Xk are getting close to x, they must be getting close to each other.To translate this into a precise statement, however, requires sorne careoSuppose we try the most obvious condition: that consecutive terms getclose together. While this is true oí convergent sequences, it is also trueoí the sequence 1, 1~, 2, 2i, 2¡, 3, 3i, 3~, 3~, 4, 4t, ... , which does notconverge. It is not sufficient that consecutive terms be close; we needall terms beyond a certain point to be close. This is called the Cauchycriterion: [or all natural numbers n there exists a natural number m

2.1.2 The Definition

for a deeper method, one that will cast a shadow forward as well asbackward. In the end we will show that all the numbers in our systemhave infinite decimal expansions, so that we have an equivalent system.

Chapter 2 Construction oí the Real Number System30

Page 50: Strichartz_The Way of Analysis 2000

Now we examine what happens when we increase n, say to n'. Thenthere exists m', beyond which the terms all differ from one anotherby at most l/n'; in other words, by going farther in the sequence, the

Figure 2.1.1:

Xm+lXm+2

lIn

Informal Proof: Let Xl, X2, X3, ••• be a Cauchy sequence ofreal num­bers. We want there to be a real number X that is the limito Whatshould z be? Suppose we want to determine z to an accuracy of l/noThen by the Cauchy criterion there exists m (depending on n) suchthat all terms beyond the m-th differ from each other by at most l/noIf we plot all the numbers Xk for k ~ m(n) on a line, they willlie in asegment of width at most l/n and the limit presumably must also liein that segment, as shown in Figure 2.1.1.

(depending on n) sucb that for all j ~m and k ~ m, IXj - xkl ~ l/noIn other words, beyond the m-th term in the sequence, all terms differfrom one another by at most l/no A sequence that satisfies the Cauchycriterion is called a Cauchy sequence. Clearly any convergent sequenceis a Cauchy sequence, because if we go far enough out in the sequencethat Xj and XIc differ from the limit x by at most 1/2n, then they willdiffer from each other by at most 1/2n + 1/2n = l/no But the def­inition of Cauchy sequence does not involve the limit, as we wanted,so it is not immediately clear that every Cauchy sequence has a limitoCauchy claimed to have proved this, but on a rigorous level his proofhad to be bogus since he never defined "number". Nevertheless, weshould consider carefully an informal proof that the statement "everyCauchy sequence converges to a real number" accords with our intuitiveconcepts, especially the idea of completeness.

312.1 Cauchy Sequences

Page 51: Strichartz_The Way of Analysis 2000

Considering the situation for all values of n, we come to the conclusionthat there exists a nested sequence of segments of length l/n in whichthe limit presumably lies. This suggests that the limit x is exactly thatnumber that is in all those segments. Ir there were no such number,this would suggest a "hole" in our number system, something thatthe idea of completeness is supposed to prevente In any event we can"compute" this number x to any desired accuracy, say l/n, by takingany value in the n-th segment (say Xm for m large enough that Xm liesin the n-th segment). Finally the sequence xI, X2, X3,'" must convergeto this limit since for any error l/n, there exists m (depending on n),such that all the terms beyond the m-th and the number x all He inthe m-th segment and so differ by at most l/no

We can illustrate this argument neatly in what I will call the "two­dimensional picture", Draw a graph that plots Xn on the y-axis overthe point l/n on the x-axis, as in Figure 2.1.3, drawing straight-linesegments in between to make the picture clearer.

The limit of the sequence is the point on the y-axis that the graphhits. The Cauchy criterion means that a portion of the graph is boxedin by a sequence of concentric rectangles (two of which are drawn),whose x-coordinates go from Oto l/m (the condition k ~ m) and whosey-coordinates lie in a segment of length l/n (these are the segmentsdiscussed aboye). The two-dimensional picture lets you "see" how theCauchy criterion forces the graph to hit the y-axis at a precise spot.

At present we have no way of making this informal proof precise.Nevertheless it is important because it will motivate our construction

Figure 2.1.2:

l/n

,.__ l/n' _____.

segment that contains all the terms, and presumably the limit, narrows,as shown in Figure 2.1.2.

Chapter 2 Construction of the Real Number System32

Page 52: Strichartz_The Way of Analysis 2000

of the real number system. Since we already have the rational numbersand we believe that every Cauchy sequence should have a real numberas limit, we should certainly add to the rational numbers a real numberto be the limit of each Cauchy sequence of rational numbers (a Cauchysequence of rational numbers does not necessarily have a rational num­ber as limit, as the sequence 1, 1.4, 1.41, 1.414, ... approximating theirrational number J2 shows). Suppose we had a number system thatcontained rational numbers and limits of Cauchy sequences of ratio­nal numbers; would that be enough? One might first guess "No",since there would then be Cauchy sequences of irrational numbers withwhich to contend. They would have to have limits, and wouldn't theseconstitute a new variety of real numbers? Perhaps so, but perhapsnot, because more than one Cauchy sequence can have the same limit(compare 1,1/2,1/3,1/4, ... with 0,0,0, ... ). It might be that everyCauchy sequence of real numbers Xl, x2, X3, .•. has the same limit as aCauchy sequence oí rational numbers YI, Y2, Y3, .... In fact the follow­ing informal proof appears convincing: just choose for Yn any rationalnumber that differs from Xn by at most l/n (here we use the intuitiveproperty that there are rational numbers arbitrarily close to any realnumber). The error involved in changing from Xn to Yn gets smaller aswe go out in the sequence and so shouldn't change the limito Anoth­er informal demonstration that we should get every real number as alimit of Cauchy sequences of rational numbers is the infinite decimalexpansion; if a real number x has an infinite decimal representation±N. aIa2a3.'" then ±N, ±N.al, ±N.aIa2, ... is a Cauchy sequence

Figure 2.1.3:

1/21/5 1/4 1/3°

332.1 Cauchy Sequences

Page 53: Strichartz_The Way of Analysis 2000

of rational numbers approximating a;We now believe that we can obtain every real number as the limit of

a Cauchy sequence of rational numbers (if the number x is rational, thenwe can take the sequence x, x, x, ... ) and that every Cauchy sequence ofrational numbers has a real number as a limito We are almost ready tomake this insight into a definition, but there is one rnissing ingredient.We need to know when two different sequences have the same limitoFortunately this is an easy problem to solve. Suppose the sequen cesXl, X2, ••• and x~, X~, ••• have the same limit X. Then for every errorl/n there is a position m in the sequence Xl, X2, ••• beyond which allthe ;fk differ from x by at most 1/n, and similarly thcre exists a positionm' in the sequence X'I' x~, . .. beyond which all the x~ differ from x by atmost l/no If we go beyond the larger of the two, m and m', then Xk willdiffer from x~ by at rnost l/n + l/n = 2/n. Clearly it is just a changeof notation to arrive at the formulation [or every n there exists m(depending on n) such that for all k ~ m, IXk -x~1 ::; l/no Two Cauchysequences with this property will be called equivalent (at present thisdefinition can only be formal1y stated for Cauchy sequences of rationalnumbers, since we have not defined the meaning of IXk - x~1::; l/n forreal nurnbers Xk and Xki however, once we have done so, the definitionwill apply verbatim to sequences ofreal numbers). We have thus shown,informally, that if two Cauchy sequences have the same limit, then theyare equivalent. What about the converse? If two Cauchy sequencesXl, X2,'" and x~, x~, ... are equivalent, by how much can their limitsz and x' differ? Certainly not by more than 1/n + 1/n for any n. Thisimplies that the two Cauchy sequences have the same limit, at least ifwe believe the axiom of Archimedes (every positive number is greaterthan 1/n for sorne n). We have seen that the axiom of Archirnedes istrue for rational nurnbers, and it is one of the properties we want tokeep when we pass from the rationals to the reals.

Let us summarize the informal conclusions we have obtained sofar: il the real number system is to be an orderedfield containing therational numbers [or which completeness, density 01 the rationals, andthe axiom 01Archimedes hold, then it should consist 01limits 01Cauchysequences 01rationals with two such limits being consideredequal il andonly il the Cauchy sequences are equivalent.

The use of the term "equívalent" in mathematics requires that theproperties of reftexivity (A is equivalent to A), symmetry (A is equiva-

Chapter 2 Construction of the Real Number System34

Page 54: Strichartz_The Way of Analysis 2000

Notice that this formal proofis a little artificial in the choice of 1/2nas the error in the first two estimates, which leads to the final estimatewith an error of l/no Alternatively, we could have started with theestimates IXk - Ykl ~ l/n and IYk - zkl $ l/n and come up with theconclusion that IXk - zki $ 2/n for all k ~ m and then remarked thatwe can obtain the desired conclusion by a change of variable. Eitherform of proof is acceptable. Notice that the idea behind this simpleproof is that if x is close to y and y is close to z, then x is close to Z.

The triangle inequality gives this idea quantitative form, and then itis inserted into the universal-existential-universal forrn of sentence thatdefines equivalence. This proof is a simple prototype of many of theproofs we will be seeing and so I have somewhat belabored the point.

by the triangle inequality for rationals. QED

Proof: Let Xl, X2, ••. , YI, Y2, •.• , and Zl, Z2,'" be Cauchy sequencesof rationals such that the X - Y and y - Z pairs are equivalent. To showthe X - Z pair is equivalent we must show that given any error l/n, wecan find an m (depending on n) such that for a11k ~ m, IXk-Zkl $ l/noBut by the x - y equivalence there exists mI (depending on n) such thatfor k ~ mI, IXk - Ykl $ 1/2n. Similarly by the y - Z equivalence thereexists m2 (depending on n) such that for all k ~ m2, IYk - zkl $ 1/2n.Thus by taking m to be the larger of mI and m2, for a11k ~ m we have

Lernrna 2.1.1 The equivalen ce 01 Cauchy sequences 01 rationals istransitive.

lent to B implies B is equivalent to A), and transitivity (A is equivalentto B and B is equivalent to e implies A is equivalent to e) be satisfied.If these properties hold, then we have a true equivalence relation, andwe can divide the set on which the relation is defined into disjoint sub­sets, called equivalen ce classes, such that A and B belong to the sameequivalence class if and only if A is equivalent to B. In the presentcase the first two properties are obvious. The transitivity is not hardto show.

352.1 Cauchy Sequences

Page 55: Strichartz_The Way of Analysis 2000

One should not get too hung up on the formal nature of this defini­tion. Although we have defined the real number x to be the equivalenceclass of Cauchy sequences, this is largely a linguistic conventíon, Wecould equally well think of the particular Cauchy sequen ces as specify­ing the real number, or labelling it, with the different elements of anequivalence class providing different labels for the same number. Wecould also try to pick out a particular Cauchy sequence from each e­quivalence class, and in fact we will do this when we discuss the infinitedecimal expansions, From the point of view of the mathematics thatresults, all these approaches are equivalent. Quite frankly, no maturemathematician thinks of an equivalence class of Cauchy sequences ofrationals every time the word "real number" appears. This definition,or any equivalent one, is merely a device to begin studying the proper­ties of the real number system; the properties of the system eventuallylead to the individual mathematician's mental conception of the realnumbers. Since we have as yet no properties in sight, we had best notlinger on the linguistic conventions of the definition,

We want to think of the reals as an enlargement of the rationals, sowe want the rationals to form a subset of the reals. This is not strictlyspeaking the case in our formal definition, but we have the informal ideathat the rational number r should be the limit of the Cauchy sequencer, r, o o o, and so we should attempt to identify the rational number rwith the real number which is the equivalence class of r, r,. o o' To make

Deftnition 2.1.1 Let e denote the set 01all Cauchy sequences XI, X2, o o o

01 rational numbers, and let R denote the set 01 equivalence dasses 01elements 01eo We calllR the real number system, or the reals, [or short,and the elements X 01 lR (which are equivalence classes 01 Cauchy se­quences 01 rationals) are called real numberso We will also say that aparticular Cauchy sequence in the equivalence class X converges to x orhas X as a limito

Eventually the steps involved in constructing the formal proof from theidea of the proof, or conversely in extracting the idea of the proof fromthe formal proof, should become routine (except that ideas of proofsare often quite subtle and dífñcult).

We can now proclaim the formal definition of the real number sys­tem:

Chapter 2 Construction of the Real Number System36

Page 56: Strichartz_The Way of Analysis 2000

8. Can a Cauchy sequence of positive rational numbers be equivalentto a Cauchy sequence of negative rational numbers?

7. Show that the Cauchy sequence .9, .99, .999, ... is equivalent to1,1,1, ....

6. Give a proof that any infinite decimal expansión ±N.al, a2, aa, ...gives a Cauchy sequence ±N, ±N.a., ±N.al, a2, ....

5. Prove that if a Cauchy sequence Xl, X2, ••• of rationals is modifiedby changing a finite number of terms, the result is an equivalentCauchy sequence.

4. Suppose Xl, X2, •. ' and Yl, Y2, ••• are two sequences of rationalnumbers. Define the shufHed sequence to be Xl, Yl , X2, Y2, •••.Prove that the shufHed sequence is a Cauchy sequence if and onlyif X¡, X2, ••• and Yl, Y2,' •• are equivalent Cauchy sequences.

3. What kinds of real numbers are representable by Cauchy sequencesoí integers?

2. Show that every real number can be given by a Cauchy sequence ofrationals rl, r2, ... , where none of the rational numbers rl, r2, ...is an integer.

1. Show that there is an uncountable number of Cauchy sequencesof rational numbers equivalent to any given Cauchy sequence ofrational numbers.

2.1.3 Exercises

this identification at this stage requires only that we verify that distinctrationals r :1= s are identified with distinct reals (r, r, ... not equivalentto s, s, ... ). This verification is simple, since r :f:. s implies there existsN with Ir-si ~ l/N by the axiom of Archimedes for rationals, and thisprevents r, r, ... from being equivalent to s, s, . ... The identificationalso brings with it a future committment to consistency: whenever aproperty or operation is defined [or the reals that is already defined [orthe rationals, the two definitions must coincide on the rational subset01 the reals. Fortunately, this will never be díffícult to verify.

37~.1 Cauchy Sequences

Page 57: Strichartz_The Way of Analysis 2000

a. Let Xl, X2, ••. and YI, Y2, •.• be Cauchy sequences 01 rationals.Then Xl + YI, X2 + Y2, ..• is also a Cauchy sequence 01 rationals.

Lemma 2.2.1

To begin the justification of the definition of the real number systemthat we have chosen, we want to transfer the basic properties of orderand arithmetic from the rationals to the reals. The basic idea foraccomplishing this is to work term-by-term on the Cauchy sequencesof rationals. This requires a certain amount of detailed checking thateverything makes sense and is well defined. Let us consider additionfirst. What should X + Y mean if X and y are real numbers? SupposeXl, X2, •.• is a particular Cauchy sequence of rational numbers in theequivalence class that defines x, and similarly let YI, Y2, .•. be a Cauchysequence of rationals that represents y. Recall our intuition that theterms Xk in the first sequence are approximating x. Shouldn't Xk + Ykthen approximate x + y? Surely this is what working term-by-termsuggests. But is this a reasonable definition? The sequence Xl +YI, X2 +Y2, ••• is clearly a sequence of rationals, but is it a Cauchy sequence?Presumably it is, but this is the first thing we have to check to determineif the definition makes sense. The second thing we have to check is moresubtle but equally important. We chose particular Cauchy sequencesXl, X2, •.• and YI, Y2, ••• out of the equivalence classes defining X and y.Suppose we chose dífferent ones, i.e., x~, x2" .. , equivalent to Xl, X2,' ••

and y~, y~, ... equivalent to YI, Y2, .•• ; would X~ + y~ , xí + y~, ... then bea Cauchy sequence equivalent to Xl + YI, X2 + Y2,"'? This is certainlynecessary ifthe sum x+y is to be well defined as the equivalance class ofXl +YI, X2 +Y2, .... To summarize: when defining an operation on realnumbers, we need to verify first that the operation preserves Cauchysequences 01 rationals and then to verify that it respects equivalenceclasses.

2.2.1 Defining Arithmetic

2.2 The Reals as an Ordered Field

9. Show that if Xl, X2,' .• is a Cauchy sequence of rational numbersthere exists a positive integer N such that Xj ~ N for all j.

Chapter 2 Construction of the Real Number System38

Page 58: Strichartz_The Way of Analysis 2000

By the lemma, Xl + YI, X2 + Y2, .•• is a Cauchy sequence and theequivalence class depends only on x and y and not on the particularCauchy sequences xI, x2,'" and Yll Y2,'" chosen.

The story for multiplication is similar. We want to define x· Y as theequivalence class of Xl Yl, X2Y2~ .... Why is this a Cauchy sequence? Lookat a typical difference XjYj - X¡"·Yk. We have information about Xj - Xk andYj - Yk, so we write

Deftnition 2.2.1 The real number X +Y is the equivalen ce closs 01 theCauchy sequence Xl + YI, X2 + Y2, ... , where XI, X2, . .. represents X andYI, Y2, ..• represents y.

for k ~ m. QED

for j,k ~ m.For part b, given any error l/n, there exists mI such that IXk-X~1 ~

1/2n for k ~ mI and there exists m2 such that IYk - Y~I ~ 1/2n fork ~ m2, because of the equivalence of Xl, X2, ... and x~, xí, ... and theequivalence of YI, Y2, ... and ~, y~, .... If we take m to be the larger ofmI and m2, then we have

Proof: The argument is very similar to the proof of the transitivityof equivalence given in the last section. For part a, given any error l/n,there exists mI such that IXj - xkl ~ 1/2n for i, k ~ mI and thereexists m2 such that IYj - Ykl ~ 1/2n for i, k ~ m2, because XI, X2, ...and YI, Y2, ... are Cauchy sequences. Then by taking m to be the largerof mI and m2, we have

b. In addition, let x~, xí, ... be a Cauchy sequence 01 rationals e­quivalent to Xl, X2,.'" and let Y~,!h, ... be a Cauchy sequence01 rationals equivalent to YI, Y2, . ... Then X~ + y~, xí + Yí, ... isequivalent to Xl + YI, X2 + Y2,····

392.2 The Reals as an Ordered Field

Page 59: Strichartz_The Way of Analysis 2000

IXjYj - XkYk I = IYj (x j - Xk) + Xk(Yj - Yk) I< IYj IIXj - xkl + IXkllYj - Ykl

1 1 2N< N·-+N·-=-n n n

Proof:a. Given any error l/n, there exists (as before) m such that

IXj - xkl ~ 1/n and IYj - Ykl ~ l/n for all j, k ~ m. Also by theprevious lemma there exists N such that Ixj I s N and Iy} I s N for j.Note that N is fixed once and for all and does not depend on n. Thisis crucial, for we have

(t his is also the idea behind thc formula for the dcrivative of a product).\Yl' can make .TJ - J:k and Yj - Yk small by taking j and k largo, but westill have to control the factors Yj and Xk that multiply them. Note thatwe don't have to make both factors in a product small in order to makethe product small; it is enough to make one factor small and the otherbounded. Thus we need an upper bound for y} and Xk, independent ofj and k.

Lernrna 2.2.2 Every Cauchy sequence 01 rationals is bounded. Thatis, there exists a natural number N( depending 01 the sequence Xl, X2, ... )sueh that [or all k, IXkl ~ N.

Proof: By the Cauchy criterion there exists m such that IXj -xkl ~ 1for j, k ~ m. If we choose N larger than Ixml + 1, then

IXjl = I(Xj - xm) + xml ~ IXj - xml + Ixml~ Ixml + 1s N

for all j ~m. If we also take N greater than IXII, ... , Ixm-d, thenwe will trivially have Ixj I ~ N for j < m. This imposes only a finitenumber of conditions on N, so such a number can be found. QED

Lernrna 2.2.3a. Let Xl, X2, ... and YI, Y2, ... be Cauchy sequences 01 rationals.

Then XIYI, X2Y2, ... is also a Cauchy sequence 01 rationals.

b. In addition, let x'I' X~, ••• be a Cauchy sequence 01 rationals e­quivalent to XI, X2,' .. , and let Y~, Y~, ... be a Cauchy sequence 01rationals equivalent to Yl, Y2, .... Then X/IY~' x~Y~, ... is equivalentto XIYl, X2Y2, ....

Chapter 2 Construction of the Real Number System40

Page 60: Strichartz_The Way of Analysis 2000

íi. there exists an additive identity O, such that O+ x = x.

iii. every element x has a negative -x, such that -x + x = O (thisimplies that subtraction is always possible, x - y being x +(-y)).

We have now defined an arithmetic of real numbers. The rationalnumbers being identified with a subset of the real numbers, we shouldcheck that the arithmetics are consistent; that is, if we add the rationalsr and 8 as rationals, r +8, we get the same real number (r +8, r +8, ••• )

as we get by adding the real numbers r, r, ... and s, s, .... Clearly thisis trivially true. Less trivially, we want the arithmetic of the reals tohave all the properties of arithmetic of rationals. These properties areexactly the field axioms (and their consequences). Recall that a setF with two operations X + Y and x . y defined is called a field if thefollowing axioms hold:

í. addition is commutative (x + y = y + x) and associative( (x + y) + z = x + (y + z)).

2.2.2 The Field Axioms

Deflnition 2.2.2 The real number X • Y is the equivalen ce das s 01 theCauchy sequence XIYI, X2Y2, ... , where Xl, X2, ... and Y¡, Y2, ... respec­tively represent X and y.

IXjYi - xjyjl = IYj(Xj - xj) + xj(Yi - Yj)1

< IYjllxj-xjl+lxjIIYj-Yjl1 1 1

~ N·-+N-=-2Nn 2Nn nfor all j ~m, proving the equivalence of xIYI, x2Y2, .•. and x~y~,x2Y2' .... QED

for all i,k ~ m. By a change ofvariable (replacing m by m' determinedby 2Nn rather than n) we have IXjYj - XkYkl ~ l/n for all i.k ~ m',which proves that XIYI, X2Y2," . is a Cauchy sequence.

b. Let N be an upper bound for all four Cauchy sequences. Givenany error l/n, we use the equivalence of Xl, X2, ... and X~, X~, ... tofind mI such that IXj- xjl ~ 1/2Nn(guess why this choice!) if j ~mIand similarly find m2 such that IYj - yj I ~ 1/2Nnif j ~m2. Takingfor m the larger of mI or m2, we have

412.2 The Reals as an Ordered Field

Page 61: Strichartz_The Way of Analysis 2000

Proof: What does x '# O mean in terms of a particular Cauchysequence X¡, X2, ••• in the equivalence class x? We need to form thenegation oí the statement that Xl, X2,." is equivalent to O,O,... ; in

Lemma 2.2.4 Let x be any real number diJJerent from zero. Thenthere exists a natural number N such that [or every Cauchy sequenceXl, X2, ••• in the equivalen ce class 01 x, there exists m sucñ that IXjl ~1/N [or all j ~m. The number m 01 exceptions will depend on theparticular Cauchy sequence, but the lower bound l/N will noto

Proof: Clearly the zero is the equivalence class of O,O,... , and theone is the equivalence class oí 1, 1, .... Almost all the field axioms aretrivial to establish and entail little more than quoting the analogousaxiom for the rationals. The exception is axiom vi, the existence ofreciprocals, so let us go into the details. Intuitively it is clear that wewant to define x-l by xII, x;I, ... if x is defined by Xl, X2, .... Wemeet here a preliminary obstacle in that x¡1 is not defined if x j =O. Even supposing this is never the case, we will need to verify thatxII, x;-l, ... is a Cauchy sequence. For this we will need to estímatethe difference x¡l - x;l, knowing that Xj - Xk is small. Now we havexjl - x;l = xjlx;l(Xk - Xj), so again we are in the situation ofhaving a product in which one factor is small. We need to show thatthe other factors are bounded. Thus we need an upper bound for x¡1or equivalently a positive lower bound for x i: The piece of informationthat we have at our disposal, and which we have not yet used, is thatx is not zero, which means Xl, X2, ... is not in the equivalence class ofzero. This property, which we will be able to identify later as the axiomof Archimedes, we formulate as a separate lemma.

Theorem 2.2.1 The real numbers [orm a field.

vi. every element x except Ohas a reciprocal x -1, such that x -1.X = 1(this implies that division by non-zero numbers is always possible).

vii. multiplication distributes over addition, x . (y + z) = x . y + x . z.

v. there exists a multiplicative unit 1 (distinct from O), such thatl·x = x.

iv. multiplication is commutative and associative.

Chapter 2 Construction oí the Real N umber System42

Page 62: Strichartz_The Way of Analysis 2000

IXil = I(xj - xi) +xjl> IXjl-lxj - xjl

1 1 1> N - 2N = 2N

other words, the negation of the statement: for a11n there exists msuch that j ~m implies IXi I ~ l/no The negation is: there exists nsuch that for a11m there exists j ~m such that IXil > l/no Thisis close to what we want but not quite right. We want the estimateIXil ~ l/N to hold for a11j ~m, not just for an infinite set of j's ("forall m there exists j ~m" is equivalent to "for an infinite set of j's").To get this added information we need to use the fact that Xl, X2, ••• isa Cauchy sequence, a fact we have not yet used and without which theconclusion is false (as in the sequence 0,1, 0,1, 0,1, ... ). The Cauchycriterion makes the terms eventua11y a11become close together, so wecan't have an infinite number satisfy IXi I > l/n without a11but a finitenumber satisfying a slightly weaker estimate, say IXi I ~ 1/2n. To makethis precise, we use the Cauchy criterion with error 1/2n: there existsm such that for all i, k ~ m, IXj - xkl ~ 1/2n. By then choosing oneparticular value of j ~m such that IXi I > l/n, which we know existsby the first step of the proof, we have

IXkl = I(:rk - Tj) + xjl 2: I·Tjl-lxk - xjl1 1 1

>---=­- ri 2n 2n

for all k ~ m. (Note the use of the transposed-triangle inequalityla - bl ~ lal-Ibl·)

This establishes the desired estimate with N = 2n for each par­ticular Cauchy sequence in the equivalence class. However, it is notyet clear that we can find a single value for N that will work for a11Cauchy sequences in the equivalence class. To see this we need onemore observation: if there exists m such that IXjl ~ l/N for a11j ~mand if x~, xí, . .. is equivalent to Xl, X2, ••• , then there exists mi suchthat Ixil ~ 1/2N for a11j ~mi. This will complete the proof sinceonce we have the lower bound 1/N for one representative Cauchy se­quence, we obtain the lower bound 1/2N for a11Cauchy sequences inthe equivalence class, To establish the observation we need only choosemi greater than m such that Ixj - xj I ~ 1/ 2N for a11j ~mi (using theequivalence of the sequences), and then

432.2 The Reals as an Ordered Field

Page 63: Strichartz_The Way of Analysis 2000

We have thus constructed a unique real number x-I for every non­zero x, and it is clear that x-l. X = 1 by multiplying xII, x21, .. o andXl, X2, ••• directly. QED

<Ix¡lx~-l(xj - x))1

Ix;lllxj-Illx~ - x) I< N. N . _1_ = ~.

N2n n

-1 -1 . e hso Xl ' X2 , ••• is a auc y sequence.If X'I' xí, ... is any equivalent Cauchy sequence (again assuming that

a finite number of terms are modified if necessary so that all x~ are non-) h h t 1-1 1-1 . . 1 t -1-1zero , we want to s ow t a Xl ' X2 ,. o • lS equiva en to Xl ' X2 ' o ...

Again we choose N as in the lemma; then given l/n we choose m sothat IXj - xjl ~ 1/N2n for all j ~m (the equivalence of Xl,X2, ... andX'l,xí, ... ) and also so that IXjl ~ l/N and Ixjl ~ l/N for all j ~m.Then for all j ~m we have

Ix-I x'-ll =j - j

Having completed the proof of the lemma, we return to the proof ofthe theorem, namely the existence of x-l. Let Xl, X2, ... be a Cauchysequence representing x. By the lemma, all but a finite number ofterms are non-zero. If we modify all the zero terms, say replace themby ones, we get an equivalent sequence. Assuming this done (withoutchanging notation), we can form the sequence ofrationals xII, x2"l, ....As before we need to show two things: i) it is a Cauchy sequence, and ii)the equivalence class of xII, x2"l, ... does not depend on the particularchoice of Xl, x2, .. o.

To prove it is a Cauchy sequence we let N be the natural numbergiven by the lernma. Then for any given error l/n, there exists msuch Ix} - xkl ~ 1/N2n for all i.k ~ m, since Xl, X2, ... is a Cauchysequence. If we also choose m large enough so that IXj I ~ l/N for allj ~m as the lemma asserts we can, then for j ~m we have

for all j ~m'. QED

Chapter 2 Construction of the Real Number System44

Page 64: Strichartz_The Way of Analysis 2000

Notice that in the second picture we can slip a rectange underthe graph. If the height of this rectangle is l/N and the base is theinterval [O,l/m], then we are saying xi ~ l/N for all j ~m (we could

Figure 2.2.1:

1/3 1/2

Having established the field axioms for the real number system, wehave completed the transference of arithmetic from the rationals to thereals, since all the usual facts of arithmetic are consequences of the fieldaxioms. We turn next to the concept of order. Every rational numberis either positive, negative, or zero (only one of the abovel}; and fortwo rational numbers r and s, we say r > s, r < s, or r = s accordingto if r - s > 0, r - s < 0, or r - s = O. For the real numbers we want asimilar ordering, and the first step is to decide for each real x whetherx > 0, x < 0, or x = O. Clearly it is tempting to say x > ° if xi > °for all j where Xl, X2,." is a Cauchy sequence ofrationals representinge. But there are sorne troubles with this idea, since it depends on theparticular choice of Cauchy sequence-a modification of say the firstterm to Xl = -1wiIl spoil things without changing the nurnber x,We might then be ternpted to try the condition: there exists m suchthat xi > O for j ~m. But there is still a problem here, since thesequence 1, 1/2,1/3, ... represents zero but still satisfies the condition.Thus while the condition might weIl be necessary for positivity (in factit is), it is not sufficient.

To understand the difficulty better, let 's look at the two-dirnensionalpicture, both for the sequence 1,1/2,1/3, ... and for a general sequencerepresenting a point aboye the x-axis, as shown in Figure 2.2.1.

45

2.2.3 Order

2.2 The Reals as an Ordered Field

Page 65: Strichartz_The Way of Analysis 2000

Proof: Zero is not positive, as the Cauchy sequence 0, 0, ... clearlyshows. Since -o = 0, zero is not negative either. Let x be any non-zeronumber; we need to show that either x or -x is positive, but not both.Let Xl, X2,'" be a Cauchy sequence representing z, By the previouslemma there exists N and m such that IXj I ~ l/N for all j ~m. Butthe signs of the rational numbers cannot keep changing, because eachsign change produces a jump of at least 2/N between terms and thiswould violate the Cauchy criterion. Thus by increasing m if necessary,we have either Xj ~ l/N for all j ~m or Xj ~ -l/N for all j ~m. Inthe first case X is positive, in the second case -x is positive, and bothcan 't occur for the same x. Finally the verification that the sum or

Remarks A field with a notion of positivity with the properties givenby this theorem is called an ordered field. Thus the reals are an orderedfíeld.

Theorem 2.2.2 Each real number is either positive, negative, or zero,but only one 01 the three. The sum and product 01positive numbers arepositive.

Deflnition 2.2.3 A real number x is said to be positioe il there ex­ist natural numbers N and m sucñ that Xj ~ 1/N [or all j ~m,where Xl, X2, •.• represents x, The number m depends on the partic­ular Cauchy sequence, bui the number N does not (the proo] 01 theprevious lemma shows that once we hove verified the condition [or oneCauchy sequence, it is valid [or all equivalent ones). A real number X

is called negative, X < 0, if -x is positive. Note that il X is rational,these definitions are consistent with the definitions 01 positivity andnegativity [or rational numbers, since the rationals satisfy the axiom 01Archimedes.

equally well say x j > l/N by replacing N by N + 1). You should thinkof this condition as showing that the tail of the sequence is boundedaway from zerOj this is a stronger condition than merely being positivesince it asserts that the separation from zero remains greater than afixed amount l/N for all the terms beyond the m-th (of course for anindividual number there will be no distinction between being positiveand being bounded away from zero, as we will show in the proof of theaxiom of Archimedes).

Chapter 2 Construction of the Real Number System46

Page 66: Strichartz_The Way of Analysis 2000

Theorem 2.2.4 (Axiom 01 Archimedes) For any positiue real numberx > O there exists a natural number n such. that x ~ l/n.

The triangle inequality is frequently used in the form Ix - zl ~I:z:- yl + Iy - zl, in transposed form Ix - yl ~ Ixl-lyl, and for the sumsof more than two numbers.

Theorem 2.2.3 (Triangle inequality) Ix +yl ~ [z]+ lyl [or real num­bers x and y.

Proof: Apply the previous lemma to the triangle inequality for ra­tionals IXk+Ykl ~ IXkl+IYkl where {Xk} and {Yk} are Cauchy sequencesof rationals defining x and y. QED

Basically a11the properties of inequalities for rationals are true forreals. The next two are especia11yimportant.

Proof: If not, then x > y; so x - y is positive. But then by thedefinition, Xk - Yk > 1/n for k large and some 1/n, contradictingXk :5 Yk for a11k ~ m. QED

Lemma 2.2.5 Let x and y be real numbers defined by Cauchy se­quences {Xk} and {Yk} 01 rational numbers. 11Xk s Yk [or all k ~ m,then x ~ y.

Remarks The converse is not true (can you give a counterexample?).

We can use the concept of positive number to define inequalitiesfor real numbers. Thus z > y means x - y is positive, x ~ y means:z: > y or x = y, Ixl < y means y - x and y + x are positive, and so on.We define Ixl to be x if x > O,-x if x < O,and 101 = O. In verifyinginequalities for real numbers it is convenient to be able to pass frominequalities involving the rational approximatíons; this requires thatthe inequalities be non-strict (~ or ~) but not strict « or ». Forexample, 1/n > O,but the Cauchy sequence 1, 1/2, 1/3, ... representsthe number O, and we do not have O> O. The next lemma gives thepositive resulto

product of positive numbers is positive is essentially trivial, since thesum or product of the lower bounds will give lower bounds. QED

47s.~ The Reals as an Ordered Field

Page 67: Strichartz_The Way of Analysis 2000

2. Show that the real number system is uncountable and, in fact,has the same cardinality as the set of a11subsets of the integers.

1. Write out a proof of the commutative and associative laws foraddition of real numbers.

2.2.4 Exercises

Proof: Let {xd be any Cauchy sequence of rationals defining x.Given the error l/n, there exists m such that IXk-xj I ~ l/n if i.k ~ m.Choose y = Xm, so that IXk - yl ~ l/n for every k ~ m. By the lemmawe have Ix - yl ~ l/no QED

Theorem 2.2.5 (Density 01 Rationals) Given any real number x anderror l/n, there exists a rational number y such that Ix - yl ~ l/no

This theorem is frequently used in the fo11owingform: if Ixl ~ l/nfor every natural number n, then x = O.

The fact that the real numbers form an ordered field means that allthe familiar algebraic identities, such as (x+y)2 = x2+2xy+y2, whichonly involve the operations of arithmetic, are valid for real numbers¡and the same is true for inequalities such as x2 + y2 /x2 ~ 1. Thereason for this is that such identities and inequalities are consequencesof the ordered field axioms. We will use such "facts" freely from nowon without special mention. In exercises 8 and 9 you will be asked toderive some of these facts from the axioms of an ordered fíeld, and thiswill give you some confidence that the ordered field axioms are rea11ysufficient to contain this aspect of elementary álgebra.

We conclude this section with a precise formulation of the densityof the rationals in the reals. This is essentially built into the definition,since the Cauchy sequence of rationals {Xk} defining the real numberx consists of rational numbers Xk that are approximating x, and thedensity of the rationals in the reals simply says that every real can beapproximated arbitrarily closely by rationals.

Proof: We have already shown that there exists n such that Xj ~l/n for a11j ~m, where {x i } is a Cauchy sequence of rationals defininga; By the lemma this implies x ~ l/no QED

Chapter 2 Construction of the Real Number System48

Page 68: Strichartz_The Way of Analysis 2000

n ~ ( n) k n-k ( n ) n!(X + y) = ~ k X Y where k = k! (n _ k)!'

9. Prove the following inequalities from the ordered field axioms:

a. x2 + y2 /x2 ~ 1, X =F O.b. 2xy ~ x2 + y2.c. x/y> X if X > Oand O< y < 1 .

10. Show that if a real number X can be represented by a Cauchysequence of positive rationals, then X ~ O. What does this tellyou about real numbers that can be represented by two equiva­lent Cauchy sequences of rationals, one consisting of on1ypositiverationals and the other consisting of only negative rationals.

11. Prove that no real number satisfies x2 = -1.

12. Define x3 = X • x2. Prove that if XI, X2, ... represents x, then3 3 t 3Xl' x2"" represen s X •

5. Prove that there are an infinite number of rational numbers inbetween any two distinct real numbers.

6. Let X be a positive real number. Prove that there exists a Cauchysequence of rationals of the special form p2 / q2 , P and q integers,representing z:

7. Prove Ix- yl ~ Ixl-lyl for any real numbers X and y. (Hint: usethe triangle inequality).

8. Prove the following identities from the field axioms:

a. (x+y)2=x2+2xy+y2.b. (x + a/x)2 - 4a = (x - a/x)2, x =1= O.c. ax2 + bx + e = a (x + b/2a)2 + e - b2/4a.d.

3. If x is a real number, show that there exists a Cauchy sequenceof rationals Xl, X2, ••• representing X such that Xn < X for all n.

4. Let x be a real number. Show that there exists a Cauchy sequenceof rationals XI, X2, ••• representing X such that Xn ~ xn+! for everyn.

492.2 The Reals as an Ordered Field

Page 69: Strichartz_The Way of Analysis 2000

At this stage in our development of the real number system, we havesucceeded-with a lot of hard work-in arriving at about where westarted. We have shown that the real number system is an ordered field.But the rational number system was also an ordered field. Now we needto show that we have really plugged up all the holes (such as 0) inthe rational number system. This property is called completeness andcan be succinctly described by saying that if we repeated the processwhereby the reals were constructed from the rationals, starting insteadfrom the reals, then we would not end up with anything new.

Our construction of real numbers was based on sequences of ra­tional numbers. We now want to consider sequences of real numbersXl,'X2, X3, .... Here each Xj is a real number and so, strictly speaking,is a symbol that stands for an equivalence class of Cauchy sequences ofrational numbers. Again 1 must emphasize the desirability of doing abit of mental gymnastics: think of the real number x j as a single entity(like a pebble), and yet be capable at times of recalling the definitionas an equivalence class of Cauchy sequences of rationals (the pebble isactually an amalgam of molecules, each of which is composed of atom­s, each of which is composed of ... ). We can now apply the Cauchycriterion to this sequence of reals: [or every n there exists m such thatIXj - xkl ~ 1/n il j, k ~ m. This is a meaningful statement since theinequality IXj - xkl ~ 1/n is meaningful for real numbers. A sequenceof real numbers that satisfies the Cauchy criterion is called a Cauchysequence. The intuition involved is the same as for the definition ofCauchy sequences of rationals: the terms of the sequence get closerand closer together as you go out in the sequence.

Since not every Cauchy sequence of rational numbers had a limitthat was a rational number, we were motivated to invent real numbersto be these limits. We do not have to invent any new numbers to belimits of Cauchy sequences of real numbers. To see this we need firstto formalize the idea of limito When is a real number x the limit ofthe sequence of real numbers Xl, X2,"'? Clearly when the terms Xk inthe sequence get closer and closer to X. We can use the order of thereals to define the inequality IXk - xl ~ 1/n and, hence, the definition

2.3.1 Proof of Completeness

2.3 Limits and Completeness

Chapter 2 Construction of the Real Number System50

Page 70: Strichartz_The Way of Analysis 2000

IYi - Ykl < IYi - Xii +IXi - xkl + IXk - Ykl1 1 1 1 2~ -+-+-<-+-j 2n k - 2n m'

Proof: The fact that the existence of the limit implies the Cauchycriterion is trivial: if m is such that k ~ m implies Ix - xkl ~ 1/n,then j, k ~ m implies IXj - xkl ~ 2/n by the triangle inequality. Thenon-trivial part is the converse.

Suppose then that the sequence Xl, X2, ••• satisfies the Cauchy cri­terion. We need to construct the limit as a real number. This meanswe have to find a Cauchy sequence of rationals Yl, Y2, ... to define yand then prove limk_oo Xk = y. The idea is that we want to take for Yka rational number close to Xk, say so close that IXk - Ykl ~ l/k. Thisis possible by the density of rationals. Then it is a simple matter toshow that {yk} is a Cauchy sequence. Given an error l/n, choose mso that IXj - xkl ~ 1/2n for j, k ~ m (this is possible because {xÚ isa Cauchy sequence). Then

Theorem 2.3.1 (Completeness 01 the Reals) A sequence xl, X2, ••• 01real numbers has a limit il and only il it is a Cauchy sequence.

if k is large enough¡ hence, x - y = Oby the Axiom of Archimedes.We should also verify, for consistency of notation, that if {Xk} is a

Cauchy sequence of rationals defining x, then limk_oo Xk = x. Indeed,given the error l/n, we can find m such that j, k ~ m implies IXj -xkl ~l/n, since {Xk} is a Cauchy sequence. But then j ~m also impliesIXi - xl ~ l/n, since this follows from IXi - xkl ~ l/n for all k large,which is what we have for k ~ m. Thus lim z, = z.

Ix - yl = I(x - Xk) - (y - xk)1

ollimit: x = limk_oo Xk illor every natural number n there exists anatural number m such that k ~ m implies IXk - xl ~ l/n. Notice thatwhile the limit of a sequence of real numbers need not always exist, ifit exists it is unique. This is because if y were another limit, then wewould have

512.3 Limits and Completeness

Page 71: Strichartz_The Way of Analysis 2000

2.3.2 Square Roots

We can illustrate the abstract ideas that we have been developing ina concrete example by discussing square roots. It was, after all, thesquare root oí 2 that started the whole idea of irrational numbers. Wecan now show that within the real number system, all positive numbershave square roots.

We will not go through the formal proofs of these statements, astheyare merely repetitions oí arguments already given. Notice that inthe case of the quotient, some of the terms Xk/Yk may be undefinedif Yk = O, but since this cannot happen for k ~ m, it does not reallymatter. Also part b would not be valid with strict inequalities, sincel/k> Obut limk_oc l/k = O.

Jf in addition y =F O, then there exists m su eh that Yk =F O fork ~ m and limk_oc Xk/Yk = x/y.

b. lfxlc ~ Yk for all k ~ m, then x ~ y.

a. lf limk_oc Xk = x and limk_oc Yk = y, then

lim (Xk + Yk) = x + y and lim (XkYk) = xy.k-oc k-oc

Theorem 2.3.2

Next we state a theorem that summarizes the basic properties oflimits. Essentially it says that limits preserve the arithmetic and orderproperties of real numbers. We have actually seen all these statementsbefore in terms of Cauchy sequences of rational numbers.

which can be made less than l/n ifm is chosen greater than 4n (there isno harm in increasing m). Thus {Yk} is a Cauchy sequence of rationalnumbers and, hence, defines a real number y.

It remains to show limk_oc Xk = y. But again this is easy, since

1Iy - xkl ~ Iy - Ykl + IYk - xkl ~ Iy - Ykl + k'Since Yl, Y2, ••• represents y, we know that we can make Iy - Yk I ~ 1/2nfor k ~ m, and hence Iy - xkl ~ 1/2n + l/k ~ l/n if m ~ 2n also.Thus limk_oc Xk = y. QED

Chapter 2 Construction of the Real Number System52

Page 72: Strichartz_The Way of Analysis 2000

but we have divided the distance between Yl and %1 in half when wepass to Y2 and %2. Now we iterate¡ in other words, we repeat the processof going from Yl, %1 to fI2, %2 but starting with Y2,%2 to obtain t/3, %3 (so

YI mi ZI y2 m2 x Z211 I I I

11 11 11Y2 Z2 y2 Z2

2 2

YI mi ZI y2 x m2 Z2I I I11 11 11 11

Y2 Z2 y2 Z22 2

Figure 2.3.1:

Theorem 2.3.3 Let x be any positive real number. Then there exista aunique positive real number y such that y2 = X (we then write y = y'Z).

Proof: It is easy to show uniqueness, for ifalso %2 = x, then y2_%2 =O. But y2 _ %2 = (y _ %) (y +%) ; and since the reals form a field, we musteither have y _ z = O or y + z = O. Since both y and z were assumedpositive, we must have y + z > O, so y _ z = O and hence y = z,

To prove existence we use a method that we call divide and conquer.This is an idea we will use many times in the pages to come. We startby finding two numbers Yl and %1 such that y = ..jX lies between them.This is easy. H x > 1, then x2 = x + x(x - 1) > x since x(x _ 1) > O.From 1< x < x2 we would expect to have 1< y < x if y = ..jX. Thuswe set Yl = 1 and %1 = x. (Similarly, if O< x < 1, we can take Yl = xand %1 = 1, because x2 < x < 1. If x = 1 we can take y = 1 and we aredone.) Note that we are not claiming to have preved Yl :5 y :5 %1 (thisdoes not make sense because we have not yet constructed y), but wehave proved y~ :5 x :5 %~, which is intuitively an equivalent statement.

Now we divide and conquer. The interval Yl to %1 has midpointmI = (Yl + %1)/2. We choose the next interval Y2 to %2 to be eitherthe left (Yl to mI) or the right interval (mI to %1), depending on therelative size ofm~ and z, Ifm~ > x we take Y2 = Yl and %2 = mI, whileif m~ < x we take Y2 = mI and %2 = %1 (if m~ = x, then y'Z = mI andwe're done). The two possibilities are illustrated in Figure 2.3.1. Thepoint of this choice is that we still have y~ :5 x :5 %~,

532.3 Limita and Completeness

Page 73: Strichartz_The Way of Analysis 2000

1. Write out a proofthat limk_oo(Xk+Yk) = x+y iflimk_oo Xk = xand limk_oo Yk = Y for sequences of real numbers.

2.3.3 Exercises

The divide and conquer algorithm is not very efficient, since ateach iteration we only cut the error in half. To reduce the error by10-3 would require 10 iterations. In exercise 6 the reader is invited togive a different proof based on the more efficient algorithm describedin section 2.1.1. However, the proof is much trickier.

Of course you still cannot take square roots of negative numberswithin the real number system¡ to do this it is necessary to enlargethe system still further to the complex numbers. We will discuss thisextension later. For now we will just point out that it is rather dif­ferent from the extension rationals -+ reals in at least two importantways. The first way is that it is algebraic, not involving infinite process­es. The second is that the complex number system has a completelyseparate interpretation and intuitive image, whereas both the real andrational number systems have a common intuitive basis in the idea ofmagnitude.

Now we claim that Y¡, Y2,." and Z1, Z2, ... are Cauchy sequencesand that they are equivalent. This is easy to see. Given an error l/n wetake m large enough that (Z1 - Y¡)/2m-1 < l/n. Then by condition b,O $ Zm - Ym < l/n, and all Yk and Zk for k ~ m lie in the interval fromYm to Zm' Thus IYj - Ykl < l/n, IZj - zkl < l/n, and IYj - zkl < l/nfor j, k ~ m. This proves the claim.

Let y be the real number equal to the common limit of these twosequences. Passing to the limit in a yields y2 $ x $ y2, which impliesy2 = x. QED

m2 = (Y2 + z2)/2 and either Y3 = m2 and Z3 = Z2 if m~ < x or Y3 = Y2and Z3 = m2 if x < m~), and then Y4, Z4, and so on. We repeat theprocess of dividing infinitely often (unless we happen to hit ~ exactlyat one ofthe midpoints). In this way we obtain two sequences Yl, Y2, ...increasing and Z1, Z2, •.. decreasing such that

a. Y¡ $ x $ z¡ for all k, and

b. Zk - Yk = (Z1 - Yl)/2k-1.

Chapter 2 Construction of the Real Number System54

Page 74: Strichartz_The Way of Analysis 2000

6. *Suppose X > 1, and define the sequence Yl, Y2, ... by Yl = X andYk+l = T(Yk) for T(y) = (y + x/y)/2.

a. Show y - T(y) = (y2 - x)/2y and T(y)2 - X = (y2 - x)2/4y2.b. Show O ~ y - T(y) ~ (y2 - x)/2 and O ~ T(y)2 - X <

(y2 _ x)2/4 for y ~ 1and y2 ~ x.c. Show O ~ Yk - Yk+l ~ (yl - %)/2 and O ~ Yl+l -:r; <

(y¡ - x)2/4.d. Show that Yl, Y2, ... is a Cauchy sequence and if y = limk_oo Yk,

then y2 =e.e. Show that if Iy¡ - xl ~ 10-3, then IY¡+l - xl ~ 10-6/4 and

IY¡+2 - xl ~ 10-13•

be the associated sequence of continued fractions. Prove thatXo, Xl, X2, ... is a Cauchy sequence of rationals and that everypositive real number arises as a limito

5. For every sequence ko, kIt k2, ... of non-negative integers, let

are the unique real roots.

-b ± Jj)2 - 4ac2a

4. Let ax2 + bx + e be a quadratic polynomial with real coefficientsa, b, e and positive discriminant, b2 - 4ac > O. Prove that

3. Let Xl, X2, ... be a sequence of real numbers such that Ixnl ~ 1/2n,and set Yn = Xl +X2+···+Xn. Show that the sequence Yl,Y2, ...converges.

2. Prove that every real number has a unique real cube root.

552.9 Limita and Completeness

Page 75: Strichartz_The Way of Analysis 2000

A less trivial matter is the question whether every real number(defined by a Cauchy sequence of rationals) has an infinite decimalexpansiono The naive approach to solving this problem would be to

íf k ~ m.

2.4.1 Infinite Decimal Expansions

We have now completed the basic task of establishing the real num­ber system. Ahead oí us lies the more challenging task of exploring thedeeper properties of thís system. But before plunging ahead, we shouldpause to consider some altemate ways we might have proceeded. Firstwe will discuss other versions of the same system: infinite decimal ex­pansions and Dedekind cuts. Then we wiIl briefly discuss other visions,essentially different mathematical systems that offer a competing viewoí what the real number system should be.

Let's consider infinite decimal expansions. From the decimal expan­sion we obtain immediately a Cauchy sequence oí ratíonals by truncat­ing (if J2 = 1.414 ... , then 1, 1.4, 1.41, 1.414, ... is a Cauchy sequenceof rationals defining /2). In fact we can say that an infinite decimalexpansion is a special kind of Cauchy sequence oí rationals, one forwhich Zk = n + ¿7=1 aj /loi where n is an integer and O~ aj ~ 9 (fornegative numbers this must be slightly modified). In fact it is trivialto verify that this is a Cauchy sequence since

2.4 Other Versions and Visions

10. Prove that the irrational numbers are dense in 1R.

9. Prove the completeness of the integers (every Cauchy sequenceof integers converges to an integer). Why is this result not veryinteresting?

8. Prove that iflimk_oo Zk = Z and Zk ~ Ofor all k, then limk_oo ..jFk =Vi·

7. Prove that a > b > Oimplies Va > ..jb> O.

Chapter 2 Construction of the Real Number System56

Page 76: Strichartz_The Way of Analysis 2000

Proof: The idea of the proof is represented by Figure 2.4.1 wherethe brackets [ ] indicate an interval [x -l/n, x + l/n] that lies withinthe interval (y, z). By the axiom oí Archimedes we can always choosethe error l/n small enough to achieve this fit (since x - y and z - x

Lemma 2.4.1 11y < x < z for any three real numbers and il {Xlc}is a Cauchy sequence o/ rationals defining e, then there ezist« m su ehthat y < x le < z lor all k ~ m.

write out the infinite decimal expansion of each rational number Xn

in the Cauchy sequence defining x and to hope that these expansionseventually settle down to the expansion for z: Of course the example1.1, .99, 1.01, .999, 1.001,.999, ... shows that this procedure may not al­ways work. However, the key observation is that it can only fail fornumbers like x = 1, which have two distinct infinite decimal expan­sions. Such numbers are of the form m/101c;and since we know howto write infinite decimal expansions for them, we can use the naiveprocedure on a11the other numbers.

Thus assume x is a real number such that x "# m/10k for any mor k. For simplicity let us assume x > O. Let Xl, X2, X3, ••• be anyCauchy sequence of rational numbers defining z, Consider the infinitedecimal expansions of the rational numbers Xl, x2,.... We want toshow that eventually they all agree to any number of terms desired.For example, consider the first three terms n . al a2a3' The rationalnumbers r whose infinite decimal expansions begin this way are thosesatisfying the inequalities n + al/lO + a2/100 + a3/1000 < r < n +al/lO +a2/l00 +a3/10OO+ 1/1000 (what happens at the endpoint is amatter of convention). We want to show that eventually a11the rationalnumbers Xlc satisfy one such inequality, namely the one satisfied by e.Indeed x must satisfy such an inequality because we have assumed itis not of the form n+ad10 +a2/100 +a3/1000 (in more detail, choosethe largest number oí the form n.ala2a3 such that n.ala2a3 < Xj thenwe must have x < n.ala2a3 + 1/1000 because x i= n.ala2a3 + 1/1000and if we had x > n.al a2a3 + 1/1000, then n.al a2a3 would not be thelargest).

We claim that the first three terms of the infinite decimal expansionsof the rational numbers Xk are a11 n.ala2a3 for k sufficiently large. Todo this we clearly need the fo11owingresulto

572.4 Other Versions and Visions

Page 77: Strichartz_The Way of Analysis 2000

Taking y = n.QIQ2Q3 and z = n.QIQ2Q3+1/1000 in the lemma showsthat all the infinite decimal expansions of the Xk beyond a certain pointbegin with n.ala2a3. In this way we can find any number ofterms in thedecimal expansion of x, so the infinite decimal expansion of x is definedunambiguously for every real number not of the forrn n +¿~l aj /101.However, there is one severe shortcoming to this procedure: there isno a priori bound for how far out in the sequence {Xk} you have togo before, say, al is determined. For example, suppose you know thatfor all j, k ~ mo, IXi - xkl ~ 1/100. Does this mean that the decimalexpansions of aH Xj for j 2: mo agree up to al? WeH, it depends! If onesuch xi is 1.427, then all the others must He between 1.417 and 1.437and so must begin 1.4. But if one such xi is 1.4005, then all we can sayis that all the others must lie between 1.3905 and 1.4105, so we don'tknow if the infinite decimal expansion begins 1.3 or 1.4. Futhermore, wecannot say in advance how accurately we need to control the variationin the xi before we can decide between 1.3 and 1.4. AHwe know is thatif we continue long enough, we wiIl eventually get a decision (this is onlytrue because we have assumed x is not oí the form n +¿~l aj/loJ).

We have now shown that every real number has an infinite decimalexpansiono To complete the identification of the real number systemwith the system of infinite decimal expansions, we would have to definethe arithmetic (addition and multiplication) of infinite decimals andthe order relation x > y and show they agree with the arithmeticand order relation already defined for the real number system. Thiswould actually be a rather noxious task, since we have to deal with thepossibility oí infinite carries. Thus we will have to be content with the

are both positive by assumption, they must both be bounded below byl/n for sorne n, and these inequalities translate into the above picture).Since limk_oo Xk = x, there must be an m such that Xk Heswithin theinterval in brackets for all k 2: m, and hence y < Xk < z. QED

Figure 2.4.1:

zxy

Chapter 2 Construction of the Real Number System58

Page 78: Strichartz_The Way of Analysis 2000

must be Lx for sorne real z: Since we have already constructed the realnumbers, we can prove this statement as a theorem.

Dedekind's idea, however, was to use this proposition as a definition,and by means of it to construct the real number system. In otherwords, let us return to our initial position of knowing only the rationalnurnber system. It is certainly possible to think of sets of rationalnumbers L satisfying 1-4. Call such sets Dedekind cuts, and definethe Dedekind real number system to be the set of all Dedekind cuts!We can identify the rational number q with the Dedekind cut Lq = {rrational: r < q} and so embed the rational number system. We can also

4. if r is in L, there exists s in L with s > r

3. if r is in L and q < r, then q is in L;

2. L is not a11the rationals;

1. L is not empty;

The other version of the real number system that we should discuss isthe Dedekind cut construction. The method of Dedekind can actuallybe traced back to the work of the Greek mathematician Eudoxes, whichis included in Euclid's texto It is based entirely upon the ordering of therational numbers. The idea is that a real number x creates a division ofthe rational numbers into two sets: those greater than x and those lessthan z: If x itself is rational, then we have also to do something with Xj

and by convention we can lump it with the big guys. Thus, assumingwe have constructed a real number system, we want to associate to eachreal number x the set Lx of all rational numbers less than z: What kindoí set is Lx? Clearly it contains sorne but not all rational numbers; andif r is in Lx, then every rational number less than r is also in Lx. Theconvention that x should not belong to Lx if x is rational means thatLx does not contain a largest rational number. It turns out that theseproperties characterize the sets of rational numbers that are of the formLx. That is, any subset L of the rational numbers that satisfies:

2.4.2 Dedekind Cuts*

plausibility of the outcome. We will have no further need for infinitedecimal expansions in this work.

592.4 Other Versions and Visions

Page 79: Strichartz_The Way of Analysis 2000

Consider the midpoint (PI + ql) /2. If it Hes in L, then x (if it exists)must Hebetween it and ql, as shown in Figure 2.4.3; whereas if it doesnot lie in L, then x (ifit exists) must Hebetween PI and it, as in Figure2.4.4.

Figure 2.4.2:

xPI

define arithmetic and order on Dedekind cuts. For example, LI + L2is the cut L consisting of all rationals of the form rl + r2 where rl isin Ll and r2 is in L2 (multiplication, alas, is more tricky to define,because the product of two negative numbers is positive-it requires asegregation according to sign). Of course such a definition requires aproof that L is actually a cut and that Lq1 + Lq2 = L(ql +Q2) if ql andq2 are rationals so that the sum for rationals agrees with the sum forcuts. The order relation LI < L2 is the same as containment, LI e L2.One can then show, with sorne work, that the Dedekind real numbersystem is a complete ordered field.

What we want to see now is that the Dedekind real number systemis identical to the real number system that we have constructed. Wehave already seen how to associate a Dedekind cut Lx to a real numberz: The fact that distinct real numbers x i= y give rise to distinct cutsLx "# Ly is just the observation that there exist rational numbers inbetween x and y-a fact we observed as a consequence of the axiom ofArchimedes. The fact that every Dedekind cut is of the form Lx, whichwe mentioned but have yet to prove, will show that the correspondencex -t L¿ is onto.

So let L be a Dedekind cut; that is, a subset of the rational numberssatisfying the properties 1-4. We want to find a real number x suchthat L = Lx. The idea of the proof is divide and conquer. We knowfrom the first two properties of L that there is at least one rational,call it PI, in L, and at least one rational ql, not in L. Now ql > PI byproperty 3, and clearly z (if it exists) must He in between, as shown inFigure 2.4.2.

Chapter 2 Construction of the Real Number System60

Page 80: Strichartz_The Way of Analysis 2000

Figure 2.4.5:

q¡Pierq

In the first case let 1'2 = (Pl + ql}/2 and q2 = ql, and in the secondcase let P2 = p¡ and q2 = (p¡ + q¡)/2. In other words we replace oneof the two points p¡ or ql by the midpoint and leave the other alone.In this way we still have P2 in L and q2 not in L, but the gap betweenPl and ql has been halved. We then repeat the process, dividing theinterval P2, q2 in half and choosing for 1>3, CJ3 the half such that 1>3 isin L and q3 is not in L. Repeating the process indefinitely we obtainsequences Pl,P2, ... and ql,~,'" of rationals. It is easy to see thatthese are Cauchy sequences; and they are equivalent, so define a realnumber e. It seems plausible that for this z, Lx = L. Let us in factprove it. Remember that Lx was defined as the cut consisting of allrational numbers less than e. L was the cut with which we started.Suppose q is a rational number in L. Why is q in Lx? Here we haveto use property 4. There exists r in L greater than q. Eventually thesequence P¡,P2, •.. must exceed r (whatever the initial distance ql - r,the distance qle - Pie must eventually be smaller-since it is halved ateach step-and so Pie > r, as shown in Figure 2.4.5).

Figure 2.4.4:

xPI11

P2

Figure 2.4.3:

P2

PI

612.4 Other Versions and Visions

Page 81: Strichartz_The Way of Analysis 2000

fJ2 = q - ql to be less than y.The argument for products is similar, although more complicated,

while the argument for the order relations is extremely simple (can you. ·t?)gIve 1 ••

Thus the method of Dedekind cuts gives a third version of the samereal number system. Comparing it to the method of Cauchy sequences,we can see some advantages and disadvantages. The main advantageof the Dedekind cut method is that it involves less set theory: we onlyhave to deal with a countable set of rationals and its power set-e-theset

Figure 2.4.6:

x+yq

From Pk ~ r for alllarge k we obtain x ~ r and so x > q. This says qis in Lx.

Conversely suppose q is in Lxi i.e., q < z, We want to show q is inL. We claim q ~ Pie for some k, for if not, then q ~ Pie for all k andhence q ~ z, contradicting q < e. Thus q ~ Pie and so q is in L byproperty 3.

This shows L = Lx and so establishes a one-to-one correspondencebetween the real numbers and the Dedekind cuts. Of course to showthat the two number systems are the same we have to verify that thearithmetic operations and ordering relations agree. For example, is thecut L(x+y) associated with the real number x + y the same as the sumLx + Ly of the cuts associated with the numbers x and y? Rememberthat L(x+y) = {rationals q < x+y} while Lx+Ly = {rationals q = ql +q2where ql < x and lb < y}. Clearly t.; + Ly ~ L(x+y) because if ql < xand lb < y, then ql + q2 < X + y. For the reverse inclusion we needto see that if q is a rational number less than x + y, then it can bewritten q = ql + q2 where ql and q2 are rational numbers with ql < xand q2 < y. But by the axiom of Archimedes x + y - q > 1/n for somen, so if we take ql any rational such that x - 1/2n < ql < z , thenq2 = q - ql is less than the largest value for q, which is x + y - l/n,minus the small value ql, which is x - 1/2n, as shown in Figure 2.4.6.In other words, by making ql closer to x than q is to x + y, this forces

Chapter 2 Construction of the Real Number System62

Page 82: Strichartz_The Way of Analysis 2000

We have constructed the real number system in which the Axiom ofArchimedes is valid-there is no positive number lesa than l/n for alln. Another way of saying this is that there are no infinitesmals. Nev­ertheless, there is a body of informal mathematics that is based on theconcept of infinitesmals. We know that many of the mathematicianswho contributed to the development of the calculus during the seven­teenth and eighteenth centuries believed that the true foundations ofthe subject should be based on infinitesmals; and even after the gen­eral acceptance of the non-infinitesmal foundations in the nineteenthcentury, the use of infinitesmals in informal or heuristic arguments per­sisted. It would seem plausible, then, that there should exist a logicallysatisfactory foundation for analysis based on infinitesmals. Such a foun­dation was finaUydiscovered around 1960by Abraham Robinson, whocalled it non-standard analysu. The reason it took 80 long to discoveris that it is a rather sophisticated mathematical system, requiring astrong background in mathematical logic and general topology in or­der to be understood. While 1 cannot begin to describe this systemprecisely here, 1 can give some indication of its general features.

We start with the real number system R, and we enlarge it to thenon-standard real number system .R, whichcontains, in addition to real

2.4.3 Non-Standard Analysis*

of subsets of the rationals. For the Cauchy sequence method we haveto deal with equivalence classes oí Cauchy sequences, involving sets ofsets of rationals, just to get a single real number.

On the other hand, we are going to have to deal with the conceptof Cauchy sequences eventually, whereas the concept of a Dedekind cutis not of much significanceonce the real number system is established.But the most compelling argument in favor of the Cauchy sequencemethod must look to the future. The construction of the reals fromthe rationals via Cauchy sequences is a proto-type of a general con­struction that is used frequently in mathematics. Thus there is anadvantage to becoming familiar with this construction in its most con­crete example. (It is also true that there are some constructions inmathematics that are generalizations of the method of Dedekind cuts,but these are less frequently encountered. Probably, a weU-educatedmathematician should be familiar with both approaches.)

632.,4 Other Versions and Visions

Page 83: Strichartz_The Way of Analysis 2000

numbers, infinitesmalnumbers and infinitenumbers (the reciprocalsofthe infinitesmals). The general finite number in ·R is the sum of a realnumber and an infinitesmal. The real number is uniquely determinedby the finitenon-standard number and is calledits standard parto Thuswehavethe mental picture of the realline enlargedby surroundingeachreal point by a cloud,or galaxy,of infinitesmallyclosepoints. The cloudabout the point Ois the set of infinitesmals,and the reciprocalsof thiscloud form the infinite non-standard numbers. The set of infinitesmalsis uncountable; in fact, it has cardinality equal to the set of subsets ofreal numbers,which is greater than the cardinality of the real numbers.

Every concept involvingreal numbers has an extension to the non­standard real numbers, and every true statement about the real num­bers is also true about the non-standard real numbers, if properly inter­preted. For example, the non-standard real numbers form a completeordered field. However,the interpretation of statements in the non­standard real numbers requires a great deal of caution. For example,consider the Axiomof Archimedes. Since the non-standard real num­bers contain infinitesmals,numbersx > Osuch that x < 1,x < 1/2, x <1/3, ... , it is clear that the AxiomofArchimedesin its usual interpre­tation is false for -R, Nevertheless,since the Axiomof Archimedesistrue forR, it must be true for ·R in the appropriate interpretation. Tofind the appropriate interpretation wewrite out the statement formal­ly: for every positive x in R there exists n in N (the natural numbers)such that x > 1/n. We then must substitute the non-standard versionfor everything-not just x in 1R but also n in N. There is a non­standard notion of natural number, a subset •N of .1R that is largerthan N. It contains infinite integers as well as finite integers. Thusthe true interpretation is: for every positive x in ·R there exists n in•N (the non-standard natural numbers) such that x > l/n. If x is aninfinitesmal, then x > l/n for some infinite integer n.

The real powerof the non-standard real number system is that es­sentially all heuristic reasoning concerning infinitesmalscan be madeinto valid logicalproofs. For example, the derivative J' (x) can be com­puted by formingthe differencequotient(J(x + h) - f(x))/h when h isan infinitesmaland by taking the standard part of this non-standardreal number (taking the standard part replaces the dubious practice ofdiscarding infinitesmalsin the final limit). Similarly,limits of infinitesequencesXl, X2,'" becomethe standard part of Xn for n infinite, and

Chapter 2 Construction of the RealNumber System64

Page 84: Strichartz_The Way of Analysis 2000

3. Any theorem about the real number system that can be provedusing the non-standard real number system can be proved withoutit. This is a meta-mathematical theorem¡ there is even a trans­lation mechanism for taking a non-standard proof and replacingit with a standard one. In fact, the first well-known publishedtheorem discovered usíng non-standard analysis was followed inthe same journal by a "translation" so that mathematicians un­familiar with non-standard analysis could follow the proof. Froma practical standpoint, however, this is a rather trivial drawback,

2. The "construction" of *R (there are now several equivalent ones)involves the use of the axiom of choice in a more serious way thanthe "construction" of R. This is not to say that our "construc­tion" of 1R is truly "constructive". When we say there exists aCauchy sequence of rationals ... we do not imply the existenceof an algorithm for producing tbe rationals in the sequence. Ourarguments also frequently use the countable axiom of choice--weassume that a countable number of unspecified choices can bemade, However, the very definition of .1R requires the uncount­able axiom of choice--simultaneously an uncountable number ofunspecified choices must be made. This puts an additional levelof abstraction between the intuition and the non-standard realnumber system.

1. It is not unique. There are many non-standard real number sys­tems ·R that are not equivalent to each other but work just aswell, so there is no reason to prefer one to the another. Intu­itively one would expect the real number system to be unique-ofcourse, one first has to deal with the more serious problem of thenon-uniqueness of set theory, but once a version of set theory ischosen the real number system 1R is uniquely determined.

integrals can be interpreted as infinite sums of infinitesmals. Becausethe non-standard real number system encompasses valid methods ofreasoning that were formerly thought of as merely plausible arguments,it has enabled mathematicians to prove theorems that they might oth­erwise have not discovered.

On the other hand, there are also several drawbacks to the non­standard real number system that must be pointed out:

65~.4 Other Versions and Visions

Page 85: Strichartz_The Way of Analysis 2000

At the opposite philosophic pole to non-standard analysis, we find Er­ret Bishop's theory, enunciated in his book Foundations 01 ConstructiveAnalysis (Academic Press, 1967). The idea behind his work is that wehave failed to make the distinction in meaning between "there existsblah" and "there exists an algorithm for constructing blah", This fail­ure is embedded in our logic, which allows us to pass in an argumentfrom a constructive existence to a non-constructive existence. A simpleexample of this is as follows: let Xl, X2, ••• be the Cauchy sequence ofrationals defined by Xn = O ir every even number (~ 4) less than nis the sum of two primes, while otherwise Xn = l/k where k is thefirst even number (~ 4) less than n that is not the sum of two primes.Clearly there is an algorithm (involving finding all primes less than n,forming all sums of pairs, and comparing the list with all even numbersless than n) for computing Xn in a finite number of steps. The numberthis Cauchy sequence represents, call it X, is Oif Goldhach's conjectureis true and X > Oif Goldbach's conjecture is falseo However, our proofthat X = Oor X > Odoes not provide us with an algorithm for decidingwhich case is true-for if it did it would provide us with an algorith­m for settling Goldbach's conjecture. Such an algorithm might exist,but no one expects its discovery to be a routine consequence of realanalysis.

Thus the theorems proved in the standard theory of the real num­ber system are lacking in constructive contento We don't know if thingsasserted to exist can actually be found, even if every hypothesis of thetheorem is true constructively. If we are interested in the constructivecontent of our theorems we must initiate a new mathematical develop­mento Such a posteriori development is given by the theory of recursive

2.4.4 Constructive Analysis*

4. Non-standard analysis is more difficult than standard analysis tolearn. We will take this drawhack as the incontestable reason forending this discussion of non-standard analysis.

since we are interested in theorems whose proofs we can discov­er rather than whose proofs existo Since non-standard analysisis a proven aid to the discovery of proofs, it helps us learn moremathematics.

Chapter 2 Construction of the Real Number System66

Page 86: Strichartz_The Way of Analysis 2000

functions; however, the school of Bishop rejects this development be­cause it allows too broad a notion of algorithm (a recursioe sequenceXn consists of a finite set of instructions that enables you to computeXn in a finite length of time given n; the problem is that there may notbe a proof that the length oí the computation is finite even though thisis in fact the case). Instead, Bishop proposes that we begin a prioriwith only algorithmic existence and use a system of logic that allows usto deduce only algorithmic existence. He develops a substitute for thereal number system, called the constructive real number system, andhe is able to prove substitute constructive versions oí the theorems inthis book.

The definition of the constructive real number system bears a su­perficial resemblance to the definition of the real number system. Itis based on Cauchy sequences of rationals, but with the additional re­striction that there must be an algorithm for demonstrating the rate ofconvergence (Bishop uses the technical trick of requiring IXn - Xm I <l/n + l/m for all his Cauchy sequences). However, the actual num­ber system that results from this definition is radically different fromthe usual one. The most striking distinction is that the act of doingmathematics actually changes the constructive real number system.The reason for this is that there is a large "don 't know" category ofsequence Xl, X2, ••• of rationals for which we neither have a proof ofthe Cauchy criterion nor a disproof. These sequences do not defineconstructive real numbers now. However, if tomorrow we discover analgorithmic proof that one of these does satisfy the Cauchy criterion,then that sequence defines a new constructive real number. Anotherway the system may change is that we may "learn" that two numbersX and y are really equal, if we discover an algorithmic proof that theCauchy sequences defining them are equivalent.

It is easy to dismiss the work of Bishop and his disciples as that ofa group of reactionary religious zealots intent on bullying the mathe­matical community into accepting its peculiar orthodox tenets. Thisis probably the attitude of most mainstream mathematicians, and itis aided and abetted by some overenthusiastic pronouncements fromthe constructivist school. However, 1 believe it is more valuable tolook on this work as an interesting, and potentially fruitful, separatebranch of mathematics. There is no doubt that the theorems of con­structive analysis are legitimate mathematical discoveries, even when

672.4 Other Versions and Visions

Page 87: Strichartz_The Way of Analysis 2000

2. If A is a set of real numbers and Lx is the Dedekind cut associatedwith each x in A, show that the union of all the cuts Lx is itselfeither a Dedekind cut or the set of all rationals.

1. Let L be the set of all negative rational numbers and those positiverational numbers satisfying r2 < 2. Show that L is a Dedekindcut that represents the number J2.

2.4.5 Exercises

interpreted in conventional terms. They spell out in precisely whatways conventional statements can be made algorithmic. Of course itis the responsibility of constructive mathematicians to come up withmathematical discoveries that will be exciting and profound enough tocapture the imagination of the larger mathematical community, if theywant their work to be taken very seriously.

Constructive analysis should not be confused with applied mathe­matics, where the goal is not just algorithms but algorithms that canbe carried out efficiently. Applied mathematics has not been notablyinftuenced by the work of the constructivists. It has, on the contrary,benefitted greatly from mathematics developed in a completely non­constructive spirit, which leads to the paradox that constructive math­ematics may be best advanced by allowing free use of non-constructivereasoning.

One can also argue that mathematics, as it develops, becomes pro­gressively less constructive. Even the notion of "construction" has e­volved from the purely finite notion in Greek mathematics (straightedgeand compass) to the contemporary idea of an algorithm giving a finiteset of instructions for obtaining successive approximations. Construc­tivism can then be seen as a "reaction" to this "progress".

In the remainder of this book, we will freely use non-constructiveideas, although whenever possible we will give constructive demonstra­tions because they usually convey more information. Thus when weassert that something exists, we do not necessarily imply that we haveprovided explicitly or implicitly a method for finding it. The readerwho has misgivings about this may wish to study Bishop's work, but1 recommend that this be done only after the non-constructive theoryhas been well digested.

Chapter 2 Construction of the Real Number System68

Page 88: Strichartz_The Way of Analysis 2000

Definition 2.2.1 The real number X + Y is the equivalence class 01

2.2 The Reals as an Ordered Field

Definition 2.1.1 A real number is an equivalen ce clase 01 Cauchysequences 01 rationals.

Lemma 2.1.1 The equivalence 01 Cauchy sequences 01 rationals is anequivalence re~ation (it is symmetric, reftexive, and transitive).

Definition Two Cauchy sequences XI, X2, ••• and YI,Y2, ••• are equil1-alent illor el1ery n there exists m su eh that IXk - Ykl ::; l/n [or k ~ m.

Definition A Cauchy sequence 01 rational numbers is a sequenceX¡, X2, ••• 01 rational numbers such that for every n there exists m sucñthat IXj - xkl ::; 1/n [or all j and k ~ m.

2.1 Cauchy Sequences

2.5 Summary

5. Let ak, bk, Ck, nk for k = 1,2, ... be an enumeration of all quadru­pies of positive integers with nk ~ 3. Let Xj = O if a~A: +b~A: -¡. c~A:for all k ::; j and otherwise Xj = l/k where k is the smallest in-

< . ~ hich nA: bnA: n" P h .teger _ J ror w ak + k = Ck• rove t at Xl,X2, ••• is aCauchy sequence. What is the relationship between the Axiomof Archimedes for the real number given by this Cauchy sequenceand Fermat 's Last Theorem?

3. If x and y are positive reals, show that Lry consists of all non­positive rationals and all positive rationals of the form r s wherer and s are positive rationals in Lx and Ly.

4. Give a simple non-constructive proof that there exist positive ir­rational numbers a and b such that ab is rational by consideringthe possibilities for .¡rI2 and (.¡rI2)v'2. Does this proof allowyou to compute a to within an error of l/lOO?

692.5 Summary

Page 89: Strichartz_The Way of Analysis 2000

Theorem 2.2.4 (Axiom 01 Archimedes) 11 x > O there exists n su ehthat x ~ l/no

Theorem 2.2.3 (7hangle In equality) Ix+ yl ~ Ixl + Iyl·

Definition x > y means x - y is positive, Ixl = x il x ~ Oand -x ilx < O.

Theorem 2.2.2 The reals [orm an ordered field-every real numberis either positive, negative, or zero, and sums and produets 01 positivenumbers are positive.

Deftnition 2.2.3 A real number x is positive il there ezists N and mand Xl, x2, ... a Cauehy sequenee representing x sucñ that Xj ~ l/N[or all j ~m. A real number is negative il -x is positive.

Lemma 2.2.4 Every real number not equal to zero is bounded awayfrom zero (there exista N sueh that [or every Cauehy sequence xl, x2, ...representing x there exista m with IXil ~ l/N [or all j ~m).

Theorem 2.2.1 The real numbers [orm a field.

Deftnition A field is a set with two operations, addition and multi­plieation, sueh that both operations are eommutative and assoeiative,multiplieation distributes over addition, there exist distinet additive i­dentity O and multiplicative identity 1, and every element X has anadditive inverse -x and a multiplieative inverse x-l (il x =F O).

Deftnition 2.2.2 The real number X . Y is the equivalenee class 01X1Y¡' X2Y2," . where X¡, X2, ... and Y¡, Y2, ... are Cauehy sequences rep­resenting x and y.

Lemma 2.2.2 A Cauehy sequenee is bounded.

Xl +y¡, x2 +Y2,'" where x¡, x2,'" and Y¡, Y2,'" are Cauehy sequences01 rationals representing X and y.

Chapter 2 Construction of the Real Number System70

Page 90: Strichartz_The Way of Analysis 2000

4. il r is in L there exists B in L with B > r.

3. il r is in L and q < r, then q is in L;

2. L is not all the rationals;

1. L is not empty;

Deflnition A Dedekind cut is a set L 01 rational numbers satisfying

Theorem Every real number has an injinite decimal expansiono

2.4 Other Versions and Visions

Theorem 2.3.3 Every positioe real has a unique real positive squaremoto

b. Limits preserve non-strict inequalities.

a. Limits commute tuith addition, multiplication, and division (tuithnon-zero denominator).

Theorem 2.3.1 (Completeness 01 Reals) Every Cauchy sequence 01real numbers has a reallimit.Theorem 2.3.2

Deflnition For x a real number and Xl, X2, ••• a sequence 01 real num­bers, X = limk_oo Xk illor every n there exists m such that k ~ mimplies IXk - xl ~ l/no

Deflnition A Cauchy sequence 01real numbers is a sequence Xl, X2, •••

01real numbers such that [or every n there exists m such that IXj -Xk I ~l/n lor all j and k ~ m.

2.3 Limits and Completeness

Theorem 2.2.5 (Density 01Rationals) Given x a real number and n,there exists a rational number y such that Ix - yl ~ l/no

712.5 Summary

Page 91: Strichartz_The Way of Analysis 2000

Theorem There is a one-to-one cofTUpondence between real numbersand Dedekind cuts.

Chapter 2 Construction of the Real Number System72

Page 92: Strichartz_The Way of Analysis 2000

73

In this chapter we delve deeper into the properties of real numbers,sequences of real numbers, and sets of real numbers. Many oí theconcepts introduced in this chapter will reappear in a broader context(metric spaces) in Chapter 9. By considering these concepts first in theconcrete case of the real number line, you will have the opportunity todevelop an intuition for them. This will make it easier to appreciatethe generalizations that foHow.

We can think of the real number system as representing a geometriclineo We are interested in properties that have a qualitative geometricnature, which is the meaning of the word ''topology''. One of thefundamental concepts is that of limit, which we have already discussedbut repeat its definition for emphasis. 11Xl, X2, X3, ••• is a sequence 01real numbers and il x is a real number such that given any error llnthere exista a place in the sequence m, such that Ix - Xii < 1/n [orall j ~m, then we say ix is the limit 01 the sequence, x = limj_oo Xj.

A limit need not exist, but if it does it is unique. We have alreadymotivated this definition by the idea that any number from the sequencebeyond the mth place is very close to e. We can also think of this in ageometric way. For each 1/n, the set oí all real numbers y that satisfythe inequality Ix - yl < lln is the open interval x -l/n < y < x + llnof length 21n centered at z; Think of this interval as comprising a

3.1.1 Limits, Sups, and Infs

3.1 The Theory of Limits

Topology of the Real Line

Chapter 3

Page 93: Strichartz_The Way of Analysis 2000

more and more accurately pinpointing the location of z: The definitionof the limit says that limj_oo Xj = x if the sequence eventually (j ~ m)Hes entirely in each neighborhood.

It is sometimes convenient to give a meaning to the expressionslimj_oo Xj = +00 and limj_oo Xj = -oo. If we think of the half­infinite intervals {x : x > n} and {x : x < n} as defining neighborhoodsof +00 and -00, respectively (allowing n to be any integer, positive ornegative), then the definition is again the same: limJ-oo Xj = +00if the sequence eventually lies entirely in each neighborhood of +00.Sometimes it is convenient to think of the symbols +00 and -00 asstanding for new numbers in what is called the extended real numbersystem. The system consists of the real numbers together with +00 and-oo. It has an obvious order (+00 is the biggest, -00 the smallest)and a limited arithmetic, with rules like x + (+00) = +00 for x real,but certain expressions, such as +00 + (-00), must remain undefined.

Of course not every sequence that fails to have a real number limitwill have +00 or -00 as a limito A sequence that jumps about, such as0,1,0,1,0,1, ... , is an obvious counterexample. (Sorne mathematician­s, such as Leibniz and Euler, felt that this sequence should have 1/2as limit; nevertheless it does not under the definition we have adapted,since the sequence never lies in the neighborhood 1/4 < x < 3/4 about1/2.) However, we can consider weaker notions that provide sorne in­formation about where the points of a. sequence He, and these notionsturn out to have great importance. The simplest of these are the supre­mum and infimum, which are based on the order properties of the realnumbers. These concepts can be defined not just for sequences but forany sets of real numbers. We will use the abbreviation sup and inf.

Figure 3.1.1:

neighborhood of x (we will give this a precise definition later). Aswe increase n these neighborhoods shrink, giving a nested picture, asshown in Figure 3.1.1,

Chapter 3 Topology of the Real Line74

Page 94: Strichartz_The Way of Analysis 2000

If E is any finite set of real numbers, we can define sup E andinf E to be the largest and smallest numbers in E, respectively. Butif E is an infinite set of real numbers, there may not be a largest orsmallest number in E. For example, the set of positive integers hasno largest number; the set of positive reals has no smallest number.Nevertheless, if we imagine the set E as represented geometrically onthe line, there should be a left-most and right-most point (possibly -00or +00) indicating the range of the set; these endpoints might or mightnot belong to the seto We will call them inf E and sup E. How arewe to define them? First let us develop the intuition that leads us tobelieve they exist; this will indicate the definition we want to adopt,and then we will have to give a proof that inf E and sup E existoFor simplicity we will deal only with sup E; the treatment of inf E iscompletely analogous, or one may simply say that inf E is minus thesup of -E. To avoid triviality we assume E is non-empty.

If sup E is supposed to indicate the top-most extent of E, thenevery point in E must be less than or equal to it. Thus, in searchingfor sup E, we can confine attention to numbers y with the propertythat every number x in E satisfies x ~ y. (The example of a finiteset should convince you that we want x ~ y and not x < y.) Suchnumbers are called upper bounds of E. Not every set possesses upperbounds, for example, the set of positive integers. We say E is boundedfrom above if it has any upper bounds or unbounded from above if ithas none. By convention we set sup E = +00 if E is unbounded fromabove.

Among all upper bounds for E (ifthey exist), which shall we choosefor sup E? Obviously the smallest one! But are we sure that a smallestone exists? We have already mentioned that sorne sets of numbers,such as the positive numbers, have no smallest elemento Well, supposewe start with one upper bound, call it Yl. If it is not the smallest, thenpick a smaller one, Y2.

Continuing in this way, we could pick a sequence Yl, Y2, Y3, . .. ofupper bounds that get smaller and smaller and hope that lim, -00 Yj =y would give us sup E. Unfortunately, we have to be a bit more carefulto really go down far enough at each step-for it is clear from Figure3.1.2 only that y is an upper bound for E, not that it is the smallest.

To bring the sequence {yj}down close to E we must consider somepoints in E as well as upper bounds. Along with Yl choose a point

753.1 The Theory 01 Limits

Page 95: Strichartz_The Way of Analysis 2000

In either case, we have replaced the original pair xl, YI with a new pairX2, Y2 with X2 in E and Y2 an upper bound to E, with the distanceapart IX2 - Y21 at most half the original distance IXI - YII (in the firstcase IX2 -Y21 = IXI -YII/2 since Y2 is the midpoint, while in the secondcase IX2 - Y21< IXI - YII/2 since X2 is greater than the midpoint). Byiterating this argurnent we obtain an increasing sequence Xl, X2, X3, •••

of points in E and a decreasing sequence YI, Y2, Y3, ... of upper boundsfor E such that IXn - Ynl :$ IXI - YII/2n. It is an easy matter toconclude from this that these are equivalent Cauchy sequences and,hence, converge to the same limit; call it y. What can we say aboutthis point y?

1. It ís an upper bound for E. In fact this follows from the moregeneral fact that a limit 01 upper bounds [or E is an upper bound

Figure 3.1.3:

midpoint:

E ,[ x~ . .] •x2 YI

second case Y2

mi~point

E +[ • • •XI Y2 YIx2

ñrst case

Xl in E (we assurned E is not empty). Since YI is an upper boundfor E, Xl :$ Y¡, and c1early the sup must He somewhere in between.Again we use a divide and conquer argumento Consider the midpoint,(Xl + yI}/2. If it is an upper bound for E, choose it for Y2, and setX2 = Xl. If it is not an upper bound for E, then it fails because thereis sorne point X2 in E bigger than the midpoint; in this case we takeY2 = YI. The two cases are illustrated in Figure 3.1.3.

Figure 3.1.2:

YIY2Y3y •••E

Chapter 3 Topology of the Real Line76

Page 96: Strichartz_The Way of Analysis 2000

A closely related theorem concerns sequences whose terms are in­creasing (there is an analogous result for decreasing sequences). Asequence Xl, x2,'" is called mono tone increasing if X;+l ~ Xj for everyj. Note that we do not demand strict inequality, which explains the useof the awkward adverb "monotone". Ifa monotone increasing sequenceis unbounded, then it has limit +00 (once Xj > n we have Xk > n forall k ~ j). But suppoee it is bounded; it would appear to rise to somefinite limit, namely, the sup of the set of numbers {xil.

It is important to understand the distinction between the sequenceXl, X2, •• o and the set {Xlt X2, o o .]. The sequence has the numbers in aspecified order, and numbers may be repeated, In the set, elements areunordered, and repeated numbers are treated no dift'erently from unre­peated numbers. Thus the set associated to the sequence 3, 2, 1,2, 1, o o •

is just the three element set {1,2, 3}. We will sometímes follow con­ventional usage and denote a sequence by {x;}. It should be clear fromcontext that we mean the sequence and not the set, even though thenotation makes no distinction.

2. il y is any upper bound for E, then y ~ sup E.

Theorem 3.1.1 For every non-empty set E 01 real numbers that isbounded above, there exists a unique real number sup E such that

1. sup E is an upper bound [or E;

[or E. The condition of being an upper bound for E is givenby non-strict inequalities x ~ y for all x in E, and non-strictinequalities are preserved by limits (thus x ~ Yj for all Yj impliesx ~ limj_oo Yj, and this reasoning applies at each point x of E).

2. It is the least upper bound of E: if 11' is another upper bound forE, then y s 11'. The reason for this is that Xj ~ 11' since Xj is in Eand y' is an upper bound for E and 80 y = lím,-00 Xj ~ y' sincethe non-strict inequality is preserved in the limito

Clearly there is only one number with these two properties-andthis deserves to be defined as sup E. The terminology least upperbound, abbreviated l.u.b., is used synonomously, while greatest lowerbound, abbreviated g.l.b., is used for inf. We restate the importanttheorem that we have established.

773.1 The Theory 01Limita

Page 97: Strichartz_The Way of Analysis 2000

Now we turn to the case oí a general sequence, which may not have alimito We have defined the sup and inf of the sequence (the sup may be+00 and the inf -(0). The interval between the inf and sup containsall the points in the sequence and is the smallest such intervalo Never­theless, it provides only a very crude indication of where the sequencereally lies. For instance, the convergent sequence 0,3,1,1,1, ... has inf= Oand sup = 3. What do Oand 3 have to do with the limit? Weneed a more refined concept, one that is not inftuenced by only a finitenumber of terms.

Suppose we take two convergent sequences Xl, X2, ••• and Yl,Y2, .••

that have different limits, x and y, and shuffíe them to form the se­quence Xl, y¡, X2, y2, •••• The shufHed sequence will not have a lirnit;nevertheless the two values x and y are connected to the sequence in

3.1.2 Limit Points

Theorem 3.1.2 A monotone increasing sequence that is bounded fromobove has a finite limit, and the limit equols the supo

The combined inequalities Xk ~ yand Xk > y-l/n show IXk -yl $ 1/nfor every k ~ j, and this says exactly limj_oo x j = y. We have shown:

Figure 3.1.4:

yy-lIn

Now we define the sup oí a sequence to be the sup of the associatedset, which we can write either as sup {xi} or SUPiXj. Let us verify thatindeed limi-oo Xj = sup{Xj} if the sequence is rnonotone increasing.Let y denote the supo We know Xk :5 y for every k since y is anupper bound. Since y is the least upper bound, we know y - l/n isnot an upper bound, for any choice of l/no What does this mean?It rneans that Xj :5 y - l/n must fail for sorne Xj. This is the sameas Xj > y - l/n, and because the sequence is monotone increasing,Xk > y-l/n for every k ~ j, as indicated in Figure 3.1.4.

Chapter 3 Topology oí the Real Line18

Page 98: Strichartz_The Way of Analysis 2000

However, al! the Zj 's eventualIy wiUlíe in the neighborhood of z, Wecan state this as foUows: an infinite number 01 terma 01 the shufffedsequence lie in each neighborhood01 e. Comparing this with the defi­nition of limit, we see that the strong statement, "all terms beyond themth" has been replaced by a weaker statement, "30 infinite number ofterms". We use this weakening to define the concept of limit-point (theexpressions accumulation point and cluster point are frequently usedsynonomously).

Deftnition 3.1.1 11 {Zj} is a sequence 01 real numbers and Z a realnumber, we say x is a limit-point 01 the sequence illor every error l/n,there are an infinite number 01 terma Zj satisfying Iz - Xj 1< l/n.

An equivalent way of formulating this is the foUowing:given any nand m, there exists j ~m such that IXj - zl < l/no You should beable to show that these are equivalente By convention we say +00 is alimit-point of the sequence if for every n there are infinitely many termssatisfying Zj > n. You should be able to verify that this is equivalentto the condition that the sequence is unbounded from above.

In the example of the shuffied sequence Z¡, y¡, Z2, Y2, .•. , the twonumbers x and y are both límit-points of the sequence, and there areno other límit-points (why?). In a way this is typical. In order tounderstand this we need to think about subsequences. The concept

Figure 3.1.5:

y)( .)

x( .

some weaker but still important way. Suppose we look at a neighbor­hood of z, the interval from Z - l/n to Z + l/n, where we choose l/nso small that y and alI points in a neighborhood of y do not lie in thisneighborhood (we are assuming z =F y, so this is possible by the ax­iom of Archimedes). The situation is illustrated ín Figure 3.1.5. Willit be true that alI the terms of the sequence beyond a certain placelie in the neighborhood? No, because the Yj's eventually all líe in theneighborhood of y.

793.1 The Theory 01 Limita

Page 99: Strichartz_The Way of Analysis 2000

Proof: This theorem is almost completely trivial if you understandthe definitions. We assume that x is a real number. The case x = ±oois treated similarly. First suppose there exists a subsequence with x asa limito Given any error l/n, the subsequence approaches within l/nof x beyond a certain place, and since all the terms of the subsequencebelong to the sequence, there are an infinite number of terms of thesequence within l/n of e. Thus X is a limit-point.

For the converse we have to do a little work; namely, we have toconstruct a subsequence with limit z, assuming x is a limit-point of thesequence {x i}. To make life easy we will choose the subsequence {xj}so that Ix~- xl < l/ni this clearly implies limj_oo xj = z: How do wechoose x~? By the definition of limit-point there are infinitely many Xjsatisfying IXj- xl < l/ni we choose x~, x2, ... in order such that, afterchoosing x~, ... , x~_l' we take for x~ sorne x j beyond x~, ... ,x~_lin the original sequence with IXj- xl < l/no In this way {xj} is asubsequence of {Xj} with limit z, QED

Theorem 3.1.3 Let {Xj} be any sequence 01 real numbers. A realnumber (or even an extended real number) x is a limit-point 01 {Xj} iland only il there exists a subsequence {xj} such that limj_ooxj = e,

of subsequence is extremely simple to comprehend but remarkably d­ifficult to notate. If Xl,X2, X3, ••• is a sequence, a subsequence is anyother sequence obtained by crossing out some (possibly infinitely many)terms, keeping the same order for the remaining terms. For an exactdefinition we first need to specify the class of functions m(n) from thenon-negative integers to the non-negative integers that are increasing,m(n + 1) > m(n) for all n. Call such a function a subsequence selectionfunction (it will pick out the position of the terms that remain afterthe crossing out). Then {yj} is a subsequence of {x j} if there existsa subsequence selection function such that Yn = xm(n). In place ofthe compound subscripts (it gets worse if you take a subsequence of asubsequence) we will follow the convention of using primes to denotesubsequences, so {xj} denotes a subsequence of {Xj}. Note that in theshufBed sequence Xl,YI, X2, Y2, ••• each of the original sequences is asubsequence. (What are the corresponding subsequence selection func­tions?) Notice the connection in this example between limit-points (xand y) and limits of subsequences. This is true in general.

Chapter 3 Topology of the Real Line80

Page 100: Strichartz_The Way of Analysis 2000

where each row is the same enumeration of the rational numbers; thusthe sequence {Xj} is r¡, r¡, r2, rI, r2, r3, r¡, r2, r3, r4, .... It has the prop­erty that every real number is a limit-point! Indeed if x is a real numberthere is a Cauchy sequence of rationals converging to it, and any se­quence ofrationals is a subsequence of {Xj} (why?).

A convergent sequence has only one limit-point, namely its limit,since every subsequence of a convergent sequence converges to the samelimito The converse is also true: if a sequence has only one limit-point(counting +00 and -00 as possible limit-points), then it is convergenteThis is not so obvious, but it will emerge from further considerations.

Does every sequence have a limit-point? We will show that this isthe case (allowing +00 and -00). In fact there are two speciallimit­points, the largest, called limsup, and the smallest, called liminf. Wecould simply define limsup to be the sup of the set of limit-points,but this begs the question of the existence of limit-points. Insteadwe will write down a formula for limsup. Note that limsup is quitedifferent from sup; for the sequence 2, 1, 1, 1, ... , the sup is 2, but theonly limit-point is 1, and this is the limsup. Nevertheless, the sup isa good starting point for finding limsup. Let 's assume the sequence isbounded aboye, for otherwise we take +00 by convention for both supand limsup. The trouble with the sup, as the example shows, is that itmight be Xl, which has nothing to do with the limiting behavior of thesequence (just as in the case of the limit, any finite number of terms of asequence can be changed without changing the limit-points). We couldtry to fue things by throwing the rasca! out-consider the sup over all

///

rI / r2 / r3 /rI / r2 / r3 /

/ rI / r2 / ra /

Thus if the sequence {Xj} has a limit-point z, we can think of it asa kind of shufHing of a sequence converging to x with another sequencebut not necessarily an even shuftling. Of course the structure of theset of limit-points of a sequence can be quite complex, as the followingexample illustrates. Let {Xj} be a sequence in which every rationalnumber appears infinitely often. Such a sequence is easily constructedby applying the diagonalization argument to the rectangular array

813.1 The Theory 01 Limits

Page 101: Strichartz_The Way of Analysis 2000

Proof of fact 2: We have to show that y = limsup is an upper boundfor the set of limit-points-since we know by fact 1 that it is a limit­point, this will show it is the least upper bound. Thus what we needto show is that if x is any limit-point of the sequence, then x ~ y. Ifx is a limit-point let {xj} be a subsequence converging to X. Let us

I • h S· {/} . b Icompare Xk+1 W1t Yk = SUPj>kxi' mee xj lS a su sequence, Xk+1

is one of the xi with j > k, so Yk is the sup of a set containing x~+l'

Proof of fact 1: Suppose first that y is finite. Since y is the limit ofsUPi>k xi, given any 1/n we can find m such that ly-suPi>k x) I ~ 1/2nfor all k ~ m. Since sUPi>k xi is finite, we can find e¡ for 1 > k suchthat IXl - sUPi>k xi I ~ 1/2n, so Iy - xli ~ l/no In fact we can find aninfinite number of Xl (since 1 > k and we can take any k ~ m). Thisshows y is a limit-point of the sequence.

If y = +00, then {SUPi>k Xj} is unbounded aboye, hence {xÚ isunbounded aboye; so +00 is a limit-point. Finally, if y = -00, thengiven any -n there exists m such that sUPi>k xi ~ -n for a11k ~ m.Thus there are infinitely many Xj with xi ~ -n, so -00 is a limit-pointof the sequence.

Incidentally fact 2 does not imply fact 1a priori, since there existsets that do not contain their sups.

1. limsup is a limit-point of the sequence.

2. limsup is the sup of the set of limit-points of the sequence.

Xj in the sequence except xl-write this SUPj>¡{Xj}. This is not muchof an improvement, for now X2 might be the culprit. If we continue tothrow the rascals out, considering in turn sUPi>2{xi},SUPJ>3{Xi},""we will never achieve our objective, but we may be approximating it.Since at each stage we are taking the sup over a smaller set (these setsare infinite, so smaller refers to containment, not cardinality), the supsare decreasing. That is, if we let Yk = sUPi>dxj}, then Yk+l ~ Yk,so the sequence {yk} is monotone decreasing and so has a limit, itsinf (possibly -00). If we write out the expression for this limit, y =limk_oo sUPj>dxj}, then we are sorely tempted, on purely linguisticterms, to define this to be lim SUPj_oo xi'

To justify the definition we need to verify two facts:

Chapter 3 Topology of the Real Line82

Page 102: Strichartz_The Way of Analysis 2000

for all j > k, so Ix - Xj I < l/n for all j > k, proving x is the limit of{xi}' Thus if limsup = liminf is finite, the common value is the limi t.(Incidentally, it is not so easy to prove this using only the property of

Because the proof is non-trivial, it would be a good exercise foryou to try to write out the analogous proof that lim infk_oo Xk =limk_oo infj>k Xk is a limit-point and the inf of the set of limit-points.Incidentally, since the sequence {SUPj>kXj} is monotone decreasingand the sequence {infj>k Xj} is monotone increasing, we can also writelimsuPk_ooXk = infk SUPj>kXj and liminfk_ooxk = sUPkinfj>k Xj'

The difference of the limsup and liminf, sometimes called the oscil­lation of the sequence, measures the spread of the set of limit-points.We are not asserting that every value in between is a limit-point; inthe sequence 0,1,0,1, ... , the limsup is 1 and the liminf is 0, and thereare no other limit-points. But, if the limsup and liminf are equal, wewould expect the sequence to converge to their common value. To seewhy this is true, call the common value x, Suppose first that x is fí­nite. We have two sequences {Yk} and {zk} converging to z, whereYk = SUPj>kxi and Zk = infj>k Xj, by the definition of limsup andliminf. Note that Zk ~ Xj ~ Yk if j > k. If we choose k large enoughso that Ix - Ykl < l/n and Ix - zkl < l/n, which we can do because xis the limit of {Yk} and {Zk}, then

1 1x - - < Zk ~ xi ~ Yk < X + -n n

Theorem 3.1.4 The limsup 01 a sequence is a limit-point 01 the se­quence and is the sup 01 the set ollimit-points 01 the sequence.

We have proved the following theorem.

Definition 3.1.2 The limsup 01a sequenceis the extended realnumberlimsuPk_ooxk = limk_ooSUPj>kXj. Similarly, the liminf is defined byliminfk_ooxk = limk_ooinfj>kXj.

hence x~+l ~ Yk' Since this holds for each k, we have x ~ y since limitspreserve non-strict inequality (you should understand why the k + 1 inplace of k makes no difference).

We can reformulate the aboye discussion by adopting the followingdefinition.

833.1 The Theory 01 Limits

Page 103: Strichartz_The Way of Analysis 2000

4. Prove sup{A U B) ~ sup A and sup(A nB) ~ sup A.

3. If E is a set and y a point that is the limit of two sequences, {xn}and {Yn} such that Xn is in E and Yn is an upper bound for E,prove that y = sup E. Is the converse true?

2. If a bounded sequence is the sum of a monotone increasing anda monotone decreasing sequence {xn = Yn + Zn where {Yn} ismonotone increasing and {zn} is monotone decreasing) does itfollow that the sequence converges? What if {Yn} and {zn} arebounded?

a. xn=l/n+{-I)n,b. Xn = 1+ (-I)n/n,C. x n = (-1)n + l/n + 2 sinmr /2.

1. Compute the sup, inf, limsup, liminf, and all the limit points ofthe following sequences Xl, X2, ••• where

3.1.3 Exercises

Finally, we need to consider the case when limsup = liminf = +00,sayo The condition liminf = +00 means Zk = infi> k Xi can be madegreater than any n by taking k large enough; but then xi ~ Zk > n fora11j > k, and this means limi-oo Xj = +00.

Theorem 3.1.5 A bounded sequence is convergent il and only il thelimsup equals the liminf or, equivalently, il and only il it has only onelimit-point.

limsup being the sup oí the limit-points, etc.) From this we deducethe immediate corollary: if a sequence {Xj} has only one limit-pointand the limit-point is finite, then the sequence is convergent. Noticethat we have to know there is only one limit-point among the extendedreal numbers, for a sequence like 0, 1,O,2, 0, 3, 0, 4, ... has the two limit­points ° and +00 but is not convergent. Another way to state this is torequire that the sequence be bounded (bounded means bounded aboyeand below; this is sometimes expressed concisely as IXi I ~ M for all j).A bounded sequence has finite limsup and liminf, since limsup ~ supand inf ~ liminf.

Chapter 3 Topology oí the Real Line84

Page 104: Strichartz_The Way of Analysis 2000

12. Say two sequences are equivalent if they differ in only a finitenumber of terms (there exists m such that Xj = Yj for all j ~m).Prove that this is an equivalence relation. Show that equivalentsequences have the same set oí limit-points.

Prove that any limit-point of any row or column of the array is alimit-point of the sequence. Do you necessarily get alllimit-pointsthis way?

11. Consider a sequence obtained by diagonalizing a rectangular ar-rayo

an / al2 / al3 /a21 / a22 / a23 /

/ a31 / a32 / a33 /

/ / /

10. Prove that the set of limit-points oí a shufHed sequence Xl, YI, X2,Y2, •.. is exactly the union of the set of limit-points of {X j} andthe set of limit-points oí {yj}. Is the same true íf the shuffting isnot regular?

9. Can there exist a sequence whose set of limit points is exactly1, 1/2, 1/3, ... ? (Hint: what is the liminf of the sequence?)

8. Write out the proof that +00 is a limit-point of {xn} if and onlyif there exists a subsequence whose limit is +00.

7. Construct a sequence whose set of limit points is exactly the setof integers.

6. Is every subsequence oí a subsequence oí a sequence also a subse­quence of the sequence?

5. Prove limsup{zn + Yn} ~ limsup{zn} + limsuP{Yn} if both limsups are finite, and give an example where equality does not hold.

853.1 The Theory 01Limita

Page 105: Strichartz_The Way of Analysis 2000

We have already had many occasions to use inequalities and sets definedby inequalities. Perhaps you have notíced that on sorne occasions thedistinction between strict and non-strict inequalities is not essential­for exarnple, in the definition of limit, we could require either IXk - xl <1/n or IXk-xl :5 1/n and it would make no difference; however on otheroccasions the distinction is essential, for example, non-strict inequalitiesare preserved in the limit, but strict inequalities may not be. We arenow going to delve further into the matter, from the point of view ofthe sets defined by the inequalities. The set determined by the strictinequalities a < x < b (for a < b) we will call an open interval, written(a, b); while the set determined by the non-strict inequalities a :5 x :5 bwe will call a closed interval, written [a,b]. For the open interval wewill also allow a = -00 or b = +00 or both. It may seem a triflingmatter whether or not the endpoints are included in the interval, butit makes a significant difference for certain questions. In the openinterval, every point is surrounded by a sea of other points. This is thequalitative feature we will want when we define the notion of open sets;it is certainly not true of the endpoints of the closed interval. On theother hand, an open interval (a, b) seems to be "missing" its endpoints.Although they are not points in the interval, they can be approachedarbitrarily closely from within the interval. It is as if they had beenunfairly omitted. The closed interval has all the points it should fromthis point of view, and this is the "closed" aspect that wewill generalizewhen we define a closed seto

Let us begin with open sets. The idea of "open" suggests that oneshould always be able to go a little further, that one should never reachthe end. Thus we will define an open set A 01 real numbers to be a setwith the property that every point x 01A lies in an open interval (a, b)that is contained in A. The open interval may vary with z, and it maybe very small. If A itself is an open interval, then it is trivially an openset because A contains the open interval A that contains each pointx in A (this is an example where the interval does not have to varywith x). A closed interval [a,b] is not an open set because the pointa in A does not lie in any open interval contained in A. The union of

3.2.1 Open Sets

3.2 Open Sets and Closed Sets

Chapter 3 Topology of the Real Line86

Page 106: Strichartz_The Way of Analysis 2000

Thus a uníon oí open intervals can be simplified to a union of disjointopen intervals {disjoint means no two intersect}. (Strictly speaking thisrequires a more elaborate proof, since there may be multiple overlaps.We leave the details to exercise set 3.2.3, number 15). How many openintervals? There could be any finite number or an infinite number. Asan example of an infinite number, consider the set (1/2, I}U(I/4, 1/2}U{1/8, 1/4} U ... shown in Figure 3.2.2.

Figure 3.2.1:

(a, b) U (e, d) = (a, d) (a, b) U (e, d) = (a, b)

be dadbea)( )())((

two open intervals, say A = (0,1) U (1,2), is open, since any point inA Hes in either the open interval (0,1) or the open interval (1,2). Thefact that the point 1does not He in an interval contained in A is notrelevant, because 1does not belong to A. The empty set is an open setbecause it satisfies the definition trivially (since it contains no points,there is nothing to verify). In what fo11owswe are mainly interestedin non-empty open sets, but we phrase the results so that they remaintrue for the ernpty set as well.

The union of any number (finite and infinite) of open intervals isan open set, simply because any point in the union must belong to oneof the open intervals. Conversely, every open set is a union of openintervals. In fact, if A is an open set, then each point x in A, accordingto the definition, Hes in sorne open interval Ix contained in A. Then Ais the union of a11the intervals Ix, A = UXfA Ix; for x is in Ix for eachx in A, so A ~ UXfA Ix; on the other hand each Ix is contained in A,so UXfA t; ~A.

Actually we can saya little more precisely what every open set islike. For if two open íntervals intersect at all, they must overlap, andso their union can be combined into a single open interval, as shown inFigure 3.2.1.

873.2 Open Sets and Closed Seta

Page 107: Strichartz_The Way of Analysis 2000

This set can also be described as the interval (0,1) with the points1/2,1/4, 1/8, ... deleted. From this second description it is not obviousthat we are dealing with a disjoint union of intervals. Of course thegeneral disjoint union of intervals can be much more complicated thanthis, with the sizes of the intervals distributed in incredibly complexways. We can, however, assert that the cardinality of the coIlectionof disjoint intervals is at most countable. That is, we cannot have anuncountable union of disjoint open intervals. To understand why this isso we have to reason about the length of the intervals. Let us call A thecollection of disjoint intervals. Consider Al, the subset of A of thoseintervals of length greater than 1. The set Al is at most countablebecause every interval of length greater than 1 must contain at leastone integer, and the disjointness means that no two intervals in Al cancontain the same integer. Next consider A2, the subset oí A oí thoseintervals of length greater than 1/2. Every interval in A2 contains atleast one half-integer (a number m/2 where m is an integer), so againA2 is at most countable. Continue in this way to define An to be thesubset of A of those intervals of length greater than l/no By the samereasoning all the sets An are at most countable. But A = Un An forevery open interval has a length greater than 11n for some n. Since.Ais a countable union of at most countable sets, it is countable.

Thus we have a structure theorem for open sets: every open set01 real numbers ís a disjoínt uníon 01 a finíte or countable number 01open íntervals. The intervals that comprise the union are uniquelydetermined; we wiIl leave the proof to exercise set 3.2.3, number 16.This structure theorem has no analogue in higher dimensions and so israrely emphasized. (For example, if you want to prove something aboutopen sets of real numbers, it would be better not to use the structuretheorem if you can avoid ít, for then your proof would stand a betterchance of generalizing.)

Figure 3.2.2:

1/21/8 1/4)x( X

Chapter 3 Topology of the Real Line88

Page 108: Strichartz_The Way of Analysis 2000

Proof of Property 2: Let Al, ... ,An be a finitenumber ofopen sets,and let A = Al n· .. nAn be their intersection. Anypoint z in A lies inall the Ak. (If A is empty there is nothing more to do, since the emptyset is open.) Since each Ak is open, z Hesin an open interval (ak, bk)contained in Ak. Sowe have a picture likeFigure 3.2.4 (n = 3).

We want an open interval containing z that lies entirely in A. Wecan't take any one oí the intervals (ak, bk), because all weknowabout

Proof of Property 1: Let A denote any collectionof open sets andU.A A their union. A point z in the unionmust belongto oneparticularopen set A. SinceA is open, z must lie in an open interval 1 containedin A and, hence, in the union. Thus the union is open.

Figure 3.2.3:

( [o

Notice also that we have to restrict the intersections in property2 to finitely many, while there is no such restriction for unions (thenumber of open sets in the union is even allowedto be uncountable).Here is an example that showswhy we need this restriction. Take aclosed interval, say [0,1]. We have observed that this is not an openseto Yet we can easily get it as a countable intersection oí, say, theopen intervals (-l/n, 1+ l/n), as shownin Figure 3.2.3.

2. The íntersectíon 01 a finite number 01 open seis is an open seto

1. The uníon 01 any number oj open seis is an open seto

Theorem 3.2.1

Next we study the closure properties of the class of open sets; inother words, what operations can you perform on open sets and stillcome out with an open set?

899.2 Open Sets and Closed Sets

Page 109: Strichartz_The Way of Analysis 2000

This rather innocent theorem on the closure properties of open setsturns out to have an unexpected significance. In the general theory oftopology, properties 1 and 2 are chosen as the axioms for the abstractnotion of "open set" (together with a trivial axiom that the empty setand the whole universe are open sets). We will not discuss the generaltheory in this work, but from time to time we will point out sorne of theways in which abstract theories are forshadowed in concrete instances.

What is the significance of the open sets? To explain this let 's con­sider a closely related concepto A neighborhood of a point x is definedto be any open set containing x (sometimes the word "neighborhood"is taken to mean any set containing an open set containing z, and thenarrower meaning we have ascribed to the term is denoted by "openneighborhood"). A particularly simple neighborhood of x is an open in­terval containing z, or even an interval of the form (x - l/n, x + l/n).By the definition of open set, every neighborhood of x contains anopen interval containing x and, by the axiom of Archimedes, even oneof the special form (x - l/n, x + l/n). Thus, although there are an un­countable number of possible neighborhoods of z , if we are willing toshrink neighborhoods, we need only consider a countable number, the

it is that it Hes in Ak. The only hope is to take the intersection ofall the intervals, since that will lie in A. But is the intersection anopen interval containing x? A glance at Figure 3.2.4 will convince youthat the intersection is the open interval (a, b) where a is the largestof the ak 's and b is the smallest of the bk 's and of course it containsx since it is the intersection of intervals containing z: (Note that thisis where the argument breaks down for infinite intersections: if therewere an infinite number of intervals the intersection might not be anopen interval.) QED

Figure 3.2.4:

x ) )•(((

Chapter 3 Topology of the Real Line90

Page 110: Strichartz_The Way of Analysis 2000

We now turn to the closed intervals and, their generalization, the closedsets. We have said that the term "closed" is used to indicate that the setcontains a11the points that it "ought to" contain. Here we are thinkingthat a set "ought to" contain those numbers that can be approximatedarbitrarily closely by numbers already in the seto In this sense theendpoints of an interval ought to be in the interval, and their absence

3.2.2 Closed Sets

intervals (x - l/n, x + l/n). In most instances when we use the term"neighborhood" it comes quantified, "for a11neighborhoods of x blah­blah-blah", or "there exists a neighborhood of x such that phooey",and it usually does not change the meaning of the sentence to say "forevery interval (x -l/n,x + l/n) blah-blah-blah", or "there exists aninterval (x - l/n, x + l/n) such that phooey". This simple observationwiIl play an important role in Chapter 14 as part of a general strategyto replace uncountable coIlections of sets by countable co11ectionsofsets.

An important and typical example of the use of the neighborhoodconcept is in the definition of limito We have defined limj -00 xj = x tomean that given any error l/n, we can make IXj -xl less than that errorby taking j sufficiently large. But the condition IXj - xl < l/n is ex­actly the statement that Xj Hesin the neighborhood (x - l/n, x + l/n)of X. Bearing in mind what we said about replacing general neighbor­hoods by special ones, we expect the following reformulation of thedefinition of limit to be equivalent to the old one: limi-oo Xj = xillor every neighborhood 01 x, there exists m such that all Xj are inthat neighborhood [or j ~m. Indeed this says exactly the same asbefore for the special neighborhoods (x - l/n, x + l/n), and since ev­ery neighborhood contains a special one, the statement is true for aIlneighborhoods if and only if it is true for the special ones.

What is the advantage of reformulating the definition of limit asaboye? It shows that the concept of open set alone suffices to definelimit, without any reference to distance. We will find that this is true formany other concepts as we11.As an exercise, try giving the definition oflimit-point using neighborhoods. On the other hand, a concept like supor inf, which involves the order properties of the real number system,cannot be defined by neighborhoods alone.

913.2 Open Sets and Closed Sets

Page 111: Strichartz_The Way of Analysis 2000

If we reformulate the definition to inelude this, we can drop the elause

Figure 3.2.5:

). . (. ) .(

Note that this definition actually implies that every neighborhoodof x contains an infinite number of points of A. For if on the contrary itonly contained a finite number of points of A, sayal, ... ,an, we couldfind a smaller neighborhood of x that does not contain al, ... , an, asindicated in Figure 3.2.5.

in the open intervalleads to the expectation that the open interval isn'tclosed.

To make these ideas precise we need to introduce the concept oflimit-point for sets. This will be analogous to the concept of limit­point for sequences but with a subtle twist. Let A denote a set oí realnumbers, and let x denote a real number (perhaps in A, perhaps not).We want to say that x is a limit-point of A if there are points in A thatapproximate x arbitrarily closely. We can say this in terms of distance:given any error lln, there exists a point Yn in A (depending on n)such that Ix - Ynl < l/n; or we can say it in terms of neighborhood:every neighborhood of x contains a point in A. It is easy to see thatthese statements are equivalent; however, they are not quite right. Thereason is that they allow x to approximate x. If A contains x, then theaboye requirement is trivially satisfied by taking the point x in A, sowe would end up having all the points of A automatically limit-points.While this makes perfect logical sense, it fails to capture the meaningwe want, which is that of an infinite cluster oí points in A around x(the synonym "cluster point" reinforces this point). Thus we must addto the definition a clause that eliminates x from consideration as anapproximating point.

Deflnition 3.2.1 x is a limit-point 01 A if given any error lln, thereexists a point Yn 01 A not equal to x satisfying IYn - xl < lln or,equivalently, il every neighborhood 01 x contains a point 01 A not equalto x..

Chapter 3 Topology oí the Real Line92

Page 112: Strichartz_The Way of Analysis 2000

excluding x: x is a limit-point 01A il every neighborhood 01x containsinfinitely many points of A.

Note that if x is contained in an interval (closed or open) containedin A, then x is a limit-point of A. In particular, every point of an openset is a limit-point. But there are other ways of being a limit point.The set {1, 1/2, 1/4, ... } has Oas limit-point; in fact Ois the on1y limit­point of this seto The set of integers has no limit-points, for no numberhas an infinite number of integers nearby.

What is the relationship between the concepts of limit-point forsequences and sets? If we denote the sequence by Xl, x2, ... and the setby {x j }, are the limi t-points of Xl, x2, . . . the same as the limi t pointsof {Xj}? Unfortunately, the answer is no. The simplest example is thesequence 5, 5, 5, ... , which has the limit-point 5. The corresponding setcontains just the one point 5, so it has no limit-points. Clearly theculprit here is repetition. If the sequence has no repetition, or onlya finite number of repetitions, then the two concepts of limit-pointcoincide. In the general case we can only say that a limit-point ofthe set is a limit-point of the sequence. Incidentally, this confusionwould not be cleared up if we adopted the convention of allowing x toapproximate itself. See exercise set 3.2.3, number 3.

Now we can define a closed set to be any set that contains all itslimit-points. Note that we do not require that all points of the set belimit-points (such sets are called perfect sets). A set with no limit­points, such as the empty set, or a finite set, is automatically closed. Aclosed interval [a, b] (with a ::;;b) is a closed set¡ in fact, it is a perfectseto A non-empty open interval (a, b) with a or b finite is not closedbecause the finite endpoint(s) are limit-points. However the whole line(-00,00) is closed (note that we have not defined the possibility of±oo being limit-points of a set; although we could do so in the sameway we did for sequences, it would confuse matters too much).

The whole story of closed sets is revealed in the following basictheorem:

Theorem 3.2.2 A set Í8 closed if and only if its complement Í8 open.

To simplify the proof we first prove a lemma, If B denotes any setin 1R, write B' for its complement, B' = {x in 1R: x is not in B).

Lemma 3.2.1 A point x in B' is not a limit-point 01 B il and only ilx is contained in an open intenJal contained in B'.

933.~ Open Sets and Closed Sets

Page 113: Strichartz_The Way of Analysis 2000

Proof:a. Let Bl, ... ,Bn be closed sets, and let B = B, U ... UBn' To

show that B is closed, we have to show it contains a11its limit-points.So let x be a limit-point of B. Does this mean x is a limit-point of Bl?Perhaps not, as Figure 3.2.6 suggests.

Of course x is a limit-point of B2 in this case. This suggests thatperhaps we must have x as a limit-point of one of the sets BI, ... ,Bn.If this is true it will certainly do the trick, for each of these sets is

2. The intersection 01 any number 01 closed seis ís a closed seto

1. The uníon 01 a finíte number 01 closed set« is a closed seto

Theorem 3.2.3

Note that this theorem does not say that if B is not closed, thenB is open. Most sets are neither open nor closed. Can you give anexample?

Using this theorem we can deduce many properties of closed setsfrom properties of open sets. For example, since union and intersectionare interchanged by complementation, it follows that the closed setsare closed under finite unions and arbitrary intersections. Of course wecan also prove this directly, and it is worth doing so.

Proof of Theorem 3.2.2: The definition that B be closed says "for a11z, x is a limit-point of B implies x is in B". Replacing the implicationby its contrapositive yields the equivalent statement "for all x, x is inB' implies x is not a limit-point of B". Since x is assumed to be in B',we can use the lemma to replace "x is not a limit-point of B" by theequivalent "x is contained in an open interval contained in B'''. Wenow have the definition that B' be open: for a11x in B', x is containedin an open interval contained in B'. QED

Proof: x is not a limit-point ofB means that there exists a neighbor­hood (x - l/n, x + l/n) containing no points of B other than z: But xalready is in B', so (x - l/n, x + l/n) is the open interval (containingx) that is contained in B'. Conversely, if x is in (a, b) contained in B',then (x - l/n, x + 1/n) is contained in (a, b), hence B', for sufficientlysma11l/no This implies x is not a limit-point of B. QED

Chapter 3 Topology of the Real Line94

Page 114: Strichartz_The Way of Analysis 2000

Let 's look at an example of a closed set that is perhaps a bit morecomplicated than you might expect. It is called the Cantor set andis the prototype of a large family of sets, which are called somewhatloosely Cantor seta. These sets are obtained from the closed interval[0,1] by removing a countable collection of open intervals, so they are

closed, hence contains all its limit-points, so x would belong to one ofthese sets, hence to the union.

Let's try to prove it. What do we know? Since x is a limit-point ofB, every neighborhood of x contains a point, not x, of B. Now B is theunion oí the Bj's, so to each neighborhood, say (x - l/k, x + l/k) ofx, there is a point Yk in the neighborhood in one of the B/s. Now, wewant to focus on each of the sets BI, B2, ••• ,Bn in turno The Yk 's aredistributed among them in some unspecified way. But since there areonly a finite number of sets and an infinite number of points YI, Y2, ..• todistribute among them, one of the sets must contain an infinite numberof points. Call the set B29 and the points Ú¡, Y2' lh' .... Since the pointsY~, Y2' Y3,·· . approximate x arbitrarily closely, x is a limit-point of B29.(The fact that not all the points in the original sequence YI, Y2, ...belong to B29 is not important-the ones that do, being infinite innumber, get into every neighborhood oí x and so x is a limit-point.)Thus we have x in B29 because B29 is closed and, hence, x in B.

b. Let 8 be any collection of closed sets, and let nBfB B be theintersection. To show the intersection is closed we have to show itcontains all its limit-points. So let x be a limit-point. It follows easilyfrom the definition of limit-point that if you increase a set you do notlose any limit-points (you may gain some). Since each B in 8 containsthe intersection, it follows that x is a limit-point of each B in B. Sinceeach B is closed, x belongs to each B and, hence, to the intersection.QED

Figure 3.2.6:

953.2 Open Seta and Closed Seta

Page 115: Strichartz_The Way of Analysis 2000

Iterating this process infinitely often produces the Cantor seto Thereis another way to describe this seto Write the numbers between Oand1 in base-3 notation. Then the numbers in the middle third (1/3,2/3)are those that begin .1 ... , those in the second set of middle thirds(1/9,2/9) and (7/9,8/9) are those that begin .01 ... and '.21.... Thusall the deleted numbers are those with a 1 in their base-3 expression, sothe Cantor set consists of a11nurnbers expressible with just O's and 2's(note that numbers like 1/3 and 2/3 that have ambiguous expressions,.1000 ... or .0222 ... for 1/3, .2000 ... or .1111 ... for 2/3, are ineludedin the Cantor set-as they should be-because they have one expressionnot involving 1's).

The Cantor set is a perfect set but it contains no intervals. It isuncountable (as all perfect sets must be, although this is more difficultto prove). We leave these facts as exercises.

We conclude this section with sorne concepts related to open andclosed sets. A point x is said to be in the interior of a set A if Acontains a neighborhood of e. Every point in an open set is in itsinterior. The concept of interior points is elearly a localized version ofopen seto Note that the interior of any set is autornatically an open seto

O

E ] E ]O 1/3 2/3

E---3 E-3 E---3 E---3O 1/9 219 1/3 213 7/9 8/9 1

Figure 3.2.7:

the countable intersection of finite unions of closed intervals, henceclosed sets. In this example we will successively remove the middle­third of every closed intervalo In the first stage we remove the interval(1/3,2/3) and so are left with the two intervals [0,1/3] U [2/3, 1]. Inthe second stage we remove the middle-third of each of these inter­vals, (1/9,2/9) and (7/9,8/9) respectively, leaving the four intervals[0,1/9] U [2/9,1/3] U [2/3,7/9] U [8/9,1], as shown in Figure 3.2.7.

Chapter 3 Topology of the Real Line96

Page 116: Strichartz_The Way of Analysis 2000

In fact it is the largest open set contained in the set (see exercises).If A is any set, the closure of A is the set consisting of all the

points of A together with a11the limit points of A. Thus a set is closedif and only if it is equal to its closure. The closure of A is always aclosed seto This is not obvious from the definition but requires a proof.The issue here is that by adding the limit-points oí A to A, we mightconceivably produce new limit-points that were not there before. Butwe can rule out this possibility as follows: Suppose x is a limit-pointof the closure of A. This means that every neighborhood oí z, say(x - 1/n, x + l/n), contains points Yn not equal to x in the closureof A. If Yn belongs to the closure of A either Yn belongs to A or Ynis a limit-point of A. Now we want to show that x is a limit-pointof A; so we need to show that (x - lln, x + l/n) contains a point ofA. Now if Yn is in A we are done. If not, Yn is a limit-point oí A,so the neighborhood (Yn - 1/n, Yn + 1/n) contains points of A, say Zn(choose Zn =F z , which is possible because we know there are infinitelymany-hence at least two-points of A there). From IZn - Yn I < 11nand Ix - Ynl < 1/n we obtain IZn - xl < 21n, so the neighborhood(x - 2/n, x + 21n) contains a point of A not equal to X. Clearly thefactor of 2 is irrelevant, so x is a limit-point of A.

The closure of A is thus closed. Also it is clearly the sma11estclosedset containing A. One could very easily obtain the sma11estclosed setcontaining A by taking the intersection of a11closed sets containingA-but this abstract construction is less informative.

If Bis a subset of A such that A is contained in the closure of B (soB ~ A ~ closure (B)), we say that B is dense in A. Put another way,B is dense in A (or we say B is a dense subset of A) if B is a subsetof A, and every point in A is either a point of B or a limit-point of B.For example, the rational numbers are dense in the real numbers, sinceevery real number is a limit of a sequence of rational numbers-that'show we constructed the real number system. The open interval (a, b)is dense in the closed interval [a, b]. Dense subsets are very convenientand are used in the fo11owingmanner. If you want to prove that everypoint in A has a certain property that is preserved under limits, thenit suffices to prove that every point in a dense subset B of A has thatproperty. The dense subset might be simpler and smaller than A.

973.2 Open Sets and Closed Sets

Page 117: Strichartz_The Way of Analysis 2000

10. Show that the set of numbers of the form k/Sn, where k is aninteger and n a positive integer, is dense in the lineo

9. Given a closed set A, construct a sequence whose set oí limit­points is A. (Hint: use exercise 6.)

8. Show that the set of limit-points oí a sequence is a closed seto

7. Give an example of a set A that is not closed but such that everypoint oí A is a limit-point.

6. Prove that every infinite set has a countable dense subset. Givean example of a set A such that the intersection of A with therational numbers is not dense in A.

5. Let A be a closed set, X a point in A, and B be the set A with xremoved. Under what conditions is B closed?

4. Let A be a set and x a number. Show that x is a limit-point of Aif and only if there exists a sequence Xl, X2, ••. of distinct pointsin A that converges to X.

3. Suppose that the definition of limit-point of a set is changed tothe one first suggested (every neighborhood of X contains a pointof the set-without requiring the point to be different from z).Give an example to show that it would still not be true that thelimit-points of a sequence and the limit-points of the underlyingset must be the same. Can you show that one contains the other?

2. Let Xl, X2, •.• be a sequence, and let A be the set whose elementsare Xl, x2, . ... Show that a limit-point of A is a limit-point ofthe sequence. Show that if no point in A occurs more than afinite number of times in the sequence, then a limit-point of thesequence is a limit-point of the seto

1. Let A be an open seto Show that if a finite number of points areremoved from A, the remaining set is still open. Is the same trueif a countable number of points are removed?

3.2.3 Exercises

Chapter 3 Topology of the Real Line98

Page 118: Strichartz_The Way of Analysis 2000

Infinite sets are more difficult to deal with than finite sets becauseof the large number of points they contain. Nevertheless, there is aclass of infinite sets, called compact sets, that behave in certain limitedways very much like finite sets. The compact sets of real numbers turnout to be exactly the sets that are both closed and bounded, but thisis a theorem and not the definition of compactness. The concept ofcompactness is not confined to sets of real numbers; we shall deal withit again later in other guises.

In what way can an infinite set behave like a finite set? Considerthe infinite pigeon-hole principle: if an infinite number of letters arriveaddressed to a finite number of people, then at least one person mustreceive an infinite number of letters. In more conventional terms, if

3.3 Compact Sets

16. Show that an open set cannot be written in two different ways asa disjoint union of open intervals (except for a change in the orderof the intervals).

15. Show that a union of open intervals can be written as a disjointunion of open intervals.

14. What sets are both open and dosed?

13. Define the derived set of a set A as the set of limit-points of A.Prove that the derived set is always closed. Give an example of aclosed set A that is not equal to its derived seto Give an exampleof a set A such that the derived set of A is not equal to thederived set of the derived set of A. (Note: Cantor was originallyled to study set theory in order to understand better the notionof derived set and to answer questions similar to the above.)

11. Show that the set of numbers in the interval [0,1] having decimalexpansions using only odd digits is closed. Describe this set by aCantor-set type construction.

12. a) Show that the Cantor set is a perfect set that contains noopen intervals. Show that it is uncountable. b) Are the samestatements true of the set in exercise 11?

993.3 Compact Sets

Page 119: Strichartz_The Way of Analysis 2000

Proof: We have already seen the necessity of the conditions that theset be closed and bounded. Conversely, let A be closed and bounded,and let Xl, X2,'" be any sequence of points in A. Since A is bounded,the sequence is bounded¡ and we have proved that a bounded sequencepossess a limit-point. But A is closed and contains all its limit-points,

Theorem 3.3.1 A set 01 real numbers is compact if and only il it isclosed and bounded.

What kind of sets can be compact? Certainly only closed sets. Forif A is not closed, it must have a limit-point y not in A; but then bythe definition of limit-point we could construct a sequence z ¡,X2, ••• ofpoints in A that converge to y. Thus y would be the sole limit-pointof the sequence, and the defining condition of compactness would fail.

For similar reasons an unbounded set can never be compacto Indeedif A is unbounded we can find a sequence of points Xl, X2, •.. in A suchthat Xn > n or Xn < -n, and such a sequence clearly has no finitelimit-point.

Thus only closed and boutuledsets can be compactoWe will show,conversely, that all closed and bounded sets are compacto The proofwill be easy because we know that every bounded sequence has a finitelimit-point.

Deftnition 3.3.1 A set A 01real numbers is said to be compact il ithas the property that every sequenceXl, X2,'" 01real numbers that liesentirely in A has a (finite) limit-point in A (or, equivalently, has asubsequence that converges to a point in A).

Xl, X2, •.• is an infinite sequence 01real numbers, and each xj belongsto a finite set A then at least one element 01A must be equal to xj[or an infinite number 01j 's. Now if A were an infinite set, this state­ment is obviously falseo However, we could hope for a slightly weakerconclusion: that A contains a limit-point of the sequence. After all,a limit-point is one that is approximated arbitrarily closely infinitelyoften, and for most purposes such approximation is just as good asequality.

Let us take this property as the definition of compactness (thereare several other equivalent conditions, and any of them could serve asweH for the definition).

Chapter 3 Topology of the Real Line100

Page 120: Strichartz_The Way of Analysis 2000

We consider now another property of compact sets that resemblesa property of finite sets. This may seem rather artificial at first, butit turns out to be extremely useful. We introduce the concept of acover of a set A, which is any collection of sets, finite or infinite innumber, whose union contains A. We have the obvious ''picture" ofthe sets in the cover covering A. Now if A is a finite set, we clearlyhave no need for an infinite number of sets to cover it. Nevertheless, byvery bad planning, we might find ourselves with an infinite collection8 of sets that cover A. In that case we could certainly simplify thingsby throwing away all but a finite number oí the sets in 8. We needonly select one set containing each point of A and throw the rest awayas redundante We define a subcouer of a couer to be a subset 81 of8 that is still a cover (note that in 8 we are dealing with a set ofsets; the "subset" in the definition refers to the big collection .8, notto the individual sets in 8 that are either accepted whoIe into 81 ordiscarded; similarly when we speak of finite covers we mean that thebig set 8 should be finite, consisting say of BI,B2,"" Bn, but thesets Bj themselves may be infinite). Then we can expresa this trivialproperty of finite sets as follows: every cover contains a finite subcover.

Again we have a statement that is obviousIy false for any infiniteset; just consider a cover by sets with one elementoThe remarkable factis that we obtain a true princíple for compact sets by simply requiringthe sets in the cover to be open sets! We call such a cover an opencover. Why should an open cover be better than an arbitrary cover?

so we need only show that a limit-point of the sequence is a limit-pointof the seto Actually this is a statement about limit-points that is falsein general. Nevertheless we can still save the proof. Recall that y is alimit-point of the sequence if every neighborhood of y contains infinitelymany points in the sequence. For y to be a limit-point of the set, everyneighborhood of y must contain points of A not equal to y. Clearly theonly way things could go wrong would be if all these points were equalto y. But if y ever appears in the sequence that means y is in A, whichis what we are trying to proveo Thus to complete the proof we needonly consider first the special case when the limit-point y appears inthe sequence--in that case y is already in Aj and then in the contrarycase, when y never appears in the sequence, we can conclude that y isa limit-point oí A and then that y is in A since A is closed. QED

1013.3 Compact Sets

Page 121: Strichartz_The Way of Analysis 2000

Clearly we cannot remove any one of these sets without uncovering apoint {ifwe remove (l/{n + 2), l/n) we uncover the point l/{n + 1)).Of course we have not quite covered the closed interval [O, 1]. We havemissed the point O. Well, why not just add one more set to the cover tocover O. Aye, but there's the rub! The set must be open, and, therefore,it must contain a neighborhood of zero, say (-l/n, l/n). But then wecan use this set and discard all but a finite number of the other sets;we need keep only (1/2,2) and (l/{k + 2), l/k) for k = 1,2, ... ,n - 1.

Well, perhaps we need to try a more ingenious method of covering.1 will not pursue the matter but invite you to try your own ideas forcovering the closed interval [O, 1] by an infinite number of open sets so

Figure 3.3.2:

2o 1/5 1/4 1/3 1/2

Let us attempt to cover the closed interval [0,1] by open sets so that nofinite subcover exists. We might try something like (1/2, 2) , (1/3, 1) ,{1/4, 1/2} , (1/5, 1/3) , and so on, as in Figure 3.3.2.

Figure 3.3.1:

~)((

Essentially because it is quite difficult to cover things with open sets.Two open intervals that intersect must overlap, and two open intervalsthat just nestle together fail to cover the point in between, as in Figure3.3.1.

Chapter 3 Topology oí the Real Line102

Page 122: Strichartz_The Way of Analysis 2000

But this means the neighborhood (y - 11n, y + 1/n) of y contains nopoints of A, contradicting the fact that y is a limit-point of A.

Thus only closed sets can have this property. We can also easilyshow that only bounded sets can have this property. For the open sets(n, n + 2), with n varying over all integers, cover the whole line, hence anyset A. If a finite number cover A, then A is bounded.

So far we have shown that if a set has the property that every opencover has a finite subcover, then the set must be closed and bounded,hence compacto The converse is also true and is called the Heine-BorelTheorem. You may very well feel at this point that such a theoremcould not possibly be very interesting-I certainly felt that way when 1first encountered this theorem. 1 hope that when you see the way thistheorem is used, you will gain an appreciation for it.

Figure 3.3.3:

•) A(y+ lInyy-lIn

A

that no finite subcover exists. Remember that you must cover everypoint and that you must show that no finite subcover is possible, notmerely that one particular attempt to find a finite subcover fails. 1promise you it will be a frustrating experience.

Let's turn the problem around, then, and try to find sets that donot have the property that every open cover has a finite subcover. Wehave already found one such set, namely (0,1]. Note that this set is notclosed because it does not contain the limit-point O. Can we show thatevery set that fails to be closed fails to have this property? Stated incontrapositive form, if every open cover of set A has a finite subcover,can we prove that A is closed? Suppose y is a limit point of A. Wehave to show y is in A. Suppose it were noto We want to constructan open cover of A with no finite subcover. Let's try the complementsof closed intervals [y - 11n, y + 1/n]-we take complements of closedintervals to get open sets. Clearly this covers the whole realline exceptfor the point y and so covers A since y is not in A. Suppose it had afinite subcover. Since these are nested sets (they increase with n), thiswould mean that a single one contains A, as in Figure 3.3.3.

1033.3 Compact Sets

Page 123: Strichartz_The Way of Analysis 2000

Since B is open, it must contain an open interval J containing z , and byshrinking the interval J if need be, we can arrange for it to have rationalendpoints, as shown in Figure 3.3.4. So now we have the existence ofat least one set in 8 containing J; hence we must have selected a set inB containing J for our subcover, and since x is in J, it is covered.

Thus we have a subcover Bl, B2, ••• that is at most countable (ifit isfinite then there is nothing more to prove). We now come to the heart ofthe proof: ifwe take n large enough, then Bl, B2,"" Bn already coversA. Suppose noto Then for each n there is a point Xn of A that isn'tcovered by Bl, B2, .. " Bn. Since A is compact, the sequence Xl, X2, ...has a limit-point X in A. Now, since the infinite collection Bl, B2,'"covers A, there must be a Bk that contains x. But Xk, Xk+1, Xk+2,' ..are not in Bk by the choice of the Xk. We now have a contradiction,because if x is a limit-point ofthe sequence Xl, X2, ..• , the neighborhood

Figure 3.3.4:

x

I( .)(

Proof: Let A be a compact set and 8 an open cover. As a first stepwe show that B has a countable subcover. (This part of the argumentdoes not use the fact that A is compact.) The idea is that, althoughthere are an uncountable number of open sets, there exists a countableset of open sets-namely, the open intervals with rational endpoints­that suffices for most purposes. Clearly the open intervals with rationalendpoints form a countable set because they are indexed by a pair ofrational numbers-the endpoints. We wiIl choose a subcover of B thatis in one-to-one correspondence with a subset of the collection of openintervals with rational endpoints and, hence, is a finite or countablesubcover. Here is how we do it: for each such interval J, we choose oneset in 8 that contains J, if there are any. (If there are none, leave weIlenough alone.) We claim that we still have a cover of A. Indeed, startwith any point x in A. We need to show it is covered. Since all of B isa cover, there must be a set B in 8 containing z,

Theorem 3.3.2 Every open cover 01 a compact set has a finite sub­cover.

Chapter 3 Topology of the Real Line104

Page 124: Strichartz_The Way of Analysis 2000

There are other equivalent statements, but they are less importantand are left as exercises. We conclude this section with an importantproperty of compact sets. A sequence of sets Al, A2, ... is called nestedif An contains An+1 for every n. For a nested sequence of sets, theintersection of the first n, Al n A2 n ... n An, is equal to An, so we maythink of the intersection of all of them, f1::l An, as a kind of "limit"

3. Every open cover of A has a finite subcover.

2. Every sequenceof points in A has a limit-point in A.

1. A is closed and bounded.

The property that every open cover has a finite subcover is some­times taken as the definition of compactness; in that case the Heine­Borel Theorem says that every closed, bounded set is compacto Tobring these ideas full circle we should prove directly that if A is a setwith the property that every open cover has a finite subcover, thenevery sequence of points in A has a limit-point in A. We already havea two-stage proof of this; the argument given before the Heine-BorelTheorem shows A is closed and bounded, and this implies A is com­pacto For a more direct proof, consider a sequence Xl, X2,'" in A withno limit-point in A. Then the closure of the set {Xl, X2, ••• } is closedand contains no points of A other than Xl, X2, •••• Nowthe complementof this set is open. Take it as the first set Bo in an open cover, Forthe other sets Bi, B2, ... in the cover simply choose Bj to be an openinterval containing Xj but no other XIc that is not equal to Xj. ThenBo, Bi ;B2, ••• is an open cover of A, since Bo contains a11points of Aexcept Xl, x2, .... If this open cover is to have a finite subcover, thenthere can only be a finite number of distinct points in the sequence-forBo contains none of them, and each B¡ for j ~1contains exactly oneof them. But if the sequence contains only a finite number of distinctpoints at least one of them must repeat infinitely often and then thatpoint is a limit-point of the sequence in A.

To summarize, we have shown that the following three conditionson a set A are equivalent:

BIe of x must contain infinitely many of them. The contradiction showsthat Bl, •.• .B¿ must cover A for some n. QED

1053.3 Compact Sets

Page 125: Strichartz_The Way of Analysis 2000

2. Show that the following finite intersection property for a set Ais equivalent to compactness: if B is any collection of closed setssuch that the intersection of any finite number of them containsa point of A, then the intersection of all of them contains a pointof A. (Hint: consider the complements of the sets of B.)

1. Show that compact sets are closed under arbitrary intersectionsand finite unions.

3.3.1 Exercises

Proof: Let Al, A2! .•• denote the nested sequence. Choose pointsXn in An' The sequence Xl, x2, .. ' lies in Al by the nesting and hasa limit-point X in Al since Al is compacto But x is also a limit-pointof xn, Xn+l, •.. and since this sequence lies in An, x must He in An(again we must argue separately that either x equals one of the pointsXn, Xn+b ... or x is a limít-point of the set {xn, Xn+b ..• } and, hence,is in An since An is closed). Since x lies in An for every n, x is in theintersection, so the intersection is non-empty. QED

Theorem 3.3.3 A nested sequence 01 non-empty compact sets has anon-empty intersection.

there is no point in the intersection. The same is true if we take forAn the closed set {x;x ~ n}. However, ifthe sets An are compact, thiscannot happen.

Figure 3.3.5:

o 1/4 1/3 1/2)) ))(

of the sequence. If the sets An are all non-empty we might expect thisintersection to be non-empty also, but this need not be the case. If wetake the sequence of open intervals (O, 1), (O,1/2) , (O,1/3) , (O, 1/4) , ... ,as shown in Figure 3.3.5,

Chapter 3 Topology of the Real Line106

Page 126: Strichartz_The Way of Analysis 2000

Deftnition y is an upper bound for a set E if x :$ y for all x in E.

Deftnition limj_ooxj = +00 if for every n there exists m such that»s ~ n for all j ~ m.

3.4 Summary

3.1 The Theory of Limíts

10. Find necessary and sufficient conditions for A to be the comple­ment of a compact seto

9. Show that every infinite compact set has a limit-point. Is thesame true of closed sets? of open sets?

8. If A is compact, show that sup A and inf A belong to A. Give anexample of a non-compact set A such that both sup A and inf Abelong to A.

7. Which of the analogous statements of exercise 6 are valid for theproduct set A·B (consisting ofall products a·b)? Can you modifythe false ones slightly to make them true?

6. For two non-empty sets of numbers A and B, define A + B to bethe set of all sums a + b where a is in A and b is in B. Showthat if A is open, then A + B is open. Show that if A and B arecompact, then A + B is compacto Give an example where A andB are closed but A + B is noto

5. For which compact sets can you set an upper bound on the numberof sets in a subcover of an open cover?

4. If A ~ s, U B2 where e, and B2 are disjoint open sets and A iscompact, show that AnBl is compacto Is the same true if Bl andB2 are not disjoint?

3. If B1, ••• .B¿ is a finite open cover of a compact set A, can theunion B, U ... U Bn equal A exactly?

1073.4 Summary

Page 127: Strichartz_The Way of Analysis 2000

Deflnition A set is open il every point 01the set lies in an open intenJalentirely contained in the seto

3.2 Open and Closed Sets

Theorem 3.1.5 A bounded sequence converges il and only if it hasonly one limit-point (il and only if the llmsup and liminf are equal).

Theorem 3.1.4 The set 01 limit-points in the extended reals 01a se­quence is non-empty, containing limsup, which is its sup, and liminf,which is its inf.

Deflnition 3.1.2limsuPk_ooXk = limk_ooSUPj>kXj and liminfk_ooXk =limk_ooinfi>kX) .

Theorem 3.1.3 X is a limit-point 01Xl, X2, ••• il and only il thereexists a subsequencewith limit X.

Deflnition A sequenceYI, Y2, ••• is said to bea subsequence01Xl, X2, •••

il there is an increasing function m(n) (meaning m(n + 1) > m(n) [orall n) such that Yn = Xm(n).

Deflnition 3.1.1 A real number X is said to be a limit-point 01a se­quenceXl, X2, ••• if [or every n there exists an infinite number 01temuXi such that Ix - Xj I < l/no

Theorem 3.1.2 A monotone increasing sequence that is boundedfromobooehas a finite limit, equal to the supo

Deflnition A sequenceXl, X2, •.• is said to be monotone increasing ilXj+! ~ xi [or every j.

2. if y is an upper bound[or E then y ~ sup E.

1. sup E is an upper bound[or E.

Theorem 3.1.1 For every non-empty set E 01 real numbers that isbounded aboue there exists a unique real number sup E such that

Chapter 3 Topology of the Real Line108

Page 128: Strichartz_The Way of Analysis 2000

Deftnition A subset B 01 A is said to be dense in A il the closure 01B contains A.

Theorem The closure 01 a set is closed.

Deftnition The closure 01 a set is the union 01 the set and all itslimit-points.

Deftnition The interior 01 a set A is the subset 01 all points which liein an open interval entirely contained in A.

Deftnition The Cantor set is the subset 01 [0,1] 01 all numbers ex­pre8sible in base 3 with digits O and 2.

Theorem 3.2.3 Closed seis are preserved under finite unions and ar­bitrary intersections.

Theorem 3.2.2 A set is closed il and only il its complement is open.

Deftnition A set is said to be closed il it contains all its limit-points.

Deftnition 3.2.1 x is a limit-point 01 a set A il every neighborhood 01x contains a point 01A different from e.

Theorem x = limj_ooxj il and only il every neighborhood 01 x con­tains all but a finite number 01 the Xj.

Deftnition A neighborhood 01 a point is an open set containing thepoint.

Theorem 3.2.1 Open sets are preserued under arbitrary unions andfinite intersections.

Theorem A set 01 reals is open il and only il it is the disjoint union01 at most countably many open intervals.

1093.4 Summary

Page 129: Strichartz_The Way of Analysis 2000

Theorem 3.3.3 A nested sequence 01 non-empty compact sets has anon-empty intersection.

Theorem 3.3.2 (Heine-Borel) A set is compact il and only il it hasthe property that every open cover has a finite subcover.

Theorem 3.3.1 A set 01 real numbers is compact il and only il it isclosed and bounded.

Deftnition3.3.1A set A 01real numbers is said to be compact il everysequence 01points in A has a limit-point in A.

3.3 Compact Sets

Chapter 3 Topologyof the Real Line110

Page 130: Strichartz_The Way of Analysis 2000

111

Deftnition 4.1.1 A function consist« 01a domain D, a ronge R-bothsubsets 01the real numbers, and a correspondencex --t I(x) where x isa variable point in D and I(x) is a point in R. We do not require thatevery point in R actually occurs as I(x) [or some x in D. We will callthe image /(D) the set 01values I(x) as x varies in D. The image is asubset 01the ronge. We say the function is onto il the image equals the

In this section we introduce four important concepts: 1) functions, 2)continuity, 3) uniform continuity, and 4) limita oí functions. There aresome subtle distinctions to be made here, and it will be important tolook at some examples and to pay attention to the motivation behindthe definitions. It ismore common to define limita oí functions first andthen base the definition of continuity on the notion oí limits. However,it is easier to motivate the definition of continuity, so that is the orderwe follow.

The abstract notion of function is that of a correspondence betweensets. Two sets, called the domain and the range, are given; and toeach element x of the domain there is given an element I (x) of therange. We will study functions whose domain and range are sets oíreal numbers, and unless explicitly stated otherwise, the term functionwill be reserved for this special case.

4.1.1 Deftnitions

4.1 Concepts of Continuity

Continuous Functions

Chapter 4

Page 131: Strichartz_The Way of Analysis 2000

1. Explicitly, the rule z ~ f (z) is given by a formula, and the domainis specified.

2. Implicitly, the correspondence is given by the solution of an e­quation involving x and f(x) = y, and the domain consists of a11x for which the equation has a solution. In this case you haveto be careful about uniqueness of the solution, since we insist onsingle-valued functions. For example the equation x2 + y2 = 1does not lead to a single-valued function, but together with thecondition y ~ O we obtain the function f (e) = - v"f'=X2 withdomain [-1, 1J.

This is not the only concept of function that mathematicians haveput forth. The concept of a function as a formula was very prevalentduring the eighteenth century, but it led to great confusion since theconcept of "formula" kept changing. Later, when we discuss Fourierseries, we will discover that many of the functions in our agnostic sensedo have formulas after all. In the twentieth century, various concepts ofrecursive function and constructible function in which the correspon­dence X ~ f(x) must be given in a manner that would in principle becomputable, have been put forth. In my opinion it would be impossi­ble to learn about these concepts without first mastering the agnostictheory presented in this work.

Basically there are three ways a function can be specified:

range. We say the function is one-to-one il f(xl) # l(x2) il Xl # X2for any Xl and X2 in the domain. We will usua"y take the range R to bethe whole real line R, and we will not distinguish between functions thatare the same except for the range Uor example, f(x) = x2 with domainR. and range R and f(x) = x2 with domain R and range y ~ O). Houieverwe must distinguish between functions with the same rule and differentdomains (f(x) = x2 with domain R and f(x) = x2 with domain [O,IJare different functions). Conceming the correspondence X ~ f(x), wemalee no assumptions. It may be given by a recognizable rule, or by abizarre rule, or by no describable rule at all. We only require that thereis a unique value f (x) that can or could be determined if the value ofx is given. This might be described as an agnostic 11Íew01 function. Itis one of the broadest views possible+ezcepí: that we are requiring thatthe function be single-valued.

Chapter 4 Continuous Functions112

Page 132: Strichartz_The Way of Analysis 2000

The concept of function, even restricting the domain and range tosets of numbers, is in a way too general to be very interesting. Cer­tainly in order to do any analysis we have to restrict further the kindof functions with which we deal. In this chapter we will discuss con­tinuous functions. This is neither the largest nor the smallest class offunctions that is convenient for analysis, but it is perhaps the most in­tuitive. Unfortunately there are many intuitive ideas behind the notionof continuity. Some of them are quite helpful and valid, while othersmay lead to confusion. Our goal is to provide a rigorous frameworkto which we can attach our intuitions and to make some subtle butimportant distinctions that would otherwise slip by.

One rather geometric intuitive idea of a continuous function is onewhose graph consists of a single, connected piece. Now it is indeedpossible to make a precise mathematical definition of connected setsin the plane so that the graph of a continuous function is connected.However, it tums out that there are functions that are not continuouswhose graphs are also connected. Thus we will put aside the intuitiveidea of continuous as ''without a break" or "drawable without liftingthe pen" and turn to other ideas.

Let us for the moment think of a function as a mathematical repre­sentation of a relationship between variables in the real world. Thinkof a measurement or experiment where the input is x and the outputis / (z ). One of the things we require of experimental science is re­peatability. If we put the input x in several times, we should alwaysget / (z) out. The requirement that a function be single-valued wouldseem to take care of this, but in fact it is not enough. The reasonis that in any real situation we cannot control the input exactly, anymore than we can measure the output exactly. So /(xI) and /(X2)might represent outputs of what we believe to be identical experimentsif Zl and Z2 are very close to each other. Therefore the requirementof repeatability is actually that very close values of the input shouldyield very close values of the output. This is exactly the intuitive ideabehind continuity.

3. By giving the graph, which is by definition the set of ordered pairs(z, /(x)) where z varies in the domain. We follow the usual Carte­sian convention of picturing the ordered pairs of real numbers aspoints in the Euclidean planeo

1134.1 Concepts 01 Continuity

Page 133: Strichartz_The Way of Analysis 2000

Let us look at the situation a little more closely. What do wemean by very close values of the input and output? Clearly we mustbe referring to sorne condition like Ix - xol < l/n for the input andI/(x) - I(xo)l < l/m for the output, where l/n and l/m are smallerrors. The statement ''very close values of the input yield very closevalues ofthe output" then should translate into "Ix - xol < l/n impliesII(x) - f (xo) I< 1/m". But this is still vague, for we have not specifiedthe relationships between the errors l/n and l/m. We have the correctstatement, but we have to decide on the quantifiers and their order.To help us decide, let 's ask how small an error in the output we wouldaccept. Would 1/10 be good enough? Or 1/100 or 1/1023? Isn't thetolerance for error in the output a relative judgment-one that shouldnot be made once and for all? Today we might be happy with anerror of 1/1,000, but tomorrow we might want to do better. To builda mathematical theory on a fixed notion of acceptable error would beabsurdo We must have the flexibility to make the error in the output assmall as we like. Thus the first quantifier must be "for all errors l/min the output". To meet any given tolerance for error in the output, wemay have to take drastic action to control the error in the input. Againwe do not want to say in advance how small this error must be; onlythat some small error in the input will do the trick. Thus the secondquantifier must be "there exists an error l/n (depending on l/m) inthe input". Altogether we now have: "for every l/m there exists l/nsuch that Ix - xol < l/n implies I f (x) - f (xo)1 < l/m". In the familiare - 8 formulation we would write 8 for l/n and e for l/m.

We are not quite done, for we have not specified what x and Xo areand how the errors l/m and l/n relate to them. This is by no meansa trivial question. In fact there are two distinct concepts that awaitthe resolution-in two different ways-of this question. Let us imaginethat Xo is the value-perhaps idealized-of the input variable in whichwe are interested. Then x represents a nearby value, which must beallowed to vary over the domain of the function with no control otherthan the error specification Ix - xol < l/n. This leads to the definitionof continuity at a point Xo: [or every l/m there exists l/n such that[or every x in the domain 01 the function with Ix - xol < l/n wehave I/(x) - l(xo)1 < l/m. The error l/n depends on l/m and onXo; the dependence on Xo need not be mentioned in this definition,since Xo is fixed. However, when we define a continuous function to

Chapter 4 Continuous Functions114

Page 134: Strichartz_The Way of Analysis 2000

and we need 2/nxa :5 l/m. Running the argument backward, if wechoose l/n to be less than xo/2 and less than x5/2m, wehave Ix-xol <l/n (the condition x > Ois now redundant) implies Il/x-l/xol < l/m.This demonstrates the continuity of the function at each point of itsdomain. Note, however, that the error l/n depends on the point xo.As xo gets closer to zero the error 1/n must be made smaller in orderto guarantee the same error l/m in the output. This is clear from thegraph of the function, as shown in Figure 4.1.2.

< nx2O

Ix - xol 2 I I..:.__-~< - x - Xoxxo - xa2

J~ - _!_J <x Xo

Then

Figure 4.1.1:

• ( .o

be a function that is continuous at every point of its domain, thisdependence becomes important. The best way to see this is throughan example.

Let's show that the function I(x) = l/x on the domain x > Oiscontinuous. We fue a point Xo in the domain. Given any error l/m,we need to find an error l/n such that Ix - xol < l/n and x > O(x in the domain) implies Il/x - l/xol < l/m. Now we computel/x - 1/Xo = (xo - x) / xxo; and if we are to bound this from aboye, weneed to bound xXo from below. Since Xo does not vary, the problem isto keep x from getting close to zero. Thus wewant to require somethinglike l/n < xo/2, for then Ix - xol < l/n implies x > xo/2, as in Figure4.1.1.

115.4.1 Concepts 01Continuity

Page 135: Strichartz_The Way of Analysis 2000

Of course this function is not even bounded near x = O, so we mightexpect trouble. The trouble with which we are dealing, however, canturn up even for bounded functions. For example, consider a functionwith dornain (O, 1] whose values zigzag between O and 1, hitting O atl/n for neven and 1 at l/n for n odd, as shown in Figure 4.1.3. Ifwe want the error in the output less than 1/2, we will have to restrictIx - xol severely if Xo is close to 0, and no condition Ix - xol < l/n willwork for all Xo in the dornain (for exarnple, it fails for Xo = l/n).

In both exarnples the dornain of the function isn't closed, and thetrouble arises near a lirnit-point that is not in the dornain. Later wewill see that if the dornain is cornpact this situation can't arise. If theerror in the input Ix - xol < l/n can be chosen so as to rnake theerror in the output If(x} - f(xo}1 < l/m for all points x and Xo inthe dornain, then we have a stronger condition than continuity, whichis called uniform continuity. We surnrnarize the discussion in a formaldefinition.

Figure 4.1.2:

o~----._----~----------~------------

y = l/x

Chapter 4 Continuous Functions116

IIIIIIIIIIIII

1/ Xo -------~-------IIIIIII I

-------.-------1--------------I II II II II II •I I

Page 136: Strichartz_The Way of Analysis 2000

If f is continuous at xo, then the values of f(x) for x near Xoare approximating the value f(xo). This suggests a close relationshipbetween the concept of continuity of functions and limits of sequences.We can make the analogy stronger if we introduce a related concept,the limit 01a function. Let f be a function defined on a domain D, andlet Xo be a limit-point of D. We do not require that Xo be actually inD-frequently we will want to take D to be an open interval and Xo anendpoint. The reason we require Xo to be a limit-point of D is that weneed values of x in D nearby. Wewant to define the limit of f (x) at Xoto be the number, il it exists, that is approximated by f (x) for x nearxo. For purposes of defining the limit we will ignore the value f(xo) if

Deflnition 4.1.2 Let f be afunction on a domain D. Let Xo be a point01D. We say I is continuous at xo if for every l/m there exists l/nsuch that I/(x) - f(xo)1 < l/m for every x in D satisfying Ix - xol <l/no We say f is continuous il for every Xo in D and [or every l/mthere exists l/n (depending on Xo and l/m) sucñ that If(x) - f(xo)1 <l/m for every x in D satisfying Ix - xol < l/no We say I is unilormlycontinuous if for every l/m there exists l/n such that I/(x) - f(xo)1 <l/m for all x and Xo in D satisfying Ix - xol < l/no

Figure 4.1.3:

11¡1 1 1 1654 3

1

1174.1 Concepts 01Continuity

Page 137: Strichartz_The Way of Analysis 2000

Comparing the definition of limit of a function with limit of a se­quence we see that the only difference is that the condition of going farout in the sequence is replaced by the condition of going near the pointXo·

The connection between the definitions of limit of a function andcontinuity is also clear. A funetion is eontinuous at a point Xo in itsdomain that is a limit-point 01 its domain il and only il I has a limitat Xo and the value oll(xo) equalslimx_xo f(x). If Xo is a point of thedomain that is not a limit-point of the domain, then the definition ofcontinuity is trivially valid-we take l/n so that the only point of thedomain satisfying Ix - xol < l/n is x = Xo itself. Thus every function iscontinuous at a non-limit-point (called an isolated point) of its domain,and we normally will not be interested in this trivial case.

Figure 4.1.4:

j(x)

Xo happens to be in the domain-this being a convention chosen notfor compelling reasons but for convenience in certain applications.

Deflnition 4.1.3 Let I be a junetion defined on a domain D, and letXo be a limit-point 01D. We say f has a limit at Xo il there esists anumber y, whieh we eall the limit 01I at Xo and write y = limx_xo f(x),su eh that [or every l/m there exists l/n sueh that I/(x) - yl < l/m[or all x in D not equal to Xo satisfying Ix - xol < l/n.

The number y, if it exists, is unique for the same reason that thelimit of a sequence, if it exists, is unique. There is no requirement thatthe limit have anything to do with f(xo), if Xo happens to be in D,since we have made the convention of excluding the value x = xo. Thusthe function equal to 1 for all values of x =F Oand Ofor x = O,as shownin Figure 4.1.4, has limit 1 at x =O.

Chapter 4 Continuous Functions118

Page 138: Strichartz_The Way of Analysis 2000

Proof: Suppose limx_xo f(x) = y exists, and let Xl, X2, ••• convergeto xo. Given any error l/m, we want to make If(xj) - yl < l/m bygoing far out in the sequence. Now first by the existence oflimx_xo f(x)we know we can make If{x)-yl < l/m by taking Ix-xol < l/n, wherel/n depends on l/m and xo. We would like to apply this to x = Xj,so we must next use the convergence of the sequence x¡, X2, ••• to Xo.Given l/n, there exists k such that IXj - xol < l/n for all j ~k, bythis convergence. Also x j =F Xo by assumption. Thus we can apply theimplication Ix - xol < l/n implies If(x) - yl < l/m derived aboye to

Theorem 4.1.1 Let Xo be a limit-point of the domain D 01a junction[, Then limx_xo I (x) exists il and only il I (Xl), I (X2), . . . converges forevery sequence Xl! X2, ... of points of D, none equal to xo, convergingto xo. It is not necessary to assume that the limit 01 all the sequencesis the same, although this must be true, and the common limit is equalto the limit of the junction.

Now let us further examine the connection between the notion of limitsof functions and sequences. Let Xo be a limit-point of the domain Dof a function, and let Xl! X2, ... be a sequence of points in D, noneequal to xo, that converges to xo. Then we would expect the sequencef(XI), f(X2), ... to converge to limx_xo f(x), if the limit exists, for bygoing far out tn the sequence we make IXj - xol < l/n. Of coursethere are usually an uncountable number of points in the domain sat­isfying Ix - xol < l/n, so we cannot expect that the convergence off(xl), f(X2), ... for any one sequence will contain as much informa­tion as the existence of the limit of the function. In the example ofthe zigzag function in Figure 4.1.3, the function does not have a lim­it as x ~ O; but if we choose the sequence 1,1/3, 1/5, 1/7, ... wherethe function takes on the value 1, then the sequence f(l), f (1/3) , ...is just 1,1,1, ... and so has the limit 1. Of course it was only by acareful choice of the sequence of points that we were able to come outwith a limit; the sequence 1, 1/2, 1/3, 1/4, ... also converges to zero, butf(l), f (1/2) , (1/3) , ... is the sequence 1,0,1,0, ... , which has no limitoThis suggests that perhaps the convergence of f{XI), f(X2), ... for ev­ery sequence xl, X2, ... converging to Xo is equivalent to the existenceof limx_xo f(x). This is indeed the case.

4.1.2 Limits of Functions and Limits of Sequences

1194.1 Concepts 01 Continuity

Page 139: Strichartz_The Way of Analysis 2000

This theorem has a very striking consequence regarding continuousfunctions. Suppose I is continuous, and let Xl, X2, ... be any sequenceof points in the domain that converges to a point Xo in the domain(with none of the Xj equal to xo). Then Xo is a limit-point of thedomain; and since limx_xo I(x) exists and equals I(xo), we have thatI(X1), I(X2), ... converges to I(xo). We can paraphrase this by sayingthat the image under I 01 a convergent sequence (converging to a pointin the domain) is convergente We can easily remove the artificial re­quirement that none of the Xi equal Xo; however the condition that Xobe in the domain is crucial, as the zigzag example shows. The converseis also true, so we can characterize continuous functions as those thatpreserve convergent sequences or, equivalently, as those that commutewith sequentiallimits, I(limj_oo Xj) = limj_oo I(xj).

x = Xj if j ~k, and we obtain j ~k implies I/(xj) - yl < l/m. Thuswe have established the convergence of I(X1), I(X2), to y.

Conversely, suppose the sequences I(X1), I(X2), always convergeif Xl, X2, ... converges to xo, with the Xi in D and different from Xo.We want to show this ímplies the existence of the limit of the functionlimx_xo I(x).

First we claim a11the sequences I(X1), I(X2), ... have a commonlimit, for if not we could shufBe two sequences with different limits.In other words, if Xl, X2, ..• and Y1, Y2, ... both converge to Xo andl(x1), I(X2), ... converges to a and l(y1), I(Y2), •.. converges to b witha::j:. b, then the shufBed sequence Xl, Y1, X2, Y2, still has limit Xo butthe shufHed sequence /(xI), /(y¡), /(X2), I(Y2), does not converge,contradicting the hypotheses. Thus there is a common limit, call ity, of a11the sequences /(xI), /(X2), .•.. This is the value that we wi11show is equal to limx_xo I(x).

Suppose limx_xo I(x) = y were falseo Negating the definition leadsto the statement: there exists l/m such that for a11l/n there exists apoint Zn in the domain, not equal to xo, such that IZn - xol < l/n, andyet I/(zn) - yl ~ l/m. This is the statement that we must show leadsto a contradiction. But if this statement were true, the sequence {zn}would converge to Xo (since IZn - xol < l/n) and yet l(z1), I(Z2), ...would not converge to y, since I/(zn) - yl ~ l/m for every n. Thiscontradicts the fact that I(Zl), I(Z2), ... converges to y that we justestablished. QED

Chapter 4 Continuous Functions120

Page 140: Strichartz_The Way of Analysis 2000

We now have two equivalent characterizations of continuity, and we willpresently find a third. Recall that we were able to characterize conver­gent sequen ces entirely in terms of open sets: Xl, X2, ... converges to Xoif for every neighborhood of xo, all but a finite number of terms in thesequence lie in that neighborhood. Similarly we can rephrase the defini­tion of continuity of f at Xo as follows: for every neighborhood of f(xo)there ezists a neighborhood 01 Xo that is mapped into the neighborhoodof f(xo) by f. If we denote by M the set {y: Iy - f(xo)1 < l/m} andN the set {x : Ix - xol < l/n}, then the statement "f maps N into M"is the same as "Ix - xol < l/n implies If(x) - f(xo)1 < l/m". Nowif A is any subset of the range of f, we denote by f-I(A), the inverseimage of A under f, the set of points X in the domain of f such thatf (x) is in A. The fact that f maps N into M is the same as saying N

4.1.3 Inverse Images of Open Sets

Theorem 4.1.2 Let f be a function defined on a domain D. Then fis continuous if and only if for every sequence of points XI, X2, ... thathas a limit in D, the sequence f(XI), f(X2), ... is convergent. It is notnecessary to assume that the limit of the sequence f(XI), /(X2), ... isequal to f [lim,-00 x j ), but this follows from the hypotheses.

Proof: The proof is quite similar to the proof of the previous the­orem; we could in fact reduce it to the proof to the previous theorem,at the expense oí a lot oí special cases.

Suppose first f is continuous. Then if Xl, X2, ... converges to Xowe can show f(XI), f(X2), ... converges to f(xo) by the same argumentas in the previous theorem: given l/m we first find l/n such thatIx - xol < l/n implies If(x) - f(xo)1 < l/m by continuity of f andthen find k such that j ~ k implies IXj - xol < l/n, by the convergence ofXl, X2, ... to Xo. Then j ~ k implies I¡(xj) - ¡(xo)1 < l/m.

For the converse we first use the shuffííng argument to show the limitof the sequence f(xl), f(X2), ... is the same for all sequences xI, X2, ...converging to the point Xoi and since xo, Xo, ... is one such sequenceand the limit of f(xo), f(xo), ... is f(xo), it follows that the commonlimit of all these sequences is f(xo). Now if Xo is not a limit-pointof D there is nothing to prove, while if Xo is a limit-point of D theprevious theorem implies limx_xo f(x) exists and equals the commonlimit f(xo). Thus f is continuous at xo. QED

1214.1 Concepts of Continuity

Page 141: Strichartz_The Way of Analysis 2000

Proof: First suppose I is continuous. Let A be an open set of realnumbers. We want to show 1-1 (A) is open. To do this we need toshow that every point in 1-1 (A) is contained in an open interval lyingin 1-1 (A). Of course 1-1 (A) may be empty, but in that case there isnothing to prove since the empty set is open.

So suppose Xo is in 1-1(A). This means that Xo is in the domain ofI and I(xo) is in A. Since A is open, there is an open interval aboutI(xo), say {y : Iy - l(xo)1 < l/m}, contained in A. By the continuityof 1, there is an open interval about xo, Ix - xol < l/n, that is mappedinto the interval about I(xo) (actually the definition says Ix-xol < l/nand x in the domain implies I/(x) - l(xo)1 < l/m, but since we haveassumed the domain is open, we can arrange that Ix -xol < l/n impliesx is in the domain by taking l/n small enough). But this implies thatthe interval Ix - xol < l/n lies in 1-1 (A), which shows that 1-1 (A) isopen.

Conversely, suppose 1-1(A) is open for every open set A. We wantto show that I is continuous at every point Xo of the domain. To dothis we have to show that given any l/m we can find l/n such thatIx - xol < l/n implies I/(x) - l(xo)1 < l/m. Let us choose for A theopen set {y: Iy - l(xo)1 < l/m}. By hypothesis 1-1(A) is open. Notethat Xo is in 1-1 (A), for I/(xo) - l(xo)1 = O. Since Xo is a point of theopen set 1-1 (A), there is an open interval about xo, say Ix - xol < l/n,

Theorem 4.1.3 Let I be a function defined on an open domain. ThenI is continuous il and only il the inverse image 01 every open set is anopen seto

is contained in 1-1(M) and so continuity at Xo becomes: the inverseimage 01 every neighborhood oll(xo) contains a neighborhood 01 xo. Itis important to convince yourself that it is the inverse image that be­longs in this statement. It is not true that the image of a neighborhoodof Xo under a continuous function I contains a neighborhood of I(xo);for example, a constant function is continuous, but its entire image isa single point.

The situation actually improves if we reformulate the definition ofcontinuity on the whole domain rather than at a single point. Forsimplicity we assume that the domain is an open seto (We will returnto this in Chapter 9.)

Chapter 4 Continuous Functions122

Page 142: Strichartz_The Way of Analysis 2000

I/(x) - f(xo)1 ~ Mlx - xolo.

This implies continuity in that If(x) - f(xo)1 < l/m if Ix - xol <l/(mM)l/o. Another way of thinking about these conditions is to

If this holds with a fixed M for all x and Xo in the domain we say /satisfies a Lipschitz condition. A typical function satisfying this con­dition is f(x) = [z], where the constant M can be taken equal to 1.Other variants of this are the Holder conditions of order a, O < a ~ 1(sometimes referred to as Lipschitz conditions of order a):

I/(x) - l(xo)1 ~ Mlx - xol·

Continuity is a qualitative property, in that it concerns a relation be­tween the error of the input and error of the output that is not spec­ified. It is not surprising that there are many related quantitativeproperties-where the relation between the errors takes on a specificformo The simplest possible form is proportionality-the existence of aconstant M such that Ix - xol < l/Mm implies I/(x) - f(xo)1 < l/m.It is simple to see that this is equivalent to the condition

4.1.4 Related Definitions

This last characterization of continuous function may be furthestfrom the intuition of continuity, but it is undeniably simple and elegant.It is extremely general-we will see that it is valid also for functions ofseveral variables-and it is in fact taken to be the definition of conti­nuity in general topology. There is a similar characterization in whichopen sets are replaced by closed sets, if the domain of / is closed. Thisis not at all surprising in view of the fact that complements of opensets are closed, but it is an important fact. We leave the details toexercise set 4.1.5, number 1. If the domain of f is the whole line, thenboth results apply. In particular, a set defined by an "open" conditionlike /(x) > a is open and a set defined by a "closed" condition like/(x) ~ a or /(x) = a is closed. See exercise set 4.1.5, numbers 2 and 3for related results.

contained entirely in 1-1 (A). But this means that Ix-xol < l/n impliesI(x) is in A, and by the choice of A, I/(x) - l(xo)1 < l/m as desired.QED

1234.1 Concepts 01 Continuity

Page 143: Strichartz_The Way of Analysis 2000

This function does not have a limit at x = O,because if you approach Ofrom positive numbers the value is 1 while if you approach from negativenumbers the value is -1. We would then like to say limx_o+ sgn x = +1and limx_o- sgn x = -1, and it is not hard to define such limits fromoboue and beloui (or right and left) to make this so. By lim + I(x) =x-xoy (where Xo is a limit-point of the domain) we will mean for every l/nthere exists l/m such that Xo < x < Xo + l/m, and x in the domainimplies If(x) - yl < l/n, while by limx_x; f(x) = y we will mean thesame condition with now Xo - l/m < x < Xo. Similarly we will say

Figure 4.1.5:

'o-11

sgnx+l~---------------

The domain of this function is all x i= O, although sometimes one takesthe convention sgn O= Oto have the function defined on the whole lineoThe graph is shown in Figure 4.1.5.

_ {+1 if x > O,sgn z - l'f O- 1 X < .

define the modulus 01continuity w(xo, 8) of I to be the sup of the valuesI/(x) - l(xo)1 as x varies over the ínterval ]e - xol < 8. The conditionthat I be continuous at Xo is the statement that w(xo, 8) as a functionof 6 has limit O as 6 -+ O, while the quantitative conditions describethe rate of convergence w(xo, 6) ~ M6 for the Lipschitz condition.

Generally speaking, for every theorem about continuous functions,there is an analogous quantitative version for Lipschitz functions, whichcan be proved by similar methods.

There are many circumstances when we will want to consider limitsand continuity that are one-sided. For example, consider the signumfunction

Chapter 4 Continuous Functions124

Page 144: Strichartz_The Way of Analysis 2000

1. Let I be a function defined on a closed domain. Show that I iscontinuous if and only if the inverse image of every closed set is aclosed seto

4.1.5 Exercises

lim I(x) = y and lim 1(:.:) = +00.x-+oo x-q

that I is continuous from the right at :':0 if /(:':0) = limx_xci/(:.:) andthat I is continuous from the left at ~o if 1(:':0) = limx_x; I(x).

Clearly continuity is the same as continuity from both the left andright. The choice sgo O = O makes the sigoum continuous from neitherside at O, but a different convention would allow us to have one or theother but not both.

The kind of discontinuity that signum has is called a jump discon­tinuity or discontinuity 01 the first ando The definition is that bothlimx_xt I(x) and limx_xo 1(:.:) should exist and be different. Anyworse discontinuity, where one or another one-sided limit does not ex­ist, is called a discontinuity 01 the second ando If we adjoin O to thedomain oí the zigzag function (Figure 4.1.3) it will have such a dis­continuity there. There is another technical kind of discontinuity, inwhich the limit exists at Xo but is different from I(xo). This is called aremouable discontinuity, because if we simply redefine the function at:':0 to equal its limit then we wiIl have a continuous function there. Anexample is the function

I(x) = { 1 ~f x =1= O,O lf x = O.

It is not unreasonable to think of removable discontinuities as simplymistakes that can be, and should be, corrected.

The one-sided limits we have discussed ínvolve restricting the do­main oí the function to one side oí the point. There are also concepts ofupper semi-continuity and lower semi-continuity that involve restrict­ing the range of the function. These are less frequently used and willnot be needed in this book.

Finally we note that it is often convenient to allow the extendedreal numbers ±oo to appear in limits, in either the domain or range.We leave it to the reader to supply the obvious meaning for statementlike

1254.1 Concepts 01 Continuity

Page 145: Strichartz_The Way of Analysis 2000

13. Is the inverse image of a convergent sequence under a continuousfunction necessarily a convergent sequence?

12. If 1is a continuous function on IR, is it true that x is a limit-pointof Xl, X2, ... implies I(x) is a limit-point of l(x1), l(x2), ... ?

11. If 1is continuous on R, is it necessarily true that l(limsuPn_ooxn)= limsuPn_oo/(xn)?

8. Give an example of a continuous function with domain R suchthat the image of a closed set is not closed.

9. Show that the function I(x) = x2 with domain ° ::;x < 00 is one­to-one but the function I(x) = x2 with domain R is noto What isthe image of these functions? Are they uniformly continuous?

10. Show that a function that satisfies a Lipschitz condition is uni­formly continuous.

6. Let 1have a jump discontinuity at Xo. Show that if Xl, X2, ... isany sequence of points in the domain of 1converging to Xo, withno Xj equal to Xo, then the sequence l(x1), I(X2), ... has at mosttwo limit-points,

7. Give an example of a continuous function with domain R suchthat the inverse image of a compact set is not compacto

2. Let A be the set defined by the equations h (x) = O,h(x) =0, ... , In(x) = 0, where 11, ... , In are continuous functions definedon the whole lineo Show that A is closed. Must A be compact?

3. Let A be the set defined by the inequalities ¡dx) ;::::0, h(x) ;::::0, ... , In (X) ;::::Owhere f¡, ... , In are continuous functions definedon the whole lineo Show that A is closed. Show that the setdefined by b (x) > O,... , In(x) > O is open.

4. Give a definition of limx_oo I(x) = y. Show that this is true ifand only if for every sequence Xl, X2, ... of points in the domainof 1such that limx_oo Xn = +00, we have limn_oo I(xn) = y.

5. Show that the function I(x) = x/3 on [0,1] for ° < {3 ::; 1 satisfiesa Holder condition of order Q for O< Q ::; {3 but not for Q > (j.

Chapter 4 Continuous Functions126

Page 146: Strichartz_The Way of Analysis 2000

We begin with some simple observationson the preservation of conti­nuity under arithmetic operations. Suppose¡and 9 are definedon thesame domain D. Then by 1+ 9 wemean the functionwith domain Dthat takes the value I(x) + g(x) at X. Even if I and 9 have differentdomains, we can define I + 9 on the intersection of their domains. IfI and 9 are both continuous, then I + 9 will also be continuous. Thisfollowsimmediately from the characterization of continuity in terms oftaking convergentsequencesto convergentsequencesand from the factthat the sum of convergent sequencesis convergent. This property isexpressedby saying the continuous functions are preserved by additionor the sum 01continuous functions is continuous. This is true whetherwe consider continuity at a point or continuity on the whole domain.(For uniform continuity and Lipschitz conditions, see the exercises.)Clearly the same is true for scalar multiplesal, differencesI - g, andproducts l· g. For quotients 1/9 wemust avoiddividingby zero. If wehave I and 9 defined and continuous on D, then I/9 will be definedand continuous on the subset of D of points where 9 is not zero. Itmay happen that 1/ 9 can be further definedand continuousat pointswhere both I and 9 are zero, but this has to be determined on a case­by-case basis. WewiIlreturn to this when we discuss l'Hópital's Rulein Section 5.4.3.

Since the constant functions and the identity function I(x) = xare easily seen to be continuous, if followsthat all rational functionsp( x) / q( x) wherep and q are polynomialsare continuouson the domain{x : q(x) i: O}.This givesus a large collectionof continuousfunctions.

4.2.1 Basic Properties

4.2 Properties of Continuous Functions

14. Show that 1-1(A UB) = ¡-l(A) U ¡-l(B) and ¡-l(A nB) =¡-l(A) n ¡-l(B) for any function l. Is the same true of images(as opposed to inverseimages)?

15. If I is definedon a finite open interval (a, b) and uniformly con­tinuous, show that the limit of I exists at the endpoints and Ican be extended to a uniformlycontinuousfunction on the closedintervalo

1274.~ Properties 01Continuous Functions

Page 147: Strichartz_The Way of Analysis 2000

Using these kinds of ideas it is generally speaking possible to con­struct continuous functions that willdo whatever you like. For example,

Proof: Fix a point Xo in the domain¡ and suppose I(xo) ~ g(xo),so max(J, g)(xo) = I(xo). To show max(f, g) is continuous at Xo wehave to show that for each l/m there exists l/n such that Ix - xol <l/n implies 1 max(f,g)(x) - max(f,g)(xo)1 < l/m. We know thatJ and 9 are continuous at xo, so given l/m we can find l/n (takethe smaIler of the two values for I and g) such that Ix - xol < l/nimplies I/(x) - l(xo)1 < l/m and Ig(x) - g(xo)1 < l/m. This l/nwiIl do the job for max(f, g). To see this we have to examine the twopossibilities, max(f,g)(x) = I(x) or g(x). In the first case I(x) -I(xo) = max(f,g)(x) - max(f,g)(xo), so I/(x) - l(xo)1 < l/m givesthe result we want. In the second case we have g(x) ~ I(x) andmax(f, g)(x) - max(/, g)(xo) = g(x) - I(xo).

Now g(x) - I(xo) ~ I(x) - I(xo) because g(x) ~ I(x), whileg(x) - I(xo) ~ g(x) - g(xo) since I(xo) ~ g(xo). So Ig(x) - l(xo)1 ~max(l/(x) - l(xo)l, Ig(x) - g(xo)1) < l/m. QED

Theorem 4.2.1 11I and 9 are continuous, then max(f, g) andmin(f, g)are continuou8.

is continuous on [a, e). The proof is left to the exercises.The maximum and minimum of two continuous functions are also

continuous. If I and 9 are defined on the same domain D, let max(f, g)denote the function on D that at x takes the value I(x) if I(x) ~ g(x)or g(x) if g(x) ~ I(x); definemin(f,g) similarly. (Note that the graphsof I and 9 may cross each other infinitely often, so we cannot reducethis to a gluing argument.)

Most of the special functions, such as sin z , cosz , eX, logz, are alsocontinuous. We will prove this when we give the precise constructionof these functions.

Continuous functions can also be created by gluing together con­tinuous pieces. If I is defined and continuous on [a, b) and 9 is definedand continuous on [b, e) with I(b) = g(b), then

h(x) = {/(X), a ~ x ~ b,g( x), b ~ x s e,

Chapter 4 Continuous Functions128

Page 148: Strichartz_The Way of Analysis 2000

It remains to see that / is continuous. But in fact / satisfies theLipschitz condition I/(x) - /(y)1 ~ Ix - yl, which implies continuity.We leave the verífícatíon of the Lipschitz condition to the exercises.

Figure 4.2.2:

which rises with slope 1 to the midpoint and falls with slope -1 to theend. For the infinite intervals (there are at most two oí them, (a, 00 )and (-00, b)) wekeep the slope +1 or -1 throughout. The total picturemight resemble that in Figure 4.2.2. From the construction it is clearthat / = Oexactly on the set A.

Figure 4.2.1:

suppose we want to construct a continuous function / (x) on the linethat is equal to zero exactly on a set A. We know (exercise set 4.1.5,number 1) that the set oí solutions to /(x) =Ois closed, so A will haveto be closed, but no other restrictions on A are needed. To constructthe function / we look at the complement of A, which is open. By ourstructure theorem for open sets, the complement of A is an at rnostcountable union of disjoint open intervals so that the whole line is adisjoint union oí A and sorne open intervals. The function / will bezero on A, and on each open interval we construct a tent (see Figure4.2.1),

1294.2 Properties o/ Continuous Functions

Page 149: Strichartz_The Way of Analysis 2000

Proof: We use the divide and conquer method. We repeatedlybisect the interval, retaining the half that might contain a solution.In other words, fixing a value of y, we consider a sequence of in­tervals [al, bl], [a2, ~], ... where [al, bl] is the original interval [a, b]and [a2,~] is one of the halves [a, a + b/2] or [a + b/2, b]-chosen sothat f(a2) < Y < f(~). Oí course ií f (a + b/2) = y, then youare done; otherwise either f (a + b/2) < y in which case you take[a + b/2, b] or f (a + b/2) > y in which case you take [a, a + b/2]. It­erating this process we obtain either the solution we seek or a pairof sequences ab a2,' .. and bIt~, ... such that f(ak) < y < f(bk),where bk - ak = 2l-k(b - a). This condition shows that both se­quences are Cauchy sequences and converge to a common limit e.

Theorem 4.2.2 (Intermediate Value Theorem) Let f be a continuousfunction on a domain containing [a, b], with say f(a) < f(b). Thenfor any y in between, f(a) < y < f(b), there exists z in (a,b) withf(x) = y.

Figure 4.2.3:

ba

I

IYI I

Y2! -'

Y ~'I ----~~~~---=~----~~~---------------/ ~

We now turn to the properties of continuous functions. One proper­ty that seems obvious is that a continuous function must pass throughall intermediate values on its way from one value to another. If f(a) =Yl and f(b) = Y2, with f continuous on [a, b] (or some larger domain)with say Yl < Y2, then for any value of y in between, Yl < Y < Y2, theremust exist at least one solution of f(z) = y in (a, b). This is certainlyevident graphically, as shown in Figure 4.2.3, if we imagine a movinghorizontal line going from height Yl to height Y2, the intersections oíthis line with the graph of f gives the solutions to y = f(x).

Chapter 4 Continuous Functions130

Page 150: Strichartz_The Way of Analysis 2000

Next we consider sorne theorems concerning continuous functions on acompact domain. Since we have said that compact sets have propertiesanalogous to finite sets, we would expect properties of all functions ona finite set to be valid for continuou8 functions on a compact seto Two

4.2.2 Continuous Functions on Compact Domains

and the expression in parenthesis can be made less than one by taking[z] large enough, so that an_lXn-1 + ... + ao is too small to changethe sign of anxn. Thus since the polynomíal of odd degree assumespositive and negative values, it must have a root (we can even givean upper bound for the absolute value of the root in terms of thecoefficients, namely the smallest value of x that makes the expressionin parenthesis equal one). This argument does not work for polynomialsof even degree, for the sign of the leading term is then always positive­this is as it should be, since such polynomials may (x2 - 1) or may not(x2 + 1) have real roots.

Another application of the intermediate value theorem is that theimage of a function defined on an interval is also an interval-the end­points being the sup and inf of the image, which may or may not belongto the image. We leave the proof for an exercise.

lan_lXn-1 + ... + aol ~ lan_lllxln-1 + + laols lanxnl (1 ~I x-II + 1~:2x-21 + + 1:: x-ni)

A frequently used special case of this theorem is the following: if acontinuous function changes sign on an interval, then it has a zero in theinterval. This is a valuable method for locating zeros of a function. Wecan use it to give a quick proof that a polynomial of odd degree (withreal coefficients) has a real root. Ifthe polynomial is anXn+lln_lXn-1 +

... + ao with say an > Oand n odd, then it is not hard to see that thesign is determined by the leading term if [z I is large-positive for x > Oand negative for x < Obecause n is odd. Indeed

Since f is continuous, f(x) = limk_oo f(ak) = limk_oo f(bk) whilefrom /(ak) < y < f(bk) we obtain f(x) ~ y ~ f(x) in the limit, hencef(x) = y. QED

1314.2 Properties of Continuous Functions

Page 151: Strichartz_The Way of Analysis 2000

Proof: Let f(D) denote the image. To show it is compact we willshow that every sequence in f(D) has a limit-point in f(D). Now thepoints in f(D) are the values f(x), so a sequence in f(D) has theform !(xI), f(X2), ••• for points Xl, X2, ••• in D. If the value !(xI) isassumed more than once then we could change Xl without changingf(xI), so the sequence Xl, X2,'" is not uniquely determined by thesequence of values f(XI), f(X2), .... But this turns out not to matter.The important thing is that we can pass from a sequence oí valuesin the image to some sequence of points in the domain. Since thedomain is compact, there is a limit-point X of the sequence Xl, x2, •••

Theorem 4.2.4 If f is a continuous function on a compact domain,then the image of f is compacto

A function defined on a finite set takes on only a finite number ofvalues; in other words, its image is also a finite seto To obtain thecorrect analogy we must replace both finite sets by compact sets.

Proof: If f were not bounded aboye, there would exist a sequenceof points XI,X2,." in D such that !(Xj) ~ j. Since D is compact,we could find a subsequence converging to a point Xo in D. Since j'is continuous at xo, it would have to take this convergent subsequenceto a convergent sequence, which contradicts f(xj) ~ j. Similarly f isbounded below.

Since the set {f(x) : x inD}, the image of f, is bounded aboye,it has a finite sup, and there must exist a sequence of values {f (x j )}converging to this sup. Again by the compactness ofD, there must exista convergent subsequence of Xl, X2, ••• j say X~, x~, ... converges to y inD. Then {f(xj)} is a subsequence of {f(Xj)} and so also converges tothe sup. By the continuity of f, f(y) = f(liml-ex> xj) = limi_oo f(xj),which equals the supo QED

Theorem 4.2.3 Let ] be a continuous function with domain D that iscompacto Then ] is bounded and there exist points y and z in D (notnecessarily unique) such that f(y) = sup{f(x) : xinD} and !(z) =inf{f(x) : x in D}.

very obvious properties of all functions on a finite set are that they arebounded and attain their maximum and minimum values.

Chapter 4 Continuous Functions132

Page 152: Strichartz_The Way of Analysis 2000

Proof: First let us recall what the issue is here. For 1 to be con­tinuous means that given the error l/m and the point xo, we can findan error l/n such that Ix - xol < 1/n and x in the domain impliesI/{x) - f{xo)1 < l/m. We do not know how the error l/n varies withthe point Xo. Uniform continuity means that we can find a value ofl/n that will work for all points Xo in the domain. Thís is what we aregoing to prove must be true if the domain is compacto

Theorem 4.2.5 (Unilorm Continuity Theorem) Let 1 be a functionon a compact domain D that is continuous. Then it is uniformly con­tinuous.

Finally we have a result without any analogue for functions on finitesets: the uniform continuity of continuous functions on compact sets.If you look back at the examples of functions that are not uniformlycontinuous (Figure 4.1.3, or exercise set 4.1.5, number 9) you will noticethat the domains are not compacto

Figure 4.2.4:

Note that the image of a closed set under a continuous function isnot necessarily closed. As an example take f{x) = 1/{1 + x2) definedon the whole line, shown in Figure 4.2.4. Then O is a limit-point of theimage but is not in the image, which is {O,1]. This gives an example ofa continuous function on a closed set that does not attain its inf.

in D or, equivalently, there is a subsequence x~, xí, ... that convergesto a; Then f{xD, I(xí), ... is a subsequence of the given sequence ofvalues in the image, and by the continuity of f it converges to I{x).Thus 1(x) is a point in the image that is a limit-point of the sequenceI(x¡), I(X2), .... This shows the image is compacto QED

1334.2 Properties of Continuous Functions

Page 153: Strichartz_The Way of Analysis 2000

We conclude this section with a discussion of monotone funetions. Wehave seen that bounded monotone sequences have limits, so we would

4.2.3 Monotone Functions

These theorems concerning continuous functions on compact setscan be used in a relative way for continuous functions on non-compactdomains. Suppose for example that f is a continuous function on anopen interval (a, b). Then the restriction of f to any compact subinter­val [e, el] is a continuous function on a compact seto Thus f is bounded,uniformly continuous, and attains its sup and inf on the set [e, el]. How­ever, a11these statements must be interpreted relative to the domain[e, d] and say something different from what they would say for thedomain (a, b)-where they may be falseo For example, f (x) = l/x onthe domain (0,00) is continuous but is unbounded and not uniformlycontinuous. On the domain [e, d], however, for O < e < d < 00 thefunction f(x) = l/x shown in Figure 4.2.5 is bounded and uniformlycontinuous, attaining its sup at x = e and its inf at x = d.

Let us consider what would have to be true for uniform continuityto fail. To negate a statement that begins "for all l/m there existsl/n", we have to begin with ''there exists l/m such that for a11l/n".The negation of "Ix - yl < l/n implies If(x) - f(y)1 < l/m" would bean example of two points x, y in the domain such that Ix - yl < l/nand If(x) - f(y)! ;:::l/m. Since there must be one such example foreach l/n, we should label the points xn, Yn' We now have the fu11statement that f is not uniformly continuous: there exists l/m suchthat for all l/n there exists two points Xn, Yn in the domain such thatIXn- Ynl< l/n but If(xn) - f(Yn)l ;:::l/m. We have to show that thisleads to a contradiction and so is impossible.

Since D is assumed compact, the obvious first step is to replace thesequences Xl, x2, ••• and Y¡, Y2,'" by convergent subsequences. Thecondition IXn - Ynl ~ l/n implies that both subsequences convergeto the same limit, call it xo. Calling the subsequences x~, x~, ... andy~, lh,'" we have If(x~) - f(lIn)! ;:::l/m and yet both limn_oo f(x~) =f(xo) and limn_oo f(!ln) = f(xo) by the continuity of f at the pointxo. This is a contradiction; we cannot have limn_oo(f(x~) - f(y~)) =f(xo) - f(xo) = O and If(x~) - f(Y~)1 ~ l/m for a11n. QED

Chapter 4 Continuous Functions134

Page 154: Strichartz_The Way of Analysis 2000

Theorem 4.2.6 (Monotone Function Theorem) Let f be a monotonefunction defined on an intenJal. Then the one-sided limits limx_xciI (x)and limx_x; / (x) both exist (allowing +00 or -00) at all points Xo inthe domain. These limits are finite except perhaps at the endpoints.

expect some analogous result for monotone functions. We say I ismonotone increasing on a domain D if x < y for two points in thedomain implies I(x) :5 f(y) and monotone decreasing if x < y fortwo points in the domain implies I (x) ~ f (y). For simplicity we willrestrict our discussion to the case when the domain is an intervalo Nowa monotone function can have jumps (the signum, for example), so wecannot prove that a monotone function is continuous. The analogueof the convergence theorem for monotone sequences is that one-sidedlimits always exist, so a bounded monotone function has at worst jumpdiscontinuities.

Figure 4.2.5:

e d

I(x) = l/x

1354.~ Properties 01 Continuous Functions

Page 155: Strichartz_The Way of Analysis 2000

Proof: Let f be monotone increasing, and consider first a point Xo inthe interior of the domain. Then the sequence f(xo-1),f(xo -1/2), f(xo -1/3), ... is monotone increasing (the first few termsmay be undefined, but eventually Xo - l/n is in the domain since Xois an interior point) and so has a limit y that must be finite sincef (xo - l/n) ~ f(xo). We claim this value y must be limx_x- f(x).

oThe reason for this is simply that any point x less than Xo but nearit must be squeezed between some Xn and xn+ 1, and by the mono­tonicity f(xn) ~ f(x) ~ f(xn+l), which forces f(x) to be close toy. To make this more precise, suppose we are given an error l/m.By the convergence of the sequence {f(xo - l/n)} there must ex­ist k such that y - l/m ~ f(xo - l/n) ~ y for all n ?: k. Thenif Xo - l/k < x < Xo we have Xo - l/n ~ x S Xo - l/ (n + 1) forsorne n ~ k, so f(xo - l/n) S f(x) ~ f(xo - l/(n + 1)), which irnpliesy - l/m ~ f(x) ~ y. Thus limx_x- f(x) = y.o

A similar argument shows the existence of limx_xt f(x), wherethis time we use the convergence oí the monotone decreasing sequencef(xo +1),f(xo +1/2), f(xo +1/3), .... At the endpoints of the intervalwe can also show the existence of the one-sided limit, allowing thepossibility oí +00 and -oo. We leave the details as an exercise. QEDCorollary 4.2.1 A monotone funetion on an open interval is eontinu­ous at all points ezcept at an at most eountable number of points whereit has a jump diseontinuity.

Proof: We have to show that there are at most a countable number ofpoints of discontinuity. Let us define the jump at a jump discontinuityXo to be limx_xó f (x) - limx_xQ f (x). If the function is monotoneincreasing, then the jumps are all positive and it would seem plausiblethat the sum of all the jumps between a and b should be at mostf(b) - f(a). We will use this idea cautiously in fashioning the proof.

Let [c, d] be any compact interval contained in the domain, andconsider the set oí jump discontinuities in [e, d] for which the jumpexceeds l/m. We claim this set is finite-if we can prove it we aredone because the set of all discontinuities is a countable union oí suchsets, where we vary l/m and [e,d] over countable sets.

In fact we can show that the number oí jump discontinuities in(e,d) with jump exceeding l/m is bounded by m(f(d) - f(e)). Sup­pose XI,.'" Xn are distinct jump discontinuities in (e, d). Then we can

Chapter 4 Continuous Functions136

Page 156: Strichartz_The Way of Analysis 2000

which is one for X ~ rk and zero for X < rk, then the functionf(x) = E~12-kfk(X) is a monotone increasingfunction with a jump

Figure 4.2.7:

-------- A(x)

the jump at X2 is bounded aboyeby f(Y2) - f(y¡), and so on. We leavethe proof of this as an exercisefor the reader. Adding up, the sum ofthe jumps at the points X1, ... , Xn is at most f(d) - f(c). Thus if eachjump is at most l/m, there are at most m(f(d) - f(c)) points. QED

Despite the apparent simplicity of this result, you should not belulled into thinking that the general monotone function is anythinglike the simple pictures you can draw. For example, if r¡, r2, ... is anenumeration of the rational numbers and if fk(X) is the function shownin Figure 4.2.7,

Figure 4.2.6:

YIe

shufHethem inside a sequencec,Y1, ... , Yn-1, d so that c < Xl < Yl <X2 < Y2 < ... < Yn-1 < Xn < d. Because the function is monotoneincreasing,the jump at Xl is bounded aboyeby f(y¡) - f(c) (seeFigure4.2.6),

1374.2 Properties of Continuous Functions

Page 157: Strichartz_The Way of Analysis 2000

8. Let f = p + 9 where p is a polynomial of odd degree and 9 is abounded continuous function on the lineo Show that there is atleast one solution of f(x) =O.

9. If f and 9 are uniformly continuous, show that f +9 is uniformlycontinuous.

7. Let f be a monotone function on an interval. Show that if theimage of f is an interval, then f is continuous. Give an example ofa non-monotone function on an interval whose image is an intervalbut that is not continuous.

5. Suppose f and 9 both satisfy a Lipschitz condition on an interval(If(x) - f(y)1 $ Mlx - yl for all X and y in the interval). Showthat f +9 also satisfies a Lipschitz condition.

6. Show that if f and 9 are bounded and satisfy a Lipschitz condi­tion on an interval, then f· 9 satisfies a Lipschitz condition. Givea counterexample to show that it is necessary to assume bound­edness.

3. If the domain of a continuous function is an int erval , show thatthe image is an intervalo Give examples where the image is anopen intervalo

4. If a continuous function on an interval takes only a finite set ofvalues, show that the function is constant.

1. If f is monotone increasing on an interval and has a jump discon­tinuity at Xo in the interior of the domain, show that the jump isbounded aboye by f(X2) - f(XI) for any two points XI,X2 of thedomain surrounding xo, Xl < Xo < X2.

2. If f is monotone increasing on an interval (a, b), write out thecomplete proof that limx_b f(x) exists either as a real number or+00.

4.2.4 Exercises

discontinuity at every rational number. We postpone the discussion ofthis example to the chapter on infinite series.

Chapter 4 Continuous Functions138

Page 158: Strichartz_The Way of Analysis 2000

17. Give an example of a function on R that has the intermediatevalue property for every interval (it takes on all values betweenf(a) and f(b) on a ~ x ~ b) but fails to be continuous at a point.Can such a function have jump discontinuities?

16. Let f be a function defined on the extended reals 1RU {±oo} butwhose range is R, which is continuous in the usual sense for pointsin R, and limx_±oo f(x) = f(±oo). Prove that f is bounded andattains its sup and inf (possibly at the points ±oo). Prove that fis uniformly continuous when restricted to R.

15. Give an example of a function on R that assumes its sup and infon every compact interval and yet is not continuous.

14. Show that the function constructed in Section 4.2.1 to vanish ex­actly on the closed set A satisfies a Lipschitz condition.

is continuous if and only if f(b) = g(b).

h (x) = { f (x), a ~ x ~ b,g(x), b ~ x ~ e,

13. If f is continuous on [a, b] and 9 is continuous on lb, e], show that

12. If f and 9 are continuous functions and the domain of 9 containsthe image of f, show that the composition 9 o f defined by 9 of (x) = g(! (x)) is continuous. If f and 9 are uniforrnly continuous,is 9 o f uniformly continuous? What about Lipschitz conditions?

11. If f is a continuous function on a compact set, show that eitherf has a zero or f is bounded away from zero (1f (x) I > l/n for allx in the domain, for sorne l/n).

10. If f and 9 are uniforrnly continuous and bounded, show that f .9is uniformly continuous. Give a counterexarnple to show that itis necessary to assume boundedness.

1394.3 Summary

Page 159: Strichartz_The Way of Analysis 2000

Deflnition A function f is said to have a limit from the right at Xoequal to y, written limx_xó f (x) = y if for every l/n there exists l/msucñ that Xo < x < Xo+ l/m implies I/(x) - yl ~ l/n. Similarly we

Deflnition A function satisfies a Lipschitz condition il 1I (x) - I (xo) 1 ~Mlx - XoI for sorne M and all x and Xo in the domain.

Theorem 4.1.3 A function on an open domain is continuous il andonly if the inverse image 01 every open set is open.

Theorem 4.1.2 A function is continuous il and only il it takes con­vergent sequences to convergent sequences.

Theorem 4.1.1limx_xo f(x) exists il and omy if f(X1), I(X2), ... con­verges jor every sequence xl, X2, ... 01points 01D not equal to Xo butconverging to xo.

Definition 4.1.3 y = limx_xo I (x) [or Xo a limit-point 01 the domain Dmeans [or every m there exists n such that Ix - xol < 1/n, x i= xo, and xin D implies II (x) - yl < l/m.

Deflnition 4.1.2 A junction f is said to be continuous at a point Xoof its domain D if for every m there exists n such that Ix - xol < l/nand Xo in D implies If(x) - f(xo)1 < l/m. We say f is continuous ifit is continuous at every point of D. We say f is uniformly continuousif for every m there exists n such. that Ix - yl < l/n and x and y in Dimplies If(x) - f(y)1 < l/m.

Deflnition 4.1.1 A function consists of a domain D, a range R, anda corresponden ce x -+ f(x) assigning a point f(x) of R to each pointx of D. The image f(D) is the set of all values f(x). The junction isonto if the image equals the range and is one-to-one if x =F y impliesf(x) =F f(y)·

4.1 Concepts of Continuity

4.3 Summary

Chapter 4 Continuous Functions140

Page 160: Strichartz_The Way of Analysis 2000

Corollary 4.2.1 A monotone function on an intenJal has at most acountable number 01 ducontinuities, all 01 which are jump discontinu­ities.

Tbeorem 4.2.6 A monotone function on an intenJal has one-sidedlimita at all points 01 the domain, finite ezcept perhaps at the endpoints.

Tbeorem 4.2.5 A continuous function on a compact set is unilormlycontinuous.

Theorem 4.2.4 The image 01 a continuous function on a compact setis compacto

Theorem 4.2.3 A continuous function on a compact set is 60undedand attains ita sup and inf.

Tbeorem A polynomial 01 odd degreehas a real zero.

Theorem 4.2.2 (Jntermediate Value Theorem) A continuous functionI on a closed intenJal [a, b] as,umes all "alues between I(a) and I(b).

Theorem 4.2.1 JI I and 9 are continuous, so is max(f,g) ormin(f,g).

Theorem Continuity is presenJed under addition, multiplication, anddivision (il the denominator never "anishes).

4.2 Properties of Continuous Functions

define limita from the left, written lim%_%o I(z). We say I has a jumpdiscontinuity at Xo il it has limita from botñ ,ides at xo and they arediJJerent.

1414.3 Summary

Page 161: Strichartz_The Way of Analysis 2000
Page 162: Strichartz_The Way of Analysis 2000

143

In this chapter we are going to review the highlights of the differentialcaleulus, supplying precise definitions and proofs for all results. Froma computational point of view you will not learn very much new-youwill still compute the derivative of (sin(sin 1/(1 + x2)))3 in the sameway you always have. From a conceptual point of view, however, wewiIl pursue two goals. First, we want to provide a sound logical foun­dation for this enormously successful branch of mathematics. (Thisis not to say that this is the only possible foundation or even that itis the foundation that Newton and Leibniz had in mind but were notable to formulate clearly. Another possible foundation can be basedon Abraham Robinson's Non-Standard Analysis, but this is consider­ably more difficult to describe.) Our second goal is to clarify certainconcepts in the rather simple one-dimensional case so that we will bebetter prepared to deal with the many beautiful and more complicatedgeneralizations to higher dimensionsand beyond.

The idea of the derivative comes from the intuitive concepts of rateoí change, velocity, and slope of a curve, which are thought of as in­stantaneous or infinitesimal versions oí the basic difference quotient(f(x) - f(xo))/(x - xo) where f is a function defined on a neighbor­hood of Xo. This is often written (f(xo+h) - f(xo))/h, which is clearlyequivalent if we set x - Xo = h. The difference quotient has the imme-

5.1.1 Equivalent Definitions

5.1 Concepts of the Derivative

Differential Calculus

Chapter 5

Page 163: Strichartz_The Way of Analysis 2000

Deflnition 5.1.1 Let I be a function defined in a neighborhood 01 xo.Then I is said to be dil1erentiable at Xo with derivative equal to the realnumber I'(xo) illor every error l/m there ezist» an error l/n such that

The intuitive idea of letting x get doser and closer to Xo (without actu­ally reaching it) is captured in the definition oí the limit oí a function:form q(x) = (I(x) - I(xo))/(x - xo) as a function of z , and take itslimit as x --+ xo. We call this limit, if it exists, the derivative f' (xo) atxo, and we say the function is differentiable at xo. We only allow realnumber limits and explicitly exdude the possibility that I'(xo) mightequal +00 or -00 when we say I is differentiable at xo. Note thatour explicit refusal to consider the value at x = Xo in the definitionof the limit pays off here, since the difference quotient is undefined,(I(xo) - I(xo))/(xo - xo), at x = xo. Thus Xo is not in the domain oíthe difference quotient q(x), which consists of the domain of I with Xoremoved.

This is the definition of derivative that is usually given in moderocalculus books, without a complete explanation of the limit conceptoLet us write out the complete statement defining f'(xo) with the mean­ing of the limit made explicit:

Figure 5.1.1:

diate interpretation as the ratio of changes in the variables x and I (x)over the interval from Xo to x, as shown in Figure 5.1.1.

Chapter 5 Differential Calculus144

Page 164: Strichartz_The Way of Analysis 2000

which we will think of as a statement about how well 9 approximates f.The problem is that there are many different notions of approximation,and we must pick out the one that is pertinent here and distinguishit from others. To do this let us rewrite the definition in terms of thefunctions f and g: [or every l/m there ezists l/n such that Ix - xol <l/n implies If(x) - g(x)1 ~ Ix - xol/m. (According to the definition weshould require x =F xo, but it is trivially true that I/(xo) - g(xo)1 ~ Obecause f(xo) = g(xo) by the form of g.) Notice that this is a statementabout what happens for x near xo. It is a local approximation property.How close does x have to be to xo? That will depend on the choice ofl/m, but it will never hurt us to make it doser, and we may be forced

g(x) = f(xo) + f'(xo){x - xo),

This new inequality admits an interesting interpretation. We thinkof f (x) - f (xo) - f' (xo) (x - xo) as the difference of two functions­the original function f(x) and the function f(xo) + f'(xo){x - xo).Here Xo is thought of as a constant, as are f(xo) and f'(xo), so thissecond function is simply an affine function ax + b, where a = f'(xo)and b = f(xo) - xof'(xo). We use the term affine rather than linearbecause we want to reserve the term linear for the special case b = O.An affine function is one whose graph is a straight line, and any non­vertical straight line is the graph of an affine function (for the functionto be linear the line must pass through the origin). Clearly the affinefunctions are extremely simple to understand-almost any questionabout an affine function that one can imagine asking can be answeredby a simple computation.

Now the existence of the derivative of f at Xo is a statement aboutthe difference between the original function and the affine function

If(x) - f(xo) - f'(xo){x - xo)1 ~ _!_Ix - xol·m

Now observe that since x - XQ =F O,we can multiply the inequalityaboye by Ix - xol to obtain the equivalent inequality

Ix - xol < 11n and x =F Xo implies

If(x) - f(xo) - f'(xo)1 s ~.

x -xo m

1455.1 Concepts of the Derivative

Page 165: Strichartz_The Way of Analysis 2000

It is not just that f(x) - 9(x) tends to O as x tends to xo-thiswould happen with any choice of a-but it goes to zero [aster thanfor any other choice of a. In fact the condition "Ix - xol < lln impliesIf(x) - 9(x)1 ~ Ix - xol/m" means exactly that it goes to zero fas terthan Ix - xol.

Ir we compare any two distinct afline functions 91(x) = f (xo) +aI(x - xo) and 92(X) = f(xo) + a2(x - xo) passing through (xo, f(xo))we find Ig1(x) - g2(X)l = lal - a211x - xol, which goes to zero at arateproportional to Ix - Xo l. This is fundamentally different from the rate ofvanishing of the difference f(x) - 9(X). In fact it shows the uniquenessofthe affine function g(x) = f(xo)+ f'(xo)(x-xo) corresponding to the

Figure 5.1.2:

to make it very dos e indeed. For a particular value of x, not equal toxo, the statement may say nothing at all, since this value of x may notsatisfy any of the conditions Ix - xol < l/no It is only as x is variedcloser to Xo that the statement implies that g(x) approximates f(x)well.

Now let us look at the graphs of f (x) and the various affine functionsthat might be g(x) (see Figure 5.1.2). Since g(x) is supposed to beapproximating x at Xo we may as well have the graphs cross at Xojin other words take f(xo) = 9(Xo), However, there are many affinefunctions whose graph crosses the graph of f at the point (xo, f ( xo) ).These functions have the form f(xo) + a(x - xo) for any real constanta. What distinguishes the unique correct choice?' It is what we visuallyidentify as tangency-the extremely close touching of the graphs for xnear Xo.

Chapter 5 Differential Calculus146

Page 166: Strichartz_The Way of Analysis 2000

Usually the function g(x) in the definition is taken to be somethingrelatively simple, such as a power of Ix - Xo1, so that the conditiongives a comparative statement conceming the size oí f near Xo and astandard oí decay or growth. The simplest choice is 9 == 1. Then f (x) =0(1) as x -+ Xo means f is bounded near x = xo, while f(x) = 0(1) asz -+ Xo means limx_xo f(x) = O. Continuity at Xo can be expressed byf(x) - f(xo) = 0(1) as x -+ Xo. The choice g(x) = Ix -xol enables us to

Definition 5.1.2 Let f and 9 denote arbitrary functions defined nearx = Xo. We say f(x) = O(g(x)) as x -+ Xo (read f is "big Oh" 01 g)if there ezists l/n and a positive constant e such that Ix - xol < l/nimplies If(x)1 ~ clg(x)1 (or equivalently, the ratio f/g remains boundedfor Ix - xol < l/n). We say f(x) = o(g(x)) as z -+ Xo (read f is "littleoh" of g) if for every l/m there exists l/n such that [z - xol < l/nimplies If(x)1 < Ig(x)l/m (or equivalently, limx_xo f(x)/g(x) = O).Note that o(g(x)) is a stronger statement than O(g(x)); o(g(x)) impliesO(g(x)) but not conversely.

for all x satisfying Ix - xol < l/n for some value oí l/no This meanswe can never make f(x) - gl(X) go to zero faster than Ix - xol. Wecan see this clearly in Figure 5.1.2 by the way the graphs of f(x) andgl (x) cross cleanly (the technical term transversal is sometimes usedto describe this).

We can formulate the condition that f(x) - g(x) vanishes at x = Xoat a faster rate than Ix - xol in a convenient manner by introducing"big Oh" and "little oh" notation.

I/(x) - gl(x) I ~ la1 - !'(xo)llx - xol/2

if Ix - xol < l/n. By taking l/m less than la1- 1'(xo)I/2 we have

If(x) - gl(x) I = l(f(x) - g(x)) + (g(x) - gl(x))1> Ig(x) - gl(x)l- If(x) - g(x)1

> la1 - !'(xo)llx - xol- _!_Ix - xolm

choice a = f' (xo) and, hence, the uníqueness of the derivative. If al isdifferent from f' (xo), then the difference f (x) - gl (x) can be estimatedfrom below by the triangle inequality,

1475.1 Concepts 01 the Derivative

Page 167: Strichartz_The Way of Analysis 2000

If(x) - f(xo)1 ::; If'(xo)(x - xo)1 + Ix - xol = (1 + If'(xo)!) Ix - xol·

Differentiability of f at Xo implies that f is continuous at xo. In­deed, continuity at Xo would require that we can make f (x) - f (xo)small, whereas differentiability at Xo means we can make f(x) - f(xo)­f' (xo) (x - xo) small. Since f' (xo) is fixed, we can also makef'(xo)(x - xo) small and, hence, f(x) - f(xo) small. More precisely, wechoose 1/n so that Ix-xol < l/n implies If(x)- f(xo)- f'(xo)(x-xo)1 ~Ix - xol. Then by the triangle inequality we have

5.1.2 Continuity and Continuous Differentiability

express differentiability as f(x) - h(x) = o(lx - ID!) as X ---t :ro where h isthe affine function !(xo) + f'(xo)(x - XO). For any other affine functionshl (X) = f(xo) + a(x - XO) we have merely f(x) - hl (X) = O(lx - xol)·

Let us define a best aiJine approximation to f at Xo to be an affinefunction g(x) such that f(x) - g(x) = o(lx - xol) as x ~ xo. We haveseen that f is differentiable at Xo if and only if it has a best affineapproximation at Xo, in which case the best affine approximation isunique and equals f(xo) + f'(xo)(x - xo). Thus the derivative hereappears as the slope of the best affine approximation, and the graph ofthe best affine approximation is the tangent line to the graph of f(x)at the point (xo, f(xo)).

Having defined differentiability at a point, we define differentiabilityon an open set A simply to mean differentiability at every point of A.(One might think, in analogy with continuity, that one would also wantto consider a stronger condition of uniform differentiability, where therelationship between the errors is specified independently of the pointxo. In this regard see exercise set 5.2.4, numbers 8 and 9.) Thus f isdifferentiable on A if for every Xo in A there exists a constant f' (xo)such that for every l/m there exists l/n (depending on l/m and xo)sucñ that Ix - xol < l/n implies If(x) - (f(xo) + f'(xo)(x - xo))1 ::;Ix - xol /m. Note that if f is defined and differentiable on A, then thederivative Xo -+ f' (xo) can also be viewed as a function defined on A. Thisis the point of view that we will adopto Note that it requires a non-trivialchange of perspective, since the definition of f'(xo) involves holding Xofixed.

Chapter 5 Differential Calculus148

Page 168: Strichartz_The Way of Analysis 2000

No amount of patching will make this an example of a function dif-

Figure 5.1.3:

j(x) = lxI

(In fact, this argument only uses the condition f(x) - f(xo) -f'(xo)(x - zo) = O(lx - xo!), which is weaker than differentiabili­ty.) This is a kind of local Lipsehitz eondition that implies eonti­nuity as follows: given l/m, ehoose l/k sueh that both k ~ n andk > (1 + If'(zo)l)m. Then Ix - xol < l/k implies If(x) - f(xo)1 ~(1+ If'(xomlx - xol < l/m.

Sinee differentiability at a point implies eontinuity at a point, itfollows that differentiability on an open set implies eontinuity on thatseto Later we will show that if the derivative is also bounded, thenthe function is uniformly continuous; in fact it will satisfy a Lipschitzeondition.

Differentiability of a funetion implies continuity of the function, butit does not imply eontinuity of the derivative. This is a rather subtlepoint, sinee the obvious attempt to ereate a counterexample doesn'twork. Since the simplest function that fails to be continuous is onewith a jump diseontinuity, sueh as the signum, it would seem plausiblethat to ereate a functíon with a diseontinuous derivative one would takea funetion like Ixl (shown in Figure 5.1.3) whosederivative is sgn z: Theproblem with this example is that the function is not differentiable atx = O,the very point where the diseontinuity in the derivative oceurs.Indeed the differeneequotient at x = Ois +1 for positive values and -1for negative values, so it ean't have a limit (it has two distinet one-sidedlimits).

1495.1 Concepts of the Derivative

Page 169: Strichartz_The Way of Analysis 2000

'( . 1 2 11 z) = 2x SIn 2' - - cos 2' if x "# O;x x xand this function is clearly unbounded as x --+ O,so no way of defining1'(0) could possibly make it continuous. Notice here that the speed ofoscillation of the factor sin(1/x2) overcomes the decay of x2 to producethe unbounded derivative.

The usual procedures of the differential calculus do not provide acomputation of 1'(0), let alone a guarantee that the derivative exists at

shown in Figure 5.1.4. We may compute the derivative in the usualfashion for x =F O (we are assuming here that the usual laws of dif­ferential calculus are valid-facts that we wiIl eventually prove). Wefind

I(x) = { x2 sin(1/x2), x '# O,O, x = O,

An explicit example is the function

Figure 5.1.4:

ferentiable at every point of an open set with a derivative that is dis­continuous. In fact later we will prove that jump discontinuities neveroccur in a derivative that exists at every point. Thus we have to lookfor discontinuities of the second kind. The picture we have in mind isa function that oscillates more and more rapidly as x approaches xo,but the size of the oscillations also decreases, tending to zero. We mustthen control the relative heights and widths of the oscillation rathercarefully. The idea is that by making the heights decrease rapidly e­nough we can make the derivative exist and equal zero at the pointx = Xo and then by making the width decrease even more rapidly wecan make the derivative discontinuous (even unbounded).

Chapter 5 Differential Calculus150

Page 170: Strichartz_The Way of Analysis 2000

I(x) - 1(0) x2 sin(l/x2) - O . 1_...;........;..__..;.....;..= =xsm-,

x - O x - O x2

which clearly has limit equal to O since I sin(l/x2)1 ~ 1. Thus I isdifferentiable at x = O, and 1'(0) = o. This is an everywhere differ­entiable function with a derivative that is discontinuous at x = o. (Infact, by modifying this example it is possible to produce an everywheredifferentiable function whose derivative is nowhere continuous.)

A function whose derivative exists and is continuous is called con­tinuously differentiable or of closs el. We wiIl see that many of thetheorems of differential calculus do not require this hypothesis; nev­ertheless, it is a very frequently encountered condition, and one couldmalee a case for the viewpoint that there is very little importance at­tached to the game of trying to eliminate this hypothesis from theorems.In deference to established traditions wewiIlplay the game for a while.

Before passing to the study of properties of differentiable functions,we will discuss briefty the intuitive notion of smoothness of the graphand its relation to differentiability. A rule of thumb that is frequent­ly expounded in calculus courses is that a continuous function is onewhose graph can be drawn without a break (without lifting pen frompaper or chalk from blackboard), while a differentiable function is onewhose graph is sufficientlysmooth so that you can run your finger alongit without getting cut. There is a good deal of truth to this maxim,although the example of x2 sin(l/x2) should convince you of the super­ficial nature of the assumption that you can always "draw" the graph ofa function. Nevertheless, if you can draw the graph, then sharp cornersdo indicate points where the derivative fails to existo However, there isanother reason the derivative can fail to exist even when the graph issmooth-namely, the tangent can become vertical. An example is thefunction f(x) = -Vi defined for all real x (the cube root of a negativenumber is negative), whose graph is the graph of x = y3, as shown inFigure 5.1.5. At x = Othe tangent is vertical and the derivative fails

x = O. Indeed it would be natural to guess that the derivative could notpossibly exist at x = Obecause of all the oscillations nearby. However,it turns out that the decay of the factor x2 is enough to overwhelmthe oscillations and produce a zero derivative at x = O. In fact thedifference quotient at x = Ois

1515.1 Concepts 01 the Derivative

Page 171: Strichartz_The Way of Analysis 2000

2. Show that f(x) = O(lx - xolk) and g(x) = O(lx - xolk) imply(f + g)(x) = O(lx - xol'l la the same true of "little oh"?

3. Show that f(x) = O(lx - xolk) and g(x) = o(lx - xoli) implyl· g(x) = o(lx - xolk+i).

4. Showthat f(x) = O(lx-xolk) implies f(x)/x-xo = O(lx-xolk-l)if k ~ 1.

5. Show that if I(xo) = Oand I(x) = o(lx - xol) as x _. xo, then f'(xo)exits. What is I'(xo)? What does this tell you about x2 sin(1/x1,OOO)?

5.1.3 Exercises

1. Showthat f(x) = O(lx-xoI2) as x ~ Xo implies I(x) = o(lx-xol)as x ~ xo, but give an example to show that the converse is nottrue.

Figure 5.1.5:

I(x) = ~

to exist

( f(X) - f(O)) = ~ = _1_ tends to + 00 as x ~ O),x - O X (~)2

even though the graph is perfectly smooth.

Chapter 5 Differential Calculus152

Page 172: Strichartz_The Way of Analysis 2000

Deflnition 5.2.1 Let I be a function defined in a neighborhood 01 apoint Xo. We say I is mono tone increasing at Xo il there exists a(perhaps smaller) neighborhood 01Xo sueh that I(xl) ~ I(xo) ~ l(x2)[or all points %1 and %2 in the neighborhood satisJying %1 < %0 < %2.

We say I is strictly increasing at Xo il there exists a neighborhood 01Xo such that ¡(xl) < I(xo) < I(X2) jor all points Xl and %2 in theneighborhood satisfying Xl < Xo < X2. We say that ¡ is monotoneincreasing on an interval il l(xI) ~ I(X2) jor all points in the intervalsatisfying Xl < X2; I strictly increasing on an interval is defined in thesame way with strict inequalites I(xl) < l(x2)' We define monotone

The basic idea of the differential calculus is to relate properties of thefunction to properties of its best affine approximation at a point. Thesimplest such properties involve questions of increase, decrease, andmaxima and minima. We study first these properties at a point.

5.2.1 Local Properties

5.2 Properties of the Derivative

9. A "zoom" on the graph of y = I(x) near (xo, Yo) (with Yo = I(xo))with magnification factor M (the same in both x and Y directions)is the graph of the function defined by I (xo + x/M) = Yo+ y/M.Prove that if I is differentiable at xo, then the zoom converges tothe straíght line through the origin with slope I'(xo), as M -+ oo.What happens to the zoom of Ixl near the origin?

8. Showthat if I is an affinefunction, it is equal to its own best affineapproximation at every point. What does this tell you about thederivative of I?

7. Give an example of a differentiable function whose tangent lineat a point fails to stay on one side of the graph (above or below)even locally (when restricted to any neighborhood of the point).

6. Show that x sin( l/x) fails to have a derivative at x = Oand eventhe one-sided limits (limx_o+ (f (x) - I(O))/ (x - O), etc.) fail toexisto

1535.~ Properties 01 the Derivative

Page 173: Strichartz_The Way of Analysis 2000

The function is increasing if the graph passes from the left-down tothe right-up quadrant, decreasing if the graph passes from Ieft-up toright-down, and so on. For strict behavior we exclude the boundary ofthe quadrants. The condition "there exists a neighborhood of xo" inall the definitions means we only look at a piece of the graph near thepoint.

One possible source of confusion is that the fact that f is monotoneincreasing at Xo is not the same as saying f is monotone increasing on aneighborhood of Xo. To say f is monotone increasing in a neighborhoodof Xo we would want to know Xl < X2 implies l(x1) ~ l(x2) for every Xland X2 in the neighborhood, which is a stronger condition. For example,the zig-zag function illustrated in Figure 5.2.2 is monotone increasing

Xo

right-down

right-up

Figure 5.2.1:

left-up

These definitions are fairly obvious. One way to think of them isto draw the horizontal and vertical lines through the graph of f at thegiven point, dividing the plane into four quadrants, as in Figure 5.2.1.

and strictly decreasing similarly, reversing the inequalities for f (x). Wesay that f has a local maximum at Xo if there ezists a neighborhood ofXo such that f(x) ~ f(xo) for all x in the neigborhood. We say that fhas a strict local maximum at Xo if there exists a neighborhood of Xosuch that f(x) < f(xo) for all x not equal to Xo in the neighborhood.We define local minimum and strict local minimum similarly, reversingthe inequalities.

Chapter 5 Differential Calculus154

Page 174: Strichartz_The Way of Analysis 2000

a. 11 I'(xo) > O, then I is strictly increasing at xo. Similarly, ilI'(xo) < O, then I is strictly decreasing at xo.

b. 11 I is monotone increasing at xo, then I'(xo) ~ O. Similarly, ilI is monotone decreasing at xo, then I'(xo) ~ O.

Theorem 5.2.1 Let I be defined in a neighborhood 01 xo, and let I bedil1erentiable at xo.

On an intuitive levelwe expect these local properties of a functionto correspond to the same properties for the best affineapproximationat the point in question, and the properties of the best affineapproxi­mation hinge on the sign of the derivative. Since the derivative is thelimit oí the differencequotient (I(x) - I(xo))/(x - xo) and the signof the differencequotient is related to the relative values of I (x) andI (xo), it is not difficultto establish the basic facts. Sincethere is a lossoí information in passing to the limit, in that strict inequalities maynot be preserved, we can't get a perfect match-up of conditions. Letus start with the true implications.

Figure 5.2.2:

at x = O, but it is not monotone increasingin any neighborhoodof O.The same remark applies to strict increasing.

1555.12Properties 01 the Derivative

Page 175: Strichartz_The Way of Analysis 2000

Almost as important as what the theorem says is what it does notsayo If the function is strictly increasing we cannot conclude that thederivative is positive, as the function 1(x) = x3 at x = O shows (seeFigure 5.2.3). Similarly if the derivative is zero at a point we cannotdraw any conclusions. Also it is necessary to assume the function is

Figure 5.2.3:

c. 111 has a localmaximum or mínimum at xo, then f'(xo) = O.

Proof:a. Since the limit of the difference quotient is strictly positive, there

must be a neighborhood of Xo in which the difference quotient is strictlypositive. For x in this neighborhood (f(x) - I(xo))/(x - xo) > O, sof(x) - I(xo) > O if x > Xo while I(x) - I(xo) < O if x < xo, showingthat 1 is strictly increasing at xo.

b. If 1 is monotone increasing at xo, then there exists a neighbor­hood of Xo for which the difference quotient is ~ O. Since non-strictinequality is preserved in the limit, the derivative at Xo is also ~ O.

c. Suppose 1 has a local maximum at xo. Then the differencequotient formed for x < Xo wiIl be ~ O, while the difference quotientformed for x > Xo wiIl be ~ O, for x in a neighborhood of xo. Since wecan take the limit from either side, the derivative at Xo must be both~ Oand s O, hence zero. QED

Chapter 5 Differential Calculus156

Page 176: Strichartz_The Way of Analysis 2000

We turn now to properties of functions that are differentiable on anopen intervalo The two main theorems are the intermediate value theo­rem (not to be confused with the theorem of the same name in Section

5.2.2 Intermediate Value and Mean Value Theorems

Thus Fermat looks at the equation f (x +h) = f (x), finds the non-zerosolution h (h = O is always a solution), expresses x as a function ofh, and then sets h = O (strictly speaking one must take the limit ash ~ O). Thus for f(x) = x2 his method sets (x + h)2 = x2. The twosolutions are h = O and h = -2x. For the non-zero solution we findx = -h/2j and when h __. O we find the mínimum at x = o. Forf(x) = x3 the equation (x + h)3 = x3 has only h = O as a real root,so Fermat's method does not get fooled by the critical point x = O,which is neither a maximum nor a minimum. Despite the fact thatFermat's method has this advantage over the usual method, it hasbeen largely forgotten because it requires the explicit solution of theequation f (x + h) = f (x), which is often intractible.

Figure 5.2.4:

differentiable at the point in question¡ f(x) = Ixl has a local minimumat x = Obut is not differentiable there.

Part e of the theorem will be extremely useful to us and forms thebasis of many familiar applications of calculus. It is usually attributedto Fermat, but in fact Fermat developed a somewhat different methodfor locating maxima and minima. Fermat observed that in a neighbor­hood of a strict local maximum (say f (xo) = M), the function assumessmaller values exactly twice but the maximum value only once, as inFigure 5.2.4.

1575.2 Properties of the Derivative

Page 177: Strichartz_The Way of Analysis 2000

To find a solution to j' (x) = Owe need only show that j has a localmaximum or minimum in the open interval (Xl, X2). In the case we areconsidering we can show it must have at least one local minimum-itmay, of course, have more than one, and it may have local maxima aswell. To find a local minimum we look for a point where j attains its inf

Figure 5.2.5:

Proof: First let us prove the theorem in the case when the valuewe want j' (x) to assume is zero. This means zero must lie betweenj'(xI) and j'(X2), so one must be positive and one must be negative,say ¡'(xI) < O< j'(X2). This is shown in Figure 5.2.5.

Intermediate Value Theorem Let j(x) be differentiable on an openinterval (a, b). Then its derivative has the intermediate value property:ij Xl < x2 are any two points in the interval, then j'(x) assumes allvalues between j'(XI) and j'(X2) on the interval (Xl, X2).

4.2.1) and the mean value theorem. The proofs are quite similar, basedon the observation that to get a solution of j' (x) = Owe can take alocal maximum or minimum and then by subtracting an appropriateaffine function we can get j'(x) to take on other values. The interme­diate value theorem is something of a curiosity, since its conclusion is aconsequence of the other intermediate value theorem if we assume thederivative is continuous, which we will frequently do for other reasons.The mean value therorem, on the other hand, is one of the most use­ful theorems in analysis. It turns out that its proof is not made anysimpler by assuming the derivative is continuous.

Chapter 5 Differential Calculus158

Page 178: Strichartz_The Way of Analysis 2000

Thus we have to use the hypotheses j' (x 1) < O< j' (X2) to showthat the inf cannot occur at either endpoint. It can't occur at theleft endpoint Xl, because there the function is strictly decreasing, soj (x) < j (x 1) for X in the interval near Xl, and it can't occur at theright endpoint x2, because there the function is strictly increasing, soj(x) < j(X2) for x in the interval near X2. (If we had assumed thereverse inequalities j'(X2) < O< j'(xl), then a similar argument wouldshow that I can't attain its sup at either endpoint.)

We have thus proved that j'(x) attains an intermediate value ifthatvalue happens to be zero. Nowwe need to see how the general case canbe reduced to this special case. Suppose we want to show that j'(x)can attain the value Yo, where say j'(XI) < Yo < j'(X2). The linearfunction g(x) = Yox has derivative everywhere equal to Yo, so we needto solve F'(x) =Owhere F = j - 9 (we are using here the elementaryproperty F' = j' - 9', whose proof we will be given in Section 5.3.1).But F'(XI) = j'(XI) - Yo < Oand F'(X2) = j'(X2) - Yo > O,80 we canapply the previous argument to F. QED

Figure 5.2.6:

inf;.: local minimum

on the closed interval [Xl, X2]. Here we use the fact that differentiabilityimplies continuity of j, and then the fact that continuous functions oncompact intervals attain their inf. If the inf is attained at an interiorpoint XQ of the interval, then Xo is a local minimum (because (Xl, X2)is a neighborhood of Xo such that j (x) ~ j (xo) for all points in theneighborhood.) However, if the inf were attained at an endpoint wecould not assert that we have a local minimum, as in the examplej(x) = X (Figure 5.2.6).

1595.2 Properiies 01 the Deriuatiue

Page 179: Strichartz_The Way of Analysis 2000

This intuition can be made to form the basis of the proof if we firstreduce to the case where the lines are all horizontal and then noticethat a horizontal line loses contact with the graph at exactly the localmaximum and minimum points.

Figure 5.2.7:

The great significance of the mean value theorem is that it enablesus to obtain information about the derivative from the computation ofa single difference quotient, without passing to the limito Of course welose the precision of knowing the point Xo exactly. The intuitive contentof the theorem is that the difference quotient is the slope of the secantline joining the two points (a, f (a)) and (b, f (b)) of the graph, whilethe derivative is the slope of the tangent lineo If we translate the secantline parallel to itself up or down, then at the moment it loses contactwith the graph it should be tangent to it, as shown in Figure 5.2.7.

Mean Value Theorem Let f be a continuous function on a compactintenJal [a,b] that is differentiable at every point in the interior. Thenthere exists a point Xo in the interior where f'(xo) = (f(b) - f(a))j(b­a).

The intermediate value theorem explains why it is so tricky to get afunction whose derivative is discontinuous-the discontinuity can't be ajump because then the intermediate value property would be violated.The proof of the intermediate value theorem is a good warm-up exercisefor the proof of the mean value theorem.

Chapter 5 Differential Calculus160

Page 180: Strichartz_The Way of Analysis 2000

is the formula for the affine function, and one can then verify alge-

g(x) = (f(b) - f(a)) (x - a) + f(a)b-a

But clearly F vanishes at both endpoints because f and 9 attain thesame values at the endpoints. Thus F(b) = F(a), so the previousargument applies.

You can easily verify that

Figure 5.2.8:

y = j(x)e

Proof: We begin with the special case when f(a) = f(b), so thedifference quotient is zero (this case is known as Rolle's Theorem).Then we are looking for a point where f'(x) = O, so it will sufficeto find a local maximum or minimum. We have assumed that f iscontinuous on the compact interval [a, b), so we know that f attainsits sup or inf. If either happens at an interior point we have our localmaximum or minimum. Thus we need only consider the special casewhere f attains both its sup and inf at the endpoints. This is not animpossible occurrence (for example, f(x) = x)¡ but because we haveassumed f(b) = f(a), this implies f must be constant on [a, b), sof' (x) =Oat any point in the interval.

Next we reduce the general case to the special case. Let g(x) be theaffine function passing through the two points (a, f (a)) and (b, f (b) ), asshown in Figure 5.2.8. The slope of the graph is obviously the differencequotient, so g(x) has derivative equal to (!(b) - f(a))/(b - a) at everypoint. By subtracting it off, F = f - g, we reduce the problem ofsolving f'(xo) = (f(b) - f(a))/(b - a) to that of solving F'(xo) = o.

1615.12Properties of the Derivative

Page 181: Strichartz_The Way of Analysis 2000

Proofa. If f is monotone increasing on the interval, then the same argu­

ment as in the pointwise case (the difference quotients are a11~ Oandnon-strict inequality is preserved in the limit) shows that f' (xo) ~ Oatany particular point Xo in the interval. Conversely, suppose f' (x) ~ Oon the intervalo Let Xl and X2 be two points in the interval withXl < x2. Then apply the mean value theorem to f on the closed in­terval [Xl, X2]. Notice that the continuity hypothesis of the mean valuetheorem is satisfied since differentiability implies continuity. The meanvalue theorem gives us the identity f'(xo) = (f(X2) - f(xl))/(X2 - xl)for sorne point Xo. Since f'(xo) ~ O, we obtain f(X2) - f(xl) ~ O,proving that f is monotone increasing.

b. We argue exactly as in part a to get f'(xo) = (f(X2) - f(xl))/(X2 - xl). Since now we are assuming f'(xo) > O, we obtain f(X2) -f(xl) > O,proving that f is strictly increasing.

C. This time from f'(xo) = (f(X2) - f(xl))/(X2 - xl) and f'(zo) = Owe obtain f(X2) - f(xl) = O. Since this is true for any two points inthe interval, f is constant. QED

c. JI f' (x) = Olor every x in the interval, then I is constant on theintervalo

Theorem 5.2.2 Let f be a differentiable function on the interval (a, b).

a. f is monotone increasing on (a, b) if and only if f' (x) ~ O forevery x in the intervalo Similarly f is mono tone decreasing on(a, b) if and only if f' (x) $ O on the intervalo

b. Jf f' (x) > O for every x in the interval, then f is strictly increasingon the intervalo Similarly f' (x) < O implies that f is strictlydecreasing.

Using the mean value theorem we can obtain relatíons between thederivative on an interval and the behavior of the function on the inter­val.

5.2.3 Global Properties

braically that g'(x) = (f(b) - f(a))/(b - a) while f = 9 at x = a andx = b-facts that are evident from the graph. QED

Chapter 5 Differential Calcul us162

Page 182: Strichartz_The Way of Analysis 2000

5.2.4 Exercises

1. Let f and 9 be continuous functions on [a, b] and differentiableat every point in the interior, with g(a) =F g(b). Prove that there

Because of the greater flexibility that corners allow, there are sorneapplications where it is important to allow Lipschitz functions. (A veryprofound theorem of Lebesgue-well beyond the scope of this work­says that Lipschitz functions rnust be differentiable at sorne points and,in fact, at most points in a certain well-defined sense.)

Figure 5.2.9:

yy-t:PSi

!A::J

Notice that we have done somewhat better in part a than in thepointwise case (from f'(xo) ~ O at one point we can deduce nothing)­we have an if and only if staternent. The converse of part b is of coursefalse, as the familiar example f(x) = x3 shows. One can in fact showthat f' (x) ~ O on an interval implies that f is strictly increasing unlessf' (x) = O on an open subinterval. We leave this as an exercise. Part ecan be used to prove the uniqueness of the indefinite integral.

Another easy application of the mean-valué theorern shows thata differentiable function with bounded derivative satisfies a Lipschitzcondition. Indeed, suppose f is continuous on [a, b] and dift'erentiablein the interior and If'(x)1 ~ M for a11x in (a, b). Then for any xI, X2in the interval, f(X2) - f(xI) = f' (XO)(X2 - xl) for sorne Xo, so If(X2) -f(x¡)1 ~ Mlx2 - xII, which is the desired Lipschitz condition. Theconverse is not true, for f (x) = Ix I satisfies a Lipschitz condition butis not differentiable at x = O. Lipschitz functions have slightly roughergraphs than differentiable functions-as a rule of thurnb they can havecorners but not cusps (see Figure 5.2.9).

1635.~ Properties of the Derivative

Page 183: Strichartz_The Way of Analysis 2000

5. Draw a picture of the graph of a function that is strictly increasingat a point but is not even monotone increasing in a neighborhoodof that point.

Give an example where f and 9 are differentiable but h is notoGive a definition of one-sided derivatives f' (b) and g' (b) and showthat the equality of these is a necessary and sufficient conditionfor h to be differentiable, given that f and 9 are differentiable.

4. Suppose f is defined on [a, b] and 9 is defined on [b, e] with f(b) =9 (b ). Then define

h(x) = {f(X) if a s x s b,g(x) if b s x s c.

f(X2) - f(xI) = f'(xo)?x2 - Xl

3. Is the converseof the mean value theorem true, in the sense that iff is continuous on [a, b] and differentiable on (a, b), given a pointXo in (a, b) must there exist points Xl, x2 in [a, b] such that

for all x and y and some fixed M and Q > 1, prove that f is con­stant. (Hint: what is f'?) It is rumored that a graduate studentonce wrote a whole thesis on the class of functions satisfying thiscondition!

If(x) - f(y)1 :$; Mlx - ylQ

2. If f is a function satisfying

This is sometimes called the second mean "alue theorem.

(f(b) - f(a))g(x) - (g(b) - g(a))f(x).)

(Hint: apply the mean value theorem to the function

f(b) - f(a) f'(xo)g(b) - g(a) = g'(xo)"

exists a point Xo in (a, b) such that

Chapter 5 Differential Calculus164

Page 184: Strichartz_The Way of Analysis 2000

5.3.1 Product and Quotient RulesIn this section we derive the familiar rules of the differential calculus.These rules are all analogous to-and derived from-rules of the cal­culus of differences. One might find it surprising that the calculus of

5.3 The Calculusof Derivatives

10. If J assumes a local maximum or minimum at an endpoint of itsdomain [a, b], what can you say about the one-sided derivative(assuming it exists)? Warning: The answer depends on whichendpoint it is.

11. Prove that if J' is constant, then J is an affine function.

12. Give an example of a function that is differentiable on (a, b) butcannot be made continuous on [a, b] by any definition of J(a) orJ(b). Can you give an example where I is bounded?

13. If J is a differentiable function, prove that between any two zeroesof J there must be a zero of J'.

8. Suppose J is continuously differentiable on an interval (a, b). Provethat on any closed subinterval le, d) the function is uniformly dif­ferentiable in the sense that given any l/n there exists l/m (inde­pendent of xo) such that IJ(x)- I(xo)- !'(xo)(x-xo)1 ~ Ix-xol/nwhenever Ix - xol < l/m. (Hint: use the mean value theoremand the uniform continuity of J' on [e, d).)

9. *Show that the converse to problem 8 is also valid: if J is uni­formly differentiable on an interval, then J is continuously differ­entiable. (Hint: for two nearby points Xl and X2, consider thedifference quotient (f(X2) - J(XI))/(X2 - Xl) as an approximationto both J'(xI) and J'(X2).)

6. Show that if I is differentiable and l' (x) ~ O on (a, b), then Iis strictly increasing provided there is no subinterval (e, d) withe < d on which l' is identically zero.

7. Draw the graph of a function that has a local maximum that isnot á strict local maximum but is not constant on an intervalo

1655.3 The Calculus 01Derivatives

Page 185: Strichartz_The Way of Analysis 2000

computation:

tl.h,(f ± g)(z) = (f(z + h) ± g(z + h)) - (f(z) ± g(z))= tl.h,f(z) ± tl.h,g(z);

tl.h,(f . g) (z) = f(z + h)g(z + h) - f(z)g(z)= f(z + h)(g(z + h) - g(z» + g(z)(f(z + h) - f(z))= f(z + h)tl.h,g(z) + g(z)tl.h,f(z);

tl.h,(f . g)(z) = f(z)(g(z + h) - g(z)) +g(z + h)(f(z + h) - f(z))= f(z)tl.h,g(z) + g(z + h)tl.h,f(z);

~h m (xl = f(z+h) f(z)g(z +h) - g(z)

= f(z + h)g(z) - f(z)g(z + h)g(z +h)g(z)

= g(z)(f(z + h) - f(z)) - f(z)(g(z + h) - g(z))g(z+h)g(z)

--:g(z)tl.h,f(z) - f(z)Ah,g(z)

g(z +h)g(z)

derivatives is somewhat simpler than the calculus of differences-butthen the wholesuccessof calculus is due to similar surprises.

If f is a function, we write A/d(z) = f(z + h) - f(z) wheneverit is defined. Think of h as a fixed number and tl.h,as an "operator",something that takes the function f as input and produces the functionf (z +h) - f (z) of z as output. The concept of operator is a distinctlytwentieth century idea. Wewill use it only in an informal way in thiswork, but you will encounter it frequently if you continue your studies.We can also think of the derivativeas an operator, taking the functionf as input and producing the function f' as output. We are claiminghere that the properties of the derivativeoperator are related to prop­erties of the differenceoperator. Since f'(z) = limh,_Otl.h,f(z)jh bythe definitionof the derivative, this is not at all surprising.

Supposeweapply arithmetic operations on functions to obtain newfunctions. What happens to the differences?This is a matter of simple

Chapter 5 DifferentialCalculus166

Page 186: Strichartz_The Way of Analysis 2000

g(xo)/'(xo) - I(xo)g'(xo)g(xO)2

The proof for the sum and product is similar. For the product we canuse either identity for the difference of a product-in the limit theyyield the same formula.

We obtain immediately similar results about differentiability on aninterval and even continuous differentiability since the formulas forderivatives preserve continuity (the condition g( x) =F O for all x inthe interval is required for the quotient, of course). What happens tothe derivative of a quotient when both I and 9 are zero is a very inter­esting question-one wewill be able to answer after we discuss Taylor'stheorem.

Now to obtain the analogous rules for derivatives, we simply dividethese identities by h and take the limit as h ~ O. For sums andproducts there is no difficulty, since the limits interchange with theseoperations. For the quotient we need to assume that 9 is not zero atthe point in question.

Theorem 5.3.1 11 I and 9 are differentiable at Xo, then I ± 9 andl· 9 are also differentiable at Xo, and (f ±g)'(xo) = I'(xo) ±g'(xo), (f.g)'(xo) = f(xo)g'(xo) + f'(xo)g(xo)· In addition, il g(xo) =F O, thenf /9 is differentiable at Xo, and

( !_)' (xo) = g(xo)f'(xo) - I(xo)g'(xo) .9 g(xO)2

Proof: We write out the complete proof for the quotient. The as­sumptions about differentiability mean limh_O!:lhl(xo)/h = I'(xo) andlimh_O!:lhg(xo)/h = g'(xo). The fact that 9 is differentiable at ·xo im­plies that it is continuous at Xo, so lirnh_Og(xo + h) = g(xo). Since weare assurning g(xo) -:j:. O, we have also lirnh_Og(xo + h) -:j:. O. Altogether

lim l::.h(f / 9)( xo)h-O h= lim g(xo)!:lhf(xo) - f(XO)!:lh(XO)

h-O hg(xo + h)g(xo)g(xo) limh_O~hf(xo)/h - f(xo) limh_Oll.hg(xo)/h

= g(xo) limh_Og(xo + h)which shows the limit exists (meaning l/gis differentiable at xo) andequals

1675.3 The Calculus 01 Derivatives

Page 187: Strichartz_The Way of Analysis 2000

and so the limit wiIl exist (and equal the product) if the limits of bothf1zg(f(z))/z and f1l1,f(z)/h existo Now if we fue x = xo and assumethat f is differentiable at xo, then we have limll,_o f1l1,f(xo)/h = f'(xo).For the other factor, if we assume 9 is differentiable at the point f (zo),we have limz_o azg(!(zo))/z = g'(f(xo)). This is not quite what iscalled for, since we are taking the limit as h goes to zero, and z isdefined in terms of h, namely z = f1l1,f(xo). Now we do know thatf is continuous at xo, because it is differentiable, so limh_O z = O,which seems to imply limll,_o f1zg(f(xo))/z = g'(f(xo)), leading to thefamiliar chain rule (g o 1)' (xo) = g' (f (xo)) J' (xo). However, if you try tomake this into a precise proof you will come upon one very sticky point:if z = O nothing is defined. Of course nothing is defined if h = O, either,but this is exeluded in the definition of the derivative, while there is

(Azg)(f(x))h

= azg(f(z)) . .:.z h

= azg(f(z)). all,f(z)z h

AII,g o f(x)=h

This rather cumbersome notation means we take the difference of thefunction 9 at the point f (z) with increment f1l1,f [z}. Of course thedomain of 9 must inelude f(z) and f(z + h) for this to be meaningful.To compute the derivative we want to divide by h and let h ~ O. Tosimplify the notation we let z = f111,/(z). Then

f1l1,go f(z) = g(!(z + h)) - g(!(z))= g(!(z) + (!(z + h) - f(z))) - g(!(z))= (f1a,d(x)g)(f(z)).

The next basic calculus formula is the chain rule for the derivative ofthe composition 9 o f(z) = g(!(z)).

We begin with a direct approach to the computation in terms of dif­ference quotients; this approach encounters sorne obstacles, which willleadus to rethink the whole problem. We compute first the difference of thecomposition:

5.3.2 The Chain Rule

Chapter 5 Differential Calculus168

Page 188: Strichartz_The Way of Analysis 2000

where the remainder term is the sum of g'(f(xo))o(z - xo) ando(y - Yo). To complete the proof we have to show this remainderis o( x - xo). Before doing this we should provide some interpretationfor these computations. The formula f(z) = f(xo) + f'(xo)(x - xo) +o(x - xo) is the same as f(x) - f(xo) = f'(zo)(z - zo) + o(z - zo)or even !:lhf(zo) = f'(xo)h + o(h) if we set z = Xo + h. We inter­pret this to say that for values of x near xo, changes in the x variableare multiplied by the magnification factor f'(xo)-aside from a smallremainder term-in order to obtain changes in y = f(x). Similarlyg(y)-g(yo) = g'(Yo)(y-Yo)+o(y-Yo) means changes in y, for Y near Yo,get multiplied by the factor g' (Yo )-again aside from a small error-inpassing to changes in g(y). Thus in the composition 9 o f the change in

g(f(z)) = g(f(xo)) + g'(f(xo))(f(x) - f(xo)) + o(y - Yo)= g(f(zo)) + g'(f(zo))(J'(zo}(z - zo)

+o(z - zo)) + o(y - Yo)= g(f(xo)) + g'(f(zo))f'(xo)(z - xo) + remainder,

[Strictly speaking, we should write f(x) = f(zo)+ f'(zo)(x-xo) + R (z)with R(z) = o(x - xo) as x ~ xo, and this is the meaning of our abbre­viation. This is a convenient and standard notational short-hand. Youshould keep in mind, however, that different occurrences of the samesymbol o(x - xo) might refer to different functions.] If 9 is differen­tiable at Yo = f(xo), then g(y) = g(yo) + g'(Yo)(y - Yo) + o(y - Yo).Substituting Y = f(x) we obtain

f(z) = f(zo) + J'(xo)(z - zo) + o(z - zo).

nothing to exclude z = O. In fact z = Owhenever f(zo + h) = f(zo),which may well happen for values of h arbitrarily close to zero. Thuswe have only established the chain rule if there exists a neighborhoodof zero such that !:lhf (zo) =F Ofor non-zero h in the neighborhood.

We can make a separate proof in the contrary case, showing thatthen both f'(zo) =Oand gof'(zo) =O. However this results in a ratherawkward proof, so we wiIl try another tack, leaving the completion ofthe first approach for the exercises.

Let us think about the chain rule in terms of best affine approxi­mations. If f is differentiable at xo, then

1695.3 The Calculus of Derivatives

Page 189: Strichartz_The Way of Analysis 2000

km

Ifwe can show that each ofthe terms in brackets is at most Ix-xo!/2m,we will be done. For the second term this is just the differentiability off at xo, which enables us to get

, Ix - xolIf(x) - f(xo) - f (xo)(x - xo)! < 2mlg'(!(xo))1

by taking x close to Xo (if g'(!(xo)) = O the whole term is zero andthere is nothing to prove). For the first term we have to work a littleharder. By the differentiability of 9 at f(xo) = Yo we can make

I ( , ( !y - Yo!g(y) - 9 Yo) - 9 (Yo) y - yo)1 ~ km

g(f(x)) - g(f(xo)) - g'(f(xo))J'(xo)(x - xo)= [g(f(x)) - g(f(xo)) - g'(f(xo))(f(x) - f(xo))]

+[g'(f(xo))(f(x) - f(xo) - f'(xo)(x - XO))]'

Proof: We need to show that given any l/m we can make

!g(f(x)) - g(f(xo)) - g'(f(xo))f'(xo)(x - xo)! ~ Ix - xolm

by taking x close enough to xo. From our discussion we know that wewant to break this into two parts as follows:

x first gets multiplied by f' (xo) and then by g' (Yo) = g' (f (xo)), hencealtogether by g'(f(xo))f'(xo), before producing a change in 9 o f. No­tice that the error in the first stage also gets multiplied by g' (f (xo))and is then added to the error in the second stage. We can now putthis all together into a complete proof.

Theorem 5.3.2 (Chain Rule) Let f be defined in a neighborhood ofXo and differentiable at xo, and let 9 be defined in a neighborhood off (xo) and dijJerentiable at f (xo). Then 9 o f is dijJerentiable at Xo and(g o f)'(xo) = g'(f(xo))f'(xo).

Chapter 5 Differential Calculus170

Page 190: Strichartz_The Way of Analysis 2000

The last of the important formulas in the calculus of derivatives is therule for differentiating functions given implicitly. We are going to haveto postpone a full discussion until a later chapter, because it requiresthe differential calculus in two variables. Here we will discuss a specialcase, that of inverse functions. In the abstract definition of function,if f is one-to-one from its domain D to its range R and is onto R (theimage f(D) is all of R), then the inverse function f-l with domain Rand range D is defined by f-l(y) = x if and only if f(x) = y. If IDand IR denote the identity functions on D and R respectively, thenf -1 o f = ID and f o r:' = IR'

To begin the discussion we will assume that f is a numerical func­tion and f and t: are both differentiable. This is a big assumption,and we will have to return to the point later. What we can see eas­ily is that the chain rule establishes an identity involving the deriva­tives of f and f-1. Note that I(x) = x is the identity function onR, so f-1(f(x)) = X. Since this is an equality between functions, wemay differentiate both sides of the equation. (Unfortunately, mathe­matical notation is sometimes ambiguous about whether an equationf(x) = g(x) is meant to hold for all x or just for sorne particular value X.

From the equality f(x) = g(x) at one point x we can conclude nothingabout the derivatives of f and 9 at that point; but if f(x) = g(x) for allx in the domains, then f and 9 are the same function, so f' (x) = g' (x)for all x because f' and g' are the same function.) From the chain rule

5.3.3 Inverse Function Theorem

When we come to discuss the differential caIculus in several vari­ables we will also have a chain rule, and we will be able to adapt theaboye proof to that contexto

Finally, by the differentiability of f at xo, we can make bothIf(x) - f(xo)1 < l/n, so the aboye applies, and If(x) - f(xo)1 ~Mlx-xol where M = 1+1!,(xo)1 (recall the proofthat differentiabilityimplies Lipschitz continuity) by taking x close enough to xo. This givesthe estimate M / km for the first term in brackets, and we need onlytake k = 2M to complete the proof. QED

1715.9 The Calculus of Derivatives

Page 191: Strichartz_The Way of Analysis 2000

Now the tangent line to the graph oí f at the point (xo, Yo) is alsoreflected in Y = x into the tangent line to the graph of ,-1 at the point

Figure 5.3.1:

(1-1)'(y)

(1-1)'(y)

(I-1)'(f(X)) 1= f'(x)' or

= I'~x) if y = ¡(x), or

1= "(1-1 (y)) .

The last form is what we obtain by differentiating f(f-1(y)) = Y andsolving for the derivative of f-1.

We might attempt to paraphrase this relation as saying the deriva­tive of the inverse function is the reciprocal (the multiplicative inverse)of the derivative of the function. However, this is only part of the sto­ry, because it doesn't say where the derivatives are evaluated. It is nottrue that (I-1)'(x) has any relation to 1/ f'(x), for example. A goodway to think about the situation is via the graphs of f and ,-l. Sincey = f(x) íf and only if x = f-1(y), the graph oí f-1 is obtained fromthe graph oí , by interchanging the axes. This is the same as reftectingthe graph in the diagonalline y = x, as shown in Figure 5.3.1.

we obtain (f-1)'(f(x))f'(x) = 1, hence

Chapter 5 Differential Calculus172

Page 192: Strichartz_The Way of Analysis 2000

It tums out that all these problems are interconnected. We wiIl startwith a differentiable function f, and to avoid splitting hairs let us as­sume that J is continuously differentiable on an interval (a, b). HowcanI be one-to-one? If I is strictly increasing or strictly decreasing, thenI cannot assume one value twice. Actual1y the converse is true; wewillleave the proof as an exercise, since we wiIl not need this resulto Howcan we assure that I is strictly increasing or decreasing? By makingI'(x) > Oor I'(x) < Oon the intervalo If we make this assumption,

3. What happens when I'(X) = O?

The interchange ofaxes clearly changes the slope of the tangentline ll.y /!:lx into its reciprocal !:lx / !:ly, while the first coordinate of thepoint of tangency is also clearly changed from Xo to Yo.

The formal computation of (1-1)', made under the assumption that1-1 exists and is differentiable, has given us a good intuitive grasp ofthe situation. Now we have to come to grips with three importantproblems:

1. How do we know if 1-1 exists?

2. How do we know if 1-1 is differentiable?

Figure 5.3.2:

Yo Xo

Yo ~---­III

Xo~-------i I

(YO, xo), as shown in Figure 5.3.2.

1735.3 The Calculus 01 Derivatives

Page 193: Strichartz_The Way of Analysis 2000

provided Ix - xol < l/n and x '# Xo. What we need to show islimy_yO(x - xo) / (y - yo) = 1/ j' (xo), in other words given any errorl/m there exists l/k such that

Ix - Xo 1 I 1Y - Yo - j'(xo) <;;;

exists and is non-zero, so limx_xo(x - xo)/(y - Yo) = 1/ f'(xo). Now(x - xo)/(y - Yo) is the difference quotient whose limit is the derivativeof r: at Yo. The only problem is that in the definition of (f-1), we aresupposed to take the limit as Y ~ Yo. Is this the same thing? To findout, let's write what we know and what we need to show. We knowlimx_xo(x - xo)/(y - Yo) = 1/ j'(xo), so given any error l/m thereexists l/n such that

li Y - Yo t'! )m --= xoX-Xo X - xo

Proof: Let xo, Yo denote fixed points such that j (xo) = Yo, and letz, Y denote variable points such that j(x) = y. Then

Inverse Function Theorem Let j be a continuously differentiablefunction on an open interval (a, b), with image (c, d); and suppose eitherj'(x) > O or j'(x) < O on (a, b). Then j-1 with domain (c, d) andimage (a, b) is continuously differentiable, and (1-1)'(y) = 1/f'(x) ijY = j(x).

then we can define r:' with domain equal to the image of j, which isalso an open intervalo We also avoid the problem of what happens at apoint where j' is zero. In fact it is clear from the reBection picture thatif j'(xo) is zero, hence the tangent is horizontal, and if r: exists, thenthe tangent to the graph of r: must be vertical at the correspondingpoint Yo= j(xo), so r: is not differentiable at Yo. The simplest exam­ple of this is j(x) = x3 at x = O. The inverse function j-1(y) = ifY isnot dífferentiable at Y = O. Therefore we are not missing any positiveresults if we assume either j' (x) > O or j' (x) < O on the interval.

Chapter 5 Differential Calculus174

Page 194: Strichartz_The Way of Analysis 2000

Proof: Since f' is assumed continuous and j'(XO) ::j:. O,we can find aneighborhood (a, b) of Xo such that either J' (x) > Oor r (x) < Othere,and then we apply the global theorem. QED

Local Inverse Function Theorem Let f be defined and eontinuouslydifferentiable in a neighborhood 01 xo, and suppose j'(XO) ::j:. O. Thenthere ezist« a neighborhood (a, b) o/ %0 sucñ that the restriction o/ /to (a, b) has a continuously differentiable inverse on the image (c, d) =j(a, b)).

The inverse function theorem as stated is unique to one dimen­sion, because only in one dimension do we have a criterion like strictlyincreasing or decreasing for the function to be one-to-one. There is alocal version of the theorem, however, that can be generalized to higherdimensions, and for that reason we state it here.

provided Iy - yol < l/k and y =/: Yo. Thus to bridge the gap betweenthese statements we need to show: jor any e7TOr l/n there exists l/ksueh that Ix - xol < l/n and x ::j:. Xo protJided Iy - yol < l/k andY =/: Yo· Note that Y =/: Yo implies x =/: Xo because I is one-to-one, sothat part is easy. The rest of the statement is exactly the definition ofthe continuity of j-l at Yo. This is the non-trivial parto

Why should 1-1 be continuous? The idea is that since J' is neverzero, it must be bounded away from zero in a neighborhood of xo,so that changes in x must result in substantial changes in y--we getestimates that go in the opposite direction of the continuity estimatesfor j, so that when we switch over to 1-1 the estimates will go in thecorrect direction. More precisely, first choose a neighborhood of Xo suchthat Ij'(x)1 ~ l/N for some fixed N and all x in the neighborhood (herewe use the continuity of the derivative). Then the mean value theorem(f(x) - j(xo))/(x - xo) = j'(X¡) gives the estimate I/(x) - l(xo)1 ~Ix - xol/N for all x in the neighborhood. Note that this is the reverseof the usual Lipschitz condition. But now Y = j(x) and Yo = f(xo),so Iy - YoI < l/Nn implies Ix - xol ~ Nly - yol < l/n as desired.This proves the continuity of t:' and so completes the proof of thedifferentiability of t:' with the correct derivative. Finally (f-l )(y) =1/ f'(f-l (y)) is continuous because J' and 1-1 are continuous and J'is never zero. QED

1755.3 The Caleulus oj Derivatives

Page 195: Strichartz_The Way of Analysis 2000

9. Show that x¡ is continuously differentiable for any rational r > 1.

6. Give a proof of the chain rule arguing separately in the casewhen every neighborhood of zero contains a value of h for whichf1hl(xo) = o.

7. Prove that f(x) = xl/k can be defined on [0,00) by the require­ment that it be the inverse function of g(x) = xk on [0,00), wherek is any positive integer. Use the inverse function theorem toderive the usual formula for 1'.

8. For any rational number r give a definition of I(x) = xT for x > Oand show I'(x) = rxr-1•

5. If I is differentiable on (a, b) and l' (x) =F Ofor all x in the interval,prove that either I'(x) > Oor I'(x) < Oon the entire intervalo

4. Prove that if f is any continuous one-to-one function on an interval(a, b), then either I is strictly increasing or strictly decreasing

x = {z if x ~ O,+ O if x < O.

Prove that I (x) = x~ is continuously differentiable if k is aninteger greater than one.

2. Show that (x-a)~(b-x)~ is a continuously differentiable functionthat is non-zero exactly on the interval (a, b).

3. Given a closed set A, construct a continuously differentiable func­tion that has A as its set of zeroes.

5.3.4 Exercises

1. Define

The inverse function theorem is an extremely useful and powerfultheorem. Many important functions such as exp, sine, and cosine arebest defined as inverses of functions given by explicit integrals. We willdevelop these ideas in a later chapter. The inverse function theorem inhigher dimensions, which is a generallzation of the local version only, isused to prove the differentiability (and existence) of functions definedimplicitly.

Chapter 5 Differential Calculus176

Page 196: Strichartz_The Way of Analysis 2000

5.4.1 Interpretations of the Second Derivative

If lisa differentiable function on an int erval , then l' is also a func­tion on that interval, which may or may not be differentiable. If l' isdifferentiable at Xo we call its derivative I"(xo) the second derivativeoí f at Xo. If f" exists for every point in the interval and is continuouswe say I is tunee continuously differentiable or I is C2•

You are no doubt familiar with the interpretation of accelerationas a second derivative. The significance of acceleration in Newton'stheory of mechanics guarantees that the notion of second derivative isof great importance. Another familiar application of second derivativesinvolves concavity of the graph and the problem of distinguishing localmaxima and minima. Roughly speaking, in an interval where f" ispositive, the graph líes above tangent lines and belowsecant Unes, andonly local mínima can occur, while the reverse is true on an íntervalwhere f" is negative. This is illustrated in Figure 5.4.1.

5.4 Higher Derivatives and Taylor's Theorem

15. A function is called algebraic if it satisfies a polynomial identity¿ajkxi I(x)k = O (finite sum, not all coefficients ajk zero). As­suming I (x) is differentiable, find a formula for l' in terms ofl·

14. If f is a polynomial, show that In is a polynomial. What is thedegree of In if I has degree N? Similarly, show that if I is arational function, then In is a rational function.

13. Let In denote the nth iterate of 1,11 = 1, h(x) = I(f¡ (x)), ...In(x) = l(!n-l(X)). Express I~ in terms of 1'. Show that ifa s 1/'(x)1 s b for all x, then an s 1/~(x)1 s s:

12. *Show that no ratíonal function has derivative equal to L]».

11. Show that the class of rational functions (polynomial divided bypolynomial) is closed under the operation oí differentiation.

10. Show that a polynomial of even order (:¡é O) has either a globalmaximum or a global minimum but not both.

1775.4 Higher Derivatives and Taylor's Theorem

Page 197: Strichartz_The Way of Analysis 2000

Proof: We begin with part C. If!" (xo) > Othis means the derivativeof r is positive at xo, hence f'(x) is strictly increasing at xo. Sincef'(xo) = O, this means that f'(x} < Oif x < Xo and f'(x) > Oif x > Xofor x near xo, say Ix-xol < l/no But this implies f is strictly decreasingon (xo - l/n, xo) and strictly increasing on (xo, Xo + l/n), so Xo is a

c. Suppose also f'(xo) = O. lf !"(xo) > O, then Xo is a strict localminimum; while if !"(xo) < O, then Xo is a strict local maximum.

d. lf Xo is a local minimum, then !" (xo) ~ O; while if Xo is a localmaximum, then !"(xo) ~ O.

a. lf !"(xo) > O, then there exists a neighborhood of Xo where f(x) ~g(x) (even f(x) > g(x) for x # xo), while if !"(xo) < O thereexists a neighborhood of Xo where f(x) ~ g(x).

b. lf f(x) ~ g(x) in a neighborhood of xo, then !"(xo) ~ O; while iff(x) ~ g(x) in a neighborhood of xo, then !"(xo) ~ O.

Theorem 5.4.1 Suppose f is differentiable in a neighborhood of Xoand f"(xo) exists. Let g(x) = f(xo) + f'(xo)(x - xo) denote the bestaffine approximation to f at Xo.

We will establish these relationships in two theorems. The first dealswith properties of the second derivative at a single point, and the sec­ond with properties of a continuous second derivative on an intervaloThe relationship involving the secant line only shows up in the secondtheorem.

Figure 5.4.1:

Chapter 5 Differential Calculus178

Page 198: Strichartz_The Way of Analysis 2000

The same proof also applies to non-strict inequalities. There is aconverseto this theorem, whichwe leave to the exercises.

A different interpretation of the second derivative involvesthe no­tion of the second difference. This idea is implicit in the notation

Proof: Suppose f" > Oin the interval (Xl, X2). Then the same istrue for h = f -g becauseg" = O.Note that h vanishesat the endpointsof the intervaloWewant to prove that h is negative on (Xl, X2). If not,it would achievea local maximumon (XI, X2), say at xo, and by part dof the previous theorem this would imply h"(zo) $ 0, a contradiction.QED

Theorem 5.4.2 Let f be a C2 function on an interval (a, b). Let 9denote any affine function whose graph intersects the graph of f at twopoints (Xl, YI) and (X2, Y2) with Xl < X2. 11 !,,(x) > O for all x in(XI, X2), then f(x) < g(x) for all x in (Xl, X2); while if f"(xo) < O forall x in (XI, X2), then f(x) > g(x) for all x in (Xl, X2).

The function f(x) = x4 has a strict minimum at x = O, but itssecondderivative is zero there, showingthat wecannot improvepart dto have strict inequality.

strict local minimum. Notice that the proof used twice the relationsbetween sign of the first derivative and behavior of the function: firstin going from the sign of the derivativeof f" at a point to behavior off' near the point and then in going from the sign of f' on an intervalto behavior of f on the interval. The proof that !"(xo) < OimpliesXois a strict local maximum is analogous.

Next part d is essentially the contrapositive of part e, for a localminimum cannot be also a strict local maximum, so we cannot have!"(xo) < Oby part c.

Finally we can derive part a from part e and part b from part d.The function f (x) - g( x) vanishesat xo, and its derivative is also zeroat Xo while its second derivative is merely !,,(xo). If !"(xo) > O,thenpart e implies that f(x) - g(x) has a strict localminimumat xo, hencef(x) - g(x) ~ f(xo) - g(xo) = Oin a neighborhoodof xo, with strictinequality for x -:/= xo. Similarly,we can derive part b from part d.QED

1795.4 Higher Derivatives and Taylor's Theorem

Page 199: Strichartz_The Way of Analysis 2000

~2 ~(x) = /'(xo + h) - /,(xo);

Proof: We apply the mean value theorem to the function g(t) =f(t + h) - f(t) on the interval [z, x + hl. Since g'(t) = f'(t + h) - f'(t),we have the required differentiability and continuity of 9 if h is smallenough. The mean value theorem gives (g(x + h) - g(x))/h = g'(xo)for Xo in (x, x + h), and this is exactly

Theorem 5.4.3 lf f is C2 on an interval, then limh_O tl.if(x)/h2 ex­ista and equals f" (x) for any x in the intervalo

so

f(x + h~ - f(x) ~ f'(x),

tl.if(x) ,_ f'(x + h) - f'(x) ,_ f"( )h2'_ h ,_ x .

To make this argument precise we will need to use the mean valuetheorem. A direct approach would be to replace the difference quo­tients (f(x + 2h) - f(x + h))/h and (f(x + h) - f(x))/h by deriva­tives. However this would lead to the awkward situation of havingtl.if(x)/h2 = (f'(X2) - f'(x¡))/h where X2 is in (x + h, x + 2h) and Xl

is in (x, x +h), and we would have no control over X2 - Xl. We need amore clever idea.

and

f (x + 2h) - f (x + h) ~ r (x + h)h

The second derivative should then be the limit of tl.U(x)/h2 as h ~ o.The reason for this is that for small h

tl.h(tl.hf)(x) = tl.hf(x + h) - tl.hf(x)= [f (x + h + h) - f (x + h)] - [f (x + h) - f (x) ]= f(x + 2h) - 2f(x + h) + f(x).

dly / dx2 and is useful in numerical solutions to differential equations.Recall that we defined the difference operator tl.h with increment h astl.hf(x) = f(x+h) - f(x) and f'(x) as the limit of tl.hf(x)/h as h ~ o.The second difference operator tl.i is simply tl.h applied twice:

Chapter 5 Differential Calculus180

Page 200: Strichartz_The Way of Analysis 2000

We come now to the last, and in many ways the most important,interpretation of the second derivative. We have seen that the firstderivative can be thought of as one of the constants in the best affineapproximation to f at the point. Now an affine function is a poly­nomial of degree one, and the derivative shows up as the coefficientof the leading (highest degree) termo We can think of the best affineapproximation as an improvement over approximation by the constan­t function y = f(xo), which is the best polynomial of degree-zeroapproximation to f at Xo. The affine approximation is an improve­ment because we only have f(x) - f(xo) = 0(1) = o(lx - xolO) whilef(x) - (f(xo) + f'(xo)(x - xo)) = o(lx - xol) as X -T Xo. From thiswe would guess that by allowing polynomials of degree 2 we should beable to improve the approximation to o(lx - xoI2). Now if f itself is apolynomial of degree at most 2, then

f(x) = f(xo) + f'(xo)(x - xo) + ~!"(xo)(x - xO)2,

as can be verifiedby a simple computation. For more general J, wewould expect

g2(X) = f(xo) + f'(xo)(x - xo) + ~f"(xo)(x - xO)2

to be the best approximation to f at Xo by polynomials of degree atmost 2. The second derivative thus appears as the coefficient of theleading term, except for the factor of 1/2.

5.4.2 Taylor's Theorem

There is also a converse to this theorem, but it ismuch more difficultto proveo

This is similar to what we had before, with Xl = Xo and x2 = Xo + h,but now we have the difference x2 - Xl = h under control. Applyingthe mean value theorem again to this difference quotient we obtain!:l~f(x)/h2 = f"(XI) for some point Xl in (xo, Xo + h). Since Xl mustlíe between X and X + 2h and f" is assumed continuous, we may takethe limit as h -T Oand get !"(x). QED

~2 f(x) f'(xo + h) - f'(xo)-~~=h2 h

hence,

1815.4 Higher Derivatives and Taylor's Theorem

Page 201: Strichartz_The Way of Analysis 2000

Figure 5.4.2:

Since the quadratic term f"(xo){x - xo)2/2 is O(lx - xol2) as x -+ Xo,we can expect this order of improvement.

Theorem 5.4.4 Let I be a C2 function defined in a neighborhood01xo, and let

92 (z) = f (xo) + !' (xo) (x - xo) + ~f" (xo) (x - XO)2 .

Then 1-92 = o(lx - xol2) as x -+ Xo; in other words, given any errorl/m, there exists l/n such that Ix - xol < l/n implies

I/(x) - 92(x)1 ~ ..!:.Ix - xol2.m

Nowthe expected improvement in using 92rather than the affine ap­proximation concerns the rate of convergence as x -+ Xo. For any par­ticular value oí x wedo not know whether 92 is a better approximation­it might very well be worse. However, the error being o(lx-xoI2) meansthat by taking x close enough to xo we have a very good approxima­tion, since Ix - xol2 is an order of magnitude smaller than Ix - xol. Thegraphs of f and 92 in Figure 5.4.2 show this in the order of contact atthe point.

Chapter 5 Differential Calculus182

Page 202: Strichartz_The Way of Analysis 2000

Tn(xo, x) = I(xo) + I'(xo)(x - xo)

+ ~/"(xo)(x - xO)2 + ... + ~!/(n)(xo)(x - xot,

called the Taylor expansion of I at xo. Here I(n) denotes the nthderivative of 1, defined by induction I(n) = I(n-l),. We say I is enil all derivatives up to order n exist and are continuous. We think ofXo as a fixed point and Tn(xo, x) as a function of x. It is a polynomialof degree n and is uniquely determined by the requirement that at thepoint Xo it agrees with I up to the nth derivative. Ifwe need to discuss

We wiU not discuss the problem of obtaining a converse kind ofstatement, deducing the existence of the second derivative from the ex­istence of quadratic polynomial approximations, because such theoremsare extremely difficult to prove and have few applications.

The theorem we have just established is clearly part two of a moregeneral theorem, which is known as Taylor's Theorem. Let us write

as required. QED

Proof: The function 92 was chosen so that I(xo) = 92(XO), f'(xo) =!1Í(xo), and I"(xo) = g'2(xO). Thus ifwe let F = I -g2 we have F(xo) =O,F'(xo) = 0, and F"(xo) = 0, and F is c'l, being the difference ofI and a quadratic polynomial. We need to show that we can makeIF(x)1 ~ Ix - xol2/m by taking x near Xo, and we can do this byapplying the mean value theorem twice to F. First, since F(xo) =0, we have F(x) = F'(XI)(X - xo) for some Xl between Xo and z,Similarly, since F'(xo) = 0, F'(x¡) = F"(X2)(XI - xo) for some x2between Xo and xl. The fact that F is C2 guarantees that the continuityand differentiability conditions of the mean value theorem are satisfiedby F and F'. Putting together these two equations gives us F(x) =F"(X2)(X - XO)(XI - xo). Hence IF(x)1 ~ IF"(X2)llx - xol2 since IXI -xol ~ Ix-xol because Xl lies between Xo and x. Now F"(xo) = °and F"is continuous, so given l/m there exists l/n such that Ix - xol < l/nimplies IF"(x)1 < l/m. The point X2 must lie in the neighborhoodIx - xol < l/n also, since it lies between Xo and z, so IF"(X2)1 < l/mand we have

1835.4 Higher Derivatives and Taylor's Theorem

Page 203: Strichartz_The Way of Analysis 2000

g(x) - Tn(xo, x) -+ O(x - xo)n

Taylor's theorem is extremely useful for understanding the behaviorof a function near a point. It is important to realize, however, that thetheorem does not assert anything about the behavior of Tn (xo, x) at apoint x as n varies. The parameter n must be fixed in any applicationof the theorem, and x must vary close to xo. We will return to thislater when we discuss two seemingly related but in fact quite differenttopics: power series and the Weierstrass approximation theorem.

Taylor's theorem gives us a formula for the Taylor expansion Tn(xo, x),but it is not always necessary-or advisable--to use this formula tocompute the Taylor expansiono We know Tn(xo, x) is uniquely deter­mined among polynomials ofdegree n by the condition I(x)-Tn(xo, x) =o(lx - xoln) as x -+ Xo, for if we also had I(x) - g(x) = o(lx - xoln) asx -+ Xo for 9 a polynomial of degree n, then

F(x) = F'(xl)(x - xo),F'(xI) = F"(X2)(Xl - xo),··.

F(n-l)(xn_I) = F(n)(Xn)(Xn_l - xo),

each time using F(k)(XO) = O, and the hypothesis that f is en (henceF ís en). Altogether we obtain

F(x) = F(n) (xn)(x - XO)(Xl - xo)··· (Xn-l - xo),

hence IF(x)1 ~ IF(n)(xn)llx-xoln and we complete the proofas beforeusing F(n)(xo) =Oand the continuity of r». QED

Theorem 5.4.5 (Taylor's Theorem) Let I be en in a neighborhood 01Xo. Then 1- Tn = o(lx - xoln) as x -+ xo·

Proof: Setting F = 1-Tn, we have F(k) (xo) = Ofor k = O,1, ... ,n.We need to show F(x) = o(lx - xoln) as x -+ Xo. We apply themean value theorem n times to the functions F, F', ... , F(n-l) to obtainpoints Xl! X2, ... , Xn all between Xo and Xl! such that

more than one function at a time we will write Tn(f, Xo, z) instead ofTn(xo, x).

Chapter 5 Differential Calculus184

Page 204: Strichartz_The Way of Analysis 2000

f(x) f'(zo) + 0(1)g(z) = g'(xo) + 0(1)

since we are assuming f(zo) = Oand g(zo) = O. Now we would like tocancel the common factor z - Zo, which is non-zero if z "1= Zo. This willchange the o(lz - zol) terms, which stand for functions with limit zeroat Zo after dividing by x - Xo, to 0(1) terms, which stand for functionswith limit zero at Zo. Thus

f(zo) + !'(zo)(z - zo) + o(lz - zol)= g(zo) + g'(zo)(z - zo) + o(lz - zol)f'(zo)(z - zo) + o(lz - zol)= g'(zo)(z - zo) + o(lz - zol)

f(z)g(z)

As an application, let's look at the notorious L'Hópital's rule for evalu­ating limits of quotients, limx_xo f(z)/g(z) when both limitslimx_xo f(z) and limx_xo g(z) vanish. We will see that Taylor's The­orem leads to a conceptually clear proof of L'Hópítal 's rule and alsoallows us to answer sorne related questions, such as: what is the deriva­tive of a quotient f /9 at a common zero of f and g?

Suppose f and 9 are el and g'(zo) =F O. We write

5.4.3 L'Hopital's Rule·

as z ~ Zo and by writing g(z) - Tn(zo, z) = L~=o ak(z - zO)k wecan show first ao = 0, then al = 0, etc., since the Iowest order non­zero term of L:=o ak(z - zO)k dominates a11the others as z -+ Zo.This means that if we can obtain, by hook or crook, a polynomiaI 9of degree n such that f(z) - g(z) = o(lz - zoln) as z -+ Zo, theng( z) = Tn(zo, z). For example, if f is the product of two functionsf = fI .h and we know the Taylor expansions of order n of fI and 12,then we can obtain the Taylor expansion of order n of f by rnultiplyingthe Taylor expansions of f and 9 and retaining only the terms of degreeup to n. Indeed the powers (z-zO)k for k > n are a11o(lz-zoln) and somay be discarded. We wi11not state this-or other related results-asa formal theorem but rather enunciate a usefuI informal principIe: youcan operate with Taylor expansions in the same way yo,", can operatewith functions, discarding higher order terms. We willleave as exercisesvarious special cases of this principIe.

1855.4 Higher Derivatives and Taylor's Theorem

Page 205: Strichartz_The Way of Analysis 2000

g'(xo) [!f"(xo) + 0(1)] - f'(xo) [!g"(xo) + 0(1)]g'(xo) [g'(xo) + !g"(xo)(x - xo) + o(lx - xo!)] .

B(",) = ("'-"'o)g'(",o) [9'("'0)('" - "'o) + ~gll(",O)('" - "'0)2 + 0(1'" - ",o12)]

since f(zo) = g(xo) = O. Notice that the terms f'(xo)g'(xo)(x - xo)in the numerator cancel, and then we may factor out (x - xO)2 in thenumerator and denominator to obtain simply

°and

A(",) = g'(",o) [1'(",o)(", - "'o) + ~f"("'O)('" - "'0)2 + 0(1'" - ",o12)]

- f'(",o) [9'(",o)(", - "'o) + ~9"("'o)(", - "'0)2 + 0(1'" - ",o12)]

where

_1_ [1(X) _ 1'(xo)] = 1(x)g'(xo) - 1'(xo)g(x) = A(x)z - Zo g(x) g'(zo) (z - xo)g(x)g'(xo) B(x)

because f'(zo)/g'(xo) is the value of the function at Zo. Let us assumethat f and 9 are C2 so that we can take the Taylor expansions to order2. We obtain

_1_ [f(X) _ f'(xo)]x - Xo g(z) g'(xo)

lim f(z) = lilIlx-xo(f'(zo) + 0(1)) = f'(zo)X-xo g(z) limx_xo(g'(zo) + 0(1)) g'(zo)'

Now let's ask the question: what is the derivative of f(z)/g(z)at xo? We assume, of course, that we define the value of J(x)/g(x)at Xo to be J'(xo)/g'(xo) so as to have a continuous function. It is notobvious that f(x)/g(z) is differentiable at Zo,and even ifit is, the usualquotient formula for the derivative will not be very helpful. Instead wewant to look at the differencequotient and use Taylor's theorem to findits limito The difference quotient at Xo is

and if g'(zo) =1= O there is no difficulty in taking the limit as z -+ zo toget

Chapter 5 Differential Calculus186

Page 206: Strichartz_The Way of Analysis 2000

Since this is the value we computed for (f /g)' at Xo, we have the con­tinuity of (f / g)' at Xo. We leave the details as an exercise.

The point of the aboye applications (and a number of exercisesto foUow) is that Taylor's theorem reduces certain kinds oí problemsto rather straightforward computations. Whenever the issue is thelocal behavior of a function near a point, it is the first technique youshould try. Incidentally, 1 have not given the best possible results forL'Hópítal's rule. There are slightly weaker hypotheses that wiU alsodo.

lim (L)' = g'(xo)/"(xo) - I'(xo)g"(xo).x-Xo 9 2g'(xO)2

which is the derivative of tIs for x '# Xo, we can show, under theassumptions that 1and 9 are C3 and I(xo) = g(xo) = O but g'(xo) "1= O,that

I'(x)g(x) - I(x)g'(x)g(x)2

which clearly has limit I(n)(xo)/g(n)(xo) as x -+ xo. Using this gener­alized L'Hópítal's rule with n = 2 on the quotient

by the quotient formula for limits, so l/gis differentiable at xo andthis is its derivative. We can also compute higher derivatives oí 1/9 atxo by similar arguments; to get the nth derivative we need to assumethat 1 and 9 are cn+1•

So far we have been dealing only with the case g'(xo) '# O. Ifg(xo) = O but I'(xo) =/: O we can easily show that I/g does not havea finite limit as x -+ xo. If both I'(xo) and g(xo) are zero, then wecan go to higher order Taylor expansions (assuming more derivativesof 1 and 9 exist) to compute the limit of tIs. If 1 and 9 are en andl(k)(xO) = O and g(k)(xO) = O for k = 0,1, ... , n - 1 but g(n)(xo) '# O,then

g(xo)l"(xo) - I'(xo)g'(xo)2g'(xO)2

It is clear that this has a limit as x -+ Xo equal to

1875.4 Higher Derivatives and Taylor's Theorem

Page 207: Strichartz_The Way of Analysis 2000

Proof: It would appear that we should attempt to give a proof byinduction, based on the mean value theorem. However, if you lookback at the proof that we gave for Taylor's theorem, you will see that

Lagrange Remainder Theorem Suppose f is en+1 in a neighbor­hood of Xo. Then for every X in the neighborhood there ezists Xl betweenXo and X such that

so it goes to zero as X ~ xo) but would also require assuming f is en+!since the derivative of order n+1 is involved (as in the case of the meanvalue theorem we could get away without assuming the continuity off{n+l)).

f(x) - Tn(xo,x) - 0(1 1)- X - Xo ,Ix - xoln

for some point Xl between Xo and z, Note that this would give theerror as O(lx - xoln+l), which is somewhat stronger than o(lx - xoln)(if f(x) - Tn(xo, x) = O(lx - xoln+l) then

For many applications ofTaylor's theorem one needs a more precise for­m for the remainder, or error, term o(lx - xoln). We give now one suchexpression, the Lagrange remainder formula. We will give an integralremainder formula in the next chapter. The Lagrange remainder formu­la is really a generalization of the mean value theorem since it involvesthe value of a higher derivative of the function at an unspecified point.lfwe write the mean value theorem as f(x) = f(xo) +f'(XI)(X -xo) forsorne Xl between xo and x, we can interpret this as a zero-order Taylortheorem with remainder f'(XI){X - xo). Note that the remainder looksexactly like the next term in the Taylor expansion except for the onechange that the point Xl appears instead of Xo when we evaluate f'.This suggests the generalization

5.4.4 Lagrange Remainder Formula"

Chapter 5 Differential Calculus188

Page 208: Strichartz_The Way of Analysis 2000

We need to show g( x) ~ o; in fact wewill show 9 is non-negative on theinterval [xo, xl. Nowthe waywehave constructed ~ubtracting offthenth order Taylor expansion from f and then subtractingM_{x - xo)n+1 j{n + 1)! guarantees that 9 and all its derivatives upto order n vanish at Xo (all these derivatives of (z - xo)n+1 vanish at

g(x) = /(x) - Tn(xo,x) - (n ~ l),M_(x - xo)n+I.

where M_ 5 f(n+1) s M+ on [xo, xl. Let us write

Tn(xo, x) + ( 1 ),M_(x _ xo)n+!.n+ 1.

This will allow us to complete the proof since f(n+1) assumes all valuesbetween M_ and M+ on the interval between Xo and a;

Suppose for simplicity that x > Xo. We need to show that

and

for any x 1. More generally, if M+ and M_ denote the sup and inf off(n+1) on the interval between xo and x, then it is reasonable to expectthat f (x) should líe between the extremes

Tn(xo,x) + ( 1 ),M+(x - xo)n+!n+l.

we do not quite get the desired form for the remainder. Therefore wewill take a different approach. We ask what could make the differencef(x) - Tn(xo,x) as bad as possible? If f(n+1)(x) were identically zero,then f would be a polynomial of degree n and so f would equal Tn (xo, x)exactly. It would appear then, that to make f(x)-Tn(xo, x) big we needto have f(n+1)(x) big, and it seems reasonable that taking f(n+1)(x)equal to a constant M would do the most damage among all possibilitieswith f(n+1)(x) bounded by IMI. Of course if r-» (x) = M for all x,then f is a polynomial of degree n + 1 and so

1895.4 Higher Derivatives and Taylor's Theorem

Page 209: Strichartz_The Way of Analysis 2000

Taylor's theorem allows us to generalize many concepts and theoremsabout polynomials to more general functions that are sufficiently dif­ferentiable. Here we discuss the notion of the order of a zero. If 9 is apolynomial and g(zo) = O, the order of the zero at Xo is defined to bethe highest integer k such that (z - zo)k divides g(x); in other wordsg(x) = g1(X)(X - xo)k for some polynomial g1(X) and g1(XO) ::j:. O. Nowit is simple algebra to show that the order k is characterized by the factthat g(xo) = O,g'(zo) = O,... ,g(k-1)(zO) = Obut g(k)(xO) '# O. If theorder is one we say Xo is a simple zero. The order gives the number oftimes (x - zo) divides the polynomial. It also gives the rate at whichg(z) tends to zero as z -+ zo; namely, g(z) = O(lx - xolk).

5.4.5 Orders of Zeros·

when n + 1 is odd (when (x - xo)n+l is negative). We willleave thedetails as an exercise. Notice that the reversal of the estimates doesnot in any way destroy the final step in the argument, that we rnusthave j(x) - Tn(zo, x) = j(n+l) (x¡)/(n + 1)! for sorne Xl because j(n+l)assumes all values between M_ and M+. QED

1 M ( )n+l( ), - x - Xon+ 1.

<

zo), and furthermore g(n+l) is non-negative on the interval [zo, z] be­cause g(n+l) = j(n+l) - M_ and M_ is the inf of j(n+l) on the interval(notice that here the factor 1/(n + 1)! cancels the (n + 1)-derivative of(z - xo)n+l).

Next we use reverse induction to show g(n), g(n-l), ... , g', 9 are allnon-negative on the intervalo First we note that g(n) (xo) = O and g(n)is monotone increasing because its derivative g(n+l) is non-negative, sog(n) is non-negative. Once we know g(n) is non-negative we can applythe same reasoning to g(n-1) and so on. Finally when we have g(x) ~ Owe have established M_(z - zo)n+l /(n + 1)! ~ j(x) - Tn(xo, x). Theother inequality j(z)-Tn(zo, z) ~ M+(z-xo)n+l /(n+l)! is establishedby analogous reasoning.

The case x < Xo is actually a little more complicated, because wehave the same estimates when n + 1 is even but the reverse estimates

Chapter 5 Differential Calculus190

Page 210: Strichartz_The Way of Analysis 2000

Finally we can also prove an analogue of factorization: I (x) = f¡(x) (x­xO)k where f¡(x) is a continuous function if I has zero of order k atXo. This is in fact a variation on l'Hópítal's rule, which we leave as anexercise.

We can paraphrase this discussion by saying the zeroes 01order k 01an arbitraryfunction are qualitatively like (x-xO)k. A natural questionthat arises is: are there any other kinds of zeroes? If we call the zeroeswe have been discussing zeroes 01finite order, it turns out that thereare also zeroes of infinite order. For this we must assume the functionis Coo, which means that derivatives of all orders exist (hence must becontinuous). A COO function is said to have a zero of infinite order atXo if I(n)(xo) = O for all n-so that all the Taylor expansions Tn(xo, x)vanish identically. It is not obvious that such zeros exist (except forI == O), but the function e-1/x2 at x = O gives one example. We willdiscuss this further in the chapter on transcendental functions.

Figure 5.4.3:

Order 3Order 2Order 1

Now let I be an arbitrary function that vanishes at Xo and is ofclass e». We can say that I has a zero 01order k at Xo if I (xo) = O,I'(xo) = 0, ... , l(k-l)(xO) = O but I(k)(xo) =F O; in other words if thepolynomial Tk(XO, x) has a zero of order k at Xo. Zeroes of order 1,2, 3 are shown in Figure 5.4.3. Taylor's theorem then says I(x) =ak(x - xo)k + o(lx - xolk) where ak = I(k)(xo)/k! is non-zero, so I(x) =O(lx - xolk) as x ~ xo. Thus the rate at which I tends to zero is thesame as for a polynomial with the same order zero. We can also deducethat if the zero is of odd order the functions must change sign near Xo,while if the zero is of even order it does not change sign near Xo.

1915.4 Higher Derivatives and Taylor's Theorem

Page 211: Strichartz_The Way of Analysis 2000

9. Suppose f is a en function on an interval and Tn(xo, x) is thesame function of x for all Xo in the intervalo What can you sayabout f?

a. f(x) = sin »[»,b. f(x) = (1- cosx)/x2,c. f(x) = (x2 - x)/ sinx,

d. f(x) = x/(l - cos x - sin x).

6. If f and 9 are en+! and f(xo) = g(xo) = Obut g'(xo) "1= Oshowthat f / 9 is en near xo and find a formula for (f /g)(n) (xo).

7. If f and 9 are e3 functions and f(xo) = f'(xo) = g(xo) = g'(xo) =Obut g"(xo) "1= Oshow that (f /g)' exists at Xo and compute it.

8. For each of the following functions defined for x "1= Ofind limx_o f (x)and f'(O) if the function is appropriately defined at x = O. Youmay use the familiar formulas for derivatives of sine and cosine:

4. Let f and 9 be e3 functions with f(xo) = g(xo) = Obut g'(xo) ::j:.O. Show that the derivative of f /g is continuous at xo.

5. Under the same hypotheses as exercise 4 show that (f /g)" existsat xo and compute it.

The expression f (x +h) - 2f (x) + f (x - h) is called the symmetricsecond difference.

li f(x + h) - 2f(x) + f(x - h) - f"( )m h2 - x .

h.-O

3. If f is e2 on an interval prove that

5.4.6 Exercises

1. Suppose f is a e2 function on an interval (a, b) and the graph of flies aboye every secant lineo Prove that f" (x) ~ Oon the intervalo

2. Suppose f'(xo) = 0, f"(xo) = 0, ... , f(n-l)(xo) = Oand f(n)(xo) >O, for a en function f. Prove that f has a local minimum at Xoif n is even and that Xo is neither a local maximum nor a localminimum if n is odd.

Chapter 5 Differential Calculus192

Page 212: Strichartz_The Way of Analysis 2000

a. f(x) = (x2 + 1)25,

b. f(x) = x/(x2 + 1),c. f(x) = (1+ x + 2x2) sin21rx,d. f(x) = cos(l + x2).

by retaining only the powers of (x - xo) up to n. (Hint: useexercises 13 and 14.)

16. Compute the Taylor expansions to order 3 for each of the followingfunctions at the points Xo = ° and Xo = 1:

_!_ (1+ t(-l)i (t ak (x _ XO)k)i)ao . 1 ao1= k=l

15. Suppose f is a en function in a neighborhood of Xo and supposef(xo) #- O. Let Tn(f, Xo, x) = Lk=O ak(x - xO)k. Show thatTn(l/ f, Xo, x) is obtained from

by retaining only the powers of (x - xo) up to n.

14. If f(x) = 1/(1 + x) show that Tn(f,O,x) = ¿k=O(-l)kxk.

12. Under the same hypotheses as exercise 11, show that the Taylorexpansion of i-s at Xo is obtained by taking Tn(f, Xo, x)Tn(g, Xo, x)and retaining only the powers of (x - xo) up to n.

13. Suppose f is a en function in a neighborhood of Xo and 9 is en in aneighborhood of f(xo). Let ¿~=oak(x - xO)k be Tn(f, Xo, x), andlet ¿i=o bi(Y - yo)i be Tn(g, Yo, y) where Yo = f(xo). Show thatTn(g o f, Xo, x) is obtained from

11. Suppose f and 9 are en functions with Taylor expansions denotedTn(f,xo,x) and Tn(g,xo,x). Prove that Tn(f,xo,x)+Tn(g,xo,x)is the Taylor expansion of f + 9 at xo.

10. Complete the proof of the Lagrange remainder formula for x < Xo.

1935.4 Higher Derivatives and Taylor's Theorem

Page 213: Strichartz_The Way of Analysis 2000

(Xl - x2)f(X3) + (X2 - x3)f(xt) + (X3 - X¡)f(X2)

= -~j"(Y)(XI - X2)(X2 - X3)(X3 - Xl)'

22. *a. Let f be a e2 function on [a, b] and let Xl, ••• ,Xn be pointsin [a, b]. Show that f"(x) ~ Oon [a, b] implies

f G(Xl + ... + Xn)) ::; ;;(f(X¡) + ... + f(xn))

while f"(x) ~ Oimplles the reverse inequality.b. More generally, let PI, ... ,Pn be positive and satisfy the con­

dition PI + ... + Pn = 1. Show j"{x) ~ Oon [a, b] impliesf(PIXl +.. +Pnxn) ~ p¡f(x¡)+" +Pn!(xn), while j"(x) ~ Oimplies the reverse inequality.

23. a. If f is en on an interval and has n + 1 distinct zeroes, provethat f<n) has at least one zero on the interval.

b. If f is en on an interval and f<n) never vanishes, then f hasat most n zeroes on the interval.

c. A polynomial of degree n has at rnost n real zeroes.d. *Ir f is e2 on an interval and Xl, X2, x3 are three distinct

points on the interval, then there exists y in the interval with

21. Apply Taylor's theorem with Lagrange remainder to (x + y)a fora rational to obtain a form of the binomial theorem.

20. Use the second-order Taylor theorem with Lagrange remainder toestimate JTIIT.

17. Suppose f isel on an interval and f' satisfies the Holder conditionoforder a, If'(X)- f'(y)1 ~ Mlx-ylO for all x and y in the interval,where a is a fixed value, O < a ~ 1. Show that IA~f(x)1 ~elh11+0• How does the constant e relate to the constant M?

18. Let f be a en function. Show that the derivative of Tn(f, Xo, x)is equal to Tn-¡(f', Xo, x).

19. Suppose f has a zero of order j at Xo and 9 has a zero of order kat xo. What can you say about the order of zero of the functionf + g, f . g, f / 9 at xo?

Chapter 5 Differential Calculus194

Page 214: Strichartz_The Way of Analysis 2000

is differentiable but not el.

f(x) = { x2 sin(l/x2), X =F 0,0, x = 0,

Example

Deftnition A function is said to be differentiable on an open set if itis differentiable at each point of the seto It is said to be continuouslydiJJerentiable (el) if the derivative is a continuous function on the seto

Theorem If f is differentiable at Xo, then f is continuous at Xo.

Deftnition 5.1.2 For functions f and 9 defined in a neighborhood ofXo, we say f(x) = O(g(x)) as x -+ xo if If(x)l :c:; clg(x)1 for someconstant e in a neighborhood of Xo. We say f(x) = o(g(x)) as x -+ Xoif limx_xo f(x)/g(x) = O.

or, equivalently, if for every m there exists n such that Ix - xol < l/nimplies If(x) - g(x)1 :c:; Ix - xol/m where g(x) = f(xo) + f'(xo)(x - xo),called the best affine approximation to f at Xo.

u f(x) - f(xo) f'( )im = XoX-Xo X - Xo

Deflnition 5.1.1 A function f defined in a neighborhood of Xo is saidto be differentiable at Xo with derivative f' (xo) if

5.1 Concepts of the Derivative

5.5 Summary

(Hint: subtract a quadratic polynomial to reduce to part a).

24. If f is C2, prove that f cannot have a local maximum or minimumat an inflection point (note that an infiection point is defined asa point where !" changes sign; it is not enough that !" vanish atthe point).

1955.5 Summary

Page 215: Strichartz_The Way of Analysis 2000

Theorem /f f is dífferentíable on (a, b) and f' is bounded, then fsatisfies a Lipschitz condition uniformly on (a, b).

Theorem 5.2.2 Let j be dífferentiable on (a, b).

1. f is monotone increasing on (a, b) ir and only ir j' (x) ~ Oon (a, b).

2. f' (x) > O on (a, b) implies f is strictly increasing on (a, b).

3. f'(x) = O on (a, b) implies f is constant on (a, b).

Mean Value Theorem /f f is contínuous on [a,b] and differentiableon (a, b), then f'(xo) = (f(b) - f(a))j(b - a) for some Xo in (a, b).

Intermediate Value Theorem 11 j is diJJerentiable on (a, b), thenf'(x) assumes all values between f'(xI) and f'(X2) on the interval(XI,X2).

Theorem 5.2.1 f'(xo) > Oimplies f is strictly increasing at Xo· /f f ismono tone increasing at xo, then f'(xo) ~ O. /1 1 has a local maximumat xo, then f'(xo) = o.

Deflnition 5.2.1 A function 1 defined in a neighborhood of Xo is saidto be mono tone (resp. strictly) increasing at Xo if there exists a neigh­borhood of Xo on which Xl < Xo < x2 implies f(xI) ~ f(xo) ~ f(X2)(resp. f(xI) < f(xo) < f(X2)). It is said to haue a local maximum (re­sp. strict local maximum) if there ezists a neighborhood of Xo on whichf(x) ~ f(xo) (resp. f(x) < f(xo) for X ;/; xo). A functíon definedon an interval is soid to be monotone (resp. strictly) increasing on theinterval if X < Y implies f(x) ~ f(y) (resp. f(x) < f(y)) for all xand y in the intervalo Similar definitions apply to mono tone and strictdecreasing and local mínimum and strict local minimum by reversingthe inequalities.

5.2 Properties of the Derivative

Chapter 5 Differential Calculus196

Page 216: Strichartz_The Way of Analysis 2000

Theorem 5.4.2 JI I ís C2 on (a, b) and I"(x) > Oon (a, b), then thegraph 01 I líes below any secant lineo

4. JI Xo is a local minimum, then !"(xo) ~ O.

3. JI I'(xo) = O and !"(xo) > O, then Xo is a strict local minimum.

2. 11 I(x) ~ g(x) in a neighborhood o/ xo, then /"(xo) ~ o.

1. JII"(xo) > O, then I(x) > g(x) on a neighborhood 01 xo, [orx =F Xo (the groph 01 I lies aboue the tangent Une).

Theorem 5.4.1 Suppose !"(xo) exista, and let g(x) = I (xo)+l' (xo) (x­xo) be the best affine approximation to I at xo.

Definition JI l' is defined in a neighborhood 01 Xo and differentiableat xo, then I is said to be twice differentiable at Xo with second deriva­tive J"(xo) equal to (J')'(xo). JI J"(x) exists and is continuous on aninterval we say I is twice continuously differentiable (C2).

5.4 Higher Derivatives and Taylor's Theorem

Local Inverse Function Theorem Let I be el in a neighborhood 01Xo with I'(xo) =F O. Then there exists a neighborhood (a, b) 01 Xo suehthat I restricted to (a, b) has a el inverse on (e, d) = I((a, b)).

Inverse Function Theorem Let I be el on (a, b) with image (e, d),and suppose I'(x) > O on (a, b) (or I'(x) < O on (a, b)). Then 1-1exists on (c,d) and is el with (f-l),(y) = l/I'(z) ily = J(x).

Theorem 5.3.2 (Chain Rule) JI I is differentiable at Xo and 9 ·ís dil­lerentiable at I (xo), then gol is differentiable at Xo and (g o 1)'(xo) =9' (J (xo ))J' (xo) .

Theorem 5.3.1 JI I and 9 are differentiable at xo, then so are I±g, l·g, and Ils (il g(xo) =F O) with the lamiliar formulo» for the derivatives.

5.3 The Calculus of Derivatives

1975.5 Summary

Page 217: Strichartz_The Way of Analysis 2000

Lagrange Remainder Theorem Jf f is en+1, then1

f(x) = Tn(xo,x) + (n + 1)!f(n+1)(x1)(x - xot+l

lim f(x) = f'(xo).X-%o g(x) g'(xo)

JI f and 9 are e», then I / 9 is differentiable at Xo with

( L)' (xo) = g'(xo)J"(xo) - f'(xo)g"(xo) .9 2g'(xO)2

Jf f and 9 are es, then f / 9 is el .

L'Hñpital Rule Jf f and 9 areel with f(xo) = g(xo) =o but g'(xo) =Fo, then

Theorem 5.4.5 (Taylor's Theorem) Jf f is en, then f(x)-Tn(xo, x) =o(lx - xoln) as x -T Xo.

Tn(xo. x) =i:;!¡(k) (Xo)(x - xO)k,k=O

Deflnition f(n)(xo) = (j(n-l»)'(xo) by induction, if it ezists. We sayf is en if f(k) exists and is continuous for all k s n. The Taylorexpansion of order n at Xo for a en function is defined by

Theorem 5.4.4 Jf f is e2 in a neighborhood of z«, then f(X)-g2(X) =o(lx - xol2) as x -T Xo where

g2(X) = f(xo) + f'(xo)(x - xo) + ~!"(xo)(x - XO)2.

on the intervalo

Theorem 5.4.3 Jf f is e2 on an interval, then

lim ~~(x) = j"(x)h.-O h2

Chapter 5 Differential Calculus198

Page 218: Strichartz_The Way of Analysis 2000

Theorem lf f has a zero of order k at xo, then f(x) = O(lx - xolk)as x -+ xo.

Definition lf f is Ck we say f has a zero of order k at Zo if f(xo) =f'(xo) = ... = f(k-l)(zO) = O but f(k)(xO) ::f. O.

for sorne Zl between Zo and z.

1995.5 Summary

Page 219: Strichartz_The Way of Analysis 2000
Page 220: Strichartz_The Way of Analysis 2000

201

In this section we will prove the existence of the definite integralf: I(x) dx of a continuous function 1on a compact interval [a, b], fol­lowing the usual approach. That is, we partition the interval into nsubintervals [Xk-l, Xk] where a = Xo < Xl < ... < Xn = b and form theapproximating sum Lk=l I(Xk)(Xk-Xk-¡). The value !(Xk)(Xk-Xk-¡)is the area of the rectangle shown in Figure 6.1.1 with base [Xk-ll Xk]and height I(Xk), which should be approximately the area under thegraph of 1over the interval [Xk-l, Xk] if 1does not vary mucho Thus,the approximating sum should be close to the area under the graph of1 (with the convention that regions below the x-axis are counted witha minus sign). The integral should then be obtained as a limit of theseapproximating sums as the size of the subintervals decreases to zero,and the number n of subintervals increases without bound. The notionof "limit" we are using here is somewhat different from the notion of alimit of a function or limit of a sequence, but it is very much in the samespirit. We will give the precise definition below, but before doing so weneed to discuss sorne of the intuitive ideas that will be needed in theproof and that will also play a role in our later extension of the notionof integral to more general (not necessarily continuous) functions.

The choice of the point Xk at which we evaluate 1 is somewhatarbitrary. We couldjust as wellchoose any other point ak in the interval

6.1.1 Existence of the Integral

6.1 Integrals of Continuous Functions

Integral Calculus

Chapter 6

Page 221: Strichartz_The Way of Analysis 2000

Deftnition 6.1.1 For any partition P, the maximum intervallength isthe maximum length 01 the subintervals 01 the partition. We say thatthe limit 01S(f, P) exists and equals the number J: I(x) dx illor everyerror l/N there exists l/m such that IS(f, P) - J: I(x) dxl ~ l/N [orany partition P with maximum intervallength ~ l/m.

(See Figure 6.1.2). Clearly any Cauchy sum S(f, P) must lie betweenthese. Also, from the intuitive properties of area, it is clear that thearea under the graph of I must He somewhere in between the upperand lowerRiemann sums.

n

S-(f,P) = E mk(Xk - Xk-¡).k=l

n

S+(f,P) = EMk(Xk - Xk-¡),k=l

[Xk-l, Xk]. In fact let us write S(f, P) = ¿k=l I(ak)(xk - Xk-¡) whereP denotes the partition and ak is any point in [Xk-l, Xk] and call this aCauchy sumoTwo important special cases are when we choose I(ak) aslarge (or as small) as possible. If Mk denotes the sup of I on [Xk-b Xk]and mk the inf, then Lk=l Mk(Xk -Xk-¡) and Lk=l mk(xk -Xk-¡) arethe largest and smallest Cauchy sums we can form (since f is assumedcontinuous, it attains its sup and inf on the closed subintervals). Wecall these the upper and lower Riemann sums, written

Figure 6.1.1:

a

Chapter 6 Integral Calculus202

Page 222: Strichartz_The Way of Analysis 2000

Since the "area under the graph of f" is only an intuitive concept,we cannot use it to prove the existence of the integral; rather we wantto use the integral to give a precise mathematical counterpart to theintuitive concept of area. Our strategy will be to show that the upperand lower Riemann sums converge to a common limito Then the intu­itive "area" should also be equal to this limit, because it is squeezed inbetween. The same argument will show that any Cauchy sums S(1,P)will converge to the limito Thus we need to show that S+ (1,P) andS- (1,P) converge to a common limit as the size of the intervals in Ptends to zero.

Why is this true? Let's first look at a somewhat simpler ques­tion that contains the crux of the matter: what will make S+(I, P) -S- (1,P) small? The difference of the upper and lower Riemannsums is the area of the little rectangles in Figure 6.1.3, or¿~=l(Mk-mk)(Xk-xk-¡). The sum ofthe lengths ofthe bases oftheserectangles is clearly the length of the full intervalb - a = ¿k=l(Xk - Xk-l), which does not depend on the particularpartition. The heights of the rectangles, Mk - mk, are the variations ofthe function f over the subintervals [Xk-b Xk], and these can be madesmall by taking the subintervals sufficiently small, by the continuity off. Since we need to make them all small simultaneously, it is clear thatwe should use uniform continuity. Because f is assumed continuouson the compact interval [a, b), we know that I is uniformly continuous:given any error l/N there exists l/m such that Ix - yl < l/m impliesIf(x) - f(y)1 < l/N.

Figure 6.1.2:

2036.1 Integrals 01 Continuous Functions

Page 223: Strichartz_The Way of Analysis 2000

n

< l/N ~)Xk - Xk-¡) = l/N(b - a),k=l

which can be made as small as desired since b - a is fixed.Being able to make the upper and lowerRiemann sums close to each

other for a fixed partition does not in itself imply the existence of thelimit defining the integral. It is still conceivable tbat the values of theseRiemann sums vary a lot as we vary the partition. This in fact does nothappen, but to see why we need a new idea: the upper Riemann sumsdecrease and the lower Riemann sums increase when we add points tothe partitions. Let us say that a partition p¡ is a refinement of P if p¡contains P (we think of the partitions as consisting of the endpoints ofthe subintervals); in other words P¡ is obtained by further partitioningthe subintervals ofP. Since the sup of f over the smaller subintervals ofp¡ will be smaller than the sup of f over the containing subintervals ofP, wewill have S+(I, P¡) ~ S+(I, P). Similarly s=i], PI) ~ s=t], P).Thus if we consider a sequence of partitions Pi, P2, ... , each of which

n

S+(1,P) - S- (1,P) = ~~)Mk - mk)(xk - Xk-¡)k=l

Thus if we choose the partition P such that Xk - Xk-l < l/m forall k, then Mk - mk < l/N since Mk and mk are values assumed by fon [Xk-b Xk). Thus

Figure 6.1.3:

Chapter 6 Integral Calculus204

Page 224: Strichartz_The Way of Analysis 2000

n

~ l/N(b - a) L(Xk - Xk-I) = l/Nk=l

and so any Cauchy sum S(I, P) Hesin the interval [S-U, P), S+U, P)]of length at most l/N. But J: I(x) dx also lies in this interval (inf U ~

n

S+(I, P) - S-U, P) = ~)Mk - mk)(xk - Xk-l)k=l

is a refinement of the previous one, the sequence {S+ (f,Pk)} will bemonotone decreasing and {S- (1,Pk)} will be monotone increasing, soeach will have a limit (the limits are finite since a11Cauchy sums arebounded aboye and below by b - a times the sup and inf of I on theinterval). If the maximum intervallength Pk tends to zero as k -+ 00,

then the argument aboye shows S+(I, Pk) - S-(l, Pk) -+ O, so the twolimits must be equal. We can take the limiting value as the definitionof J: I(x) ds: Of course we then need to verify that the limit does notdepend on the particular sequence of partitions PI, P2, .... This is nothard to accomplish, using the ideas we have already encountered.

Theorem 6.1.1 (Existence 01 the IntegraQ Let I be a continuous func­tion on [a,b]. Then the limit 01 the Cauchy suma S(I, P) ezists. Theintegral J: I(x) dx is also equal to the inf 01 the upper Riemann sumsS+ (1,P) and the sup 01 the lower Riemann suma S- U, P) as P varíesouer all partitions.

Proof: Let U denote the set of values of upper Riemann sum­s S+U,P) and L denote the set of values of lower Riemann sumsS- (1,P). We claim that inf U = sup L. Indeed inf U ~ sup L becauseevery element in U is greater than every element of L (if PI and P2 aretwo partitions and P3 is the union of the points in PI and P2, so thatP3 is a refinement of both PI and P2, then S+U, PI) ~ S+U, P3) ~S- U, P3) ~ S-U, P2)). If Pk is any sequence of partitions such thatPk+1 is a refinement ofPk and the maximum intervallength of Pk tendsto zero, then limk_ooS+(/, Pk) = limk_ooS- (/, Pk), sowe cannot haveinfU> sup Z.

Let J: I(x) dx equal the common value of inf U and sup L. Giventhe error l/N, choose l/m so that Ix -yl < l/m implies I/(x) - I(y)l ~l/N(b-a). If Pis any partition with maximum intervallength at mostl/m, then

2056.1 Integrals 01 Continuous Functions

Page 225: Strichartz_The Way of Analysis 2000

for a < b where M is the sup and m the inf of f on [a, bJ. This fol­lows from the fact that M(b- a) and m(b- a) are Riemann upper andlower sums for the partition consisting of the single interval [a, bJ. Theexpression I:1(x) dx / (b - a) may be interpreted as an "average" valueof f on the interval [a, b], so our basic estimate can be interpreted assaying that the average líes between the maximum and the mínimumvalues. This is certainly a property that one would expect an average tohave. Another property one would expect from an average is linearity,and this is an immediate consequence of the linearity of the integral.This integral average is an ''unbiased" average in that subintervals ofequal length contribute to the average equally. For some application­s it is desirable to give different weights to different portions of theintervalo This can be accomplished by using a continuous, positiveweight function w(x) and defining the weighted integral average to beJ: f(x)w(x) dx/ J:w(x) dx. We leave to the exercises the verificationthat this satisfies the aboye properties of a reasonable average.

m(b - a) ~ [/(x) do: ~ M(b - a)

[ /(x) do: =t /(x) do:+[ /(x) dx

if a < b < c. By defining the integral for a > b by I: f (x) dx =- Iba I(x) dx and for a = b by I: I(x) dx = O,we can easily verify thatthe additivity continues to hold. We also have the basic estimate

t (/(x) +g(x)) do:=t /(x) do: +t g(x) dx

lbc/(x) do: = clb /(x) dx, c constant,

and the additivity:

From the definition we obtain immediately the línearity propertiesof the integral:

S+(I, P) and sup L ~ s=t], P)), so the two differ by at most l/N.QED

Chapter 6 Integral Calculus206

Page 226: Strichartz_The Way of Analysis 2000

l xo-mh ~ I(t) dt ~ -Mh

xo+h

and

Now (l/h) e:I(t) dt is a kind of average value of I on the interval[zo, Zo + h], so we would expect it to converge to I(xo) since I iscontinuous. (In Figure 6.1.4, it is the shaded area divided by the lengthof the base.)

More precisely, if h > O, then Jx:o+h I(t) dt Hes between hM andhm where M and m are the sup and inf of I on [xo, Xo + h]. Thusm s (l/h) sr: f(t) dt s M. If h < 0, then

11xo+h 11xOh f(t) dt = -h f(t) dtxo xo+h

We will frequently want to consider the integral with one of the end­points variable, to obtain a function, say F(z) = J: I(t) dt for a ~ x ~b if I is continuous on [a, b]. We will informally call F the "integral" ofl. We make the obvious remark that it is easier for the integral of I toexist than for the derivative of I to exist, contrary to the impressiongiven in calculus courses, where one is impressed by the fact that itis easier to compute a "formula" for the derivative than to compute a"formula" for the integral. Our perspective now is that J: I(x) dx is aformula for the integral; this formula does involve an infinite process,but we can always obtain approximations to J: I(t) dt with any desirederror. The restriction that I be continuous can be relaxed, and we willdiscuss this in detail later.

We can now obtain easily the fundamental theorem of the calculusor rather the two fundamental theorems-differentiation of the integraland integration of the derivative.

Theorem 6.1.2 (Differentiation 01 the Integral) Let I be a continuousfunction on [a, b], and let F(x) = J: I(t) dt [or a ~ x ~ b. Then F isel and F' = l.Proof: From the basic properties of the integral we find the differencequotient for F is

F(xo + h~ - F(xo) = ~r I(t) dt.Xo

6.1.2 Fundamental Theorems of Calculus

2076.1 Integrals 01 Continuous FUnctions

Page 227: Strichartz_The Way of Analysis 2000

The antiderivative F is not the unique solution to the equation9' = [, However, any other solution must differ from F by a constant,for then F - 9 would have derivative zero; but we have seen that afunction with zero derivative on an interval must be constant.

Note that the previous theorem has a rather trivial analogue involv­ing sums and differences: if Xl, X2, ••• is any sequence of numbers andwe form the sequence of sums YI!Y2, ••• with Yn = Xl + X2 + ... + Xn,then Xn = Yn - Yn-l·

Similarly, the next theorem (integration of the derivative) is an "in­finitesimal" version of the familiar fact that sums of differences "tele­scope": 2:k=1(Xk-xk-l) = Xn -Xo. It was first stated in the fourteenthcentury by Nicole Oresme in the form: the area under the graph of ve-

This shows F' = f; and since f was assumed continuous, we have F isel. QED

l1XO+1I.lim -h f(t) dt = f(xo).11.-0 Xo

where now M and m are the sup and inf of f on [xo+h, xo]. Since -his positive, we obtain again m ::; (l/h) 1:00+11.f(t) dt ::; M. Since f iscontinuous, both M and m tend to I(zo) as h --+ 0, so

Figure 6.1.4:

f

Chapter 6 Integral Calculus208

Page 228: Strichartz_The Way of Analysis 2000

It is also possible to prove the integration oí the derivative as aconsequenceoí the differentiationoí the integral, but this proof lacksintuitive appeal. We leaveit as an exercise.

nE!'(ak)(Zk - Xk-¡)k=l

adds up terms f'(ak)(Zk-xk-l) that are approximatelyf(Xk)- f(Xk-¡).In fact, by the mean value theorem, it is possible to chooseak in theinterval [Xk-l, Zk] to make this exact equality: f'(ak)(xk - Xk-¡) =f(zlc) - f(Zk-¡). For this choiceoí ak the Cauchy sum telescopes,soS(I', P) = f(b) - f(a) exactly. Sincewe know that J: f'(x) dx is thelimit of any choiceof Cauchy sum S(I', P) as the maximum intervallength tends to zero, it follows that it is the limit with this particularchoice,hence f(b) - f(a). QED

Theorem 6.1.3 (Integration of the Derivative) Let f be el on [a, b](since the interval is closed, the derivative f' at the endpoints is a one­sided derivative). Then J: f'(z) dx = f(b) - f(a).Proof: For any partition P oí the interval, the Cauchysum

Figure 6.1.5:

bI a

f'---- I/ r--.r--. ~~--"

locity is the distance traveled. He justified it by saying that the areawas the sum oí the areas oí the verticallines-thought oí as rectangleswith infinitesimalbases-and the areas oí the verticallines representthe distance traveled in an infinitesimaltime interval, as indicated inFigure 6.1.5.

2096.1 Integrals of Continuous Functions

Page 229: Strichartz_The Way of Analysis 2000

As an application we derive the integral remaindcr formula for Taylor'stheorem. Assume f is ClI+l. We define the remainder R¿ (xo. x) by theequation

It is interesting to compare the integration of the derivativetheorem with the mean value theorem. Both give the value of(I(b) - f(a))j(b - a) in terms of the derivative f'. In the mean val­ue theorem the derivative is evaluated at an unspecified point; in theintegration of the derivative theorem this is replaced by the integralaverage J: f'(t) dtj(b - a). In many applications you can use either re­sult. (For example, as an exercise, try reproving the results in Section5.2.3 by using the integration of the derivative theorem). Althoughthe mean value theorem has the advantage that it holds with weakerhypotheses (the derivative does not have to be continuous), it also hasthe disadvantage of the unspecified nature of the point. A good ruleof thumb is to try the mean value theorem first, but if you run intodifficulties, to switch to the integration of the derivative theorem.

The two fundamental theorems enable us to define the indefiniteintegml or primitive of the continuous function f to be any el functionF such that F' = f. The differentiation of the integral theorem showsthat an indefinite integral always exists, and we have seen that it isunique up to an additive constant. The integration of the derivativetheorem shows us how to evaluate definite integrals using indefinite in­tegrals. The two theorems can also be interpreted as saying the opera­tions of differentiation and integration are inverse operations; however,since differentiation is not one-to-one, the inverse indefinite integrationoperator is only defined up to an additive constant.

As a consequence of the integration of the derivative and the prod­uct formula for derivatives we obtain the familiar integration by partsformula.

Theorem 6.1.4 (Integration by Parts) Let f and 9 be el on [a,b].Then l f(:I:)g'(:I:) d:l:= f(b)g(b) - f(a)g(a) -l /,(:I:)g(:I:) dx.

Proof: Since (I.g)' = t's+ fg', we have J:[f'(x)g(x)+ f(x)g'(x)] dx= f(b)g(b) - f(a)g(a). QED

Chapter 6 Integral Calculus210

Page 230: Strichartz_The Way of Analysis 2000

19(b) lbI(x) dx = I(g(x))g'(x) dx.

g(a) a

Theorem 6.1.5 (Change 01 Variable Formula) Let 9 be el and in­creasing on [a, b). Then for any continuous function I on [g(a), g(b)),we have

and the proof can be completed by inductíon (see exercises).Another important consequence of the fundamental theorems is the

familiar change of variable formula.

~r (x - t)n I(n+l)(t) dtn. }xo

1 LX 1= (x - tt-l I(n)(t) dt - -(x - xot/(n)(xo)(n - 1)! Xo n!

integrating I(n+l) and differentiating (x - t)n In! to obtain

Note that J: (x - t)n dt/n! = (x - xo)n+l /(n + 1)!, so both the integralo .remainder and Lagrange remainder are oí the same order of magnitude,namely O(lx - xoln+l). Notice that the Taylor expansion with integralremainder formula for n = Ois I (x) = I (xo) + J~ l' (t) dt just the inte­gration of the derivative. To establish the integral remainder formulain general we simply apply integration by parts to

l1XRn(xo,x) =.. (x - t)n/(n+l)(t)dt.n. Xo

where Xl is some unspecified point between Xo and x. For some ap­plications, especially when one needs to vary X, the unspecific natureof Xl causes difficulties (it gives no information about the derivative ofthe remainder, for example). The integral remainder formula is

where Tn(xo, x) is the Taylor expansion of I to order n. We havealready established Lagrange's remainder formula

2116.1 Integrals 01 Continuous Functions

Page 231: Strichartz_The Way of Analysis 2000

Since we have not defined the length of a curve, we cannot begin toprove this result here. A thorough discussion will be given in Chapter13, where we will give a more general result (for curves not given as

Theorem 6.1.6 (Arclength Formula) Let f be a el function on la, b).Then the length o[ the curve given by the graph of the function I on[a,b) is equal to fa JI + f'(x )2 dz:

In this section we discuss briefly three integration formulas that willbe proved later in the texto From time to time we will need to usethese formulas, and you may find them useful in doing some of theexercises. Strictly speaking, we should not be allowed to do this, butit would simply take us too far afield to define all the concepts anddo the preparatory work needed to present the proofs here. The oth­er extreme-avoiding all use of these theorems until after they areproved-would have the negative consequence that we would not beable to complete the discussion of other topics in the place where theynaturally belong. So a healthy compromise seems in order. Of courseyou should check, when we eventually prove these theorems, that wehave not used anything in the proof that was derived assuming theresulto

The first result is the familiar arclength formula for the graph of afunction.

6.1.3 Useful Integration Formulas

Proof: Use g(x) = x + y, and observe g' == 1. QED

Corollary 6.1.1 (1hmslation Invariance) If I is continuous on [a, b),then J: f(x) dx =J:~:f(x + y) dx [or any y.

As a special case we have the translation invariance of the integral.

Proof: Let F be an indefinite integral for f. Then F(g(x)) hasderivative f(g(x))g'(x) by the chain rule, so J: f(g(x))g'(x) dz =F(g( b)) - F(g( a)) by the integration of the derivative theorem. Butthis is the same value that the same theorem gives for f;~if(x) dzsince F' = f. QED

Chapter 6 Integral Calculus212

Page 232: Strichartz_The Way of Analysis 2000

1T,,(f(x + h) - I(z))

1J.b(X+h)= h g(x + h, t) dt

b(x).r:- -h g(x + h,t)dta(x)

l b(X) 1+ h(g(x + h, t) - g(x, t)) dt.

a(x)

Each of the three terms will converge to the corresponding term inthe formula. For the first two terms this is almost the differentiationof the integral theorem combined with the chain rule, except for thetranslation oí x by h in g(x + h, t). It is plausible that this does notaffect the outcome since h is tending to zero. In the special case that 9is only a function of t alone this problem does not arise (see exercises).

For the third term the issue is the interchange of the integral and thelimit that defines 8g/8z. Wewill discuss the question oí interchanginglimits and integrals in Chapter 7.

i:I(z) = g(x, t) dta(x)

where a(x) and b( x) are functions of x as well.

Theorem 6.1.7 Let a(z) and b(x) be el junctions and g( x, t) be aeljunction 01 two variables. Then il I(x) = Ja(;1 g(x, t) dt we have

lb(X) agI'(x) = b'(x)g(x, b(x)) - a'(x)g(x, a(z)) + -8 (x, t) dt.

a(x) x

Since we have not discussed functions of two variables yet and havenot defined partial derivatives, we will have to postpone the completeproof until Chapter 10. However,we can indicate the main ideas of theproof rather easily. We write the difference quotient as a sum of threeterms that isolate the three appearances of the x variable:

graphs). We will need to use this result in Chapter 8 to motivate thedefinition of the trigonometric functions.

The next result is the general formula for differentiating a functiondefined by an integral. Here we allow the integrand to be a function oítwo variables g(z, t), and let

2136.1 Integrals 01Continuous Functions

Page 233: Strichartz_The Way of Analysis 2000

6.1.4 Numerical Integration

The existence of the integral as a limit of sums is a qualitative resultin that we know the sums approximate the integral but don 't knowhow fast the process converges. For practical purposes we would liketo have a quantitative counterpart-an estimate for the difference ofthe sum and the integral. This is possible if we assume more aboutthe function in terms of smoothness (this is quite reasonable becausewiggling of the function is likely to cause greater errors). In this sectionwe discuss briefty four methods of numerical integration and estimatesfor their errors.

Suppose f is differentiable and If'(x)1 ~ u, on [a, b]. Then if xand Yk are points in the interval [Xk-b Xk), we have If(x) - f(Yk)l ~Mdx - Ykl by the mean value theorem. Now the difference betweenJ:1c"-l f(x) dx and !(Yk)(Xk-xk-l) can be written Jxx""_l (f(x)- f(Yk)) dx

We will prove this theorem in Chapter 15, where we show thatboth of the aboye iterated integrals are equal to the double integralJJ f (x I y) dx dy defined by partitioning the rectangle [a, b] x [e, d] in theplaneo

l (t f(X,Y)dY) dx =t (l f(X,Y)dX) dy.

Theorem 6.1.8 (Interchange of Integrals) Let f(x, y) be a eontinuou8function for x in [a, b] and y in [e,d). Then Jcd f(x, y) dy is a continuou8function of x for x in [a, b], J: f(x, y) dx is a eontinuous funetion of yfor y in [e,d], and

Similarly, we may perform the integrations in the reverse order.

l (t f(x, y) dY) d»,

The last formula in this section is the interchange of integrals. Sup­pose f(x, y) is a continuous function of two variables for x in [a, b] andy in [e, d]. The exact definition is given in Chapter 9. We may considerthe function Jcd f(x, y) dy as a function of z: If it is continuous, whichis in fact always true, we can take its integral

Chapter 6 Integral Calculus214

Page 234: Strichartz_The Way of Analysis 2000

n

EM1(Xk - Xk_l)2/2.k=l

If we let 6 denote the maximum interval length of the partition, theerror bound is L~=lM1t5(Xk - xk-¡)/2 = (b - a)M1t5/2. This is calleda first-order estimate because 6 appears to the first power. Since 6is the only quantity we can control, by making the partition finer weare assured that we can make the error as small as we like but at arather large cost in computation time and computation accuracy. Forexample, if (b - a)M¡f2 :5 1 and we want an accuracy of 10-4, we haveto take 6 = 10-4, which means about 10,000 intervals in the partition.

We can do a lot better simply by taking the point Yk to be the mid­point of the interval, provided the function is smooth enough. Assume

At worst the error is Ml(xk - Xk_l)2/2; and if we sum over all theintervals in the partition, the total error is bounded by

Figure 6.1.6:

height =M1(Xl - Xl -1)12

Yl = midpointYl = left endpoint

(think of the contribution /(Yk)(Xk - Xk-l) to the Cauchy sum as theintegral ofthe constant function /(Yk) over the interval [Xk-b Xk)). Thebiggest this can be (in absolute value) is ¡:Ic Mllx - Ykl dx because

"'Ic-l/(x) - /(Yk) is at most Mllx - Ykl (in absolute value). This is anelementary integral that we can evaluate. Note that it is worst (largest)if Yk is one of the endpoints and best if Yk is the midpoint, as seen inFigure 6.1.6.

2156.1 Integrals 01 Continuous Functions

Page 235: Strichartz_The Way of Analysis 2000

~ 1 3 1 2~ 1 2~ 24M2(Xk - Xk-I) ~ 24M26 ~(Xk - Xk-t) = 24M2(b - a)6 .k=l k=l

This is a second-order estimate, because of the factor 62• IfM2(b - a)/24 ~ 1 and we want an error of at most 10-4 we onlyhave to take 6= 10-2 or about 100 points in the partition.

A closely related method is the trapezoidal rule, which is obtainedby replacing !(Yk) by ~(f(Xk-¡) + !(Xk)), the average value at theendpoints, the point being that ~(f(Xk-I)+ !(Xk))(Xk-Xk-t) is exactlythe area of the trapezoid lying under the line segment joining the twopoints (Xk-b J(Xk-t)) and (Xk, !(Xk)) on the graph of J, as shownin Figure 6.1.7. The trapezoidal rule also gives the exact integral forany affine function on each subinterval and also has a second-ordererror estimate. This is a little trickier to prove, and we leave it to theexercises.

The total error using the midpoint rule is at most

(, f(x)dx =t: /(Yk) de+]' (Yk)t:(x-Yk)dx+ 1::,R1(x)dx.

The first integral on the right is exactly !(Yk)(Xk - Xk-t), the midpointrule; and the second integral is zero because x - Yk is an odd functionabout Yk (this is where we require that Yk be the midpoint). If !wereaffine there would be no third term, so the midpoint rule would produceno error. But in any case, the error is at most

i xle 1 2 1 3 :r 1 3XII_l 2M2(x - Yk) dx = '6M2(x - Yk) IX~-1= 24M2(Xk - Xk-t) .

f is C2 and If"(x)1 ~ M2 on the intervalo The point of using the mid­point rule, Yk = (Xk-l + xk)/2, is that it gives the exact answer for anyaffine function on each subinterval. In other words, if f (x) = ax + b on[Xk-l,Xk], then J:1c f(x) dx = f(Yk)(Xk - Xk-t). (Geometrically, this

...11-1just says that the area of a trapezoid is the product of the base timesthe midpoint altitude.) To exploit this fact we simply use the first­order Taylor expansion with Lagrange remainder about the point Yk,so f(x) = f(Yk) + (x - Yk)f'(Yk) +Rl (x) with R; (x) = (x - Yk)2 f"(z)/2for sorne point z in [Xk, Yk] so that we have IRI(x) I ~ M2(X - Yk)2/2.Now we integrate the Taylor expansion:

Chapter 6 Integral Calculus216

Page 236: Strichartz_The Way of Analysis 2000

3. Derive the integration of the derivative theorem from the differ­entiation of the integral theorem. Can you prove the converseimplication?

6.1.5 Exercises

1. If f(x) = J:g1 g(t) dt where a(x) and b(x) are el functions and9 is continuous, prove that f'(x) = b'(x)g(b(x)) - a'(x)g(a(x)).(This is a special case of Theorem 6.1.7.)

2. Show that IJ: f(x) dxl s J: If(x)l dx.

n

L)I/6)(f(xk-¡) + 4f(Yk) + f(Xk))(Xk - Xk-¡).k=l

It gives the exact integral for all cubic polynomials and has a fourth­order error estimate involving M4 = sup 1!,,"(x)l, assuming f is e4•

Notice that we have a trade-off: in order to obtain higher order esti­mates we must assume more smoothness for the function. For any fixed6, it is not obvious which method will give the best approximation, s­ince the constants multiplying the power of 6 will vary. However, oncewe let 6 get small, the higher order methods quickly win out. It usuallypays to be clever.

Simpson's method uses

Figure 6.1.7:

2176.1 Integrals 01 Continuous Functions

Page 237: Strichartz_The Way of Analysis 2000

13. Let F(x) = J; f(t) dt for continuous f. Show that F has strict lo­cal maxima and minima at points where f changes signo Comparethis to the F' (x) = Ocriterion.

12. Let f(x) be a continuous function that is periodic of perioda(f(x + a) = f(x)). Prove that F (x) = J; f (t) dt is also periodic ofperiod a if and only if Joa f (t) dt = O.

11. Show that Simpson's rule gives the exact integral for any cubicpolynomial.

for continuous functions f. Prove that Aw is linear and lies be­tween the maximum and minimum values of f.

Aw(f) =l f(x)w(x) dx] l w(x) dx

10. For a continuous, positive funetion w(x) on [a, b), define theweighted average operator Aw to be

7. Write out the complete proof of the integral remainder formula inTaylor's theorem.

8. Let f be a el funetion on the line, and let g(x) = Jol f(xy)y2 dy.Prove that 9 is a el function and establish a formula for g' (x) interms of f.

9. If f is continuous on [ca, cb], show J: f(ex) dx = (l/e) Jc~ f(x) dxfor e > O. ls the same true for e < O?

6. Suppose you want to compute atable of arctangents for values ofx in [0,1] using the formula aretan x = J; 1/(1 +t2) dt. What sizemust you take for 6 if you want the error to be at most 1/1,000using the midpoint rule?

4. Prove the integral mean value theorem: if f is continous on [a, b),then there exists y in (a, b) such that J: f(x) d» = (b - a)f(y).

5. Let 9 be continuous on [a, b), and let f(x) = J;(x - t)g(t) dt.Prove that f is a solution of the differential equation 1" = 9 andthe initial conditions f (a) = r (a) = O.

Chapter 6 Integral Calculus218

Page 238: Strichartz_The Way of Analysis 2000

Often we want to form the integral of a function that is not continu­ous. For example, it is convenient to think of the Cauchy sums as theintegral of a function that is constant on each of the subintervals ofthe partition. This function is not continuous at the partition points,where it may have a jump discontinuity. It is not hard to show that thedefinition of integral we have given can be extended to functions thathave a finite number of jump discontinuities. One can then ask if itcan be extended further. The answer is yes, if the function has a finitenumber of any type of discontinuity and is bounded. Even if the num­ber of discontinuities is countable. The trouble with each such theoremextending the definition of integration is that it raises still more ques­tions. Are there other functions for which the definition of the integralalso makes sense? No amount of piece-meal extension of the concep­t of the integral will put to rest the possibility that sorne importantcases have been left out. However, there is a bold-stroke method forgetting all possible extensions at once. This method was introduced byBernard Riemann in the middle of the nineteenth century after manypiece-meal methods had been discovered.

Here is Riemann 's idea. We restrict attention to bounded functionson bounded intervals (so the graph of the function líes in a finite rect­angle). This is an important restriction, because many new kinds ofdifficulties arise when the interval or the function is unbounded; laterwe will come back and deal with sorne of the more well-behaved un­bounded cases when we discuss "improper integrals". The Riemannintegral is thus a theory of "proper integrals". If we want the largest

6.2.1 Definition of the Integral

6.2 The Riemann Integral

14. *a. If f is C2 on [a, b] with 1J"(x)1 ~ M2 and f(a) = f(b) =O, prove that If(x)1 ~ M21x - al [z - bl. (Hint: use themean value theorem to write f(x) = J'(y)(x - a), use Rolle'stheorem to find a point z where !'(z) = O, and use the meanvalue theorem to write f'(y) = J"(w)(y - z).)

b. Use part a on each subinterval to establish a second-orderestimate for the trapezoidal rule.

2196.2 The Riemann Integral

Page 239: Strichartz_The Way of Analysis 2000

possible class of functions for which the definition of the integral makessense, we should define the class of functions exactly in that way. Thuswe will say that a bounded function on [a,b] is Riemann integrable ifthe Cauchy sums S(j, P) converge to sorne limit, called the Riemannintegral of f and written J: f(x) dx as before, as the maximum internallength of P tends to zero.

Notice that this definition is nothing but a linguistic trick-takinga theorem and making it a definition. Of course the theorem can bereinterpreted in light of the definition: every continuous function on[a, b] is Riemann integrable. The important thing is that we have madea change in viewpoint. Instead of thinking of integration as a processattached to a specific class of functions (such as continuous functions),we think of it as a general process and search for functions to whichit can be applied. Notice that this is the way we have dealt with thederivative from the start (you might argue that in a calculus coursethe derivative is presented piece-meal: first you learn to differentiatepolynomials, then rational functions, and so forth).

Before we can make Riemann's idea precise we have to clarify onetechnical but important point. In establishing the existence of theintegral of a continuous function, we saw that there were several e­quivalent ways of obtaining it. We could take the limit of S(I, Pn)for a specific sequence of Cauchy sums and partitions, or the limit ofS(I, P) for all Cauchy sums and partitions, or the inf of upper Rie­mann sums S+(1,P) or the sup of lower Riemann sums S- (j, P). Ifwe were to define the Riemann integrable functions by any one of theseconditions, would we obtain the same class of functions? It turns outthat the answer is yes, provided we avoid one rather silly choice. Tounderstand why, we need to make a few simple observations aboutthe sums S(j, P). We are not assuming that f is continuous anymore, merely that it is bounded. Thus the sup M¿ and the inf mkof f on the interval [Xk-l, Xk] are still defined, but there do not haveto exist points of the interval where f takes on these values. Thusthe Riemann upper and lower sums S+(I, P) = Ek=lMk(Xk - Xk-¡)and S- (j, P) = Ek=lmk(xk - Xk-l) are not necessarily Cauchy sumsS(f, P) = Ek=¡!(Yk)(Xk - xk-l). However the Cauchy sums líe in be­tween, s=t], P) ~ S(I, P) ~ S+(I, P), and can be made as close asdesired to the Riemann sums (this is with P fixed) simply by choosingYk in [Xk-b Xk] so that f(Yk) is as close as needed to the sup or inf.

Chapter 6 Integral Calculus220

Page 240: Strichartz_The Way of Analysis 2000

We sum this inequality over all j, and on the left side we obtain exactlyOsc(f, P'). On the right side weobtain a rather complicated expression;however, ifwe sort the terms according to the index k of (Mk-mk), thenwe find (Mk - mk) multiplied by (xj - xj_l) for every subinterval of P'that intersects [Xk-l, xic). These subintervals ofP' cover [Xk-l, Xk) withperhaps sorne overlap at either end, as illustrated in Figure 6.2.1. Butbecause the lengths of the subintervals of P' are less than (Xk - Xk-I),we certainly have the sum of the lengths of the covering intervals less

Proof: Look at one particular interval [xj_l' xj] of Pi. By the lengthassumptions, it overlaps at most two consecutive subintervals [Xk-h Xk]

and [Xk, Xk+l] of P. The oscillation of Mj - mj of I on [Xi-l' xj] iscertainly less than the sum ofthe oscillations Mk-mk and Mk+1 -mk+lover the intervals [Xk-h Xk] and [Xk, Xk+1] that cover [xj_l' xj]. Thuswe have

Lemma 6.2.1 1/ the maximum interval length 01P' is less than theminimum length 01 subintervals 01P, then Osc(l, P') ~ 3 Osc(l, P).

Let us define the oscillation of I on P, Osc(l, P), to be the differenceS+(I, P) - s=t], P). Since this also measures the spread oí the differ­ent Cauchy sums S(f, P) associated with P, it is clear that it is theimportant quantity in deciding whether I is Riemann integrable. If Iis Riemann integrable we will have the integral J:I (x) dx also lying inthe interval between S-(/,P) and S+(/,P), so ifOsc(/,P) is small weknow any Cauchy sum S(I, P) is close to the integral.

Nowwe need to relate the oscillation of I for different partitions. IfP' is a refinement ofP, then clearly Osc(f, Pi) ~ Osc(f, P), because theupper sums decrease and the lower sums increase as points are addedto the partition. Howeverwe also need to compare the oscillation of Ion partitions that are not necessarily refinements of P but are in sorneways finer than P. Let 6 be the mínimum length of the subintervals inP. Then if pi is another partition with the maximum intervallengthless than 6, we claim Osc(f, P') ~ 3 Osc(f, P). The factor 3 arises inthis estimate because the subintervals of pi covering a fixed subintervalof P can be three times as long; it will turn out to be harmless whenwe try to make Osc(l, Pi) small.

2216.2 The Riemann Integral

Page 241: Strichartz_The Way of Analysis 2000

We are now in a position to prove the equivalence of several possibleways oí defining Riemann integrability and the Riemann integral.

Theorem 6.2.1 Let I be a bounded function on [a,b]. Then the 101-lowing are equivalent:

a. There exists a sequence Pi 01partitions such that Osc(f, Pi) ~ O.

b. Osc(f, P) ~ O as the maximum interval length 01 P tends tozero, in the sense that given any lln there exists l/m sucñ thatOsc(f, P) ~ 1/n [or any partition P with maximum intervallength~ l/m.

c. infpS+(f, P) = sUPpS-(f, P).

d. There exists a sequence oj partitions Pi and a real number J: I(x) dxsuch that S(f, Pi) ~ J: I(x) dx as j ~ 00 [or every choice 01Cauchy sums S(f, Pi)'

e. S(f, P) ~ J: I(x) dx as the maximum intervol lenqth. 01 P tendsto zero, in the sense that given any 1/n there exists l/m such thatIS(f, P) - J: I(x) dxl ~ 1/n [or any Cauchy sum S(f, P) and anypartition P with maximum intervallength ~ l/m.

Proof: Clearly condition b is a stronger statement than condi­tion a, but condition a implies condition b by the lemma, since onceOsc(f, Pi) ~ 1/3n, we have Osc(f, P) ~ 1/n if the maximum inter­val length of P is less than the minimum length oí subintervals of Pi'Thus conditions a and b are equivalent. The same reasoning showsthat conditions d and e are equivalent.

Figure 6.2.1:

)( X

than 3(Xk-xk-I). Thus (Mk-mk) is multiplied by at most 3(Xk-Xk-¡)on the right side, proving Osc(f, Pi) ~ 3 Osc(f, P). QED

Chapter 6 Integral Calculus222

Page 242: Strichartz_The Way of Analysis 2000

A function satisfying any one of the aboye equivalent conditionswiil be called Riemann integrable. Of course any continuous function isRiemann integrable, as was shown in Theorem 6.1.1. Condition a or bis useful because it does not involve explicitly the value of the integral.We note that for Riemann integrability it does not suffice to verifythat S(I, Pi} converges for one particular sequence of Cauchy sums.To see this we need to examine a famous example due to Dirichlet: thefunction f on [O,1]equal to 1 if x is rational and Oif x is irrational. Thisfunction is not continuous at any point, and there is no way to pictureits graph. It is trivial to compute that S+(J,P) = 1 and S-(j,P) = Ofor this function and any partition because the sup and inf of f on anyinterval are 1 and O,respectively. Thus f is not Riemann integrable.Howeverone can easily choose Cauchy sums to converge to zero or one.

Dirichlet 's function may strike you as rather pathological; indeedmany of bis contemporaries dismissed it as the work of a crackpotmentality. However, from the point of view of twentieth century math-

Now let's show that condition e is equivalent to condition a. Justas in the case of continuous functions, every upper sum S+ (1,P) isgreater than or equal to every lower sum S- (1,P'), as followsby com­parison with a partition P", wbich is a common refinement of P andP'. Thus infp S+(I, P) ~ supj, S-(1,P) in any case. H we have e­quality, there must exist sequences Pi and PJ of partitions such thatlimi_oo S+(I, Pi) = limi-oo s=i], PJ)· By passing to common refine­ments PJ' ofPi and PJ wehave limj_oo S+(I, PJ') = limi_oo S- (1,PJ'),which is the same tbing as limi_oo Osc(l, PJ'} = O. Thus condition eimplies condition a. But converselycondition a implies limj_oo S+ (1,Pi)= limi_ooS-(l,Pj} for any sequence Pj, so infpS+(I,P) =supp S-(l, P).

So far we have established the equivalences a H b H e and d He. To complete the proof we will show condition d implies conditiona and condition e implies condition d. Since the oscillation of f onP is the spread of the values of the Cauchy sums S(I, P), the con­vergence of S(1,Pi) implies Osc(l, Pi) ~ O. So condition d impliescondition a. But as before condition e implies limi-oo S+ (1,Pi) =limn_ooS-(1,Pi) for a sequenceofpartitions Pi and so limi-oo S(I, Pi)equals the common value infp S+(I, P} = suPp S-(l, P) for any Cauchysums S(1,Pi)' This is condition d. QED

2236.2 The Riemann Integral

Page 243: Strichartz_The Way of Analysis 2000

where M and m are the sup and inf of I on [a, b] is also easily verifiedfor Riemann integrable functions. This implies that the integral of anon-negative function is non-negative, and hence the integral freservesorder: if I(x) ~ g(x) for every x in [a, b), then f: I(x)dx ~ fa g(x) da:

It is also possible to show that the product of two Riemann inte­grable functions is Riemann integrable, although there is no formula

m(b- a) ~ [/(x) dx ~ M(b - a)

We leave the proofs as exercises. The basic estimate

[ /(x) dx = [¡(X) dx +[ ¡(x) dx.

Additivity says that if a < b < e and I is defined on [a, e] and therestrictions of I to [a, b) and lb, e) are Riemann integrable, then I isRiemann integrable on [a, e) and

[(/+9)(X)dx = [/(X)dx+ [9(X)dX

[(e¡)(x) dx = e [/(X) dx.

It is a simple matter to show that the elementary properties of theintegral of continuous functions are also true of the Riemann integral.Such properties now require an additional statement about Rieman­n integrability. Here is linearity: if I and 9 are Riemann integrablefunctions on [a, b] and e is a real number, then I + 9 and el are alsoRiemann integrable functions on [a, b) and

6.2.2 Elementary Properties of the Integral

ematics, Dirichlet 's function is rather tame in comparison with manyfunctions that arise in solving very practical problems. Thus the in­ability of the Riemann theory of integration to deal with it is a sign ofsignificant weakness. We shall also observe that there is rather com­pelling evidence for assigning the value zero to the integral of Dirichlet 'sfunction. WewiIl return to this example when we discuss the Lebesguetheory oí integration in a later chapter.

Chapter 6 Integral Calculus224

Page 244: Strichartz_The Way of Analysis 2000

There is one hitch in the argument, however: how do we know IJ(x)1is Riemann integrable? It turns out to be true, but not obvious, thatif J is Riemann integrable, then IJI is also Riemann integrable. Beforeproving this let us observe that the converse is not true. We can have IJIRiemann integrable without J being Riemann integrable; just considerthe variant of Dirichlet 's function that takes the value 1 for x rationaland -1for x irrational. This is still not Riemann integrable, but theabsolute value of this function is identically one, hence continuous.This example helps to motivate the positive result in that it showshow taking the absolute value decreases the amount of oscillation in afunction.

Often we will choose 9 in such a way that we can evaluate J: g( x) dxexplicitly.

Now IJ(x)1 is the smallest non-negative function that dominates J.By the aboye argument we expect to have the estimate

Il!(x) - sl g(x) dz,

by the order preservation of the integral; hence, we obtain

for the integral of a producto The proof is based on the Osc(f, Pi) -+O criterion, as might be expected, and the estimate Osc(f g, P) ~M(Osc(J, P)+ Osc(g, P)) where M is the sup of IJI and Igl over theinterval. We leave the details for the exercises, along with some relatedresults on quotients and compositions.

It is important to be able to estimate the size of an integral; inother words to obtain a bound for 1 J: J(x) d.xl. Now if 9 is any non­negative Riemann integrable function such that IJ(x)l ~ g(x) (we say9 dominates f) for every x in [a,b], then -g(x) ~ J(x) ~ g(x), so

-lb g(x)dz s lb !(x)dx < lbg(x)dx

2256. ~ The Riemann Integral

Page 245: Strichartz_The Way of Analysis 2000

We now turn to the problem of showing that sorne discontinuousfunctions are Riemann integrable. We begin with a simple resulto

Theorem 6.2.3 Let I be a boundedfunction on la, b) that is contin­uous ezcept at a finite number 01points (we do not a8sume anythingabout the nature 01the discontinuities at these points). Then I is Rie­mann integrable.

Proof: Let a¡, ..• , aN denote the points of discontinuity. Given any1/n, surround each oí these points by an interval lk of length at most

{I(x) if I(x) ~ g(x),

max(f, g)(x) = g(x) if g(x) ~ I(x).

We clairn Osc(max(f, g), P) ~ Osc(f, P)+ Osc(g, P). This follows byconsidering the values that rnax(f, g) assumes on each subinterval­they can vary at most from the larger of the two sups to the small­er oí the two infs. Then Osc(max(f, g), P) -+ O as the maximumlength of the subintervals of P tends to zero since Osc(/, P) -+ O andOsc(g, P) -+ O, so max(f, g) is Riernann integrable. QED

Proof:a. We claim Osc(l/l, P) ~ Osc(f, P). Indeed on any subinterval

[Xk-l, Xk) let Mk and mk denote the sup and inf of l. If they bothhave the sarne sign, then the oscillation of Ilion the subinterval is thesame. If they have the opposite sign, then Mk > O and mk < O andMk - mk =Mk + (-mk) is greater than either Mk or -mk. But the supof Ilion the subinterval is the larger ofMk or -mk, and the inf of 1IIis at least zero since 1II ~ O. Thus the sup rninus the inf for 1II is lessthan Mk - mk. Adding up over all the subintervals gives Osc(l/l, P) ~Osc(f, P). Since I is Riemann integrable, there exists a sequence Pj

such that Osc(f, Pj) -+ O; hence, Osc(l/l, Pj) -+ O and 1I1 is Riemannintegrable.

b. Let us consider

Theorem 6.2.2

a. 11 I is Riemann integrable on [a,b), then so is 1/1.b. 11I and 9 are Riemann integrableon la, b), then so aremax(f, g)

and rnin(f, g).

Chapter 6 Integral Calculus226

Page 246: Strichartz_The Way of Analysis 2000

[Note: the arguments in this section use sorne elementary facts aboutinfinite series that are discussed in detail in the next chapter.]

The arguments of the last section could be generalized to showthe Riemann integrability of any function f that is continuous exceptfor a set E with the property that given any l/n there exists a finitecovering of E by intervals whose lengths add up to at most l/no Such

6.2.3 Functions with a Countable Number of Disconti­nuities*

Notice that the idea of the proof is to cover the set of discontinuitiesof f by a union of a finite number of intervals whose lengths add up toa small number (N/ n in the argument). Inside these intervals we haveno control over f other than its boundedness, but this is enough inestirnating Osc(f, P) because the totallengths of the intervals is srnall.Outside these intervals the function f is continuous and we can controlthe oscillation as before.

l/no Then the function is continuous on the set consisting of [a, b] withU~l t, removed. Let P denote any partition that contains all theintervals h. It will also contain sorne other interval Jk. In estimatingOsc(f, P) we will use a separate argurnent for contributions frorn the Iand J type intervals. For the I type intervals we use the crude estimatethat the length of each interval is at most l/n and the oscillation ofthe function is at most M - m, where M and m are the sup and infof f on [a, b] (we assumed f was bounded, so M and m are finite).Thus each interval contributes at most (M - m)/n, and there areN such intervals, so the total contribution is at most N(M - m)/n.For the contribution from the J type intervals we use the existence ofthe integral for continuous functions to conclude that it can be madeas srnall as desired, say less than l/n, by taking the length of thesubintervals Jk sufficiently small,

Thus we have shown that Osc(f, P) ~ l/n + N(M - m)/n if P is anypartition containing the intervals h and the remaining subintervals of Pare sufficiently small, Now the quantities N (number of discontinuities)and M and m (sup and inf of f on [a, b]) are fixed, so by taking nlarge we can make Osc(f, P) as small as desired, proving f is Riemannintegrable. QED

2276.12The Riemann Integral

Page 247: Strichartz_The Way of Analysis 2000

( ) _ { 1 if x < r k,qk x - O.f >1 X _ rk

is continuous at every point except rk. We will multiply qk by a factor bkand take the sum 1= E~l bkqk(X). For simplicity we take bk = 2-k,but any absolutely convergent series E Ibk I< 00 will do as well. Thenthe function 1 can also be expressed I(x) = E2-k where the sumextends over all k such that x < rk. Since E~l 2-k = 1, the function

However, not every countable set has content zero. The rationalnumbers in [0,1], for instance, can be covered by a finite number ofintervals only if their lengths add up to one. Nevertheless, a boundedfunction with only a countable number of discontinuities is also Ríe­mann integrable. This will require a new and more difficult kind ofproof, Before embarking on it we will consider an example that showsthere are functions that have discontinuities on the rational numbersalone (Dirichlet 's function is not an example because it is discontinuousat every point). The idea of constructing the example is to place jumpsat each rational points but to make the jumps very small so that thefunction will be continuous at all irrational points. The fact that weare forced to make the jumps small helps explain why such a functionmust be Ríemann integrable.

Let rl, r2, ... be any enumeration of the rational numbers. Thenthe function

Figure 6.2.2:

a set is said to have content zero. (Sets of content zero will playa rolein the theory of multiple integrals in Chapter 15. The general notionof content has been superseded by the notion of measure , which willbe discussed in Chapter 14.) An example of a set of content zero is acountable set al, a2, ... that converges to the limit ao, together with ao;for then, given l/n, we cover ao with an interval of length 1/2n, andthis will cover all but a finite number of the aj since limj-+oo aj = ao.The finite number of points left uncovered can then be covered by afinite number of intervals with lengths adding up to at most 1/2n.

Chapter 6 Integral Calculus228

Page 248: Strichartz_The Way of Analysis 2000

Proof: The first step is to make precise the idea that I cannotjump around a lot at too many points. Define the oscillation of fat a point xo, denoted Osc(f, zo) to be the limit as n ~ 00 of thedifference of the sup and inf of I on the interval [zo - l/n, Xo + l/n).Clearly if f is continuous at zo, then Osc(f, zo) = O, and conversely.Furthermore, the size of Osc(f, zo) at a point of discontinuity gives aquantitative measurement of the amount oí jumping around. If I hasa jump discontinuity at xo, then Osc(f, zo) is the size of the jump (aslong as I(zo) takes on some intermediate value between limx_xt I(x)and limx_x; I(x)). Regardless of the type oí discontinuity, Osc(f, xo)is well defined because the difference of the sup and inf decreases withthe size of the interval.

Now the set where Osc(f, x) < l/n is open. To see this, notice thatif Osc(f, zo) < l/n, then Osc(f, zo) ::;ó for sorne ó < l/no This meansthere is an interval [xo - l/k, Xo + l/k] on which the difference of thesup and inf of I is at most l/n, and hence the oscillation of f at everypoint of (xo - l/k, Zo + l/k) is less than l/no

Since the set where Osc(f, x) < l/n is open, its complement, whereOsc(f, x) ~ l/n, is closed. Note that the set of discontinuities of I is theunion over n ofthese closed sets. This fact, which is true of an arbitraryfunction, is sometimes stated as follows: the set of discontinuities of afunction is an Fq set (Fq means a countable union of closed sets).

Theorem 6.2.4 Let I be a boundedfunction on la, b]. 11 I is contin­uous ezcept at a countable set 01points, then I is Riemann integrable.

I takes on values between O and 1. Now I is continuous at everyirrational point Zo because given any N we can find a neighborhood ofZo that does not contain the first N rational numbers rl, ... , rN. Thevariation of I over this neighborhood is confined to the terms 2-krk(X)for k < N, and the maximum these change is L:~N+12-k = 2-N•Thus I/(x) - l(zo)1 ::; 1/2N in this neighborhood, so I is continuousat Zo. Roughly the same argument proves that I is discontinuous ateach point rk' Just write f = 2-kqk + L:j~k 2-j qj. Then L:j~k 2-j qj iscontinuous at rk by the argument just given, so I can't be continuousat rk without implying 2-kqk is continuous there, which it clearly isnoto Again it is not possible to draw a picture of the graph of thisfunction.

2296.2 The Riemann Integral

Page 249: Strichartz_The Way of Analysis 2000

2n' 22n' 23n' ....

Thus if al, a2, ... is the countable closed set (it must be compact be­cause it is also bounded), we cover it by the intervals

(ak - 2k!ln' ak+ 2k!ln) .By the Heine-Borel theorem a finite number ofthese intervals will cover,and the sum of lengths of these intervals is at most 2:~11/2kn = l/noThus al, a2, ... has zero contento

Now let us consider what we have accomplished. We have the in­terval [a, b] divided into two sets. On one, where Osc(f, x) < l/n, thefunction is fairly well behaved-it is not necessarily continuous, but atleast whatever discontinuities exist are not too jumpy. The comple­mentary set, where Osc(f, x) 2: l/n, is small, in that we can cover itby a finite number of intervals of small length. We want to combinethese two properties. However we still need to move cautiously andapply the Heine-Borel theorem one more time. First we cover the setwhere Osc(f, x) 2: lln by a finite number of open intervals 11, ••• , INsuch that the lengths add up to at most l/n (the 1/n here is deliber­ately the same as the l/n aboye). Now consider the complementaryset, [a, b) with uf=IIj removed. This is again a compact set, andOsc(f, x) < l/n on this seto But Osc(f, x) < l/n means that x Hes inan open interval in which f varies by at most l/no Thus there exists acovering of the compact set [a, b] - Uj~1 Ij by such intervals and by theHeine-Borel theorem, a finite covering. By shrinking these intervals wecan obtain a non-overlapping covering by closed intervals JI,' .. , Jk.

Altogether we have the interval [a, b] partitioned into subintervals!¡,... ,IN and JI, ... , Jk with the properties:

1 1 1

Now the hypothesis that the set of discontinuities is countable im­plies that the smaller set where Osc(f, x) 2: l/n is at most countable.The fact that it is also closed means that it is in some sense betterbehaved than the original set of discontinuities. In particular we wantto show that it has content zero (it can be covered by a finite numberof intervals whose lengths add up to as small a number as desired).

To do this we use the Heine-Borel theorem. Given l/n, we cover theclosed countable set by a countable collection of open intervals whoselengths are

Chapter 6 Integral Calculus230

Page 250: Strichartz_The Way of Analysis 2000

7. *Let f be Riemann integrable on [a, b] and let 9 be continuous on[m,M], where M is the sup and m the inf of f on [a, b]. Provethat 9 o f is Riemann integrable on [a, b). (Hint: This is tricky.Not only do you need to use the uniform continuity of g, but youneed to argue separately concerning the subintervals of a partitionfor which Osc(f, P) is small but Mk - mk is large.)

6. Prove that if f is Riemann integrable on [a, b] and g(x) = f(x) forevery x except for a finite number, then 9 is Riemann integrable.

5. Prove that if f and 9 are Riemannintegrableon [a, b] and 9 is boundedaway from zero (there exists l/n such that Ig(x)l ~ l/n for all x in[a, b], then f /9 is Riemann integrable.

4. Prove that if f and 9 are Riemann integrable on [a, b], then f .9is Riemann integrable on [a, b).

3. Prove the additivity of the Riemann integral.

2. Prove the linearity of the Riemann integral.

1. If a < b < e and f is Riemann integrable on [a, e], prove that f isRiemann integrable on [a, b) (strictly speaking we should say thatthe restriction of f to [a, b] is Riemann integrable).

6.2.4 Exercises

If P denotes the partition of [a, b) consisting of these intervals wecan estimate Osc(f, P) as follows: the contribution from the 1 typeintervals is at most (M - m)l/n where M and m are the sup and infof f on [a, b); while the contribution from the J type intervals is atmost l/n times the sum of the lengths of the J intervals, hence atmost (b - a)/n. Thus Osc(f, P) ~ (M - m}/n + (b - a}/n; and sinceM - m and b - a are fixed quantities, we can make Osc(f, P) as smallas desired. Thus f is Riemann integrable. QED

2. The difference of the sup and inf of f on each J¡ is at most l/no

1. The sum of the lengths of lb' .. ,IN is at most l/n.

2316.2 The Riemann Integral

Page 251: Strichartz_The Way of Analysis 2000

6.3.1 Definitions and Examples

We frequently need to deal with integrals oí functions that are UD­

bounded and with integrals over unbounded intervaIs. In the Riemanntheory oí integration the only way to handle these is to take appropriatelimits of integrals over smaller intervals. The term improper integral isused informally to denote any of a variety of such integrals. For exam­ple, consider the expression Jo1x4 dz where a < O. (Strictly speaking,we have only defined x4 for a rational; the general case will be dealtwith in Chapter 8. For now you can either assume that a is rational or

6.3 Improper Integrals*

15. Prove that a G/j set that is dense must be the whole lineo

14. Prove that every open set is an Fq set and every closed set is aG/j seto

13. Give an example of an Fq set that is not a Gs set and a Gs setthat is not an Fq seto

12. Prove that the complement of an Fq set is a G/j set (a countableintersection of open sets).

11. If f is continuous on [a, b] and differentiable on (a, b) and f' isRiemann integrable on [a, b], show that J: t' (X ) dx = f (b) - f (a).

8. Let 9 be el and increasing on [a, b], and let f be Riemann inte­grable on [g(a),g(b)]. Prove that fog is Riemann integrable on[a, b] and the change oí variable formula holds.

9. a. If f is Riemann integrable on [a, b], prove that F(z) = J: f(t) dtis continuous.

b. Prove it satisfies a Lipschitz condition.

10. If f is Riemann integrable on [a, b] and continuous at xo, provethat F(x) = J: f(t) dt is differentiable at Xo and F'(xo) = f(xo).Show that if f has a jump discontinuity at xo, then F is notdifferentiable at xo.

Chapter 6 Integral Calculus232

Page 252: Strichartz_The Way of Analysis 2000

and so the limit exists and equals -l/(a+l) if a < -1 and fails to existif a ~ -1. Notice that a = -1 is again the cut-off point between theexistence and non-existence of the improper integral, but the inequalitygoes the other way. In particular, for no valué of a does the improperintegral Joooxa dx existo We can state these basic facts informally asfollows: the function xa has an integrable singularity near x =O if andonly if a > -1 and an integrable singularity near 00 if and only ifa < -1.

So far we have looked at examples of improper integrals where theexistence of the limit does not depend on cancellation of positive andnegative values of the function. Suppose, to be specific, that f is a func­tion defined on [0,1] that is Riemann integrable on [f, 1]for every f > O.We say that f has an absolutely convergent improper integral on [0,1]if limHo J(llf(x)1 dx exists. This implies the existence of J} f(x) dxas we11.To see this we use the Cauchy criterion. Given f and ~ withsay f < ~ then J(.l f(x) dx - J61f(x) dx = J(6 f(x) dx by additivity and

{

ya+! - 1(Y , a;:/; -1,11 xa dx = a + 1

1 logy, a = -1,

Thus we define the improper integral Jo1xa dx to be lime_ot xa dx,and this is 1/(a + 1) if -1 < a < O;while the limit does not exist (orequals +00 in the extended real numbers) if a ~ -1.

Next let us look at an example where the interval of integration is in­finite, J100 xadx, again with a < O.Wedefine this to be limy_ooJIY xa dz:Again we can compute explicitly

a =-1.-log e,

1- fa+!---, a;:/; -1,a+l

else accept that the basic calculus formulas for xa are also valid for a11real a.) Since the function xa is unbounded, we have not yet definedthis integral. However, the function xa is bounded and continuous onthe interval [f,I] for any f > Oand so J(.1 xadx is defined. In fact wecan compute it exactly as

2336.3 Improper Integrals

Page 253: Strichartz_The Way of Analysis 2000

if Z > y,Ir sin x dx _ r sin x da:I ::; !+!+ 1% _!_ dx = ~Jo x Jo x Z y y x2 y

and this goes to zero as y, Z ~ oo. It is possible to computeJoOO{sin x/x)dx exactly using methods of cornplex variables. You mightbe amused to look up the answer in a table oí integrals.

Another important dass of examples are the Cauchy principal val­ue integrals. In these examples the singularity Hes in the interior ofthe interval. Say f is defined on [-1,1] (possibly undefined at x = O)and is bounded on [-1, -E) and {E, 1],for every E > 0, but unboundedon (-E, E). We define the principal value integral P.V. J21 f{x) dx tobe the limit lim(_o+ J~1( + J(l f (x) dx if it exists. In other words, wecut away a symmetric neighborhood of the singularity and take thelimit as the size of the deleted neighborhood goes to zero. For ex­ample, P.V. J~l (l/x) dx = O although f{x) = l/x is not absolutely

by integration by parts. Using the crude estimate I cos z] ::; 1we find

There is no difficulty near x = O because sin x]» is continuous there.However, Jt I sin x/xl dz does not have a finite limit as y ~ 00, formuch the same reason that ¿~=11/n diverges. Nevertheless, we caneasily show that limy_oo Jt (sin x / x) dx exists by applying the Cauchycriterion. We have

l ysin x d 1% sin z d -1% sin z d - cosz cosy 1% COSX-- x- -- x - -- x - ---+--- --dx

o x o x y x Z y y x2

100 • lY .sm z d li sm z d-- x= 1m -- x.o x y-oo o x

so the Cauchy criterion for convergence of lime_O J(l If{x)1 dx impliesthe Cauchy criterion for convergence of lime_o J} f (x) dx, Similar rea­soning shows the existence of lirny_oog If (x) Idx irnplies the existenceof limy_oo J1Y f (x) dx.

There are sorne important examples of improper integrals that arenot absolutely convergent. One is the integral

I J(" f (x) dx I ::; J(6 If (x) Idx by our previous results. This means

I[I(x) dx -[ I(x) dxl :5 [1/(x)1 dx -[I/(x)1 dx,

Chapter 6 Integral Calculus234

Page 254: Strichartz_The Way of Analysis 2000

3. If 1is non-negative and f1°O I(x) dx exists as an improper integral,must limx_oo I(x) = O? Must 1 be bounded? What can you sayif limx_oo I(x) exists?

4. Let p( x) be a polynomial with a simfle zero at x = Obut no otherzeroes in [-1,1]. Show that P.V. Ll1/p(x) dx exists. Also showthat f1Ip(x)l-adx exists for O< a < 1.

5. Show that P.V. I~l(f(x)/x)dx exists if 1is el on [-1,1].

6. Show that 1000x-a sin x dx exists for O< a < 2.

7. Ifthe improper integral J~ I(x) dx exists, prove I~oo I(x+y) dx =J:O I(z)dx for a11real y.

8. If 1 is positive and continuous on (0,1] and the improper integral101I(x) dx exists, prove that the lower Riemann sums converge tothe integral.

exist?

2. For which values of a and b does the improper integral

L"" x41logxlb d»

exist?

1. For which values of a and b does the improper integral

t/2Jo zallog zlbdx

6.3.2 Exercises

integrable. One can also show (see exercises) that P.V. t:(I(z)/z) dxalways exists if f is el. The fact that the neighborhood is symmetric isimportant, since for example lime_O+ f1+ f21(.(1/z) dx = -log2 whilelim,__o+ f~l + f(.;(1/z) dx doesn't existo

The question oí the existence or non-existence oí improper integralsboils down to the question of existence of limits, which we will take upin detail in the next chapter.

2356.3 Improper Integrals

Page 255: Strichartz_The Way of Analysis 2000

Theorem (Integral Remainder Formula [or Taylor's Theorem) II I is

[ f(" )g'(,,) dx = f(b)g(b) - f(a)g(a) -[ f'(,,)g(,,) d».

Theorem 6.1.4 (Integration by Parts) 1/ I and 9 are el on [a,b] then

Deftnition F is colled an indefinite integral or primitive 01I il F' = l.

Theorem 6.1.3 (Integration 01 the Derivative) II I is el on [a, b),then J: I'(x) dx = I(b) - I(a).

Theorem 6.1.2 (Differentiation 01 the Integral) II I is eontinuoU8 on[a,b], then F(x) = J: I(t) dt is el and F' = l.

Theorem The integral 01 eontinuous functions is linear and additiue,and m(b - a) ~ J: I(x) dx ~ M(b - a) where M and m denote the supand inf 01 I ouer [a,b).

Theorem 6.1.1 (Existenee 01 the Integral) II I is continuous on [a, b)there ezists a real number J: I (x) dx su eh that given N there existsm sueh that il the mazimum interval length 01 P is less than l/m,then IS(I, P) - J: I(x) dxl < l/N. AIso J: I(x) dx = infS+(f, P) =supS- (f, P) where P varies ouer all partitions.

Deftnition Let P denote a partition 01 the interval [a,b), a = Xo <Xl < ... < Xn = b and I a eontinuous junetion on [a,b]. A Cauehysum S(f, P) is any sum 01 the [orm L~=l I(ak)(xk - Xk-¡) where akis in [Xk-l, Xk]. The upper and lower Riemann sums are S+ (f, P) =E~=l Mk(Xk - Xk-¡) and S-(1,P) = Ek=l mk(xk - Xk-¡) where Mkand mk denote the sup and inf 01 I on [Xk-l, Xk]' The maximum in­tervallength 01P is the maximum value 01Xk - Xk-l.

6.1 The Fundamental Theorem

6.4 Summary

Chapter 6 Integral Calculus236

Page 256: Strichartz_The Way of Analysis 2000

[ (tI(x, y) dY) dx =t ([I(x, y) dx) dy.

Theorem 6j.S (Interchange of Integrals) Let I(x, y) be continuousfor x in la, b] and y in [e,d). Then fcd I(z, y) dy and J: f(z, y) eh arecontinuous and

l b(X}f'(x) = b'(x)g(z, b(z)) - a'(z)g(z, a(z))+ 8g/8z(z, t) dt.

a(x}

Theorem 6.1.7Let f(z) = J:(~lg(z, t) dt for a(z),b(z), and g(x, t) eljunctions. Then

Theorem 6.1.6 (Arclength formula) The length of the graph of a eljunction 1 on [a,b] is J: JI+ 1'(z)2dx.

lb lb-Ya I(z) dx = a-y I(z + y) dy.

Corollary 6.1.1 (7hmslation lnvariance)

19(b) lbf(z) dx = I(g(x))g'(x) eh.

9(a} a

Theorem 6.1.5 (Change of Variable) lf 9 is el and increasing on[a,b] and 1 continuous on [g(a),g(b)),

2376.,4 Summary

Page 257: Strichartz_The Way of Analysis 2000

d. S(f, Pj) converges[or every choice 01 Cauchy sequence[or somesequence 01partitions.

c. inf S+(I, P) = sup s=t], P).

a. Osc(f, Pj) ~ Olor some sequence 01partitions.

b. Osc(f, P) ~ Oas the maximum interval length 01P goes to zero.

Theorem 6.2.1 Let I be bounded on [a,b). Then the lollowing areequivalent:

Osc(f, P') ~ 3Osc(f, P).

Lemma 6.2.1 JI the maximum interval length 01pi is less than theminimum length 01subintervals 01P, then

Deftnition Osc(l,P) = S+(f,P) - S-(f,P).

Deftnition A boundedfunction I on [a,b] is said to be Riemann inte­grable with integralJ: I(x) dx il S(I, P) converges to J: I(x) dx as themaximum intervallength 01P tends to zero,

6.2 The Riemann Integral

Theorem (Midpoint rule) JI I is C2 on [a,b] with II"(x)1 s M2 andS(I, P) is [ormed by evaluating I at the midpoint 01 each subinterval,then

It ¡(x) <Ix - S(f, p)1s ~Ml(b - ajó.

Theorem 11 I is el on [a,b] with 11'(x)1 ~ M1 and Ó is the maximumintervallength 01P, then

Chapter 6 Integral Calculus238

Page 258: Strichartz_The Way of Analysis 2000

Deftnition The Cauchy principal value integral P. V. I~l f(x) dx isdefined to be limHo ¡:1(+t f(x) dx if the limit exits.

Example limn_oo Ion (sinx/x) dx exists, but limn_oo Ion 1 sinx/x 1 dx doesnoto

Deftnition Jf f is defined on [0,1] and Riemann integrable on [e, 1) forevery € > O we say f has an absolutely convergent improper integralon [0,1] if limHo t If(x)1 dx exists. This implies limHo 1(1 f(x) dxexists.

Example The function xa has an integrable singularity near x =O ifand only if a > -1 and an integrable singularity near 00 if and only ifa <-1.

6.3 Improper Integrals

Theorem 6.2.4 A bounded function on [a,b] continuous except at acountable set of points is Riemann integrable.

Example There exists a function continuous at the irrational numbersbut discontinuous at the rational numbers.

Theorem 6.2.3 A bounded function with a finite set of discontinuitieson [a, b] is Riemann integrable.

Theorem 6.2.2 Jf f and 9 are Riemann integrable on [a, b], then sois Ifl, max(f,g), and min(f,g).

Theorem The Riemann integral is linear and additive, and m(b- a) ~1: f(x) dx ~ M(b - a) where M and m denote the sup and inf of f on[a, b].

e. f is Riemann integrable.

2396.4 Summary

Page 259: Strichartz_The Way of Analysis 2000
Page 260: Strichartz_The Way of Analysis 2000

241

1 x .( y)x + iy = x2 + y2 +, - x2 + y2

(x + iy) + (x' + iy') = (x + x') + i(y + y'),

(x + iy) . (x' + iy') = x· x' - y. y' + i(x . y' + x' . y),

which are obtained by adopting the usual rules of arithmetic togetherwith the identity i2 = -1. It is a familiar fact from algebra that thecomplex numbers satisfy the field axioms. Perhaps the only nontrivialone is the existence of multiplicative inverses, but we easily verify that

So far we have been dealing with the real number system R and func­tions whose domain and range are subsets of R. For many purposes itis important to consider also the complex number system C. We candescribe this system succinctly by defining a complex number to be asymbol x +iy, where x and y are real numbers and i is a formal symbol,which operationally is to be thought of as a solution of the equationx2 = -1. We have already observed that there are no real solutions ofthis equation. The arithmetic of complex numbers is described by theformulas

7.1.1 BasicProperties of e7.1 Complex Numbers

Sequences and Series ofFunctions

Chapter 7

Page 261: Strichartz_The Way of Analysis 2000

The two most important properties of the absolute value are themultiplicativity, Iz·zIl= Izl·lzIl, and the triangle inequality, Iz+z¡j ~Izl+lz¡j,where z and Zl are any complex numbers. The multiplicativityis established by a direct computation; if z = x + iy and Zl = Xl + iYl,then

Figure 7.1.1:

x

~(X,y) x2+y2y

¡

does the trick. We leave the details as an exercise.The construction of the complex numbers from the real numbers

is purely algebraic (involving only finite operations). It is an exampleof a general procedure for enlarging any field by adjoining roots of anequation (in this case x2 + 1 = O). It is a remarkable fact that the com­plex numbers are algebraically complete, meaning that a11polynomialswith complex coefficients ( (J5+1) x27 + 3x6 - 7r, for example) havecomplex roots. Thus the complex numbers cannot be further enlargedalgebraically. This fact is ca11edthe fundamental theorem 01 algebra. Itwill not be used in this book. For a proof, the reader can consult anytext on complex variable theory.

The complex numbers do not possess an order that is mathemat­ically relevant (one could impose an order, say lexicographic, but itwould not have enough properties to make it worth studying). Butthere is a related concept of absolute value or modulus of a complexnumber, [z + iyl = Jx2 + y2. Note that x2 + y2 is non negative, sothe square root exists as a non negative real number. If we adopt thefamiliar convention of identifying the complex numbers with points inthe plane (x + iy corresponds to (x, y»), then Ix + iyl is the distance ofthe point to the origin by the Pythagorean theorem (see Figure 7.1.1).

Chapter 7 Sequences and Series of Functions242

Page 262: Strichartz_The Way of Analysis 2000

and so IZl - Z I ~ Iz I + IzII says the length of one side of the triangleis less than the sum of the lengths of the other two sides. This isequivalent to the inequality IZl + zl ~ Izl + IZll since I - zl = [z]. It isthen clear from the geometry that the inequality is an equality exactlywhen the triangle degenerates into a straight line with z and Zl on thesame side of the origino Another way of saying this is Zl = rz where ris a non negative real number.

We can use this insight to fashion a proof. Let us hold Zl fixed andvary z so that Izl = e is also fixed and ask when [z+ zll is maximized.

The answer should be when z = czI/lzd, and if this is indeed thecase we have Iz + zll = Iz I+ Izll for this particular choice and, hence,Iz + zd < Izl + Izd for every other choice (note that Izl + Izd is notvarying because Zl and Izl = e are fixed).

Figure 7.1.2:

o

z)lzl

Iz . zd2 = I(XXI - yyI) + i(XYI + xIy)I2= (XXI - yy¡)2 + (XYI + XIy)2

= x2xI + y2YI + x2y~ + xIy2

because the cross terms cancel, and this is (x2 + y2)(xi + yr). ThusIz· zd2 = Izl2lzd2, so the multiplicativity follows by taking the square­root.

There are many different proofs of the triangle inequality. Firstlet 's justify the name with a geometric interpretation. Consider thetriangle in the plane with vertices at the origin, at z and at Zl. Thenthe lengths of the sides are [z], Izd, and IZl - z], as shown in Figure7.1.2,

7.1 Complex Numbers 243

Page 263: Strichartz_The Way of Analysis 2000

Now we have the problem of maximizing Iz + zll, or what is thesame thing, maximizing Iz + z¡j2 = (x + x¡)2 + (y + y¡)2 given thatZl = Xl + iYl is fixed and x2 + y2 = c2, as shown in Figure 7.1.3.But (x + xI)2 + (y + YI)2 = x2 + y2 + x~ + yr + 2XXl + 2YYl and,since x2 + y2 + x~ + yr = ¿. + xi + yr is fixed we need to maximize2(XXI +YYI) given x2+y2 = ¿.. This can be reduced to a simple calculusproblem by solving y = ±v'c2 - x2 and finding the critical points off(x) = 2(XXI ± Yl'/c2 - x2). Since f'(x) = 2(Xl =F YIX/VC2 - x2), weobtain f'(x) =Owhen XIY = XYI. The two critical points occur when zand Zl are colinear with the origin, but the maximum is clearly assumedwhen they both He on the same side.

This is by no means the simplest proof of the triangle inequality,and we will give another after we discuss trigonometric functions.

A useful consequence of the triangle inequality is the inequalityIlzl - IZIII :5 Iz - zd, which has the interpretation that the distancebetween points 00 coocentric circles is at least the difference of the radii(see Figure 7.1.4). We leave the details to the exercises.

For our purposes, the main reason for introducing the absolute valueis to use it to formulate topological properties of the complex numbers.For example, what do we mean by the limit of a sequence of complexnumbers? If Zn = Xn + w« we could define limn-oo Zn = z wherez = x +iy to mean limn_oo Xn = x and limn_oo Yn = y. In other word­s, we separate the real and imaginary parts of the complex numbersand require convergence of each. In fact, with most concepts involvingcomplex numbers we will take this approach. But there is another ap-

Figure 7.1.3:

Chapter 7 Sequences and Series of Functions244

Page 264: Strichartz_The Way of Analysis 2000

The equivalence of the two definitions is thus a consequence of the factthat for every such rectangle there exista a disc about z entirely con-

Figure 7.1.5:

D

proach, which is to replace the neighborhood Ix - xol < l/n conceptfor reals by the neighborhood Iz - zol < l/n concept for complex num­bers. Tbis would lead to the definition: limk_oozk = Z means givenany error l/n there exists m such that k ~ m implies IZk - zl < l/no

We claim that the two definitions are equivalent. Indeed the seconddefinition says the sequence eventually is entirely contained in any discabout z, while the first definition says the sequence eventually is entirelycontained in any rectangle about z, as in Figure 7.1.5 (limk_ooXlc= xmeans eventually IXk- xl < l/n and limk_oo1Ik = 11 means eventuallyIylc- yl < l/m, and these two conditions mean Zk lies eventually in arectangle) .

Figure 7.1.4:

2457.1 Complex Numbers

Page 265: Strichartz_The Way of Analysis 2000

In the same way we can define a Cauchy sequence of complex num­bers Zl, Z2, ••• by the condition: for a11lln there exists m such thatIZk - Zj I < 1/n if j ~m and k ~ m. We can show this is equivalent tothe condition that X¡, X2, ••• and Yl, Y2, ••• be Cauchy sequences of realnumbers and hence obtain the completeness of the complex numbers:every Cauchy sequence of complex numbers converges to a complexnumber limito (Note that the word completeness is used here in anentirely dífferent sense than the algebraic completeness mentioned be­fore.)

The complex number system has no direct intuitive interpretation.For this reason it met with some difficulty in being accepted into themainstream of mathematics---such terminology as "imaginary" and"complex" reflects the resistance and even hostility of the imagina­tions of many mathematicians to accepting the complex numbers asa legitimate number system. After all, what do these numbers mean?Fortunately, such objections did not prevail, as the complex numbersystem proved extraordinarily useful, in almost every branch of pureand applied mathematics. As far as the question of "meaning" is con­cerned, we can be content with no answer because there are so manydifferent "interpretations" of the complex number system, each of whichgives some significance--if not meaning-to the abstract mathemati-

Figure 7.1.6:

tained in the rectangle (so Zk in the disc implies Zk is in the rectangle),and conversely for every disc about z there exists such a rectangle en­tirely contained in the disc (so Zk in the rectangle implies Zk is in thedisc). These facts are illustrated in Figure 7.1.6. We leave the detailsof the algebraic verification to the exercises.

Chapter 7 Sequences and Series of Functions246

Page 266: Strichartz_The Way of Analysis 2000

We will be considering mostly functions whose domain lies in the realnumbers and whose range lies in the complex numbers. Functions withdomain and range in the complex numbers involvethe theory of complexanalysis, which requires a book of its own. We will only give a hint ofthis theory in the section on power series.

Functions whose range Hes in the complex numbers, also calledcomplex-valued functions, can be dealt with by splitting the real andimaginary parts. IfF(x) is a complex number for each x in the domain,then F(x) = f(x) + ig(x) for uniquely deterrnined real numbers f(x)and g(x); and this defines f and gas real-valued functions on the samedomain. Conversely, if f and 9 are real-valued functions on a commondomain, then F(x) = f(x) + ig(x) is a complex-valued function onthe same domain. We call f and 9 the real and imaginary parta ofF, respectively. Properties such as continuity, differentiability, andintegrability can be defined for complex-valued functions by modifyingthe definition for real-valued functions, and it can easily be shown thatthe complex-valued function has the property if and only if its realand imaginary parts both have the property. For example, if F is acomplex-valued function on [a, b], then F is said to be differentiableat xo, a point in (a, b), if limh_O(F(xo + h) - F(xo))/h exists. Forintegrability we do not have a notion of Riemann upper and lowersums, but we can still define F to be integrable if all the Cauchy sumsS(F, P) converge to a limit as the maximum length of subintervals ofP goes to zero. We leave the details as exercises. Again we must stressthat the complex values appear in the range and not the dornain of thefunctions. For functions defined on the complex numbers there is nocorresponding splitting into functions defined on the real numbers.

By and large, all the theorems we have established so far for real­valued functions are also valid for complex-valued functions. The proofscan either be modified simply or one may break up the complex-valuedfunction F = f + ig and apply the real-valued theorem to the real andimaginary parts. We discuss here only the few exceptions to this rule,where either the analogous theorem is false or a new idea is needed for

7.1.2 Complex-Valued Functions

cal system. Intuition is not completely excluded from the picture; onesimply has to develop an intuition for the complex number system.

2477.1 Complex Numbers

Page 267: Strichartz_The Way of Analysis 2000

x,yx,yx,ysup IIF(z)I-IF(y)11 ~ sup If(z) - f(y)1 + sup Ig(x) - g(y)l,

Theorem 7.1.1 Let F be a complex-valuedintegrablejunction. Thenthe real-valued junction IFI is integrable, and IJ:F(x) tixl sJ: IF(x)1 da:

Proof: We first show Osc(lFI, P) s Osc(f, P)+ Osc(g, P). Thisproves the integrability of IFI since F integrable implies ¡ and 9 areintegrable, so Osc(f, P) and Osc(g, P) can be made as small as desired.

Now the oscillation over a partition is just the sum of the oscillationsover each subinterval multiplied by the length of the subintervals, soit sufficesto show that on each subinterval the oscillation of !PI is lessthan or equal to the sum of the oscillations of f and g. Recall thatthe oscillation of a real-valued function f on an interval was definedto be the difference of the sup and inf, and this is clearly the sup ofthe values If(z) - f(y)1 as z and y vary over the intervalo Thus we aretrying to prove

g(b) - g(a) '()b = 9 X2 ,-a

but there does not have to be any connection between the points Xland X2. It is easy to construct a counterexample, say F(z) = z2 + ix3on [0,1]where F'(z) = 2z+i3z2 and (F(I) -F(0))j(I-0) = l+i, andne'er the twain shall meet. Similarly, the Lagrange remainder formulafor Taylor's theorem does not hold for complex-valued functions, butTaylor's theorem itself is true, as is the integral remainder formula.There is also no intermediate value property for continuous complex­valued functions, since there is no notion of "between" for complexnumbers.

The next result is an example of a theorem that is true for bothreal- and complex-valued functions, but the complex version requires amore intricate proof.

and

f(b) - f(a) = ¡'(Xl)b- a

the proof.The mean value theorem does not hold for complex-valued func­

tions. If we apply the theorem to the real and imaginary parts weobtain

Chapter 7 Sequences and Series of Functions248

Page 268: Strichartz_The Way of Analysis 2000

1. Prove that a complex-valued function is Riemann integrable (inthe sense of convergence of Cauchy sums) if and only if its realand imaginary parts are Riemann integrable.

2. Prove the integral remainder formula for Taylor's theorem for acomplex-valued function.

7.1.3 Exercises

IF(Yj )(xi - Xj_¡) 1 = IF(Yi )I(xj - xi-d,Xi - Xi-l being positive. QED

since

n n

IS(F,P)I = ¿F(yj)(xj - Xj-1) s¿IF(Yj)(Xj-Xi-dl = S(IFI,P)j=l j=l

for then wemay take the sup over a11x and y in the interval and use thefact that the sup of a sum is less than or equal to the sum of the sups(the worst possible case iswhen the sups of If(x)- f(y)1 and Ig(x)-g(y)1are assumed for the same value of x and y, when sUPx,y(lf(x) - f(y)1 +Ig(x) - g(y)1) = SUPx,y If(x) - f(y)1 + SUPx,y Ig(x) - g(y)1 whereas ifIf(x) - f(y)1 and Ig(x) - g(y)1 are largest for different values of x andy, then there will be inequality). But this followsfrom the inequalityinvolving complex numbers Ilzl - IZ111 ~ Ix - xII + Iy - yd, wherez = x + iy and Zl = Xl + iY1' The proof is left to the exercise set 7.1.3,number Iüb.

Altogether, then, we have shown that Osc(IFI, P) ~ Osc(!, P)+Osc(g,P). (We could also prove Osc(IFI,P) ~ Osc(F,P) ífwe definedOsc(F, P) appropriately, taking the definition oí the oscillation of acomplex-valued function on an interval to be the sup of IF(x) - F(y)lover all X and y in the interval.) Thus F integrable implies IFI inte-grable. The inequality IJ:F(x) dxl s J: IF(x)1 dx then follows fromthe corresponding inequality IS(F, P) I s S(IFI, P) for Cauchy sum­s (evaluated at the same points), which is just a consequence of thetriangle inequality (iterated to sums oí n numbers)

IIF(x)l- IF(y)11~ If(x) - f(y)1 + Ig(x) - g(y)1

and for this it sufficesto prove

2497.1 Complex Numbers

Page 269: Strichartz_The Way of Analysis 2000

7.2.1 Convergence and Absolute Convergence

We have already discussed the meaning of the convergence of an infinitesequence of real or complex numbers. We will also want to considerinfinite series, written Xl + x2 + ... or ¿~l xk. To each infinite seriescorresponds the infinite sequence ofpartial sums Sn =L~=l Xk, and wedefine L~l Xk to be convergent with L~l Xk = s il 81, 82, ... is con­vergent with limit 8. This definition applies to real- or complex-valued

7.2 Numerical Series and Sequences

10. a. Prove the inequality Ilzl-lzdl ~ IZ-Zll for complex numbers.(See Figure 7.1.4.)

b. Prove Ilzl - Izd I ~ Ix - xd + Iy - Yd for z = x + iy andZl = Xl + iYl.

8. State and prove the fundamental theorem of the calculus (bothforms) for complex-valued functions.

9. Prove that every disc {z : Iz - Z() I < l/n} is contained in a square{z : Ix - xol < 1/n and Iy - Yol < 1/n} and that every square{z : Ix - xol < 1/n and Iy - Yol < l/n} is contained in a disc{z : Iz - zol < V2/n}.

7. Show that z2 = i has solutions z = 1/ V2 ± i] V2.

5. If z is a complex number define the complex conjugate z to bex - iy if z = x + iy. a) Show that Zl + Z2 = ii+ Z2, ZlZ2 = ZlZ2,and Izl = ,.;ti. b) Show l/z = z/lzI2• c) Show that z is real ifand only if z = z.

6. If I is a complex-valued el function and I(x) "1= oon the domain,prove that 1I1 is el. Can you find a formula for I/I'? (Hint: useexercise 5.)

4. Find a countable dense subset of C.

3. If I(x) is a continuous complex-valued function and I(x) =F Oforany x in the domain, prove that l/I(x) is continuous.

Chapter 7 Sequences and Series of Functions250

Page 270: Strichartz_The Way of Analysis 2000

r rn+lSn---=---'

l-r l-r

so

n r rn+lSn= "'""'rk = -- - --,LJ l-r l-rk=l

we have

(1- r)(r + r2 + ... + rn)_ + 2+ + n 2 n n+l n+l-r r ... r -r -"'-r -r =r-r ,

series. This convergence is sometimes called ordinary conver:gence todistinguish it from the more stringent absolute convergence we wiIl de­fine latero A series that is not convergent is called diver:gent, and wewill sometimes speak of series that diver:ge to +00 or -00 (or writeL~l Xk = ±oo) if SI, S2, ... converges to +00 or -00 in the extendedreal numbers.

From the sequence of partial sums SI,S2,... we can recover theterms Xn of the series by the difference formula Xl = SI and Xn =Sn - Sn-l for n > 1. Note that if SI, s2,'" is any infinite sequence ofnumbers we can obtain by the same formula an infinite series L~l Xkthat has SI, S2,.•• as partial sums. In this way we have a naturallydefined one-to-one correspondence between infinite series and infinitesequences-and so in a sense the theory of the two is the same.Nevertheless, it is sometimes more convenient to think about certainproblems in one or the other form, so we will maintain both perspec­tives. An example of a concept that is natural for series but not forsequences is that of absolute convergence. We say ¿~l Xk is absolutelyconver:gent if ¿~llxkl is convergent. This definition applies to real­or complex-valued series. The idea of absolute convergence in the realcase is that the convergence should not be caused by cancellation ofpositive and negative terms.

Later in this chapter we will discuss infinite series and sequencesof functions. The material in this section lays the foundation for thatmore complicated theory. The reader is probably familiar with most ofthe ideas-at least on an informal level-from calculus.

Before beginning the discussion of general series, we start withan important and familiar example, the geometric series L~l rk forO < r < 1. Since we have

2517.2 Numerical Series and Sequences

Page 271: Strichartz_The Way of Analysis 2000

by the binomial theorem. Since all the terms in the binomial expansionare positive, we have l/rn ~ 1+ nfk, so rn ~ k/{n + k) and this canbe made ~ l/m by taking n ~ k(m - 1).

Note that the rate of convergence of rn to O (hence the rate ofconvergence of L~=lrn) depends on how close r is to 1. The closer ris to 1, the larger k is in the estimate 1/ r ~ 1+ l/k and the slower theconvergence. The estimate we have given is rather crude; the geometricseries actually converges much more rapidly for fixed r. Neverthelessit is true that there is no uniform rate of convergence for all r < 1.We leave this as an exercise (note that the argument we gave for theconvergence does not suffice to proue that the rate of convergence isnot uniform-it only shows that one attempt to estimate the rate ofconvergence leads to dependence on r).

We begin the discussion of general properties of series with theelementary observations of linearity and order preservation. If Lk=l Xk

and Lk=l Yk are convergent, then L~l (Xk + Yk) is convergent andL::l aXk is convergent with L~l (Xk +Yk) = L~l Xk +L~l Yk andL~l aXk = a L~l Xk· Also if Xk ~ Yk for every k, then L~l Xk ~

L~l Yk· These follow from the corresponding results for the sequen cesof partial sums and the fact that like operations prevail (the partialsums of the sum series L::l (Xk + Yk) are the sums of the partialsums of the summands, etc.). Incidentally, there is no such result forproducts; Lk:lXkYk need not be convergent (see exercises). Indeedthere is no relation between Lk=l XkYk and Lk=l Xk and Lk=l Yk·

Another relatively simple observation is that the convergence of aninfinite series (but not the limit) does not depend on any finite numberof terms. Indeed changing the terms Xk for k ~ n means that all thepartial sums Sm beyond Sn are changed to Sm + e for a fixed number eand so converge or not as with Sm' A particular form of modifying afinite number of terms is to rearrange the first n terms. In this case the

1 (l)n n n 1-> 1+- =1+-+..·--+-rn - k k kn-1 kn

We claim for O< r < 1 that this has limit zero as n -+ 00, which willshow L~=lrk is convergent and equals r/{l - r). Since r < 1, thefactor 1/{1 - r) is harrnless, so we need to show limn_oo rn = O. Ifr < 1, then l/r > 1, so by the axiom of Archimedes l/r ~ 1+ l/k forsorne integer k. Then

Chapter 7 Sequences and Series of Functions252

Page 272: Strichartz_The Way of Analysis 2000

Theorem 7.2.2 (Comparison test) Let 2:~1 Xk and 2:r=l Yk be infi­nite series with the Yk'S non-negative and IXkl::; Yk (it suffices to havethis for all but a finite number of terms). lf 2:k:1 Yk is convergent,then 2:::1 Xk is absolutely convergent.

A special case ofthis is the observation that convergence of 2:~1 Xk

implies limk_oo Xk = 0, since Xk = Sk - Sk-1. Of course the converseis false (we have already discussed the fact that the smallness of thedifference between neighboring terms 8k-1 and 8k in a sequen ce is notthe same thing as the Cauchy criterion). As we will show shortly,2:~1 l/k diverges.

A series 2:~1 Xk is said to be absolutely convergent if the seriesof absolute values 2:~1 IXklis convergent. To justify the terminologylet us observe that absolute convergence implies convergence. This isan immediate consequence of the Cauchy criterion, since I2:%=p xkl ::;

2::=p IXklby the triangle inequality. Intuitively we can argue that thepossibility of cancellation of positive and negative values (or the morecomplicated cancellation of positive and negative values in the realand imaginary parts in the complex-valued case) can only help withconvergence. An example of a convergent series that is not absolutelyconvergent is 1 - 1/2 + 1/3 - 1/4 + 1/5 - 1/6 + ... = 2:~1 (-l)k /k(we will prove this later).

Absolute convergence is an extremely useful concept, and most testsfor convergence actually establish absolute convergence. The most fun­damental test is the comparison test.

Proof: If 81, S2, ••• denotes the partial sums, then 8q - Sp-1 =2::=p Xk, so this is exactly the Cauchy criterion for 81,82, ..•. QED

Theorem 7.2.1 2:~1 Xk converges if and only if for every error l/nthere exists m such that I 2::=pxkl < l/n for all p, q satisfying q ~ p ~m.

constant e is zero, so the limit is unchanged. Rearranging an infinitenumber of terms is another matter, which we will return to later.

The Cauchy criterion for convergence of sequences translates easilyinto a criterion for convergence of series that again does not involve thelimito

2537.2 Numerical Series and Sequences

Page 273: Strichartz_The Way of Analysis 2000

Next we consider the important dass of series L~=l l/na where ais a positive real number. (Strictly speaking we have not yet definedna unless a is rational, so we should either restrict the discussion torational numbers a = vt« when nP/q = ~ or else observe that theargument we give depends only on the familiar properties of powers,so the results will ultimately be justified in the next chapter when weestablish these properties in general.)

We want to show L~11/n4 converges for a > 1 and divergesfor a ::; 1. If this reminds you of the result concerning the improperintegrals ft l/x4 dx, it should. In fact one can use the results aboutthe integrals to establish the results about the series (see exercises). Weshall give a more direct argument here. Notice that we will not evaluateE~=11/n4exactly, as a function of e; this is the notorious Riemannzeta function, which is related to questions about the distribution ofprime numbers in ways that are too mysterious to explain here. You

b. (Root test) lf y'jXJ < r for all sufficiently large n and sorner < 1, then EXn converges absolutely.

a. (Ratio test) 11 Ixn+I/xnl < r [or all sufficiently large n and sorner < 1, then EXn converges absolutely; while il Ixn+I/xnl ~ Ilorall sufficiently large n, then EXn diverges.

Theorem 7.2.3

The comparison test can also be used in contrapositive form toprove divergence of series of non-negative terms (see exercises).

Many applications of the comparison test use the geometric series;in fact the familiar root and ratio tests are proved in this fashion. Westate the results here but leave the proofs to the exercises.

Proof: Note that for a series of non-negative terms, convergenceand absolute convergence are the same. The theorem is a trivial con­sequence of the Cauchy criterion as L:=p IXkl ::; E:=p Yk from thehypothesis (if there is a finite number of exceptions to IXkl ::; Yk wemust take p sufficiently large). From the convergence of E~l Yk weknow we can make E:=p Yk < l/n if q ~ p ~ m and so E:=p IXkl < l/nalso, so L~l IXkl converges. QED

Chapter 7 Sequences and Series of Functions254

Page 274: Strichartz_The Way of Analysis 2000

Note that each block of 2k terms (a11equal to 1/2ka) adds up toexactly 2k . 1/2ka = 2(1-a)k, so E!:llbn = E;;';012(I-a)k and thisis a geometric series with r = 21-a. If a > 1, then r < 1, so thegeometric series converges; while íf a ~ 1, then r ~ 1 and the geometricseries diverges. Strictly speaking we have only analysed the behaviorof the partial sums E:=1 bn for k = 2m - 1; but since the terms bnare non-negative, this is enough to decide the convergence (a > 1) anddivergence (a ::; 1) of the series.

In particular, we have established the divergence of the harmonicseries E~=1 l/n. (As an interesting computer "experiment", try evalu­ating the sum of the harmonic series on your favorite computer or pro­grammable calculator. You may come up with a finite answer becausel/k is rounded off to zero before the partial sums become very large.This illustrates how slowly the series diverges. It also raises a profound

1 1 120 + va + va

21 terms1 1 1 1

+22ci+22ci+22ci+22ci

22 terms1 1+ ... +~+ ... +~+ ....

2k terms

00

Lbn =n=1

might also be intrigued with the identity E::l1/n2 = 1r2/6, which wewill establish in the chapter on Fourier series.

The key idea for deciding the convergence or divergence of the seriesE~=11/na is to break the sum into dyadic pieces, meaning the range2k ~ n ~ 2k+l -1. For n in this range the values of l/na vary between1/2ka at the largest to 1/2(k+l)a = 1/2a .1/2ka at the smallest-so theyare roughly the same order of magnitude. For the proof of convergencewe use the upper bound, l/na ~ 1/2ka if 2k ~ n < 2k+l, and for theproof of divergence we use the lower bound, l/na ~ 1/2a . 1/2ka if2k ~ n < 2k+ 1• By the comparison test we thus need to determine theconvergence of the series E:=1 bn where bn = 1/2ka where k is relatedto n by 2k ~ n < 2k+l. Thus

2557.2 Numerical Series and Sequen ces

Page 275: Strichartz_The Way of Analysis 2000

If E:=l an is an infinite series, we say E~=l bn is a rearrangement ifthere is a one-to-one correspondence between the terms; or, put anotherway, bn = amen) where m is a function from the natural numbers to thenatural numbers that is one-to-one and onto. Thus E~=l bn is a seriesconsisting of the same terms as E~=l an but with a different order.

At first we might think that if E~=l an is convergent, then E:=l bnshould also be convergent with the same limit, but this turns out tobe the case only for absolutely convergent series. To understand whythis should be so, we consider the intuitive idea that the tail of theseries is small. That is, E~l an = E~=l an + E~=N+lan and thefinite sum E~=l an is close to the limit, so E~=:V+lan is sma11. Inthe absolutely convergent case E~=:V+llanlcan also be made small,but in the nonabsolutely convergent case E~=N+l lanl is divergent nomatter how large N is. This means E~=N+l an is small only becauseof cancellation.

Now consider a partial sum of the rearrangement L~ 1 bn. If wetake m large enough, then a11the terms al, a2, ... , a:v will show up(here we fix N and m will depend on N) so that

7.2.2 Rearrangements

The series E~l -1/(2k - 1)2k converges absolutely by comparisonwith Er=11/k2, so the partial sums E~l(-I)n/n with m even con­verge and ifm is odd we have E~=l (-I)n/n = (_I)m /m+ E~-/( -1)n/n,so the odd partial sums converge to the same limito

We will discuss another way to establish the convergence ofE::l (-1)n [n in section 7.2.3.

-1 1 -12k - 1+ 2k = (2k - 1)2k'

philosophical question: can there be any natural phenomenon whoseexistence depends on the divergence of the harmonic series?) Why,then, does the series E~=1(-1)"[n converge? One way to see this isto combine consecutive terms, say n = 2k - 1 and n = 2k,

Chapter 7 Sequen ces and Series of Functions256

Page 276: Strichartz_The Way of Analysis 2000

a. Let 1:~=1an be absolutely convergent. Then any rearrangementis also absolutely convergent and has the same limito

b. Suppose every rearrangement 01 1:~=1an is convergent. Then1:::1 an is absolutely convergent.

Proof:a. Given any error l/m, we have to show that 1::=1 bn can be

made to differ from 1:~1 an by at most l/m, by taking k large enough.Here 1::=1 bn denotes a fixed rearrangement. First we choose N largeenough so that E~N+1lanl ~ l/m and then choose k large enough sothat al, a2,"" aN occur in bl,~,"" bk. Then 1::=1 bn - L:~=l an isa selection of the tail 1::=N+l an, hence

'P¿'an s ¿ lanlN+l n=N+l

for p large enough so that each an in the selection of the tan 1::""+1anhas n ~ p. This can be made as small as desired by taking N large(hence m depending on N also large), so 1:~=1bn differs from 1:~=1anby as little as desired, showing that 1:~=1bn converges to the samelimito But in the nonabsolutely convergent case we do not know thatthe selection of the tail 1::""+1an is small just because the tan is small.On the contrary, we expect that by an especially nasty selection ofthe tail we should be able to get something large, since 1:~=N+1lanldiverges. For example, in the case that all the terms are real, thedivergence of 1:~=N+1lanlcan only occur if either the sum of all thepositive an's or the sum of all the negative an's diverges. Let us sayanA;> Ofor sorne sequence nI < n2 < n3 < ... and 1:::1 anA; = +00.Then if the selection of the tan 1::"+1an is made entirely from thevalues of n equal nI, n2,"" then the selection of the tail can be rnadeas large as desired! This is the idea that lies behind the next theorem.

Theorem 7.2.4

where :E:'V+1an just represents a finite sum of some of the an withn ~ N + 1-a selection of the tan 1:::N+1 ano For the absolutelyconvergent case

2577.2 Numericol Series and Sequences

Page 277: Strichartz_The Way of Analysis 2000

We say that L~= l an converges unconditionally if every rearrange­ment converges. The theorem we have just proved shows that L~=I anconverges unconditionally if and only if it converges absolutely.

A series is said to converge conditionally if it converges but sornerearrangement diverges. We can generalize the previous argument toshow that a conditionally convergent series oí real numbers can berearranged so that it converges to any prescribed real number! The

b. Suppose L~=llllnl diverges. We need to construct a rearrange­ment oí E~=l an that diverges. Assume first that the terms an arereal. We can also assume without loss oí generality that there exists asequence nl < n2 < n3 < ... such that anA:> Oand L~l ank = +00[if not, then we can find ank < Owith E~l ank = -00, and the ar­gument is essentially the same). To simplify the notation let Ck= anA:and let dI, d2, ... be the remaining an 's (they may be finite in number].To describe a rearrangement oí E~=l an we will tell how to pick aH theCk's and dk 's in sorne order.

Now the idea is that we would like to take CI, C2, ••• for this willclearly give a divergent series. But this is not a rearrangement, becausewe have omitted the dk 's. So we need to fix this by sprinkling the dk 'svery thinly among the Ck's. Look at dI. Wait until cI + c2 + ... + CNl

exceeds Idll + 1, so then CI + C2+ ... + CN1+ dI ~ 1. Next look at d2'Wait until Cl + .. '+CN1 +dl +CN1+l + .. '+CN2 exceeds Id21+2,so thencI + ... + CNl + dI + CNl +1 + ... + CN2 + d2 ~ 2. We can continue in thisfashíon to sprinkle in dk after CNA:so that the sum up to dk exceeds k.In this way we obtain a rearrangement that diverges.

In the case where the an are complex, say an = Xn + iYn, the diver­gence oí E~l lanl implies that either E~=I Ixnl or E~=I IYnldiverges(else E~=llanl $ E~=llxnl + E~=I IYnlby the triangle inequality,hence E:=l lanl converges by comparison). Then by the previous ar­gument we can rearrange E:=l an so that either the real or imaginaryparts oí the series diverge, and this implies the complex series diverges(convergence oí a complex series is equivalent to convergence oí the realand imaginary parts). QED

so L~=l bn converges to the same limit as E~=l ano The absoluteconvergence oí L~=l s; follows because L~=l Ibnlis a rearrangementoí E~=I lanl·

Chapter 7 Sequences and Series oí Functions258

Page 278: Strichartz_The Way of Analysis 2000

these correspond to summing by rows or by columns. We could alsoconsider summing along diagonals to get an ordinary infinite series. Ingeneral, the fact that one of these procedures yields a finite numberdoes not imply that any of the others wiIl or that even if they are allfinite that they must be equal. However, if any of these proceduresyields a finite number when applied to lamnl, then we say the double

all al2 al3a21 a22 a23a31 a32 a33

Thinking of amn as an infinite matrix

idea is that now we can divide the terms an into those that are non­negative, CI,C2, C3,'" and those that are negative, dI, d2,"" and wemust have E:=l en = +00 and E~=l dn = -00 (if both were finitethe series would converge absolutely, while if only one were finite theseries would diverge). Also, limn_oo en = Oand limn_oo dn = Osincethe series E~=l an converges. Now suppose we want to rearrange theseries to converge to A. Say A ~ O. Take CI,C2,"" cNl until CI+C2+ + CNl > A for the first time. Then take dl, ... , dN2 untilcI + + CNl + dI + ... + dN2 < A for the first time. Keep switchingback and forth between c's and d's to make the partial sums oscillateaboye and below A. This is always possible since E~=l en = +00 andE::I dn = -00, and we eventuaIly use all the c's and d's so that wehave a genuine rearrangement. Finally the conditions limn-oo en = Oand limn_oo dn = O imply that the limit of the rearranged series is A,because we switch directions just when the partial sums cross the valueA. We leave the details as an exercise.

Returning to the positive results, we note also that absolute con­vergence implies the possibility of rearrangement of multiply indexedseries. For example, let tlmn denote a real or complex number for eachnatural number n and m. Then we can sum all the amn in either order,

2597.2 Numerical Series and Sequences

Page 279: Strichartz_The Way of Analysis 2000

Thus, if the term Am+lBm goes to zero, the question of convergenceand the limit ofE~1 Anbn is the same as the question of convergence

m

+... + bm(Am+l - Am) = - L bnAn + Am+lBm.n=1

bl(A2 - Al) + (Aa - A2) + + (Am+1 - Am))+~«Aa - A2) + (A4 - Aa) + + (Am+1 - Am))+... + bm(Am+1 - Am)

If we collect all the terms that contain b1, then all the terms thatcontain ~, and so on, we obtain

= (A2 - Al)b1 + (Aa - A2)(b1 +~)+(~ - Aa)( b1 +~ + ba)+... + (Am+l - Am)(b1 + b-.! + ba + ... + bm).

n=1

m¿anBn = a1B1 + a2B2 + ... + amBm

In this sectionweestablish a generalmethod oí provingconvergenceoínon-absolutelyconvergentseries, called summation by parts, which isan analog of the integration by parts formula. It can be used to giveanother proof of the convergenceofE::1(_I)n In.

Suppose a series can be written in the form E~=1Anbn (of coursethis is alwayspossiblein manyways,the idea being that a cleverchoiceoí An and bn will be required to get anything out of the method). Wethen want to do the analogof integrating bn and differentiatingAn. Sowe let Bn = E~=1bk and an = An+l - An. Then

7.2.3 Surnrnation by Parts*

infiniteseriesEE amn is absolutelyconvergent,and we can prove thatall three summingprocedures yield the same finite number. The ideaof the proof is the same as in Theorem 7.2.4. We leave the details asan exercise. An analogous result holds for multiple integrals; we willcover this in Chapter 15.

Chapter 7 Sequencesand Seriesoí Functions260

Page 280: Strichartz_The Way of Analysis 2000

= Al - Am+l,

which converges as m -+ 00 to Al since Am -+ O. Thus L~=l -Cln isan absolutely convergent series. Now if we multiply the terms of anabsolutely convergent series by a bounded sequence the resulting seriesis still absolutely convergent. In this case IBnl s M, so L~=p lanBnl ~M L~=p lanl -+ Oas p, q -+ 00, proving L::l anBn is absolutely con­vergent by the Cauchy criterion. Sincewe also have liIDn_oo An+1Bn =Osince liIDn-oo An+l = Oand IBnl ~ M, we conclude that L~l Anbnis convergent by summation by parts. QED

n=l

mE-an = -(A2 - Al) - (A3 - A2) - ... - (Am+l - Am)

Proof: Part a is a special case of part b with bn = (-1) n, so weprove part b. Note that if we form Cln = An+l - An, then -Cln 2:: Osince An+ 1 ~ An and

a. Let A¡, A2,." be a sequence 01positive numbers converging mono­tonically to zero (so Al 2:: A2 ~ A3 ... and liIDn-oo An = O). ThenL~=1( -1)nAn is convergent.

b. Suppose also bl, ~, ... is any sequence 01 real numbers with Bn =bl + ... + bn bounded, say IBnl ~ M [or all n. Then L~=lAnbnis convergent.

Theorem 7.2.5

and the limit of - 2:~=1ClnBn (note that the absolute convergence ofthese two series is a different question, because the above manipulationswould not work with absolute values). By clever choice of An and bnwe may well find it easier to prove convergence of 2:ClnBn. For theseries 2:::1 (-I)n/n we take An =l/n and bn=(_I)n. Then

1 1 -1a =----=---n n + 1 n (n + l)n

and Bn = O or -1 depending on whether n is even or odd. SinceAn+lBn -+ Oas n -+ 00, we conclude L~=l(-I)n/n is convergent since¿~-lanBn is absolutely convergent by comparison with L~=11/n2.Of course this is essentially the same argument as before, but we cannow generalize it.

2617.~ Numerical Series and Sequen ces

Page 281: Strichartz_The Way of Analysis 2000

9. Give an example of a doubly indexed series amn such thatE:=l(E::l amn) =F E~l{E:=l tlmn).

10. Prove the root test (Theorem 7.2.3b).

11. Suppose lan I :5 bn - bn+1 where bn decreases monotonically tozero. Prove that 2:~=1an converges absolutely.

12. *Show that if E~l an is absolutely convergent, there exists anabsolutely convergent series 2:~=1bn such that lillln_ooan/bn =O. Explain why this result shows that there is no "universal"comparison series for testing absolute convergence.

8. Prove that if (2:::1Iamnl) is finite for every m andE:=l (E~=l lamnl) is finite, then E:=l (E~=l amn) =E:: 1(E:=l tlmn).

7. Prove that a series of complex numbers is absolutely convergentif and only if the series of real and imaginary parts are absolutelyconvergent.

6. Prove that every conditionally convergent series of real numbersthat does not converge absolutely can be rearranged to have anyprescribed real limito

3. Show that it is not true that for every error l/m there exists nsuch that IE~=l rlc - r/{l- r)1< l/m for a11r in O< r < 1.

4. Prove the ratio test (Theorem 7.2.3a). What does this te11you iflillln_ooIXn+l/xnlexists?

5. ShowE~=2 l/na :5 Ji' l/xa dx, and use this to prove the conver­gence of the series for a > 1.

2. State a contrapositive form of the comparison test that can beused to show divergence of a series.

1. Give an example of two convergent series E%:l XIc and E%:l v«such that 2:~1 XkYIc diverges. Can this happen if one of the seriesis absolutely convergent?

7.2.4 Exercises

Chapter 7 Sequences and Series of Functions262

Page 282: Strichartz_The Way of Analysis 2000

The results discussed in this section are valid for both sequences and se­ries of functions. We will usually state the result for sequences {!n{x)}and leave as an exercise for the reader the formulation of the analogousresult for series ¿k:l gk, since this amounts to stating the sequenceresult for the partial sums fn = ¿~=l gk of the series.

Let f¡(x), f2(x), ... be a sequence of real- or complex-valued func­tions defined on a common domain D. We say the sequence convergesto a function f(x), written limn_oo fn(x) = f(x), if for each Xo in thedomain the sequence ofnumbers fn(xo) converges to the number I(xo).This notion of convergence is sometimes called pointwise or simple con­vergence in order to distinguish it from other notions of convergence wewill have to consider. In fact, we will see that pointwise convergence isnot always a very useful notion.

Consider, for example, the infinite series ¿~=l z". From our dis­cussion of the geometric series we recognize that this is a pointwise con­verging series of functions on the domain -1 < x < 1, and the limit ísx / (1-x). However, we observed that the rate of convergence gets slow­er as x approaches 1. Thus ifwe select an error l/m, we cannot say howmany terms N we have to take to make ¿~=l xn differ from x/(l- x)by at most l/m without first specifying a; In other words, the orderof quantifiers in the definition of pointwise convergence is universal­existential (for every x there exists N). If we want the existential­universal form-which is a stronger condition-then we come up witha stronger notion of convergence, called uniform convergen ce: a se­quence In(x) 01 functions on a common domain D is said to convergeunilormly to a function f (x) (equivalently, f (x) is said to be the uni­form limit of fn(x)) if for every error l/m there exists N (dependingon l/m) such that for all x in the domain D, Ifn(x) - f(x)1 < l/m if

7.3.1 Uniform Limits and Continuity

7.3 Uniform Convergence

14. *Show that ¿~=l1/an converges, where al, a2, ... is the Fibonac­ci sequence 1, 1,2,3,5,8, 13, ...

13. Give an example of a divergent series whose partial sums arebounded.

2637.3 Uniform Convergence

Page 283: Strichartz_The Way of Analysis 2000

so that fn(x) = ° if x ~ 0, fn(x) = 1 if x ~ l/n, and fn(x) is linearin between, fn(x) = nx if ° < x < l/n. Then clearly limn_oo fn(x)exists for every x. If x ~ 0, then fn(x) = ° for every n, so the limit iszero; while if x > 0, then fn(x) = 1 once n > l/x, so the limit is one.The limit function (shown in Figure 7.3.2) has a jump discontinuityat x = 0, but the functions f n (x) are all continuous. Of course theconvergence is not uniform-the closer x is to zero the longer it takesfor fn(x) to approach one. Since fn(1/2n) = 1/2 and f(1/2n) = 1, wecan never make Ifn(x) - f(x)1 < 1/2 for alI x. We wiU see shortly thatthis must always be the case: a uniform limit of continuous functions iscontinuous. This example also shows that compactness of the dornainwill not suftice to turn convergence into uniform convergence. (It isimportant not to confuse uniform convergence with uniform continuity,where compactness of the domain does suffice!)

The notion of uniform convergence can be rephrased in terms of

Figure 7.3.1:

1/n

n ~ N. The definition of uniforrn convergence of a series of functionsis analogous. In particular, 2:~=1xn does not converge uniformly on-1 < x < 1.On the other hand it does converge uniformly on a smallerdornain -Xo < x < :co for fixed :co < 1.

If a convergent sequence fails to converge uniformly, there may beserious consequences in that the limit function may fail to share prop­erties with the approximating functions. For example, the pointwiselimit of continuous functions may not be continuous! Let fn (x) be thefunction whose graph is shown in Figure 7.3.1

Chapter 7 Sequences and Series of Functions264

Page 284: Strichartz_The Way of Analysis 2000

We leave the details to the reader.Conversely, suppose the Cauchy criterion is satisfied. It followsthat

at each point z, the numerical sequence {fn(z)} satisfies the Cauchycriterion, hence it must converge to a limito We then define f (z) to be

Ifn(z) - fk(Z)1 ~ Ifn(z) - f(z)1 + If(z) - fk(Z)I·

Proof: It is easy to show that a uniformly convergent sequencesatisfies the Cauchy criterion using the estimate

Theorem 7.3.1 (Cauchy criterion) A sequence of functions f n(z) con­verges uniformly to sorne limit function if and only if given any errorl/m there esists N sucñ that k, n ~ N imply Ifn(z) - fk(Z)1 ~ l/m forall e,

the quantities sUPx{lfn(z) - f(z)I}, which can be thought of as a kindof "distance" between the functions fn and f (we wiIl return to thisinterpretation in a later chapter). We claim fn -+ f uniformly if andonly if limn_oosUPx{lfn(z) - f(z)l} = O. Note that the limit here isthe limit of a sequence of real numbers. Writing out the meaning oflimn_oo sUPx{lfn(z) - f(z)l} = O we have: given any error l/m thereexists N such that n ~ N implies sUPx{lfn(z) - f(z)l} ~ l/m. Ofcourse sUPx{lfn(z) - f(z)l} ~ l/m is the same as Ifn(z) - f(z)1 ~ l/mfor every z, so we are back to the definition of uniform convergence.

There is a Cauchy criterion for uniform convergence:

Figure 7.3.2:

1-'------

2657.3 Uniform Convergen ce

Page 285: Strichartz_The Way of Analysis 2000

Figure 7.3.3:

x y

InI

Proof: The idea of the proof is that since we can make fn (x) closeto f(x) for all points x, we can turn questions of continuity about finto questions of continuity about fn. More precisely, let an error l/mbe given, and choose n large enough so that Ifn(x) - f(x)1 ~ l/3m forall z, Then we can compare f(x) with f(y) by first comparing f(x)with fn(x), then comparing fn(x) with fn(Y), and finally comparingfn(Y) with f(y), using the triangle inequality, as shown in Figure 7.3.3:

Theorem 7.3.2 Let fn converge to f uniformly on the domain D. Jfall the fn are continuous at a point Xo in D, then f is also continuous atxo. Jf all the [« are continuous on D, then f is continuous on D. Jf allthe fn are uniformly continuous on D, then f is uniformly continuousonD.

We leave it as an exercise to formulate the Cauchy criterion forpointwise convergence. The next theorem makes precise the idea thatuniform limits preserve continuity.

this limito To complete the proof we need to show that the convergenceof fn(x) to f(x) is uniformo To see this, we take the Cauchy criterion asstated and let one of the indices, say k, go to infinity. That is, given theerror l/m, we find N such that k, n ~ N imply Ifn(x) - ik(x)1 ~ l/mfor all x; then n ~ N implies Ifn(x) - f(x)1 ~ l/m for all x sincenon-strict inequalities are preserved in the limito But this is exactlythe statement of uniform convergence. QED

Chapter 7 Sequences and Series of Functions266

Page 286: Strichartz_The Way of Analysis 2000

This theorern is a two-edged sword: on the one hand it shows theusefulness of uniform convergence; but on the other hand it shows thatwe frequently can expect uniform convergence to fail. For exarnple,when we consider Fourier series, we will want discontinuous functionsto have Fourier series. But these cannot converge uniformly becausethe sines and cosines in the Fourier series are continuous functions,so the theorem says that uniformly convergent Fourier series can onlyrepresent continuous functions.

The first and third terms are at most l/3m by the uniforrn closenessof I and In, so I/(x) - l(y)1 ~ 2/3m + I/n(x) - In(y)1 for all X andy. Note that we still have l/3m to play around with when we want tomalee If(x) - l(y)1 ~ l/m. The value of n and hence the function Inis now fixed, but the choice of n díd depend on the error l/m that wewere given.

Consider the continuity at the point xo. After choosing n as aboye,we can then use the continuity of In to assert that there exists a neigh­borhood Ix - xol < l/N of Xo in which I/n(x) - In(xo)1 ~ l/3m. ThiswiIl then rnake I/(x) - l(xo)1 ~ 2/3m + I/n(x) - In(xo)1 ~ l/m. Thisis the continuity of f at xo.

Notice that the order of choice is crucial: given l/m we first choosen, then based on the continuity of the particular fn we choose theneighborhood Ix - xol < l/N. Ir we did not have uniform convergencewe would not know that a single function In could rnake I/n(x) - l(x)1small for all e. We would only know that for each X sorne In would dothis, and we would be unable to use the continuity of In to carry outthe proof.

The continuity of I on D follows from the continuity of In on Dbecause of the pointwise resulto

FinaIly suppose the In are uniformly continuous. Then given theerror l/m we need to find l/N independent ofthe point so that Ix-yl <l/N implies If(x)- f(y)1 < l/m. Choosing n as before, we already haveI/(x) - l(y)1 ~ 2/3m+l/n(x) - In(y)l· By the uniform continuity ofthisIn we can find l/N so that Ix-yl < l/N implies I/n(x)- In(y)1 ~ l/3m,and this l/N then does the trick. QED

I/(x) - f(y)1 ~ I/(x) - fn(x)1 + Ifn(x) - In(y)1 + I/n(y) - l(y)l·

2677.3 Uniform Convergence

Page 287: Strichartz_The Way of Analysis 2000

(or more generally we can interchange the limit and integral ouer anysubinterval) .

Proof: The idea of the proof is that if Ifn(x) - f(x)1 ~ l/m, thenthe Cauchy sums S(f, P) and S(fn, P) evaluated at the same pointscan differ by at most (b - al/m. Thus suppose the error l/m is given.We want to show that the Cauchy sums S(f, P) can be made to differfrom each other by at most l/m by making P sufficiently fine, sincethis will prove the integrability of f. Thus choose n large enough (sayn ~ N) so that Ifn(x) - f(x)1 ~ 1/3m(b - a). Then

IS(fn,P) - S(f,P)1 = IE(fn(Yj) - f(Yj))(Xj+l - Xj)!< Elfn(Yj) - f(Yj)l(xj+l - Xj)

1 1< 3m(b - a) E(Xj+l - Xj) = 3m

7.3.2 Integration and Differentiation of Limits

Uniform convergence can also be used to establish integrability and dif­ferentiability of limits of functions and the interchange of the operationand the limito We begin with integration because it is simpler, and wewiIl need it in the discussion of differentiation.

Theorem 7.3.3 Let [« converge uniformly to f on a finite internal[a, b). lf all the fn are Riemann integrable on [a, b], then so is f and

lim lb fn(x) dx = lb f(x) dxn-oo a a

The theorem just proved is notorious because Cauchy got it wrong­he claimed to prove that the limit of continuous functions is continuous.At least part of the reason why he went wrong was that he couchedhis proof in the language of infinitesmals. For an interesting discussionof this see the Appendix 1to Proofs and refutations by Imre Lakatos,Cambridge University Press, 1976.

Uniform convergence depends on the domain of the functions. Fre­quently we encounter the situation of a sequence offunctions fn(x) thatconverges pointwise but not uniformly on an open domain D, but therestrictions oí fn to all compact subsets converge uniformly. In such acase we say that f n converges uniformly on compact sets. For example,¿~=lxn converges uniformly on the domain Ixl ~ 1 - € for any € > O.

Chapter 7 Sequences and Series of Functions268

Page 288: Strichartz_The Way of Analysis 2000

There are a few subtle points about the aboye proof that are worthobserving. First note that in the proof that f is integrable we did notuse the full strength of the uniform limit-we only needed 1fn(x) -f(x)1 ~ l/m for one particular value of n for each l/m. For the proofof the limit and integral interchange we did need to use this estimateholding for all n ;:::N. Here we had to introduce a partition P to

IS(I, P) - S(I, P')I ~ IS(f, P) - S(ln, P)I+IS(ln,P) - S(ln, P')I+IS(ln,P') - S(f, P')I2< 3m + IS(ln, P) - S(fn, P')I·

Now that n is fixed, we know from the integrability of Í« that S(fn, P)converges to J: fn(x) dx as the maximum intervallength of the parti­tion goes to zero. Therefore by taking the maximum intervallength forP and 1" sufficiently small we can make IS(fn,P) - S(fn, PI)I ~ l/3m,hence IS(I,P) - S(I, 1")1 ~ l/m. This provea the integrability of f.

Finally we need to show that liIDn_ooJ: f n(x) dx exists and equalsJ: f(x) dx. From the integrability of f weknow IS(f,P)- J: f(x) dxl ~l/3m ifthe partition P is sufficientlyfine. Ifwe choosen as before, thenIS(ln, P) - S(f, P)I ~ l/3m, so IS(fn,P) - J: f(x) dxl ~ 2/3m. Notethat for this to hold we only have to choose n large enough (n ;:::N)so that Ifn(x) - f(x)1 ~ 1/3m(b - a) and the partition P sufficientlyfine. These conditions on n and P depend on l/m but are independentof each other. Now for any fixed n ;:::N, we can also require that thepartition P be sufficiently fine so that IS(fn, P) - J: fn(x) dxl ~ l/3m.Here the partition P does depend on n, but when wecombine this withthe previous estimate IS(fn, P) - J: f(x) dxl ~ 2/3m we obtain simplyIJ: fn(x) dx-J: f(x) dxl s l/m, so the partition P used for making thecomparison drops out of the picture. Since this holds for every n ~ N,with N depending on l/m, we have the result liIDn_ooJ: f n (x) eh =f: f (x) dx as desired. QED

for any partition P, where Yj denotes an arbitrary point in the subinter­val [Xj, Xj+l) that is the same for both Cauchy SUInS (we could also getthe same estimate comparíng Riemann upper and lower sums). Thusif P' is any other partition,

2697.3 Uniform Convergence

Page 289: Strichartz_The Way of Analysis 2000

that we sought for all n ~ N.This theorem may seem perfectly reasonable, but it turns out not

to be as useful as one might like because there are many examples ofnon-uniform limits of functions (expecially in Fourier series) where onewants to interchange the limit and integral. Here is a particularly vex­ing example. Let rI, r2, ... be an enumeration of the rational numbersin the interval [0,1], and let In be defined on [0,1] by In(x) = 1 ifx = rI, r2,"" rn and by In(x) = O otherwise. Clearly In(x) is con­verging pointwise (but not uniformly) to Dirichlet 's function I (x) = 1if x is rational and I(x) = O if x is irrational. Now each In is zeroexcept on a finite set of points, so In is integrable and J01 In(x) dx = O.Thus limn_oo J01 In(x) dx exists and equals zero, so we would expectJoII (x) dx =O.But I (x) is not integrable and JoI I (x) dx is not defined.It is exactly this kínd of failure of the Riemann integral to behave prop­crly in limits that are not uniform that will ffi0tivat€ liS to study a moregeneral notion of integration-the Lebesgue theory-in a later chapter.

Next we consider the problem of interchanging limits and deriva­tives. First we note that the limit of differentiable functions need notbe differentiable, even if the limit is uniformo The reason for this isthat the uniform closeness oftwo functions, Iln(x) - l(x)1 ~ l/m, doesnot imply anything about the relative smoothness of the graphs. Thefunction I(x) can be perfectly smooth, say I(x) == O,while In(x) canhave lots of bumps and wiggles, as in Figure 7.3.4. Perhaps the sim­plest example is the approximation of I(x) = [z] by smooth functionsobtained by rounding out the corner, as shown in Figure 7.3.5.

To get a positive result we need to assume the uniform convergenceof the derivatives.

Il!n(X)dx-l !(X)dxl ~ ~

make the comparisons S(f, P) to J: I(x) da, S(fn, P) to S(f, P), andS(jn, Pi to i:fn(x) dx. The particular partition was restrtcted by themaximum length of subintervals in two ways, once depending on I andonce depending on In' It is not true that one particular partition Pcan work for all n ~ N. Nevertheless, we still obtained the estimate

Chapter 7 Sequences and Series of Functions270

Page 290: Strichartz_The Way of Analysis 2000

for all small h. We therefore seek an indirect method vía the funda­mental theorem of the calculus. This is frequently a good method forgaining information about derivatives.

fn(x + h) - fn(x)h

f(x + h) - f(x)h

Proof: A direct approach to the proof wiIl not work, since there isno obvious way to estimate the dift'erenceof the differencequotients forI and In,

Theorem 7.3.4 Let fn be defined on (o, b) and be el. lf fn(x) ~ f(x)pointwise and f~ converges uniformly to g(x), then f is el and f' = g.

Figure 7.3.5:

Figure 7.3.4:

o ee

2717.3 Uniform Convergence

Page 291: Strichartz_The Way of Analysis 2000

7.3.3 Unrestricted Convergence*

We conclude this section with another way of looking at uniform con­vergence, at least for sequences of continuous functions on a compactintervalo Since we are dealing with sequences of functions, it makessense to vary the point as well as the function and to ask if fn(zn)converges to f(x) whenever Xn converges to e.

Theorem 7.3.5 Let fn(x) be a sequence 01 continuous functions on acompact domain D. Then fn(xn) converges to f(z) [or all sequences

There are a number of ways to improve this resulto It suffices tohave f~ converge uniformly on every subinterval (a + l/m, b - l/m)because then we can apply the argument on each subinterval and theconclusion f' =9 then holds on (a, b). It suffices to assume In (z) con­verges at just one point z = Zo, for then fn(z) = In(xo) + ¡:o f~(t) dtimplies fn(x) converges at every point since the right side converges.In fact, the convergence is uniform if the interval is bounded. We can­not, however, conclude that In(z) converges merely from the uniformconvergence of f~(x), as the example fn(z) == n shows. Also, the point­wise convergence of f~ and Í« will not guarantee the differentiabilityof f. For a counterexample we need only integrate the functions in theexample of a. discontinuous limit of continuous fun-+ions

It is also possible to show that the theorem is valid for differentia­bility in place oí continuous differentiability. But the proof is trickier(using the mean value theorem in place oí the fundamental theorem) ,and the result is not as useful.

which holds because fn is el. We now take the limit as n ~ 00,toobtain f(z) = f(zo) + ¡:o g(t) dt, where we have used the previous the­orem to obtain the limit oí the integral. Since we know 9 is continuous(it is the uniform limit oí continuous functions), the differentiation ofthe integral theorem tells us that f is el and f' (z) = g( z) (the termf(zo) is just a constant whose derivative is zero). QED

fn(z) = fn(zo) +1xf~(t) dt,Xo

We fue a point Zo in the interval and write the integration of the

Chapter 7 Sequences and Series of Functions272

Page 292: Strichartz_The Way of Analysis 2000

The trouble with the fírst estímate is that we would need to invoke thecontinuity of In to estimate Iln(xn) - In(x)l, and this would requiremaking IXn - xl small depending on In' Since we cannot vary the nseparately for Xn and In, we would be in trouble. Thus we work withthe second inequality.

Here we need to use the uniform convergence to estimateIln(xn) - l(xn)1 and the continuity of I to estimate I/(xn)- I(x)l. Giv­en any error l/m we first use the uniform convergence to find N suchthat n ~ N implies Iln(Y) - f(y)! ~ l/2m for any y and, hence, in par­ticular for y = xn. Thus Iln(xn) - l(xn)1 ~ l/2m for all n ~ N. Nextby thc contnuíty :~ J, a conscquencc of t~~ fact fh'lt J !~the uni+vm limitof continuous functions, there exists l/k such that Iy - xl ~ l/k impliesI/(y) - l(x)1 ~ l/2m (here x is fixed and y is variable). In particularsetting y = Xn we have IXn - xl ~ l/k implies !I(xn) - l(x)1 ~ l/2m.Summing the two estimates we obtain Iln(xn) - f(x)1 ~ l/m, the es­timate we want, under the two conditions n ~ N and IXn - xl ~ l/k.But since Xn ~ x by assumption, we can make IXn-xl ~ l/k by takingn large enough (say n ~ N'), so the conclusion Iln(xn) - l(x)1 ~ l/mfollows by taking n larger than both N and ..:,\TI. This is the st ;:..tementlimn_oo In(xn) = I(x).

Conversely, assume In(xn) ~ I(x) if Xn ~ e. To show In con­verges uniformly to I we consider the sets Am,N = {x in D such thatIln(x) - l(x)1 ~ l/m for all n ~ N}. The condition for uniform con­vergence that we want to preve is that for all m there exists N suchthat Am.N = D. Ordinary pointwise convergence would say that for

or

Iln(xn) - l(x)1 ~ Iln(xn) - In(x)1 + I/n(x) - I(x)l

Proof: Suppose first In ~ I uniformly and Xn ~ X in D (thishalf of the proof does not require the compactness of D). We needto compare In(xn) with I(x). Following the method of proof used inTheorem 7.3.2, there are two intermediate values, In(x) and I(xn),that we might consider in making the comparison,

Xn 1" D convergeut to x (Jvr aü x in D) iJ and (Jnl~ I¡ In co.werqes.,.".~,._.,., •• & ... I. . .

2737.3 Uniform Conoerqence

Page 293: Strichartz_The Way of Analysis 2000

4. Prove that a sequence of complex-valued functions converges u­niformly if and only if the sequences of real and imaginary partsconverge uniformly.

3. Give an example of a sequence of continous functions on a compactdomain converging pointwise but not uniformly to a continuousfunction.

2. Suppose In -t I and the functions In all satisfy the Lipschitzcondition I/n(x) - In(y)1 ~ Mlx - yl for sorne constant M in­dependent of n. Prove that I also satisfies the same Lipschitzcondition.

1. State and prove a Cauchy criterion for pointwise convergence.

7.3.4 Exercises

all x and all m there exists N such that x is in Am.N. But we areassuming more than pointwise convergence, and we claim the followingstronger conclusion: [or each x in D and every m, there exists N suchthat Am,N contains a neighborhood 01e,

We prove this claim by contradiction. If it were not true, then therewould exist a fixed x and m and a sequence of points {x N} convergingto x such that each xN is not in Am.N. But XN not in Am.N means thereexists k(N) ~ N such that I/k(N)(xN) - l(x)1 > l/m. By insertingsome terms equal to x into the sequence {xn} we can obtain a newsequence {Yk} converging to x such that I/k(Yk) - l(x)1 > l/m forinfinitely many k. Specifically, we take Yk = XN if k = k(N), choosingthe smallest N if there is more than one choice and Yk = x otherwise.But this contradicts the hypothesis limk_oo Ik(Yk) = I(x), proving theclaim.

To complete the proof we apply the Heine-Borel theorem. Fix m.For each x in D there exists N and a neighborhood (x - l/n, x + l/n)of x contained in Am.N. This set of neighborhoods is an open coverof D, which is compact, so there exists a finite subcover. Thus D is afinite union of sets of the form Am,N for m fixed. But the sets Am.1Vincrease with N, so the finite union is just Am.N for the largest valueof N. Since D = Am.N for sorne N holds for every m, we have proveduniform convergence. QED

Chapter 7 Sequences and Series of Functions274

Page 294: Strichartz_The Way of Analysis 2000

14. Define a linear spline to be a continuous function on a compactinterval that is piecewise linear {equal to an affine function on each

a. Prove that every continuous function on a compact intervalis a uniform limit of step functions.

b. Prove that a uniform limit of step functions (on a compactinterval) is Riemann integrable.

13. Define a step function to be a function that is piecewise constant,I{x) = 2:7=1CjX!aj,bj) where [aj,bj) are disjoint intervals. (xdenotes the characteristic function of the interval.)

12. If In -T I uniformly on [a, b), prove that Fn -T F uniformly on[a, b] where Fn{x) = J: In{t) dt. Is the same true on the wholeline?

9. If In -T I uniformly and limx_xo In{x) exists for every n, thenlimx_xo I (x) exists and equals limn_oo limx_xo In (x) (note: weare not assuming continuity of In or 1).

10. If In -T I uniformly and the functions In have only jump discon­tinuities, prove that I has only jump discontinuities.

11. *Give an example of a sequence of continuous functions In on [0,1]that converge pointwise to zero but such that limn_oo Jo1 In(x) dxis not zero.

7. If I/n{x)1 ~ an for all x and 2:~=1an converges, prove that2:~=1In{x) converges uniformly.

8. Give an example of a sequence of continuous functions on a non­compact domain D that does not converge uniformly, yetliffin_oo In(xn) = I(x) for every sequence {xn} converging to x inD.

6. Give an example of a sequence of continuous functions convergingpointwise to a function with a discontinuity of the second kind.

5. If limn_oo In = I and the functions In are a11mono tone increas­ing, must I be monotone increasíng? Whet happens if In :!l''' allstrictly increasing?

2757.3 Uniform Convergence

Page 295: Strichartz_The Way of Analysis 2000

The first question we need to answer is the question of convergence:given a specific power series L~=o an (x - xo)n, that is, given Xo and ao,al, a2, ... , for which values of x does it converge? Note that the answerdepends ooly on x - Xo, so if we can answer the question for Xo = Owe

"There was a little girlWho had a little curlRight in the middle of her foreheadWhen she was goodShe was very, very goodBut when she was bad she was horrid."

7.4.1 The Radius of ConvergenceAn important cIass of infinite series of functions is the cIass of powerseries L~=o an(z - zo)n where Zo is fixed (we refer to Zo as the pointabout which the power series is expanded) and Clnare real (or complex)coefficients. By definition (z - zo)o = 1, so the first term is merely theconstant function ao. Note that the partial sums of the power seriesare polynomials; L~=o Cln(z - xo)n is a polynomial of degree N. Inmany ways power series can be thought of as polynomials of infiniteorder. In fact, the theory of power series was developed during theeighteenth century in just such a spirit, with an unfortunate disregardfor the question of convergence. We are in a position now to develop thebasic properties oí power series with complete rigor. Before beginningit is only fair to warn you that power series are very atypical of seriesof functions in general, so you should not expect the insights derivedfrom the study of power series to extend to other cIasses of series. Thiswill be especíally important in a later chapter when we discuss Fourierseries.

In discussing power series it is good to recall a nursery rhyme:

7.4 Power Series

subinterval in a finite partition). Prove that every continuousfunction (on a compact interval) is the uniform limit of linearsplines.

Chapter 7 Sequences and Series of Functions276

Page 296: Strichartz_The Way of Analysis 2000

can answer the question in general. In what fo11owswe will frequentlydeal with the C~~ Xo = Oonly, in order to simplify notatíon.

The convergence of ¿~=o anxn at a point x depends on the coeffi­cients ano We can always arrange for anxn to be unbounded (if x =F O),and in fact by taking an = n! we can make anxn unbounded for everyx =F O, so the power series need not converge except at x = O. How­ever, if the power series does converge at some point x =F O, this hasimplications about convergence at other points. The idea is extremelysimple. If L~=oClnxn converges at sorne point x =Xl, then the termsanxi must certainly be bounded. So there must exist a constant Msuch that lanxil :5 M for a11n. We are not saying that this condi­tion is sufficient for the convergence; only that it is necessary. But itturns out to be sufficient for the convergence of the series for any xwith Ixl < IXII. In fact, if we rewrite the condition as lanl :5 Mlxd-n,then we have lanxnl :5 Mlx/xlln and so ¿~=o anxn converges by corn­parison with L::o Mr" for r = Ix/xII < 1 if Ixl < IXII. In fact theconvergence is absolute, and it is uniform in any interval Ixl :5 R forfixed R < IXII, because we can compare with L~=oM (R/lxllt for allx,lxl :5 R. This implies that the power series represents a continuousfunction on Ixl < IXII since it is the uniform limit on compact subsets ofthe partial sums, which are continuous because they are polynomials.We will see shortly that the behavior of power series is rnuch betterthan continuous.

This simple cbscrvatíon leads immedíatcly te the dcfíniticn of theradius of convergence of the power series ¿~=o anxn as the uniquenumber R such that the series converges for IxI < R and diverges forIxl > R. We allow R = +00 if the series converges for all x andR = O if the series converges only for x = O. If the series diverges forsome valué of z, then R = sup{ x : series converges}. At the valuesx = ±R the series may either converge or diverge. However, if Ixl > R,then the series must diverge; in fact, the terms must be unbounded­otherwise by our observation we would have convergence for a11valuesin (-Ixl, Ix!), contradicting the definition of R. Thus the region Ixl < Ris where the series is very, very good, and the region [z I > R is whereit is horrid. Again we must emphasize that this property is special topower series.

Here are sorne simple examples that illustrate the variety of behav­iors of power series at the radius of convergence. The geometric series

2777.4 Power Series

Page 297: Strichartz_The Way of Analysis 2000

Lemma 7.4.1 liffin-ooM1/n = 1 jor any M > O.

since limn_oouv» = 1 for any M > O (we will give a proof of this asa separate lemma below). This shows lim sUPn_oolanl1/n :5 l/Irl forany value of r with [r] < R, so limsuPn_oo lanj1/n ~ l/R.

For the reverse inequality write lim sUPn_oolanl1/n = 1/~ (al­lowing ~ = O or 00). We need to show that L~=oanrn convergesif Irl < ~, for that will imply R ~ ~. Now for any fized r withIrl < Ro we can find Rl satisfying Irl < Rl < Ro (if Ro = O thereis nothing to prove). Then lim sUPn_oolanl1/n < 1/R, so that fork large enough, lanl1/n :5 1/Rl for all n ~ k (this is a consequenceof the definition limsuPn_oo lanl1/n = limk_ooSUPn>klanI1/n). Thismeans lanl ~ l/Ri and, hence, lanrnl ~ Ir/Rlln forall n ~ k, whichproves the convergence of L anrn by comparison with the geometricseries since Ir/ Rll < 1. One can also prove the convergence of L anrnby appealing directly to the root test-the above argument essentiallyincorporates the reasoning used to derive the root test. QED

n-oon-oo

Theorem 7.4.1 The radius 01 convergen ce R 01 a power seriesL~=o anxn is given by 1/R = lim sUPn_oo\IIaJ.Proof: First we will establish the inequality lim sUPn_oolanl1/n ~1/R. If R = O there is nothing to proveo If R > O, choose any r < R.Then the terms anxn are bounded for x = r, lanrnl ~ M. This meanslanl1/n ~ M1/n /Irl and so

limsup lanl1/n :5 (l/Irl) lim supMl/n = l/Irl

L:=o xn converges for Ixl < 1 and diverges for Ixl > 1, so the radiusof convergence is 1. At x = ±l the terms (±l)n are bounded but theseries diverges. The series L:=o nxn also has radius of convergence1 but the terms are unbounded at x = ±1. The series L~=l (l/n)xnhas radius of convergence 1, and at x = 1 it diverges while at x = -1it converges, since L::l (_I)n /n converges, although not absolutely.Finally the series L:=l (1/n2)xn converges at both endpoints x = ±1.We will verify the computations of the radius of convergence for theseseries after the next theorem.

The root test for convergence gives a formula for the radius of con­vergence.

Chapter 7 Sequences and Series of Functions278

Page 298: Strichartz_The Way of Analysis 2000

Next we look at sorne examples. In these, as is usually the case, thelimit of lanl1/n will exist, so it is not necessary to invoke the limsup.First consider the case when the coefficients tln are given by a rationalfunction of n, an =p(n)/q(n) where p and q are polynomials (perhapsmodified at the finite number of zeroes of q). We claim the radius ofconvergence is 1. To see this it suffices to show limn_oo Ip{n)11/n= 1for any non-zero polynomial. In the lemma we have already establishedthis for p( n) = M, the constant polynomials. Since it is easy to showMonk ~ Ip(n)l s Mnk for alllarge n and some values of Mo, M andk (depending on the polynomial), it suffices to show liffin_oo nl/n = 1.Then limn_oo(Mnk)l/n = (limn_oo Ml/n)(liDln_oo nl/n)k = 1 by theproperties of limits (/(x) = xk is continuous, so we can interchange itwith the limit). Finally we establish lillln_oon1/n = 1 without loga­rithms using the binomial theorem. Since {1+ l/m)n = 1+ njrn + ... +(l/m)n, we have nfm ~ (l+l/m)n, so n1/n ~ ml/n{1+1/m). Keepingm fixed and letting n ~ 00 we obtain limsuPn_oo n1/n ~ (1 + l/m)

1 ( n ) l/n1+;;;~ 1+;;;

since the function f(x) = xn preserves the order relation. Now ifwe fue M > 1, then given any error l/m we can find k such thatM ~ 1 + k/m, so M s 1 + nf m for all n ~ k. Altogether uv» s(1 + n/m)l/n ~ 1 + l/m; and since the inequality 1 ~ uv» is animmediate consequence of M > 1, we have IM1/n - 11~ l/m for alln ~ k, proving liDln-oouv» = 1. QED

hence

All the terms are positive, so we have

Proof: It suffices to do this for M > 1 since liDln_oo{l/M)l/n ={limn_ooMl/n)-l. One can give a quick proof using logarithms butsince we have not discussed the properties of logarithms, we will givea longer proof using the binomial theorem. We use the identity

2797.,4 Power Series

Page 299: Strichartz_The Way of Analysis 2000

But limn_oo(n + l)l/n = 1 by the arguments already given, whilelimsuPn_oo lanI1/(n-l) = limsuPn_oo lanl1/n (because lanI1/(n-l) =(lanI1/n)n/(n-I) and n/{n - 1) ---+- 1 as n ---+- 00). One can also de­rive directly that the boundedness of the terms anxn for fixed x = rimplies the convergence of E nanxn-1 for IxI< r because the geometricdecrease of Ix/rln swamps the growth of the factor n.

We can also verify that the derived power series E(n + l)an+lXn hasthe same radius of convergence as the original power series. Note that

00 002:nanxn-1 = 2:(n + l)an+1xn.n=O n=O

since we already have shown liDln_oo m l/n = 1. Since this is truefor any value of m, lim sUPn_oo n l/n ~ 1 and the obvious inequalityn l/n ~ 1implies limn_oo n l/n = 1.

To get an example of a power series with radius of convergence+00 or O we must do something more dramatic (to get a finite posi­tive value for R we can merely take L R-nxn). The most importantexample is the exponential power series L~=o l/n!xn. To see this hasradius of convergence +00 we need to show limn_oo l/(n!)I/n = O or,equivalently, limn_oo(n!)l/n = +00. This will also show that the seriesE~=o n!xn has radius of convergence O. Now limn_oo(n!)l/n = +00 isplausible because n! = 1·2· .. n, a product of n factors, so that (n!)l/nis a kind of average (the geometric average) of the numbers from 1 ton, and this average should not remain bounded as n ---+- oo. In fact,for any fixed m, once n ~ m we can estimate n! ~ m(n-m) simplyby ignoring the first m factors and noting that the last n-m factors,m+ 1,m+2, ... ,n, are all greater than m. Thus (n!)l/n ~ mI-m/n andSO liminfn_oo(n!)l/n ~ mli1Dn_oo(m-m)I/n = m by our previous re­sult limn_oo M1/n = 1. Since m is arbitrary, we have limn_oo(n!)l/n =+00.

Next we discuss differentiability of power series, which is based onthe fact that in the interior of the interval of convergence the powerseries converges very rapidly. This wiIl allow us to differentiate theseries terrn-by-terrn. It is easy to verify that, on a formal level, thepower series E~=o anxn has derivative

Chapter 7 Sequences and Series of Functions280

Page 300: Strichartz_The Way of Analysis 2000

7.4.2 Analytic Continuation

Suppose ¿ Onxn converges to f(x) in Ixl < R (we have expanded aboutx = Ofor simplicity). Now for any fixed Xo in Ixol < R, we might hope

This result is a two-edged sword. It shows that power series arereally terrific-you can differentiate them term-by-term in the interiorof the interval of convergence; but it also shows that only very specialkinds of functions can be represented by power series-any such func­tion must be C'", In the next chapter we will even give examples ofeoo functions that do not have power-series expansions.

We can now relate the coefficients of a convergent power serieswith the Taylor expansions of the function equal to the sum, f (x) =E:=o anxn. By differentiating n times and setting x = O we obtainlen) (O) = nlan (by convention 01= 1). In other words, the partí al sumsof the power series are the Taylor expansions of the function. Fromthis we also obtain the uniqueness of the power series. If ¿anxn and¿ bnxn converge in IxI < r to the same function ¡, then an = bn =l/n!f(n} (O), so they are identical series. Notice that the same thingholds for power series about an arbitrary point Xo. If¿an(x - xo)nconverges to f(x) in Ix-xol < r, then an = l/n!¡{n}(xo)j so if¿ an(x­xo)n =Ebn(x_xo)n in Ix-xol < r, then an = bn for all n. However, wecan have equality between different power series about different points.As we will see in the next section, this is a rather important idea.

Proof: We apply Theorem 7.3.4 on differentiating infinite sequencesof functions. The individual terms anxn are all differentiable, so weneed to show the convergence of ¿ anxn and the uniform convergenceof ¿nanxn-1• But we have already shown that a power series withradius of convergence R converges uniformly on any smaller intervaland that the power series ¿ nanxn-l has radius of convergence R also.Thus we obtain ¡' (x) = ¿nanXn-1 on any smaller interval and, hence,on (-R, R). The results for f(k) then follows by induction. QED

Theorem 7.4.2 Let ¿anxn have radius of convergence R with R =1=o. Then thefunction f(x) = ¿anxn is el on (-R,R) and ¡'(x) =¿nanxn-1 there. In fact f(x) is eoo on (-R, R), and the derivativef(k) of order k is given by the power series formally differentiated ktimes.

2817.4 Power Series

Page 301: Strichartz_The Way of Analysis 2000

( n+k)k ={n+k)·{n+k-l) .. ·{n+l)/k!

Notice that the express ion we have guessed for the coefficient bk isitself a power series in Xo that converges also in Ixol < R (if ¿anxnconverges in Ixl < R) because the factor

We observe that if the series were finite (so f would be a polynomial),we could simply write x = (x - xo) + Xo and

x" = «x - xo) + xo)" =t(~)(x - xO)kxO-kk=O

and substitute this into f(x) = ¿~=oanxn to obtain

f(x) =~ a,. ta ( ~) (x - xO)kxO-k,

and by regrouping and rearranging terms, f(x) = ¿~=obk{X - xo)kwhere bk = ¿~=k Cln (k) xo-k. This is just algebra. But it suggeststhat for the infinite series f{x) = ¿~=oanxn we should have f{x) =¿~o bk(X - xo)k with

bk = f::a" ( ~ ) xO-k = f::a,,+k ( n t k ) xo.n=k n=O

Figure 7.4.1:

Ro-R••(

to find a power series E bn(x - xo)n converging to J(x) in Ix - xol < rwhere r = R - Ixol (see Figure 7.4.1). In fact this is always possible,as we will now see. We know in fact that if there is such a series, itmust be given by bn = l/n!f(n}(xo), but surprisingly this remark doesnot help in establishing the convergence of ¿bn{x - xo)n to f. Wetherefore take a different approach, which is more directo

Chapter 7 Sequences and Series of Functions282

Page 302: Strichartz_The Way of Analysis 2000

f:tan ( ~ ) (x - xO)kxO-k = f:anxn = I(x).n=Ok=O n=O

On the other hand, if we rearrange the series, summing first on n withk fixed, we obtain

~ta an ( ~ ) (x - zO)kzO-k = ~ (~an ( ~ ) ZO-k) (z - zO)k

1:t lanl ( ~ ) Ix - xolklxoln-k =1:lanlrn,n=Ok=O n=O

which converges because the power series is absolutely convergent in(-R, R). Thus we are dealing with an absolutely convergent series. Ifwe take the given order and sum on k fírst we obtain

1:i:an ( ~ ) (x - xO)kxO-kn=Ok=O

(this is an infinite series, but it is not indexed by n; rather it is indexedby the pairs (n, k) with k ~ n in lexicographic order). Ifwe put in abso­lute values for each term an (í:) (x - xo)kxo-k we obtain the expressionE:=o E~=o lanl (í:) Ix - xolklxoln-k, since the binomial coefficients arepositive. We can next evaluate the k-sum

i:(~) Ix - xolklxolk = (Ix - xol + Ixoi)n = rnk=O

where r = Ix - xol + Ixol satisfies r < R by the conditions we haveassumed for x and Xo. Thus

expression

is a polynomial in n for k fixed, so limn_oo(ntk)l/n = 1for each k. Thusthe rea1Tangement (note that we are using this term in a somewhat d­ifferent sense than in section 7.2.3) 2: bk(x - xO)k of 2: anxn wherebk = E:=o an+k(ntk)xo is well defined for any Xo in (-R, R). Inciden­tally, the formula for bk agrees with the previously derived f(k)(xo)/k!as can be seen by differentiating f(x) = 2: anxn k times and settingx = Xo.

It remains to show that E bk(X - xo)k actually converges to f(x),at least for Ix - xol < R - Ixol. To see this we want to invoke theunconditional nature of an absolutely convergent series, namely the

2837.4 Power Series

Page 303: Strichartz_The Way of Analysis 2000

In fact, this computation gives the power series of 1/ (1 - x) about anypoint xo except 1.

Figure, 7.4.2:

. - . --)1oXo-1

(-1+ 2xo

4' - -. ----- ---- Ix - xol < 11 - Xo I

provided I(x - xo)/(1 - xo)l < 1 using a power-series expansion aboutO. Notice that we have a power-series expansion for 1/(I-x) convergentin the ínterval ]e - xo 1< 11- xo l. If O< xo < 1 this is the same intervalpredicted by the theorem, but for -1< xo < Oit is a larger intervalo

1 1 1 1------=1 - x 1 - Xo - (x - xo) 1 - xo 1 - (x - xo)/(1 - xo)

= _1_~(X-XO)n=~(I_Xo)-n-l(x_xor,1- xo L.,,¡ 1- Xo L.,,¡n=O n=O

Notice that the theorem does not preclude the convergence of thenew power series in a larger interval. For example, L~=o xn = 1/(I-x)converges in IxI < 1. Ifwe fue a point Xo in this interval, we can computethe power-series expansión 2:n'~obn(x - xo)n by bn = 1(7. \xo)/ a! ormore directly as

Theorem 7.4.3 11Lan(x - XI)n is a convergent power series inIx - xII < R converging to the function 1, then I also has a power­series expansion about eachpoint X2 01 the interoallx - xII < R, whichconverges at least in the largest symmetric interoal Ix - x21 < r lyingentirely in the original interval Ix - xII < R.

so I(x) = ¿:~o bk(X - xO)k in Ix - Xo I < R -Ixol. We have thus provedthe following theorem.

00

= L~)k(X - xo)k,k=O

Chapter 7 Sequences and Series of Functions284

Page 304: Strichartz_The Way of Analysis 2000

If we know f(x) for Ix - xol < €, then we can compute f(n)(xo) for alln and so obtain the values of f(x) on the interval of convergence from¿~=oan(x - xo)n. This in turn allows us to compute f(n)(x¡) for anypoint x 1 in the interval of convergence, hence we can obtain the powerseries for f about other points. This process of passing from powerseries to power series about new points is called analytic continuation.We leave as an exercise the proof that if f is defined in (a, b) andanalytic, then we can obtain the value of f(xI) for any Xl in (a, b) byanalytic continuation from the power series f(x) = L an(x-xo)n aboutany point Xo in (a, b) in a finite number of steps. Note that analyticcontinuation of the power series L~=oxn leads to the analytic function1/(1- x) on the domain (-00,1). However, analytic continuation willnot enable us to extend 1/(1 - x) past the singularity at x = 1 to theregion (1,00). To do this, we need to move into the complex planeoThis wiIl be discussed briefty in the next section.

Most important functions in mathematics are analytic-at least ifthe domain is suitably restricted. Polynomials are the simplest exam­ples, having power series with a finite number of non-zero terms. AH thespecial functions-+síne, cosine, exponential, and logarithm, and moreexotic functions such as Bessel functions, hypergeometric functions,and so on, are analytic; indeed they are often defined by power-series.As wewill see, the class of analytic functions is closed under arithmeticoperations and compositions-so that any function for which you canwrite a formula is analytic (again with the domain suitably restricted).This means that power-series expansions should be extremely useful.However, it would be a mistake to think that only analytic functions are

Functions that have power-series expansions about al! points intheir domain are called analytic junctions. The theorem shows thatfunctions defined by power series about one point are analytic-theyhave convergent power series about the other points in their domain.Analytic functions have remarkable properties-the values of the func­tion on any small interval determine the values of the function on anylarger intervalo This is just a consequence of the uniqueness of powerseries and the formula

2857.4 Power Series

Page 305: Strichartz_The Way of Analysis 2000

To understand the behavior of a power series, even if we are onlyinterested in a real domain, requires that we consider its behaviorin a complex domain. If f(x) is a real- or complex-valuedfunctionof a real variable x, defined on some domain in R, it is not a pri­ori clear what we should mean by f(z), where z varies in C. Ofcourse for some special functions we have a good candidate. If f isa polynomial, f(x) = E~=o l1nxn, then we simply take for grantedthat f(z) = E~=oanzn. This is not the unique possible choice-iff(x) = O we could consider F(x + iy) = y, which also extends theoriginal function in that F(z) = f(z) if z = x is real. However,weare certainlyjustified in claimingthat the first choiceis somehowmostnatural. Now it turns out that we can do the same thing for powerseries. If E~=o l1nXn = f(x) convergeson the interval Ixl < R, thenE~=oanzn = f(z) also convergesin the circle Ixl < R (this explainswhy R is referred to as the radius of convergence).In fact the identi­cal argument for the absolute convergenceof the power series for realx proves also the absolute convergencefor complex z, Thus analyticfunctions possess natural extensions to domains in C. For example,1/(1 - x) = E~=oxn for Ixl < 1, and also E~=o zn = 1/(1 - z) forIzl < 1. Clearly f(z) = 1/(1 - z) is the most natural candidate forthe extensionof f(x) = 1/(1 - x) to complexnumbers-although youmight be hard pressed to explainwhywithout the use of power series.By using analytic continuation in e wecan get around the obstacle atz = O that we could not get past in Il However,analytic continuationin e leads to other types of complicationsthat can't be discussedhere.

The theory of analytic functions of a complexvariable is beyondthe scope of this book-it requires a book of its own. I will give oneexamplethat hints at someof the waysthe theory ofanalytic functionsof a complexvariablecan shed light on questions that only involverealnumbers. If we are told that the function 1/(1 - x) has a power-seriesexpansionabout x = 0, wecan guessimmediatelythat the powerseries

7.4.3 Analytic Functions on Complex Domains"

of interest. Manyfunctionsthat are supposedto represent physical da­ta fromthe real worldare not analytic. Also,the procedure of analyticcontinuation is computationallyunstable, involvinghigh derivativesofthe function that cannot be efFectivelycontrolled.

Chapter 7 Sequencesand Seriesof Functions286

Page 306: Strichartz_The Way of Analysis 2000

because taking absolute values oí (-I)n (i:) {2xo{x - xo))n-k{x - xO)2k

{ l)n ( n )1 1 oon - k-- - -- '"" '"" (2x {x - X ))n-k(x - X )2k1+ x2 - 1+ x5 ~ ~ {1+ x5)n o o o

we still have the absolute convergence of

(-2XO(X-XO)-(X-XO)2)" = (-1)"t(~)(2xo(x-XO))"-k(x-xO)2kk=O

with absolute convergence if 12xo(x - xo) + (x - xo)21< 1+ x5. Now ifIx - xol < r for r sufficiently small we have 2xolx - xol and (x - xo)2each ~ {1+ x5)/2, so we not only have the absolute convergence of theabove series (this is not quite a power series) but after substituting thebinomial expansion

= 1+ x5 1- (-2xo{x - xo) - (x - xo)2)/(1 + x5)

= _1_ f: (-2xo(x - xo) - (x - Xo)2)n1+ x5 n=O 1+ x5

= 1+ x5 + 2xo(x - xo) + (x - xO)21 1

1 1= 1+ x2 = 1+ «x - xo) + xO)2

1

f(x)

could converge at most for Ixl < 1 because the function 1/(1 - x) hasa singularity at x = 1. We might be led to guess, then, that a powerseries 2: anxn = f(x) should converge on the largest interval Ixl < Rfor which the function f(x) has no singularities (never mind exactlywhat singularity means-just interpret it as any "irregular" behavior).Put another way, if 2: anxn = f(x) converges in Ixl < R, then f isanalytic in Ixl < Rj we can ask, conversely, if f being analytic in Ixl < R(remember this means it has a power-series expansion about each pointin Ixl < R) implies that the power series about Oconverges in Ixl < R?The answer turns out to be no. The function f(x) = 1/{1 + x2) isanalytic on the whole real lineo At any point xo we can compute itspower series from

2877.4 Power Series

Page 307: Strichartz_The Way of Analysis 2000

Theorem 7.4.4 Let ¿l1n(x - xo)n = f{x) and ¿bn{x - xo)n = g(x)converge in Ix - xol < R. Then f ± 9 and f . 9 hove power-seriesexpansions about Xo convergent in Ix-xol < R. Furthermore, ifg(xo) ::j:.

7.4.4 Closure Properties of Analytic Functions"In this section we show that analytic functions are preserved under op­erations of arithmetic and composition. This wiIl involve understandinghow power series behave under these operations. We work entirely inthe real domain.

but now there are singularities at z = +i and z = -i (because1/(1 + i2) = 1/{1 - 1) = 1/0, which is undefined). The presence ofthose singularities explains why the real power series cannot convergein any interval Ixl < R for R > 1. If it did, the complex power serieswould converge in Izl < R and the function would be defined at z = +iand z = -i. In fact one can prove-although we wiIl not do so here­that the radius of convergence of a power series ¿an(x - xo)n = f{x)is exactly equal to the distance from Xo to the first complex singularityof f{z) = ¿an(z - xo)n.

00

L{ -ltz2n = 1/{1+ z2) for Izl < 1,n=O

which is < (1+ xij)n. Since we have an absolutely convergent series,we can rearrange it according to the powers of (x - xo) and so obtainthe convergent power series of f(x) = 1/(1+ x2) about x = Xo. Thecomputation for general Xo is quite messy, but for Xo = O it is sim­ply 1/(1+ x2) = ¿~=o(_1)nx2n. From this we see that the radiusof convergence is exactly 1. This, despite the fact that the function1/(1 + x2) does nothing unusual at x = ±1, destroys our conjecture.

Nevertheless, if we look at the function 1/(1+ x2) of a complexvariable, we see immediately what is happening. We still have

t I(_l)n ( ~ ) (2xo(x - xO)¡n-k(x - xO)2klk=O

= (12xo(x - xo)1+ Ix - xol2t,

leads to

Chapter 7 Sequences and Series of Functions288

Page 308: Strichartz_The Way of Analysis 2000

where Ck = aobk+albk-l +.. ·+akbo. Since everything we have done isvalid in Ix - Xo I < R,we may conclude that the power series for I9 hasradius of convergence at least R. (This could also be verified directlyfrom the formula for Ck.)

Finally, we need to consider quotients. Since f /g = I .(1/g), itsuffices to show that l/g has a convergent power series about Xo if

00 00 00

I{x)g{x) =E E C1nbm{x - xo)n+m =E Ck{X - XO)kn=Om=O k=O

which is bounded (independent of N and M). This justifies the formalmultiplication and allows us to rearrange terms to make a convergentpower series:

n=Om=O

IV .\1E E lanbm{x - xo)n+ml

(We must use a different label for the index of summation in each powerseries because the distributive law for multiplication requires that wetake all products of terms-one from each series). Now the point isthat the power series are absolutely convergent in Ix - xol < R, andthis implies that the double series is also absolutely convergent

Proof: The theorem for f ± 9 is trivial since the sum or differenceof the convergent power series gives a convergent power series for I ±g, (!± g)(x) = L~=o{C1n ± bn)xn. We have to do a little work for theproducto In the process we will find a formula for the power series ofthe producto If we formally multiply the two power series we obtain

0, then 1/ 9 has a power-series expansion about Xo convergent in sorne(perhaps srnaller) neighborhood 01Xo.

2897.4 Power Series

Page 309: Strichartz_The Way of Analysis 2000

It is rnerely a question of whether we can rearrange the sum to read

1 1 ~ k k 1 ( ~ k ~ n)g{x) = bo ~(-1) h{x) = bo 1+ {:r( -1) ~ hn.k(X - xo) .

where both the inner and outer series converge absolutely if Ix-xol < r.This is not yet a power series, but it is very close to one. Rememberthat we have already shown how to multiply two power series, so byinduction we know h(x)k = {L~=l dk(X - xo)n)k has a power-seriesexpansion obtained by formally multiplying out all terms and collectingthe (finite) sums corresponding to each power of x - xo. Let us sayh{x)k = L~=k hn.k{X - xo)n (because there is no zero-order terrn inthe h(x) power series, the lowest order term in the h(x)k power seriesis (x - xO)k). Using this notation, we have

with the series converging absolutely. Thus we have found

1 1 1 l~ k kg(x) = bo 1+ h(x) = bo L.) -1) h(x)

k=O

where dn = bn/bo. We define h(x) = L~=l dn{x - xo)n. We note thatthis is a convergent power series in Ix - XoI < R, so h is continuous.Then since h(xo) = O,we can find sorne smaller neighborhood Ix-xol <r on which Ih(x) - h{xo)1 = Ih(x)1 < 1, so that we can expand

= bo{1 +Ldn{x - xo)n)n=l

00

n=l

00

g(x) = bo + L:>n(X - xo)n

g(xo) =F O. Note that g(xo) =F Ois the same as bo =F O. We can thenwrite

Chapter 7 Sequences and Series of Functions290

Page 310: Strichartz_The Way of Analysis 2000

The theorem implies immediately that if f and 9 are analytic func­tions on a domain D, then so are f ± g, f· g, and f /g (if 9 =F Oon D).The proof of the expansion for 1/9 actually contains the germ of an im­portant generalization concerning compositions (l/gis the compositionof 9 followed by f(x) = l/x).

1+~ H(x)k =1+~ (~Idnllx _ xoln) k

converges absolutely. In expanding the kth power we have

(~ ItI.. I Ix - xoln) k =EHn,klx - xoln,

Note that all the terms are non-negative (this uses the fact that thecombinatorial factors are a11natural numbers) and so the double seriesL~l L~=k Hn,k Ix - Xo In converges in the order indicated. Finally, wehave Ihn,kl ::; Hn,k, since the Hn,k are obtained in the same manner ashn,k except that dn is replaced Idnl. Thus we have proved the absoluteconvergence of the double series L~l (-l)k L~k hn.k(X - xo)n, henceits rearrangement as a power series in Ix - xol < ro is justified. QED

is finite.At first glance this looks like an unpleasant task, since even to

obtain a closed-form expression for the coefficients hn,k would requiresome formidable combinatorial notation. Fortunately we can argue thedifficulties away. However, it is first necessary to shrink the neigh­borhood somewhat, say Ix - xol < ro, so that not only do we haveIh(x)1 = rE~l dn(x - xo)nl < 1 but also H(x) < 1, where we defineH(x) = 2: Idnllx - xoln. This of course is possible since the conver­gence of L~l dn(x - xo)n implies the convergence of the power seriesdefining H, and we still have H(xo) = O. Since we have IH(x)1 < 1for Ix - xol < ro, we may conclude 1/(1 - H(x)) = 1+ L~l H(x)kconverges absolutely, so

k=ln=k

00 00

L L Ihn,kllx - xoln

in order to obtain a convergent power series. We know that the con­dition for rearrangement is absolute convergence. Thus we need toshow

2917.4 Power Series

Page 311: Strichartz_The Way of Analysis 2000

follows íf we restrict X - Xo so that L~llakllx - xolk < rl, for thenL:~=o Ibnl (L:~l lakllx - xolkf converges absolutely and expanding

00 00

L L IbnlAk,nlx - xolkn=Ok=n

Finally the convergence of the dominating double series

where lak,nl ::; Ak,n and Ak,n is obtained in the same way as ak,n withlanl replacing an,

00 00

L L IbnlAk,nlx - xolkn=Ok=n

with convergence in Ix - xol < ro, so we need to show the absoluteconvergence of the double infinite series L~o Lk:n bnak,n(X - xo)k.As in the previous proof we do this by comparison with

9 o ¡(x) =Es; (Eak(X - XO)k) ,.

converging if If(x) - xII < rl also. Since f(xo) = Xl and f is continu­ous, this will be true if Ix - XoI is small enough. To complete the proofwe have to justify the rearrangement. We have

Theorem 7.4.5 Let f(x) = L:~=o an(x-xo)n converge in Ix-xol < r,and let g(x) = L:~=o bn(x - x¡)n converge in Ix - xd < rl wheref(xo) = Xl (so Xl = ao). Then go f(x) has a convergent power series ina 8ufficiently small neighborhood of Xo that is obtained by rearrangementfrom L~=o bn (L~l ak(x - xo)k)n).

Proof: Since f(x) - Xl = L~l ak(x - xo)k and this converges forIx - xol < ro (hence absolutely), we have

Chapter 7 Sequences and Series of Functions292

Page 312: Strichartz_The Way of Analysis 2000

The theorems in this section justify the assertion that essentiallyall functions that can be written in closed form are analytic, providedthe domain is suitably restricted to eliminate points where division byzero is called for at some stage of the definition. This of course assumesthat all the special functions we introduce are analytic. (To show thatgeneral powers f (x) = xa are analytic for x > Owe anticípate theresults xa = ea log x and the analytic nature of e" and logx from thenext chapter.)

The proof of the rearrangement of the series for 9 o f suggests aninteresting question. If f¡, 12, ... are analytic functions with convergentpower series lle = E~o an,k(x - xo)n in an ínterval ]z - xol < R and ifE~l fk = I converges in Ix-xol < R, does f have a convergent powerseries E~o (Ek:l an,le) (x_xo)n? The answer turns out to be no, evenif we require the convergence of the series E~l fk to be uniformoWewill see this in the next section, where we show that any continuousfunction can be obtained as a uniform limit of polynomials. A deepertheorem says that if E fk(Z) converges uniformly for all complex Z ina disc, then the limit is analytic. (For a proof, consult any book oncomplex variables.)

We now summarize what we have discovered about three closelyrelated topics: power series, analytic functions, and Taylor series. Sup­pose we start with a Coo function f (x) defined in a neighborhood ofXo. Then we can form the Taylor expansions E~=o f(n}(xo)(x-xo)njn!to any order N at xo. Taylor's theorem describes the accuracy of ap­proximation to f (x) as x ~ Xo for fixed N. It says nothing aboutlimiting behavior as N ~ 00, and in fact we will see in the next chap­ter that in general nothing can be said. However, there is nothing toprevent us from considering the behavior as N ~ 00 and asking if theresulting power series E::o an(x - xo)n converges (where we have setCln = f(n}(xo)jn!). If it does converge to f (there are examples whereit converges but to a different function) on an interval Ix - xol < R,then we say that f is analytic on Ix - xol < R. We have shown thatf is also equal to a power-series expansion E~o bn (x - Xl) n about anarbitrary point Xl in the interval; the coefficients bn are again given

out CLk:llakllx - xolkr does not change the sum since all terms arenon-negative. QED

2937.4 Power Series

Page 313: Strichartz_The Way of Analysis 2000

3. If f is analytic on (a, b) and f(Xk) = O for a sequence of distinctpoints Xk in (a, b) with limk-+ooXk = Xo in (a, b) (note Xo = a orXo = b is not allowed), prove that f == O. (Hint: show f(xo) = Oand divide the power series by x - xo.)

2. If f is analytic in a neighborhood of Xo and f(xo) = O, show thatf(x)/(x - xo) is analytic in the same neighborhood.

1. Let f be defined in (a, b) and have a power-series expansion aboutevery point Xo in (a, b) that converges in a neighborhood of xo.Show that the values of f(x) on (a, b) are determined from thevalues of f(x) on any neighborhood in (a, b).

7.4.5 Exercises

as in Taylor's theorem by bn = l/n!f(n}(xI), and the interval of con­vergence is at least as large as the largest symmetric interval about Xl

contained in the original interval about xo. Inside the interval of con­vergence, these power series are very well behaved. The convergenceis absolute, uniform on any smaller interval, and the series may bedifferentiated term-by-term. Two power series about the same pointconverge to the same function only if they are identical series. Pow­er series about a point Xo may be combined by arithmetic operations,and power series may be composed if the expansion points match upappropriately. These properties make power series a powerful tool inthe study of differential equations and other applications.

The cIass of anal ytic functions is wide enough to contain most im­portant funtions. On the other hand, it is a rather special cIass offunctions. An analytic function is determined by its values in an ar­bitrarily small neighbohood of a point since these suffice to determineall the derivatives ¡(n) (xo) and, hence, the power series about xo. Itis even true that f is determined by its values on any sequence ofpoints Xl! X2, ... converging to a point in the interior of its domain (seethe exercises), but there is then no nice formula for f (x) in terms off(x¡), f(X2), ....

To learn more about analytic functions see A primer of real analyticfunctions by S. G. Krantz and H. R. Parks, Birkhauser- Verlag, 1992.

Chapter 7 Sequences and Series of Functions294

Page 314: Strichartz_The Way of Analysis 2000

8. Compute the radius of convergence of the following power series:

a. En4/n!xn,b. Evnzn,c. En22nzn.

9. *Compute the power-series expansion of 1/(1+ x2) about anypoint Xo from the formula an = f{n}(xo)/n!.

10. If f is analytic on (a, b) prove that for every Xo in (a, b) thereexists a neighborhood of Xo and constants M and r such thatlJ(k}(x)1 :5 Mk!rk for all k and x in the neighborhood.

11. *Prove the converse to 10, namely if f is COO on (a, b) and if forevery Xo in (a, b) there exists a neighborhood of Zo and constantsM and r such that If{k)(z)1 :5 Mk!rk for all k and x in theneighborhood, then / is analytic on (a, b). (Hint: show that theTaylor approximations about Zo converge to 1in a neighborhoodof xo.)

a. f(z) = x2/(1 - z2),b. I(x) = 1/(1 - x)2,c. f(x) = v'f+Z.

converges in Ixl < 1, for any real a.5. Expand f(z) = ft(l/t) dt in a power series about x = 1 by ex­

panding l/t in a power series about t = 1and integrating term­by-termo What is the radius of convergence of the series for f(x)?

6. Prove that if f(x) is analytic on (a, b), then F(z) = fex f(t) dt ísalso analytic on (a, b), where e is any point in (a, b).

7. Compute the power-series expansion of the following functionsabout x = O:

4. Prove that the binomial series

a( a - 1) 2 a( a - 1) ... (a - n) n(1+ z)a = 1+ az + z + ... + x2! n!

+ ...

2957.4 Power Series

Page 315: Strichartz_The Way of Analysis 2000

In this section we discuss the problem of approximating functions bysequences of polynomials. This is clearly a worthwhile goal, since apolynomial is a simpler object than a general continuous function, andmany questions about polynomials can be answered easily byalgebraiccomputations. One's first thoughts on the matter naturally turn topower series since the partial sums of a power series form a sequenceof polynomials. However, using power series restricts us to analyticfunctions, excluding such simple functions as [z], and even for analyticfunctions the interval of convergence may be too small. But we willshow that an arbitrary continuous function on a compact interval canbe approximated uniformly by polynomials-this is the famous Weier­strass approximation theorem. This does not in any way contradict thefact that sorne continuous functions do not have power-series expan­sions, because the partial sums of power series are very special waysof creating a sequence of polynomials. If the power series is expandedabout the origin, for example, and Pn(x) = Ek=O akxk, then the se­quence Pl (x), P2 (x), . .. has the property that once the coefficient anof z" is added in Pn, it never changes in the subsequent polynomials.When we construct sequences of polynomials PI,P2, . •. converginguniformly to f, we will not be requiring any relationship between thecoefficients of Pn and the subsequent polynomials.

Before getting into the proof of the Weierstrass approximation the­orem, we wíll discuss a related but simpler problem that was first solvedby Lagrange-that of fitting a polynomial to any finite set of data. Inother words, we want to find a polynomial P(x) satisfying P(Xk) = akfor k = 1, ... , n where Xk are arbitrary distinct real points and ak arearbitrary real (or complex) values. Since an arbitrary polynomial ofdegree n - 1, CO + ClX + ... + Cn_lXn-l, has n arbitrary constants, wewould hope to solve the problem with a polynomial of degree n - 1. Infact, substituting this in the equations P(xn) = an leads to n linear e­quations in n unknowns, which may or may not have a unique solution.In the present case we can simply write down a solution. Note thatthe function qk(X) = IIj#(x - Xj) is a polynomial of degree n -1thatvanishes at every Xj except Xk. Note that qk(Xk) = IIj#(xk - Xj) is a

7.5.1 LagrangeInterpolation

7.5 Approximation by Polynomials

Chapter 7 Sequences and Series of Functions296

Page 316: Strichartz_The Way of Analysis 2000

The method we are going to use is called convolution with an approxi­mate identity. The term convolution refers to a kind of product between

7.5.2 Convolutions and Approximate Identities

Even if we were to choose additional points on the graph of 1 throughwhich to pass a polynomial, we would not be sure of a better fit.

In section 7.5.3 we will give a constructive proof of the Weierstrassapproximation theorem. The proof is instructive for two reasons. Thefirst is that it is the prototype of a very general method for obtainingapproximations to functions. The second is that it allows one to saymore about the approximation if more is known about 1 (for exarnple,if f is el, then the derivatives of the polynomials approximate 1').

Figure 7.5.1:

y =j{x)

so P(x) = ¿:=l akQk(x) gives a solution to the problem that is calledthe Lagrange interpolation polynomial.

However, merely passing the graph of a polynomial through a finiteset of points in the plane does not solve the problem of approximating afunction because the formula for the Lagrange interpolation polynomialdoes not allow you to control P(x) at points in between the Xk 's. Forexample, for the function 1 in Figure 7.5.1, the polynomial P(x) == Opasses through all seven points where the function 1 crosses the x-axis,and yet Pisa very poor approximation to l.

{O if j =F k,

Qk (xi) = 1 if j = k,

non-zero constant and the polynomials Qk(X) = [qk(Xk)]-lqk(X) satisfy

2977.5 Approzimation by Polynomials

Page 317: Strichartz_The Way of Analysis 2000

Since this is a finite sum, we can interchange it with the integral toobtain f * g(x) = 2:7'=0bjxj where

f * g(x) = ¡g(x - y)f(y) dy

= ¡(ta ak(x - y)k) f(y) dy

= ¡tf.(-l)k-jak ( ~ ) x.iyk-jf(y)dy.k=OJ=O

In the case wewill consider both f and 9 will be continuous functionsand one of them will be zero outside a bounded interval so that theintegral willbe a proper integral and f * g(x) willbe definedfor each z,More generallyone considersconvolutionproducts under muchweakerassumptions on f and g. The key point to observe about the convolu­tion is the commutativity, f * g( x) = 9 * f (x). This followsfrom thesimple change of variable in the integral-replace y by x - y (hencex - y gets replaced by y). The convolutionproduct is also associative,(f * g) * h = f * (g * h)-a fact we leave as an exercise.

What is the significanceof the convolutionproduct? We can ínter­pret f (x - y) as a translate of f (the graph is translated to the rightby y), and this is then "averaged"with the weight g(y). So f * 9 is aweighted averageof translates of f. But writing f * g(x) = 9 * f(x) =Jg( x - y) f (y) dy showsthat the convolutionproduct is also a weightedaverage of translates of g. Thus f * 9 is a kind of hybrid, having theproperties of both f and 9 (at least those properties that are preservedunder translation and averaging). For example, if 9 is a polynomialand f is continuousand vanishesoutside a bounded interval, then f * 9is a polynomial. Indeed if g( x) = 2:~=0 akxk, then

f * g(x) =1:f(x - y)g(y) dy.

functions definedby

Chapter 7 Sequencesand Seriesof Functions298

Page 318: Strichartz_The Way of Analysis 2000

tln1I * g(x) - l(x)1 < nJo I/(x - y) - l(x)1 dy

ll/n 1 1<n -dy=-

o m mif I/(x - y) - l(x)1 ~ l/m for alllyl ~ l/n.

Clearly the exact form of the function 9 is not crucial, but we doneed the condition J g(y) dy = 1 and sorne condition that rnakes 9concentrate near zero. Any sequence of functions {gn} with such prop­erties is called an approximate identity. This is not quite a definition

so

(l/nI * g(x) - I(x) = n Jo (f(z - y) - I(x)) dy,

Figure 7.5.2:

Then JOl/n g(y) dy = 1, so we are getting a fair average and I*g(x) =n JOl/n I(z - y) dy approxirnates I well since I(z - y) is close to I(z)for all y in O ~ y ~ l/n by the uniform continuity of l. This can beseen most simply by writing

o l/n

n.--....

are just constants (note the integral J yk-j I(y) dy is a proper integralsince I = O outside a bounded interval). Thus we can obtain polyno­mials to approximate f by taking f * 9 where 9 is a polynomial.

We still need one further idea to rnake I * 9 approximate f: thatof an approximate identity. Looking at I * g(z) = J f(x - y)g(y) dy,this will be close to f(x) if only small values of y are ernphasized (thenI (x - y) will be close to I (x) by the continuity of f) and the averageis "fair." Suppose for example that the graph of 9 were something likeFigure 7.5.2.

2997.5 Approximation by Polynomials

Page 319: Strichartz_The Way of Analysis 2000

1. gn(x) ~ O,

2. J~oo gn(x) dx = 1,

3. liDln_oo !¡xl?l/n gn(x) dx = O

Deflnition 7.5.1 A sequence of continuous functions on the line {gn}satisfying

If * gn(x) - f(x)1 < 2M ( gn(Y) dy + ..!:_t: gn(Y) dyJ1yl?l/n m-l/n

< 2M ( gn(y)dy + _!_J1Y1?l/n m

since J~{in gn(Y) dy s f~oo gn(Y) dy = 1. But we are assumingf1yl?l/n gn(y) dy ~ O as n ~ 00 (thís is our concentrating hypoth­esis) , so we can make 2M f1Y1?l/ngn(y) dy ~ l/m by taking n largeenough and, hence, If * g(x) - f(x)1 ~ 2/m if n is large enough. Wecan summarize this result as follows:

1/* gn(x) -/(xll = 11.:(/(x - y) -/(x))gn(Y) dyl

< t:+ (00 If(x _ y) _ f(x)lgn(Y) dy-00 Jl/n

+ ¡l/n If(x _ y) _ f(x)lgn(Y) dy.-l/n

We are assuming f is continuous and vanishes outside a bounded inter­val. This implies that f is bounded and uniformly continuous. ThusIf(x)1 ~ M for all x, and given any error l/m there exista l/n such thatIf(x - y) - f(x)1 < l/m for alllyl ~ l/n. Substituting these estimatesinto the integrals (using If(x - y) - f(x)1 ~ If(x - y)1 + If(x)1 ~ 2Min the Iyl > 11n integral) we obtain

because I have been vague as to what is meant by the condition that gngets more concentrated near zero as n -+ oo. In fact, depending uponthe context, there are several ways of making this precise. Here is one:suppose gn ~ Osatisfies f gn(Y) dy = 1 and f~n + ft/n gn(Y) dy ~ Oas n ~ oo. Then

Chapter 7 Sequences and Series of Functions300

Page 320: Strichartz_The Way of Analysis 2000

and then extending 1 to be zero outside [a - 1,b + 1]. Any uniformapproximation of 1 on [a - 1,b + 1] will automatically yield uniformapproximation on the smaller interval [a, b]. The other method is tosubtract from1an appropriate affinefunctionAx+B so that I-Ax-Bvanishes at the endpoints-since Ax + B is a polynomial, we can addit back on to the polynomials approximating 1 - Ax - B to obtainpolynomials approximating l.

Figure 7.5.3:

C7 b b+la-l a ~

Prooñ There are two obstacles to using the approximate identitylemma. The first is that 1 is not definedand continuouson the wholeline, and the second is that it is impossibleto find polynomialsgn tosatisfy the approximate identity conditions.

The first obstacle is readily overcomein two different ways. Notethat if it happens that I(a) = I(b) = O, then we can extend thedomain of 1 to the whole line by setting 1 = O outside [a, b], and theextended function will be continuous. We can reduce the general caseto this special case by enlargingthe domain, say to [a -1, b+ 1],adding''flaps" to the graph of 1, as in Figure 7.5.3

7.5.3 The Weierstrass Approximation TheoremTheorem 7.5.1 (Weierstrass Approzimation Theorem) Let 1 be anycontinuous function on a compact interval [a, b]. Then there exista asequence 01polynomials converging unilormly to I on [a, b].

Lemma 7.5.1 (Approzimate Identity Lemma) Let {gn} be an approx­imate identity. Then il 1 is any continuous function on the Une van­ishing outaide a bounded interval, 1 * 9n converges unilonnly to l.

is called an approzimate identity.

3017.5 Approzimation by Polynomials

Page 321: Strichartz_The Way of Analysis 2000

Thus to complete the prooí oí the Weierstrass approximation the­orem vía the approximate identity lemma we need to show that thereexists an approximate identity that consists oí functions 9n that areequal to polynomials on [a', b') and vanish outside. For simplicity wetake the interval [-1,1), since the general case can be obtained by

Figure 7.5.4:1-1

only involves the values oí 9 on the interval [z - b, x - a); and if x isalso restricted to lie in [a, b), then only the values of 9 on the compactinterval [a', b/)= [a - b, b - a) are involved. Therefore, if we take 9n (z )to be equal to a polynomial on [a', b') and zero elsewhere, then f * 9nwill be equal to a polynomial on [a, b) (for z not in [a, b) we will nothave f *9n(X) equal to a polynomial, but we are only interested in whathappens on [a, b)).

f * g(x) = f f(x - y)g(y) dy =t f(y)g(x - y) dy

So nowwe assume that f is a continuous function on the whole linevanishing outside [a, b), and we need to approximate f by polynomialson [a, b). It is crucial that we observe that the approximation is onlyneeded on [a, b), since the growth oí polynomials as z -+ 00 precludesapproximation on the whole lineo Also, by restricting attention to theinterval [a, b), we can overcomethe problem that no polynomials satisfythe approximate identity properties. The idea is that if f vanishesoutside [a, b], then

Chapter 7 Sequences and Series of Functions302

Page 322: Strichartz_The Way of Analysis 2000

(the length of the interval m-1/2 times the lower bound for the func­tion).

So ir we set hm(x) = c~l(l - x2)m (on [-1,1] and zero elsewhere)we will have flhm(x) dx = 1 and, by the aboye estimate, Ihm(x)1 :5(4/3)m1/2(1- x2)m. Clearly hm satisfies the first two conditions for anapproximate identity. To verify the third condition we need to show!¡zl2:1/n hm(x) dx goes to zero as m ~ 00, for every fixed n. Sinceb-« vanishes outside [-1,1], this is a proper integral, so it suffices toshow hm(x) ~ Ouniformly on [l/n,I] (since b-« is even, the behavior

composing 9n with an appropriate affine function and multiplying bya constant.

We can actually write down a simple formula for the approximateidentity we want. Consider the function (1 - x2)m, shown in Figure7.5.4. Note that it vanishes to high order at x = ±1 and its graph overthe interval [-1, 1] appears concentrated near x = O. The conditionthat the integral be equal to one is not satisfied, so we have to multiplyby the appropriate constant, namely c;,.l, where Cm = J~1(1- x2)m da:Now while it is possible to compute Cm explicitly, the result is quitecomplicated (see exercise set 7.5.5, number 14) and it is more illumi­nating to get an estimate instead. Since wewill want an estimate fromaboye for c;,.l, weneed an estimate from below for Cm and, hence, an es­timate from below for (l_x2)m. For x near zero it is natural to compare(1- x2)m with 1-mx2, the first terms of the Taylor expansion aboutx = O. (See Figure 7.5.5where the two functions are graphed together.)Since they are equal at x = Oand d/dx(1 - x2)m = -2mx(1 - x2)m-lwhile d/dx(1 - mx2) = -2mx, we see 1 - mx2 :5 (1 - x2)m for all x,the desired estimate.

Note that 1 - mx2 vaníshes at x = ±..¡¡¡m, so the estimate1 - mx2 :5 (1 - x2)m is of interest only for Ixl :5 1/.¡m. In fact, ifIxl :5 1/2.¡m, then the estimate tells us

(1 - x2)m ~ 1 - mx2 ~ 1 - m/4m = 3/4

and so a lower bound for fl(1 - x2)mdx is

3037.5 Approximation by Polynomials

Page 323: Strichartz_The Way of Analysis 2000

as m -+ oo.This completes the verification that {hm} is an approximate identi­

ty. The approximate identity lemma tells us that hm * f -+ f uniformly

is the same on [-1, -l/nD. But hm is clearly decreasing on [l/n, 1],so it assumes its maximum at x = l/n. Thus we really need to showlimm_oo hm (l/n) = O for each n. Because of our bound for hm{x), thiswiIl foIlow from liffim-oo m1/2 (1- 1/n2)m = o.

Notice what the issue is here. The factor m1/2 goes to infinity, andthe factor (1- l/n 2) m goes to zero because 1- l/n 2 < 1. Which termdominates? In the next chapter we will discuss the general principiethat exponential factors always dominate polynomial factors. Here wecan finish the proof by recalling that our proof that limm_oo rm = Ofor r < 1 gave the estimate (1+ c)m ~ 1+ cm; hence, 1/{1+ c)m ~1/{1+ cm). In our case 1/{1+ e) = 1- 1/n2; hence,

(1 )m m1/2 1

m1/2 1- - < < -- -+ On2 - 1+ cm - cm 1/2

Figure 7.5.5:

f{z) = 1-mx2

1-1

Chapter 7 Sequences and Series of Functions304

Page 324: Strichartz_The Way of Analysis 2000

7.5.4 Approximating Derivatives

Next we show that if f is differentiabIe we can also approximate I' bythe derivatives of the polynomials approximating 1 (this is by no meansautomatic, as we have seen). To be precise, let us assume that 1 is elon [a, b), meaning that one-sided derivatives exist at the endpoints andI'(X) is continuous on [a, b]. As in the proof of the Weierstrass approx­imation theorem, we extend f to the whoIe line, but this time we needto do it in a more eIaborate way so that the extension is el. Thiscan be accomplished either by adding flaps that are adjusted to matchup derivatives or by subtracting from 1 a higher order polynomial tomalee both f and f' vanish at the endpoints. After this more carefuIpreparation, the same construction of 1* gn will yield the desired ap­proximation. Indeed it sufficesto verify that (f * gn)' = I' * gn, for theproof of the Weierstrass approximation theorem shows I'*gn convergesuniformly to f'. We state this as a general principle,

so 1/12 == O and 1 == O. The probIem of actually reconstructing afunction from its moments is more difficult, and we will not discuss ithere.

A typical application of the Weierstrass approximation theorem isthe following: if f is a continuous function on [0,11 and all the momentsCln = J¿ f(x )xndx are known for n = 0,1,2, ... , then is 1determined?By considering the difference of two functions with the same moments,the question reduces to: if the moments of 1 are all zero, is 1 zero?If the moments of 1 are a11zero, then also J¿ l(x)P(x) dx = O forany poIynomial P by linearity of the integral. But by the Weierstrassapproximation theorem we can find a sequence of polynomials Pn con­verging uniformly to 1 (if 1 is compIex-valued take ¡instead). Then1Pn converges to 1/12 uniformIy (the difference 1I Pn - PI= 1I IIPn - I1is dominated by the maximum vaIue of IJI, which is finite, times thevalue of IPn - 11, which tends to zero uniformIy). Thus we can inter­change the limit and integral:

0= lim O= lim J.l l(x)Pndx = J.1I/(X)12 dx,n-oo n-oo o o

on [-1,1], and we have observed that hm * 1 is equal to a polynomialon [-1,1]. QED

3057.5 Approximation by Polynomials

Page 325: Strichartz_The Way of Analysis 2000

Because of the commutativity of the convolution product, we alsohave (f * g)' = f * g'. By induction we can extend the theorem tohigher derivatives: if f is c-, then so is f * 9 and (f * g)(k) = f(k) * g.By choosing more sophisticated flaps we can extend ek functions on acompact interval [a, b] to ek functions on the line that vanish outside alarger interval and so obtain in {f * gn} a sequence of polynomials thatconverges uniformly to f with all derivatives of orders ~ k convergingto the corresponding derivative of f. Finally it is even possible to havederivatives of all orders of f * 9n simultaneously converge uniformly tothe corresponding derivatives of J, provided J is C'", This requiresadapting the ftaps to match derivatives of all orders. We will see howto do this in the next chapter.

In obtaining a sequence of polynomials approximating f, we havenot paid particular attention to the orders of the polynomials (of coursethe orders must increase to 00 unless f is a polynomial). An interesting

x - Xoconverges uniformly to f'(zo - y) as x -+ xo. Indeed by the meanvalue theorem it is f'(XI - y) for sorne Xl between Xo and x (Xl maydepend on y), and by the uniform continuity of f' we have f'(XI -y) -+ f' (xo - y) uniformly. Since the integration only extends over afinite interval, we can interchange the integral with the uniform limit toobtain f f'(xo-y)g(y) dy as the limit ofthe difference quotient for I=sat xo, simultaneously proving that f * 9 is differentiable and supplyingthe formula J' * 9 for the derivative. QED

f(x - y) - f(xo - y)Now we claim

f *g(z) - f *g(zo)z - Zo

= x ~ XoU /(x - ,,)g(,,) dy - f /(xo - ,,)g(,,) dY)

= f (/(x - Y~= ~~xo - ,,)) g(y) dy.

Proof: We form the difference quotient

Theorem 7.5.2 Let f be el and vanish outside a bounded internal,and let 9 be continuous. Then f * 9 is el and (f * g)' = f' * g.

Chapter 7 Sequences and Series of Functions306

Page 326: Strichartz_The Way of Analysis 2000

6. Define the suppori of f to be the closure of the set of points wheref 1= O. Prove that a continuous function 1 has compact support ifand only if f vanishes outside a bounded interval. Prove that support(f * g) ~ support (f) + support (g) where the + means the set of allsums of numbers from support (f) and support (g).

4. Prove that (f * g) * h = 1 * (9 * h) if 1,9, and h are continuousand two of them vanish outside a bounded intervalo (Note: thisrequires interchanging the order of two integrations.)

5. If1is e: and 9 is cm and one of them vanishes outside a boundedinterval, prove that f * 9 is Ck+m and (f * g)(k+m) = f(k) * g(m).

1. Show that there exists a polynomial of degree 2n - 1 satisfyingI(Zk) = ak and 1'(Zk) = bk for k = 1, ... ,n.

2. Let 1 be el on [a, b]. Construct a el extension of 1 to the linethat vanishes outside [a - 1, b + 1]. (Hint: use exercise 1.)

3. If 1 and 9 are continuous on the line and 1 vanishes outside abounded interval, prove 1* 9 is continuous.

7.5.5 Exercises

the maximum error (the sup is achieved because 1- P is continuousand [a, b] compact). We then define En(f) = inf{E(f, P) : P is anypolynomial of degree ~ n}. It is not clear that this inf is attainednor that the polynomial attaining the inf-if it exists-is unique. N­evertheless, both statements are true. The Weierstrass approximationtheorem implies liIDn_ooEn(f) = O, but we can also ask at what rateEn(f) vanishes. It turns out that we can relate the rate of convergenceto the smoothness of l. All these considerations are beyond the scopeof this book.

E(f, P) = sup{I/(z) - P(z)1 : z is in [a, b]},

question that involves such considerations is the question of best approxi­mation to 1by polynomials of degree ~ n. For any fixed polynomial P, wedefine

3077.5 Approzimation by Polynomials

Page 327: Strichartz_The Way of Analysis 2000

15. Compute f * f for f equal to the characteristic function of [O,1](equal to one on the interval, zero elsewhere). Explain why thisis called a "hat function".

Cm= 2 2·4·6··· (2m) = 2 (2mm!)23·5·7 .. · (2m + 1) (2m + 1)!

14. a. For Cm = J~l {1- x2)m dx, obtain the identity Cm = Cm-l -{1/2m)Cm by integration by parts.

b. Show that

13. Let Pn --+ f uniformly on [a, b] where {Pn} is a sequence of poly­nomials of degree ~ N. Prove that f is a polynomial of degree~ N. (Hint: for each k ~ N find a continuous function hk (x)such that J: hk{X)xidx = Ofor all j ~N such that j =F k butJ: hk{X)xkdx = 1, and consider limn_ooJ: hk{X)Pn{x) dx.)

12. Let f be defined and el on (a, b), and suppose one-sided limitsof f' exist at a and b. Prove that one-sided limits of f exist at aand b and f can be extended to a el function on [a, b].

11. Prove that none of the power-series expansions of 1/{1 + x2) con­verge to it on [-2,2].

10. Let f be an even function (I{x) = f{ -:t)) on [-1,1]. Prove thatif s: f{x)x2kd,x = Ofor k = 0,1,2, ... , then f == O.

9. If f{c) = Ofor some point c in (a, b), prove that the polynomialsapproximating f on [a, b] may be taken to vanish at C.

8. If f 2: O,on [a, b] show that the polynomials approximating f maybe all taken ~ Oon [a, b].

7. If f is el on [a, b] prove that there exists a cubic polynomial Psuch that f - P and its fírst derivative vanish at the endpoints ofthe intervalo

Chapter 7 Sequences and Series of Functions308

Page 328: Strichartz_The Way of Analysis 2000

Compactness is a powerful method for producing existence theorems-e­for example, the existence of a point where a continuous functionachieves its maximum or minimum on a compact interval. In manyproblems we need to find functions, rather than points, that maximizeor minimize certain quantities (often physical quantities such as ener­gy or entropy are involved). For such problems we need a differentnotion of compactness. In order to describe one such notion we consid­er the compactness condition "every sequence of points in a set has asubsequence converging to a point in the set". If we replace the word"point" by "function", we are led to consider the problem of when asequence of functions has a subsequence that converges. We will limitthe discussion to continuous functions defined on a compact interval[a, b), and we will demand uniform convergence. A typical way thisproblem arises is when we try to minimize some "functional" E(f)over a11continuous functions / (for example, E(f) = Jo1 I/(x )12dx). Ifwe can show that E(f) is bounded from below, E(f) ~ c, then theinf of a11the real values E(f) exists and so we can find a sequence offunctions /n such that E(fn) converges to this inf. If we knew therewere a subsequence that converged uniformly, / ni --+ /, then we couldhope that lilIln_ooE(fn/) = E(f) (in actual applications this is usuallythe most difficult step) and / would then be a continuous function (theuniform limit of continuous functions) minimizing E.

Not every sequence of continuous functions on [a, b] has a uniformlyconvergent subsequence. The simplest example is /n(x) == n. Here thetrouble is that the functions are unbounded. It is natural then toimpose the condition that functions be uni/ormly bounded, meaningthat l/n(x)1 ~ M for some M for all n and all x. Note that thecondition that the sequence of functions be bounded at each point (forevery x there exists Mx such that l/n(x)1 ~ Mx for all n) is enoughto guarantee that for every point there is a subsequence converging atthat point (for every Xo there exists {n(k)} depending on Xo such thatlimk_oo /n(k) (xo) exists). The reason we require uniform boundedness(the bound M does not depend on the point x) is that it is a property ofevery uniformly convergent sequence of continuous functions. If / n --+ /

7.6.1 The Deftnition of Equicontinuity

7.6 Equicontinuity

3097.6 Equicontinuity

Page 329: Strichartz_The Way of Analysis 2000

An individual continuous function is prevented from jumping aroundby the continuity condition. Since the domain is assumed compact, weautomatically have uniform continuity: for every error l/m there ex­ists l/n such that Ix - yl < l/n implies If(x) - f(y)1 < l/m. If weswitch to a different continuous function g, the same condition is sat­isfied, except that the value of l/n may be different. If we look atfunctions like sin kx for large k we see that l/n must be taken verysmall-the larger k the smaller l/n. However, if Ik -+ f uniformly,

Figure 7.6.1:

(\ (\ V\ (\Sin(RX)

\TU VV

so Ifn(x)1 ~ l+supx If(x)1 for n ~ N and wecan take forM the largestof sup; Ifn(x)1 for n ~ N and 1+suPx If(x)l. Thus we are imposing onthe original sequence a condition that must be met by the convergentsubsequence.

But we are still far from our goal. It is easy to give examples ofuniformly bounded sequences of continuous functions that possess nouniformly convergent subsequences-essentially because they oscillatetoo mucho A simple example is the sequence {sinnx }. Figure 7.6.1shows a typical function from this sequence. We will not give a detailedproof that no convergent subsequence exists, but this is clear from thegraph. In order to find the correct condition that will rule out this kindof oscillatory behavior, let us try to determine what kind of behavior auniformly convergent sequence of continuous functions possesses thatrules out such unrestrained agitation.

Ifn(x) - f(x)1 ~ 1 for all n ~ N,

uniformly, then

Chapter 7 Sequences and Series of Functions310

Page 330: Strichartz_The Way of Analysis 2000

2. uniformly equicontinuous.

We will now show that these two conditions imply the existence of auniformly convergent subsequence. (Note: if you have difficulty pro­nouncing the words "equicontinuity" and "equicontinuous", try saying

1. uniformly bounded and

We can also define equicontinuity at a point, but since we will notuse this notion explicitly, we wiIl not give the definition.

We can summarize the discussion so far by saying that any uniform­ly convergent sequence of continuous functions on a compact intervalis

Deflnition 7.6.1 A sequence of functions {fk} defined on a domainD is said to be uniformly equicontinuous if for every l/m there existsan error l/n (depending only on l/m) sucñ that Ix - yl < l/n impliesIfk(X) - fk(y)1 < l/m for all k. Similarly we say any family (possiblyuncountable) of functions is uniformly equicontinuous if for every l/mthere exists l/n such that Ix - yl < l/n implies If(x) - f(y)1 < l/mjor every function of I in the family.

and the fact that each term is ~ l/3m). This stillleaves a finite set offunctions fk for which we don't have Ifk(X) - f(x)1 < l/3m for all e,But they do not cause much trouble because there are a finite numberof them. For each !k there is lInk such that Ix - yl < lInk impliesIfk(X) - Ik(y)1 < l/m because each fk is uniformly continuous. Thusby taking the mínimum of link for the finite number of fk not coveredaboye and the l/n that works for all large k, we obtain a single ualueof l/N such that Ix - yl < l/N implies l!k(x) - Ik(y)1 < l/m for allk. This kind of continuity estimate that is valid for all fk is referred toas equicontinuity.

Ifk(X) - fk(y)1 ~ Ifk(X) - f(x)1 + If(x) - f(y)1 + If(y) - fk(y)1

then the error estimate l/n for the single function f can serve as wellfor all the functions !k with k sufficient1y large. More precisely, if wechoose l/n so that Ix - yl < l/n implies If(x) - f(y)1 < l/3m, thenIx - yl < l/n also implies I/k(X) - Ik(y)1 < l/m once k is large e­nough that Ifk(X) - l(x)1 < l/3m for all z, (The proof follows fromthe three-term estimate

3117.6 Equicontinuity

Page 331: Strichartz_The Way of Analysis 2000

and take the diagonal, 111,122, /33, .... The diagonal is a subsequenceof the original sequence; in fact, except for the first k terms it is a subse­quence of the kth row b:l, Ik2, Ik3, ... , because every row below the kth

7.6.2 The Arzela-AscoliTheorem

Tbeorem 7.6.1 (Arzela-Ascoli Theorem) Let {fk} be a sequence 01functions on a compact interval that is unilonnly bounded and uní­lormly equicontinuous. Then there exists a unilormly convergent sub­sequence.

Proof: The idea of the proof is first to obtain a subsequence thatconverges at every point of a countable dense subset of the domain andthen to show that this subsequence converges uniformly. The first steprequires the uniform boundedness; the second step requires the uniformequicontinuity.

Let Xl, X2, ••• be a countable dense subset of the domain (say allrational numbers in the interval). The sequence of numbers 11(Xk),12[zk), . .. for each fixed k is bounded (by the uniform boundedness),so for each k there is a subsequence that converges. The problem is toobtain a subsequence that converges simultaneously at all the pointsXk. We do this by a diagonalization procedure. First choose a sub­sequence of {In} that converges at Xl. Call it In, Iiz, 113,.... Thenchoose a subsequence of In, 112,113,... that converges at X2 • Call it121,122,123,.... Then this sequence also converges at Xl because it is asubsequence of a sequence that converges at Xl. Next we choose a sub­sequence of 121,122,123,... that converges at X3. In this way we obtaina sequence of subsequences of the original sequence, each one a subse­quence of the previous one, such that the subsequence Ik¡' Ik2, Ik3,' ..converges at the points X¡, X2,"" Xk. We still do not have a subse­quence converging at the infinite set of points Xl, X2, • • •• To get thiswe write all our subsequences in an infinite matrix

In 112 hs121 122 123/31 /32 133

the words "continuity" and "continuous" and then prefix them with anunaccented "equi".)

Chapter 7 Sequences and Series of Functions312

Page 332: Strichartz_The Way of Analysis 2000

Since !k converges at each of the points X¡, X2, ••• , xr, we can find Nsuch that j, k ~ N implies l/j(xp) - Ik(xp)1 :5 l/3m for p = 1,2, ... r(choose N to be the largest value of those required to make this state­ment true for each xp).

We claim that j, k 2': N implies l/j(x) - A(x)1 ~ l/m for any pointin the interval. Indeed given any x we find xp with [z - xpl < l/n, and

2. the interval is compact (if the countable dense set was all rationalnumbers in the interval, then we need only choose all points ofthe form k / n that Hein the interval).

1. the full sequence XI, X2, ••• is dense and

where xp is chosen from the finite set of points and is near e. Themiddle term will be controlled by the Cauchy criterion at xp, while theother two terms will be controlled by the uniform equicontinuity.

Let the error l/m be given. By the uniform equicontinuity, thereexists l/n such that Ix - yl < l/n implies l/j(x) - h(y)1 < l/3m. LetXl, X2, ••• , Xr be chosen from the countable dense set such that everypoint x in the domain is within a distance of l/n of at least one ofXl, X2, ••• , Xr• The number of such points will depend on l/n, but itwiIl be finite because

is a subsequence of the kth row. Therefore the diagonal sequence con­verges at Xk (a subsequence of a convergent sequence converges). Thisis true for all values of k, so lu, /22, /33, ... is the desired subsequenceconverging at a countable dense subset.

Now we come to the second step. We need to show that this sub­sequence actually converges uniformly. To do this we need to verifythe uniform Cauchy criterion (given any error l/m there exists N suchthat j, k ~ N implies l/j(x) - Ik(X)1 < l/m for all x) since we knowthis implies uniform convergence. How can we accomplish this? Theidea is that we need only nail down the Cauchy criterion at a finite setof points (depending on l/m) from our countable dense subset, andthen the uniform equicontinuity holds down the other points (imaginea finite number of clothespins holding together two clotheslines thatare not too wiggly). We will make the comparison

3137.6 Equicontinuity

Page 333: Strichartz_The Way of Analysis 2000

1. If In is a uniformly equicontinuous sequence of functions on acompact interval and In ~ I pointwise, prove that In ~ I uni­formly. (You should not assume that I is continuous, althoughthis is a consequence of the result.)

7.6.3 Exercises

In order to apply the Arzela-Ascoli theorem we need a criterion forequicontinuity. Fortunately there is one that is very simple to verify.Suppose all the functions in the sequence are Cl and the derivativesare uniformly bounded, say I/Hx)1 ::; M. Then I/k(X) - h(y)1 =I/k(z)llx-yl ::;Mlx-yl by the mean value theorem and so the sequenceis uniformly equicontinuous (choose l/n = l/Mm ). This is the rnostcommon condition used to verify uniform equicontinuity, so we state itseparately:

Corollary 7.6.1 Suppose {Id is a sequence OICl functions on a com­pact interval such that Ilk(X)1 ~ M and Ilk(x)1 ~ M [or all k and x.Then there exists a unilormly convergent subsequence.

Of course the condition of uniform boundedness of derivatives is notnecessary for uniform equicontinuity, so there may be cases where thetheorem applies but not the corollary. Unfortunately we cannot givehere any honest applications of either, since they are all difficult andwould take us far afield. (Sorne of the difficulties arise from the unbal­anced nature of the corollary: although we make hypotheses about thederivatives of the functions, we do not obtain any conclusions concern­ing derivatives.) In a later chapter we will show how the Arzela-Ascolitheorem can be interpreted as a compactness theorem.

IfJ(x) - h(x)1 < IfJ(x) - iJ(xp)1 + l/j(xp) - Ik(Xp)1+llk(Xp) - h(x)1

< l/3m + l/3m + l/3m = l/m,

the first and third estirnates following from [z - xpl < l/n and theuniform equicontinuity, and the middle one from the Cauchy criterionat xp' QED

then

Chapter 7 Sequences and Series of Functions314

Page 334: Strichartz_The Way of Analysis 2000

11. Let {lk} be a sequence of uniformly bounded uniformly equicon­tinuous functions on a bounded open interval (a, b). Show thatthe functions can be extended to the compact interval [a, b] so thatthey are still uniformly bounded and uniformly equicontinuous.

10. Give an example of a sequen ce of functions satisfying the hypothe­ses of the corollary that has no subsequence whose derivativesconverge uniformly.

9. Give an example of a uniformly bounded and uniformly equicon­tinuous sequence of functions on the whole line that does not haveany uniformly convergent subsequences.

8. Let f¡,h, ...,fn be any finite set of continuous functions on acompact interval. Show that the family of a11linear combinationsI:j=l aj t, with a11 laj I s 1 is uniformly bounded and uniformlyequicontinuous.

7. Prove that the family of all polynomials P( x) of degree ~ Nsatisfying IP(x)1 ~ Ion [0,1] is uniformly equicontinuous on [0,1].

6. Prove that the family of a11polynomials of degree ~ N with co­efficients in [-1,1] is uniformly bounded and uniformly equicon­tinuous on any compact interval.

5. Give an example of a sequence that is uniformly equicontinuousbut not uniformly bounded.

4. Prove that the sequence fn(x) = sin nx is not uniformly equicon­tinuous on any non-trivial compact intervalo

3. Let {In} be a sequence of coo functions on a compact intervalsuch that for each k there exists Mk such that If~k)(x)l ~ Mkfor all n and z, Prove that there exists a subsequence converginguniformly, together with the derivatives of a11orders, to a Coofunction.

2. If I/n(x) - In(y)1 ~ Mlx - ylQ for sorne fixed M and Q > °and a11z , y in a compact interval, show that {In} is uniformlyequicontinuous.

3157.6 Equicontinuity

Page 335: Strichartz_The Way of Analysis 2000

Theorem (Completeness 01 e) 11 Zl,Z2,'" is a Cauchy sequence 01complex numbers ifor every l/n there exists m such that i.k ~ mimplies IZj - zkl ~ l/n), then there ezists a complex number z suchthat Zk -T z.

Theorem Zk -T z il and only il Xk -T x and Yk ~ y.

Deflnition If Zl, Z2, . .. is a sequence 01 complez numbers (Zk =Xk + iYk), then Zk ~ z illor every l/n there exists m such that k ~ mimplies IZk - z] < l/n.

Theorem IZ'Zll = Izllzll and Iz+zd ~ Izl+lzd (triangle inequality).

Deftnition A complez number z is an expression x + iy where x and yare real numbers. Under the operations (x + iy) + (x' + iy') = x + x' +i(y + y') and (x + iy) . (x' + iy') = xx' - yy' + i(xy' + x'y) the complexnumbers [orm a field, denoted C. The modulus or absolute value 01 acomplez number is Izl= Ix + iyl = y'x2 + y2.

7.1 Complex Numbers

7.7 Summary

13. Suppose f¡, 12, ... is a sequence of functions on a compact intervalthat is pointwise bounded (1/k(x)1~ M(x) for all k) and point­wise equicontinuous (for each x in the interval and for all l/m,there exists l/n such that Ilk(x) - Jk(y) I ~ l/m for all k pro­vided Ix - yl ~ l/n). Prove that there is a pointwise convergentsubsequence.

12. Let {lk} be a sequence of functions defined an open interval(a, b) (not necessarily bounded) satisfying l/j (x ) I ~ F (x) andI/j(x)1 ~ G(x) for all i, where F and G are continuous functionson (a, b). Prove that {fk} has a subsequence that converges uni­formly on compact subsets of (a, b). (Hint: use the diagonaliza­tion argument after obtaining subsequences converging uniformlyon [a+ l/n, b - l/n].)

Chapter 7 Sequences and Series of Functions316

Page 336: Strichartz_The Way of Analysis 2000

b. (Root test) JI \I1XJ < r for all sufficiently large n and somer < 1, then EXn converges absolutely.

a. (Ratio test) JI IXn+l/xnl < r [or all sufficiently large n and somer < 1, then EXn converges absolutely, while if IXn+l/xnl ~ 1 forall sufficiently large n, then EXn diverges.

Theorem 7.2.3

Theorem 7.2.2 (Comparison test) J/lxkl s s» and ¿::1 Yk converges,then L~l Xk converges absolutely.

Theorem Absolute convergence implies convergence.

Theorem 7.2.1 (Cauchy criterion) 2:~1Xk converges if and only iffor every l/n there exists m such that q ~ p ~ m implies ILX=pXkl <l/n.

Theorem JI¿~l Xk and ¿r:1 Yk are convergent, then so are ¿~1 aXkand 2:::1(Xk +Yk), and 2:::1aXk = a 2:~1Xk and ¿::1 (Xk+ Yk) =¿~l Xk +¿::l Yk· JI Xk ~ Yk [or every k then ¿~1 Xk ~ ¿~1 Yk'Jf 2:~1Xk is convergent and Yk = Xk for all but a finite number 01 k,then ¿v« is convergent.

Example The geometric series Lr:1 rk = r/{l - r) converges forO < r < 1 and diverges for r ~ 1.

Deftnition ¿~1 Xk is convergent if SI, S2, ••• is convergent, where Sn

denotes the partial sums, Sn = Lk=l Xk, and L~l Xk = liDln_oo Sn'Otherwise, the series is said to be divergent. JI liIIln-oo Sn = +00,then2:~1Xk is said to diverge to +00.JI L~l IXkl is convergent, ¿~1 Xkis said to be absolutely convergent.

7.2 Numerical Series and Sequences

Theorem JI a complex-valued function F is integrable, then IFI isintegrable and IJ:F{x) dxl s J: IF{x)1 ds:

3177.7 Summary

Page 337: Strichartz_The Way of Analysis 2000

Deflnition Let f1(x), h(x), ... be a sequence ollunctions on a domainDo We say In(x) -+ I(x) pointwise illor every x in D, the sequence 01numbers {ln(x)} converges to the number I(x)o We say In(x) -+ I(x)unilormly illor every l/n there ezists m (not depending on x) suchthat for all x in D and all k ~ m we have I/k(x) - l(x)1 ~ l/no

7.3 Uniform Convergence

Theorem 7.2.51f AI,A2, ... is a sequence of positive numbers, Al 2':A2 2': ... , and lirn., __oc An = 0, then L~l (_l)n An converges. Moregenerally L~=l bnAn converges if also IBnl ~ M for all n, uihere s; =L~=l bk·

Theorem (Summation by Parts) Let Al, A2, o o o and bl,b2, o o o be twosequences and an = An-l -An, Bn = L:~=lbk; and suppose An+IBn -+00 Then L::=l Anbn converges il and only il L::=l anBn converges, andL~=l Anbn = L~=l anBno

Theorem 11 L:=llanml is convergent [or every n andL~=l (L:=l lanml) is convergent, then L~=l (L:=l anm) =L::=l (L:~l anm)o

Deflnition A series is said to converge unconditionally il every rear­rangement converges, conditionally il it converges but some rearrange­ment divergeso

Theorem 7.2.4 A series is absolutely convergent il and only il eu­ery rearrangement is convergent, in which case the rearrangement isabsolutely convergent and has the same limito

Deflnition L~=l i; is a rearrangement 01L:~=lan il bn = am(n) [orsome one-to-one onto function m(n) from the natural numbers to thenatural numberso

Example L~=l (-l)n/n converges but not absolutelyo

Example L~=ll/na converges [or a > 1 and diverges [or O~ a ~ 1.

Chapter 7 Sequences and Series of Functions318

Page 338: Strichartz_The Way of Analysis 2000

Theorem 7.4.1 The rodius of convergen ce R of L::o an(x - xo)n isgiven by 11R = limsuPn_oo vIaJ.

Example The power series E:':oxn, E~l (l/n )xn, and E:=l (l/n2)xnall have radius of convergence R = 1, but the first diverges at x = ±1,the second converges at x = -1 but diverges at x = +1,while the thirdconverges at x = ±1.

Deftnition The radius of convergen ce of a power series is the uniquenumber R in [0,00] such that the series converges for Ix - xol < R anddiverges for Ix - xo I > R.

Lemma lf a power series converges for x = X¡, then it convergesabsolutely for Ix - xol < IXI - xol and uniformly in Ix - xol ~ r for anyr < IXI - xol.

Definition A power series about Xo is a series of the formL::O an{x - xo)n.

7.4 Power Series

Theorem 7.3.5 lf Í« are continuous functions on a compact domainD, then fn(xn) -+ f(x) for all sequence» xn -+ x in D if and only iffn -+ f uniformly.

Theorem 7.3.4 lf [« are el functions on (a, b), l« -+ f pointwise andf~ -+ 9 uniformly, then f is el and f' =g.

Theorem 7.3.3 liIIln_oof: fn{x) dx = J: f{x) dx if fn{x) -+ f{x)uniformly on [a,b).

Theorem 7.3.2 Uniform convergence preserves continuity at a point, con­tinuity on the domain, or uniform continuity.

Theorem 7.3.1 (Cauchy criterion) {fn{x)} converges uniformly tosome limit function if and only if for every 1/n there eXÍ8ts m sucñthat l/j{x) - h:{x)1 ~ 11n for all i.k ~ m and all x in D.

3197.7 Summary

Page 339: Strichartz_The Way of Analysis 2000

Theorem Given distinct points Xl, ... ,In and valúes al, ...• an, thereexists a unique polynomial P(x) of degree ~ n - 1 such that P(Xk) =akt namely P(x) = E~=l akQk(x) where Qk(X) = (qk(Xk))-lqk(X) andqk(X) = Oi#(x - Xj). P(x) is called the Lagrange interpolation poly­nomial.

7.5 Approximation by Polynomials

Theorem 7.4.5 Jf f and 9 are analytic junctions and the range of flíes in the domain of g, then 9 o f is analytic with the power seriesobtained by formal substitution of the power series for f in the powerseries for g.

Theorem 7.4.4 Jf f and 9 are analytic, then so are f±g, f -s, and f /g(provided 9 never vanishes), with the pouier series obtained by formalcombination of the series for f and g.

Deftnition An analytic function f (z) on an internal (a, b) is a functionwith a convergent power-series expansion (non-zero radius of conver­gence) about each point in the domain.

Theorem 7.4.3 Jf f(x) = 2::=0 an(x - x¡)n in Ix - xII < R, then forany X2 satisfying IZ2 - xII < R there exists a power series2::=0 bn(x - X2)n converging to f at least for Ix - x21 < R -lx2 - xII,and

Theorem Jf f(x) = 2:~=0an(x - xo)n with radius of convergenceR > O, then an = f(n)(xo)/nL Jf2:~=o an(x-zo)n = 2:~=0bn(x-xo)nfor Ix - xol < R and R > O, then an = bn for all n.

Theorem 7.4.2 Jf f(x) = 2:~=0an(x-xo)n with radius of convergenceR > O, then f is Coo in Ix - xol < R and the series can be differentiatedterm- by-termo

Examples Jf an = p(q)/q(n) where p and q are polynomials, thenR = 1. 2:~=0zn/n! has R = +00, while 2:~=0n!xn has R = O.

Chapter 7 Sequences and Series of Functions320

Page 340: Strichartz_The Way of Analysis 2000

Definition 7.6.1 A sequence 01functions {fk} on a domain D is saidto be unilormly bounded il there exists M such that lik(x)1 ~ M for allk and all x in D. Jt is said to be unilormly equicontinuous illor every

7.6 Equicontinuity

Theorem JI I is ek on [a,b], then there exists a sequence 01polyno­mials Pn with pW) -+ f(j) uniformly on [a,b]for all j s k.

Theorem 7.5.2 JI I is el and vanishes outside a bounded intenJaland 9 is continuous, then I * 9 is el and (f * g)' = f' * g.

Corollary JI I is continuous on [0,1] and Jol/(x)xndx = Olor alln= O, 1, ... , then I == O.

Theorem 7.5.1 (Weierstrass Approximation) Any continuous func­tion on a compact interval is the unilorm limit, on that intenJal, 01polynomials.

2. J~oogn(x) dx = 1, and

3. limn_oo J~;!n + Jt/ngn(x) dx = O.

Lemma 7.5.1 (Approximate Jdentity) JI gn is an approximate identityand lisa continuous function vanishing outside a bounded interval,then gn * I -+ I unilormly.

1. gn ~ O,

Definition 7.5.1 An approximate identity is a sequence 01 continuousfunctions gn on the line satisfying

Theorem The convolution product is commutative.

/*g(x) =L: /(x -y)g(y)dy.

Deftnition The convolution I * 9 01 two continuous functions on theUne, one 01 them vanishing outside a boundedset, is given by

3217.7 Summary

Page 341: Strichartz_The Way of Analysis 2000

Corollary 7.6.1 If {Ik} is a sequence of el functions on [a,b] sat­isfying Ifk(X)1 :5 M and If~(x)1 :5 M for all k and x, then it has auniformly convergent subsequence.

Theorem 7.6.1 (Arzela-Ascoli) A sequence of uniformly bounded anduniformly equicontinuous functions on a compact interval has a uni­formly convergent subsequence.

l/m there exists l/n such that Ix - yl < l/n implies IfA:(x) - ik(y)1 <l/m [or all k.

Chapter 7 Sequences and Series of Functions322

Page 342: Strichartz_The Way of Analysis 2000

323

This list does not exhaust all the characterizations, but chances areany calculus book will use one of the aboye. One of the maín goalsof this section is to show that all five are equivalent, so it is not tooimportant which one we take as the official definition. However, notall of these characterizations are of the same nature¡ for example, def­initions 1 and 5 are direct1y algorithmic, giving a formula to computeexp(x) (although neither formula would be a particularly good choicefor computing exp(1007r)). Many ofthese characterizations require that

5. the limit of (1 + xln)n as n -T oo.

4. the inverse of the naturallogarithm Jt lIt dt,

3. the number e raised to the power z,

1. the power series expansion exp(x) = L~=o xnIn!,2. the unique solution to the differential equation J' = J with J(O) =

1,

8.1.1 Five Equivalent Definitions

The exponential function, exp(x) or eX, is one of the most importantfunctions in mathematics. There are many ways to characterize it, suchas:

8.1 The Exponential and Logarithm

Transcendental Functions

Chapter 8

Page 343: Strichartz_The Way of Analysis 2000

by the binomial theorem. The rearrangement is justified by the abso­lute convergence of the doubly indexed series.

00 n 00 mexpíz] exp(y) = ~ =-- ~ L

L; n! L; m!n=O m=O00 ( znym) 00 (z + y)k

= E E n!m! =E k!k=O n+m=k k=O

Proof:a. The differential equation follows by term-by-term differentiation

of the power series, and exp( O) = 1 since on = O for n ~ 1. Indeed, wecould deduce the form of the power series from the differential equation:if (E anzn)' = E anzn, then n an = an-b so an = ao/n! and 1(0) = 1yields ao = 1.

b. The identity may be established by multiplying the power seriesand rearranging terms:

c. exp(z) > Olor any real e,

b. exp(x + y) = expe expy [or any real x and y.

a. The exponential function satisfies the diJJerential equation f' = f,with f(O) = 1.

Theorem 8.1.1

We have observed in the last chapter that this power series hasinfinite radius of convergence, so exp( z) is well defined and is a Coofunction and the power series converges absolutely and uniformly onany bounded interval; in particular, the series can be rearranged anddifferentiated term-by-term.

Deftnition 8.1.1 The ezponential function exp(z) is defined for anyreal z by exp(z) = E~=ozn/n!.

certain facts be proved first, for example, that the limit in definition 5actually exists.

We will take definition 1for our definition, use it to establish someof the basic properties of the exponential function, and then prove inturn that each of the other descriptions yields the same function.

Chapter 8 Transcendental Functions324

Page 344: Strichartz_The Way of Analysis 2000

For the third characterization, we define the real number e to beexp(l) = ¿~=ll/n! ~ 2.7. This number is not rational-a fact that isnot too difficult to prove, but we will not give the proof here. It is noteven algebraic (i.e., it does not satisfy any polynomial equation with ra­tional coefficients )-although this is more difficult to establish. For thisreason the number e is called transcendental. For a similar reason thefunction exp(x) is called transcendental (there is no polynomial F(x, y)in two variables with rational coefficients such that F(x, expx) = O,forthen F(l, e) =Owould imply that e is algebraic).

Theorem 8.1.3 For every rational number pjq (p and q integers, q >O) we have exp(pjq) = (eP)l/q. For x real and Xk = Pk/qk a Cauchysequence oj rationals converging to x, exp(x) = limk_oo exp(xk) =limk_oo (eplc ) 1/q1c •

Proof: The identity exp(x + y) = exp x . exp y implies exp(2) = e2,and by induction exp( k) = ek for any non-negative integer k. From

'() j'(x) exp(x) - j(x) exp(x)9 x = __;~-~~__,...,~-;;.....;....~(expx)2

using the fact that df d» exp(x) = exp(x). But j' = j then gives g' = Ofor all x; hence 9 is constant, so j(x) = g(x) exp(x) is a constantmultiple of exp(x). Clearly the constant is 1 if and only if j(O) = 1.QED

Proof: We have already established existence. The uniqueness is aspecial case of a more general theorem to be established in Chapter 11.We give here a simple but tricky proof of this special case.

Look at g(x) = j(x)/ exp(x). Since exp(x) never vanishes, this iswell defined and 9 is differentiable (it is implicit in writing J' = j thatj is differentiable; and since differentiability implies continuity, j isalso continuous; hence j is el since j' = f). We compute

Theorem 8.1.2 exp(x) is the unique 80lution oj the differential equa­tíon t' = j with j(O) = 1.

c. For x > Oeach ter m in the power series is positive, so exp(x) > O.For x negative just use part b to obtain exp( x) exp( -x) = 1, henceexpz = l/exp( -x) is positive. QED

3258.1 The Exponential and Logarithm

Page 345: Strichartz_The Way of Analysis 2000

so d/dxlogx = L]», From the fundamental theorem of the calculusand log 1=O (from eO =1) we deduce log x =I: l/t dt. QED

1 1=--=-,expyo Xo

d 1.J_ log x x=xo = dUóli ":J'::' exp x

ax X=YO

Proof: By the inverse function theorem log x is differentiable sinced/dx exp{x) =1= O, and the derivative of log x at x = Xo is the reciprocalof the derivative of exp{x) at x = Yo where exp{yo) = Xo. Thus

Theorem 8.1.5 logz =J: l/tdt.

The inverse function to exp is called the natural logarithm, denotedlog x or In x (or log, x). It is a function with domain R+ and range R,increasing, one-to-one, and onto.

Proof: We have seen that exp{x) assumes only positive values. Fromthe differential equation d] dx exp( x) = exp( x) and the positivity ofexp{x) we see that exp{x) is strictly increasing and, hence, one-to-one.To show tbat it assumes all positive values we observe that liDln-oo en =+00 and limn_oo e-n = O, since e > 1 and O < e-1 < 1. Since exp iscontinuous, the intermediate value theorem implies that it takes on allpositive values. QED

Theorem 8.1.4 The exponentialfunction maps R one-to-one and ontoR+.

Next we consider the fourth characterization in terms of the inversefunction. For this we need to know that the image of the exponentialfunction is the set of positive real numbers, which we denote R+ .

e =exp(1/2 + 1/2) =exp{1/2)2 we obtain exp{1/2) =e1/2 (rememberexp is always positive). In a similar way we obtain exp(l/n) = e1/nand exp(p/q) = (eP)l/q for every rational number p/q. Since expis continuous, we have exp(x) = limk_oo exp(xk) where Xk is anyCauchy sequence of rational numbers converging to x, so exp{x) =limk_OO(eP,,)1/9". This shows the limit exists (it would be awkward,although not impossible, to prove this directly), and it justifies writingexp{x) = eX even for irrational values of x. QED

Chapter 8 Transcendental Functions326

Page 346: Strichartz_The Way of Analysis 2000

This completes the proof of the equivalence of all five characteriza­tions oí exp.

An important property of the logarithm is that it also has power­series expansiona, although the radius oí convergence is finite. Startingfrom logz = ft l/t dt we compute

(lOg(l + z/n) -log 1)10g(1+ z/n)n = n 10g(1+ z/n} = z z/n

since log 1 = O. But with z =F O fixed, z/n goes to zero as n ~ 00,

so (1og(l + z/n) -log l}/(z/n} converges to the derivative of Iog z atz = 1, which is one. This establishes liIIln_oolog(l + z/n}n = z asclaimed. QED

since exp is continuous. Thus it sufficesto show limn_oo10g(1+z/n)n =Z. But

Theorem 8.1.6 exp(z} = liIIln_oo(l + z/n}n [or any real Z.

Proof: We start by writing

lim (1 + z/n}n = lim exp log (1 + z/nt = exp lim log (1 + z/ntn-oo n-oo n-oo

An altemative approach to defining the exponential and logarithmfunctions ís to atart with logz = ft l/t dt. From this definition it is asimple matter to deduce log(zy} = logz + logy by a change of variableargumento Then we can define exp as the inverse function to log anddeduce the differential equation exp' = exp from the inverse functiontheorem.

The fundamental identity exp(z+y} = exp z exp y and its equivalentlog(zy} = logz + logy show that exp and log establish an isomorphismbetween the additive group 01 R and the multiplicative group 01 a+ .Logarithms were invented by Napier to exploit this isomorphism forcomputational purposes-only recently have these applications becomeobsolete.

We come now to the fifth characterization of the exponential interms of a limito

3278.1 The Exponential and Logarithm

Page 347: Strichartz_The Way of Analysis 2000

= ~ (-l)\x _ l)k+1z: k+lk=O

where we assume Ix - 11< 1 in order to have uniform convergence ofthe 1/(1 + t) expansion, justifying the interchange of sum and integral.This is frequently written

x2 x3 x410g(1+ x) = x - 2" + 3" - "4 + ....

For the power-series expansion of log x about an arbitrary point Xo > Owe use logx = logz¿ + logx/xo = logz¿ + log(l + (x - xo)/xo) = log z¿ +L~=l(-l)k+l((x - xo)/xo)k/k, which converges for Ix - Iol < xo. Thuslog x is an analytíc functíon.

We can use exp and log to define general powers, ab = exp( b log a)for a > Oand any real b, and we can verify that ab = (aP)l/q if b = p/qis rational. Also ab is an analytic function of either variable. If we fuea > O, then aX = exp( x log a) is analytic in x because exp is (here log ais just a constant). If we fix b, then xb = exp( b log x) is the compositionof anal ytic functions and, hence, anal ytic in x > O. Similarly f(x)g(x)is analytic if f and 9 are analytic and f(x) > O. However, there may bepoints where f(x) = O and f(x)g(x) is still defined but not analytic (as in(x2)1/2 = Ix!). The familiar identities (ab)C = ab.c and log(ab) = blog acan also be easily deduced-we leave these as exercises.

In addition to the identities for exp and log, we need to understandthe asymptotic behavior of these functions. The significance of "ex­ponential growth" is that exp( x) beats out any polynomial in z, thatis, limx_oo x-n exp(x) = +00 for any n. This is an immediate conse­quence of the power-series expansion, since exp( x) ~ xn / n! for any n,if x > O (just throw away the other positive terms of the power series),and so (substituting n + 1 for n) x-n exp(z ) ~ x-nxn+l/(n + 1)! =x/(n + 1)! -+ 00 as x -+ oo. Similarly we have rapid exponential de­cay as x -+ -00, limx_+oo xn exp( -x) = O for any n, which is justthe reciprocal of the exponential growth. For the logarithm we haveslow growth, limx_oo log x = +00 but limx_oo x-a log x = O for anya > o. This follows from logx = K l/t dt ~ K ta-1 dt = (xa - l)/afor any a > o, hence (substituting a/2 for a) limx_oo x-a log x ~limx_oo x-a XO:i;l = O. Since log l/x = -log x, we have similar es­timates near zero: limx_o xa log x = O for any a > O (substitute l/xfor x).

Chapter 8 Transcendental Functions328

Page 348: Strichartz_The Way of Analysis 2000

Figure 8.1.1:

exp (-l!x2)

With the aid of the exponential function, we can do gluing of functionswith matching of all derivatives. The basic tool is the function I(x) =exp(_1/x2), which is defined for a11x =F Oand can be extended bydefining 1(0) = O, since limx_oexp(-1/x2) = limx_ooexp(-x) = O.Thus f (x) is a continuous function. We claim that f is in fact a eoofunction. This is clear at every point x =F O (in fact it is analytic onx > Oand x < O). At x = Othe function f(x) has a zero of infiniteorder, I(x) = O(lxln) as x ~ Ofor every n. Indeed the substitution1/x2 = t shows limx_o Ixl-n exp(_1/x2) = limt_+oo tn/2 exp(-t) = O.From this we can prove by induction that I(n)(o) = O. Since 1(0) = O,we have (/(x) - I(O))/x = I(x)/x ~ Oas x ~ O,so 1'(0) = O. AlsoI'(x) = 2x-3exp(-1/x2) = 2x-3/(x) for x =F O,so limx_o I'(x) = O,proving that 1 is el. It is clear for x =F Othat f(n)(x) = Qn(x)f(x)where Qn is a polynomial in l/x, so limx_o I(n)(x) = O; and if weassume by induction that I(n)(o) = O, then (/(n)(x) - f(n)(o))/x =Qn(x)/(x)/x ~ Oas x ~ Oproving f(n+l}(O) =O. Thus 1 is eoo andall derivatives vanish at x =O.

Notice that the Taylor expansions of f about x = Ovanish iden­tically for all orders. Thus 1 is not analytic at x = O, for the Taylorexpansions converge to zero and not to 1(x) (if x '# O). The graph ofI(x) is completely ftat at x = O,although the function is not constantin a neighborhood of x = O. We can thus glue I(x) for x > Oto thezero function on x ~ Oand have a eoo function. It is shown in Figure8.1.1.

8.1.2 Exponential Glue and Blip Functions

3298.1 The Ezponential and Logarithm

Page 349: Strichartz_The Way of Analysis 2000

then g.\(x) satisfies all of the aboye, except that in place oí condition 2we have g.\(x) =Oif Ixl ~ A. We call g.\ a Coo approximate identity.

Figure 8.1.2:

vanishes outside (-1,1) and satisfies g(O) = 1. We will refer to 9 asa blip function centered at x = O. We can obtain other blip functionscentered at other points x = Xo by taking g( A -1(x - xo) ); this functionvanishes outside Ix - xol < A and satisfies g(xo) = 1.

We can construct other C'" functions using blip functions. The ideais to use the convolution of a blip function with an arbitrary continuousfunction. Let g( x) be a blip function satisfying

1. g(x) is C?",

2. g(x) = O if Ixl ~ 1,

3. g(x) ~ O, and

4. J~ g(x) dx = f21 g(x) dx = 1.

This ínvolves nothing more serious than multiplying the previous blipfunction by a constant to obtain condition 4. If we set

-1 < x < 1,x ~ 1 or x s 1,

g(x) = { e2 exp (-I/(x - 1)2) exp (-1/(x + 1)2) ,O,

Using this idea we can create Coo functions to suit every need. Forexample, the Coo function shown in Figure 8.1.2,

Chapter .8 Transcendental Functions330

Page 350: Strichartz_The Way of Analysis 2000

if x < a - 3e or x > b + 3e, again for A < e. Thus F = 1 * g)..is a Coo function satisfying F(x) == 1 on [a, b] and F(x) == Ooutside[a - 3e, b - 3e]. We also observe that O:5 F(x) :5 1 for every x because1satisfies the same estimate. The graph of F is shown in Figure 8.1.4.

/ * g~{z) = ¡_:/(z - y)g~{y) dy = ¡_:O· g~{y) dy =O

since 1(x - y) = 1 for al! values of x - y in the integral. We also have

f * g)..(x) = 1).. I(x - y)g)..(y) dy = 1). g)..(y) dy = 1-). -).

Since 1== 1 on [a - e, b + e], we have for A < e and x in [a, b]

Figure 8.1.3:

b+2Eb+Ea-Ea-2¬

If1is any continuous function, then we can form 1* s» As we sawin the discussion of the Weierstrass approximation theorem, f * g).. willbe a Coo function ((1 * g).)(k) = 1* glJr.)) and 1* g). wiIl converge tof as A -+ O,uniformly on compact intervals. This procedure is caIledregularization. It differs from what we did in the proof of the Weier­strass approximation theorem only in that g). are Coo (the polynomials(1 - x2)n were joined up to zero at x = ±1 creating discontinuitiesin the derivatives of order ~ n). Regularization is an important tech­nique, although most of its applications are beyond the scope of thiswork.

Suppose we apply regularization to the function I(x) whose graphis shown in Figure 8.1.3.

3318.1 The Exponential and Logarithm

Page 351: Strichartz_The Way of Analysis 2000

Theorem 8.1.7 (BoreQ Given any sequence ao,al,a2, ... of real (orcomplex) numbers and given any point Xo and neighborhood Ix-xol < >.,

f(n)(o) = n!bn + (djdx)n (ÉbkXkg(~-lX)) Ix=o = anok=O

Since the kth equation expresses k!bk in terms oí ak and the previouslydetermined bo, bl, ... , bk-l, there is a unique solution. Notice that itwould not hurt to vary >. with k (taking >"k ~ >.). This is the importantobservation that enables us to control all derivatives simultaneously.

feO) = bo = aof'(O) = bl + djdx(bog(>.-lx»lx=o = al

1"(0) = 2~ + d2jdx2(bog(>..-lx) + blxg(>..-lx)lx=o = a2

8.1.3 Functions with Prescribed Taylor Expansions*Using blip functions we can build COO functions having prescribed Tay­lor expansions. The key observation is that the function fk (x) =g(>.-lx)xkjk! satisfies (djdx)kfk(X) = 1 but (djdx)i h:(x)lx=o = O forj < k. The reason for this is that (djdx)ixklx=o = O unless j = k, soby the product rule for derivatives each term of (djdx)ig(>.-lx)(:ckjk!)contains a factor vanishing at x = O if j < k and when j = kthe only term not vanishing is g(>..-lx)(djdx)k(xk jk!), which gives1 at x = O. Thus if we want a Coo function f vanishíng outsideIxl < >.. with (djdx)kf(x)lx=o = ak for k = 0, ... , n we need only takef = ¿k=O bkxk9(>..-lx) and then successively solve the linear equations

Figure 8.1.4:

ba

y = F(x)

Chapter 8 Transcendental Functions332

Page 352: Strichartz_The Way of Analysis 2000

n!bn + (d/dz)n (EbkXkg(~¡;lX)) Ix=o = anok=O

We will use this equation to define inductively bo, bl, .... Notice thatthe value of An does not enter into this equation or any earlier one,so we are free to choose An after seeing what bo, ... , bn are, and thechoice of An will not change bo, bl, ... ,bn• Therefore we choose An tomake (d/dz)k{bnxng{A~lz)) very small for aU k < n and all x, sayl{d/dz)k{bnzng{A;lz)1 ~ 2-n for all k < n and a11e. Why is thispossible? Essentially because the function zng{x) vanishes to order nat x = O, so by taking An very small we can nip it off before it hasgrown very mucho This is easiest to see for k = 0, where there are noderivatives involved. Then we can simply estimate

Ibnzng(A~lz)1 s IbnlA~

since g{A~lz) = O if Izl ~ An, Ig{A~lx)1 ~ 1 in general, and Iznl ~ A~if [z] ~ An. Thus, no matter how large bn may be, we can makeIbnlA~ ~ 2-n by taking x, s 1/2IbnI1/n. But even when there arederivatives involved, there are less than n of them. If k = 1 we have

d/dz{bnxng{A;lx)) = nbnxn-1g{A;lz) + bnA;lzng'{A;lz)

there exists a Coo function f vanishing outside the neighborhood andsatisfying f(k){zo) = ak for all k.

Proof: For simplicity we take xo = O. (For the general case justreplace z by x - zo.) We want f{x) = 2:~Obkxkg{A;lx) where bkand Ak < A are to be determined. The idea of the proof is that whilewe cannot control the values of bk (they wiU be determined as aboyeby solving linear equations to make f(k){O) = ak), we can control thevalues of Ak' Thus if we insist that Ak -T O as k -T 00, the sum definingf (z) is actuaDy finite for any fixed e. Indeed, if z = O only the first termis nonzero; while if z ::j:. O, then g{A;lz) = O once Ak ~ Izl (becauseg{A¡;lx) vanishes outside Ixl < Ak)' Of course the number of non-zeroterms increases without bound as x -T O.

Assuming we can differentiate the series term-by-term at x = O(this is true for z ::j:. O because the series is finite, but it is not clear atz = O, and indeed it won't be true unless we choose Ak cleverly), wefind the foUowing equation for the condition f(n) (O) = an:

3338.1 The Exponential and Logarithm

Page 353: Strichartz_The Way of Analysis 2000

Using this theorem we can glue ftaps onto C?" functions on an inter­val. Suppose f is Coo on [a, b] (by this we mean one-sided derivativesexist at the endpoints, and f( n) is continuous on [a, b] for a11n). Thenby constructing Coo functions la and lb to match up a11derivatives

converges uniformly. Thus I is Coo and term-by-term differentiation isjustified, proving I(n) (O)= ano Finally we note that the conditions onAn were a11of the nature that An must be sufficiently close to zero, sowe can arrange to have An ~ A for any prescribed A > O,proving thatI vanishes outside Ix I < A. QED

Since n-k > O, we can make this as small as we like by taking Ansmall enough. There are only a finite number of such terms in (dj dx)k(bnxng(A;;lx)), so we can bound this uniformly by 2-n. Finally, wecan do this for a11k < n since there are only a finite number of k.

This completes the description of how to choose the bn's and An's.We still have to justify the term-by-term differentiation of f(x) =2:~o bkxk9(A;lx) in order to conclude that f(n)(o) = ano To do thiswe use the estimates l(djdx)kbnxng(A;;lx)l ~ 2-n for a11k < n andall X. These estimates imply that the differentiated series convergesuniformly, since for any fixed k the condition k < n is satisfied for allbut a finite number of n, so

ICjbnXn-i (A;;1)k-j g(k-j) (A;;I) I~ CjMk_ibnA~-j(A;I)k-i= CjMk_jbnA~-k.

where Cj are combinatorial coefficients and j ~k < n. Each such termcan be estimated by the same reasoning, using g(k-j)(A;lx) = O for[z] 2: x, and Ig(k-j) (A;; lx)l ~ Mk-j where Mk-j is the s':!p of Ig(k-i) 1,some finite number independent of An, and Ixn-i I ~ A~-J on Ixl ~ An.We obtain

and in general (djd.x)k(bnxng(A;lx)) is a sum of terms of the form

cjbnxn-j (A; 1)k-j g(k-i) (A; 1x)

Chapter 8 Transcendental Functions334

Page 354: Strichartz_The Way of Analysis 2000

finite?

100 1~-~adx

2 X [logz]

7. For which values of a is the improper integral

finite?

{1/2 1

Jo x Ilog xla dx

6. For which values of a is the improper integral

5. Show that f' = >..f for a real constant >.. has only ce~x as solutions.

4. Show directly that if (1+x j n )n converges uniformly to a functionf on a compact interval, then f is el and f' = f.

3. Compute djdx(f(z)g(x») if 1 and 9 are el, I(x) > o.2. Find limx_o xx.

1. Using log x =¡;1jt dt show log(xy) = log x + log y.

8.1.4 Exercises

a eoo function on the whole líne extending f and vanishing outside[a - 1, b + 1].

This theorem can be paraphrased as saying that there are no apriori restrictions on the infinite Taylor expansion of a eoo function,since the Taylor expansion is determined by the derivatives at the point.In particular, the infinite Taylor series expansion about a point Xo maydiverge at every point except Xo.

{

fa(x) if x < a,F(x) = f(x) if a ~ x ~ b,

fb(X) if x > b

with f at z = a and z = b and to vanish outside [a - 1,b + 1],we canobtain in

3358.1 The Exponential and Logarithm

Page 355: Strichartz_The Way of Analysis 2000

16. Show that any function of the form f (x) = cxa will have a straightllne graph on log-log graph papero What is the slope of the line?

15. For f(t) = eGt (a positive) define the doubling time as the time Tsuch that f(t +T) = 2f(t). Show that T is well defined, and findthe relationship between a and T.

14. For f(t) = e-at (a positive) define the hall lile as the time T suchthat /(t +T) = f(t)/2. Show that T is well defined, and find therelationship between a and T.

13. Give an interpretation of definition 5 in terms of compound in­terest rates. What does it say about interest that is compoundedinstantaneouly?

12. Prove that any el real-valued function satisfying /(% + y) =f(x)f(y) must be exp(az) for some real a.. (Hint: differentiatethe identity.)

11. Show that there exists a eoo function on (a, b) having prescribedderivatives ofall order on any sequence ofdistinct points Xl, Z2, ..•with no limit points in (a, b).

10. Show that there exists a eoo function on the line having pre­scribed derivatives of all orders on any sequence of distinct pointsZl, X2,' •• with no finite limit point.

9. Show that for every closed set A there exists a eoo function onthe line such that f (z) =Oif and only if x is in A.

finite?

(2 1Jl/2 x [logz]" dx

8. For which values of a is the improper integral

Chapter 8 Transcendental Functions336

Page 356: Strichartz_The Way of Analysis 2000

lsin8 1= dy.o JI=Y2

1+(- h)2dY1_y2

r: {sin8e = Jo JI+ g' (y)2 dy =Jo

For O ~ O ~ 1r/2 this is consistent with the sides of a right trian­gle definition-taking the radian measurement of angles (equal to thelength of the are of the unit circle subtended), and for other values ofO it is consistent with the usual conventions. In order to turn this intoan analytic definition we use the calculus formula for the are length ofa piece of the circle. We can compute the length of the piece of thecircle between O and O (for O < 1r/2) by considering it as the graphoí the function g(y) = ~ as y varíes between Oand sinO, as inFigure 8.2.2:

Figure 8.2.1:

Our approach to defining the sine and cosine functions will be indi­rect: first we will obtain the inverse functions, which will enable us todefine sine and cosine on an interval, and then we will extend the defi­nition to the whole lineoAlong the way we will derive sorne importantproperties of these functions. We begin with the geometric idea that(cosO, sin O) should be the e- and y-coordinates of the point obtainedby measuring a length O along the unit circle from the point (1,O),withthe counterclockwise direction taken as positive, as in Figure 8.2.1.

8.2.1 Definition of Sine and Cosine

8.2 Trigonometric Functions

3378.2 Trigonometric Functions

Page 357: Strichartz_The Way of Analysis 2000

~2=arcsinl = t' RdY= lim r RdY,Jo 1- y2 x-l Jo 1- y2

which is simply a translation of the geometric definition of 1r as halfthe length of the unit circle.

in [0,1) and so the improper integral exists and is finite (recall thatfor a local singularity the cutoff point for integrability of Ix - xol-a: isQ = 1).

We can now define the number 1r by

and the first term remains bounded as y -+ 1. Thus,

1 < ey'l-Y2 - v'1=Y

111y'l-Y2 - yr-¡:y v'1=Y

Upon examining this formula we find that it does not in fact furnisha definition of sin 8 but rather of its inverse function arcsin. It tells ushow to compute 8 from sin 8. If we substitute x = sin 8 so that 8 =arcsin x we have

arcsin x = fozR dy, for - 1< x < 1

(because of our sign conventions tbis formula is also valid for negativevalues of x). For x = 1we have an improper integral since the integrandtends to +00. However, note that

Figure 8.2.2:

sin (9)

Chapter 8 Transcendental Functions338

Page 358: Strichartz_The Way of Analysis 2000

Figure 8.2.3:

Since the equation of the circle is x2 + y2 = 1 and the right halfcírcle is given by x > 0, we can define cos8 = +v'1 - sin2 8 for -1r /2 <8 < 1r/2. This gives the x-coordinate of the point on the circle withangle 8. The formula for the derivative of sin 8 now simplifies to

:8 (sin8) = v'1 - sin2 8= cos8,

asin a

such that

~ sin 8= 1/ (!arcsinx) = ~ for x = sin 8.

Thus on the right half circle we have recovered the geometric descrip­tion of sin 8 as the y-coordinate of the point on the unit circle wherethe angle 8 is measured by are length along the circle from the point(1, O), as shown in Figure 8.2.3.

The function arcsinx is defined on (-1,1) and is el with deriva­tive 1/~ by the fundamental theorem of the calculus (note thederivative does not exist at x = ±1). Since the derivative is strictlypositive, we can invoke the inverse function theorem to define sin 8 on(-1r/2,1r/2)-the image of arcsin z on (-1,1) (note that the continuityof arcsín z up to x = ±1 guarantees that (-1r/2, 1r/2) is the image of(-1, 1)). The inverse function theorem tells us that there is a unique elfunction sin 8 defined on (-1r /2, 1r/2) taking values in (-1, 1) satisfyingr:

8= Jo 1/~dy,

3398.2 Trigonometric Functions

Page 359: Strichartz_The Way of Analysis 2000

sin 8 = cos (8 - i), cos 8 = - sin (8 - i)for O < 8 < 'Ir/2. Indeed suppose (x,y) = (cos8,sin8) is the pointon the unit circle corresponding to angle 8 in O < 8 < 'Ir/2. Then

holding on this interval.Next we will extend the functions sin 8 and cos 8 to the whole line

in accordance with our geometric definition so that these identitiescontinue to hold. The easiest way to do this is first to establish theidentities

Figure 8.2.4:

1

The graphs of the functions sin 8 and cos 8 for 8 in (-'Ir /2, 'Ir/2) areshown in Figure 8.2.4. We have the fundamental identities

sin28 + cos2 8 = 1, :8 sin8 = cos8, ~ cos8 = - sin 8,

= !!'Vl - sin2 8d8-sin8(d/dx)sin8 -sin8cosO . 8

= Vl- sin28 = cos8 = -SID •

dd8 (cos 8)

and by differentiating the equation defining cos 8 we obtain

Chapter 8 Transcendental Functions340

Page 360: Strichartz_The Way of Analysis 2000

Figure 8.2.5:

The portion of the are from (1,O) to (z, y) has length J~1/~ dtand the portion from (y, -z) to (1, O)has length J: 1/~ dt (herewe are using the z-coordínate as parameter, or equivalently the iden-

tity J1Il1/~dt = Jo~ 1/~d8, which follows from thesubstitution 8 = ~). Thus the totallength is

1.11 1 dt +11 1 dt =1.1 1 dt =~o~ 11~ o~ 2

as desired. This result is dear on geometric grounds because the radüjoining the origin to the poínts (z, y) and (y, -z) are perpendicular andperpendicular radü cut the circle in quarters.

Now the identity sinB = cos(B-7r/2) allows us to define sinB in theinterval [11'/2,11'], since cos(B - 11'/2) will already be defined. Similarlywe can define cosB = -sin(B-1r/2) for B in [11'/2,11'), and then wecan extend sine and cosine for [11',311'/2), and so on. The extension

(y, -z) is also a point on the unit circle (see Figure 8.2.5). It dearlylies in the fourth quadrant, so wemust have (sinB, - cosB) = (y, -z) =(cos tI>, sin tI» for sorne ti> in -7r /2 < ti> < O. We need to show ti>= B -7r /2in order to have the desired identities. Inother words, we need to showthat the are length along the cirde from (y, -z) to [z,y) is 7r/2-onequarter of the circle.

Since (z,y) is in the first quadrant and (y, -z) is in the fourthquadrant, (1, O) is an intermediate point.

3418.! Trigonometric Functions

Page 361: Strichartz_The Way of Analysis 2000

Another way to obtain a global definition of sine and cosine is viapower series. Since we can easily compute the derivatives of sine andcosine of al! orders, we find the infinite Taylor expansions about theorigin are

Figure 8.2.6:

················..

to values of 8 ~ -1r /2 is accomplished in a similar way. Notice whathappens to sin8 near 8 = 1r/2. We have sin8 = cos(8 - 1r/2} (for8 < 1r/2 because we proved it, and for 8 ~ 1r/2 because we defined it), sosin 8 is continuous and differentiable at 8 = 1r/2 and d/d8sin8Is=1r/2 =d/d8cos(8 - 1r/2}ls=1I'/2= - sin(8 - 1r/2}ls=1r/2= O.

It is easy to verify that the functions sin 8 and cos 8 so extendedagree with the geometric definition in terms of (cos 8, sin 8) giving thecoordinates oí a point on the unit circle a distance 8 from (1, O) in thecounterclockwise direction, and they continue to satisfy the identities

sin28 + cos28 = 1,

~ sin 8 = cos8, ~ cos8 = - sin 8

sin 8 = cos (8 - i), cos8 = - sin (8 - i) .Iterating the last two identities we find sin 8 = - sin(8 - 1r), cos 8 =-cos(8- 1r),sin8 = sin(8-21r),cos8 = cos(8-21r), so sine and cosineare periodic functions of period 21r. Their graphs are shown in Figure8.2.6. Since sine and cosine are continuous, the derivative formulasimply they are C?", (d/ d8)2k sin 8 = (-l)k sin 8, etc.

Chapter 8 Transcendental Functions342

Page 362: Strichartz_The Way of Analysis 2000

Proof: This result is a consequence of a general uniqueness the­orem for ordinary differential equations that we will establish in alater chapter. However, there is a simpler proof of this special case,modeled on the proof of the uniqueness of solutions of the exponen­tial differential equation. We observe that the complex-valued func­tion F(x) = g(x) + if(x) satisfies the differential equation F'(x) =- f(x) + ig(x) = iF(x), as does the function cisz = cosz + isinx.Also, both F and cis take on the value 1 at x = O. Therefore we wantto form F(x)/ciax and show that the derivative is zero. We note that[císxl2 = cos2X + sin2X = 1, so císx ::j:. O;hence we can divide by it.

Lemma 8.2.1 Jf f and 9 are Cl functions on the line satisfying f' = 9and g' = - f and f(O) = O, g(O) = 1, then f(x) = sin z and g(x) =cosx.

We consider them as power series. From the size of the coefficients(they are comparable with the exponential power series) we see thatthey converge for all x. However, we cannot conclude from this thatthey converge to sin x and cosx (as we have seen the Taylor expansionof exp(-1/x2) about x = Oconverges to zero). This will turn out tobe true, but it will require a proof-and a rather indirect proof, sinceour definition of sine and cosine via the arcsine does not give us direc­t information about the power series. We do observe that S(x) andC(x) satisfy the same sort of differential equations as sine and cosine,S'(x) = C(x) and C'(x) = -S(x)-these followsimply by differentiat­ing the power series term-by-term. It would seem plausible that thesedifferential equations essentiaIly characterize sine and cosine, at leastwith the conditions S(O) = Oand C(O) = 1, which are clearly satis­fied. Thus we wiIl have S(x) = sin z and C(x) = cosz for all x as aconsequence of the following lemma.

C(x)

S(x)

3438.12Trigonometric Functions

Page 363: Strichartz_The Way of Analysis 2000

The connection between sines and cosines and the exponential func­tion is now clear from two perspectives. We have seen that F(x) =cos z + isinx satisfies F'(x) = iF(x), the same differential equation

8.2.2 Relationship Between Sines, Cosines, and Com­plex Exponentials

(the values at x = O providing the constants 1 and O). We get f(x) =sin x and g( x) = sin x by solving these linear equations (multiply thefirst by sin x and the second by cos x and add, using cos2 x +sin2 x = 1).Of course such a purely real-valued proof would seem quite magical (outofwhat hat did we draw cos xg(x)+sin xf(x) and cosxf(x)-sinxg(x)?)if presented without motivation.

cosxg(x) + sinxf(x) = 1 and cosxf(x) - sinxg(x) = O

from the differential equations, hence

d~[COSXg(x)+sinxf(x)] =0 and ![cosxf(x) -sinxg(x)] =0

Thus we compute

= (cosx - isinx)(g(x) + if(x))

= [cosxg(x) + sin xf(x)] + i[cos xf(x) - sin xg(x )].

F(x)cisx

We could also disguise the argument to remove a11reference to thecomplex numbers by noting 1/(cosx + isinx) = cosx - isinx, sincecos2 x + sin2 x = 1, so that

is as easy as before. Thus F(x) = acisx and a = 1 follows by settingx = O. QED

cis x! F(x) - F(x) d:cis x= (cis x)2= icisxF(x) - iF(x)cisx = O

(cis x)2

.!!_ (~(x))dx ClSX

The computation

Chapter 8 Transcendental Functions344

Page 364: Strichartz_The Way of Analysis 2000

cos(x + y) + isin(x + y) = (cosx + isinx)(cosy + isiny).

after substituting the Euler identities:

exp(i(x + y)) = exp(ix) exp(iy)

cos(x + y) = cosx cosy - sin x sin y

can be obtained by taking the real and imaginary parts of the identity

sin(x + y) = sin x cosy + cosx sin y,

These are truly remarkable identities. Although they are trivialconsequences of the power-series expansions, they are totally unexpect­ed from the geometric definitions of sine and cosine. We should alsopoint out the rather remarkable nature of the power-series expansionsfor sine and cosine. For example, the periodicity of the cosine meansthat L~o(-1)kx2Ielk! = L~o(-l)le(x + 27r)2k/k!. This identity istrue in the sense that rearrangement of the second series-expanding(x + 27r)21e in the binomial theorem-gives the first series. But a directproof of this fact is out of the question.

The Euler identities simplify the study of the trigonometric func­tions. For example, the addition formulas

2iexp(ix) + exp( -ix)

2cosx =

exp(ix) - exp(-ix)=sin z

by separating the real and imaginary parts (the rearrangement is justi­fied by the absolute convergenceof the series). Thus we have exp(ix) =cos X + i sin x, the notorious Euler identities. In a like manner we findexp(-ix) = cosz - isinx, and so we can solve for sine and cosine:

00 (')n OO'n n 00 2k 00 2k+l. "IX ,,' X " k X • " le Xexp(IX) = LJ --;r- = LJ )i! = LJ( -1) (2k)! +t LJ( -1) (2k + 1)!

n=O n=O k=O k=O

that exp(ix) should satisfy. By exp(z) for z a complex number wemeanthe result of substituting z in the exponential power series, exp(z) =~::oznIn!. For z = ix we find

3458.~ Trigonometric F\mctions

Page 365: Strichartz_The Way of Analysis 2000

By the inverse function theorem tan 8 has a el inverse (denoted aretanz) on the whole line taking values in (-'Ir /2, 'Ir/2) with the derivativegiven by d/dx aretan z = 1/{d/d8 tan 8) = 1/{1 + tan28) if z = tan 8.Thus d/dx{arctan z) = 1/{1 + z2); and since arctan O= O,we obtainarctan z =¡;1/{1+ t2) dt by the fundamental theorem of the calculus.

With the aid of the sine and cosine we can give another perspectiveon the complex number system. If z is a non-zero complex number,then z/Izl is a complex number ofabsolute value 1, Iz/lzlI = Izl/Izl = 1.Since Izl2 = z2 + y2 for z = z + w. a complex number of absolute

Figure 8.2.7:

tan(9)

which shows tan 8 is increasing on (-'Ir /2, 'Ir/2); and since cos ±'Ir /2 = 0,we see that tan 8 maps (-'Ir /2, 'Ir/2) onto the real lineo The graph isshown in Figure 8.2.7.

The other trigonometric functions are definable in terms of sineand cosine, so we will not discuss their properties in detail. We willmention one interesting result, the arctangent integral: aretan: arctanz =¡;1/{1 + t2) dt. To derive this we compute

d 8 _ d sin 8 _ cos28 + sin28 _ 1 28d8 tan - d8 cos8 - cos28 - + tan ,

Chapter 8 Transcendental Functions346

Page 366: Strichartz_The Way of Analysis 2000

We can use these ideas to define an infinite-valued logarithm func­tion for complex numbers. We want log to be the inverse functionof exp, but exp is not one-to-one on C, since exp(z + 27rki) = exp z.To see how to define log z we look at the equation exp(log z) = z(in the other direction we can only expect that one of the valuesof log(exp z) will equal z). Writing z = re" and log z = a + biwe are led to the equation reíS = éeib and, hence, r = ea andeiS = e'b, SO a = logr and b = 8 + 27rk for some integer k. Thuslog z = log r + i(8 + 27rk) = log Izl+ i arg z, We can a1so express thisin terms of the real and imaginary parts of z since Izl2 = x2 + y2 and

and now ak need not be an integer. In fact, if a is irrational there areinfinitely many values for za, while for a rational there are only finitelymany. For example, there are two square roots, (rei6)1/2 = rl/2ei6/2and (rei«(J+21r})1/2 = rl/2é6/2e1ri = _rl/2eiS/2, exactly as we expect.Thus

(i)I/2 = (e1ri/2)1/2 = ±e1ri/4 = ± (cos ~ + isin~) = ± (_!_ + ~) .. 4 4 J2 J2

since kn is a1so an integer. However, for non-integer powers we findmore than one value for za, since

value one corresponds to a point [z,y) in the plane lying on the unitcircle. Thus, (x,y) = (cos8,sin8) and z/Izl = cos8 + isin8. Thevalue of the angle 8 is only determined up to a multiple of 27rand iscalled the afYUment of z or arg z. Thus, we have the polar coordinatesrepresentation of an arbitrary non-zero complex number z = rei(J =r cos8+ ir sin 8 where r = Izl is the length of the line segment joining(O,O) to (x, y) and 8 is the angle it makes with the positive x-axis.This gives us a better understanding of complex multiplication. IfZI = rlei(Jl and Z2 = r2ei(J2, then Z1Z2 = rlei(Jlr2ei(J2 = (rlr2)ei(61+S2),so the absolute values multiply and the arguments add. Similarly forpowers: zn = (réS)n = rnein8• The absolute value is raised to thepower n and the argument multiplied by n. Notice that in both theseformulas the ambiguity in the argument does not matter. For example,

3478.~ Trigonometric Functions

Page 367: Strichartz_The Way of Analysis 2000

which is the familiar law 01 cosines for the triangle (shown in Figure8.2.8) with vertices at O,ZI, Z2. Since cos(81 - 82) varies between +1and -1, we have IZl-z212lying between r~+r~-2rlr2 = (rl-r2)2 andr~ + r~ + 2rlr2 = (rl + r2)2, with the maximum and mínimum valuesassumed when the angle 81- 82is 'Ir and O. Thus IZl- z212~ (rl +r2)2,so IZl- z21~ IZll+ IZ21with equality holding only when ZI and Z2arecolinear.

In addition to the exponential, logarithm, and trigonometric func­tions, there are a number of other transcendental functions that havebeen carefully studied. Examples of these special functioR8 are Besselfunctions, hypergeometric funetions, Legendre functions, and thegamma function. The main tools used in the study of these func­tions are those we have already discussed: representation as integrals,power-series expansions, and differential equations.

Figure 8.2.8:

o

ZI

arg z = aretan y/x, so log(x + iy) = 1/210g(x2 + y2) + i aretan y/x(actually there is more ambiguity in ardan y/x than in arg z, so thisequation must be understood in the sense that only half the values ofarctan y/x are allowed).

With the aid of the polar coordinate representation we can givea simple proof of the triangle inequality (lZl - z21~ IZ11+ IZ21)forcomplex numbers. Writing ZI= rl ei91 and Z2= r2ei~ we compute

IZl- z212 = (ZI - Z2)(%1- %2)= (rlei91 - r2ei62)(rle-ilh - r2e-i92)= r~ + ri - rlr2(ei(91-~) + e-i(91-92))= r~ + ri - 2rlr2 COS(81 - 82),

Chapter 8 Transcendental Functions348

Page 368: Strichartz_The Way of Analysis 2000

8. *Showthat eZ = z has no real solutions but that it has complexsolutions. (Hint: write z = z + iy, and show that eZ = z isequivalent to z2 + y2 = e2x and y/ z = tan y.)

9. Br expanding 1/{1+z2) in a power series about z = Oprove 'Ir/4 =Jo 1/{1 + z2) dz = lim,._l L~o{ _1)kr2k+l /(2k + 1), and thenshow 'Ir / 4 = L~o (-l)k / (2k + 1). (Hint: combine neighboringterms in the series.)

7. Let f(z) = L~=o Clnznbe convergent in Izl < R, and suppose thecoefficients Cln are all real. Show f (z) = f (z ).

6. Find all complex solutions of the equation zn = 1. Can you givea geometric interpretation of the result?

5. Compute the power-series expansions of sin 8 and cos 8 about thepoint 'Ir/2.

Interpret this result in terms of the arctangent.

3. *Using 'Ir = 4 J¿ 1/{1 + z2) dz and the midpoint and trapezoidalrules with increments of 1/10, compute the approximate valueof 'Ir. Compare the predicted error with the actual error. (Usea calculator for this problem. You might also try programmingthe computation to allow different values of the increment andsee if the actual error depends on the increment in the predictedmanner.)

4. Verify the identity sin28 + cos28 = 1 by rearranging the power­series expansions.

2. Show by direct substitution that

8.2.3 Exercises1. Show by direct substitution (t = z/~) that

1"01/(1+ ",2) dx = ll/..¡¡:::t2 dt.

3498.~ Trigonometric Functions

Page 369: Strichartz_The Way of Analysis 2000

Theorem 8.1.2 exp(z) ÍI the unique solution o/ the diJJerentialequa­tion J' = j with /(0) = 1.

b. exp(z + y) = expe expy for any real z and y.

c. exp(z) > Ofor any real z ,

a. The exponential function satisfies the diJJerentialequation J' = j,with /(0) = 1.

Theorem 8.1.1

Deftnition 8.1.1 Tbe ezponential function exp(z) is defined jor anyreal z by exp(z) = E:O=ozn/nL

8.1 The Exponential and Logarithm

8.3 Summary

12. Express (síne)" as a linear combination of coskz and sinkz forO:5 k :5 n. (Hint: use sinz = (eiZ - e-iZ)/2i.)

13. Prove ¡:sin nz sinmz dz = Oif n :1: m.

14. Prove coszcosy = 1/2cos(z + y) + 1/2cos(z - y). Explain howthis formula could be used, in conjuction with atable of cosines,to simplify the process of multiplication of two numbers in [0,1].(This idea was proposed by Francois Viete in the 15908under thename prosthapooeruÍl. It became obsolete alter the introductionof logarithms by Napier.)

11. Show that exp(z) assumes every complex value except zero andthat exp(zt) = exp(Z2) ü and ooly if ZI - Z2 = 21rki for someinteger k.

10. Show a sin z + bCOI z = A cos(z + B). How are a, b and A, Brelated?

Chapter 8 Transcendental Functíons350

Page 370: Strichartz_The Way of Analysis 2000

is Coo and tlanishes to infinite order at x = O; hence, it is not analyticat x = O.

F(z) = { exp( _1/z2), z:l:. O,O, x= O,

Theorem The function

Theorem limx_+ooz-nexp(x) = +00 and lhnx_+ooznexp(-x) = 0,[or any n ~ O; limx_+oo x-Alog x = O and limx_o zAlog x = Olor anya> O.

for Ix - XoI < Xo; hence log x is analytic.

~ (_1)1e+1 (z - xo) klogx = 10gxo + LJ k --

k=l Xo

[or Ixl < 1, and more generally

X2 z3 x4.log(1 + e) = z - - + - - - + ...

234

Theorem

Theorem 8.1.6 exp(z) = liIIln_oo(1 + x/n)n for any real z,

Theorem The functions exp and log e8tablish an isomorphism betweenthe addititlE group 01 the reals and the multiplicatitle group 01 the p08ititlereals.

Theorem 8.1.5 logx = ft 1/tdt.

Theorem 8.1.4 The ezponential function map8 R one-to-one and ontoR+.

Theorem 8.1.3 For every rational number p/q{p and q integers, s >O) we hatle exp{p/q) = (eP)l/q. For z real and Xle= Ple/qle a Cauchysequence 01 rationals contlerging to z, exp(z ) = limle_oo exp( z le) =limle_oo( ePlc )l/qlc•

3518.3 Summary

Page 371: Strichartz_The Way of Analysis 2000

x3 xS 00 x2k+lsin z = x - -3' +rr - ... = ~)_l)k (k )'. 5. k 2 + 1 .

=0

Theorem sin and cos are analytic functions with power series

Theorem sin 8 and cos 8 are el functions periodic 01 period 21r, andsatisfy the identities sin28 + cos2 8 = 1, d/ d8 sin 8 = cos 8, d] d8 cos 8 =- sin 8.

Theorem There exist unique functions sin 8 and cos 8 defined for allreal 8 coinciding with the above definitions [or -1r /2 < 8 < 1r/2 andsatisfying sin8 = cos(8 -1r/2),cos8 = - sin(8 - 1r/2) for all real 8.

Deftnition cos8 = +J1- sin2 8 for -1r/2 < 8 < 1r/2.

Theorem arcsinx is a el function mapping (-1,1) onto (-1r/2,1r/2)with derivative 1/~, and has ael inverse function sin 8 mapping(-1r/2,1r/2) onto (-1,1) with derivative d/d8sin8 = JI - sin2 8.

Definition arcsin x = ¡;1/J1=Y2 dy for -1 ~ x ~ 1, and 1r = 2arcsin 1.

8.2 Trigonometric Function

Corollary Given any Coo function on a compact interval [a, b] (mean­ing one-sided derivatives exist at the endpoints) there exists a Coo ex­tension to the line vanishing outside a larger interval [a - E, b+ E].

Theorem 8.1.7 (BoreQ Given any sequence ao, al, ... of reals, anypoint xo, and any neighborhood Ix-xol < A, there exists a C'" functionf vanishing outside the neighborhood such that f(k)(xo) = ak for all k.

is COO, vanishes outside Ixl ~ 1, and satisfies g(O} = 1.

g(x) = { e2 exp (-l/(x - 1)2)exp (-l/(x + 1)2), -1 < x < 1,O, Ixl ~ 1,

Theorem The "blip function"

Chapter 8 Transcendental Functions352

Page 372: Strichartz_The Way of Analysis 2000

Theorem (Law 01 Cosines) IZl - z212= r~ + r~ - 2rlr2 eos{8l - 82) ilZI = rl ei91 , Z2 = r2ei92.

Tbeorem An arbitrary non-zero complex number z can be written z =re", where 8 (determined modulo 27r) is called the argument and r themodulw 01z, In multiplying complez numbers the moduli are multipliedand the arguments added, while zn = rnein8 [or integer n.

Theorem aretan z =¡;1/{1+ t2) dt.

2iexp{ ix) + exp{ -ix)= 2

cos z

exp{ix) - exp( -ix)=sinx

Theorem (Euler identities) exp{ ix) = cos x + isinx,

Lemma 8.2.1 The unique el solutions to I' = 9 and g' = - I withI{O) = o and g{O) = 1 are I{x) = sin x and g{x) = cos z.

contlerging for all real e,

3538.3 Summary

Page 373: Strichartz_The Way of Analysis 2000
Page 374: Strichartz_The Way of Analysis 2000

355

We are now ready to begin the study of functions of several variables,f(z¡, Z2, ••• ,zn), where each ZIe varies over III Since we live in a tbree­dimensional world, we can easily appreciate the importance of thissubject, at least for n = 3. We allow the number n to be arbitrarybecause it is necessary to do so for many applications and also becausethe mathematics is not appreciably more difficult. We will frequentlyappeal to our three-dimensional geometric intuition to guide us to anunderstanding of the general case.

We begin by studying the Euclidean space Rn ofn real variables. Wecan simply defineRn to be the set of all ordered n-tuples (z¡, Z2, ••• , zn)of real numbers. Of course this is not the whole story. We also want todefine on the set Rn various "structures": vector space, metric space,normed space (Banach space), and inner product space (Hilbert space).Each of these structures can be defined by rather simple formulas;however, we will also be interested in a more abstract description ofthese structures. For each type of structure, we will first describe theEuclidean version, then present some of the basic properties of thestructure, and finally use these properties to define a general notionof the structure. In this way, the Euclidean version becomes just a

9.1.1 Vector Space and Metric Space

9.1 Structures on Euclidean Space

Euclidean Space andMetric Spaces

Chapter 9

Page 375: Strichartz_The Way of Analysis 2000

special case of abstract structure¡ but it is the special case in which weare most interested, and it serves to motivate the abstract definition.All these structures play an important role in mathematics, althoughit is the metric space structure that will be most emphasized in thisbook. The reader who has not seen these ideas before should not expectto appreciate the signficance of these structures immediately¡ such anunderstanding can only come after the theory is developed.

In order to simplify the notation we will adopt the following con­ventions. Letters at the end of the alphabet z, y, z, etc., will be used todenote points in R", so x = (Xl, X2, ... , xn) and xk will always refer tothe kth coordinate of e. We will reserve the letter n for the dimensionof the space. Letters from the beginning of the alphabet, a, b, c, etc.,wiIl denote real numbers, also called scalars.

We begin with the vector space structure of R". We define vectoraddition by X + y = (Xl +Yl, X2+Y2, ... ,Xn + Yn) and scalar multipli­cation by ax = (axl, aX2, ... , axn). It is easy to verify that with thesedefinitions lRnforms a vector space over the scalar field IR. We recallthe vector space axioms: a set V with a vector addition and scalarmultiplication is said to be a vector space over the scalar field lF (inthis book we wiIl always take R or e) provided

1. vector addition satisfies the commutative group axioms: commu­tativity (x + y = y + z}, associativity «x + y) + z = x + (y + z)),existence of zero (x + ° = x for all x), and existence of additiveinverses (x + (-x) = O); and

2. scalar multiplication is associative «ab)x = a(bx)) and distributesover addition in both ways (a( x + y) = ax + ay and (a + b)x =ax + bx).

The study of vector spaces is called linear algebra. We assume thereader has had some exposure to the elementary theory of linear alge­bra, at least in the concrete setting of R". We recall that Rn has dimen­sion n because the vectors (1,0, ... ,O),(O,1,0, ... , O),... , (O,... ,0,1)form a basis (a set of vectors that is linearly independent and spans)and that every basis must have n elements. We refer to this special basisas the standard basis, and we sometimes denote it by e(l), e(2), ... ,e(n).The vector x is written uniquely Xl e(l) +X2e(2)+... +xne(n) as a linearcombination of the standard basis vectors, and the coordinate x k is thecoefficient of e(k) in this representation.

Chapter 9 Euclidean Space and Metric Spaces356

Page 376: Strichartz_The Way of Analysis 2000

The Euclidean distance clearly satisfies the first two properties; onlythe triangle inequality requires proof. We have already proved this inthe case n = 2 under the guise of a triangle inequality for distances inthe complex planeo Indeed we can regard e as R2 with some additionalstructure-complex multiplication-by identifying x = (Xl, X2) in R2with z = Xl + iX2 in e, and the distance functions are the same. Wecould generalize the proof given for e to the case of Rn but will obtaina simpler proof very shortly, so we postpone the discussion until thenext section.

The abstract notion of metric space is that of a set M with adistance function (or metric) d(x, y) taking real values for X and yin M and satisfying the aboye three conditions of positivity, symme­try, and triangle inequality. These are considered to be the mínimumconditions needed to justify the crudest intuitions of distance (sorne­times, however, a condition weaker than the triangle inequality, such asd(x, z) :5 M(d(x, y) +d(y, z)) for some constant M, can be substitutedand one still obtains a useful notion oí distance). Thus we have shownthat Rn with the Pythagorean distance function forms a metric space.We will give many more examples in the next section.

3. d(x, z) :5 d(x, y) + d(y, z) (triangle inequality).

2. d(x, y) = d(y, x) (symmetry).

1. d(x, y) ~ O with equality if and only if x = y (positivity).

The vector space structure of Rn is not enough--on its own-toallow us to express geometric concepts. We are thus led to considerthe metric structure in order to define length. We take the Pythagore­an formula d(x, y) = J(XI - YI)2 + ... + (xn - Yn)2 as the definitionof the Euclidean distance between x and y. Our geometric intuitionvalidates this definition when n = 1,2,3. It is also the only reasonablechoice in general if we want subspace consistency (if x and y happen toHein an m-dimensional subspace defined by the vanishing of a specifiedset of n-m coordinates, then the distance is the same measured in thesubspace or in all of Rn). This distance function satisfies three basicconditions:

3579.1 Structure« on Euclidean Space

Page 377: Strichartz_The Way of Analysis 2000

d(x, y) = Ilx - yll = II(-1)(y - x)1I= l-lllly - xii = lIy - xII = d(y, z},

Note that in the statement of homogeneity the symbol j ] is used withtwo different meaníngs-e]a] referring to the absolute value of a. Ofcourse the absolute value and the norm coincide for R1, so there is nodanger of misinterpreting the formula. Nevertheless, it is sometimespreferable to use double bars Ilxll to donate the norm. We have chosento use the single bars for the norm on Rn so that we can use double barsto refer to other norms. The abstract definition of norm is of courseany function on a vector space that satisfies the aboye conditions ofpositivity, homogeneity, and triangle inequality. Notice that a normmust be defined on a vector space in order for conditions 2 and 3 tomake sense. For a metric there is no need to assume the space has avector space structure.

The verification that the Euclidean norm on Rn actually satisfies thepositivity and homogeneity conditions is trivial, and again we postponethe proof of the triangle inequality. What we want to show is thatthese conditions on the norm imply the defining conditions of a metricfor the distance d(x, y) = Ix - yl. In other words, if we start withany vector space V with a norm IIxll, then V becomes a metric spacewith the distance function d(x, y) = IIx - yll. The proof of this istrivial: the positivity of the norm implies the positivity of the metric,d(x, y) = Ilx - vll ~ Owith equality if and only if x - y = O,or in otherwords x = y. The symmetry of the metric followsfrom the homogeneityof the norm with a = -1,

3. Ix + yl ~ Ixl + Iyl (triangle inequality).

2. laxl = lallxl for any scalar a (homogeneity).

1. Ixl ~ Owith equality if and only if x = O(positivity).

The metric structure of ]Rn is related to the vector space structure,since d(x, y) depends only on x-y. To make this explicit we introducethe Euclidean norm, defined by Ixl = "¡xi + ... + x~, so that d(x, y) =Ix - yl. The basic properties of the norm are:

9.1.2 Norm and Inner Product

Chapter 9 Euclidean Space and Metric Spaces358

Page 378: Strichartz_The Way of Analysis 2000

IlxlI.=(tIXjl'f"with p a constant satisfying 1 ~ p < 00 (the norm Ilxllsup can bethought ofas limp_oo Ilxllp and is frequently denoted Ilxlloo). The proofof the triangle inequality for the p-norm in general is more díffícult. Ifwe draw the graph of the set of points in ]R2 satisfying IlxII= 1we canget a picture of the differences between these norms, as in Figure 9.1.1.

For another important example we consider the space C([a, b]) ofcontinuous real-valued functions on the compact interval [a, b), withnorm Ilfllsup = suPx If(x)l. The vector space structure of C([a, b])

The verification of the norm axioms is straightforward for these--weleave it to the exercises. The associated distances are different from thePythagorean distance--in the first case we can interpret the distanceIlx - yll as the shortest distance between x and y along a broken linesegment that moves parallel to the axes (taxicab distance in a city laidout on a square grid). The norms we are considering are special casesof the p-norm

n

IlxllI = ¿Ixjl,j=l

Ilxllsup = m?-X{lxjI}·J

The metric d(x, y) = Ilx - yll is said to be the metric associated with (orinduced by) the norm. Note that in our proof that d(x, y) is a metricwe used the homogeneity of the norm only in the special case a = -1.This means that the homogeneity condition is considerably strongerthan is absolutely essential-so not every metric on a vector space thatdepends only on x - y is associated with a norm (for an example seeexercise set 9.1.4, number 15).

Here are two more examples of norms, with the underlying vectorspace being ]Rn:

d(x, z) = Ilx - zll = II(x - y) + (y - z)11~ Ilx - yll + Ily - zll = d(x, y) + d(y, z).

Finally, the triangle inequality for the norm implies the triangle in­equality for the metric-

3599.1 Structures on Euclidean Space

Page 379: Strichartz_The Way of Analysis 2000

1. x· y = y. x (symmetry).

2. (ax + 11y) • z = ax . z + l1y • z and x . (ay + bz) = ax . y + bs: . z(bilinearity).

e- y = rR(cos8cos4> + sín é sin é = rRcos(8 - 4»,

and r = Ixl,R = lyl, and 8 - 4>is the angle between, as in Figure 9.1.2.The basic properties of the inner prod uct are:

is the obvious one, f + 9 is the function f(x) + g(x), and af is thefunction af(x). The veríñcation of the norm axioms for the sup-normis again simple and left as an exercise. This vector space is not finitedimensional.

The last structure on Rnwe want to consider is the inner productoWe need this structure in order to express the geometric notion ofangle. We define x . y = XIYl + ... + xnYn-this is sometimes calledthe scalar product or dot producto The connection with angle is givenby the formula x . y = Ixllyl cos 8 defining the angle 8 between thevectors x and y. Note that the angle is only defined if both x and y arenonzero, and then the sign of 8 is not defined (in dimension n ~ 3 wecannot unambiguously choose a sigo convention). For most applicationswe will only be interested in the condition x . y = O characterizingperpendicular vectors. The geometric justification of the angle formulain R2 is familiar: if x = (rcos8,rsin9) and y = (Rcos4>,Rsin4» inpolar coordinates, then

Figure 9.1.1:

IIxlj¡ = 1 diamond

IIxl12= 1circle

IIxlloo = 1 square

Chapter 9 Euclidean Space and Metric Spaces360

Page 380: Strichartz_The Way of Analysis 2000

lb I (b ) 1/2 ( b ) 1/2f. I(,,)g(,,) do: s f. 1/(,,)12 d» f. Ig(,,)12d" ,

An integral version,

(n ) 1/2 (n ) 1/2LX~ LYI

j=1 j=1

This is one of the most important inequalities in all of analysis andappears under myriad guises. By stating it as we have in the abstractformulation we inelude all the special cases-corresponding to differentchoices of inner producto For the case of the Euclidean inner productthe inequality reads

Notice that all three conditions are trivially verified. Any real-valuedfunction (x, y) defined for x and y in a vector space and satisfyingsymmetry, bilinearity, and positive definiteness is said to be an innerproducto

The connection between the Euclidean inner product and norm isevidently given by Ixl =.¡x:x. In general, if we are given an innerproduct (x, y) we define the associated (or induced) norm to be Ilxll =~. This requires, of course, that we prove Ilxll = ~ actuallyis a norm. To do this we need an important estimate, known as theCauchy-Schwartz inequality:

3. x· x 2: Owith equality if and only if x =O (positive definiteness).

Figure 9.1.2:

3619.1 Structure« on Euclidean Space

Page 381: Strichartz_The Way of Analysis 2000

Notice that the first inequality obtained, I(x, y)1 :s ((x, x)+(y, y) )/2,is in general weaker than the Cauchy-Schwartz inequality because thegeometric mean (x, x)I/2(y, y)I/2 is less than the arithmetic mean((x, x) + (y, y) )/2. Thus the "scaling" part of the proof is indispen­sible.

I(x, y) I = I(ax', by') I = lab(x', y') I = labll (x', y') I :s labl

by the special case I(x', y')1 :s 1, and labl = (x, x)I/2(y, y)I/2.To see when equality can hold consider first the special case of

unit vectors. Then (x', y') = 1 implies (x' - y', x' - y') = O, while(x', y') = -1 implies (x' + y', x' + y') = O. Thus equality holds ifand only if x' = ±y'. In the general case equality holds if and only ifI(x', y') I = 1, so the condition of colinearity follows. QED

Proof: The proof is based on the observation (x + y, x + y) ~ Oand (x - y, x - y) ~ O by the positivity. Expanding these using thebilinearity and symmetry we obtain (x, x) + (y, y) + 2(x, y) ~ O and(x,x) + (y,y) - 2(x,y) ~ O, so l(x,y)1 :s ((x,x) + (y,y))/2. This isalmost the inequality we want. Notice that if (x, x) = 1 and (y, y) = 1,then this is just I(x, y) I :s 1 = (x, x) 1/2(y, y) 1/2 as desired. Thus wealready have the Cauchy-Schwartz inequality for unit vectors (thosesatisfying (x, x) = (y, y) = 1). Finally we can reduce the general caseto this special case. Leaving aside the trivial cases x = Oor y = O(thenboth sides of the inequality are zero) , we can always write x = ax',y = by' with (x', x') = (y',y') = 1 simply by choosing a = (x,x)I/2 andb = (y, yj1/2 and setting x' = a-Ix, y' = b-1y (a-1 and b-1 are definedbecause x #- O,y #- O, and the inner product is positive definite). Then

Theorem 9.1.1 (Cauchy-Schwartz Inequality) On any vector space withan inner product (x, y), we have I(x, y) l:s (x, x) 1/2(y, y) 1/2 (or equiv­alently, I(x, y)1 :s Ilxllllyll) for any x, y. Furtherrnore, equality occursif and only if the vectors x and y are colinear (x = ay or y = bx forsorne scalar).

follows easily from the fact that (I,g) = J: f(x)g(x)dx is an innerproduct on C([a, b]). We will give a simple proof in the general set­ting. A good exercise would be to rewrite it in the special case of theEuclidean inner producto

Chapter 9 Euclidean Space and Metric Spaces362

Page 382: Strichartz_The Way of Analysis 2000

Geometrically this can be interpreted to say the sum of the squaresof the diagonals of a parallelogram equals the sum of the squares of

1(x, y) = ¡(llx + yll2 -llx _ yIl2),

which follows immediately by expanding IIx±y112 = (x±y,x±y) usingthe bilinearity. Of course not every norm is associated to an innerproduct (of the examples of norms on R", only the Euclidean one is),so the polarization identity does not make sense for an arbitrary norm.Norms that are associated to inner products satisfy the parallelogramlaw

It is a curious, and sometimes useful, fact that we can also expressthe inner prod uct in terms of the norm. This is called the polarizationidentity:

Ilx + yl12 = (x + y, x + y) = (x, x) + 2(x, y) + (y, y)= IIxl12+ 2(x,y) + Ily112.

Now by the Cauchy-Schwartz inequality IIx +y112 ~ IIxll2 + 211xllllyll +IIyl12 as desired. QED

follows from the bilinearity. The only nontrivial property is the triangleinequality, which we prove in its squared versión, Ilx + yl12 ~ (11xll +IIyll)2 = IIxl12+ 211xllllyll + IlyW, for we can then take the square root.Now

Ilaxll = (ax, ax)1/2 = (a2(x, x) )1/2 = lal(x, x)1/2 = lalllxll

Proof: The positive definiteness of the inner product implies thepositivity of the norm, and the homogeneity

Theorem 9.1.2 Let (x, y) be an inner product on a vector space. ThenIlxll = (x, x)1/2 is a norm.

We can now prove that the formula Ilxll = (x, X)1/2 really defines anorm-incidentally giving the promised proof of the triangle inequalityfor the Euclidean norm, since we have already verifed that the Eu­clidean inner product x . y is an inner producto

3639.1 Structures on Euclidean Space

Page 383: Strichartz_The Way of Analysis 2000

A discussion of the structure of vector space, norm, and inner productwould be incomplete without mention of the complex analogs becausethey have many important applications. A complex vector space is aspace satisfying the axioms of a vector space with e as the field ofscalars. This means that ax is defined for complex numbers a. Acomplex vector space may be thought of as a real vector space with ad­ditional structure (multiplication by i = V'=I). The simplest exampleis en, the set of n-tuples Z = (Zl, Z2, .•• , zn) of complex numbers. Thishas complex dimension n since the basis vectors e(1), ... , e(n) of IRnalsoform a basis of C", Z = zle(1) +.. ·+zne(n). (Warning: regarded as a re-1 trn h di .. 2 ith (1) (n) . (1) . (n)a vector space, Il.- as lmenslOn n, Wl e , ... , e ,'te , ... ,'te

forming a basis. The paradox is explained because the definitions oflinear independence and spanning involve the notion of linear combi­nations, which refers to the scalar field.)

The definition of norm on a complex vector space is the same as for areal vector space except that the homogeneity condition IIaxll = lalllxllmust hold for all complex scalars, where lal is the absolute value of the

9.1.3 The Complex Case

Figure 9.1.3:

IIxll

the sides (see Figure 9.1.3). It is again a simple exercise to derive theparallelograph law for norms associated with an inner producto It isactually true that the parallelogram law characterizes such norms. If anorm satisfies the parallelogram law, then the polarization identity de­fines an inner product, and the norm associated with the inner productis equal to the original norm. We leave the details as an exercise.

IIxll

Chapter 9 Euclidean Space and Metric Spaces364

Page 384: Strichartz_The Way of Analysis 2000

(a+a = 2Rea for any complex number a). Thus we have IRe(z,w)l:s;(lIzll2+ IIwIl2)/2 or, IRe(z,w)1s 1for Ilzll = Ilwll= 1. Now for thetwist: if we knew that (z, w) were real, then we would have Re(z, w) =(z, w) and, hence, I(z,w)1 :s;1. The point is that we can always make

(z, w) + (w, z) = 2Re(z, w}

I(z,w) + (w, z)l:s; (z, z) + (w, w).

However, we do not have (w, z) = (z, w) but rather (w, z) = (z, w)because oí the Hermitian symmetry, so

For en the usual inner product is (z,w) = ZlWl + ... + znwn. Itsatisfies these conditions, as is easily verified. The associated norm isIlzll = (z, z)I/2, which is still a positive real number. Note that thehomogeneity of the norm, Ilazll = (az, az)1/2 = (aa(z, z) )1/2= lalllzll,depends on the Hermitian form of the linearity since lal = (aa)1/2 but[e] :1: (a2)1/2 for complex a.

Now the Cauchy-Schwartz inequality I(z,w)1 :s;Ilzllllwll is still validfor complex inner products, but the proof requires one additional twist.We begin, as in the real case, by expanding (z + w, z + w) ~ O and(z - w, z - w) ~ O to obtain (z, z) + (z, w) + (w, z) + (w, w) ~ O and(z, z) - (z, w) - (w, z) + (w, w) ~ o. Thus, (z, w) + (w, z) is real andsatisfies

1. (x, y) = (y, x) (Hermitian symmetry).

2. (ax + by, z) = a(x, z) + b(y, z) and (x, ay + bz) =a(x, y) + b(x, z) (Hermitian linearity).

3. (x, x) is real and (x, x) ~ O with equality if and only if x = O.

complex number a. The proof that a complex norm defines a metricvia d(x, y) = IIx - yll is the same as in the real case.

The definition of complex inner product requires one importantmodification. In place of the bilinearity, we must introduce complexconjugates on one or the other side. (By convention mathematicianshave chosen the right side and physicists the left.) This is referred toas Hermitian linearity, and it also spoils the symmetry. The conditionson (x, y) necessary for it to be an inner product in the complex caseread as follows:

3659.1 Strnctures on Euclidean Space

Page 385: Strichartz_The Way of Analysis 2000

9.1.4 Exercises

1. Let u(1), u(2), ... ,u(n) be any orthonormal basis (u(j) . u(k) = O ifj ::j:. k, u(j) . u(j) = 1) in Rn• Prove that x = Lj=l aju(j) whereaj = x . u(j) and [z] = (2:j=1a;)1/2.

2. Verify that the sup norm on C([a, b)) is a norm.

3. Verify that 11/111= J: I/(x)ldx on C([a, b)) is a norm.

4. Prove that Ilxllsup = lilllp-oo Ilxllp on Rn.

We leave the verification as an exercise.

where we use the Cauchy-Schwartz inequality and the trivial inequalityRe(z,w) s l(z,w)l.

Incidentally, the polarization identity also must be modified in thecomplex case. It reads:

IIz+w112 = (z+w,z+w)= (z, z) + 2 Re(z, w) + (w, w)< IIzll2 + 211zllllwll + IIwl12= <llzll + IIw1l)2

Thus I(z, w)1 ~ 1 for IIzll = IIwll = 1 and we can complete the proofin general by scaling as in the real case. The triangle inequality thenfollows by the same token:

1= I(az, w)1= IRe(az, w)1 ~ '20lazll + Ilwll)

1= 2(IIzII + IIwll).

l(z,w)1

(z, w) real by multiplying z by the appropriate complex number ofabsolute value one tuithout changing Ilzll. In other words, lIazll = Ilzllif lal = 1 and if (z, w) = rei6 in polar coordinates, then the choicea = e-í6 makes (az, w) = a(z, w) = e-i6rei6 = r real, so

Chapter 9 Euclidean Space and Metric Spaces366

Page 386: Strichartz_The Way of Analysis 2000

15. Verify that d(x, y) = Ix - yl/(l + Ix - yl) defines a metric on Rn,but this metric is not induced by any norm. (Hint: homogeneityfails.)

13. Prove that Ax . y = x . A ty where A t denotes the transpose ma­trix-obtained from A by interchanging rows and columns.

14. Let A denote any n x m matrix, and define !IA!! =sup{IAx! : Ix! ~ 1}. Show that this is indeed a norm on thespace of n x m matrices (regarded as an (n . m)-dimensional vec­tor space).

9. Prove that if IlxII is any norm on Rn, then there exists a positiveconstant M such that ll=ll :5 Mlxl for all x in Rn where Ixl is theEuclidean norm. (Hint: M = (I:i=l lIe(i) 112)1/2 wiil do.)

10. Prove that the norm !!xlll on ]Rn for n > 1 is not associated withan inner producto (Hint: violate the parallelogram law.) Do thesame for IIxllsup•

11. Prove that a real n x n matrix A satisfies Ax .Ay = x . y for all xand y in r if and only if IAxl = Ixl for all x in ]Rn. Such matricesare called orthogonal.

12. Prove that a real n x n matrix is orthogonal if and only if itscolumns form an orthonormal basis for Rn•

7. Verify the parallelogram law.

8. *Prove that if a norm Ilx II on a real vector space satisfies theparallelogram law, then the polarization identity defines an innerproduct and that the norm associated with this inner product isthe original norm.

6. Verifythe polarization identity in both the real and complex cases.

5. Verifythat (/, g) = J: f(x)g(x) dx is an inner product on C([a, bD.What is the associated norm and metric?

3679.1 Structure« on Euclidean Space

Page 387: Strichartz_The Way of Analysis 2000

We have also seen that IRn has other metrics, such as those associatedwith the norms Ilxllsup and Ilxlll' Another important example is thespace C([a, bD of real-valued continuous functions on a compact interval[a,b] with the sup-norrn metric d(f,g) = sUPx If(x) - g(x)l.

Whenever we have one metric space we can immediately get manymore by the simple device of restricting to a subset. If M is a metricspace with distance function d(x, y), then any subset M' ~ M be­comes a metric space, called a subspace, with the same distance func­tion d(x, y) (now x and y are restricted to He in M'). Clearly there aremany interesting metric subspaces of IRn(warning: the term "subspace"here is being used in a completely different way than in linear algebra,where a vector subspace must have the vector space properties). Wewill find in particular that it is very use fui to consider metric space

d(x, y) = J(Xl - Yl)2 + ... + (xn - Yn)2.

We begin by describing sorne examples so that we can refer the ab­stract concepts to concrete situations. Our basic example is Euclideanspace IRnwith the Pythagorean metric

2. d(x, y) = d(y, x) (symmetry), and

3. d(x, z) :S d(x, y) +d(y, z) for any x, y, z in M (triangle inequality).

In this section we want to discuss the generalizations of the conceptsintroduced in Chapter 3, including open set, closed set, compact set,limits, and completeness. It turns out that these concepts can be de­scribed in terms of the most rudimentary structure on IRn, the metricspace structure. In order to make this clear we will develop those con­cepts for an abstract metric space. This doesn't really involve any morework, and it has many rewards, for there are other contexts in whichthese concepts are very useful.

Recall that a metric space is simply a set M on which we havedefined a distance function d(x, y), real-valued, for x and y in M, sat­isfying

1. d(x, y) 2: O with equality if and only if x = y (positivity),

9.2.1 Open Sets

9.2 Topology of Metric Spaces

Chapter 9 Euclidean Space and Metric Spaces368

Page 388: Strichartz_The Way of Analysis 2000

< r¡ + r - r¡ = rd(y, z) ~ d(y, x) + d(x, z)

where the radius r is positive and the center y is an arbitrary pointin the metric space. We follow the mathematical convention of usingthe word "ball" for the solid region and "sphere" for the boundary,{x : d(x, y) = r}. In the example of IRn for n = 2 or 3 these balls arewhat we norrnally think of as balls (or discs in IR2) with radius r andcenter y. In a general metric space, however, we cannot expect the ballsto have any "roundness". As a consequence of the triangle inequality,we can easily show that an open ball with center y also contains openballs centered at all its other points. To be precise, if x is a point inB; (y), then d (x, y) = r¡ < r and so Br-rl (x) ~ B; (y) because if z is inBr-r1 (x), then d (x, z) < r - r¡, which implies

Br(Y) = {x: d(x,y) < r}

Now that we have sorne examples, we begin the topological theory.The analog of open intervals in IR will be played by the open balls

Figure 9.2.1:

concepts for subspaces of Euclidean space. Sometimes the restricteddistance function for a subspace may not be the most natural one toconsidero For example, the unit circle in the plane as a subspace ofIR2 has a chord-length distance function, as illustrated in Figure 9.2.1.It is not difficult to verify that arc-length distance makes the circle arnetric space--a different but perhaps more natural rnetric space. Wewill return to this idea in a later chapter. For a completely differentexample of a rnetric, see exercise set 9.2.5, number 12.

3699.2 Topology 01 Metric Spaces

Page 389: Strichartz_The Way of Analysis 2000

Specifically, we define an open set in a metric space to be a set Awith the property that every point in A lies in an open ball contained inA. This is equivalent to saying for every point y in A, Br(Y) ~ A forsorne r > O. Notice that it suffices to consider only balls with radius1/m for sorne integer m, since r > l/m for sorne m by the axiom ofArchimedes, so Bl/m(Y) ~ Br(y). We define a neighborhood of a pointto be an open set containing that point. For most applications we canrestrict attention to the countable sequence Bl/m (y) of neighborhoodsof y. We define the interior of a set A to be the subset of all points inA contained in open balls contained in A. Thus A is open if and onlyif it equals its interior.

From the definition it is clear that a nonempty open set (the emptyset is open by convention) is a union of open balls and conversely.However, the open sets in }Rn are quite complicated-there is no nicestructure theorem as there is in }Rl, where every open set is a disjointunion of open intervals. Even more distressing, if we consider subspacesof }Rn, the concept of open set changes. If M is a subspace of R", thenthe open balls of M are the sets {x in M : d(x, y) < r} for centerpoints y in M. The open subsets of M are thus unions of these kindsof balls and need not be open in }Rn. To make the distinction clear wesometimes refer to these sets as open relative to M or relatively opensets. The set M is always open relative to M but need not be open in}Rn. Of course if M is open in }Rn, then all open subsets of M are open

Figure 9.2.2:

by the triangle inequality; so z is in Br(y). (See Figure 9.2.2.) We willuse this property to define a general notion of open seto

Chapter 9 Euclidean Space and Metric Spaces370

Page 390: Strichartz_The Way of Analysis 2000

so A is open in M.

Br{Y) = Mn Br{y}¡ ~ Mn Al = A,

Since the distance function is the same (this is the definition oí metricsubspace), we clearly have Br{Y) = M n Br{y}I. In other words, theopen balls in M are exactly the intersections of M with open balls inMI whose centers happen to He in M.

Now given an open set A in M, we construct the required set Al.Simply take Al to be the union of all the balls Br(y}I such that Br{Y)lies in A. Since Br{Y) ~ A ~ M, we have Br{y}¡ nM = Br{Y)j soAl nM is the union of all the balls Br(y), which equals A from thedefining property of A being open in M. AIso Al is clearly open inMI because it is a union of open balls. Thus we have constructed anopen subset Al oí MI, such that Al nM = A. Incidentally, we are notclaiming that such a set Al is unique.

Conversely, if Al is an open subset of MI, then A = Al nM is anopen subset of M. Indeed if y is in A, then Br(y}I ~ Al for smallenough radius r because Al is open and y is in Al. Then

Br(y}I = {x in MI : d(x,y) < r}.

On the other hand we can also consider the ball of radius r centeredat y in the larger metric space MI,

Br{Y) = {x in M: d{x,y) < r}.

Proof: If A ~ M is open in M, this means for every y in A, s,(y) ~A for small enough radius r. Here B; (y) refers to the metric space Mand so is defined by

Theorem 9.2.1 Let M be a metric subspace 01 a metric space MI.Then a subset A 01M is open in M il and only il there exists an opensubset Al 01MI sucb that A = Al nM. 11M is open in MI, then [orsubsets A 01M we have A is open in M il and only il A is open inMI.

in Rn• But in general all we can say is that the open subsets of M arethe intersections of M with open subsets of ]Rn. None of this has to dospecifically with R", so we prove it in general.

3719.2 Topology 01 Metric Spaces

Page 391: Strichartz_The Way of Analysis 2000

In the more general theory of topological spaces, the closure prop­erties of this theorem together with the trivial property that the emptyset and the whole space are open are taken as axioms for the opensets. That is, a topological space is defined to be any set with a col­lection of open subsets satisfying those axioms. A metric space thenbecomes a special case of a topological space with the open sets givenby the definition we have chosen. It is possible to introduce rnany ofthe concepts we are discussing in the still more abstract setting of gen­eral topological spaces, but we have chosen not to do this here becauseall the examples we need to deal with are in fact metric spaces. Thereare, however, topological spaces that are not metric spaces, and sorneof these are important in more advanced analysis.

Proof: These are immediate consequences of the definition. If Ais any set of open subsets A of M, we want to show UA A is open.Since the union consists of all points that líe in at least one of the setA, given any point x in the union there is a set A in A containing X.

Since A is open, it contains B; (x) for small enough radius r; so B; (x)is contained in the union, proving the union is open.

Next consider a finite intersection A = Al nA2 n ... nAm, whereeach Ak is open. If x is in A, then x is in each Ak; hence, Br1c(x) e Akfor some positive rk because Ak is open. Taking r to be the minimumof rl, ... ,rm (r is positive because there are only a finite number ofrk's) we have Br(x) ~ Bric(X) ~ Ak for every k; so Br{x) ~ A, provingA is open. QED

Theorem 9.2.2 In any metric space, an arbitrary union 01 open setsis open and a finite intersection oj open sets is open.

The basic closure properties of open sets on the line are true ingeneral.

Finally if M itself is open in MI! then Br(Y}¡ = Br(Y) for smallenough r (depending on y) for each y in M. Thus the conditions thatA be open in M and MI are the same: for all y in A, Br{Y) = Br{Y)¡must He in A for r small enough. QED

Chapter 9 Euclidean Space and Metric Spaces372

Page 392: Strichartz_The Way of Analysis 2000

In a similar way we define limit point oí. a sequence, this time re­quiring infinitely many Xn to be in each neighborhood of x. Just as inthe real case, this is equivalent to a subsequence converging to x. Alimit point of a set A is a point x such that every BI/m(X) contains apoint of A different from x-hence, infinitely many points of A. A setA is said to be closed if it contains a11its limit points. The closure of aset ís the union of the set and a11its limit points. The closure is always

Figure 9.2.3:

Next we discuss the concept of limit in a metric space. If Xl, X2, ••• isa sequence of points in M, then we say the sequence has a limit X inM (or the sequence converges to x), written Xn -t x or limn_oo Xn =x, provided that for any error l/m there exists N sucli that n ~ Nimplies d( Xn, x) ~ l/m. An equivalent way of saying this is that everyneighborhood 01 x contains all but a finite number 01 Xn. This is theidentical definition we gave for limits in IR;we have simply replaced thedistance IXn - xl in IRwith the distance function d(xn, x) in M. Againwe have the bull's eye picture of the neighborhoods BI/m(x), as shownin Figure 9.2.3, and the points in the sequence must eventua11yend upin each neighborhood.

9.2.2 Limits and Closed Sets

3739.2 Topology of Metric Spaces

Page 393: Strichartz_The Way of Analysis 2000

One very important fact that we learned about convergence in R. isthat a sequence converges if and only if it satisfies the Cauchy criteri­on. In an arbitrary metric space we can define a Cauchy sequence asfollows: for every error l/m there exists N such that j, k ~ N impliesd(xj, Xk) $ l/m. It is easy to see that a convergent sequence is always aCauchy sequence because once d(xk, x) :5 l/2m for all k ~ N it followsthat d(xj, Xk) $ l/m for j, k ~ N by the triangle inequality. However,the converse is not true for the general metric space. For example it isnot true for the rational numbers, which is a metric subspace of R.. Ametric space is called complete if every Cauchy sequence is convergent.Wewill show that Rn is a complete metric space. This will be easy oncewe show that convergence in Rn is equivalent to convergence in each

9.2.3 Completeness

From this we can conclude that finite unions and arbitrary intersec­tions of closed sets are closed. We could also prove this directly fromthe definition.

Proof: Let A be open and B be its complemento To show B isclosed we need to show it contains its limit points. So let x be a limitpoint of B. To show x is in B we must show x is not in A since Bis the complement of A. But if x were in A, then B; (x) ~ A for theradius r sufficiently small since A is open. But this contradicts the factthat x is a limit point of B, for every Br(x) must contain points of B.Therefore x is not in A, hence in B.

Now conversely assume B is closed and A is its complemento Wewant to show A is open. So given any point x in A, we want to showBr(x) ~ A for sorne r. If not, then Br(x) would contain points of B(not equal to x) for all r, hence x would be a limit point of B. SinceB is assumed closed, it would have to contain x, contradicting the factthat x is in A. Thus Br(z) ~ A for sorne r, proving A is open. QED

Theorem 9.2.3 A 8u68et 01 a metric 8pace is closed if and only il itscomplement is open.

a closed set, and a set is closed if and only if it equals its closure. Theseare a11straightforward to verify. We saya subset A of B is dense in Bif the closure of A contains B.

Chapter 9 Euclidean Space and Metric Spaces374

Page 394: Strichartz_The Way of Analysis 2000

Another interesting example of a complete metric space is C([a, b)).

Proof: Let x(l), x(2), .•• be a Cauchy sequence. Then by the theo­rem, x~l), x~2), ... is a Cauchy sequen ce of real numbers for each k. Bythe completeness of the reals each of these sequences has a limit, saylimj--+oox?) = Xk. Then limj--+oox(j) = x = (Xl,"" xn), also by thetheorem. QED

Corollary 9.2.1 ]Rn is a complete metric space.

I(j) I 1xk -Xk ::;y'nm'

By taking N to be the largest of the Nk we have j ~N impliesIx~) - Xk I ::; 1/ y'nm for all k. Then-square in the circle-

Proof: The idea of the proof is that we can fit a circle in a squareand a square in a circle (or more precisely, the n-dimensional analog).We essentially proved this result for n = 2 when we showed convergencein e was equivalent to convergence of the real and imaginary parts.

Suppose first limj--+oox(j) = x exists. Given any error l/m thereexists N such that j ~N implies d(x(j),x) ::; l/m. Since lakl ::;

(L:;=1IapI2)1/2, we have Ix~) - xkl ::; d(x(j), x), so Ix~) - xkl ::; l/mfor j ~N and so limj--+oox~) = Xk. This is the circle in the squareparto

Conversely, suppose limj--+oox~) = Xk for each k. Then given anyerror l/m we can find N k such that j ~Nk implies

Theorem 9.2.4 Let x(1), x(2), ••. be a sequence 01points in }Rn. Thenx(1), x(2), ... converges to a limit x il and only il the sequence 01 real

(1) (2)numbers Xk 'Xk , ••• converges io Xk [or k = 1,2, ... ,n.

coordinate. Here we follow the notational conventions of the previoussection.

3759.2 Topology 01 Metric Spaces

Page 395: Strichartz_The Way of Analysis 2000

It is important to realize that this result concerns just the sup­norm metric on the space C([a, b]). There are other metrics, such asthe L1 metric d(f,g) = J: I/(x) - g(x)ldx on the same set C([a, b]) forwhich completeness fails. In fact the sequence of continuous function­s converging pointwise to a discontinuous function gives an exampleof a Cauchy sequence in this metric with no limit in C([a, b]). Seethe exercises for details. The Ll metric is associated to a norm andwill be discussed in detail in Chapter 14. In a finite-dimensional vectorspace every metric associated to a norm is complete, so it is the infinite­dimensional nature of the space C( [a, b]) that is crucial in this example.The kind of incompleteness here is different in nature from the incom­pleteness of the rationals in that there are no "holes" in between pointsof the space.

Any metric space that is not complete can be completed by a pro­cess completely analogous to the procedure whereby we constructedthe real numbers from the rationals. The completion M of M is de­fined to be the set of equivalence classes of Cauchy sequences of pointsin M-where equivalence is defined in exactly the same way as with

Proof: This theorem is really just a reformulation of the Cauchycriterion for uniform convergence (Theorem 7.3.1). In fact, we claimthat convergence in the sup-norm metric is the same as uniform con­vergence.

Indeed convergence in the sup-norm metric means given any errorl/m there exists N such that k 2: N implies sup , Ilk(X) - l(x)1 :S l/m,which is the same as saying there exists N independent of x such thatk 2: N implies Ifk(x) - l(x)1 :S l/m for every x, which is uniformconvergence.

The same reasoning shows that the Cauchy criterion for a sequence{In} in the sup-norm metric is identical to the uniform Cauchy criteri­on: given any error l/m there exists N such that I/j(x) - Ik(X)1 :S l/mfor all x provided i,k 2: N. Thus any Cauchy sequence in the sup-normmetric converges uniformly to a function I that is continuous (beingthe uniform limit of continuous functions), which is the same as sayingIn ---+ I in the sup-norm metric. QED

Theorem 9.2.5 C([a, b]) with the sup-norm metric is a complete met­ric space.

Chapter 9 Euclidean Space and Metric Spaces376

Page 396: Strichartz_The Way of Analysis 2000

The idea oí compactness is important in the general setting of metricspaces and Rn in particular. The definition of a compact subset A ofa metric space is that every sequence al, a2, ... of points in A has alimit point in A (or, equivalently,has a subsequencethat convergestoa point in A). Notice that the definition refers only to the points in Aand to the distance function betweenpoints oí A, but it does not refer

9.2.4 Compactness

numerical sequences.Weregard M as a subset ofM by identifyingthepoint x in M with the equivalence class of the sequence (x, z, ... )­exactly the waywe regarded the rationals as a subset of the reals. Wecan maleeM into a metric space by definingthe distance between theequivalenceclass of x}, X2, ••• and the equivalenceclass of y}, Y2, ••• tobe limn_ood(xn, Yn). This definition requires that we verify: 1) thelimit exists, 2) the limit is independent oí the choiceoí sequencesfromthe equivalenceclasses,and 3) the distance so definedsatisfies the ax­ioms for a metric. All these verificationsare routine. A general metricspaceM has no further structure, so we can't say any more about M.But ifM happens to be a vector space and the metric is associated toa norm or an inner product on M, then we can extend the norm orinner product structure to M. Finally we can prove that M is com­plete, justifying the terminology. Again the proof of the completenessof M is completely analogous to the proof of the completenessof R.A complete normed vector space is called a Banach space, and a com­plete inner product space is called a Hilbert space. The study of thesestructures forms an important part of twentieth century analysis, butit is beyond the scope of this work.

It should be pointed out that the abstract construction of the com­pletionM sketchedaboyeis not alwaysverysatisfactory. Frequentlywewant a more concrete description ofM. For example, in the Ll metricd(f,g) = J: I/(x) - g(x)ldx on C([a,b]), the completion is essentiallythe space of Lebesgueintegrable functions. However,the constructionoí Lebesgueintegration theory is muchmore difñcult than the abstrae­t completion-indeed the abstract completion does not enable you toidentify the elements of the completion space as certain functions on[a, b), while the Lebesgue theory does. For this reason we will notdiscuss completions in more detail.

3779.~ Topology 01Metric Spaces

Page 397: Strichartz_The Way of Analysis 2000

to points not in A. Thus it is the same thing to say A is a compactsubset of M or A is a compact subset of N if N is any subspace ofM containing A (remember that "subspace" implies the metric is thesame). Thus compactness is an absolute concept, unlike "open" and"closed", which depend on the whole space. In particular we say thata metric space M is compact if M is a compact subset of itself or, inother words, if all sequences of points in M have' limit points in M.Then A is a compact subset of M if and only if A as a subspace is acompact metric space.

Clearly there is a close connection between the concepts of com­pactness and completeness, since both refer to the existence of limitsof sequences. In fact it is easy to see that compactness implies complete­ness, bui completeness does not imply compactness. Indeed suppose Mis compacto To show that M is complete, consider any Cauchy sequenceXl, X2, • • •• We need to show it has a limit in M. By the compactness ithas a limit point in M, and it is easy to show that the Cauchy criterionthen implies that the limit point is actually a limit (see exercises). Onthe other hand, IR is already an example of a complete metric spacethat is not compacto

As in the case of subsets of IR, compactness is also equivalent toother conditions. The most important is the Heine-Borel property:every open covering has a finite subcovering. The definitions are thesame as in the case of subsets of R If A is a subset of M, we say B,a collection of subsets B of M, is a covering of A if UsB 2 A andan open covering if all the sets B are open sets in M. A subcoveringsimply means a subcollection B' of B that still covers A.

Notice now that there are two ways in which we can interpret theHeine-Borel property-in the metric space M and in the metric sub­space A. They are somewhat different but turn out to be equivalent.The reason is that "open" means different things for subsets of A andsubsets of M. If we were to consider the Heine-Borel property withrespect to A we would have to consider coverings of A by open subsets01 A. If B is an open subset of A, it is not in general an open subsetof M. Thus a covering of A by open subsets of A is not a special caseof a covering of A by open subsets of M. Nevertheless, because of therelationship between open subsets of A and M, we know that if B isa covering by open subsets of A, then by extending each B in B toiJ an open subset of M (with iJ nA = B), we obtain covering B by

Chapter 9 Euclidean Space and Metric Spaces378

Page 398: Strichartz_The Way of Analysis 2000

Proof: Many of the ideas of the proof have already been given inthe proof of the analogous theorem for sets of real numbers. We startby proving that the Heine-Borel property implies compactness. Thuswe need to show that an arbitrary sequence Xl, X2, ... of points in Ahas a limit point in A. We may assume without loss of generalitythat all the points Xl, X2, ... are distinct. We want to show that nothaving a limit point would contradict the Heine-Borel property. Todo this we construct a cover of A by the sets BI, B2, ... where BIis A with the sequence Xl, X2, ... removed, B2 is A with X2, X3, ...

removed, and in general Bi, is A with Xk, Xk+l, ... removed. Clearlythe sets {Bk} cover A, and no finite subcovering exists. But are thesets Bk open in A? The answer is yes because we are assuming thesequence Xl, X2, ... (and hence Xk, Xk+l ... ) has no limit points. Sincethe sequence Xk, Xk+l,'" has no limit points, it is a closed subset of A(the definition of closed is vacuously satisfied), so its complement Bi; isopen. Thus the Heine-Borel property is not satisfied, proving half thetheorem by contradiction.

Conversely, suppose A is compacto We want to show that the Heine­Borel property holds. Let us first show that every countable open coverhas a finite subcover-and afterward we will reduce the general case tothis one. Let the sets in the cover be BI, B2, If there were no finitesubcover, then each of the sets BI, BI UB2, , BI UB2 U··· UBn, ...would fail to cover A. Thus there would be points al, a2, ... in A withan not in BI U ... U Bn. By the compactness of A there must be alimit point a of the sequence al, a2,'" in A. For any fixed n, the tailof the sequence an, an+l, an+2, ... consists of points not in Bn; andsince the complement of the open set Bn is closed, it must contain its

Theorem 9.2.6 (Abstract Heine-Borel Theorem) A metric space (or asubset 01 a metric space) is compact il and only il it has the Heine-Borelproperty.

open subsets of M. Conversely we can intersect open subsets of Mwith the set A to obtain open subsets of A, and by doing this to eachset in a covering of A by open subsets of M we can obtain a cover­ing of A by open subsets of A. In this way we can go back and forthbetween the two Heine-Borel properties and show they are equivalent(see exercises).

3799.2 Topology 01 Metric Spaces

Page 399: Strichartz_The Way of Analysis 2000

limit point a (of course a separate-more trivial-argument must begiven if infinitely many of the Cln equal a). Thus a is not in any Bn,contradicting the fact that B1, B2,'" was supposed to be a cover of A.

To reduce the case of an arbitrary open cover to a countable one weneed the fact that a compact metric space has a countable dense set(without the assumption of compactness this is not necessarily true ofan arbitrary metric space-although it is true of an arbitrary subspaceof the reals, or of Rn). We will prove this as a separate lemma foHowingthis proof. Assuming it is true, let us denote by Xl, X2,' .. this countabledense set and consider the countable collection of balls B1/m(xn) ofradius l/m about Xn, where m and n vary over all positive integers.The idea is to reduce an arbitrary cover to a cover that is in one-to-onecorrespondence with a subset of this countable collection of balls­hence will be a countable (or finite) subcover. To do this we startwith the original cover B and choose one set B containing B1/m(xn),if any such sets existo This requires the countable axiom of choice, onearbitrar y selection from each ofthe sets {B in B containing B1/m(Xn)}.This gives us a countable (or finite) subcollection B'Of B. Why does itcover A? Let a be any point in A. Since B covers A, there must be a setB in B containing a. Since B is open, it must contain a neighborhood ofa. Now since Xl, X2,." is dense in A, we can find points in the sequencearbitrarily close to a, and it follows easily that one ofthe balls B1/m(xn)contains a and is contained in B (if say B2/m(a) ~ B, then by choosingd(xn, a) < l/m we have a in B1/m(xn) and B1/m(xn) ~ B2/m(a) ~ B),as shown in Figure 9.2.4.

Since B1/m(xn) ~ B for some B in B, by the definition of B' thereis a set in B' also containing B1/m(xn) and, hence, a since a is inB1/m(Xn). Thus B' covers A. To complete the proof of the theorem weneed only prove the following lemma.

Lemma 9.2.1 Any compact metric space has a countable dense subset.

Proof: Choose any point for Xl. For X2 we choose any point not tooclose to Xl' We let R be the sup of d(xIt x) as X varies over the spaceand require that d(xIt X2) ~ R/2. The fact that R is finite follows fromthe compactness, since otherwise there would be a sequence Y1, Y2,'"with limn_oo d(X1, Yn) = +00, and no subsequence of YItYnt ... couldconverge-for yj -t Y would imply

d(X1, yj) ~ d(x¡, y) + d(y, yj)

Chapter 9 Euclidean Space and Metric Spaces380

Page 400: Strichartz_The Way of Analysis 2000

remains bounded since d(y, yj) -t O and d(XI, y) is a fixed constant.Having chosen Xl and X2, we choose X3 so that both d(XI, X3) and

d(X2,X3) exceed R2/2, where R2 is the sup ofthe minimum ofd(XI,X)and d(X2, X) as X varies over the space. In this way X3 is not too closeto either Xl or X2. Proceeding inductively, having chosen Xl, ... ,Xn welet R¿ be the sup of the minimum of d(XI, x), ... , d(xn, x) as X variesover the space. As long as there are an infinite number of points inthe space (if there were only a finite number there would be nothing toprove) there are always points X not equal to any of Xl, ... ,Xn, so Rnis not zero. And R¿ is finite since it is always :::;R. We can think of Rnas measuring the furthest one can possibly get away from all the pointsXl, ... ,Xn in the space. We then choose for Xn+l any point such thatd(xk, xn+I) ~ Rn/2 for all k = 1,2, ... ,n. This construction requiresthe countable axiom of choice.

So far we have merely obtained a sequence Xl, X2,' .. that is "spreadout" over the space. Our argument did not use the compactness of thespace in any essential way (only boundedness). Now we are going touse the compactness to show that the set Xl, X2, ... is in fact dense.First we need to show that limn--+ooR¿ = O. Note that the choice ofXn+l required that d(Xn+b Xk) ~ Rn/2 for k :::;n; so if we did not havelimn--+ooR¿ = O, then there would exist e > O such that d(xj, Xk) ~ e forall j and k (just take e = 1/2limn--+ooRn, the limit existing because R¿

Figure 9.2.4:

3819.2 Topology 01 Metric Spaces

Page 401: Strichartz_The Way of Analysis 2000

In the course of the proof we have established several properties ofcompact metric spaces that are of interest on their own. The first isboundedness. We can express this in two different but equivalent ways:1) there exists a point x in the space such that d(x, y) < R for everyy in the space, or BR(X) is the whole space. The inf of such R definesthe radius of the space with respect to x. The triangle inequalitythen implies that the radius is finite with respect to every point in thespace, ~ 2R in fact. 2) Let D = SUPx,y d(x, y) be the diameter of thespace. The diameter is finite if and only if the radius is finite, withR ~ D ~ 2R no matter which point we use to compute the radius.These are immediate consequences of the triangle inequality.

In addition to boundedness, compact metric spaces have the prop­erty that given any l/m there exists a finite set of points xI, ... , xnsuch that every point x is within a distance l/m of one of them. Thiswas established in the course of the proof of the lemma. Notice that itsays BI/m(x), ... , BI/m(xn) covers the space, so it is also an immediateconsequence of the Heine-Borel property.

We have seen that a compact metric space must be bounded andcomplete. Here the completeness is playing the role of "closed" forsubsets of the reals. In the abstract setting, saying that a set is closedmay have no significance-if the whole space is being considered, thenit is always closed. The connection between "complete" and "closedsubset of the reals" is explained by the following simple fact, whoseproof we leave as an exercise: a subspace A of a complete metric spaceM is itself complete if and only if it is a closed set in M. -Sincethe realsare complete, the closed subsets (as metric subspaces) are the same asthe complete subspaces.

It is natural to pose the question: if a metric space is complete andbounded, is it necessarily compact? For subsets of the reals we haveseen this is true, but in general it is falseo Before giving a counterexam­ple, let us first show that the analogous statement in Euclidean space

is decreasing). But then no subsequence could converge, contradictingthe compactness. Thus limn_oo R¿ = O.

To show that Xl, X2,' •• is dense, choose any point X in the space.Given any l/m we need to find Xn with d(x, xn) ~ l/m. Choose nlarge enough so that R¿ < l/m. Then d(x, Xk) ~ l/m for sorne k ~ nor else by the definition of R¿ we would have R¿ 2: l/m. QED

Chapter 9 Euclidean Space and Metric Spaces382

Page 402: Strichartz_The Way of Analysis 2000

provides us with examples oí subspaces that are closed (hence com­plete) and bounded but not compacto To see this, let f¡, 12, ... be asequence of continuous function converging pointwise to a discontin­uous function; for example, on [-1,1], we could take In as shown inFigure 9.2.5. Let A be the set {JI, 12, ... } in C([-l, 1)). A is boundedbecause d(fn, O) ~ 1. A is closed because it has no limit points-anysequence from A converges pointwise to a discontinuous function and socannot converge uniformly (remember uniform convergence is the met­ric convergence) to a continuous function. But A is not compact, sinceas we have just seen !t,h, ... is a sequence from A with no convergentsubsequence.

We can now give an interpretation of the Arzela-Ascoli theorem interms of compactness in C([a, b]). Recall that the Arzela-Ascoli theo­rem says that a sequence of functions in C([a, b)) that is both uniformlybounded and uníforrnly equicontinuous has a uniformly convergent sub­sequence. Now suppose A is any subspace of C([a, b)) that is 1) closed,

d(f,y) = sup I/(x) - g(x)1:r

The complete metric space C([a, b)) with sup-norm metric

Proof: Since we have already seen that a compact set is closed andbounded, it remains to show that if A is a closed and bounded subset oí]Rn, then it is compacto Thus let x(l), x(2), ••. be any sequence oí pointsin A. Since A is bounded, the sequence of coordinates x~l), x~2), ..•must also be bounded, for each fixed k, 1 ~ k ~ n. Thus each ofthese sequences of reals has a limit point and, hence, a convergentsubsequence. By taking subsequences of subsequences we can arrange

fi d . 1 b 11· (1) (2) h h (1) (2)to n a smg e su sequence, ca it y ,y , ... suc t at Yk 'Yk , ...converges, for each k, 1 ~ k ~ n. Since convergence in each coordinateimplies convergence in Rn, the subsequence y(1), y(2) , •.. converges tosome limit y in ]Rn. Since A is closed, the limit must be in A. Thisshows A is compacto QED

Theorem 9.2.7 A subspace 01Rn is compact if and only il it is closedand bounded.

Rn is true.

3839.~ Topology 01Metric Spaces

Page 403: Strichartz_The Way of Analysis 2000

4. Prove that the space oí bounded sequences with metricd( {xn}, {Yn}) = sup., IXn - Ynl is complete, and the same is trueon the subspace of sequences converging to zero.

2. If Xn --+ x and Yn -4 Y in a metric space, show liInn_ood(xn, Yn) =d(x, y).

3. Prove that the metric d(f,g) = J:lf(x) - g(x)ldx on C([a,b)) isnot complete. (Hint: consider the example of a sequence of con­tinuous functions converging pointwise to a discontinuous func­tion.)

1. If Xn --+ x in a metric space and y is any other point in the space,show lillln_ood(xn, y) = d(x, y).

9.2.5 Exercises

2) bounded, and 3) uniformlyequicontinuous (for every l/m there ex­ists l/n such that for every f in A and x and y in [a,b] with Ix - yl <l/n, we have If(x) - f(y)1 < l/m). Given any sequence of functionsin A, 2) and 3) imply by the Arzela-Ascoli theorem that there exists auniformly convergent subsequence, and the limit is in A by 1). ThusA is compacto Conversely, if A is a compact subspace of C([a, b)), thenit must satisfy 1), 2), and 3). Indeed, we have already seen that acompact set is closed and bounded; the uniform equicontinuity is aconsequence oí the Heine-Borel property. We leave the details as anexercise. Thus we have a complete characterization oí the compactsubspaces of C([a, b)).

Figure 9.2.5:

I¿ In

•-1 : O l/n 1

Chapter 9 Euclidean Space and Metric Spaces384

Page 404: Strichartz_The Way of Analysis 2000

(Metrics with this property are called ultra metrics.)

C. Show that the Euclidean metric on 1R3is not an ultra metric.

d. Show that the completion of the integers with the p-adicmetric can be realized concretely by infinite base p integers

d(x, z) ~ max(d(x, y), d(y, z)).

12. *a. Let Z denote the integers, and let p be any fixed prime. Everyinteger z can be written uniquely base p as ±aNaN-l ... alaO

N' kwhere O ~ aj ~ p - 1 and z = ± ¿j=o ajp1. Let Izlp = p- ,where k is the smallest integer for which ak i= O. Prove thatd(x, y) = [z - ylp is a metric. (It is called the p-adic metric.)

b. Show that the p-adic metric satisfies

11. Prove that if Al 2 A2 2 A3 2 ... is a nested sequence of non­empty compact subsets of a metric space, then n~=lAn is non­empty.

10. Show that any compact subspace of C([a, b]) in the sup norm is uni­formlyequicontinuous. (Hint: for each l/m consider the covering byBn = {f in C([a, b]) : Ix - yl :::; l/n implies If(x) - f(y)1 < l/m}.)

9. Construct a sequence offunctions 11,12, ... in C([O, 1]) with sup­norm metric such that d(ik, O) = 1 and d(/j, Ik) = 1 for any jand k.

8. Prove that a subspace of a complete metric space is complete ifand only if it is closed.

7. Prove that a metric space is compact if and only if it is bounded,complete, and given any l/m there exists a finite subset XI, ... ,Xn

such that every point X in the space is within l/m of one of them(d(x, Xk) < l/m for sorne k, 1~ k < n).

6. Prove directly that if A is a subspace of M, then the Heine-Borelproperty for A as a subspace of M (open meaning open in M) isequivalent to the Heine-Borel property for A as a subspace of A.

5. Prove that if a Cauchy sequence in a metric space (not assumedto be complete) has a limit point, then it has a limito

3859.2 Topology 01 M etric Spaces

Page 405: Strichartz_The Way of Analysis 2000

In this section we are going to discuss functions whose domains andranges are metric spaces. Although most of the examples we deal withwill concern only subspaces of Euclidean space, it is instructive to seethe theory in the more abstract setting. The proofs are certainly nomore difficult, and there are many applications in which other metric

9.3.1 Three Equivalent Definitions

9.3 Continuous Functions on Metric Spaces

16. Prove that the metrics on ]Rn associated to the norms Ixlsup andIx11 are equivalent to the usual metric.

17. Which subsets of lRnare both open and closed?

18. Prove that the intersection of an open subset of JR2 with the x-axisis an open subset of the lineo

19. Let S denote the circle x2 + y2 = 1 in lR2. For points on S definethe distance to be the length of the shortest are of the circle joiningthem. Prove this is a metric. Is it the same metric as that of S asa subspace of JR2?Can you describe the distance function in termsof the angular parameter 9 in the representation (cos 9, sin 9) ofpoints on 5?

15. Prove that equivalent metrics have the same open sets. Give anexample of two metrics on lRthat have the same open sets but arenot equivalent.

13. Give explicitly a countable dense subset of R".

14. Call two metrics dI and d2 on the same set M equívalent if thereexist positive constants CI,C2 such that d1(x,y) ~ C2d2(X, y) andd2(x, y) ~ c1d1(x, y) for all x and y in M. Prove that Xn ~ xin d1-metric if and only if Xn ~ x in d2-metric, if dI and d2 areequivalent.

± ... anan-l ... al where O ~ an < P - 1and that the ordi­nary rules of addition and subtraction make the completiona group (called the p-adic integers).

Chapter 9 Euclidean Space and Metric Spaces386

Page 406: Strichartz_The Way of Analysis 2000

Note that in definition 1 the distance function d(x, xo) refers to themetric on M, while d(f(x), f(xo)) refers to the metric on N. We willnot burden the notation with this distinction. Of course definition 1is the continuity at the point Xo for each point Xo in the domain. Weoccasionally will need the notion of continuity at a point-the conditionin definition 1 for an individual xo. In definition 2 we do not need toadd the condition f(limn_oo xn) = limn_oo f(xn), but this wiIl be animmediate consequence of shufHing X = lilIln_oo Xn into the sequence.We can paraphrase definition 2 as saying f preserves limits.

In the case of numerical functions we only discussed definition 3-inverse images of open sets are open-for functions whose domains wereopen subsets of 1R. The reason for this is that we had not yet discussedthe concept of open subset of M where M is a subspace of R. Of coursewhen M is an open subset of IR the open subsets of M are the subsets ofM that are open in :IR, so there is no difficulty. When M is not open inR, the meaning of "f-l(B) is open in M" is exactly what it says, openin M, not necessarily open in R.. Thus the general viewpoint enables

2. For all sequences in M, if Xl, x2, ... converges in M, then f(x1),f(X2), ... converges in N.

3. The inverse image f-l(B) of any open set B in N is an open setin M.

1. For every l/m and every Xo in M there exists l/n such thatd(x, xo) ~ l/n implies d(f(x), f(xo)) ~ l/m.

spaces are involved. Since we have had the experience of studyingfunctions whose domains and ranges are subsets of the reals, we willfind many of the concepts and proofs familiar. On the other hand therewill also be a few new ideas.

We introduce the notation f : M -+ N to mean f is a functionwhose domain is M and whose range is N, where both M and N aremetric spaces. The image f(M) is the set of a11values actually assumedby f, f(M) = {y in N: there exists x in M with f(x) = y}. It is asubset of N, not assumed equal to a11of N. If it is all of N we say fis onto. We are interested primarily in continuous functions, and as inthe case of numerical functions there are several equivalent definitions.We list three important definitions, all familiar:

3879.3 Continuous Functions on Metric Spaces

Page 407: Strichartz_The Way of Analysis 2000

Proof: First we show the equivalence of definitions 1 and 2. Supposedefinition 1 holds, and let Xk ~ Xo in M. We want to show f(Xk) ~f(xo) in N. Given any error l/m we use definition 1 at Xo to findl/n such that d(x, xo) :::;l/n implies d(f(x), f(xo)) :::;l/m. Then fromXk ~ Xo we know there exists j such that k ~ j implies d(Xkl xo) ~ l/n.Thus d(f(Xk), f(xo)) :::; l/m for k ~ j, which proves f(Xk) ~ f(xo)and, hence, definition 2.

Conversely, assume definition 2 holds. By shufHing Xo = limk_oo Xkinto the original sequence Xo, Xl, Xo, X2, XO, X3,' .. we still have a conver­gent sequence, and hence by definition 2 f(xo), f(x1), f(xo), f(X2), ...is also convergent, which can only happen if limk_oo f(Xk) = f(xo).Thus definition 2 implies the stronger statement "Xk ~ Xo impliesf(Xk) ~ f(xo)". Now let's establish definition 1 at the point xo. Sup­pose it were falseo Then there would exist l/m such that for every l/nthere exists Xn such that IXn- xol ~ l/n but If(xn) - f(xo)1 > l/m.The sequence xl, x2, ... clearly violates definition 2, sinee Xn ~ Xo butf(xn) does not converge to f(xo).

Next we show the equivalence of definitions 1 and 3. Assume firstdefinition 1 holds, and let B be any open set in N. We have to showt:' (B) is open. So let Xo be in r:' (B); this simply means f(xo) isin B. We need to find a ball Br(xo) contained in r: (B). Since B isopen, it contains a ball Bl/m(f(xo)) about f(xo), and by definition 1there exists l/n (for that l/m and xo) such that Ix - xol < l/n (x in

Theorem 9.3.1 For a function f :M ~ N, the three definitions oboueare equivalent.

us to improve our understanding even in the concrete setting of subsetsof R.

On the other hand, it is immaterial whether we take the range Nas given, or reduce it to the image f(M), or enlarge it to some spacecontaining N, as long as we keep the same metric on the image. Thereason for this is that f-l(B) is the same as f-l(B n f(M)), becauseonly points of B in the image of f contribute to the inverse image.Thus as B varies over the open subsets of N, Bn f(M) varies over theopen subsets of the image f (M)-and the same inverse images occur.

We can take any one of the three conditions as the definition of fis continuous on M, since we will now show they are equivalent.

Chapter 9 Euclidean Space and Metric Spaces388

Page 408: Strichartz_The Way of Analysis 2000

We note some simple properties of continuity that are easily de­duced from the definition. If f : M ~ N and 9 : N -+ P are con­tinuous, then 9 o f : M -+ P is continuous. If f : M ~ Rn and9 : M ~ Rn are continuous, then f + 9 : M -+ Rn is continuous. Iff : M -+ Rn and 9 : M ~ R are continuous, then 9 . f : M -+ Rn

is continuous. If f :M -+ N is continuous and MI ~ M is any sub­space, then the restriction of f to MI is continuous. If f : M ~ Rn

and we define the coordinate functions fk : M ~ R for 1 :5 k :5 n byf(x) = (ft(x), ... , fn(x)), then f is continuous if and only if all the fkare continuous. We leave the verification of these facts as exercises.

In the special case where the domain and range are Euclidean s­paces, or subspaces of Euclidean spaces, it is important to have a richcollection of continuous functions. We have already observed that therange space can be split into coordinate components, so it is reallyenough to consider the case f :Rn ~ R. The simplest nontrivial exam­ples are the coordinate projection maps fk(X) = fk(X¡, ••• , xn) = Xk.These are easily seen to be continuous by the preservation of limits cri­terion. From these, using composition and arithmetic operations, wecan establish the continuity of all "elementary" functions-functionsfor which we have a finite formula-provided we restrict the domain toall points for which the formula defining the function makes sense.

The class of polynomials on an ís the class of functions built upby addition and multiplication from the coordinate projections and theconstants. Writing these in concise notation is difficult, but the fol­lowing multi-index convention seems to work extremely well. Let a =(al, ... ,an) denote an n-tuple of non-negative integers (each ak canequal 0,1,2, ... ), and let XO = xrlx~2 ... x~n. Then p(x) = ¿coxo,where the sum is finite and co are constants, is the general polynomial

BI/n(xo)) implies If(x) - f(xo)1 < l/m (f(x) in BI/m(f(xo))). Thusthe ball BI/n(xo) Hesin f-I(B).

Conversely, assume definition 3 holds. Given the point Xo and l/m,we look at the inverse image ofthe ball BI/m(f(xo)). SinceBI/m(f(xo))is open in N, f-1(BI/m(f(xo))) is open in M by definition 3. Now Xobelongs to this inverse image since f(xo) is in B1/m(f(xo)); so by thedefinition of open set in M there is a ball BI/n(xo) contained in theinverse image f-1(BI/m(f(xo))), or x in BI/n(xo) implies f(x) is inBI/m(f(xo)). This is just another way of writing definition 1. QED

3899.3 Continuous Functions on Metric Spaces

Page 409: Strichartz_The Way of Analysis 2000

on ]Rn. We further let 10'1= 0'1 + 0'2 + ... +an (note this is a differentnotational convention from the usual Pythagorean metric on R"}. Wecall xQ a monomial of order or degree 10'1, and we call the order ofthe polynomial the order of the highest monomial appearing in it withnon-zero coefficient (frequently we are sloppy in saying "a polynomialof order m" when we really mean "a polynomial of order ~ m").

In dealing with functions defined in IRnor subsets of lRn, it is tempt­ing to think of the variable x = (Xl," ., Xn) as consisting of n distinctreal variables XI, X2, ••• , Xn. In particular, we can hold n - 1 of themfixed and vary just one, obtaining a function of one variable. We couldhope to reduce questions about a single function of n variables to ques­tions about the many functions of one variable that arise in this fashion.However, while there are sorne situations in which this technique provesuseful, in general it is very misleading. To understand why, we haveonly to consider the case n = 2, where we can visualize the domain asthe planeo By fixing one variable and varying the other, we sweep outall the horizontal and verticallines in the planeo The function 1(x 1,X2)considered as a function of Xl with X2 fixed is then the restriction of1 to the horizontal line X2 = a. Thus our seductiva suggestion is totry to reduce questions about 1to questions about its restriction to allhorizontal and vertical lines. This may not be too helpful if we needto compare the values of 1at two different points that are not on thesame horizontal or vertical lineo

Let 's look at continuity from this point of view. It is easy to see thatthe continuity of 1:Rn -T R implies the continuity of the restriction of1 to each of the lines obtained by holding n - 1 variables constant-thisis an immediate consequence of the preservation of limits criterion forcontinuity. We say that 1is separately continuous if it has this proper­ty: for every k and every fixed value of all Xj with j '# k, the functiong(Xk} = l(xI, ... , xn} is continuous. Thus continuity implies separatecontinuity. But the converse is not true. It is easy to give a counterex­ample in R,2. Take the function 1(x, y) = sin 28 where 8 = arctan (y j x)is the angular polar coordinate, and 1(O, O) = O. The factor of 2 ischosen so that 1 is zero, hence continuous, on the coordinate axes. Onevery line not passing through the origin the function is continuousbecause I(x, y) = 2xyj(x2 + y2} (this follows from sin 28 = 2 sin 8 cos 8and sin 8 = yjvx2 + y2,cos8 = xjvx2 + y2) and we don't encounterany zero divisions. Thus 1 is separately continuous. But 1 is discon-

Chapter 9 Euclidean Space and Metric Spaces390

Page 410: Strichartz_The Way of Analysis 2000

Continuous functions on compact domains have other special prop­erties. For example, if the range is the reals, then we can assert thatthe sup and inf are attained.

Proof: Since f is continuous, for each l/m and each point Xo thereexists l/n such that d(x, xo) < 2/n implies d(f(x), f(xo)) < l/2m.By the triangle inequality d(f(x), f(y)) < l/m if both x and y are inB2/n(xo). Here l/n depends on xo and l/m. Keeping l/m fixed andvarying Xo over M, consider the open covering by the smaIler ballsB1/n(xo). By the Heine-Borel property, there exists a finite subcover.That means there exists a finite number of points Xl, X2, ••• , xN andradii l/ni such that every point lies in one of the balls B1/nj (xi)' Wenow take l/n to be the smallest value of l/ni' Given any point x,we have X in Bl/nj(Xi) for sorne j. If d(x,y) $ l/n, then d(y,xj) ~d(x, y) + d(x, xi) $ l/n + l/ni :5 2/nj by the triangle inequality, soboth x and y belong to B2/nj (Xi)' We have already observed that thisimplies d(f(x), f(y)) :5 l/m. This proves the uniform continuity. QED

Theorem 9.3.2 Let M be a compact metric space, and let f :M --+ Nbe continuous. Then f is uniformly continuous.

We continue our discussion of continuous functions f : M --+ N forgeneral metric spaces. Just as in the case of nurnerical functions, wecan define a notion of uniform continuity in which the error relationsare uniform over the domain: for every error l/m there exists an errorl/n sucñ that d(x, y) $ l/n implies d(f(x), f(y)) $ l/m. Once againwe can show that a continuous function on a compact dornain is auto­matically uníformly continuous. The proof is a nice application of theHeine-Borel property.

9.3.2 Continuous Functions on Compact Domains

tinuous along any line through the origin not equal to one of the axes,for it assumes the value zero at the origin and is equal to the constantnon-zero value sin 28 along the ray with 8 constant. From this it is easyto see that f is not continuous.

3919.3 Continuous Functions on Metric Spaces

Page 411: Strichartz_The Way of Analysis 2000

We mention now that for sequences or series of functions on metricspaces, results analogous to those of Chapter 7 for numerical functionscan be obtained without difficulty. For example, if a sequence of con­tinuous functions fn : M --+ N converges uniforrnly to f : M --+ N,then f is continuous. The definition of uniform convergence and theproof can be repeated almost word for word. We leave the details asan exercise. Similarly for the Arzela-Ascoli theorem: if fn : M --+ lRisa sequence of real-valued functions on a compact metric space M thatis uniformly bounded and uniformly equicontinuous, then there existsa uniformly convergent subsequence. The same result is even true forfunctions fn :M --+ N as long as N is complete.

Proof: Let A be compacto To show f(A) is compact we need toshow every sequence of points in f (A) has a subsequence converging toa point in f(A). But a sequen ce of points in f(A) must have the formf(XI), f(X2), ... where Xl, X2, ... is a sequence of points in A. (Thepoint Xk may not be uniquely determined by f(Xk) if f is not one-to­one, but all that matters is that there is at least one such point). By thecompactness of A there exists a convergent subsequence x~ --+ Xo withXo in A. Then by the continuity of I, the subsequence f (xU convergesto f(xo), which is in f(A). Thus f(A) is compacto QED

Theorem 9.3.4 The image of a compact set under a continuous func­tion is compacto

More generally, if f :M --+ N is continuous and Mis compact, thenthe image f(M) is compacto This implies the previous result becausea compact subset of ~ is bounded and contains its sup and inf.

Proof: We give the proof just for sUPxf (x). There exists a sequenceof values {f(xn)} converging to the sup (or to +00 if the sup is +00).By the compactness of M we can obtain a convergent subsequencex~ --+ Xo. Then f(x~) --+ f(xo) by the continuity of f. Thus the sup isfinite and equals f(xo). QED

Theorem 9.3.3 Let f :M --+ lR be continuous and M compacto Thensup f (x) and inf f (x) are both finite, and there are poitits in M wherethese values are assumed.

Chapter 9 Euclidean Space and Metric Spaces392

Page 412: Strichartz_The Way of Analysis 2000

We consider next sorne analogs of the intermediate value property ofnumerical functions. If I is a continuous numerical function definedon an interval, then the interrnediate value property says it assumesall values in between values it assurnes-and this implies easily thatthe image of the interval must be an intervalo The first question thatfaces us, then, is to find some intrinsic properties of intervals thathave counterparts in a general metric space. The kind of propertiesfor which we are looking will express the intuitive idea of being of onepiece, or connectedness. There are actually two distinct concepts ofconnectedness. The first, which we will call connectness, involves theirnpossibility of splitting the space up into pieces. The second, whichwecall arcwise connectedness, involves being able to join any two pointsby a continuous curve; it turns out to be a stronger condition thanconnectness.

Let M be a metric space. We say M is connected if there doesnot exist a pair of disjoint nonempty open sets A and B with M =A UB. Equivalently, since the complement of an open set is closed, Mis connected il the only subsets 01M that are both open and closed arethe empty set and the whole space. To justify-at least in part-thisdefinition we should observe that an interval of R is connected, and theonly connected subsets of 1Rare intervals.

Let I be an interval of R-it doesn't matter whether or not I con­tains its endpoints or whether or not the endpoints are finite or infinite.Suppose 1 = A UB, A and B disjoint open (in 1) subsets. If A andB are both nonempty, they must contain points a and b and we mayassume a < b (A and B are disjoint, so we can't have a = b). Toproduce a contradiction we look for a dividing point between A and B.Thus let r = sup{z in I : z :5 b and z is in A}. Since a is in A anda < b, we have r ~ a. By the definition of the sup we have a sequenceof points Xl, X2, ••• in A converging to r, so r is in A since A is closedin l. But A is also open in l. Since r < b, (we have r =F b since r is inA), r is in the interior of 1, so A rnust contain an open neighborhood ofr, which contradicts the definition of r as the sup of points of A. ThusI is connected.

We now show, conversely, that iotervals are the only connectedsubsets of R (points are special cases of intervals, [a, a)). Suppose A is

9.3.3 Connectedness

3939.3 Continuous Functions on Metric Spaces

Page 413: Strichartz_The Way of Analysis 2000

a connected set in lR. and a < b < e with a and e in A. We want toshow this implies b is in A. Suppose it were not; then by dividing Aat b as A = (A n (-00, b)) U (A n (b, 00)) we would obtain a forbiddendecomposition into disjoint open sets. Thus we must have b in A.It is then a simple exercise to show that A must be an interval withendpoints inf A and sup A.

For the second concept of connectedness, we need to define thenotion of curve, also known as path or are. A curve in M is defined tobe a continuous function f : 1 ---T M where 1 is an interval in R Theimage f (1) is a set of points in M, which we can think of as being tracedout by f(t) as t varies in 1, interpreted as a time variable. Thus thecurve is a "trajectory of a moving particle" in M. We do not assume fis one-to-one, so the image of the curve is allowed to intersect itself, Itis tempting to identify the curve with the image f(l) in M, with f(t)supplying a parametric representation of it. Of course the same setof points in M can be the image of many different curves, so, strictlyspeaking, we are not justified in calling f (1) a curve-nevertheless,we will do so when there is no danger of confusion. When M is asubspace of lR.n the curve has the forrn f( t) = (Ir (t), h(t), ... , fn(t))where fk (t) are continuous numerical functions, giving the coordinatesof the trajectory at each time t. For example, f(t) = (cost,sint)is the curve whose image is the unit circle traced out in the usualmanner (counterclockwise at constant angular velocity) infinitely oftenas t varies in R The graph of a continuous function 9 : 1 ---T lR. is acurve in the plane given by f(t) = (t, g(t)) for t in l.

A metric space (or subspace of a metric space) is called arcwiseconnected (sometimes the term pathwise is used instead, but nevercurvewise) if there exists a curve connecting any two points. In moreprecise language, given any two points x and y in M there exists acurve f : [a, b] ---T M such that f(a) = x and f(b) = y. The intuitivecontent of this definition is clear; on a technical level it allows us toreduce questions about M to questions about the intervals [a, b]. Forexample, if 9 : M ---T lR. is any continuous real-valued function on anarcwise connected space M, we can show that 9 has the intermediatevalue property. Suppose g(x) = a and g(y) = b, with a < b. Letf : 1 ---T M be a curve connecting x to y, as shown in Figure 9.3.1.Then 9o f is a continuous function from 1 to lR. that assumes the valuesa and b at the endpoints of l. By the intermediate value property for

Chapter 9 Euclidean Space and Metric Spaces394

Page 414: Strichartz_The Way of Analysis 2000

The converse of this theorem is not true, although it is true underextra assumptions on M. (See exercise set 9.3.7, number 7.)

From a practical point of view, the easiest way to show a space isconnected is to show that it is arcwise connected. Indeed, most of thesubspaces of IRnwith which we will deal, such as balls and generalizedrectangles ({x in IRn;aj ::; x j ::; bj for j = 1, ... , n} ), are obviouslyarcwise connected. Any straight line segment in IRnis a curve, as isany broken line segment; so any subspace of IRnsuch that a11pointsmay be joined by broken line segments is connected, such as the regionshown in Figure 9.3.2.

Proof: Suppose M were arcwise connected but not connected. ThenM = A U B where A and B are disjoint non-empty open sets. Let xbe in A and y be in B, and let f : [a, b) ~ M be a curve connecting xand y in M. We claim ¡-l(A) and ¡-l(B) give a forbidden decompo­sition of the connected interval [a, b]. Indeed they are open sets because¡ is continuous and the inverse image of open sets is open. Further­more, for each t in [a, b), either ¡(t) is in A or B but not both. Thus[a, b] = f-l(A) U ¡-l(B) and ¡-l(A) and ¡-l(B) are disjoint. Thiscontradiction to the connectedness of [a, b] proves that M is connected.QED

Theorem 9.3.5 An arcwise connected space is connected.

Figure 9.3.1:

a b

numerical functions, 9 o¡assumes all values in [a, b); hence, 9 assumesall these values.

3959.3 ContinuoU8 Functions on Metric Spaces

Page 415: Strichartz_The Way of Analysis 2000

The implications in the theorem cannot be reversed. It is easy togive examples (see exercises) of f :M --+ N onto, where N is connectedbut M is noto

Proof: Both results are simple consequences of the definitions. Sup­pose, first, that M is connected. If N = A UB with A and B disjointnon-empty open subsets, then M = f-l(A) U f-l(B) since N = f(M)with f-l(A) and f-l(B) disjoint non-empty open subsets of M, a con­tradiction. So N is connected.

Next assume M is arcwise connected. Let x and y be points ofN = f(M). Then x = f(u) and y = f(v) for u and v in M. Since Mis arcwise connected, there exists a curve 9 : [a, b] --+ M joining u to v.Then f o 9 : [a, b] --+ N is a curve joining x to y. Thus N is arcwiseconnected. QED

Theorem 9.3.6 Let f : M --+ N be continuous and onto, f(M) = N.lf M is connected, then N is connected. lf M is arcwise connected,then N is arcwise connected.

The notion of connectedness allows us to attain the broadest gen­eralization of the intermediate value property: the continuous image ofa connected set is connected. Furthermore, since the only connectedsubsets of R are intervals, this implies immediately that a continuousfunction f :M --+ R has the intermediate value property if M is con­nected.

Figure 9.3.2:

Chapter 9 Euclidean Space and Metric Spaces396

Page 416: Strichartz_The Way of Analysis 2000

Hence, by induction,

Proof: Choose any x in M, and consider the sequence f (x), p(x),j3(x),. ... We will show it is a Cauchy sequence; for then, by thecompleteness of M, it wiIl have a limito By the contractive mappingproperty,

Contractive Mapping PrincipIe Let M be a complete metric spaceand / : M ~ M a contractive mapping. Then there ezists a uniquefixed point xo, and Xo = limn_oo ¡n(x) for any point x in M, withd(xo, ¡n(x)) ~ er" [or a constant e depending on x.

Note that this condition is just a Lipschitz condition with constantless than one. It says that under the mapping, aIl distances are reducedby at least a factor of r. If we apply / repeatedly we will shrinkdistances drastically, which makes the existence of a unique fixed pointseem plausible.

We introduce the notation ¡n for the iterated mapping [o]» ... 0/ (ntimes).

Deftnition 9.3.1 We say f : M ~ M is a contractive mapping ifthere exists a contant r < 1 such that d(f(x),f(y)) ~ rd(x,y) for allx and y in M.

The next theorem we discuss is the Contractive Mapping Princíple. Ithas a very simple proof but many important applications. We will useit many times in the chapters to come. We consider a function whosedomain and range are of the same metric space, which we assume iscomplete. The term mapping is sometimes used to denote a continuousfunction and suggests the intuitive idea of moving points around. Fora mapping I :M ~ M, we can ask if there are any fixed points,points for which f(x) = e, This is not merely idle speculation, becausemany problems can be cast in this formo The Contractive MappingPrinciple gives a criterion for there to be a unique fixed point and asimple constructive means of finding the fixed point.

9.3.4 The Contractive Mapping PrincipIe

3979.3 Continuous Functions on Metric Spaces

Page 417: Strichartz_The Way of Analysis 2000

It would seem that contractive mappings are hard to come by, s­ince they have such strong properties. This is in fact the case, and inmost applications we must first restrict the space. That is, we startwith a mapping I :M --+ M that is not necessarily contractive andseek a closed subspace Mo (this implies Mo is 'complete) on which Iis contractive. We then have to verify that d(f(x), I(y)) ~ r d(x, y)for every x and y in Mo and also that I (x) is in Mo for every x inM«. This second condition is easy to overlook, but it is necessary if weare to apply the theorem to I restricted to Mo and have a mappingI : Mo --+ Mo with the same domain and range. We will discuss thisin detail when we apply the Contractive Mapping Principle to obtainsolutions of differential equations.

with r < 1, which is impossible. QED

Finally the fixed point is unique, for if Xl were another one, then

d(r(x), xo) = lim d(¡n(x),lm(x))m--+oo00 n

< "" rkd(f(x), x) = _r_d(f(x), x).L....,¡ l-rk=n

by the continuity of l. The aboye estimates also give us the rate ofconvergence,

I(xo) = I( lim r(x)) = lim r+l (x) = Xon-oo n--+oo

by the triangle inequality. But d(f(x), x) is just a constant, and wecan make (rm-l + ... + rn) as small as we please by making n largeenough, because ¿rn converges (since r < 1). Thus, {¡n(x)} is aCauchy sequence and Xo = limn--+oo In (x) exists because M is complete.It is easy to see that Xo is a fixed point since

d(fm(x), r(x)) < d(fm(x), Im-l(x)) + ... + d(r+r(x), r(x))< (rm-l + rm-2 + ... + rn)d(f(x), x)

Thus, if m > n, we have

Chapter 9 Euclidean Space and Metric Spaces398

Page 418: Strichartz_The Way of Analysis 2000

We consider next a famous generalization of the Weierstrass approx­imation theorem. A straightforward generalization would simply bethat any continuous real-valued function I :M -+ R for M a com­pact subspace of R" can be uniíormly approximated by polynomials.This will foIlow from a more general theorem of Marshall Stone, whichis known as the Stone- Weierstrass theorem. Stone asked the question:what are the properties oí the polynomials that enable them to approx­imate arbitrary continuous functions on an interval? He observed thatthe collection oí polynomials on an interval, call it 1', has the propertythat it forms a vector space: if I and 9 are in 1', then al + bg is inl' for constants a, b. Also it is closed under multiplication: I and 9in -p imply I .9 is in 'p. We summarize these properties by saying -pforms an algebra. More generally, if A denotes any coIlection of real­valued functions on a set M, we say A forms an algebra if I and 9 inA imply al + bg and l· 9 are in A. If M is any subset oí]Rn, then thepolynomials on M form an algebra.

Stone discovered that any algebra oí continuous functions on a com­pact metric space wiIl suffice to approximate uniformly all continuousreal-valued functions on M, provided that we impose some conditions­which are obviously necessary-to guarantee that A is large enough.We say that A strongly separates points on M if 1) given any x in Mthere exists I in A with I(x) '# 0, and 2) given any distinct pointsx and y in M there exists I in A with I(x) '# I(y). If A failed to

9.3.5 The Stone-Weierstrass Theorem*

It is tempting to consider weakening the contractive property toallow r = 1, but this will not work. The simplest counterexample isthe translation I(x) = x+ Ion R, for which d(f(x), I(Y)) = d(x, y) butthere are no fixed points. It is not even possible to allow the conditiond(f(x),/(y)) < d(x, y) for every x and y, for there are examples ofmappings without fixed points satisfying this condition.

There are many other fixed point theorems, such as the Brouwerfixed point theorem, which asserts that there is always a fixed point(not necessarily unique) if M is a closed hall in Rn. But that theoremis non-constructive, and Brouwer himself had to renounce it when hebecame a constructivist! Incidentally, there does not have to be a fixedpoint if Misan open ball.

3999.9 Continuous Functions on Metric Spaces

Page 419: Strichartz_The Way of Analysis 2000

Lemma 9.3.1 Let g1 and g2 be inA. Then for every error l/m thereexist funetions g3 and g4 inA su eh that Ig3 - max(g1, g2)I ::; l/m andIg4 - min(gl,g2)1 ::; l/m [or all points 01M.

Proof: Since max(gl' 92) = (gl +g2+ Ig1- g21)/2 and min(g1, g2) =(g1+g2 -lg1 - 921)/2, it suffices to prove the analogous statement for Igl :if 9 is in A there exists g1 in A with Ig1-lgl 1 ::; l/m. Here we use thefact that A is an álgebra. Let A = sUPxIg(x)l, which is finite becauseM is compact and 9 is continuous. Then Ig(x)1 is the compositionof 9 : M -T R followed by the function Ixl : [-A, Al -T R. By theWeierstrass approximation theorem on the interval [-A, A], there exists

Proof: We have to show that given any continuous function1 :M -T R and any error l/m there exists 9 in A with I/(x) - g(x)1 ::;l/m for all x in M. We write this estimate as I(x) - l/m::; g(x) ::;I(x) + l/m. We will first get 9 in A to satisfy g(x) ~ I(x) - l/m, andthen we will decrease 9 to get the other inequality. A key technicaldevice is that if g1 and g2 are in A, then max(g¡, g2) and min(g1, 92)are "almost" in A. More precisely, we have the following lemma:

Theorem 9.3.7 (Stone-Weierstrass Theorem) Let M be a eom­paet metrie space and A an algebra 01continuous real-valued funetionson M that strongly separates points. Then any eontinuous reol-valuedfunetion on M can be unilormly approximated by funetions in A.

strongly separate points (say 1(xo) = O for all 1 in A or 1(x 1) = I (X2)for all 1 in A), then the same would be true of all uniform limits offunctions inA and we could never approximate all continuous functionswithout these properties. Thus the condition that A strongly separatespoints is necessary. On the other hand the condition that A forms analgebra is not necessary; there are many interesting examples of vectorspaces of functions that do not form an algebra and yet can uniformlyapproximate all continuous functions. Nevertheless, there are enoughapplications of the Stone-Weierstrass theorem to justify its fame. Theproof of the Stone-Weierstrass theorem uses the one-dimensional Weier­strass theorem and so cannot be used to give an alternative proof ofthe Weierstrass theorem.

Chapter 9 Euclidean Space and Metric Spaces400

Page 420: Strichartz_The Way of Analysis 2000

a polynomial p with Ip(t) -Itll ~ l/m for every t in [-A, Al (of coursep depends on both l/m and A). Then Ip(g(x)) -lg(x)11 ~ l/m (justtake t = g(x)), and p(g(x)) is in A because A is an algebra. This provesthe lemma.

Returning to the proof of the theorem, we begin by finding a so­lution of a simpler interpolation problem-to find a function 9 in Asatisfying g(Xk) = Yk for any finite distinct set of points xl, ... , Xn inM and any real values YI, ... , Yn' This step in the proof uses the hy­pothesis that A strongly separates points. We will only need to usethe result for pairs of points Xl, X2 and so we give the proof in thatcase. The reader can easily supply the general case proof by induction.We know there is a function hl in A such that hl(XI) =1- hl(X2), andwe can assume without loss of generality that b, (xl) =1- O and evenhl (Xl) = 1 by multiplying by a suitable constant. We have to considertwo cases, depending on whether or not b, (X2) = O. If b, (X2) =1- O,thenhr(X2) =1- hl(X2) (we know hl(X2) =1- hl(xl) = 1), and linear combina­tions of hl and hr will solve the interpolation problem. If b, (X2) = O,then we use the existence of h2 in A with h2(X2) =1- O, say h2(X2) = 1,and then linear combinations of hl and b: will solve the interpolationproblem.

Consider any continuous function f : M -+ R For any two pointsXl and x2 we can find h in A with h(x) = f(x) for X = Xl or x2.Because both functions are continuous, there exists a neighborhood ofeach ofthese points on which h(x) 2: f(x) -l/m. By holding one ofthepoints fixed, say Xl, and varying X2, we get an open covering of M byneighborhoods A for which there exists h in A with h(Xl) = f(XI) andh(x) 2: f(x) -l/m for all x in A. By the compactness of M there existsa finite subcovering. Thus M ~ U~=l s, and for each BJ there existshj in A such that hj(XI) = f(XI) and hj(x) 2: f(x) - l/m for X in BJ'Now the function H = max(hl, ... ,hk) satisfies H(xI) = f(xl) andH(x) 2: g(x) - l/m for every X in M. By the lemma, and induction,we can find 9 in A such that Ig(x) - H(x)1 < l/m for every X in M.Thus the function 9 in A satisfies g( x) 2: f (x) - 2/ m for every X in Mbut g(Xl) ~ f(XI) + l/m, as well. (In Figure 9.3.3 the graph of 9 isconstrained to lie aboye the lower line-the graph of f(x) - 2/m, butat x = Xl it must pass through the verticalline-between f(Xl) - 2/mand f(Xl) + l/m.)

4019.3 Continuous Functions on Metric Spaces

Page 421: Strichartz_The Way of Analysis 2000

There is also a complex-valued version of this theorem, where werequire the additional hypothesis that if g( x) is in A, then g( x) is inA. The proof can be reduced to the real version by taking real andimaginary parts. We leave the details for the exercises. Without thisadditional hypothesis the theorem is not true, but a proper explanationof the counterexample requires the theory of complex variables.

If M is any compact subset of ]Rn, then the Stone-Weierstrass the­orem applies to the algebra of polynomials on M. Indeed the constantpolynomial f(x) == 1 gives a non-zero value at every point of M, and

Thus we have succeeded in both pinning down the value of 9 near thatof f at one point Xl and at the same time bounding the values of g(x)below by f(x) - 2/m at all points.

We now repeat the same procedure to decrease g, preserving thelower bound we have already obtained, and at the same time gettingthe desired upper bound. Since we have one such function for each pointXl of M and by the continuity of 9 and f we have g(x) ~ f(x) + 2/mfor X in a neighborhood of Xl, we can use the compactness of M to finda finite set of functions gl, ... ,gk in A and an open covering Bl, ... , Bksuch that gj(x) ~ f(x) + 2/m on Bj. Of course we still have gj(x) ~f(x) - 2/m on all of M. Thus G(x) = min{gl (x), ... ,9k(X)} satisfiesf(x) - 2/m ~ G(x) ~ f(x) +2/m for all X in M. By the lemma we canfind 9 in A satisfying IG(x)-g(x)1 ~ l/m on M, so Ig(x)- f(x)1 ~ 3/m.Thus any continuous function f can be uniformly approximated byfunctions in A. QED

XlFigure 9.3.3:

f(x) - 2!n

f(Xl) + ~ f(x)

Chapter 9 Euclidean Space and Metric Spaces402

Page 422: Strichartz_The Way of Analysis 2000

We turn now to certain examples of continuous functions that havevariously been described as pathological, bizzare, monstrous, obscene,etc. The first such example is a continuous nowhere differentiable func­tion, discovered by Bolzano around 1830. His discovery was not widelycirculated and so when Weierstrass discovered a similar example some40 years later it was regarded as new and shocking. The reason for thiswas that mathematicians had tacitly assumed that a11continuous func­tions would be differentiable except for isolated exceptional points. Itis tempting to react to these examples by saying, "very we11,but let'sjust add some hypotheses to rule them out". It turns out, however,that nowhere differentiable functions have an important role to playin very down-to-earth problems. In the study of Brownian motion­one of the central topics in modern probability theory with widespreadphysical applications-one finds that nowhere differentiable functionsare the rule, not the exception. (With probability one, every Brownianmotion path is nowhere differentiable.)

Bolzano's example was very graphic. It is also a model for a kind ofconstruction that occurs often in the theory of fractals. The functionwe seek will be the limit of a sequence of approximating functions. Forsimplicity, we take [0,1] for the domain and range. The first functionwe consider is h (x) = z: For the second function we add a zigzag tothe graph, as shown in Figure 9.3.4. For the third function we takeeach of the three straight line segments of the graph of [z and addzigzags to them. Continuing in this fashion we pass from fn-l to fnbyadding zigzags to all the straight line segments of the graph of fn-l.This process is illustrated in Figure 9.3.5. By controlling the size ofthe zigzags appropriately we can make the sequence f¡,h,... convergeuniformly to a limit function f that will be continuous. It is certainlyplausible that f should fail to be differentiable at the countable set ofpoints where we have corners in the graphs of the In. But why shouldI(x) fail to have a derivative at every point?

To understand this we need a simple remark. Normally we computethe derivative at a point Xo by taking the limit of a difference quotient

9.3.6 Nowhere Differentiable Functions, and Worse*

one ofthe coordinate monomials fk(X) = Xk will assume different valuesat two distinct points.

4039.3 Continuous F\mctions on Metric Spaces

Page 423: Strichartz_The Way of Analysis 2000

f(xn) - f(xo) _ r (xo)Xn - Xo

where an = (xn - xo)/(xn - Yn) and bn = (xo - Yn)/(xn - Yn)' Noticethat an + bn = 1 and an and bn are ~ Obecause oí the assumptionYn ~ Xo ~ xn. Thus O~ an, bn ~ 1 and since both

Proof: The idea is that the difference quotient (f(xn)- f(Yn))/(Xn- Yn)is an average of the difference quotients (f(xn) - f(xo))/(xn - xo) and(f(Yn) - f(xo))/(Yn - xo)· In fact we compute

f(xn) - f(Yn) _ f(xn) - f(xo) + b f(Yn) - f(xo)___;____;,__ __;",_...;...- Cln n___;__.;..._ _;,_...;_Xn - Yn Xn - Xo Yn - Xo

l· f(xn) - f(Yn) - f'( )1m - Xo.n-oo Xn - Yn

Lemma 9.3.2 Suppose f(x) is defined in a neighborhood of Xo anddifferentiable at Xo. Then if Xn ~ Xo and Yn ~ Xo with Yn ~ Xo ~ Xnwe haue

(f(X) - f(xo))/(X - XO) where Xo is one of the points at which weevaluate f. However we could also hope to get f' (XO) as a limit ofdifferent quotients (f(x) - f(Y))/(x - y) where both points x and yapproach xo. It turns out that this is the case if x and y stay onopposite sides of Xo.

Figure 9.3.4:

11:31

1

Chapter 9 Euclidean Space and Metric Spaces404

Page 424: Strichartz_The Way of Analysis 2000

3m ·3mNote that i-« is linear on the interval [k/3m, (k+l)/3m], so the estimatefor f:n shows 13m(fm((k + 1)/3m) - fm(k/3m))1 ~ (3/2)m. Thus thedifference quotient for f computed at points k/3m and (k + 1)/3m islarge. But for any point Xo in the unit interval and any m there is sorne

We use this result in contrapositive formo To show t' (xo) doesnot exist we need only find sequences {xn} and {Yn} surroundingXo and converging to Xo for which the difference quotients(f(xn) - f(Yn))/(xn - Yn) do not converge. This makes life easy, for weneed only know how to compute f at a countable dense set of pointsin order to compute difference quotients and show f' does not exist atevery point.

We are now in a position to verify Bolzano's example. To be specific,we add the zigzags by dividing each interval of the domain in thirds. Onthe first and last thirds we cover 3/4 of the vertical distance in the samedirection and in the middle third we cover 1/2 the vertical distance inthe opposite direction, as shown in Figure 9.3.5. The exact values arenot important. We note that If{(x)1 ~ 3/2 at all points except x = 1/3and 2/3 where ff doesn't existo By induction If~(x)1 ~ (3/2)n at aHpoints except x = k/3n where f~ doesn't exist, since adding the zigzagmultiplies the slope of the line segment by 9/4 on the first and lastthirds and -3/2 on the middle third. We also note that the value offn(x) does not change for x of the form k/3m once n ~ m so that

can also be made as small as desired. QED

an (f(Xn) - f(xo) - fl(XO))Xn - Xo

+bn (f(Yn) - f(xo) - fl(XO))Yn - Xo

f(xn) - f(Yn) _ f'(XO)Xn - Yn

f(Yn) - f(xo) _ f'(XO)Yn - Xo

can be made as small as desired,

and

4059.3 Continuous Functions on Metric Spaces

Page 425: Strichartz_The Way of Analysis 2000

value of k such that k/3m :5 Xo :5 (k + 1)/3m and so, by the lemma, fis not differentiable at xo. Thus f is nowhere differentiable.

It remains to verify that f is continuous-that the sequence {fn}converges uniformly. In fact we claim Ifn(x) - fn-¡(x)1 :5 (3/4)n, forall x, which wiIl establish the uniform convergence by comparison withE(3/4)n. Notice If¡(x) - fo(x)1 :5 3/4 by inspection. Furthermore,the maximum vertical height of any line segment in the graph of f¡ isat most 3/4. Adding the zigzag to any line segment does not changethe value of the function by more than 3/4 times the vertical heightof the segment, and each of the three segments of the zigzag are ofheight at most 3/4 times the vertical height of the original segmentoThus by induction we have that all the vertical heights of the segmentscomposing the graph of fn are at most (3/4)n, and Ifn(x) - fn-¡(x)I :5(3/4)n as desired.

While Bolzano's example is geometrically appealing, it has the de­fect that there is no reasonable formula for the function. To obtain anexample of a nowhere differentiable continuous function with a formu­la, we note that in Bolzano's example the differences [« - fn-l havegraphs that zigzag rapidly around the x-axis. Now if g(x) denotes thefunction periodic of period 2 whose graph is as shown in Figure 9.3.6,then g(2nx) is also a function that zigzags rapidly about the x-axis andvanishes at x = k/2n.

Thus we could try to get an example in the form E:=o ang( 2nx)for appropriate coefficients ano In fact it is not hard to show f(x) =E~=o(3/4)ng(4nx) will work. We leave the details for the exercises.

Figure 9.3.5:

5 2 7 8 19 '399

11

f¡ and hJo and f¡

Chapter 9 Euclidean Space and Metric Spaces406

Page 426: Strichartz_The Way of Analysis 2000

The example of Weierstrass is similar, except sin 7rX is used in place ofg(x).

Our last example is that of a space-filling curve. The individualcomponents of the curve are nowhere differentiable functions, so wecan think of this curve as a higher species of monster. To be specific,we will construct a continuous function f : [O, 1] ~ ]R2 such that theimage is the square O ::; x, y ::; 1. This function is not one-to-one, andit is possible to prove that there is no one-to-one example. However,the proof requires a rather elaborate study of the topological meaningof dimension and is beyond the scope of this book. Think of a childscribbling on a piece of paper if you want an intuitive idea of how onemight proceed to obtain an approximation to such a curve. We justhave to make sure that the scribbling doesn't omit any region of thepaper!

The original example is due to Peano, and such curves are usuallycalled Peana cUnJes. The example we give is due to 1. J. Schonberg.We let t denote the parameter variable in [0,1] and x(t) and y(t) thecoordinates of the curve (x(t), y(t)). We need to show that x(t) andy(t) are continuous functions and that for every point (xo, YO) in thesquare there exists to such that (x ( to), y( to)) = (xo, Yo).

Let f (t) and g( t) be the functions whose graphs are sketched inFigure 9.3.7 and which are extended to the whole line to have period1. The idea behind these functions is that the curve (f(t), g(t)) visitsall four corners (O, O), (O, 1), (1, O), and (1,1) of the square and remainsin them during the intervals [.1, .2], [.3, .4], [.5, .6], [.7, .8] respectively.In particular, if we write an arbitrary point (x, y) in the square byexpressing x and y in binary expanion, x = .x 1X2 ••• ,y = 'YlY2 ••• ,then we can match the first digits ,Xl and .Yl by f(t) and g(t) bycontroUing the first digit in the decimal expansion of t (here we are

Figure 9.3.6:

4079.9 Continuous Functions on Metric Spaces

Page 427: Strichartz_The Way of Analysis 2000

{O if tk = 1 or 3,

Ek= 1 if tk = 5 or 7.

Similarly we have y(t) = E~l 61c/2k = .6162'" in binary notation,

00 1y(t) =E 2kg(10k-1t).

k=1

Notice that these series converge uniformly by comparison with El/2kand so the limits are continuous functions.

Now suppose t is a real number in [0,1] whose decimal expan­sion t = .tlt2 .. ' containa only the digits 1,3,5,7. Then !(10k-1t) =¡(.tktk+1 ••• ) because f is periodic of period 1, and this is O if tk = 1or 3 and 1 if tk = 5 or 7. Thus x(t) = ¿~1 Ek/2k = .EIE2 •.• in binarynotation, where

00 1x(t) =E 2k!(10k-1t),

k=1

thinking of .111 ... as the binary expansion of 1).To control all the binary digits oí % and y simultaneouslyby the

decimal digits in t we take the infinite series

Figure 9.3.7:

o .1 .2 .3 .4 .5 .6 .7 .8 .9

!

1 !

Ol!~ ~¡I ~~f __O .1 .2 .3 .4 .5 .6 .7 .8 .9

Chapter 9 Euclidean Space and Metric Spaces408

Page 428: Strichartz_The Way of Analysis 2000

9. Give an example of a 'continuous mapping of a noncompact setonto a compact set.

10. Formulate a definition of uniform convergence for a sequence offunctions In : M -+ N, and prove that the uniform limit of con­tinuous functions is continuous.

7. Prove that if A is an open set in Rn, then A is connected if andonly if A is arcwise connected. (Hint: consider the set of pointsin A that can be joined to a given point Xo by a curve.)

8. Give an example of a continuous mapping of a disconnected setonto a connected seto

6. If Ik : M -+ R for k = 1, ... ,n and I(x) = (f¡(x), ... , In(x)) :M -+ Rn, prove 1 is continuous if and only if all the 1k are con­tinuous.

5. If 1:M -+ N is continuous and M1 ~ M is any subspace, provethat the restriction of 1 to M1 is continuous.

3. If 1 :M -+ N and g: N -+ Pare continuous, prove gol: M -+ Pis continuous. Give an example where gol is continuous but 9and 1 are noto

4. If 1 :M -+ Rn and 9 : M -+ R are continuous, prove 9 . 1 :M -+Rn is continuous.

9.3.7 Exercises1. Prove that 1 : M -+ N is continuous if and only if 1-1(A) is

closed in M for every set A closed in N.

2. If M is a metric space and Xo a point in M, prove that d(x, xo) :M -+ R is continuous.

Thus by choosing tk appropriately we can simultaneously obtain anybinary expansions .€1€2 ... and .8182,,, and so (z(t),y(t)) takes onevery value in the unit square for one such t.

if tk = 1 or 5,if tk = 3 or 7.

where

4099.3 Continuous Functions on Metric Spaces

Page 429: Strichartz_The Way of Analysis 2000

19. Let g( x) be the sawtooth function in Figure 9.3.6. Prove thatf(x) = L~=o(3j4)ng(4nx) is a nowhere differentiable continuousfunction. (Hint: evaluate the difference quotient4n(f((k+ 1)j4n) - f(kj4n)), and show that the contribution from(3j4)n-lg(4n-1x) is the dominant term.)

18. Show that the set of trigonometric polynomials (functions of theform L~=-n akeikO for sorne n) satisfies the hypotheses of theStone-Weierstrass theorem (complex-valued) on the circle.

17. Let T: C([O, 1]) --+ C([O, 1]) be defined by Tf(x) = x+ fox tf(t) dt.Prove that T satisfies the hypotheses of the contractive mappingprinciple, Show that the fixed point is a solution to the differentialequation f'(x) = xf(x) + 1.

16. Let a be a fixed real number with 1 < a < 3. Prove that themapping f(x) = (xj2) + (aj2x) satisfies the hypotheses of thecontractive mapping principle on the domain (1,00). What is thefixed point?

15. Give an example of a continuous mapping of (0,1) onto (0,1) withno fixed points.

14. Show that the conclusion of the Stone-Weierstrass theorem is e­quivalent to saying A is dense in the metric space C(M) of contin­uous real-valued functions on M with metric d(f, g)supx If(x) - g(x)l·

13. Prove that if A is an algebra of complex-valued functions on acompact metric space that strongly separates points and such thatif f is in A, then ¡is in A, then any continuous complex-valuedfunction on M can be uniformly approximated by functions in A.

12. Prove that if A satisfies the hypotheses of the Stone-Weierstrasstheorem then for any distinct points XI, ... , Xn in M and anyreal values YI, ... , Yn there exists a function f in A satisfyingf(Xk) = Yk, k = 1, ... ,n.

11. State and prove a version of the Arzela-Ascoli theorem for se­quences of functions fn : M --+ N where N is complete and M iscompacto

Chapter 9 Euclidean Space and Metric Spaces410

Page 430: Strichartz_The Way of Analysis 2000

29. *Let K denote the set of compact subsets of a complete metricspace M. Define the Hausdorlf distance on J(, as follows: dH(A, B)

28. If Al ;2 A2 ;2 ... is a nested sequence of compact, connected sets,show that nn An is also connected. Similarly, show that if thesets are arcwise connected, then so is the intersection. Would thesame be true if we did not assume compactness?

27. Prove that the graph of a continuous function f : 1 --+ ]Rfor aninterval 1 is a connected subset of ]R2.

is a connected subset of]R2 even though 1 is discontinuous. Is itarcwise connected?

1(x) = { Soin l/xx -=f. Ox=O

26. Show that the graph of

25. Give an example of a contractive mapping on an incomplete metricspace with no fixed point (Hint: remove the fixed point from acomplete metric space.)

24. Give examples of continuous functions 1:M --+ N that are ontosuch that M is complete and N is incomplete or M is incompleteand N is complete.

23. Give an example of a continuous function f :M --+ N that doesnot take Cauchy sequences in M to Cauchy sequences in N.

22. Give an example of a metric space that is complete but not con­nected and one that is connected but not complete.

21. Show that 1: [0,27r) --+ S defined by I(t) = (cost,sint), where Sis the unit circle in ]R2, is one-to-one, onto, and continuous, but1-1 is not continuous.

20. Let 1:M --+ N be continuous and onto, and let M be compactoProve that A is closed in N if and only if 1-1(A) is closed inM. Prove that if 1 is also one-to-one, then 1-1 : N --+ M iscontinuous.

4119.3 Continuous Functions on Metric Spaces

Page 431: Strichartz_The Way of Analysis 2000

Definition A metric space M is a set with a real-valued distance func­tion d(x, y) defined for X, y in M satisfying

1. d(x, y) ~ O with equality if and only if X = Y (positivity) ,

Definition IRn is the set of ordered n-tuples x = (Xl, X2, ••• ,xn) of realnumbers.

9.1 Structures on Euclidean Space

9.4 Summary

32. Let TI and T2 satisfy the hypotheses of the contractive mappingtheorem on the same metric space M with the same contractiveratio r. Suppose TI and T2 are close together, in the sense thatd(TIX, T2x) ::; 6 for all X in M. Show that the fixed points Xl

and X2 of TI and T2 are also close together, namely d(xll X2) ::;6/(1 - r).

31. If f satisfies the hypotheses of the contractive mapping princípleand Xl is any point in M, show that d(XI, xo) ::; d(XI, f(XI))/(l­r) where Xo is the fixed point. Informally, this says that if f(XI)is close to Xl, then Xl is close to the fixed point (but r must notbe too close to 1 for this to be a good estimate).

30. *An iterated function system on a complete metric space M is de­fined to be a finite set h, ... ,I-« of contractive mappings. Provethat there exists a unique compact set K (called the attractor)such that K = U;:l fJ (K). (Hint: show that the mappingA --? UJ=l fJ(A) on JC satisfies the hypotheses of the contractivemapping principle, using the Hausdorff distance from the previousexercise. )

a. Show that dH is a metric on IC.

b. Show that ICis complete in this metric.

is the smallest value of 6 such that for every point a in A thereexists a point b in B with d( a, b) ::; 6 and for every b in B thereexists a in A with d( a, b) ::; 6.

Chapter 9 Euclidean Space and Metric Spaces412

Page 432: Strichartz_The Way of Analysis 2000

Theorem On an inner product space the polarization identity (x, y) =(lIx + yll2 - IIx - yIl2)/4 holds, and the a8,ociated noma satisfies theparallelogrom law IIx + yll2 + IIx - yll2 = 211xll2+ 211y1l2.

Example On R" x . y = Ei=l XjYj is an inner product; hence, Ixl =JEi=l x~ is a noma and Ix - yl Í8 a metric.

Theorem 9.1.2 1I (x,y) is an inner product, then IIxll = J(x,x) isa noma.

Deftnition An inner product on a real vector space Í8 a real-valuedfunction (x, y) defined [or all x and y in the vector 'pace ,atisJying

1. (x, y) = (y, x) (symmetry),

2. (ax + by, z) = a(x, z) + b(y, z) and (x, ay + bz) =a(x, y) + b(x,z) jor all real numbers a,b (bilinearity),

3. (x, x) ~ Owith equality if and only if x = O(po,itive definiteness).

Theorem 9.1.1 (Cauchy-Schwartz Inequality) On a real inner productspace, I(x, y) I s J (x, x).¡ry::y), with equality il and omy il x and yare colinear.

Example Let C([a, b)) denote the continuous functions on [a,b). ThenII/lIsup = supI/(x)1 is a noma on C([a, b)), called the ,up noma.

2. d(x, y) = d(y, x) (,ymmetry),

3. d(x, z) ~ d(x, y) + d(y, z) (triangle inequality).

Deftnition A noma on a real or complex vector 'pace ú a functionIlxII defined for every x in the vector 'pace satúfying

1. ll=ll 2: O with equality il and omy il x = O (positivity),

2. Ilaxll = lalllxll for any ,calar a (homogeneity),

3. Ilx + yll s Ilxll + Ilyll (triangle inequality).

Theorem I/lIxll Í8 a noma, then d(x, y) = IIx - yll (called the inducedmetric) is a metric.

4139.4 Summary

Page 433: Strichartz_The Way of Analysis 2000

Definition limn_oo Xn = x means for alll/m there exists N sucti thatn ~ N implies d{xn, x) ~ l/m. We say x is a limit point 01 a sequence{xn} il every neighborhood 01 x contains Xn jor infinitely many n andx is a limit point 01 a set A il every neighborhood 01 x contains points

Theorem 9.2.2 In a metric space, an arbitrary union 01 open sets, ora finite intersection 01 open sets, is open.

Theorem 9.2.1 Let M be a subspace 01MI. A set A is open in M iland only il there exists Al open inMI sucñ that A = Al nM. 11M isopen in MI, then a subset 01M is open in M il and only il it is openin MI.

Definition A subset A 01 a metric space M is said to be open inM il every point 01 A lies in an open ball entirely contained in A.A neighborhood 01 a point is an open set containing the point. Theinterior 01 a set A is the subsei 01 all points contained in open ballscontained in A.

Definition The open ball B; (y) in a metric space with center y andradius r is Br{Y) = {x : d{x, y) < r}.

Definition A subspace M' 01 a metric space M is a subset 01M withthe same metric.

9.2 Topologyof Metric Spaces

Definition A complex inner product on a complex vector space is acomplex-valued function (x, y) defined [or all x and y in the space sat­isfying

1. (x, y) = (y, x) (Hermitian symmetry),

2. (ax + by, z) = a(x, z} + b(y, z} and (x, ay + bz) =a(x, y} + b(x, z} (Hermitian linearity),

3. (x, x) is real and (x, x) ~ O with equality il and only il x = O(positive definiteness).

Chapter 9 Euclidean Space and Metric Spaces414

Page 434: Strichartz_The Way of Analysis 2000

Theorem 9.2.7 A subspace o/]Rn is compact il and only il it is closedand bounded (this is not troe 01 general metric spaces).

Theorem A subspace A 01a complete metric space M is itsell completeil and only il it is a closed set in M.

Lemma 9.2.1 A compact metric space has a countable dense subset,and given any l/m there exists a finite set 01 points such that everypoint is within distance l/m of one 01them.

Theorem 9.2.6 (Heine-Borel) A metric space is compact il and onlyil it has the Heine-Borel property: every open covering has a finitesubcovering.

Theorem A compact metric space is complete.

Deftnition We say A is compact il every sequence 01 points in A hasa limit point in A.

Theorem 9.2.5 C([a, b)) with the sup-nonn metric is complete.

Corollary 9.2.1 Rn is complete.

Theorem 9.2.4 A sequence x(I), x(2), ... in]Rn converges to x il andl :1 th ,1 rd' t (1) (2)on 11 'J e sequence oJ COO ma es Xk 'Xk , ••• converges to Xk lor

every k = 1, ... ,n.

Deftnition We say {xn} is a Cauchy sequence illor every l/m thereexists N such that d(xj, Xk) ~ l/m [or all i,k ~ N. A metric space iscomplete il every Cauchy sequence has a limito

Theorem 9.2.3 In a metric space, a set is closed il and only il itscomplement is open.

01 A not equal to e, A set is closed il it contains all its limit points.The closure 01 a set consiste 01 the set together with all its limit points.II A ~ B we say A is dense in B il the closure 01A contains B.

4159.4 Summary

Page 435: Strichartz_The Way of Analysis 2000

Theorem 9.3.3 If M is compact' and f : M --+ IR is continuous, thensUPx f(x) and infx f(x) are finite and there are points in M where fattains these values.

Theorem 9.3.2 Let M be compacto Then f M --+ N continuousimplies it is uniformly continuous.

Definition f : M --+ N is said io be uniformly continuous if for everyl/m there exists l/n such that d(x,y) ~ l/n implies d(f(x),f(y)) ~l/m.

where 8 = arctan (y / x) is the polar coordinates angular variable, is notcontinuous at the origin, but is continuous in x for each fixed y andcontinuous in y for each fixed x.

Example The function f : JR2 --+ IRgiven by

f(x, y) = { sin 28 (x, y) # (O, O),O (x, y) = (O, O)

Theorem Continuous functions are closed under resiriction to a sub­space, composition, addition (when the range is IRn) and multiplieation(when the range of one is IRand the other IRn). If f(x) = (11(x), ... ,fn(x)) is a function f : M --+ IRn, then f is continuous if and only ifall fk : M --+ IRare continuous.

3. if B is any open set in N, then f-1 (B) is open in M.

In part 2 it follows that limn_oo f(xn) = f(limn_oc xn).

2. ifx1, X2, ... is any convergent sequence in M, then f(X1), f(X2),'"is convergent in N;

Theorem 9.3.1 Let M and N be metric spaces, f : M --+ N a function.The following three conditions are equivalent (and a function satisfyingthem is called continuous):

1. for every l/m and Xo in M there exists l/n sueh that d(x, xo) ::;l/nimplies d(f(x), f(xo)) ::;l/m;

9.3 Continuous Functions on Metric Spaces

Chapter 9 Euclidean Space and Metric Spaces416

Page 436: Strichartz_The Way of Analysis 2000

Example (Bolzano) There exists a continuous function I R -+ R.

Corollary Any continuous function I :M -+ 1Rfor M ~ R.n compactcan be uniformly approximated by polynomials.

Theorem 9.3.7 (Stone- Weierstrass) Let M be a compact metric spaceand .A a collection 01 reol-valued functions on M that [orms an algebra(is closed under linear combinations and products) and strongly sepa­rates points (there are functions in .A assuming distinct and non-zerovalues at distinct points). Then any continuous real-valued function onM can be uniformly approximated by functions in A.

Theorem (Contractive Mapping Principle) A contractive mapping ona complete metric space has a unique fized point zo, which is the limit ofI" (z) (r denotes f iterated n times) for any x, and d(zo, I" (z)) ~ cr".

Deftnition A contractive mapping is a function I :M -+ M such thatthere ezists r < 1with d(f(z), f(y)) ~ r d(z, y) for all z, y in M.

Theorem 9.3.6 Let f : M -+ N be continuous and onto. 11 M isconnected, then so is N. lf M is arewise connected, then so is N.

Theorem 9.3.5 AreWÍ8e connected implies connected.

Deftnition A curve (or are) in M is a continuous function from aninterval to M. A space M is said to be arewise connected if for everypair of points z, y inM there ezists a curve f : [a,b] -+ M with f(a) =z, f(b) = y.

Theorem A subspace of R is connected if and only if it is an intervalo

Deftnition M is soid to be connected if there do not exist disjointnonempty open sets A and B with M = A U B; or equivalently, theonly sets botñ open and closed in M are the empty set and M.

Theorem 9.3.4 The image of a compact set under a continuous func­tion is compacto

4179.4 Summary

Page 437: Strichartz_The Way of Analysis 2000

Example (Peano) There ezists a continuous funetion from [O,1] ontothe unit 'quare in R2 •

for any pair 01sequence':l:n -t :1:0 and Yn -t :1:0 su eh that Yn ~ :1:0 ~ Xn.

/'(:1:0) = lim (/(3:n) - ( /(y,,) »)n-oo :l:n- Yn

Lemma 9.3.2 Jj j'(:l:o) ezists, then

such that j' fails to exist at every point.

Chapter 9 Euclidean Space and Metric Spaces418

Page 438: Strichartz_The Way of Analysis 2000

419

with gk(X) = Ej=l akjXj + bk. Since these affine functions are morecomplicated than in the n = m = 1 case, the differential calculus ingeneral will also be more complicated. However, we assume the readerhas some familiarity with elementary linear algebra and so will feelcomfortable in dealing with these affine functions.

(

gl(X) )9 (x) = ax + b = :

gm(x)

Recall that the idea of the differential calculus for numerical functionswas to approximate locally a general function by a special kind of func­tion, an affine function ax + b, and to relate properties of the generalfunction to properties of its affine approximation. The idea of the d­ifferential calculus for functions f : Rn -+ Rm is essentially the same.The affine functions wiil still have the form ax + b, but now a is anm x n matrix (m rows and n columns) and b is a vector in lRm. We adoptthe convention that all vectors are considered as column vectors in anyequation involving matrix multiplication. Thus

10.1.1 Definition of Differentiability

10.1 The Differential

Differential Calculus inEuclidean Space

Chapter 10

Page 439: Strichartz_The Way of Analysis 2000

as x ~ y; or in other words, given any error l/N, there ezists l/k su ehthat Ix - yl < l/k implies If(x) - f(y) - df(y)(x - y)1 ~ Ix - yl/N.

and of course 9 is also affine. But we can show that zero is the onlyaffine function such that g(x) = o(lx - yl) as x ~ y. We write g(x) =a(x - y) + b; since g(y) = b we obtain b = O from g(x) = o(lx - sl) asx ~ y. So it suffices to show that la(x - y)1= o(lx - vl) implies a = O.Setting z = x - y, the hypothesis is lazl/lzl ~ O as z ~ O. But if welet z vary along a ray z = tw, for t > Oand w in Rn with w =F Ofixed,then lazl = tlawl while Izl = tlwl, so lazl/lzl = lawl/lwl is a constantindependent oí t. Since this must tend to zero as t ~ O, it must alreadybe zero, so aw = O for a11w ::j:. O, and this implies a = O.

Thus the best affine approximation to f at a given point, if it exists,is uniquely defined. We can also assert that the best affine approxima­tion, iíit exists, must have the formg(x) = a(x-y)+bwhere b = f(y).This fo11owssimply from the fact that f (x) - g( x) = o( Ix - yl) as x ~ yimplies f(y) - 9(Y) = O, and b = g(y). In other words, the graphs oí fand 9 pass through the same point at the value y oí the independentvariable. Thus it is only the matrix a that needs to be determined,and we shall call it the diJJerential of f at y, written df(y). (The termsderivative and total derivative are often used to mean the same thing.)

Deflnition 10.1.1 We say f is diJJerentiable at y if there exists anm X n matriz df(y), ealled the differential of f at y, su eh that

f(x) = f(y) + df(y)(x - y) + o(lx - yl)

g(x) = -(f(x) - 91(X)) + (f(x) - g2(X))= o(lx - yl) + o(lx - yl) = o(lx - yl),

The presumption is that if such an affine function exists, then it isunique. To see why this is so suppose there were two such functions g1and g2, and let 9 = 91 - g2 be their difference. Then

lim(f(x) - g(x))/Ix - yl = O.x-y

Suppose f is a function f :D ~ Rm with D ~ Rn• For simplicitywe will assume that the domain D is an open seto We want to define thebest affine approximation to f at a point y. From the n =m = 1casewe expect this should be given by the condition f(x) = g(x) +o(lx -yl)as x ~ y, where 9 is an affine function. This means

Chapter 10 Differential Calculus in Euclidean Space420

Page 440: Strichartz_The Way of Analysis 2000

where the dot indicates the inner product of vectors. It will not be nec­essary for us to distinguish between row and column vectors. However,it is only fair to point out that in the study of differential geometrythis distinction becomes meaningful and important.

The best affine approximation f (y) +df (y) (x - y) has a graph thatcan be interpreted as a tangent plane (actually an n-dimensional plane)to the graph of f at y. This is most meaningful geometrically in thecases f : R2 ~ RI where the tangent plane is an honest plane in Jt3and f : ]Rl --+]R2 where the tangent "plane" is a Une in r.

Just as in the numerical case, it is easy to show that the differen­tiability of f at y implies the continuity of f at y, as follows. Fromf(x) = f(y) +df(y)(x - y) +o(lx - yl) we take limits as x --+ y. We oh­tain limx_y f(x) = f(y) because f(y) is constant and both df(y)(x -y)and o(lx -yl) tend to zero as x --+ y. We can show a little more, namelythat a pointwise Lipschitz condition holds: there exists a neighborhood

f(x) = f(y) +V f(y) . (x - y) + o(lx - yl)

in components, then f is differentiable at y if and only if all the func­tions J;(x) are differentiable at y and df(y) is the m x n matrix whosejth row is dJ;(y). However, having n variables in the domain is asignificant generalization.

If f is real-valued, then df(y) is a 1 x n matrix-a row vector. Itis sometimes called the gradient of f and written Vf(y). We willlatergive a geometric interpretation of this vector. We may also rearrangethe row vector into a column and then write

(

J¡(x) )f(x) = :

fm(x)

Notice that Ix - yl refers to the distance in Rn and If(x) - f(y) -di (y) (x - y) I refers to the distance in Rm• Otherwise, the definition isthe same as in the case of numerical functions. We do not specify inthe definition what the entries of the matrix df(y) should be. We willdeal with this question in the next section, after we derive sorne basicproperties that followfrom the definition directly.

It is easy to show that the m variables in the range do not addanything significantly new. In fact, if we write

42110.1 The DiJJerential

Page 441: Strichartz_The Way of Analysis 2000

g(x)f(x) = (g(y) + dg(y)(x - y) + o(lx - yl)). (f(y) + df(y)(x - y) + o(lx - yl))

= g(y)f(y) + [(dg(y)(x - y))f(y)+g(y)df(y)(x - y)] + o(lx - yl)

of y on which If(x) - l(y)1 $ Mlx - yl for some constant M. To seethis, observe that

If(x) - f(y)1 = Idf(y)(x - y) +o(lx - yl)l ~ Idf(y)(x - y)1+ lo(lx - yl)l·Since o(lx - yl) implies O(lx - yl), we can make lo(lx - yl) I ~ Ix - ylon a neighborhood of y. Then Idf(y)(x - y)1 $ M [z - yl becauseIAxl ~ M [z] for any matrix A, with M depending on A (see exercises).This proves the pointwise Lipschitz condition If(x) - f(y)1 $ Mlx - yl,for x in a neighborhood of y. Note, however, that since the constant Mmay depend on y, this is not as strong as a uniform Lipschitz condition.

If f is differentiable at every point of the domain D, we say f isdifferentiable on D. We can then regard the differential df(y) as afunction of y, taking values in the space of m x n matrices Rmxn. Ifdf : D ~ Rmxn is continuous we say f is continuously differentiable orf is el. This will turn out to be the more useful notion.

We note that differentiability and the differential are linear: ifI and9 (both mapping D ~ Km) are differentiable at y, then so is al + bgfor scalars a, b and d(af + bg) = adf + bdg. These are immediateconsequences of the linearity of matrix multiplication,

al (x) + bg (x) = a (1(y) + df (y) (x - y) + o (Ix - yD)

+ b (g (y) + dg (y) (x - y) + o (Ix - yl»= al (y) + bg (y) + (ad! (y) + bdg (y» (x - y)+ ao (Ix - yl) + bo (Ix - yl)

and the error term ao(lx - yl) + bo(lx - yl) is also o(lx - yl). It alsofollowsthat if f and 9 are differentiable or el on D, then so is af +bg.

There is also a product formula, in the case that f :D -+ Rm and9 : D ~ R so that 9 . I :D ~ Rm• Then the differentiability of Iand 9 at y implies that 9 . I is differentiable at y and d(g . f) (y) =g(y)df(y) + f(y)dg(y). Here df(y) is an m x n matrix and is multipliedby the scalar g(y), while f(y) is an m x 1 matrix and dg(y) is a 1 x nmatrix, so the matrix product f(y)dg(y) is an m x n matrix. Thisfollows from

Chapter 10 Differential Calculus in Euclidean Space422

Page 442: Strichartz_The Way of Analysis 2000

duf(y) = lim(f(y + tu) - f(y))/t.t-O

Ofk (y) = lim fk(y + tej) - fk(y).8xj t-O t

Notice that in general the differential is not expressible as the limit oí adifferencequotient, and for this reason wehave consistently downplayeddifference quotients in our development oí differential calculus.

More generally, if u is any non-zero vector in Rn, we can define thedirectional derivative in the u direction:

where df(Y)kj denotes the kth coordinate oí the vector df(y)ej or thekj entry oí the matrix df(y). But this equation simply expresses thefact that the function fk(y + tej) as a function oí t is differentiable att = O with derivative df (y) kj. We recognize this as just the usual par­tial derivative Ofk/8xj, which is obtained by keeping all the variablesXl, ••• , Xn except Xj fixed and differentiating fk as a function oí Xj.

Note that we can define the partí al derivatíves independently of theconcept oí the differential. We simply say 8fk/8xj exists at a point yíf fk(y + tej) = fk(y) + at + o(ltl) for some constant a as t -t O andthen set a = 8fk/8xj(Y). This can also be expressed as the limit oí thedifference quotient

The kth coordinate oí this equation is

f(x) = f(y + tej) = f(y) + df(y)tej + o(ltejl)= f(y) + tdJ(y)ej + o(ltl).

We come now to the problem oí determining the entries oí the matrixdJ. If we choose x = y + tej where ej is the unit vector in the jthdirection, then

10.1.2 Partial Derivatives

(the new remainder term includes o( Ix - yl) multiplied by boundedfunctions, and (dg(y)(x - y))dJ(y)(x - y), which is O(lx - y12) henceo(lx - yl).) AU that remains is to verify that (dg(y)(x - y))f(y) =(f(y)dg(y))(x - y), and this is simply matrix algebra.

42310.1 The DiJ1erential

Page 443: Strichartz_The Way of Analysis 2000

Notice that choosing u = ej shows df(y)ej = of /OXj(Y) as we o­riginally noted. Also we can write out df(y)u = EJ=l 8f /8xj(y)ujor (df(Y)U)k = EJ=18fk/8xj(y)Uj in components. Thus a11 direc­tional derivatives are determined by the special ones in the coordinatedirections, if f is differentiable. This is a somewhat surprising re­sult, because the directional derivative duf(y) is defined entirely interms of the values of f along the line y + tu passing through y inthe u direction (these lines are shown in Figure 10.1.1.) Aside fromthe point y itself, these lines have no points in common, so thereis no reason to believe there should be any connection between thederivatives of f along them. In fact it is easy to construct functionsfor which there is no connection. Take the plane R2 and y = (O,O),and define f(rcos8,rsin8) = rg(8) for any function g(8) that is odd,g(-8) = -g(8). Then duf(O,O) = g(8) if u = (cos8,sin8). The func­tion f is linear along each line through the origin, but the slopes areunrelated.

QED

f(y + tu) = f(y) + df(y)tu + o(ltul)= f(y) + tdf(y)u + o(t).

Theorem 10.1.1 Jf f : D -+ Rm with D ~ Rn is differentiable aty with differential df(y), then duf exists at y for any u in Rn andduf(y) = df(y)u.

Proof: From f(x) = f(y) + df(y)(x - y) + o(lx - yl) we obtain bysubstituting x = y + tu

Note that this means f(y + tu) = f(y) + tduf(y) + o(t) as t -+ O. If ftakes values in Rm, then so does duf; we can also consider dufk, the kthcomponent of duf. Then 8fk/8xj(y) = deJk(Y)' We will also writeo f /OXj for de) f without taking components and call this the partialderivative of f with respect to Xj. (In some calculus books the term"directional derivative" is reserved for the case that u is a unit vector,but there is no need for such a restriction. In fact, we will also allowu = O in duf, in which case we obviously have duf = O.)

We have observed that the existence of the differential implies theexistence of partial derivatives. This generalizes easily to the case ofdirectional derivatives.

Chapter 10 Differential Calculus in Euclidean Space424

Page 444: Strichartz_The Way of Analysis 2000

Theorem 10.1.2 Let 1 :D ~ R.m [or D ~ ]Rn have partial derwatiues81/8zi :D ~ rn [or j = 1, ... ,n that are continuous in a neighbor­hood 01y. Then I is differentiable at y. Moreover, a necessary and

The point is that these functions are not differentiable (unless g( 8)is very special}. The graph of 1 near the origin has a folding fan ap­pearance and is not smooth. We see that differentiability for a functionof several variables is a rather strong condition. The existence of thepartial derivatives, or even all directional derivatives, does not implydüferentiability. In fact, in the aboye example if we take g( 8} smooth,say g(8} = sin 28, then I(z, y} = zy/vx2 + y2 has direction derivativesin all directions and at all points in the plane but is not differentiableat the origine Thus the converse to the last theorem is falseo It is aremarkable fact, however, that if the partial derivatives are continuous,then 1 has to be differentiable; this will provide a modified converseto the theorem. The hypothesis of continuity of 81/ 8xi (y) is strongenough to allow us to compare quantities computed from the values of1 along different lines parallel to the axes and passing through pointsnear y. Figure 10.1.2 illustrates the geometry in the planeo These linesfiU up in a neighborhood of y.

Figure 10.1.1:

42510.1 The Differential

Page 445: Strichartz_The Way of Analysis 2000

for sorne value Z2 between X2 and Y2.

and apply the mean value theorem te)both terms. (These are the threecorner points in Figure 10.1.3.) Notice that ¡(XI, X2) - ¡(Xl, Y2) isthe difference of the values of '¡ along the horizontal line, so by theone-dimensional mean value theorem

Proof: The idea of the proof is to apply the mean value theoremn times, to express I(x) - I(y) in terms 6f partial derivatives. Themean value theorem, however, involves evaluating derivatives at unde­termined points, and we will use the continuity of the partial derivativesto make the change to the partial derivatives evaluated at y. For sim­plicity of notation we present the proof in the case n = 2 and m = 1.

Assume that 81/ 8X1 and 81/ OX2 are continuous in a neighborhoodof y. To prove the differentiability of 1 at y we write

sufficient condition that 1 beel on D is that all the partial derivatives81/8xj ezist and are continuous.

Figure 10.1.2:

y

Chapter 10 Differential Calculus in Euclidean Space426

Page 446: Strichartz_The Way of Analysis 2000

R = [af/aXl(ZItY2) - al/aX1(Y1,Y2)](X1 - y¡)+ [al /aX2(X1, Z2) - al /aX2(YIt Y2)](X2 - Y2).

where

I(X1, X2) - I(y¡, Y2) = al /8Xl(Y1, Y2)(X1 - Y1)+al/ax2(Y1,Y2)(X2 - Y2) +R,

that we are trying to proveo Therefore, we rewrite our identity as

I(X1,X2) - I(Y1,Y2) = 81/8xl(Y1,Y2)(X1 - Yl)+ al /8X2(Y1, Y2)(X2 - Y2) + o(lx - yl),

which is related to but not identical to the statement of differentiability

for some Zl between Xl and Y2. This stage ofthe argument only requiresthe existence of the partial derivatives (not their continuity) to justifythe use of the mean value theorem. Summing the two identities, weobtain

alI(X1, Y2) - I(Y1, Y2) = -a (Zl, Y2)(X1 - Y1)Xl

Similarly, by the mean vahie theorem applied to f on the verticalline,

Fi~e 10.1.3:

42710.1 The Differential

Page 447: Strichartz_The Way of Analysis 2000

While the previous theorem allows us to reduce many questions inthe differential calculus of severa! variables to computations of partial

10.1.3 The Chain Rule

If we examine the argument carefully, we observe an asymmetry a­mong the variables. This was caused by the arbitrary choiceof /(Xl, Y2)over I (Yl, X2) for the intermediate comparison. As a result we endedup with a comparison of values of 8//8Xl along a horizontalline, butthe values of al/8x2 are compared between (XI, Z2) and (Yl, Y2)-notalong a verticalline. We will see in the next section that the proof ofcommutativity of partial derivatives is very similar.

This theorem is extraordinarily useful, since it provides a simpleand expedient method of showing functions are differentiable. We evenget continuity of the derivative in the bargain! We can immediatelyconclude from this theorem that every function I :Rn -+ Rm givenby a finite formula involving arithmetic operations, roots, and specialnumerical functions such as exp, log, sin, cos, which are known to bedifferentiable, is continuously differentiable on its natural domain-allX for which a11operations are defined (without dividing by zero ortaking roots of zero, etc.).

IRI I al al I I al al I-1 -1~-a (Zl,Y2) - -a (Yl,Y2) + -a (Xl,Z2) - -a (Yl,Y2)'x - y Xl Xl X2 X2

Now we are ready to use the continuity of al/8xl and al/aX2. Wehave only to note that as (Xl, X2) approaches (Yl, Y2), so do (Z1' Y2) and(Xl, Z2) because Zl and Z2 are intermediate values. Thuslimx_y R/ Ix - yl = O, proving the differentiability of I at y.

Next, assume that the partial derivatives are continuous on all ofD. Then, by what we have shown, I is differentiable on a11of D. Sincethe matrix dl(Y) has entries 8Ik/8xj(Y) and these are continuous byhypothesis, it follows that di : D -+ Rmxn is continuous, hence I is el.Conversely, assume I is el; then 8Ik/8Xj(Y) exists and is an entry ofthe matrix dl(y). But the continuity of the matrix function implies thecontinuity of the entries, so alk/axj(Y) is continuous. QED

To complete the proof we need to show R = o(lx - yl) as x -+ y orlimx_y IRI/lx - yl = O. But IXl - Yll ~ Ix - yl and IX2 - Y21~ Ix - yl,so

Chapter 10 Differential Calculus in Euclidean Space428

Page 448: Strichartz_The Way of Analysis 2000

This will prove both the difIerentiability of gof at y and the formula forthe difIerential, if we can show that dg(z)Rl(x, y) + R2(f(x), f(y)) =o(lx-yl) as x --+ y. But dg(z)Rl(x,y) = o(lx-yl) because Rl(x,y) =

g(f(x)) = g(f(y)) + dg(z)(f(x) - f(y)) + R2(f(x), f(y))= g(f(y)) + dg(z)df(y)(x - y)

+ dg(z)Rl(x, y) + R2(f(x), f(y))·

g(w) = g(z) + dg(z)(w - z) + R2(w, z)

where Rl (x, y) = o(lx - yl) as x --+ y and R2(W, z) = o(lw - zl) asw --+ z. Setting w = f(x) and z = f(y) in the second equation andthen using the first equation we obtain

and

f(x) = f(y) + df(y)(x - y) + Rl(x,y)

Proof: We write

Theorem 10.1.3 (Chain Rule) Jf f is dilferentiable at y and g is dif­ferentiable at z = f(y), then go f is dilferentiable at y and d(go !)(y) =dg(z)df(y) (matrix multiplication). Moreover, if f and g are dilferen­tiable (respectively el) on their domains, then so is g o f.

derivatives, hence to the one-dimensional calculus, there are still sit­uations in which we need to think in terms of the difIerential in itsfull multi-dimensional incarnation. One of these is the chain rule. Theunderlying idea is that the composition of two linear transformationscorresponds to matrix multiplication. Since the difIerential calculus letsus approximate a difIerentiable function f by an affine function, whichis a linear transformation plus a constant, the composition of difIeren­tiable functions should be difIerentiable and the difIerential should beobtained by appropriately multiplying the difIerentials of the functionsbeing composed. As in the one-dimensional case, we need to specifythe points at which the difIerentials are evaluated.

Let f : D --+ jRm with D <;;; jRn and g : A --+ jRP with A <;;; jRm, andsuppose the image f (D) is contained in A; so g o f :D --+ jRP is definedby g o f(x) = g(f(x)) for x in D. Now fix a point y in D, and letz = f(y). Note that z is an image point of f, so z is in A.

42910.1 The Dilferential

Page 449: Strichartz_The Way of Analysis 2000

This formula was derived under the assumption that f and 9 aredifferentiable-not just that the partial derivatives existo We can easilyinterpret this formula if we think about how a small change in the x jvariable affects 9 o f. First it produces a change in each fk(X) roughlyproportional to 8 fk/ OXj, and each of these is transmitted to 9 o f witha factor roughly equal to og/OZk' The fact that these terms are more orless independent, so we sum their contributions, can be justified by ar­guing that any interactions between terms will be of smaller order andhence vanish in the limito This can be seen clearly if we take polynomial

m

8(g O f)/OXj(y) =E8g/0Zk(Z)ofk/OXj(Y)'k=l

By taking components we can obtain the chain rule for partialderivatives. For simplicity of notation we assume the range of 9 isal, so 9 o f(x) = g(f¡(x), ... , fm(x)). Then

o(lx - yl) and dg(z) is a fixed matrix (so Idg(z)Rl(x,y)1 ~ cIRl(X,y)1where e depends only on dg(z)). To show R2(f(X), f(y)) = o(lx - yl)we need to appeal to the pointwise Lipschitz continuity of f at y, whichwe have shown to be a consequence of the differentiability of f at y.

Given any error l/N, we first use R2(z, w) = o(lz - wl) to find l/ksuch that Iz-wl ~ l/k implies IR2(Z, w)1 ~ Iz-wl/NM, where M is theconstant in the pointwise Lipschitz condition If(x) - f(y)1 ~ Mlx - ylholding for x in a neighborhood of y, say Ix - yl < l/p. Next we choosel/q so that l/q ~ l/p and Ix-yl < l/q implies If(x)- f(y)l < l/k sincef is continuous (we may take l/q = minimum (l/p, l/kM)). Then ifIx - yl < l/q we have

IR2(f(X), f(y))1 s If(x~;(y)1 s Ix;, YI.

This proves R2(f(X), f(y)) = o(lx - yl).Thus we have simultaneously shown that 9 o f is differentiable at y

and computed its differential at y to be dg(z)df(y), under the assump­tions that f is differentiable at y and 9 is differentiable at z = f(y). Iff and 9 are differentiable on their domains this shows 9 o f is differen­tiable on its domain, and if dg and df are continuous the formula ford(g o f) shows that it, too, is continuous. QED

Chapter 10 Differential Calculus in Euclidean Space430

Page 450: Strichartz_The Way of Analysis 2000

Oí course this notation suppresses some of the evidence of what isgoing on, but it is undeniably convenient. There is, however, one gravedanger of confusion. If some of the Z variables are the same as theX variables, the corresponding partíal derivatives may be different. Inother words, Zl = Xl does not imply 8g/8xI and 8g/8zl are equal.Perhaps the simplest way to understand this is to realize that 8g/8xI

8g ~ 8g (8zk)8x' = L._¿ 8Zk 8x . .

J k=l J

g(f¡(XI + h, X2), h(xI + h, X2))= f¡(XI + h, X2) . h(xI + h, X2)= (XIX2 + X2h)(XI - X~ + 2Xlh + h2)= (XIX2)(X~ - X~) + ((XIX2)2xI + X2(X~ - x~))h

+ (X22xl + XIX2)h2 + X2h3

and the h2 and h3 terms are discarded in the o(h) remainder. Noticethat the term X22x 1h2 comes from an interaction between8f¡/8xI(XI,X2)h and 8h/8xI(XI,X2)h and it is, as expected, of s­maller order than the signifícant terms.

The chain rule can also be interpreted as giving a formula for trans­forming partial derivatives under a change of variable. For this inter­pretation we think of g( Zl, .•. , Zm) as a given function of m variableswhose partial derivatives 8g/8zk are known. We then assume thatXI, ••• , Xn are new variables that are connected to the Z variables bythe equations Zk = fk(XI, ... , xn). (Usually we have n = m in thisinterpretation, although strictly speaking this isn't necessary.) Thenwe may regard 9 as also a function of the X variables, g(ZI,"" zm) =g(J¡ (XI, ... ,xn), h(xl' ... , xn), ••• , fm(x¡, ... ,xn)). Of course this newfunction is just 9 o f(x). Then the partial derivatives oí the new 9 withrespect to Xj are computed by the chain rule:

functions, say f¡(Xl, X2) = xIX2, f2(x¡, X2) = x~ - x~, and g(ZI, Z2) =ZIZ2. Then giving Xl an increment, Xl + h results in J¡(Xl + h, X2) =(Xl +h)X2 = XIX2+X2h and h(xI +h,X2) = (Xl +h)2 -X~ = XI -x~+2xlh+h2. Here we have f¡(XI +h,X2) = f¡(XI,X2)+8f¡/8xI(XI,X2)hexactly, while h(xI +h, X2) = h(x¡, X2) +812/ 8XI (Xl, x2)h +h2. Sub­stituting into 9 we find

43110.1 The Differential

Page 451: Strichartz_The Way of Analysis 2000

We are now in a position to give the proof.

l b(X) 8gI'(x) = b'(x)g(x, b(x)) - a'(x)g(x, a(x)) + a(x, y) dy.

a(x) x

l b(X)I(x) = g(x,y)dy

a(x)

where a(x) and b(x) are el functions of one variable and 9 is el on]R2. In Chapter 6, we stated without proof the theorem that I is el

and its derivative is given by

expression

We return to the question of differentiating a general function definedby an integal. A function of a single variable x might be given by an

10.1.4 Differentiation of Integrals

Proof: If y is in the interior of D, then y + te, must belong to D fort in a neighborhood of zero; and if di (y) exists, then g( t) =1(y + tej)is differentiable at t = O. Clearly 9 attains its max or min at t = O,so g'(O) = O by the n = 1 case. Thus 81 j8xj(Y) = g'(O) for all i, sodl(y) = O. QED

Theorem 10.1.4 Let I :D -+ ]R [or D ~ ]Rn, and let y be a pointin the interior 01 D. 11 I assumes its maximum or minimum valueat y and I is differentiable at y, then dl(y) = O (i.e., 81 j8XI(Y) =0,81 j8X2(Y) = 0, ... ,81 j8xn(y) ~ O).

means the derivative of 9 obtained by varying Xl and holding fixedX2,.'" Xn, while 8gj8z1 involves varying Zl = Xl but holding fixedZ2, ... ,Zm' Clearly these are different. The real problem is that whenZl = Xl one is tempted not to introduce a new name for the variable.Clearly this is a temptation to be resisted.

We conclude this section with a discussion of maximum and min­imum problems in several variables. This is just a preliminary dis­cussion, and we will return to the topic several times again. Here wesimply want to observe that the vanishing of the gradient is a neces­sary condition for the existence of a max or min in the interior of thedomain, which is almost an immediate consequence of the analogousone-dimensional resulto

Chapter 10 Differential Calculus in Euclidean Space432

Page 452: Strichartz_The Way of Analysis 2000

Lemma 10.1.1

a. lf 9 is a continuous function on R2, then J:g(x, y) dy is a contin­uous function of x.

We also need to show that this a continous function. Thus to com­plete the proof of our differentiation formula we need to establish thefollowing lemma.

8 lb lb 8g8x a g(x, y) dy = a 8x (x, y) dy.

8F 8G8a,8X2 (Xl, X2, X3) = 8a aX2 = -a (X2)g(X3, a(X2))'

This shows 8F/8xI and 8F/8x2 are continuous, and 8F/8xI(X, x, x)and 8F/8x2(X, x, x) are the first two terms in the claimed formula forf'(x).

It remains to compute 8F/8x3. Since b(XI) and a(x2) are heldfixed, we can simplify the notation and show

and

by the chain rule (provided F is el). But the computation of 8F/8xIand 8F / 8X2 is easily accomplished by the one-dimensional chain ruleand the fundamental theorem of the calculus (differentiation oí theintegral). For F(XI, X2, X3) = G(b(xl), a(x2), X3) where G(b, a, X3) =J:g(X3, y) dy, so

8F 8G 8b ,8XI (Xl, X2, X3) = m; 8XI = b (X¡)g(X3, b)(XI))

Then f(x) = F(x,x, x) and so

, 8F 8F 8Ff (x) = -8 (x,x,x) + -8 (x,x,x) + -8 (x,x,x)

Xl x2 x3

To do this we introduce a function F(Xl, X2, X3) of three variablesthat isolates the three appearances of x in the definition of f:

43310.1 The Differential

Page 453: Strichartz_The Way of Analysis 2000

We want to take the limit as h ---t O and interchange the limit andthe integral, which will be justified if we can show that(g(x + h, y) - g(x, y))/h converges to 8g/8x(x, y) uniformly for y in[e, d]. But the mean value theorem shows that the difference quotient(g(x + h, y) - g(x, y))/h is equal to 8g/8x(z, y) for sorne point z be­tween x and x + h (z depends on y also). The uniform convergence of8g/8x(z, y) to 8g/8x(x, y) then follows from theuniform continuity of8g /8x on the rectangle. QED

~ ([ g(x+h,y)dy -[ g(x,Y)dY) = [~(g(X+h,y)-g(X,Y))dY.

Keeping x and x' fixed (with Ix - x'l ~ l/m) and taking the limit as themaximum intervallength of the partition goes to zero, the sums becomeintegrals and we obtain I J: g(x, y) dy - J: g(x', y) dyl ~ (b- a)/n. Thisis the desired continuity of the integral.

b. We form the difference quotient

I¿g(x, Yj)~Yj - ¿g(x', Yj)~Yj I<¿ Ig(x, Yj) - g(x', Yj )1~Yj1

~; ¿~Yj = (b- a)/n.

Proof:a. On any compact set, say the rectangle a ~ x ~ b, e ~ y ~ d,

the function 9 is uniformly continuous. So given any l/n there existsl/m such that Ig(x, y) - g(x/, y)1 ~ l/n provided Ix - x'l ~ l/m for(x, y) and (x', y) in the rectangle (of course more is true, but this is allwe need). If we consider two Cauchy sums approximating J: g(x, y) dyand J: g(x', y) dy, evaluating at the same y values, they can differ byat most (b - a)/n, because

d t' rdx la g(x, y) dy = la 8g/8x(x, y) dy.

b. Jf 9 is el on JR.2, then J: g(x, y) dy is el on JR. and

Chapter 10 Differential Calculus in Euclidean Space434

Page 454: Strichartz_The Way of Analysis 2000

10. Let 9 : [a, b] ~ Rn be differentiable. If f : Rn ~ Rl is differen­tiable, what is the derivative (dldt)f(g(t))?

9. Let / : Ron ~ R be differentiable. Show that there exists 9 :lRn-1 ~ R with f(x¡, ... xn) = g(X2,"" xn) if and only if8f 18xl == O.

8. Let f : D ~ lR for D ~ lR2 be differentiable. Let (x, y) denotecartesian coordinates in R2 and (r,8) denote polar coordinates in]R2. Express 8/18x and 8f 18y in terms of B] 18r and 8f 188 and,conversely, at every point except the origino

7. Let f : D ~ lR be differentiable at y, and suppose V f(y) "1= O.Show that duf(y) = Oif u is orthogonal to V f(y).

6. Let f : D ~ R be differentiable at y, and suppose V f(y) "1=O. Show that duf(y) as u varies over all unit vectors (Iul = 1)attains its rnaxirnum value when u = >"Vf(y) for some X> Oandduf(y) = IV f(y)1 for that choice of u.

5. Let f :D ~ Jlt3 and 9 :D ~ ]R3 be differentiable at y, and let f x 9be defined by the vector cross product in R3. Prove that f x 9 isdifferentiable at y and d(f x g)(y) = df(y) x g(y) + f(y) x dg(y).

4. Let f : D ~ Rom and 9 : D ~ Rom be differentiable at y. Letf .9 : D ~ R be defined by the dot product in lRm• Prove that/ . 9 is differentiable at y, and find a formula for the differentiald(f . g)(y).

3. If f is differentiable at y, show that duf(y) is linear in u, meaningd(o.u+lw)f(y) = afluf(y) + bdvf(y) for any scalars a and b.

2. Prove that f :D ~ lRm is differentiable at a point if and only ifeach of the coordinate functions fk : D ~ lR is differentiable atthat point.

1. If A is any m x n matrix prove that there exists a constant Msuch that IAxl ~ Mlxl for every x in lRn.

10.1.5 Exercises

43510.1 The Differential

Page 455: Strichartz_The Way of Analysis 2000

17. Let f : IR2 --+ IR be el and satisfy f(O, y) = Ofor all y. Prove thatthere exists g: IR2 --+ IRI that is el, such that f(x,y) = xg(x,y)

18. If f : IR --+ IR is el and g : IR --+ IR is continuous and one of themhas compact support, show that f * g is el and (f * g)' = f' * g.

19. If g: IRn+1 --+ IR is el and f(x) = J: g(x,y) dy (for x E IRn), showthat f is el and 8fj8xj(X) = J: 8gj8xj(x,y) dy.

16. Let f : IRn --+ IRm be el. Show that df == o if and only if fis contant and that df is constant if and only if f is an affinefunction.

a. f: IR2 --+ IRI, f(XI,X2) = xleX2•b. f : IR3 --+ IR2, f(XI, X2, X3) = (X3, X2).c. f: IR2 --+ IR3, f(XI,X2) = (XI,X2,XIX2).

14. *A contour map shows the curves h(x, y) = e for values of ediffering by fixed amounts (usually 50 feet or 100 feet), where h isthe altitude function. The gradient \1h is larger in regions wherethe contour curves are denser, and \1h lies in a direction roughlyperpendicular to the contour curves. Explain why this is so.

15. If f :D --+ IR is el with D ~ IRn and D contains the line segmentjoining x and y, show that f(y) = f(x) + \1f(z), (y - x) for sornepoint z on the line segmento Explain why this is an n-dimensionalanalog of the mean value theorem.

13. Show that the following functions are differentiable and computedf:

11. Let XI, ... , Xn and YI, ... , Yn be given real numbers with the x'sdistinct. Find the affine function g( x) = ax + b such that¿j=l (Yj - g(Xj))2 is minimized.

12. *Suppose f : IRn --+ IR is el and g : IRn --+ IR is an affine functionsuch that the graphs of f and g intersect at the point (y, f(y))in IRn+1 but do not intersect at any other point in a neighbor­hood of (y, f(y)), where n 2: 2. Prove that g is the best affineapproximation to f at y.

Chapter 10 Differential Calculus in Euclidean Space436

Page 456: Strichartz_The Way of Analysis 2000

a2 I () _li al/axk(Y + tej) - 8//8Zk(Y)-:---~- y - m .aZj8Xk t-O t

Notice that this can be defined, independent of the existence oí d2/, aslong as 8/ /8Zk exists in a neighborhood of y.

Wewrite 821/8Zj8Zk for 8/8Zj ({Jf /(JZk) andso (cP f)jk = {J2f /(JZj{JZk.We call a2 l/aXj8Xk a second-order partial derivative, which is givenby

8 81 a 81-- --aZ1aZ1 8zn aZ1

d(df) =a al a al-- --

aZ1 aZn aZn aZn

al8Zn

as a column vector. (We are thinking of di as taking values in Rn thatwe identify with 1 x n matrices by transposing the row vector to acolumn vector.) If di : D ~ Rn is differentiable, then we can againexpress d(df) in terms oí partial derivatives:

di =

alaZl

write

10.2 Higher Derivatives10.2.1 Equality of MixedPartialsWe begin by defining the second derivative. If I :D ~ Rm (withD ~ Rn) is differentiable, we can regard df as a function di : D ~ lRmxn

taking values in the space of m x n matrices. We can then ask if thisfunction is differentiable. If it is, its differential d(df) (y) at a point willbe an (m x n) x n matrix, which we will call the second derivative,,p.I(y). For simplicity we wiIl usually deal with the case m = 1, sincethe general case reduces to this by considering coordinate functions.If I : D ~ R, then ,p.I(y), if it exists, is an n x n matrix calledthe Hessian of I at y. We can also define higher derivatives, but thenotation becomes a bit awkward.

What are the entries of the Hessian matrix? To answer this we

43710.2 Higher Derwative»

Page 457: Strichartz_The Way of Analysis 2000

Theorem 10.2.1 Let f : D --+ R and all partial derivatives of orderone and two be continuous. Then 82 f /8xj8xk = 82 f /8Xk8xj [or all j

On the other hand, ifwe compute 82 f j8xj8xk we will obtain the doublelimit of the same expression (think of it as a mixed second differencequotient) with the order of the limits interchanged. Thus the identityof 82f8xk8xj and 82 f8xj8xk is a statement about the interchange oftwo limits.

8f (x + tek) _ 8f (x)8 (8f ()) r 8xj 8xj

8Xk 8xj x = t~ t

l· lí f(x+tek+sej)-f(x+tek) l' f(x+sej)-f(x)1m 1m - 1m____.:.--~--~t-+O 8-+0 st 8-+0 st

l. li f(x + te¿ + sej) - f(x + tek) - f(x + sej) + f(x)1m 1m .t-+O 8-+0 st

The main result in which we are interested is the equality82f j8xj8xk = 82 f j8xk8xj. This is not true without additional hy­potheses, but the counterexamples are not of great significance, so weleave them to the exercises. We can interpret the identity 82 f j8xj8xk =82 f j8xk8xj in two ways. First, it says that the Hessian is a symmet­ric matrix. This is a significant observation since there are a numberof powerful theorems of linear algebra that apply to symmetric matri­ces. For example, a symmetric matrix always has a complete set ofeigenvectors. This fact is especially valuable in studying maxima andminima.

The second interpretation of 82 f j8xj8xk = 82 f j8xk8xj involvesthe commutativity of the "operators" 8j8xj and 8j8xk. Here we arethinking of the partial derivatives 8j8x j as functions ("operators")whose domain and range consist of spaces of functions f : D --+ IR,with 8j8xj mapping the "point" f to the "point" 8f j8xj. Withoutgoing into the details of the precise definition of 8j8xj as an operator,it is clear that 8j8xj (8f j8Xk) = 8j8xk (8f j8xj) does in fact expressa commutative law for partial derivatives.

If we simply substitute the definitions in terms of difference quo­tients we find

Chapter 10 Differential Calculus in Euclidean Space438

Page 458: Strichartz_The Way of Analysis 2000

for different values t2 and S2 in the same range. Since the differ­ence operators commute, we have the equality of the mixed partialderivatives in reverse order at two different points, 82 f /8Xk8xj(X/) =82 f /8xj8xk(X"), where x' = X+t1ek+s1ej and z" =X+t2ek+s2ej areboth confined to the same small rectangle near x, as shown in Figure10.2.1. Since this is true for all s and t, we have only to let s ~ Oand

where Sl lies between Oand s. The same argurnent with the differenceoperators in the reverse order shows

Then one more application of the mean value theorem yields

Proof: We will use the mean value theorern twice to replace thernixed second differencequotient by a second partial derivative evaluat­ed at an undetermined point and then use the continuity of the secondpartial derivatives. Define the diJJerence operator, au, by auf(x) =f (x +u) - f (e). In this notation the rnixed second differencequotient isate/ca,ejf(x)/st. A direct computation shows that the operators ate.and a,ej commute, that is, ate.a,ejf(x) = a,ejate.f(x), since bothare equal to f(x + tek + sej) - f(x + tek) - f(x + sej) + f(x). Con­sider first ate.a,e;l(x), and think of it as ate.g(x) for the functiong(x) = a,ejf(x). Note that the mean value theorern for 9 (regardedas a function of Xk alone, with the other variables heId fixed) can bewritten in the form ate/cg(x)/t = 8g/8xk(X+t1ek) for sorne t1 betweenOand t. Thus

and k.

43910.12 Higher Derivatives

Page 459: Strichartz_The Way of Analysis 2000

again assuming I is Ck for k = lal + 1,81.

for any multi-index a. We will always assume the functions involvedare Ck for k = lal (recall lal = al +a2+ ... +an) so that the order ofthe partial derivatives is irrelevant. Note that

A careful reworking of the finallimiting argument in the proof willallow you to deduce the equality from the assumption that just one ofthe mixed second partial derivatives is continuous-however, this is ofminor interest.

Under the hypothesis that all partial derivatives of orders one andtwo are continuous we can apply Theorem 10.1.2 twice to conclude thatdi is differentiable. Such functions are said to be of class C2. Similarlyif all partial derivatives of orders up to k are continuous, the functionis said to be e». This again implies that full derivatives up to order kexist and are continuous, If I is in Ck for all finite k, we say I is Coo.

In dealing with partial derivatives of higher order, a good notationis ver y important. We will find the multi-index notation introduced forpolynomials in the last chapter extends nicely to this contexto We let

t --+ O in any order and appeal to the continuity of 82 I j8xj8xk and82/j8xk8xj to obtain 82/j8xj8xk(X) = 82/j8xk8xj(x). QED

Figure 10.2.1:

x

x+srk

x +sej

Chapter 10 Differential Calculus in Euclidean Space440

Page 460: Strichartz_The Way of Analysis 2000

We now consider further the question of local maxima and minima forC2 functions. We have seen already that the vanishing of the differentialis a necessary condition for a local extremum. A point where df (x) = Ois called a critical point. In this section we wiIl show how to use thesecond derivative d2 f to analyze the behavior of a function near acritical point. It will turn out that we can reduce the problem to theone-dimensional situation by considering the restriction oí the functionto all lines passing through the critical point.

Let f :D ~ R be a C2 function with a critica! point at y, so df(y) =O. Let ~ f(y) denote the Hessian matrix {82f(y)/8xih~k} at the pointy. For any line passing through y, given as y + tu, the restriction off to the line gives a function g(t) = f(y + tu) of one variable, whichis C2• Note that g'(t) = df(y + tu)u = 2:7=1 8f /8xj(y + tU)Uj by thechain rule, so g'(O) = O. Also g"(t) = E~k=l 82 f /8xj8xk(y+tU)UjUk =(~f(y+tu)u, u), where (, ) denotes the inner product on an• Thus it isnot surprising that the expression (~f(y)u, u) (regarded as a functionof u in Rn) is the key quantity to study, since it is g"(0) and we knowhow g"(O) relates to the behavior of g(t) near the critical point t = O.For example, if f has a local minimum at y, then so does 9 at O andso g"(O) ~ O. On the other hand, if g"(0) > O, then we know 9 hasa strict local minimum at O. If this is true for alllines through y, weexpect to be able to prove that f has a strict local minimum at y.

This discussion points to the importance of conditions like(~f(y)u, u) ~ O or (d2 f(y)u, u) > O for all vectors u =F O in ]Rn.The first case we call non-negative definite and the second we callpositive definite. These definitions come from the theory of quadrat­ic [orms, which we can define simply as functions on ]Rn of the for­m (Au, u) = Ej,k=l AjkUjUk where A is any symmetric matrix. Aquadratic form is said to be non-negative definite if (Au, u) ~ O for a11non-zero u in R,n and positiue definite if (Au, u) > O for all non-zerou in R". We define non-positive definite and negative definite in thesame way with the inequalities reversed.

Wewill need the foIlowingbasic facts about positive definite quadrat­ic forms:

10.2.2 LocalExtrema

44110.2 Higher Derwatiues

Page 461: Strichartz_The Way of Analysis 2000

But the sphere is compact, and (Au, u) is continuous on Rn and, hence,on the sphere (with the subspace metric). Thus the quadratic formattains its inf on the sphere, which must be positive since (Au, u) > O.Thus we must have (Au, u) ~ E > O on the sphere for sorne E > O.This then implies (Au, u) ~ Elul2 for general u (because Itul = 1 ift = lul-1).

b. The symmetric matrices may be viewed as a Euclidean spaceof dimension n(n + 1}/2 (as coordinate variables take all entries onor aboye the diagonal-the below diagonal entries are deterrnined bysymmetry). Given a positive definite quadratic form (Au, u), choose

Figure 10.2.2:

Proof:a. It is clear that the inequality implies that the quadratic for­

m is positive definite, so it suffices to prove the converse. Note that(Atu, tu) = t2(Au, u) for real t, so the quadratic form has the samesign along lines through the origino Thus (Au, u) is positive definite ifand only if it satisfies (Au, u) > Ofor u in the unit sphere lul = 1. (Thesetup is symbolized in Figure 10.2.2).

a. A quadratic form (Au, u) is positive definite if and only if thereexists E > O sucñ that (Au, u) ~ Elul2 for all u.

b. lf (Au, u) is positive definite, then so is (Bu, u) for all symmetricmatrices B sufficiently close to A.

Lemma 10.2.1

Chapter 10 Differential Calculus in Euclidean Space442

Page 462: Strichartz_The Way of Analysis 2000

Proof:a. This is an immediate consequence of the one-dimensional case

and the fact that g"(O) = (d2 1(y)u, u) for g(t) = 1(y + tu).b. Here we have to proceed more cautiously, because we cannot

deduce that 1has a local minimum at y merely from the fact thatg(t) = 1(y + tu) has a strict local minimum at O for every U. Theproblem is that g(t) having a strict local minimum at O means thereexists e > O such that g(t) > g(O) for t -¡. O and Itl < e. But this edepends on 9 and, hence, on U. Thus we know only that 1(y+tu) > 1(y)if t -¡. O and Itl < é(U), and the set of such values y + tu may notnecessarily constitute a neighborhood of y, as in Figure 10.2.3. To showy is a strict local minimum we need to show 1(x) > 1(y) for x in aneighborhood of y. We will show that this is true on a ball Ix - yl < eon which d21(x) is positive definite-we have already observed thatsuch a ball exists by the lemma.

Fix u with lul = 1, and look at g(t) = 1(y+tu) on Itl < e (this e doesnot depend on u). We have g'(O) = Oand g"(t) = (d2 1(y+tu)u, u) > Oas a consequence of the positive definiteness. We claim that this implies

Theorem 10.2.2 Let 1:D --+ IR be C2 with D ~ IRnan open seto Lety in D be a critical point.

a. J1y is a local minimum (respectively maximum), then d21 (y) isnon-negative definite (respectively nonpositive dejinite).

b. 11d21(Y) is positive definite (respectively negative dejinite), theny is a strict local minimum (respectively maximum).

Part b of the lemma says exactly that the positive definite matricesform an open set in the space of symmetric matrices. In particular, fora C2 function 1, if (d2 1(x)u, u) is positive definite for one value of x itmust be positive definite in a neighborhood of x.

so B is positive definite. QED

e > Oas in part a. If we take any symmetric matrix B sufficiently closeto A in the Euclidean metric, then all the entries of B - A will be smalland we can make ((B - A)u, u) :S éluI2/2. Then

(Bu, u) = (Au, u) + ((B - A)u, u) 2: élul2 - élul2/2 = éluI2/2,

44310.2 Higher Derivatives

Page 463: Strichartz_The Way of Analysis 2000

In view of this result it becomes important to know whether or nota quadratic form is positive definite. We can understand the problembetter if we use a basic fact from linear algebra-the spectral theorem.Given any square matrix A, wesay u is an eigenvector with eigenvalue Aif u =F O and Au = AU. Here A is any scalar value, with A = O allowed.We must insist u =F O, otherwise AO = AO for any A. The spectraltheorem says that [or a symmetric matrix A there exists a complete set01 eigenvectors; that is, there exists an orthonormal basis u(l), .•. , u(n)of Rn with Au(k) = AkU(k). This is sometimes expressed by saying A isdiagonalizable by an orthogonal matrix, for the linear transformation

for O< t3 < t2 by another application of the mean value theorem (asimilar argument works if -E < tI < O.) Thus g(y) < g(y + tu) fort =F O,Itl < E, and every u with lul = 1; hence, g(y) < g(x) for x =F y inthe ball jz - yl < E. QED

9'{t2) t~ g'{O) = 9"{t3) s O

for O< t2 < tI by the mean value theorem and then

g{t1) t~ g{O) = 9'{t2) ~ O

Ois a strict minimum of 9 on the ínterval lr] < E. (Indeed if g(t¡) ~ g(O)for O< tI < E, then

Figure 10.2.3:

Chapter 10 Differential Calculus in Euclidean Space444

Page 464: Strichartz_The Way of Analysis 2000

and so (Ax, x) = ¿:j=l'xj (x, u(j»)2 because the u(j) are orthonormal«(u(j),u(k») = Oif j =F k and (u(j) ,u{i»)= 1). Thus the spectral theoremshows us that the most general quadratic [orm is a weighted sum 01squares. The terms (x, u(j») are just the coordinates of x with respectto the orthonormal basis {u{j)}, and the weighting factors 'xj are justthe eigenvalues. It is clear that the sigo of (Ax, x) will be determinedby the sigo of the eigenvalues 'xj. The quadratic form will be positive­definite if and only if all the eigenvalues are positive, non-negativedefinite if and only if the eigenvalues are all non-negative, and so on.

The problem of deciding the type of a quadratic form is thus reducedto the signs of the eigenvalues. Since the eigenvalues of a matrix A arethe roots of the characteristic polynomial p(,X)= det(AI - A), thisreduces to an examination of the signs of the coefficients of p('x). Infact, p('x) = ni=1(,X-'xj) = Lk=O ak,Xk with an = 1. If'xj > Ofor all j,then the signs of ak alternate; while if all X, < O,then ak > Ofor k. Theconverse statements are also true and so we have a criterion for positiveor negative definiteness in terms of the characteristic polynomial. Thereis also a related criterion in terms of the signs of the determinants ofsubmatrices of A.

In view of the importance of the spectral theorem, we will give aproof of it. This proof is not algebraic but rather uses the differentialcalculus in Euclidean space. It also has the virtue of being adaptable to

n n

Ax = ~)x, u(j»)Au(j) = ~~)x, u(j»)'xju(j)j=l j=1

with respect to the basis u(1), ... , u(n). We will not make use of thisinterpretation.

What does the spectral theorem say about the quadratic form(Ax, x)? Since an arbitrary vector x can be expressed as a linear combi­nation ofthe orthonormal basis elements u{i) as x = Lj=1 (x, u(j»)u (j),

we can substitute this into (Ax, x), noting that

x -+ Ax is represented by the diagonal matrix

4451O.~ Higher Derivatives

Page 465: Strichartz_The Way of Analysis 2000

Thus we have a way to get eigenvectors and so we know that there isat least one eigenvector, which we can normalize to have length one bymultiplying by an appropriate scalar. Call it u. To complete the proofof the spectral theorem by induction we want to restrict attention to

Thus the critical point equations are 2IxI2(Ax)j = 2xj(Ax, x) or Ax =AX where A = (Ax,x)j(x,x). QED

= 2L AjkXk = 2(Ax)j.k=l

n

_!_ ((AX,x)) = IxI2k(AX,x) - 2xj(Ax,x)

8xj (x, x) Ixl4

and so the equations for a critical point are IxI28j8xj(Ax,x)2xj (Ax, x), for j = 1, ... ,n. Now since A is symmetric,

Proof: We compute

Lernrna 10.2.2 Let x =1= O in]Rn be a critical point for (Ax,x)j(x,x).Then x is an eigenvector for A.

certain infinite-dimensional situations. It is based on the observationthat the function (Ax, x) j (x, x), called the Rayleigh quotient, attains itsmaximum value when x is an eigenvector corresponding to the largesteigenvalue. If we assumed the spectral theorem this would be a simpleexercise; but since we are trying to prove the spectral theorem, wewill have to use this only for motivation. In fact, we observe that theRayleigh quotient is homogeneous of degree zero-if we multiply x bya scalar it does not change the Rayleigh quotient. This means thatthe values of the Rayleigh quotient on IRnminus the origin (it is notdefined for x = O)are the same as its values on the unit sphere. But theunit sphere is compact and the Rayleigh quotient on the sphere is just(Ax, x), a continuous function, so it attains its maximum. Passing backto IRnminus the origin, we have shown the existence of a maximum for(Ax,x)j(x,x), hence a critical point.

Chapter 10 Differential Calculus in Euclidean Space446

Page 466: Strichartz_The Way of Analysis 2000

(381 ±2..j7 ) .

±2\17 OSince this matrix has determinant - 28 and the determinant is equal tothe product of the eigenvalues, we conclude that the two eigenvalues

which is clearly positive definite, so (2,O) is a local minimum. At(1,±..fi/2) this is

( 3/2 O)O 6 '

At the point (2,O)this is

Example Let I(x, y) = (1/32)x4 + x2y2 - X - y2. The critical pointsare the solutions of ol/ox = (1/8)x3 + 2xy2 - 1 = Oand ol/oy =2x2y - 2y = O. A simple computation shows there are only the threecritical points (2,O), (1, ..j7/2) and (1, -..j7 /2). The Hessian for f is

(~x2 + 2y2 4xy )

4xy 2x2 - 2 •

the orthogonal complement of u, the (n - 1)-dimensional vector spaceof all x in Rn with (x, u) = O. Now the symmetry of the matrix A isequivalent to the condition (Ax, y) = (x, Ay) for any x and y (choosingx = ej and y = ele gives Ajle = Alej). Thus if x is orthogonal to u,then so is Ax, (Ax, u) = (x, Au) = (x, AlU) = Al (x, u) = O since u is aneigenvector. Thus if we call this vector subspace u.l, we have A actingon u.l. Furthermore, A is still symmetric on u.l because the condition(Ax, y) = (x, Ay) is true when we restrict x and y to u.l. We can thenrepeat the argument to get an eigenvector of length one in u.l, andcontinuing in this fashion by induction we obtain the complete set ofeigenvectors u(l), ••• ,u(n). The manner of choice shows that these forman orthonormal set, hence a basis for Rn• This procedure also yieldsthe formula

{ (Ax, x) ~ O d' h al (1) (Ie-l)}Ale = sup (x, x) : x -r an x 18 ort ogon to u , ... , u

for the kth largest eigenvalue. This formula is often used in obtainingestimates for the eigenvalues.

4471O.~ Higher Derivatives

Page 467: Strichartz_The Way of Analysis 2000

where a! = al!a2! ... D:n!, for any k :S m (note O!= 1 by convention).

1 (d) k uo: ( 8 ) o:k! dt g(t) = L al 8x f(y + tu)

1001=k

Lernrna 10.2.3 If f is cm, then g(t) = f(y + tu) is cm and

We have been thinking of the differential df as giving the crucial pieceof information needed to form the best approximation to f by an affinefunction in a neighborhood of a point. We can think of an affine func­tion as a polynomial of degree one, and then the natural generalizationis to consider the best local approximation to f by polynomials of high­er degrees. For simplicity we take f to be real-valued, f :D --+ IRwithD <;;;; IRn an open seto Motivated by the n = 1 case, we seek a polyno­mial Tm(y, x) of degree m in x (here y is a fixed point in D) such thatf(x) = Tm(y, x) + o(lx - Ylm) as x --+ y. Notice that this is truly ann-dimensional problem in that we need to show for every l/N there ex­ists l/k such that Ix - yl < l/k implies If(x) - Tm(y, x)1 :S Ix - ylm /N.It is not enough to show that this is true along every line through y,for then the 1/ k might depend on the choice of the lineo Nevertheless,our method of proof will be to use Taylor's theorem in one dimensionapplied to the restriction of f to each line through y. This will tell uswhat Tm(y,x) has to be, and we will be able to give a proof based onthe proof of the ene-dimensional Taylor's theorem that will enable usto control l/k independent of the lineo

Let f be cm. This means all partial derivatives (8/ 8x)0: f for lal :Sm (recall lal = al + ... + an) exist and are continuous. It is easy tosee that the restriction of f to every line in IRn (intersected with thedomain D of 1) is cm. We will need a formula for the derivatives of therestriction. Let y+tu be such a line, where y is in D, u is in IRn(u =p O),and t is a real variable. Then g(t) = f(y + tu) is the restriction.

10.2.3 Taylor Expansions

have opposite signs and so these points are neither local maxima orminima (they are called saddle points).

We will return again to the theory of maxima and minima in Chap­ter 13, after we have discussed the implicit function theorem.

Chapter 10 Differential Calculus in Euclidean Space448

Page 468: Strichartz_The Way of Analysis 2000

Proof: This is essentially just an application of the chain rule ktimes. For k = 1 we have d/dtf(y + tu) = I:j=1 uj8f/8xj(Y + tu),which is of the required form (10'1 = 1 implies all aj = Oexcept for oneaj = 1). For k = 2 we compute

,p. d ( n 8f ) n n 82fdt2 1(1/+ tl/) = dt t;Uj8",/1/ + tu) =ttt;u¡UjO",¡O",/I/+tu).There are two kinds of multi-indices a with 10'1 = 2: those with n - 2zeroes and two ones and those with n - 1 zeroes and one two. Thefirst kind correspond to 82 f /8Xi8xj = 82 f /8xj8xi (off-diagonal termsin ,p.f) with j :1: i and occur twice in the sumo The second kindcorrespond to 82 f / qx i8x j (diagonal terms in ,p.f) and occur only once.But a! is equal to one for the first kind and two for the second kind, so

n n 82f o ( 8 )0LLUiUj8 .8 .(y+tu)=2 L; -8 f(y+tu),.. x, xJ a. x,=1 )=1 101=2which gives the result for k = 2.

The general case is proved by induction. So let us assume that

(k ~ l)! (!t11(1/ + tu) = L :~C:.,r 1(1/ + tu)

lol=k-land prove the result for k. We differentiate both sides of the inductionhypothesis identity to obtain

:!(~r1(1/ + tu) = IQ~-l t~~(:.,)"1(1/ + tu)

= "" ~ 1uo 8 (8) oLJ ~ k a!Uj 8x. 8x f(y + tu).lol=k-l J=1 J

But (8/8xj) (8/8xt = (8/8x)/3 where Pi = ai for i =F j and Pj =Olj + 1. Clearly every multi-index P with IPI = k arises in this way ntimes, one for each j. Also P! = pja!, so

L t! u:Uj~ (!_)o f(y + tu)1

. 1 k a. 8xJ 8xlo=k-l1=

= L ~i:P~uJ3(!...)/3 f(y+tu)1/3I=k k j=1 p. 8x

4491O.~ Higher Deritlatitles

Page 469: Strichartz_The Way of Analysis 2000

Proof: We have to show that given any error l/N there exists l/ksuch that Ix - yl < l/k implies If(x) - Tm(y,x) ::; Ix - ylm/N. Let

Then f(x) = Tm(y, x) + o(lx - Ylm) as x ~ y.

'" {x - y)Q ( 8 ) QTm{y, x) = L..J Q! 8x f(y)·IQI$m

Taylor's Theorem Let f : D ~ R be cm, where D ~ Rn is open, andlet y be in D. Define the m th order Taylor expansion as

Note that Tm(y, x) is a polynomial of degree ::; m in z, We call it theTaylor expansion 01 I at y of order m. When n = 1 this agrees withour previous definition. Also note that Tm(y, x) = J(x) exactly if f isa polynomial of degree ::; m. The problem is that we do not yet havean honest proof that f(x) = Tm(y, x) + o(lx - ylm), because we haveonly proved it line by lineo Since the o{lx - Ylm) estimate on each linethrough y may vary, we may not be able to make o(lx - ylm)/Ix - ylmsmall uniformly on a neighborhood of y. Nevertheless, we will see thatwe can produce a valid argument by looking back at exactly what wasproved in the one-dimensional case.

{x - y)Q ( 8 )QTm(y,x) = E , -8 f(y).I I

Q. XQ$m

m Q ( 8 )Qg(t) =E E:! 8x f(y)tk + o(tm).

k=O IQI=k

Choosing u = (x - y)/Ix - yl and t = Ix - yl this becomes

!(x) = E (x :t)" (:x r !(y) + o(lx - Ylm).IQI$m

This looks like the kind of result we want, with

We write out the one-dimensional Taylor expansion for g{ t) =f{y + tu), substituting in the aboye computation, to obtain

and (l/k) E']=1 ¡3j = 1 (since 1¡31 = k)j hence, we have the desiredidentity. QED

Chapter 10 Differential Calculus in Euclidean Space450

Page 470: Strichartz_The Way of Analysis 2000

It is worth pointing out the similarity of this proof with that ofTheorem 10.2.2 (ál f(y) being positive definite at a critical point im­plies the critical point is a local minimum). In both cases we reduced

as desired. QED

Ih(x)l = Ig(l)1 ::; 1(d/dt)m g(tl)l :5 enlx - ylm/N,

for O::; t ::; 1, where en= LIQI=m l/o!. (Wehave also used the estirnateI(x - y)QI ::; Ix - ylm for 101=m, which we leave as an exercise.) Theproof of the one-dimensional Taylor's theorem (Theorem 5.4.5) showsthat Ig(t)1 ::; tml(d/dt)mg(tl)1 for sorne value of tI in (O,t). Takingt = 1 and applying (*) we obtain

by the lemma. So setting t = Owe obtain (d/dt)kg(O) = Ofor k ::;msince (8/8x)Qh(y) = O.We also have Iy +t(x - y) - yl = tlx - yl < l/kfor O ::; t ::; 1 if Ix - yl < l/k. Thus 1(8/8x)Qh(y + t(x - y))1 ::; l/N,which gives us the key estimate

(d ) k tk(x y)k ( 8 ) Q

dt g(t) = E a~ 8x h(y + t(x - y))IQI=k

h(x) = f(x) - Tm(y, x). Then we claim (8/8x)/3h(y) = O for any1.81 :5 m. This followsby direct computation since (8/8x)/3(x - y)Q =O at x = y unIess (3 = o, and then (8/8x)Q(x - y)Q = ol. Thus(8/8x)/3Tm(y, x) at x = y consists of just the one term (8/8x)/3 f(y)and so (8/8x)/3h(y) =O.

Nowwe use the assumption that f is cm, hence h is cm, to concludethat (8/ 8x)P h must be smalI in a neighborhood of y for all 1.81 ::; m.Let 1/N be given. Then there exists l/k such that Ix - yl < l/kimplies 1(8/8x)/3h(x)l ::; l/N for all 1.81 ::; m. We want to prove thatthis in turn implies Ih(x)1 ::; enlx - ylm /N where en is a constantdepending only on the dirnension n. If we can do this we will haveshown f(x) - Tm(y, x) = o(lx - Ylm) as desired.

But now we can appIy the proof of the one-dirnensional TayIortheorem to the function g(t) = h(y + t(x - y)). We have

45110.12 Higher Deriuatiues

Page 471: Strichartz_The Way of Analysis 2000

6. Prove the Lagrange Remainder Formula in n-dimensions.

5. Prove Ixol ~ Ixl10l for every Q.

4. Prove that a positive definite matrix has positive entries on thediagonal. Give an example of a symmetric matrix with positiveentries on the diagonal that is not positive definite.

3. Let A be a symmetric matrix. Prove that A is nondegenerate (forevery x '# Othere exists y '# Osuch that (Ax, y) '# O), if and onlyif all the eigenvalues of A are non-zero.

10.2.4 Exercises1. Let 1 : R2 -+ R be defined by !(x, y) = xY(X2 - y2)/(x2 + y2) for

(x, y) '# (O,O) and 1(0, O) = O. Express 1 in polar coordinates.Show that 81/ B», 81/ 8y, 82 1/ 8x8y, and 82 1/ 8y8x exist for all(x, y) in R2 but 821 /8x8y(0, O) '# 821 /8y8x(0, O).

2. Prove the equality of 821 /8xj8xk and 821 /8Xk8xj under the hy­pothesis that one of them is continuous (and 8JI 8x j and 81/ 8x kare continuous).

where z is some point on the line joining x and y. From this we obtainI(x) -Tm(y, x) =O(lx-ylm+1). The proof ofthe Lagrange RemainderFormula in n-dimensions is a direct consequence of the one-dimensionalcase and the lemma. We leave the details as an exercise.

~ (x - y)O (!...)O I(z)L,_¡ o! 8x

lol=m+lI(x) - Tm(y, x) =

the n-dimensional result to the one-dimensional result along all linespassing through the point. However, in both cases it was not enoughsimply to quote the one-dimensional result; we needed further to usethe continuity of the derivatives in n-dimensions to obtain estimatesuniformly on the lines, and this in turn required that we re-examinethe proof of the one-dimensional resulto

There is also a Lagrange Remainder Formula for I(x) - Tm(y, x),under the assumption that 1 is cm+1• It has the form

Chapter 10 Differential Calculus in Euclidean Space452

Page 472: Strichartz_The Way of Analysis 2000

15. Classify the critical points of the followingfunctions:

a. I(x, y) = x4 + x2y2 - y.

14. Prove that the Taylor expansion Tm (y, x) is unique in that if 9 isany polynomial of degree ~ m such that 1(x) = g( x) +o( Ix _ Ylm)as x ~ y, then g(x) = Tm(y,x).

13. If 1and 9 are cm functions, prove that the Taylor expansion oforder m about y of l·9 is obtained by multiplying the correspond­ing Taylor expansions of 1and 9 and retaining only the terms oforder ~ m.

12. Let 1 and 9 be C2 real-valued functions with I(y) = g(y) =O,dJ(y) = dg(y) = O,and tP I(y) = >.tPg(y) where tPg(y) is posi­tive definite (or negative definite). Prove that limx_y I(x)/g(x) =>.. Give a counterexample to the naive generalization of l'Hópital'srule to dimensions greater than one [i.e., find 1and 9 that are elwith I(y) = g(y) = O,dl(y) = >.dg(y) =F O,but limx_?I I(x)/g(x)does not exist).

10. Prove dudvl = dvdul if 1 is C2 for any vectors u and v.

11. Prove that if 1 is e3 and tP I(Y) = Oat a critical point y but(8/ 8x )01(y) =F Ofor some a with [o] = 3, then y is not a localmaximum or minimum.

for all [z,y) =F (O,O).

9. Show that if 1 is C2 and I(x, y) = g(r,8) where (r,8) are polarcoordinates in R2, then

8. For the afline function g(x) = ax + b that minimizesLi=l (Yj - g(Xj))2 (see problem 11of section 10.1), show by directcomputation that the Hessian is positive definite.

7. *Find a formula for (8/8x)°(l· g) in terms of derivatives of 1andg.

45310.12Higher Derivatives

Page 473: Strichartz_The Way of Analysis 2000

f(x) = f(y) + df(y)(x - y) + o(lx - yl)

Deflnition Let f : D -+ IRm with D ~ Rn open. We say f is differen­tiable at y (a point in D) if there exists an m x n matrix df (y ), ealledthe differential of f at y, sueh that

10.1 The Differential

10.3 Summary

(vibrating string equation) if and only if there exist C2 functionsg, h : IR~ IRsuch that f(x, t) = g(x + ct) + h(x - ct). Here e is aconstant (the speed of sound). (Hint: make a change of variableto reduce to problem 17.)

19. Define Pf(x) = :Elal~mca(8/8x)a f(x) for f : IRn~ IRany cmfunction, where Ca are constants. Show that if Pf = O, then thesame is true for any translate of f (g( x) = f (x + y) for fixed y).P is called a eonstant eoefficient partial differential operator.

20. If f : IR~ 1Ris Ck and even (f( -x) = f(x)), show that F: IRn~IR defined by F(x) = f(lxl) is e».

xb. f(x, y) = 2 2·l+x +yc. f(x,y) = X4 + y4 - x3•

16. Show that if f : IRn~ R is C2 and tP f(y) is positive definite, thenthe graph of f locally lies aboye the graph of its tangent plane aty. Prove conversely that if the graph of f lies locally aboye itstangent plane at y that tP f(y) is non-negative definite.

17. Let f : IR2 ~ IRbe C2• Show that f satisfies 82 f /8x8y(x, y) == Oif and ooly if there exist C2 functions g, h : IR ~ IR such thatf(x, y) = g(x) + h(y). To what extent are 9 and h uniquelydetermined by f?

18. *Let f :R2 -+ R be C2• Show that f satisfies

82 /8t2 f(x, t) = c2(82 /8x2)f(x, t)

Chapter 10 Differential Calculus in Euclidean Space454

Page 474: Strichartz_The Way of Analysis 2000

Theorem 10.1.3 (Chain Rule) Let f : D ~ Km with D ~ Kn open,and let 9 : A ~ KP with A ~ Km open and f(D) ~ A. Jf f is differen­tiable at y and 9 is differentiable at z = f (y), then 9 O f is differentiableat y and d(g O f)(y) = dg(z)df(y) (matriz multiplication).

Theorem 10.1.2 A function f : D ~ Km with D ~ Rn open is el ifand only if the partial derioatiues ezist and are continuous on D.

Example f(x, y) = xy/ J x2 + y2 has directional derivatives in all di­rections at all points in the plane bui is not differentiable at the origino

Theorem 10.1.1 Jf f is differentiable at y, then all partial and direc­tional derivatives exist at y and df(y) is the matriz 8fk/8xj(y), whileduf(y) = df(y)u.

Deftnition The partial derivative 8ik/ 8xj is said to exist at a point yif fk(y + tej) = fk(y) + 8fk/8xj(y)t + o(t) as t ~ O. More genera"y,if u is in ]Rn, the directional derivative duf is said to ezist at y iff(y + tu) = f(y) + tduf(y) + o(t) as t ~ O.

Theorem Jf f and 9 are differentiable at y (or el, respectively), thenso is af + bg and the differential is linear: d(af + bg)(y) = adf(y) +bdg(y). Jf f : D ~ Km and 9 : D ~ K are differentiable, then so iss : f and d(gf)(y) = g(y)df(y) + f(y)dg(y).

Theorem lf f is differentiable at y, then f is continuous at y; in fact,If(x) - f(y)1 ~ Mlx - yl for x in a neighborhood of y, for some M.

The differential is uniquely determined by this condition. lf m = 1we also call df(y) the gradient of f at y and write it V f(y)· Jf fis differentiable at every point of D we say f is differentiable on D,and if df : D ~ Rmxn is also continuous we say f is continuouslydifferentiable or el.

If(x) - f(y) - df(y)(x - y)1 ~ Ix - yl/N).

as x ~ y (or in other words, given any l/N there exists l/k sud: thatIx - yl < l/k implies

45510.3 Summary

Page 475: Strichartz_The Way of Analysis 2000

Deftnition A function is said to be 01 class ek il all partial derivatives01 orders up to k exist and are continuous.

Theorem 10.2.1 Let I : D -+ IRbe continuous together with all partialderivatives 01 order one and two. Then 821/8xk8xj = 821 /8xj8xk;hence, the Hession metrix is symmetric.

Theorem JI I : D -+ R with D ~ IRnhas a second derivative at a pointy, then all second-order partial derivatives exist at y and the Hessiandl I(y) matriz is (dl f)jk = 821/8xj8xk.

Deftnition JI I is differentiable and 81/8xj has a partíal derivative8/8xk(81/8xj) at a point y, we say the second-order partial derivative821/8xk8xj exists at y and equals 8/8xk(81/8xj).

Deftnltlon Let f :D -+ Km with D ~ IRnbe differentíable. JI di : D -+Rn)(m is differentiable at y, then its differential d(dl)(y) (il ít e%Ísts) iscalled the second derivative, denoted d21(y). 11m = 1, then d21(Y) isan n x n matrix called the Hessian 01 I at y.

10.2 Higher Derivatives

b. 119 is el, then G is el with G'(x) = J:8g/8x(x, y) dy.

Lemma 10.1.1

a. 119 :R2 -+ R is continuous, then G(x) = J: g(x, y) dy is continu­ous.

l b(X) 8gI'(x) = b'(x)g(x,b(x)) - a'(x)g(x,a(x)) + 8(x,y)dY.

a(x) x

Theorem Let 9 : R2 -+ R, a : R -+ IR, and b : IR-+ IRbe el. ThenI(x) =J:~ig(x, y) dy is el and

Theorem 10.1.4 Let I : D -+ IRwith D ~ Rn and y in the interior 01D. 11I a8sumes its maximum or minimum at y and I is differentiableat y, then d,f(y) = O.

Chapter 10 Differential Calculus in Euclidean Space456

Page 476: Strichartz_The Way of Analysis 2000

1 (d) k ua ( 8 ) ak! dt g(t) = L al 8x f(y + tu)

lal=k

Lemma 10.2.3 I] f is cm, then g(t) = f(y + tu) is cm and

Lemma 10.2.2 Let x # Oin}Rn be a critical point for (Ax,x)j(x,x)oThen x is an eigenvector for A.

Spectral Theorem A symmetric matrix has a complete set of eigen­ueciors.

Definition A n eigenvector u for a matrix A with eigenvalue ,X is anon-zero solution of Au = 'xu.

a. I] y is a local minimum (respectively maximum), then d2 f(y) isnon-negative definite (respectively non-positive definite) o

b. I] d2 f (y) is positive definite (respectively negative definite), thetiy is a strict local minimum (respectively maximum) o

a. A quadratic form (Au, u) is positive definite if and only if thereexists e > O such that (Au, u) ~ élul2 for all u,

b. l] (Au, u) is positive definite, then so is (Bu, u) for all symmetricmatrices B sufficiently close to Ao

Theorem 10.2.2 Let f : D --+ }R be C2 with D ~ }Rn an open seto Lety in D be a critical point.

Lemma 10.2.1

Definition A quadratic form on}Rn is a function of the form (Au, u)where A is a symmetric n x n matriz. It is said to be non-negativedefinite if (Au, u) ~ O for all u and positive definite if (Au, u) > O foru # 00 (Similarly, we define non-positive definite and negative definiteby reversing the inequalities) o

Notation a = (al, a2, o o o, an) is called a multi-index, each aj being anon-negative integer; (8j 8x)a f = (8j 8Xl )a1(8j 8X2)a2 o o o (8j 8xn)an f;and lal = al + o o o + an is the order of the partial derivative.

457100:1 Summary

Page 477: Strichartz_The Way of Analysis 2000

Then f(x) = Tm(y, x) + o(lx - Ylm) as x -4 y.

" (x-yYl:(8)O'Tm(y, x) = LJ Q! 8x f(y).IO'I$m

Taylor's Theorem Let f :D --+ IRbe cm, where D ~ IRnis open; andlet y be in D. Define the m th order Tay10r expansion as

where a! = al!a2!···an!, for any k ~ m (note O! = 1 by convention).

Chapter 10 Differential Calculus in Euclidean Space458

Page 478: Strichartz_The Way of Analysis 2000

459

An ordinary differential equation is a relationship between a functionand its derivatives, where the derivative is taken with respect to a singlevariable. Usually we think of this variable as time, although in sorneapplications there may be other interpretations for it. We also want toconsider systems 01 ordinary differential equations that involve morethan one function. The term "ordinary" thus refers to the number ofvariables with respect to which we take derivatives-not to the num­ber of variables in the problem. For example, the theory of celestialmechanics-describing the motions of any number of bodies throughthree-dimensional space under the infíuence of gravitational forces-iscast in the form of a system of ordinary differential equations since onlytime derivatives appear. On the other hand, Einstein's theory of Gen­eral Relativity describing the same system involves partial derivatives.

We have already used ordinary differential equations in the study oftranscendental functions. The exponential function satisfies y' = y, andup to a constant multiple it is the unique solution. Similarly y(t) = cos tand z(t) = sint satisfy the system y' = -z, z' = y; and with theconditions y(O) = 1, z(O) = O they are again the unique solution to thesystem. These observations played a crucial role in our derivation of the

11.1.1 Motivation

11.1 Existence and Uniqueness

Ordinary DifferentialEquations

Chapter 11

Page 479: Strichartz_The Way of Analysis 2000

properties of these functions. We will see that these results are specialcases of the general theory that we are going to develop; however, thespecial tricks that made the proof of existence and uniqueness for thesespecial equations so simple are not available in general, so we will haveto work harder.

We let t denote the variable with respect to which we take deriva­tives, and we let x(t) : [a, b] -+ Rn denote a function of t taking valuesin Rn• The coordinate components x j (t) of x (t) can be thought of as nseparate real-valued functions that the system of equations describes.An ordinary diJJerential equation, abbreviated o.d.e., is an equation ofthe form F(t, x(t), x'(t), ... , x(m)(t)) = O where F is a function definedon an open subset of R1+n(m+l) and taking values in :ak• Each of the kcomponents of F may be thought of as a separate equation, so we arecompressing k equations into one. (Usually k = n, but we will not insiston this here.) Choosing zero for the right side is merely a simplifyingconvention, for we could always absorb a constant into F (in place ofy' + y = 27 write y' + y - 27 = O). The order of the o.d.e. is definedto be m-the highest derivative that appears in it. Notice that all thefunctions in the o.d.e. are evaluated at the same time t. Sometimes itis necessary to consider relationships such as z' (t )+x (t - 1) = 1, whichinvolve simultaneously different time values; however, such equation­s are not included in the standard theory of o.d.e. 's; they are calledfunctional-diJJerential equations or retarded diJJerential equations.

By a solution to the o.d.e. F(t,x(t),x'(t), ... ,x(m)(t)) = O on aninterval J we mean a function x(t) : I -+ ]Rn of class cm such thatthe o.d.e. holds for every t in J (if the interval contains one or bothendpoints we interpret the o.d.e. as referring to one-sided derivativesat those endpoints). This means in particular that for each t in J,the value (t, x(t), x'(t), ... , x(m>(t)) must He in the domain of F. Thesolutions we are considering may be thought of as "local" solutions,in that we do not require that the interval I be of maximal length inany sense. The function F may be defined for t outside the interval1, and it may even be possible to extend the solution beyond l. Forexample, consider the o.d.e. x'(t) + x(t)2 = O for x(t) taking valuesin R. The function F(t, x(t), x'(t)) is defined on all of 1R3(in termsof coordinates (Yl, Y2, Y3) for Jlt3 the function is F(y) = y~ + Y3). Asolution for this o.d.e. is the function x(t) = lIt on any interval Jnot containing zero. Other solutions are x(t) = 1/(t - to) where to

Chapter 11 Ordinary Differential Equations460

Page 480: Strichartz_The Way of Analysis 2000

The second and third sets of equations simply say what y(t) and z(t)are in terms of x (t), and the first set of equations restates the originalo.d.e. The claim is that x(t) : 1 --T ]Rn is a solution of the original o.d.e.if and only if (x(t), x'(t), x"(t)) :1--T R,3n is a solution of the new o.d.e.and Curthermore any solution (x(t), y(t), z(t» : 1 ~ aan of the newo.d.e. must be of the above formo We leave the simple verification ofthis claim as an exercise. The new o.d.e. may appear more complicatedthan the original o.d.e., but it is of first order.

It is customary in discussions of o.d.e.'s to insist on this reductionto the first-order case since then one can proceed with a minimum ofnotational baggage. However, this has the disadvantage that whenever

F(t, x(t), y(t), z(t), z'(t)) = O,x'(t) - y(t) = O,y'(t) - z(t) = o.

is any fixed constant and the interval 1 does not contain too It iseasy to verify that these are solutions since x' (t) = -1/ (t - to) 2 andx'(t) + x(t)2 = -l/(t - tO)2 + l/(t - tO)2 = o. As a consequence ofthe uniqueness theorem we will prove that there are no other solutionsto this o.d.e. In particular there are no solutions defined on the entireline-despite the fact that the function F is defined everywhere andis COO. This example underlies the motivation for looking at merely"local" solutions-if we demanded "global" solutions defined for all tfor which the o.d.e. makes sense, then we would put ourselves out ofbusiness for even this simple example. Later we will see that it is stillpossible to get global solutions for a certain important class of o.d.e.'s,the linear o.d.e.'s.

There is a fairly standard device for reducing an arbitrary o.d.e. toone of first order. In the process wewill increase the number of variables(the dimension of the range of x (t) )-in particular, we will always endup with a system, even if we started out with a single equation. Thetrick is to introduce new names for the derivatives x'(t), ... , x(m-l)(t),up to but not including the highest order. Say m = 3; then let y(t) =x'(t) and z(t) = x"(t). If the original o.d.e. involves x(t) taking valuesin Rn (n unknown functions) and F taking values in Rk (k equations),the new o.d.e. will involve 3n unknown functions, namely x(t), y(t),and z(t), and k + 2n equations, namely

46111.1 Existence and Uniqueness

Page 481: Strichartz_The Way of Analysis 2000

one wants to apply the theory to a higher order equation, it becomesnecessary to go through a translation process. We will sometimes optto deal directly with equations of higher order.

There is one other preliminary simplification that we will needto do; namely to solve the equations for the highest order deriva­tives. In other words, we want to rewrite the equation in the formx(m)(t) = G(t,x(t), ... ,x(m-l)(t)). Usually this is easy to do by in­spection. If the original o.d.e. is x"(t)x'(t) + t2 = O, then we canrewrite it as x"(t) = -t2 jx'(t). However, if the original o.d.e. isx'(t)2 - 1 = O, then solving for x'(t) yields two different o.d.e.'s,x'(t) = 1and x'(t) = -1, which have different solutions, all of whichare solutions of the original equation. Notice that by solving for thehighest order derivative we may end up with a more singular equation.In going from x'(t)2 + 1 - x(t)2 = O to x'(t) = ±y'x(t)2 - 1 we gofrom an everywhere defined Coo F to a partially defined, multivaluedG that is not even differentiable when x(t) = 1. This is an importantobservation, since we will be making assumptions about the functionG, and these are not consequences oí the corresponding assumption­s on F. The abstract question of when we can solve equations likeF(t, x(t), ... , x<m)(t» = O for x<m)(t) = G(t, x(t), ... , x(m-l)(t)) is onewe will take up in a later chapter with the aid of the Implicit FunctionTheorem. For now we will simply assume that this preliminary step hasbeen accomplished, and all o.d.e. 's we consider will be in the normalform x(m)(t) = G(t, x(t), ... , x(m-l)(t)) where G is a function definedon an open set of R(mn+1) taking values in lRn• We will always assumeG to be continuous, since we want x(t) to be cm and so x(m)(t) to becontinuous. Notice that in the normal form the number of equationsand unknown functions is the same (namely n). From now on we willalways assume this.

Perhaps the simplest o.d.e, in normal form we might consider isx'(t) = g(t) for a continuous function 9 : IR~ IR (here n = k = 1).We know from the fundamental theorem oí the calculus that this o.d.e.has solutions x(t) = J~g(x) ds + e where e is an arbitrary constant.(The choice of O for the lower endpoint in the integral is also arbitrary;we could just as well have started the integration from another point.)We also know these are the only solutions. We can therefore expectin general that solutions to o.d.e. 's are not unique, but we can hopethat they can all be described in terms of a finite number of constants.

Chapter 11 Ordinary Differential Equations462

Page 482: Strichartz_The Way of Analysis 2000

Looked at another way, we can hope to adjoin a finite number of sideconditions to the o.d.e. in order to make the solution unique. In theexample x'(t) = g(t), the condition x(O) = a determines the uniquesolution x(t) = J~g(s) ds + a. Ideally we would like to adjoin just thecorrect number of conditions so that the solution is unique and so thesolution always exists for all values of the parameters that are involved.Clearly, existence demands fewer conditions and uniqueness demandsmore conditions, so we hope to strike a happy balance in the middle.(In the example x'(t) = g(t), the conditions x(O) = a and x(l) = bwould also guarantee uniqueness, but there does not exist a solutionfor every choice of a and b.)

We would like to give an intuitive answer to the question: how muchextra information do we need to determine a solution to a given o.d.e.?The first approach is to ask how many free constants-parameters­one can expect in the general solution. That is, we can expect allsolutions to be given by a formula x(t) = j(t, a, b, ... ), each choice oftheparameters a, b, ... yielding a solution and all solutions correspondingto sorne choice of the parameters. Indeed, if you look at any elementarytextbook on o.d.e. 's, you will find most of the book devoted to methodsfor explicitly obtaining such formulas in many special cases. Such anapproach is often disparagingly described as "cookbook", but in factmany of the "recipes" turn out to be valuable in unexpected ways. Herewe will use only the very imprecise observation that all the "récipes"for solving an o.d.e. of order m involve performing m integrations, andof course each integration picks up an arbitrary constant. Since thereare n functions Xl (t), ... , xn (t) being integrated m times, we expectmn constants to be generated and so there should be mn parametersin the general solution. This also means that we will be looking for mnside conditions in order to determine a solution uniquely. This doesnot mean that we can expect any mn side conditions to work-we stillhave to examine the problem in detail for particular choices of sideconditions. We are only claiming at this point to have a grasp of thenumber of conditions that need to be imposed.

We will now simplify the discussion to consider only conditionsinvolving the unknown functions and their derivatives at a single valueof t, say too Such conditions are known as initial value conditions, withthe interpretation that to is sorne initial time at which we make sornemeasurements on the system, and then we want to predict the behavior

46311.1 Existence and Uniqueness

Page 483: Strichartz_The Way of Analysis 2000

We can thus think of G(t, x) as a function assigning a slope to everypoint in the t - x planeo The differential equation requires that we drawthe graph of a function subject to the restriction that at each point onthe graph the slope is the one prescribed by G. We can further imagineG as given by drawing a tiny (perhaps infinitesimal) line segment at

Figure 11.1.1:

slope = G(t. x(t»

of the system at future times t > too Actually our methods will alsoallow us to "predict the past" as we11,t < to, so we should take "initial"with a grain of salto It is certainly not the case that a11interesting ornatural problems involving o.d.e.'8 lead to initial value conditions­many involve boundary conditions in which the values at two points,thought of as the endpoints of the interval on which the solution isexpected to exist, are specified. However, initial value conditions arethe easiest with which to deal, and form the basis for discussing moregeneral problems.

Let us look at the simplest case: one first-order equation in oneunknown function, x'(t) = G(t, x(t))j and consider an initial valuecondition at t = too Since we expect only one such condition (m =n = 1), the simplest choice would be to specify the value x(to) = a.Is it reasonable to expect that this should determine a unique solutionfor each choice of the parameter a? One way to think about this isgraphically. Suppose we consider the graph of the solution in the t - xplaneo What does the differential equation say about this graph? Itsays the slope of the curve x'(t) at a point t is specified by G(t, x(t))­the value of G at the point in the plane on the graph of the solution,as shown in Figure 11.1.1.

Chapter 11 Ordinary Differential Equations464

Page 484: Strichartz_The Way of Analysis 2000

where we have substituted the previously determined x'(to) =G(to, a).By induction we can continue to determine all the derivatives x(n)(to) atto by differentiating the o.d.e. n - 1 times, (djdt)(n-l)x'(t) =(djdt)(n-l)G(t, x(t)) and setting t = to, for the right side of this e­quation only involves derivatives of x at to of orders < n, which havepreviously been determined in the induction step. Again we need toassume that G is en and x (t) is en+l. If the function x (t) were ac­tually analytic, then we would have a complete determination oí thepower series of x(t) about to, which would uniquely specify x(t). Thuswe have another plausible reason why the initial condition x(to) = ashould specify a unique solution.

Despite these plausible reasons, the uniqueness of the solution isnot assured without further assumptions. The way to obtain a coun­terexample is to work backward from the solution to the equation. We

8G 8G ,= m(to, x(to)) + 8x (to, x(to))x (to)

8G 8G= lit (to, a) + 8x (to, a)G(to, a)

X"(tO)

each point with the given slope so that the plane is covered by a patternoí porcupine quills. The graph oí a solution is then obtained by piecingtogether a curve out oí the porcupine quills. The initial conditionx(to) = a asserts that the graph oí the solution must pass through thepoint (to, a). Some people find that this graphical description makesthe existence and uniqueness of a solution seem plausible.

Another way of looking at the question is in terms of the Tay­lor expansion of the solution at the point too The initial conditionx(to) = a allows one to determine the value of the first derivative x'(t)of the point to via the differential equation x'(t) = G(t, x(t)), name­ly x'(to) = G(to, x(to)) = G(to, a). (This computation reveals why itwould be foolish to try to specify x' (to) as well as x (to).) Thus wehave the first two terms of the Taylor expansion of x about to, x( t) =a + G(to, a)(t - to) + o(lt - toD. But we can in fact go further, sim­ply by differentiating the differential equation. Since x'(t) = G(t, x(t))holds for t in an interval, we also have (djdt)x'(t) = (djdt)G(t, x(t)) orx"(t) = 8Gj8t(t, x(t)) +8Gj8x(t, x(t))x'(t) by the chain rule. This,ofcourse, presupposes that x(t) is e2 and Gis el. Under these additionalhypotheses we can set t = to and obtain

46511.1 Existence and Uniqueness

Page 485: Strichartz_The Way of Analysis 2000

so that x'(O) = o. Now it is easy to concoct the o.d.e., namelyx'(t) = (3j2)x(t)1/3, which this function satisfies. Note that the func­tion G(t, x) = (3j2)xl/3 is continuous and everywhere defined, al­though it does fail to be differentiable. This o.d.e. also has the solutionx(t) == O, and both these solutions satisfy the initial condition x(to) = Ofor any choice of to ~ o. Thus uniqueness fails for this example. It willturn out that the failure of G to be differentiable is the culprit here-wewill be able to prove the uniqueness and existence under hypotheses onG that are slightly weaker than differentiability.

Before discussing the positive results, let's consider the case of gen­eral systems (n ~ 1) and general orders (m ~ 1). For first-ordersystems, the same reasoning as aboye leads us to choose the initialconditions x(to) = a, where this is now a vector equation, Xk(tO) = ak

x'(t) = { (3j2)t1/2 ift>O,O if t s O

We choose t3/2 so that the second derivative will fail to exist, but thefunction is still el and

{3/2·f Ox(t) = t 1 t > ,

O ift s o.

so that it is identically zero for a while and then slowly lifts off, say

Figure 11.1.2:

want a function whose graph looks like that in Figure 11.1.2

Chapter 11 Ordinary Differential Equations466

Page 486: Strichartz_The Way of Analysis 2000

x(t) = a + r G(s, x(s)) dsi;using the initial condition x(to) = a. This is exactly the way we wouldobtain the solution ofthe simple o.d.e. x'(t) = g(t). In the general casewe do not obtain the solution in this manner, because after integrationthe unknown function x(t) appears on both sides of the equation. But

or

We will now consider how to obtain a proof of the existence and u­niqueness of solutions of an o.d.e. with Cauchy initial conditions. Wewill begin with first-order systems and then extend the results to higherorder systems. The key idea of the proof is to integrate the o.d.e. toobtain an equivalent integral equation. That is, we take the equationx'(t) = G(t, x(t)) and integrate both sides from to to t:

x(t) - x(to) = r G(s, x(s)) dsi;

11.1.2 Picard Iteration

which is familiar from the discussion of the sine and cosine functions.)When we reduce the general o.d.e. system of order m to one of fírstorder, we introduce new variables for the derivatives of x(t) up to or­der m - 1, so the initial conditions for the first-order system translateto the conditions x(to) = a(O), x'(to) = a(l), ... , x(m-l)(to) = a(m-l)

where each a(i) is a vector in R", Notice that there are exactly mnconditions. We will call these conditions the Cauchy initial value con­ditions, and we will refer to the specified values a{i} of the param­eters as the Cauchy data. Then the Cauchy problem for the o.d.e.x{m)(t) = G(t, x(t), ... , x(m-l)(t)) consists of solving the o.d.e. withgiven Cauchy data. It is a simple matter to verify that the Cauchy da­ta determines uniquely the values of all derivatives of all the functionsXk(t) at the point t = to, under the assumption that the Xk and G areCoo functions.

x~ (t) = -X2(t), x~(t) = Xl(t),

for k = 1, ... , n, involving n parameters. (Incidentally, we can't re­duce the general first-order system to n separate first-order equations,because they may be "coupled", as in the example

46711.1 Existence and Uniqueness

Page 487: Strichartz_The Way of Analysis 2000

Thus we have reduced the Cauchy problem to the solution of anintegral equation. Have we gained anything by doing this? On a verysuperficial level we have managed to combine both the o.d.e. and theinitial value condition into a single equation. But the real accomplish­ment is considerably deeper. We can see a hint of it in the asymmetricstatement of the lemma. For the o.d.e. we had to assume x(t) wasel, but in the integral equation it was enough to assume x(t) wascontinuous-it then followed that x( t) was el. This gratuitous gain ofa derivative is highly non-trivial-normally one doesn't expect a con­tinuous function to be automatically differentiable. There is a reasonfor it in this case: the integral equation. Integration "smooths thingsout", while differentiation "roughens things up", Therefore it is easierto deal with an equation in which integrals appear than with an equa­tion involving derivatives. There is also one technical point that wewill exploit in the proof: aside from the constant a, the right side ofthe integral equation Jt~G( S, x( s)) ds is small if t is close to too

Conversely, any continuous function x(t) satisfying the integral equa­tion is el and satisfies the Cauchy problem.

Proof: This is an immediate consequence of the two fundamentaltheorems of the calculus. We use integration of the derivative as aboyeto go from the Cauchy problem to the integral equation. Conversely,given the integral equation, the differentiation of the integral theoremtells us that f(t) = Jt~G(s, x(s)) ds is a el function with derivativef' (t) = G(t, x (t) )-this uses the hypotheses that G and x are contin­uous functions, hence so is G(t, x(t)). But the integral equation saysx(t) = f(t), so we have that x is el and x'(t) = f'(t) = G(t, x(t)).Thus the o.d.e. is satisfied. Finally we obtain the initial conditionx(to) = a by substituting t = to in the integral equation. QED

x(t) = a + t G(s, x(s)) ds.i;

is is easy to show that the new equation is equivalent to the o.d.e withthe Cauchy initial condition.

Lernrna 11.1.1 A el function x (t) defined on an interval 1 containingto satisfying the Cauchy problem x'(t) = G(t,x(t)) and x(to) = a alsosatisfies the integral equation

Chapter 11 Ordinary Differential Equations468

Page 488: Strichartz_The Way of Analysis 2000

d(x, y) = sUPllx(t) - y(t)1where I I denotes the Euclidean norm in R". We have seen that thisis a complete metric space. We define a mapping T : C(I) ~ C(I) byTx(t) = a+ J,: G(s, x(s)) ds. For this to be defined we need only assumeG to be continuous, but for this to be a contraction (d(Tx, Ty) ~pd(x, y) for some p < 1) we need to assume more. We have already seenthat uniqueness can fail, and even if the solution exists it may not existover the whole intervalo Thus we must both put additional hypotheseson G and shrink the interval 1. Since the contraction property is a kindof Lipschitz condition, it is not surprising that we wiIl need to have Gsatisfy a Lipschitz condition. A quick glance at the form of

d(T:I:, Ty) = sUPI la +1.:G(s, :I:(s)) ds - (a + 1.:G(s, Y(S))dS) I= sUPI li:(G(s, :I:(s)) - G(s, y(s))) dsl

indicates that we will have to estimate differences G(s, x) - G(s, y)where the value of s is the same for both terms. Thus the kind of

It is also significant to consider that the process whereby we passedfrom the o.d.e. to the integal equation, integration, gave the exact solu­tion to the simpler o.d.e. x'(t) = g(t). In fact, the gist oí our argumentwiIl be that the general o.d.e. is sorne kind of a ''perturbation" of thesimpler one--this will be especially clear in the case of linear equationswhere we can actually write down a perturbation series expansion ofthe solution. The idea of exploiting an explicit solution to a simplerproblem to prove the existence of a less explicit solution to a moredifficult problem is an important theme in the theory of differentialequations and perhaps in all of mathematics.

The form of the integal equation suggests that we look for the solu­tion as a fixed point of the mapping x(t) ~ a + ft: G(s, x(s)) ds. Thatis, we think oí a + Jt~G(s, x(s)) ds as a mapping from the functionx(t) to a new function of t-the integral equation then says that thesolution we seek is a fixed point of this mapping. It is natural, then, totry to apply the Contractive Mapping Principle, since this will give usa unique fixed point. We consider the metric space C(I) of continuousfunctions x : 1 ~ R", where 1 is sorne interval containing to, with thesup-norm metric

46911.1 Existence and Uniqueness

Page 489: Strichartz_The Way of Analysis 2000

except now we are restricting attention to the interval J. Here weneed to use a basic inequality known as Minkowski's inequality, whichsays If:I(s) dsl ~ J: I/(s)1 ds for any continuous function I(s) tak­ing values in Rn, where I 1denotes the Euclidean norm. We havealready discussed this inequality in the case n = 1. For the generalcase it is really a generalization of the triangle inequality. The integralJ: I(s) ds is a limit of Cauchy sums L, I(tj)(sj - Sj-1), for which wehave IL, l(tj)(sj-sj-1)1 s¿1/(tj)l(sj -Sj-1) by the triangle inequal-ity, and L, 1/(tj)l(sj - Sj-1) is a Cauchy sum for J: 1/(8)1de. Passingto the limit we obtain Minkowski's inequality.

d(Tx, TII) = supj li (G(8, X(8)) - G(8, 11(8))) dsl'Proof: As before, we have

Lemma 11.1.2 Suppose G(t, x) is a continuous function defined [or tin 1and x in Rn taking value« in Rn, which satisfies the global Lipschitzcondition IG(t, x) - G(t, y)1 ~ Miz - yl for all x and y in Rn and t in1, and sorne constant M. Let J be any subinterval 011 contained inIt - to 1 ~ p/ M [or sorne p < 1. Then the mapping

Tz(t) = a + ¡'G(s, Z(8)) dsi:on C(J) is contmctive, d(Tx, Ty) ~ pd(z, y) [or any x(t) and y(t) inC(J).

Lipschitz condition we want isIG(s,z) - G(s,y)1 ~ Mlx - yl for all xand y in Rn and alI s in 1 (here we are using the letters x and y to denotepoints in Rn rather than functions taking values in Rn-although thereis no harm in substituting these functions into the Lipschitz condition).One might think that we have to put some restriction on the size ofthe Lipschitz constant M because for the contraction property we needp < 1, but it turns out that this is unnecessary because we can obtainp < 1by shrinking the intervalo We will refer to this kind of Lipschitzcondition as a global Lipschitz condition, because it must hold for all xand y in Rn• This is usualIy too strong a condition-it rules out toomany important examples-so we will later have to consider also localLipschitz conditions.

Chapter 11 Ordinary Differential Equations470

Page 490: Strichartz_The Way of Analysis 2000

Notice the crucial way we have used the observation thatf~ G(s, x(s)) ds is small for t near to to overcome the lack of controlover M.

We are now in a position to apply the contractive mapping principleto obtain the existence and uniqueness of the solution of the integralequation, hence the Cauchy problem, on the subinterval J. Noticethat the contractive mapping principle gives us a constructive methodfor finding the solution by iteration: choose an arbitrary fírst guessXl(t) (it is simplest to take Xl(t) == a) and then define inductivelyXk+l(t) = a + Jt~G(s,xk(s))ds. The solution x(t) is just the limitof x k(t) as k -+ 00 in the metric, which means x k(t) converges to x (t )uniformly on J. This construction ofthe solution is known as the Picarditeration method. It can be shown by using more careful estimates thatthe Picard iteration method actually converges to the solution on thewhole interval 1not just on J. However, the Picard iteration method is

which is the desired contraction property. QED

Finally we make the crude estimate Ix(s) - y( s) I ~ d( x, y) for s in J,so f,t Ix(s) - y(s)lds ~ It - told(x, y) for t in J. Since we chose J sothat 1t - tol < p/M for t in J, we have

d(Tx, Ty) < supJM l' Ix(s) - y(s)1 ds ~ supJMlt - told(x, y)to

< M· (p/M)d(x, y) = pd(x, y),

d(Tx, Ty) s SUPJM1' Ix(s) - y(s)1 ds.'o

Thus we have altogether

1, IG(s, x(s)) - G(s, y(s))1 ds sl'Mlx(s) - y(s)1 ds.

~ ~

IJ~(G(s, x(s)) - G(s,y(s))) dsl s [IG(S, x(s)) - G(s,y(s))I de.

We then substitute the global Lipschitz condition to obtain

We apply Minkowski's inequality to estimate

47111.1 Existence and Uniqueness

Page 491: Strichartz_The Way of Analysis 2000

two solutions to obtain a single solution of the o.d.e on the union JUJlof the two intervals, as shown in Figure 11.1.3. Repeating this process afinite number of times, we will eventually extend the solution to everypoint of 1since the size of the interval of solution at each stage is

Figure 11.1.3:

J

Proof: By the lemmas and the Contractive Mapping Principle, thereexists a unique solution on J. If J is not a11of 1we take a point tIof J and the value x(tI) = b and give the Cauchy initial conditionsx(t¡) = b. Applying the argument to this Cauchy problem we obtaina solution to the o.d.e. on an interval JI containing tI. Since the twosolutions have the same Cauchy data at tI, they must be equal on theoverlap of the domains J nJI because of the uniqueness of the Cauchyproblem on JI. Thus we may combine the

has a unique solution on 1 (the solution is even unique on any subin­terval containing to).

x'(t) = G(t, x(t)), x(to) = a,

Theorem 11.1.1 (Global Existence and Uniqueness) 11G(s, x) satis­fies a global Lipschitz condition [or s in 1, then the Cauchy problem

not really very practical for obtaining approximations to the solution­since doing successive integrations is very time consuming. (It is neverused in numerical solutions.) Therefore we wi11not devote our energyhere to obtain more information about the scope of this method, butwe will simply take it as a method for proving existence and uniquenessover a small interval. We can then get existence and uniqueness over alarger interval by piecing together solutions over smaller intervals.

Chapter 11 Ordinary Differential Equations472

Page 492: Strichartz_The Way of Analysis 2000

The global existence and uniqueness theorem allows an immediate ex­tension: if 1 = U~l lj and G satisfies a global Lipschitz conditionIG(s,x) - G(s,y)1 :::;Mjlx - yl for s in lj and all x,y in ]Rn, where theLipschitz constant Mj depends on the subinterval lJ, then existenceand uniqueness holds on l. The reason for this is simply that exis­tence and uniqueness on 1 is equivalent to existence and uniqueness oneach lj, which is a consequence of the theorem. This extension is valu­able because such a function G may fail to satisfy a global Lipschitzcondition on all of l.

An important example where this remark applies is the class oflinear o.d.e.'s, x'(t) = A(t)x(t) + b(t) where A : 1 -T ]Rnxn is acontinuous (n x n )-matrix-valued function and b : 1 -T ]Rn is con­tinuous. Strictly speaking we should call these "affine" equations,limiting the term "linear" to the case b = O, but instead it is tra­ditional to call x'(t) = A(t)x(t) the homogeneous linear o.d.e. andx'(t) = A(t)x(t) + b(t) the inhomogeneous linear o.d.e. Of course thehomogeneous linear o.d.e. is "linear" in the sense that linear combina-

11.1.3 Linear Equations

restricted only by the condition It - tkl :::;pjM for the same fixed pand M (tk here denotes the point at which the Cauchy data is given).If the interval 1 is unbounded it will take an infinite number of stepsto extend the solution to all of 1 but only a finite number of steps toget to any given point in l.

In this way we pass from local existence and uniqueness to globalexistence by piecing together solutions. To complete the proof it re­mains to show that the solution is unique. This is done by lookingat the first point where two proposed solutions begin to differ and ap­plying the local uniqueness result there. Suppose then that x(t) andy(t) are two solutions of the o.d.e. on a subinterval h, containing to,and suppose x(to) = y(to). We want to show x(t) = y(t) on 11. LettI = sup{t in h : t ~ to and x(t) = y(t)}. Then, by the continuityof x and y, we have x(tl) = y(tI) and so, by the local uniqueness ofthe Cauchy problem at tI, we have x(t) = y(t) in a neighborhood oftI. This contradicts the definition of tI, un less tI is the upper endpointof 11. In a similar way we show that x(t) = y(t) down to the lowerendpoint of h. QED

47311.1 Existence and Uniqueness

Page 493: Strichartz_The Way of Analysis 2000

It is instructive to examine the Picard iteration method in the spe­cíal case of linear equations, for then we can represent it in terms of aperturbation series. We consider the equation x'(t) = A(t)x(t) +b(t) asa perturbation of the simpler equation x'(t) = b(t), which we know howto solve exactly. We can represent this in concise notation by writingD» for x'(t) and Ax for A(t)x(t). Then D» = b is the simple equationand De = Ax + b or (D - A)x = b the perturbed equation. Under theinitial condition x(to) = a, the simple equation Dx = b has the explicitsolution x(t) = a + Jt: b(s) ds. For simplicity we will set a = O. Wedefine D-1 f(t) = Jt: 1(8) ds so that D-1b is the solution of D» = bwith initial condition x(to) = O.

In the perturbed equation (D - A)x = b with initial conditionx (to) = O, we will think of A as being small relative to D. This suggestswriting the equation as (D-A)x = D(I -D-l A)x = b, where 1denotesthe identity operator, Ix = X. Then, at least formally, we can solvex = (I - D-1 A)-l D-1b. We already know what D-l means, but whatdoes (1- tr:' A)-l mean? We are thinking of A as small compared to

Corollary 11.1.1 The Cauchy problem for the linear o.d.e. x'(t) =A(t)x(t) + b(t) with A : 1-+ ]Rnxn and b : 1 ~ ]Rn continuous has aunique solution on l.

Now for any fixed matrix A we have IA(x - y)1 ~ Mlx - yl whereM depends on the entries of the matrix. The question of finding thesmallest M is very delicate, but we can easily get a crude estimate withM2 = ¿j,k IAjk12• Thus if all the entries Ajk(t) of A(t) are bounded on1, we will have a global Lipschitz condition for G holding on 1. Sincewe are assuming that A (t )-hence the entries-are continuous, this isimmediately the case if 1is compacto But even if 1is not compact, wecan always write 1=Uj=l lj with lj compact, and G satisfies a globalLipschitz condition on each lj. Thus we have

IG(t, x) - G(t, y)1 = IA(t)x - A(t)yl = IA(t)(x - y)1 ~ Mlx - yl·

tions of solutions are solutions, and the theory of linear algebra maybe applied.

For a linear first-order o.d.e., the function G(t, z) is A(t)x + b(t).The assumptions that A and b are continuous imply that G is contin­uous. The global Lipschitz condition that needs to be verifed is

Chapter 11 Ordinary Differential Equations474

Page 494: Strichartz_The Way of Analysis 2000

kXk=L(D-I A)1D-1b.

1=0Thus the Picard interation method produces the partial sums of theperturbation series. H we take the general case a =F O and choosexo(t) = a + D-Ib(t), then we find Xk = E7=o(D-I A)i(a +D-Ib). We

and in general

Since this is just Xk = D-I AXk-1 + D-Ib, we find

Xl = D-I AD-Ib + D-Ib,X2 = D-IAD-IAD-Ib+ D-IAD-Ib+ D-Ib,

Xk(t) =l'(A(S)Xk-I(X) + b(s)) ds,'o

and inductively

xI(t) =l'(A(s)xo(S) + b(s)) ds,'o

Of course we have only derived this perturbation series solution in aformal manner.

Now, what does this have to do with the Picard iteration method?Since we are free to choose any initial approximation xo(t) to the solu­tion, let's take xo(t) = D-Ib(t), the solution to the simpler equation.Then the Picard iteration method defines

00

x = ~)D-I A)kD-Ibk=O

= D-Ib +D-I AD-Ib + D-1 AD-I AD-Ib + ....

This leads to the "solution"

00

(1- D-I A)-l = 1+ D-I A + (D-I A)2 + ... = ~)D-I A)k.k=O

D, so D-I A should be small compared to l. Now if r is a real numberthat is small compared to 1 (if Irl < 1), then (l-r)-1 = l+r+r2+ ... =¿~o rk• We could thus hope that

47511.1 Ezistence and Uniqueness

Page 495: Strichartz_The Way of Analysis 2000

Aside from the linear equations, there are very few o.d.e. 's for whichthe function G satisfies a global Lipschitz condition. We have alreadyobserved that the o.d.e. x'(t) = -x(t)2 has solutions x(t) = 1/(t - tI)that fail to exist for all t, so the global existence and uniqueness theoremcan 't apply. In this case it is easy to see that G( t, x) = -x2 fails tosatisfy a global Lipschitz condition. Nevertheless, it satisfies a localLipschitz condition, IG(t, x) - G(t, y)1 ::;Mlx - yl, if we restrict x andy suitably, say Ixl s N and Iyl ::; N (then the Lipschitz constant Mwill depend on N). Such a local Lipschitz condition is rather easy toobtain-in fact it is true in general if we merely assume that G is Cl.We could hope, then, that the local Lipschitz condition would sufficeto prove a local existence and uniqueness theorem. This is indeed thecase and provides us with a very valuable theorem.

The idea of the proof is again to use the contractive mapping prin­ciple, not on the whole space C(J), but rather on a part of it-a part onwhich x(t) is bounded-so the local Lipschitz condition applies. Thedifficult part of the proof turns out not to be the contractive estimate-­that argument is the same as before--but the verification that the partof C(J) is mapped into itself. As before we will take a subinterval J ofJ, but now we will also impose the condition Ix(t) - al ::;N for all t inJ. We let Co(J) denote the metric subspace of C(J) of all continuousfunctions x : J -+ Rn satisfying Ix(t) - al ::;N for a11t in J, with thesup-norm metric d(x, y) = sUPJ Ix(t) - y(t)l. Note that the conditionIx(t) - al ::;N for a11t in J is the same as d(x, a) ::;N, where a denotesthe function that is identically equal to a. Thus the subspace Co(J)is actually the closed ball of radius N and center a in C(J). Being aclosed subspace of a complete space it is complete, so we can apply thecontractive mapping principle to mappings of Co(J) to itself. We wantto apply it in particular to the mapping Tx(t) = a + Jt~G(s, x(s))ds.

11.1.4 Local Existence and Uniqueness*

leave the details as an exercise. In particular, since we know the Picarditerations approximate the solution on some interval about to, we canconclude that the perturbation series converges on that interval (infact, that the convergence is uniform, since we are using the sup-normmetric in the Contractive Mapping Principie). There are many othercontexts in which the method of perturbation series can be applied.

Chapter 11 Ordinary Differential Equations476

Page 496: Strichartz_The Way of Analysis 2000

Proof: By the lemma, if we take J small enough, T will map Co(J)into Co(J). But by the same argument as in the proof of the globaltheorem, we will have the contractive estimate d(Tx, Ty) ::; pd(x, y)for sorne p < 1 and for x and y in Co(J), if J is small enough. Thisis because that argument only involved using the Lipschitz conditionfor G(s,x(s)) - G(s,y(s)), and the condition that x and y be in Co(J)says exactly that the Lipschitz condition applies. Thus once again theContractive Mapping Principle yields the existence and uniqueness ofsolutions to the integral equation and, hence, to the Cauchy problem.QED

Theorem 11.1.2 (Local Existence and Uniqueness) Let G(t, x) be de­fined and continuous for t in 1 and Ix - al ::;N, and let it satisfy thelocal Lipschitz condition IG(t,x) - G(t,y)1 ::;Mlx - yl for t in 1 andIx - al ::; N, Iy - al ::;N. Then for to in 1 there exists a subinterval Jcontaining to on which the Cauchy problem x'(t) = G(t, x(t)), x(to) = a,has a unique solution.

Notice that the local Lipschitz condition did not playa role in theaboye argument-it will only be used to get the contractive estimate.

ITx(t) - al = 11.:G(s, x(s)) dsl ~ 1.: IG(s, x(s))1 ds

by Minkowski's inequality, and then we can substitute IG(s,x(s))1 <M« to obtain ITx(t) - al ::;Molt - tol ::;N if It - tol ::; NjM«. QED

It will not be hard to show that T is a contractive mapping from Co (J)to C(J), but first we show that the image líes in Co(J).

Lemma 11.1.3 Let G(t, x) be defined and continuous for t in 1 andx in Ix - al::; N, and let Mo be the sup of IG(t,x)1 for t in 1 andIx - al ::;N. Then Tx(t) = a+ Jtt G(s, x(s)) ds maps Co(J) into Co(J)provided J is contained in It - tol ::; NjMo.

Proof: Note that Tx(t) is a continous function (even Cl) becauseit is a constant plus an integral of the continuous function G(s, x(s)).Thus the image of T lies in C(J). We need to show ITx(t) - al ::; Nfor every t in J. But

47711.1 Existence and Uniqueness

Page 497: Strichartz_The Way of Analysis 2000

"Ig'(8)1 = ¿(YA: - zA:)8G/8zA:(t,z + s(y - z)) ,k=l

G(t, y) - G(t, x) =g(1) - g(O) =[ g'(.) d.

by the fundamental theorem of the calculus, so

IG(t,y) - G(t,x)1 ~ [1g'(.)ld.by Minkowski's inequality. But

Figure 11.1.4:

Also 9'(s) =¿~=l(YA:- zA:)8G/8zk(t, z + s(y - x)) by the chain rule.We have

Corollary 11.1.2 Let G(t, z) be a el function defined [or t in 1 and[or x in sorne open set D in R", taking values in R". Then the Cauchyproblem z'(t) = G(t, z(t)), z(to) = a [or to in 1 and a in D has a uniquesolution on sorne subinterval J containing to.

Proof: By taking N small enough we can malee the ball B ={Iz - al 5 N} lie in D since D is open. We then can apply the theoremto this ball and a subinterval J, provided we verify

IG(t,x) - G(t, y)1 5 Miz - ylfor x and y in B. But this follows from the hypothesis that G is elby considering the restriction of G to the line segment joining z and y.We can parameterize this segment by z + s(y - x) for O5 8 5 1. Thenfor fixed t,x, and y, g(s) = G(t,z + s(y - x)) is a el function (for zand y in the ball B the line segment stays inside B because the ball isconvex, as shown in Figure 11.1.4).

Chapter 11 Ordinary Differential Equations478

Page 498: Strichartz_The Way of Analysis 2000

It is natural to ask how big the interval J, on which we have ex­istence and uniqueness, can be taken. In particular, why can't we usethe idea of piecing together local solutions as we did in the proof of theglobal existence and uniqueness theorem? The answer is that of coursewe can use the idea, although it will not take us quite as faro The u­niqueness of the solution-if it exists-is a global fact. If two solutionsx(t) and y(t) exist on any subinterval lo of 1 and x(to) = y(to) for to inlo then they must be equal on lo. The same argument as in the globaltheorem works-just apply the local uniqueness at the points wherethe two solutions stop being equal.

n 8GE(Yk - Xk) 8x (t, x + s(y - x))k=l k

n 18G I:sEIYk-Xkl 8x (t,x+s(y-x))k=l k

n 18G I:s Ix - ylE 8x (t, x + s(y - x)) .k=l k

Either approach yields the desired Lipschitz condition with slightlydifferent M. (One could also use the mean value theorem in place ofthe fundamental theorem of the calculus.) Thus the theorem applies.QED

and this can be dominated by Mlx - yl where M depends only on themaximum of 18G/8xkl on J x B. (This is finite because B and J arecompact and G is el.) The most efficient way to do this is to applythe Cauchy-Schwartz inequality in }Rn to obtain

n 8GI)Yk - Xk) 8x (t, x + s(y - x))k=l k

(

n ) 1/2 ( n ) 1/2::s (;(Yk - Xk)2 E;~(t, x + s(y - x))2

(

n ) 1/2= Ix - yl E ;~ (t, x + s(y - x))2

k=l k

One could also use the triangle inequality and IXk - Yk I < Ix - yl toobtain

47911.1 Existence and Uniqueness

Page 499: Strichartz_The Way of Analysis 2000

Therefore, if we start at the point t = c - ó/2 we will be able toextend the solution past c, contrary to the assumption that the solutioncould not be extended past c. Of course, if x (t) becomes unboundedas t ~ c, then the interval of existence gets smaller and smaller ast ~ c (because the bounds on G get larger and larger), and there isno contradiction. This argument shows that functions like sin l/x with

Figure 11.1.5:

N 1--------,

c+ee

2N-------

Regarding the existence of the solution on larger intervals, the bestwe can say is that the solution exists until it becomes unbounded, orleaoe»the domain 01the function G. To treat the simplest case, let usassume that G is defined and el for a11t in R and a11x in R", Startingwith the initial condition z(to) = a, we have the existence of the solu­tion on an interval J. Then, by starting at points near the endpointsof J we can extend the solution. Let 1denote the largest interval onwhich the solution exists. By the local existence and uniqueness thismust be an open intervalo If it is not the whole line, then what is thebehavior of the solution as we approach an endpoint of l? We claimthat the solution must become unbounded. The reason for this is abit subtle. Suppose to the contrary that x(t) remained bounded, sayIz(t)1 :5 N, as t ~ e from below, where c is the upper endpoint of 1.Then the function G( t, x) is bounded and satisfies a Lipschitz conditionunder the restrictions to :5 t :5 C+é and Ixl :5 2N. Now an examinationof the proof of the local existence and uniqueness theorem shows thatstarting with Cauchy data in the region to :5 t :5 c and Ix I :5 N, thesolution wiIl exist over an interval of fixed length ó, where Ó dependsonly on the bounds for G and the Lipschitz constant over the largerregio n to ~ t ~ C + s, Ixl :5 2N.

Chapter 11 Ordinary Differential Equations480

Page 500: Strichartz_The Way of Analysis 2000

Integrating again we obtain

x(t) = a(O) + (t - to)a(l) +1.'1.T G(s, x(s), x'(s)) ds dr.to 'o

x'(t) = a(1) +1.'G(s, x(s), x'(s)) ds.to

or

IG(t, y) - G(t, z)1 ~ Mly - zlfor a11t in 1 and all y and z in sorne ball B in IRnm• In the first casewe have global existence and uníqueness and in the second case localexistence and uniqueness for the Cauchy problem

x(m)(t) = G(t, x(t), ... , x(m-l)(t)),x(to) = a(O), ••• , x(m-l)(to) = a(m-l)

(with the Cauchy data a(O), ••• , a(m-l) in the ball B in the second case).This can be proved by reducing to the first-order case as indicatedbefore, but it can also be proved directly. The idea is that we nowwant to integrate m times to obtain an equivalent integral equation.To see what this should look like let 's consider the case m = 2. Thenintegrating x"(t) = G(t, x(t), x'(t)) once yields

x'(t) - x'(to) =1.' x"(s) ds =1.t G(s, x(s), x'(s)) dsto to

for all t in 1and all y and z in IRnm or that G satisfies a local Lipschitzcondition

IG(t, y) - G(t, z)1 ~ Mly - zl

where G is a continuous function defined on an open set in IR1+mn

taking values in IRn. We will assurne that either G satisfies a globalLipschitz condition

11.1.5 Higher Order Equations*Finally, we turn to the case of higher order equations,

x(m)(t) = G(t, x(t), ... , x(m-l)(t)),

oscillating discontinuities cannot be solutions to the kind of o.d.e. '8 weare considering.

48111.1 Existence and Uniqueness

Page 501: Strichartz_The Way of Analysis 2000

Thus the integral equation for m = 2 is

x(t) = a(O)+ (t - to)a(1) +1.t(t - s)G(s, x(s), x'(s)) ds,to

and it is easy to verify directly that any solution of the integral equationis a solution to the Cauchy problem. By induction, it follows that theCauchy problem in general is equivalent to the integral equation

m-l .x(t) = L (t -.:0)1 a(i). ° J.J=

1.t (t - s)m-l (m-l)+ 'o (m _ 1)! G(s, x (s ), ... , x (s ) )ds.

We leave the details as an exercise. Notice that the integral equationalso involves derivatives-up to order m - l-but there is still a gainof one derivative: assuming that x(t) is Cm-1 and satisfies the integralequation we obtain that x(t) is cm by the fundamental theorem of thecalculus applied m times to the right side of the equation.

To solve the integral equation by the Picard iteration method wechoose an initial approximation xo(t) and define inductively

m-l .Xk(t) = L (t -.:o)J a(j)

j=O J.

¡t (t - s)m-l (m-l)+ to (m-l)! G(S,Xk-l(S), ... ,Xk_1 (s))ds.

Again we prove the sequence {Xk} converges to a solution ofthe integralequation on a sufliciently small interval by appealing to the Contrae­tive Mapping PrincipIe. This time we have to take the metric spaeec(m-l)(J) with metric

d(x, y) = suP. max Ix(i)(t) - y(j)(t)1J J=O,...,m-l

We can simplify the iterated integral by interchanging the order (wewill give a proof that this is valid in Chapter 15),

¡ t¡TG(s,x(s),x'(s))dsdr = ¡'lt G(s,x(s),x'(s))drds'o 'o to s

= 1.t(t - s)G(s, x(s), x'(s))ds.'o

Chapter 11 Ordinary Differential Equations482

Page 502: Strichartz_The Way of Analysis 2000

3. Verify that x(t) is a solution of the mth order o.d.e.x(m)(t) = G(t, x(t), ... , x(m-l)(t)) if and only if (xo, ... , Xm-l) =

a. x"(t) = t2x'(t) + sin t.

b. x"'(t) = x"(t)x'(t) + x(t).

{x"(t) = sin y'(t) + x(t)y(t),

c. y"(t) = y'(t)2 + x'(t)2.

2. Reduce the following o.d.e. 's to first-order systems by introducingnew variables:

a. x'(t)x(t) + sinx(t) = 27.

b { x'(t)y(t) + t2y'(t) = 0,. x(t)y'(t) + x(t)2 = t4•

c. exp (x") - x'(t)2 = t2.

1. Put the following o.d.e's into normal form by solving for the high­est order derivatives:

11.1.6 Exercises

is a contraction on that metric space. The details are not too muchdifferent than before, so we leave them for the exercises. It is interestingto observe that the Picard iterations described aboye are somewhatdifferent from the iterations obtained if you first reduce the equationto first order. Of course either method produces approximations thatconverge to the same solution.

m-1 .'" (t - to)1 (j)L._¿ '1 aj=O J.

lt (t )m-l- S (m 1)+ (_ )1 G(s, x(s), ... , X - (S)) ds.to mI.

Tx(t) =

(or a ball in C(m-l) (J) in the local case). Of course, it is necessaryto show that this is a complete metric space and that the Lipschitzcondition on G implies that the mapping

48311.1 Existence and Uniqueness

Page 503: Strichartz_The Way of Analysis 2000

8. Compare the Picard iterations for the aboye integral-differentialequation when m = 2 with the Picard iterations obtained by firstreducing the system to first order.

m-l .= L (t -.:o)J a(j)

. O J.J=

1t (t )m-l+ (- ~ )' G(s, x(s), ... , x(m-l)(s)) ds.to m 1.

x(t)

(x, e', ... ,x(m-l)) is a solution of the first-order system

X~_l (t) = G(t, xo(t), ... ,Xm-l (t)),xÁ:(t) = Xk+l(t) k = 0,1, ... ,m - 2.

Alsoverify that x(t) satisfies the Cauchy initial conditions x(k)(tO) =a(k) for k = 0,1, ... ,m-l if and only if (xo, ... ,Xm-l) satisfies theCauchy initial conditions (xo(to), ... ,Xm-l (to)) = (a(O), ... , a(m-l)).

4. Show that al! solutions of x"(t) = -x(t) are of the form x(t) =A cost+ B sin t. Using this, decide for which values of tl and t2 theo.d.e. x"(t) = -x(t) with boundary conditions X(tl) = al, X(t2) =a2 has a unique solution on [t¡, t2], for any choice of al, a2.

5. Prove that a homogeneous linear o.d.e. x'(t) = A(t)x(t), wherex(t) takes values in Km and A(t) is a continuous m x m matrixvalued function, has an m-dimensional vector space of solutions.

6. Write out Picard iterations explicitly as a perturbation seriesfor the inhomogeneous linear o.d.e. x'(t) = A(t)x(t) + b(t) withCauchy data x(to) = a choosing xo(t) = a+D-lb(t). What wouldhappen with a different choice of Xo (t)?

7. Prove that x(t) is a cm solution to the o.d.e.

x(m)(t) = G(t, x(t), ... , x(m-l)(t))

with Cauchy data

x(j)(to) = a(j), j = 0, 1, ... ,m - 1,

if and only if x(t) is a Cm-l solution of the integral-differentialequation

Chapter 11 Ordinary Differential Equations484

Page 504: Strichartz_The Way of Analysis 2000

11.2.1 DifferenceEquation Approximation

In this section we discuss two alternate approaches to obtaining so­lutions of the Cauchy problem. The first, which is sometimes calledEuler's method, involves obtaining approximations to the solution byreplacing the differential equation by a differenceequation. This is anal­ogous to approximating a definite integral by a Cauchy sumo Euler'smethod actually yields a proof oí existence under weaker condition­s than Picard's method-we need only assume the function G in theo.d.e. is continuous. Oí course we have seen that under such weak as­sumptions the uniqueness of the solution is not assured. Quite frankly,1don't know of any applications in which existence without uniquenessis of any value, but 1will present the proof anyway because it uses theArzela-Ascoli theorem in a nice way. The real importance of Euler'smethod is that it forms the basis of efficient numerical algorithms forapproximating solutions of o.d.e.'s. There are many improvements onEuler's method, essentially based on the idea of using an improvementon the Cauchy sum approximation to an integral, such as the trape­zoidal rule or Simpson's rule. We wiIl not discuss these here.

The second method involves expansions into power series oí boththe solution and all the functions involved in tbe o.d.e. This metbod isevidently limited to o.d.e.'s for which power-series expansions are avail­able but within this narrower class of equations gives a computationalalgorithm of some practical importance, and it yields some informationthat is not readily available through the other approaches.

We will consider Euler's method in the case of a single first-ordero.d.e. x'{t) = G(t, x(t)). Recall the interpretation of this equation as

11.2 Other Methods of Solution*

9. The first-order o.d.e. with n = 1 is called separable if G(e, x) =91(S)92{X) for some functions 91 and 92. Assume they are eland 92 is never zero. The usual technique for solving a separableo.d.e. is to write formally dx/dt = 91(t)92(X) hence dx/92(X) =91(t)dt, so J dx/!J2(x) = J 91(t)dt + c. Justify this method byshowing that for Gl(t) = ft~91(S) ds and h(x) = J:dY/92(Y),the solution to the o.d.e. with initial value x(to) = a is given byx(t) = h-1(Gl(t)).

48511.2 Other Methods 01 Solution

Page 505: Strichartz_The Way of Analysis 2000

Clearly this determines y(t) on the interval [to, tI] to be y(t) = Yo +G(to, yo)(t - to), as shown in Figure 11.2.1. We set YI = y(tl)' Thefunction y( t) on the interval [to, tI] differs slightly from the solutionx(t) ofthe o.d.e. x'(t) = G(t,x(t)) because G(t,x(t)) will in general bedifferent from G(to, Yo). But if the interval is small the difference willnot be great (assuming G is continuous), so we should have a good ap­proximation. We then repeat the process on the interval [tI, t2] startingwith the initial data y(tl) = YI. We now require y'(t) = G(tl,YI) andso y(t) = YI +G(tl,YI)(t-tl)' We continue by induction to define y(t)on each of the intervals [tb tk+l] by y(t) = Yk + G(tb Yk)(t - tk) whereYk = y(tk)' In this way we obtain a continuous piecewise affine functiony(t) on the interval [to, T] where each affine piece of the graph of Y isa line segment of the prescribed slope at the left end, as illustrated inFigure 11.2.2.

Before we get into the question of whether (or how well) y(t) ap­proximates the solution x(t), we should consider sorne alternate inter-

Figure 11.2.1:

ti

Yo = a

YI

saying that the graph of x( t) in the t-x plane has its slope prescribedby G (t, x) and so can be imagined as being pieced together out of in­finitesmalline segments ofslope G(t, x) through the point (t, x). Euler'smethod takes small, rather than infinitesmal, line segments and piecesthem together to obtain an approximate solution. Let x(to) = a be thegiven initial data, and suppose we want the solution on the interval[to, T]. We partition the interval as to < tI < ... < (", = T. On thefirst interval [to, tI] we take the affine linear function y(t) that satisfiesthe same initial condition y(to) = Yo and the condition y'(t) = G(to, Yo)where we set Yo = a for consistency of notation.

Chapter 11 Ordinary Differential Equations486

Page 506: Strichartz_The Way of Analysis 2000

IcYlc+l = Yo +E G(tj, z(tj»(tj+l - tj)

;=0(evaluating G(s, x(s)) at the left endpoint s = tj). Of course we don'tknow x(t;) exactly, but we could approximate it by Y;; this yields

Ic

YIc+1= Yo +EG(tj, Yj)(tj+l - tj),j=O

pretations of y(t). Since the graph of y(t) is a broken line, it clearlysuffices to compute the values YIc = y( tlc)-for then we can "connectthe dots" to obtain the intermediate values. Notice that the equationsy(t) = YIc+G(tlc, YIc)(t-tlc) for t in [tlc,tlc+11evaluated at t = tlc+1 yieldthe relations Ylc+l- YIc= G(tlc, YIc)(tlc+1 - tic) or

YIc+1- YIc = G(t )t t Ic,YIc ,1c+1- Ic

which is clearly a differenceequation analog ofthe o.d.e. x'(t) = G(t, x).Thus Euler's method can be construed as solving the differential equa­tion approximately by replacing it with a difference equation.

Another interpretation of Euler's method derives from looking atthe integral equation

x(t) = Q +t G(s, x(s)) ds

and replacing the integral by a Cauchy sum. H we take t = tlc+l andwe partition the interval [to,tlc+1] in the obvious way as to < tI < t2 <... < tlc+lt this becomes

Figure 11.2.2:

48711.2 Other Methocls 01 Solution

Page 507: Strichartz_The Way of Analysis 2000

Notice that if we extend the size of the interval on which we dothe approximation, the constant eMllt-tol - 1 that multiplies h in ourestimate grows exponentially. This is exactly what we should expectbecause the errors in the method may compound. Of course, if we fixany interval, we can choose h small enough to make the error estimateas small as we like on that intervalo But it is only fair to point outthat the smaller we take h, the more points we have to take in thepartition and, hence, the more computations we have to perform. Also,any numerical implementation of this algorithm will involve round-off

where x(t) is the exaet solution arul y(t) is the Euler approximationusing step size h.

Theorem 11.2.1 Let G be el funetion that is bounded together withits first derivatives. Then there exist eonstants MI and M2 (dependingon the bounds for G and dG) sueh that

which is just another way of writing Euler's method. This interpre­tation shows clearly that there are two sources of error, one from re­placing the integral by a Cauchy sum, and one from evaluating G at(tj, Yj) rather than (tj, x( tj)). Also there is the possibility that errorswill compound, since the Yj are determined inductively and any errorin one particular Yj gets passed on to all subsequent ones. Thus wecan't simply conclude that Yk approximates X(tk) from the fact thatthe Cauchy sum approximates the integral.

Nevertheless, it is possible to show that if G is el, then on a suf­ficiently small interval [to, T] Euler's method does approximate the so­lution. In fact if h denotes the maximum length of the subintervals[tk, tk+l], then y(t) - x(t) = O(h); in other words, Iy(t) - x(t)1 ~ ehfor to ~ t ~ T where e is a constant depending only on the o.d.e.From a practical point of view, the O (h) error is not very good, andmore sophisticated methods-based on better approximations to theintegral-are preferred.

To avoid technicalities we assume the partition intervals are of equallength h, so tk = to+kh, and that G and its first derivatives are globallybounded. Under this hypothesis the solution exists for all t and theEuler approximation is valid on any interval.

Chapter 11 Ordinary Differential Equations488

Page 508: Strichartz_The Way of Analysis 2000

We will prove this by induction. Notice that for k = O we haveYo = x(to) = a as required. Thus we may assume that (*) holds,and we need to show the analogous estimate holds for k + 1. (In thecourse of the argument we will specify the constants MI and M2.)Euler's method gives us Yk+l = YIc +G(tk, Yk)h, and the integral equa­tion gives X(tlc+l) = X(tlc) + h:lc+l G(s, x(s)) ds for the exact solution.Then Yk+l - X(tk+l) = Yk - X(tk) + G(tk,Yk)h - t::G(s,x(s))dsand the first difference Yk - X(tk) is controlled by (*). In compar­ing the integral h~lc+lG(s,x(s)) ds with g(tlc,Yk)h we are on slipperyfooting because the value Yk is not exactly X(tk). Therefore we willwrite G(tlc, Yk)h - t:G(s, x(s)) ds = [G(tk, YIc)h - G(tk, x(tk))h] +[G(tk, x(tlc))h - Jt:Ic+1 G(s, x(s)) ds] and estimate each of the bracketeddifferences separately.

The first is easy. Using the mean value theorem we haveIG(tk, Yk)h - G(tk, x(tk)hl ~ M1hlyIc - x(tk)1 where M¡ (this will bethe constant M¡ in (*)) is an upper bound for the derivatives of G(aetually just the z-derívatives, sinee the point tic is the same in bothG(tk, Yk) and G(tk, X(tk))). Notice that this is just what the doctorordered, because we have IYk - x( tk) I on the right, which is controlledby (*).

For the second difference, notice that we have the integral over aninterval of length h of the function G(s, x( s)) being compared with thevalue oí this function at the left endpoint s = tk multiplied by thelength of the intervalo If we call this function g(s) = G(s, x(s)), then

Proof: For simplicity we only prove the error estimate at the parti­tion point tic. Thus we need to show

error; and when a large number of calculations need to be performed,the round-off error has to be taken into aceount.

48911.! Other Methods 01Solution

Page 509: Strichartz_The Way of Analysis 2000

for tk ~ t ~ tk+l, maleesense without reference to any solution. Wecan use these to obtain a solution via the Arzela-Ascolitheorem.

We will now show how Euler's method can be used to establish theexistenceof solutionsunder the hypothesis that G is continuous. Sincewe don't yet know that solutions exist under such weak assumptions,we cannot refer to "the solution" x(t)-in fact we know by example(x' = (3/2)x1/3) that there may be more than one solution to theCauchy problem. Nevertheless,the formulasdefiningEuler's method,

11.2.2 Peano Existence Theorem

because 1+Mlh ~ eM1h (remember the power series for exp). Thuswe have the desired analog of (*) for k + 1. QED

We now can specify that M2 =M /2M}. It then followsthat

IYk+¡ - x(tk+¡)1 ~ M2h[(1 +M¡h)(eM1kh - 1)+M¡h]= M2h[(1 +M¡h)eM1kh - 1] ~ M2h(eMl(k+1)h - 1)

Then substituting (*), wehave

by the chain rule and x'(s) = G(s, x(s)) by the o.d.e., so g' can bebounded in terms of the bounds for G and its derivatives.

If we combine a11three estimates we have altogether (triangle in­equality)

IYk+¡ - x(tk+¡} I ~ IYk - x(tk)1 +M¡hlYk - x(tk)1 + (1/2)Mh2.

8G 8Gg'(s) = -8 (s,x(s)) + -8 (s,x(s))x'(s)s x .

19(tk)h - 1:'+1g(a) dal s M 1:'+1(a - tk) da = (1/2)Mh2

whereM is a bound for g'. But

by the mean value theorem. Thus

Chapter 11 Ordinary DifferentialEquations490

Page 510: Strichartz_The Way of Analysis 2000

The condition for trapping the solution is that Mlt - tol ~ r for t inJ, which can be achieved by making J small enough while holding rfixed (the constant M, which is the sup of IGI over a larger domain,will still be an upper bound for IGI over this smaller domain). Thereason the condition Mlt - tol ~ r for t in J traps the solution is thatthe derivative x'(t) is trapped between -M and +M, so any solutionmust lie between the lines through (to, a) with slope -M and +M.

For simplicity we assume J is [to, T). Consider any partition to <t¡ < ... < tN = T of this interval, and let's attempt to define induc­tively the Euler method function

y(t) = Yk + G(tk, Yk)(t - tk)

for t in [tk, tk+l] where Yo = a and Yk = y(tk)' For this to make sensewe need to verify that (tk, Yk) is in the domain of G; in fact we will

Figure 11.2.3:

---J---slope-M

i

a ~

I!

slope+M

Theorem 11.2.2 (Peano Existence Theorern)Let G(t, x) be a contin­uous real-valuedfunction [or t in sorne interval 1containing to and xin sorne neighborhood01 a. Then there exista an interval J containingto and a solution x(t) to the Cauchy problem x'(t) = G(t, x(t)) andx (to) = a on the interval J. The solution need not be unique.

Proof: If the domain of G is not compact we can always make itso by shrinking it. Then let M be the sup of IGI over this compactdomain, which is finite because G is continuous. We want to shrinkthe domain still further to a rectangle (t in J and Ix - al ~ r) in whichthe approximations y( t) and the solution will be trapped, as shown inFigure 11.2.3.

49111.2 Other Methods 01Solution

Page 511: Strichartz_The Way of Analysis 2000

ift and s belong to the same subinterval [tk, tk+I]' Since the constant Mdoes not depend on the partition, we obtain the uniform equicontinuity.

Thus the Arzela- Ascoli theorem applies and so there exists a uni­formly convergent subsequence. For simplicity of notation let us alsodenote this subsequence y(l), Y(2), .... Let x(t) = limj-+oo y(j)(t), the

Iy(t) - y(s)1 = IG(tk, Yk)(t - s)1 ~ Mlt - si

for tk ~ t ~ tk+I' Thus also IYk+I - al ~ r since M(tk+I - to) ~ r byassumption. This completes the induction argumento

Now we want to vary the partition, letting the maximum lengthof the subintervals go to zero. Let y(l), y(2), ... denote the sequenceof functions obtained by the Euler method from the partitions. Wewould hope to obtain a solution as the limit of this sequence, but thenonuniqueness means the limit may not existo Instead we must settlefor something weaker, a limit of a subsequence. For this we invokethe Arzela-Ascoli theorem. Thus we need to verify the hypotheses ofthat theorem, that the sequence is uniformly bounded and uniformlyequicontinous (they are obviously continuous functions defined on acompact domain). The estimate Iy(t) - al ~ Mlt - tol ~ r establishedaboye shows that the functions are uniformly bounded. The uniformequicontinuity is essentially a consequence of the fact that the func­tions are piecewise affine with derivative G(tk, Yk) bounded by M. Weclaim that any function y(t) constructed by Euler's method will satisfyIy(t) - y(s)1 ~ Mlt - si for t and s in J-thisjust follows by summingthe same estimate over all the subintervals connecting t and s, since

Iy(t) - al IYk - a + G(tk, Yk)(t - tk)1< IYk - al + IG(tk, Yk)(t - tk)1< M(tk - to) + M(t - tk) = M(t - to)

for t in [to, tI]. Since we are assuming Mlt - tol ~ r in J, we haveIYI - al ~ r. Continuing by induction, assuming IYk - al ~ r andIy(t) - al ~ Mlt - tol for to ~ t ~ tb we have

Iy(t) - al = IG(to, Yo)(t - to)1 ~ Mlt - tol

show IYk - al ~ r or more precisely Iy(t) - al ~ Mlt - tol for t in J. Wedo this by induction. Since Yo = a, we have trivially Iyo - al ~ r and

Chapter 11 Ordinary Differential Equations492

Page 512: Strichartz_The Way of Analysis 2000

Therefore the error Rj is just the difference between a Cauchy sumand an integral. Notice that both the partition and the functionG(s, y(j)(s)) being integrated vary with j. Since the maximum intervallengths of the partitions are going to zero, we know that the Cauchysums for any fixed continuous function I are converging to the integralof l. An examination of that proof, however, shows that the rate of con­vergence depends only on the modulus of continuity, which is defined

y(j)(t) = y(i)(t~)) + G(t~),y(j)(t~)))(t - t~))k-l

= a +EG(t~), y(j)(t~)))(t~~l - t~))p=o

+G(t~), y(j)(t~)))(t - t~)).

because the convergence of y(j) to x is uniformoThe definition of y{i)(t) by Euler's method shows that y{i}(t) - a is

a certain Cauchy sum for the integral Jt~G(s, y{i)(s)) ds correspondingto the partition to < t~j) < oo. < t~) < t of [to, t] and evaluation ofG( s, y{i) (s)) at the left endpoint of each subinterval. Here to < t~j) <... < tW = T is the partition corresponding to y(j) and t~) is thelargest value for which y~) < t (thus k depends on t and j, but wehave not burdened the notation with this dependence). This is just theequation

y(j)(t) = a+ t G(s,y(i)(s))ds+Rj(t)i;where Rj(t) ~ Oas j ~ 00, for

Iim t' G(s, y(j}(s)) ds = r G(s, x(s)) dsJ-ooJ~ ho

for we have already seen that the integral equation is equivalent to theCauchy problem, We will obtain the integral equation if we can showthat

limit converging uniformly on J. To complete the proof we will showthat x(t) satisfies the integral equation

x(t) = a + r G(s, x(s)) dsJto

49311.2 Other Methods 01 Solution

Page 513: Strichartz_The Way of Analysis 2000

00

p(t) = LPk(t - to)k,k=O

where p, q, r are the analytic functions

x"(t) =p(t)x'(t) + q(t)x(t) + r(t)

has a power-series solution that converges in a small neighborhood ofto provided the function G has a convergent múltiple power series in aneighborhood of the Cauchy data. The proof of this, however, wouldtake us too far afield. We should also point out that the result hasa generallzation to partial differential equations, the famous Cauchy­Kovalevsky Theorem.

The o.d.e.'s we consider are of the form

x(m}(t) = G(t, x(t), x'(t), ... ,x(m-l}(t))

We turn now to power-series methods. We will consider only linearo.d.e. 's of second order, because this special case ineludes many im­portant examples. Making the assumption of linearity vastly simplifiesthe problem, although it is still true that the Cauchy problem for thegeneral equation

11.2.3 Power-Series Solutions

for G continuous.

x(m}(t) = G(t, x(t), x'(t), ... , x(m-l}(t))

Although we have stated the theorem only for single equations, itis clear that the proof goes over almost without change to first-ordersystems and, hence, to systems of arbitrary order

to be w(6) = sup{lf(a) - f(t)1 given that la - ti ~ 6}. In the presentcase, the uniform equicontinuity of the functions y(j} implies that asingle function w(6) dominates the modulus of continuity for a11the in­tegrands G(s, y(j}(s)) so that the difference between Jt~G(s, y(j}(a)) daand a Cauchy sum is small for all j provided the partition is small.This implies Rj -+ O as j -+ 00 and so the integral equation holds.QED

Chapter 11 Ordinary Differential Equations494

Page 514: Strichartz_The Way of Analysis 2000

00

Z"(t) = L k(k - 1)Cktk-2k=200

= L(k + l)(k + 2)Ck+2tkk=O

and also (by making a change of variable in the k summations)

p(t)x'(t) + q(t)x(t) + r(t)

= (fp;t;) (f(k + l)Ck+1tk)1=0 k=O

+ (~~~) (~Cktk) + tarktk

= f: (tp;(k - j + l)Ck-HI) tkk=0 j=O

+f (t ~Ck-;) tk +f rktkk=O 1=0 k=O

00

r(t) = L rk(t - tO)kk=O

with power series converging at least in It - to I < E. We wish to solvethe Cauchy problem (x(to) = a, x'(to) = b) by an analytic function withpower series x(t) = ¿k:O Ck(t - to)k, also convergent in It - tol < E.

Notice then that the Cauchy initial data z(to) = Q, z'(to) = b justmeans CO = Q, CI = b. Thus the problem can be restated: find ananalytic solution of the o.d.e. z(t) = ¿~o Ck(t - tO)k with CO and CIspecified. For simplicity of notation we will set to = O.

We proceed backward, assuming the solution exists. What does theo.d.e. say about the coefficients Ck? Inside the radius of convergencewe can perform all the operationS called for in the o.d.e. by operatingformally on the power series. We obtain

00

q(t) = L qk(t - to)k,k=O

49511.2 Other Methods 01 Solution

Page 515: Strichartz_The Way of Analysis 2000

for k = 0,1,2, .... Notice that the right side of the equation containsco, ... ,Ck+l but no other coefficients and that the left side is just Ck+2multiplied by a positive constant. Thus, given values for Co and Cl,

these algebraic equations have a unique solution; in fact they are inho­mogeneous linear equations in triangular formo It is the simplicity ofthe solution of these equations that makes the use of power series soappealing here.

So far we have seen that if the problem has a power-series solution,then we have the recursion formulas (*) to find the coefficients. Thenext question is can we reverse the process? That is, we start bysolving the equations (*) to obtain the formal power series L:r=o Cktk.If we can show that this power series converges in Itl < e, we will bedone, for then (*) will show that the power series satisfies the o.d.e.However, this seems a formidable task, since the coefficients Ck are onlygiven indirectly as solutions of (*) and because in order to show thatthe power series converges we will need sorne estimates for the size ofICkl. While it is possible to derive such estimates directly from theequations (*), there is a simpler indirect approach known as Cauchy'smethod of majorants. The idea of this method is to show there existsa majorant series L:~o dktk, with hl ~dk (so the dk are all positive)and yet L:~o dktk converges in Itl < e. This implies the convergenceof L:~o Cktk by the comparison test. In order to produce the majorantseries we will systematically increase all the coefficients of the powerseries of p, q, r, Because of the form of (*), it is easy to show thisproduces a majorant series. Finally, we will obtain the convergence ofthe majorant series not by estimating its coefficients but because wewill be able to write down the function L:~o dktk explicitly-it will be

AH these power series will be convergent in Itl < e. By the uniquenessof power-series expansions we can equate the coefficients of x" (t) andp(t)x'(t) +q(t)x(t) +r(t). This leads to a system of algebraic equations

Chapter 11 Ordinary Differential Equations496

Page 516: Strichartz_The Way of Analysis 2000

P(t)=¿Pktk, Q(t)=¿Qktk, and R(t)=¿Rktkk=O k=O k=O

are majorants 01 p(t) = ¿'::OPktk, q(t) = ¿'::o qktk, and r(t) =¿'::orktk respectively, meaning Ipkl :::;Pkllqkl :::;Qk, and hl :::;tu.Suppose co and CI are given and ¿'::o Cktk is defined by the equatian

(*) (k+l)(k+2)Ck+2 = (tpj(k - j + l)Ck-j+l + qjCk-J) +rk,J=O

and similarly suppose do and dI are given and ¿'::o dktk is defined bythe analogous equation

(**) (k+1)(k+2)dk+2 = (~Pj(k - j + 1)dk-j+1 + Qjdk-j) +Rk'

Finally assume lcol :::;do and ICII :::;a; Then ¿dktk is a majarant lar¿Cktk, meaning ICkl:::;s, [or every k.

Proof: Notice that this is a purely algebraic fact-we assume noth­ing and conclude nothing about the convergence of any of the formalpower series. The proof is by induction and is almost completely triv­ial. We are assuming Icol :::;do and ICII :::;dI, so let us assume we knowICjI :::; dj for O:::;j :::;k + 1 and then prove 1ck+21 < dk+2' But from (*)and the triangle inequality

(k + l)(k +2)lck+21 < (~IPj I(k - j + l)!ck-j+d + Iqj IICk-jl) + Irkl

since the coefficients (k - j + 1) are all positive. Substituting themajorants for Pj, qj, r k and the assumed estimates ICj I < dj for O:::;j :::;k + 1 we obtain

(k + l)(k + 2) 1ck+2I :o; (~Pj(k - j + l)dk-j+l + Qjdk-j) +n,

000000

the solution of an o.d.e. that can be solved by inspection. This is thetrick that saves us the tedious work of estimating coefficients.

We begin by proving the Majorant lemma:

Lemma 11.2.1 (Majorant lemma) Suppase

49711.2 Other Methods 01 Solution

Page 517: Strichartz_The Way of Analysis 2000

Since

M3R(t) = (8-t)3'andM2

Q(t) = (8 _ t)2

x"(t) = P(t)x'(t) + Q(t)x(t) + R(t),which can't be solved by inspection. However, a slight rnodification ofthe choice of Q and R will lead to a linear o.d.e, of homogeneous type

") MI,( ) M2 ( ) M3x (t = 6 _ tX t + (8 _ t)2x t + (8 _ t)3'

which can easily be solved. This means we want to choose

is an elernentary function and its power series converges in Itl < 8.We might similarly be tempted to try Q(t) = M2/(8-t) and R(t) =

M3/(Ó - t) for suitable constants M2 and M3, but this choice leads tothe o.d.e.

00 00

P(t) = LPktk = L M6-ktkk=O k=O

= M(l _ 8-lt)-1 = M6 = MI8-t 8-t

Proof: In order to put the lernma to work we have to make a cleverchoice of the majorants P,Q,R. Fix a value 6 < c. From the conver­gence of ¿:~oPktk in Itl s 6 we know that Ipkl ~ M6-k for sorne Mand a11k (recall that this followed from the boundedness of Pktk fort = 6). Thus the choice Pk = M 6-k would seem natural. Notice thenthat

Theorem 11.2.3 Let the power series for p, q, r converge in Itl < c.Then the power series x(t) = ¿:~ocktk, where Ck are given by (*) andco and Cl are arbitrarily chosen, also converges in Itl < e to a solution01 the Cauchy problem

x"(t) =p(t)x'(t) + q(t)x(t) + r(t), x(O) = co, x'(O) = CI.

and the right side is just (k + l)(k + 2)dk+2• QED

Chapter 11 Ordinary Differential Equations498

Page 518: Strichartz_The Way of Analysis 2000

is an analytic function with a power series about t = O convergentin Itl < 6; the explicit power series is given by the binomial theorem(see exercise 4 of section 7.4). But this power series must be the sameas ¿dktk because the coefficients must satisfy (**) of the majorant

a(8 - t)..\l + b(8 - t)..\2 + c(8 - t)-l

Thus by adjusting the two constants a and b we can meet the initialconditions x(O) = leol,x'(O) = ICII(this requires only the distinctnessof tbe two roots ~1 and ~2). Tbe solution

Indeed by substituting this in to the o.d.e. and doing the routine calcu­lations (see exercises) we find that .\1 and .\2 must satisfy the quadraticequation .>t(.>t-l) = MI.>t+M2 and Cmust satisfy 2c=MIC+M2C+M3.Thus we have determined a unique value of e (if MI + M2 = 2 we canincrease MIto avoid this problem), and the quadratic equation for .>thas two distinct real roots

"( ) MI,( ) M2 ( M3X t = 6_tX t + (8_t)2x t)+ (6-t)3

x(O) = leol,x'(O) = ICII.But this o.d.e. has solutions of the form

a(6 - t)..\l + b(8 - t)..\2 + c(6 - t)-l.

we can always choose M2 and M3 large enough to malee these majo­rants of q(t) and r(t)-the point being that the factors (k + 1) and(k + 2)(k + 1) do not spoil the estimates.

Choosing do = leol and dI = Icl!, we can apply the lemma toconclude ICkl~ dk where L,dktk is the power series for the solution to

= L6-3M3(k + 2)(k + 1)6-ktk,k=O

00

R(t)

and

49911.2 Other Methods 01 Solution

Page 519: Strichartz_The Way of Analysis 2000

3. Prove that the first-order linear o.d.e. x'(t) =p(t)x(t) + q(t) withinitial condition x(to) = a is solved in closed form by

x(t) = au(t) +ltu(t)q(s) dsto u(s)

where u(t) = exp(Jt~p(s) ds). (Hint: showfirst u'(t) = p(t)u(t).)

for a second-ordero.d.e. by reducing to a first-order system.

1. Solvethe Cauchyproblem x'(t) = x(t)2, x(to) = a by powerseries.What is the radius of convergenceof the solution?

2. Write out the explicit recursion formulas for Euler's method forappoximating solutions of the Cauchy problem

x"(t) = G(t, x(t), x'(t)), x(to) = a, x'(to) = b,

11.2.4 Exercises

where "x is a fixedconstant, becomes

Z"(t) = ~1z'(t) + (~; - 1)z(t),so p(t) and q(t) have singularities at t = O). This is the theory ofregular singular points.

Of course one could also compute dk explicitly (first computingMI, M2, M3, "xl, "x2, a, b, c) from the binomal theorem and then proveby induction ICkl ~ dk directly from (*), thereby replacing the aboyeindirect argument by a mass of incomprehensiblecomputation.

For many applications it is necessary to extend this technique tothe case when the functions p(t), q(t), r(t) do not have power-seriesexpansionsabout the point to but rather have singularities of specifiednature (for example, Bessel'so.d.e.

t2x"(t) + tx'(t) + (t2 - "x2)x(t) = O,

lemma. Thus E dktk convergesin Itl < 6 and so E Cktk convergesinItl < 8 by the majorant lemma. Since we could take any 8 < s, itfollowsthat L Cktk convergesin Itl < E as desired. QED

Chapter 11 Ordinary DifferentialEquations500

Page 520: Strichartz_The Way of Analysis 2000

We want to consider now some elegant applications of the existenceand uniqueness theorem for o.d.e.'s. We want to discuss the notion ofvector field, whích intuitively is the assignment of an arrow to everypoint in space. More precisely, a continuous vector field on an opendomain D in an is defined to be a continuous function F : D -+ ]Rn,

but we want to think of the value F(x) as giving a vector sitting atthe point x (pictorially, the arrow joining x to x + F(x), as in Figure11.3.1).

11.3.1 Integral Curves

11.3 Vector Fields and Flows*

00 (-l)i (t/2)k+2iJk(t) =E "(k + ')'

j=O J. J .

has infinite radius of convergenceand satisfies Bessel's differentialequation x"(t) + (l/t)x'(t) + (1 - k2/t2)x(t) = o.

7. Show that

5. Write out a proof that the first-order linear o.d.e, in problem 3 canalso be solved by power-series methods if p(t) = E~oPk(t - tO)kand q(t) = E~o qk(t - to) converge in It - tol < e,

6. Show that if G is el, then there exists an interval [to,T] andconstants M}, M2 such that Iy(t) - x(t)1 ~ M2(eMdt-tol - l)h on[to, T] where y is the Euler approximation to the exact solution xwith stepsize h.

and derive the conditions on a, b, e that are necessary and sufficientfor this to be a solution. Then show (with MI +M2 - 2 =F O)thatany initial conditions x(O) = do and x'(O) = dI can be met by theappropriate choice of a and b.

4. Substitute x(t) = a(6 - t)..\l +b(6 - t)..\2+c(6 - t)-l into the o.d.e.

"( ) MI,( ) M2 ( ) M3X t = 6_tX t + (6_t)2X t + (6-t)3

50111.3 Vector Fields and Flows

Page 521: Strichartz_The Way of Analysis 2000

Of course we cannot draw in too many of the arrows at once or theywill criss-cross unintelligibly, but we can certainly imagine the vectorfield as a well-groomed head of hair on a fíat- headed persono You canalso think of a weather map showing wind velocities. A common way toget a vector field is to take the gradient of a scalar function, I :D -+ Rwhere I is assumed el, so F = V l. This is called a gradient vectorfield. Not all vector fields are of this form, however (see exercises).

We want to think of the vector field as giving directional instruc­tions: when at x, move with velocity F(x). Ir we obey the instructionswe will trace out an integral curve of the vector field, which is definedto be a el curve x : [a,b] -+ r such that x'(t) = F(x(t)) for all t in[a, b]. If we think of x'(t) as the tangent vector to the curve, then weare saying that the curve has tangent vectors prescribed by the vectorfield. (Warning: This is slightly different from, although related to, ourinterpretation of the o.d.e. x'(t) = G(t, x(t)) as prescribing the slope ofthe graph of the solution.) It is clear that the equation for an integralcurve is just a first-order o.d.e. that is aIready in normal form, so we areready to apply the existence and uniqueness theorem. We will assumethat the vector field F is el so that the local theorem can be applied.The o.d.e. for the integral curve is sometimes called an autonomouso.d.e. because it does not depend on t (thought of as time).

Theorem 11.3.1 Lei F : D -+ Rn with D an open subset 01Rn be ael vector field. Then through every point y 01D there passes a uniqueintegral curve x (t) with x (O) = y. Furthermore x (t) has a natural max­imal domain 01 definition 1, an interval 01R, such that as t approachesa finite endpoint 011 (il 1 is not all 01R) the curve x (t) either becomesunbounded or leaues the domain D. Two different integral curves either

Figure 11.3.1:

x

F(x)-Chapter 11 Ordinary Differential Equations502

Page 522: Strichartz_The Way of Analysis 2000

Notice that the integral curves are actually e2; one derivative existsvia the o.d.e. and x"(t) = dF(x(t))x'(t) by the chain rule, which iscontinuous because F is assumed el. You should imagine a pictureof the domain D filled with disjoint smooth integral curves, such asthe example shown in Figure 11.3.2. Incidentally, nothing preventsthe curves from being closed. We can have X(tl) = x(O) for sorne tI, inwhich case the curve repeats periodically with period tI, X(t+tl) = x(t)for all t, again by the uniqueness. The curves may in fact consist of asingle point, x(t) == y if F(y) = O.

Next, we observe that we can follow the integral curves to obtain aflow. We think of t as a time parameter, and we move the point x(O)to x(t) in time t. Putting all the integral curves together we definethe flow Jt by Jt (y) = x (t) where x is the unique integral curve suchthat x(O) = y. The flow is possibly only locally defined, as x(t) maynot exist for all values of t; but for each fixed y it is defined for sorneinterval of time containing zero. The flow has a local group property,Jt (fs (y)) = Jt+s (y), whenever both sides are defined. The reason isagain the uniqueness. If we let z = Js (y) and start the flow at z, sayx(l)(t) is the integral curve with x(1)(O) = z, then x(t + s) = x(1)(t)because this is true for t = Oand so Js+t (y) = Jt (z) = Jt (fs (y)).

Proof: The existence and uniqueness of x (t) follow from the localexistence and uniqueness of x'(t) = F(x(t)) with x(O) = y. As weobserved following the proof of that theorem, the solution can be con­tinued beyond t = a if limt--->ax(t) exists and is in D. Thus the integralcurves cannot simply disappear but must move out of D or towardinfinity if they don't exist for all t.

The fact that two integral curves must either coincide or be disjointfollows from the fact that the o.d.e. is autonomous. If x(t) satisfies theo.d.e., then so does x(t+s) for any fixed s, for x'(t+s) = dx/dt(t+s) =F(x(t+s)). Now suppose x(l)(t) and x(2)(t) are two integral curves andx(1)(tl) = x(2)(t2) for sorne values tI and t2. Then x(3)(t) = x(1)(t + s)for s = tI - t2 satisfies the o.d.e. and x(3)(t2) = x(l)(tl) = x(2)(t2)' Bythe uniqueness for the Cauchy problem at t2 we must have x(3) (t) =x(2)(t) and, hence, x(1)(t + s) = x(2)(t) for all t. QED

coincide as subsets oJ ~n (x(1)(t) = x(2)(t + s) Jor sorne fixed s) or aredisjoint.

50311.3 Vector Fields and Flows

Page 523: Strichartz_The Way of Analysis 2000

Conversely, from the flowwe can recover the vector field, F(y) =(8/Ot)Jt(y)lt=o. This is an immediate consequence of the definition, forif x(t) is the integral curve with x(O) = y, then (8/8t)Jt(Y) = x'(t) =F(x(t)), so (8/Ot)Jt(y)lt=o = F(x(O)) = F(y). In fact we have actuallyshown (8/8t)Jt(Y) = F(ft(Y)) for any value of t since x(t) = Jt(Y).

Even if we don't start with a vector field but are simply givena ftow, a function Jt (x) defined for x in D and t in some neighbor­hood of O (depending on x) with Jt (x) also in D, satisfying the localgroup law Jt (fs (x)) = JHS (x) whenever both sides are defined, thenwe can define a vector field F by the equation F(x) = (8/8t)Jt(x)lt=o(we must assume that Jt(x) is differentiable in t, oí course). We canagain show that (8/Ot)Jt(x) = F(ft(x)), this time by appealing tothe local group law: F(ft(x)) = (8/8s)Js(Jt(x))ls=o by definition and(8/8s)Js(ft(x))ls=o = (8/8s)Js+t(x)ls=o = (8/8t)Jt(x). From this itfollowsthat Jt (x) for any fixed x is an integral curve for the vector fieldF. In this way we can establish a one-to-one correspondence betweenvector fields and ftows. We will refrain from stating this as a formaltheorem because the comparison of smoothness properties of the vectorfield and the ftow requires some more careful work.

Figure 11.3.2:

Chapter 11 Ordinary Differential Equations504

Page 524: Strichartz_The Way of Analysis 2000

Note that this is really an o.d.e. because the partial derivatives thatappear are taken of a known function H; the unknown functions are

dpj(t) = _8H( () ())d 8

q t ,p t .t qjdqj(t) aHdt = 8pj (q, (t),p(t)),

so the o.d.e. for the integal curves x(t) = (q(t),p(t)) is

8H(q,p)8q

8H(q,p)8p

We can describe classical mechanics in Hamiltonian form using vectorfields and ftows. Following the standard physicist 's conventions, we letq = (ql,"" qn) denote a variable in a domain D in Rn called positionor configuration space and let P = (pI, ... ,Pn) denote a variable in adomain in Rn (usually all of Rn) called momentum space. The domainD x Rn in R2n of all q,p variables is called phase space. Usually theposition variables q are determined by the location of certain masses inspace, and Pi is the coordinate of the momentum associated with themass located at qj. For k particles in three-dimensional space we taken = 3k.

Now we assume we are given a real-valued function on phase spaceH(q,p) called the Hamiltonian. This function is interpreted as givingthe energy of the system when the positions and momenta are given byq, p. The Hamiltonian is determined by the physics of the system understudy. We are making the special assumption that the Hamiltonian istime-independent. That means that any external influences on thesystem cannot vary with time.

Each point in phase space represents a possible description of thesystem at a moment of time. The basic idea of Hamiltonian mechanics(which is, of course, equivalent to Newtonian mechanics) is that thesystem is completely determined by a point in phase space and thatthe future evolution of the system is also determined; in fact there i~a Hamiltonian vector field determined by H, and the flow associatedwith this vector field, called the Hamiltonian flow, gives the evolutionof the system. If the system is at the point x in phase space at timeto, then it will be at the point ¡t(x) in phase space at time to + t.

The Hamiltonian vector field is given by

11.3.2 Hamiltonian Mechanics

50511.3 Vector Fields and Flows

Page 525: Strichartz_The Way of Analysis 2000

As a final application of the existence and uniqueness theorem foro.d.e.'s, we showhow to obtain all solutions to a first-order linear par­tial differentialequation 2:7=1 pj(x)8f /8xj(x) = q(x )f(x)+r(x) where

11.3.3 First-Order Linear P.D.E.'8

The first set of equations definesthe momentum Pj(t) = m(dqj(t)/dt)as mass times velocity, and the second set of equations can then beinterpreted as Newton's F =ma law.

8H dpj(t) 8H dq,8pj (q(t),p(t))dt = - 8qj (q(t),p(t))Tt

by the Hamilton-Jacobi equations. It is usually possible to deduce theexistence of solutions for all time from the conservationof energy andsome special features of H that prevent the solution from becomingunbounded in a finite time with H constant.

A typical example of a Hamiltonian system is a single particle ofmass m movingunder the inftuenceof a forcevector field F(q) that isassumed to be a gradient vector field, F(q) = -VV(q) where V(q) iscalled the potential energy. The configuration space is all of R.3, andthe phase space is R6• The Hamiltonian for this system is H(q,p) =(1/2m)1P12+V(q), interpreted as a sum ofkinetic and potential energy.The Hamilton-Jacobi equations are

dqj(t) = Pj(t)dt m

dpj (t) = _ 88V (q(t)) = r,(q(t)).dt qj

and

q(t) and p(t). This o.d.e. system is known as the Hamilton-Jacobi e­quation.t-they play an important role in the modern theory of partialdifferential equations and in mechanics. Clearly our existence and u­niqueness theorem applies if we assumeH is C2•

Conservationofenergyis an immediateconsequenceof the Hamilton­Jacobi equations. SinceH(q(t),p(t)) is the energyofthe system at timet, we need to show (d/dt)H(q(t),p(t)) = O to conclude that the energyremains the same. But by the chain ruled n 8H dp·(t) n 8H dq·dtH(q(t),p(t)) =L a-:-(q(t),p(t))+ +L a(q(t),p(t)) d: (t))=1 Pl j=1 %

Chapter 11 Ordinary DifferentialEquations506

Page 526: Strichartz_The Way of Analysis 2000

1. Prove that a el vector field F that is a gradient vector field mustsatisfy the curl F = O equations 8Fj /8Xk - 8Fk/8xj = O for allj -# k.

11.3.4 Exercises

A function I is a solution of the p.d.e. (*) if and only if for each integralcurve the restriction I(x(t)) is a solution of the o.d.e. (**). Of coursethe o.d.e. (**) is linear and so can be solved in closed form (see exercise3 of section 11.2). However the o.d.e. for the integral curves x(t) is ingeneral non-linear and rarely can we give an explicit formula for x(t).

dd/(x(t)) = q(x(t))/(x(t))+r(x(t)).

we can use the same computation to reduce the problem to a first-orderlinear o.d.e. along the integral curves as

n 81LPj(x) 8x. (x) = q(x)/(x)+r(x)j=l J

so I being constant along the integral curves is necessary and sufficientfor I to be a solution of ¿j=IPj(x)8f/8xj(x) = O. Here we mustassume P is el in order to have the domain D covered by integralcurves.

If we want to consider the more general equation

dd/(x(t))

Pj, q, r are given real-valued continuous functions on an open domainD in lRn and I is the unknown function. (No such simple methodworks for higher order equations.) Let us consider first the case q == Oand r == O. Think of P = (PI, ... ,Pn) as a vector field on D. Then''i::/;=oPj(x)81/8Xj(x) says that I has directional derivative equal tozero along the vector field and so should be constant along the integralcurves. Indeed if x(t) is an integral curve for the vector field p(t), then

50711.:3 Vector Fields and Flows

Page 527: Strichartz_The Way of Analysis 2000

7. Find all solutions to the p.d.e. -y(8! /8x)(x, y)+x(8! /8y)(x, y) =!(x, y) in the quadrant x > O, y > O.

where q(k) = (qik), q~k) , q~k)) denotes the position, p(k) =(plk) ,p~k) ,p~k)) denotes the momentum of a particle of mass mk,

and G is the universal gravitational constant. Show that theHamilton-Jacobi equations reduce to Newton's F = ma law andthe universal law of gravitation. Show that the total momentum¿~=l p(k) and total angular momentum ¿~=l q(k) Xp(k) (here xis the vector cross product in ]R3) are conserved-that is, constanton integral curves.

6. The n-body problem is described by the Hamiltonian

( -y) . 2F(x,y) = x in R".

5. Describe explicitly the integral curves and the fíow for the vectorfield

4. Suppose F = V'V is a gradient vector field and x(t) an integralcurve. Show that V(x(t)) is an increasing function of t.

3. Let F = grad 8 where 8 is the angular polar coordinate in ]R2.Write out an explicit formula for F in terms of rectilinear coordi­nates. Show that F is single-valued (even though 8 is not) and elin ]R2\{0}. Show that curl F = O but F is not a gradient vectorfield.

2. Prove that if F is a el vector field on ]Rn satisfying curl F = O,then F is a gradient vector field (Hint: integrate components ofF along lines parallel to the axes.)

Chapter 11 Ordinary Differential Equations508

Page 528: Strichartz_The Way of Analysis 2000

Deftnition For an o.d.e. 01 order m in normal [orm, the conditionsx(to) = a(O),x'(to) = a(1), ... ,x(m-l)(to) = a(m-l), where each a(j) is avector in Rn, are called the Cauchy initial value conditions, the vectorsa(j) are called the Cauchy data, and the problem 01 solving the o.d.e.subject to the Cauchy initial value conditions on an interval containingto is called the Cauchy problem.

as well as x(t) = O, both with x(O) = O, despite the [act that G(x) =(3/2)x1/3 is continuous.

Example The o.d.e. x'(t) = (3/2)x(t)1/3 has the solution

_ {t3/2 ir t ~ O,x (t) - o ir t ::; o

Deftnition An o.d.e. 01 order m is said to be in normal form il it iswritten x(m)(t) = G(t, x(t), ... , x(m-l)(t)) where Gis afunction definedon an open set in Rl+nm taking values in Rn .

Theorem Every o.d.e. 01 order m may be reduced to an equivalento.d.e. 01order one by introducing new variables equal to the derivatives01 x 01 orders ::;m - 1.

Example The o.d.e. x'(t) + X(t)2 = Ohas solutions x(t) = l/(t - to)on any interval not containing too

where F is a given function defined on an open subset 01Rl+n(m+l)taking values in Rk and the unknown function x (t) is defined on aninterval 1 01R taking value» in Rn. We say x(t) is a solution 01 theo.d. e. il the equation holds [or every t in 1.

F(t, x(t), x'(t), ... , x(m)(t)) = O

Deftnition An ordinary differential equation (or system 01 equations),abbreviated o.d.e., 01 order m is an equation

11.1 Existence and Uniqueness

11.4 Summary

50911.4 Summary

Page 529: Strichartz_The Way of Analysis 2000

Corollary 11.1.2 Let G(t, x) be a el function defined [or t in 1 andx in D (an open set in ]Rn), taking values in Rn. Then the Cauchy

Theorem 11.1.2 (Local Ezistence and Uniqueness) Let G(t, x) be de­fined and continuous [or t in 1 and Ix - al ~ N, and satisfy IG(t, x) -G(t,y)1 ~ Mlx-yl [or t in 1 and lx-al ~ N, Iy-al ~ N. Thenlor asufficiently small interval J containing to there exists a solution 01 theCauchy problem x' = G(t, x), x(to) = a, which is unique on as large aninterval as the solution exists.

Corollary 11.1.1 The Cauchy problem [or the linear o.d.e. x'(t) =A(t)x(t) + b(t) where A : 1 -t ]Rnxn and b : 1 -t lRn are continuoushas a unique solution on l. In [act, the solution with x (to) = O can bewritten as a convergent perturbation series x = ¿~O(D-I A)k D-lb =D-Ib + D-I AD-Ib + D-I AD-I AD-Ib + ... where D-I denotes thesolution to the problem with A = 0, namely D-Ix(t) = Jt: x(s) ds.

Theorem 11.1.1 (Global Ezistence and Uniqueness) Under the hy­pothesis 01 the lemma, the Cauchy problem has a unique solution on l.The same is true il we can write 1 = U~l lj and [or every subintervalIj there exists M, such that IG(s, x) - G(s, y)1 ~ M¡ Ix - yl [or all xand y in]Rn and all s in lj.

Lemma 11.1.2 Let G(t, x) be continuous [or t in 1 and x in ]Rn;taking valúes in ]Rn; and let G satisfy the global Lipschitz conditionIG(t, x) - G(t, y)1 ~ Mlx - yl [or all x and y in]Rn and t in l. Thenthe Contractive Mapping Principle applies to the mapping Tx(t) =a + Jt: G( s, x( s)) ds on the space C(J) 01 continuous functions on J,where J is a sufficiently small interval containing to.

Theorem (Minkowski's lnequality) 11 I : [a, b] -t lRnis continuous,then 1J: I(t)dtl s J: I/(t)1 dt where 1I denotes the Euclidean norm.

Lemma 11.1.1A Cl function x(t) solving the Cauchy problem x'(t) =G(t,x(t)),x(to) = a, also solves the integral equation x(t) = a +Jt: G(s,x(s))ds. Conversely, any continuous solution 01 the integralequation solves the Cauchy problem.

Chapter 11 Ordinary DifferentialEquations510

Page 530: Strichartz_The Way of Analysis 2000

Deftnition 112:~0 aktlc and 2:k:O Alctlcare lormal power series, wesay 2: Alctk is a majorant 012: aktk il lakl :5 Ak [or every k.

Theorem 11.2.2 (Peano Existence) Let G(t, x) be continuous [or tin 1 and x in a neighborhoodo/ a. Then there ezists a solution (notnecessarily unique) to the Cauchy problem x' = G(t, x), x(to) = a, ona sufficiently small interval J containing too The solution is a limit 01Euler approximations.

Theorem 11.2.1 11G and its first derivatives are bounded and tk =to + kh is a partition with unilorm stepsize, then there exist constantsMI and M2 (depending only on the bounds [or G and dG) such thatIy(t) -x(t)1 :5 M2(eMdt-tol_l)h wherey(t) is the Euler approximation.

Deftnition The Euler approximation to the solution 01 the Cauchyproblem x'(t) =G(t, x), x(to) = a, associated to the partition to < tI <... < tn = T, is the piecewise affine function y(t) on [to, T] definedinductively by y(t) = Yk+ G(tlc,Yk)(t - tk) on [tk,tk+l] where Yo= aand Yk= y(tk) for k ~ 1.

11.2 Other Methods of Solution*

Theorem Let G(t, y) be a continuous function defined [or t in 1 andy in an open set in R,nm. Assume IG(t,y) - G(t, z)1 :5 Mly - zl ei­ther a) [or all t in 1 and all y and z in Rnm or b) [or all t in 1and y and z in some open ball B in Rnm. Then the Cauchy problemx(n) = G(t,x,x', ... ,x(m-l»),x(to) = a(O), ... ,x(m-I)(to) = a(m-I),has a unique solution a) on 1 or b) on some sufficiently small intervalJ containing to il the Cauchy data (a(O), ... , a(m-I») belongs to B.

Theorem 11G(t, x) satisfies the hypotheses 01 the local existence anduniqueness theorem, then the solution can be extended uniquely untilx(t) either becomes unbounded or leaues the domain 01G.

problem x' = G(t,x),x(to) = a for to in 1 and a in D has a uniquesolution on a sufficiently small interval J containing to.

51111..4 Summary

Page 531: Strichartz_The Way of Analysis 2000

Deftnition A ftow on an open set D in Rn is a continuous functionft(x) defined for x in D and t in a neighborhood ofO (depending on x)

Theorem 11.3.1 If a vector field F is el, then there exist uniqueintegral curves satisfying x(O) = y for any y in D. Two integral curveseither coincide (as subsets of D) or are disjoint.

Deftnition A continuous vector field on an open set D in Rn is acontinuous function F : D ~ lRn• It is said to be a gradient vectorfield if there exists a el function f : D ~ R such that F = V f. Anintegral curve of a vector field F is a el curve x : [a,b] ~ Rn such thatx'(t) = F(x(t)) for all t in [a,b].

11.3 Vector Fields and Flows"

Theorem 11.2.3 Let p, q, r have convergent power series in Itl < é.

Then the Cauchy problem s" = px' +qx+r,x(O) = eo,x'(O) = el, has asolution given by a power series x(t) = ¿:::o Cktk convergent in Itl < e.

and suppose do and dl aregiven and ¿:~o dktk is defined by

(k + l)(k + 2)dk+2 = (tpj(k - j + l)dk-j+l +Qjdk_J) +e;1=0

Lemma 11.2.1(Majomnt Lemma) Suppose P(t) = ¿:~OPktk,Q(t) =¿:~o Qktk, R(t) = ¿:~o Rktk are formal power series that aremajo­mnts of p(t) = ¿:~OPktk, q(t) = ¿:::o qktk, r(t) = ¿:::o rktk, respec­tively. Suppose eo and el are given and ¿:k:O ektk is defined by theequations

Lemma If ¿: Aktk is a majorant of ¿: aktk and ¿: Aktk converges inItl < e, then so does ¿: aktk.

Chapter 11 Ordinary Differential Equations512

Page 532: Strichartz_The Way of Analysis 2000

il and only illor each integral curve x(t) 01 the vector field p, the re­striction I(x(t)) is a solution 01 the o.d.e. u'(t) = q(x(t))u(t) +r(x(t)).

n

LPj(x)81/8xj(x) = q(x)/(x) + r(x)j=l

Theorem Let Pj [or j = 1, ... , n be el and q and r be continuousfunctions on an open set D in]Rn. Then I(x) is a solution 01 the p.d.e.

The Hamiltonian is constant on integral curves (conservation 01 ener­gy). JI H(q,p) = (1/2m)lpI2 +V(q), the Hamilton-Jacobi equations areequivalent to Newton's laws 01 motion [or a particle 01 mass m mov­ing under the infiuence 01 a [orce F = - VV and V is interpreted aspotential energy.

dp¡ 8Hdt = - dt (q,p).dq¡ 8Hdt = 8pj (q,p),

Example (Hamiltonian Mechanics) Let q and p denote variables in]Rn (interpreted as position and momentum); and let H(q,p) be a real­valued function, called the Hamiltonian (interpreted as energy). Thevector field (8H(q,p)/8p, -8H(q,p)/8q) is called the Hamiltonian vec­tor field, and the o.d.e. [or an integral curve comprise the Hamilton­Jacobi equations

taking valúes in D, satisfying hUs(x)) = It+s(x) whenever both sidesare defined. The fiow associated to a el vector field F is given byft(Y) = x(t) where x(t) is the integral curve satislying x(O) = y. Thevector field associated to a el fiow It is F(x) = 8/8th(x)lt=ü'

51311.4 Summary

Page 533: Strichartz_The Way of Analysis 2000
Page 534: Strichartz_The Way of Analysis 2000

515

Figure 12.1.1:

x

I! y(x, t)

b---..............."......"...._......,.....

,yI

Here y(x, t) denotes the vertical displacement of a string at the hori­zontal point x at time t for O~ x ~ L, as in Figure 12.1.1.

82y(x, t) 2 82y(x, t)8t2 = e 8x2 .

Fourier series were first stumbled upon by Daniel Bernoulli in the 1750swhile studying the equation of a vibrating string, which is the partialdifferential equation (p.d.e.)

12.1.1 Fourier Series Solutions of P.D.E. 's

12.1 Origins of Fourier Series

Fourier Series

Chapter 12

Page 535: Strichartz_The Way of Analysis 2000

are imposed, where f(x) is the initial shape of the string (for consis­tency we need f(O) = f(T) = O).

8y8t (x,O) = Oy(x, O) = f(x),

In the problem considered by Bernoulli, the string is held down atthe endpoints, yielding the boundary conditions y(O, t) = y(L, t) = Oforall t. Also the string is initially (t =O)held at rest in some position andthen is released to follow the time evolution dictated by the equation(as in all Newtonian mechanics, the initial position and velocity aresupposed to determine the future evolution of the system). This meansthe initial conditions

Figure 12.1.2:

Also L is the length of the string and the constant e is determined

b h densi d . f h . (2 tension ~ .y t e ensíty an tensíon o t e string e = d . ror appropriateensítyunits). We will not dwell on the derivation of this equation, whichis a combination of Newtonian mechanics, simplfying assumptions, alimiting argument, and a little hocus pocus. We can give a plausibleexplanation of the equation, however, as follows: 82y / 8t2 is the accel­eration of a point on the string located horizontally at x. According toNewtonian mechanics, the acceleration should be proportional to theforce acting on that point on the string. This force is caused by thetension on the string, which acts to straighten it out. But 82y/8x2measures the concavity of the string shape and has the correct sign toimpart an acceleration in the direction oí straightening out the string,as shown in Figure 12.1.2.

Chapter 12 Fourier Series516

Page 536: Strichartz_The Way of Analysis 2000

Euler thought tbis was patently absurd, since the sine series possessedspecial properties, such as being an odd function in x and periodic ofperiod 2L, which a function such as I(x) = x(L - x) does not possess.

We can in fact verify that this is the case if we make suitable as­sumptions on the coefficientsto justify differentiating the infinite seriesterm-by-term (see exercise set 12.1.4).

Bernoulli went further, however, and claimed that in this mannerhe obtained all solutions. Tbis claim can be made plausible in terms ofthe musical interpretation, but Bernoulli rested bis claim on the vagueand dubious grounds that with an infinite number of variables (the ak)at his disposal, he could do anything at all.

Euler immediately attacked Bernoulli's claim, pointing out that itwould imply that an arbitrary function I(x) on O~ x ~ L with 1(0) =J (L) =Ocould be expanded in an infinite sine series

I(x) =f:ak sin k7rx.k=l L

would satisfy the p.d.e. and the initial conditions for

¡(x) = I:aksin 7,x.k=l

. k7r ckity(x, t) = SIn LX cos -¡;t

where k is a postive integer, that satisfy the boundary conditions andthe initial conditions with I(x) = sin(k7r/L)x. These solutions have amusical interpretation in terms of the sound produced by the string­they give the overtone series for the fundamental tone (k = 1). Sincethe p.d.e. is linear, linear combinations of these simple solutions arealso solutions. In Bernoulli's day it was presumed that infinite linearcombinations would also satisfy the p.d.e., so

Bernoulli observed that there were some particularly simple solu­tions to the p.d.e., namely

51712.1 Origins 01Fourier Series

Page 537: Strichartz_The Way of Analysis 2000

which is plausible on physical grounds (no real string vibrates foreverlike Bernoulli's simple solutions), then Bemoulli's method of solution

was like an algebraic system of an infinite number of linear equationsin an infinite number of unknowns. One of the most sígnifícant weak­nesses in his argument was that he could not produce a formula for thecoefficients ak in terms of f. This formula was actually first discoveredby Euler several years later in the course of an unrelated investigation(Euler had cosines instead of sines), but as Euler was predisposed toreject Bernoulli's elaim he never pointed out the possible relevance.Thus, did the two of them botch the opportunity of developing Fourierseries a full half century before Fourier.

Before leaving this interesting historie episode behind, it is worthcommenting on one line of thought that Bernoulli might have pursuedto great advantage. If we modify the vibrating string equation to in­elude a damping term for friction,

82y a2y ay8t2 = ~ ax2 - a 8t '

Bernoulli was not able to answer Euler well, except to repeat hislame argument that the equation

I(x) =f:4k sin '7xk=l

. k1f' ck1f' 1(. k1f' ( ). k1f' ( ))S10 -¡;x cosTt = 2" S10T x + ct + S10 -¡; X - ct .

where j denotes the extension of f to the whole line, which is odd,j(-x) = -f(x), and periodic ofperiod 2L,j(x + 2L) = j(x). It is aroutine exercise to verify that Euler's solution actually works. Eulerpointed out that Bemoulli's solution was a special case of his, which iselear from the trigonometric identity

1 - -y(x, t) = "2(f(x + ct) + f(x - ct))

We will return to discuss this objection latero Of course, Euler hadan axe to grind: he (and D'Alembert, independently) had found acompletely different solution to the problem, namely

Chapter 12 Fourier Series518

Page 538: Strichartz_The Way of Analysis 2000

l L k .• 1r . J1ro slllLXSlllyxdx

for the coefficients in terms of l. This is easily verified by multiplyingby sin(j1r/L)x and integrating term-by-term (assuming L:~llakl isfinite, so the series converges uniformly). We then need the elementarycalculation

00 (_c2 k21r2t) k1ru(x, t) = (; ak exp L2 sin LX

if I (x) = L:%"=1ak sin( kt: / L)x by the linearity of the equation. Fourierwent farther than Bernoulli and produced the formula

2 fL k1rak = L Jo I(x) sin LX dx

if I(x) = sin(k1r/L)x and hence

Bu 282u8t (x, t) = e 8x2 (x, t)

where u(x, t) represents the temperature of a one-dimensional object(think of an insulated thin bar) at the point x at time t where e is a con­stant that depends on the physical properties of the object. Imposingboundary conditions u(O, t) = u(L, t) = O (the physical interpretationof this kind of boundary condition is that the ends of the bar are at­tached to reservoirs held at a constant temperature, which is taken tobe zero) and initial conditions u(x, O) = I(x) for O :s x :s L, whereI (x) is the initial temperature distribution, Fourier found the simplesolution (for t > O)

is easily modified (see exercise set 12.1.4), but Euler's fails completely.If there had been as many mathematicians in those days as there aretoday, no doubt sorne graduate student looking for a thesis topic wouldhave observed this, which might have been enough to push Euler andBernoulli onto the right track.

Fourier carne to consider Fourier series by way of a related p.d.e.,the heat equation

51912.1 Origins 01Fourier Series

Page 539: Strichartz_The Way of Analysis 2000

Before we begin the detailed study of Fourier series, it is worth remark­ing on sorne of the special features of the trigonometric functions thatenter into the expansions. Bluntly stated, we want to know why weshould care about Fourier series and into what general framework theyfit. It turns out that there are two distinct perspectives available, oneleading to spectral theory and the other to harmonic analysis.

If we ask why the sines appeared in the simple solutions to thevibrating string or heat equation, the answer can be given in terms ofthe "operator" 82/ 8x2 on the space of C2 functions on O S x S Lwith boundary conditions f(O) = f(L) = O. The sines are the solutions

12.1.2 Spectr al Theory

(Curiously, neither Euler nor Fourier presented this simple derivationimmediately but rather tirst gave long-winded and largely incompre­hensible derivations using power-series expansions and clever but in­valid manipulations-presumably the method they used to discoverthe result-and then added the simple computation aboye as an afterthought.)

Armed with a formula for the coefficients, Fourier was able to makea more convincing claim that every function f (x) on OS x S L withf(O) = f(L) = O has a sine-series expansiono Actually this was quitea bold claim for his time, since neither the concept of function nor themeaning of the integral in the formula for ak with a general function f(in Fourier's time the integral was usually detined as an anti-derivative)were at all clear. In fact it was the interest in understanding Fourier'sclaim that helped spur a good deal of the work we have discussed inearlier chapters. Fourier devoted a good deal of effort to computingthe sine-series expansions of explicit functions and examining the con­vergence of the partial sums. This "empirical" evidence helped bolsterhis claim.

11L [ (k - j)1r (k + j)1r 1 dx- cos L x - cos x2 o L

{L/2 if k = j,O if k "# j.

Chapter 12 Fourier Series520

Page 540: Strichartz_The Way of Analysis 2000

( d2 I ) ( 82g)dx2,g = 1, 8x2for I and 9 in V. This is easily seen by integrating by parts twice, usingthe boundary conditions 1(0) = I(L) = Oand g(O) = g(L) = O (from Iand 9 being in V) to obtain the vanishing of the boundary terms in theintegration by parts formula (see exercise set 12.1.4). The condition(Al, g) = (1, Ag) for a linear operator on an inner product space iscalled symmetry, and it is a direct generalization of the symmetry oí

lJ2 lJ2 lJ2dx2 (al(x) + bg(x)) = a dx2 I(x) + bdx2g(x).

Since the o.d.e. has explicit solutions for every possible ). (real or com­plex), it is simple to verify, because of the boundary conditions, that theonly solutions to the eigenvalue problem are multiples of sin( kat] L)x,which correspond to ). = _k2rr2 / L2 (see exercise set 12.1.4). Theeigenvectors 1/Jthen yield simple solutions of the aboye problems (andmany more) by separation oí variables: try u(x, t) = g(t)1/J(x) in theheat equation and obtain g'(t) = c2 ).g(t), hence 9 is a multiple of exp(c2).t).

So we can restate Fourier's claim as follows: every function has anexpansion in eigenvectors for the operator d2 / dx2 on the space V oí C2functions on O~ x ~ L satisfying 1(0) = I(L) = O, called the Dirichletboundary conditions. (The fact that the range of d2 / dx2 on V is notequal to V creates some technical problems, but the eigenvectors willbe in V as a consequence of the eigenvalue equation.) We note onespecial property oí this operator. If (1, g) denotes the inner productL L --Jo I(x)g(x) dx (or Jo I(x)g(x) dx if we want to allow complex-valued

functions), then

This is completely analogous to the eigenvalue equation A1/J= ).1/JwhereA is an n x n matrix and 1/Ja vector in R". In the present case, in placeof A we take tP / dx2 and in place of lRnwe take 1/Jin the vector spaceoí C2 functions on O :5 x :5 L satisfying 1/J(O)= 1/J(L) = O. The spaceoí C2 functions is an infinite-dimensional vector space, but lJ2 / dx2 islinear:

tPdx21/J(x) = ).1/J(x) on O~ x ~ L, 1/J(0)= 1/J(L) = O.

(eigenvectors) of the eigenvalue problem

5211!.1 Origins 01Fourier Series

Page 541: Strichartz_The Way of Analysis 2000

(if there exist linearly independent eigenvectors with the same eigen­value, then we must use the Gram-Schmidt orthogonalization processto make them orthogonal to each other).

Another closely related example is the operator d2 j dx2 on the vectorspace VI of C2 functions satisfying the boundary conditions t' (O) =j'(L) = O (called the Neumann boundary conditions). Notice that wehave not changed the formula for the operator but that we have changedthe vector space of functions on which it operates. We take the sameformula for the inner product as before, (1,g) = JoL j(x)g(x) dx. Onceagain we find that d2jdx2 is symmetric, (d2jdx2j,g) = (1,d2jdx2g) byintegration by parts, this time using the Neumann boundary conditionsto show that the boundary terms vanish. In case you are beginning to

if j =f. k in the special case under consideration, which is the key formulafor finding the Fourier coefficients. In the general case, if j =¿ck1/1kwhere the 1/1kare eigenvectors with distinct eigenvalues, then

{L . kt: . j1rJo smTxsm¡;xdx = O

(1/11,A1/12) = (1/11, "\21/12) = "\2 (1/11,1/12);

so the symmetry (A1/11, 1/12) = (1/11, A1/12) implies A¡ (1/11,1/12) = "\2(1/11,1/12),hence (1/11,1/12) = O if "\1 =f. "\2. This is just

and

an n x n matrix considered as a linear operator on the inner productspace ]Rn. Recall that for a symmetric matrix we proved a spectraltheorem asserting that ]Rn has an orthonormal basis of eigenvectors.The Fourier sine expansion can be expressed as a generalization of thisspectral theorem, and both are special cases of a very general spectraltheorem of von Neumann.

In the spectral theorem for symmetric matrices, the eigenvectorscan be taken to be orthogonal. This is true in general for symmetricoperators. If A1/11= "\11/11 and A1/12= "\21/12 for "\1 =f. "\2, then

Chapter 12 Fourier Series522

Page 542: Strichartz_The Way of Analysis 2000

ao ~ k1r"2 + ~alecosLx,Ie=!

then the same series will give le(x) on [-L, L) and, similarly, if lo(x)restricted to [O,L) has a Fourier sine expansion Ek:! ble sin(k7r/ L)x,

which can be interpreted physically as saying the ends of the bar areinsulated, hence no heat ftowsacross them.

We can also combine the Fourier sine and cosine expansions bydoubling the intervalo H I(x) is defined on [-L, L), then we can writeI(x) = le(x) + lo(x) where le(x) = 1/2(1(x) + I( -x)) is even andlo(x) = 1/2(1(x) - I( -x)) is odd about the point x = 00 Since thecosines are even and the sines odd, ir le(x) restricted to [O,L) has aFourier cosine series

81.1. 81.1.8x (O, t) = 8x (L, t) = 0,

2 (L k7rale= L Jo I(x) cosLX dx

(the choice of the factor 1/2 in front of ao is purely conventional, sothe formula for ale is the same for k = Oas for k :F O). This is calledthe Fourier cosine expansion, and Fourier also conjectured that everyfunction I (x) has such an expansiono Fourier used it to solve the heatequation with Neumann boundary conditions

then

Thus if

(L ( k7r) 2 {L ir k = O,Jo cosLX dx = L/2 if k :F O.

suspect that tfl / dx2 will be symmetric for any boundary conditions,this turns out to be false (see exercise set 12.1.4).

The eigenvectors for d2 / dx2 on VI are cos(kt: / L)x with eigenvalue_k27r2 / L2 for k = 0,1, .... From the symmetry we can conclude im­mediately that JoLcos(k7r/ L)x cos(j7r/ L)x dx = O if j :F k. Using thestandard trigonometric identities as in the sine case we could obtainthe same result, and also

52312.1 Origins 01Fourier Series

Page 543: Strichartz_The Way of Analysis 2000

because the integrands are odd functions).Now recall that for the Fourier sine series we were assuming that

f(O) = f(L) = O. What does this mean for the odd function fo(x)aboye? The condition fo(O) = O is automatic from the definitionfo(x) = 1/2(f(x) - f( -x)), but the condition fo(L) = Ois the same asf(L) = f( -L) (for the even part the condition fe(L) = fe( -L) is auto­matic). We can interpret this as a periodicity condition if we extend fto be periodic of period 2L. This is a natural thing to do since the sinesand cosines in the series already are periodic of period 2L. Thus wedefine f(x) on the whole line by saying f(x+2L) = f(x) for all x. Thecondition f (- L) = f (L) is then necessary and sufficient for this con­dition to be fulfilled. We will usually discuss Fourier series in terms ofthe periodic extension-thus the condition that f be continuous mean­s that f is continuous on [-L,L] and f(L) = f(-L), the conditionthat f be el means that f is el on [-L, L] and f(L) = f( -L) andf'(L) = f'( -L), and so on.

Thus Fourier's conjecture is that every continuous function f(x) onthe line that is periodic of period 2L has a full Fourier series expansionoThis follows from the two conjectures concerning Fourier sine and cosineseries, and it implies them by considering even and odd extensionsacross x = O (see exercise set 12.1.4). The conjecture is actually not

j L kt:fe(x) sin -x dx = O

-L Lj L k1rfo(x) COS LX dx = O and

-L

1 jL k1rbk = - f(x) sin -L x dx,L -L

the factor 2 having disappeared because we extended the integration to thefull interval (notice also that

and

ljL k1rak = -L f(x) cos -x dx-L L

where

ao ~ ( kt: . kt: )f(x)=2+~ akcosLx+bkslllLxk=l

then the same series will give fo(x) on [-L, L]. Adding, we have thefull Fourier series

Chapter 12 Fourier Series524

Page 544: Strichartz_The Way of Analysis 2000

where en = (1/2L) J~L f (x) e-{in-rr/L)%dx (see exercise set 12.1.4). Thecondition that f (x) be real-valued is then equivalent to the conditionC-n = en. on the Fourier coefficients. Note then that the functionse(I71.7r/L)% are exactly the eigenfunctions of the operator d/dx on thespace of el functions with the periodic boundary conditions I(-L) =/(L) (note that the operator d/ dx is now skew-symmetric instead ofsymmetric but it can be made symmetric by multiplying by ±i).

Now we are in a position to discuss the second interpretation ofFourier series. Let Ty denote the operation oftranslation by y, Ty/(X) =

00= L Cn.e(in:tr/L)%

71.=-00

12.1.3 Harmonic AnalysisIn dealing with the full Fourier series we can simplify matters con­siderably by introducing complex numbers. Recal! the Euler relationse1% = cos x + isin x, cosx = (el% + e-I%) /2, and sin x = (el% - e-1%) /2i.Using these it is easy to show that

~ + f:(akcosk;X+bkSin~X)k=l

quite correct as we will see in the next section, but it can be modifiedin a number of ways to give a correct theorem. Notice already that wehave successfully countered Euler's objection that the function I(x) =x(L - x) cannot have a Fourier sine-series expansion because it is notodd and periodic. Fourier's claim is only that

x(L - x) = I>ksin k; xk=l

for O~ x $ L. For other values of x the sine series converges to thefunction I(x), which is obtained by extending x(L - x) to be odd andperiodic of period 2L, but I(x) :í= x(L - x) outside the interval [O,L].Euler could not conceive of this possibility because he was experiencedin dealing with power series and knew that one cannot have a powerseries converge to different analytic expressions on different interval­s. Euler naively assumed that trigonometric series would behave likepower series-in fact nothing could be further from the truth!

5251!.1 Origins 01Fourier Series

Page 545: Strichartz_The Way of Analysis 2000

This equation, called the character identity, has a beautiful symmetry.It is not hard to showthat any complex-valuedel solution must be ofthe formexp (..\x) where ..\is a complexnumber (seeexerciseset 12.1.4),and with more work it is possible to show the same is true under thehypothesis that t/J is continuous. The condition that t/J be periodic ofperiod 2L then implies ..\= ikrrIL for some integer k. Notice thatthe character identity determines exp(ikrrIL)x exactly, whereas theeigenvalueequation allowsan arbitrary multiple of exp(ikrrIL )x.

The significanceof the character identity is that it has a group­theoretic interpretation. Wethink of the real numbersmod 2L as form­ing a commutative group under addition (the elements of this groupare the sets {x + 2Lk} as k varies over the integers, and {x + 2Lk} +{y + 2Lk} = {x + y + 2Lk} is well defined). A periodic function ofperiod 2L may be thought of as a function on this group. The charac­ter identity then says that t/J is a homomorphismof this group into themultiplicative group of the non-zero complex numbers. We note thatin this case all the solutions of the character identity e( ik1r / L)x take ononly complexvalues of absolute value one, It/J(x) 1 = 1. Recall that thecomplexnumbers of absolute valueone, Izl = 1, can be written z = ei9

t/J(x + y) = t/J(x)t/J(y).

It is not hard to show (see exercise set 12.1.4) that (up to constantmultiples) these are the only periodic functions that are eigenvectorsfor all the operators "v-

We can go a little further with the eigenvalueequation for Ty andwrite it as

f(x + y). We can apply this to any function on the line, but forthis discussionwe restrict attention to periodic functions of period 2L.We can relate the operators Ty and the operator di d,x by the formuladld,x = (818Y)Tyly=o. In a sensethat weneed not makeprecise, dldx isa sort of infinitesmal translation, and converselythe translations maybe obtained by integrating dldx. The point of this vague discussion isthe following:if the functions e(ik1r/L)x are eigenvectorsfor dl d», theyshould also be eigenvectorsfor Ty• Having said this, it is immediatelyobvious that they are:

Chapter 12 Fourier Series526

Page 546: Strichartz_The Way of Analysis 2000

T. (acos k;x + bsin ~ x)klr . klr

= acos T(x + y} + bsm T(x + y}

(k 11' k 11' • klt' . klt' )= a cos-xcos -y - sm-xsm-yL L L L

(. klt' k 11' k 11' • klt' )

+b smTxcosTy+cosTxsmTY

k 11' • klr= AcosTx + BsmTx

we see that the complex numbers of absolute value one under multi­plication give an isomorphic model of the group we are considering forL = 11'. Let us denote this group T.

We can summarize the aboye discussion (for L = 11') by saying thatthe functions eik:c are continuous homomorphisms oí the group Tintoitself. It turns out that these are the only ones (see exercise set 12.1.4,number 18 for a slightly weaker result). They are called the charactersof T. Fourier's conjecture then says that an arbitrary complex-valuedfunction on T is an infinite linear combination oí characters. This is aspecial case oí what is called harmonic analysis (or sometimes Fourieranalysis). There are far-reaching generalizations to other groups. If agroup G is commutative we again consider characters, which are definedto be homomorphisms of G into T (usually there is a metric or moregenerally a topology on G, and the characters are assumed continuous).H a group is non-commutative, then the character identity must befurther generalized to the theory of group representations. In the casewhere the group G is the additive group of the line, the harmonicanalysis leads to the theory of Fourier transforms.

If we had not expanded our perspective to the complex numbers,the group-theoretic significance of the sines and cosines would not be soapparent. Nevertheless, there is still something we can sayo Look at thetwo-dimensional vector space {a cos( klr / L)x + bsin( klr / L)x} of linearcombinations oí the two functions cos( klr / L)x and sin( k 11'/ L)x. Thenapplying Ty to any function in the space produces another function inthis space:

for 8 real and 8 is determined mod 211'. Sinceei91 ei92 = ei(91+(2),

5271!.1 Origina 01 Fourier Series

Page 547: Strichartz_The Way of Analysis 2000

1 - -y(x, t) = 2(!(x + ct) + f(x - ct)),

3. Verify that the Euler-D' Alembert solution

2. Show that it is impossible to have sin x = L:~2 ak sin kx on O~x ~ 1rwith the series converging uniformly, for any choice of theak, even though there are an infinite number of parameters in theproblem. (Hint: multiply by sin x and integrate.)

converges uniformly and absolutely to a C2 solution of the vibrat­ing string equation with boundary conditions y(O, t) = y(L, t) = Oand initial conditions y(x, O)= f(x), (8/8t)y(x, O)= Ofor

12.1.4 ExercÍses

1. Assuming ¿~1 k21akl < 00, verify that

The constant functions of course form a one-dimensional vector spaceof functions that is preserved by all translations. It turns out that anyfinite-dimensional vector space of functions (periodic of period 2L) thatis preserved by all translations must be a kind of sum of these basicbuilding blocks. Thus the sines and cosines that appear in the Fourierseries are in this sense the functions that behave most simply undertranslation.

We have now given three reasons why Fourier series are so impor­tant: 1) they are useful in solving p.d.e. 's, 2) they are a special case ofthe spectral theorem, 3) they are a special case of harmonic analysis.

( A ) = (COS(k1r/L)Y Sin(k1r/L)Y) ( n.B _ sin(k1r/ L)y cos(k1r / L)y

where

Chapter 12 Fourier Series528

Page 548: Strichartz_The Way of Analysis 2000

7. Show that all solutions of 1/J"(x) = >'1/J(x) on O ~ x $ L with1/J'(0)= 1/J'(L) =Oare of the form ccos(k1r j L)x.

8. Prove the symmetry of d2j dx2 on the vector space of C2 functionswith a) Dirichlet (1(0) = I(L) = O) or b) Neumann (1'(0) =I'(L) = O)boundary conditions.

satisfies the heat equation for t > Owith boundary conditionsu(O,t) = u(L, t) = Oand initial condition u(x, O)= I(x) if I(x) =E~l Qksin(k1rjL)x with E lakl < oo.

6. Show that all solutions of 1/J"(x) = >'1/J(x) on O ~ x ~ L with1/J(0)= 1/J(L) = Oare ofthe form csin(k1rjL)x. (Hint: write downall solutions of the o.d.e. and impose the boundary conditions.)

with E k21akl < oo. (Hint: look for special solutionsg(t) sin(k1rjL)x and obtain an o.d.e. for g. You will have to dis­tinguish three cases depending on whether a is greater than, equalto, or less than 2ck1r j L.)

5. Verify that

82y 82y By8t2 = c2 8x2 - a 8t

with boundary conditions y(O, t) = y(L, t) = Oand initial condi­tions y(x, O)= I(x), 8yj8t(x, O)=Oassuming

4. Solve the damped vibrating string equation

where j denotes the extension of I to the whole line satisfyingj( -x) = - I(x) and j(x + 2L) = j(x), actually solves the initialvalue problem for the vibrating string equation for any functionI(x) such that j is C2• What conditions on I are necessary andsufficient for j to be C2?

52912.1 Origina 01Fourier Series

Page 549: Strichartz_The Way of Analysis 2000

17. Show that if 'T"yf(x) = g(y)f(x) for every x and y and sorne elfunction g(y), where f is a el periodic function of period 2L, thenf(x) = ce(ik1r/L}x. (Hint: differentiate and use exercise 15.)

of the full Fourier series.

14. Show that E~=-oo ene(ik1r/L)x is real-valued if and only if C-n =en·

15. Show that all eigenfunctions of d/dx on the space ofel functionson [-L,L] satisfying f(-L) = f(L) are ofthe form ce(ik1r/L}x.

16. Show that (Af, g) = -(1, Ag) on a complex inner product spaceif and only if iA is symmetric.

00E ene(·k1r/L}x

n=-oo

and

ao ~ ( k7r . k7r )"2 + z: akCOS L + bkSlD LXk=l

13. Express the a 's and b's in terms of the c's and vice versa in thetwo forms

12. Show how any result about convergence of the full Fourier serieson [-L, L] implies a result about Fourier sine and cosine series on[O,L).

11. Prove the symmetry of the Sturm-Liouville operator Af =p f" +pi f' + st, where p and q are real-valued functions, on thespace of e2 functions with a) Dirichlet boundary conditions, orb) Neumann boundary conditions.

10. Find a11eigenvectors for the operator cP / dx2 on the space ofe2 functions on [O,L] satisfying the boundary conditions f(O) =f'(L) = O.

9. For which values of a and b is the operator lJ2 / dx2 symmetric onthe vector space of C2 functions on [O,L] satisfyingaf(O) + bf'(O) = af'(L) + bf'(L) = O?Prove the symmetry in thecases it is true, and give counterexamples to the symmetry in thecases it is falseo

Chapter 12 Fourier Series530

Page 550: Strichartz_The Way of Analysis 2000

Dirichlet was the first mathematician to give a good proof of the con­vergence of Fourier series. His work on the question was importantalso because it stimulated Riemann to investigate integration theory.Dirichlet proved that a continuous function is equal to a pointwise con­vergent Fourier series if it has only a finite number of maxima andminima. Dirichlet also proved a similar result for a function that hasa finite number of discontinuities, and he remarked that he only usedthat property to ensure the existence of the integral. He had the in­tegrity to admit that he saw no way to remove the hypothesis that thenumber of extrema be finite, even though he could not see any naturalreason why the hypothesis should be needed. We will follow Dirichlet'sargument part way, and use it to establish the uniform convergence fora el function.

12.2.1 Uniform Convergence for el Functions

12.2 Convergence of Fourier Series

in the rectangle O ::; x ::; a, O ::; y ::; b with boundary conditionsthat u(x, O) = f(x) on the bottom side and u = O on the otherthree sides (i.e., u(x, b) = O,u(O, y) = O and u(a, y) = O) byexpanding u (x, y) in a Fourier sine series in x, for each fixed y.

21. Solve the differential equation

18. Show that any complex-valued el function satisfying the charac­ter identity must be of the form exp(,.\x) for ,.\ complexo

19. Show that any bounded el function on the line satisfying thecharacter identity must be of the form f (x) = e= for real t.

20. a. "Sketch" the graph of sin 10, OOOx. (Note: Your ear is capableof hearing a tone with this frequency, a little more than anoctave aboye the highest note on a piano.)

b. "Sketch" the graph of sin 10, OOOx + sin 10, 001x. (Hint: usethe trigometric identity sin( a + b) + sin( a - b) = 2 sin a cos b.)

53112.2 Convergence of Fourier Series

Page 551: Strichartz_The Way of Analysis 2000

SN/(X) =

To investigate the convergence we form the partial sums SN f (x) =E~=-N enein:c. Notice that the order we are taking corresponds to thenatural order in terms of sines and cosines,

Nao "" .SNf(x) = 2" + L.J(akcoskx + bksmkx).

k=lWe want to show SNf(x) -+ f(x) as N -+ oo. Dirichlet's key idea issimply to substitute the definition of the Fourier coefficients in orderto write SN f (x) as a convolution. To see how this works, we need theelementary observation that if f is 211'-periodic,thenL /(x) dz = [+2" ¡(x) dz

for any real a-for any interval of length 211' can be cut in two piecesand translated back by multiples of 211' to the interval [-11',11']. Notethat this remark also applies to f(x)e-in:c, since this function is again211'-periodic.

Using these ideas, we find

n=-oo

and the Fourier series

Our starting point wiIl be a function f (x) that is continuous andperiodic of period 211' (wenow followthe convention of taking the periodequal to 211' in order to simplify the notation-clearly the argumentswill go through in the general case). We form the Fourier coefficients

1111' .en = - f(x)e-mx dx211' -11'

Chapter 12 Fourier Series532

Page 552: Strichartz_The Way of Analysis 2000

If (1/21r)DN behaves like an approximate identity we can hope to showSNf -+ l·

Let us examine the graph of DN(t) (see Figure 12.2.1 for the caseN = 3). Notice that DN(O) = 2N + 1, so DN has a peak at the ori­gin. However, the function sin(N + 1/2)x in the numerator oscillatesfrequently between -1 and +1, and tbe function sin z/2 in tbe denom­inator has only the single zero at x = o. Thus the quotient does notactually become small for x away from zero. However, there is a greatdeal of oscillation away from x = Oso that "on the average" DN is smallaway from tbe originoWe do have (1/21r)J~7r DN(t) dt = 1, which canbe seen quite easily from the expression D N (t) = L:~=- N eint, sinceonly the term n = Osurvives the integration.

1 1 111'SNI(x} = -2 f * DN(X) = -2 f(x - y)DN(Y) dy.1r 1r -11'

Thus we have

1 11' r: 111'-11' f(y)g(x - y) dy = X-7r f(x - y)g(y) dy = -11' f(x - y)g(y) dy.

_ ei(N+1)t _ e-iNt _ ei(N+l/2}t _ e-i(N+1/2)t _ sin(N + 1/2)tDN(t) - eit _ 1 - eit/2 _ e-it/2 - sin(1/2)t .

We call the expression J~7r I(y)g(x - y) dy the periodic convolution ofI and g, written I * g( x), for any 21r-periodicfunctions f and g. Weobserve that it is a commutative product, for by the change of variabley -+ x - y we obtain

so

N+leit DN(t) = E eint = DN(t) + ei(N+l}t - e-iNt,

n=l-N

Since this is a geometric progession, we can evaluate it exactly. Wehave

NDN(t) = E eint.

n=-N

where we have defined the Dirichlet kernel D N by

533lf.! ContJergence 01Fourier Series

Page 553: Strichartz_The Way of Analysis 2000

But now we cannot take absolute values inside the integral withoutcourting disaster, since we have no control over IDN{y)l. In fact it canbe shown that ¡::'11" IDN(y)ldy ~ clog N (see exercise set 12.2.6). Itfollows that a direct appeal to the approximate identity method willnot work. Nevertheless, the cancellation implied by the oscillation ofDN away from the origin should malee the contribution to the integralfrom 6 ~ Iyl ~ 1r very small. The way to see this is by the appropri­ate integration by parts. We begin by proving the convergence understronger hypotheses, namely that f is C2• The proof is simpler in thiscase. As a bonus we get a faster rate of convergence.

1111"SN f(x) - f(x) = - [f(x - y) - f(x)]DN(Y) dy.21r -'lf

Since we are interested in the convergence of SN f to f, we write

Figure 12.2.1:

sin(7z/2)/ sin(z/2)

sin(z/2)

sin(7%/2)

Chapter 12 Fourier Series534

Page 554: Strichartz_The Way of Analysis 2000

Proof: Once again we write

SN/(X) - /(x) = 2~L 9(Y)Si~N + DYdY,

Theorem 12.2.2 Let I be a el periodic function 01period 21r. Thenthe Fourier senes 01 I conl1erges uniloroUlI to 1; in [act, SNI(x) -f(x) =O(N-l/2) unilormly as N ~ oo.

In fact, by integrating by parts k - 1times, you can show that if Iis c-, then the rate of convergence is O(l/Nk-l). We leave the detailsto the exercises.

SN/(X) - /(X) = 2~L (I(X :i:~¡/(X)) sin (N +D ydy.

We write g(y) = (f(x - y) - I(x))/ siny/2, and observe that this is ael functíon on [-7r,7r]. Indeed, there is no problem away from y = O,and near y = O we can use the Taylor expansion of I(x - y) aboutx, as in our discussion of L'Hópital's rule. It is here that we use thehypothesis that I is e". Furthermore, we can find a bound II(y) I$ Mfor all x and y.

Now we are ready to do integration by parts, differentiating 9 andintegrating sin(N + 1/2)Y = (-d/dy) cos(N + 1/2)y/(N + 1/2). Thereare no boundary terms because the cosine of (N + 1/2)y vanishes wheny = ±7r. Thus

1 j'" cos(N + l)ySN I(x) - I(x) = 27r -fr g'(y) N + ~2 dy;

hence, ISN I(x) - l(x)1 $ M/(N + 1/2) as claimed. QED

The trick is to substitute the explicit formula for DN (y) and to groupthe terms as follows:

1 j'"SNI(x) - I(x) = - (f(x - y) - l(x))DN(Y) dy.27r -11"

Proof: We have seen that we can write

Theorem 12.2.1 Let I be a e2 periodic function 01period 27r. Thenthe Fourier series 01I converges unilormly to 1, and in [act SN I (x) -I(x) = O(1/N) unilormly as N ~ oo.

5351~.~ Convergence 01Fourier Series

Page 555: Strichartz_The Way of Analysis 2000

using the fact that Iyl ~ 1T1sin y/21 on Iyl ~ 1T (see exercise set 12.2.6).Thus

ILg(Y)Sin(N + DYdyl sL: Ig(y)dyl s 2ds~pl!,(z)l.This is a good estimate, because we can malee the integral small bytaking 6 small.

Next weconsider the integral over the interval [6,1T] (we treat [-1T, -6]analogously). We integrate by parts, differentiating g(y) and integrat­ing sin(N + 1/2), to obtain

1 11' 111', cos(N + l)y cos(N + 1)6g(y) sin(N +1/2)y dy = 9 (y) N 12 dy+g( 6) 12

6 6 +'2 N+'2

and now can take absolute values and estimate

I[ g(y) sin(N + ~)YdYI

s N ~ ~ ([ Ig1(y)IICO{N+ D yl dy + Ig(6)1 Cos(N +~H.Now Ig(6)1 ~ 1TSUPzIf'(z)1 as aboye, and

'( ) _ - f'(x - y) _ cos ~y ()gy- . 1 2' 19y,s102Y S1D2Y

yf'(z)Ig(y)l ~ -.-1- s 1TSUpIf'(z)l,SlD'2Y z

for g(y) = (f (x - y) - f (x)) / sin y /2. This time we divide the interval[-1T,7r] of integration up into two pieces, the interval [-6,6] and itscomplement 6 ~ Iyl ~ 1T, where 6 will be chosen latero In [-8,8] we donothing, and in its complement we integrate by parts. The hypothesisthat f is el implies g(y) is continuous away from y = O,but even aty = o we have the limit

lim f(x -.y) - f(x) = lim _._y_ f(x - y) - f(x) = -2f'(x).y-O s10~y y-O s10~y y

By the mean value theorem f(x - y) - f(x) = -yf'(z) for some z,which gives the estimate

Chapter 12 Fourier Series536

Page 556: Strichartz_The Way of Analysis 2000

We have now seen that Fourier's conjecture is true if we add the addi­tional hypothesis that the function be el. However, there is somethingvery unsatisfactory about this result, because the hypothesis is stronger

12.2.2 Summability of Fourier Series

It would be nice to know that the convergence is also absolute sothat the order of terms is immaterial. This is indeed the case as we willshow latero However, there are conditions on I that yield the conver­gence of SNI but not the absolute convergence. There are also manyvariants of this theorem that give convergence under weaker conditionsand local versions in which only the convergence at a single point ispreved. Some of these are given in exercise set 12.2.6.

where the constant e does not depend on 1 or x or N (we use here(N + 1/2)-1 ~ (N + 1/2)-1/2 for N ~ 1). Thus we have the uniformconvergence of the partial sums to l. We have actually shown that therate of convergence is O(N-1/2), with the constant depending only onsup, 1/'(z)l. QED

(1)-1/2

ISN I(x) - l(x)1 ~ e N + 2" s~p 1/'(z)1

Is this good enough? Yes, if we choose 8 = (N + 1/2)-1/2 we obtainthe estimate

ISNI(x) - l(x)1 = 2~ 11.:g(y)sin( N + DYdyl< [ó + N ~ !(:ó + 1)hp1/'(z)l.

so Ig'(Y)1 ~ (1I-¡8)supz1/'(z)1 + (1I"/28)supz1/'(z)1 using siny/2 ~ 8/11"on [8,11"] (this followsfrom Iyl ~ 1I"Isiny/21).Thus

Ii"g(Y)Sin(N+DYdYI s N~! (H++~pl/'(z)l.The factor of N + 1/2 in the denominator is helpful, but the 6 in thedenominator is a potential problem. Adding the estimates for [-8, 8]and its complement we obtain

5371~.~ Convergence 01Fourier Series

Page 557: Strichartz_The Way of Analysis 2000

as shown in Figure 12.2.2. For this choice of coefficients (INI is just thearithmetic mean of the first N + 1 partial sums, (INI =(Sol + SI! + ... + SN f)/(N + 1).

AN,n = (N + 1 - Inl)/(N + 1)= 1- Inl/(N + 1),

:v(lNI(x) = E AN,neneinx

n=-Ninstead oí SNJ(X). We choose the coefficients AN,n to be small whenn is near ±N so that the new terms are gently eased in, but we letAN,n -+ 1 for fixed n as N -+ 00 so that each term is eventuallycounted toward the sumo We can then ask if (INI converges to l. Eachchoice of coefficients AN,n (with O ~ AN,n ~ 1 and limN_oo AN,n = 1for each fixed n) is called a summability method, and if (INI -+ I wesay the Fourier series is summable to f by the particular summabilitymethod. In fact the idea of summability methods can be applied toany sort of series, not just Fourier series. A key fact is that if the seriesconverges, then it is summable to the same limito If this were not true,the use of summability methods would be very suspicious. Fortunately,it is simple to prove, and we leave it to the exercises.

The simplest summability method is Cesaro summability, where wetake the coefficients AN,n to be linear in n,

than the conclusion. That is, the natural hypothesis for convergence ofthe Fourier series is that I be continuous; if I is el we would like toknow that the differentiated Fourier series E~=-oo ineneinx also con­verges to 1'. But neither statement is true. Still, there is a way toimpove matters by changing the question. Instead of asking for theconvergence of the Fourier series, we ask for a weaker condition calledsummability. This notion exploits the great amount of cancellation thatis to be expected due to the oscillations of the functions einx.

The point is that by simply adding the terms of the Fourier seriesin pairs (cNeiNx + c_Ne-iNx) in passing from SN-d to SN 1, we aredoing things too abruptly, giving too much emphasis to the new terms.It is like dropping a pebble in a lake-even if the pebble is small it canmake a big splash. However, there is a way to ease the pebble into thewater gently. What we do with the Fourier series is to multiply eachterm eneinx by a constant that depends on both n and N. That is, welook at

Chapter 12 Fourier Series538

Page 558: Strichartz_The Way of Analysis 2000

- cos{n + l)y + cosny

= -cos[ (n + Dy+ ~Y] +cos[(n+~} - ~Y]= - (cos(n + Dycos ~y - sin(n +~,Si+)

Now the idea is that the Fejér kernel should be better behaved than theDirichlet kernel, because it involves an averaging process that exploitsthe cancellations in the Dirichlet kernel. To see this more clearly wehave to simplify the expression for KN, using familiar trigonometricidentities in an especially clever way. We observe that

1 ~ 1 ~ sin{n + 1/2)yKN{Y) = N + 1 ~ Dn{Y) = N + 1 ~ sin{1/2)y .

where the Fejér kernel KN{Y) is given by

1uNf(x) = N+l{Sof(x)+ ... +SNf{x))

1 N 1 j7r= N + 1 ~ 211"-7r f{x - y)DN{Y) dy

1 j1r= 211"-7r f(x - y)KN{Y) dy

The idea of using Cesaro summability on Fourier series is due to Fejér,and it works splendidly.

Following Fejér, we write

Figure 12.2.2:

5391!.! Con1Jergence of Fourier Series

Page 559: Strichartz_The Way of Analysis 2000

NI() E N + 1- Inl inx

UN X = N ene.+1n=-N

Theorem 12.2.3 (Fejér) Let I be any contínuous functíon periodíc 01period 211'.Then the Fourier series 01 I is unilormly Cesara summableto 1,uNI__., I unilormly as N__.,oo where

The first is the appearance ofthe factor 1/(N +1), which means KN(Y)gets small as N __.,00, except near y = Owhere sin y /2 vanishes. Thisis the concentration of KN (y) near y = O,which gives it the propertiesof an approximate identity. This fact is not surprising in view of ourstrategy of averaging oscillations of Dn (y). The second fact is that K Nis non-negative, KN (y) ~ Ofor a11y. This is rather startling, and 1knowof no explanation for it other than the computation itself. However,it is not essential for the theorems we will proveo The sketches of thegrapbs of D3(X) and K3(X) in Figure 12.2.3 show a marked contrasto

() sin(N + ~)yDN y = . 1 .sm2y

1 (sin( N±l )y) 2K (y) = __ 2N N+l . 1 •Sln2Y

Examining this expression carefully, we notice two important improve­ments over the Dirichlet kernel

and so finally

1 N• 1 ~) - cos(n + l)y + cosny)

2sm 2Y n=O

= 1- cos(N + l)y _ (sin(~ )y)22 sin !y - sin!y

Thus

Chapter 12 Fourier Series540

Page 560: Strichartz_The Way of Analysis 2000

KN(Y) = _1_ (Sin~ tl.fl)y) 2N + 18mb

1. KN(Y) ~ O,

2. (1/211')¡:'Ir KN{Y) dy = 1, and

3. limN_oo KN(Y) =O uniformly on Iyl ~ s for any 6 > o.From the expression

Thus we need to verify that

1 j'"(IN I(x) = 211" -11" f(x - y)KN(Y) dy.

Proof: Thisis essentially a consequence of the approximate identitylemma, since we have seen

Figure 12.2.3:

-1['

5411B.! Convetyence 01Fourier Series

Page 561: Strichartz_The Way of Analysis 2000

{6+ [ KN(y)dy::; 4~

where M = sup; I/(x)l. This is possible using property 3 if we take21rc/82(N + 1) s &/4M. Then

L6 +[ I/(x-y)- /(x}lKN(y) dy s 2M (L6+[ KN(Y) dY) s ~and so IUNI(x) - l(x)1 =::;; e for all x. QED

for all N. Given 8 we chooseN large enough so that

16 e 16 e 111' &I/(x-y)- l(x)lKN(Y) dy ~ -2 KN(Y) dy =::;; - KN(Y) dy = --6 -6 2 -11' 2

by Minkowski's inequality. Given e > Owe first find 8 > Osuch thatIy!' =::;; 8 implies I/(x - y) - l(x)1 ~ &/2 for all », This followsfrom theuniform continuity of 1,which is a consequence of periodicity. Indeedit suffices to prove the result for IxI ~ 1r by periodicity, and then wehave the uniform continuity of 1on the compact interval Ixl =::;; 1r+ 1(keeping 8 =::;; 1 to stay inside it). This enables us to estimate

1111'IUNI(x) - l(x)1 ~ 21r -11'I/(x - y) - l(x)IKN(Y) dy

and so

1 111'uNI(x) - I(x) = - [/(x - y) - l(x)]KN(Y) dy21r -11'

for 8 =::;; lyl =::;; 1r. To verify property 2 we integrate the expressionKN(Y) = E~=oDn{y)/(N + 1), using the fact that f~1I' Dn(Y) dy = 1for all n.

Repeating the argument of the approximate identity lemma, wewrite (using property 2)

IK ( )1 < 1 1 eN y - N + 1 (sin !y)2 =::;; 82(N + 1)

we obtain property 1, and then property 3 followssince

Chapter 12 Fourier Series542

Page 562: Strichartz_The Way of Analysis 2000

(1L )1/2d2(f, g) = 211" -11' I/(x) - g(xW dx .

Although Fejér's theorem suggests that the Cesaro sums uNI may doa better job than the partial sums SN I in approximating 1, there is acriterion by which SNI is the best approximation among all trigono­metric polynomials ¿l!.N llnel1lZ of order N. This involves measuringthe error in the mean-square sense as

12.2.3 Convergence in the Mean

Notice that we are then approximating a discontinuous function bycontinuous functions, so the convergence cannot be uniformo For thesame reason, the family uNI cannot be unifonnly equicontinuous.

Fejér's theorem gives an explicit proof that every periodic contin­uous function can be uniformly approximated by trigonometric poly­nomials. A trigonometic polynomial is defined to be any finite Fourierseries, ¿~=-N eneln8. Since the trigonometric polynomials fonn an al­gebra, this fact is also obtainable from the Stone-Weierstrass theorem,although in less explicit formo In the opposite direction, we can obtainthe Weierstrass approximation theorem as a consequence of Fejér's the­orem by replacing the exponentials el1lZ in aNI (x) by partial sums oftheir power-series expansion, which we know converge uniformly onbounded intervals.

An immediate corollary of Fejér's theorem is the uniqueness 01Fourier series: if I and 9 are two continuous functions with the sameFourier coefficients, then I = g. Indeed if the Fourier coefficients areequal, then UNI = uNg for every N and, letting N __.,00, we obtainI =g.

i(lim I(x) + lim I(X)) = lim iU(xo + 8)+ I(xo - 8)).%-%t %-%0 3-0

There are many variants of this resulto For example, if I is justRiemann integrable but continuous at a point xo, a simple modificationof the same proof shows UNI(xo) __.,I(xo). Even if I has a jumpdiscontinuity at xo, UNI (xo) will converge to the average value

54312.2 ContJergence 01Fourier Series

Page 563: Strichartz_The Way of Analysis 2000

Proof: We define Pu = E~=l (u, vn)vn and observe (Pu, Vk) =(u, Vk) for any k by the orthonormality condition. Therefore(u - Pu, Vk) = O and so (u - Pu, E~=l akvk) = O, showing u - Puis orthogonal to W. Of course Pu is in W from the definition. Now if

hence, we have Bessel's in equali ty

NL I(u, vn)12 s Ilu112•n=l

Nd(u,Pu)2 = IluW - L 1(u,vn) 12,

n=l

Recall that the exponential functions e'nz are orthogonal vectors withrespect to this inner product, (elnz, elkx) = O if k =F n; and that thechoice of the factor 1/211" in the inner product malees them normal­ized, (einz, einz) = 1. The normalization is purely conventional, andeverything would work as well without it.

We wiIl need to use the following basic fact about orthonormalvectors in an inner product space (real or complex), which generalizesthe formula for projecting a vector onto a subspace of Rn•We state andprove the result in the abstract context because it clarifies the ideas.

Theorem 12.2.4 (Pro;ection Theorem) Let VI, ••• ,VN be orthonormal»ectors in an inner product space V and W denote the subspace theyspan, W = U::~=l anvn}. Given any vector u in V, the problem 01finding the vector w in W that minimizes the distance d(u, w) is soluedby w = Pu = E~l (u, vn)vn. Pu is called the orthogonal projection 01u onto W and is characterized by the condition that Pu is in W andu - Pu is orthogonal to W. The distance d(u, Pu) is given by

1111'(I,g) = 211" _1I'/(x)g(x)dx.

associated to the inner product

11/112= (2~L 1/(x)12 dx) 1/2

This is the metric asaociated to the L2-norm

Chapter 12 Fourier Series544

Page 564: Strichartz_The Way of Analysis 2000

Returning to the concrete context of the L2 inner product on 211'­periodic continuous functions, we consider the set of orthonormal vec-

QED

N

(Pu,Pu) = (u,Pu) = (Pu, u) =El(u,vn)12•n=l

Notice that the first term d(u, Pu)2 is independent of w, and thesecond term d(Pu, w)2 is non-negative. It is obvious that the sumis minimized if the second term d(Pu, w)2 vanishes, which happensexactly when w = Pu. Thus w = Pu is the unique minimizer. Toobtain the formula for d(u, Pu)2 we compute

d(u, PU)2 = (u - Pu, 1.1.- Pu)= (u, u) + (Pu, Pu) - (u, Pu) - (Pu, u)

and (usiog the definitions of Pu)

Figure 12.2.4:

u

because u - Pu is orthogonal to W, as in Figure 12.2.4. This is ageneralization of the Pythagorean theorem.

w is any vector in W, then u - w = (u - Pu) + (Pu - w) and Pu - wis a1so in W. Thus

d(u,w)2 = (u-w,u-w)= ((u - Pu) + (Pu - w), (u - Pu) + (Pu - w»)= (u - Pu,u - Pu) + (Pu - w,Pu - w)= d(u, Pu)2 + d(Pu, w)2

5451!.! Con1Jergence 01 Fourier Series

Page 565: Strichartz_The Way of Analysis 2000

the mean-square convergenceof Fourier series. We can then take thelimit as N -+ 00 in the identity

1 j7rlim -2 I/(x) - SNI(x)12 dx = 0,N-oo 1f' -7r

Thus combiningFejér's theorem and the projection theorem we obtain

1 j1l'lim -2 I/(x) - O'NI(xW dx = O.N-oo 1f' -1r

1 j7r21f' -7r I/(x) - O'NI(x)12 dx.

On the other hand, I - O'NI goes to zero uniformly, so

is less than

1 j7r- I/(x) - SNI(x)12 dx21f' -1r

In particular,

where en = (1/21f') J~7r I(x)e-inx dx.

and

tors einx for - N ~ n ~ N. Note that PI is exactly SNI. Thus wehave the followingcorollary.

Corollary 12.2.1 Among all trigonometric polynomíals ¿~=-N aneinx01degree N, SNI minimizes the L2 distance to 1, [or any 21f'-periodiccontínuous functíon l. Furthermore, we have Bessel's ínequalíty

Chapter 12 Fourier Series546

Page 566: Strichartz_The Way of Analysis 2000

so I == O (the same conclusion also follows from Fejér's theorem sinceUN I would also be identically zero). The completeness can be ínter­preted as saying we have not omitted any terms in forming the Fourierseries expansion-in retrospect, this is the idea Bernoulli should haveadvanced.

1 l1r- I/(xW dx = O,27r -1r

as N -+ 00; hence

1 l1r 1 l1r- I/(x)12 dx = - I/(x) - SN l(x)12 dx -+ O27r -1r 27r -1r

where cn(!) and en(g) denote the Fourier series coefficients of I and g.This is obtainable by applying the polarization indentity to both sidesof Parseval's identity.

The mean-square convergence of Fourier series implies a propertyof the orthonormal system einx called completeness (warning: this useof the term is somew hat different from the previous usages we haveencountered): there is no non-zero continuous function I (x) that isorthogonal to all the functions e=, -00 < n < oo. For if there weresuch a function, then all SNI would be identically zero and so

00 1 l1rL cn(!)cn(g) = 27r I(x)g(x) dxn=-oo -1r

This identity gives us very useful information concerning the rate ofgrowth of the Fourier coefficients. In particular, it implies limn->±oo en =O, a fact that is referred to as the Riemann-Lebesgue Lemma. Inciden­tally, both the Riemann-Lebesgue lemma and Parseval's identity arevalid under much more general hypotheses than continuity on 1, but itrequires the Lebesgue theory of integration to explain these hypothe­ses. We will return to this in Chapter 14. There is also a bilinear formof Parseval's identity:

to obtain Parseval's identity:

54712.2 Convergence 01Fourier Series

Page 567: Strichartz_The Way of Analysis 2000

is finite by the Cauchy-Schwartz inequality. Thus the order of terms inthe Fourier series of a el function is immaterial, PAN1-+ 1uniformly.We should point out, however, that this argument only shows that SNI

which implies

by integration by parts (there are no boundary terms because 1 isassumed periodic, so 1(-11") = 1(11")). Note this is the same result thatwe would obtain by formally differentiating the Fourier series. TheParseval identity for I' is

We are now in a position to show the absolute convergence of Fourierseries of el functions. Because leinxl == 1, the absolute convergence isonly a question of the size of the coefficients, ¿~=-oo eneinx convergesabsolutely if and only if ¿~=-oo lenl < oo. Now if 1 is el, then I' iscontinuous, and the Fourier coefficients of I' are

_!_ J7I" I'(x)e-inx dx = in_!_ J7I" I(x)e-inx dx = inc¿211" -71" 211" -7r

where B denotes the complement of A, by Parseval's identity (theFourier coefficients of 1- PA1 are Cn if n is in B and O if n is inA). If AN is any sequence of finite subsets increasing to all of theintegers and BN is the complement of AN, then

One striking property of the mean-square convergence is that itdoes not depend on the order of the terms. If A is any finite subset ofintegers, let PAI(x) = ¿A cneinx. Then

Chapter 12 Fourier Series548

Page 568: Strichartz_The Way of Analysis 2000

(f(xo - y) - f(xo))DN(Y)

= (f(xo -.y) - f(Xo))sin(N +~) ysm!y 2

(f(xo - y) - f(xo)) sinNy + (f(xo - y) 1- f(xo)) cosNy.tan2y

When we integrate over [7r,-8] and [8,7r] we avoid the zero of tan y,so the limit is O as N -+ 00 by the Riemann-Lebesgue lemma. (Actu­ally we have to extend the Riemann-Lebesgue lemma to discontinuousfunctions because the integral cuts off at y = ±8. Another way to getaround this is to use a continuous cutoff of the integral, multiplying bya continuous function equal to 1 on Iyl ~ 8 and zero on Iyl ~ 8/2. Weleave the details to the exercises.) QED

The first term will be the same for 9 in place of f, so it suffices to showthat the second term goes to zero as N -+ oo. Using a trigonometricidentity for sin(N + 1/2)y we compute

111f- [f(xo - y) - f(xo)]DN(y) dy211" -1f

= 2_18[f(xo - y) - f(xo)]DN(Y) dy27r -8

1 1-8 J.1f+ -2 + [f(xo - y) - f(xo)]DN(y) dy.7r -7r 8

Proof: As in the proof of convergence of the Fourier series for elfunctions, we write

Theorem 12.2.5 (Loealization) Let f and 9 be two continuous peri­odie functions such that f (x) = g( x) for x in [xo - 8, Xo + 8], for sornefixed 8 > o. Then the Fourier series for f and 9 either both convergeor both diverge at xo.

converges; it does not show that the limit is f. For that we have torefer to the original proof.

Another application of Parseval 's identity (actually the Riemann­Lebesgue lemma) is the principIe of localization: the convergence of aFourier series at a point Xo depends only on the behavior of f (x) for xin a neighborhood of xo.

54912.2 Convergenee of Fourier Series

Page 569: Strichartz_The Way of Analysis 2000

Qn.m(x)cosmx cos(m + 1)x cos(m + 2)x cos(m + n - 1)x= + + + ... +-....;._ ....;".._

n n-1 n-2 1

The first result is relatively easy to demonstrate using sorne functionalanalysis, the second result is very difficult to demonstrate, and thethird result is horrendously difficult. These results should give you aninkling of the complexity and subtlety of the issues raised by Fourier'sconjecture. However, the essential validity of Fourier's point of view isfirmly established.

The idea of Fejér's example of a divergent Fourier series is to con­struct the Fourier coefficients en in such a way that a certain subse­quence of partial sums converges uniformly (this will give us the con­tinuous function f) but other subsequences of partial sums will beunbounded at x = O. The basic building block is the sum

3. In 1966 Carleson showed that for a large class of functions (L2integrable in the sense of Lebesgue) including all continuous func­tions, the Fourier series converges at "almost every" point (theset of points at which the series diverges has Lebesgue measurezero).

2. In 1926 Kolmogorov gave an example of a function (integrablein the sense of Lebesgue) whose Fourier series diverges at everypoint. This function is not continuous, however.

1. A "typical" continuous function has a Fourier series that divergeson a countable dense set of points.

We come now to our first negative result, an example of a continu­ous function whose Fourier series diverges at a point. The first suchexample was given by du Bois Reymond in 1876. The example wegive is due to Fejér. In a sense these examples show that Fourier'sconjecture was falseo Nevertheless, the three positive results that wehave established-mean-square convergence and Cesaro summabilityfor continuous functions and uniform convergence for el functions­more than compensate. In this regard we should also mention threemore recent results that are beyond the scope of this work:

12.2.4 Divergenceand Gibbs' Phenomenon*

Chapter 12 Fourier Series550

Page 570: Strichartz_The Way of Analysis 2000

~ sin k» LX 1 1LX 1L._¡ -k- = "2(Dn(t) - 1) dt = "2 Dn(t) dt - -x.k=1 O o 2

Thus we need to verify the uniform boundedness oí

Lx D ( ) d - LX sin(n + ~)t

n t t - . 1 dt.O O sin 2t

and so

n n

Dn(x)= ¿ eikx = 1+2¿coskxk=-n k=1

. n sinkxQn.m(X) = 2sm(m + n)x¿-k-.

k=1

To complete the proof we wiIl show that E~=1(sin kx) Ik is uniíormlybounded.

We use an interpretation for L~=l(sin kx)lk in terms oí integralsoí the Dirichlet kernel. Recall that

and use cos(m + n ± k)x = cos(m + n)x coskx =f sin(m + n)x sin kx toobtain

cos(m + n + k)xk

cos(m + n - k)xk

Proof: We group together the two terms

Lemma 12.2.1 There exists a constant e such that IQn,m(x)1~ e [orall n, m, and x.

Notice that if we sum just the first n positive terms at x = O we get1/n + 1/(n - 1)+ ... + 1 > logn by comparison with logn = ft lIt dt.This sum is Sn+mQn.m(O).On the other hand, Qn.m(O)= O,and in factwe will show that Qn.m (x) is bounded by a constant independent oí nand m. Thus Qn.m (x) is a function with the property that a certainparcial sum oí its Fourier series is very much larger at x = Othan anyvalue oí Qn.m(x). Once we have established this it will be a simplematter to create I (x) by taking a suitable infinite linear combinationoí such functions,

cos(m + 2n) xn

cos (m + n + 2) x2

cos (m + n + 1) x1

5511!.! Convergence 01Fourier Series

Page 571: Strichartz_The Way of Analysis 2000

decreases with k because the numerator sin(n + 1/2)t is the same inabsolute value and the denominator sin(1/2)t increases with k.

(k+1 ...

J n+1 2~ Dn(t) dtn+1/2

~ 2~ n~t = n + 1/2' n + 1/2" .. , n + 1/2'

(See Figure 12.2.5). The function fox Dn(t) dt thus increases on[O,~/(n + 1/2)], then decreases on [~/(n + 1/2),h/(n + 1/2)], andcontinues to alternate between intervals of increase and decrease. Fur­thermore, the absolute value of

This can easily be seen if we recall that the graph of Dn(t) oscillatesbetween positive and negative values, changing sign at the zeros ofsin(n + 1/2)t,

Figure 12.2.5:

Chapter 12 Fourier Series552

Page 572: Strichartz_The Way of Analysis 2000

whose graph is shown in Figure 12.2.6. To see this, we compute

!r ('Ir - x) sin kx dx = ('Ir - x) (- cos kX) 111'_!. r cos kx dx = .!:.'Ir Jo 2 'Ir k O 'Ir Jo k k

{('Ir - x) /2 if O s x $ 'Ir,

9 (x) = _ ('Ir + x) /2 if - 'Ir ::; X < O,

the sum commuting with the integral because the limit is uniformo At mostone of the terms J::7r Qnlc,mk(x) e-ijxdx is non-zero because of the non­overlapping condition. In particular Snlc+mlcf(O) = akSnlc+mIcQnk,mlc(O)because Snlc+mIcQnj,mj = Qnj,mj for j < k and Qnj,mj(O) = O whileSnlc+mIcQnj,mj == O for j > k. Thus Snlc+mlc/(O) ~ ak lognk, and if wechoose ak and nk so that ak log nk -+ 00 we have the divergence oí theFourier series for / at O. A particular choice of the mk, nb and Qk thatmeet all the aboye conditions is ak = l/k2 (so Eak < oo),nk = 2(k3)(so ak lognk -+ 00), and mk = 2(k3) (so mk > mk-l + 2nk-l).

An interesting fact about the sums 2:;=1 (sin kx)/k is that they arethe partial sums of the Fourier series oí the function

1 1'" 00 1 1'"-ijx -ijxCj=-2 ¡(x)e dX=Lak- Qnlc,mlc(x)e de,'Ir -'" k 2'1r_1I'=1

Concerning the functions Qn,m(X), we have shown IQn,m(x)1 $ efor all n and m while Sn+mQn,m(O) ~ logn. Now we select sequences{nk} and {mk} so that there is no overlap between the exponentialsoccurring in the different Qnlc,mlc' We can accomplish this by requiringmk > mk-l + 2nk-l since cosjx occurs in Qn,m only for m $ j $m + 2n. We then choose a sequence {ak} of positive coefficients suchthat 2:~1 ak is finite and set f(x) = 2:::1 akQnlc,mlc(x). The seriesconverges uniformly by Lemma 12.2.1, so I is continuous. Also, theseries that defines f is the Fourier series of f, because

Thus the function ¡;D¿ (t) dt attains its absolute maximum at x ='Ir/(n + 1/2}. But we can easily estimate

r/(n+l)/2 'Ir ( 1)Jo Dn(t) dt s n + ~ ·2 n + '2 = 2'1r

because Dn(t) attains its maximum value 2(n + 1/2} at t =O. QED

55312.2 Convergence 01Fourier Series

Page 573: Strichartz_The Way of Analysis 2000

because (1/2)t,lsin(1/2)t -+ 1 uniformly on O ~ t ~ 1r/(n + 1/2). Nowwe can compute Jo1r(sint/t) dt ~ 1.081r/2 by numerical integration (wecan also argue that Jo1r(sin t/t) dt > JoOO(sint/t} dt by an argument simi­lar to the prooíthat ¡;sin(n+1/2}t,lsin(t/2) dt attains its maximum att = 1r/(n + 1/2}; and we compute JoOO(sint}/tdt= 1r/2 exactly, giving

Figure 12.2.6:

Now the claim is that this value actually exceeds 1r/2, which is themaximum value oí g, by an amount that does not go to zero as n -+ oo.In fact

li En sin kx I - li 1 lo n+~/2 D ( ) 1rm -- - m- nt---n-oo k .. n-oo 2 o 2n + 1

k=l x=n+l/2

= lim! f n+~/2 sin(n + 1/2}t dtn-oo 2 Jo sin t/2

= lim! fn+~72 (_t_) sin(n + 1/2)t dtn-oo 2 Jo sin t/2 t

li r: sin(n + 1/2}t d li lo1r sint d lo1r sint d= m t= m - t= -- tn-oo o t n-oo o t o t

o-1[

and observe that there are no cosine terms since 9 is odd, Note thatg( x) has a jump discontinuity at x = O. What is the behavior oíthe partial sums of the Fourier series of 9 in a neighborhood of thisdiscontinuity? Oí course they must approximate the jump, but in factthey do something more. At the point x = 1r/(n + 1/2} we computed

n sin kx 1 r/(n+l/2) 1rE-k- = 2 Jo Dn(t)dt- 2n+ i 'k=l

Chapter 12 Fourier Series554

Page 574: Strichartz_The Way of Analysis 2000

To conclude this chapter on a positive note, let's return to the heat e­quation that Fourier considered and see what we can sayoFor simplicityof notation we choose L = "Ir and e = 1, so the p.d.e. is 8u/8t = 82u/8x2on O~ x ~ "Ir with boundary conditions u(O, t) = u(1t, t) = O. We willreftect u oddly about x = O,u( -x, t) = -u(x, t), and then extend it tobe periodic of period 2"Ir • Thus instead of the original boundary condi­tions we will assume periodicity: u(x + 2"1r, t) = u(x). We will assumethat u(x, t) is a e2 function on t > Oand that u is continuous on t ~ Owith u(x, O)= I(x).

For each fixed t > Owe consider the Fourier series of the continuous

12.2.5 Solution of the Heat Equation*

and the height of the leap is independent of the number of terms in thepartial sumo This is referred to as Gibbs' phenomenon, and it is trueof any function with a jump discontinuity that is el up to the jump,for such a function can be written as the sum of a el function plus avariant of g. It may be regarded as a negative result because it showsthat the behavior of Fourier series near jump discontinuities is worsethan it has to be.

Figure 12.2.7:

Jo7r(sin t) / t dt > "Ir / 2). This means the partial sums of the Fourier seriesof g( x) make a little leap before they jump across the discontinuity, ascan be seen in Figure 12.2.7,

55512.2 Convergence 01Fourier Series

Page 575: Strichartz_The Way of Analysis 2000

lim -21 j1r u(x, t)e-inx dx = lim cne-n2tt-O+ 11' -1r t-O+

= en·

1 j1r .-2 f(x)e-mx dx =11' -1r

the boundary terms always cancelling because all the functions areperiodic.

Now we know the o.d.e. has only the solutions Cn(t) = Cne-n2t whereen is a constant. The fact that u(x, t) is continuous on the compact setIx I ~ 11', O ~ t ~ 1 implies that it is uniformly bounded there, so en (t)must satisfy len(t)1 ~ M for some fixed constant M for O ~ t ~ 1 andso lenl ~ M. This means that the Fourier series L:~=-ooene-n2teinxconverges absolutely if t > O (by comparison with ¿: e-n2t). We couldalso observe that since u(x, t) is assumed differentiable, the absoluteand uniform convergence u(x, t) = ¿:~=-ooen(t)einx has already beenestablished. Finally we can assert that en are the Fourier coefficientsof f because the uniform continuity of u(x, t) on O ~ t ~ 1 (by com­pactness) implies the uniform convergence of u(x, t) to f(x) as t -+ 0+,so

c~(t)

which is justified by the uniform convergence of the difference quotient(u(x,t + s)e-inz - u(x,t)e-inz)/s to the derivative as s -+ O. Thenwe use the p.d.e. to replace 8u/Ot by 82u/8x2 and integrate by partstwice:

1 j1r Buc~(t) = - ~ (x, t)e-mx dx,211' -1r Uf,

Now we claim that the p.d.e. for u implies that Cn(t) satisfies the o.d.e.¿n(t) = -n2en(t). This requires first differentiating inside the integralto show en(t) is el and

c,,(t) = (1/2'11")i:u(x, t)e-m,r de.

function u (x, t). The Fourier coefficients are

Chapter 12 Fourier Series556

Page 576: Strichartz_The Way of Analysis 2000

What can we say about it? First we observe that the sequence en isbounded, lenl ~ (1/21r)J':1I' If(x)1 dx, so that the series defining u(x, t)converges if t > O;in fact the convergenceis absolute and uniform in t ~e for any e > Oby comparison with ~:::e-fn2. Thus the function u(x, t)is well defined and continuous in t > O,and it is clearly periodic in x.Furthermore, we can differentiate term-by-term any number of timeswith respect to either variable, because ¿~=-oo Inlke-n2t converges forany fixed k. This enables us to conclude that u(x, t) is C2 in t > O(it is also eOO) and satisfies the p.d.e. by direct computation. Thereremains only the question of what happens as t -+ O. We would liketo assert that u(x, t) -+ f(x) uniformly as t -+ 0+ I for this will showthat by setting u(x, O)= f(x) we obtain a continuous function on t ~ Owith the correct initial conditions.

This turns out to be more difficult to do. We do not know thatE:=-oo eneinx converges in general, so the formula defining u(x, t) doesnot necessarily make sense for t =O. If we are willing to assume thatf is el, then we know the Fourier series for f converges absolutely,E lenl< 00, and from this it is straightforward to prove

00

u(x, t) = E ene-n2teinz.n=-oo

This establishes the uniqueness of the solution and justifies Fourier'sformula.

But nowwe have to confront the question: does this formula alwaysproduce a solution? We cannot use the above argument to answer thisquestion because we assumed the existence of the solution in derivingthe formula. What we have to do is start from the formula and see if itsolves the problem. In other words, we are given the continuous func­tion f(x), compute its Fourier coefficientsen = (1/21r)J~1I' f(x)e-inz dxand then write the series

Thus we have shown that if the heat equation with periodic boundaryconditions and initial condition u(x, O)= f(x) has a solution, then thissolution is

00 111'u(x, t) = E ene-n2teinz where en = (1/21r) f(x)e-inz dx.n=-OO -11'

55712.2 Convergence of Fourier Series

Page 577: Strichartz_The Way of Analysis 2000

It turns out that the heat kernelis positiveand behaveslikean approxi­mate identity (with continuouspararneter t -+ Oin placeof the discreteparameter n -+ 00 before). This implies u(x, t) -+ f (x) uniforrnly ast -+ 0+. Wewill not givethe proof of the properties of the heat kernel,since this would take us too far afield.

This example shows the utility of the Fourier series technique forthe problems that gave rise to it, but it also indicates that it is by nomeans a trivial exerciseto carry out the details of the applications.

During the eighteenth century, the concept of "function" was oftenconfusedwith that of "analytic expression" or "formula". The concep­t of "function" that we now accept was introduced in the nineteenth

00

Ht(x) = L e-n2teinx.n=-oo

1 j7ru(x, t) = 211' -7r f(y)Ht(x - y) dy

where the heat kernel H, is given by

this being justified by the uniforrn convergenceof the sum for t > O.We thus have u(x, t) written as a periodic convolutionoperator appliedto f:

u(x, t) =

However,this hypothesis is not natural, and in fact the conclusionis true without it. The idea of the proof is very similar to the proof ofFejér's theorem. We substitute the definition of en in the formula foru and interchange the integral and sum to obtain

n=-oo

n=-oo n=-oo

00 00L (ene-n2teinx - enelnx) < L lenlle-n2t - 11

uniformly since

Chapter 12 Fourier Series558

Page 578: Strichartz_The Way of Analysis 2000

to evalúate E:=11/n2•

7. Compute the Fourier sine series for cos x and the Fourier cosineseries for sin x on [O, 1(-].

8. Give a complete proof of the localization theorem.

o s x s 71",-7r:::; X < 0,{

(71" - x) /2,g(x) = -(7r+x)/2,

6. Apply Parseval's identity to the function

4. Let I be continuous except for the point Xo where I has a jumpdiscontinuity. Prove (IN I(xo) ~ (limx_xt I(x)+limx_x; l(x))/2.

5. Let I be Riemann integrable. Prove f~1r IUNI(x) - l(x)1 dx ~ Oas N ~ oo.

3. Let I be Riemann integrable and continuous at the point Xo.Preve UNI(xo) ~ f(xo) as N ~ oo.

2. Let f be continuous and let r (xo) existo Prove SNf (xo) ~ I (xo)as N ~ oo.

~ ~+1~;jW+f72 j +1 I ( 1) I (k+l)7I"N!~/2 IDN (t)1 dt 2:: N!~/2 sin N + 2 t dt / sin 2N + 1 .

Hint:

1. Prove there exists a constant e > O such that

12.2.6 Exercises

century-Dirichlet is reputed to be the first to state it explicitly, al­though Fourier also seemed to have it in mind. On the other hand,if we accept a Fourier series as an "analytic expression", then we canclose the circle of ideas: at least for periodic continuous functions, therealways is an analytic expression!

55912.2 Convergence 01Fourier Series

Page 579: Strichartz_The Way of Analysis 2000

(22N / ( 2; )) (cosx/2r

satisfies the same approximate identity properties as the Fejérkernel KN (properties 1-3 in the proof of Fejér's theorem).

c. Conclude that E~-N AN.nCneinz converges to 1 uniformlyas N_.,oo for any continuous, periodic function l.

(Hint: use the Euler identity for cosx/2 and expand(cosx/2)2N using the binomial theorem.)

b. Show that

a. Show that

14. *An altemative summability method, invented by John Hubbard,uses the factors

13. Prove that Iyl ~ 11"1sin y/21 on Iyl ~ 11".

12. Let 1 * g(x) = ¡:'7r I(x - y)g(y) dy for continuous periodic func­tions. What is the relationship between the Fourier coefficients oft.s. and 1 *g?

11. Let Ty/(x) = I(x + y). What is the relationship between theFourier coefficients of 1 and Tyl?

10. If {A} is a sequence of continuous functions converging uniformlyto 1, prove that the Fourier coefficients of Ik converge to theFourier coefficientsof l.

9. Prove there exists a continuous function whose Fourier series di­verges at a countable set of points. (Hint: if the example of adivergent Fourier series given in the text only diverges at a finiteset of points, take an infinite linear combination of translates ofit. )

Chapter 12 Fourier Series560

Page 580: Strichartz_The Way of Analysis 2000

a. Show that rlkleik9 is the unique solution of /l.u = Oin the discr < 1 of the form g(r)eik9 with g(l) = 1. (Hint: the o.d.e.for 9 has a two-dimensional space of solutions, but some ofthese solutions are singular at r =O.)

b. Assuming u(r, 8) = ¿k:-oo gk(r)eik9 has a Fourier series ex­pansion for each fixed r < 1 and that the series can be d­ifferentiated twice and also the limit as r ~ 1 can be in­terchanged with the series, show that the Dirichlet problem/l.u = Oin r < 1 and u(1,8) = 1(0) has the unique solutionu(r,8) = ¿~-oo ckrlkleik9 where Ck are the Fourier coeffi­dents of l.

c. Show that the solution can also be written in convolutionform, u(r,8) = (1/211')J~1I'Pr(8 - f1')f(fI')dfl' where Pr(8) =¿k=-oo rlkleik9.

82 1 8 1 82/l. = 8r2 + ;:8r + r2 882

(section 10.2, exercise 9).

16. *Let /l. = 82/ 8x2 + 82/ 8y2 in R,2.For this problem you will haveto use the polar coordinates representation

if k is even and a similar formula holds with cosine replacedwith sine if k is odd. (Hint: use b to show that the boundaryterms at ±11'cancel.)

d. Conclude that SN/(X) - I(x) = O(l/Nk-l) uniformly asN~oo.

s I( ) - f( ) = ±~ 111' {k-l}( )cos(N + 1/2)y dN x X 211' -7r 9 Y (N + 1/2)k-l y

15. *Let f be a Ck periodic function of period 211'for k ~ 2.

a. Show that g(y) = (f(x-y) - I(x))/ siny/2 is Ck-1 and thereexists M such that Ig{k-l}(y)l :$ M for all x and y.

b. Show that g(y + 211')= -g(y).c. Show that

56112.2 Convergence 01Fourier Series

Page 581: Strichartz_The Way of Analysis 2000

with uniform convergence on [-L, L], then ak = (1/L) J~L f(x) cos(k1r/L)xand bk = (l/L) J~L f(x)sin(k1r/L)x, and in this case we also havef(x) = ¿~=-oo cnei(n1r/L)x with cn = (1/2L) J~L f(x)e-i(n1r/L)x dx.

Theorem Jf f(x) =¿~1 bk sin(k1r/L)x with uniform convergence on[O,L], then bk = (2/ L) JoLf(x) sin(k1r/L)x dx. Similarly, if f(x) =ao/2+ ¿~lakcos(k1r/L)x with uniform convergen ce on [O,L], thenak = (2/L)JoL f(x)cos(k1r/L)xdx. Jf

Theorem The solutions to the eigenvalue problem d2'IjJ/dx2(x) = )..'IjJ(x)on O ~ x ::;L with 'IjJ(0)= 'IjJ(L)= O (Dirichlet boundary conditions) aremultiples of the functions sin(k1r/ L )x, k = 1,2, ... , unth ). = -k21r2 / L2.

12.1 Origins of Fourier Series

12.3 Summary

for any continuous function f on the circle,

f. Show by an approximate identity argument that u( r, 8) -+f (8) uniformly as r -+ 1 if f is continuous.

u= (1/2"-) ¡:Pr(8 - cp )/(cp) dcp

Note that Pr(8 - <p) = (1 - IAI2)/IA - BI2 where A = rei9and B =ép•

e. Show by direct differentiation that .6.u = Oif

1- r2Pr(8) = .1 - 2r cos8 + r2

d. Evaluate the infinite series for P; (8) to obtain

Chapter 12 Fourier Series562

Page 582: Strichartz_The Way of Analysis 2000

Theorem uNI(x) = (1/21r) J~1r l(x-y)KN(Y) dy where the Fejér ker-

~ N+ l-Inl· 1uNI(x) = ~ N + 1 ene1n:c= N + 1(So/(x) + ... + SNI(x)).

n=-N

Deflnition The Fourier series 01 I is said to be Cesara summable toI il uNI converges to 1, where

Theorem 12.2.1 and 12.2.211 I is periodic 01period 21rand el, thenthe Fourier series converges unilormly to I and the rate 01convergenceis O(N-l/2). 11I is e2, then the rate 01convergence is O(N-l).

EN sin(N + l/2)tDN(t) = eint =

sin t/2n=-N

Theorem SNI(x) = (1/21r) J~7r l(y)DN(X - y) dy where DN is theDirichlet kernel

Lemma 1/ I is periodic 01period 21r, then J~7rI(x) dx = J:+27r I(x) dx[or any a.

Deftnition III(x) is periodic 01 period 21r, the Fourier coefficíentsen are defined by en = (l/21r) J~7r I(x)e-in:c dx for n an ínteger andthe lormal series ¿:~oo enein:c is called the Fourier series 01 l. Thepartial sums SNI (x) 01 the Fourier series are defined by SNI (x) =E~=-N enein:c. The Fourier series is said to converge (pointwise, uní­lormly, or absolutely), il SNI(x) -+ I(x) in the desired sense.

12.2 Convergence of Fourier Series

Theorem The two-dimensional real vector space 01linear combinations01cos(k1r/ L)x and sin(k1r / Lk)x ís preserved under translation.

Theorem The solutions to the character identity 1/J(x +y) = 1/J(x )1/J(y)that are el and periodic 01period 2L are the functions ei(nfr/L):c, for nan ínteger.

56312.3 Summary

Page 583: Strichartz_The Way of Analysis 2000

Lernma (Riemann-Lebesgue) JI I is continuous and periodic,limn_±oocn = O.

[or two such [unctions.

00 1 j7r¿ cn(f)Cn(g) = - I/(x)g(x) dx271" -7rn=-oo

Corollary 12.2.1 (Mean-Square Convergence 01 Fourier Series) Let Ibe continuous and periodic. Then SNI minimizes the L2 distance to Iamong all trigonometric polynomials 01 degree N and

(1/211") 1:I/(x) - SN /(x W dx --+ O as N --+ oo.

Furthermore, Parseval's identity L:~=-oo ICnl2 = (1/271") J::7r I/(x W dxholds or, more generally,

Deflnition A trigonometric polynomial 01 degree N is a function 01the l. ",\"N inxe Jorm LJn=-N ane .

Theorem 12.2.4 (Projection) Let VI, ... , VN be orthonormal vectorsin an inner product space V with span W. Then Pu = L:r=l (u, vn)vn[or any vector u in V is the orthogonal projection 01 u onto W inthe sense that Pu is the unique vector in W such that u - Pu 1. W.Furthermore Pu is the unique minimizer in W 01 the distance to u andd(u, Pu)2 = IIuW - L:~=l I(u,vn)12; hence, we have Bessel's inequalityL:~=II(u,vnW < IIuW·

Theorem (Uniqueness 01 Fourier Series) JI I and 9 are continuousperiodic [unctions with the same Fourier series, then I = g.

Theorem 12.2.3 (Fejér) JI I is continuous and periodic 01 period 271",then the Fourier series 01 I is unilormly Cesaro summable to l.

nel K N (y) is given by

( )

2_ 1 00 _ 1 sin(NtI) Y

KN(Y) - N + 1~ Dn(Y) - N + 1 siny/2

Chapter 12 Fourier Series564

Page 584: Strichartz_The Way of Analysis 2000

Example For the appropriate choice of sequence ak,nk,mk (e.g., ak =k-2, nk = mk = 2k3) the functíon ¿k:l akQnlc,mlcis contínuous bui hasdivergent (in [act unbounded) Fourier series at x = o.

Qn.m(x)cosmx cos(m + 1)x= + + ...n n-1cos(m + n - 1)x cos(m + n + 1)x

+ 1 - 1cos(m+ n + 2)x _ •.• _ cos(m+ 2n)x

2 n

are unilormly bounded [or all x, n, m, but Sn+mQn.m(O) > logn.

Lemma The trigonometric polynomials

Theorem 12.2.5 (Localization) The convergen ce or divergence 01 theFourier series 01 a continuous functíon at a poínt Xo depende only onthe functíon on any neighborhood 01Xo.

Theorem 11 I is el and periodic íts Fourier series converges absolute­ly.

Theorem 11 I is el and periodic with Fourier series E eneinx, thenl' has Fourier series E ineneinz.

Theorem The mean-square convergen ce 01Fourier series does not de­pend on the order 01 the terms; i.e.,

nl!.~(1/2,,) l: 1/(") - PNI("W do: =O

il PN I(x) = EAN eneinz where AN ís any íncreasíng sequence 01 sub­seis 01 íntegers such that U~=l AN is all íntegers.

Theorem (Completeness 01 Trigonometric System) The functions einzas n varies over the integers [orm an orthonormal set with respect tothe inner product (1, g) = (1/211") J~1r I(x)g(x) dx, whích ís complete inthe sense that no other non-zero continuous functíon is orthogonal toall einz.

56512.3 Summary

Page 585: Strichartz_The Way of Analysis 2000

where en are the Fourier coefficients 01 l. Conversely, [or every elfunction I (actually continuous will suffice), (*) solves the heat equatíonand initial conditions.

n=-oo

00

u(z,t) = E

Tbeorem (Fourier) 11u(z, t) is a e2 functíon for t > O, periodíc 01period 2'1r in x, satUflling the heat equation 8u/8t = 82u/8x2 togetherwith the initial condition limt-O u( z, t) = I(x) uniformly, then

Then 9 heu a jump ducontinuity at z = O with jump 11",but the par­tial SUml 01 the Fourier serie« SN9 jump by more than 1.0811" between±Nfm for all N.

Example (Gibb,' Phenomeno) Let

{('Ir - z)/2 ilO s x s 11",

g(x) = -(1I'+z)/2 if -11' ~ Z < O.

Chapter 12 Fourier Series566

Page 586: Strichartz_The Way of Analysis 2000

567

dyM(x, y) dz + N(x, y) = O

It frequently happens that a function is not given explicitly as y = f(x)but rather implicitly as the solution to an equation F(x, y) = O. In thissection we will describe the Implicit Function Theorem, which givesconditions under which such equations actually do define functions.This is an extremely useful and important theorem, but it is also quitesubtle. We have already seen an important case, involving inversefunctions. If y = f(x) is given explicitly, the inverse function f-l(x) isdefined implicitly as the solution of the equation x = f (y), .which wecan write x - f(y) = O. Note that this is of the form F(x, y) = OforF(x, y) = x - f(y). We have proven an inverse function theorem in onedimension, and we exploited it in the definition of the sine and cosine.We wiIl prove an n-dimensional version as a coroIlary of the ImplicitFunctíon Tbeorem.

Another situation in which implicit functions arise naturally is insome of the so-called exact (or cookbook) solutions of o.d.e.'s, Forexample, the o.d.e.

13.1.1 Statement of the Theorem

13.1 The Implicit FunctionTheorem

Implicit Functions,Curves, and Surfaces

Chapter 13

Page 587: Strichartz_The Way of Analysis 2000

(often written M(x, y)dy + N(x, y)dx) is called ezact if there existsF(x,y) such that M = 8F/8y and N = 8F/8x (this wiIl happen if8M/8x = 8N/8y and the region of definition is reasonable-see theexercises in section 11.3). Even if the o.d.e. is not exact, it can oftenbe made exact by multiplying by an appropriate factor. Once theo.d.e. is exact, it is equivalent to (8/ 8x )F( x, y( x)) = O and, hence,F(x, y(x)) = e for some constant c. Thus all solutions of the o.d.e. aresolutions of the implicit function equation F(x, y) - e = O. In order forthis procedure to produce a solution to the o.d.e. wehave to knowhowto solve the implicit equation.

A third natural situation in which implicit equations arise is in thestudy of curves and surfaces. We frequently describe such geometricobjects as the solution sets of certain equations. For instance, the unitsphere in ~ is the solution set for F(x, y, z) = O where F(x, y, z) =x2 + y2 + z2 - 1. We can think oí this sphere as a two-dimensionalsurface because we can solve for one of the variables in terms of theothers-at least in portions of the sphere, say z = - J1 - x2 - y2 inthe lowerhemispherez < O. Thus x and y provide a pair of coordinatesfor this hemisphere. But if the function F weremore complicated, say

X27y5 _ zl6x9 + x4y4 z4 - 8xyz - 1,

so that we could not solveF(x, y, z) = O explicitly,couldwe still main­tain that the solution set is a two-dimensionalsurface? Wewill see ina later section that the implicit function theorem is the perfect tool foranswering this question.

The basic idea of the differentialcalculus is that if you have a ques­tion about a general function in a neighborhood oí a point, ask thesame question about the best affine approximation at that point andthe answers should be qualitatively the same. This perspective willlead us to the correct statement oí the implicit function theorem andwill give us a hint at how to prove it. Notice that this approach limitsus to a local theorem, This turns out to be just right because simpleexamples show there are no nice global theorems. In this respect theone-dimensional inverse function theorem is very misleading becausethere is a nice global resulto

'V\'ewant to considera system ofm equations for m unknown func­tions Yl,"" Ym (each being a function oí n variables Xl,." ,Xn). Wewill write y = (y¡, ... , Ym) and x = (Xl,"" xn), and wewill write the

Chapter 13 Implicit Functions, Curves, and Surfaces568

Page 588: Strichartz_The Way of Analysis 2000

This is a system of affine linear (more commonly called inhomogeneouslinear) equations for y, Ay = b(x), where A is the m xm matrix Fy(x, y)and b(x) is the vector Fy(i,Y)ji-Fx(i,Y)(x-x). The implicit functiontheorem says essentially that ir the affine linear sytem has a uniquesolution, then so does the original problem.

At this point we need to review some basic facts from linear alge­bra concerning matrix equations Ay = b. An m x m matrix is calledínvertible (or nonsingular) if there exists an m x m matrix B withAB = 1, 1 denoting the identity matrix. If such a matrix exists it isunique, denoted A-l, and it is a two-sided inverse, A-1 A = AA-l = l.

{8Fi} , 1s j s m, 1s k s m.8Yk

and Fy stands for the m x m matrix

{ 8Fj} , 1s j < m, 1s k ~ n,8Xk

where Fx stands for the m x n matrix

Fx(i, ji)(x - i) +Fy(i, y)(y - ji) = O

for j = 1, ... ,m, since F(i, y) = O.We abbreviate this system as

~ 8Fj(i,y) ( -) ~ 8Fj(i,y)( -) OLJ 8 Xk - Xk + L._¿ 8 Yk - Yk =k=l Xk k=l Yk

equations F(x, y) = Owhere F is a function defined in an open set inan+m taking values in Km. Notice that we are prejudging the outcomethat the number of equations and unknowns should be equal if there isto be any hope of having unique solutions. We are looking for a localtheorem, so we consider fixed points i and y that give one particularsolution, F(i, ií) = O,and we ask if there is a way to define functionsYl, ... ,Ym of x (or y( x) taleing values in Rm) in a neighborhood of x sothat y(i) = Y and F(x, y(x)) = Ofor all x in the domain of y. Becausewe want to use differential calculus we maleethe hypothesis that F beel and we also demand that the solution functions y( x) be el.

If we were to replace F by its best affine approximation at (i, y),then we would be looking at the system

56919.1 The Implícít Functíon Theorem

Page 589: Strichartz_The Way of Analysis 2000

= A-1 ~)_l)k(BA-l)kk=O

= A-l - A-lBA-1 +A-lBA-lBA-1 - ••••

00

so

If A -1 exists, then y = A -1b is the unique solution of Ay = b; andconversely if Ay = b has a unique solution for each b, then A -1 exists.In fact if Ay = O has the unique solution y = O (otherwise said, zerois not an eigenvalue of A), then A is invertible. Finally there is thedeterminant criterion: A is invertible if and only if det A '# O, in whichcase A -1 may be explicitly given by Cramer's rule in terms of ratiosof determinants of (m - 1) x (m - 1) submatrices of A. This is not aparticularly efficient method for computing A -1, however.

One crucial fact that we will need is that the set of invertible ma­trices is an open set in IRmxm; if A is invertible, then any matrix suf­ficiently close to A is also invertible. One way to see this is via thedeterminant criterion: the set of matrices where det A '# O is the in­verse image under the continuous function det : IRnxn -+ IRl of the openset IRl \ {O}. However, we prefer a different proof that is more in thespirit of what we will be doing.

Let us write a matrix close to A as A + B, where B is a smallmatrix. The most convenient measure of the "size" of a matrix M isthe nonn IIMII, which is defined to be the least constant e such thatIMxl ~ clxl for all x in IRm where Ixl denotes the Euclidean norm(warning: IIMII is not the same as the Euclidean norm on IRnxn). Wewill frequently use the estimate IMxl ~ IIMlllxl in this section. Herewe want to point out that every entry Mj k is dominated by the nor­m, IMjkl ~ IIMII (take x = (O, ... , 0,1, O, ... , O) with the 1 in the kthplace, so Ixl = 1and (Mx)j = Mjk; hence, IMjkl = I(Mx)jl ~ IMxl ~IIMlllxl = IIMID· On the other hand IIMII ~ CEj,k M}k)l/2 since1Mxl2 = ¿j (¿k MjkXk)2 ~ ¿j (¿k MJk ¿k x¡) = (¿j,k M}k)lxl2 bythe Cauchy-Schwartz inequality. Thus the norm 01a matriz is small iland only il all its entríes are small.

Now we will write a perturbation series for (A +B)-l. We have

Chapter 13 Implicit Functions, Curves, and Surfaces570

Page 590: Strichartz_The Way of Analysis 2000

Theorem 13.1.2 (Inverse Function Theorem) Let I be a el functiondefined in a neighborhood 01 ji ín Rn taking values in Rn• JI dI (ji)is ínvertible, then there exists a neighborhood U 01 x = I(ji) and ael functíon 9 : U -+ R,n such that I (g(x )) = x for every x in U.Furthermore 9 maps U one-to-one onto a neíghborhood V 01 ji and

A special case is the following:

dy(x) = -Fy(x, y(x))-l Fx(x, y(x)).

Theorem 13.1.1 (Implicit Function Theorem) Let F(x, y) be a elfunction defined in a neighborhood 01x in Rn and ji in Rm taking valuesin Rm, with F(x, ji) = c. Then il Fy(x, ji) is invertible there exists aneighborhood U 01x and a el function y : U -+ Rm such that y( x) = jiand F(x, y(x)) = c [or every x in U. Furth erm ore, y is unique in thatthere exists a neighborhood V 01 ii (V is the image y(U)) such that thereis only one solution z in V 01 F(x, z) = e, namely z = y(x). Finally,the differential 01 y can be computed by implicit differentiation as

near the point (x, ji). The condition we want in order that the lin­earized version have a unique solution is that Fy(x, ji) be invertible.This condition then implies that the equation F(x, y) = O has a localsolution.

Fx(x, ji)(x - x) + Fy(x, ji)(y - ii) = O

We need to verify that the infinite series converges, for then it is a simplematter to multiply by A+B and get the identity matrix. But this is easyto show if IIBA -111< 1. In that case the series converges geometrically,so it gives an efficient method of computing (A+B)-l if A-1 is known.For the details, see the exercises. Note that we have actually establisheda quantitative result: il A is invertible and IIBA-llI < 1, then A + Bis invertible. This method works in great generality, even in infinite­dimensional problems where there is no determinant.

We will also need to know that A-1 is a continuous function of theentries of A. This follows from Cramer's rule, and we can also deduceit from the perturbation series (see exercise set 13.1.3, number 2).

Let us now return to the general implicit equation F(x, y) = Oandthe linearized version

57119.1 The Implicit Function Theorem

Page 591: Strichartz_The Way of Analysis 2000

Example 13.1.1 Consider the equation x2+y2 = 1 ofthe unit circle inR2, as shown in Figure 13.1.1. Here F(x, y) = x2+y2 and Fy(x, y) = 2y.This 1 x 1 matrix is invertible if and only if y '# O. Thus according tothe implicit function theorem, at any point (z,y) on the circle except(1, O)and (-1, O), we can locally solve for y as a function of z; Of coursethere are two possible solutions, y = v'f'=X'2 and y = -v'f'=X'2, andthe point (z, y) will determine which ofthese it is. At the points (±1, O)this breaks down, as y can't be defined for x < -1or x > -1and thefunction y = ±v'f'=X'2 does not even have a one-sided derivative atx = ±1.

( \-1F% ~,-I)if and only if Fy is invertible. Thus the hypothesis of the implicitfunction theorem for F implies the hypothesis of the inverse functiontheorem for f. If 9 is the local inverse to f, then g(x, e) = (x, y(x))where y(x) is the solution to F(x, y(x)) = c. This is the way the implicitfunction theorem is proved in most texts. However,wewiIl give a directproof with an explicit algorithm for approximating the solution. Noticethat in the aboye reduction the dimension was increased, so in orderto have the one-dimensional implicit function theorem (n = m = 1) weneed the two-dimensional inverse function theorem.

Before beginning the proof let 's look at some examples.

and this is invertible with inverse

The inverse function theorem is a special case of the implicit func­tion theorem with F(x, y) = f(y) - x and e = O. Notice that Fy(x, y) =df(y), so the hypotheses are the same. There is also a trick for reducingthe implicit function theorem to the inverse function theorem: givenF(x, y) = e we construct f :U -+ Rn+m, where U is a neighborhood of(i, y) in Rn+m, by f(x, y) = (x, F(x, y)). Then

d/(x,U) = (~% ~y),

g(l (y)) = y [or every y in V. The function 9 is unique in that forany x in U there is only one z in V with f(z) = x, namely z = g(x).Finally dg(x) = df(y)-l if f(y) = x.

Chapter 13 Implicit Functions, Curves, and Surfaces572

Page 592: Strichartz_The Way of Analysis 2000

We come now to the proof. The idea we use can be traced directly backto Isaac Newton, who used it to locate zeros of functions. Neunon'smethod for solving 1(x) = O is to obtain x as a limit of a sequencexo, X1, .... The first value Xo is chosen close to a root of f(x) = O.Wethen replace f by its best affine approximation I(xo) + I'(xo)(x - xo),set this equal to zero, and solve x = Xo - l' (xo) -11(xo). The point weobtain is the intersection of the tangent line to f at Xo with the x-axis,as shown in Figure 13.1.2.

13.1.2 The Proof*

d/(z, y} = (~: -~!)and det df(x, y) = 4x2 + 4y2, so df(x, y) is invertible if (x, y) '# (O,O).Thus the inverse function theorem says that there exist local complexsquare roots. But even if we cut away a neighborhood of the origin, wecannot define a global complex square root because the mapping f isnot globally one-to-one. Indeed I(x, y) = f( -x, -y). We can visualizethe mapping f as follows: cut the plane along the positive real axis,then wrap it twice around (simultaneously stretching r __.,r2), and thenglue it together again. This example shows that the local nature of theconclusion of the theorems is inherent in the situation.

Example 13.1.2 We consider what is essentially the function z2 on C.In terms of real variables alone this is the function f :R2 __.,R2 givenby f(x, y) = (x2 -v'. 2xy). Wewould like to apply the inverse functiontheorem to this function. We compute

Figure 13.1.1:

57313.1 The Implícít Function Theorem

Page 593: Strichartz_The Way of Analysis 2000

3. We show that y( x) is differentiabIe and the derivative at x isobtained by differentiating the identity F(x, y(x)) = c. This partof the proof only uses the identity F(x, y(x)) = e itself and doesnot refer to the way we obtained y(x).

2. We verify the hypotheses of the Contractive Mapping Principie.The fíxed point, y(x), is the implicitly defined function.

1. For each point x in U we construct a map T : V __.,V with theproperty that F(x, y) = e if and only if Ty = y. This mapping Tis a slight variant of the one suggested by Newton's method.

This is not the solution to J(x) = Oin general, but we take it to be Xl.Iterating this process, we define Xn = Xn-l - J' (Xn-l) -1J (Xn-l). Forthis to make sense weneed J' to be non-zero at all the points considered.Notice also that the iterative equation for Xn in terms of Xn-l canbe written conciseIy as Xn = TXn-1 where T is the transformationTx = X - f'(X)-1 J(x). We see that f(x) = O if and onIy if x is afixed point of T. This suggests that we should apply the ContractiveMapping PrincipIe. For technical reasons it will be easier to analysethe transformation 'Í'x = x - J'(XO)-l J(x). Notice that it is also trueof T that we have Tx = x if and only if J (x) = o. Furthermore, if J'doesn't vary too much the difference between T and t will be slight.We will not carry this discussion of Newton's method further, since weintend it only as motivation for the proof.

Here is an outline of the proof:

Figure 13.1.2:

Chapter 13 Implicit Functions, Curves, and Surfaces574

Page 594: Strichartz_The Way of Analysis 2000

Proof of the Implicit Function Theorem: We want the solution yto F (x, y) = e to be the limit oí successive approximations obtained byiterating a contractive mapping. (The mapping will depend on x, whichwe treat as a constant.) Avoiding subscripts, let us suppose we havefound y for which F(x, y) ~ e and we want a better approximationz. Replacing F by its best affine approximation at (x, y), which isF(x, y) + Fy(x, y)(z - y) (since we don't vary x, there is no Fx term),and setting this equal to c, we can solve F(x, y) +Fy(x, y)(z - y) = e toobtain z = y+Fy(x, y)-l(c-F(x, y)). Since our invertibility hypothesiscon cerned Fy(x, y) and (x, y) is a nearby point, we will simplify thisto z = Y + Fy(x, y)-l(c - F(x, y)). Thus we are led to consider themapping Ty = y + Fy(x, y)-l(C - F(x, y)), which is well defined sinceFy(x, y)-l is assumed to exist, and has the property that Ty = y ifand only if F(x, y) = c. We would like to find a neighborhood V ofy, say V = {y : Iy - 171 < 6}, on which T is a contractive mapping;for then we will know that T has a unique fixed point and, hence,F(x, y) = c has a unique solution for y in V. Presumably we will haveto restrict x to lie in a neighborhood of x in order for this to work, sayU={x:lx-xl<e}.

In order to verify the amended hypotheses of the Contractive Map­ping Princíple we need to show that T maps V to V (y in V impliesTy is in V) and that ITy - Tzl ~ ply - z] for sorne p < 1 for y and zin V. We are free to take 8 and e as small as needed to obtain these

There are unfortunately several small technical difficulties that makethe proof long and pedantic. We mention one at the onset: the sets Uand V in the implicit function theorem are required to be open sets,while the Contractive Mapping PrincipIe requires a complete metricspace and, hence, closed sets. (Actually this is only a problem for V,since U is in the parameter space.) To get around this mismatch, wehave to consider both V and its closure V. If we can show T :V -+ Vand is contractive on V, then we can apply the Contractive MappingTheorem to V. This gives a unique fixed point in V, but the fixed pointmust actually He in V since T maps into V. Thus we may conclude theexistence of a unique fixed point in V.

4. By varying X, we establish dy(x) = -Fy(x, y)-l Fx(x, y(x)) for xin a neighborhood of X. This formula shows that y is el.

57519.1 The Implicít Function Theorem

Page 595: Strichartz_The Way of Analysis 2000

I(Fy(x, y + t(z - y)) - Fy(x, y))(z - y)1 ~ 'xlz - yl;

for any preassigned 'x. Thus

where we have used the linearity of matrix multiplication and the factthat Jol dt = 1 to bring Fy(x, y) inside.

Nowwe use the hypothesis that Fy is continuous. By taking e smallenough we can make x close to X, and by taking 8 small enough wecan make y and z close to y; hence, all points y + t(z - y) on the linesegment joining them will be close to y. By doing this we can makeFy(x, y + t(z - y)) close to Fy(x, y), say

IIFy(x, y + t(z - y)) - Fy(x, y)11 ~ ,X

(*) F(x, z) - F(x, y) - Fy(x, y)(z - y)

= l (F.(x, y + t(z - y» - F.(x, ¡¡»)(z - y) dt

Now the factor Fy(x, y)-l is a fixed matrix, so if we can control theterm in brackets it will only increase the value of p. The term inbrackets looks very much like a comparison of F(x, z) with the bestaffine approximation to F at (x, y), the only difference being that weare evaluating Fy at the wrong point. However, Fy is continuous sothat should not be too serious.

Now if wewere in one dimension we could use the mean value theo­rem on F(x, z) - F(x, y), but in the general case we have to resort to amore technical argument involving the fundamental theorem of the cal­culus applied to the line segment joining y and z, We parametrize thissegment y + t(z - y) with O~ t ~ 1 and consider the restriction h(t) =F(x, y + t(z - y)) of F to this line (x remains constant throughout).Now F(x, z) - F(x, y) = h(l) - h(O) and h(l) - h(O) = J¿ h'(t) dt bythe fundamental theorem, while h'(t) = Fy(x,y+t(z-y))(z-y) by thechain rule. Altogether F(x, z)-F(x, y) = Jo1Fy(x, y+t(z-y))(z-y) dt,which enables us to compare it with Fy(x, y)(z - y) by writing

Ty - Tz = y - z + Fy(x, y)-I(F(x, z) - F(x, y))= Fy(x, y)-I[F(x, z) - F(x, y) - Fy(x, y)(z - y)].

results. First we show that T satisfies the contraction property. SinceFy(x, y) is invertible, we can write

Chapter 13 Implicit Functions, Curves, and Surfaces576

Page 596: Strichartz_The Way of Analysis 2000

ITy - jil s MIF(x, ji) - F(z, y) + Fy(x, y)(y - ji) I~ M(.\(e + 6) + ce)

where M = IIFy(x,y)-lll as before; and we can maleethis < 6 by firsttaking .\ so that M.\ < 1/2, fixing the required 6, and then choosing eso that M(.\ + c)e < 6/2.

We have now completed the most difficult part oí the proof. Wehave shown that there exist neighborhoods U oí x and V oí y such

for any given .\ by maleinge and 8 small enough, and the second termwe estimate by IFx(i,y)(z - x)1 ~ ce for e = IlFx(x, Y)II. Combiningthese we obtain

IF(x, y) + Fx(x, y)(z - i) + Fy(i, y)(y - y) - F(z, y)1 $ .\(e + 6)

By the differentiability oí F at (x, y) we can arrange to malee

[F(x, y) + Fx(x, ji)(z - x) + Fy(x, ji)(y - ji) - F(z, y)]-Fx(x, y)(z - x).

since e = F(x, y). Again the terms in brackets resemble the comparisonof F(x, y) with the best affine approximation to F at the point (i, ji).We write it as

ITy-Tzl = IFy(x,y)-l[F(z,z)-F(z,y)-Fy(i,y)(z-y)]1 $ M.\lz-yl

where M = IIFy(x, y)-lll is a fixed constant. Thus by choosing .\ sothat p = M.\ < 1 and e and 8 accordingly we obtain the contractiveestimate for T and ti'.

The argument that T maps ti' into V is quite similar. We need toshow that for 8 small enough, Iy - jil $ 8 implies ITy - jil < 8. Now

Ty - Y = y - y + Fy(x, y)-l(c - Fi», y))= Fy(x,y)-l[F(x,ji) - F(z,y) + Fy(x,ji)(y - y)]

So

hence, by (*) and Minkowski's inequality (see section 11.1.2)

IF(x, z) - F(x, y) - Fy(i, ji)(z - y)l ::;f .\Iz - yl dt = .\Iz - yl·

57713.1 The Implicit Functíon Theorem

Page 597: Strichartz_The Way of Analysis 2000

1Iy(x) - yl ~ MIlx - xl + 2(lx - xl + Iy(x) - iíl)·

so

IR(x)1 ~ (Ix - xl + Iy(x) - iíl)/2M2,

Iy(x) - iíl ~ IFy(x, ií)-I Fx(x, ií)(x - x) I+ jFy(x, ií)-lR(x)1 ~ MIlx - xl + M2IR(x)1

where MI = IIFy(x,ií)-IFx(x,ií)1I and M2 = IIFy(x,ií)-11l are fixedconstants. Then by taking x close enough to x we can make

This is not immediately apparent because our estimate for R( x) in­volves Iy(x) - iíl, but we can establish it in two steps.

The first step is to prove the Lipschitz condition Iy(x) -iíl ~ elx -xlfor sorne constant e and x near X. For this we use (*) and estimate

Notice that this will say that y( x) is differentiable at x with differential-Fy(x, ií)-I Fx(x, y) once we show that

Fy(x, ií)-lR(x) = o(lx - xl).

y(x) - ií = - Fy(x, ií)-I Fx(x, ií)(x - x)- Fy(x, ií)-lR(x).

o = F(x,y(x)) - F(x,ií)= Fx(x, ií)(x - x)

+ Fy(x, ií)(y(x) - ií) + R(x)

where R(x) = o(lx - xl + Iy(x) - iíl). We can then solve

that T : V --+ V is a contractive mapping for each x in U (T dependson x). Therefore F(x, y) = e for fixed x in U has a unique solutiony(x) in V that is obtained algorithmically as the limit of iterating Ton sorne initial first guess, say ií, so y(x) = limk~oo Tkií. All thatremains is to show that y( x) is a el function and its differential isobtained by implicit differentiation of the identity F(x, y(x)) = e. Wewill obtain this information directly as a consequence of that identityand the differentiability of F.

Now both F(x, ií) = e and F(x, y(x)) = e, so F(x, y(x)) - F(x, ií) =o. On the other hand F is differentiable at (x, ií), so

Chapter 13 Implicit F\mctions, Curves, and Surfaces578

Page 598: Strichartz_The Way of Analysis 2000

x(t) = x(to) + ¡t G(s, x(s)) dsi;can be thought of as an infinite-dimensional implicit function equation.

equation

Notice that the formula for the differential shows that y is actuallye- if F is c-. It can even be shown that y is analytic if F is analytic,although we will not do this here.

The reader may be struck with the resemblance of the proof of theimplicit function theorem and the proof of the existence and unique­ness theorem for o.d.e. 's vía Picard iteration. In a sense, the integral

so y( x) is differentiable and, hence, continuous. But this expression fordy( x) is clearly a continuous function. Thus y( x) is el. QED

dy(x) = -Fy(x, y)-l Fx(x, y(x)),

Then we need only choose 8 so that M28(1+c) = e, This completes theproof of the o( Ix - xl) estimate for the remainder in (*), thus provingthe differentiability of y( x) at x with the correct differential.

Finally, we need to vary the point X. We have seen that the invert­ibility of Fy(x, y) implies that all nearby matrices are also invertible,so Fy(x, y(x))-l exists for x near X. Thus the aboye argument can berepeated at x to show

IFy(x, y)-l R(x)1 < M2IR(x)1 ~ M28(lx - xl + Iy(x) - yl)< M28(lx - xl + clx - xl)< M28(1 + c)lx - xl.

and then

IR(x)1 ~ 8(lx - xl + Iy(x) - yl),

establishing the Lipschitz condition with constant e = 2Ml +1. The sec­ond step is then to argue that given any e > Owe can make IFy (x, ii)-1.R(x)1 ~ élx - xl, since if x is close enough to x we can make

~Iy(x)-yl~ (Ml+D lx-xl,

We rearrange terms to obtain

57913.1 The Implicit Function Theorem

Page 599: Strichartz_The Way of Analysis 2000

13.1.3 Exercises

1. Verify that if IIBA-111 ~ r < 1, then the series A-l Lk:o(-I}k.(B A -1) k converges at arate that depends only on r.

2. Show that (A+B}-1_A-1 = A-1 L~1(BA-1)k --+ Oas B --+ O.

3. Write out the proof of the inverse function theorem by modifyingthe given proof of the implicit function theorem.

4. In the following examples, decide at which points (i, y) the hy­potheses of the implicit function theorem are satisfied:

a. x4 + xy6 - 3y4 = c.

{sin(x + y¡} + y~ = C1,

b. yr + xyi = C2.

{ yf + 3Y1Y2 = Xl,c. y~ + 4yry~ = X2.

5. Prove that the o.d.e.

GViYi + 1+:1:2) ~ + 2xy =0

has unique local solutions with y(xo} = Yo for any Xo and Yo. Doesthe existence and uniqueness theorem for o.d.e. 's apply?

dy N(x, y}dx= M(x,y)'

in a neighborhood of (xo, Yo).

We can now complete the discussion of the exact o.d.e.

M(Z,y)~ + N(x,y} = O

where M = Fy and N = Fx for some el function F(x, y}. We saw thatthe o.d.e. was equivalent to the implicit equation F(x, y} = c. We cannow apply the implicit function theorem to assert the local existenceof solutions of the implicit equation provided Fy = M is non-zero. Inother words, if M(xo, Yo) '# O, then there exists a unique solution y(x)of the o.d.e. with y(xo} = Yo defined in a neighborhood of xo. Thecondition M(xo, Yo) '# O is quite natural, for it means we can put theo.d.e, in normal form,

Chapter 13 Implicit Functions, Curves, and Surfaces580

Page 600: Strichartz_The Way of Analysis 2000

1. Parametrically, as the image of a function 9 : U ~ an where U isan open subset of Rm,A = g( U). The usual cartesian coordinates(t¡, ... , tm) = t in U give curvilinear coordinates for A. Usuallythe function 9 is assumed to be one-to-one. Frequently wecan onlydescribe part of the set A in this way, but we can describe all ofA as a union A = Uj9j (Uj) where each 9j : U, ~ Rn is as above.It is natural to think of such sets A as being m-dimensional.

We want to consider three ways to describe a subset A of Rn:

13.2.1 Motivation and Examples

13.2 Curves and Surfaces

6. a. Prove that for every n x n matrix M sufficiently close to theidentity matrix there exists a square-root matrix (solution ofA2 = M) and the solution is unique if A is required to besufficiently close to the identity matrix.

b. Compute the derivative of the square-root function M ~ A.c. Show that the binomial expansion can be used to compute A

(write M = 1+B and substitute B into the power series fory'f+x) for M sufficiently close to the identity.

7. a. Show that the autonomous o.d.e. y' = F(y) has a uniquesolution with initial condition y( to) = Yo in a neighborhoodof to provided F is continuous and F(yo) ":/;O. (Hint: solvethe o.d.e. that the inverse function satisfies.)

b. *Let F : R ~ R be continuous, and let A = {y : F(y) = O}.Since A is closed, its complement can be written uniquely as a(finite or countable) disjoint union of open intervals U(aj, bj ).Show that there exists a unique solution to y' = F(y) withany initial condition y(to) = Yo if and only if all the im­proper integrals r;j IF(y)l-l dy and J!:. IF(y)I-1dy diverge,

J Jwith mj the midpoint of the interval (if one of the inter-vals is unbounded, say (aj, 00), the condition should readJaa~+lIF(y)l-l dy = +00).

J

58119.! Curves and Surfaces

Page 601: Strichartz_The Way of Analysis 2000

Again we may also allow the domain of f to be an open subset of ~m.

This is really a special case of part 1, since the t variables can serve asparameters, and g(t) = (t, f(t)). In this case the function gis obviouslyone-to-one.

We willloosely refer to the subsets A that can be described in one ofthe aboye manners as m-dimensional surfaces. When m = 1 we will usethe term curve. We will give more precise definitions later. Our goalis to show that all three approaches yield the same class of surfaces.Also, we are interested only in smooth surfaces, so we will assume thatall functions involved in the descriptions of A are el. The graph-of­function description is the simplest from this point of view, since wedo not have to impose any further conditions on the function f. Onthe other hand, a graph-of-function description is not so easy to find.The main theorem in section 13.2.3 says that a parametric descriptionleads to a graph-of-function representation, while the main theorem insection 13.2.4 says that an implicit description also leads to a graph-of­function representation. However, it is important to realize that boththese theorems impose additional hypotheses on the defining functions.In fact, section 13.2.2 is devoted to explaining the condition we need toimpose for parametric representations. Also, the theorems are local innature; generally speaking, it is only the implicit representation thatallows us to describe the whole surface at once.

The intuitive idea is that a smooth m-dimensional surface is a setthat has an m-dimensional tangent space at each point. Unfortunately,there is an ambiguity in how one should define the tangent space at apoint x: it can be 1) the vector subspace passing through the origin

A = {(t, s) in ~m+k : s = f(t)}.

3. As the graph of a function. We split the variables (Xl, ... ,Xn) intotwo groups t = (tl,"" tm) and s = (Sl,"" Sk), where m+]: = n,consider a function f : ~m -1- ~k, and set

2. Implicitly, as the solution set of an equation or set of equations.That is, if F : ~n -1- ~k, A = {x : F(x) = O}. Here we expect k tobe the codimension of A (the difference between the dimensionsof the ambient space ~n and the subset A), so m = n-k shouldbe the dimensiono The domain of F may also be an open subsetof R".

Chapter 13 Implicit Functions, Curves, and Surfaces582

Page 602: Strichartz_The Way of Analysis 2000

Portions of the circle can be given as graphs of functions, x =±y'1=Y2 or y = ±V1- x2.

Figure 13.2.1:

1. The unit circle in ]R2 can be described irnplicitly as the solutionset of x2 + y2 - 1 = O. We can describe it pararnetrically by theangular variable 8 and the function g(8) = (cos8,sin8). Noticethat we cannot find an open interval on which 9 is both one-to­one and onto (we could take a half-open interval such as [O, 27r),but this just hides the difficulties at the endpoint O). We can,however, cover the circle with two patches such that the pararn­eter function 9 is one-to-one on each, say 9 : (0,27r) --+ ]R2 and9 : (-1r /2, 1r/2) -+ ]R2, as indicated in Figure 13.2.1.

of all directions that are tangent to A at Xj or 2) the affine subspacepassing through x lying tangent to A. The tangent space accordingto definition 2) is just the tangent space according to definition 1)translated by x so as to sit tangent to A. We will use definition 1),but frorn time to time we will indicate how to obtain the affine tangentspace of definition 2). One thing to keep in mind is that with whateverdescription of A we start-parametric, implicit, or graph-of-function­we will obtain the same kind of description of the tangent space.

Now we look at sorne examples that reveal sorne of the difficultieswith which we have to deal.

58313.2 Curves and Surfaces

Page 603: Strichartz_The Way of Analysis 2000

4. If we intersect the unit sphere in RJ with the x - y-plane, weobtain the unit circle in R2 again, but this time as a subset ofr. Implicitly this means considering the pair of equations x2 +y2 + z2 - 1 = O and z = O, or in other words, F(x, y, z) = Owhere F(x, y, z) = (x2 + y2 + z2 - 1, z) is a function from Jlt3 toR2• Parametrically we can represent this circle by adding a zero

(x, y, z) = (cos9sin <1>, sin 9sin <1>, cos <1».

Notice that there appears to be a sharp cusp at the point (O,O)onthe curve, even though the functions in the implicit and paramet­ric representations are smooth. Wewill see that it is the vanishingof certain derivatives that is to be blamed for this. Notice thatthis problem is apparent in the representation as the graph off(x) = x2/3 because 1'(0) does not existo

3. The unit sphere in :al is given implicitly by the equation x2 +y2 +z2 - 1 = O.We can describe each of the six hemispheres as graphsof the functions x = ±v'1 - y2 - z2, Y = ±v'l - x2 - z2, z =±v' 1 - x2 - y2. Another popular parametric representation isthe spherical coordinates [latítude-longítude)

Figure 13.2.2:

2. Next consider the solution set in R2 of the equation y3 - x2 = O,as shown in Figure 13.2.2. This can be represented parametricallyby g(t) = (t3, t2) or as the graph of y = x2/3•

Chapter 13 Implicit Functions, Curves, and Surfaces584

Page 604: Strichartz_The Way of Analysis 2000

13.2.2 Immersions and EmbeddingsThe next problem with which we have to deal is the possibility ofcusps or other non-smooth behavior of the surface. We want to dealwith smooth surfaces, so we can use differential calculus, but the sec­ond example shows that rough spots can develop even if the functions

These examples reveal some of the problems with which we are go­ing to have to come to grips in creating a nice theory of surfaces. Wecan see from the start that we are going to have to describe surfaceslocally and that there are really two distinct notions of "local". In themore stringent notion, we take a neighborhood U of a point on the sur­face in Rn and demand some property of the intersection of U with thesurface-in other words we take a piece of the surface as it sits in Rn•In this sense the crossing of the axes in the last example cannot be cutaparto In the other, less stringent, sense of "local", we ask only that thesurface be a union of pieces with certain properties. In this sense thetwo axes are the union ofaxes, and each axis is weIlbehaved. The first,stricter notion is the one we shall adopt, since we can obtain a nice re­sult about implicitly defined surfaces with this concepto The technicalname for the two concepts are embedded surfaces and immersed sur­faces. Although we wiIl not emphasize immersed surfaces, we shouldpoint out that they are important in topology and in some ways easierto study. For example, there is a famous theorem of Smale that says theunit sphere in a3 can be turned inside out-everted-by continuouslydeforming it through a sequence of immersed surfaces. It is intuitivelyclear that no such result would be possible using embedded surfaces.

third coordinate to the R2 parametrization, g( 8) = (cos (J, sin (J, O).We can also use the graph of (y, z) = (±~, O) or (x, z) =(±~, O)to represent parts of the cirele, but we cannot usea graph of a function of z.

5. As a final example consider the implicit equation xy = Oin R2,which defines the set consisting of the z-and y-axes. In a neigh­borhood of the origin where the axes cross, it is impossible todescribe this set parametrically in a one-to-one fashion, althoughwe can write the set as a union of the axes, each represented as agraph of a function.

58519.2 Curves and Surfaces

Page 605: Strichartz_The Way of Analysis 2000

tangent, then it is smooth, so what can go wrong? In the exampleg(t) = (t3, t2), the thing that goes wrong is g'(O) = (O, O). If thevelocity vector vanishes, then we don't necessarilyhave a tangent tothe curveand the curvemaynot be smooth.·Our interpretation of g' (t)as velocity means that g'(to) = O is interpreted as saying the tracingcomes to a stop at time too In other words, you can draw a roughcurve in smooth motions if the motion is allowed to stop. Therefore,the condition we want to impose is that g' is never zero. This is not

Figure 13.2.3:

definingthe surface are el. Of coursewedid notice that this problemdoesn't seem to arise in the graph of function description. One wayto evade the problem would be to make a definition in terms of thegraph-of-functiondescription: a el surface in R,n is a subset that islocallythe graph of a el function. This approach seemsoverlyrestric­tive, however. How do we know there aren't subsets of R,n that wewould intuitively agree are smooth surfaces and yet cannot locally beexpressedas a graph of a el function? Actually it turns out that therearen't any; but this should be a theorem, not part of the definition.For the definitionweshould only require sornesort of local parametricrepresentation. But not everyel parametric representation willdo, asthe cusp example g(t) = (t3, t2) shows. So our problem becomeshowto restrict parametric representations to ensure "smoothness" of thesurface.

To get a feel for the answer, let's look first at the simplest case:curves. The parameter is just a singlevariable t, whichwecan assumetakes valúes in an open interval (a, b). Let 9 : (a, b) -+ lRn be el. If wethink of g( t) as tracing out the curve in Rn with t as time, then g' (t)represents the velocityof the motion. It willbe a vector that is tangentto the curve, as in Figure 13.2.3. If the curve has a

Chapter 13 Implicit Functions, Curves, and Surfaces586

Page 606: Strichartz_The Way of Analysis 2000

and although it is not zero, it does appear less full of life than it mightbe. This gives us a good clue toward finding the condition we want.

It is time to appeal once again to the basic principle of differentialcalculus: do unto the derivative what you would do unto the function.In this case we want the parametrizing function 9 to be one-to-one,so we should demand that the best affine approximation be one-to­one. Clearly this doesn't depend on the constant term, but only onthe linear part, the differential. The question is then whether matrixmultiplication by dg is one-to-one. It is easy to see when m = 1 thatthis is the same as the non-vanishing of dg, but for m > 1 the questionis more subtle. We need to recall some basic facts from the linearalgebra.

Let A denote an n x m matrix, so Ax is a linear transformationfrom Rm to R". The rank of A is defined to be the dimension of the

is never zero. Nevertheless, the surface folds at t = Oand so it is notsmooth. The differential when t = Ois

(3t2 O)

dg(t, s) = ~ ~

to say that g'(to) = Onecessarily implies that the curve is not smoothat to. There are many different parametric representations of a curve,and we can always stop along a smooth curve to admire the view (forexample, g(t) = (cost3,sint3) parameterizes the circle). Nevertheless,it is plausible that if a curve is smooth we can move along it at anon-zero velocity in a smooth manner.

Since we have seen that for a curve the condition wewant to imposeon 9 is the nonvanishing of the derivative, wemight be tempted to guessthat this is also the right condition in general. However, if we analysethe followingsimple "fold singularity" wewill see that this is not strongenough. We simply take the cusp curve and add a dimensiono Theparameter space will have two variables t, s; and the two-dimensionalsurface in Jt3 wíll be parametrized by g( t, s) = (t3, t2, s) . Note that

58713.2 Curves and Surfaces

Page 607: Strichartz_The Way of Analysis 2000

We now want to consider those el functions 9 : U --+ ]Rn where Uis an open subset of R?' such that dg(x) has rank m at every point xof U. Such functions are called immersions. We will insist that localparametrizations of el surfaces be given by irnmersions. As expect­ed, immersions will be locally one-to-one; we will not prove this nowsince it will be a consequence of a later resulto An immersion doesnot have to be globally one-to-one, as the familiar circle parametriza­tion shows. We define an embedding to be an immersion that has twoadditional properties: 9 : U --+ ]Rn is one-to-one, and the inverse rnap

and we have

The same is true of

has rank 1 and Ax is clearly not one-to-one, as

image of A, which is the linear span of the columns of A. This is alsoequal to the dimension of the span of the rows of A and the size ofthe largest invertible square submatrix (a submatrix means we selectcertain rows and columns, not necessarily in any special order). Thekernel of A, also called the null-space, is the set of solutions ofAx = O;and the linear function Ax is one-to-one if and only if the kernel is{O}. The basic formula dimension 01domain = dimension 01image +dimension 01kernel means that Ax is one-to-one if and only if the rankof A is equal to the dimension of the domain, which we are calling m.This is the largest possible value for the rank, since there are only mcolumns. Note that if A is one-to-one we necessarily have m ::; n, forthere are only n rows. The matrix

Chapter 13 Implicit Functions, Curves, and Surfaces588

Page 608: Strichartz_The Way of Analysis 2000

f(t) = g(t, i2, ... ,xm),

because the touching point is no longer a point of g( U).If 9 : U ~ Rn is an embedding, we take the usual Euclidean coordi­

nates in U ~ Rm and carry them by 9 onto a set of coordinates for theimage g(U). Fix a point i in U, and let y = g(i) be the correspondingpoint in the image g(U). H we fue al! the coordinates of x except forone, say Z2, Z3, ... , Zm, and vary Zl, then we obtain a straight line inU that gets mapped under 9 into a curve in g(U),

Figure 13.2.5:

g

(~o

should illustrate the bad behavior we are trying to avoid. In Figure13.2.4 the map on the left fails to be one-to-one at the cross point. Themap on the right is one-to-one, but the inverse fails to be continuousat the touching point (the map is defined on an open interval, so thetouching point is oo1ycoveredonce). However, the map in Figure 13.2.5does give an embedding

Figure 13.2.4:

g-l : g(U) ~ Rm is continuous (here we are considering g(U) as ametric subspace of Rn). The followingpictures of immersions that arenot embeddings (m = 1,n = 2)

58919.2 Curves and Surfaces

Page 609: Strichartz_The Way of Analysis 2000

Figure 13.2.6:

If h : (a, b) __.,U is any el curve passing through X, say h(c) = x, thengoh: (a, b) __.,g(U) is a el curve in g(U) passing through y, goh(c) = yand the tangent vector

d ~ dhk 8g_-goh(c) = ~-(c)-(x)dt k=l dt 8Xk

is a linear combínation of these vectors ag/8Xk(i) and, hence, belongsto the tangent space at y. Thus the tangent space is composed oftangent vectors to curves lying in g(U) and passing through y. Also,because g-l : g(U) __.,Rm is continuous, any curve on g(U) passingthrough y is the image oí a curve in U passing through x as above.

passing through y. The tangent vector

dI (_) 8g (_ _ _)-d Xl = -8 XI,X2"",Xmt Xl

is non-zero because it is a column of dg. In fact, if we vary each oíthe coordinates Xk in turn, we get curves in g(U) with tangent vectorsat y that are the different columns of dg(x) and, hence, are linearlyindependent. This is indicated in Figure 13.2.6. They span an m­dimensional subspace oí Rn that we will call the tangent space to g(U)at the point y.

Notice that this gives a parametric description of the tangent space,namely all vectors Ek=l cka9/aXk(X) for any choice of the parametersC1,.·· ,Cm.

Chapter 13 Implicit Functions, Curves, and Surfaces590

Page 610: Strichartz_The Way of Analysis 2000

Proof: The idea of the proof is to solve for xl, ... ,Xm (variables inthe domain of g) as a function of sorne of the y variables (in the rangeof g) tI, ... .t-«, using the inverse function theorem. Look at the matrixdg(x). By assumption it has rank m, so among the n rows dgj(x), j =1, ... ,n, there are m linearly ind.ependent ones. The correspondingvariables Yj may be taken as the t variables. The remaining n-mvariables will then be the S variables. In other words, among all possiblepartitions of the y variables we are allowed to choose any one for whichthe rows dgj(x), corresponding to those variables Yj we have called

The theorem holds for all immersions and is local in the domain of g.However, if 9 is actually an embedding, then g(Ü) is a neighborhood ofg( x) in g(U) and so the result is local on the surface. Thus the theoremshows that every surface is locally the graph of a function.

Theorem 13.2.1 Let 9 : U -1- ]Rn with U an open subset of]Rm be ael immersion. Then for any point X in U there exists a neighborhoodÜ of x such that 9 is one-to-one on Ü and g(Ü) is the graph of a elfunction.

Our definition thus says that M¿ is locally parametrized by elembeddings. Next we want to show the fact hinted at before, that thisimplies that locally M¿ is the graph of a el function. By this we meanthat for each V there is a partition of the variables YI, ... , Yn into twogroups, tl, ... , tm and Sl, ... , Sn-m, such that V n Mm = {(t, S) : S =f(t) for t in U} where f : U -1- ]Rn-m is a el function. (Actually itmay be necessary to decrease the size of the sets V for this to be true.)Note that this is a special case of an embedding, namely g(t) = (t, f(t)).Since the first m rows of dg are the m x m identity matrix, we haverank dg =m. Also 9 is clearly one-to-one, and the inverse is continuousbecause it is just the projection onto the t-variables.

13.2.3 Parametric Description of Surfaces

Deflnition 13.2.1 A el m-dimensional surface in ]Rn (with m ~ n)is a subset Mm of lRn such that for every point y in Mm there exists aneighborhood V of y in ]Rn and an embedding 9 : U -1- ]Rn for U openin lRm such that g(U) = V n Mm' We say Mm is of class e- if each ofthe embeddings is of class e».

59113.2 Curves and Surfaces

Page 611: Strichartz_The Way of Analysis 2000

allow us to write s = <p(h-l (t)) = f(t) for f = <po h-Ij and f is acomposition of el functions and, hence, el. Thus y = g( x) for X in Üif and only if y = (t, s) with t in V and s = f(t), so g(Ü) is a graph off. QED

which we can abbreviate as s = <p(x) where

so dh(x) consists of the first m rows of dg(x) and, hence, is invertible.By the inverse function theorem there exists a neighborhood Ü of x onwhich h has a el inverse h-l : V -t Ü so that X = h-l(t) if and onlyift = h(x) for x in Ü and t in V, where Vis an open set in the t space.This shows that g is one-to-one on Ü. Now the remaining equations

which we can abbreviate t = h( X) where

t variables, are linearly independent (or equivalently, the submatrixformed by taking the rows corresponding to t variables is invertible).To simplify the notation we now assume that the first m variablesYI, ... ,Ym are the t variables-we can always arrange this by relabelingthe variables.

Now we use the inverse function theorem to solve for Xl, ••• , Xm asa function of tI, ... ,tm• Indeed we are given the equations

Chapter 13 Implicit Functions, Curves, and Surfaces592

Page 612: Strichartz_The Way of Analysis 2000

When sin 4> =Othe first column is zero and dg has rank 1, which is lessthan m = 2; so 9 is not an immersion. However, at all other points dg

(

-sin 8sin 4> cos8cos 4> )dg = cos 8sin 4> sin 8cos 4> .

O - sin 4>

9 : ]R2 -+ ]R.3 , with the image of 9 equal to the unit sphere. We compute

(X) ( 8 ( cos8sin 4> )y = 9 </» = sin 11 sin </> ,z cos 4>

Example 13.2.2Consider the spherical coordinates map

( :: ) = ( ;;::: )

on the circle with sin 80 :/; O (hence Xo f ±1), we can first solve for 8as a function of x, 8 = arccos x (choosing a branch of arccosine so that80 = arceos xo) and then y = sin(arccos x) gives part of the circle asthe graph oí a function (depending on the branch of arccosine, we havesin(arccos x) = ±J1 - X2).

Choosing a point

8g1 . 8 O88 = - s10 :/; .

we can solve for x as a function of y in a neighborhood of any point forwhich 8g2/ 88= cos 8 :/;Oand for y as a function of x in a neighborhoodoí any point for which

d (8) = ( - sin 8 )9 cos8'

an immersion oí R1 into R2, the image being the unit circle. Since

(8) = ( c~s8 ) = ( x )9 s108 y'

Example 13.2.1Consider

59319.2 Curves and Surfaces

Page 613: Strichartz_The Way of Analysis 2000

e = arcsin (-J!--) = arcsin (. y ) .SID 4> Sin arccos z

Then x = cos f)sin 4> = cos(arcsin(y / sin arccos z)) sin(arccos z), whichas we know must simplify to x = ±J1 - y2 - z2.

The theorem shows that we could just as well have defined a elsurface as a set that is locally the graph of a el function. However,it is preferable to have the greater ftexibility of general embeddings9 :U -+ V nMm' We call the subsets V nMm of the surface coordinate

and

(j) = arccos z,

so that we can solve for x as a function of y and z if cos f) ":/;O or wecan solve for y as a function of x and z if sin f) ":/;O. In the first case wesolve y = sin f) sin 4>, z = cos 4> for f) and (j),

invertible. Since the determinant of this 2 x 2 matrix is - sin 4> cos 4>,we need cos (j) ":/; O-in other words we have to stay away from theequator. Then we can solve x = cos f) sin 4>, y = sin ()sin 4> for f) and 4>;indeed x2 + y2 = sin24>, so 4> = arcsin Jx2 + y2, and y/x = tan f), sof) = aretan y/x, and then z = cos 4> = cos (arcsin JX2 + y2), whichsimplifies to z = ±JI - x2 - y2 depending on the hemisphere, In thiscase the graph-of-function extends to the north and south poles, whichhad to be excluded in the spherical coordinate description.

At points on the equator (cos4> = O) we have

(-sin e sin (j) cose sin 4> )cos f) sin (j) sin ()cos 4>

has rank 2, so 9 will be an immersion if we restrict it to any open setin (f), 4» space for which sin 4> never vanishes, say the strip O < 4> < 'Ir.

Then 9 wraps this strip around the sphere with the north and southpoles removed (f) is the longitude and 4> - 'Ir/2 the latitude). Supposewe want to solve for z as a function of x and y on the sphere. We needto have the first two rows of dg linearly independent or

Chapter 13 Implicit Functions, Curves, and Surfaces594

Page 614: Strichartz_The Way of Analysis 2000

tell us howto changefromone coordinate system to another. Weclaimthese functions are el. This is not obvious because 911 and 921 arenot defined on an open subset of R", However,if !J2 happens to beone of the special graph-of-functionembeddings, then g2l is just theprojection onto some of the coordinates, so gil o gl is el and thengIl o g2 = (g;l o 91)-1 is el by the inverse function theorem. Sinceby the theorem we can always interpose a third coordinate patch ofthis special form, it followsthat these functions are el in general. We

Figure 13.2.7:

patches and the function 9-1 : V nMm -+ U a local coordinate mapsince it assigns coordinates (the standard cartesian coordinates in U)to each point in the coordinate patch. The theorem asserts that if wetake a small enoughcoordinate patch wecan arrange for the coordinatemap to be the projection onto m of the cartesian coordinates of Rn,or in other words we can obtain local coordinates on the surface byselectingm of the cartesian coordinates of Rn•

It is interesting to comparetwo localcoordinatemaps on coordinatepatches that overlap. Suppose VI nMm and V2 nMm are two patchesfor which VI n V2 nMm is non-empty, as indicated in Figure 13.2.7.On the overlapwe have two coordinate maps, gIl and g;l, and let ii,and Ü2 be the images of the overlap in Ul and U2. Then the maps

1 - - 1 - -g2 o91 : Ul -+ U2 and gl og2 : U2 -+ U1

59519.2 Curues and Surfaces

Page 615: Strichartz_The Way of Analysis 2000

that lies in this span, and clearly we get every vector in the span in thisway for appropriate straight lines f (t). What remains to be seen is thatevery el curve in the patch V nMm is actually obtained in this way.But if h(t) is any el curve in the patch, we look at f(t) = g-l o h(t),which is clearly a curve in U with go f(t) = h(t), as indicated in Figure13.2.8. We need to show that f is in fact el. But this is obviousif the coordinate map g-l is a projection onto Euclidean coordinates,and it follows in general by the aboye observation that all changes ofcoordinates are el. This completes the proof of the equivalen ce of thetwo definitions of tangent space.

Using the local coordinate maps we can transfer much of the differ­ential calculus in ]Rm to the surface Mm' A function f defined on Mm(say taking real values) is said to be el if the coordinate version f o9 isel for every coordinate patch. Questions about f can be pulled back

d ~ dfk 8g-g o f(t) = Z:: ---(x)dt dt 8Xk

k=l

can interpret this as saying that, despite the non-uniqueness of the localparametric representations, any two local parametric representations ofthe same portion of the surface differ only by a el change of variablein the parameters.

Recall that an embedding 9 : U ---1- ]Rn allows us to define a tangentspace to the image g(U) at any point. We want similarly to definethe tangent space to a el surface at a point to be the tangent spacegiven by one of the local embeddings 9 : U ---1- V n Mm' In orderfor this to make sense we must verify that we get the same tangentspace in the overlap of coordinate patches. We do this by giving acharacterization of the tangent space TMm (y) for y a point in Mm asthe set of all tangent vectors to el curves lying in Mm at the point y.This characterization is intrinsic in the sense that it does not refer toany coordinate patch. We will show that the two definitions coincide:the set of tangent vectors to el curves in Mm at y is equal to the spanof 8gj8xk(X), k = 1, ... m, where g: U ---1- VnMm is a local embeddingand g(x) = y.

Recall that every el curve passing through y that is obtained bycomposing a el curve f(t) in U with the map 9 has tangent vector

Chapter 13 Implicit Functions, Curves, and Surfaces596

Page 616: Strichartz_The Way of Analysis 2000

F(x) =(

F(x) )

Fn-~(x)is a el functi~n from ]Rn to ]Rn-m, and we can think of the sets {x :F (x) = e} as the level sets of F. Suppose for a moment that we knewthat one of these level sets, say Mm = {x : F(x) = O}, were a elm­dimensional surface. Then if x(t) is any curve lying in Mm we haveFj(x(t)) == O, so

We have seen now sorne of the significance and versatility of the conceptofel surface from the point of view of parametric and graph-of-functiondescriptions. We now want to understand how el surfaces may be givenimplicit descriptions as the solution set of F(x) = O. Here

13.2.4 Implicit Description of Surfaces

to f o g, which is an ordinary el function. It is obvious that if f is therestriction of a el function on ]Rn to Mm, then f is el, say f = FIMm,and then f o 9 =F o g, the composition of two el functions. However,there are many instances when functions are defined on surfaces Mmwithout any obvious extension to ]Rn (it can be shown that such ex­tensions must exist locally). We can similarly define what is meant byel functions from one surface Mm to another surface, M:n,; and it ispossible to show that the differential di (y) can be sensibly defined as alinear transformation from the tangent space T Mm (y) to the tangentspace TM:n,(f(y)), but we will not go into the details of this.

Figure 13.2.8:

59713.2 Curves and Surfaces

Page 617: Strichartz_The Way of Analysis 2000

for a el function f : U __.,Rn-m, U an open set in Km o For simplicitywe assume the first m variables Xl, •• o, xm are the t variables. We writethe equation of the level set F( t, s) = c. The condition on the linearindependence of the vectors 8F/8sj says that the (n - m) x (n - m)

V n {x : F(x) = e} = ((t,8) : s = f(t) for t in U}

Proof: Let x be a point on the level seto Since we are assumingdF( x) has rank n-m, we can find n-m variables x j such that thecolumns 8F/8xj(x) are linearly independent. Call these the 8 variablesand the remaining variables the t variables. We want to show that ina neighborhood V of x the level set is the graph of a el function,

Theorem 13.2.2 Let F : Rn __.,Rn-m be a el function, and supposedF(x) has rank n-m at every point on a leuel set {x : F(x) = e}.Then this level set is a el m-dimensional surface.

in other words the gradient of Fj is perpendicular to the tangent vectordx/dt for j = 1, ... , m - n. Since this is true for any curve in Mm, itfollows that VFj(x) is perpendicular to the tangent space TMm(x). Wecall the subspace of all vectors in Rn that are perpendicular to T Mm (x)the normal space at X, denoted NMm(x). It is clear from linear algebrathat the normal space has dimension n-m and TMm(x) is the spaceof all vectors perpendicular to N Mm (x). Thus the normal and tangentspaces determine one another.

We have observed that if a el m-dimensional surface is given im­plicitly by F( x) = 0, then the gradients VFj (x) give us n-m vectorsin the normal space N Mm (x). Since the normal space has dimensionn-m, it would be especially convenient if these gradients VFj (x) werelinearly independent because then they would span the normal spaceand hence determine the tangent space. Since these gradients are therows of dF( x), the condition that they be linearly independent is thesame as saying rank dF( x) = n-m (notice that n-m is the maximalrank possible for an (n - m) x n matrix). If F was actually a lineartransformation, then Mm would equal the tangent space, so this con­dition would be necessary and sufficient for Mm to be m-dimensional.Thus we expect that this condition should imply in general that thelevel set should be a elm-dimensional surface, and the proof shouldinvolve the implicit function theorem.

Chapter 13 Implicit Functions, Curves, and Surfaces598

Page 618: Strichartz_The Way of Analysis 2000

dF = (2X 2y 2Z)O O 1 '

which has rank 2 provided either x '# Oor y '# o. We can check that nopoint of the form (O,O,z) occurs on the given level set, so the theoremasserts it is a one-dimensional surface. In fact we know it is a circle.

The implicit representation is very convenient because with it wecan describe the whole surface at once, whereas with the other rep­resentations we usua11ycan only describe it in pieces. Also the im­plicit description allows us to compute easily the normal space and,hence, the tangent space at each point. For example, for the spherex2 + y2 + z2 = 1, the vector (2xo, 2yo, 2zo) is normal at the point(xo, Yo, zo); hence, the tangent plane at that point is given by the e­quation 2xox + 2yoY + 2zoz = O. Notice that by giving the tangentspace as the orthogonal complement to the normal space we are givingan implicit description: the tangent space at x consists of a11vectors vthat satisfy the equations v .VFj (x) = O, j = 1, ... ,n-m.

There is a more abstract concept, called manifold, which is essen­tially a surface that is not described as a subset of a Euclidean space.Roughly speaking, a el m-dimensional manifold Mm is a metric spacethat is covered by coordinate patches Vi on which coordinate map­s hj : Vi --+ Uj are defined, where Uj are open subsets of Euclideanspace. The condition for differentiability takes the form of an assump­tion that on overlapping patches the change of coordinates functions

Then

For example, the unit sphere in RJ is given implicitly by x2 + y2 +z2 = 1. Here F(x, y, z) = x2 + y2 + z2 and dF(x, y, z} = (2x, 2y, 2z)has rank 1 provided (x, y, z) '# (O,O,O). Thus by the theorema11 thespheres x2 + y2 + z2 = e for e > Oare el two-dimensional surfaces.The theorem does not apply to the level set F(x, y, z}= O,which is infact not a two-dimensional surface. Ifwe intersect the sphere with thex-y-plane, we are looking at the pair of equations x2 + y2 + z2 = 1 andz =O,which can be written

(1 ) ( x2 + y2 + z2 )F(x, y, z) = O for F(x, y, z) = z .

matrix F, is invertible, so we can apply the implicit function theoremto solve for 8 = f(t) as desired. QED

59913.2 Curves and Surfaces

Page 619: Strichartz_The Way of Analysis 2000

3. For each oí the suríaces in exercise 2, compute the normal spaceand tangent space at each point.

4. Let f : Mm ~ R be a el function, where Mm ~ Rn is a elsurface. Prove that every point in Mm lies in a neighborhood Usuch that there exists a el function F : U ~ R that extendsf,i.e.,F(y) = f(y) for y in MmnU. (Hint: use graph ofíunctionrepresentation. )

5. For each point (i, y, i) on the unit sphere x2 + y2 + z2 = 1andeach vector v = (VI, V2, V3) in the tangent space to the sphere at(i, y, i), construct a el curve lying in the sphere whose tangentvector at the point (i, y, i) is (VI, V2, V3).

a. x2 + y2 - z2 = C;b. x2 + y2 + z2 = el, x2 + y2 - z2 = e2;

c. xyz = e.

a. g(t) = (cost,sint,t) t in KI,b. g(t,s) = (cost,sint,s) (t,s) in R2,

c. g(t,s) = (scost,ssint,s) (t,s) in R2,

d. g(t,s) = (scost,ssint,s) s » 1,t in RI.

2. For which values oí the constants do the íollowing implicit equa­tions define a el suríace? For those that do, decide which vari­ables can be taken to be the independent variables in the descrip­tion oí the suríace locally as a graph oí a function:

1. Decide which oí the íollowingmaps are immersions, and for thosethat are decide which variables can be taken to be the independentvariable(s) in the description oí the image 10ca11yas a graph of afunction:

13.2.5 Exercises

must be el. Wewill not go into the details here. There is a theorem tothe eft'ect that every el manifold can be realized as a el surface in aEuclidean space of sufficiently high dimension, so we don't get any newobjects by considering manifolds rather than surfaces. Neverthless, wedo gain a more intrinsic and flexible point oí view by doing so.

Chapter 13 Implicit Functions, Curves, and Surfaces600

Page 620: Strichartz_The Way of Analysis 2000

13. Let Mm be a el m-dimensional surface in Rn• Let TMm be theset of points in R2n of the form (x, y) with x in Mm and y in thetangent space to Mm at x. (This is called the tangent bundle ofMm') Prove that TMm is a el 2m-dimensional surface in R2n•

12. Show that the n x n matrices with determinant equal to one forma el surface of dimension n2- 1 in Rn2 •

11. *Show that the orthogonal n x n matrices form a el surface ofdimension n(n - 1)/2 in Rn2•

8. Let M2 be the surface of revolution in Jt3 obtained by rotating acircle in the z-e-plane, that does not intersect the z-axis, aboutthe z-axis. Show that M2 is a el two-dimensional surface, andcompute its tangent space at any point (Note: this surface is calleda toros.)

9. Show that the Cartesian product Mml xMm2 of two el surfaces ofdimensions mI and m2 in Rnl and Rn2 is a el surface of dimensionmI + m2 in Rnl +n2. Express the tangent space of Mml x Mm2 ata point (x, y) in terms of the tangent space of Mml at x and thetangent space of Mm2 at y.

10. Show that the equation x¡+x~+" .+x; = 1 defines a el (n-1)­dimensional surface in Rn (called the unit sphere). Compute itstangent space at every point.

7. Let M2 be any el two-dimensional surface in Jt3 that is compactoShow that for every two-dimensional vector space V in Jt3, thereexists a point x on M2 whose tangent space equals V. (Hint: ifu is a vector perpendicular to V, what happens at points on M2where x· u achieves a maximum or mínimum?)

6. Show that the set of m x n matrices of rank m (with n ~ m) isan open set in the space of matrices R.mxn•

60119.2 Curues and Surfaces

Page 621: Strichartz_The Way of Analysis 2000

where Gj(x) are real-valued el functions. We form the function H(x,..\)of n + k variables H(x,..\) = f(x) + ..\lGl(X) + ... + ..\kGk(X) and findall critical points of H (points where V'H(x,..\) = O). Then the localmaxima and minima of f(x) on G(x) = Omust occur at the x valuesof these critical points. The ..\ values are discarded. The method alsofinds sorne values of x that are neither maxima nor minima.

Why does the method work? We can break the n + k equationsV'H = Ointo two groups, namely the x-derivatives and the ..\-derivatives.

As an application of the material in section 13.2.4, we consider theproblem of determining the local maxima and minima of a el func­tion f defined on a el surface Mm ~ R". Given a local parametricrepresentation 9 : U -1- Rn with U an open set in Rm, we can find alllocal maxima and minima on g(U) by finding the local maxima andminima of f o9 on U. This reduces the problem to one we have alreadysolved. However, the procedure is often awkward and long, especiallyif we have to consider severallocal parametric representations to coverthe surface or if the expressions for the functions 9 are complicated.

There is an alternate method, called Lagrange multipliers, thatworks in the situation where the surface is given implicitly and thefunction f is the restriction to the surface of a function defined onRn (or at least a neighborhood of Mm in Rn). Whenever these condi­tions are fulfilled, the method of Lagrange multipliers is almost alwayssimpler and more directo Frequently problems solved by Lagrange mul­tipliers are stated in terms of constraints: maximize f (x) defined on Rnsubject to the constraints G(x) = o. We recognize that the constraintequation restricts x to the surface G(x) = O, and in fact the hypotheseson the constraint equation will be exactly what we needed to concludethat the solution set of G(x) = Ois a el surface.

The method can be described very elegantly. We assume f and Gare el functions, and we write

13.3.1 Lagrange Multipliers

13.3 Maxima and Minima on Surfaces

Chapter 13 Implicit Functions, Curves, and Surfaces602

Page 622: Strichartz_The Way of Analysis 2000

Proof: The condition rank dG(x) = k implies that the level setG(x) =Ois a el surface (dimension m = n-k) in a neighborhood ofx, and since the result is purely local, we can disregard what happensaway from X. Wehave seen that V>,H(x,~) = Ojust says G(x) =O,so itremains to show VxH(x,~) =Ofor some Á. Now consider any el curveh(t) lying in the surface G(x) = Opassing tbrough x, say h(to} = X.Then f o h(t) is a el function (the composition of el functions) witha local maximum or minimum at too Hence (d/dt)1 o h(to) = Obythe one-dimensional theorem. But (d/dt)1 o h(to) = VI(x) . h'(to), soV f(x) is perpendicular to the tangent vector h'(to) to the curve. Byvarying the curves we see that VI (x) is perpendicular to the tangentspace ofthe surface at X; hence, V/(x) lies in the normal space. But we

Theorem 13.3.1 (Lagrange Multipliers) Let I : Rn -+ R and G :R,n -+ Rk be el functions, and let X be a point where G(x) = O andsuch that dG(x) has rank k. 111(x) is a local maximum or minimum forI on {x : G(x) = O}, meaning there exists a neighborhood U 01x in Rnsueh that I(x) ~ I(x) (or I(x) ~ I(x)) for all x in Un{x : G(x) = O},then there exists ~ in Rk such that H(x, A) = I(x) + A . G(x) has acritical point at (x, ~) .

This equation says that VI Hes in the span of the gradients of thefunctions Gr(x), or in other words in the normal space to the surfaceG(x) = O, provided we assume these gradients are linearly indepen­dento This means the components of the gradient of I are zero in thetangential directions, which is what we expect if I has a maximum orminimum on the surface.

k

V/(x) = -LArVGr(X).r=l

which we can abbreviate as

al k aGr-a (x) = - LAr-a (x), j = 1,... ,n,Xj r=l Xj

The A-derivativesyield Gl(X) = O,G2(X) = 0, ... ,Gk(X) = O,which arethe constraint equations, so every solution of VH (e, A)=Oautomati­cally satisfies the constraint equations. The x-derivatives yield

60313.3 Maxima and Minima Surfaces

Page 623: Strichartz_The Way of Analysis 2000

Thus in using the method of Lagrange multipliers, you must also searchfor maxima and minima among the points where rank dG < k. Fortu­nately these are usually either isolated points or themselves surfaces oflower dimension to which the method may be applied anew.

The method is also usefui for finding maxima and minima of func­tions over regions given by inequalities-for interior points use the usualV'f = Otest, and for boundary points use Lagrange multipliers.

dG(O, O) = ( ~ ) .

is not a linear multiple of

Vf(O,O) = ( ~ )

Clearly the function f (x, y) = y assumes a minimum at the origin, but

Figure 13.3.1:

If the rank condition on G is not satisfied, the method may notwork. For example, consider G(x, y) = y3 - x2 in ]R2, which gives acurve with a cusp at the origin, as shown in Figure 13.3.1.

know that the normal space is spanned by V'Gl(X), ... , V'Gk(X); hence,V f(x) = -X . VG(x) for sorne X in ]Rk and this says V'xH(x, >.) = O.QED

Chapter 13 Implicit Functions, Curves, and Surfaces604

Page 624: Strichartz_The Way of Analysis 2000

It is possible to give a second derivative test to distinguish betweenmaxima, mínima, and saddle points. Suppose x is a point where

13.3.2 A SecondDerivativeTest·

Since .\ ~ O by the first equation, we obtain either x = y = O or4(x + y)2 - 2 = O and, hence, x + y = ±1/v1.. The points on thelemniscate where x + y = ±1/v'2 give the maximum and mínimumvalues of x + y, since 1(0, O) = O and (O, O), is the only point at whichthe gradient of (x2 - y2)2 - x2 - y2 vanishes.

.\(4(x + y)(x2 - y2) - 2(x - y)) = o.Subtracting the first two equations we obtain

1 + .\(4x(x2 - y2) - 2x) = O,1 + .\(-4y(x2 - y2) - 2y) = O,(x2 _ y2)2 = x2 + y2.

From the first equation we obtain x = O or .\ = -1 and from the secondy = O or .\ = 1, so all solutions (x, y,.\) are (O, ±1, 1), (±1, O, -1).Discarding the .\ values we have the four points (±1,O) and (O, ±1).Finally we evaluate 1at the five possible points and find that (±1, O)are maxima and (O, ±1) are minima, but (O, O) is neither.

As another example we consider the problem of maximizing andmínimizing I(x,y) = x + y on the lemniscate (x2 - y2)2 = x2 + y2.Forming H(x, y,.\) = x+y+.\((x2 _y2)2 -x2 _y2) and setting VH = Owe obtain the equations

V/(x,y) = ( _~:)

vanishes only at the origin (O, O), so this is the only possible point inthe interior. On the boundary x2 + y2 -1 = O we solve VH( x, y, .\) = Ofor H(x, y,.\) = x2 - y2 + .\(x2 + y2 - 1), obtaining the equations

2x + 2x.\ = O, -2y + 2y.\ = O, x2 + y2 - 1 = O.

For example, suppose we want to find the maximum and minimumvalues of I(x, y) = x2 _y2 on the unit disk x2+y2 :s; 1. We first observethat

60513.3 Maxima and Minima Surfaces

Page 625: Strichartz_The Way of Analysis 2000

with the obvious abbrevíations. This may seem like a messy expression,but it is easy to interpreto Remember that h(t) is just a curve lying inthe surface passing through x at t = to, so h'(to) is just an arbitraryvector in the tangent space to the surface at x o Thus the signs of thesecond derivatives of f along curves in the surface through x are givenby the quadratic form associated to the matrix ~ f(x) + ).o~G(x)restricted to the tangent space at X. Notice that the situation haschanged in two ways from the unconstrained problem: first, we restrict

Finally we can substitute this back into (*) to obtain

= _~ ~ ).r 8Gr (x)h"(to)L...J L...J 8x·j=l r=l J

n n k

= LLLXra:~XI(i)hj(to)hHto).j=l 1=1r=l

~ 8/ (x)h"(to)L...J 8x·j=1 J

Now recall that we have 8f/8xj(x) = - ¿::=1 ).r8Gr/8Xj(x), so ifwemultiply each of the equations (**) by Ar and sum we obtain

n n 2G n 8GL L a: a:, (i)hj(to)hHto)+L axr (i)hj(to) = O.j=1 1=1 J j=l J

Now we want to eliminate h" from this expression. We can do this byobserving that Gr(h(t)) == O,so (tP /dt2)Gr(h(to)) = Ofor r = 1, o.o,k.Thus

VH(i).) = O and dG(x) has rank k, As in the proof of the theo­rem let h(t) be a C2 curve lying in the level set G(x) = Opassingthrough X, h(to) = Xo We assume that f and G are also C20 We wantto compute the second derivative tP / dt2 (f oh) at too This is

tt a:2:XI (i)hj(to)hHto)+t::(i)hj(to).j=1 1=1 J j=1 J

Chapter 13 Implicit Functions, Curves, and Surfaces606

Page 626: Strichartz_The Way of Analysis 2000

Taking G(x, y) = y ± x2 and applying the method oí Lagrange mul­tipliers we set H(x, y,..\) = y + ..\(y ± x2). From VH(x, y,..\) = Owe

Figure 13.3.2:

for all u such that dG(x)u = O is a necessary condition for a maxi­mum (or minimum). If we have strict inequality, then we have a localmaximum (or minimum)-this requires an additional argument similarto the one in the unconstrained case to show that there is a neigh­borhood oí x on the surface (as opposed to a union oí neighborhoodson the curves passing through x) such that I (x) is the maximum (orminimum).

Let's look at some examples. The parabolas y = ±x2 shown in Fig­ure 13.3.2 have the same tangent space at the origin, but the functionI(x, y) = y has a maximum on one and a mínimum on the other.

attention to the tangent space to the surface at x; second, we "correct"the Hessian cPf(x) by adding the term .\. d2G(x). Here.\ is the vectorgiven by the equation df(x) + X· dG(x) = o. Also the tangent space atx is the set oí vectors u in Rn such that dG(x)u = o. Thus everythingwe need ís computable from G.

Now íf f has a local maximum (or minimum) at x, then(cP/dt2)f o h(to) ís always ~ O (or ;:::O), so

(d2 f(x) + X . d2G(x))u, u) ~ O (or ;:::O)

60713.3 Maxima and Minima Surfaces

Page 627: Strichartz_The Way of Analysis 2000

sowe have a maximum in the first case (lowerparabola) and a minimumin the second case.

For another example, consider the function I(x, y, z) = x - y onthe unit sphere :1:2 + y2 + z2 - 1=O. Then we find from VH =Othat

1+ 2AX = O, -1 + 2Ay = O, 2AZ = O, x2 + y2 + z2 = 1,

which has the two solutions

(z,y,z,~) = (V;, -f,o, -f) and

= (- V;, V;,o, V;) .The matrix tP f + AtPG is

~O ~D,

(ngives

( ('F~ ~) ( ~ ) , ( ~ ) ) = 'f2u2,

to vectors of the form

('F~ n.Now the tangent space at (O, O) is the x-axis (givenby (u, v)·VG(x, y) =±2xu +v = O,which for (x, y) = Ogives v = O). Restricting the matrix

('F~ n

get ±2XA = O,1+ A = O,y ± x2 = O,which has the unique solution(x, y) = (O,O)and A= -1. Now we compute the matrix

.f1+~d2G = (~ n+x (±~ nand so for A= -1 this is

Chapter 13 Implicit Functions, Curves, and Surfaces608

Page 628: Strichartz_The Way of Analysis 2000

6. Set up the equationsforthe maximumand minimumvaluesof f(x, y) =x3 - 3yx + y on the unit disk X2 + y2 ~ 1.

5. Apply the second derivative test to each of the critical pointsfound in exercise 4.

4. Use the method of Lagrange multipliers to locate possible localmaxima and mínima of the function f subject to the conditionsG = Oin the following:

a. f(x, y, z) = x2 + 4y2 - z2, G(x, y, z) = x2 + y2 + z2 - 1.

b. f (x, y, z) = zx + 2y, GI (x, y, z) = X2 + y2 + 2z2 - 1,G2 (x, y, z) = x2 + y + z. (Set up the equations to be solved.)

c. f(x,y,z) = x2 + y2 + z2,G(x,y, z) = X2 + 4y2 - 2z2-1.

3. Let Mm be a el surface in Rn that is compact (as a metric sub­space of Rn). Prove that there exist a pair of points x and y inMm that maximize the distance among all such pairs and thatthe diameter joining them intersects the surface perpendicularlyat both points.

1. Let Mm be a el surface in Rn, and let y be a point in Rn noton Mm. If X is a point on Mm that minimizes or maximizes thedistance to y, prove that the line joining X and y is perpendicularto the surface at X [i.e., perpendicular to the tangent space at x).(Hint: it is easier to consider the square of the distance.)

2. If Mm and M:n, are two disjoint el surfaces in ]in and if x in Mmand y in M:n, minimize or maximize the distance apart, provethat the line joining them is perpendicular to both surfaces at thebtersections.

13.3.3 Exercises

which is negative definite on all ofRJ in the first case (.\ = -v'2/2) andpositive definite in the second (.\ = v'2/2). It followsthat it is negativeand positive definite when restricted to the tangent space of the sphereat the appropriate points, so we have a maximum and a minimum atthe points (v'2/2, -v'2/2, O)and (-v'2/2, v'2/2, O).

60913.9 Maxima and Minima Surfaces

Page 629: Strichartz_The Way of Analysis 2000

We can think of this approximation in practical terms as the result ofhammering nails along a curve and pulling a string taut between thenails and measuring the length of the string. As we increase the number

Figure 13.4.1:

Let 9 : [a, b] --+ ]Rn be a continuous curve in ]Rn. We want to talk aboutits length. The idea is very simple. If the curve consisted of broken s­traight line segments, we would simply add the lengths of the segments,measuring the length of a straight line segment by the Pythagoreandistance from one endpoint to the other. For a general curve we canapproximate the length by choosing a sequence of points lying alongthe curve and connecting them in order by straight line segments. Sincewe believe that the shortest distance between two points is a straightline, the length of the approximating broken line curve should be anunderapproximation to the length of the curve, as in Figure 13.4.1.

13.4.1 Rectifiable Curves

13.4 Are Length

7. Show that the entropy - 2:7=1 Xj log Xj is maximized subject tothe constraints 2:7=1 Xj = 1 and all Xj > O,at the point (1/n, lln,... , l/n).

Chapter 13 Implicit Functions, Curves, and Surfaces610

Page 630: Strichartz_The Way of Analysis 2000

exists and is finite, then the curve is rectifiable and the limit is thelength. This is not immediately obvious from the definition but requiresa proof. The idea is to use the uniform continuity of 9 to add pointsto the partition without appreciably increasing the sumo

Suppose PI, P2, ... is the sequence of partitions of [a, b], and denoteby L(Pk) the sum ¿:f=o Ig(tj+l) - g(tj)1 for the partition Pi: (Thenumber of points increases with the partition, so strictly speaking weshould write N¿ rather than N.) Since each L(Pk) ~ L where L denotesthe length of the curve, we have limk_oo L(Pk) ~ L and so it suffices toprove the reverse inequality. Since L is defined to be the sup of L( Q)over all partitions Q, we need to show that for every Q and E > O theinequality L( Q) ~ L(Pk) + E holds for all sufficiently large k; for thenL( Q) ~ limk_oo L(Pk) + E, hence L ~ limk_oo L(Pk) + E and, hence,L ~ limk_oo L(Pk). To do this we will throw the points of Q into thepartition Pk-call the resulting partition Q UPk (we have to rearrangethe points in their correct order and discard repeats, of course). ThenL(Q) ~ L(QUPk), so it suffices to show L(QUPk) ~ L(Pk) +E for aHsufficiently large k.

NL Ig(tj+l) - g(tj)1j=o

If a curve is rectifiable, then the length can be obtained as a limit of¿:f=o Ig(tj+l) - g(tj)1 along a sequence of partitions; and since addingpoints to the partition only increases the value of the sum, we canassume that the maximum of tj+I - tj goes to zero. Conversely, illorone sequence 01partitions with the maximum 01 tj+l - tj going to zerothe limit 01

Deflnition 13.4.1 The length 01 a continuous curve 9 : [a, b] ~ IRn isthe sup 01¿:f=o Ig(tj+l) - g(tj)1 taken ouer all partitions a = to < tI <... < tN = b of the interval [a,b], where Ig(tj+l) -g(tj)1 is the distancebetween the points g(tj+l) and g(tj) in IRn. JI the length is finite, wesay the curve is rectifiable.

of points along the curve, the length of the approximating broken linecurves will increase (by the triangle inequality) , and we expect thelimit to be the length of the original curve. Because the lengths areincreasing we can replace the limit with a sup.

61113.4 Arc Length

Page 631: Strichartz_The Way of Analysis 2000

with s in [tj, tj+l]' By the uniform continuity of 9 (it is a continuousfunction on a compact interval) we can make 8 as small as we like, say6 < efN, by making all the intervals [tj, tj+l] sufficiently small, and wedo this by taking k large enough. Then L(Q UPk) ~ L(Pk) + e andwe have limk_oo L (Pk) = L as claimed. Incidentally, this argumentalso shows that limk_oo L(Pk) = L along any sequence of partitionsPI,P2, ... , such that the maximum length of subintervals goes to zero.The situation is analogous to that oí the Riemann integral.

It is important to observe that the length of a curve is a propertyoí the subset g([a, b]) of R,n and its ordering and not on the particularparametrization given by the function g. This is simplest to state if9 is assumed to be one-to-one. If h : [e, el) --+ [a, b] is a continuousfunction that is one-to-one and onto, then 9 oh: [e, d] --+ lRnis anotherone-to-one parametrization of the same subset of ]in. Since h beingone-to-one implies that h is either increasing or decreasing, the orderof points along 9 O h is either the same or the reverse of the order alongg, so the lengths L(P) for partitions oí 9 and 9 O h are in one-to-onecorrespondence and so the lengths oí the curves 9 and goh are the same.It is possible to show further that any other one-to-one continuous

Ig(8) - g(tj)1 + Ig(tj+l) - g(8)1

in place of Ig(tj+d - g(tj)1 in L(Pk). (To avoid technicalities we assumethat k is taken large enough so that at most one point of Q lies in eachinterval [tj, tj+l]' This is possible because the lengths of these intervalsare going to zero.) For intervals [tj, tj+¡] of Pk containing no points ofQ the same Ig(tj+l) - g(tj)1 will occur in both L(Pk) and L(Q UPk)sums. Thus we have an estimate L(Q UPk) ~ L(Pk) + N8 where 8 isany upper bound for

Ig(8) - g(tj)1 + Ig(t]+l) - g(8)1

We have L(Pk) ~ L(Q UPk), but to obtain the reverse inequalltyL(Q UPk) ~ L(Pk) + e we will have to reason careíully. Let N denotethe number ofpoints in Q. This number remains fixed, while by varyingk we can make the points in Pk increase indefinitely, with the distancebetween them very small. Let 8 denote a point in Q. If s is not in Pk,then it will fall between two points tj and tj+l of Pi, Thus the sumL (Q UPk) will contain

Chapter 13 Implicit Functions, Curves, and Surfaces612

Page 632: Strichartz_The Way of Analysis 2000

and so the length is infinite.

""'1 . 1 .11> L....,¡ tk+1 Slll-- - tk sm-tk+1 tk

2= 2L Itkl = 2L k1r = +00kodd kodd

true that the length is independent of the parametric representation,even of the point g(a) = g(b) on the curve.

It is easy enough to give examples of reasonable curves that are notrectifiable. For example, Figure 13.4.3 shows the graph of y = x sin l/xon [0,1], given by the parametric representation g(t) = (t, t sin 1jt) fort in [O,1]. Note that limt-+o t sin 1jt = 0, so 9 is continuous and, in fact,9 is differentiable at every point of the interior although not at zero.However, if we take the partition points tk = 2jk1r, k = 1, ... , N, thensin 1jtk = ° for k even and ±1 for k odd; hence,

Figure 13.4.2:

g(a) = g(b)

parametrization gl : [e, d] --+ ]Rn of the same set, gl ([e, d]) = g([a, b]),must be of the form gl = 9 o h (see exercises). Thus for simple curves,those that have a one-to-one parametrization, the length is independentof the parametric representation. For curves that intersect themselvesin complicated ways, the situation is more complicated and we will notattempt to describe it. There is, however, one case that is especiallyeasy and important. Suppose 9 : [a, b] --+ ]Rn with g(a) = g(b) butotherwise g(s) =F g(t) for s =F t in [a, b], as in Figure 13.4.2. The imageg([a, b)] is called a simple closed curve. For these it is again

61313.4 Are Length

Page 633: Strichartz_The Way of Analysis 2000

Proof: It suffices to prove the formula if 9 is el over the wholeinterval, since both L and the integral are additive over the finite par­tition. Then J: Ig'(t) 1 dt is the integral of a continuous function, so itis approximated by the Cauchy sums

N

EIg'(tj )I(tj+l - tj)j=O

13.4.2 The Integral Formula for Are Length

Next, wewould liketo derive the familar calculus formula for the lengthof a curve under the assumption that 9 : [a, b] -+ ]Rn is el. Actually, itsufficesto have 9 piecewiseel, which is quite handy for applications.

Theorem 13.4.1 Let 9 : [a,b] -+ ]Rn be a continuous curve, and as­sume that there exists a finite partition a = to < tI < ... < t» = b01 [a, b] such that g'(t) exists and is continuous on [tj, tj+l] (at theendpoints this means one-sided derivatives) [or each j = 0, ... ,N - 1.Then the curve is rectifiable and the length is given by L = J: 19' (t) 1dt.Ifwe recall the interpretation of g' (t) as the velocity of the motion

along the curve and Ig' (t ) 1 as the speed, then this formula says that thedistance traveled is obtained by integrating the speed.

Figure 13.4.3:

Chapter 13 Implicit Functions, Curves, and Surfaces614

Page 634: Strichartz_The Way of Analysis 2000

N NL Ig'(tj)l(tj+l - tj) - L Ig(ti+t) - g(tj)1j=O j=O

N~¿IIg'(ti)l(ti+1 - ti) -19(ti+1 - g(tj)"j=O

N

~ 6¿(ti+1 - tj) = 6(b- a).j=o

by Minkowski's inequality. Thus

I(g(tj+l) - 9(ti)) - 9'(ti)(tj+1 - tj)1~ 6(tj+1 - tj)where 6 is the sup of 19'(t) - g(ti)1 for t in [tj, tj+l]. By the unifor­m continuity oí g (t), we can malee 6 as small as desired by takingthe partition fine enough. Using the triangle inequality in the formIIul-lvll ~ lu-vi for u = 9(ti+1) - g(tj) and v = g(tj)(ti+1 - ti), wealso have IIg(tj)l(tj+1 - tj) -Ig(ti+l) - g(tj) 11 ~ 6(tj+1 - ti)' Addingup, we obtain

r ,g(ti+t) - g(tj) = t. 9 (t) dtJ

by the fundamental theorem of the calculus (since g( t) takes valuesin Rn, we apply the theorem to each coordinate and put them all to­gether). If the interval is small, then g'(t) won't vary much over it, soJtt~+1 g'(t) dt will be approximately g'(ti)(ti+l - tj). More precisely,

J

ltH1t . g' (t) dt - g' (tj )( ti +1- ti )J

= [i+1 (g'(t) _ 9'(ti)) dtJr~ t. Ig'(t) - g'(ti)1 dtJ

for partitions of the interval (not to be confused with the single partí­tion in the hypothesis of the theorem). We want to say thatIg(tj)l(tj+l - tj) is approximately equal to Ig(ti+l) - g(ti)l. To dothis we will show that the vectors g'(tj)(ti+l - ti) and 9(ti+l) - 9(ti)are approximately equal and, hence, their lengtbs are approximatelyequal. But

61513..4 Are Length

Page 635: Strichartz_The Way of Analysis 2000

13.4.3 Are Length Parameterization*

The notion of are length allows us to choose a particularly apt para­metric representation for a simple rectifiable curve, in which arc lengthis the parameter. Let us say that the curve is initially given by 9 :[a, b] -+ R". We then define a function s(t) on [a, b] by setting s(to) e­qual to the length of the curve for t restricted to [a, tolo Then s(t) takesvalues in [O,L] and is monotone increasing. Since we are assuming thecurve is simple, 9 is one-to-one and so s(t) is strictly increasing. It isnot hard to show that s(t) is continuous because the curve is rectifi­able (see exercise set 13.4.4). Thus s is a one-to-one function taking[a, b] onto [O,L], so it has a continuous inverse because the intervals arecompact (see exercises). We denote the inverse simply by t(s). Theng(t(s)) : [O,L] -+ Rn is a parametric representation of the curve andthe parameter s equals the length of the curve from the inítial pointcorresponding to s = ° to the point with parameter s. We can inter­pret this parametric representation as tracing out the curve with speedequal to one, even if the derivative doesn 't exist so that there is notangent vector. For a simple curve it is easy to see that there are onlytwo ways to parametrize by are length-starting at either of the ends,

The are length parametrization has several use fui properties. If thecurve is a el one-dimensional surface so that it is given by a el function9 : [a, b] -+ Rn with g'(t) '# 0, then the arc length parametrization willbe el (in fact it will be ek if 9 is ek). The tangent vector will thenhave length equal to one. If the arc length parametrization x (s) is c-,then the acceleration vector X"(S) will be normal to the curve. Thisfollows by differentiating the identity x' (s) . x' (s) = 1. This is called

since g(t) = (hl (t), ... , hn-l (t), t) is the associated parametrization.

( )

1/2

{ 1+ ~hj(t)2 dt,

In particular, if the curve is given as the graph of a el function ofone of the variables, say (xl, ... ,Xn-l) = h(xn) for a ~ Xn ~ b, thenthe length is given by

Thus the integral and the length are equal. QED

Chapter 13 Implicit Functions, Curves, and Surfaces616

Page 636: Strichartz_The Way of Analysis 2000

3. Let 9 : [a, b] -+ Rn be a rectifiable curve, and let s(t) : [a,b] -+[O, L] be defined by s(to) = length of the curve 9 restricted to[a, bo]. Prove that s( t) is continuous.

4. Let 91 : [a, b] -+ R" and 92 : [c, d] -+ lRnbe two one-to-one con­tinuous function with the same image. Prove that there exists acontinuous function h : [a, b] -+ [c, d] such that 91 = 92 o h.

5. Prove that a one-to-one onto map of compact intervals (or moregenerally compact metric spaces) has a continuous inverse.

6. Show that there are exactly two ways to parametrize a rectifiablecurve so that the are length is equal to the parameter.

7. Prove that if a curve is given by a el parametrization 9 with9'(t) i= O, then the are length parametrization is also el and hastangent vector of length one. Show also that if 9 is c-, then sois the are length parametrization. Give an example to show thatthe condition 9'(t) =F Ois essential.

8. Prove that if a C2 curve lies in a plane in lRn,then the osculatingplane to the curve at any point is that planeo

9. Prove that the radius of curvature of any circle is equal to radiusof the circle.

1. Prove that the shortest curve joining two points is a straight linesegmento Can you also give a proof directly from the formulaL = f: 19'(t)1dt for those curves to which it applies?

2. Prove that are length is additive: if the curve is divided into twoparts the sum of the lengths of the parts equals the length of thewhole.

13.4.4 Exercises

the principal normal. The plane determined by the tangent vector andthe principal normal is called the osculating plan e, and the reciprocalof the length of the principal normal is called the radius 01 curtlature.The circle of that radius in the osculating plane tangent to the curveat the point (on the appropriate side) is thought of as the circle thatbest approximates the curve at the point.

61713..4 Are Length

Page 637: Strichartz_The Way of Analysis 2000

Theorem 11A is invertible and IIBA-lII < 1, then A +B is invertibleand (A+ B)-l = A-l ¿~0(-1)k(BA-l)k, the series converging.

Definition An m x m matrix A is called invertible (or nonsingular)if ihere exists an m x m matriz B with AB = I! uihere 1 denotes the m x midentity matrix. We write B = A -1. The norm IIAII of an m x m matrix isthe least constant e such that IAxl :S e Ixl for all x in}Rm, where Ixl denotesthe Euclidean norm.

Theorem y( x) is a solution 01 the ezact o.d. e. M 8y / B» + N = O iland only il F( z,y( x)) = c [or sorne constant c, [or all X.

Deflnition An o.d.e. in the form M(x,y)dy/dx+N(x,y) = Ois calledexact il there existe F(x, y) such that M = 8F/8y and N = 8F/8x.

13.1 Implicit Function Theorem

13.5 Summary

12. Let (r(t),8(t)) for t in [a, b] describe the polar coordinates of acurve in the planeo Assuming r(t) and 8(t) are el functions, findan integral fomula for the are length of the curve.

is finite, the sup taken over all partitions of [a, bD.11. Prove that a Lipschitz continuous curve 9 : [a, b] -t ]Rn is rectifí­

able.

N

sup¿ I/(tj+l) - l(tj)1j=O

10. Prove that a continuous curve 9 : [a, b] -t lRn is rectifiable ifand only if each coordinate function gk is of bounded variation (afunction I :[a,b] -t lR is said to be of bounded variation if

Chapter 13 Implicit Functions, Curves, and Surfaces618

Page 638: Strichartz_The Way of Analysis 2000

Example The cusp curve in ]R2 given parametrically by g(t) = (t3, t2)

Definition A subset A of]Rn is given parametrically if A = Uj gj (Uj)where U¡ ~ ]Rm is open and 9 : U¡ --+ ]Rn is continuous; implicitly ifA = {x : F(x) = O} where F : ]Rn --+ ]Rk is continuaus; as the graphof a functian if A = {(t, s) in ]Rm+k : s = f(tn where f : ]Rm --+ ]Rk

is continuaus, m+ k = n, and the variables tI, ... , t-«, SI, ... ,sk are apermutation of the variables Xl, ... , xn.

13.2 Curves and Surfaces

Theorem Jf M(xo, Yo) =f. O, then the exact o.d.e. M(dyjdx) + N = Ohas a unique solution satisfying y(xo) = Yo in a neighborhaod af xo.

Remark The inverse function theorem is a special case af the implicitfunctian thearem. Conversely, the implicit functian thearem far F(x, y)is a cansequence of the inverse function theorem in ]Rn+mfor f (x, y) =(x, F(x, y)).

Theorem 13.1.2 (Jnverse Function Thearem) Let f be a el functiondefined in a neighborhood of y in ]Rn taking values in ]Rn. Jf df (Y)is invertible, then there exists a neighborhaod U of x = f (Y) and ael function 9 : U --+ ]Rn such that f (g( x)) = x far every x in U.Furthermore, 9 maps U one-to-one onio an apen neighbarhaod V af Yand g(l (y)) = y for every y in V. The functian 9 is unique in that forany x in U there is only one z in V with f(z) = x, namely z = g(x).Finally dg(x) = df(y)-l if f(y) = x.

Theorem 13.1.1 (Implicit Function Theorem) Let F(x, y) be a elfunction defined in a neighborhood of x in]Rn and y in]Rm taking valuesin ]Rm, with F(x, y) = c. Then if Fy(x, y) is invertible there exists aneighbarhood U af x and a el function y : U --+ ]Rm such that y( x) = yand F(x, y(x)) = c far every x in U. Furthermore y is unique in thatthere exists a neighborhood V of Y (the image y(U)) such that there isonly one solution z in V of F(x, z) = c, namely z = y(x). Finally thedijJerential af y can be computed by implicit dijJerentiatian,

dy(x) = -Fy(x, y(x))-l Fx(x, y(x)).

61913.5 Summary

Page 639: Strichartz_The Way of Analysis 2000

Deflnition In a el surjace, the subsets V nMm are called coordinatepatches and the [unctions g-l : V nMm --+ U are called local coordinatemaps.

Theorem 13.2.1 Let 9 : U --+ ]Rnwith U ~ ]Rn open be an immersion.For every point x in U there exists a neighborhood Ü 01 x such that 9is one-to-one on Ü and g(Ü) is the graph 01 a el junction.

Deflnition 13.2.1 A elm-dimensional surjace in]Rn (with m ~ n) isa subset Mm 01IRn locally parametrized by embeddings, in the sense thatevery point in Mm lies in a neighborhood V in]Rn such that there existsan embedding 9 : U -+ ]Rnlar U ~ ]Rm open such that g(U) = V nMm.

Deflnition If 9 : U --+ IRn is an embedding, the tangent space to g(U)at a point ií = g(x) in g(U) is the span 01 the vectors 8g/8xk(X), k =1,... ,m.

Deflnition The rank 01 an m x n matrix A is defined to be the dimen­sion 01 the image 01A (regarded as a linear transformation [rom IRm toIRn). It is equal to the dimension 01 the span 01 the columns 01A in IRnand is equal to the dimension 01 the span 01 the rows 01 A in]Rm andequal to the size 01 ihe largest invertible square submatrix. A el [une­tion 9 : U --+ ]Rnwhere U ~ ]Rm is open is called an immersion il dg( x)has rank m lar every x in U. 11 9 is also one-to-one and the inverse[unction g-l : g(U) --+]Rm is continuous it is called an embedding.

Deflnition 11 9 : (a, b) --+ IRn is a el curve, then g' (t) is the tangentvector to the curve at g(t).

Example The unit sphere in IR3 can be represented parametrically byspherical coordinates (x, y, z) = (cos (J sin <p, sin (J sin <p, cos <p), implicitlyby x2 + y2 + z2 - 1 = 0, or locally as the graph 01 a [unction by x =±J1 - y2 - z2 or y = ±V1 - x2 - z2 or z = ±J1 - x2 - y2.

lar t in IR1, or implicitly by y3 - x2 = O, or as the graph 01 y = x2/3,has a nonsmooth point at (0,0).

Chapter 13 Implicit F\mctions, Curves, and Surfaces620

Page 640: Strichartz_The Way of Analysis 2000

Theorem (Second Derivative Test) 11 I attains a local maximum (re­sp. minimum) on G(x) = O at i and the conditions 01 the LagrangeMultiplier Theorem hold and il I and G are assumed e2, then d2I (i) +-X • d2G (i) is non-positive (resp. non-negative) definite on the tangent

Example 11 G(x, y) = y3 - x2 and I(x, y) = y, then I attains itsminimum value on G = O at (O, O) but H does not have a critical point01 the [orm (O; O, >.). In this case the rank condition 01 the theorem doesnot hold.

Theorem 13.3.1 (Lagrange Multipliers) Let I : ]Rn --t ]R and G : ]Rn --t]Rk be el functions, and let i in]Rn be a point where G(i) = O anddG(i) has rank k. III(i) attains a local maximum or minimum [or Ion {x : G (x) = O}, then there exists -X in ]Rk such that H : lRnH --t ]R

defined by H(x, >.) = I(x)+ L:J=l >'jGj(x) has a critical point at (x, -X).

13.3 Maxima and Minima on Surfaces

Theorem 13.2.2 Let F : ]Rn --t ]Rn-m be a el function with rankdF(x) equal to n-m at every point on a level set {x : F(x) = e}.Then this level set is an m-dimensional el surface Mm and V'Fj (x)[or j = 1, ... ,n - m span NMm(x).

Definition The normal space NMm(x) to a surface Mm at a point xin Mm is the (n - m) -dimensional subspace ol]Rn 01 all vectors perpen­dicular to TMm(x).

Theorem The tangent space to gj(Uj) at a point y on a surface Mmdoes not depend on the coordinate patch and can be taken as the del­inition 01 the tangent space TmM(fj) to Mm at y. It is equal to theset 01 tangent vectors g' (t) at g( to) = y as 9 ranges over all el curves9 : (a, b) --t Mm lying in the surface and passing through y.

Theorem 11VI nMm and V2nMm are coordina te patches that intersectand e, and Ü2 are the images of the intersection VlnV2nMm under gIl

1 1 - - 1 - -and g:; in Ul and U2, then g:; o gl : U1 --t U2 and gl o g2 : U2 --t Ulare el [unctions.

62113.5 Summary

Page 641: Strichartz_The Way of Analysis 2000

Theorem A simple rectifiable curve can be pammetrized by are length:il L denotes the length there exists h : [O, L] --+ ]Rn onto the curve suchthat the length 01h restricted to [O, s] equals s, [or O :::; s :::;L, and thereare exactly two possibilities [or h (the other being h (L - s)).

Theorem 13.4.1 A piecewise el continuous curve 9 : [a, b] --+ ]Rn

(meaning there exists a finite partition a = to < tI < ... < tN suchthat 9 is el on each subintervaQ is rectifiable, and the length equalsJ: 19'(t)ldt.

Example The gmph 01 y = x sin l/x on [0,1] is a continuous curvethat is not rectifiable.

Theorem The length 01 a simple curve, or a simple closed curve, doesnot depend on the pammetrization.

Definition A curve 9 : [a,b] -+ ]Rn is called simple il 9 is one-to-one.Jt is called a simple closed curve il 9 is one-to-one except g( a) = g( b).

Theorem JI ¿f=o Ig(tj+1) - g(tj)1 has a finite limit [or a sequence 01partitions with the maximum 01 tj+l - tj going to zero, then the curveis rectifiable and the limit equals the length.

Definition 13.4.1 The length 01 a continuous curve 9 : [a, b] --+ ]Rn isthe sup o!"r:/!=o Ig(tj+1) - g(tj)1 over all partitions a = to < tI < ... <tN = b. JI the length is finite the curve is called rectifiable.

13.4 Are Length

space at x to G(x) = O. Conversely, il I and G are e2, il dG has mnkk at x, il (x,.\) is a critical point 01H, and il d2/(x) + .\.d2G(x) isnegative (resp. positive) definite on the tangent space to G(x) = O at x,then x is a strict local maximum (resp. minimum) 01 I on G(x) = O.

Chapter 13 Implicit Functions, Curves, and Surfaces622

Page 642: Strichartz_The Way of Analysis 2000

623

The idea of the Lebesgue integral is to enlarge the class of integrablefunctions so that J: f(x) dx wiIl be given a meaning for functions fthat are not Riemann integrable. For functions that are Riemann in­tegrable the Lebesgue theory will assign the same numerical value toJ: f(x) dx as the Riemann theory. Thus the Lebesgueintegration the­ory can be thought of as a kind of completionof the Riemann integra­tion theory. This can be given a precise sense in terms of the metricd(f, g) = J: If(x) - g(x)ldx on the continuous functions C([a, b]) sothat the Lebesgueintegrable functions are obtainable from the contin­uous functions by the same process as the real numbers are obtainedfrom the rational numbers. However,it is best if we observe this factafter we have developedthe Lebesguetheory in a more concrete way.

Before beginning on the rather difficult patb of developing theLebesguetheory wewill recall someof the weakpoints of the Riemanntheory that can serve as motivation for seekinga better theory. TheRiemann theory worksweIlfor continuous functions and for uniformlyconvergent sequences, but one weakness is that it breaks down if wego too far beyond this tame territory. We have seen that a boundedfunction with a countable set of discontinuities is Riemann integrable,

14.1.1 Motivation

14.1 The Concept of Measure

The Lebesgue Integral

Chapter 14

Page 643: Strichartz_The Way of Analysis 2000

by the Cauchy-Schwartzinequality). The Riemann theory of integra­tion, whichwas invented to handle similar problems in Fourier series,fails to givea meaning to this integration. Certainly the trend in math­ematical analysis has been to consider more and more wildly behavedfunctions. Perhaps this trend has been encouraged by the Lebesguetheory, which allows us to deal with such functions, but it is still aworthwhile goal to pursue the integration of all functions that comealong. (It would be nice to have a theory that provides an integral forall functions, but this turns out to be impossible.)

because the factor n in the denominator helpsmake the series converge(ifwe assume E lan 12< 00, which is a weakercondition than L lan I <00, we have

where r¡, r2," . is an enumeration of the rationals, with JOI In(x) dx =O.Nowonemightobject that Dirichlet's exampleis somewhat artificial,but it is possible to conceiveof more natural examples. For example,consider a Fourier series E~oo aneinx that is given by specifying thecoefficientsan in some way. Unless we know that E~oo lanl is finitewe cannot be sure that the series convergesto a continuous function.Neverthelessit is tempting to integrate the series term-by-term as

is not, even though it is a pointwiselimit of functions

In (x) = {1 if x = ~l ~r2~· .. ,rn~O otherwise

if x is irrational,if x is rational¡(x) =U

but Dirichlet's function

Chapter 14 The Lebesgue Integral624

Page 644: Strichartz_The Way of Analysis 2000

Nevertheless we will find two rather useful criterion for interchanginglimits and integrals-the monotoneconvergencetheorem and the dom­inated convergencetheorem.

A third weaknessof the Riemann theory is that improper integralshave to be treated separately. In the Lebesgue theory we will be ableto treat absolutely convergent improper integrals on the same footingas proper integrals.

A fourth weakness is that we have no reasonable criterion for de­ciding whether or not a function is Riemann integrable (Riemann didin fact give such a criterion, but 1 have not bothered to present it be­cause it seems no easier to apply than to verify the definition of theRiemann integral). With the aid of the Lebesgue theory it is possibleto give a criterion for the Riemann integral to exist (although it mustbe admitted that wedon't have a very good criterion for the Lebesgueintegral to exist).

Finally a fifthweaknessinvolvesthe theory ofmultiple integrals. Wehave postponed discussingmultiple integrals until after the Lebesguetheory because the Riemann theory yields only very awkward and in­complete results.

In addition to overcomingthese weaknesses,the Lebesgue theoryyields a remarkable bonus-it allows a very far-reaching and fruitfulgeneralization of the concept of integration. Wehave already observedat least superficially an analogy between infinite series and integrals.

then Jol In(x) dx = 1 but limn_ooIn(x) = O at every point, so

t lim In(x) dx -j; lim t In(x) dx.Jo n-oo Jo

In(x) = {n if O < ~ < 1/n,O otherwise,

A secondweaknessof the Riemann theory of integration is the lackof a good convergencetheorem. We have seen that the Riemann inte­gral can be interchangedwith a uniformlimit, but in many applicationsthis is not adequate. For example,with Fourier serieswe frequently donot have uniform convergence,even if the function is continuous. Ofcourse even in the Lebesgue theory we will not be able to interchangeal! limits with integration. For example, if

62514.1 The Concept 01Measure

Page 645: Strichartz_The Way of Analysis 2000

The Lebesguetheory allowsus to say that the sum ofan absolutely con­vergent series is a form of integration, and this conceptual frameworkallows us also to give a foundation to probability theory. Lebesgue'stheory stands at the doorway to twentieth century mathematics, andall who enter must pass through this gateo

Much of the motivation we have given for the Lebesgue theorycomesafter the fact. Historically,Lebesguewas viewedby his contem­poraries as almost a crack-pot--or at least as someonewho was goingvery far out on a limb to study esoteric problems of little importanceto the mainstream of mathematics. He was once asked why he both­ered to study the problemof definingthe area of very irregular surfacessince the usual calculus formula was valid for all surfaces that wouldever arise in applications. Lebesgue replied by producing a crumpledhandkerchiefl However,once Lebesgue's results were published theywere rapidly recognizedas being of great importance.

Lebesgueexplains the basic idea of his method by a parable. Sup­pose a merchant wishes to add the day's receipts. The most straight­forward approach would be to add the amounts in the order in whichthey came, say 5+ 10+ 1+1+25+5+10+50+25+ 10 (in Lebesgue'sday you could buy things for those amounts). But a better approachwould be to tally the number of coins of each denomination, 2 pen­nies, 2 nickles, 3 dimes, 2 quarters, 1 half-dollar, and then compute2 x 1+ 2 x 5+ 3 x 10+ 2 x 25+ 1 x 50= 2+ 10+30+ 50+ 50= 142.The Riemann integral is like the first approach; it adds the values ofthe function f (x) in the order they occur-partitioning the domain.Lebesgue's integral is like the secondapproach; it first sorts the valuesof f (x )-partitioning the mnge, and then sums the valuesmultiplied bythe sizeofthe set on whichthey occur. Of course if the function is con­tinuous, or nearly so, the valuesof f(x) will not vary muchovera smallinterval and the Riemann integral will work well. But if the functionf (x) is wildlydiscontinuous-as the parable suggests it verywellmightbe-then partitioning the domain really makes no sense. Why shouldf(zj)(xj - Xj-l) for Zj in [Xj-lI Xj] represent a good approximation ofJX! f(x) dx even if the interval is small if the values of f(x) on theX)-l

interval vary considerably? On the other hand, if A] denotes the set ofx such that Yj-l < f(x) ~ Yj, then Yj times the size of the set Aj willbe a good approximation to the contribution to the integral of f overthe set Aj provided Yj - Yj-l is small. H Yo < Yl < Y2 < ... < Yn is a

Chapter 14 The Lebesgue Integral626

Page 646: Strichartz_The Way of Analysis 2000

Before we can discuss the concept of Lebesgue measure we need todiscover the basic properties of the concept of length of an interval, forthese will be the properties we can hope to generalize. If1denotes aninterval (a, b) or [a, b] or (a, b] or [a, b), then b - a is its length, which

14.1.2 Properties of Length

If the function is wildly discontinuous, then we can expect evenworse trouble. Thus we need to be able to give a numerical value tothe size oí a set for rather complicated sets, which generalizes the lengthof an intervalo This leads to the concept of Lebesgue measure.

Figure 14.1.1:

partition of the range of I (we assume for simplicity that I is boundedso we can do this), then the Lebesgue approximating sum Ej=l YjlAjl,where IAjl denotes the size ofthe set Aj = {x: Yj-l < I(x) ~ Yj}, willapproximate the Lebesgue integral J: I(x) dx.

To complete the program we have to make precise the notion of thesize of the set Aj. Here we come against a formidable difficulty. Even ifthe function I is quite smooth, the set Aj can be quite hairy; if I(x) =x2 sin l/x (as shown in Figure 14.1.1) the set where O< I(x) ~ 1 is acountable union of intervals.

62714.1 The Concept 01Measure

Page 647: Strichartz_The Way of Analysis 2000

we will denote IJI. Notice that if a = -00 or b = +00, then 111= +00,so that 111 is a non-negative extended real number. Perhaps the mostobvious property of the length is its additivity: if 1 is the un ion of twodisjoint intervals 1 = h U h (say 1 = (a, b) and h = (a, e), h = [e, b)with a < e < b), then 111= Ih 1+ Ihl. Notice that this makes senseeven if sorne of the lengths are zero or infinity. By induction we canobtain the finite additivity of length: if 1 = hu· .. U In is a disjointunion of intervals, then 111= 1111+ ... + 11nl. Actually the inductionargument is not completely trivial since we need to observe that it ispossible to remove one of the intervals 1k (say the one containing anendpoint or a neighborhood of an endpoint if 1 is open) so that theunion of the remaining intervals is still an interval.

An immediate consequence of the finite additivity is the followingsubadditivity: if 1 is covered by h, h, ... ,In, not necessarily disjoint,then 111 :S L:j=1 11j [. We leave the details of the proof as an exercise,the idea being that we can shrink the intervals lj to 1~ so that 1 is thedisjoint union 1 = 1~ U ... U 1~. An important special case, n = 1, iscalled monotonieity: if 1 < J, then 111 :S IJI.

From the finite additivity it is tempting to jump to a general addi­tivity principle that would inelude infinite unions. Here, however, wecome up against a famous paradox that troubled many of the math­ematicians who worked on the foundations of calculus: an interval ofnon-zero length, say (0,1), is the union of the points it contains, yeteach point e is an interval [e, el of zero length. How can a non-zero valuebe obtained by summing an infinite number of zeroes? There is no wayto make sense out of this-although sorne have tried by arguing thatthe zeros are actually infinitesimals. The only way out is to coneludethat there is no general principle for additivity of length. However, weobserve that in the example we broke up the interval (0,1) into an un­countable disjoint union of intervals. We know that uncountable setsare apt to cause more trouble than merely countable infinite sets, whichare in effect limits of finite sets. Suppose we had a countable disjointunion 1 = h U hu· .. of intervals, with 1 also an interval. We couldhope that 111= Ih 1+ Ih 1+. ". Such a principle does in fact hold, andwe call it eountable additivity or a-additivity for short. Although it isplausible because a countable union is a limit of finite unions, we can­not prove it quite so easily because there are many complicated waysto write an interval as a countable disjoint union of intervals. Instead

Chapter 14 The Lebesgue Integral628

Page 648: Strichartz_The Way of Analysis 2000

Figure 14.1.2:

Letting n -+ 00we obtain ¿j:1 11j1s IJI.The tricky step is obtaining the reverse inequality. We could try

to argue directly that the J-intervals in the aboye picture must getsmaller as we add more l-intervals, but it is hard to do this convincingly.Instead we resort to subterfuge. We shrink the interval 1 if need be tomalee it compact, and we expand the lj intervals if need be to maleethem open. We claim we can do this while changing the values of 111and ¿~1 11j1by at most €. Indeed if a and b are the endpoints of 1,then set (if e is small enough) l' = [a+ é/2, b - é/2]; and if aj andb¡ are the endpoints of Ij, set Ij = aj - 2-Jé/2, bj + 2-jé/2). We thenhave l' compact, lj open, and 11'1= 111- €, ¿~111jl = e+ ¿~111jl.Because the lj cover 1, it followsthat the lj cover I' since we expandedthe coverers and contracted the covered. The lj are no longer disjoint,but this doesn't matter. By the Heine-Borel theorem there is a finitesubcover, l' ~ uf=1 1j for some finite N.

Proof: Consider first the case when 111is finite. If we break off theunion alter a finite number of terms, then 11U 12U ... U In ~ 1 andthe order of the terms can be rearranged if need be so that 1k-l liesto the left of Ik - for 2 ~ k $; n. We can then fill in other intervalsJ¡,"', Jm as needed in between (as indicated in Figure 14.1.2) so that1 = 11U ••. U In U JI U ... U Jm (disjoint) SO that 111= ¿:j=111jl +¿:~=1 IJkl by the finite additivity, hence ¿:i=111j1s 111·

Lemma 14.1.1 Let 1 be an internal, 1 = 11Uhu··· where 11,12,."are disjoint interoals. Then 111= ¿~1 11j1 (we interpret this to meanthat il one side is +00, then so is the other, where ¿~1 11j1 can be+00 either because one 01 the summands is +00 or because the seriesdiverges).

we must resort to trickery and the Heine-Borel theorem.

62914.1 The Concept 01Measure

Page 649: Strichartz_The Way of Analysis 2000

111= ¡nf {~Iljl : 1 c:;Q lj

where {lj lis any countable covering 011 by intervals.}

Proof: We can always cover 1 by itself (It = 1 and lj = 0 forj > 1), showing that the inf is at most IJI. Thus it suffices to show1 ~ U~l t, implies 111:S L:~1 11jl. If the intervals t, are not disjointwe can replace them by smaller intervals that are disjoint, in the process

Corollary 14.1.1 11 1 is any interval, then

It is perhaps worth pointing out a technical aspect of the proof:when we expanded the intervals lj we threw in a factor of 2-j so thatthe sum of all the errors would remain small. This is a common themein many arguments in Lebesgue integration theory. We can paraphraseit by saying a countable number 01 small errors can be made small.

When we give a formula for Lebesgue measure we will need thefollowing corollary to the lemma.

(for then a = -00 or b = +00), so L:~111jl ~ +00 hence = +00. QED

lim 11n [-N,N]I = +00 if 111= +00N-oo

since 11jn [-N, N] 1:S 11j1by the monotonicity. But

00

11n[-N,N]1 = Elljn[-N,N]Ij=l

00

Now the finite subadditivity implies 11'1:S L:f=l 11jl. Finally, from11'1:S L:f=111jl and 11'1= 111- s, 11jl= 11jl+ 2-jé we obtain 111- e :Se+L:f=l 11j1:S e+L:~1 11j1and, since this holds for every é > O, 111:SL:~1 11jl· This completes the proof when 111< oo.

If 111= +00 we simply intersect all the intervals with [-N, N] andlet N --+ oo. By the previous case we have

Chapter 14 The Lebesgue Integral630

Page 650: Strichartz_The Way of Analysis 2000

Our first choicewouldbe to measure all sets. Howeverthis turns out tobe impossible if we want to retain the properties listed aboye. (Actuallythe situation is more complicated: we need the uncountable axiom ofchoice to prove the impossibility, and Solovay has shown that withoutsome such strongly nonconstructive axiom it is impossible to prove theimpossibility.) This does not turn out to be too devastating a blowto the theory, since we can do analysis with a more restricted class of

14.1.3 Measurable Sets

We are not claiming that these three statements determine 1I1uniquely-íar from it, there are many different ways to define a measureoí size 111 for every interval in such a way that these three statementsare valid. Rather we are claiming that these represent a sort of minimaldistillation oí the concept oí length, or even oí the general concept oímeasurement of "extent" that would include such things as area, vol­ume, mass, and probability. We wiIl eventually take them to be axiomsfor the abstract concept oí measure, just as we took a íew simple prop­erties of distance and used them for axioms for the abstract concept ofmetric. But before doing this we need to extend our vision beyond thesimple world oí intervals and come to grips with the question of whichkinds oí sets we need to measure.

1. 1IIis a value in [0,00].

2. III= Oif I is the empty intervalo

3. 1I1=E~llljl if I =U~l t, and the t, are disjoint.

We have now completed a description oí all the basic properties ofthe length of an interval that will be needed for the generalization ofthe concept oí measure. We can summarize them succinctly in threesimple statements:

reducing ¿~1 l/jI· If / is then not equal to U~l /j, we can make it soby replacing each interval with its intersection with J, again reducing¿~llljl. After these two reductions we apply the lemma to obtainequality 111 = E~llljl, hence we must have had the desired inequalityall along. QED

6311/,.1 The Concept 01 Measure

Page 651: Strichartz_The Way of Analysis 2000

F.

3. If A and B are in F, then A U B is in F.

Here all the sets in F are subsets of a fixed universe X and thecomplement is defined with respect to this universe, eA = {x in X : xis not in A}. For most of our applications X will be a subset of sorneEuclidean space. It is a simple exercise to show that these axiomsimply that if A and B are in F, then A n B and A\B are also inF. The term algebra of sets is sometimes used instead of field. Thealgebraic terminology derives from the fact that a field forms a Booleanalgebra under the operations A+B = A~B, A· B = AnB where A~Bdenotes the symmetric difference defined by A~B = (A\B) U (B\A) =(A U B)\(A n B), as indicated by the shaded region in Figure 14.1.3.

An important example of a field of sets is the following: take forX any fixed interval of IRand take for F all sets that are finite unionsof intervals contained in X. We leave the simple verification of theaxioms as an exercise. It is necessary to consider unions of intervalsrather than merely intervals in order to have a field of sets.

So that we will be able to perform limit processes on functions,we have to assume more about the measurable sets-namely that theyare preserved under countable set-theoretic operations. For technicalreasons it suffices to assume the following additional axiom:

4. If Al, A2, ... is a sequence of sets in F, then A = U.i=l Aj is in

2. If A is in F, then the complement eA is in :F.

1. The empty set is in :F.

sets. It does mean, however, that we have to be more careful than wemight prefer to be. Recall that the kind of sets we want to measureare of the form A = {x : a < f (x) ::; b} for sorne function f we want tointegrate. Because we cannot measure all sets, we cannot integrate allfunctions; but we would like to integrate as many functions as possible,so we want the collection of measurable sets, those A for which IAI isdefined, to be as versatile as possible so that we can manipulate thefunctions freely. This means we want these sets to be preserved underthe usual operations of set theory: union, intersection, complement,difference. Any collection of sets with this property is called a field ofsets. Technically it suffices to assume only the following axioms for afield F of sets:

Chapter 14 The Lebesgue Integral632

Page 652: Strichartz_The Way of Analysis 2000

Any field of sets that satisfies this condition is called a a-field. Theidentity n~l Aj = C(U~l eAj) shows that a a-field is closed undercountable intersections as well. Notice that we only consider countableoperations; we do not require that arbitrary uncountably infinite unionsof sets in the o-fíeld must belong to the a-field. Nevertheless a o-fieldis in general a very large collection of sets.

For example, the field F of finite unions of intervals in X is not aa-field. We could try to make it into a a-field by considering F1, thecollection of all countable unions of intervals of X. This will satisfy thelast axiom, but it is not closed under complements (for example, theCantor set is not in F1, but its complement is), so it isn't even a field.We could try to fix this by taking F2 to be all countable intersectionsof sets in F1 and then F3 to be aH countable unions of sets in F2 andso on alternately taking countable intersections and unions and finallysetting F 00 = U~ 1Fj. It is not hard to show that F 00 is again a field,but alas it too is not a zr-fíeld. We need to repeat this process through asophisticated transfinite induction up to the first uncountable ordinal inorder to obtain a a-field! Naturally such a complicated "construction"is of little use and so we rely on a non-constructive description. If:Fis any field of sets we define the a-field generated by :F, denoted byFe" to be the intersection of all zr-fíelds containing:F. We leave it asan exercise to verify that this is actually a o-fíeld and is the smallestrr-field containing :F, in the sense that any a-field containing F mustcontain :Fu. When F is the field of finite unions of intervals containedin a fixed interval X of R we call :Fu the o-fíeld of Borel subsets of Xand we call sets in :Fu Borel sets. The Borel sets can equally well bedescribed as the smallest o-fíeld containing all the open sets (or all the

Figure 14.1.3:

63314.1 The Concept 01Measure

Page 653: Strichartz_The Way of Analysis 2000

It is by no means clear that there exists such an extension as calledfor in the definition of Lebesgue measure. We will give a proof thatsuch an extension exists in section 14.2. For the remainder of thissection wewill derive properties of measures that followfrom the aboyethree axioms. These properties will hold for Lebesgue measure, but theproofs are just as easy for general measures. It will also turn out thatthese properties lead to a formula for Lebesgue measure, which is givenin section 14.1.5.

Next we observe sorne simple properties of measures that are con­sequences of the defining properties:

1. (non-negativity) IAI is in [0,00];

2. 101=Owhere 0 is the empty set;

3. (o-additivity) if A = Uf=l Aj with Aj disjoint, IAI= l:;:lIAjl.Note that this implies finite additivity simply by taking all but a finitenumber Aj equal to the ernpty seto

14.1.4 Basic Properties of Measures

We can now state our first main goal as follows: to extend the lengthmeasure 111 from the intervals to the e-fíeld of Borel sets so as topreserve the basic properties. We will call the extended length measureLebesgue measure. More generally, we will define a measure to be afunction IAI defined on a e-fíeld :F of sets (called the measurable sets)satisfying:

closed sets), and this definition maleessense in Rn or any metric space.The point of the aboye discussion is that there is no really satisfactorydescription of what a Borel set is like. However, it is a sufficientlylarge category of sets so that it contains any set that is describablein conventional mathematical terms. (Actually, many authors use aslightly larger e-fíeld of sets, called the Lebesque sets, but there is notmuch to be gained by doing this.)

The point of view we will take in the remainder of this chapter isthat only the Borel sets are important. In this sense we can say thatLebesgue made Cantor obsolete: Cantor wanted to maleeset theory thefoundation of mathematics, while Lebesgue showed that just about allmathematics can be done within the smaller confines of the Borel sets.

Chapter 14 The Lebesgue Integral634

Page 654: Strichartz_The Way of Analysis 2000

2. Continuity from below: If Al ~ A2 ~ A3 ~ ... is an increasingsequence of measurable sets and A = U~ 1Aj, then lA I= lim, --->00IAj [.To see this we define the difference sets Bk = Ak \Ak-l for k = 2,3, ...and observe that A is the disjoint union A = Al U B2 U B3 U . ", soIAI = IAll + IB21+ IB31+ ... by the a-additivity. On the other handif we terminate the union we have Al U B2 U ... U B¿ = An a disjointunion, so IAll + IB21+ ... + IBnl = IAnl and so IAI= limn--->ooIAnl·

3. Conditional continuity [rom above: If e, :2 B2 :2 B3 :2 ... isa decreasing sequence of measurable sets and B = n~l B, and if themeasures IBj I are finite, then IBI = limj--->ooIBj l. To see this we definethe difference sets Ak = Bk \Bk+l for k = 1,2, ... and observe that B;can be written as a disjoint union B; = B U Al U A2 U ... (see Figure14.1.5); hence, IBll = IBI + IAll + IA21+ ''', which we can write asIBll - IBI = L:~l IAj I since IBll and, hence, IBI are finite. On theother hand B; = B¿UAl UA2U· .. UAn-l is a disjoint union, so IBll =IBnl + IAll + ... + IAn-ll· Thus IBll-IBnl = L:j::{ IAjl--* IBll-IBIas n --* 00, so limn--->ooIBnl = IBI. The requirement that IBj I be finitewas used in the argument to avoid meaningless manipulations with00 - 00 expressions. To see that it is actually necessary for the resultconsider the example of Lebesgue measure on IR with B¿ = (n,oo).Then nB¿ = 0, but limn--->ooIBnl = +00.

4. Subadditivity: If Al, ... ,An are measurable sets, not necessarilydisjoint, then IAl U ... U Anl ::; L:j=l IAnl. To see this we replace the

Figure 14.1.4:

B

1. A measure is monotone: If A and B are measurable sets withA ~ B, then IAI ::; IBI. To see this observe that B is a disjoint unionof A and B\A, as shown in Figure 14.1.4, so IBI = IAI + IB\AI byadditivity and, hence, IBI ~ IAI since IB\AI ~ o.

63514.1 The Concept 01Measure

Page 655: Strichartz_The Way of Analysis 2000

Let us return to the special case of Lebesgue measure. Although theproof of the existence of this measure is very difficult, it is not toohard to derive a formula for the measure. The remaining task will thenbe to show that the three axioms for a measure are indeed satisfied.Suppose that a set B is covered by a countable union oí intervals,B ~ U~l 1j. Then if B is to be measurable, the u-subadditivitywould imply IBI s E~l 11jl. Notice that here we know what 11j1 is:

14.1.5 A Formula for Lebesgue Measure

union with an equivalent disjoint union Al U·· 'UAn =B,UB2U" ·UBnby defining Bl = Al, B2 = A2\Ab B2 = A3\{Al U A2),"" Bn =An\{Al U·· ·UAn-l). Then the B, are disjoint and B, ~ Aj, so IBjl ~IAjl by monotonicity. Thus IAlU···UAnl = :Lj=l IBjl ~ :Lj=lIAjl·

5. u-Subadditivity: If Al, A2, ... is a sequence of measurable sets,then IU~l Ajl s ¿:~lIAjl· By essentially the same argument as inthe finite case we write ~~l Aj = U~l B, where the .B_j .are disjointand s, ~Aj. Then IUj=l Aj I = Ej=l IBjI by e-additivity, and theresult follows.

Frequently we need to combine subadditivity and monotonicity toobtain the following statement: if B ~ Al U ... U An, then IBI ~El=l IAj l· A similar statement holds for countable unions.

Figure 14.1.5:

Chapter 14 The Lebesgue Integral636

Page 656: Strichartz_The Way of Analysis 2000

We can see immediately sorne consequences oí this definition. Firstnote that the arbitrary countable union of íntervals can be replacedby a disjoint union to cover the same set B while reducing the valueof E~l 11jl· To do this we consider the differences lit 12\lb13\(11 U12)"" as in the proof of the subadditivity. These differences are notnecessarily íntervals, but they are clearly finite unions oí intervals, soaltogether we end up with a countable disjoint covering by intervals.Since the definition IBI involves an infinum, we clearly obtain the samevalue if we restrict to disjoint coverings. We may also assume that theintervals Ij are open, since we can always expand Ij to make it open,

just the length of the interval lj. Thus any countable covering oí aset by intervals gives us some information about the measure of thatseto Of course the same would be true about finite coverings oí Bby intervals. However it is a key observation of the Lebesgue theorythat the information obtained from countable coverings is much moreprecise than that obtained from finite coverings. To understand thiswe should look at an example. Let B be the set oí rational numbersin [0,1]. Then if we want to cover B by a finite union of intervalsit is not hard to see that we must cover the entire interval [0,1], sothe lengths must add up to at least one. Thus we obtain only theestimate IBI ~ 1. However, using a countable cover gives us greaterftexibility. IfTI,T2, ••• is an enumeration oíthe rational numbers in [O, 1]set lj = (rj - 2-je, rj +2-je). Then U~I Ij ~ B and U~llljl = e, soIBI ~ e. Since this is true for every é > O, we have IBI ~ O and, hence,lB I = O (ií we are willing to use closed intervals in the cover we cantake lj = [Tj, Tj] and get IBI ~ O immediately). In this case we havedetermined the value oí IBI exactly by considering countable coveringsby intervals. It turns out that this is true if B is any Borel seto

Suppose B is an arbitrary seto We have seen that if B is to be mea­surable we must have IBI ~ ¿:~llljl where U~I t, is any countablecovering of B by intervals. Similarly we must have IBI ~ infU.~IIIjlwhere the infinum is taken over all countable coverings of B by intervals.The heart of the proof of the existence of Lebesgue measure is to take thisto be an equality, to define

63714.1 The Concept 01Measure

Page 657: Strichartz_The Way of Analysis 2000

if we adopt the aboye definition for the measure of open sets. Thisproperty is called outer regularity (see exercises for inner regularity).In other words, once we know what the Lebesgue measure is for opensets, the outer regularity property tells us what it is for all Borel sets.

In particular, if B happens to be an interval, then by Corollary14.1.1 we know that the Lebesgue measure of B coincides with thelength of B. Thus Lebesgue measure does extend the length measureof intervals.

We now want to look at two other special cases when the definitionof Lebesgue measure can be given a significant interpretation. Firstsuppose inf {L~ 1 IIj I : B ~ U Ij} = O. Then we have no choice bythe a-subadditivity but to set IBI = O. Such sets are said to havemeasure zero. They will play an important role-that of "negligible"sets that can be ignored-in the integration theory. It is not difficult toshow from the definition that sets of measure zero have the propertiesthey should have: every Borel subset of a set of measure zero hasmeasure zero, and any finite or countable union of sets of measure zerohas measure zero. In particular we have a proof of a trivial kind ofzr-additivity for sets of measure zero.

Second, we consider the case of an open seto Recall that we proveda structure theorem for open sets, B =Uj Ij where Ij are disjoint openintervals, the union being finite or countable. Furtherrnore, this decom­position is unique. If we believe the countable additivity of Lebesguemeasure, we must have IBI = ¿j IIj l. Now the definition of Lebesguemeasure is somewhat different in that it requires that we take the inf ofall such sums over countable covers of B by intervals; the union givenby the structure theorem is just one of these covers. 1t certainly seernsplausible that this is the best cover so that the infimum is actuallyachieved. This can be proved directly without using the zr-addit.ivity(see exercise set 14.1.7).

Consider a general Borel set B. As we observed before, in thedefinition of IBI we can restrict attention to disjoint open coverings,so B ~ U t, says exactly B ~ A where A is open (A = U Ij) and so¿j IIjl = IAI. Thus the definition becomes simply

IBI = inf{IAI : B ~ A, A open}

increasing its length by é2-j; and the increase in ¿~1 IIj I is at mosté.

Chapter 14 The Lebesgue Integral638

Page 658: Strichartz_The Way of Analysis 2000

Although Lebesgue measure on an interval oí R is the principal exampleoí a measure in which we wiIl be interested, there are a few otherexamples worth keeping in mind:

1. Wecan define a Lebesgue measure on R". In place of the intervalswe consider rectangles 1 = 11x 12x ... x In = {x in Rn : Xl is in 11,X2 isin 12, ... , Xn is in In} where 11, ••• , In. are intervals in 1R, with measure(volume) equal to 1111.1121.... ·11nl. The Lebesgue measure of a setA that is a countable disjoint union oí rectangles is the sum oí themeasures oí the rectangles, and IBI = iní{IAI : B ~ A} with A as

14.1.6 Other Examples of Measures

(Outer regularity is also an interesting property in its own right, andmany other measures also share this property.)

Next we observe that we can replace the infimum by a limit: theremust exist a sequence of open sets Al, A2,... such that IBI =limj_oo IAj l. (This is just a consequence of the properties of the in­f over any set oí real numbers.) By the monotonicity we can obtainthe same limit using A¡, Al n A2, Al n A2 nA3, . ... Indeed we write~ = Al n A2 n ... n An. It is easy to see that ~ is also a count­able disjoint union of open intervals and B ~ ~. Since A~ ~ An, wehave I~I s IAnl, so IBI s liIDn_ooIA~Is liIDn_ooIAnl = IBI. Thepoint oí this is that the sequence A~,A2, ... is decreasing. Finally setA = n~lAj. Since B ~ Aj for all i,we have B ~ A. In summary, forevery Borel set B, there exists a decreasing sequence A~,A~ oí coveringsby countable disjoint unions of open intervals, such that limj_oo IAj I isthe value we have defined for IBI. Hwe had B =A we would certainlywant to take IBI = limj_oo IAjl¡ in fact we would have to take thisvalue by the condicional continuity from aboye ií the measures IAj I arefinite (this is always the case if the universe X is a bounded interval).In the general case we are merely asserting that A\B has measure zero.

A set that is the countable intersection oí open sets is called a G6

set (the G stands for "open" and the 6 stands for "intersection"). Asthe aboye discussion shows, the definition oí the measure of a G6 set isperíectly natural. Now the class oí G6 sets is quite large, but it is nota e-fíeld. But if we modify the G6 sets by sets of measure zero we doobtain the Borel e-fíeld: every Borel set B can be covered by a G6 setA such that A \B has measure zero, and IBI= IAI.

6391,4.1 The Concept 01Measure

Page 659: Strichartz_The Way of Analysis 2000

aboye for a general Borel set B. We will use this measure in the theoryof multiple integrals.

2. Let the universe X consist of the positive integers {1, 2, 3, ... },and define IAI to be the number of points in A (+00 if A is an infiniteset) . This is called counting measure, and it is defined on all subsetsof X. It is trivial to verify that the axioms for a measure are satisfied.We will see that the integration theory associated with this measureis the theory of absolutely convergent series. More generally, countingmeasure can be defined on any universe X.

3. Let the universe X be a finite set (Xl!"" xn). Let P1,P2,··· ,Pnbe any non-negative values (+00 is allowed); and define IAI to be thesum of Pi for all points xi in A, where A is any subset of X. Againit is trivial to verify the axioms for a measure. In this case we canalso easily show that any measure defined on all subsets of the universeX must have this formo If the values Pi also satisfy 1:7=1Pi = 1 (orequivalently, IX I = 1), then we can interpret them as the probabilities.Thus Pj is the probability that xi occurs; IAI is the probability thatone of the x's in A occurs,

4. More generally, for any universe X and any measure such thatIX I = 1 we can interpret the measure as giving probabilities for "ran­dom" events with outcomes in X, IAI being the probability that theoutcome Hes in A. Conversely, in the point of view pioneered by Kol­mogorov and now almost universally accepted by mathematicians, ev­ery description involving probabilities may be cast into this formo Togive one important illustration, let us give such a decription for therandom tossing of a fair coin in an infinite sequence of independenttrials. The universe X consists of all possible outcomes, i.e., sequencesof heads (H) and tails (T), as HTHT .... If A is a subset of X thatis measurable (i.e., belongs to a certain e-fíeld of subsets that we willnot describe explicitly), then IAI will be interpreted as the probabil­ity that a random sequence of tossings will He in A. For example, ifA consists of all sequences beginning with H, then IAI = 1/2; andmore generally if A consists of all sequences whose first n outcomes arespecified, then IAI = 2-n• However, for more general sets it is not im­mediately apparent how to define IAI. There is a device for doing this,however. We interpret H as Oand Tasi, and each infinite sequenceHT ... as a binary expansion .01 .... In this way we obtain a map­ping of the universe X onto the unit interval [0,1]; this mapping is not

Chapter 14 The Lebesgue Integral640

Page 660: Strichartz_The Way of Analysis 2000

6. Prove that a countab1e union of sets of Lebesgue measure zero has

5. Prove that the Cantor set (delete midd1e thirds) has Lebesguemeasure zero.

4. Prove that if A = Uj=lIj, a disjoint union of intervals, then¿:j=l IIj I is independent of the particular decomposition. Showthat ifwe define IAI= Ej=lIIjl, then al1the axioms for a measureare satisfied on the fie1dof finite unions of intervals, where (7-

additivity means IU~l Aj I=¿:;l IAj I if Al, A2, ... and U~l Ajare all in the field. (Hint: use the u-additivity on intervals provedin the text.) Why doesn't this argument establish the existenceof Lebesgue measure?

3. Prove that the intersection of al1e-fields containing a fie1d:F is ao-field and that it is the smallest e-fíeld containing :F.

2. Show that the collection of all finite unions of intervals forms afield. Show the same is true for finite unions of intervals in R thatare 1eftopen and right closed, (a, b].

1. Show that a fie1dof sets is closed under finite intersections, anddifferences.

14.1.7 Exercises

quite one-to-one because of identifications such as .0111... = .100 ... ,but there are only a countab1e set of such exceptions and this set willhave measure zero. Furthermore, if we consider the set A in X of allsequences starting with H, this is mapped into the interva1 [0,1/2], soIAIis equal to the Lebesgue measure of [0,1/2]. More generally, if Ais the set of al! sequences whose first n outcomes are specified, this ismapped into an interval of the form [k/2n, (k + 1}/2n] and IAIstill a­grees with the Legesguemeasure of the image of A. We can thus defineIAI for a general subset of X as the Lebesgue measure of the imageof A under the mapping (A wiHbe measurab1e if its image is a Bore1set). In this way we obtain a new interpretation of Lebesgue measureon [0,1] as the probability measure associated with an infinite sequenceof independent tosses of a fair coin.

64114.1 The Concept 01Measure

Page 661: Strichartz_The Way of Analysis 2000

16. Let 1-" be a measure on a zr-fieldY, and let F be a set in Y. Definethe restriction of 1-" to F, denoted I-"IF, by I-"IF(A) = I-"(F nA).Prove that I-"IF is a measure on Y.

15. Prove that for compact sets A, the Lebesgue measure can be com­puted using only finite coverings,

14. Prove the foHowing inner regularity for Lebesgue measure: IBI =sup{IFI : F ~ B is closed}, for all Borel sets B. (Hint: if B iscontained in (-N, N) use the outer regularity for (-N, N)\B.)

13. Prove directly (without assuming o-additivity ) that IU~l t,I =¿~l IIj I ir {Ij} are disjoint intervals and 111 denotes Lebesguemeasure.

11. Prove that every open subset of R" is a countable union of rect­angles. Can the rectangles in the union be taken to be disjoint?

12. Prove that any measure on the field of aH subsets of a finite set Xhas the form IAI= ¿xoinAPj for some values P¡ in [0,00]. What

Jif X is countable?

8. What is the Lebesgue measure of a countable set? Is the sametrue of a general measure?

9. Prove that a countable intersection of countable unions of intervalsis a G8 seto

10. Prove that the class of finite unions of rectangles in ]Rn is equalto the class of finite disjoint unions of rectangles in ]Rn and formsa field.

7. What is the Lebesgue measure of the set of irrational numbers in[0,1]?

measure zero, directly from the definition IBI = Oif

inf {~ IIj I : B <;; ,Q t,} = o. .

Chapter 14 The Lebesgue Integral642

Page 662: Strichartz_The Way of Analysis 2000

Notice that axioms 1 and 2 are the same as the corresponding ax­ioms for measures and that axiom 3 is a weakening of a-additivity.

3. (a-subadditivity) if A = U~l Aj with all Aj in :F, then I-"(A) ~¿~ll-"(Aj);

4. (monotonicity) if A < B are in :F, then I-"(A) ~ I-"(B).

2. 1-"(0) = O;

1. (non-negativity) I-"(A) is in [0,00] for every A in :F;

In this section we will give a proof of the existence of Lebesgue measure,following a method of Carathéodory. As a bonus we will also obtain theexistence of other measures, including the Hausdorff measures that areused in the theory of fractals. The strategy of the proof is as follows.First, we weaken the axioms for a measure to conditions that are easyto verify. The resulting object will be called an outer measure. (This istruly dreadful terminology, because an outer measure is not a specialcase of a measure but something more general.) Itwill be easy to obtainexamples of outer measures; in particular, the definition of Lebesguemeasure yields an outer measure.

The second step in the proof is to show that an outer measure doesyield a measure if we restrict the a-field of sets appropriately. That is,we start with a rr-field :F and consider a possibly smaller collection :Foof sets that satisfy a certain "splitting" condition. We prove a generaltheorem to the effect that :Fo is a a-field and the restriction of the out­er measure to :Fo is indeed a measure. Such a general theorem couldyield a vacuous result in special cases because :Fo could be very small,perhaps consisting of just the empty set and the whole space. To givesignificance to the general theorem we need the third step in the pro­gram, which gives a criterion for concluding :Fo = :F. Fortunately, thiscriterion is easy to verify for Lebesgue measure and for other measuresas well.

To begin the first step of the proof we give the axioms for an outermensure 1-" defined on rr-field of sets :F:

14.2.1 Outer Measures

14.2 Proof of Existence of Measures*

64314.2 Proof of Existence of Measures

Page 663: Strichartz_The Way of Analysis 2000

Since the splitting condition is an exact additivity statement forthe disjoint sets B n A and B\A, we cannnot expect it to hold veryoften, unless t-t is a measure, in which case it would always hold. It isobviously true for the empty set and the whole space but may not holdfor any other sets. What the next theorem says is that things are verygood for the sets that do satisfy the splitting condition.

for every set B in F.

t-t(B) = t-t(B nA) + t-t(B\A)

Since we have an inequality, it is not necessary to assume the sets Ajare disjoint. Ordinary additivity is not being assumed, but of coursewe have subadditivity for finite unions as a consequence ofaxioms 2and 3 by taking all but a finite number of sets equal to the empty setoWe have already noted that axioms 3 and 4 are properties that hold formeasures, so every measure is an outer measure. (Note: Most booksrequire that F be the zr-fieldof all subsets of X in the definition of out­er measure. This is totally unnecessary and contrary to the philosophythat non-measurable sets should play no role in analysis.)

Now, as promised, I will show that Lebesgue measure, defined byt-t(A) = inf{¿~l IIj 1 : A < U~l Ij}, is an outer measure on theBorel sets (or any zr-field, for that matter). Indeed, axioms 1, 2, and4 are immediate consequences of the definition; and axiom 3 followseasily because if we have countable coverings of each of the sets Aj byintervals, we can take the union of these coverings to obtain a countablecovering of A by intervals. Thus if Aj < Uk t.; then A ~ U, Uk IJk,so t-t(A) ::; ¿j ¿k IIjkl; and if we take the coverings of Aj so that¿k IIjk 1 ::; t-t(Aj) +é2-j (note that if t-t(Aj) = +00 for any i, then thereis nothing to prove) we have t-t(A) ::; ¿j(t-t(Aj)+é2-j) = é+ ¿j t-t(Aj).Since this is true for any e > O, it is true for e = O and we have therequired a-subadditivity.

Since we have not worked very hard to establish the outer measureaxioms for Lebesgue measure, we do not deserve to obtain very muchas a consequence. Still, we get a little more than we deserve out ofthe following theorem. First we need to define the splitting condition,which will enable us to carry out the second step in the proof.

Definition 14.2.1 Let t-t be an outer measure on a a-field F. We saythat a set A in F satisfies the splitting condition if we have

Chapter 14 The Lebesgue Integral644

Page 664: Strichartz_The Way of Analysis 2000

If instead of starting with B we start with BI UB2 UB3 and repeat theaboye arguments we obtain

NowAl satisfies the splitting condition, so splitting B yields JJ(B) =JJ(Bl UB3) +JJ(B2UB4). Also A2 satisfies the splitting condition, so wecan use it to split the sets B,U B3 and B2 U B4 to obtain JJ(BI U B3) =JJ(B¡) + JJ(B3) and JJ(B2 U B4) = JJ(B2) + JJ(B4). Combining theseresults yields

All four sets Bj are in F. The splitting condition for Al U A2 andB that we need to prove is

Figure 14.2.1:

B

Proof: It is clear from the definition that A is in Fo if and onlyif the complement of A is in Fo because the splitting condition is thesame for A and eA (B\A = B n eA). We first show that Fo is a fieldand that JJ is finitely additive on Fo. So let Al and A2 be in Fo, andtake any set B in F. Write B = BI UB2UB3UB4 (disjoint), accordingto the Venn diagram shown in Figure 14.2.1.

Theorem 14.2.1 Let JJ be an outer measure on the o-field F, and letFo denote the seis in F that satisfy the splitting condition. Then Fo isa o-fietd, and JJ restricted to Fo is a measure.

64514.! Proof 01Existence 01Measures

Page 665: Strichartz_The Way of Analysis 2000

n

IJ(B nFn) = LJJ{B nAj)j=l

by induction on the splitting property since the sets Aj are in :Fo anddisjoint. By the u-subadditivity (applied to B nA = U~l(B n Aj))we have IJ(B nA) sE~l IJ(B nAj). Altogether we have shown

IJ{B nA) + IJ{B\A)

and furthermore

Jl (B) = Jl (B nFn) + Jl (B\Fn)

so it sufficesto establish the reverse inequality.Consider the sets Fn = Uj=l Aj. Since Fn ~ A, we have B\A ~

B\Fn and so IJ(B\A) ~ IJ{B\Fn) by the monotonicity of IJ. On theother hand, Fn is in :Fo because :Fo is a field;so Fn splits B, yielding

IJ{B) ~ IJ{B nA) + IJ{B\A)j

holds. Now ¡.J is an outer measure, so by the u-subadditivity, whichimplies finite sub-additivity, we already know

IJ{B) = IJ{B nA) + IJ{B\A)

Thus IJ is finitely additive on :Fo.Next we showthat :Fo is a o-field. Sincewe have shown that it is a

field, it sufficesto showthat it is preservedby countable disjoint unions(since any countable union can be replaced by a disjoint one withoutleaving the field). Suppose {Aj} is a disjoint sequence of sets in :Fo,and let A = U~l Aj. Given any B in :F, we need to show that thesplitting formula

So :Fo is a field. Furthermore, if Al and A2 in :Fo are disjoint, thenusing Al to split Al UA2 yields

IJ{B) = (IJ(Bl) + JJ{B2)+ IJ{B3)) + IJ(B4)

= IJ{Bl UB2 UB3) + IJ(B4).

This proves the desired splitting of B,

Chapter 14 The LebesgueIntegral646

Page 666: Strichartz_The Way of Analysis 2000

il A and B are separated.

~(A UB) = ~(A) + ~(B),

Deftnition 14.2.2 We say two seis A and B in X are separated il thedistance from A to B (the inf 01d(x, y) for all x in A and y in B) ispositive. We define a metric outer measure to be an outer measure thatis additive on separated seis,

We now come to the third stage in our process. We are going to give asimple criterion for Fo to equal :F in the last theorem. This will allowus to conclude that JJ is a measure on all of F. We assume that X isa metric space and F is the e-fíeld of Borel sets. Recall that this isdefined. to be the smallest o-fíeld containing the open sets (or closedsets, since these are the complements of open sets).

14.2.2 Metric Outer Measure

which is exactly the u-additivity of ~ on :Fo. This is the only axiomfor a measure not contained. in the axioms for an outer measure, so ~restricted to Fo is a measure. QED

00

~(A) =E~(Aj),j=1

This completes the verification that Fo is a u-field. But if we look atthe last string of inequalities for B = A we find

= lim ~(B) = ~(B).n-oo

n

= n~~E~(BnAj)+~(B\A)j=1

< nl!..~(~I'{B nAj) + I'{B\Fn))

= lim (~(B nFn) +~(B\Fn))n-oo

00

s E~(BnAj) +~(B\A)j=1

64714.! Proof 01Ezistence 01Measures

Page 667: Strichartz_The Way of Analysis 2000

B\A = U n, (disjoint)j=l

00

for every B in F. Notice that we already know this is true if B n Aand B\A are separated. But in general, B nA and B\A are disjointbut not separated.

Nowwewant to use the fact that A is closedand weare in a metricspace. We break up B\A into ''rings'', as indicated in Figure 14.2.2,that is

~(B) = ~(B n A) +~(B\A)

Proof: By the previous theorem it sufficesto show that :Fo = :F.SinceF is generated by the closed sets and :Fo is a e-fíeld, it sufficesto show that every closedset A is in Fo, whichmeans

Theorem 14.2.2 (Carathéodory) A metric outer measure is in fact ameasure on the Borel sets.

Of course, separated sets are disjoint, but disjoint sets need not beseparated (think ofadjacent open intervals). It is usually not difficulttoverifythe metric conditionin specificcases. For example, take Lebesguemeasure. Notice that in the definition of ~(A) we can always assumethat the coveringintervals have length less than e, for any fixed e. (Ifnot, break any interval of length greater then e into a disjoint unionoí small intervals; this does not change the sum E 11jl.) Given twoseparated sets A and B, say oí distance e apart, coverA UB by U1jwith 11jl < e. Then each 1j can meet only one of the sets A or B (orelse the distance apart would be less than e). Thus we can pick apartthe coveringof A UB into a coveringof A and B, say A ~ U Ij andB ~ U1j' with {lj} U {Ij/} = {lj} so that L11jl =L11jl +L Ilj/l·rr we chose the coveringA UB ~ U1j so that ~(A UB) ~ LIljl- 8,then ~(A U B) ~ ~(A) +~(B) - 8. Sincethis is true for any 8 > O, it istrue for 8 = O. This completes the verificationthat Lebesguemeasureis a metric outer measure, since the reverse inequality is automaticby subadditivity. Essentially the same argument shows that Lebesguemeasure on Rn is also a metric outer measure.

Thus the next theorem will complete the proof of the existence ofLebesguemeasure.

Chapter 14 The Lebesgue Integral648

Page 668: Strichartz_The Way of Analysis 2000

To complete the proof we only need

¡.&(B) ? ¡.& ((B nA) UjQ Rj) ,

/J(B) ~ /J(B n A) + /J (ÜRj) .J=1

so

By monotonicity we have

Notice that the sets B nA and Ui=l Rj are separated (distance atleast l/n), so we have

Figure 14.2.2:

The fact that A is closed is equivalent to the statement that d(x, A) > Oif x is not in A because d(x, A) = Osays that x is a limit-point of A.

A

where the ring H¡ is defined to be the set of points in B\A whosedistance to the set A(d(x,A) = inf{d(x,y): y in A}) satisfies

1/i ~ d(x, A) < l/U - 1).

64914.2 Proof 01Existence 01Measures

Page 669: Strichartz_The Way of Analysis 2000

Lebesgue measure on the line has the simple scaling property that ifwe dilate a set by a factor t, then the measure is multiplied by t. Inn-dimensions the corresponding factor is t" (so if your height doubles,your weight should increase by about a factor of eight, assuming youlive in three dimensions). Hausdorff had the brilliant insight that wecan construct measures that scale with factor tO: for any positive Q bymaking a simple change in the definition of Lebesgue measure. Haus­dorff's construction works in any metric space. In recent years, theseHausdorff measures have played an important role in the developmentof fractal geometry.

14.2.3 Hausdorff Measures*

which would followfrom the convergence of the series 2:~1 IJ(Rj}. Weneed one more trick to get this. Notice that the convergence wouldfollow if we could show 2: IJ(Rj) converges when summed separate­ly over even and odd values of i- The point of this is that the setsR1, R3, Rs, ... are all separated, as are the set R2, R4,~, .... ThusIJ(Rl) + IJ(R3) + ... + IJ(R2j+l) = IJ(Rl U R3 U ... U R2J+¡) by themetric hypothesis on IJ. Since IJ(Rl UR3 U... UR2j+l} ~ IJ(B\A} bymonotonicity and wemay assume IJ(B\A} < 00 (or else IJ(B) = +00 bymonotonicity and the splitting is obvious), we have the convergence ofthe odd sums. A similar argument shows that the even sums converge.QED

00

lim """ IJ(Rj} = O,n-oo L...Jj=n+l

so it suffices to show

is automatic by subadditivity.But by O'-subadditivity we know

IJ(B) s IJ(B nA) + IJ(B\A}

since the reverse inequality

Chapter 14 The Lebesgue Integral650

Page 670: Strichartz_The Way of Analysis 2000

Proof: First we observe that JJa is an outer measure. The proof ofthe required properties is routine and is left to the exercises. Then weobserve that JJa is a metric outer measure. The proof is almost the

Theorem 14.2.3 JJa is a measure on the Borel sets.

The limit defining JJa(A} always exists in the extended real numbersbecause JJ~) (A) increases as é -+ O (it is the infimum over a smallercollectíon of coverings for smaller s).

If A is ~ bounded set we will always have a finite value for JJ~\A},but we may nevertheless have JJ~) (A) = +00. In fact, if a < 1we willusually have JJa(A} = +00 unless A is very thin. For example, if A isa nonempty open interval, then JJa(A) = +00. The reason we requirea ::; 1 is that if a > 1, then JJa(A} = O for every set A. We leave thedetails to the exercises.

Deftnition 14.2.3 For any fixed e > O, define JJ~)(A} =

inf {~ 11; 1'" : A ~ ;Q 1; and the intervals li all satisfy 11; 1 :5 e } .

Then define JJa(A} = li~_oJJ~)(A}.

For simplicity we begin with the definitions for subsets of the lineo IfA is a Borel set covered by a countable union of intervals A ~ U~l 1j,we may take ¿:~l 11jla as an upper approximation to the measure JJa.However, the definition inf{¿:~l 11jla} over all such coverings does notwork. The reason it fails is that if we split an interval 1 into pieces,say 1 = Uf=llj (disjoint), then 111= ¿:f=llljl but we do not have1110= Ef=l 11jla for any a :/;1. In fact, for O ::; a < 1, which is thecase in which we will be interested, ¿:f=l 11jla is larger then 1110.Thismeans that ifwe have a covering A ~ U~l 1j with E~llljla small, wecannot automatically replace it with a covering where all the intervalsare small simply by splitting the intervals into small pieces. Since thesmaHness of the covering intervals was a crucial fact that was used inproving the metric property of Lebesgue measure, we will have to buildthe corresponding fact into the definition of JJa.

65114.2 Proof 01 Ezistence 01Measures

Page 671: Strichartz_The Way of Analysis 2000

It is easy to show (see exercises) that /-LI is Lebesgue measure and/-Lo is counting measure and that /-La has the desired scaling property,/-La(8tA) = ta/-La(A) for dilations 8tA = {tx : x is in A}. To get a feelingfor what /-La is like for O< o: < 1 consider the middle-third Cantor setC. Choose o: = log 2/ log3. We claim /-La(C) ~ 1. The idea is that wecan cover C by 2n intervals of length 1/3n (the intervals that remainfrom the unit interval after we have removed middle thirds n times).Thus we obtain an upper bound of 2n. (1/3n)a = (2/3a)n = 1 (becauseQ = log2/log3) for /-L~)(C) for e = 1/3n, hence /-La(C) ~ 1 in thelimito It is trickier to show /-La(C) = 1, and we leave the details to theexercises.

To extend the definition of /-La to IRn or any metric space, we dropthe requirement that the covering sets lj be intervals and allow themto be arbitrary closed sets, interpreting 111 to denote the diameter ofthe set (the sup of d(x, y) as x and y vary over 1). The definition isotherwise the same, and the proof that /-La is a measure is essentiallythe same. (In IRIthe two definitions coincide because every closed set iscontained in a closed interval of the same diameter.) It is much harderto compute Hausdorff measures in ]Rn because we have to consider suchgeneral coverings. For example, it is true that /-Ln is equal to a multipleof Lebesgue measure in IRn, and one can even compute the constant(/-Ln(Br) = (2r)n for a ball B; of radius r), but this requires a deepgeometric fact, the isodiametric theorem: the maximal volume of a setof fixed diameter is attained by a ball. It is usually easy to obtain upperbounds for Hausdorff measure since this only requires finding efficientcoverings. Lower bounds are more problematic because they requireproving estimates for all possible coverings by quite general sets. Forexample, if C is a rectifiable curve in IRn, then /-LI (C) ~ length( C). If

because any countable covering of A UB by intervals of length at moste splits into a disjoint union of coverings of A and B. Taking thelimit as e --+ Owe obtain the additivity for /-La. Finally, we apply theCarathéodory theorem on metric outer measures. QED

same as for Lebesgue measure. If A and B have distance apart 8, thenfor any é < 8 we have

Chapter 14 The Lebesgue Integral652

Page 672: Strichartz_The Way of Analysis 2000

The value ~ao(A) may be anything in [0,00]. We define Qo to bethe HausdorJJdimension of A. Of course, if we can find Qo such that~ao(A) is finite and positive, then Qo is the Hausdorff dimension of A.Thus the Hausdorff dimension of the Cantor set is log2/ log3.

Figure 14.2.3:

o~--------~~~---------

Thus if we graph ~a(A) as a function of Q, there wiIl be a uniquevalue Qo such that ~a(A) = +00 for Q < Qo and ~a(A) = O for Q > Qo.

Proof: Consider any covering A ~ U~l t, with IIj I s e. Thenif {3 > Q we have E~l IIj 1,8= E~l IIj 1,8-aIIj la s e,8-a E~l IIj la,while if {3 < Q we have similarly E~l IIj 1,8 ~ e,8-a E IIj la. Thus~~)(A) ~ e,8-a~~)(A) if {3 > Q and ~~)(A) ~ é,8-a~~)(A) if {3 < Q.

Taking the limit as e -+ Owe obtain the desired resulto QED

Lemma 14.2.1

a. Suppose ~a(A) < oo. Then ~,8(A) = Olor {3 > Q.

b. Suppose ~a(A) > O. Then ~,8(A) = +00 for {3 < Q.

Q(t) is an are length parametrization for t in [a, b), then the coveringsby pieces of the curve corresponding to a partition of {a, b} will givethis estimate. However, it is necessary to make stronger assumptionson the curve in order to conclude that this is an equality.

Using Hausdorff measures with varying Q, we can give the definitionof Hausdorff dimensiono It is based on the followingsimple lemma.

6531.4.! Proof 01Existence 01Measures

Page 673: Strichartz_The Way of Analysis 2000

a. Show that there exist constants el and C2 such that IJ (A) ~cllJn(A) and IJn(A) ~ c2IJ(A) for all Borel sets A.

b. Show that there exists a constant c such that IJ(R} = cIJn(R}for all rectangles R.

10. *Let IJo: be n-dimensional Hausdorff measure on H", and let IJ beLebesgue measure on R".

9. Let C be the Cantor set and Q = log 2/log 3. Let IJ be the restric­tion of IJo: to C, so IJ(A) = IJo:(A n C). Show that IJ(A} ~ IAIO:(where IAI denotes the diameter of A).

8. *Show that IJo:(C) = 1 for the Cantor set C and Q = log 2/ log 3.(Hint: if 1 is an interval that has the form 1 = 11 U12 U13 whereh is a deleted interval in the construction of C and 11 and 13 areadjacent to 12 and of smaller length, then 11110:+ 11310:~ 1110:.)

7. Show that IJo:(8tA) = tO:IJo:(A) for Borel sets in R".

6. Show that IJo: on Rn is translation invariant, IJ(A + y) = IJ(A)where A + y = {x + y: x is in A}.

5. Show that IJl on R equals Lebesgue measure.

4. Show that IJo is counting measure on any metric space.

3. Verify that IJo: is an outer measure.

2. a. Prove that IJo:(A) = Ofor every Borel set A in R if Q > 1.

b. Prove the same for every Borel set A in R,n if Q > n.

1. Verify that Lebesgue measure on the Borel sets in Rn is an outermeasure, and in fact a metric outer measure, so it is indeed ameasure by Carathéodory's theorem.

14.2.4 Exercises

Chapter 14 The Lebesgue Integral654

Page 674: Strichartz_The Way of Analysis 2000

The reason for this particular choice is that it gives us a sequence oípartitions, each being a refinement of the previous ones, such that inthe limit every finite piece of the range is cut up into arbitrarily small

14.3.1 Non-negativeMeasurableFunctionsThe Lebesgue integral is an absolutely convergent integral; J f (x) dxwill be defined if and only if J If(x)1dx is defined and finite. There­fore we will concentrate first on defining the integral for a non-negativefunction. Then for real-valued functions we will split the function intothe difference of its positive and negative parts, f = f+ - r,wheref+ = max(O,/) and r: = max(O,-/) , so that r and r: are non­negative and then define J f (x) dx = J f+ (x) dx - J t:(x) dx. Similar­ly for complex-valued functions wewill integrate the real and imaginaryparts separately.

Let X be a set on which we have defined a measure on a e-fíeld ofsubsets F, Wewilllet IJ or dIJstand for the measure, as is conventional,so we write IJ(A) for the measure of a set and J fdIJ for the integralwith respect to the measure. It is standard terminology to refer tothe pair (X, ¡:) as a measurable space and the triple (X, F, IJ) as ameasure space, and we will refer to sets in ¡: as measurable sets. Itis usually safe to adopt the attitude that all sets that you wiIl everencounter are measurable. Ifyou can write down a description of a set,then it will almost always be measurable, and usually it is a routineexercise to verify this. Of course these remarks apply primarily toLebesgue measure on an interval of the line (or ]in); there are situationsin probability theory (as in the definition of conditional probability)where one deliberately takes a very small u-field oí sets so that not allreasonable sets are measurable.

Now suppose f is a non-negative function on X, in other words,f(x) ~ ° for every x in X. The range oí f is thus [0,00), and forconvenience we will even allow the possible value of +00. Now wepartition the range [0,00]. For convenience let Pn denote the specificpartition

14.3 The Integral

65514.9 The Integral

Page 675: Strichartz_The Way of Analysis 2000

Notice that for this to be well defined the sets {x : k/2n ~ f(x) <(k +1)/2n} must be measurable; this will put sorne sort of restriction onf but of a very weak nature since we believe most sets are measurable.Given that this condition is met, it is clear that L(f, Pn) represents anunderapproximation to the integral because we are multiplying the sizet-t(A) of each set A by the minimum value that f assumes on A, whereA = {x : k/2n ~ f(x) < (k+1)/2n} or {x : f(x) > 2n}. In other words,A = t:' (B) where B is one of the sets in the partition Pn. More gen­erally, for any partition P of [0,00] into a finite number of intervals wecan define L(f, P) = ¿p(inf B)t-t(f-1 (B)) (here inf B means the lowérendpoint of the interval B). Notice al so that as we refine the partitionwe increase the Lebesgue approximate sumo If, say, a particular Bo inP splits into B~ U ... U B~ (disjoint) in a refined partition P'; then(inf Bo)t-t(Bo) ~ ¿~=1 (inf B~)t-t(B~) because inf Bo ~ inf B~ for everyk and t-t(Bo) = ¿~=1 t-t(B~) by the additivity of the measure. Sincewe have chosen the particular sequence of partitions Pn so that eachone is a refinement of the previous ones, we have a monotone increas­ing sequen ce of Lebesgue approximate sums {L(f, Pn)} and can definethe integral as the limit, J fdt-t = limn_oo L(f, Pn). This will be a non­negative extended real number (the value +00 can occur either becauseL(f, Pn) increases without bound or because L(f, Pn) = +00 for sornen). We can be confident that this gives a reasonable definition becausethe maximum size of the intervals on any fixed bounded region goes tozero as n --1- 00 (since we are partitioning the unbounded range [0,00]into a finite number of intervals, we must have one infinite interval).

Before going further with the definition of the integral we brieflypause to fill in the technical details of assuring that t-t(f-1(B)) is de­fined. We say that a function f : X --1- IRis measurable if t:' (B) is ameasurable subset of X whenever B is a Borel subset of IR(recall thatthe Borel subsets are the o-field generated by the intervals). We leaveit as an exercise to verify that it suffices to show that t:' (B) is mea­surable for every interval B (or even every interval (a, 00), or [a, 00),

pieces. Of course there are other ways to accomplish the same thing.Then we form the Lebesgue approximate sum

Chapter 14 The Lebesgue Integral656

Page 676: Strichartz_The Way of Analysis 2000

Proof: We give the proof in sorne of the cases, leaving the othersas exercises. Suppose h = f + g and f : X --+ ~ and 9 : X --+ ~are measurable. Let us define Fa = {x : f(x) > a} and similarlyfor Ca and Ha, so Fa = f-1{(a,00)}. We know that Fa and Ca aremeasurable, and we want to show that Ha is measurable. So we ask:how can h(x) be greater than a? Clearly h(x) > a if f(x) > b whileg( x) > a - b, and this must be true for sorne b if h( x) > a. ThusHa = Ub(Fb n Ca-b). This shows how the sets Ha can be constructedout of the sets Fa and Ca. Unfortunately this construction involves

Theorem 14.3.1 If f and g are measurable functions on X, then af +bg, f . g, f io (if 9 #- O), max(f, g), min(f, g), and Ifl are measureablefunctions. Jf h : IR ---7 IR is measureable (with respect to the a-jieldof Borel sets in IR), then h o f is measurable. Jf f n is a sequence ofmeasurable functions on X, then sup., L; inf'., i-, lim suPfn, lim inf i-:and lim f n (if it exists pointwise) are measurable functions.

or (-00, a), or (-00, a)) to conclude that f is measurable. The reasonfor this is that the intervals generate the Borel sets and t:' preservesset-theoretic operations. It is of course simpler to verify that r : (B) ismeasurable for every interval B (or every interval of the form (a, 00),etc.) than for every Borel set B, so this remark is quite useful. Weexpect that every function we will encounter will be measurable, so itwill be mostly a technical nuisance to have to verify it. Nevertheless,the Lebesgue theory of integration is restricted to measurable function­s only. We can also define measurable functions taking values in theextended reals, by allowing the intervals B to contain +00 and -oo.For technical reasons it is often convenient to do this, and we will notexplicitly distinguish this minor variant.

There is a superficial resemblance between the definition of "mea­surable" function and one of the forms of the definition of "continu­ous" function: you need only interchange the words "measurable" and"open" (here we interpret "measurable'' for subsets of the range ~ tomean "Borel set"). However, because there are so many more measur­able sets than open sets-the axioms for a rr-field are quite generous­there are many more measurable functions than continuous functions.All the usual operations for generating functions preserve measurabili­ty; this explains why you won't "meet" a non-rneasurable function.

65714.3 The Integral

Page 677: Strichartz_The Way of Analysis 2000

In order to use the theorem to show that all "usual" functions aremeasurable, we have to "get started" with sorne basic measurable func­tions. IfX is lR or an interval (or IRn),then every continuous functionis measurable, since 1-1 (B) for B open is open, and hence, measur­able. Also, the characteristic function of a measurable set (the functionthat is one on the set and zero off it) is measurable. These functionsenable us to get started constructing measurable functions. Finitelinear combinations of characteristic functions of measurable sets arecalled simple functions. We write I = L~=1 akXA", where XA denotesthe characteristic functions of A. It is easy to see that we may assumethat the measurable sets Ak are disjoint, even if they are not originallygiven disjoint, and the simple functions are exactly the class 01measur­able functions that assume only a finite set 01 values. Simple functionsplay a key role in our development of the integral, and we will need toestablish some elementary facts about them along the way.

In terms of the simple functions, we can give a new interpretationto the Lebesgue approximate sums. Let I be a non-negative measur­able function, and let P be a finite partition of [0,00] into intervals.For each interval B in the partition, 1-1 (B) is a measurable set and soLp(inf B)X¡-l(B} is a simple function, obtained by replacing I(x) by apossibly smaller value depending in which interval of the parti tion I(x)

an uncountable union that threatens to take us out of the O'-fieldofmeasurable sets. However a closer look at the argument suggests atrick to replace the uncountable union over all real b by a countableunion. Since I(x) is a real number, there must exist a rational numberb such that I(x) > b. Thus we have Ha = Ubrationa.l(FbnGa-b) and thisexhibits Ha as a measurable seto We have used tricks like this before,and we will have to use them again.

Next suppose In : X ~ IRare measurable functions, and let usshow that SUPnIn(x) is measurable. This is easy, since sUPnIn(x) > aif and only if In(x) > a for some n, so {x : SUPnIn(x) > a} =Un {x : In (x) > a} and a countable union of measurable sets is mea­surable. A similar argument works for infn In' Then we obtain theresult for lim sup j., and liminf/n since lim sup j., = infn(suPk>nlk)and liminf In = sUPn(infk~nIk)' The same argument works for lfm/n,if it exists pointwise, because then lim In = limsup In = lim inf In'QED

Chapter 14 The Lebesgue Integral658

Page 678: Strichartz_The Way of Analysis 2000

Let i« = ¿Pn (inf B)Xf-l(B) be the associated simple function.Then {in} is monotone increasing and has limit equal to f pointwise.This is clear if we consider an individual point x and the value f (x).For each n we locate the value f (x) in one of the intervals of Pn, andthen fn (x) is the inf of that interval. This value clearly increases as thepartition is refined and approaches f (x) since the size of the intervals is1/2n (once f(x) ::;2n). In fact it is even true if f is allowed to assumethe value +00, for then fn(x) = 2n where f(x) = +00.

We have thus shown that every non-negative measurable function isobtainable as the pointwise limit of simple functions (it is then straight­forward to obtain the same result for real-valued measurable functions).This justifies thinking of simple functions as the basic building blocksout of which general measurable functions are constructed. Actually wehave proved a slightly stronger statement: every non-negative measur­able function is the pointwise limit of a mono tone increasing sequenceof non-negative simple functions. Having the sequence monotone in-

Figure 14.3.1:

0122n 2n

I I II I I I I I I I I I

Hes. Then L(f, P) is a kind of primitive integral of this simple functionequal to the sum of the "areas" of the "rectangles" (the graphs of thefunctions (inf B)Xf-l(B))' which are the products of the height inf Bwith the measure ¡.L(f-l(B)) ofthe "base" f-l(B). The situation is en­tirely analogous to the interpretation to the Cauchy approximate sumsas integrals of step functions (more precisely the Riemann lower sums),but the simple functions are more versatile than the step functions,since the characteristic function of a measurable set is more generalthan the characteristic function of an interval. In fact, 1 dare not drawa picture of a simple function for fear of lulling you into thinking it is"simpler" than it might be-the measurable sets on which the functionassumes its values might be Cantor sets, or worse.

Now suppose we consider the particular sequence Pn of partitionsof [0,00] described aboye, so Pn consists of [2n,00] together with [O,2n)chopped up into intervals of length 1/2n, as indicated in Figure 14.3.1.

65914.3 The Integral

Page 679: Strichartz_The Way of Analysis 2000

14.3.2 The Monotone ConvergenceTheorem

We start with the definition 01 the integral: If I = ¿~=l anA" is anon-negativa simple function we define f Id~ = ¿~l ak.u(Ak}. It isa simple exercise to verify that this definition is independent of therepresentation: if ¿ akXA" = ¿ bjXBj, then ¿ ak~(Ak} = ¿bj~(Bj}.Next if I is a non-negative measurable function we define f Id~ =limn_ooL(f, Pn} = limn_oo f IndJJ where In = ¿P" (inf B}X¡-l(B)'It is again easy to verify that this is consistent with the special casedefinition: if lis simple, say f = ¿~l akXA" , then limn_oo L(f, Pn} =¿~=l ak~(Ak}. To see this assume, as we may, that the sets Ak aredisjoint and the values ak distinct. Then for n large enough the valuesak will all be less than 2n and fall into distinct intervals of length1/2n of e; so L(f, Pn} = ¿~l bkJJ(Ak} where Ibk - akl s l/2n andso limn_ooL(f, Pn} = ¿~l akJJ(Ak} (note that both sides are +00 ifJJ(Ak) = +00 for some k}.

This definition is extremely simple, but it has the defect that itappears to depend on the particular choice of the partitions Pn. Ifthis were really the case, it would not be worth very mucho Thus our

Of course the limits are not monotone in these examples. The pointis that without monotonicity, or at least sorne such restriction, thereneed be no relationship between the integral of I and the limit of theintegrals of fn, where fn are simple functions such that lim In (x) = I (e)pointwise.

2. Let X = [O,oo} with Lebesgue measure and set In = (l/n}X(O,n)'Again lim In(x} = O, but the integrals (l/n }~(O,n) = 1 do nottend to zero.

1. Let X = [O,1] with Lebesgue measure and set In = nX(O,l/n)'Then lim In(x} = Opointwise, but the integral n~(O, l/n} = 1does not approach zero. If instead we take In = n2X(O,1/n),thenstilllim In(x} = Obut now the integral n2~(0, l/n} = n doesn'teven have a limito

creasing may seem like only a minor improvement, but it turns out tobe crucial for the theory of the integral. To understand its significancewe need to look at some disturbing examples:

Chapter 14 The Lebesgue Integral660

Page 680: Strichartz_The Way of Analysis 2000

Proof: Since we have defined J fdl-" = limn->ooL(!, Pn) and J ikdl-" =limn_oo L(ik, Pn), we need to show limk_oo limn_oo L(ik, Pn) = limn_ooL(limk->oo ik, Pn). Now we claim that the inequality limk_oo J fkdl-" :SJ fdl-" is easy to obtain from the monotonicity. To prove it we ob­serve that since ik :S I, we have L(!kl Pn) :S L(!, Pn). This followsfrom the fact that the simple functions fkn) = ¿Pn (inf B)Xr;:l(B)

and f(n) = ¿Pn (inf B)X¡-l(B) (whose integrals give L(ik, Pn) andL(!, Pn)) satisfy fkn) :S f(n) and the integral for simple functions isclearly monotone: f :S 9 implies J fdl-" :S J gdl-" (see exercises). FromL(ik, Pn) :S L(!, Pn) we obtain J fkdl-" :S J fdl-" by letting n --+ 00, andthen by letting k --+ 00 we obtain limk->ooJ ikdl-" :S J fdl-".

Now we work on getting the rever se inequality. Note that it sufficesto show limk_oo J fkdl-" ~ L(!, Pn) for all n. Now L(!, Pn) = J f(n) duwhere f(n) is a simple function, and limk->ooik = f ~ f(n). To sim­plify notation set f(n) = g. Thus to complete the proof we need toshow that if 9 is any simple function such that limk_oo t» ~ g, thenlimk->ooJ fkdl-" ~ J qdu,

Write 9 = ¿f=l bj XBj where the sets B, are disjoint. We then re­strict the functions ik to the sets Bj by multiplying by XBj, and observe

Theorem 14.3.2 (Monotone Convergence Theorem) Let O :S J¡ :Sh :S ... be a monotone increasing sequence of non-negative measurablefunctions, and let f = limk->oo ik· Then J fdl-" = limk->ooJ ikdl-" (bothsi des may be equal to +00).

next goal is to show that the same value for the integral would beobtained from other sequences of partitions. Recall that the sequencefn = ¿Pn (inf B)X¡-l(B) of simple functions associated with the par­titions Pn had the property that it was monotone increasing and hadlimit f. Clearly the same will be true if we take any other sequenceof partitions such that each is a refinement of the previous one (toget monotonicity) and the maximum length of the subintervals on anybounded region goes to zero (to get the limit). Thus it suffices to showthat limn->ooJ fndl-" = J fdl-" if fn is any monotone increasing sequenceof non-negative simple functions converging to f. It turns out that thesame is true even if the i« are not assumed to be simple. This is thefamous Lebesgue monotone convergence theorem. Since the proof isnot much harder, we will go directly to the general case.

66114.3 The Integral

Page 681: Strichartz_The Way of Analysis 2000

b. monotone, f f dIJ ~ f gdIJ if f ~ g;

C. additive, fAUB fdIJ = fA fd¡.J + f8 fdIJ for A and B disjoint mea­surable seta, where fA fdIJ denotes f fXAdIJ.

a. linear, f(af + bg}dIJ = a J fdIJ + b f gdIJ if a and b are non­negative reals;

Theorem 14.3.3 The integral of non-negative measurable functionsis

With the aid oí the monotone convergence theorem we can deducethe elementary properties oí the integral of non-negative measurablefunctions quite easily from the corresponding properties oí the integralof non-negative simple functions. For example, fU + g)dIJ = f fdIJ +f gdIJ if f and 9 are non-negative simple functions. Then if f and9 are merely non-negative measurable functions, let {In} and {gn}be monotone increasing sequences of simple functions approximatingf and g. Then {In + gn} is a monotone increasing sequence of simplefunctions approximating f +g, so from fUn +gn}dIJ = f fndIJ+ f gndIJwe obtain fU + g}dIJ = f fdIJ + f gd¡.J by passing to the limito In asimilar way we can prove the following theorem.

that !k ~ 2:;=1 isx», by disjointness, so J !kdIJ ~ 2:;=1 f Isx», dIJ. Ifwe can prove limk_oo J fkXBjdIJ ~ bjIJ(Bj) then we will havelimk_oo J fkdIJ ~ f gdIJ by adding over j. In other words, without1088 of generality we can assume N = 1 and so can drop subscriptsand write 9 = bXB. We need to show that 1imk_oo Ik ~ bXB implieslimk_oo f fkdIJ ~ bIJ(B).

Finally we can complete the proof oí the theorem by appealing tothe continuity from below for the measure. Suppose b and IJ(B) arefinite (we leave as an exercise the simple modifications necessary ifeither is +oo). Fix an error e and look at the set Ek where fk(X} ~b - E. Since limk_oo !k(x} ~ b on B, it íollows that Uk:1Ek 2 B.Since fk is increasing, so are the sets Ek; hence limk-oo IJ(Ek} ~ IJ(B)by continuity from below. But since fk(X} ~ b - E on Ek, we havef fkdIJ ~ (b - E)IJ(Ek}, so limk_oo !kdIJ ~ (b - e) limIJ(Ek} ~ (b­e}IJ(B). Since this is true for every error E, we have limk_oo f fkd¡.J ~bIJ(B} as desired. QED

Chapter 14 The Lebesgue Integral662

Page 682: Strichartz_The Way of Analysis 2000

Actually the full Fatou's Theorem says J lim infn_oo Indl-" :::;lim infn_oo J Indl-" even when lim In and lim J Indl-" are not assumedto exist, as long as In are assumed non-negative. We leave it as an ex­ercise to verfiy that essentially the same argument works in this case.

Proof: The idea of the proof is to replace the sequence {fn} bya monotone increasing sequence {gn} with the same limito A lit­tle thought shows that gn = inf k>n!k will do the jobo Note thatgn :::;In, so J gndl-" :::;J Indl-" and passing to the limit limn_oo J gndl-" :::;limn_oo J Indl-". But limn_oo J gndl-" = J Idl-" by the monotone conver­gence theorem. QED

Theorem 14.3.4 (Fatou 's Theorem) Let I = limn_oo In where Inare non-negative measurable functions. 11 limn_oo J Indl-" exists, thenJ Idl-" < limn_oo J Indl-".

Note that if 1-" is Lebesgue measure and A is an interval (a, b),then JA I du plays the role of J: I (x) dx in the Riemann theory of in­tegration. In fact it is easy to see that if I is Riemann integrable,then the two must be equal. Indeed ir I is Riemann integrable theRiemann upper and lower sums for any partition of (a, b) are integral­s of step functions 9 and h such that 9 :::;I :::;h on (a, b) and thestep functions are just simple functions constant on intervals. ThusJ gdl-" < J IX(a,b)dI-" < J hdl-" by the monotonicity of the Lebesgue in-tegral; hence, J IX(a,b)dI-" = J: I(x) dx (of course a separate, technicalargument must first be given to show that if I is Riemann integrablethen I is measurable so J IX(a,b)dI-" is defined). Since the Riemannand Lebesgue integrals agree on their common domain of definition,we will not insist on separate notation and will write J: I (x) dx for theLebesgue integral as well.

The monotone convergence theorem gives us a criterion for inter­changing limits and integrals for non-negative functions, and we haveseen sorne examples in which the interchange is not valido One featureof these examples is that they both result in a "loss of mass" in pass­ing to the limit-the limiting function has a smaller integral than thelimit of the integrals. It turns out that this is always the case-it isimpossible to gain mass by passing to a limito This is a special case ofa famous theorem of Fatou.

66314.3 The Integral

Page 683: Strichartz_The Way of Analysis 2000

We have now discussed a11the important properties of the integral ofnon-negatíve measurable functions, and we would like to pass to thegeneral case of measurable real-valued functions. We can always writef =r -r and so we will define J fdIJ = J r dIJ- J f- dIJ providedboth terms are jinite. For non-negative functions we could afford theluxury of allowing a value of +00 for the integral-this allowed us todefine the integral for every measurable non-negative function and ac­tually simplified the statements of the theorems. For real-valued func­tions, however, we have to avoíd dealing with expressions like 00 - 00and so make the definition that a measurable function is integrableif r and r: both have finite integrals. We then define the integralJ fdIJ = J f+ dIJ- f r:dIJ for integrable functions only. The terminol­ogy is slightly confusing in that we have previously defined an integralfor all non-negative measurable functions, including those that are notintegrable, and we do not wish to recant. It does mean that you willhave to adjust to the possibility that a function that isn't integrablemay have an integral (but the integral won't be finite).

Notice that Ifl = f+ +r,so a measurable function f is inte­grable if and only if Ifl is integrable and we have Minkowski's inequalityIJ fdIJI ~ J IfldlJ· It is in this sense that we say the Lebesgue integralis an absolutely convergent integral. In particular, certain improperRiemann integrals whose convergence depends on cancellation (such asJooo(sin x/x) dx) are not subsumed in the Lebesgue theory. If IJdenotescounting measure on {1,2, 3, ... }, then the integrable functions are theabsolutely convergent series, with the integral equal to the sum of theseries (see exercise set 14.3.5, number 9).

From the properties of the integral of non-negative functions it issimple to deduce the corresponding properties of the integral of real­valued integrable functions. In particular, it is linear (with real coeffi­cients), monotone, and additive. For example, let 's showJ (1+ 9)dIJ = J f dIJ + J gdIJ· This would fo11owimmediately if wehad (1+ g) + = r + s", but this need not be the case since the pos­itive and negative parts of f and 9 can overlap. Nevertheless, fromf = ¡+- r: and 9 = g+ - g- we obtain f + 9 = f+ + g+ - (1- + g-).We also have f + 9 = (J + g)+ - (J + sv. so by equating the twowe obtain ¡+ + g+ - (J- + g-) = (J + g)+ - (J + g)-, which

14.3.3 Integrable Functions

Chapter 14 The Lebesgue Integral664

Page 684: Strichartz_The Way of Analysis 2000

which says J(f+g)dl-" = J Idl-"+ J q du, (A separate argument to showthat all these integrals are finite must be given to justify the rearrange­ment of terms.) The derivation of the other properties is similar, andwe leave it to the exercises. We can also define an integral for complex­valued measurable functions by splitting into real and imaginary parts.Again we leave the details as an exercise.

The monotone convergence theorem, on the other hand, does notextend to real-valued functions. Now sorne sort of interchange of limitand integral is vital to a good theory; essentially the only way to com­pute the integral of "new" functions I is to write I = limk_oo l» wherel» are functions whose integrals we can compute by elementary meansand then to compute J I dp. = limk_oo J Ik dp. by quoting the appropri­ate theorem. Thus we need to find a good substitute for the monotoneconvergence theorem. Lebesgue's dominated convergence theorem willserve this purpose very well. This theorem is motivated by the follow­ing elementary observation: if 9 is a non-negative integrable functionand I is a measurable real-valued function such that 1I1 ~ g, then I isintegrable. This follows immediately from the two facts 1I1 ~ 9 impliesJ Illdl-" ~ J gdl-" and J 1I 1dI-"< 00 implies I is integrable. If 1I1 ~ 9 wesay that 9 domina tes l. It implies that the graph of I lies between thegraphs of +g and -g, a region (shown in Figure 14.3.2) that has finite"area" since 9 is integrable. Now if {In} is a sequence of measurablefunctions, each of which is dominated by g, then the graphs of In areall contained in this region of finite area and so it is plausible that thereis no room for any mass to leak out when we pass to the limito

J (f + g) +du - J (f + g) - du

= (J ¡+d¡J - J ¡-d¡J) +Ug+d¡J - J 9-d¡J) ,

Finally we rewrite this as

we can rewrite as an equality between sums of non-negative functions,1+ +g+ + (f +g) - = 1- +g- + (f +g) +. We integrate this equality anduse the linearity of the integral of non-negative functions to interchangethe finite sums and the integral:

66514.3 The Integral

Page 685: Strichartz_The Way of Analysis 2000

and by subtracting the finite term J gdp, from both sides the inequalitywe obtain ± J fdp, :S ± limn_oo J fndP" which implies the equality.

Without the assumption that J fndp, converges we first pass toa convergent subsequence (the sequence {J fndP,} is bounded sinceI J fndP,1 :S J Ifnldp, :S J gdp,) and then conclude limJ fn,dP, = J fdp,along the subsequence. Since this is true for any convergent subse­quence, we conclude that the whole sequence converges to J fdp,. QED

J (g ± f)djJ < ;i_,~J (g ± In)djJ

;i_,~(JgdjJ ± J IndjJ)

J gdjJ ± ;i_,~J In djJ

Theorem 14.3.5 (Dominated Convergence Theorem) Let {in} be asequence of measurable functions converging pointwise to f. Jf thereexists an integrable function 9 such that Ifn(x)1 :S g(x) for all n and x,then f is integrable and J f du = limn_oo J fndp,.

Proof: Assume first limn_oo J fndp, exists. Then {g - f n} and{g +fn} are sequences of non-negative measurable functions convergingto 9 - f and 9 + t. Since 9 dominates t,f is integrable. By Fatou'stheorem

Ifwe look back at the examples where the limit and the integral cannotbe interchanged, we see that the smallest function that dominates allthe i« is essentially l/x, and this function just fails to be integrable.

Figure 14.3.2:

Chapter 14 The Lebesgue Integral666

Page 686: Strichartz_The Way of Analysis 2000

Proof: Suppose f = 9 a.e. Then we can write f = 9+h where h = Oa.e. To show J fdl-" = J gdl-" it suffices to show J hdl-" = O. Now h = Oa.e. implies h+ = O a.e. and h: = O a.e., so it suffices to show J hdl-" = Oif h = O a.e. and h is non-negative. But J luip. = limn_oo J hndl-" fora sequence of simple non-negative functions increasing monotonicallyto h. Since O ::; hn ::; h, we have hn = O a.e., so hn = 2:t'::1 akXAkwith I-"(Ak) = O. It follows that J hkdl-" = 2:t'::1 akl-"(Ak) = O, henceJ hdl-" = O.

More generally, if A is any measurable set, then f = 9 a.e. impliesfXA = gXA a.e., so the aboye argument shows fA fdJl = f4. gdJl.

Conversely, suppose JA fdl-" = JA gdl-" for every measurable set A.Apply this to the measurable set An = {x : f(x) ::; g(x) - 1/n}. ThenJAn fdl-" < JAn (g - l/n)dl-" by monotonicity; hence,

Theorem 14.3.6 Let f and 9 be integrable functians. Then f = 9 a. e.if and anly if JA fdl-" = JA gdl-" [or every measurable set A.

Suppose f and 9 are two measurable functions that are equal except ona set of measure zero, f(x) = g(x) if x is not in E with I-"(E) = O. Thenwe say f = 9 almost everywhere, abbreviated a.e. More generally we sayof any property that it holds a.e. if it is true for all x except for x in a setof measure zero. Because sets of measure zero are preserved under finiteand even countable unions, the a.e. concept is suitably flexible. Forinstance, if h = gl a.e. and h = g2 a.e., then h +h = gl + g2 a.e.; orif i« = gn a.e. for all n, then limn_oo fn = limn_oo gn a.e. Since sets ofmeasure zero cannot contribute to the integral, we expect that functionsequal almost everywhere should be more or less interchangable whenintegrated.

14.3.4 Almost Everywhere

One important special case of this theorem is the following: if themeasure of the whole space is finite and the sequence fn is uniformlybounded, then limn_oo J fndl-" = J limn_oo fndl-". In this case we cantake 9 == M where M = sUPn,x Ifn(x)l·

66714.3 The Integral

Page 687: Strichartz_The Way of Analysis 2000

3. Prove that if f :X --+ IRis measurable and h : IR--+ IRis measur­able (with respect to the o-field of Borel sets in IR), then h o f ismeasurable.

2. Prove that if f : X --+ IRand 9 : X --+ IRare measurable, thenmax(f, g) is measurable.

1. Let (X,.1') be a measurable space. Prove that f : X --+ IRis ameasurable function if and only if r:' (B) is in .1' for every set Bin a collection B of sets with the property that the smallest rr-fieldcontaining B is the o-field of Borel sets. Verify that this is thecase if B consists of all intervals (a, 00).

Exercises14.3.5

This result means that whenever we have a theorem about integrals,we can replace any condition that is supposed to hold for every x bythe same condition a.e. For example, in the dominated convergencetheorem, we need only assume limn-HXlfn(x) = f(x) a.e. and Ifn(x)1 ~g(x). a.e.

The almost everywhere concept has some rather paradoxical conse­quences. For example, with respect to Lebesgue measure, an individualpoint, or even a countable set of points, has measure zero. Thus as faras integration theory is concerned, the value of a function f(x) at anindividual point x is immaterial. We can always find 9 = f a.e. thattakes a different value at x. If we think of all the functions equal to fa.e. as forming an equivalence class, then there is no meaning withinthis equivalence class of the value at an individual point. Nevertheless,how can we know the function if we don't know its value at any point?There is a resolution to this paradox, but it is beyond the scope of thisbook. The Lebesgue Differentiation of the Integral Theorem says thatthe "average value" limr_o (1/2r )J:~:f (y) dy exists for almost everyx, and this provides a "canonical" choice of value for f (x) for suchpoints. The "average values" are unchanged if we change f on a set ofmeasure zero.

which contradicts JAn fdJ.L = JAn gdJ.L unless J.L(An) = O. NowJ.L{U~=l An} = O by a-additivity, so f(x) 2:: g(x) a.e. Interchangingf and 9 gives g(x) 2:: f(x) a.e., hence f(x) = g(x) a.e. QED

Chapter 14 The Lebesgue Integral668

Page 688: Strichartz_The Way of Analysis 2000

14. Prove that a non-negative measurable function has integral equalto zero if and only if it is zero a.e.

13. Let F = I + ig be. a complex-valued function. Define it to beintegrable if I and 9 are integrable, and define J (f + ig )dJ-L =J I dJ-L+ iJ qdu, Prove that F is integrable if and only if I and9 are measurable and !PI is integrable, and prove Minkowski'sinequality I J FdJ-L1 ~ J IFldJ-L.

11. Suppose In are measurable functions and 2:~=1 I/nl is integrable.Prove that 2:~=1 In is integrable and J (2:~=1In)dJ-L2:~=1J IndJ-L.

12. Restate the results ofproblems 10 and 11when J-Lis counting mea­sure on the positive integers in terms of doubly indexed infiniteseries.

10. Prove that 2:~=1J IndJ-L = J(2:~=l In) dJ-L if In are non-negativemeasurable functions.

7. Prove that a Riemann integrable function on a bounded intervalis measurable. (Hint: it is the pointwise limit a.e. of the stepfunctions involved in the Riemann upper and lower approximatesums.)

8. Prove that if {In} is any sequence of nonnegative measurable func­tions, then J lim infn-+oo IndJ-L :::;lim infn-+oo J IndJ-L.

9. Prove that a series 2:~=1an is absolutely convergent if and onlyif the function n -+ an on the positive integers is integrable withrespect to counting measure.

6. Write out a complete proof that the integral is linear, monotone, andadditive first for non-negative simple functions, then for non-negativemeasurable functions, and finally for integrable functions.

4. Prove that if 2:r=l akXAk = 2:t!:1 bjXBj for every x, then

2:r=l akJ-L(Ak) = 2:t!:1 bjJ-L(Bj).

5. Prove directly that if 2:r=l akXAk ~ 2:t!:1 bjXBj for every x, then

2:r=l akJ-L(Ak) ~ 2:t!:1 bjJ-L(Bj).

66914.3 The Integral

Page 689: Strichartz_The Way of Analysis 2000

14.4 The Lebesgue Spaces L1 and L2

14.4.1 L1 as a Banach Space

We have already seen the utility of thinking of a set of functions asforminga "space" on whichcertain structures are defined. Weparticu­larly made use of the space C(I) of continuous functions on a compactinterval 1,whichis a vector space and has a natural metric space struc­ture given by the sup-norm 1I/IIsup= sUPI I/(x)l. One of the reasonsthe space C(I) with this metric is so useful is that it is complete, so wecan apply the contractivemappingprincíple, for example. Weobservedthat there are also other norms on this space, 11/111 = JI I/(x)1 dx and11/112 = (JI I/(x)l2dx)1/2, the 2-normevenbeing associated with an in­ner product (I,g) = JII(x)g(x)dx. But the space C(I) is not completewith respect to these norms. In this sectionwewillseehowwith the aidof Lebesgue integration we can construct complete spaces with thesenorms, called Ll (1) and L2(I). These will be concrete realizations ofthe "completions" of C(I) with respect to these norms (analogous tothe real numbersas completionofthe rationals). Recall that a completenormed vector space is called a Banach space. Thus we are construct­ing two more examplesof a Banach space. The study of Banach spacesis one of the central topies in twentieth century analysis. Wewill alsogive some applications of the space L2([-1r, 1r]) to Fourier series. Thisis not surprising if you recall the role of the inner product in definingFourier coefficientsand in the derivation of Parseval's identity.

17. Explain why Fatou's Theorem shows that the "hard" inequalityin the proof of the Monotone ConvergenceTheorem is alwaysvalid, evenwithout the assumption ofmonotonicity.Did the proofof this part of the theorem use the hypothesis of monotonicity?Why can't we use Fatou's theorem to simplify the proof of theMonotoneConvergenceTheorem?

15. Prove that if f is a nonnegative integrable function, then 11 (A) =fA fdJ.l is a measure on the same o-field on which J.l is defined.

16. Prove that under the hypotheses of the Dominated ConvergenceTheorem one has liII1n_oo J I/n - IldIJ = o.

Chapter 14 The Lebesgue Integral670

Page 690: Strichartz_The Way of Analysis 2000

Let us consider the space of integrable functions on an interval ormore generally on a measure space (X,:F, IJ). Recall the definition:these are function I : X ~ lR that are measurable and for whichJ 1+dIJ and J 1-dIJ are finite. It is also important sometimes to con­sider complex-valued functions, F: X ~ e with F = 1+ ig and I and9 real-valued integrable functions. Unless we specify otherwise, every­thing we say pertains to either case, although we will usually discussthe real-valued case for simplicity. Now it is easy to see that the inte­grable functions form a vector space, and we can define 11/111 = J I/ldIJ.However, when we check the definition of the norm, we find one prob­lem: the statement "ll/lb = Oimplies 1=0" is not correcto In fact wehave established that J I/ldIJ =Oif and only if I =Oa.e., and exceptin certain special cases (such as counting measure), there are functionsI = Oa.e. that are not identically zero.

To overcome this difficulty we are forced to consider equivalenceclasses of functions that are equal a.e. In other words we say I isequivalent to 9 if I = 9 a.e. It is easy to see that this is an equivalencerelation, so the measurable functions divide into equivalence classes.We then define L 1(IJ) to be the set of equivalence classes of integrablefunctions. It is easy to show that the vector space structure of functionsrespects equivalence classes: al + bg is equivalent to ah + bg1 if I isequivalent to h and 9 is equivalent to g2 (al +bg = afl +bg1 except onthe union of the sets where I '# fl and 9 '# g1, and the union of two setsof measure zero has measure zero). We can then define the norm of anequivalence class to be 1I/Ib = J I/ldIJ for any I in the equivalence classsince the value of 11/111 is constant on the equivalence class. We will[ollos» the usual convention 01 conlounding the equivalen ce class and arepresentative function I in the equivalence class. This is, when we say"let I be a function in L1(IJ)" , we really mean "let I be a representati vefunction in an equivalence class in L1 (IJ)" , with the understanding thatnothing we do to I depends on the particular choice of representative.This convention spares us from overburdening the notation, but it doesmean that one has to be careful interpreting statements.

Now it is a straightforward matter to verify that 11I 111 satisfies theaxioms for a norm on L1(IJ). The triangle inequality comes from inte­grating the pointwise inequality I/(x) + g(x)1 ~ I/(x)1 + Ig(x)l, andthe homogeneity is obvious. The troublesome positivity condition,11/111 =Oimplies f =O,is built into the definition because now 1=0

67114.4 The Lebesque Spaces L1 and L2

Page 691: Strichartz_The Way of Analysis 2000

00

IJ(En) :5 LIJ{x: I/k+l(x) - Ik(X)1 ~ n2-k}k=l00 1

< E;;2kll/k+l - Iklhk=l

~f:!.2k • 4-k = !..k=l n n

and estimate the measure of En by Chebyshev's inequality as

En = {x: IIk+l(X) - Ik(X)1 ~ n2-k for some k}

Proof: Let {lk} denote a Cauchy sequence in Ll(IJ). As usual itsuffices to show that a subsequence converges, for the convergence ofthe subsequence and the Cauchy criterion wiIl give the convergenceof the original sequence. Now we pass to a subsequence that comestogether rapidly, say 11/Á:+l- IÁ:II ~ 4-k (we choose4-k for convenienceonly; actually we only need a sequence ek with Eek < 00). It is aroutine matter to obtain such a subsequence: Given k, we choose j(k)depending on k so that j(k) ~ j(k - 1) and II/j(k} - Imll ~ 4-k forall m ~ j(k), such j(k) existing by the Cauchy criterion, and then setIÁ:= Ij(k)' Since j(k + 1) ~ j(k), we have Il/j(k} - Ij(k+1)11 ~ 4-k asdesired. For simplicity of notation we denote the subsequence by {Ik}.

Now the remarkable fact is that for this subsequence we have con­vergence at almost every point. This fact will enable us to get a holdof the limit function. To verify it we define the sets

Theorem 14.4.1 Ll(lJ) is complete.

Lemma 14.4.1 (Chebyshev' s Inequality) Let I be in L 1(IJ), and lets > O. Then IJ(Es) ~ II/lIds where Es = {x : I/(x)1 ~ s].

Proof: le. II1dIJs f III dIJ= 11/1h·On the other hand s :5 IlionEs, so le. s dIJ :5 fE. II1dIJ and fE. s dIJ= sIJ(Ea). QED

means I and O are in the same equivalence class, I = O a.e, We wantto prove that Ll (IJ) is complete in this norm. Recall that this meansthat every Cauchy sequence converges.

Chapter 14 The Lebesgue Integral672

Page 692: Strichartz_The Way of Analysis 2000

14.4.2 L2 as a Hilbert Space

Althought the space L1 (J.L) of integrable functions is very natural toconsider, it turns out for technical reasons to be less useful than thespace L2(J.L) of measurable functions for which Ifl2 is integrable. Ourdiscussion of L2 (J.L) follows the same outline as before. We will again bedealing with equivalence classes of functions, although we will not indi­cate this in the notation. In the case of L2 (J.L), however, we encounter

so we may apply the dominated convergence theorem to the sequence{Ifk- fl}· Since limk_oo Ifk- fl = O a.e, we have limk_oo J Ifk- fldJ.L =J O dJ.L= O as desired. QED

so 9 is integrable. This shows that f is integrable also since Ifl =1 I:~1 gkl :S g. Finally

lA - /1 < lA 1+ 1/1= t gj + 1/1< (tIgj 1) + 1/1< 21gl,

The measure of En is small, but on the complement of En the sequence{Id clearly converges pointwise by comparison; write gk = ik - ik-1for k ~ 2, gl = h, so I» = I:7=1 gj and Igk(X)1 ::; n2-k for every k onthe complement of En and I:k n2-k converges. Thus {Id convergespointwise on the complement of n~=l En, which has measure zero.

We may now set f(x) = limk_oo fk(X) = I:~1 gk(X) on the setwhere the limit exists and set f (x) = O on the set of measure zero wherethe limit fails to existo By redefining ik(x) to be zero also on this setof measure zero (this just means choosing a different representative ofthe equivalence class) we have f(x) = limx_oo fk(X) at every point, sof is a measurable function. We need to show that f is integrable andlimk_oo Ilfk - flh = O to complete the proof. To do this we will needboth the monotone and dominated convergence theorems. We firstconsider 9 = I:~1 Igkl, which converges a.e. by the aboye argumentoThis will essentially be our dominator. Notice that since Ilgklh :S 4-kfor k = 2,3, ... and the Igk 1 are non-negative measurable functions, wecan apply the monotone convergence theorem to conclude

67314.4 The Lebesgue Spaces L1 and L2

Page 693: Strichartz_The Way of Analysis 2000

Proof: Let V denote the vector space of (equivalence classes of)simple functions L~=l akXA. where JJ(Ak} < 00 for each k. It is s­traightforward to verify that V is a vector space and (j, g) = J IgdJJdefines an inner product on V. Thus we have the Cauchy-Schwartzinequality 1(j,g}1 s 1I/I1211g112for I and 9 in V, and replacing 1and 9by 1I1 and Igl we obtain J I/gldJJ ~ 1I/I1211g112 on V.

Now let I and 9 belong to L2(JJ}. Then Ig is measurable, so weneed to show J IlgldJJ is finite. But if {In} and {gn} are sequences ofnon-negative simple functions increasing monotonically to 1I1 and Igl,then {fngn} is a sequence of non-negative simple functions increasingmonotonically to Ilgl. Furthermore it is easy to see that In and gnmust be in V. If say In = L:~=l akXA" with all ak :/; 0, then I~ =E~=l a¡XA. and I~ ~ 1/12 implies J l~dJJ s J 1/12dJJ < 00, so JJ(Ak} <00 for all k. Thus

J l/n9nldl' s 1I1nII2119nl12 s 11/112119112

Theorem 14.4.2 (Cauchy-Schwartz In equality) 111/12 and Igl2 are in­tegrable, then I9 is integrable and

which would in turn imply that Ig is integrable since 1I1 and Igl have thesame norm as I and g. We seem to be caught in circular reasoning-1 . 9 integrable implies that the Cauchy-Schwartz inequality impliesI .9 integrable. The way out of this circle is first to consider simplefunctions where the integrability of I9 is obvious.

a new difficulty: it is by no means obvious that L2(JJ) forms a vectorspace. If 1/12 and Igl2 are integrable, why is 1I +gl2 integrable? Takingthe real case for simplicity, we can write (!+g)2 = 12 +21g +g2, so weneed to show that l·9 is integrable. If this were the case, then we coulddefine an inner product (1, g) = J IgdJJ (put 9 in the complex case),and the norm 11/112 = (J 1/12dJJ)1/2 would be associated to this innerproducto Furthermore, the Cauchy-Schwartz inequality would say

Chapter 14 The Lebesgue Integral674

Page 694: Strichartz_The Way of Analysis 2000

Proof: The proof follows the same pattern as the proof of the com­pleteness of Ll(t-t). We take a Cauchy sequen ce {id in L2(t-t), passto a subsequence such that IIgkll2 ~ 4-k where gk = I» - Ik-l, andthen show that I» = L:J=l gj converges a.e. The only difference in theproof is that we use the L2 Chebyshev inequality f-L {x : 1 f (x) 1 ~ 8} <(1/82) 11 f 11 ~, which follows by the same reasoning.

We then set 1= 2:~1 gj and 9 = 2:~1 Igj 1so that 1i12 ~ Ig12, andwe prove Igl2 is integrable by the monotone convergence theorem (herewe estimate 11 L:7=1 Igj 1112 ~ L:7=1 IIgj 112 by the triangle inequality forthe norm). This shows that 1is in L2 (t-t) and then 21g 12 serves todominate the sequence 11 - h12, so limk_OJ 11 - hl2dt-t = O by thedominated convergence theorem; hence Ik --+ 1in the metric. QED

Theorem 14.4.3 L2(t-t) is complete.

On the other hand, if t-t(X) = +00, then there are functions in L2(t-t)that are not in L 1(t-t). Looking at Lebesgue measure on the line, we cansay roughly speaking that to be in L2 (t-t) is a more restrictive conditionconcerning local behavior of singularities but a less restrictive conditionconcerning decay at infinity. Thus if 1(x) = (1 + Ix 1) -a, then 1is inL2 if and only if a > 1/2 while 1is in Ll if and only if a > 1. On theother hand if l(x) = Ixl-aXlxl::;l' then 1is in L2 if and only if a < 1/2while 1is in Ll if and only if a < 1. Finally, if t-t is counting measure,the containment is reversed: L1 is contained in L2 (the usual notationfor counting measure is II and l2).

It is now a straightforward matter to verify that L2 (t-t) forms avector space and (1, g) = J Igdt-t defines an inner product on L2(t-t).What is the relationship between functions in L1 (t-t) and L2 (t-t)? If themeasure of the whole space t-t(X) is finite, then we have the containmentL2 (t-t) ~ L1 (t-t). Indeed taking 9 == 1 in the Cauchy-Schwartz inequalitywe obtain

and taking the limit as n --+ 00 gives J 11gldt-t ~ 11111211g112, showingthat 19 is integrable. QED

67514.4 The Lebesgue Spaces Ll and L2

Page 695: Strichartz_The Way of Analysis 2000

It would seem plausible that this would continue to hold for 1 in L2.We will show that this is in fact the case and, furthermore, that everychoice of Fourier coefficients for which ¿ Icnl2 is finite corresponds to

and the mean convergence

We can then at least write the formal Fourier series ¿~oo cneinx and askif it converges to 1in any sense. Unfortunately there is no good answerto this question in L1, so we restrict attention to L2 functions (recallthat L2 ~ L1 here because I-"(X) = 27r is finite). Now for continuousfunctions we have established Parseval's identity

1 l1r . 1 l1r 1- I/(x)e-mxl dx = - I/(x)1 dx = -11/111 < oo.27r -1r 27r -1r 27r

makes sense for any 1 in L1 if we interpret the integral as a Lebesgueintegral. Indeed if 1 (x) is integrable, then 1 (x )e-inx will also be inte­grable since it is the product of measurable functions, hence measur­able, and I/(x)e-inxl = I/(x)l, so

1 l1r .en = - I(x)e-mx dx27r -1r

We now consider the case where X = [-7r,7r] and 1-" is Lebesgue mea­sure, and we study Fourier series for functions in L2• We observe thatthe definition of the Fourier coefficients

14.4.3 Fourier Series for L2 Functions

A complete inner product space is called a Hilbert space. L2 (1-") isa typical example of a Hilbert space. This is the setting for quantummechanics, where 1-" is Lebesgue measure on sorne Euclidean space (ormore generally the classical configuration space of the system beingstudied).

Chapter 14 The Lebesgue Integral676

Page 696: Strichartz_The Way of Analysis 2000

1 l7r .-2 f1v(x)e-mx dx1(' -7r

1 l7r .+ - (!(x) - IN(x))e-mx dx27r -7r

1 l7r .= en + - (f(x) - IN(x))e-mx dx27r -7r

Finally

1 l7r .- I(x)e-mx dx =27r -7r

-00

2~11/11~

N<lnl;5M 2

if N < M (the factor 27r appears here and not in Chapter 12 becausewe have not divided by 21('in defining the inner product). Frorn thisand the fact that L:~oo Icnl2 < 00 it follows easily that IN is a Cauchysequence in L2• By the completeness of L2 it converges to sorne I inL2 in the norrn, limN_oo III - INI12 = O. Since I - IN and IN areorthogonal, we obtain

N<lnl;5MII/N - IMII~ =

some L2 function. What this rneans is that the Fourier coefficients givea one-to-one correspondence between L2 functions and l2 sequences{en} such that 2:~00 lenl2 < oo. Furtherrnore this correspondence islinear and isometric so that in fact it shows the Hilbert space structureof L2 (-7r, 1(')and l2 are isomorphic.

We begin by studying the correspondence in the direction {cn} --+ l.Theorem 14.4.4 (Riesz-Fischer) Let complex coefficients en be giv­en with 2:~00 lenl2 < oo. Then there exists complex I in L2( -7r, 7r)with en = (1/21(') J~7r I(x)e-inx dx such that (1/27r) J~7r I/(x)12 dxL:~oo lenl2 and limN_oo 11I- 2:~N cnénxl12 = o.

N .Proof: Let IN = 2: _N enemx. Because of the orthogonality of thefunctions einx with respect to the inner product we have

2

67714.4 The Lebesgue Spaces Ll and L2

Page 697: Strichartz_The Way of Analysis 2000

For the converse result we will have to work harder, because itessentially depends on the fact that we haven't left out any functions(note that the Riesz-Fischer theorern rernains true if we ornit sorne ofthe exponentials einx). Nowwe have aIready verified this completenessof the exponentials einx in the context of continuous functions, butthere remains the possibility that by enlarging to the class L2 we havecreated some new functions that are orthogonal to all the exponentials.The reason this is not the case is that the continuous functions arealready dense in L2• Therefore we first rnust prove this density resultoIncidentally, the analogous staternent is true for L1, and the proof isessentially the same.

Theorem 14.4.5 (Density 01 Continuous Functions) Let I be inL2( -7r, 7r). Then there ezists a sequence 01 continuous functions In(with In( -7r) = In(7r), so In extends to a continuous periodic junction)sucñ that II/n - 1112 -+ O as n -+ oo.

Proof: It clearly sufficesto do this for non-negative I (approxirnate1+and 1- separately in the real case), and frorn the definition of J 12dJJit is clear that there exists a sequence of simple functions In such that11I - Inl12 -+ O. Thus it suffuces to show that a simple function can beapproxirnated by continuous functions. Again it is clear that it sufficesto do this for characteristic functions I = XA where A is a rneasurableseto

Now recall that JJ(A) was defined to be the infimum of L:~l IIj Iwhere U~l t, covers A and lj are intervals. Without loss of generalitywe rnay assurne the intervals lj are disjoint, and given any e > Owe canchoose the intervals so that U~l Ij = AUB, with A and B disjoint, and

I_!_ j7r (f(x) - IN(x))e-inx dxl27r -7r< 2~ (1: I/(x) - IN (x)l2dx ) 1/2 (f 12 dx) 1/2

1= /tLIII - 11'1112 -+ O as N -+ 00,v27r

for any N ~ n. But

Chapter 14 The Lebesgue Integral678

Page 698: Strichartz_The Way of Analysis 2000

The Parseval identity gives us the converse of the Riesz-Fischertheorem. If we start with 1 E L2 we produce a sequence {Ck} in l2,the Fourier coefficients, and recover f as the limit of t~N cneinx

Proof: Let {fk} be a sequence of continuous functions such thatIl/k - 1112~ Oas k ~ oo. We have already proved the result for thefunctions fk, so we want to get the result for 1by passing to the limitoGiven any error é we can first find Ik so that Il/k - 1112 ~ é/2 and thenchoose N large enough so that IISNfk - Ikll2 ~ é/2, so IISNIk - 1112~é. But we can apply the projection theorem to 1 to conclude thatSNI minimizes the distance to 1among all trigonometric polynomialsE~NClnein.x, so IISNI - 1112 ~ IISN/k -/112 ~ é (recall that for theprojection theorem we needed only to have an inner product space, soL2( -11',11') will do). This shows limN_oo IISN1 - /112= Oand so also1I/11~= limN_oo IISN/II~ = limN_oo21rE~N lenl2 = 21rE~oo lenl2•QED

Adding together all these approximations in the usual way we oh­tain the resulto QED

Theorem 14.4.6 (Parseval Identity) Let 1 be in L2(1r, 11'). ThenIISNI - 1112~ O and (1/21r)11/11~= E~oo lenl2•

Figure 14.4.1:

#-,(B) ~ e. We can also choose N large enough so that E~N+Illj 1~ e.Thus XA and Ef=l Xt¡ are equal except on a set of measure at most2é, so IlxA - Ef=l xt, 112~ .;2c.

Finally XI, can clearly be approximated by continous functions inJ

the L2-norm, as indicated in Figure 14.4.1, and we can always arrangethese approximating functions to vanish at ±1r.

67914.4 The Lebesque Spaces Ll and L2

Page 699: Strichartz_The Way of Analysis 2000

L: lu(x, tW dx = 2".f ICn12e-2n2,-00

00 ( 00 ) 1/2 ( 00 ) 1/2~ ICnle-n2,::; ~ ICnl2 ~ e-2n2, < 00

by the Cauchy-Schwartz inequality. Thus u is continuous. Further­more if we formally differentiate with respect to t or x any number oftimes, the series remains uniformly convergent because the derivativesjust produce polynomial factors in n, and we can repeat the previousargument since 2:c:'00 n2ke-2n2t is finite for any k. Thus u is C?' int > Oand the series may be differentiated term-by-term, so u satisfiesthe heat equation. By Parseval's identity we have

Proof: For any fixed t > O the series 2:c:'00 e-n2t cneinx convergesabsolutely and uniformly in x by comparison with

The importance of these results is that they show that Fourier seriesare well behaved on the space of L2 functions. We can recognize an L2function from its Fourier coefficients, and we can always interpret theFourier series as converging in L2 norm. We give one typical applicationto the heat equation.

Theorem 14.4.7 Let {Cn} by any sequence satisfying 2:c:'00 Icnl2 <oo. Then u(x, t) = 2:c:'00 e-n2tCneinx for t > O is a solution of theheat equation 8u/8t = 82u/8x2 that is periodic ofperiod 21r in x, andSUPt>o J':1r lu(x, tW dx is jinite. Furthermore J':1r lu(x, t) - f(xW dx -+O as t -+ 0+ where f = J~oo Cneinx. Conversely, every solution of theheat equation in t > O that is periodic of period 27r in x and sucli thatSUPt>oJ::1r lu(x, t)12 dx is jinite has the above formo

Proof: The orthogonality means Cn = O for all n, hence IIfl12 = Oby the Parseval identity. This means f =Oa.e.

in the L2 norm. The correspondence f H {Ck} is thus an isometricisomorphism, which is onto in both directions. We also obtain as acorollary the completeness of the orthogonal system e=.Corollary 14.4.1 lf f in L2 is orthogonal to all the functions e=,then f is zero.

Chapter 14 The Lebesgue Integral680

Page 700: Strichartz_The Way of Analysis 2000

14.4.4 Exercises

1. Prove the V Chebyshev's inequality: if IflP is integrable (for fixedv > O), then ~{x : If(x)1 ~ s} s (1/sP) J IfIPd~.

2. Prove that II ~ l2 (recalllP means V for counting measure).

3. Prove that if f is in L1 and f is bounded, then f is in L2.

4. Give an example of a function on the line that is in L1 but notL2 and one that is in L2 but not LI•

5. Prove that if f is any L1 function such that f~1I' f(x)e-inx dx = Ofor all n, then f =Oa.e,

6. Prove the Riemann-Lebesgue Lemma: if f is in LI, thenlilIln_±oo en = O. (Hint: use the density of continuous functionsin L1 and Parseval's identity.)

7. Let f be in L2( -1r, 1r), and extend f to be periodic of perlod 21r.Prove the continuity of translation in L2: if h(t) = f~1r If(x + t)­f(x)12 dx, then limt_O h(t) = O. (Hint: use density of continuousfunctions.) State and prove the analogous result for L1.

and this goes to zero as t _., 0+ by the dominated convergence theoremfor counting measure (lenI2(1 - e-n2t)2 goes to zero pointwise and isdominated by the integrable lenI2).

For the converse, assume u(x, t) is a solution to the heat equa­tion in t > o. Then we have already shown in Chapter 12 that

2 .u(x, t) = E e-n tene,nx for some coefficients en. Now we add the con-dition that f~1r lu(x, t)12 dx $ M for a11t > o. By Parseval's identitythis is E:1I' lenl2e-2n2t $ M/21r. Finally we let t _., O and apply themonotone convergence theorem (e-2n2tlenI2 _., lenl2 monotonically) toconclude that E~oo len 12must be finite. QED

so SUPt>o J~1r lu(x, t)12 dx is fínite; also by Parseval's identity we have

-00

68114.4 The Lebesgue Spaces L1 and L2

Page 701: Strichartz_The Way of Analysis 2000

In (x) = {n i/O < ~ < 1/n,O otherunse.

Then fol(limn_oo In(x)) dx =1: limn_oo fe: In(x) dx.

Example Let

is not Riemann integrable, even though it is the pointwise limit 01Rie­mann integrablefunctions.

I(x) = {O í!x ~ i~tional,1 il X IS rahonal

Example Dirichlet's functíon on [0,1]

14.1 The Concept of Measure

14.5 Summary

10. Give an example of a sequence of functions In(x) on [0,1] suchthat J¿ I/n(x)l dx ~ Oas n ~ 00 but In(x) does not converge tozero for any x in [0,1]. Can you make the functions In continuous?

Prove that u(r cos8, r sin 8) = ¿~oo enrlnlein8gives a solution forevery choice of en with ¿~oo lenl2 finite and conversely that allsolutions have this formo

1.:lu(rcos9, rsin9>fd9 s M for O < r < 1.

9. Consider the Laplace equation (82/8x2 + 82 /8y2)u(x, y) = Oinx2 + y2 < 1 subject to the restrlction

8. Prove that if u(x, t) is a solution to the heat equation in t > Othat is periodic of period 21rand for which SUPt>o f~1r lu(x, t)12 dxis finite, then there exists a constant e such that u(x, t) ~ euniformlyas t ~ oo.

Chapter 14 The Lebesgue Integral682

Page 702: Strichartz_The Way of Analysis 2000

2. continuous [rom below: il Al ~ A2 ~ ••• is an increasing sequence01 measurable sets, then I-"(U~l Aj) = lirnj_oo I-"(Aj);

3. conditionally continuous from above: il B1 ;2 B2 ;2 ... is a de­creasing sequence 01 measurable sets and il the mensures I-"(Bj)are finite, then l-"(n~l Bj) = lirnj_oo I-"(Bj);

Theorem Any measure 1-" is

1. monotone: A ~ B implies I-"(A) ~ I-"(B);

Definition A measure on a a-field F (relerred to as the measurablesets) is a [unction 1-" : F -+ [0,00] satislying 1-"(0) = O and a-additivity:il A = U~l Aj with Aj disjoint sets in F, then I-"(A) = ¿~ll-"(Aj).

Definition A collection F 01 subsets 01 a fixed set X (the universe) iscalled a field 01 sets il

1. the empty set is in F;

2. il A is in F, then the complement 01 A is in F; and

3. il A and B are in F, then A U B is in F.A field 01 sets is called a a-field il in addition

4. il Al, A2, ••• is a sequence 01 sets in F, then U~l Aj is in F.

Definition The a-field generated by the field F, denoted Fu, is thesmallest a-field containing F or, equivalently, the intersection 01 all a­fields containing F. The a-field generated by the field 01 finite unions01 subintervals 01 a fixed interval X 01 the line is called the a-field 01Borel sets in X. The same definition applies to any metric space X.

Corollary 14.1.1 111 is any interval, then

111 = inf {~ IIj 1 : 1~Q Ij } .

Lemma 14.1.1 Let 11,12,... be disjoint intervals such that 1= hUhU' .. is also an intervalo Then 111 = 2:~1 11j 1 where 111 denotes thelength 01 the interval 1.

68311,.5 Summary

Page 703: Strichartz_The Way of Analysis 2000

2. ¡.t(0) = O;

3. (a-subadditivity) il A = Ui=l Aj, then ¡.t(A) ~ L:i=l ¡.t(Aj);

4. (monotonicity) il A ~ B, then ¡.t(A) ~ ¡.t(B).

Deftnition An outer measure on a a-field :F 01 sets is a junction ¡.t(A)satislying:

1. (non-negativity) ¡.t : :F --+ [O, 00];

14.2 Proof of Existence of Measures

Deftnition A probability measure is any measure such that IXI = 1.

Example Counting measure is the measure on the a-field 01 all subsets01 X that assigns to each set its cardinality (the number 01 elements itcontains).

Deftnition The Lebesgue measure ¡.t(B) 01 a Borel set in lRn is definedto be ¡.t(B) = inf{L:~l IRj I : B ~ Ui=l Rj} where u, denotes anyrectangle h x 12 X ••• x In with volume IRjl = IhIII21" ·IInl.

Theorem Any Borel set B can be covered by a Gb set A (a countableintersection 01 open sets) such that A \ B has Lebesgue measure zero,and IAI = IBI·

Theorem Sets 01Lebesgue measure zero are preserved under countableunions, and subsets 01 sets 01 measure zero also have measure zero.

Lemma The Lebesgue measure 01 an interval is equal to its length.

5. a-subadditive: il B ~ U~l Aj, then ¡.t(B) s L:i=l ¡.t(Aj).

Deftnition The Lebesgue measure ¡.t(B) 01 a Borel set in lR is definedto be ¡.t(B) = inf{L:~l IIjl : B ~ Uj:l Ij}. Without loss 01 generalitywe may assume the intervals Ij to be disjoint.

4. subadditive: 11 B ~ Al UA2 U·· .UAn, then ¡.t(B) ~ ¡.t(Al) +... +¡.t(An);

Chapter 14 The Lebesgue Integral684

Page 704: Strichartz_The Way of Analysis 2000

a. 1-"0(A)< 00 implies 1-",l3(A)=Olor (3 > 0:'.

b. ~o(A) > O implies ~,l3(A) = +00 [or {3 < a.

Lemma 14.2.1

Deflnition On an arbitrary metric space, the Hausdorff measure 01dimension a is defined as in lRwhere Ij are allowed to be arbitraryclosed sets and IIj I denotes the diameter 01Ij (the supremum 01d(x, y)[or x and y in Ij).

Theorem 14.2.3 1-"0 is a measure on the Borel sets.

Deflnition 14.2.3 Hausdorff measure 1-"0 01 dimension O:' on the Borelsets in lR is defined by 1-"0(A) = lime_o I-"~)(A) where I-"~)(A)infrE~ll/jIO :A ~ U~l t, [or intervals t, satisfying l/ji::; e}.

Theorem 14.2.2 (Carathéodory) A metric outer measure is a measureon the Borel sets.

Lemma Lebesgue measure on lRis a metric outer measure.

Deflnition 14.2.2 The distance between two sets A and B in a metricspace is the infimum 01d(x, y) [or x in A and y in B. /1 this distanceis positive we say A and B are separated. An outer measure on theBorel sets 01 a metric space is said to be a metric outer measure ilI-"(AU B) = I-"(A)+ I-"(B) whenever A and B are separated.

Theorem 14.2.1 Let 1-" be an outer measure on F. The sets satislyingthe splitting condition (denoted Fo) [orm a u-field and 1-" restricted toFo is a measure.

Deflnition 14.2.1 For 1-" an outer measure on F and A in F, we sayA satisfies the splitting condition ill-"(B) = I-"(Bn A) + I-"(B\A) larevery B in F.

Theorem Lebesgue measure is an outer measure.

68511,.5 Summary

Page 705: Strichartz_The Way of Analysis 2000

Lemma Every simple funetion has a representation I = E akXAJ¡where the seta AA: are disjoint, and the representation is essentially

Equivalently, a simple funetion is a measumble funetion that takes ononly a finite set 01 values.

{1 if x is in A,

XA= O if x is not in A.

Deftnition A simple funetion is a finite linear eombination 01 ehar­aeteristie funetions 01measumble seta I = L~=1akXAJ¡ where Ak aremeasurable seta; and

Theorem 14.3.1 11I and 9 are measurable funetions, then so areal +bg, l· g, I/g (il g:f: O), max(f,g), min(f,g), and 1/1. 11h : R ~ Ris measumble (with respect to the Borel q-field), then h 01 is measur­able. 11 In, is a sequence 01 measumble funetions, then sup., In" infn, In"lim sup In, lim inf In" and lim In, (il it e%istapointwise) are measurablefunetions.

Lemma 11I : X ~ R and 1-1(1) is measumble for every interval 1(or even for every interval 01 the form (a, 00), or [a,00), or (-00, a),or (-00, a]), then I is measumble.

Deftnition A funetion I : X ~ R is said to be measumble il 1-1 (B)is measumble for every Borel subset B 01R.

Deftnition A measumble spaee is a pair (X, F) where F is a a-field 01subseta 01X. A measure spaee is a triple (X, F, ~) where, in additionto the above, ~ is a measure on F. The seta 01F are ealled measumbleseta.

14.3The Integral

Example The Hausdorff dimension 01the Cantor set is log2/ log3.

Deftnition The Hausdorff dimension 01A is the unique 1Jalue00 suehthat ~a(A) = +00 for Q < Qo and J'a(A) = Olor Q > Qo.

Chapter 14 The Lebesgue Integral686

Page 706: Strichartz_The Way of Analysis 2000

Definition 11I : X -+ lR,then 1+= max(f, O)and 1- = max( - 1,O),so I = 1+ - 1- and I± are nonnegative. 11I is mensurable we say I

Theorem 14.3.4 (Fatou's Theorem) 11 I = limn_oo In where In arenon-negative measurable [unciions and il limn_oo f IndJJ exists, thenf Idl-" :::;limn_oo f IndJJ. More generally, il {In} is any sequence 01non-negative functions, then f lim inf IndJJ $ lim inf J Indl-".

Theorem 11I is Riemann integrable, then I is Lebesgue integrable andthe two integrals are equal.

1. linear: f(al + bg)dl-"= a f Idl-"+ b f gdl-"j

2. monotone: f I dI-":::;f gdl-"if I :::;s:

3. additive: fAUB Idl-"= fA Idl-"+ fB IdJJ [or A and B disjoint mea­surable sets, where fA Idl-" denotes f IXAdl-".

Theorem 14.3.3 The integral 01 non-negative mensurable functions is

Theorem 14.3.2 (Monotone Convergence Theorem) 110:::; h :::;h :::;... is a mono tone increasing sequence 01 non-negative mensurable func­tions, then f limn_oo Indl-"= limn_oo f Indl-" (both sides may be +00).

Definition The integral 01 a non-negative simple function I =I:r::l akXAk is defined by f IdJJ = I: akJJ(Ak), which does not de­pend on the representation. The integral 01 a non-negative measurablefunction I is defined by f IdJJ = limn_oo f Indl-" where In is a se­quence 01 non-negative simple functions increasing monotonically tol. The integral does not depend on the sequence. A particular choiceis In = Ln (inf B)x 1 and Pn is the partition consisting of setrn ¡- (B)

[(k - 1)/2n, k/2nJ [or 1< k ::; 22n and [2n, 00].

Theorem (Approximation by Simple Functions) Every non-negativemeasurable function is the pointwise limit 01 a mono tone increasingsequence 01 simple [unctions.

unique.

68714.5 Summary

Page 707: Strichartz_The Way of Analysis 2000

Lemma 14.4.1 (Chebyshev's In equali ty) For any I in L1 (IJ) and 8 > Owe have IJ({x : 1I (x) 1> 8}) ~ (1/8) 11I 1[i .

Deftnition We say I is equivalent to 9 il I = 9 a.e., where I and 9are measurable functions on a measure space. We define L1(IJ) to bethe vector space 01 equivalen ce classes 01 integrable functions, and theL1 norm is defined to be II/lh = J I/ldIJ. This is a norm on L1(IJ).

14.4 The Lebesgue Spaces L1 and L2

Theorem 14.3.6 Let I and 9 be integrable functions. Then I = 9 a. e.il and only il fA IdIJ = fA gdIJ for every measurable set A.

Deftnition Any statement about points x in X is said to hold almosteverywhere (abbreviated a.e.) il it is true for all x not in a set E 01measure zero.

Corollary 11IJ(X) il finite and In is a unilormly bounded sequence 01functions converging pointwise to 1, then J IdIJ = liIDn_oo J IndIJ·

Theorem 14.3.5 (Dominated Convergence Theorem) 11 9 is a non­negative integrable function with I/n(x)1 ~ g(x) [or all x and n [or asequence In 01measurable functions such that In(x) ~ I(x) as n ~ 00for every e, then I is integrable and f IdIJ = limn_oo f IndIJ·

Theorem The integral is linear, monotone, and additive on real-valuedintegrable functions.

Example For counting measure on {l,2, 3, ... } the integrable functionsare the absolutely convergent series, and the integral is the series sumo

Theorem 11 I : X ~ R is measurable, then I is integrable il and onlyi/l/l is integrable and Minkowski's inequality 1 f IdIJI s f I/ldIJ holds.

is integrable il J l±dIJ are both finite and then define the integral 01 Iby J IdIJ = J l+dIJ - J l-dIJ·

Chapter 14 The Lebesgue Integral688

Page 708: Strichartz_The Way of Analysis 2000

Corollary 14.4.1 II I is in L2 and orthogonal to all the functions einx,then I is zero.

Theorem 14.4.6 (Parseval ldentity) 11 I is in L2( -11",11"), then thepartial sums 01 the Fourier senes converge to I in L2 nonn and Paree­val's identity holds. In particular, the correspondence I H {en} betweenL2( -11",11") and 12 is onto in both directions.

Theorem 14.4.5 (Density 01 Continuous Functions) The continuousfunctions on [-11",11"] with I( -11") = 1(11") are dense in L2( -11",11").

Theorem 14.4.4 (Riesz-Fischer) For any complex coefficients en with¿::-oo lenl2 < 00 there exists a complex-valued function I in L2( -11",11")with Fourier coefficients equal to en : en = (1/211") J.:1r I(x)e-inx dx.Furthennore, (1/211") f~1r I/(x)l2 dx = ¿~oo lenl2 (Parseval's identity),and the partial sums SNI(x) = ¿~=-N eneinx converge to I in L2nonn.

Deftnition A complete nonned space is called a Banach space, a com­plete inner product space is called a Hilbert space.

Theorem 14.4.3 L2(IJ) is complete.

Theorem II IJ(X) is finite, then L2 (IJ) ~ L1 (IJ). For counting measurethe spaces are denoted 11 and l2 and we have II ~ l2.

Theorem 14.4.2 (Cauchy-Schwartz Inequality)II I and 9 are in L2(JJ),then I 9 is integrable, so the inner product is well defined and theCauch'Y-Schwartz inequality f IlgldlJ s (J 1/12dIJ)1/2(J IgI2dIJ)1/2 hold-S.

Deftnition L2(JJ) is the space 01equivalence classes 01measurable func­hons I such that 1/12 is integrable, with L2 nonn 11/112 = (J 1/12dIJ)1/2and inner product (f, g) = f I gdIJ (take 9 in the complez-volued case).

Theorem 14.4.1 L1 (1') is complete.

68914.5 Summary

Page 709: Strichartz_The Way of Analysis 2000

Theorem 14.4.7 IIE~oo lenl2 is finite, then u(x, t) = E~oo e-n2teneinz[or t > O is a solution 01 the heat equation 8u/8t = 82u/8x2 that isperiodic 01 period 27r in x, and SUPt>O 1':11' lu(x, t}¡2 dx is finite, with1':11' lu(x, t) - l(x)12 dx ~ O as t ~ 0+ where I = ¿:~ooeneinz. Con­versely, every solution 01 the heat equation in t > O that is periodic 01period 27r in x and such that SUPt>o 1~11' lu(x, t)12 dx is finite has theabove formo

Chapter 14 The LebesgueIntegral690

Page 710: Strichartz_The Way of Analysis 2000

691

N M

SU, P) = L L f(xj, Yk)(Xj - Xj-1)(Yk - Yk-1)j=lk=1

where P denotes a partition a = Xo < Xl < ... < XN = b,e = Yo <Yl < ... < YM = d, of the sides of the rectangle (hence determining apartition of the rectangle into M N little rectangles), and (Xj, fiA:) is anypoint in the rectangle Rjk = [Xj-b Xj] X [Yk-1, Yk]. The limit is takenas the maximum length of the subintervals xj - xj -1and Yk - Yk-l goesto zero. It is a straightforward matter to show that this limit exists,using the uniform continuity to show that the variation of f over any

sumsThe double integral I IR f(x, y) dx dy is defined as the limit of Cauchy

[ ({ /(x, y) dx) dy and {([ /(x, y) dY) dx,

15.1.1 Integrals of Continuous Functions

An important fact in the ealculus of several variables is the equality ofiterated integrals and multiple integrals, at least for eontinuous func­tions on well-behaved domains. For simplicity of notation we wiIl stateall results for R2, but the generalization to higher dimensions wiIl beevidente Suppose R = [a, b] x [e,ti) is a closed bounded rectangle andf :R -+ Risa eontinuous function. The iterated integrals are

15.1 Interchange of Integrals

Multiple Integrals

Chapter 15

Page 711: Strichartz_The Way of Analysis 2000

On the other hand a direct calculation of the iterated integralJcdU; g(x, y) dx) dy shows it is equal to S(f, P). Thus the iteratedand double integral differ by at most 2e area(R), and by letting E __., Owe obtain their equality.

For many applícations it is desirable to relax the condition that thedomain be a rectangle. Suppose the domain is a compact set K in R2•

We can always set it inside some rectangle R and define the double in­tegral J IK !(x,y) dxdy by partitioning R as before and taking Cauchysums S(f, P) = L:¿: !(Xj, Yk)area(Rjk) where the sum extends onlyover those rectangles Rj k that líe entirely in K, as indicated in Figure15.1.1.

It(t /(x, y) dx ) dy - t (t g(x, y) dx ) dyl s •area(R).

so

I['(X, y) dx -[ g(x, y) dxl ~ .(b - a),

of the subrectangles Rjk goes to zero. Perhaps the only subtle point isthat we must require that both sides of the rectangles Rjk get small; itis not enough that the areas of the rectangles go to zero, for we haveno control of the variation of ! over long skinny rectangles. We leavethe details to the exercises.

We briefiy sketch the proof of the equality of the double integraland one of the iterated integrals for a continuous function ! on a rect­angle R. Given any e > O we first choose a partition P of the rectangleR into subrectangles Rjk such that I!(x, y) - !(x, y)1 5 e if (x, y) and(x, y) belong to the same Rjk. This is possible by the uniform continu­ity oi ], Then S(f, P) for this partition differs from the double integralJ IR !(x, y) dx dy by at most e· area(R). This followsby the same argu­ment as in the one-dimensional case. Now consider the function 9 thatis constant on each of the rectangles Rjk and equal to !(Xj, Yk) there.9 is discontinuous, but it has only a finite number of discontinuities,so the integrals involving 9 are covered by the theory of the Riemannintegral. Notice that I!(x, y) - g(x, y)1 ~ e everywhere on R. Thus forany fixed y,

Chapter 15 Multiple Integrals692

Page 712: Strichartz_The Way of Analysis 2000

Needless to say, the same theory of multiple integration is valid inRn• IfK ~ :an is a bounded set whose boundary has content zero and Iis a continuous function on K, then we can define the multiple integralJK I(x) dx and show that it is equal to the iterated integrals (n-fold)in any order. In particular, we can define the 1JoZumeof K, writtenvol(K), to be the integral of the function 1==1 over K.

In order for the double integral to exist and be equal to the iter­ated integrals for continuous functions on K we need to malee someassumptions on K that will control the error that arises from omittingthe rectangles that lie partially in K. It is clear that this error will benegligible if we assume that the sum of the areas of those rectanglesthat are partially in K goes to zero as the partitions are refined. It isnot difficult to see this is the same as saying the boundary 01K hascontent zero. Recall the boundary of K is defined to be all points ofK that are not in the interior-the interior points being those lyingin a neighborhood entirely contained in K. A subset of the plane issaid to have content zero if given any e > O it can be covered by afinite set ofrectangles whose areas sum to at most é. (Notice that thisis a stronger condition than having measure zero because the coveringmust be fínite-+for measure zero the definition is the same except thatcountable coverings are allowed.) If the boundary of K is the union ofa finite number of smooth curves, it is easy to show that it has contentzero and this is the case in the usual calculus applications. We leavethe details to the exercises.

Figure 15.1.1:

69315.1 Interchange 01Integrals

Page 713: Strichartz_The Way of Analysis 2000

Then the Lebesgue me asure of Ay is J~oo XA(X, y) dx, so we are lookingat the iterated itegral r;(J~oo XA (x, y) dx) dy. If we were to sectionfirst in the y-direction (for fixed x) and then integrate in x, we wouldobtain the iterated integral in the other order. The double integralJ XAdl-" is just I-"(A), so our claim that I-"(A) is equal to the area of A

{1 if(x,y)isinA,

XA(X, y) = ()O if x, y is not in A.

Let A be a measurable subset of IR2• Then for each y, the sectionAy = {x in IR: (x, y) is in A} is a subset of IR,as shown in Figure 15.1.2.If we take the Lebesgue measure of the section Ay and then integratewith respect to y we should get I-"(A). Notice we use Lebesgue measureon IRtwice, once to measure Ay and once to integrate with respect toy. We can make this look like an iterated integral if we consider thecharacteristic function of A:

Figure 15.1.2:

We discuss next the equality of iterated and double integrals in thecontext of Lebesgue integration. (We are deliberately skipping the caseof Riemann integrable functions because the results are inconclusiveand not very useful.) We let 1-" denote Lebesgue measure on IR2• Recallthat 1-" is the unique measure on the Borel subsets of IR2 that assigns toeach rectangle its usual area (it is convenient to adopt the conventionO. 00 = Oin defining areas of rectangles so that the area is defined forunbounded rectangles in a consistent manner). We would like to relatethis measure on IR2 to Lebesgue measure on IRby summing the lengthsof horizontal or vertical sections.

15.1.2 Fubini's Theorem

Chapter 15 Multiple Integrals694

Page 714: Strichartz_The Way of Analysis 2000

which is the countable additivity of 11. Of courseour technical assump­tions about measurability were used implicitly in this argumento

This result admits an interesting interpretation. Suppose I :IR-+R is a non-negative measurable function. Then one interpretation off I (x) dx is.as the area under the graph of l. Thus let A denotethis region under the graph of 1,A = {(x, y) : O $; y ~ I(x)}. If we

f (J XA(Z,y)d:t) dy =f f (J XA.(Z,y) d:t) dy,k=l

f XA(Z,y)d:t =f f XA.(Z,y)d:tk=l

for each fixed y by the first application of the monotone convergencetheorem. But we can then regard this equation as writing one non­negative function of y as an infinite series of non-negativefunctions, soby a second application of the theorem

by sectioning is equivalent to the equality of the double integral andthe iterated integrals for the particular function XA.

There are sorne technical difficulties with defining the iterated inte­gral for XA. First, in order to form J~ooXA(X, y) dx, we need to knowthat for each fixed y the function XA(X, y) as a function of x is mea­surable. Then we need to know that J~oo XA(X, y) dx is measurableas a function of y in order to perform the next integration. The res­olution of these difficulties is by no means trivial, and we postponethe details to section 15.1.3. Given that these functions are measur­able, we indicate a simple proof that f~oo(J~oo XA(X, y) dx) dy = ~(A).We note that the identity is trivially true when A is a rectangle.Since J1 is characterized as being the unique rneasure on the Borel setsthat gives the usual area for rectangles, it suffices to show thatthe iterated integral defines a measure. In other words, if we setII(A) = J.~ooo(J~ooXA(X, y) dx) dy, then we have to verify the axiomsfor a measure for II-the only non-trivial axiom being countable addi­tivity. Thus we need to show II(A) = E~l"(Ak) if A =U~l Ak is adisjoint union of Borel sets. But we claim this followseasily from twoapplications of the monotone convergencetheorem. Indeed the condi­tion A = U~l Ak (with Ak disjoint) translates into XA = E~l XA ••Since these are non-negative functions, we have

69515.1 Interchange 01 Integrals

Page 715: Strichartz_The Way of Analysis 2000

completing the proof. QED

J (J /n (x, y) dx ) dy .... J (J / (x, y) dx ) dy,

Proof: Let in be a monotone increasing sequence of non-negativesimple functions converging to i· Then J i d.p. = limn_oo J ind.p. bydefinition. But we have seen that the double and iterated integralsare equal for characteristic functions and, hence, for simple functions.Thus J Indl-" = JU In(x, y) dx) dy. The last step in the argument isto take the limito Regarding In(x, y) as a function of x (for fixed y),we know it is measurable (this will be proved in section 15.1.3); and aswe vary n we have an increasing sequence of non-negative functions, soI(x, y) is measurable and J In(x, y) dx -+ J I(x, y) dx by the monotoneconvergence theorem. Since J In(x, y) dx is measurable, this shows thatJ I(x, y) dx is measurable, and one more application of the monotoneconvergence theorem shows

Theorem 15.1.1 (Fubini's Theorem, first version) Let I be a non­negative measurable [unctior: on ]R2. Then the double integral J I dI-"and the two iterated integrals J (J I (x, y) dx) dy and J (J I (x, y) dy) dxare all equal (they may be all +00) .

We now turn to the problem of extending the equality of double anditerated integrals to more general functions than characteristic func­tions. To begin with we study non-negative functions.

Figure 15.1.3:

section A first by varying y, as indicated in Figure 15.1.3, then theiterated integral JU XA(X, y) dy) dx is equal to J I(x) dx. Thus ourresult implies the equality of the integral of a non-negative measurablefunction and the area under its graph.

Chapter 15 Multiple Integrals696

Page 716: Strichartz_The Way of Analysis 2000

This result says a lot less than appears since the hypothesis thatI be integrable involves the double integral and normally we want tocompute it by computing an iterated integral. The existence of theiterated integral, if it involvescancellation, need not imply the existence

Proof: We need only apply the previous version to 1+and 1-. SinceJ l±dIJ is assumed finite, we have JU I±(x, y) dx) dy = J l±dIJ andthis implies J I±(x, y) dx is finite for almost every y (or else the y inte­gration would produce +00). Subtracting and using the linearity of thedouble integral and the iterated integrals weobtain JU I(x, y) dx) dy =J IdIJ· QED

Theorem 15.1.2 (Fubini's Theorem, second version) Let I be a mea­surable function on R2, and assume that I is integrable with respectto IJ (this means J 1+dIJ and J 1-dIJ are jinite). Then the double in­tegral and iterated integrals are equal. In particular for almost everyy, I(x, y) as a function 01x is integrable and J I(x, y) dx is integrableas a function 01y and the same with x and y reversed.

is finite for y '# O but +00 for y = O since J~l Ixl-1dx = +00. In thiscase the double integral is finite, but this is easiest to see using polarcoordinates, which we discuss in section 15.2.

It is not necessary that I be defined on all of]R2 to apply thetheorem. If suffices to have I defined on a measurable set A, for thenwe can simply extend I to be zero outside of A. Notice that we do notneed any assumptions about the boundary of A.

Next we consider functions that are not necessarily nonnegative.

One word of caution in interpreting this theorem: even though theintegral J IdIJ may be finite, some of the single integrals J I(x, y) dxmay be infinite. This can only happen for values of y in a set of measurezero, of course. Thus the second integration in the iterated integral mayinvolve a function taking values in the extended reals. For example,suppose I(x, y) = (x2 + y2)-1/2Xx2+y29. Then

69715.1 Interchange 01 Integrals

Page 717: Strichartz_The Way of Analysis 2000

Theorem 15.1.4 (Ll Convolution Theorem) Let I and 9 be integrablefunctions on the lineo Then I * 9 is defined as an integrable func­tion in the lollowing sense: for almost every x, the function h(y) =

This is the most useful of the three versions, since the hypothesisis one that we can hope to check by finding an upper bound for theiterated integral. It is significant that we have to deal with only one ofthe two iterated integrals, sínce it is often the case that one is easier toestimate.

The consequences of Fubini's theorem are often surprising. We giveone application that is typical and an important theorem in its ownright. Recall we defined the convolution of two functions I * g( x) =J~ooI(x-y)g(y) dy. We originally defined this in terms ofthe Riemannintegral, but now we can interpret it as a Lebesgue integral and canask under which conditions on I and 9 it is defined.

Proof: By the first version applied to 1II we have J I/ldIJ =JU I/(x, y)1 dx) dy, so 1II is integrable. This means I is integrable,so by the second version the iterated integrals both equal the doubleintegral of l. QED

Theorem 15.1.3 (Fubini's Theorem, third version) Let I be a mea­surable function on 22, and suppose one 01 the iterated integrals [or 1I1exists and is finite; say J 1I (x, y) 1 dx is finite [or almost every y andJU I/(x, y)1 dx) dy is finite. Then I is integrable with respect to IJ andboth iterated integrals for I are equal to J I dIJ·

Then J I(x, y) dx = O for every y, so JU I(x, y) dx) dy = O, butJ I(x,y)dy = ±oo for x in (-1,1), and of course I is not integrable.With a little more ingenuity one can construct examples with iterat­ed integrals existing in both orders but not equal. AH these examplesexploit concealed cancellation of ±oo, as the final version of Fubini'sTheorem malees clear.

{

1 ifO < x < 1,I (x, y) = -1 if - 1< x < O,

O otherwise.

of the double integral. For example, consider

Chapter 15 Multiple Integrals698

Page 718: Strichartz_The Way of Analysis 2000

Even the statement that I * 9 is defined for almost every x is notobvious. It is easy to construct examples where 1* 9 fails to exist at asingle point. Take I = 9 = Ixl-l/2Xlxl~l. Then I and 9 are integrablebecause the singularity at x = Ois not too bad, but

I * g(O) = JI Ixl-l/2. Ixl-1/2 dx = JI Ixl-1dx = +00.-1 -1

The idea here is simply that the ordinary product of two integrablefunctions need not be integrable. Since the convolution integrates the

by the equality of the iterated integrals. This shows I * 9 is integrableand establishes the desired estimate. QED

J 1I * 9 (x)1 dx ~ J (J Ih (x, y)1 dY) dx = 111111 119111

JU Ih(x,y)1 dX) dy = J Ig(y)lll/lh dy = 11/11111g111,which is finite. Thus by Fubini's theorem J h(x, y) dy = I * g(x) isfinite for almost every x and

Here we have used the translation invariance of Lebesgue measure.Integrating this identity we obtain

Proof: Since I and 9 are measurable, it follows easily that h(x, y) =I(x - y)g(y) is a measurable function on ]R2. We want to apply thethird version of Fubini's theorem to h(x, y), and we will compute theiterated integral in the reverse order to what one might expect. Foralmost every y, g(y) is finite and

J Ih(x, y)1 dx = J I/(x - y)g(y)1 dx

= Ig(y)1 J I/(x - y)1 dx

= Ig(y)1 J I/(x)1 dx = Ig(y)111/111'

I(x - y)g(y) is an integrable [unctioti and 1* g(x) = J h(y) dy so de­fined [or almost every x (it doesn't matter how it is defined on theremaining set 01 measure zero) is an integrable [unction. In additionwe have the estimate 11I * gl [i ~ 11I1[i Ilgl[i-

69915.1 Interchange 01 Integrals

Page 719: Strichartz_The Way of Analysis 2000

In this section we will explain the technicalities involved in provingmeasurability in Fubini's theorem. They key idea is a set-theoretic fact

15.1.3 The Monotone Class Lemma*

provided f is non-negative, or integrable, or one iterated integral isfinite for 1/1. This is especiallyimportant in probability theory, wherethe choice of the product measure is interpreted as saying the eventsrepresented by x and y are independent. If IJ and 11 are counting mea­sure on the positive integers, then Fubini's theorem is equivalent to theinterchange of order of summation for absolutely convergent doublyindexed series.

f (f /(x, Y)dl'(X)) dv(y)

f (f /(x, Y)dV(Y)) dl'(x)or

product of two functions, it would seem plausible that by placing thesingularities I and 9 so that they reinforce each other in the productf(x - y)g(y) it wouldbe possibleto construct integrable functions suchthat I * g(x) = +00 for every x. The theorem shows that this is notpossible. There are many variants of this theorem: for example, if I isin L1 and 9 is in L2, then 1* 9 is in L2¡ or if I and 9 are both in L2,then I * 9 is bounded and continuous.

Fubini's theorem generalizes not only to Rn but to any situationwhere there are functionsof severalvariables. If (X, :F, IJ) and (Y,Q, 11)are two measure spaces (a certain technical condition is required thatX and y be expressibleas a countable union of sets of finite measure,but this condition is satisfied by most examples), then we can considerfunctions I(x, y) ofvariables x in X and y in Y to be a function on theCartesian product X x Y. There is a e-field of subsets of X x Y called:F x Q, which is the e-fíeld generated by the "rectangles" A x B whereA is in F and B is in Q (of course in ]R2 these are not the conventionalrectangles, since the "sides" are allowed to be arbitrary measurablesubsets of R). On this e-field there is a product measure IJ x 11 thatgives the correct area of a rectangle IJ x II(A x B) = IJ(A)II(B). ThenFubini's theorem says the integral J fdIJ x 11 is equal to the iteratedintegrals

Chapter 15 Multiple Integrals700

Page 720: Strichartz_The Way of Analysis 2000

Proof: Let F1 denote the o-field generated by F, and let MI be themonotone class generated by F (the smallest monotone class containingF). Clearly MI ~ M since M is a monotone class containing F. Wewill show that MI is a field, and this will complete the proof becauseit follows easily that MI is a o-field and, hence, contains F1 (actuallyMI = F1 since F1 is also a monotone class).

For each set A in MI, define MI(A) to be the collection of sets Bin MI such that A n B, A UB, A\B, and B\A are also in MI' (Thisis the trick!) Notice the symmetry between A and B in the definition,so B is in MI (A) if and only if A is in MI (B). Of course there is noreason to believe a priori that MI (A) contains any sets other than theempty set and A. But we will eventually show that MI(A) is all ofMI, which will imply that MI is a field.

First we claim that MI (A) is a monotone class. This is easy be­cause each of the operations that defines MI (A) preserves monotonesequences. For example, if B1 ~ B2 ~ "', then A\B1 2 A\B2 2 ...(notice the switch from increasing to decreasing, but this doesn't mat­ter because the definition of monotone classes involves both). Thismeans that if MI(A) contains F, then it must be all of MI. But howdo we get started? We use that fact that F is a field; so if A and Bare in F, then B is in M1(A). That means for any set A in F, M1(A)contains F; hence, M1(A) =MI.

Lemma 15.1.1 (Monotone Class) Let M be a mono tone class thatcontains a field F. Then M contains the a-field generated by :F.

called the Monotone Class Lemma. The proof of this fact is very trickyalthough not very long. It serves as a device for finessing the difficultyof dealing with differences of sets. We define a mono tone class M tobe a collection 01 sets (subsets 01 the universe X) that is closed undermono tone increasing and decreasing sequences: il Al ~ A2 ~ ... withAj in M, then U~l Aj is in M; and il B1 2 B22 ... with e, in M,then n~lBj is in M. These are conditions that are easy to check,especially using the monotone and dominated convergence theorems.It is easy to see that a o-field is a monotone class, but a monotoneclass need not be a o-field since it does not have to be closed undercomplements or differences. The Monotone Class Lemma gives a simplecriterion for a monotone class to contain a rr-field.

70115.1 Interchange of Inteqrals

Page 721: Strichartz_The Way of Analysis 2000

This lernrna also implies the measurability of I(x, y) as a function ofx for fixed y and J I(x, y) dx as a function of y for any simple function

is rneasurable, being the limit of measurable functions. Thus the mono­tone class lernrna implies that M contains all Borel subsets of [a, b} x[e, d].

For a general Borel subset A of a2 we write it as the countabledisjoint union of An [mI, mI + 1) x [m2, m2 + 1) as mI and m2 vary overall integers. The conclusions are preserved under countable disjointunions (the function IAy I may now assume the value +00). QED

{b X U Aj (x, y) dx = Iim {b XAJ (x, y) dxL J-~h

Proof: First we prove the analogous result for Borel subsets of afinite rectangle [a, b] x [e, d]. Let M denote the class of Borel sets A forwhich the conclusionsoí the lemma hold. ClearlyM contains the fíeldgenerated by the rectangles. We claim M is a monotone cIass. For thiswe simply apply the dominated convergence theorem to interchangethe integral with limits. We can use a constant function as dominatorsince we are in a finite rectangle. For example, if Al ~ A2 ~ ... , thenthe same is true for the sections, and

Lemma 15.1.2 Let A be a Borel subset 01a2• Then every sectíon Ayis a Borel subset 01a, and the functíon IAyl = J~oo XA(X, y) dx ís ameasurablefunctíon 01y (taking values in the extended real numbers).

Now we can 611in the missing arguments in the proof of Fubini'stheorem.

We are alrnost done, but first we need a clever use of the syrnrnetrycondition. For any B in MIwe know that B is in MI(A) for A in F, soA is in MI(B). That means MI(B) contains F; hence, MI(B) =MI'Since B was any set in MI,we have succeeded in proving our mainclaim.

We can now see that MI is a fíeld, because A in MI(B) for anyA and B in MI rneans AnB,AUB,A\B, and B\A are also in MI'By using the field properties we can replace any countable union by acountable increasing union, so MI is a e-fíeld. QED

Chapter 15 Multiple Integrals702

Page 722: Strichartz_The Way of Analysis 2000

15.1.4 Exercises

1. Write out tbe details of the proof of tbe existence of tbe doubleintegral of a continuous function on a rectangle.

In particular, this uniqueness tbeorem gives anotber proof of theuniqueness of Lebesgue measure.

It might seem possible to avoid using the Monotone Class Lemmain tbese applications and argue directly tbat tbe classes we called Mare e-fíelds. It isn't! 'lry it if you don't beüeve me.

Proof: Suppose first tbat JJ and v are finite measures. Let M bethe collection of sets A in Fl for which JJ{A) = v{A). It is easy to seetbat M is a monotone cIass, so M = F¡by tbe monotone cIass lemma.(The finiteness of the measures is needed in order to use the continuityfrom aboye for monotone decreasing sequences.) For tbe general casejust split tbe space into a countable union of sets of finite measure forbotb JJ and v, A priori tbese sets may not belong to F, but this canalways be arranged (see exercise set 15.1.4 for details). Then applythe special case of finite measures to each of tbe pieces and sum, usingu-additivity. QED

Theorem 15.1.5 (Hahn Uniqueness Theorem) Let JJ and u be u-finítemeasures on a a-field F¡ generated by a field F. 11 JJ and v are equalon F, then they are equal on F¡.

I (just use the analogous facts for the sets Ak in the decompositionI = Ef=l akXA.J, which was used in the proof of Fubini's theorem(first version).

The same argument can be used to justify the definition of theproduct measure JJ x u for any two finite measures JJ and v, and moregenerally for the case of u-finite measures, which is defined as follows:JJ is u-finite il the space X can be written as a countable uníon X =U~¡ Xj with JJ{Xj) finite. For if JJ and u are e-fíníte and we writeJJ = E JJj and v = E Vk where JJj and Vk are the restrictions of JJ toXj and v to Yk (we can always take the decompositions X = U X¡ andy = UYk to be disjoint), then we can define JJxv = E~¡ E~¡ JJj XVk.

Another important application of the Monotone Class Lemma istbe following uniqueness tbeorem of Hahn:

70315.1 Interchange olIntegrals

Page 723: Strichartz_The Way of Analysis 2000

14. Give an example of a measure that is not o-finite.

13. State each of the three versions of Fubini's theorem for countingmeasure, in terms of doubly indexed series.

11. Define a periodic convolution f*g(x) = J::'7J(X-y)g(y) dy wheref and 9 are periodic of period 27r. Prove that if f and 9 restrictedto (-7r, 7r) are integrable, then f * g(x) is defined a.e. and is alsointegrable on (-7r, rr},

12. Give an example of a function on ]R2 for which both iterated in­tegrals exist and are finite but unequal. Can you do it with acontinuous function?

6. Prove that the graph of f(x) = sin l/x on [0,1] has content zero.

7. Let f : ]R --+ ]R be measurable, and let A ~ ]R2 be the graph of f.Prove that A has measure zero. (Hint: use Fubini's theorem.)

8. Let A be a measurable subset of ]R2. Prove that A has measurezero if and only if almost every section Ay has measure zero.

9. Prove that if f : ]R2 --+ ]R is bounded, measurable, and vanishesoutside a bounded region, then the iterated integrals are equal.

10. Prove that the convolution of two L2 functions is bounded.

5. Prove that a finite union of sets of content zero is also a set ofcontent zero.

2. State and prove (by induction) the equality of all iterated integrals(how many are there?) for a continuous function on a rectangularparallelopiped in ]Rn.

3. a. Prove that if K is a set in ]R2 whose boundary has contentzero, then the double integral of a continuous function on Kexists.

b. Assume also that every section of K is a finite union of inter­vals. Prove that the iterated integrals are equal to the doubleintegral.

4. Prove that a el bounded curve has content zero.

Chapter 15 Multiple Integrals704

Page 724: Strichartz_The Way of Analysis 2000

for suitable functions I and g, where 9 : D' -+ D is a one-to-one changeof variable (D' and D are suitable regions in Rn). This result is usuallystated in calculus courses with an intuitive argument that [det dg(x) Iis the correct magnification factor for relating the volumes in the xand y = g( x) variables. This argument is based on the observationthat [det dg(x)1 is the correct magnification factor if 9 is a linear (orafline) transformation, and the general case follows by localization viathe differential. We will follow the same outline, presenting the com­plete details. It is not an easy theorem to prove, even under strongassumptions on the domains D and D' and the functions I and g.

We begin with the case of a linear transformation g(x) = Ax, whereA is an n x n matrix. Here dg( x) = A is constant. The claim is thatif R is any rectangle, then the volume of g(R) is exactly [detA] timesthe volume of R.

Let us think of the n x n matrix A as being composed of n columnvectors Al, , An. These vectors are the image of the ·standard basisfor Rn(e(1), , e(n») under multiplication by A. Now the unit cubeconsists of all vectors 2:7=1 bj e(j) where O :S bj :S 1, and this is mappedunder multiplication by A into the parallelopiped of all vectors of theform 2:7=1 bj Aj for O :S bj :S 1. Let us call this the parallelopipedgenerated by A1,"" Ano The unit cube is thus the parallelopipedgenerated by the standard basis vectors e(l), o o o ,e(n) o The volume ofthe parallelopiped generated by Al, o o o , An is clearly the magnificationfactor for which we are looking, because the unit cube has volume oneand it is clear by linear algebra that the same factor relates the volume

f I(y) dy = f l(g(x))ldet dg(x)1 dxJD JD'

Our goal is to establish the change of variable formula

15.2.1 Determinants and Volume

15.2 Change of Variable in Multiple Integrals

15. If 1-" is a o-finite measure on a o-field F1 generated by a field F,show that in the decomposition X = U~l X¡ with I-"(Xj) < 00 itis possible to take each Xj in :F.

70515.2 Change 01 Variable in Multiple Integrals

Page 725: Strichartz_The Way of Analysis 2000

Any function satisfying properties 1,2, and 3 is called a skew­symmetric multilinear formo Property 4 is just a normalizing condition.A theorem of linear algebra says that the space of skew-symmetric mul­tilinear forms in n vectors from ]Rn is one-dimensional, so together withthe normalizing condition 4 this uniquely determines the determinant.

In view of this uniqueness result, in order to show that the volumeof the parallelopiped generated by Al, ... , An is Idet(At, ... ,An)1 wewant to invent a signed volume for parallelopipeds that will have allthe properties of the determinant. We give the argument first in thecase n = 2 where we can draw pictures.

Let Al and A2 be two vectors in ]R2. We define m(Al, A2) to be ±the area of the parallelogram with sides Al and A2, where we take theplus sign if the angle from Al to A2 is less than 1800 and the minussign if it is more than 1800, as indicated in Figure 15.2.1.

4. det (e(1), ... ,e(n)) = 1.

det(Al,"" Aj,"" Ab , An)= -det(At, ... ,Ab ,Aj, ... , An)j

3. it is skew-symmetric (or sometimes called alternating), meaningfor any j =f. k

det(Al,"" aAk + bBk, , An)= a det(Al,"" Ak, , An) + bdet(Al,"" Bk, ... , An)j

2. it is multilinear, meaning that for each k = 1, ... ,n we have

1. det (Al, ... ,An) is real-valued;

of A(R) to the volume of R for any rectangle R (just decompose R intoan approximate un ion of cubes).

We need to show that the volume of the parallelopiped generatedby Al, ... ,An is equal to [det Al. The key to understanding why thisis so is to get rid of the absolute value. Let us consider det A as afunction of the n column vectors Al, ... ,An that comprise A. Withthis understanding we will write det(Al, ... ,An). We now recall thebasic properties of this function, properties that in fact will characterizeit uniquely:

Chapter 15 Multiple Integrals706

Page 726: Strichartz_The Way of Analysis 2000

If the angle is 0° or 180°, then the area is zero, so we need notdetermine the signo Note that this definition automatically makes mskew-symmetric, m(Ab A2) = -m(A2, Al), and it clearly satisfies thenormalization condition m(e(1), e(2)) = 1. Thus the only conditionwe need to check is the bilinearity; and by the skew-symmetry it suf­fices to establish it for one of the variables, say the second. Nowfix Al i= O (if Al = O there is nothing to prove), and let A2 beany vector such that Al and A2 are linearly independent. We claimm(Al, alAl +a2A2) = a2m(Al, A2) for any real al and a2, and this willprove the linearity because every vector in ]R2 has the form alAl +a2A2and a2m(Ab A2) is linear in (al, a2). To prove this identity we com­pare the parallelograms with sides Al, a2A2 and sides Al, alAl +a2A2,as shown in Figure 15.2.2. Notice that they have the same base (IAll)and the same altitude, hence the same area, and also the angle betweenthe sides is always on the same side of 180°; so we have m(Al, alAl +a2A2) = m(Al, a2A2). Next it is clear that the factor a2 multiplying A2changes the area by la21 since this is the change in the altitude. Also,the angle remains the same if a2 > Oand changes by 180° if a2 < O.Thus m(Al' a2A2) = a2m(Al, A2).

By the uniqueness ofthe determinant we conclude that m(Al, A2) =det(Al, A2), which completes the proof that the area of the parallelo­gram generated by Al, A2 is Idet(Al, A2)1. In R" the proof is similar.We define m(Al,' .. ,An) to be the volume of the parallelopiped gen­erated by Al, ... ,An multiplied by ±l. The easiest way to specifythe choice of the sign is to take the sign of det (Al, ... , An). From

Figure 15.2.1:

+

70715.2 Change 01 Variable in Multiple Integrals

Page 727: Strichartz_The Way of Analysis 2000

Theorem 15.2.1 Let g(x) = Ax where A is an n x n matriz. 1/R isany rectangle, then vol(g(R)) = IdetAlvol(R).

the properties of det this guarantees the skew-symmetryand the cor­rect normalization m(e(l},... , e(n}) = 1. To show the multilinearityit suffices to prove linearity in the last factor because of the skew­syrnmetry. Let Al, ... ,An-l be fixed vectors that we may assume tobe linearly independent, for otherwise m(AI, ... ,An-l, An) = O forany choiceof An. We chooseone more vector An, so Al, ,An forma basis for Rn. We will show m(Al, ... , An-l, alAl + + anAn) =anm(Al, ... ,An-l, An), which implies the linearity. We can use thesame argument as beforeto throw awaythe terms alA2+·· +an-lAn-lsince they don't change the base (the parallelopipedof dimensionn - 1generated by Al,···, An-l) and they don't changethe altitude (the dis­tance from alAl +... +anAn to the base); hence they leaveunchangedthe volume = altitude x area of base and also the determinant (us­ing properties of the determinant), so they don't change the signe Som(Al, ... ,An-l, alAl + ... + anAn) = m(Al, ... ,An-l, anAn). Final­ly the factor an changes the measure by lanI since it doesn't changethe base but multiplies the altitude by ~anl;and the determinant ischanged by a factor of an, so the sign changes appropriately. Thus wehave established:

Figure 15.2.2:

Chapter 15 Multiple Integrals708

Page 728: Strichartz_The Way of Analysis 2000

Since we have shown that the absoIute value of the determinant of alinear transformation gives the exact magnification factor for voIumesunder the transformation, the basic principIe of differential calculussuggests that for a el mapping 9 : U --+ Rn (U an open set in Rn) theabsoIute value of the determinant of the derivative J(x) = [det dg(x) Ishould give an approximate local magnification factor for volumes un­der the mapping near x. We call J(x) the Jacobian factor. We thenexpect to get the exact voIume of g(U) by integrating J(x) over U, if 9is one-to-one. It turns out to be extremely difficult to prove this. Be­fore beginning the proof we need to understand some of the difficultieswe wiIl encounter.

If we fix a point i and use the differentiability of 9 at x to writeg(x) = g(i) + dg(i)(x - i) + o(lx - il), then the mapping gl(X) =g(i) + dg(i)(x - x) satisfies vol(gl(R)) = J(i)vol(R) for any rectangleR. Wewould like to say vol(g(R)) ~ vol(g¡(R)) with the approximationimproving as the size of R decreases. But we will need to have anestimate that is uniform in x, and for that we will need to use thecontinuity of the derivative. We will use this idea to prove the upperbound vol(g(U)) ~ fu J(x) dx. But the lower bound is more delicate.Suppose gl(R) is a long skinny rectangle, say [O,e-l] x [O,e] in 1R2,with volume one. Then by moving each point in gl(R) by a distanceof only E (send (x, y) to (x, O)) we can squash the rectangle down toa segment of the x-axis, [O,el] x 0, so the volume drops to zero. Soeven if we know that g(R) is point-by-point close to g1(R), we cannotconclude that vol(g(R)) is close to vol(g1(R)). Instead, we will use theinverse function theorem and the upper bound estimate for the inversemapping, but for this we wiIl have to avoid points where J(x) = O,or even points where J(x) is close to zero. Fortunately these pointscontribute very little to the integral, so we can omit them when we tryto prove the lowerbound vol(g(U» ~ fu J(x) de,

We begin with a simple geometric lemma that gives an upper boundfor the volume of a neighborhood of seto We define the s-neíghborhood~ of A to be the set of points of distance at most e from points inA. This definition malees sense for any set Aj but we will only needto consider parallelopipeds, which are the images of rectangles underlinear mappings.

15.2.2 The Jacobian Factor"

70915.2 Change of Variable in Multiple Integrals

Page 729: Strichartz_The Way of Analysis 2000

g(x) - g(i) =[ dg(i + s(x - i))(x - z) ds,

Now let C6 denote a cube with center at x and side length 6. If 9 isa Cl mapping in a neighborhood of x we want to compare vol(g(C6))with J(x)6n, which is the volume of 91(C6), where g1(X) = g1(X) +dg(x)(x - i) is the best affine approximation to 9 at X. We would liketo have a good approximation for 6 small, but it is important that webe able to control the size of 6 independent of the point X. The nextlemma tells us how to do this in terms of the variation of dg(x) as xvaries over C. Since dg(x) is an n x n matrix, it is natural to measureits size by the matrix norm (reca1l that IIMII is the smallest constante such that IMxl ~ clxl for a11 x in Rn).

Lemma 15.2.2 Suppose Ildg(x) 1I~ M and Ildg(x) - dg( x) 1I ~ e [orall x in C6. Then vol(g(C6)) ~ (J(i) +ce)6n where e is a constant thatdepends only on n and M.

Proof: The hypothesis Ildg(x) 1I~ M easily implies that g1(C6) liesin a ball of radius c6 (here we use the letter e to denote a constantthat depends only on n and M with the understanding that differentoccurences may stand for different constants). We will apply the pre­vious lemma with r = c6. To do this we need an estimate for thedifference g(x) - g1(X). This is provided by the fundamental theoremof calculus:

Lemma 15.2.1 Let A be a parallelopiped contained in a ball 01radiusr in Rn• Then there exists a constant e depending only on n such that

vol(At) ~ vol(A) + ce(r + e)n-1 for all e ~ 1.

Proof: We give the proof for n = 2. Then A is a parallelogram withperimeter P. Clearly every point in At that is not in A must be withindistance of e from P, so At ~ AUPt. Thus vol(At) ~ vol(A) +vol(Pt),and the result follows from the estimate vol(Pt) ~ ce(r + e), which isessentially trivial since P consists of four line segments each of lengthat most 2r, and vol (Lt) ~ (1+ 2e)e if Lisa line segment of length l,

The argument for general n is similar, using induction. The key factis again the uníform upper bound for the (n - 1)-dimensional volumeof the boundary of A that follows from the fact that A is contained ina ball of radius r. We leave the details to the exercises. QED

Chapter 15 Multiple Integrals710

Page 730: Strichartz_The Way of Analysis 2000

for x and i in the same cube, by the uniform continuity oí dg on thecompact closure oí U. With these fixed E and 6, write {Cj} for the

IIdg(x) - dg(x)11 ~ E

Proof: Since the closure oí U is compact, we have a uniíorm upperbound Ildg(x)11 ~ M. This enables U8 to use the previous lemma. Italso allows us to conclude that the boundary oí g(U) has content zero(see exercíses), so the integrals over g(U) are well defined.

We subdivide U into cubes oí side length 6, discarding those thatdo not lie entirely in U. Given any E, we can choose 6 so that

More generally, if I is any non-negatil1e continuous function on theclosure 01g(U), then fg(u) I(x) dx s fu l(g(x))J(x) dx.

vol(g(U)) s fu J(z) dz.

Lemma 15.2.3 Let 9 : U ~ Rn be a one-to-one map 01 a boundedopen set U whose boundary has content zero, and assume 9 is CI onthe closure 01U. Then

QED

vol(g(C6)) ~ VOl(gl(C6)) + ce6(c6 + ce6t-1s (J(x) + ce)6".

Ig(x) - gl(x)1 ~ cd,

hence 9(C6) lies in the ce6neighborhood oí 91(C6). Thus the previouslemma gives the estimate

Thus we have

I(dg(x + s(x - x)) - dg(x))(x - x)1 ~ Elx - xl.

g(z) - gl(Z) =l (dg(x + s(z - x) - dg(x))(z - x) ds.

Since x + s(x - x) lies in C6, we can use the hypothesis to obtain theestimate

so

71115.2 Change 01 Variable in Multiple Integrals

Page 731: Strichartz_The Way of Analysis 2000

1. f(x) dx ~ f f(g(x))J(x) dx + CE f f(g(x)) dx,g(U) Ju Ju

and the desired upper bound for the integral followsby letting e ~ o.QED

As 8 ~ Othe sum 2:j f(g(xj))J(Xj)~ converges to fu f(g(x))J(x) dxand ce2:j f (g(xj ))8n converges to CE fu f (g( x )) dx. It is also true that2:j f(g(Xj) )vol(g(Cj)) converges to fg(u) f(x) dx. We leave the detailsto the exercises. Thus

+ ceEf (g( xj ))8n.j

jj

and vol(U) is finite. Since this holds for any é > 0, we obtain thedesired upper bound for vol(g(U)).

To obtain the upper bound for the integral we simply multiply (*)by f (g( x j )) before summing to obtain

vol(g(U)) s fu J(x) dz + CEvol(U),

With e fixed we can let 8 ~ O (the condition relating 8 to e allowsall values smaller than a fixed 80). The sum 2:j J(xj)8n converges tofu J(x) dx and ce2:j 8n converges to cevol(U). Since the boundary ofU has content zero, it is easy to show that vol(g(Uj Cj )) converges tovol(g(U)) (see exercises for the details). Thus we have the estímate

and summing over j we obtain

vol(g(Cj)) ~ (J(xj)+CE)cfl,

collection of cubes and {x j} for the center points of the cubes. By theprevious lemma we have

Chapter 15 Multiple Integrals712

Page 732: Strichartz_The Way of Analysis 2000

L1l(g(x))J(x) dx ::;L f I(x) dx::; f I(x) dx,j e, j Jg(Cj) Jg(U)

When we sum over all such cubes we obtain

1l(g(x))J(x) dx::; f I(x) dx.C Jg(C)

J l(g(x))J(x) dx::; f I(x) dx.u Jg(U)

To deal with the general case we cut up U into a union of smallcubes, except for a neighborhood of the boundary that does not con­tribute to the integral in the limito For each cube e in the partition,either J(x) does or does not vanish on the closure of e. If not, we arein the special case already completed, so

by the inverse function theorem (remember that the product of deter­minants is the determinant of the product, and dg-1 (x) is the inversematrix of dg(g-l (x)). So we have the desired reverse inequality

Proof: It suffices to prove the result for non-negative 1, and wealready have the inequality in one direction. We start by proving thereverse inequality under the additional assumption that J( x) nevervanishes on the closure of U. Then the inverse function theorem impliesthat g-l is a el map from g(U) to U (remember we are assuming9 is one-to-one, which is not a consequence of the inverse functiontheorem). We may thus apply the inequality of the last lemma tothe function l(g(x))J(x) = F(x) on U = g-l(g(U)) and the map«: :g(U) --+ U. We have fu F(x) dx ::;Jg(U) F(g-1(x))J1(X) dx whereJl(x) = Idetdg-1(x)1 is the Jacobian factor for g-l. But

F(g-1(x))J1(X) = l(x)J(g-1(x))J1(x) = I(x)

f I(x) dx = J l(g(x))J(x) dx.Jg(U) u

Theorem 15.2.2 (Change 01 Variable Formula) Let 9 : U --+ ]Rn bea one-to-one map 01 a bounded open set U whose boundary has con­tent zero, and assume 9 is el on the closure 01 U. Also let I be anycontinuous function on the closure 01g(U). Then

71315.2 Change 01 Variable in Multiple Integrals

Page 733: Strichartz_The Way of Analysis 2000

1. f(x,y)dxdy = l1r roo f(rcosO,rsinO)rdrd8.1R2 -1r Jo

so det dg = r and, hence,

sin 8 )reos 8 '

d = ( cosO9 -rsinO

in the reverse direction. To get a one-to-one correspondence we mustomit the origin in ]R2 (a set of content zero that doesn't contributeto the integral) and suitably restrict 8, say -7r < O ::; n, Thus we letD' = {(r,8): r > ° and -7r < 8::; 7r} and D = {(x,y): (x,y) #- (O,O)},with g: D' -+ D defined by g(r,8) = (rcos8,rsinO). Note that 9 isone-to-one and onto. (The domain D' is not open because it containsthe boundary strip 8 = 7rj but since this is a set of content zero, we canignore it. Technically, to apply the theorems as stated, we should takeD' to be given by -7r < 8 < 7r and D to be ]R2 with the negative realaxis omitted.) We compute easily

x reos O,y = r sin 8

relating the polar coordinates r and 8 to the Cartesian x and y coor­dinates, and

r = (x2 + y2)1/2,Y8 = arctan -,x

As an example we consider polar coordinates. The situation in ]R2 isfamiliar. We have

15.2.3 Polar Coordinates

where the sum extends over the cubes for which J(x) does not vanish.But if we form the lower Riemann sum to approximate the integralfu f(g(x))J(x) dx, the cubes where J(x) vanishes contribute zero whilethe others contribute at most fe. f(g(x))J(x) dx. Thus the lower Rie-

Jmann sum is bounded aboye by fg(u) f(x) dx, hence so is the integral.This completes the proof of the reverse estimate in the general case.QED

Chapter 15 Multiple Integrals714

Page 734: Strichartz_The Way of Analysis 2000

where the first column is (8xiI8r), the second column is (8XjI881), ... ,and the last column is (8x ji 88n-1 ).

-r sin 81'" sin 8n-1r sin 81'" sin 8n-2 c088n-l

sin 81 sin 82'" sin 8n-2 c088n-l TC0881sin 82'" sin8n_2 c088n-1sin 81 sin 82'" sin 8n-2 sin 8n-1 TC0881sin 82'" sin8n-2 sin8n_1

oO

C0881 -Tsin81sin 81 coa 82 .,.coa 81 coa 82

If we take D'{(r,9) : O< r < 00, O< 91 < 11', ••• , O< 9n-2 < 11', -11' <9n-1 < 11'}, then 9 maps D' one-to-one onto D = {xn : Xl ~ O,X2 ~O,... , Xn ~ O},which differs from Rn by a set oí content zero. Next wecompute dg =

r = "¡xi + ... + x~

91 Xl= arccos-r

82 X2= arccosr sin (J1

8n-2Xn-1= arccos

r sin 81 ..• sin 8n-3

9n-1Xn= arctan--.

Xn-1

which we write X = g(r, (J) and in the reverse direction

Xl = r coslhX2 = r sin 81 cos82x3 = r sin (h sin (J2cos83

Xn-1 = r sin (J1sin 82· .. sin (Jn-2 cos (Jn-1Xn = r sin 81 sin 82 ••• sin 8n- 2 sin 8n-1 ,

Next we consider the situation in R". We have polar-spherical co­ordinates r, 8It82, ••• , 8n-1 related to Cartesian coordinates xIt ... ,Xn

by

71515.! Change 01 Variable in Multiple Integrals

Page 735: Strichartz_The Way of Analysis 2000

1. ¡(lxl) dx = en roo ¡(r)rn-1 dr.IRn Jo

This is especially useful if ¡is a radial function (a function of r alone).Then we can evaluate the polar-spherical integral as an iterated inte­gral, and the (J-integrations just produce a constant en depending onthe dimension

= l1r {1r... {1r roo ¡(r cos (JI,r sin (JIcos (J2,... , r sin (JI ... sin (Jn-l-1r Jo Jo Jo

rn-l (sin (JIt-2 (sin (J2t-3 ... sin (Jn-2dr d(Jld(J2... d(Jn-l'

J. ¡(Xl,"" Xn)dXl ••• dx¿IRn

Thus we have the integration formula

QED

n-l(' (J )n-2 . (J:::;::r sm 1 ... sm n-2.

det dg = r sin (JI ... sin (Jn_2(rn-2(sin (Jlt-3 ... sin (Jn-3)

where gn-l denotes the (n -1)-dimensional case. Note that the expres­sion in brackets equals 1. Then by the induction hypothesis

det dg = r sin (JI... sin (Jn-2 [cos (Jn-l (cos 9n-1+ sin2:n-l ) 1 det dgn-l

cos n-l

Proof: We prove the result by induction, the case n = 2 havingbeen previously computed. Suppose the result is true for n - 1. Noticethat the last column of dg (the (Jn-l derivative) has only two non-zeroentries. We can eliminate one of these non-zero entries without chang­ing the determinant by multiplying the last row by sin (Jn-I/ cos (Jn-land adding it to the (n - 1)th-row. This has the effect of reducing theupper left (n - 1) x (n - 1) submatrix of dg to that of the n - 1 caseexcept that the last row is multiplied by cos (Jn-l + sin2 (Jn-I/ cos (Jn-l.Thus

Lemma 15.2.4 det dg = rn-1 (sin (h )n-2 (sin fh)n-3 ... sin (Jn-2.

Chapter 15 Multiple Integrals716

Page 736: Strichartz_The Way of Analysis 2000

This definition makes sense even if 9 is not one-to-one, and it is easy toshow that lJ o g-l is a measure (see exercise set 15.2.5), because takinginverse images commutes with set-theoretic operations. Intuitively, ifyou think of ~ as a mass distribution on X, then lJ o g-l moves themass via 9 onto Y (if sorne regions of y are hit more than once, thenthe mass transported by 9 is simply summed).

A basic property of the image measure is the integral identityJy Id(lJ o g-l) = Íx I o 9 dlJ for any non-negative measurable func­tion I on y (and more generally for any integrable I with respect tothe measure lJ o g-l ). Indeed, if I is the characteristic function of ameasurable set A in :F2, I = XA, then I o 9 = Xg-l(A) because x is in

In the context of the Lebesgue integral we can obtain a stronger changeof variable theorem, allowing a general Lebesgue integrable function Iand also weakening the assumption on the domain. Suppose U is an ar­bitrary open set in lRn and 9 : U --+ lRn is a one-to-one el mapping. No­tice that we are not assuming anything about the behavior of 9 on theboundary of U. In particular, the Jacobian factor J(x) = [det dg(x)1does not have to be bounded. By the way, the boundary of a gener­al open set can be quite bizarre; it can even have positive Lebesguemeasure, so the integral of a function over the closure of U might bedifferent from the integral over U. In most applications of the changeof variable formula the set U is rather tame, but frequently we needto deal with unbounded sets and mappings 9 that do not extend con­tinuously to the boundary. Note that our hypotheses do imply thatg(U) is an open set (but this is a deep theorem, Brouwer's Invarianceof Domain Theorem, which we will not prove here), but the Jacobianfactor may.have zeros in U (as is the case for g(x) = x3 on R).

To explain the change of variable formula from the Lebesgue pointof view we need a simple construction that gives the image of a measureunder a mapping. Suppose we have a measurable function 9 : X --+ Yfor two measure spaces (so there are o-fíelds :F1 on X and :F2 on Ysuch that g-l (A) is in :F1 for every A in :F2)' In our case X will be Uand Y will be g(U), with the e-fields of Borel sets. If lJ is a measureon X, then define the image measure lJ o g-l on y by

15.2.4 Change of Variable for Lebesgue Integrals*

71715.2 Change 01 Variable in Multiple Integrals

Page 737: Strichartz_The Way of Analysis 2000

f Idl-" = f lo 9 d1lJg(R) JR

for any continuous function I on 9(R) . In particular, 1-" (g (R)) = 11 o

g-l (g( R)) o Similarly, if A is any subset of R whose boundary hascontent zero, then l-"(g(A)) = 11 o g-l(g(A)).

Consider first the special case of a rectangle R such that J (x) =1= Oon R. Then the inverse function theorem implies that g-l is a elmapping from g(R) to R, so any rectangle B contained in g(R) is ofthe form B = g(A) where A = g-l(B) has boundary with content zero.Thus 1-" and 11 og-l agree on rectangles contained in g(R), hence by theHahn uniqueness theorem they are equal as measures on g( R).

What do we do with the set where J(x) = O? This is called thecritical set for the mapping g. Notice that it is a closed set, since J(x)is continuous; and if we restrict attention to a bounded subset of U,say U; = U n {Ixl :::;r}, then it is compacto Given any e, we cancover the set of points in U; where J(x) = Oby rectangles on whichJ(x) :::;e (because J(x) is continuous), and by compactness can reduceto a finite subcover, say Rr, oo.RN. Also, by further decomposition if

Proof: Let R denote any closed rectangle contained in U. Then wecan apply the change of variable theorem to the mapping 9 :R -+ g( R)to get

Lemma 15.2.5 JI U is open in ]Rn and 9 : U -+ ]Rn is one-to-oneand el, then 11 o g-l is equal to Lebesgue measure on g(U), where11 = J(x)dl-"Iu(x).

g-l (A) if and only if g( x) is in A. Thus the integral identity reduces tothe definition for characteristic functions. It then follows easily that itholds for simple functions and then for non-negative measurable func­tions by the definition of the integral. The integral identity is in sorneways more natural than the definition since it involves 9 rather thang-l.

What does this have to do with the change of variable formula? Ifwe denote by 1-" Lebesgue measure on R" and I-"Iuits restriction to U andlet 11 = J(x)dl-"Iu(x), then one side of the change of variable formula isfu I o 9 d1l, which is just Jg(U) Id 11 o g-l. Thus the change of variableformula says that 11 o g-l is equal to Lebesgue measure restricted tog(U).

Chapter 15 Multiple Integrals718

Page 738: Strichartz_The Way of Analysis 2000

One consequenceof the change of variable formula is the fact thatthe image of the critica! set under 9 has Lebesgue measure zero,

QED

( fd(v o g-1) = ( f o qdu,Jg(U) Ju

Proof: This is an immediate consequenceof the lemma and theintegral identity

Theorem 15.2.3 Let U ~ Rn be open and 9 : U --+ Rn be one-to-oneand el. Then for any non-negative measurable function f on g(U),

{ fdjJ. = ( f o gJdjJ..Jg(U) Ju

More generally, if f is real- or complex-valued and measurable on g(U),then f is integrable if and only if f o gJ is integrable on U, in whichcase the change of variable formula holds.

1J.l.(A) - 11 o g-l(A)1 :::; cr":«for any measurable subset of g(Ur). First let e --+ O to get jJ. = 11 og-lon g(Ur), and then let r --+ 00 to get jJ. = v og-l on g(U). QED

measures, so

JJ (Y9(Rk)) = v o«: (Y9(Rk)) s e~1'(Rk) = el' (YRk)< éjJ.(Ur) ~ crné

since U; is contained in the ball {Ixl:::; r}.Since J( x) ::j:. O on u,\ Uk n, and we can write this set as a count­

able union oí rectangles, we know that jJ. = 11 og-1 on g(Ur \ Uk Rk) =g(Ur)\ Uk 9(Rk)' But Uk g(Rk) has measure at most cr?e for both

This means

necessary, we can make the interiors oí these rectangles disjoint. Wecan still apply the change oí variable theorem on each rectangle Rk, sowe have

71915.2 Change of Variable in Multiple Integrals

Page 739: Strichartz_The Way of Analysis 2000

2. Prove that a one-to-one el mapping 9 : U --+ g(U) is measure­preserving (¡;,(g(A)) = ¡;,(A) for every measurable subset A ~ U)if and only if det dg(x) = ±1 for every x in U.

1. Evaluate J~oo e-x2 dx by considering ffR2e-x2-y2 dx dy as an iter­ated integral and in polar coordinates.

15.2.5 Exercises

Theorem 15.2.5 Suppose I is continuous on ]Rn except [or afinite set 01 isolated singularities al, ... , ano Suppose we haveI/(x)1 ~ cklx - aklOt:k [or x near ak and I/(x)1 ~ clxl.B [or all largex, where Qk > -n [or all k and f3 < -no Then I is integrable.

Returning to the example of the integration formula for polar coor­dinates worked out in section 15.2.3, we have the validity of this resultfor Lebesgue integrals. In particular, if I (r) is a measurable functionon (0,00) and we consider the corresponding radial function 1(lxl) on]Rn, which is also measurable, we see that 1(lxl) is integrable on ]Rn ifand only if r n-l I (r) is integrable on (O,00). Recall that rOt:is integrablenear r = Oif and only if Q > -1and that r" is integrable near r = 00if and only if Q < -1. That translates to the condition that IxlOt:on]Rn is integrable near the origin if and only if Q > -n and integrablenear infinity if and only if Q < -no In particular, no power IxlOt:isglobally integrable. By cutting and pasting, we obtain the followinguseful criterion for integrability on ]Rn:

Theorem 15.2.4 (Sard) 11 9 : U --+ ]Rn is el and e = {x : detg(x) =a}, then g( e) has Lebesgue measure zero.

¡;,(g(e)) = Owhere e = {x : J(x) = O}. Indeed, ¡;,(g(e)) = lI(e) =fe J d¡;, = O since J is zero on e and part of the proof of the lem­ma involved convering the set g( e) by sets of small Lebesgue measure(after localizing to a bounded region). An examination of the proofshows that we really do not need the assumption that 9 is one-to-one,because all we used was the upper bound ¡;,(g(Rk)) ~ fRk J(x)d¡;,(x)and the proof of this in the previous section did not use the one-to-onehypothesis. This result is of importance in the theory of differentialtopology, where it goes under the name of Sard's Theorem.

Chapter 15 Multiple Integrals720

Page 740: Strichartz_The Way of Analysis 2000

11. a. Suppose U is a bounded open set in lRn whose boundaryhas contant zero. For each fixed Ó, decompose lRn by thestandard tiling with cubes of side length Ó; and let {Ck} bethe collection of cubes in the tiling that lie entirely in U.Show that ¿k vol(Ck) converges to vol(U) as Ó --+ O.

10. Suppose U is a bounded open set in lRn whose boundary has con­tent zero. If 9 : U --+ lRn is Cl on the closure of U, show that theboundary of g( U) has content zero.

9. Complete the proof ofthe estimate vol(Aé) ::; vol(A)+Cé(r+é)n-lfor e < 1 if A is a parallelopiped in lRncontained in a ball of radiusr, where e is a constant depending only on n.

7. Prove ffRnf(tx)dx = cnffR f(x) dx for any t > O where tx =(txl, tX2,"" txn). Show also ffRn f(tx)dx/lxln = ffRn f(x)dx/lxin.

8. Prove ffRn f(x/lxI2)dx/lxln = ffRn f(x)dx/lxln.

6. Define a multiplication on lR2n+l as (x, y, t) o (x', y', ti) =x +x' , y +y' , t +ti +x . y' - x' . y) for x in lRn, y in lRn, t in lRl. Provethat ffR2n+l f( (x, y, t)o( x', y', ti)) dx dy dt = ffR2n+l f(x, y, t)dx dy dtfor any (x', y', ti).

for any non-negative measurable function f : GL(n, lR) --+ R,where xy denotes matrix multiplication. Is the same true if wereplace xy by yx?

r f(xy)ldetxl-ndx= r f(x)ldetxl-ndxJCL(n,fR) JGL(n,fR)

5. Let GL(n, lR)denote the set of n x n invertible real matrices. Provethat for any y in GL(n, lR),

4. Prove that the set of non-invertible n x n matrices has measurezero in lRn2

3. Classify all continuous measure-preserving transformations 9 :lR--+ lR. Give an example of a discontinuous measure-preservingtransformation 9 : lR--+ lR.

72115.2 Change of Variable in Multiple Integrals

Page 741: Strichartz_The Way of Analysis 2000

Corollary The double and iterated Riemann integrals are equal for acontinuous function defined on a compact subsei of the plane whoseboundary has content zero.

Definition A subset of the plane is said to have content zero if forevery e > O there exists a finite covering by rectangles whose oreas sumto at most e.

Theorem For a continuous function on a rectangle R = [a, b] x [e,d),the double integral is equal to each of the iterated integralsJcdU: f(x, y) dx) dy and J:U: f(x, y) dy) dx.

where P denotes a partion a = Xo < Xl < ... < XN = b, e = Yo < Yl <... < YM = d, of the sides of the reetangle, Xj-l ~ Xj ~ Xj, Yj-l ~Yj ~ Yj, and the limit is taken as the maximum lengths of Xj - Xj-land Yk - Yk-l both tend to zero.

N MS(f, P) =L L f(Xj, ih)(xj - Xj-l)(Yk - Yk-l)

j=lk=l

Definition Jf f is a continuous function defined on a rectangle R =[a, b] x [e,d] in ]R2, the Riemann double integral J IR f(x, y) dx dy isdefined to be the limit of Cauchy sums

15.3 Summary

12. Let U be a bounded open set in ]Rn whose boundary has contentzero. Consider finite partitions U = UjAj where the sets Ajhave disjoint interiors and boundaries with contant zero. For fa continuous function on the closure of U, form the generalizedCauchy sums ¿j f(xj )vol(Aj) where Xj is an arbitrary point inAj. Prove that these sums converge to Iu f(x) dx as the maximumdiameter of the sets Aj tends to zero.

b. Suppose that 9 : U --+ ]Rn is one-to-one and Cl on the closureof U. Show that ¿k vol(g(Ck)) converges to vol(g(U)) asÓ --+ O.

Chapter 15 Multiple Integrals722

Page 742: Strichartz_The Way of Analysis 2000

Theorem 15.1.4(L1 Convolution Theorem) The convolution I*g(x) =J I(x - y)g(y) dy is well defined [or I and 9 in Ll in the sense that[or almost every x, the [unction h(y) = I(x - y)g(y) is integrable.Furthermore, 1* 9 is in L1 and 11I * gll1 ~ 11/11111gll1·

Theorem 15.1.3 (Fubini's Theorem, third version) Let I be measur­able on }R2. 11 one 01 the iterated integrals 01 1I1 is finite, then I isintegrable and the double and iterated Lebesque integrals (in both order­s) 01 I are equal.

is not integrable, but the iterated integral fU I(x, y) dx) dy = O.

{

1 i/O < x < 1,I(x, y) = -01 il - 1 < x < O,

otherwise

Example The [unction

Theorem 15.1.2 (Fubini's Theorem, second version) Let I be inte­grable on R2. Then the double and iterated Lebesgue integrals are e­qual; and [or almost every y, I(x, y) as a [unctiot» 01 x is integrableand J I(x, y) dx is integrable as a junction 01 y, and the same is truewith x and y reversed.

Theorem 15.1.1 (Fubini's Theorem, first version) Let I be a non­negative measurable [unction on R2• Then the double and iteratedLebesgue integrals are equal.

Corollary The Lebesgue integral 01 a non-negative [unction I on R isequal to the Lebesque mensure 01 the region under the graph.

[or any mensurable set A, where the iterated Lebesque integrals are welldefined.

Lemma 11 J.l. denotes Lebesgue measure on R2, then

72315.3 Summary

Page 743: Strichartz_The Way of Analysis 2000

Lemma 15.2.2 Let e8 denote a cube centered at i of side length Ó.Suppose 9 is el on e8 satisfying Ildg(x)11=M and Ildg(x) -dg(i)11 ~ e

Lemma 15.2.1 There exists c depending only on n such that vol(Aé) ~

vol(A) + cé(r + é)n-l for e ~ 1 for every parallelopiped A contained ina ball of mdius r .

Definition The é-neighborhood Aé of a set A is the set of points atdistance at most e from A.

Definition lf 9 : U --+ JRn is a el function for U an open set in JRn,then the Jacobian factor is J(x) = Idetdg(x)l.

Theorem 15.2.1 lf g(x) = Ax where A is an n x n matrix, thenvol(g(R)) = [det Alvol(R) for any rectangle R.

Definition Let Al, ... ,An denote n vectors in JRn. The parallelopipedgenemted is defined to be the set of vectors of the form L:j=l bjAj withO ~ bj ~ 1.

15.2 Change of Variable in Multiple Integrals

Theorem 15.1.5 (Hahn Uniqueness Theorem)lf two a-finite measuresare equal on a field, they are equal on the a-field generated by the field.

Definition A measure 1-" on X is a-finite if X = U~l Xj and I-"(Xj)is finite for all j.

Lemma 15.1.2 lf A is a Borel subsei ofJR2, then every section Ay is aBorel set of JR and the Lebesgue measure of Ay is a measurable functionofy·

Lemma 15.1.1 (Monotone Class)lf a monotone class contains a field,it contains the a-field genemted by the field.

Definition A monotone class M is a collection of subsets of X that isclosed under monotone increasing and decreasing sequences.

Chapter 15 Multiple Integrals724

Page 744: Strichartz_The Way of Analysis 2000

Xl = r cos91X2 = r sin 81 cos82x3 = r sin 81 sin 92 cos83

Xn-1 = r sin 91 sin 82· .. sin 9n-2cos9n-1Xn = r sin 91 sin 92 ... sin 9n-2 sin 9n-l.

Deftnition Polar-spherical coordinates in2nfor n ;:::3 are r, 91, 92, ... , 8n-1given by

Example (Polar Coordinates)

1. I(x, y) dx dy = 111' lOO I(r cos9, r sin 9)r dr d9.aft -11' Jo

Theorem 15.2.2 (Change 01 Variable Formula) Let 9 : U --+ Rn be aone-to-one map 01 a bounded open set U whose boundary has contentzero, and ~sume 9 is el on the closure 01U. Let I be continuous onthe closure 01g(U). Then

1 I(x) dx = I l(g(x))J(x) ds:g(U) Ju

for any non-negative continuous function I on the cl08ure 01g(U).

Lemma 15.2.3 Let 9 : U __.,2n be one-to-one on a bounded open setU whose boundary has content zero, and suppose 9 is el on the closure01U. Then vol(g(U)) s fu J(x) dx and

1 I(x) dx 5 I l(g(x))J(x) dxg(U) Ju

for all x in elj. Then vol(g(e(j)) 5 (J(i) + ce)6n where e depenclsonlyonn andM.

72515.9 Summary

Page 745: Strichartz_The Way of Analysis 2000

Theorem 15.2.5 Let I be continuous on]Rn except [or a finite set 01singularities a1, ... , aN, and suppose I/(x)1 ~ cklx - aklQ:k [or x nearak and I/(x)1 ~ clxl.B [or all large x, where Ctk > -n [or all k andf3 < -n. Then I is integrable.

Theorem 15.2.4 (Sard) 119 : U -+ ]Rn is el and e= {x : det dg(x) =O} is the critical set 01g, then I-"(g(e)) = o.

Theorem 15.2.3 Let U be open and 9 : U -+ ]Rn be one-to-one andel. Then Ig(u) I dI-"= Iu I o gJ dI-"[or any nonnegative measurablefunction I on g(U). Moreover, I is integrable il and only il I o gJ isintegrable on U, and the same identity holds.

Lemma 15.2.5 Let 1-" denote Lebesgue measure on]Rn and I-"Iu denoteits restriction to U. Suppose U is open and 9 : U -+ ]Rn is one-to-oneand el. 11 v = J(x) dl-"Iu, then v o g-l = I-"Ig(u).

Lemma 11 I is a non-negative measurable function on Y, thenIy Id(lI o g-l) = Ix I o gdv.

Definition 11 9 : X -+ y is measurable junctior: (with repect to (J­

fields 1=1 on X and 1=2 on Y) and 11 is a measure on X, then the imagemeasure 1I0g-1 on Y is given by vog-1(A) = v(g-l(A)) [or any A in1=2.

Corollary Ilg is radial (i.e., a junction oj r alone), then ITJtn1(lxl) dx =en Iooo l(r)rn-1 dr where en is a constant depending on the dimensionn alone.

d td n-1 (si 9 )n-2(' 9 )n-3 . 9e 9 = r sin 1 sin 2 •.. sm n-2'

one-to-one onto D = [z :Xj '# o all j} and

ni = {(r,9) : o < r < 00, o < 9j < 7r [or j ~ n - 2 - 7r < 9n-1 < 7r}

Lemma 15.2.4 11 9 denotes the mapping (r,9) -+ x given by polar­spherical coordinates, then 9 maps

Chapter 15 Multiple Integrals726

Page 746: Strichartz_The Way of Analysis 2000

727

angle 360angular momentum 508anti-derivative 208, 520approximate identity lemma 301, 541approximate identity 297, 330, 533,

558approximating sum 201approximation by polynomials 296Archimedes xiiiare length parametrization 616, 653are length 212, 610, 614arcsin 338arctangent integral 346arctangent 218, 346, 349arcwise connectedness 393are 394area under the graph 201, 695area 201, 631arithmetic 38Arzela-Ascoli theorem 312,383,392,

410, 485, 490, 492associated 361associative 41, 298, 356attractor 412autonomous 502, 581average value 668average 206axiom oí Archimedes 20,34,42,46,

47, 63, 64, 370

a.e. 667absolute convergence oí Fourier

series 548absolute convergence 250absolute value 20, 242absolutely convergent improper in­

tegral 233absolutely convergent integral 655absolutely convergent 253, 260, 262

640, 664acceleration 177accumulation point 79additive group 327additive identity 41additive 662, 664additivity 206, 224, 231, 628, 644affine approximation 419affine function 145, 419algebraic identities 48algebraically complete 242algebraic 177álgebra 399, 631algorithm 66almost everywhere 550, 667alternating 706analytic continuation 281, 285analytic expression 558analytic íunctions 285, 286, 293

A primer oí real analytic functions294

Index

Page 747: Strichartz_The Way of Analysis 2000

e= 281,440e- 440Cantor set 95, 99, 633, 641, 652, 554,

659Cantor xiv, 8, 11, 25, 634Carathéodory theorem 652Carathéodory 643, 648cardinalí ty 8Carleson 550Cartesian coordinate 581, 715Cartesian product 10, 13, 22, 601,

700Cauchy completion oí the rationals

26Cauchy criterion 30, 50, 234, 253,

265, 274, 313, 374Cauchy data 467Cauchy initial value conditions 467Cauchy principal value integrals 234Cauchy problem 467Cauchy sequence 25, 31, 34, 36, 50,

62, 246, 374, 376Cauchy sum 202,220,247,249,268,

487,692,Cauchy's method of majorants 496Cauchy-Kovalevsky theorem 494Cauchy-Schwartz inequality 361, 362,

365, 479, 674Cauchy xiv, 25, 30, 268celestial mechanics 459center 369Cesare summability 538, 550chain rule 168, 428change oí variable 211, 232, 705, 713,

. 717,719character identity 526characteristíc function 275, 308, 658,

694characteristic polynomial 445

Index

e([a, b)) 359el 151,422e2 177,440

ball 369Banach space 355, 377, 670basis 356Bernoulli xiii, 515, 547Bessel functions 285, 348Bessel's differential equation 501Bessel's inequality 544, 546Bessel's o.d.e. 500best affíne approximation 148, 420best approximation 307big Oh 147bilineari ty 360binomial expansion 252, 287binomial series 295binomial theorem 279, 324Bishop 26, 66blip functions 329Bolzano's example 403Bolzano xiv, 25, 403Boolean algebra 632Borel sets 633Borel332boundary conditions 464, 519, 520,

529bounded away from zero 46bounded from aboye 75bounded variation 618boundedness 382bounded 40, 84broken line segment 395Brouwer fixed point theorem 399Brouwer's invariance oí domain the-

orem 717Brouwer 399Brownian motion 403

728

Page 748: Strichartz_The Way of Analysis 2000

compound interest 336computer experiment 255Conditional continuity from aboye

635conditional probability 655configuration space 505connectedness 113, 393conservation of energy 506constant coefficient partial differen-

tial operator 454constraints 602construction 68constructive analysis 66constructive content 66constructive real number system 26Constructivism 6content zero 228, 693content 228continued fractions 55continuity from below 635continuity of translation 681continuous differentiability 148continuous from the right 125continuous functions 387continuously differentiable 151, 422continuous 111, 388Continuum Hypothesis 12contour map 436contractive mapping principle 397,

469,471,476,482,574,575contrapositive 2converge conditionally 258convergence in the mean 543convergence 250, 276convergent subsequence 310converge unconditionally 258converge uniformly on compact sets

268converge 36, 373

729

characters 527Chebyshev inequality 672, 675, 681choice function 23chord-Iength 369circle 583, 584, 593, 599cis 343classical mechanics 505classical real number system 26closed interval 86closed sets 91, 373closure properties 372closure 97, 373cluster point 79, 92Cohen 12commutative group 356, 526commutative 41commutativity 298compact domains 131, 391compact sets 99compactness 309, 377comparison test 253, 255, 262completeness of the reals 51completeness 28, 34, 50, 246, 374,

547, 680complete 382, 672completion 376, 670complex analysis 247complex conjugate 250complex domains 286complex exponentials 344complex inner product 365complex multiplication 347complex number system 241complex numbers 54complex plane 285complex square root 573complex vector space 364complex-valued functions 247, 274composition 168, 288

Index

Page 749: Strichartz_The Way of Analysis 2000

degree 390dense 28, 97, 374density oí continuous functions 678density oí the rationals 34, 48derivative 143, 420derived set 99determinants 570, 705diagonalizable 444diagonalization argument 11, 81diagonal 313diameter 382, 652difference equation 485difference operator 439differentiable 144, 404, 419, 420differential 420, 597differentiate 281differentiation oí integrals 207, 432,

668dilate 650direct proof 6directional derivative 423Dirichlet boundary conditions 521Dirichlet kernel 533, 539, 551Dirichlet problem 561Dirichlet's function 223, 225, 228,

270, 624Dirichlet 223, 531, 559discontinuity oí the first kind 125discontinuity oí the second kind 125,

275discriminant 55distance function 357diverge to +00 251divergent 251divide and conquer 53, 76, 130domain 111Dominated Convergence theorem 625,

666, 670dominates 225, 665

Index

DN 533D'Alembert 518damped vibrating string equation 529damping term 518decreasing 154Dedekind cuts 56, 59Dedekind real number system 60Dedekind xiv, 25, 59definite integral 201definition oí the integral 660

converse 2convolution 297, 698cookbook 463coordinate components 389coordinate maps 599coordinate patches 595, 599coordinate projections 389corners 151cosine 285, 337, 344countable additivity 628countable axiom oí choice 22, 381countable coverings 637countable dense subset 98, 312, 380countable sets 8countable 9counting measure 640, 654coupled 467cover 101Cramer's rule 570critical point 441, 451critical set 718, 719cross product 435crumpled handkerchieí 626cube root 55, 151curl508curve 394, 568, 581curvilinear coordinates 581cusp 584, 585

730

Page 750: Strichartz_The Way of Analysis 2000

Fu set 232fair coin 640Fatou's theorem 663, 670Fatou 663Fejér kernel 539Fejér 539, 550Fermat 157Fibonacci sequence 263field axioms 41, 241field 19, 41, 632filter 23finite coverings 637finite intersection property 106first order linear partial differential

equation 506first uncountable ordinal 633fixed points 397flaps 301, 305, 334fold singularity 587follow your nose 15Fourier analysis 527Fourier coefficients 522, 532, 676Fourier cosine expansion 523Fourier series xiii, 112, 255, 267, 270,

276,515,624,625,670,676Fourier transforms 527Fourier's conjecture 550Fourier 519, 559fractal geometry 650fractals 643Frankel25Frege xiv, 25Fubini's theorem, first version 696

exponential function 285, 323, 459exponential growth 328exp 324extended real number system 74,657extensions 524

731

e 323s-neighborhood 709eigenvalue 444, 521eigenvector 444, 521Einstein 459elementary functions 389embedded surfaces 585embeddings 585, 588empirical evidence 520energy 505entropy 610equality of mixed partials 437equation of a vibrating string 515equicontinuity 309equivalence class 35, 36, 668, 671equivalent 34, 85, 386, 671Euclidean distance 357Euclidean inner product 361Euclidean norm 358Euclidean space 355, 368Euclid 59Eudoxes 59Euler identities 345, 525Euler's method 485, 490Euler xiii, 74, 517, 525everted 585exact 568, 580examples of measures 639existence and uniqueness 467existence of Lebesgue measure 648existence of measures 643existential quantifier 3exponential decay 328

dot product 360, 435double indexed series 260, 262, 704doubling time 336du Bois Reymond 550dyadic pieces 255

Index

Page 751: Strichartz_The Way of Analysis 2000

i 241image 111, 132, 387immersed surfaces 585

Holder condition 123, 126, 194Hahn uniqueness theorem 703Hahn 703half life 336Hamilton-J acobi equations 506Hamiltonian mechanics 505Hamiltonian 505harmonic analysis 520, 525harmonic series 255hat function 308Hausdorff dimension 653Hausdorff distance 411Hausdorff measures 643, 650Hausdorff 650heat equation 519, 555, 680, 682heat kernel 558Heine-Borel property 378, 385, 391Heine-Borel Theorem 103, 230, 274,

379, 629Heine xiv, 25Hermitian linearity 365Hermitian symmetry 365Hessian 437, 607higher derivatives 437higher order equations 481Hilbert space 355, 377, 673homogeneity 358homogeneous linear o.d.e. 473homomorphisms 527Hubbard 560hypergeometric functions 285, 348

greatest lower bound 77group representations 527group 527

Index

G8 set 232G8 set 639, 642GL(n, lR) 721g.l.b. 77Gódel IzGalileo 8gamma function 348General Relativity 459generated 633geometric average 280geometric series 251Gibbs' phenomenon 550, 555Global Existence and Uniqueness 472global Lipschitz condition 470, 474global solutions 461glue 329gluing 128Goldbach's conjecture 5, 66gradient vector field 502gradient 421Gram-Schmidt orthogonalization 522graph of a function 582graph-of-function representation 582graph 113gravitational forces 459gravitation 508

Fubini 's theorem, second version 697Fubini's theorem, third version 698Fubini 's theorem 694, 700, 702full Fourier series 524functional-differential equations 460functional 309functions of a complex variable 286functions 111fundamental theorem of algebra 242fundamental theorem of the calculus

207, 250, 271, 462fundamental tone 517

732

Page 752: Strichartz_The Way of Analysis 2000

L1 convolution theorem 698L1 metric 376L1670L2-norm 544L2673l'Hópital's Rule 127, 185, 191, 453l.u.b. 77Lagrange interpolation polynomial 297Lagrange interpolation 296Lagrange multipliers 602, 603Lagrange remainder formula 188, 193,

248, 452

KN 539kernel 588Kolmogorov 550, 640Krantz 294

Jacobian 709, 717jump discontinuity 125, 136, 229,

275

interior 96, 370intermediate value property 248,396intermediate value theorem 130, 158interval of convergence 280intrinsic 596, 600Intuitionism 6Inverse Function theorem 171, 571,

592inverse functions 171, 567inverse images 121, 387invertible 569isodiametric theorem 652isolated point 118iterated function system 412iterated integrals 691, 694, 704iterated mapping 397iterate 177

733

immersions 585, 588Implicit Function Theorem 462, 567,

571implicit description 597implicit differentiation 571, 578implicit function theorem 598implicitly 582improper integrals 232, 219, 625improper Riemann integrals 664increasing 77indefinite integral 210independent trials 640independent 700indirect proofs 6, 16infimum 74infinite decimal expansions 29, 56infinite matrix 259, 312infinite sets 8infinite Taylor expansion 335infinitesimals 63, 628inf 74, 391inhomogeneous linear o.d.e. 473initial value conditions 463inner product space 355inner product 358inner regularity 638, 642integral equation 487integrable functions 664integrable singularity 233integrable 664integral curves 501integral differential equation 484integral equation 467, 482, 579integral remainder formula 210, 218,

248, 249integral 201, 655integration by parts 210integration of the derivative 209interchange of integrals 214, 691

Index

Page 753: Strichartz_The Way of Analysis 2000

m-dimensional 591m-dimensional manifold 599m-dimensional surfaces 582Majorant lemma 497majorant 496manifold 599mapping 397mass 631maxima and minima 602maximum and minimum problems

432maximum intervallength 202max 128mean value theorem 160, 210, 248,

272, 436mean-square convergence 550measurable functions 655, 657measurable sets 631, 634, 655measurable space 655measurable 656

149,163,232,274,397,421,469, 578

little oh 147local coordinate map 595local existence and uniqueness 476local extrema 441local inverse function theorem 175local Lipschitz condition 470, 476local maxima and minima 154, 441,

451local solutions 460localization theorem 559localization 549log-log graph paper 336logarithm 279, 285, 323, 347logic of connectives 2log 326lower semi-continuity 125

Index

Lagrange 296Lakatos 268Laplace equation 682latitude-Iongitude 584law of cosines 348least upper bound 77Lebesgue approximate sums 656,658Lebesgue Differentiation of the Inte-

gral Theorem 668Lebesgue integral 623Lebesgue measure on ]Rn 639, 648Lebesgue measure zero 550Lebesgue measure 627,630,634,636,

641,643,648,652,654,718Lebesgue monotone convergence the-

orem 661Lebesgue sets 634Lebesgue spaces 670Lebesgue theory of integration 224Lebesgue 163, 270, 550, 626, 634Legendre functions 348Leibniz xiii, 74, 143lemniscate 605length 611, 627level sets 597lexicographic order 283liminf 81limit points 92, 78, 373limits from aboye and below 124limits of functions 111limit 31, 36, 50, 73, 201, 244, 373limsup 81linear algebra 356, 419linear equations 473linear spline 275linearity 206, 224, 231linearly independent 356linear 461, 662, 664Lipschitz condition 123, 129, 138, 139,

734

Page 754: Strichartz_The Way of Analysis 2000

o.d.e. 460, 567one-to-one 112onto 171, 387open balls 369open covering 101, 378open interval 86open neighborhood 90open sets 86, 368operator 166, 438, 520order of a zero 190

natural logarithm 323negative definite 441negative 20, 41, 46negligible sets 638neighborhood 74, 90, 370nested sequence 385nested 105Neumann boundary conditions 522Newton xiii, 143, 177Newton's method 573Newtonian mechanics 505, 516non-constructive mathematics 21non-measurable function 657non-negative definite 441non-positive definite 441non-standard analysis 63non-standard numbers 26, 64nondegenerate 452nonmeasurable sets 644nonnegativity 634, 643nonsingular 569normal form 462, 580normal .§.pace598normed space 355norm 358, 361, 364, 570, 670nowhere differentiable functions 403null-space 588numerical integration 214

735

n-body problem 508Napier 350

measure preserving 720measure space 655measure zero 638, 667, 693measure 623, 631, 634Meray 25metric outer measure 647metric space 355, 368metric 357midpoint rule 216Minkowski's inequality 470, 664, 669min 128models 12modulus of continuity 124modulus 242, 305moments 305momentum space 505momentum 508monomial 390Monotone Class Lernma 700Monotone Convergence Theorem 625,

660, 661, 665monotone class 701monotone decreasing 135monotone function theorem 135monotone functions 134monotone increasing 77, 135, 153monotone 635, 662, 664monotonicity 643multi-index 389, 440multilinear 706multiple integrals 228, 625, 691multiplicative group 327multiplicative inverses 241multiplicative 42multiplicativity 242multiply indexed series 259

Index

Page 755: Strichartz_The Way of Analysis 2000

Peano existence theorem 490Peano xiv, 25, 407perfect sets 93, 96perimeter 710periodic convolution 533, 704periodic extension 524periodic 218perturbation series 474, 570perturbation 469perturbed equation 474phase space 505Picard iteration 471, 482, 579piecewiseconstant 275piecewiselinear 275pointwise convergence263, 274polar coordinates 348, 453, 714, 720polar-spherical coordinates 715polarization identity 363, 366polynomials 389, 448, 296porcupine quills 465positive definiteness 361, 441, 451positive 20, 46positivity 357, 358, 368potential energy 506power series 184, 276, 293, 308, 485,

494, 525power set 10powers 328primitive 210principal normal 617principal value integral 234probability 626, 631, 640, 655, 700product and quotient rules 165product measure 700, 703Projection theorem 544proof by contradiction 16Proofs and refutations 268proofs 13proper integrals 219

Index

7r 349p-adic integers 386p-adic metric 385p-norm 359p.d.e. 515parabolas 607paradox 628parallelogram law 363, 367parallelopiped 705parametric representations 582parametrically 581Parks 294Parseval identity 547, 549, 670, 676,

679partial derivatives 423partial differentialequations 506,515partial sums 250, 532partition the range 626, 655partitioning the domain 626partition 201, 248path 394Peano curves 407

order of quantifiers 263ordered field 20, 27, 34, 38, 46,order 20, 45, 390ordinary convergence251ordinary differential equations 343,

459Oresme 208orthogonal matrix 444orthogonal projection 544orthogonal 367orthonormal basis 366oscillation 83, 221, 248osculating plane 617outer measure 643outer regularity 638overtone series 517

736

Page 756: Strichartz_The Way of Analysis 2000

O"-additivity628, 634, 643O"-field633O"-finitemeasures 703O"-subadditivity636, 643saddle points 448, 605Sard's theorem 720scalar field 356scalar multiplication 356scalar product 360scalars 356scaling property 650Schonberg 407secant lines 177second derivative test 605second derivative 177, 437second difference 179second mean value theorem 164second-order estimate 216sections 694, 702separable 485separated 647separately continuous 390

Riemann integrable 220, 223, 249,268, 275, 623, 663, 669

Riemann integral 220, 270Riemann theory of integration 663Riemann upper and lower sums 202,

220, 247, 269Riemann zeta function 254Riemann-Lebesgue lemma 547, 549,

681Riemann 219, 531Riesz-Fischer theorem 677, 678rings 648Robinson 26, 63, 143Rolle's theorem 161root test 254, 262, 278Russell 25

737

radial function 716radius of convergence 276, 277, 286radius of curvature 617radius 369, 382range 111rank 587rate of convergence 252ratio test 254, 262rational number system 18, 19, 34Rayleigh quotient 446real and imaginary parts 244, 247,

258, 262, 274real number system 26, 36rearrangements 256, 283, 292recipes 463reciprocal 42rectifiable curves 610recursive functions 67recursive sets 10reductio ad absurdum 17refinement 204, 656regular singular points 500regularization 331relatively open 370removable discontinuity 125representative function 671retarded differential equations 460Riemann integrable 669

quadrants 154quadratic form 441, 606quantifiers 1

propositional calculus 2prosthaphaeresis 350Pythagorean formula 357Pythagorean metric 368Pythagorean theorem 242, 545

Index

Page 757: Strichartz_The Way of Analysis 2000

tail of the series 256tangent bundle 601tangent line 153, 177tangent plane 421tangent space 582, 590, 596taxicab distance 359Taylor expansion 183, 281, 303, 332,

448Taylor series 293Taylor's theorem 167, 181, 210, 218,

248, 249, 448temperature 519term-by-term 281topological space 372topology 73, 90, 123torus 601total derivative 420trajectory 394transcendental functions 459transcendental 325

subsequences 79subspace 368summability method 538summability 537surnmation by parts 260summing along diagonals 259sup-norm 360, 366, 368, 376, 383,

469, 670supremurn 74sup 74, 391surface of revolution 601surfaces 568, 581symmetric difference 632symmetric matrix 441, 522symmetric second difference 192symrnetry 357, 360, 368, 521systems of ordinary differential equa-

tions 459

Index

sequence of approximations 28sequences of polynomials 296shufHedsequence 78side conditions 463signed volume 706simple closed curve 613simple functions 658simple zero 190Simpson's rule 217, 218, 485sine series 520sine 285, 337, 344singularity 287, 500, 675, 699, 720skew-symmetric multilinear form 706skew-symmetric 525, 706Smale 585Solovay 631space filling curve 407spans 356special functions 285, 348spectral theory 520spectral theorem 444, 522speed of sound 454sphere 369, 584, 593, 599, 601, 609spherical coordinates 584, 593splitting condition 643, 644square roots 52, 347, 581standard basis 356step function 275Stone-Weierstrass theorem 399, 543strict local maxirnurn 154strictly increasing 153strongly separates points 399structure theorem for open sets 88,

638structures 355Sturm-Liouville operator 530subadditivity 635subcover 101, 378subsequence selection function 80

738

Page 758: Strichartz_The Way of Analysis 2000

Zermelo 25zero 131zoom 153zeros of finite order 191

Weierstrass approximation theorem184,296,297,301,305,543

Weierstrass xiv, 25, 403, 407weight function 206weighted average 298weighted integral 206well-ordering 21Whitehead 25

vector addition 356vector field 501vector space 355velocity vector 586vibrating string equation 454, 518Viete 350volume 631, 693, 705von Neumann 522

universal quantifier 3unrestricted convergence 272upper bounds 75upper semi-continuity 125

739

ultra metrics 385ultrafilter 23unbiased average 206unbounded frc .. '" above 75uncountable axiom of choice 631uncountable sets 10uncountable 9uniform Cauchy criterion 313uniform continuity theorem 133uniform continuity 111,116,117,264,

266, 391uniform convergence 263, 272, 392,

531uniform limits 263uniformly bounded 309, 383, 392uniformly convergent subsequence 309,

392uniformly differentiable 165uniformly equicontinuous 311, 383,

385, 392uniqueness of Fourier series 543uniqueness of Lebesgue measure 703unit sphere 568universal gravitational constant 508

transfinite induction 633translation invariance 212, 654translation 298, 525transpose matrix 367transversal 147trapezoidal rule 216, 485triangle inequality 20, 47, 242, 243,

357, 358, 366, 368trigonometic polynomial 410, 543trigonometric functions 213, 244, 337,

520truth table 2twice continuously differentiable 177two dimensional picture 32

Index

Page 759: Strichartz_The Way of Analysis 2000

ISBN 0-7b37-1497-b

mIl

Robert S. Strtchartz is Professor ofMathematics ar Cornell Universrty.He ,eceived a Ph.D. from PrincetonUnlversity in 1966 and has been anactive researcher In harmonie analysis.partlal ditferenlial equaltons. and geo­rnetnc analysis Inaddition, he iswellknown for e~poslto'Y wrlUng ilnd 15apast redpient 01 the Lestt'r Ford Awardrrem the Mathematical~sociation 01America.

fa~Jones and 8artten Publl\t'lprs40 Ta" Pi.....Dr'veSudbuIY, MA 017769111-443<,000inf08lbpub.com

Mathematicsls a way 01 thought.Anaining a deep undetstanding 01mathematlcs is mote than mastenng acotlection 01 theorems. deñnttlons,problerns, and techniques; jI is under­standing how rhecrems and definltlonslit together with the overatl strategy ofargumenu presented Thls lrrtroducttonto real analysis combines thorough andcomplete p,oofs wlth Ilvely and gener­ous explanetícns to gulde the readerIhrough the toundatlons and the way01 analysls. Real analysis. in one andseveral variables, ls devalcped f,om theccnstructlon of the real number systemlo an introduction 01 the Lebesquetntegral. Addluonally. there are threechapters on application of analysls.ordinary differential equetlens, Four[erseries. and curves and surfaces. to showhow the techniques of analysis areused in concrete settinqs