lecture notes for math 61cm, analysis, version...

79
Lecture notes for Math 61CM, Analysis, Version 2018 Lenya Ryzhik * December 6, 2018 Nothing found here is original except for a few mistakes and misprints here and there. These notes are simply a record of what I cover in class, to spare the students some of the necessity of taking the lecture notes. The readers should consult the original books for a better presentation and context. The material from the following books will be likely used: L. Simon ”An Introduction to Multivariable Mathematics”, V. Zorich ”Mathematical Analysis I”, and maybe W. Rudin ”Principle of Mathematical Analysis”. 1 The real numbers In this section, we will introduce the set of real numbers R via a collection of several groups of axioms: we will postulate that the real numbers (1) form what is known as a field, (2) they are ordered, and (3) satisfy the completeness axiom. The latter may be less intuitive at the first sight but, as we will soon see, is absolutely crucial for the subject known as ”mathematical analysis”. The intuition behind the axioms is that they should formalize what we are know as ”real numbers” from ”past experience”. 1.1 Axioms of addition and multiplication First, we need to take care of the arithmetic of real numbers. We will assume that the operations of addition and multiplication are defined for real numbers, and satisfy the following properties: (A1) The addition operation is associative: (a + b)+ c = a +(b + c) for all a, b, c R, (A2) There exists a zero element denoted by 0, so that x +0=0+ x = x for all x R. (A3) For each x R there exists an element (-x) R such that x +(-x)=0. (A4) The addition operation is commutative: a + b = b + a for all a, b R. Axioms (A1)-(A3) say that R is a group with respect to addition, and (A4) says that this group is commutative. We will go over the general notion of a group again in the linear algebra portion of the class. * Department of Mathematics, Stanford University, Stanford, CA 94305, USA; [email protected] 1

Upload: others

Post on 21-Jun-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture notes for Math 61CM, Analysis, Version 2018math.stanford.edu/.../STANF61CM-18/lecture-notes-61cm-18-analysi… · Lecture notes for Math 61CM, Analysis, Version 2018 Lenya

Lecture notes for Math 61CM, Analysis, Version 2018

Lenya Ryzhik∗

December 6, 2018

Nothing found here is original except for a few mistakes and misprints here and there. Thesenotes are simply a record of what I cover in class, to spare the students some of the necessity oftaking the lecture notes. The readers should consult the original books for a better presentation andcontext. The material from the following books will be likely used: L. Simon ”An Introduction toMultivariable Mathematics”, V. Zorich ”Mathematical Analysis I”, and maybe W. Rudin ”Principleof Mathematical Analysis”.

1 The real numbers

In this section, we will introduce the set of real numbers R via a collection of several groups ofaxioms: we will postulate that the real numbers (1) form what is known as a field, (2) they areordered, and (3) satisfy the completeness axiom. The latter may be less intuitive at the first sightbut, as we will soon see, is absolutely crucial for the subject known as ”mathematical analysis”. Theintuition behind the axioms is that they should formalize what we are know as ”real numbers” from”past experience”.

1.1 Axioms of addition and multiplication

First, we need to take care of the arithmetic of real numbers. We will assume that the operations ofaddition and multiplication are defined for real numbers, and satisfy the following properties:(A1) The addition operation is associative:

(a+ b) + c = a+ (b+ c) for all a, b, c ∈ R,

(A2) There exists a zero element denoted by 0, so that

x+ 0 = 0 + x = x for all x ∈ R.

(A3) For each x ∈ R there exists an element (−x) ∈ R such that

x+ (−x) = 0.

(A4) The addition operation is commutative:

a+ b = b+ a for all a, b ∈ R.

Axioms (A1)-(A3) say that R is a group with respect to addition, and (A4) says that this group iscommutative. We will go over the general notion of a group again in the linear algebra portion ofthe class.∗Department of Mathematics, Stanford University, Stanford, CA 94305, USA; [email protected]

1

Page 2: Lecture notes for Math 61CM, Analysis, Version 2018math.stanford.edu/.../STANF61CM-18/lecture-notes-61cm-18-analysi… · Lecture notes for Math 61CM, Analysis, Version 2018 Lenya

The next group of axioms concerns multiplication. We will sometimes write the product of twonumbers a, b ∈ R as ab and sometimes as a · b, hoping this causes no confusion.(M1) The multiplication operation is associative:

(ab)c = a(bc) for all a, b, c ∈ R,

(M2) There exists an identity element denoted by 1, so that 1 6= 0, and

x · 1 = 1 · x = x for all x ∈ R.

(M3) For each x ∈ R such that x 6= 0, there exists an element x−1 ∈ R such that

x · x−1 = 1.

(M4) The multiplication operation is commutative:

ab = ba for all a, b ∈ R.

These axioms mean that the set R \ {0} is a commutative group with respect to multiplication. Thelast algebraic axiom, when taken together with the previous axioms, says that R is a field, and issimply the distributive law with respect to addition and multiplication:

(F1) a(b+ c) = ab+ ac for all a, b, c ∈ R.

The familiar arithmetic properties of real numbers follow from these axioms relatively easily. Forexample, to show that there is only one zero in the set of real numbers, let us assume that 01 and 02are both zeros and write:

01 = 01 + 02 = 02 + 01 = 02.

Make sure that you understand which axioms were used in each identity above! To show that eachelement x ∈ R has a unique negative element, let us assume that for some x ∈ R there exist x1and x2 such that

0 = x+ x1, 0 = x+ x2.

Then we can write:

x1 = x1 + 0 = x1 + (x+ x2) = (x1 + x) + x2 = 0 + x2 = x2.

Again, make sure that you understand which axioms were used in each identity above. The nextexercise lists some of the other elementary arithmetic properties of real numbers.

Exercise 1.1 Show using the above axioms that the following algebraic properties hold:(1) Given any a, b ∈ R, equation

a+ x = b

has a unique solution x ∈ R.(2) For each x ∈ R there exists a unique y such that xy = 1.(3) For each a 6= 0 and each b ∈ R there exists a unique solution to the equation a · x = b.(4) For any x ∈ R we have x · 0 = 0.(5) If xy = 0 then either x = 0 or y = 0.(6) For any x ∈ R we have −x = (−1) · x.(7) For any x ∈ R we have (−1) · (−x) = x.(8) For any x ∈ R we have (−x) · (−x) = x · x.

2

Page 3: Lecture notes for Math 61CM, Analysis, Version 2018math.stanford.edu/.../STANF61CM-18/lecture-notes-61cm-18-analysi… · Lecture notes for Math 61CM, Analysis, Version 2018 Lenya

1.2 The order axioms

The next set of axioms has little to do with arithmetic but rather ordering: it says that the realnumbers form an ordered set. That is, there is a relation ≤ between the real numbers such that(O1) For each x ∈ R we have x ≤ x.(O2) If x ≤ y and y ≤ x then x = y.(O3) If x ≤ y and y ≤ z then x ≤ z.These axioms mean that R is a (partially) ordered set. We postulate it is totally ordered:(O4) For each x, y ∈ R either x ≤ y or y ≤ x.

To connect the axioms of addition, multiplication to the order axioms, we add the following twoaxioms.(OA) If x ≤ y and z ∈ R then x+ z ≤ y + z.(OM) If 0 ≤ x and 0 ≤ y then 0 ≤ xy.

If x ≤ y and x 6= y then we use the notation x < y. The relations ≥ and > are defined in theobvious way.

Note that we have 0 < 1. Indeed, the multiplication axioms imply that 0 6= 1. If we have 1 < 0,then adding (−1) to both sides gives 0 < −1, and (OM) implies that 0 < (−1)(−1) = 1, which is acontradiction to the assumption 1 < 0. We used claim (8) of Exercise 1.1 in the last step.

The next exercise shows that ordering is actually a non-trivial structure – not every field can beordered.

Exercise 1.2 Show that not every set satisfying the addition and multiplication axioms can beordered. Hint: consider the set {0, 1} with 0 + 0 = 0, 0 + 1 = 1, 1 + 1 = 0 and 0 · 0 = 1 · 0 = 0 · 1 = 0,and 1 · 1 = 1. Show that it satisfies the addition and multiplication axioms but can not be ordered.

1.3 The completeness axiom

As you can readily check, the axioms of addition, multiplication and order do not yet narrow downthe intuition we have for the set of real numbers that should, at the very least, include what weknow as rational numbers and irrational numbers. Indeed, these axioms are satisfied by the set ofrational numbers alone and hence are not yet sufficient. As a side note, strictly speaking, to definethe rational numbers we need to know what integers are but let us disregard this for the moment.What is clear is that we need another axiom that would not be satisfied by what we think of as”rational numbers”. This is achieved by the completeness axiom.The completeness axiom. If X and Y are non-empty sets of real numbers such that for any x ∈ Xand y ∈ Y we have x ≤ y, then there exists c ∈ R such that x ≤ c ≤ y for all x ∈ X and y ∈ Y .

In a sense, this axiom says that R has no holes. On other hand, the set Q of rational numbersdoes have holes: if we take

X = {r ∈ Q : r2 < 2}, Y = {r ∈ Q : r2 > 2 and r > 0} ,

then there is no c ∈ Q such that for any x ∈ X and y ∈ Y we have x ≤ c ≤ y – this is because√

2 isirrational. No rational number squared can equal to 2 and if a rational c separating the sets X and Ywere to exist, it is not hard to see that it would have to satisfy c2 = 2 – we will come back to thisa little later. Thus, the completeness axiom really distinguishes the reals from the set of rationalnumbers. To see how this axiom can be used, let us give some definitions.

Definition 1.3 (1) A set S ⊆ R is bounded from above if there exists a number K ∈ R such thatany x ∈ S satisfies x ≤ K. Such K is called an upper bound for S.(2) A set S ⊆ R is bounded from below if there exists a number K ∈ R such that any x ∈ S

3

Page 4: Lecture notes for Math 61CM, Analysis, Version 2018math.stanford.edu/.../STANF61CM-18/lecture-notes-61cm-18-analysi… · Lecture notes for Math 61CM, Analysis, Version 2018 Lenya

satisfies K ≤ x. Such K is called a lower bound for S.(3) A set S ⊆ R is bounded if it is bounded both from above and from below.

Exercise 1.4 Let us define |x| as |x| = x if 0 ≤ x and |x| = −x if 0 ≤ −x. Show that a set S isbounded if and only if there exists M ∈ R such that |x| ≤M for all x ∈ S.

Definition 1.5 We say that x ∈ S is the maximum of a set S if for any y ∈ S we have y ≤ x.Similarly, we say that x ∈ S is the minimum of a set S if for any y ∈ S we have x ≤ y.

Note that the maximum and minimum of a set belong to the set, by definition. However, not everyset has a maximum or a minimum.

Exercise 1.6 Show that the set S = {x : 0 < x < 1} has neither a minimum nor a maximum.

Definition 1.7 (1) A number K is the least upper bound for a set S ⊆ R, denoted as K = supS,if K is an upper bound for S and any upper bound M for S satisfies K ≤M .(2) A number K is the greatest lower bound for a set S ⊆ R, denoted as K = inf S, if K is a lowerbound for S and any lower bound M for S satisfies M ≤ K.

In other words, supS is the minimum of the set of the upper bounds for S, and inf S is the maximumof the set of the lower bounds for S. As we have seen, not every set has a maximum or a minimum.The completeness axiom for the real numbers has the following consequence showing that the set ofupper bounds for any set has a minimum.

Lemma 1.8 (The least upper bound principle). Every non-empty set of real numbers that is boundedfrom above has a unique least upper bound.

Proof. Uniqueness of the least upper bound follows from the fact that (if it exists) it is the minimumof the set of all upper bounds, and every set can have have at most one minimum. Thus, we onlyneed to show that a least upper bound exists. Let Y be the set of all upper bounds for a non-emptyset X. Since X is bounded, the set Y is not empty, and by the definition of an upper bound, weknow that for any x ∈ X and y ∈ Y we have x ≤ y. The completeness axiom then implies that thereexists c ∈ R such that for any x ∈ X and y ∈ Y we have x ≤ c ≤ y. The first inequality impliesthat c is an upper bound for X and the second implies that it is the least upper bound. �

Exercise 1.9 Show that the conclusion of Lemma 1.8 does not hold for the set Q of rationalnumbers. That is, exhibit a set S of rational numbers such that there is no smallest rational upperbound for S.

One may wonder if one can construct a version of real numbers explicitly and verify that itsatisfies the axioms. A standard way is to start with the rational numbers and use what is knownas the Dedekind cuts. We will not do this here for lack of time, and simply note that an interestedreader should have no difficulty finding it in a variety of books or on the Internet.

1.4 The natural numbers and the principle of mathematical induction

We understand commonly that the natural numbers are 1, 1 + 1, 1 + 1 + 1, . . . In order to make thismore formal, we give the following definition.

Definition 1.10 A set X ⊆ R is inductive if for each x ∈ X, the set X also contains x+ 1.

Definition 1.11 The set N of natural numbers is the intersection of all inductive sets that contain 1.

4

Page 5: Lecture notes for Math 61CM, Analysis, Version 2018math.stanford.edu/.../STANF61CM-18/lecture-notes-61cm-18-analysi… · Lecture notes for Math 61CM, Analysis, Version 2018 Lenya

Exercise 1.12 Show that N is an inductive set.

Lemma 1.13 The sum of two natural numbers is a natural number.

Proof. Let S be the set of all natural numbers n such that for any m ∈ N we have n + m ∈ N.Then S is not empty because 1 ∈ S (this, in turn, is because N is an inductive set). Moreover,if n ∈ S then n+ 1 is also in S because for any m ∈ N we have

(n+ 1) +m = n+ (m+ 1) ∈ N,

because n ∈ S and m+ 1 ∈ N. Thus, S is an inductive set, and, as it is a subset of N, it has to beequal to N. �

Exercise 1.14 Show that the product of two natural numbers is a natural number.

Exercise 1.15 Show that if y ∈ N and y 6= 1 then y − 1 ∈ N. Hint: let E be the set of all realnumbers of the form n− 1 where n ∈ N and n > 1. Show that E = N.

Lemma 1.16 If n ∈ N then there are no natural numbers x such that n < x < n+ 1.

Proof. We will show that the set S of all natural numbers for which this assertion holds contains 1and is an inductive set. To show that 1 ∈ S, let

M = {x ∈ N : x = 1 or x ≥ 2}.

Then, obviously 1 ∈ M and if x ∈ M then either x = 1 so that x + 1 = 2 ∈ M or x ≥ 2 and thenx + 1 ≥ 2, thus x + 1 ∈ M also in that case. Thus, M is a subset of N that is an inductive setcontaining 1, hence M = N. This means that the assertion of the lemma holds for 1 and 1 ∈ S.Next, assume that x ∈ S – we will show that x + 1 ∈ S. Assume that there is y ∈ N such thatn+ 1 < y < n+ 2. Then y > 1, thus y− 1 is in N and n < y− 1 < n+ 1, meaning that n /∈ S, whichis a contradiction. �

The set Z of integers consists of numbers x ∈ R such that either x = 0, or x ∈ N, or −x ∈ N.Now, we can define the rational numbers as usual: they have the form mn−1 with m,n ∈ Z, n 6= 0.The irrational numbers are those that are not rational. In order to show that irrational numbersexist, we make the following observation.

Exercise 1.17 (1) Show that if s > 0 and s2 < 2 then there exists δ > 0 so that (s+ δ)2 < 2. Hint:you should be able find an explicit δ > 0 in terms of s.(2) Show that if s > 0 and s2 > 2 then there exists δ > 0 so that (s− δ)2 < 2. Hint: again, pick anexplicit δ > 0 in terms of s.(3) Let X = {x > 0 : x2 < 2}. Show that X is bounded from above and set s = supS. Showthat s2 = 2 and that s is irrational.

1.5 The Archimedes principle

The Archimedes principle is a key to the approximation theory, and says the following.

Lemma 1.18 (The Archimedes principle) For any fixed h > 0 and any x ∈ R there exists aninteger k ∈ Z so that

(k − 1)h ≤ x < kh.

5

Page 6: Lecture notes for Math 61CM, Analysis, Version 2018math.stanford.edu/.../STANF61CM-18/lecture-notes-61cm-18-analysi… · Lecture notes for Math 61CM, Analysis, Version 2018 Lenya

Proof. We first note that N is not bounded from above. Indeed, if it is bounded, set s = supN.If s /∈ N, then, by the definition of supremum, there must be n ∈ N such that n > s − 1, butthen s < n + 1 and thus s is not an upper bound for N. However, if s ∈ N then s + 1 ∈ Nand s+ 1 > s meaning that s is again not an upper bound for N. This shows that N is not boundedfrom above, and thus Z is not bounded from above or from below. A similar argument shows thatif S is a set of integers bounded from above, it must contain a maximal element, and if it is a setof integers bounded from below, it must contain a minimal element. Now, turning to the proof ofthe Archimedes principle, let S = {n ∈ Z : x/h < n}. This set is bounded from below and thuscontains a minimal element k. Then we have

k − 1 ≤ x

h< k,

and the conclusion of Lemma 1.18 follows. �Here are some important corollaries of the Archimedes principle.

Corollary 1.19 (i) For any ε > 0 there exists n ∈ N such that 0 < 1/n < ε.(ii) For any a, b ∈ R such that a < b there exists r ∈ Q such that a < r < b.

Exercise 1.20 Prove the assertions of Corollary 1.19.

2 The nested intervals lemma and the Heine-Borel theorem

In this section, we will introduce the first ideas related to the notion of compactness.

2.1 The nested interval lemma

Definition 2.1 An infinite sequence of sets X1, X2 . . . , Xn, . . . is nested if for each n ∈ N we haveXn+1 ⊆ Xn.

A closed interval [a, b] is simply the set [a, b] = {x : a ≤ x ≤ b}, and an open interval is the set(a, b) = {x : a < x < b}.

Lemma 2.2 (The nested intervals lemma). Let I1 ⊇ I2 ⊇ · · · ⊇ In ⊇ . . . be a nested sequence ofclosed intervals. Then there exists c ∈ R that belongs to all Ik, k ∈ N. If, in addition, for any ε > 0there exists k such that |Ik| < ε, then there is exactly one point c common to all intervals.

Proof. Since Ik is a nested sequence of intervals, we have ak ≤ am ≤ bm ≤ bk for all m ≥ k. Thus,the sets A = {an : n ∈ N} and B = {bn : n ∈ N} satisfy the assumptions of the completenessaxiom. Hence, there exists c ∈ R such that am ≤ c ≤ bk for all m, k ∈ N, and, in particular, wehave an ≤ c ≤ bn for all n ∈ N, thus c ∈ In for all n ∈ N.

If there are two points c1 6= c2 that belong to all Ik, say, with c1 < c2, then we have

an ≤ c1 < c2 ≤ bn for all n,

hence |bn − an| > c2 − c1, and the length of all intervals In is larger than c2 − c1. �

Exercise 2.3 Where did we use the fact that the intervals In are closed in the above proof? Givean example of a nested sequence of open intervals In = (an, bn) that has an empty intersection.

6

Page 7: Lecture notes for Math 61CM, Analysis, Version 2018math.stanford.edu/.../STANF61CM-18/lecture-notes-61cm-18-analysi… · Lecture notes for Math 61CM, Analysis, Version 2018 Lenya

2.2 The Heine-Borel lemma

Definition 2.4 A collection F of sets is said to cover a set Y if for every element y ∈ Y thereexists a set X ∈ F such that y ∈ X.

A collection of sets G is a subcollection of F if every set X ∈ G is also in the collection F .

Lemma 2.5 (The Heine-Borel lemma) Every collection F of open intervals covering a closed in-terval [a, b] contains a finite subcollection that also covers [a, b].

Proof. Let F be a collection of open intervals covering a closed interval [a, b] and let S be the setof points y in [a, b] with the following property: the interval [a, y] can be covered by finitely manyintervals from F . As the point a itself is covered by some interval (α, β) ∈ F , with α < a < β,we know that S is not empty: it contains a, as well as all points z ∈ [a, b] such that a ≤ z < β.Moreover, the set S is bounded from above: any element y ∈ S satisfies y ≤ b. It is easy to see thatif y ∈ S then any x such that a ≤ x ≤ y is also in S. Let s = supS, and note that s ∈ [a, b]. Weclaim that s ∈ S and s = b. To see that s ∈ S, assume that s /∈ S, and observe that the point s iscovered by some open interval (α′, β′) ∈ F , with α′ < s < β′. Since s is the least upper bound for S,and s /∈ S, there must be a point x ∈ S such that α′ < x ≤ s. Since x ∈ S, we can cover the interval[a, x] by finitely many intervals from F . Adding the interval (α′, β′) ∈ F to this collection gives us afinite collection covering the interval [a, s], contradicting the assumption that s /∈ S. Therefore, wehave s ∈ S. However, this also shows that unless s = b, the points in the interval [s,min(β′, b)) arealso in S, thus s is not an upper bound for S, which is a contradiction. Thus, we have b = s. As wehave shown that s ∈ S, we know that b ∈ S, and we are done, by the definition of the set S. �

Exercise 2.6 Where did we use the fact that the intervals in the covering collection F are open?Where did we use the fact that the interval [a, b] is closed? Construct a collection F of closedintervals [a, b] that covers the closed interval [0, 1] but no finite sub-collection of F covers [0, 1].Next, construct a collection G of open intervals (a, b) that covers the open interval (0, 1) but nofinite sub-collection of G covers (0, 1).

Exercise 2.7 Devise an alternative proof of the Heine-Borel lemma as follows. Assume that thereis no finite sub-collection of open intervals from F that covers I0 = [a, b]. Then either there is nofinite sub-collection that covers [a, a1] or there is no finite sub-collection that covers [a1, b], wherea1 = (a + b)/2. Choose the interval I2 out of these two that can not be covered, split it into twosub-intervals, choose a sub-interval I3 that can not be covered by a finite sub-collection and repeatthis procedure. This will give you a sequence of nested closed intervals I1 ⊇ I2 ⊇ I3 ⊇ · · · ⊇ In ⊇ . . . .The Nested Intervals Lemma implies that there is a point c that belongs to all of these intervals.As c ∈ [a, b], it is covered by some open interval I from the collection F . Use this to arrive at acontradiction to the fact that none of Ik can be covered by finitely many intervals from F .

2.3 Limit points and the Bolzano-Weierstrass lemma

Let us recall that a neighborhood of a point x is an open interval (a, b) that contains x.

Definition 2.8 A point z ∈ R is a limit point of a set X if any neighborhood of z contains infinitelymany elements of X.

Exercise 2.9 Show that z is a limit point of a set X if and only if any neighborhood of z containsat least one element of X different from z itself.

7

Page 8: Lecture notes for Math 61CM, Analysis, Version 2018math.stanford.edu/.../STANF61CM-18/lecture-notes-61cm-18-analysi… · Lecture notes for Math 61CM, Analysis, Version 2018 Lenya

As an example, part (i) of the Corollary 1.19 of the Archimedes principle shows that z = 0 is a limitpoint of the set {1/n : n ∈ N}.

Exercise 2.10 Use the Archimedes principle to show that every x ∈ R is a limit point of the set Qof rational numbers.

The following lemma is a cornerstone of many things to come.

Lemma 2.11 (The Bolzano-Weierstrass lemma) Every bounded set of real numbers has at least onelimit point.

Proof. Let X be an infinite bounded subset of R. As X is bounded, there exists M > 0 so that Xis contained in the interval I = [−M,M ]. Assume that no point in I is a limit point of X. Thenfor each x ∈ I we can find an open interval Ix = (ax, bx) that contains x such that there are onlyfinitely many points of X inside Ix. The open intervals Ix form an open cover of the closed interval I,and the Heine-Borel lemma implies that there exists a sub-cover of I by a finite sub-collection ofintervals Ixk = (axk , bxk), k = 1, . . . , N with some N ∈ N. Each interval Ixk contains only finitelymany elements of X, hence there are only finitely many elements of X in the union of all theseintervals. As no elements of X can lie outside of this union, there are only finitely many elementsin X which is a contradiction. �

Another property we will use often is the following.

Lemma 2.12 Let S be a bounded from above set that does not have a maximum. Then supS is alimit point of S.

Exercise 2.13 Prove Lemma 2.12.

3 Limits

3.1 Definition of the limit of a sequence

We first define a limit point of a sequence. One may informally think of them as points where thesequence ”bunches up”. Here is a way to formalize this.

Definition 3.1 We say that z ∈ R is a limit point of a sequence an if for any ε > 0 there existinfinitely many k ∈ N such that |ak − z| < ε.

Note a subtle difference between a limit point of a sequence and a limit point of the set of its values.For instance, for a constant sequence 1, 1, 1, 1, . . . , that is, an = 1 for all n ∈ N, the set of its valuesis {1} – it consists of one point and has no limit points. However, x = 1 is a limit point of thissequence. This small difference will be of no ”serious” importance for us but one should keep it inmind.

Exercise 3.2 Show that z ∈ R is a limit point of a sequence an if and only if any open interval (c, d)that contains z, also contains infinitely many elements of the sequence an: there exist infinitelymany n so that an ∈ (c, d).

A sequence may have more than one limit point. For instance, the sequence 1,−1, 1,−1, 1, . . . , thatis, an = (−1)n+1 has exactly two limit points z = 1 and z = −1.

Exercise 3.3 (1) Construct a sequence an that has no limit points. (2) Construct a sequence bkthat has infinitely many limit points. (3) Construct a sequence ck such that any point z ∈ R is alimit point of bk.

8

Page 9: Lecture notes for Math 61CM, Analysis, Version 2018math.stanford.edu/.../STANF61CM-18/lecture-notes-61cm-18-analysi… · Lecture notes for Math 61CM, Analysis, Version 2018 Lenya

Next, we introduce the notion of the limit of a sequence.

Definition 3.4 A point z ∈ R is the limit of a sequence an if for any open interval (c, d) thatcontains z there exists N so that all ak with k ≥ N lie inside (c, d). We write this as an → z asn→ +∞, or as lim

n→∞an = z.

The following exercise gives a slightly more ”practical” definition of the limit.

Exercise 3.5 Show that a point z is the limit of a sequence an if and only if for any ε > 0 thereexists N(ε) so that |ak − z| < ε for all k ≥ N(ε).

A small advantage of Definition 3.4 over the one in Exercise 3.5, and the reason to introduce it first,is that it will be easy to generalize to spaces other than R where the notion of a distance may be notdefined but the notion of a neighborhood may be. However, the statement in Exercise 3.5 is muchmore ”practical” and we will mostly use it rather than Definition 3.4 directly.

Important: unless otherwise specified, we will usually refer to Exercise 3.5 as the definition ofthe limit of a sequence. For the first reading, the reader should really think of Exercise 3.5 as thedefinition of the limit of a sequence.

The next important exercise gives another informal way to think of the limit of a sequence: thisis the only point where the sequence ”bunches up”.

Exercise 3.6 Show that a point z is the limit of a bounded sequence an if and only if z is the onlylimit point of an.

Here are some examples of limits.

Exercise 3.7 Show that

(a) limn→∞

1

n= 0

(b) limn→∞

sinn

n= 0

(c) limn→∞

1

qn= 0 for any q > 1.

Hints: for part (a) – use part (i) in Corollary 1.19; for part (c) – write q = 1 + δ with δ > 0 and useinduction to show that qn ≥ 1 + nδ. Then apply part (a).

3.2 Basic properties of converging sequences

Definition 3.8 (1) A sequence an is bounded if there exists M ∈ R so that |ak| ≤M for all k ∈ N.(2) A sequence an is bounded from below if there exists M ∈ R so that ak ≥M for all k ∈ N.(3) A sequence an is bounded from above if there exists M ∈ R so that ak ≤M for all k ∈ N.(4) A sequence an is increasing if ak+1 ≥ ak for all k ∈ N.(5) A sequence an is strictly increasing if ak+1 > ak for all k ∈ N.(6) A sequence an is decreasing if ak+1 ≤ ak for all k ∈ N.(7) A sequence an is strictly decreasing if ak+1 < ak for all k ∈ N.(8) A sequence an is monotone if it is either decreasing or increasing.(9) A sequence is converging if it has a limit.

Theorem 3.9 A convergent sequence is bounded.

9

Page 10: Lecture notes for Math 61CM, Analysis, Version 2018math.stanford.edu/.../STANF61CM-18/lecture-notes-61cm-18-analysi… · Lecture notes for Math 61CM, Analysis, Version 2018 Lenya

Proof. Let an be a convergent sequence, with limn→∞

an = `. Take ε = 1 in the definition of the limit

of a sequence (again, in Exercise 3.5!) – this implies existence of N so that |ak−`| < 1 for all k ≥ N .It follows that all ak with k ≥ N satisfy `− 1 ≤ ak ≤ `+ 1. There are only finitely many elementsaj with j < N . Thus, if we set

m = min(a1, . . . , aN−1, `− 1), M = max(a1, . . . , aN−1, `+ 1),

then all ak satisfy m ≤ ak ≤M , hence the sequence an is bounded. �An immediate consequence is that any unbounded sequence can not have a limit.

Exercise 3.10 Not all bounded sequences converge: give an example of a bounded sequence thathas no limit.

The next theorem gives a condition for convergence that is actually very useful in many appli-cations as monotone sequences arise very often.

Theorem 3.11 If an is monotone and bounded, then it converges. Moreover, if S = {a1, a2, . . . , ak, . . . },then (1) If an is increasing and bounded from above then lim

n→∞an = supS.

(2) If an is decreasing and bounded from below, the limn→∞

an = inf S.

Proof. We will only prove (1). Let us assume that an is increasing and bounded from above. Thenthe set S is bounded from above, thus ` = supS exists. Let ε > 0, then, as ` − ε is not an upperbound for S (this follows from the definition of supS), there exists N(ε) so that aN(ε) > ` − ε. Asthe sequence an is monotonically increasing, it follows that for all k ≥ N(ε) we have ak ≥ ` − ε aswell. However, as ` is an upper bound for the sequence an, we also know that ak ≤ `. Thus, wehave |ak − `| < ε for all k ≥ N(ε) and we are done. �

Theorem 3.12 Let an and bn be convergent sequences with limn→∞

an = A and limn→∞

bn = B.

(1) The sequence an + bn also converges and limn→∞

(an + bn) = A+B.

(2) The sequence anbn also converges and limn→∞

(anbn) = AB.

(3) If, in addition, B 6= 0, then the sequence an/bn also converges and limn→∞

(an/bn) = A/B.

Proof. We will only prove (2). Let us write

anbn −AB = anbn − anB + anB −AB = an(bn −B) + (an −A)B.

The triangle inequality implies that

|anbn −AB| ≤ |an||bn −B|+ |an −A||B|.

Since an is a convergent sequence, there exists M so that |an| ≤M for all n ∈ N, which leads to

|anbn −AB| ≤M |bn −B|+ |an −A||B|.

Now, given ε > 0 find N1 so that |an−A| ≤ ε/(2|B|) for all n ≥ N1, and N2 so that |an−A| ≤ ε/(2M)for all n ≥ N1. Then, for all n ≥ N = max(N1, N2), we have

|anbn −AB| ≤M |bn −B|+ |an −A||B| ≤Mε

2M+

ε

2|B||B| = ε.

Hence, the sequence anbn converges to AB. �

10

Page 11: Lecture notes for Math 61CM, Analysis, Version 2018math.stanford.edu/.../STANF61CM-18/lecture-notes-61cm-18-analysi… · Lecture notes for Math 61CM, Analysis, Version 2018 Lenya

Exercise 3.13 (1) Prove assertion (1) in Theorem 3.12.(2) Prove assertion (3) in Theorem 3.12. Hint: it is easier to first show that the sequence 1/bnconverges to 1/B.(3) Let an = 1/n, find a sequence bn such that the limit of an/bn exists, and a sequence dn such thatthe limit of an/bn does not exist.(4) Assume that lim

n→∞an = A > 0, and lim

n→∞bn = 0. Show that the sequence an/bn does not converge.

Theorem 3.14 Assume that an and bn are two convergent sequences with

limn→∞

an = A and limn→∞

bn = B.

If A < B then there exists N so that an < bn for all n ≥ N .

Exercise 3.15 Prove this theorem.

3.3 The number e

Theorem 3.16 The sequence xn =(

1 +1

n

)nconverges. Its limit is denoted as e:

e = limn→∞

(1 +

1

n

)n(3.1)

Proof. We will look instead at the sequence

yn =(

1 +1

n

)n+1

and show that it is decreasing. For that, we need the inequality

(1 + α)n ≥ 1 + nα (3.2)

that holds for all α ≥ 0 and all n ∈ N. Recall that we have seen this inequality in Exercise 3.7. Now,we write

yn−1yn

=(1 + 1

n−1)n

(1 + 1n)n+1

=nn

(n− 1)nnn+1

(n+ 1)n+1=

n2n

(n2 − 1)nn

n+ 1=( n2

n2 − 1

)n n

n+ 1

=(

1 +1

n2 − 1

)n n

n+ 1.

Now, we use (3.2) with α = 1/(n2 − 1), to get

yn−1yn≥(

1 +n

n2 − 1

) n

n+ 1≥(

1 +1

n

) n

n+ 1= 1.

Thus, the sequence yn is a positive decreasing sequence, and Theorem 3.11 implies that limn→∞ ynexists. Thus, the sequence xn can be written as a product of two converging sequences:

xn =(

1 +1

n

)n=(

1 +1

n

)n+1 1

1 + 1/n,

hence xn is itself converging. �

11

Page 12: Lecture notes for Math 61CM, Analysis, Version 2018math.stanford.edu/.../STANF61CM-18/lecture-notes-61cm-18-analysi… · Lecture notes for Math 61CM, Analysis, Version 2018 Lenya

Let us explain informally why (3.1) agrees with what we may know from a standard calculuscourse. It follows from (3.1) that we can write

e =(

1 +1

n

)n+ αn, (3.3)

with a sequence αn that goes to zero as n→ +∞. Taking the logarithm with respect to base e, thatwe will denote simply by log, gives

1 = n log(

1 +1

n

)+ βn, (3.4)

with

βn = log[1 + αn

(1 +

1

n

)−n].

Exercise 3.17 Show that limn→∞

βn = 0. Hint: use the fact that αn → 0.

Using the result of Exercise 3.17 in (3.4) gives

log(

1 +1

n

)=

1

n+βnn. (3.5)

Exercise 3.18 Recalling the calculus definition of the derivative (that we officially do not knowyet), show that (3.5) implies that if the derivative of f(x) = log x exists at x = 1, then f ′(x) = 1.Show that e is the unique number a such that

d

dx(loga x)

∣∣∣x=1

= 1.

3.4 The Cauchy criterion

Definition 3.19 We say that an is a Cauchy sequence if for any ε > 0 there exists m ∈ N so that|xn − xm| < ε for all n,m ≥ N .

Theorem 3.20 A sequence an converges if and only if an is a Cauchy sequence.

Proof. One direction is easy. Assume that an converges and A = limn→∞

an. Given ε > 0 we can find

N ∈ N so that |an −A| < ε/2 for all n ≥ N . Then, for all n,m ≥ N we have

|an − am| ≤ |an −A|+ |A− am| <ε

2+ε

2= ε,

hence an is a Cauchy sequence.Next, assume that an is a Cauchy sequence. First, we claim that an is bounded. Indeed,

taking ε = 1, we can find N such that |an − am| < 1 for all n,m ≥ N . In particular, we haveaN − 1 < xm < aN + 1 for all m ≥ N . In addition, there are only finitely many elements of thesequence an with n < N , so the set {a1, a2, . . . , aN−1} is a bounded set. Hence, {an} is a union oftwo bounded sets, hence it is also bounded, and the sequence an is bounded as well. Thus, for eachn ∈ N we can define

xn = infk≥n

ak, yn = supk≥n

ak.

It is clear from the definition that xn ≤ xn+1 ≤ yn+1 ≤ yn for all n ∈ N, so that the sequence xn isincreasing and the sequence yn is decreasing. By the nested intervals theorem there exists a point Acommon to all intervals [xn, yn]:

xn ≤ A ≤ yn for all n ∈ N.

12

Page 13: Lecture notes for Math 61CM, Analysis, Version 2018math.stanford.edu/.../STANF61CM-18/lecture-notes-61cm-18-analysi… · Lecture notes for Math 61CM, Analysis, Version 2018 Lenya

In addition, we havexn ≤ an ≤ yn for all n ∈ N,

and it follows that|A− an| ≤ |xn − yn|. (3.6)

However, given any ε > 0 we can find N so that for all m,N ≥ N we have

|an − am| <ε

10,

and in particular,

|an − aN | <ε

10.

Now, it follows from the definition of xn and yn that for all n ≥ N we have

|xn − aN | ≤ε

10, |yn − aN | ≤

ε

10,

hence

|xn − yn| ≤2ε

10< ε.

We conclude from this and (3.6) that |A − an| < ε for all n ≥ N , thus an converges to A asn→ +∞. �

Exercise 3.21 Use the Cauchy criterion to show that the sequence

an = 1 +1

2+

1

3+ · · ·+ 1

n

does not converge. Hint: show that a2n − an ≥ 1/2 for all n.

Exercise 3.22 (i) Show that the claim of Theorem 3.20 does not hold for the set Q of rationalnumbers. In other words, show that a Cauchy sequence of rational numbers does not necessarilyconverge to a rational number. This is another crucial difference between Q and R. (ii) However,show, without using the results about real numbers, that if a sequence ak of rational numbers isCauchy and a subsequence ank

converges to a rational number ` ∈ Q, then the sequence an convergesto `. In other words, a Cauchy sequence that has a converging subsequence must converge.

3.5 The Bolzano-Weierstrass theorem, lim sup and lim inf

Definition 3.23 If xn is a sequence, and nk ∈ N is an increasing sequence: n1 < n2 < n3 · · · <nk < . . . , then the sequence yk = xnk

is called a subsequence of xn.

Theorem 3.24 Every bounded sequence of real number contains a convergent subsequence.

Proof. Let E be the set of values of a bounded sequence xn. If E is finite, then there exists a ∈ R sothat xn = a for infinitely many n. It is then an easy exercise to see that there exists a subsequencexnk

of xn such that xnk= a for all k, hence limk→∞ xnk

= a and we are done. If the set E is infinite,then, as xn is a bounded sequence, E is an infinite bounded set, hence by the Bolzano-WeierstrassLemma 2.11, it has a limit point A. Then one can choose n1 so that |xn1 −A| < 1. Next, note that,since A is a limit point of xn, the interval (A−1/2, A+1/2) contains infinitely many elements of thesequence, hence one can choose n2 > n1 so that |xn2 − A| < 1/2, and so on, at each step choosingnk+1 > nk so that |xnk+1

−A| < 1/(k+ 1). As limk→∞

1/(k+ 1) = 0, the sequence xnkconverges to A,

and we are done. �

13

Page 14: Lecture notes for Math 61CM, Analysis, Version 2018math.stanford.edu/.../STANF61CM-18/lecture-notes-61cm-18-analysi… · Lecture notes for Math 61CM, Analysis, Version 2018 Lenya

Definition 3.25 If there exists a subsequence xnksuch that ` = limk→∞ xnk

, then we say ` is alimit of xn along a subsequence.

We now define lim sup and lim inf of a sequence xn. As we have done in the proof of the Cauchycriterion for convergence, let us set

an = infk≥n

xk, bn = supk≥n

xk.

As we have observed in that proof, if xn is bounded from below, then an are well-defined, and thesequence an is increasing, while if xn is bounded from, then bn are well-defined, and the sequence bnis decreasing. We denote the corresponding limits by

lim supn→∞

xn = limn→∞

(supk≥n

xk), lim inf

n→∞xn = lim

n→∞

(infk≥n

xk).

Proposition 3.26 Let xn be a bounded sequence, then lim inf xn and lim supxn are the largest andthe smallest limits of xn along a subsequence.

Exercise 3.27 Prove Proposition 3.26.

4 Continuity and limits of a function

4.1 Limit of a function at a point

Let f(x) be a function f : [c, d]→ R and a ∈ [c, d].

Definition 4.1 We say thatlimx→a

f(x) = A,

in the sense of Cauchy if for every ε > 0 there exists δ > 0 so that for any x 6= a such that |x−a| < δand x ∈ [c, d] we have |f(x)−A| < ε.

Note that we exclude x = a in the above definition. The reason is that we do not want the valueof lim

x→af(x) to depend on the value of f(a). For instance, if f(x) = 1 for x 6= 0 and f(0) = 0, we

would like to have limx→0

f(x) = 1.

An alternative definition is as follows.

Definition 4.2 We say that A is a sequential limit of f(x) as x → a if for any sequence xn → a,with xn 6= a and xn ∈ [c, d], we have

limn→∞

f(xn) = A.

Theorem 4.3 The two definitions of the limit of f(x) at a point a are equivalent.

Proof. Let us first assume thatlimx→a

f(x) = A,

in the sense of Cauchy, and let xn be a sequence such that xn → a. Given ε > 0 there exists δ > 0so that for any x 6= a such that |x− a| < δ we have |f(x)−A| < ε. As xn → a, given this δ > 0, wecan find N so that |xn− a| < δ for all n ≥ N , and then |f(xn)−A| < ε for all n ≥ N , by the above,hence f(xn)→ A.

14

Page 15: Lecture notes for Math 61CM, Analysis, Version 2018math.stanford.edu/.../STANF61CM-18/lecture-notes-61cm-18-analysi… · Lecture notes for Math 61CM, Analysis, Version 2018 Lenya

Next, assume that f(x) converges in the sequential sense to A as x → a but that A is not thelimit of f(x) in the Cauchy sense as x → a. This means that there exists ε0 > 0 such that forany δ > 0 there exists x 6= a such that |x − a| < δ and |f(x) − A| > ε0. Let us take δn = 1/n –this will generate the corresponding sequence xn. It is easy to see that xn → a but f(xn) does notconverge to A, which is a contradiction. �

Exercise 4.4 Verify that the usual arithmetic and inequality properties hold for the limit of afunction at a point.

4.2 Continuous functions and their basic properties

Definition 4.5 (1) A function f(x) : [c, d]→ R is continuous at a point a ∈ [c, d] if

limx→a

f(x) = f(a).

(2) A function f : [c, d] → R is continuous on the interval [c, d] if it is continuous at every pointof [c, d].

We summarize some basic properties of continuous functions in the following proposition.

Proposition 4.6 (i) Let f(x) defined on an interval [c, d] be continuous at a point z ∈ [c, d]and f(z) 6= 0, then there is a neighborhood U of z such that f(x) has the same sign as f(z) forall x ∈ U such that x ∈ [c, d].(ii) If the functions f and g defined on [c.d] are continuous at z ∈ [c, d], then so are the functionsf + g and fg. Also, the function f/g is continuous at z provided that g(z) 6= 0.(iii) Let f be a continuous function defined on an interval [c1, d1] and g be a continuous functiondefined on an interval [c2, d2], such that g(x) ∈ [c1, d1] for all x ∈ [c2, d2], so that the composi-tion (f ◦ g)(x) = f(g(x)) is defined for all x ∈ [c2, d2]. Show that if f is continuous on [c1, d1]and g is continuous on [c2, d2], then f ◦ g is continuous on [c2, d2]. be two functions such that thecomposition

Exercise 4.7 Prove Proposition 4.6, pay special attention to (i) and (iii).

The next theorem, and Corollary 4.9 show that a continuous function ”can not skip values”.

Theorem 4.8 (Intermediate Value Theorem) Let f be a continuous function on a closed inter-val [a, b] such that f(a) and f(b) have different signs, that is, f(a)f(b) ≤ 0. Then there exists c ∈ [a, b]such that f(c) = 0.

Proof. If either f(a) = 0 or f(b) = 0, we are done, so let us assume without loss of generalitythat f(a) < 0 and f(b) > 0. Let us denote I1 = [a, b] and divide I1 in half. If the value of f atthe mid-point is zero, we are done, otherwise, one of the two closed intervals has the same property:the signs of f at the two end-points are different. Let us call that interval I2 = [a1, b1], divide it atits half-point, and continue. The process terminates if the value of f at one of the mid-points willbe zero, which means we are done – we have found a point on [a, b] where f vanishes. Otherwise,we get an infinite sequence of nested intervals Ik = [ak, bk] such that f(ak) and f(bk) have differentsigns, and Ik+1 ⊂ Ik. We also know that |Ik| = |a − b|/2k−1 goes to zero as k → +∞. The NestedIntervals Lemma implies that there is a unique point c ∈ [a, b] in the intersection of all Ik. Let a′kbe the endpoint of Ik such that f(a′k) < 0 and b′k be the endpoint of Ik such that f(b′k) > 0. Note

15

Page 16: Lecture notes for Math 61CM, Analysis, Version 2018math.stanford.edu/.../STANF61CM-18/lecture-notes-61cm-18-analysi… · Lecture notes for Math 61CM, Analysis, Version 2018 Lenya

that a′k → c and b′k → c as k → +∞. As c ∈ [a, b], the function f is continuous at c. Continuity of fat c implies that

f(c) = limk→∞

f(a′k), f(c) = limk→∞

f(b′k).

The first equality above implies that f(x) ≤ 0 and the second implies that f(c) ≥ 0. It followsthat f(c) = 0. �

Corollary 4.9 Let f be a continuous function on a closed interval [a, b] such that f(a) < Aand f(b) > A, or the other way around. Then there exists c ∈ [a, b] such that f(c) = A.

Exercise 4.10 Prove Corollary 4.9. Hint: apply the Intermediate Value Theorem to the func-tion g(x) = f(x)−A.

Theorem 4.11 A continuous function on a closed interval [a, b] is bounded and attains both itsmaximum and its minimum on [a, b].

Proof. Since f is continuous on [a, b], for every point x ∈ [a, b] there exists an open interval Ixcontaining x such that

f(x)− 1 ≤ f(y) ≤ f(x) + 1 for all y ∈ Ix such that y ∈ [a, b].

The open intervals Ix form a cover of the closed interval [a, b], hence the Heine-Borel lemma impliesthat there exists a finite sub-collection Ix1 , . . . IxN that also covers [a, b]. Let

M = 1 + max(f(x1), . . . f(xN )),

andm = −1 + min(f(x1), . . . f(xN )).

As the intervals Ix1 , . . . IxN cover [a, b], we know that every point z ∈ [a, b] belongs to some Ixk . Itfollows that

m− 1 ≤ f(z) ≤M + 1,

for all z ∈ [a, b], hence the function f is bounded on [a, b].To see that the maximum and minimum are attained, let

M = supx∈[a,b]

f(x) (4.1)

and assume that f(x) 6= M for all x ∈ [a, b]. Then the function g(x) = 1/(M − f(x)) is continuouson [a, b], thus, by what we have just proved, it is bounded: there exists K so that

g(x) ≤ K for all x ∈ [a, b].

But then we have

M − f(x) ≥ 1

K, for all x ∈ [a, b],

so that

f(x) ≤M − 1

K, for all x ∈ [a, b],

which contradicts the definition of M in (4.1). Thus, there has to exist c ∈ [a, b] such that f(c) = M ,hence f attains its maximum on [a, b]. The proof that f attains its minimum on [a, b] is almostverbatim the same. �

16

Page 17: Lecture notes for Math 61CM, Analysis, Version 2018math.stanford.edu/.../STANF61CM-18/lecture-notes-61cm-18-analysi… · Lecture notes for Math 61CM, Analysis, Version 2018 Lenya

Exercise 4.12 Prove that a continuous function on a closed interval [a, b] is bounded and attainsits minimum on [a, b].

Exercise 4.13 Give an alternative proof of Theorem 4.11 with the following outline that relies onthe sequential definition of continuity: assume that M = sup

x∈[a,b]f(x) but f(x) 6= M for any x ∈ [a, b].

Show that then there exists a sequence xn such that all xn ∈ [a, b] and f(xn) → M as n → +∞.Use the Bolzano-Weierstrass lemma to show that xn has a subsequence xnk

that converges to somepoint z ∈ [a, b]. Show that f(x) = M .

Exercise 4.14 (i) Give an example of a continuous function on the open interval (0, 1) that isunbounded. (ii) Give an example of a continuous function on the open interval (0, 1) that is boundedbut does not attain its maximum or minimum on (0, 1). (iii) Give an example of a discontinuousfunction on the closed interval [0, 1] that is unbounded. (iv) Give an example a discontinuousfunction on the closed interval [0, 1] that is bounded but does not attain its maximum or minimumon [0, 1].

4.3 Uniform continuity

Definition 4.15 A function f is uniformly continuous on a set E if for every ε > 0 there exists δ > 0so that |f(x)− f(y)| < ε for all x, y ∈ E such that |x− y| < δ.

Exercise 4.16 Show that the function f(x) = sin(1/x) is continuous but not uniformly continuouson the open interval (0, 1). Hint: consider the points xk = 1/(2πk + π/2) and yk = 1/(2πk − π/2)with k ∈ N.

Exercise 4.17 (i) Let a function f be continuous on the open interval (a, b) and assume that f isbounded on (a, b): there exists M so that |f(x)| ≤ M for all x ∈ (a, b) and also that f takes eachvalue y ∈ [−M,M ] at most finitely many times. Show that then lim

x→af(x) exists.

(ii) Let a function f be uniformly continuous on the open interval (a, b). Show that then limx→a

f(x)

exists.

Proposition 4.18 A uniformly continuous function defined on a bounded set E is bounded.

Proof. Assume that f is unbounded on (a, b). Then there exists a sequence of points xk ∈ (a, b)such that

|f(xk+1)| ≥ |f(xk)|+ k.

It follows, in particular, that

|f(xm)− f(xk)| > k for any k and m > k. (4.2)

Uniform continuity of f implies that there exists δ0 > 0 so that |f(x) − f(y)| < 1 for all x, y ∈ Esuch that |x − y| < δ0. The sequence xk is bounded, hence it has a convergent subsequence xnk

.Therefore, the sequence xnk

is Cauchy. In particular, there exists N such that |xnk− xnm | < δ0 for

all k,m ≥ N . But then |f(xnk)− f(xnm)| < 1 for all k,m ≥ N , which contradicts (4.2). Thus, the

function f has to be bounded on E. �

Theorem 4.19 A function that is continuous on a closed interval [a, b] is uniformly continuous onthat interval.

17

Page 18: Lecture notes for Math 61CM, Analysis, Version 2018math.stanford.edu/.../STANF61CM-18/lecture-notes-61cm-18-analysi… · Lecture notes for Math 61CM, Analysis, Version 2018 Lenya

Proof. Let f be continuous on [a, b]. Then, given ε > 0 for each x ∈ [a, b] there exists δx > 0 suchthat |f(x) − f(y)| < ε/10 for all y such that |y − x| < δx. Let us consider the family of smalleropen intervals Ix = (x− δx/2, x+ δx/2). As they cover the closed interval [a, b], it follows from theHeine-Borel lemma that there exists a finite sub-collection Ix1 , . . . , IxN of such intervals that alsocovers [a, b]. Let us set δ = min(δx1/2, . . . , δxN /2). We will show that for any x, y ∈ [a, b] suchthat |x−y| < δ we have |f(x)−f(y)| < ε – this will prove the uniform continuity of f . Given such xand y, since the intervals Ix1 , . . . , IxN cover [a, b], there exists k ∈ {1, . . . , N} such that x ∈ Ixk ,and |xk − x| < δxk/2. Then we also have

|y − xk| ≤ |y − x|+ |x− xk| < δ +δxk2≤ δxk

2+δxk2

= δxk .

Hence, both x and y belong to the interval (xk − δxk , xk + δxk). It follows that

|f(x)− f(xk)| <ε

10, |f(y)− f(xk)| <

ε

10.

The triangle inequality now implies that

|f(x)− f(y)| ≤ |f(x)− f(xk)|+ |f(y)− f(xk)| <ε

10+

ε

10< ε.

Thus, we have shown that for any ε > 0 we can find δ > 0 so that for any x, y ∈ [a, b] suchthat |x− y| < δ we have |f(x)− f(y)| < ε, which means that the function f is uniformly continuouson [a, b]. �

5 Open, closed and compact sets in Rn

5.1 Open and closed sets

Let us recall that the distance between two points x = (x1, . . . , xn) ∈ Rn and y = (y1, . . . , yn) ∈ Rnis

d(x, y) = [(x1 − y1)2 + · · ·+ (xn − yn)2]1/2,

also often denoted as‖x− y‖ = [(x1 − y1)2 + · · ·+ (xn − yn)2]1/2.

We will use the notation B(x, r) for a ball centered at a point x ∈ Rn of radius r:

B(x, r) = {y ∈ Rn : d(x, y) < r},

and the closed ball asB(x, r) = {y ∈ Rn : d(x, y) ≤ r}.

Definition 5.1 A set U ⊂ Rn is open if for every x ∈ U there is an open ball B(x, r) that iscontained in U .

Definition 5.2 A point x ∈ Rn is a limit point of a set G if for any open ball B(x, r) with r > 0there exists a point y ∈ B(x, r) such that y ∈ G and y 6= x.

Definition 5.3 A set C ⊂ Rn is closed if all limit points of C are contained in C.

Definition 5.4 The complement of a set S ⊂ Rn is the set Sc = Rn \ S.

18

Page 19: Lecture notes for Math 61CM, Analysis, Version 2018math.stanford.edu/.../STANF61CM-18/lecture-notes-61cm-18-analysi… · Lecture notes for Math 61CM, Analysis, Version 2018 Lenya

Exercise 5.5 (i) Let F be a collection of sets and let U =⋃S∈F S. Show that U c =

⋂S∈F S

c.(ii) Let F be a collection of sets and let G =

⋂S∈F S. Show that Gc =

⋃S∈F S

c.

Theorem 5.6 A set C ⊂ Rn is closed if and only if its complement U = Rn \ C is open.

Proof. Let C be a closed set and x ∈ U = Rn \ C. As C is closed, we know that x is not a limitpoint of C. Thus, there exists a ball B(x, r) such that there is no point from C, except, possibly, x,in B(x, r). As x ∈ U = Rn \ C, we know that x 6∈ C, hence the ball B(x, r) contains no pointsfrom C, hence B(x, r) ⊂ U . This shows that U is an open set.

Next, assume a set C is such that U = Rn \ C be an open set, and let x be a limit point of C.We will show that x ∈ C. Indeed, if x 6∈ C, then x ∈ U and, as U is open, there exists a ball B(x, r)that is contained in U . Hence, no points in B(x, r) are in C. This contradicts the assumption that xis a limit point of C. Thus, C contains all its limit points and is closed. �

Theorem 5.7 (i) The union of any collection of open sets is itself an open set.(ii) The intersection of any collection of closed sets is a closed set.

(iii) The intersectionN⋂i=1

Ui of a finite collection of open sets U1, . . . , UN is an open et.

(iv) The unionN⋃i=1

Fi of a finite collection of closed sets F1, . . . , FN is a closed set.

Proof. Let us prove (i). Let Uα, α ∈ A (here, A is just some set of indices, finite or infinite) be

a collection of open sets and let x ∈ U =⋃α∈A

Uα. Then there exists some Uα with α ∈ A such

that x ∈ Uα. As Uα is an open set, there exists r > 0 such that the ball B(x, r) is containedin Uα. But then B(x, r) ⊆ U , hence U is an open set. Note that (i) is equivalent to (ii) because ofTheorem 5.6 and Exercise 5.5, hence (ii) is also proven.

To prove (iii), let the sets U1, . . . , UN be open, and take x ∈ U =N⋂i=1

Ui. As all Ui are open sets,

there exist r1, . . . , rN > 0 such that B(x, rk) ∈ Uk for all k ∈ {1, . . . , N}. Let r = min(r1, . . . , rN ),then B(x, r) ∈ U , hence U is an open set. Note that (iii) is equivalent to (iv) because of Theorem 5.6and Exercise 5.5, hence (iv) is also proven. �

Here is a bunch of related definitions.

Definition 5.8 (i) A point x is an interior point of a set U ⊂ Rn if there is a ball B(x, r) that iscontained in U .(ii) A point x is a boundary point of a set U ⊂ Rn if for any r > 0 the ball B(x, r) contains bothpoints in U and not in U . The boundary of U is denoted as ∂U .(iii) The closure U of a set U is the union of U and the set of its limit points.

Exercise 5.9 Show that U is the union of U and the boundary ∂U .

Exercise 5.10 Show that the closure S of any set S is a closed set.

Exercise 5.11 Show that a set F is closed if and only F = F .

19

Page 20: Lecture notes for Math 61CM, Analysis, Version 2018math.stanford.edu/.../STANF61CM-18/lecture-notes-61cm-18-analysi… · Lecture notes for Math 61CM, Analysis, Version 2018 Lenya

5.2 Compact sets in Rn

Definition 5.12 A set K ⊂ Rn is compact if from every covering of K by sets that are open in Rn,one can extract a finite sub-covering of K.

The Heine-Borel lemma implies that a closed interval [a, b] is a compact set in R. Here is a general-ization of this result to Rn. A closed n-dimensional interval is a set of the form

Iab = {x ∈ Rn : ai ≤ xi ≤ bi, i = 1, . . . , n},

with some fixed numbers ai < bi. We may also define the open n-dimensional interval as

Ioab = {x ∈ Rn : ai < xi < bi, i = 1, . . . , n},

Exercise 5.13 A closed n-dimensional interval is a closed set in Rn.

Exercise 5.14 An open n-dimensional interval is an open set in Rn.

Proposition 5.15 A closed n-dimensional interval is a compact set in Rn.

Proof. Let us fix ai < bi, i = 1, . . . , N , and consider the corresponding closed n-dimensionalinterval Iab. Assume that there is a covering G of Iab by open sets such that no finite sub-collectionof G covers Iab. Let us bisect each interval ai ≤ xi ≤ bi in half – this partitions Iab into 2n

closed n-dimensional sub-intervals. Note that least one of these n-dimensional sub-intervals can notbe covered by a finite sub-collection of G. Let us call this n-dimensional sub-interval I(1). Continuingthis process, we obtain a nested sequence of closed n-dimensional intervals I(k) such that I(k+1) iscontained in I(k), and none of which admit a finite sub-covering. Each I(k) has the form

I(k) = {x ∈ Rn : a(k)i ≤ xi ≤ b

(k)i , i = 1, . . . , n},

with some a(k)i < b

(k)i , i ∈ {1, . . . , n}. By construction, for each i ∈ {1, . . . , n}, the intervals [a

(k)i , b

(k)i ]

form a nested collection of closed intervals whose lengths goes to zero as k → +∞. Hence, bythe Nested Intervals Lemma, for each i there exists a unique ci ∈ R such that ci belongs to all

intervals [a(k)i , b

(k)i ], for all k ≥ 1. Then, the point c = (c1, . . . , cn) ∈ Iab belongs to all n-dimensional

intervals I(k), for all k ≥ 1. Since c ∈ Iab, there exists an open set U ∈ G so that c ∈ U . As U isan open set, there exists r > 0 so that the ball B(c, r) is contained in U . However, by construction,then there exists N ∈ N so that all I(n) with n ≥ N are contained in U .

Exercise 5.16 Prove this last assertion. Hint: first, show that a(k)i → ci and b

(k)i → ci as k → +∞

for each i ∈ {1, . . . , n}.

This contradicts the fact that none of I(k) can be covered by finitely many sets from G. Hence, afinite sub-cover of Iab by sets from G has to exist, thus the set Iab is compact. �

Proposition 5.17 Any compact set K in Rn is closed.

Proof. Let K be a compact set and z be not in K. For each x ∈ K, we take rx = ‖x − z‖/3and consider the open ball B(x, rx). Note that z 6∈ B(x, rx) for any x ∈ K and, moreover, theballs B(x, rx) and B(z, rx) do not intersect for any x ∈ K. On the other hand, the open balls B(x, rx)form a cover G of K by open sets. As the set K is compact, there exists a finite sub-cover of K by thesets in G. That is, there exist x1, . . . , xN ∈ K such that K is contained in the union

⋃Ni=1B(xi, rxi).

20

Page 21: Lecture notes for Math 61CM, Analysis, Version 2018math.stanford.edu/.../STANF61CM-18/lecture-notes-61cm-18-analysi… · Lecture notes for Math 61CM, Analysis, Version 2018 Lenya

Let R = min(r1, . . . , rN ), then, by construction, the ball B(z,R) does not intersect any of theballs B(xk, rxk), k = 1, . . . , N . As these balls cover K, it follows that the ball B(z,R) contains nopoints from K, hence z is not a limit point of K. Thus, K is a closed set. �

It is not true that all closed sets are compact in Rn. For example, the whole space Rn is aclosed set that is not compact, as is the closed half-space {x = (x1, . . . , xn) ∈ Rn : x1 ≥ 0}, or thecomplement of the open unit ball {x ∈ Rn : |x| ≥ 1}. The next proposition shows that a closed setthat is contained in some compact set is itself compact.

Proposition 5.18 Let a set K ⊂ Rn be compact and K1 be a subset of K that is a closed set.Then K1 is a compact set.

Proof. Let K1 ⊂ Rn be a closed set that is a subset of a compact set K ⊂ Rn, and let G1 bea collection of open sets that covers K1. Since K1 is a closed set, the set U1 = Kc

1 = Rn \ K1 isopen. Let us add this set to the collection G1 – the resulting collection G covers not just K1 butalso the set K. Since K is compact, there is a finite sub-collection O1, . . . , ON of sets in G thatcovers K. If none of the sets Ok is U1, then this is a sub-collection not only of G but also of theoriginal collection G1. Moreover, as this sub-collection covers K, it also covers K1 – thus we havefound a finite sub-collection of sets in G1 that covers K1. On the other hand, if one of the sets Ok,with k = 1, . . . , N , say, ON is U1, then we can remove it and consider the collection O1, . . . , ON−1.Note that, since the collection O1, . . . , ON covers K, it also covers K1. Therefore, for any x ∈ K1

we can find k = 1, . . . , N such that x ∈ Ok. However, since U1 does not intersect K1 and we assumethat ON = U1, it is impossible that k = N for any x ∈ K1. Therefore, the collection O1, . . . , ON−1covers K1. However, all sets Ok, k = 1, . . . , N − 1 are actually not only in G but also in the originalcollection G1. Thus, we have found a finite sub-collection of G1 that covers K1. Since G1 is anarbitrary collection of open sets that covers K1, it follows that K1 is a compact set. �

Definition 5.19 A set S ⊂ Rn is bounded if there exists R > 0 so that S is contained in theball B(0, R).

In other words, a set S is bounded if there exists R > 0 so that for any x ∈ S we have |x| < R.

Proposition 5.20 If a set K ⊂ Rn is compact then K is bounded.

Proof. Let K be a compact set, and consider the cover of K by open balls of the form B(x, 1), forall x ∈ K. As the set K is compact, there exists a finite sub-cover of K by such open balls, that is,a finite collection of open balls B(x1, 1), B(x2, 1), . . . , B(xN , 1) with xk ∈ K, that covers K. Let usset

R = 2 + max1≤i≤N

|xi|.

As the balls B(xk, 1), k = 1, . . . , N , cover K, for any x ∈ K we can find k such that x ∈ B(xk, 1).Then we have, by the triangle inequality,

|x| ≤ |x− xk|+ |xk| ≤ 1 + max1≤i≤N

|xi| < R,

hence K is a bounded set. �

Theorem 5.21 A set K ⊂ Rn is compact if and only if it is closed and bounded.

Proof. Propositions 5.17 and 5.20 show that if K ⊂ Rn is a compact set then it is closed andbounded.

Let us now assume that a set K ⊂ Rn is closed and bounded. Since K is bounded, it is containedin some closed n-dimensional interval I. Proposition 5.15 shows that I is a compact set. Thus, Kis a closed (by assumption) subset of a compact set. Now, Proposition 5.18 implies that K is acompact set and we are done. �

21

Page 22: Lecture notes for Math 61CM, Analysis, Version 2018math.stanford.edu/.../STANF61CM-18/lecture-notes-61cm-18-analysi… · Lecture notes for Math 61CM, Analysis, Version 2018 Lenya

6 Continuous mappings in Rn

6.1 Convergence of sequences in Rm

Let x(n) ∈ Rm be a sequence with values in Rm. In other words, x(n) = (x(n)1 , x

(n)2 , . . . , x

(n)m ) is

an m-tuple of real-valued sequences x(n)1 , x

(n)2 , . . . , x

(n)m .

Definition 6.1 Let ` ∈ Rm. We say that limn→∞

x(n) = ` if for every ε > 0 there exists N so that for

all n ≥ N we have ‖x(n) − `‖ < ε.

Note that this definition is equivalent to the following: limn→∞

x(n) = ` if

limn→∞

‖xn − `‖ = 0,

with the limit understood simply in the sense of convergence of the sequence ‖xn−`‖ of real numbersto zero.

Exercise 6.2 Show that a sequence {x(1), x(2), . . . , x(n), . . . } ∈ Rm converges to ` = (`1, . . . , `m) ∈Rm if and only if

limn→∞

x(n)k = `k, for all 1 ≤ k ≤ m.

In other words, convergence of sequences with values in Rm is equivalent to the convergence of all mcomponents of the sequence. The key to this exercise is the following pair of inequalities that holdsfor all x, y ∈ Rm:

|xi − yi| ≤ ‖x− y‖ for all 1 ≤ i ≤ m,

‖x− y‖ ≤√m max

1≤i≤m|xi − yi|. (6.1)

The first inequality in (6.1) is obvious from the definition of ‖x − y‖ and the second also followsimmediately from this definition:

‖x− y‖2 = (x1 − y1)2 + · · ·+ (xm − ym)2 ≤ m max1≤i≤n

|xi − yi|2.

Taking the square root in both sides gives the second inequality in (6.1).

Definition 6.3 Let x(n) be a sequence in Rm. We say that x(n) is Cauchy if for every ε > 0 thereexists N ∈ N so that for all j, k ≥ N we have ‖x(j) − x(k)‖ < ε.

Proposition 6.4 A sequence {xn} ∈ Rm is Cauchy if and only if it is convergent.

Exercise 6.5 Prove this proposition, using inequalities (6.1) and the fact that sequences of realnumbers are Cauchy if and only if they are convergent.

6.2 Continuity of maps from Rn to Rm

We will now consider maps f : Rn → Rm and generalize the notions of continuity and convergencethat we have defined for real-valued functions on Rn and sequences of real numbers. Such map issimply a collection of m real-valued functions f1(x), . . . , fm(x) that each maps Rn to R. For example,

f(x1, x2) = (x21 + x22, x2)

22

Page 23: Lecture notes for Math 61CM, Analysis, Version 2018math.stanford.edu/.../STANF61CM-18/lecture-notes-61cm-18-analysi… · Lecture notes for Math 61CM, Analysis, Version 2018 Lenya

is a map from R2 to R2 with f1(x1, x2) = x21 + x22 and f2(x1, x2) = x2, while

g(x1, x2, x3) = (x1 + x2, cos(x1 + x3), ex1+x2+x3 , sin(x3))

is a map from R3 to R4 with

g1(x1, x2, x3) = x1+x2, g2(x1, x2, x3) = cos(x1+x3), g3(x1, x2, x3) = ex1+x2+x3 , g4(x1, x2, x3) = sin(x3).

We define the limit of a mapping f : Rn → Rm similarly to what we did for real-valued functions.

Definition 6.6 Let f : Rn → Rm be a mapping, a ∈ Rn, and ` ∈ Rm. We say that limx→a

f(x) = ` if

for every ε > 0 there exists δ > 0 so that for all x ∈ B(a, δ) such that x 6= a we have f(x) ∈ B(`, ε).

Note that we use a slightly different notation than in R: instead of writing ‖x−a‖ < δ we write x ∈B(a, δ), and instead of saying ‖f(x)− `‖ < δ we say f(x) ∈ B(`, ε) but the meaning is literally thesame. The reason for this choice is to emphasize the geometry of what is going on.

Definition 6.7 We say that a mapping f : U → Rm is continuous at x ∈ U ⊂ Rn if

limx′→x

f(x′) = f(x).

This definition is, of course, exactly the same as in R. The same proofs as in R show that if amap f : U → Rm is continuous at x ∈ U and λ ∈ R then λf is continuous at x, and that if twomaps f : U → Rm and g : U → Rm are both continuous at x ∈ U then so is f + g. The same is truefor composition of continuous maps.

Proposition 6.8 Let f : E → Rm be a map defined on an open set E ∈ Rn, and g : U → Rk be amap defined on an open set U ⊂ Rm. Assume that f is continuous at x ∈ E, that y = f(x) is in U ,and that g is continuous at y = f(x). Then the composition g ◦ f is defined in a ball around x andis continuous at x.

Proof. Given ε > 0, since g is continuous at y we can find r > 0 so that ‖g(y′) − g(y)‖ < ε forall y′ ∈ B(y, r). Moreover, as f is continuous at x, we can find δ > 0 so that ‖f(x′) − y‖ < r forall x′ ∈ B(x, δ). It follows that ‖g(f(x′))− g(f(x))‖ < ε for all x′ ∈ B(x, r) and we are done. �

6.3 Successive limits vs. the limit

One should not think that the limit of a function of several variables can be computed by takingsuccessively the limits in each variable. The standard example when this fails is

f(x1, x2) =

x1x2x21 + x22

, if (x1, x2) 6= (0, 0),

0, if (x1, x2) 6= (0, 0).

Then we havelimx2→0

f(x1, x2) = 0 for any x1 6= 0,

andlimx1→0

f(x1, x2) = 0 for any x2 6= 0,

so thatlimx2→0

( limx1→0

f(x1, x2)) = 0,

limx1→0

( limx2→0

f(x1, x2)) = 0,

23

Page 24: Lecture notes for Math 61CM, Analysis, Version 2018math.stanford.edu/.../STANF61CM-18/lecture-notes-61cm-18-analysi… · Lecture notes for Math 61CM, Analysis, Version 2018 Lenya

but f(x, x) = 1/2, so that the limit lim(x1,x2)→(0,0)

f(x1, x2) does not exist.

Here are some other variations on this theme. Consider

f(x1, x2) =

x21 − x22x21 + x22

, if (x1, x2) 6= (0, 0),

0, if (x1, x2) 6= (0, 0).

Then each of the successive limits exists but they are different:

limx2→0

f(x1, x2) =x21x21

= 1 for any x1 6= 0,

and

limx1→0

f(x1, x2) = −x22

x22= −1 for any x2 6= 0,

so thatlimx2→0

( limx1→0

f(x1, x2)) = −1,

limx1→0

( limx2→0

f(x1, x2)) = 1.

It is easy to see that lim(x1,x2)→(0,0) f(x1, x2) does not exist.The next interesting example is

f(x1, x2) =

x21x2x41 + x22

, if (x1, x2) 6= (0, 0),

0, if (x1, x2) 6= (0, 0).

Then the limit of f(x1, x2) is zero along any ray x1 = λt, x2 = µt, in the sense that for any λ, µ ∈ Rfixed, with at least one of λ, µ not equal to zero, we have

limt→0

f(λt, µt) = limt→0

λ2µt3

λ4t4 + µ2t2= 0.

However, if we take x1 = t, x2 = t2, then f(t, t2) = 1/2, hence, again, lim(x1,x2)→(0,0) f(x1, x2) doesnot exist.

Finally, consider the function

f(x1, x2) =

x1 + x2 sin( 1

x1

), if x1 6= 0,

0, if x1 = 0.

Then we have |f(x1, x2)| ≤ |x1|+|x2|, which immediately implies that the limit lim(x1,x2)→(0,0) f(x1, x2)exists and equals to zero. However, the successive limit

limx2→0

( limx1→0

f(x1, x2))

does not exist at all simply because the limit

limx1→0

f(x1, x2)

does not exist for any x2 6= 0.

24

Page 25: Lecture notes for Math 61CM, Analysis, Version 2018math.stanford.edu/.../STANF61CM-18/lecture-notes-61cm-18-analysi… · Lecture notes for Math 61CM, Analysis, Version 2018 Lenya

6.4 Continuous functions on compact sets

Definition 6.9 We say that a function f : E → Rm is uniformly continuous on a set E ⊂ Rn if forevery ε > 0 there exists δ > 0 so that d(f(x1), f(x2)) < ε for all x1, x−2 ∈ E such that d(x1, x2) < δ.

Proposition 6.10 If a mapping f : K → Rm is continuous on a compact set K ⊂ Rn then it isuniformly continuous on K.

Proposition 6.11 If a mapping f : K → Rm is continuous on a compact set K ⊂ Rn then it isbounded on K.

Proposition 6.12 If a mapping f : K → Rm is continuous on a compact set K ⊂ Rn then it attainsits maximum and minimum on K.

The proofs of all of the above propositions are exactly the same as for continuous real-valued functionson R defined on a closed interval: these proofs used nothing but the Heine-Borel lemma for closedintervals, and that is exactly the definition of a compact set. Check this!

6.5 Connected sets

In order to generalize the intermediate value theorem, we will need the notion of a connected set.

Definition 6.13 A continuous path is a continuous function Γ : [α, β] → Rm. We say that aset E ⊂ Rm is pathwise connected if for every x, y ∈ E there exists a continuous path Γ : [a, b]→ Rmsuch that Γ(α) = x, Γ(β) = y and Γ(t) ∈ E for all α ≤ t ≤ β. A domain U ⊆ Rm is a pathwiseconnected set that is open.

Note that by re-defining Γ(t) as Γ(t) = Γ((1 − t)a + tb) for 0 ≤ t ≤ 1, we can always assumethat α = 0 and β = 1 in the definition of the path.

An open ball B(a, r) and a closed ball B(a, r) are both pathwise connected sets in Rm, forany a ∈ Rm. Indeed, given any two points x, y ∈ B(a, r) we can define the path s(t) = (1− t)x+ ty,with 0 ≤ t ≤ 1. Then s(0) = x, s(1) = y and for each t ∈ [0, 1] we have

‖s(t)− a‖2 = (s1(t)− a1)2 + · · ·+ (sm(t)− am)2

= ((1− t)x1 + ty1 − a1)2 + · · ·+ ((1− t)xm + tym − am)2

= ((1− t)(x1 − a1) + t(y1 − a1))2 + · · ·+ ((1− t)(xm − am) + t(ym − am))2.

We now use the inequality((1− t)a+ tb)2 ≤ (1− t)a2 + tb2, (6.2)

that holds for all a, b ∈ R, and 0 ≤ t ≤ 1, to get

‖s(t)− a‖2 ≤ (1− t)(x1 − a1)2 + t(y1 − a1)2 + · · ·+ (1− t)(xm − am)2 + t(ym − am)2

= (1− t)‖x− a‖2 + t‖y − a‖2 ≤ (1− t)r2 + tr2 = r2.

It follows that s(t) ∈ B(a, r) for all t, which proves that B(a, r) is a connected set. The proof forthe closed ball B(a, r) is identical.

Exercise 6.14 Prove the inequality (6.2). Hint: use the fact that y = x2 is a convex functionfor x ∈ R.

25

Page 26: Lecture notes for Math 61CM, Analysis, Version 2018math.stanford.edu/.../STANF61CM-18/lecture-notes-61cm-18-analysi… · Lecture notes for Math 61CM, Analysis, Version 2018 Lenya

The following version of the intermediate value theorem holds for continuous functions defined onpathwise connected sets.

Proposition 6.15 Let E ⊂ Rn be a connected set, and f : E → R be a function that is continuouson E. Assume that there exists x ∈ E and y ∈ E such that f(x) = A and f(x) = B. Then, forany C ∈ R that is between A and B there exists z ∈ E such that f(z) = C.

Proof. Since E is a pathwise connected set and x, y ∈ E, there exists a path Γ : [0, 1] → E suchthat Γ(0) = x and Γ(1) = y. The function g(t) = f(Γ(t)), defined for 0 ≤ t ≤ 1, is continuous,as a composition of two continuous functions, g : [01, ] → E and f : E → R. In addition, wehave g(0) = f(x) = A and g(1) = f(y) = B. The intermediate value theorem for real valuedfunctions on [0, 1] then implies that there exists t0 ∈ [0, 1] such that g(t0) = C. This meansthat f(z) = C, with z = Γ(t0). �

7 Basic properties of metric spaces

We now take a detour to discuss some very basic properties of metric spaces, generalizing what weknow about R and Rn.

7.1 Convergence in metric spaces

If you go back to what we have really used about ‖x‖ n Rn, we will discover that what we reallyneeded in most situations was that the function d(x, y) = ‖x− y‖ was non-negative, equal to 0 onlyif x = y, symmetric, so that d(x, y) = d(y, x), and satisfied the triangle inequality. We thus makethe following definition.

Definition 7.1 A metric space (X, d) is a set X together with a map d : X ×X → R (called adistance function) such that

1. d(x, y) ≥ 0 for all x, y ∈ X, and d(x, y) = 0 if and only if x = y.

2. d(x, y) = d(y, x) for all x, y ∈ X,

3. (Triangle inequality) d(x, z) ≤ d(x, y) + d(y, z) for all x, y, z ∈ X.

Recall that a norm ‖.‖ : V → R on a vector space V over R (or C) is a map that is

1. positive definite, i.e. ‖x‖ ≥ 0 for all x ∈ V with equality if and only if x = 0,

2. absolutely homogeneous, i.e. ‖λx‖ = |λ| ‖x‖ for λ ∈ R (or C) and x ∈ V ,

3. satisfies the triangle inequality, i.e. ‖x+ y‖ ≤ ‖x‖+ ‖y‖ for all x, y ∈ V .

Then one easily checks that every normed vector space is a metric space with the induced metricd(x, y) = ‖x− y‖; for instance the triangle inequality for the metric follows from

d(x, z) = ‖x− z‖ = ‖(x− y) + (y − z)‖ ≤ ‖x− y‖+ ‖y − z‖ = d(x, y) + d(y, z),

where the inequality in the middle is the triangle inequality for norms.There are many interesting metric spaces that are not normed vector spaces, for the simple reason

that the distance function does not require that X is a vector space. For instance, for any set X wemay define a metric on it setting d(x, y) = 0 if x = y, d(x, y) = 1 if x 6= y.

26

Page 27: Lecture notes for Math 61CM, Analysis, Version 2018math.stanford.edu/.../STANF61CM-18/lecture-notes-61cm-18-analysi… · Lecture notes for Math 61CM, Analysis, Version 2018 Lenya

Another example of a metric space we will use often is the space C(K) of continuous real-valuedfunctions f : K → R, where K is a compact subset of Rn. The distance is defined as

d(f, g) = supx∈K|f(x)− g(x)|.

We will often simply look at K = [a, b], a closed interval on R.Convergence in metric spaces is defined exactly as in Rn, except we use the metric d(x, y) rather

than ‖x− y‖.

Definition 7.2 A sequence xn of points in a metric space (X, d) converges to x ∈ X, denotedas limxn = x, if for all ε > 0 there exists N ∈ N such that for all n ≥ N we have d(xn, x) < ε.

Exercise 7.3 Show that the limit of a sequence xn is unique (if it exists).

Exercise 7.4 Show that if X is a normed space and xn → x in X, then ‖xn‖ → ‖x‖. Hint: show

that∣∣∣‖xn‖ − ‖x‖∣∣∣ ≤ ‖x− xn‖.

7.2 Continuous mappings between metric spaces

We now turn to continuity of functions f : X → Y where (X, dX), (Y, dY ) are metric spaces,generalizing what we have done for maps from Rn to Rm.

Definition 7.5 Suppose (X, dX), (Y, dY ) are metric spaces. A function f : X → Y is continuousat a point a ∈ X if for all ε > 0 there exists δ > 0 such for all x ∈ X such that dX(x, a) < δ wehave dY (f(x), f(a)) < ε. A function is called continuous on X if it is continuous at all a ∈ X.

As you recall, for functions on R we have shown that f is continuous at a ∈ R if for any sequencexn → a we have f(xn)→ f(a). In general metric spaces, we have the same property.

Definition 7.6 Suppose (X, dX), (Y, dY ) are metric spaces. A function f : X → Y is sequen-tially continuous at the point a ∈ X if for every sequence xn in X which converges to a, we havelimn→∞ f(xn) = f(a).

We then have

Lemma 7.7 Suppose (X, dX), (Y, dY ) are metric spaces. A function f : X → Y is continuousat a ∈ X if and only if it is sequentially continuous at a ∈ X.

Proof. The proof is verbatim as in R, check this! �Let us give some examples. Let X = C[0, 1] and consider F : X → R defined as F (f) = f(3/4).

A sequence of functions fn ∈ C[0, 1] converges to f ∈ C[0, 1] if

supx∈R|fn(x)− f(x)| → 0 as n→ +∞.

In other words, for any ε > 0 there exists N so that

supx∈R|fn(x)− f(x)| < ε for all n ≥ N .

Rephrasing this again, this is equivalent to saying that for any ε > 0 there exists N so that

|fn(x)− f(x)| < ε for all n ≥ N and all x ∈ [0, 1].

27

Page 28: Lecture notes for Math 61CM, Analysis, Version 2018math.stanford.edu/.../STANF61CM-18/lecture-notes-61cm-18-analysi… · Lecture notes for Math 61CM, Analysis, Version 2018 Lenya

It follows, in particular, that for any ε > 0 there exists N so that

|fn(3/4)− f(3/4)| < ε for all n ≥ N ,

which is nothing but|F (fn)− F (f)| < ε for all n ≥ N ,

which means that F (fn) converges to F (f), thus F is continuous at all f ∈ C[0, 1].The Bolzano-Weierstrass theorem on R stated that bounded sequences had convergent subse-

quences. In particular, if xn is a sequence of points in a closed interval [a, b] ⊂ R, then it has asubsequence xnk

converging to some x ∈ [a, b]. This motivates the following generalization.

Definition 7.8 A metric space (X, dX) is sequentially compact if every sequence xn of pointsin (X, dX) has a convergent subsequence that converges to a point in X.

Here the word ‘sequentially’ refers to the fact that there is in fact an equivalent (in the setting ofmetric spaces) definition, in terms of the Heine-Borel property, which is usually called compactness.However, sequential compactness, which is, again, equivalent in the setting of metric spaces, isequally convenient.

Definition 7.9 A subset K of a metric space (X, dX) is sequentially compact if every sequence xnof points in K has a convergent subsequence that converges to a point in K.

Recall that in Rn a set is compact if and only if it is closed and bounded. In general metric spacesonly one implication is true.

Proposition 7.10 Let K be a sequentially compact subset of a metric space (X, d), then K is closedand bounded.

Proof. First, let us show that K is closed. Consider a limit point y ∈ X of the set K. Then thereis a sequence xn ∈ K such that xn → y. As K is sequentially compact, there is a subsequence xnk

that converges to a point in K. Since it is a subsequence of a convergent sequence, it must convergeto the same limit, hence y ∈ K which shows that K is closed. To show that K is bounded, assumeit is not. Then for any z ∈ X and a sequence xk ∈ K such that d(xn, z) ≥ n. Let us then fixsome z ∈ X. As K is compact, xn should have a convergent subsequence xnk

→ y ∈ K. However,we have, by the triangle inequality

d(xnk, z)− d(y, xnk

) ≤ d(y, z) ≤ d(y, xnk) + d(xnk

, z).

Hence, there exists N so that for all k ≥ N we have

d(y, z) ≥ d(xnk, z)− 1 ≥ nk − 1,

which is a contradiction. Thus, K must be bounded. �The next proposition shows that the other direction fails in general.

Proposition 7.11 The closed unit ball B(0, 1) ⊂ C[0, 1] is a closed and bounded set that is notsequentially compact.

Proof. The closed unit ball in C[0, 1] is the set

B(0, 1) = {f ∈ C[0, 1] : supx∈[0,1]

|f(x)| ≤ 1},

28

Page 29: Lecture notes for Math 61CM, Analysis, Version 2018math.stanford.edu/.../STANF61CM-18/lecture-notes-61cm-18-analysi… · Lecture notes for Math 61CM, Analysis, Version 2018 Lenya

that is, the set of all f ∈ C[0, 1] such that −1 ≤ f(x) ≤ 1 for all x ∈ [0, 1]. Consider the followingsequence of functions

fn(x) =

1, 0 ≤ x ≤ 1/2− 1/n,

2n(1/2− x+ 1/(2n)), 1/2− 1/n ≤ x ≤ 1/2− 1/(2n),0, , 1/2− 1/(2n) ≤ x ≤ 1.

Then all fn lie in B(0, 1) ⊂ C[0, 1] but the sequence fn is not Cauchy. This is because for m > 2n+1we have fm(1/2 − 1/(2n)) = 1 but fn(1/2 − 1/(2n)) = 0, so that ‖fn − fm‖ = 1. Therefore, thesequence fn can not have a convergent subsequence, and B(0, 1) is not a compact set.

Theorem 7.12 Suppose (X, d) is a sequentially compact metric space, and f : X → R is continuous.Then f is bounded, and it attains its maximum and minimum, i.e. there exist points a, b ∈ X suchthat f(a) = sup{f(x) : x ∈ X}, f(b) = inf{f(x) : x ∈ X}.

Proof. The proof is verbatim as in Rn except in Rn we relied on the Heine-Borel definitionof compactness, so let show how this is done using the sequential compactness. Let us show firstthat f(x) is uniformly bounded from above: there exists M > 0 so that f(x) ≤ M for all x ∈ X.Indeed, suppose it is not, i.e. there is no upper bound for f(X). Then, for any n ∈ N, thereexists xn ∈ X such that f(xn) > n. By the sequential compactness of X, the sequence xn has aconvergent subsequence, xnk

. Let us say x = limk→∞ xnk∈ X. Due to its continuity, f is sequentially

continuous at x, solimk→∞

f(xnk) = f(x).

In particular, applying the definition of convergence with ε = 1, there exists N such that k ≥ Nimplies |f(xnk

)− f(x)| < 1. But then

|f(xnk)| = |(f(xnk

)− f(x)) + f(x)| ≤ |f(xnk)− f(x)|+ |f(x)| < |f(x)|+ 1

for all k ≥ N . Since f(xnk) > nk ≥ k by the very choice of xnk

(note nk ≥ k is true for any subse-quence), this is a contradiction: choose any k > max(N, |f(x)|+1), and then the two inequalities arecontradictory. This completes the boundedness from above claim; a completely analogous argumentshows boundedness from below.

Let us show that the maximum is actually attained, the case of the minimum is identical. Theproof that we used in R works verbatim but let us give a slightly different proof. Now that weknow that f(x) is bounded from above, let M = supx∈X f(x). Then for all n ∈ N, M − 1

n is not anupper bound for f(x), so there exists xn ∈ X such that f(xn) > M − 1/n. The sequence xn has aconvergent subsequence, by the sequential compactness of X, and we let

a = limk→∞

xnk∈ X.

We claim that f(a) = M . Indeed, due to its continuity, f is sequentially continuous at a, so

limk→∞

f(xnk) = f(a).

On the other hand, since

M ≥ f(xnk) > M − 1

nk≥M − 1

k,

by the very choice of the xn, we have by the sandwich theorem that

limk→∞

f(xnk) = M.

In combination with the just observed sequential continuity, this gives f(a) = M , as desired. �A very similar proof gives

29

Page 30: Lecture notes for Math 61CM, Analysis, Version 2018math.stanford.edu/.../STANF61CM-18/lecture-notes-61cm-18-analysi… · Lecture notes for Math 61CM, Analysis, Version 2018 Lenya

Theorem 7.13 Suppose (X, dX), (Y, dY ) are metric spaces, X is sequentially compact, and f :X → Y is continuous. Then (f(X), dY ) is sequentially compact.

Proof. Suppose that yn is a sequence in f(X), that is, yn = f(xn) for some xn. We need to find asubsequence of yn which converges to a point in f(X) in the metric dY .

Since X is compact, xn has a convergent subsequence xnk. We let

x = limk→∞

xnk∈ X.

We claim that ynkconverges to y = f(x) ∈ f(X) in the metric dY . Indeed, by the continuity of f ,

we know that f is sequentially continuous at x, so

limk→∞

f(xnk) = f(x).

As f(xnk) = ynk

, this proves the claim, and thus the theorem. �

8 Basic facts about series

8.1 Convergence of a series

Given real numbers an, we will use the following definition to make sense of an infinite sum

a1 + a2 + · · ·+ an + . . . ,

that we will denote as∞∑k=1

ak,

or sometimes simply as∑

k ak. We call aj the individual terms of the series. Note that there is nomeaning yet assigned to the sum above, at the moment it is just a notation. The n-th partial sumof the series

∑∞k=1 ak is

Sn = a1 + · · ·+ an =

n∑k=1

ak.

Definition 8.1 We say that a series∑∞

k=1 ak converges if the sequence Sn of its partial sumsconverges, and call the limit of the sequence Sn the sum of the series

∑∞k=1 ak.

An immediate consequence of the definition is the Cauchy criterion for convergence that simplyrestates the Cauchy criterion for the sequence Sn of the partial sums, using the fact that for allm > n we have

Sm − Sn =

n∑k=m+1

Sk.

Theorem 8.2 (The Cauchy criterion for convergence) A series∑∞

k=1 ak is convergent if and onlyif for every ε > 0 there exists N so that for all m > n ≥ N we have

|an+1 + · · ·+ am| < ε.

Here are two important immediate consequence of the Cauchy criterion for convergence of a series.

Corollary 8.3 If only finitely many terms of a series are changed then the new series is convergentif and only if the original series is convergent.

30

Page 31: Lecture notes for Math 61CM, Analysis, Version 2018math.stanford.edu/.../STANF61CM-18/lecture-notes-61cm-18-analysi… · Lecture notes for Math 61CM, Analysis, Version 2018 Lenya

The proof is a simple exercise: be careful about how you choose N given ε > 0 in the definition ofthe Cauchy property.

Corollary 8.4 If a series∞∑k=1

an converges then limn→∞

an = 0.

The proof is, again, an exercise in the application of the Cauchy property: take m = n+ 1 and lookat what Sm − Sn is.

Th geometric series∑∞

k=1 rk plays an incredibly important role in analysis, and especially in

complex analysis that is outside the scope of this class.

Exercise 8.5 (i) Show that the series

∞∑k=0

rk converges if |r| < 1 and diverges if |r| > 1. Hint: recall

thatn∑k=0

rk =1− rk+1

1− r.

(ii) Show that the series

∞∑k=0

rk diverges both when r = −1 and r = 1. Now, consider the function

S(r) =

∞∑k=1

rk on the open interval r ∈ (−1, 1). Show that limr→1

S(r) does not exist but limr→−1

S(r)

exists. Why does it not contradict the divergence of the series∞∑k=1

(−1)k?

(iii) Show that the series∞∑k=1

1

kdiverges. Hint: note that we have

1

n+ 1+ · · ·+ 1

2n≥ 1

2n+ · · ·+ 1

2n=

n

2n=

1

2,

and use the Cauchy criterion.

8.2 Absolute convergence of a series

Definition 8.6 We say that a series

∞∑k=1

ak converges absolutely if the series

∞∑k=1

|ak| converges.

Note that since|an + · · ·+ am| ≤ |an|+ · · ·+ |am|,

the Cauchy criterion implies that if a series converges absolutely, then it converges. The converse isnot true. As an example, consider the series

1− 1 +1

2− 1

2+

1

3− 1

3+ . . .

that has partial sums equal either to zero or 1/n. Thus, it converges. However, part (iii) of Exer-cise 8.5 shows that it does not converge absolutely.

31

Page 32: Lecture notes for Math 61CM, Analysis, Version 2018math.stanford.edu/.../STANF61CM-18/lecture-notes-61cm-18-analysi… · Lecture notes for Math 61CM, Analysis, Version 2018 Lenya

Theorem 8.7 Let an ≥ 0, then the series∞∑k=1

an converges if and only if the sequence of its partial

sums Sn =n∑k=1

an is bounded.

Proof. The sequence Sn is increasing because an ≥ 0. Thus, Sn converges if and only if it isbounded. �

Theorem 8.8 (Comparison theorem) Assume that 0 ≤ an ≤ bn, then (i) if the series∞∑k=1

bn con-

verges then so does the series

∞∑k=1

an, and (ii) if the series

∞∑k=1

an diverges then so does the se-

ries

∞∑k=1

bn.

Exercise 8.9 Prove this theorem in two ways: (i) using the Cauchy criterion for convergence,and (ii) using Theorem 8.7.

Theorem 8.10 (The Weierstrass test) Assume that 0 ≤ |an| ≤ bn, then if the series∞∑k=1

bn converges

then the series∞∑k=1

an converges absolutely.

This is an immediate consequence of Theorem 8.8.

Criteria for series convergence

We now discuss some criteria for the convergence of a series, based on the Weierstrass and theconvergence of the geometric series

∑rn with |r| < 1.

Theorem 8.11 (The Cauchy test) Let∑an be a series and set

α = lim supn→∞

(|an|)1/n.

Then (i) if α < 1 then the series∑an converges absolutely, and (ii) if α > 1 then the series diverges.

Note that if α = 1 then the Cauchy criterion says nothing about the convergence. As an example ofa divergent series with α = 1, consider

∑n(1/n), then

α = lim supn→∞

1

n1/n= 1,

but the series diverges. On the other hand, the series∑

n(1/n2) has

α = lim supn→∞

1

n2/n= 1,

but the series converges. This is because for n ≥ 2 we have

1

n2≤ 1

n(n− 1)=

1

n− 1− 1

n.

32

Page 33: Lecture notes for Math 61CM, Analysis, Version 2018math.stanford.edu/.../STANF61CM-18/lecture-notes-61cm-18-analysi… · Lecture notes for Math 61CM, Analysis, Version 2018 Lenya

and the series∞∑n=1

( 1

n− 1− 1

n

)converges because its partial sums are explicit:

Sn =(

1− 1

2

)+(1

2− 1

3

)+(1

3− 1

4

)+ · · ·+

( 1

n− 1− 1

n

)= 1− 1

n,

and Sn → 1 as n→ +∞.Proof of Theorem 8.11. Assume that α < 1, then there exists N so that for all n > N we

have

|an|1/n ≤ β = α+1− α

2=

1 + α

2< 1,

so that for all n > N we have |an| ≤ βn. As 0 < β < 1, the series∑

n βn converges, hence the

Weierstrass criterion implies that the series∑an converges absolutely.

On the other hand, if α > 1 then there exists N so that for all n > N we have

|an|1/n ≥ γ = α− α− 1

2=

1 + α

2> 1,

hence |an| > γn. As γ > 1, it follows that it is not true that an → 0 as n → +∞, hence the series∑n an diverges. �Here is another way to compare with a geometric series.

Theorem 8.12 (The d’Alembert test) Suppose that the limit

α = limn→+∞

∣∣∣an+1

an

∣∣∣exists for a series

∑n an. Then (i) if α < 1, the series converges absolutely, and (ii) if α > 1, the

series diverges.

Proof. (i) If α < 1 then there exists N so that for all n ≥ N we have

|an+1| ≤ r|an|,

where r = (1 + α)/2 < 1. Hence, for all n ≥ N we have |an| ≤ aNrn−N . As r < 1, the series an

converges absolutely.(ii) If α > 1, then there exists N so that for all n ≥ N we have

|an+1| ≥ r|an|,

where r = (1 + α)/2 > 1. Hence, for all n > N we have |an| ≥ |aN |rn−N with r > 1, hence it isimpossible that an → 0 as n→ +∞, because |an| ≥ |aN |. �

The next proposition is both important, and its proof uses a very useful tool.

Theorem 8.13 The series∑

n(1/np) converges for all p > 1.

Proof. According to Theorem 8.7, it suffices to prove that the sequence of partial sumes

Sn =n∑k=1

1

kp

33

Page 34: Lecture notes for Math 61CM, Analysis, Version 2018math.stanford.edu/.../STANF61CM-18/lecture-notes-61cm-18-analysi… · Lecture notes for Math 61CM, Analysis, Version 2018 Lenya

is bounded. As an ≥ 0, it is actually sufficient to show that the sequence S2n is bounded. Note thatan = 1/np is a decreasing sequence. Hence we may write

S2n = a1 + a2 + a3 + a4 + · · ·+ a2n ≤ a1 + a2 + a2 + a4 + a4 + a4 + a4 + a8 + · · ·+ a2n

= a1 + 2a2 + 4a4 + 8a8 + · · ·+ 2na2n .

Recalling that an = 1/np, we have

S2n ≤n∑k=1

2k1

(2k)p=

n∑k=1

1

(2k)p−1=

n∑k=1

1

(2p−1)k=

1− rn+1

1− r≤ 1

1− r,

with r = 1/2p−1 < 1 – here we use the fact that p > 1. Therefore, the partial sums S2n are bounded,and hence so are the partial sums Sn, and the series converges. �

8.3 Uniform convergence of functions and series

We now discuss the notion of uniform convergence of functions.

Definition 8.14 A sequence of functions fn : E → R converges to a function f : E → R uniformlyon E if for any ε > 0 there exists N so that |f(x) − fn(x)| < ε for all n ≥ N and all x ∈ E. Wesometimes use the notation fn ⇒ f on E.

As an example of a uniformly convergent sequence, consider fn(x) ≡ 1/n on R, or any other setE ⊂ R. The sequence fn(x) = x/n converges uniformly on [0, 1] to f ≡ 0 but is not uniformlyconvergent on R, thus the notion of uniform convergence very much depends on the set E on whichwe consider the convergence.

Definition 8.15 A sequence of functions fn : E → R is uniformly Cauchy on E if for any ε > 0there exists N so that |fm(x)− fn(x)| < ε for all n,m ≥ N and all x ∈ E.

Here is a useful criterion for the uniform convergence.

Theorem 8.16 A sequence of functions fn : E → R is uniformly convergent on f : E → R if andonly if fn is uniformly Cauchy on E.

Proof. First, let fn ⇒ f on a set E. Then for any ε > 0 there exists N so that |f(x)−fn(x)| < ε/10for all n ≥ n and all x ∈ E. The triangle inequality implies that for any n,m ≥ N and all x ∈ E wehave

|fn(x)− fm(x)| ≤ |fn(x)− f(x)|+ |f(x)− fm(x)| ≤ ε

10+

ε

1)< ε,

hence fn is a uniformly Cauchy sequence on E.Next, assume that fn is a uniformly Cauchy sequence on E. Then, for each x ∈ E fixed, the

sequence of numbers fn(x) is Cauchy, thus it converges to some limit that we denote by f(x). Asthe sequence fn is uniformly Cauchy on E, given ε > 0 there exists N so that for all n,m ≥ n andall x ∈ E we have

|fn(x)− fm(x)| < ε

10.

Fixing m > N and letting n→∞ above, we deduce that

|f(x)− fm(x)| < ε

10, for all m ≥ n and all x ∈ E,

which means that fn ⇒ f on E. �

34

Page 35: Lecture notes for Math 61CM, Analysis, Version 2018math.stanford.edu/.../STANF61CM-18/lecture-notes-61cm-18-analysi… · Lecture notes for Math 61CM, Analysis, Version 2018 Lenya

Theorem 8.17 Let fn : E → R be continuous functions on E. Assume that fn converge uniformlyto f on E, then f is continuous.

Proof. This is a homework problem, will be filled in later.In the same way, we can talk about uniform convergence of a series of functions.

Definition 8.18 Let an : E → R be functions from a set E to R. The series∑

n an(x) convergesuniformly on E if the corresponding sequence of partial sums Sn(x) =

∑nk=1 ak(x) converges uni-

formly on E.

Restating Theorem 8.16 for the uniform convergence of a series gives the following.

Theorem 8.19 A series∑

n an(x) converges uniformly on E if and only if the corresponding se-quence of partial sums Sn(x) =

∑nk=1 ak(x) is uniformly Cauchy on E.

Restating Theorem 8.17 for the uniform convergence of series of continuous functions gives thenext theorem.

Theorem 8.20 Let an : E → R be continuous functions from a set E to R. If a series∑

n an(x)converges uniformly on E to S(x) =

∑∞n=1 an(x), then the function S(x) is continuous.

Here is a very useful criterion for the uniform convergence of a series.

Theorem 8.21 (The Weierstrass test) Let an : E → R be functions from a set E to R. If thereexists a sequence of numbers bn ≥ 0 such that |an(x)| < bn, and the series

∑n bn converges, then

the series∑

n an(x) converges uniformly on E. It also converges absolutely for each x ∈ E. If,in addition, each function an(x) is continuous then the sum S(x) =

∑∞n=1 an(x) is a continuous

function.

Proof. The absolute convergence follows simply from Theorem 8.10 (another Weierstrass test). Forthe uniform convergence, note that the partial sums

Sn(x) =

n∑k=1

ak(x)

satisfy

|Sn(x)− Sm(x)| =∣∣∣ n∑k=m+1

ak(x)| ≤n∑

k=m+1

|ak(x)| ≤n∑

k=m+1

bk,

for all x ∈ E. As the series∑bk is convergent, it is Cauchy, thus given any ε > 0 we can find N so

that for all n,m ≥ n we haven∑

k=m+1

bk < ε.

It follows that then|Sn(x)− Sm(x)| < ε

for all x ∈ E. Thus, the series∑an(x) is uniformly Cauchy, hence it is uniformly convergent.

Finally, the continuity of S(x) follows from the uniform convergence of the series∑

n an(x) on theset E that we have just proved, and Theorem 8.20. �

35

Page 36: Lecture notes for Math 61CM, Analysis, Version 2018math.stanford.edu/.../STANF61CM-18/lecture-notes-61cm-18-analysi… · Lecture notes for Math 61CM, Analysis, Version 2018 Lenya

8.4 Convergence of power series

A series of the form

a0 + a1(x− x0) + a2(x− x0)2 + · · ·+ an(x− x0)n + . . .

is called a power series around x0, or simply a power series. We are interested in two questions:first, if the series converges and second, if the sum is a continuous function. The is answered by thefollowing theorem.

Theorem 8.22 LetR−1 = lim sup

n→∞|an|1/n

if the lim sup in the right side is different from zero, otherwise set R = +∞. Then the series

∞∑n=1

an(x− x0)n

converges for all x such that |x−x0| < R and diverges for each x such that |x−x0| > R. Moreover,convergence is uniform on any interval |x− x0| ≤ r with 0 < r < R, and the sum

S(x) =

∞∑n=1

an(x− x0)n

is continuous on the open interval |x− x0| < R.

Proof. We will set x0 = 0 without loss of generality. Note that the individual terms of the seriesare cn(x) = anx

n, and|cn|1/n = |x|(|an|)1/n,

so that

lim supn→∞

|cn|1/n = |x| lim supn→∞

|an|1/n =|x|R.

Now, the Cauchy criterion, Theorem 8.11 tells us that the series converges if |x| < R and diverges in|x| > R, the first claim of the theorem. To show that the series converges uniformly on any interval[−r, r] with r < R, note that we can find N so that for all n > N we have

|an|1/n ≤1

(R+ r)/2,

so that

|an| ≤( 2

R+ r

)n.

Then, for all n ≥ n and all x ∈ [−r, r] we have

|an||x|n ≤( 2

R+ r

)n|x|n ≤

( 2r

R+ r

)n= sn, s =

2r

R+ r.

As r < R, we know that 0 < s < 1, so that the series∑sn converges. Now, the Weistress test,

Theorem 8.21, implies that the series converges uniformly and the sum is a continuous function on[−r, r]. As this is true for all 0 < r < R, the sum is continuous on the whole open interval (−R,R). �

36

Page 37: Lecture notes for Math 61CM, Analysis, Version 2018math.stanford.edu/.../STANF61CM-18/lecture-notes-61cm-18-analysi… · Lecture notes for Math 61CM, Analysis, Version 2018 Lenya

Exercise 8.23 (i) Show that the power series

1 + x+ x2 + · · ·+ xn + . . .

converges for |x| < 1.(ii) Show that the power series

1 + x+x2

2!+x3

3!+ · · ·+ xn

n!+ . . .

converges for all x ∈ R.(iii) Show that the power series

1 + x+x2

2+ · · ·+ xn

n+ . . .

converges for |x| < 1.

9 Differentiability in Rn

9.1 Differentiability in R

In this section, we deal with differentiability of functions defined on the real line, so x ∈ R throughout.

Definition 9.1 Let f be a function from an open set U to R, and a ∈ U If the limit

f ′(a) = limx→a

f(x)− f(a)

x− aexists then f ′(a) is called the derivative of f at a.

Equivalently, we f(x) is differentiable at a if

f(x)− f(a)

x− a= f ′(a) + α(x), (9.1)

and α(x)→ 0 as x→ a. The notation is α(x) = o(1) as x→ a. We say that α(x) = o(|x− a|m) asx→ a if

limx→a

α(x)

|x− a|m= 0.

We say that α(x) = O(|x− a|m) as x→ a if

lim supx→a

|α(x)||x− a|m

< +∞.

In other words, α(x) = O(|x− a|m) as x→ a if there exists r0 > 0 and C > 0 so that

|α(x)||x− a|m

< C for all x such that |x− a| < r0.

In other words, α(x) = o(|x − a|m) if α(x) → 0 ”faster” than |x − a|m and α(x) = O(|x − a|m) ifα(x)→ 0 ”at least as fast” as |x− a|m.

Now, we can restate (9.1) as

f(x) = f(a) + f ′(a)(x− a) + o(|x− a|), (9.2)

that is, f(x) is approximated, to the leading order by the linear function f(a) + f ′(a)(x − a). Weoften write this as

f(a+ h) = f(a) + f ′(a)h+ o(|h|). (9.3)

37

Page 38: Lecture notes for Math 61CM, Analysis, Version 2018math.stanford.edu/.../STANF61CM-18/lecture-notes-61cm-18-analysi… · Lecture notes for Math 61CM, Analysis, Version 2018 Lenya

Exercise 9.2 An important observation is that if α(h) = o(|h|) and β(h) is bounded, then (αβ)(h) =o(|h|) as well. Check this!

Theorem 9.3 Let U be an open set. If f : U → R is differentiable at x ∈ U then f is continuousat a.

Proof. This follows immediately from (9.2). �

Theorem 9.4 Let U be an open set. If f : U → R and g : U → R are differentiable at x ∈ U thenf + g and fg are differentiable at a, with

(f + g)′(a) = f ′(a) + g′(a), (fg)′(a) = f(a)g′(a) + f ′(a)g(a).

If, in addition, g(a) 6= 0, then f/g is differentiable at a and(fg

)′=f ′(a)g(a)− f(a)g′(a)

g2(a).

Proof. We will only prove this for fg, and use the o(h) notation, to get the reader used to it:

(fg)(a+ h)− (fg)(a) = (f(a) + f ′(a)h+ o(|h|))(g(a) + g′(a)h+ o(|h|)− f(a)g(a)

= (f ′(a)g(a) + f(a)g′(a))h+ o(|h|).

For 1/g we note that for α 6= 0 we have

1

1 + βh+ o(|h|)= 1− βh+ o(|h|).

Warning: the o(|h|) in the left and the right sides above are not the same functions. This identitymeans that if α(h)/h→ 0 as h→ 0, then

1

1 + βh+ α(h)− (1− βh) = hγ(h),

where γ(h)→ 0 as h→ 0. Make sure you understand this! Now, we can write

1

g(a+ h)− 1

g(a)=

1

g(a) + g′(a)h+ o(|h|)− 1

g(a)=

1

g(a)

1

1 + (g′(a)/g(a))h+ o(|h|)− 1

g(a)

=1

g(a)

(1− g′(a)

g(a)h+ o(|h|)

)− 1

g(a)= − g

′(a)

g2(a)h+ o(|h|),

so that (1

g

)′(a) = − 1

g2(a).

Theorem 9.5 Let X and Y be open subsets of R. If f : X → R is differentiable at a point x ∈ X,and f(x) ∈ Y , and g : Y → R is differentiable at y = f(x), then the composition g◦f is differentiableat x and (g ◦ f)′(x) = g′(f(x))f ′(x).

Proof. We havef(x+ h) = f(x) + f ′(x)h+ o(|h|) as h→ 0,

andg(y + t) = g(y) + g′(y)t+ o(|t|) as t→ 0. (9.4)

38

Page 39: Lecture notes for Math 61CM, Analysis, Version 2018math.stanford.edu/.../STANF61CM-18/lecture-notes-61cm-18-analysi… · Lecture notes for Math 61CM, Analysis, Version 2018 Lenya

Then we have(g ◦ f)(x+ h) = g(f(x) + f ′(x)h+ o(|h|)) = g(y + t(h)), (9.5)

where t(h) = f ′(x)h+o(|h|). Note that t(h)→ 0 as h→ 0 and there exists r0 so that |t(h)| ≤ 2|f ′(x)hfor |h| < r0. If we write out the meaning of (9.4), it says that for any ε > 0 there exists δ > 0 sothat if |t| < δ, then

|g(y + t)− (g(y) + g′(y)t)||t|

< ε. (9.6)

Choosing |h| sufficiently small, we may ensure that |t(h)| < δ, so that

|g(y + t(h))− (g(y) + g′(y)t(h))||t(h)|

< ε, (9.7)

and then|g(y + t(h))− (g(y) + g′(y)t(h))| < ε|t(h)| < 2ε|f ′(x)|h. (9.8)

Going back to (9.5) it follows that

|(g ◦ f)(x+ h)− g(y)− g′(y)t(h)| < 2ε|f ′(x)|h. (9.9)

Now, we recall that y = f(x) and t(h) = f ′(x)h+ o(|h|), which gives

|(g ◦ f)(x+ h)− g(f(x))− g′(f(x))f ′(x)h− g′(f(x))o(|h|)| < 2ε|f ′(x)|h. (9.10)

It follows that(g ◦ f)(x+ h)− g(f(x))− g′(f(x))f ′(x)h = o(|h|), (9.11)

and we are done. �

Exercise 9.6 Go over the proof and make sure you understand at each step how the notation o(|h|)is used and why it makers sense and is all rigorous!

Theorem 9.7 (Rolle’s theorem) If f : [a, b]→ R is continuous, f(a) = f(b) and f is differentiableat each point x ∈ (a, b), then there is c ∈ (a, b) such that f ′(c) = 0.

Proof. If f(x) ≡ f(a) on [a, b] then f ′(x) = 0 for all x ∈ (a, b). Let us assume that f is notequal identically to a constant on [a, b]. Then it either attains its maximum, or a minimum at somec ∈ (a, b). Then the ratios

f(c+ h)− f(c)

h

have different signs for h > 0 and h < 0. It follows that f ′(x) = 0. �

Theorem 9.8 (Mean Value theorem) If f : [a, b]→ R is continuous, and f is differentiable at eachpoint x ∈ (a, b), then there is c ∈ (a, b) such that

f ′(c) =f(b)− f(a)

b− a.

Proof. Apply Rolle’s theorem to the function

g(x) = f(x)− f(b)− f(a)

b− a(x− a).

Note that g(a) = g(b) = f(a). �

39

Page 40: Lecture notes for Math 61CM, Analysis, Version 2018math.stanford.edu/.../STANF61CM-18/lecture-notes-61cm-18-analysi… · Lecture notes for Math 61CM, Analysis, Version 2018 Lenya

9.2 The Taylor series representation

We now ask the question of when we can approximate a function to a higher order than linearly. Sofar, we have seen that if f ′(x0) exists then we can approximate f(x+ h) by a linear function, up toan error of the order o(h). Here is a more general result.

Theorem 9.9 Let f be a function differentiable up to order m + 1 on an interval (x0 − r, x0 + r).Then for any x such that |x− x0| < r, there exists c between x0 and x such that

f(x) =

m∑n=0

f (n)(x0)

n!(x− x0)n +

f (m+1)(c)

(m+ 1)!(x− x0)m+1.

Proof. Fix some x such that |x− x0| < r and consider the function g(t) defined on |t− x0| < r by

g(t) = f(t)−m∑n=0

f (n)(x0)

n!(t− x0)n −M(t− x0)m+1,

with the constant M chosen so that g(x) = 0 (keep in mind that both x and x0 are fixed, theindependent variable is t):

M =1

(x− x0)m+1

(f(x)−

m∑n=0

f (n)(x0)

n!(x− x0)n

). (9.12)

As g(x0) = 0 and g(x) = 0, we deduce that there exists c1 between x0 and x such that g′(c1) = 0.However, we also g′(x0) = 0 (check this!) – hence there exists c2 between x0 and c1 (and still betweenx0 and x) such that g′′(c2) = 0. But we also have g′′(x0) = 0 (check this!), so there exists c3 betweenx0 and c2 (and still between x0 and x) such that g′′′(c3) = 0 Note that

g(k)(x0) = 0, for all 0 ≤ k ≤ m,

hence we can continue this argument until step m + 1, funding a point cm+1 between x0 and x sothat g(m+1)(cm+1) = 0. Note that for any |t− x0| < r we have

g(m+1)(t) = f (m+1)(t)−M(m+ 1)!,

hence

M =f (m+1)(c)

(m+ 1)!.

Using this in (9.12) says that

f(x) =

m∑n=0

f (n)(x0)

n!(x− x0)n +

f (m+1)(cm+1)

(m+ 1)!(x− x0)m+1,

and we are done. �

Theorem 9.10 Let f(x) be differentiable to all orders in an interval |x− x0| < r and assume thatthere exists a constant C > 0 so that∣∣∣f (n)(x)

n!

∣∣∣rn ≤ C, for all n ∈ N and all x such that |x− x0| < r,

then we have

f(x) =

∞∑n=0

f (n)(x0)

n!(x− x0)n, for all x such that |x− x0| < r.

40

Page 41: Lecture notes for Math 61CM, Analysis, Version 2018math.stanford.edu/.../STANF61CM-18/lecture-notes-61cm-18-analysi… · Lecture notes for Math 61CM, Analysis, Version 2018 Lenya

Proof. By assumptions of this theorem and by Theorem 9.9, for any m ∈ N and any x ∈ (x0 −r, x0 + r), we can find cm(x) between x0 and x so that

f(x)−m∑n=0

f (n)(x0)

n!(x− x0)n =

f (m+1)(cm(x))

(m+ 1)!(x− x0)m+1.

Using the assumptions of the present theorem, we see that∣∣∣f(x)−m∑n=0

f (n)(x0)

n!(x− x0)n

∣∣∣ ≤ |f (m+1)(cm(x))|(m+ 1)!

|x− x0|m+1 =|f (m+1)(cm(x))|

(m+ 1)!rm+1 |x− x0|m+1

rm+1

≤ C∣∣∣x− x0

r

∣∣∣m+1→ 0 as m→ +∞,

and we are done. �

9.3 Differentiability in Rn

Recall that for a scalar function f : R → R we say that a function f is differentiable at x if thereexists a number a ∈ R so that

f(x+ h) = f(x) + ah+ o(|h|), as h→ 0,

and then we say that a = f ′(x). For a scalar-valued function f : Rn → R the definition is verysimilar. It is differentiable at a point x ∈ Rn if there exists a linear function L : Rn → R such that

f(x+ h) = f(x) + L(h) + o(‖h‖), as h→ 0,

in the sense that

limh→0

‖f(x+ h)− (f(x) + L(h))‖‖h‖

= 0.

A linear function L : Rn → R must have the form L(h) = (v · h) with some v ∈ Rn. We call thisvector the gradient of f at x:

v = ∇f(x).

In other words, a scalar-valued function f : Rn → R is differentiable at a point x ∈ Rn if

f(x+ h) = f(x) + (∇f(x) · h) + o(‖h‖), as h→ 0.

This definition can be extended to mappings f : Rn → Rm easily: a mapping f : Rn → Rm isdifferentiable at a point x ∈ Rn if there exists a linear map L : Rn → Rm such that

f(x+ h) = f(x) + L(h) + o(‖h‖), as h→ 0,

in the sense that

limh→0

‖f(x+ h)− (f(x) + L(h))‖‖h‖

= 0.

Recall that a linear map L : Rn → R must have the form L(h) = Ah with some m × n matrix A.We call this matrix the derivative matrix of f (or the Jacobian matrix):

A = Df(x).

Lemma 9.11 If U ⊂ Rn and f : U → Rm is differentiable at a point x0 ∈ U , then f is continuousat x0.

Exercise 9.12 Adapt the proof of this fact for scalar valued functions of a real variable to proveLemma 9.11.

Definition 9.13 For a scalar-valued function f(x), the gradient of f at x0 is the vector Df(x0) ifit exists.

41

Page 42: Lecture notes for Math 61CM, Analysis, Version 2018math.stanford.edu/.../STANF61CM-18/lecture-notes-61cm-18-analysi… · Lecture notes for Math 61CM, Analysis, Version 2018 Lenya

9.4 Directional derivatives

We now interpret the derivative matrix Df(x) in more concrete terms. Let U ⊂ Rn be an open set,and f a map from U to Rm. Then, for each x ∈ U we can find a ball B(x, r) that is contained in U .This means that for any v ∈ Rn the point x+ tv s in U as long as |t| < r/|v| is sufficiently small, sothat the function

g(t) = f(x+ tv)

of a real variable t ∈ R is defined for |t| < r/v.

Definition 9.14 Given a vector v ∈ Rn, v 6= 0, the directional derivative Dvf(x0) is

Dvf(x0) = limt→0

1

t(f(x0 + tv)− f(x0)),

whenever that limit exists.

Note that if f maps U to Rm then Dvf(x0) is a (row) vector in Rm as well.

Exercise 9.15 Fix some λ ∈ R and v ∈ Rn, and let w = λv. Express Dwf(x0) in terms of Dvf(x0).

In the special case when v = ej , the j-th vector of the standard basis, we denote Dejf(x0) as

Dejf(x0) =∂f(x0)

∂xj.

Let us show that if the gradient matrix Df(x0) exists then directional derivatives can all be computedfrom it.

Theorem 9.16 Let U ⊂ Rn be an open set and f : U → Rm be a map that is differentiable at apoint x0 ∈ U . Then the directional derivative in a direction v ∈ Rn, v 6= 0, has the form

Dvf(x0) = (Df(x0))v,

or, written component-wise, we have

Dvfk(x) =n∑j=1

vj∂fk(x0)

∂xj.

Proof. First, as U is an open set, there exists r so that x0 + tv s in U for |t| < r, and the function

g(t) = f(x0 + tv)

is defined, as a mapping from (−r, r) to Rm. Also, as f is differentiable at x0, we have

f(x0 + h)− f(x) = (Df(x0))h+ o(‖h‖), as h→ 0,

with h ∈ Rn. Note that if some function α(h) = o(‖h)‖ as h→ 0, then α(tv) = o(t) for any v ∈ Rnfixed (check this!), hence using h = tv above, we have

f(x0 + tv)− f(x) = (Df(x0))(tv) + o(t), as t→ 0, (9.13)

with t ∈ (−r, r). This means exactly that

limt→0

f(x0 + tv)− f(x0)

t= (Df(x0))v,

42

Page 43: Lecture notes for Math 61CM, Analysis, Version 2018math.stanford.edu/.../STANF61CM-18/lecture-notes-61cm-18-analysi… · Lecture notes for Math 61CM, Analysis, Version 2018 Lenya

hence Dvf(x0) = (Df(x0))v, as claimed. Going back to (9.13) and writing it separately for eachcoordinate fk, k = 1, . . . ,m, gives

fk(x0 + tv)− fk(x) = (Dfk(x0))(tv) + o(t), as t→ 0, (9.14)

However, Dfk is simply a row vector in Rm and (Dfk)(v) is the dot-product 〈Dfk(x0), v〉. Takingv = ej shows that

(Dfk(x0))j =∂fk(x0)

∂xj.

It follows then that

Dvfk(x0) = (Dfk(x0))v =

n∑j=1

vj(Dfk(x0))j =

n∑j=1

vj∂fk(x0)

∂xj,

and we are done. �Note that existence of partial derivatives of f at x0 does not imply that the derivative matrix

Df(x0) exists. To see that, we may borrow an example from Section 6.3. Consider a functionf : R2 → R defined by

f(x1, x2) =

x1 + x2 sin( 1

x1

), if x1 6= 0,

0, if x1 = 0.

Then we have f(x1, 0) = x1 and f(0, x2) = 0, which means that

∂f

∂x1(0, 0) = 1,

∂f

∂x2(0, 0) = 0,

so if Df(0, 0) exists then Df = (1, 0), but if we look at x1 = x2 = h, then

f(h, h)− f(0, 0) = h+ h sin(1

h

),

and the right side is not of the form 〈(1, 0), (h, h)〉+o(‖h‖).Here is an extra condition that ensures that existence of partial derivatives implies existence of

the derivative matrix.

Theorem 9.17 Let f : U → Rn, x0 ∈ U ⊂ Rn and assume that the partial derivatives Dkf(x)exists for all x in a ball B(x0, ρ) with some ρ > 0 and are continuous at x0. Then f is differentiableat x0.

Proof. We will assume without loss of generality that m = 1, so that f is real-valued and thederivative ”matrix” is simply the gradient row-vector (if it exists). For general m > 1 one arguesimply component by component. Take h = (h, . . . , hn) with |h| < ρ and write

f(x0 + h)− f(x0) = (f(x0 + h1e1)− f(x0)) + (f(x0 + h1e1 + h2e2)− f(x0 + h1e1)) + . . .

+ (f(x0 + h1e1 + · · ·+ hnen)− f(x0 + h1e1 + · · ·+ hn−1en−1)).(9.15)

The intermediate value theorem for functions of one variable implies that there exist ξ1 ∈ B(x0, ‖h‖), ξ2 ∈B(x0, ‖h‖), . . . , ξn ∈ B(x0, ‖h‖), such that

f(x0 + h1e1)− f(x0) =∂f(ξ1)

∂x1h1,

f(x0 + h1e1 + h2e2)− f(x0 + h1e1) =∂f(ξ2)

∂x2h2,

. . . ,

f(x0 + h1e1 + · · ·+ hnen)− f(x0 + h1e1 + · · ·+ hn−1en−1) =∂f(ξn)

∂xnhn.

43

Page 44: Lecture notes for Math 61CM, Analysis, Version 2018math.stanford.edu/.../STANF61CM-18/lecture-notes-61cm-18-analysi… · Lecture notes for Math 61CM, Analysis, Version 2018 Lenya

Using this in (9.15) gives

f(x0 + h)− f(x0) =∂f(ξ1)

∂x1h1 +

∂f(ξ2)

∂x2h2 + · · ·+ ∂f(ξn)

∂xnhn. (9.16)

However, as each ξk is contained in the ball B(x0, ‖h‖) and each partial derivative ∂f∂xk

is continuous,we know that

∂f(ξk)

∂xk=∂f(x0)

∂xk+ o(1), as h→ 0,

so that, as hko(1) = o(‖h‖) for any 1 ≤ k ≤ n fixed (check this!), we have

f(x0 + h)− f(x0) =∂f(x0)

∂x1h1 +

∂f(x0)

∂x2h2 + · · ·+ ∂f(x0)

∂xnhn + o(‖h‖), as h→ 0. (9.17)

This means precisely that f is differentiable at x0 and

Df(x0) =(∂f(x0)

∂x1, . . . ,

∂f(x0)

∂xn

),

and we are done. �

9.5 The chain rule

We now consider the chain rule for maps. Let U ⊂ Rn and V ⊂ Rm be two open sets, f be amap from U to V and g a map from V to Rk. Assume that f is differentiable at x0 ∈ U and gis differentiable at y0 = f(x0). Then the composition g ◦ f is a map from U to Rk. We will showthat (g ◦ f) is differentiable at x0 and that the derivative matrix of (g ◦ f) at x0 is

D(g ◦ f)(x0) = Dg(y0)Df(x0). (9.18)

Note that Df(x0) is an m × n matrix, while Dg(y0) is a k × m matrix, so that the productDg(y0)Df(x0) makes sense and is a k × n matrix, as it should be for a derivative matrix of amap from Rn to Rk. The proof is quite similar to that for real-valued functions. First, note thatthe continuity of f and g and the fact that V is open imply that there exists r > 0 small so thatf(x) ∈ V for all x ∈ B(x0, r), so that g ◦ f is defined for all x ∈ B(x0, r). Next, we write

f(x0 + h) = f(x0) +Df(x0)h+ o(‖h‖), as h→ 0, (9.19)

with h ∈ Rn, andg(y0 + t) = g(y0) +Dg(y0)t+ o(‖t‖), as t→ 0, (9.20)

with t ∈ Rm. Given h, let us take t = f(x0 + h) − f(x0), then, as y = f(x0), (9.19), togetherwith (9.20) say that

g(f(x0) + h) = g(f(x0)) +Dg(y0)(f(x0 + h)− f(x0)) + r(h), (9.21)

with a function r(h) such that

r(h)

‖f(x0 + h)− f(x0)‖→ 0 as h→ 0. (9.22)

It follows from (9.22) and (9.19) that actually

r(h)

‖h‖→ 0 as h→ 0, (9.23)

44

Page 45: Lecture notes for Math 61CM, Analysis, Version 2018math.stanford.edu/.../STANF61CM-18/lecture-notes-61cm-18-analysi… · Lecture notes for Math 61CM, Analysis, Version 2018 Lenya

that is, r(h) = o(h). Thus, (9.21) says that

g(f(x0) + h) = g(f(x0)) +Dg(y0)(f(x0 + h)− f(x0)) + o(h). (9.24)

Next, use (9.19) for the difference f(x0 + h)− f(x0) in the right side above:

g(f(x0) + h) = g(f(x0)) +Dg(y0)(Df(x0)h+ o(|h‖)) + o(h). (9.25)

Exercise 9.18 Go over the details on how we obtained (9.25), in particular, how (9.19) and (9.22)imply (9.23).

It remains to observe thatDg(y0)o(‖h‖) = o(‖h‖),

to conclude thatg(f(x0) + h) = g(f(x0)) +Dg(y0)Df(x0)h+ o(‖h‖), (9.26)

from which we conclude that g ◦ f is differentiable at x0 and

D(g ◦ f)(x0) = Dg(y0)Df(x0), (9.27)

which is the chain rule.An important special case is when g : Rm → R is a real-valued function. Let us define

p(x) = g(f(x)) = g(f1(x), f2(x), . . . , fm(x)).

Then the chain rule says that

Dp(x) = D(g ◦ f) = Dg(f(x))Df(x),

where Dg is a row-vector of length m, and Df is an m×n matrix, so the product Dp is a row vectorof length n, with the entries

∂p(x)

∂xj=

m∑l=1

∂g(f(x))

∂yl

∂fl(x)

∂xj. (9.28)

9.6 Higher order derivatives in Rn

In this section, we will look at higher order derivatives of a map f : Rn → Rm and show that ”theorder of differentiation does not matter”. As by now we know that the derivative matrix Df hasrows that are the gradient vectors of each component fk : Rn → R, and that differentiability of fis equivalent to differentiability of all components of f , we will assume that f : Rn → R is a scalarvalued function. We say that a function f is twice differentiable at a point x0 if the gradient vectorDf , viewed as map from Rn → Rn is a differentiable map. We denote the second order partialderivatives of f as

∂2f

∂xi∂xj,

or as DiDjf , and, more generally, k-th order partial derivatives of f as

∂kf

∂xi1∂xi2 . . . ∂xik,

or as Di1Di2 . . . Dikf , with some collection of indices 1 ≤ ik ≤ n.

45

Page 46: Lecture notes for Math 61CM, Analysis, Version 2018math.stanford.edu/.../STANF61CM-18/lecture-notes-61cm-18-analysi… · Lecture notes for Math 61CM, Analysis, Version 2018 Lenya

Theorem 9.19 Let U ⊂ Rn be an open set, with x0 ∈ U . Assume that a function f is twicedifferentiable at x0, and that the partial derivatives DiDjf and DjDif are continuous at x0. ThenDiDjf(x0) = DjDif(x0).

Proof. Let ei and ej be the standard basis vectors, and for h, t ∈ R sufficiently small, consider thefour points x0, x0 + tej , x0 + hei, and x0 + tej + hei. Then the ”second order difference”

φ(h, t) = f(x0 + hei + tej) + f(x0)− f(x0 + hei)− f(x0 + tej)

can be written in two ways: first, as

φ(h, t) = [f(x0 + hei + tej)− f(x0 + hei)]− [f(x0 + tej)− f(x0)]. (9.29)

but also asφ(h, t) = [f(x0 + hei + tej)− f(x0 + tej)]− [f(x0 + hei)− f(x0)]. (9.30)

Looking at (9.30), let us define the function g(x), with h fixed, as

g(x) = f(x+ hei)− f(x), (9.31)

so that (9.30) says thatφ(h, t) = g(x0 + tej)− g(x0). (9.32)

The intermediate value theorem applied to (9.32) implies that there exists t1 between 0 and t sothat

φ(h, t) = (Djg)(x0 + t1ej)t. (9.33)

Note that, differentiating (9.31) gives

Djg(x) = Djf(x+ hei)−Djf(x), (9.34)

Again, as Djf is differentiable, the intermediate value theorem applied to Djf implies that thedifference in the right side of (9.34) can be written as

Djg(x) = hDiDjf(x+ h1ei), (9.35)

with some h1 between 0 and h. Using this in (9.33) with x = x0 + t1ej gives

φ(h, t) = ht(DiDjf)(x0 + t1ej + h1ei). (9.36)

Next, let us use (9.29) rather than (9.30). Define the function p(x), with t fixed, as

p(x) = f(x+ tej)− f(x), (9.37)

so that (9.29) says thatφ(h, t) = p(x0 + hei)− p(x0). (9.38)

The intermediate value theorem implies that there exists h2 between 0 and h so that

φ(h, t) = (Dip)(x0 + h2ei)h. (9.39)

Note that, differentiating (9.37) gives

Dip(x) = Dif(x+ tej)−Dif(x). (9.40)

46

Page 47: Lecture notes for Math 61CM, Analysis, Version 2018math.stanford.edu/.../STANF61CM-18/lecture-notes-61cm-18-analysi… · Lecture notes for Math 61CM, Analysis, Version 2018 Lenya

Again, as Dif is differentiable, the intermediate value theorem implies that the difference in theright side can be written as

Dip(x) = tDjDif(x+ t2ej), (9.41)

with some t2 between 0 and t. Using this in (9.39) with x = x0 + h2ei gives

φ(h, t) = ht(DjDif)(x0 + t2ej + h2ei). (9.42)

Now, comparing (9.36) and (9.42) we deduce that

(DiDjf)(x0 + t1ej + h1ei) = (DjDif)(x0 + t2ej + h2ei). (9.43)

Note that t1, h1, t2, h2 in (9.43) all depend on t and h but tend to zero as t, h→ 0. Moreover, bothDiDjf and DjDif are continuous at x0. Passing to the limit in (9.43) gives, therefore:

(DiDjf)(x0) = (DjDif)(x0), (9.44)

and we are done. �As a corollary, we deduce the following: if f is k times differentiable at a point x0, and all of its

derivatives of order k are continuous, then the partial derivative

Di1Di2 . . . Dikf

does not depend on the order in which these derivatives are taken.

9.7 Critical points in R

Let us now investigate the critical points of a function f : R → R, a real-valued function of a realvariable. We say that x0 is a local maximum of a function f(x) if there exists R0 > 0 so that wehave f(x) ≤ f(x0) for all x ∈ (x0−R0, x0+R0). Similarly, x0 is a local minimum of a function f(x) ifthere exists R0 > 0 so that we have f(x) ≥ f(x0) for all x ∈ (x0−R0, x0+R0). In other words, thereexists an interval I around x0 so that f attains its maximum (minimum) on I at the point x0. Recallthat we have already proved that if f attains it maximum or minimum over an open interval I at apoint x0 ∈ I, then f ′(x0) = 0. In order to be able to say whether x0 is a maximum or a minimum, weuse the second derivative test. In order to make the presentation very easily adaptable to functionson Rn, we begin with the following lemma.

Lemma 9.20 Let f be twice continuously differentiable on an interval (a, b) and x0 ∈ (a, b). Assumethat f ′(x0) = f ′′(x0) = 0, then

f(x0 + h)− f(x0) = o(h2).

Proof. The intermediate value theorem implies that, given h, we can find ξ1(h) that is between x0and x0 + h so that

f(x0 + h)− f(x0) = f ′(ξ1(h))h. (9.45)

Next, the intermediate value theorem applied to the function f ′(x), implies that there exists ξ2(h)that is between x0 and ξ1(h) so that

f ′(ξ1(h))− f ′(x0) = f ′′(ξ2(h))(ξ1(h)− x0). (9.46)

As f ′(x0) = 0, this is simply

f ′(ξ2(h)) = f ′′(ξ2(h))(ξ1(h)− x0). (9.47)

47

Page 48: Lecture notes for Math 61CM, Analysis, Version 2018math.stanford.edu/.../STANF61CM-18/lecture-notes-61cm-18-analysi… · Lecture notes for Math 61CM, Analysis, Version 2018 Lenya

Using this in (9.45) gives

f(x0 + h)− f(x0) = f ′′(ξ2(h))h(ξ1(h)− x0). (9.48)

As |ξ1(h)− x0| ≤ h, we deduce that

|f(x0 + h)− f(x0)| ≤ |f ′′(ξ2(h))|h2, (9.49)

so that|f(x0 + h)− f(x0)|

h2≤ |f ′′(ξ2(h))|. (9.50)

Now, as f ′′(x) is continuous at x0, and ξ2(h)→ x0 as h→ 0, we know that f ′′(ξ2(h))→ f ′′(x0) = 0as h→ 0. Thus, we have

f(x0 + h)− f(x0) = o(h2),

and we are done. �

Theorem 9.21 Let f be twice continuously differentiable at a point x0. Assume that f ′(x0) = 0,then (i) if f ′(x0) > 0 then x0 is a local minimum of f and (ii) if f ′′(x0) < 0 then x0 is a localmaximum of f .

Proof. Let us assume that f ′′(x0) > 0. Consider the function

g(x) = f(x0) +1

2f ′′(x0)(x− x0)2,

and setp(x) = f(x)− g(x).

Note that g′(x0) = 0 and g′′(x0) = f ′(x0), so that

p(x0) = 0, p′(x0) = 0, p′′(x0) = 0.

It follows from Lemma 9.20 that p(x0 + h) = o(h2), so that

f(x0 + h) = g(x0 + h) + o(h2),

so that

r(h) = f(x0 + h)− f(x0)−1

2f ′′(x0)h

2 = o(h2). (9.51)

Thus, we can choose R0 so that for all h ∈ (−R0,+R0) we have

|r(h)| ≤ 1

10f ′′(x0)h

2.

Using this in (9.51) gives∣∣∣f(x0 + h)− f(x0)−1

2f ′′(x0)h

2∣∣∣ ≤ 1

10f ′′(x0)h

2, for all |h| < R0,

and thus

f(x0 + h)− f(x0) ≥1

2f ′′(x0)h

2 − 1

10f ′′(x0)h

2 > 0, for all |h| < R0,

thus x0 is a local minimum of f if f ′′(x0) > 0. The case f ′(x0) < 0 is essentially identical. �

48

Page 49: Lecture notes for Math 61CM, Analysis, Version 2018math.stanford.edu/.../STANF61CM-18/lecture-notes-61cm-18-analysi… · Lecture notes for Math 61CM, Analysis, Version 2018 Lenya

9.8 A digression on the quadratic forms

A quadratic form is an expression of the form

F (ξ) =n∑

i,j=1

qijξiξj .

Here, ξ ∈ Rn is a vector that we think of as an ”independent variable”, and the n × n symmetricmatrix Q with the entries qij is fixed.

Exercise 9.22 Why can we assume that qij = qji so that the matrix Q is symmetric?

This is a natural generalization of the quadratic functions in R that do not have a linear part: inthat case, F (ξ) = qξ2, where q ∈ Rn is a number, and ξ ∈ R is an independent variable. Noticethat, depending on the sign of the number q, we either have F (ξ) ≥ 0 for all ξ ∈ R or F (ξ) ≤ 0 forall ξ ∈ R. In higher dimensions n > 1, it is also possible that a quadratic form may be positive forall ξ ∈ Rn, for instance for

F (ξ) = a1ξ21 + a2ξ

22 + · · ·+ anξ

2n,

with all aj ≥ 0, 1 ≤ j ≤ n. It may also happen that a quadratic form may be negative for all ξ ∈ Rn,for instance for

F (ξ) = a1ξ21 + a2ξ

22 + · · ·+ anξ

2n,

with all aj ≤ 0, 1 ≤ j ≤ n. It is also possible that a quadratic form may take both positive andnegative values: take n = 2 and consider

F (ξ) = ξ21 − ξ22 , ξ = (ξ1, ξ2) ∈ R2.

Then, for instance F (1, 0) = 1 > 0 and F (0, 1) = −1 < 0.In order to understand what is happening in general, let us write

F (ξ) =

n∑i,j=1

qijξiξj =

n∑j=1

( n∑i=1

qijξj

)ξi =

n∑i=1

viξi, (9.52)

with the vector v = (v1, . . . , vn) ∈ Rn having the components

vi =

n∑j=1

qijξj , i = 1, . . . , n.

Note that v = Qξ so that we can re-write (9.52) as

F (ξ) = (Qξ · ξ). (9.53)

The matrix Q is an n×n symmetric matrix. The spectral theorem tells us that it has n eigenvaluesλ1, . . . , λn taken with multiplicities. We denote the corresponding eigenvectors v1, . . . , vn:

Qvk = λkvk. (9.54)

Note that if λk 6= λj then the eigenvectors vk and vj are orthogonal. To see that, write

Qvk = λkvk

Qvj = λjvj ,(9.55)

49

Page 50: Lecture notes for Math 61CM, Analysis, Version 2018math.stanford.edu/.../STANF61CM-18/lecture-notes-61cm-18-analysi… · Lecture notes for Math 61CM, Analysis, Version 2018 Lenya

and take the dot product of the first equation with vj and of the second with vk. This gives

(Qvk · vj) = λk(vk · vj)(Qvj · vk) = λj(vj · vk).

(9.56)

Note that as the matrix Q is symmetric, we have QT = Q, so that

(Qvk · vj) = (vk · (QT vj)) = (vk ·Qvj) = (Qvj · vk).

With this identity in hand, subtracting the second equation in (9.56) from the first gives

(λk − λj)(vk · vj) = 0.

As λk 6= λj , it follows that (vk · vj) = 0.

Exercise 9.23 Use the above to show that if Q is symmetric then there exists an orthonormal basis{v1, . . . , vn} of Rn so that each vj is an eigenvector of Q.

This basis is extremely useful for the quadratic form F (ξ) = (Qξ · ξ). Given a vector ξ, letc1(ξ), . . . cn(ξ) be its coordinates in that basis:

ξ = c1(ξ)v1 + · · ·+ cn(ξ)vn.

As the basis vk is orthonormal, we have (vk · vj) = 0 if k 6= j and (vk · vk) = 1 for all 1 ≤ k ≤ n, sothat

|ξ|2 = (ξ · ξ) = |c1(ξ)|2 + · · ·+ |cn(ξ)|2. (9.57)

Furthermore, we may write Qξ as

Qξ = c1(ξ)Qv1 + · · ·+ cn(ξ)Qvn = c1(ξ)λ1v1 + · · ·+ cn(ξ)λnvn,

thusF (ξ) = (Qξ · ξ) = (c1(ξ)λ1v1 + · · ·+ cn(ξ)λnvn) · (c1(ξ)v1 + · · ·+ cn(ξ)vn). (9.58)

As the basis vk is orthonormal, we have (vk · vj) = 0 if k 6= j and (vk · vk) = 1 for all 1 ≤ k ≤ n.Using this in (9.58) gives

F (ξ) = (Qξ · ξ) = λ1|c1(ξ)|2 + · · ·+ λn|cn(ξ)|2. (9.59)

It follows that if we setm = min

1≤k≤nλk, M = max

1≤k≤nλk, (9.60)

thenm(|c1(ξ)|2 + · · ·+ |cn(ξ)|2) ≤ F (ξ) = (Qξ · ξ) ≤M(|c1(ξ)|2 + · · ·+ |cn(ξ)|2). (9.61)

With the help of (9.57), this becomes

m‖ξ‖2 ≤ F (ξ) = (Qξ · ξ) ≤M‖ξ‖2, (9.62)

with m and M as in (9.60). Let us summarize this as the following theorem.

50

Page 51: Lecture notes for Math 61CM, Analysis, Version 2018math.stanford.edu/.../STANF61CM-18/lecture-notes-61cm-18-analysi… · Lecture notes for Math 61CM, Analysis, Version 2018 Lenya

Theorem 9.24 Let Q be an n × n symmetric matrix with eigenvalues λ1, . . . , λn (taken with mul-tiplicities), and set

m = min1≤k≤n

λk, M = max1≤k≤n

λk, (9.63)

then for any ξ ∈ Rn we havem‖ξ‖2 ≤ (Qξ · ξ) ≤M‖ξ‖2. (9.64)

Moreover, there exists ξ ∈ Rn, ξ 6= 0, so that (Qξ · ξ) = m‖ξ‖2, and η ∈ Rn, η 6= 0, so that(Qξ · ξ) = M‖ξ‖2.

Definition 9.25 We say that a quadratic form F (ξ) on Rn is positive definite if there exists m > 0so that for any ξ ∈ Rn we have

m|ξ|2 ≤ F (ξ). (9.65)

We say that a quadratic form F (ξ) on Rn is negative definite if there exists m < 0 so that for anyξ ∈ Rn we have

m|ξ|2 ≥ F (ξ). (9.66)

Corollary 9.26 (i) A quadratic form F (ξ) defined on Rn is positive definite if all eigenvalues ofthe corresponding symmetric positive definite n× n matrix Q are positive.(ii) A quadratic form F (ξ) defined on Rn is negative definite if all eigenvalues of the correspondingsymmetric positive definite n× n matrix Q are negative.

Another function one can associate to a symmetric n× matrix Q is a bilinear form G(ξ, η) definedfor a pair of vectors ξ ∈ Rn and η ∈ Rn as

G(ξ, η) =n∑

i,j=1

qijξiηj =n∑i=1

( n∑j=1

qijξj

)ηi = (Qξ · η). (9.67)

Since Q is symmetric, we have the identity

(Qξ · η) =1

4[(Q(ξ + η) · (ξ + η))− (Q(ξ − η) · (ξ − η))],

which allows us to write

G(ξ, η) =1

4[F (ξ + η)− F (ξ − η)],

hence the values of the bilinear form are determined by the values of the quadratic form. Note thatthe corresponding quadratic form is simply F (ξ) = G(ξ, ξ).

Exercise 9.27 Show that the entries of the matrix Q can be expressed in terms of the bilinear formas

qij =1

4[F (ei + ej)− F (ei − ej)].

9.9 The second order Taylor polynimial in Rn

Let us now consider the analog of the Taylor series in Rn. Recall that for an (m+ 1)-differentiablefunction f(x) of a single variable we have the Taylor formula

f(x0 + h) = Pm(h) +f (m+1)(ξ)

(m+ 1)!hm+1,

51

Page 52: Lecture notes for Math 61CM, Analysis, Version 2018math.stanford.edu/.../STANF61CM-18/lecture-notes-61cm-18-analysi… · Lecture notes for Math 61CM, Analysis, Version 2018 Lenya

where ξ is a point between x0 and x0 + h, and the Taylor polynomial is

Pm(h) =m∑k=0

f (k)(x0)

k!hk.

The polynomial Pm(h) was constructed so that for all 0 ≤ k ≤ m we have

dk

dhkf(x0 + h)

∣∣∣h=0

=dk

dhkPm(h)

∣∣∣h=0

.

We can do a similar procedure for functions on Rn but we will only do this in detail for the quadraticTaylor polynomials. We are given a function f : B(x0, r) → R that is twice differentiable in a ballB(x0, r) and such that its second derivative matrix D2f is continuous in B(x0, r). Our goal is tofind a vector b and a matrix Q with entries qij such that we have

f(x0 + h) = f(x0) + (b · h) +1

2(Qh · h) + o(‖h‖2). (9.68)

Note that1

2(Qh · h) + o(‖h‖2) = o(‖h‖),

hence we must have b = Df(x0), so that (9.68) becomes

f(x0 + h) = f(x0) + (Df(x0) · h) +1

2(Qh · h) + o(‖h‖2). (9.69)

If we blindly differentiate both sides of (9.68) twice in the directions ei and ej and manage to arguethat we may pass to the limit h → 0, disregarding the derivative of the o(‖h‖2)-term in (9.68), wewould get that

∂2f(x0)

∂xi∂xj= qij , (9.70)

that is, the matrix Q is Q = D2f(x0). Now, the question is whether (9.68) does, indeed, hold withb = Df(x0) and Q = D2f(x0). Let us start with the following lemma.

Lemma 9.28 Let f be twice differentiable in a ball B(x0, r), and D2f(x) be continuous in B(x0, r).Assume that Df(x0) = 0 and D2f(x0) = 0, then

f(x0 + h)− f(x0) = o(‖h‖2), as h→ 0. (9.71)

Proof. Let us fix h ∈ B(x0, r) and consider the function

g(t) = f(x0 + th)− f(x0), t ∈ [−1, 1],

of a scalar variable t. Note that g(0) = 0 and, using the definition of the partial derivatives of f andalso the partial derivatives of ∂f/∂xj , we can compute

g′(t) = (h ·Df(x0 + th)) =n∑j=1

hj∂f(x0 + th)

∂xj,

g′′(t) =

n∑j=1

hj

(h ·D∂f(x0 + th)

∂xj

)=

n∑j=1

hj

n∑i=1

hi∂2f(x0 + th)

∂xi∂xj=

n∑i,j=1

hihj∂2f(x0 + th)

∂xi∂xj.

(9.72)

52

Page 53: Lecture notes for Math 61CM, Analysis, Version 2018math.stanford.edu/.../STANF61CM-18/lecture-notes-61cm-18-analysi… · Lecture notes for Math 61CM, Analysis, Version 2018 Lenya

The intermediate value theorem tells us that

g(1)− g(0) = g′(c1), (9.73)

with some c1 ∈ (0, 1). Applying the intermediate value theorem again we conclude that

g′(c1) = g′′(c2)c1, (9.74)

with some c2 ∈ (0, c1). Using this in (9.73) gives

g(1)− g(0) = g′′(c2)c2. (9.75)

Let us use the definition of g(t) and the fact that g(0) = 0 in (9.75):

f(x0 + h) = g′′(c2)c2. (9.76)

Now, let us replace g′′(c2) by the corresponding expression in (9.72):

f(x0 + h) =n∑

i,j=1

hihj∂2f(x0 + c2h)

∂xi∂xjc2. (9.77)

As D2f(x0) = 0 and D2f(x) is continuous at x0, given any ε > 0 we can find δ > 0 so that∣∣∣∂2f(x0 + z)

∂xi∂xj

∣∣∣ < ε for all z ∈ Rn such that ‖z − x0‖ < δ. (9.78)

As |c2| < 1, as soon as |h| < δ, we have

|f(x0 + h)| ≤n∑

i,j=1

|hihj |∣∣∣∂2f(x0 + c2h)

∂xi∂xj

∣∣∣ ≤ n∑i,j=1

‖h‖2ε ≤ n2‖h‖2ε. (9.79)

It follows that ‖f(x0 + h)| = o(‖h‖2), and we are done. �Now we can prove the second order approximation theorem.

Theorem 9.29 Let f be twice differentiable in a ball B(x0, r), and D2f(x) be continuous in B(x0, r).Assume that Df(x0) = 0 and D2f(x0) = 0, then

f(x0 + h) = f(x0) + (Df(x0) · h) +1

2(D2f(x0)h · h) + o(‖h‖2), as h→ 0. (9.80)

Proof. This is a simple consequence of Lemma 9.28. Define the function

f(x) = f(x)− f(x0)− (Df(x0) · (x− x0))−1

2(D2f(x0)(x− x0) · (x− x0)).

Note that the f(x0) = 0, Df(x0) = 0 and D2f(x0) = 0. Moreover, D2f(x) is continuous at x0.It follows from Lemma 9.28 that f(x) = o(‖x − x0‖2), which is exactly the claim of the presenttheorem. �

53

Page 54: Lecture notes for Math 61CM, Analysis, Version 2018math.stanford.edu/.../STANF61CM-18/lecture-notes-61cm-18-analysi… · Lecture notes for Math 61CM, Analysis, Version 2018 Lenya

9.10 The critical points of a function and the second derivative test

Now, we are ready to discuss conditions for a function f : Rn → R to attain a local maximum ora local minimum at a point x0. Recall that a function f defined on a open set U attains a localmaximum at a point x0 ∈ U if there is exists r > 0 so that

f(x) ≤ f(x0) for all x ∈ U such that |x− x0| < r. (9.81)

Similarly, f attains local minimum at a point x0 ∈ U if there is exists r > 0 so that

f(x) ≥ f(x0) for all x ∈ U such that |x− x0| < r. (9.82)

Note that if f is differentiable at a point x0 and attains a local maximum or a local minimum at x0then we must have Dkf = 0 for all 1 ≤ k ≤ n, which means that Df(x0) = 0. To see that, assumethat r attains a local maximum at x0, and for a given k define the function

g(t) = f(x+ tej),

of a real variable t ∈ (−r, r). Then g(t) attains its maximum on the interval (−r, r) at the pointt = 0, hence g′(0) = 0, which exactly means that Dkf(x0) = 0. The question is whether we candetermine if a given x0 such that Df(x0) = 0 is a local maximum, a local minimum or neither. Notethat any of the three possibilities are possible: simply look at f(x) = x21 + x22, g(x) = −x21− x22, andp(x) = x21 − x22 at x0 = (0, 0) ∈ R2. All three functions have a gradient equal to zero at x0, and fattains a local minimum at x0, g attains a local maximum at x0 but for p(x) the point x0 is neithera local maximum nor a local minimum.

For the rest of this discussion, see Section 7 in Chapter 2 of the Simon book, and Sections8.4.3.-8.4.5 in Zorich.

10 A crash course on integrals

10.1 Extension of a continuous function from a dense subset

Before we turn to the integral, let us consider the following question. Let X be a metric space,and S be a dense sub-set of X. Suppose we have a real-valued continuous function f that is definedon S, f : S → R, the question is if we can extend f to all of X so that the extension is continuous.In other words, can we construct a function f : X → R such that f is continuous on X and for s ∈ Swe have f(s) = f(s)?

First, let us find a candidate for what f should be. Given x ∈ X, since S is a dense subsetof X, there exists a sequence sn → x. If f is continuous on X, then we must have f(sn) → f(x)as n → ∞. Since f = f on S, we should have f(sn) = f(sn). This brings the following possibility:let us define f(x) for x 6∈ S as follows – take a sequence sn → x and set

f(x) = limn→∞

f(sn). (10.1)

This brings two immediate questions: (i) does the limit in (10.1) exist, and (ii) if the limit exists,will it be the same for all sequences sn that converge to a given point x 6∈ S?

Let us make a stronger assumption than that f is continuous on S: let us assume that f isuniformly continuous on S, so that for any ε > 0 there exists δε > 0 such that for any s1, s2 ∈ Swith d(s1, s2) < δε we have |f(s1)−f(s2)| < ε. Now, as the sequence sn converges to x, this sequenceis Cauchy, hence for any ε > 0 there exists Nε so that d(sn, sm) < δε, with δε as above. Then we

54

Page 55: Lecture notes for Math 61CM, Analysis, Version 2018math.stanford.edu/.../STANF61CM-18/lecture-notes-61cm-18-analysi… · Lecture notes for Math 61CM, Analysis, Version 2018 Lenya

have |f(sn) − f(sm)| < ε for all n > Nε. It follows that the sequence f(sn) is Cauchy, hence thelimit in (10.1) exists. This answers (i). Furthermore, (ii) has also been answered – indeed, takeanother sequence s′n that converges to x and consider the alternating sequence s1, s

′1, s2, s

′2, . . . , that

is s′′k = sk if k is even and s′′k = s′k if k is odd. The sequence s′′k also converges to x, hence, by whatwe have just shown, the limit f(s′′k) exists. But then the limits of f(sk) and f(s′k) must coincide.

Exercise 10.1 Fill in the details in proof of the claim that

limn→∞

f(sn) = limn→∞

f(s′n)

above.

Now we know that f(x) is well-defined for x 6∈ S. It remains to show that f(x) is a continuous on X.Since f is uniformly continuous on S, given ε > 0, we can choose δ > 0 so that for all s1, s2 ∈ Ssuch that d(s1, s2) < δ we have |f(s1)− f(s2)| < ε/4. Let us take x, y ∈ X such that d(x, y) < δ/4.As S is dense in X, we can find two sequences sn, s

′n ∈ S such that

limn→∞

sn = x, limn→∞

s′n = y,

and thenf(x) = lim

n→∞f(sn), f(y) = lim

n→∞f(s′n).

Let us choose N sufficiently large, so that all of the following hold:

d(sN , x) < δ/4, d(s′N , y) < δ/4, |f(x)− f(sN )| < ε/4, |f(y)− f(s′N )| < ε/4. (10.2)

The first two inequalities in (10.2) imply that

d(sN , s′N ) < d(sN , x) + d(x, y) + d(y, s′N ) <

δ

4+δ

4+δ

4< δ,

thus the way we have chosen δ implies that |f(sN ) − f(s′N )| < ε/4. Now, the last two inequalitiesin (10.2) imply that

|f(x)− f(y)| ≤ |f(x)− f(sN )|+ |f(sN )− f(s′N )|+ |f(y)− f(s′N )| < ε

4+ε

4+ε

4< ε.

It follows that the function f is uniformly continuous on X. Thus, we have proved the followingtheorem.

Theorem 10.2 Let X be a metric space and S its dense subset. Suppose that a function f : S → Ris uniformly continuous on S. Then there exists a function f : X → R that is uniformly continuouson X and such that f(s) = f(s) for all s ∈ S.

10.2 The integral of a continuous function

We now give a brief and short-cutty way to define the integral of a continuous function on aninterval [a, b] using the above strategy of extension. There is a class of functions on which one”knows” what the integral should be if we think of the integral as the area under the graph of afunction. Namely, if f : [a, b]→ R is an affine function, that is,

f(t) = αt+ β

with some α, β ∈ R, then by the area of a triangle formula we should have∫ b

af(x)dx =

(αa+ b

2+ β

)(b− a). (10.3)

Here, (b − a)β is the are of the rectangle under the graph, and α(b − a)(a + b)/2 is the area of thetriangle – draw a picture to clarify this!

55

Page 56: Lecture notes for Math 61CM, Analysis, Version 2018math.stanford.edu/.../STANF61CM-18/lecture-notes-61cm-18-analysi… · Lecture notes for Math 61CM, Analysis, Version 2018 Lenya

The integral of a piecewise affine continuous function

The next clear property the integral should have is that the integral is additive with respect to”interval splitting”: for a < b < c we should have∫ c

af(x)dx =

∫ b

af(x)dx+

∫ c

bf(x)dx. (10.4)

This tells us what the integral must be for piecewise affine functions. A function is piecewise affineif there exists a partition a = t0 < t1 < . . . < tn = b of the interval [a, b] so that f(t) = αit + βi oneach interval t ∈ [ti−1, ti], with αi, βi ∈ R. For piecewise affine functions we should have∫ b

af(x)dx =

n∑i=1

(αiti + ti−1

2+ βi

)(ti − ti−1). (10.5)

Note that we require that the piecewise affine functions are continuous: the values at ti on the leftand on the right must match. Our goal is to extend the notion of the integral from piecewise affineto all continuous functions.

It is not completely obvious that the above definition of the integral makes sense even for piecewiseaffine functions. The issue is that if f is piecewise affine, there may be many ways of choosing thepartition points ti. For instance, if one such partition works, one could always refine it by addingmore division points. One then has to show that the result is independent of the particular choiceof division points, subject to f being affine on each subinterval.

Exercise 10.3 Show that if f is a piecewise affine function then the integral defined by the rightside of (10.5) does not depend on the choice of the partition. Hint: given two such partitions,take their common refinement, consisting of all the division points in either, and then show thatthe integrals defined using the original two partitions are equal to that defined using their commonrefinement, and thus to each other.

Once one knows that the integral (10.5) is well-defined on the set D = D([a, b]) of piecewise affinefunctions, one can show the following basic properties of the integral for piecewise affine functions.

Proposition 10.4 (i) Linearity: let f1, f2 ∈ D([a, b]) and c1, c2 ∈ R, then∫ b

a(c1f1(x) + c2f2(x))dx = c1

∫ b

af1(x)dx+ c2

∫ b

af2(x)dx.

(ii) Positivity: if f ∈ D([a, b]), and f(x) ≥ 0 for all x ∈ [a, b], then∫ b

af(x)dx ≥ 0.

(iii) Boundedness: if f ∈ D([a, b]) then∣∣∣ ∫ b

af(x)dx

∣∣∣ ≤ ∫ b

a|f(x)|dx ≤ (b− a)‖f‖, (10.6)

where ‖f‖ is the norm on C(a, b]): ‖f‖ = sup{|f(x)| : x ∈ [a, b]}.(iv) Additivity: if a < b < c and f ∈ D([a, c]) then∫ b

af(x)dx+

∫ c

bf(x)dx =

∫ c

af(x)dx.

56

Page 57: Lecture notes for Math 61CM, Analysis, Version 2018math.stanford.edu/.../STANF61CM-18/lecture-notes-61cm-18-analysi… · Lecture notes for Math 61CM, Analysis, Version 2018 Lenya

Exercise 10.5 Prove Proposition 10.4. Hint: for (i) you may notice that c1f1 + c2f2 is an affinefunction with respect to the partition that is a refinement both of the partition for f1 and f2 andcompute the integral with respect to that partition – recall that the integral does not depend on thepartition. In parts (ii)-(iv) you should use the definition of the integral directly.

Positivity plus linearity imply that if f, g ∈ D and f(x) ≥ g(x) for all x ∈ [a, b], then∫ b

af(x)dx ≥

∫ b

ag(x)dx.

Indeed, we have ∫ b

af(x)dx−

∫ b

ag(x)dx =

∫ b

a(f(x)− g(x))dx ≥ 0

where the first equality uses the linearity of the integral and the second the positivity of the integralof f − g ≥ 0.

Approximation of continuous functions by piecewise affine functions

Next, we will extend the definition of the integral from piecewise affine functions on [a, b] to allcontinuous functions on [a, b]. First, we need to show that that a continuous function can beapproximated by a sequence of piecewise affine functions.

Lemma 10.6 Let f ∈ C[a, b] be a continuous function. There is a sequence of piecewise affinefunctions fn ∈ C[a, b] such that fn → f in C[a, b].

Proof. As f is continuous on a closed interval [a, b], f is uniformly continuous on that interval.Therefore, given n ∈ N we can find δn so that |f(x)− f(y)| < 1/n for all x, y such that |x− y| < δn.Consider now a partition a = t0 < t1 · · · < tk = b of the interval [a, b] such that |tj − tj−1| < δn forall 1 ≤ j ≤ k. Consider a piecewise affine function fn so that fn(t) is linear on each interval [tj−1, tj ]and fn(tj) = f(tj) for all 0 ≤ j ≤ k – draw a picture to visualize fn! Note that fn depends on nthrough the partition. Then, for each t ∈ [a, b], we can find 1 ≤ j ≤ k so that t ∈ [tj−1, tj ]. Then,we have

|fn(t)− f(t)| ≤ |fn(t)− fn(tj)|+ |fn(tj)− f(tj)|+ |f(tj)− f(t)|. (10.7)

The basic properties of linear functions imply that the first term in (10.7) can be estimated as

|fn(t)− fn(tj)| ≤ |fn(tj−1 − fn(tj)| = |f(tj−1)− f(tj)| ≤1

n, (10.8)

since |tj−1−tj | < δn for all j. The second term in the right side of (10.7) vanishes since fn(tj) = f(tj),and the last one can be bounded by

|f(tj)− f(t)| < 1

n, (10.9)

because |tj − t| < δn. It follows that for all t ∈ [a, b] we have

|fn(t)− f(t)| ≤ 2

n, (10.10)

hence ‖fn − f‖ → 0 as n→ +∞, thus fn → f in C[a, b]. �

57

Page 58: Lecture notes for Math 61CM, Analysis, Version 2018math.stanford.edu/.../STANF61CM-18/lecture-notes-61cm-18-analysi… · Lecture notes for Math 61CM, Analysis, Version 2018 Lenya

Defining the integral of a continuous function

Suppose f ∈ C([a, b]), and let fn ∈ D[a, b] be a sequence such that fn → f , in C[a, b]. Note that byLemma 10.6 at least one such sequence exists but of course it is not unique. We would like to show,first, that

limn→∞

∫ b

afn(x)dx

exists and, second, that it is independent of the choice of the particular sequence in D[a, b] thatconverges to f . If we show both of these properties, then we can define∫ b

af(x)dx = lim

n→∞

∫ b

afn(x)dx. (10.11)

Lemma 10.7 Let fn ∈ D[a, b] be a sequence of piecewise affine functions that converges to a func-tion f ∈ C[a, b], then the limit

limn→∞

∫ b

afn(x)dx

exists.

Proof. As the sequence fn converges in C[a, b], it is a Cauchy sequence in C[a, b]. Thus, for all ε > 0there is N such that n,m ≥ N imply ‖fn − fm‖ < ε/(b − a). The linearity property (i) and theboundedness property (iii) in Proposition 10.4 imply that∣∣∣ ∫ b

afn −

∫ b

afm

∣∣∣ =∣∣∣ ∫ b

a(fn − fm)

∣∣∣ ≤ (b− a)‖fn − fm‖ < ε.

It follows that the sequence of numbers

In =

∫ b

afn(x)dx

is a Cauchy sequence in R, thus it converges. �

Lemma 10.8 Let fn ∈ D[a, b] and gn ∈ D[a, b] be two sequences of piecewise affine functionsin D[a, b] that both converge to the same function f ∈ C[a, b], then

limn→∞

∫ b

afn(x)dx = lim

n→∞

∫ b

agn(x)dx. (10.12)

Proof. Consider the alternating sequence f1, g1, f2, g2, f3, g3 . . ., that is, the sequence hn with

h2k−1 = fk, h2k = gk.

This sequence also converges to f , as follows easily from the definition of convergence. Thus,Lemma 10.7 implies that

limn→∞

∫ b

ahn(x)dx

exists, but then all subsequences of ∫ b

ahn(x)dx

converge to this very same limit, and (10.12) follows. �

58

Page 59: Lecture notes for Math 61CM, Analysis, Version 2018math.stanford.edu/.../STANF61CM-18/lecture-notes-61cm-18-analysi… · Lecture notes for Math 61CM, Analysis, Version 2018 Lenya

Putting these together, we can define ∫ b

af

for f ∈ C([a, b]) as follows: take any sequence fn in D[a, b] such that fn → f in C[a, b], and set∫ b

af = lim

n→∞

∫ b

afn. (10.13)

This argument remains valid if the the target space R of f is replaced by any complete normed vectorspace V , such as Rm. Complete normed vector spaces are called Banach spaces.

It is straightforward to check that the integral inherits the properties from D[a, b] in Proposi-tion 10.4.

Proposition 10.9 (i) Linearity: let f1, f2 ∈ C([a, b]) and c1, c2 ∈ R, then∫ b

a(c1f1(x) + c2f2(x))dx = c1

∫ b

af1(x)dx+ c2

∫ b

af2(x)dx. (10.14)

(ii) Positivity: if f ∈ C([a, b]), and f(x) ≥ 0 for all x ∈ [a, b], then∫ b

af(x)dx ≥ 0. (10.15)

(iii) Boundedness: if f ∈ C([a, b]) then∣∣∣ ∫ b

af(x)dx

∣∣∣ ≤ ∫ b

a|f(x)|dx ≤ (b− a)‖f‖, (10.16)

where ‖f‖ is the norm on C(a, b]): ‖f‖ = sup{|f(x)| : x ∈ [a, b]}.(iv) Additivity: if a < b < c and f ∈ C([a, c]) then∫ b

af(x)dx+

∫ c

bf(x)dx =

∫ c

af(x)dx. (10.17)

Proof. Linearity follows from the linearity of the limit and of the integral on D[a, b]: if hn → f1and gn → f2, with hn, gn ∈ D[a, b], then

c1

∫ b

af1(x)dx+ c2

∫ b

af2(x)dx = c1 lim

n→∞

∫ b

ahn(x)dx+ c2 lim

n→∞

∫ b

agn(x)dx

= limn→∞

(c1

∫ b

ahn(x)dx+ c2

∫ b

agn(x)dx

)= lim

n→∞

∫ b

a(c1hn(x) + c2gn(x))dx

=

∫ b

a(c1f1(x) + c2f2(x))dx.

The first equality uses the definition of the integral, the second is the linearity of the limit in R,the third is the linearity of the integral on D[a, b], and the fourth uses the linearity of the limit inthe vector space C([a, b]), so that that c1hn + c2gn converges to c1f1 + c2f2 in C[a, b]. The proof ofadditivity is completely similar.

To show the boundedness property, note that if fn → f in C[a, b] with fn ∈ D[a, b], thenalso |fn| → |f | in C[a, b]. This is because∣∣|fn(x)| − |f(x)|

∣∣ ≤ |fn(x)− f(x)|, (10.18)

so that if the right side converges uniformly on [a, b] to zero as n→∞, then so does the left side.

59

Page 60: Lecture notes for Math 61CM, Analysis, Version 2018math.stanford.edu/.../STANF61CM-18/lecture-notes-61cm-18-analysi… · Lecture notes for Math 61CM, Analysis, Version 2018 Lenya

Exercise 10.10 Check this!

Now, the first inequality in (10.16) is easy: by boundedness in D, we have∣∣∣ ∫ b

afn(x)dx

∣∣∣ ≤ ∫ b

a|fn(x)|dx, (10.19)

and the left side in (10.19) converges to∣∣∣ ∫ b

afn(x)dx

∣∣∣→ ∣∣∣ ∫ b

af(x)dx

∣∣∣ (10.20)

by the definition of the integral. As |fn| → |f | in C[a, b] and |fn| ∈ D[a, b], the right side of (10.19)converges to ∫ b

a|f(x)|dx, (10.21)

and the first inequality in (10.16) now follows from (10.16), (10.19) and (10.20).To show the second inequality in (10.16), let ε > 0. Then there exists N such that

supx∈[a,b]

|fn(x)| ≤ supx∈[a,b]

|f(x)|+ ε,

for all n ≥ N , and |fn| ∈ D[a, b], so the second inequality in (10.6), the boundedness on D[a, b],shows that ∫ b

a|fn(x)|dx ≤ (b− a) sup

x∈[a,b]|fn(x)| ≤ (b− a)(sup |f |+ ε)

for n ≥ N . The integral in the left side above converges to∫ b

a|f(x)|dx,

while the right side does not depend on n, hence∫ b

a|f(x)|dx ≤ (b− a)(sup |f |+ ε). (10.22)

As ε > 0 was arbitrary, this shows that∫ b

a|f(x)|dx ≤ (b− a) sup

x∈[a,b]|f(x)|,

as claimed.Finally positivity follows the same way as the bound we just observed: if f ≥ 0 and fn → f ,

then for ε > 0 there exists N such that for n ≥ N , fn > −ε, and thus∫ b

afn(x)dx ≥ −ε

∫ b

a1 · dx = −(b− a)ε.

Note that 1 ∈ D, and ∫ b

a1 · dx = b− a,

60

Page 61: Lecture notes for Math 61CM, Analysis, Version 2018math.stanford.edu/.../STANF61CM-18/lecture-notes-61cm-18-analysi… · Lecture notes for Math 61CM, Analysis, Version 2018 Lenya

from the definition. As a consequence, we have∫ b

af(x)dx ≥ −(b− a)ε.

Since ε > 0 was arbitrary, we deduce that∫ b

af(x)dx ≥ 0,

finishing the proof. �In summary, we have shown the existence part of the following theorem.

Theorem 10.11 Given a < b, there exists a unique linear map Iba : C[a, b] → R such that forall f ∈ D[a, b] we have

Iba(f) =

∫ b

af(x)dx, (10.23)

and such that there is a constant K0 such that

|Ibaf | ≤ K0‖f‖, f ∈ C([a, b]). (10.24)

Furthermore, this unique linear map satisfies properties (10.14)-(10.17).

One writes ∫ b

af(x)dx = Iba(f).

Proof. As mentioned, we have already proved existence. To show uniqueness, assume that Iba is alinear map from C[a, b] to R that satisfies (10.24), and has the property that for fn ∈ D we have(10.23). Take any f ∈ C[a, b] and a sequence fn ∈ D[a, b] such that fn → f as n → ∞. Then, bylinearity of Iba, we have

Iba(f)− Iba(fn) = Iba(f − fn).

The estimate (10.24) implies then that

|Iba(f)− Iba(fn)| ≤ K0‖f − fn‖ → 0,

as n→ +∞. We conclude that

Ibaf = lim

∫ b

afn,

thus Iba is uniquely determined by its values on D[a, b]. �

10.3 The fundamental theorem of calculus

We now prove the Newton-Leibniz theorem, also known as the fundamental theorem of calculus.For this purpose it is convenient to define ∫ a

af = 0.

The first part of the theorem says that the integral gives an antiderivative.

61

Page 62: Lecture notes for Math 61CM, Analysis, Version 2018math.stanford.edu/.../STANF61CM-18/lecture-notes-61cm-18-analysi… · Lecture notes for Math 61CM, Analysis, Version 2018 Lenya

Theorem 10.12 (The fundamental theorem of calculus, part I) Suppose f : [a, b]→ R is continuous,and let

F (t) =

∫ t

af(x)dx, t ∈ [a, b].

Then the function F (x) is continuously differentiable on (a, b), and F ′(x) = f(x) for all x ∈ (a, b). Inaddition, the right derivative of F at x = a and the left derivative of F at b exist and are F ′r(a) = f(a)and F ′l (b) = f(b).

Proof: It suffices to show that F is differentiable with F ′ = f , since the continuity of F ′ thenfollows from that of f . Let t ∈ [a, b) be fixed and consider h > 0, the case of t ∈ (a, b] and h < 0being identical similar. Then, for h < b− t, we have, using the basic properties of the integral:

F (t+ h)− F (h) =

∫ t+h

af(x)dx−

∫ t

af(x)dx =

∫ t+h

tf(x)dx

=

∫ t+h

tf(t)dx+

∫ t+h

t(f(x)− f(t))dx = hf(t) +

∫ t+h

t(f(x)− f(t))dx,

(10.25)

so that

F (t+ h)− F (h)− hf(t) =

∫ t+h

t(f(x)− f(t))dx. (10.26)

Exercise 10.13 Check carefully which properties of the integral we are using in each step in (10.25).

Using the boundedness property (10.16) of the integral, we deduce from (10.26) that

|F (t+ h)− F (t)− hf(t)| ≤ h supt≤x≤t+h

{|f(x)− f(t)|}.

By the continuity of f at t, given ε > 0 there exists δ > 0 such that |x−t| < δ implies |f(x)−f(t)| < ε.Thus, for 0 < h < min(δ, b− t) we have

|F (t+ h)− F (t)− hf(t)| ≤ εh.

Together with the analogous argument for h < 0, this is exactly the definition of differentiabilityat t, with F ′(t) = f(t), proving the theorem. �

In order to prove the second part of the fundamental theorem of calculus, we need an observation.

Proposition 10.14 If f ∈ C([a, b]) is differentiable on (a, b) and f ′(x) = 0 for all x ∈ (a, b) then fis constant: f(x) = f(a) for all x ∈ [a, b].

Proof. Let a < x1 < x2 < b, then by the mean value theorem on [x1, x2], there is c ∈ (x1, x2) suchthat

f ′(c) =f(x2)− f(x1)

x2 − x1.

As, by assumption, we have f ′(c) = 0, we conclude that f(x1) = f(x2). Thus, for all x ∈ [a, b], wehave f(x) = f(a), so that f is a constant function. �

Theorem 10.15 (Fundamental theorem of calculus, part II) If f ∈ C1([a, b]) then

f(b)− f(a) =

∫ b

af ′(x)dx.

62

Page 63: Lecture notes for Math 61CM, Analysis, Version 2018math.stanford.edu/.../STANF61CM-18/lecture-notes-61cm-18-analysi… · Lecture notes for Math 61CM, Analysis, Version 2018 Lenya

This is the part of the theorem that is actually used to evaluate integrals explicitly: to find theintegral ∫ b

ag(x)dx,

for given a function g ∈ C([a, b]), one looks for f ∈ C1([a, b]) such that f ′ = g, and then∫ b

ag(x)dx = f(b)− f(a).

Note that part I of the fundamental theorem says that the indefinite integral of g is such a function,but does not give an explicit evaluation – you need a different method of finding the antiderivativefor the explicit calculation. For instance, if g(t) = tn, then you may check that f(t) = tn+1/(n+ 1)satisfies f ′(t) = g(t) using the product rule for differentiation, and then apply the second part of thefundamental theorem of calculus to find the integral of g(x).

Proof. Let us set

F (t) =

∫ t

af ′(x)dx.

By the first part of the fundamental theorem of calculus proven above, F is continuously differentiablewith F ′(t) = f ′(t) for all t ∈ [a, b]. Thus, the function g = f −F satisfies g ∈ C1([a, b]) and g′(t) = 0for all t ∈ [a, b]. Proposition 10.14 implies that g(b) = g(a), so that

f(b)− F (b) = f(a)− F (a),

which is exactly

f(b)− f(a) = F (b)− F (a) =

∫ b

af ′(x0dx,

completing the proof. �

A comment on the integration by parts

Note that integration by parts is a simple consequence of the fundamental theorem of calculus andthe product rule for differentiation:

(fg)′ = f ′g + fg′.

Indeed, for f, g ∈ C1([a, b]), we have

f(b)g(b)− f(a)g(a) = (fg)(b)− (fg)(a) =

∫ b

a(f(x)g(x))′dx =

∫ b

a(f ′(x)g(x) + f(x)g′(x))dx

=

∫ b

af ′(x)g(x)dx+

∫ b

af(x)g′(x)dx,

and now a simple rearrangement gives the integration by parts formula:∫ b

af(x)′g(x) = fg

∣∣∣ba−∫ b

af(x)g′(x)dx, fg

∣∣∣ba

= f(b)g(b)− f(a)g(a).

Consider, for example, the integral ∫ b

axexdx.

63

Page 64: Lecture notes for Math 61CM, Analysis, Version 2018math.stanford.edu/.../STANF61CM-18/lecture-notes-61cm-18-analysi… · Lecture notes for Math 61CM, Analysis, Version 2018 Lenya

If we know that (ex)′ = ex, but can not guess the anti-derivative of xex, we can integrate it by partsas follows: ∫ b

axexdx =

∫ b

ax(ex)′dx = xex

∣∣∣ba−∫ b

a1 · exdx = beb − aea − eb + ea.

Essentially, this discovers the fact that

xex =d

dx

[(x− 1)ex

].

Or we can use the fact that (log x)′ = 1/x, to compute, for 0 < a < b:∫ b

ax log xdx =

∫ b

a(log x)

(x22

)′dx =

b2

2log b− a2

2log a−

∫ b

a

1

x· x

2

2dx

=b2

2log b− a2

2log a− b2

4+a2

4.

It may also help to understand convergence of indefinite integrals. Recall that if a function f(x)is continuous on [a,+∞), then we set∫ ∞

af(x)dx = lim

R→∞

∫ R

af(x)dx, (10.27)

provided that the limit in the right side exists. Let us look at∫ ∞1

sinx

xpdx.

Note that if p > 1, then we have∣∣∣ ∫ R2

R1

sinx

xpdx∣∣∣ ≤ ∫ R2

R1

| sinx|xp

dx ≤∫ R2

R1

1

xpdx =

1

p+ 1

( 1

Rp+11

− 1

Rp+11

)≤ 1

p+ 1

1

Rp+11

, (10.28)

thus the limit

limR→∞

∫ R

1

sinx

xpdx

exists, and the integral ∫ ∞1

sinx

xpdx

is well-defined.

Exercise 10.16 Make the above argument precise: use (10.28) to show that if p > 1 then for anysequence Rn → +∞ the limit

limn→+∞

∫ Rn

1

sinx

xpdx

exists and does not depend on the choice of the sequence Rn → +∞. Hint: show that the sequence

In =

∫ Rn

1

sinx

xpdx

is Cauchy.

The question we would like to answer is whether the integral converges for p ∈ (0, 1). Note that theintegral has to diverge for p ≤ 0 since the integrand does not tend to zero as n→ +∞.

64

Page 65: Lecture notes for Math 61CM, Analysis, Version 2018math.stanford.edu/.../STANF61CM-18/lecture-notes-61cm-18-analysi… · Lecture notes for Math 61CM, Analysis, Version 2018 Lenya

Exercise 10.17 Let the function f be continuously differentiable on [1,+∞), and assume that theintegral ∫ ∞

1f(x)dx

exists, that is, the limit in the right side of (10.27) exists with a = 1. Assume, in addition, that

there exists M > 0 so that |f ′(x)| ≤M for all x ≥ 1. (10.29)

(i) Show that thenlim

x→+∞f(x) = 0.

(ii) Show that this conclusion may be false without assumption (10.29).

Let us then consider the case 0 < p < 1. We have to use some cancellation because of the signchanges in sinx, because of the following.

Exercise 10.18 Show that the integral ∫ ∞1

| sinx|xp

dx

does not converge for p ∈ (0, 1).

Consider next

I(R) =

∫ R

1

sinx

xpdx,

with the idea to let R→ +∞ eventually. Note that sinx = (− cosx)′, (1/xp)′ = −p/xp+1, hence wecan integrate by parts as follows:

I(R) =

∫ R

1

sinx

xpdx =

∫ R

1

(− cosx)′

xpdx = −cosR

Rp+ cos 1−

∫ R

1

( 1

xp

)′(− cosx)dx

= −cosR

Rp+ cos 1− p

∫ R

1

cosx

xp+1dx.

(10.30)

As in Exercise 10.16, the integral∫ ∞1

cosx

xmdx = lim

R→+∞

∫ R

1

cosx

xmdx

exists for all m > 1, thus the limit

limR→+∞

∫ R

1

cosx

xp+1dx

exists for all p > 0 and equals ∫ ∞1

cosx

xp+1dx.

Going back to (10.30) and passing to the limit R→ +∞, we conclude that the integral∫ ∞1

sinx

xpdx = cos 1− p

∫ ∞1

cosx

xp+1dx

also exists. Note how integration by parts allowed us to increase the power of x in the denominator,and reduce a problem that involves the integral of a function that is decays slowly so the integral isnot absolutely convergent (this terminology is similar to the absolute convergence to a series) to anintegrand that decays faster by one power of x and the integral converges absolutely.

65

Page 66: Lecture notes for Math 61CM, Analysis, Version 2018math.stanford.edu/.../STANF61CM-18/lecture-notes-61cm-18-analysi… · Lecture notes for Math 61CM, Analysis, Version 2018 Lenya

11 The contraction mapping principle

The contraction mapping principle is the most basic tool that can be used to prove existence ofsolutions in many situations in analysis. Many problems can be formulated in the form

f(x) = y0, (11.1)

where y0 is an element of some metric space X, f is a mapping from X to X, and x is the unknownthat we need to find. We can reformulate it as

f(x) + x− y0 = x.

The advantage of the latter formulation is that now we have what is known as a fixed point problem.These are equations of the form

F (x) = x, (11.2)

where F is a mapping from a metric space X to itself, and x is an unknown point x ∈ X. A solutionof (11.2) is known as a fixed point of the mapping F . In other words, (11.1) is equivalent to (11.2)with

F (x) = f(x) + x− y0. (11.3)

Of course, there is a serious difference: the notion of a fixed point as a solution to (11.2) requiresonly that F maps X to itself, and the notion of a solution of (11.1) also requires only that f maps Xto itself and y0 ∈ X. However, to pass from (11.1) to (11.2) we need to introduce the mapping Fin (11.3) which requires an addition structure on X: we need X to be a vector space to be able todo that. However, often X is a vector space, so that issue is not a problem.

An important class of mappings of a metric space X onto itself are contractions. We say that amapping f : X → X is a contraction if there exists a number q ∈ (0, 1) so that for an x1, x2 ∈ Xwe have

d(f(x1), f(x2)) ≤ qd(x1, x2). (11.4)

We also need the following definition: a metric space X is complete if any Cauchy sequence in Xconverges. The spaces Rn are complete. Before we proceed further with the contraction mappingtheorem, let us show that the space C[0, 1] with the norm

‖f‖ = sup0≤x≤1

|f(x)|

is a complete metric space.

Theorem 11.1 The metric space C[0, 1] is complete.

Proof. Let fn be a Cauchy sequence in C[0, 1]. This means that for any ε > 0 there exists N sothat for any n,m ≥ N we have

‖fn − fm‖ < ε.

In other words, we havesup

0≤x≤1|fn(x)− fm(x)| < ε,

so that|fn(x)− fm(x)| < ε,

for all n,m ≥ N and all x ∈ [0, 1]. This means that the sequence fn is uniformly Cauchy on [0, 1].Recall that Theorem 8.16 implies that then fn is uniformly convergent, and by Theorem 8.17 thelimit is a continuous function. Thus, the space C[0, 1] is a complete metric space. �

The contraction mapping principle says the following.

66

Page 67: Lecture notes for Math 61CM, Analysis, Version 2018math.stanford.edu/.../STANF61CM-18/lecture-notes-61cm-18-analysi… · Lecture notes for Math 61CM, Analysis, Version 2018 Lenya

Theorem 11.2 Let X be a complete metric space, and f : X → X be a contraction, then fhas a unique fixed point a in X. Moreover, for any x0 ∈ X, the sequence defined recursivelyby xk+1 = f(xk), with x1 = f(x0), converges to a as n → +∞. The rate of convergence can beestimated by

d(xn, a) ≤ qn

1− qd(x1, x0). (11.5)

Note that the theorem provides an algorithm to compute the unique fixed point, and that the rateof convergence in (11.5) depends on how close q is to 0 or 1: it gets faster for q close to 0 and slowerfor q close to 1.

Proof. We will show that the sequence xk is a Cauchy sequence. Note that

d(xn+1, xn) = d(f(xn), f(xn−1)) ≤ qd(xn, xn−1), (11.6)

so that an induction argument shows that

d(xn+1, xn) ≤ qnd(x1, x0). (11.7)

Now, by the triangle inequality and (11.7), we have

d(xn+k, xn) ≤ d(xn+k, xn+k−1) + d(xn+k−1, xn+k−2) + · · ·+ d(xn+1, xn)

≤ (qn+k−1 + qn+k−2 + · · ·+ qn)d(x1, x0) ≤qn

1− qd(x1, x0).

(11.8)

It follows that if 0 < q < 1, then the sequence xn is Cauchy. Since the space X is complete, thelimit of xn exists, and we set

a = limn→∞

xn.

Now, as f is a continuous map, passing to the limit n→∞ in the recursion relation xn = f(xn−1),we arrive at a = f(a), hence a is a fixed point of f .

The reason the fixed point is unique is that f is a contraction. Indeed, if a1 and a2 are two fixedpoints, so that a1 = f(a1) and a2 = f(a2), then by the contraction property we have

d(f(a1), f(a2)) ≤ qd(a1, a2).

However, as both a1 and a2 are fixed points of f , the left side above equals d(a1, a2). Since q ∈ (0, 1),we deduce that d(a1, a2) = 0 and a1 = a2. �

Existence theorem for ordinary differential equations

Let us consider an ordinary differential equation (ODE) for an unknown function y(x)

y′ = f(x, y) (11.9)

supplemented by the initial conditiony(x0) = y0, (11.10)

with some given x0 ∈ R and y0 ∈ R. We assume that the function f(x, y) is continuous in (x, y) andLipschitz in y: there exists a constant M > 0 so that

|f(x, y1)− f(x, y2)| ≤M |y1 − y2| for all x ∈ R and y1, y2 ∈ R. (11.11)

Theorem 11.3 Under the above assumptions on f(x, y), there exists an interval (x0 − δ0, x0 + δ0),so that the problem (11.9)-(11.10) has a unique solution y(x) on the interval (x0 − δ0, x0 + δ0).

67

Page 68: Lecture notes for Math 61CM, Analysis, Version 2018math.stanford.edu/.../STANF61CM-18/lecture-notes-61cm-18-analysi… · Lecture notes for Math 61CM, Analysis, Version 2018 Lenya

Proof. We can write (11.9-(11.10) together, using the fundamental theorem of calculus as

y(x) = y0 +

∫ x

x0

f(t, y(t))dt. (11.12)

Let us define the map A that maps a function y(x) to a function A[y] via

(A[y])(x) = y0 +

∫ x

x0

f(t, y(t))dt. (11.13)

Then (11.12) can be written asy(x) = A[y](x), (11.14)

so that y(x) is a solution to (11.12), or, equivalently, to (11.9)-(11.10) if and only if the function yis a fixed point of the mapping A. The first step is to show that A maps the space C[x0 − δ, x0 + δ]to itself.

Lemma 11.4 If y ∈ C[x0 − δ, x0 + δ] for some δ > 0, then A[y] is also in C[x0 − δ, x0 + δ].

Proof of Lemma. Let y(x) be a continuous function on the closed interval [x0− δ, x0 + δ] for somegiven δ > 0. Then for any x1, x− 2 ∈ [x0 − δ, x0 + δ], we have

A[y](x1)−A[y](x2) = y0 +

∫ x1

x0

f(t, y(t))dt− y0 −∫ x2

x0

f(t, y(t))dt =

∫ x1

x2

f(t, y(t))dt. (11.15)

The function y is continuous on [x0− δ, x0 + δ], hence it is bounded on that interval: there exists Ksuch that |y(t)| ≤ K for all x0−δ ≤ t ≤ x0+δ. As the function f is continuous on [x0−δ, x0+δ]×R,there exists M > 0 so that |f(x, y)| ≤M for all x ∈ [x0− δ, x0 + δ] and y ∈ [−K,K]. It follows that|f(t, y(t))| ≤M for all t ∈ [x0 − δ, x0 + δ]. Using this in (11.15) gives

|A[y](x1)−A[y](x2)| ≤∫ x1

x2

|f(t, y(t))|dt ≤M |x1 − x2|. (11.16)

It follows that the function A[y] is continuous. �We return to the proof of the theorem. Our goal is to show that if δ is sufficiently small, then A

is a contraction on C[x0 − δ, x0 + δ]. To this end, let us take two functions y1, y2 ∈ C[x0 − δ, x0 + δ]and write

A[y1](x)−A[y2](x) = y0 +

∫ x

x0

f(t, y1(t))dt− y0 −∫ x

x0

f(t, y2(t))dt =

∫ x

x0

[f(t, y1(t))− f(t, y2(t))]dt.

(11.17)We will now use the Lipschitz property (11.11) of the function f(x, y):

|A[y1](x)−A[y2](x)| ≤∫ x

x0

|f(t, y1(t))− f(t, y2(t))|dt ≤∫ x

x0

M |y1(t)− y2(t)|dt. (11.18)

Note that for all t ∈ [x0, x] we have

|y1(t)− y2(t)| ≤ supx0−δ≤x≤x0+δ

|y1(x)− y2(x)| = ‖y1 − y2‖.

Using this in (11.18), we arrive at

|A[y1](x)−A[y2](x)| ≤∫ x

x0

M |y1(t)−y2(t)|dt ≤∫ x

x0

M‖y1−y2‖|dt = M |x−x0|‖y1−y2‖ ≤Mδ‖y1−y2‖.

(11.19)

68

Page 69: Lecture notes for Math 61CM, Analysis, Version 2018math.stanford.edu/.../STANF61CM-18/lecture-notes-61cm-18-analysi… · Lecture notes for Math 61CM, Analysis, Version 2018 Lenya

Taking supremum over all x ∈ [x0 − δ, x0 + δ] gives

‖A[y1]−A[y2]‖ ≤ supx∈[x0−δ,x0+δ]

|A[y1](x)−A[y2](x)| ≤Mδ‖y1 − y2‖. (11.20)

Therefore, if Mδ < 1 then A is a contraction on C[x0 − δ, x0 + δ]. It follows that A has a uniquefixed point y in C[x0 − δ, x0 + δ]. This means that the function y(x) satisfies (11.12):

y(x) = y0 +

∫ x

x0

f(t, y(t))dt. (11.21)

It follows immediately that y(x0) = y0. Moreover, as the function y(t) is continuous, and f(t, y) iscontinuous in both variables, it follows that p(t) = f(t, y(t)) is continuous in t. The fundamentaltheorem of calculus implies then that y(x) is differentiable and

y′(x) = f(x, y(x)). (11.22)

This finishes the proof. �Let us now combine the existence theorem for ODEs with the construction of the fixed point of

a contraction mapping in the proof of the existence theorem of a fixed point. Consider an ODE

y′(t) = y(t), y(0) = 1,

and write it, as in (11.12), in the form

y(t) = 1 +

∫ t

0y(s)ds.

The mapping A is now defined via

A[y](t) = 1 +

∫ t

0y(s)ds.

Consider the recursive sequence yn(t), with y0(t) = 1, and

yn+1(t) = 1 +

∫ t

0yn(s)ds.

Exercise 11.5 Show by induction that

yn(t) =

n∑k=1

tk

k!.

We see that the unique solution is y(t) = et.

12 The implicit function theorem

The implicit function theorem addresses the question of when an equation of the form f(x, y) = 0uniquely defines x in terms of y, or y in terms of x. The former question means that given y, wetrying to ”solve the equation f(x, y) = 0 for an unknown x”. The latter asks when an equationof the form f(x, y) = 0 uniquely defines a function y(x) – hence, the name the implicit functiontheorem. The two questions are totally equivalent but the points of view are slightly different.

69

Page 70: Lecture notes for Math 61CM, Analysis, Version 2018math.stanford.edu/.../STANF61CM-18/lecture-notes-61cm-18-analysi… · Lecture notes for Math 61CM, Analysis, Version 2018 Lenya

Let us consider a very simple example: the equation

x2 − y = 0. (12.1)

Then, for y > 0 this equation has two solutions x = ±√y, for y = 0 it has one solution x = 0,and for y < 0 it has no real solutions. Let, us change our perspective somewhat. Suppose we knowa particular solution (x0, y0) – that is, x20 = y0 and we ask: given a y close to y0, can we find aunique x close to x0 so that x2 = y? That is, if we perturb y0 slightly, can we still find a uniquesolution of (12.1) close to the original solution x0? Note that we have y0 ≥ 0 automatically, simplybecause x20 = y0. The answer to our question is that if y0 > 0, and, say, x0 > 0 then, indeed,for y close to y0 we still have a solution to (12.1) that is close to x0: x =

√y. Similarly, if we

have x0 < 0, then we still have a solution of (12.1) that is close to x0: x = −√y. On the other hand,if x0 = 0 so that y0 = 0, then in any interval y ∈ (−δ, δ) around y0 = 0 and any interval (−δ′, δ′)around x0 = 0 we can find y < 0 for which the equation x2 = y has no solutions and y > 0 forwhich x2 = y has two solutions in the interval (−δ′, δ′) – we just need to take y < (δ′)2. Thus, thereis a difference between x0 = 0, y0 = 0 and other points on the graph of y = x2 – we can invert therelationship around the latter but not the former. The implicit function theorem generalizes thistrivial observation.

12.1 The inverse function theorem on R

We begin with the inverse function theorem, that looks not at an ”implicit” equation f(x, y) = 0 butat the simpler problem of solving an equation of the form f(x) = y. In one dimension the situationis quite simple.

Proposition 12.1 Let f(x) be continuously differentiable on an interval [a, b], and set

m = infa≤x≤b

f(x), M = supa≤x≤b

f(x).

(i) Then f is a one-to-one map from [a, b] to [m,M ] if and only if f is monotonic on [a, b]. (ii) In ad-dition, if f is monotonic on [a, b] then the inverse function g = f−1 : [m,M ]→ [a, b] is continuouslydifferentiable at y0 ∈ [m,M ] if and only if f ′(g(y0)) 6= 0. In that case, g′(y0) = 1/f ′(g(y0)).

Exercise 12.2 Prove the first statement (i) in the above proposition.

To prove (ii), assume first that x0 ∈ (a, b) and f ′(x0) 6= 0. Without loss of generality, we mayassume that f ′(x0) > 0. As the function f ′ is continuous at x0, there exists δ > 0 so that

f ′(x) > f ′(x0)/2 for all x ∈ (x0 − δ, x0 + δ), (12.2)

thus f is strictly increasing on that interval. Let us set α = f(x0 − δ), β = f(x0 + δ) and takesome y ∈ (α, β), so that y = f(x) for some x ∈ (x0−δ, x0+δ), that is, x = g(y) – recall that g = f−1.Our goal is to show that the function g(y) is differentiable at y0 and g′(y0) = 1/f ′(g(y0)). As f isdifferentiable on (x0 − δ, x0 + δ), there exists c between x and x0 so that we have

f(x)− f(x0) = f ′(c)(x− x0), (12.3)

and, because of (12.2), we know that f ′(c) > f ′(x0)/2 > 0. Take now some y ∈ (f(x0−δ), f(x0+δ)),and use (12.3) with x = g(y) ∈ (x0 − δ, x0 + δ). This gives

y − y0 = f ′(c)(g(y)− g(y0)), (12.4)

70

Page 71: Lecture notes for Math 61CM, Analysis, Version 2018math.stanford.edu/.../STANF61CM-18/lecture-notes-61cm-18-analysi… · Lecture notes for Math 61CM, Analysis, Version 2018 Lenya

so thatg(y)− g(y0)

y − y0=

1

f ′(c), (12.5)

with c between x = g(y) and x0 = g(y0). We now let y → y0 in (12.5). As c lies between g(y)and g(y0), and the function g(y) is continuous, we know that c→ g(y0) as y → y0. Hence, the limitof the right side of (12.5) exists, and thus so does the limit of the left side, and

limy→y0

g(y)− g(y0)

y − y0=

1

f ′(x0), (12.6)

which means exactly that g′(y0) = 1/f ′(g(y0)), as claimed. �

12.2 The inverse function theorem for maps Rn → Rn: an outline

Before we proceed with the inverse function theorem for maps from Rn to Rn, that is, n×n systemsof equations

F1(x1, . . . , xn) = y1,

. . . . . .

Fn(x1, . . . , xn) = yn,

(12.7)

let us first consider a special case when F : Rn → Rn is an affine map, that is, then there existsan n× n matrix A and a vector q ∈ Rn, so that

F (x) = Ax+ q. (12.8)

In that case, the equation F (x) = y takes the form

Ax+ q = y, (12.9)

and has an explicit solutionx = A−1(y − q),

provided that the matrix A is invertible. A general differentiable map F can be approximated in aneighborhood of a point x0 as

F (x) ≈ F (x0) + [DF (x0)](x− x0), (12.10)

in the sense thatF (x)− (F (x0) + [DF (x0)](x− x0)) = o(‖x− x0‖). (12.11)

Here, DF (x0) is the derivative matrix of F at x0. Now, for y close to y0 = F (x0), let us replace theexact equation

F (x) = y (12.12)

by an approximate equationF (x0) + [DF (x0)](x− x0) = y. (12.13)

Warning: at the moment, we do not really know that solutions of (12.12) and (12.13) are close, weare simply trying to understand what should be important for (12.12) to have a solution x closeto x0. The question you may want to keep in the back of your mind is why the solution to anapproximate equation (12.13) is close to a solution of the true equation (12.12) – this should become

71

Page 72: Lecture notes for Math 61CM, Analysis, Version 2018math.stanford.edu/.../STANF61CM-18/lecture-notes-61cm-18-analysi… · Lecture notes for Math 61CM, Analysis, Version 2018 Lenya

clear from the proof of Theorem 12.7. Note that the approximate equation (12.13) has the familiarform (12.9), and its solution is explicit:

x = x0 + [DF (x0)]−1(y − y0). (12.14)

Recall that y0 = F (x0). For (12.14) to make sense, we must know that the matrix DF (x0) isinvertible. This is a generalization of the condition f ′(f−1(y0)) 6= 0 in Proposition 12.1. A naturalguess then is that a map F : Rn → Rn is invertible in a neighborhood of a point y0 = F (x0) if thederivative matrix DF (x0) is an invertible matrix. This will be our goal.

12.3 Some preliminaries

We will need the following auxiliary lemmas for the proof of the inverse function theorem.

Lemma 12.3 Let A be an n × n matrix, U ⊂ Rn be an open set, and a map f : U → Rn becontinuously differentiable on U . Set g(x) = Af(x), then g : U → Rn is a continuously differentiablemap with

Dg(x) = ADf(x). (12.15)

Proof. Recall that the entries of the matrix Dg are

[Dg(x)]ij =∂gi(x)

∂xj,

and

gi(x) =

n∑k=1

Aikfk(x).

It follows that∂gi(x)

∂xj=

n∑k=1

Aik∂fk(x)

∂xj=

n∑k=1

Aik[Df(x)]kj = (ADf(x))ij ,

thus Dg(x) = ADf(x), and we are done. �

Lemma 12.4 Let A be an n× n matrix with entries such that Aij ≤ ε for all 1 ≤ i, j ≤ n, then forany vector v ∈ Rn we have ‖Av‖ ≤ nε‖v‖.

Proof. First, we recall the inequality

(x1 + · · ·+ xn)2 ≤ n(x21 + · · ·+ x2n). (12.16)

To see that (12.16) holds, write

(x1 + · · ·+ xn)2 = x21 + · · ·+ x2n + 2∑

1≤i<j≤nxixj ≤ x21 + · · ·+ x2n +

∑1≤i<j≤n

(x2i + x2j )

= x21 + · · ·+ x2n + (n− 1)(x21 + · · ·+ x2n) = n(x21 + · · ·+ x2n).

Alternatively, one can use the Cauchy-Schwarz inequality: observe that

x1 + x2 + · · ·+ xn = (1, 1, , . . . , 1) · (x1, x2, . . . , xn),

thus|x1 + x2 + · · ·+ xn|2 ≤

√‖(1, 1, . . . , 1)‖x‖ =

√n‖x‖,

72

Page 73: Lecture notes for Math 61CM, Analysis, Version 2018math.stanford.edu/.../STANF61CM-18/lecture-notes-61cm-18-analysi… · Lecture notes for Math 61CM, Analysis, Version 2018 Lenya

which is (12.16). Note that for each 1 ≤ i ≤ n we have

|(Av)i| =∣∣∣ n∑j=1

Aijvj

∣∣∣ ≤ n∑j=1

|Aij ||vj | ≤ εn∑j=1

|vj |,

so that, using (12.16), we get

|(Av)i|2 ≤ ε2( n∑j=1

|vj |)2≤ nε2

n∑j=1

|vj |2 = nε2‖v‖2.

Next, summing over i, we get

‖Av‖2 =

n∑i=1

|(Av)i|2 ≤ n2ε2‖v‖2,

and the claim of the lemma follows. �

Lemma 12.5 Let A be an n× invertible matrix. Then there exist αA > 0 and βA > 0 so that

βA‖x‖ ≤ ‖Ax‖ ≤ αA‖x‖, for all x ∈ Rn. (12.17)

Proof. The function G(x) = ‖Ax‖ is continuous on Rn, and the unit sphere S = {x ∈ Rn : ‖x‖ = 1}is a compact subset of Rn. Hence, G attains its maximum αA and minimum βA on S. Moreover, asthe matrix A is invertible, G(x) 6= 0 for all x ∈ S, thus αA > 0 and βA > 0. However, each x ∈ Rncan be written as x = ry, with ‖y‖ = 1, and r = ‖x‖, so that y ∈ S and r ≥ 0. Then, we have

‖Ax‖ = ‖A(ry)‖ = ‖rA(y)‖ = r‖A(y)‖ = rG(y) = ‖x‖G(y),

thus‖Ax‖ ≤ ‖x‖αA,

and‖Ax‖ ≥ ‖x‖βA,

finishing the proof. �

Corollary 12.6 Let A be an n × n invertible matrix, then the map F (x) = Ax maps any openset U ∈ Rn to an open set.

Proof. Lemma 12.5 applied to the matrix A−1 implies that there exists α > 0 so that

‖A−1w‖ ≤ α‖w‖ for all w ∈ Rn. (12.18)

Let now U be any open set and V = F [U ] be the image of U under F . Take any y0 ∈ F [U ], sothat y0 = Ax0, with x0 ∈ U . As the set U is open, it contains a ball B(x0, r) with some r > 0.Consider any z ∈ B(y0, ρ), with ρ < r/α. Then z = F (x), with x = A−1(z), and we have,using (12.18):

‖x− x0‖ = ‖A−1(z)−A−1(y0)‖ = ‖A−1(z − y0)‖ ≤ α‖z − y0‖ ≤ αρ < r, (12.19)

so that x ∈ B(x0, r), and thus x ∈ U . It follows that z ∈ F [U ], thus the ball B(y0, ρ) is containedin V and the set V is open. �

73

Page 74: Lecture notes for Math 61CM, Analysis, Version 2018math.stanford.edu/.../STANF61CM-18/lecture-notes-61cm-18-analysi… · Lecture notes for Math 61CM, Analysis, Version 2018 Lenya

12.4 The inverse function theorem for maps Rn → Rn

We will now prove the inverse function theorem for maps from Rn to Rn.

Theorem 12.7 Let U ∈ Rn be an open set, and x0 ∈ U . Let f : U → Rn be a continuouslydifferentiable map, and set y0 = f(x0). Suppose that the derivative matrix Df(x0) is invertible.Then there exist an open set V ⊂ U such that x0 ∈ V , and an open set W ⊂ Rn such that y0 ∈ W ,so that f is a one-to-one map from V to W . Moreover, the inverse map g = f−1 : W → V is alsocontinuously differentiable and for y ∈W we have Dg(y) = [Df(g(y))]−1.

Proof. Step 1. Reduction to the case Df(x0) = I. We first note that it suffices to prove thetheorem under an additional assumption that

Df(x0) = In, (12.20)

the n× n identity matrix In. Indeed, if Df(x0) 6= In, we consider the map

f(x) = [Df(x0)]−1f(x).

Then the gradient matrix of f at the point x0 is In:

Df(x0) = [Df(x0)]−1Df(x0) = In,

as follows from Lemma 12.3. Since the matrix Df(x0) is invertible, the function f is one-to-one froma neighborhood V of x0 to a neighborhood W of y0 = f(x0) if and only if the function f is a one-to-one map from V to W = [Df(x0)]

−1W , and W is a neighborhood of the point y0 = f(x0). Thisis a consequence of Corollary 12.6. Again, as the matrix Df(x0) is invertible, and by Lemma 12.3we have

Df(x) = [Df(x0)]−1Df(x),

the function f is continuously differentiable if and only if f is continuously differentiable. Hence, wemay assume without any loss of generality that f satisfies (12.20), and this is what we will do forthe rest of the proof.

As the map f(x) is continuously differentiable in U , and f satisfies (12.20), for any ε > 0 thereexists r > 0 so that ∣∣∣ ∂fi

∂xj− δij

∣∣∣ < ε (12.21)

for all x ∈ B(x0, r). Here, δij is the Kronecker delta: δij = 1 if i = j and δij = 0 if i 6= j. In otherwords, for each x ∈ B(x0, r), we have

Df(x) = I + E(x), (12.22)

with the matrix E(x) such that

|Eij(x)| ≤ ε for all 1 ≤ i, j ≤ n. (12.23)

It follows that for every v ∈ Rn and all x ∈ B(x0, r) we have, by the triangle inequality

‖Df(x)v‖ = ‖(I + E(x))v‖ = ‖v + E(x)v‖ ≥ ‖v‖ − ‖E(x)v‖.

Now, Lemma 12.4, with A = E(x), implies that for every v ∈ Rn and all x ∈ B(x0, r) we have

‖Df(x)v‖ ≥ ‖v‖ − εn‖v‖ ≥ 1

2‖v‖,

74

Page 75: Lecture notes for Math 61CM, Analysis, Version 2018math.stanford.edu/.../STANF61CM-18/lecture-notes-61cm-18-analysi… · Lecture notes for Math 61CM, Analysis, Version 2018 Lenya

as long as ε < 1/(2n). In particular, it follows that the kernel of the matrix Df(x) is {0}, and thematrix Df(x) is invertible for all x ∈ B(x0, r).

Step 2. Reformulation as a fixed point problem for a contraction mapping. Now, weturn to solving

f(x) = y (12.24)

for y close to y0 = f(x0), with the assumption Df(x0) = In. Note that, because of (12.20), theapproximate linear equation (12.13) in this case takes the particularly simple form

f(x0) + x− x0 = y, (12.25)

with y0 = f(x0), that, naturally, has a unique solution

x = x0 + y − y0.

We assume that y ∈ B(y0, ρ) and will later see how small ρ needs to be. In order to turn (12.24)into a contraction mapping question, let us reformulate (12.24) as a fixed point problem by setting

F (x) = x− f(x) + y, (12.26)

again, with y ∈ B(y0, ρ) fixed. The map F depends on y – we will drop it in the notation but youshould keep this in mind. Note that x satisfies (12.24) if and only if it satisfies

F (x) = x. (12.27)

This is a fixed point problem and we will address it using the contraction mapping principle. Thefollowing lemma is a key part of the proof.

Lemma 12.8 There exist ρ > 0 and r1 > 0 so that for each y ∈ B(y0, ρ) the map F maps the closedball B(x0, r1) to itself and is a contraction on B(x0, r1).

As for each y ∈ B(y0, ρ), the map F is a contraction, the contraction mapping principle will thenimply that for each y ∈ B(x0, ρ) there exists a unique x ∈ B(x0, r1) such that F (x) = x, whichmeans that there is a unique solution to (12.24) in B(x0, r1) for each y ∈ B(y0, ρ). In other words,we may define the inverse map f−1 that maps B(y0, ρ) to B(x0, r1).

Proof of Lemma 12.8. The proof requires us to verify two conditions: first, that F maps theclosed ball B(x0, r1) to itself, and, second, that there exists q ∈ (0, 1) so that for any z, w ∈ B(x0, r1)we have

‖F (z)− F (w)‖ ≤ q‖z − w‖. (12.28)

We will start with (12.28). Note that, as Df(x0) = In, we have

DF (x0) = 0. (12.29)

To verify that F satisfies (12.28), let us take z, w ∈ B(x0, r), with r determined by the conditionthat (12.21) holds in B(x0, r), and define

g(t) = F (z + t(w − z)),

for 0 ≤ t ≤ 1, so that F (z) = g(0), and F (w) = g(1). The fundamental theorem of calculus impliesthat

Fk(w)− Fk(z) =

∫ 1

0g′k(t)dt, 1 ≤ k ≤ n. (12.30)

75

Page 76: Lecture notes for Math 61CM, Analysis, Version 2018math.stanford.edu/.../STANF61CM-18/lecture-notes-61cm-18-analysi… · Lecture notes for Math 61CM, Analysis, Version 2018 Lenya

We compute the derivative g′k(t) using the chain rule:

dgk(t)

dt=

n∑j=1

∂Fk(z + t(w − z))∂xj

(wj − zj). (12.31)

The definition (12.26) of F (x) implies that

∂Fk(x)

∂xj= δkj −

∂fk(x)

∂xj.

Recalling (12.22)-(12.23), we see that the derivative matrix DF (x) = E(x), and thus the entries ofthe matrix DF (x) satisfy ∣∣∣∂Fi

∂xj

∣∣∣ ≤ ε for all 1 ≤ i, j ≤ n. (12.32)

This is why we modified f(x) to make sure that Df(x0) = I at the beginning of the proof. Now,using (12.32) in (12.31) gives ∣∣∣dgk(t)

dt

∣∣∣ ≤ n∑j=1

ε|wj − zj | ≤ nε‖w − z‖, (12.33)

for all 1 ≤ k ≤ n. Using this in (12.30) gives

|Fk(w)− Fk(z| ≤∫ 1

0|g′k(t)|dt ≤

∫ 1

0nε‖w − z‖dt = nε‖w − z‖, (12.34)

for all 1 ≤ k ≤ n. It follows from (12.34) that

‖F (w)− F (z)‖ ≤( n∑k=1

|Fk(w)− Fk(z|2)1/2

≤ (n(n2ε2‖w − z‖2))1/2 = εn3/2‖w − z‖. (12.35)

Therefore, F satisfies (12.28) with q = 1/2 on B(x0, r) if we take ε = 1/(2n3/2) and then take r1 > 0so small that (12.21) holds in B(x0, r1). Thus, we have established that

‖F (w)− F (z)‖ ≤ 1

4‖w − z‖, for all z, w ∈ B(x0, r1), (12.36)

if r1 is sufficiently small. Note that even though the map F depends on y, this estimate holds forall y ∈ B(y0, ρ): this will be important in Step 3 below.

Next, we need to verify that F maps B(x0, r1) to itself if ρ is sufficiently small and y ∈ B(y0, ρ).Note that

F (x0) = x0 − f(x0) + y = x0 − y0 + y, (12.37)

hence we have‖F (x0)− x0‖ = ‖y − y0‖ ≤ ρ <

r12, (12.38)

provided we take ρ < r1/2, thus, in particular, F (x0) ∈ B(x0, r1). Next, we take x ∈ B(x0, r1) andwrite, using the triangle inequality

‖F (x)− x0‖ = ‖F (x)− F (x0) + F (x0)− x0‖ ≤ ‖F (x)− F (x0)‖+ ‖F (x0)− x0‖. (12.39)

The first term in the right can be estimated using (12.36), and the second using (12.38), to give

‖F (x)− x0‖ ≤ ‖F (x)− F (x0)‖+ ‖F (x0)− x0‖ ≤1

4‖x− x0‖+

r12≤ r1, (12.40)

finishing the proof of Lemma 12.8. �

76

Page 77: Lecture notes for Math 61CM, Analysis, Version 2018math.stanford.edu/.../STANF61CM-18/lecture-notes-61cm-18-analysi… · Lecture notes for Math 61CM, Analysis, Version 2018 Lenya

Exercise 12.9 Use the above argument to show that there exists r0 so that for any r < r0 the imageof the ball B(x0, r) under f is an open set: it contains a ball B(y0, ρ) centered around y0 = f(x0),with ρ > 0 that depends on r.

Step 3: continuity of the inverse map. We now show that the inverse map g = f−1 is acontinuous map from B(y0, ρ) to B(x0, r1), provided that ρ and r1 are sufficiently small. Recall thatfor a given y ∈ B(y0, ρ), the point g(y) ∈ B(x0, r1) is the unique fixed point in B(x0, r1) of the map

F (x) = x− f(x) + y,

that depends on y, and that is how g(y) depends on y. To show that g(y) is continuous, we need toshow that the fixed point of F depends continuously on y. Let us take y, y′ ∈ B(x0, ρ) and considerthe corresponding maps

F (x) = x− f(x) + y, F ′(x) = x− f(x) + y′.

Then, for any z, w ∈ B(x0, r1), we have

F (z)− F ′(w) = F (z)− F (w) + F (w)− F ′(w) = F (z)− F (w) + w − f(w) + y − (w − f(w) + y′)

= F (z)− F (w) + y − y′,(12.41)

hence‖F (z)− F ′(w)‖ ≤ ‖F (z)− F (w)‖+ ‖y − y′‖. (12.42)

Using (12.36), we get

‖F (z)− F ′(w)‖ ≤ 1

4‖z − w‖+ ‖y − y′‖. (12.43)

Let us now take z = g(y) and w = g(y′), so that F (z) = z and F ′(w) = w. It follows from (12.43)that

‖z − w‖ ≤ 1

4‖z − w‖+ ‖y − y′‖, (12.44)

thus

‖z − w‖ ≤ 3

4‖y − y′‖. (12.45)

In other words, we have shown that the map g satisfies

‖g(y)− g(y′)‖ ≤ 3

4‖y − y′‖, for all y, y′ ∈ B(y0, ρ). (12.46)

It follows that g is a continuous map on B(y0, ρ).Step 4: computing the derivative of the inverse map. Now that we have shown that the

inverse map g = f−1 : B(y0, ρ) → B(x0, r1) is well defined and continuous, provided that ρ and r1are sufficiently small, it remains to show that g is continuously differentiable and for y ∈ B(y0, ρ)we have

Dg(y) = [Df(g(y))]−1. (12.47)

Note that it actually suffices to prove differentiability and relation (12.47) only for y = y0 = f(x0)since x0 was chosen arbitrarily, as an arbitrary point such that Df(x0) is an invertible matrix.

In this step of the proof we will no longer assume that Df(x0) is the n × n identity matrix In,and will denote A = Df(x0). For a given ε > 0 we can find δ > 0 such that δ < r0, with r0 as inExercise 12.9, and if ‖x− x0‖ < δ, then

‖f(x)− f(x0)−A(x− x0)‖ < ε‖x− x0‖. (12.48)

77

Page 78: Lecture notes for Math 61CM, Analysis, Version 2018math.stanford.edu/.../STANF61CM-18/lecture-notes-61cm-18-analysi… · Lecture notes for Math 61CM, Analysis, Version 2018 Lenya

The result of Exercise 12.9 shows that the image of the ball B(x0, δ) under f is an open set,hence it contains a ball B(y0, ρ). Let us take any y ∈ B(y0, ρ). Then, we have y = f(x) withsome x ∈ B(x0, δ), thus we may use (12.48) with y = f(x) and y0 = f(x0), and x = g(y), x0 = g(y0),to get

‖y − y0 −A(g(y)− g(y0))‖ < ε‖g(y)− g(y0)‖. (12.49)

Let us writey − y0 −A(g(y)− g(y0)) = A[A−1(y − y0)− (g(y)− g(y0))], (12.50)

and recall that by Lemma 12.5 there exists β > 0 so that

‖Aw‖ ≥ β‖w‖ for all w ∈ Rn. (12.51)

Using this in (12.50) gives

‖y − y0 −A(g(y)− g(y0))‖ = ‖A[A−1(y − y0)− (g(y)− g(y0))]‖ ≥ β‖A−1(y − y0)− g(y)− g(y0)‖= β‖g(y)− g(y0)−A−1(y − y0)‖.

(12.52)Inserting this into (12.49) gives

‖g(y)− g(y0)−A−1(y − y0)‖ ≤ε

β‖g(y)− g(y0)‖, (12.53)

for all y ∈ B(y0, ρ). This is almost want we need to say that Dg(y0) = A−1 except in the rightside we have ‖g(y)− g(y0)‖ rather than ‖y− y0‖ that we need. However, we can bootstrap: (12.53)implies that

‖g(y)− g(y0)‖ − ‖A−1(y − y0)‖ ≤ε

β‖g(y)− g(y0)‖, (12.54)

so that (1− ε

β

)‖g(y)− g(y0)‖ ≤ ‖A−1(y − y0)‖. (12.55)

Lemma 12.5 tells us that there exists α > 0 so that

‖A−1w‖ ≤ α‖w‖ for all w ∈ Rn. (12.56)

Using (12.56) in (12.55) gives us(1− ε

β

)‖g(y)− g(y0)‖ ≤ α‖(y − y0)‖. (12.57)

Now, (12.54) and (12.57) together imply that

‖g(y)− g(y0)−A−1(y − y0)‖ ≤(

1− ε

β

)−1αβε‖y − y0‖, (12.58)

for all y ∈ B(y0, ρ). It follows that g is differentiable at y0 and Dg(y0) = A−1.The final step, continuity of Dg(y) is surprisingly easy: we know that Dg(y) = [Df−1(g(y))]

and the matrix Df(x) is invertible at x = x0. It follows that detDf(x) 6= 0 in a ball around x0.Then, the explicit formula for the inverse of a matrix in terms of its minors implies that the inversematrix [Df ]−1(x) is a continuous function of x, hence its composition [Df ]−1(g(y)) with a continuousmap g(y) is also continuous. This completes the proof of the inverse function theorem. �

Let us note that, once we know that the inverse map g = f−1 is differentiable, its derivativematrix can be computed as follows. We start with the identity

f(g(y)) = y

and write it out in components as

fk(g1(y), g2(y), . . . , gn(y)) = yk, 1 ≤ k ≤ n.

78

Page 79: Lecture notes for Math 61CM, Analysis, Version 2018math.stanford.edu/.../STANF61CM-18/lecture-notes-61cm-18-analysi… · Lecture notes for Math 61CM, Analysis, Version 2018 Lenya

12.5 The implicit function theorem

The implicit function theorem starts with a system of n equations for n unknowns x = (x1, . . . , xn),parametrized by y ∈ Rm:

G1(x1, . . . , xn, y) = 0,

. . . . . .

Gn(x1, . . . , xn, y) = 0.

(12.59)

Let us assume that z = (z1, . . . , zn) is a solution of this system for some y0 ∈ Rm:

G(z, y0) = 0, (12.60)

or, in the system form,G1(z1, . . . , zn, y0) = 0,

. . . . . .

Gn(z1, . . . , zn, y0) = 0.

(12.61)

The question is whether if we take y close to y0, can we find a solution x of (12.59) that is closeto z. The system (12.7), addressed by the inverse function theorem, is a special case of this problem,with y ∈ Rn and G(x, y) = G(x) − y. There is a nice way to understand the general case via anapplication of the inverse function theorem. One problem to apply this theorem is that G(x, y)maps Rn+m to Rn and not to Rn+m, and in the inverse function theorem we need the domain of themap and the image of the map to have the same dimension. To fix this, consider the map

G(x, y) = (G(x, y), y), (12.62)

that maps Rn+m to Rn+m. This simply means that we re-write (12.59) by adding to it m equationsof the form yk = yk, and the system (12.59) is equivalent to

G(x, y) = (0, y). (12.63)

We know from (12.60) that the point (z, y0) satisfies

G(z, y0) = (0, y0). (12.64)

In addition, the derivative matrix DG(x, y) has the block form

DG(x, y) =

(DxG(x, y) DyG(x, y)

0 Im×m

). (12.65)

Thus, the matrix DG(x, y) is invertible if and only if the matrix DxG(x, y) is invertible. Let usassume that the derivative matrix DG(z, y0) is invertible and take y close to y0. Now, the inversefunction theorem implies that for all y close to y0 the system (12.63) has a unique solution x closeto z – this is the implicit function theorem. Let us formulate it precisely.

Theorem 12.10 Let a map G : Rn+m → Rn satisfy G(x0, y0) = 0 for some x0 ∈ Rn and y0 ∈ Rm.Assume that G is continuously differentiable at (x0, y0) and the n×n matrix DxG(x0, y0) is invertible.Then there exist r > 0 and ρ > 0 so that for every y ∈ B(y0, ρ) there exists a unique x ∈ B(x0, r)such that G(x, y) = 0.

79