mat 473 — differentiation · mat 473 — differentiation john quigg contents 1. introduction 2 2....

30
MAT 473 — DIFFERENTIATION JOHN QUIGG Contents 1. Introduction 2 2. Linear maps 4 3. Derivatives 8 4. Partial derivatives 12 5. Mean value 16 6. Inverse functions 19 7. Implicit functions 22 8. Higher-order derivatives 25 Index 30 Date : October 4, 2004. 1

Upload: others

Post on 23-Jun-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: MAT 473 — DIFFERENTIATION · MAT 473 — DIFFERENTIATION JOHN QUIGG Contents 1. Introduction 2 2. Linear maps 4 3. Derivatives 8 4. Partial derivatives 12 5. Mean value 16 6. Inverse

MAT 473 — DIFFERENTIATION

JOHN QUIGG

Contents

1. Introduction 22. Linear maps 43. Derivatives 84. Partial derivatives 125. Mean value 166. Inverse functions 197. Implicit functions 228. Higher-order derivatives 25Index 30

Date: October 4, 2004.1

Page 2: MAT 473 — DIFFERENTIATION · MAT 473 — DIFFERENTIATION JOHN QUIGG Contents 1. Introduction 2 2. Linear maps 4 3. Derivatives 8 4. Partial derivatives 12 5. Mean value 16 6. Inverse

2 JOHN QUIGG

1. Introduction

These lecture notes will develop “intermediate” mathematical anal-ysis in n-dimensional Euclidean space. What does this mean? Well,first of all, the material will contain all of what’s typically covered in“multi-variable advanced calculus”, which is the rigorous treatment ofcalculus in Rn. The main topics are differentiation and integrationof functions of n variables. The highlights are the Implicit FunctionTheorem, giving general conditions under which a system of nonlinearequations can be solved differentiably for some variables in terms of theothers, convergence theorems for Lebesgue integrals, and the Changeof Variables Theorem, giving a general process for transforming multi-ple integrals. Yes, that’s right: we’ll do Lebesgue rather than Riemannintegration. On the real line, it makes sense to start with Riemannintegration, not only because it’s a lot easier to develop than Lebesgueintegration, but it’s historically important to know about the basic the-ory of the Riemann integral on an interval [a, b]. However, in higherdimensions the Riemann integral no longer has any advantage — it’s apain in the neck to develop, and its properties are no match for thoseof the Lebesgue integral. In the real world, everyone uses Lebesgueintegration, so that’s what you should learn.

The style will be conversational, rather than formal. These lecturenotes are written for you (the student), rather than for the instructor orother professional mathematicians. What does this mean? I emphasizethe word style here — the mathematics will be correct, and the proofscomplete. But I’ve tried to write the stuff in “plain language”, the wayI would expect you to write it; there won’t be any definition or theoremI wouldn’t expect you to be able to state for me if I ask for it.

I assume you’ve had a preparatory course covering basic analysis inmetric spaces, as well as analysis on the real line — for example, seemy Lecture Notes for MAT 472, the prequel to this course. However,I won’t assume any specific knowledge of multivariable calculus. Youshould also be comfortable with linear algebra.

Advice for the student: These lecture notes are meant to be readcarefully ! You’ll have to read each bit several times, with pen and paperto write down what your brain is telling (asking?) you. There are lotsof exercises. A small number will be formally assigned as homeworkto be turned in, but all of them are important; my advice is to trythem all, but at least you should be familiar with what they all say.At any point in the lecture notes, anything that’s said in any exercisewhich comes before can be used. Also, for the solution of any exercise,

Page 3: MAT 473 — DIFFERENTIATION · MAT 473 — DIFFERENTIATION JOHN QUIGG Contents 1. Introduction 2 2. Linear maps 4 3. Derivatives 8 4. Partial derivatives 12 5. Mean value 16 6. Inverse

MAT 473 — DIFFERENTIATION 3

anything that appears in any other exercise which comes before can beused.

There’s a fairly complete index — use it! Also, get a lot of practicewriting down from memory the statements of definitions, results, andexamples. And you should know the results by their names.

Page 4: MAT 473 — DIFFERENTIATION · MAT 473 — DIFFERENTIATION JOHN QUIGG Contents 1. Introduction 2 2. Linear maps 4 3. Derivatives 8 4. Partial derivatives 12 5. Mean value 16 6. Inverse

4 JOHN QUIGG

2. Linear maps

Before we can study derivatives of functions of several variable, weneed to first do some analysis of linear maps1.

Notation and Terminology 2.1. Some notation for linear maps:

(i) L(Rn, Rm) denotes the set of linear maps from Rn to Rm.(ii) L(Rn) = L(Rn, Rn).(iii) For a linear map A : Rn → Rm we’ll typically omit parentheses,

writing Ax for A(x).

Every A ∈ L(Rn, Rm) is uniquely represented by an m × n ma-trix [A] = [aij] relative to the standard bases of Rn and Rm, so thatA(x1, . . . , xn)i =

∑nj=1 aijxj. The map A 7→ [A] is an isomorphism

of L(Rn, Rm) onto the vector space Rm×n of m × n matrices. If weidentify Rm×n with Rmn in one of the obvious (but it doesn’t reallymatter which) ways, the Euclidean norm on Rmn gives a norm on thematrices, hence a norm

‖A‖2 :=

(∑i,j

a2ij

)1/2

on L(Rn, Rm).

Lemma 2.2. If A ∈ L(Rn, Rm) then

‖Ax‖ ≤ ‖A‖2 for all ‖x‖ = 1.

Proof.

‖Ax‖2 =m∑

i=1

( n∑j=1

aijxj

)2

≤m∑

i=1

( n∑j=1

a2ij

∑j

x2j

)(Cauchy-Schwarz Inequality)

=∑ij

a2ij = ‖A‖2

2. �

Actually, for most purposes we want a different norm on linear maps:

Definition 2.3. The operator norm of A ∈ L(Rn, Rm) is ‖A‖ :=sup{‖Ax‖

∣∣ ‖x‖ = 1}.

By the above lemma, this sup exists and ‖A‖ ≤ ‖A‖2. There’sanother handy way to characterize ‖A‖:

1and note that we’re using “map” as a synonym for “function”

Page 5: MAT 473 — DIFFERENTIATION · MAT 473 — DIFFERENTIATION JOHN QUIGG Contents 1. Introduction 2 2. Linear maps 4 3. Derivatives 8 4. Partial derivatives 12 5. Mean value 16 6. Inverse

MAT 473 — DIFFERENTIATION 5

Lemma 2.4. If A ∈ L(Rn, Rm) then

‖A‖ = min{c ∈ R∣∣ ‖Ax‖ ≤ c‖x‖ for all x ∈ Rn}.

Proof. Put S = {c ∈ R∣∣ ‖Ax‖ ≤ c‖x‖ for all x ∈ Rn}, and note that

it’s enough to check the inequality for x 6= 0. If x 6= 0 then

‖Ax‖ = ‖x‖∥∥∥A

x

‖x‖

∥∥∥ ≤ ‖x‖‖A‖,

so ‖A‖ ∈ S. On the other hand, if c ∈ S then ‖Ax‖ ≤ c for all ‖x‖ = 1,so ‖A‖ ≤ c. �

Observation 2.5. Every A ∈ L(Rn, Rm) is uniformly continuous, be-cause2

‖Ax− Ay‖ = ‖A(x− y)‖ ≤ ‖A‖‖x− y‖.

We use the above lemma to verify that we really do have a norm onlinear maps:

Lemma 2.6. ‖ · ‖ is a norm on L(Rn, Rm).

Proof. ‖A‖ ≥ 0 since ‖Ax‖ ≥ 0 for all x ∈ Rn. If c ∈ R and ‖x‖ = 1then ‖cAx‖ = |c|‖Ax‖; taking the supremum over x gives ‖cA‖ =|c|‖A‖. Finally, if ‖x‖ = 1 then

‖(A + B)x‖ = ‖Ax + Bx‖ ≤ ‖Ax‖+ ‖Bx‖ ≤ ‖A‖+ ‖B‖,so ‖A + B‖ ≤ ‖A‖+ ‖B‖. �

Lemma 2.7. ‖AB‖ ≤ ‖A‖‖B‖ for all A ∈ L(Rn, Rm), B ∈ L(Rs, Rn).

Proof.‖ABx‖ ≤ ‖A‖‖Bx‖ ≤ ‖A‖‖B‖‖x‖,

so ‖AB‖ ≤ ‖A‖‖B‖. �

Example 2.8. The norm of the identity operator I on Rn is 1.

Exercise 2.9. Fix x ∈ Rn, and define A ∈ L(R, Rn) and B ∈ L(Rn, R)by At = tx and By = x · y. Prove that

‖A‖ = ‖B‖ = ‖x‖.

We saw above that ‖A‖ is dominated by the Euclidean norm ‖A‖2,and it’ll be important for us that we have something similar in theopposite direction:

Proposition 2.10.

‖A‖2 ≤√

nm ‖A‖ for all A ∈ L(Rn, Rm).

2The inequality in fact shows that A is Lipschitz continuous.

Page 6: MAT 473 — DIFFERENTIATION · MAT 473 — DIFFERENTIATION JOHN QUIGG Contents 1. Introduction 2 2. Linear maps 4 3. Derivatives 8 4. Partial derivatives 12 5. Mean value 16 6. Inverse

6 JOHN QUIGG

Proof. If {ej}n1 and {ui}m

1 are the standard bases for Rn and Rm, then

|aij| = |(Aej) · ui| ≤ ‖Aej‖‖ui‖ ≤ ‖A‖ for all i, j,

so‖A‖2

2 =∑ij

a2ij ≤ nm‖A‖2. �

Consequently, any of the natural isomorphisms of L(Rn, Rm) ontoRnm is a homeomorphism3

Corollary 2.11. det : L(Rn) → R is continuous.

Proof. Identifying the n × n matrices with Rn2, det is a polynomial,

hence continuous. Composing with the homeomorphism of L(Rn) onto

Rn2, det is continuous on L(Rn). �

Notation and Terminology 2.12. GL(n) denotes the set of invert-ible elements of L(Rn).

Corollary 2.13. GL(n) is open in L(Rn), and the map A 7→ A−1 onGL(n) is a homeomorphism.

Proof. First, GL(n) = det−1(R \ {0}), the pre-image of an open setunder a continuous function, hence is open. For the other part, by theadjoint formula for inverting matrices, each entry of the inverse matrix[A]−1 is a rational, hence continuous, function of the entries of [A].Therefore, inversion is continuous on GL(n). �

Since the identity I of L(Rn) is invertible and GL(n) is open, thereexists ε > 0 such that Bε(I) ⊂ GL(n). How big can we take ε? Since‖I‖ = 1 and the linear map 0 is noninvertible, certainly ε ≤ 1. It willbe important for us to know that in fact we can take ε = 1:

Proposition 2.14. For all A ∈ L(Rn), if ‖A − I‖ < 1 then A isinvertible.

Proof. It suffices to show that A is 1-1, and for this it suffices to showthat ker A = {0}. If x 6= 0 then

‖Ax‖ = ‖Ix + Ax− Ix‖ ≥ ‖x‖ − ‖(A− I)x‖≥ ‖x‖ − ‖A− I‖‖x‖ > ‖x‖ − ‖x‖ = 0. �

Exercise 2.15. Let m < n, and let

1 ≤ j1 < j2 < · · · < jm ≤ n.

3A homeomorphism is a continuous function between metric spaces which has acontinuous inverse.

Page 7: MAT 473 — DIFFERENTIATION · MAT 473 — DIFFERENTIATION JOHN QUIGG Contents 1. Introduction 2 2. Linear maps 4 3. Derivatives 8 4. Partial derivatives 12 5. Mean value 16 6. Inverse

MAT 473 — DIFFERENTIATION 7

Prove that the set

{A ∈ Rm×n | Aj1 , . . . , Ajm are linearly independent}is open in Rm×n, where Aj denotes the jth column of A.

Page 8: MAT 473 — DIFFERENTIATION · MAT 473 — DIFFERENTIATION JOHN QUIGG Contents 1. Introduction 2 2. Linear maps 4 3. Derivatives 8 4. Partial derivatives 12 5. Mean value 16 6. Inverse

8 JOHN QUIGG

3. Derivatives

Definition 3.1. Let E ⊂ Rn, f : E → Rm, and a ∈ Eo. The derivativeof f at a is a linear map f ′(a) : Rn → Rm such that

limx→0

f(a + x)− f(a)− f ′(a)x

‖x‖= 0.

Most students are initially confused by this definition. Why does itlook so different from the derivative in single-variable calculus? Well,the 1-variable case is special because real numbers can be divided. Youcan’t divide vectors in higher dimensions, in particular you can’t dividef(a + x) − f(a) by x when x ∈ Rn for n > 1. Anyway, you’ll recallthat one of the most important uses of the derivative for a 1-variablefunction is to approximate changes in the function by changes along thetangent line. What’s really going on here is that you’re approximatingf(a + x)− f(a) by f ′(a)x, and this latter is a linear4 function of x.

Definition 3.2. (i) The above linear map f ′(a) is unique if itexists, in which case f is differentiable at a.

(ii) f is differentiable if it is differentiable at each element of itsdomain.

Thus, by definition a differentiable function must have open domain.

Examples 3.3. (i) If A ∈ L(Rn, Rm) then A′(x) = A for all x ∈ Rn.(ii) If U ⊂ Rn is open and f : U → Rm is constant then f ′(x) = 0

for all x ∈ U .

Lemma 3.4. f = (f1, . . . , fm) : E → Rm is differentiable at a ifand only if each component function fi is, in which case f ′(a) =(f ′1(a), . . . , f ′m(a)).

Proof. If A = (A1, . . . , Am) ∈ L(Rn, Rm) then

f(a + x)− f(a)− Ax

‖x‖

=

(f1(a + x)− f1(a)− A1x

‖x‖, . . . ,

fm(a + x)− fm(a)− Amx

‖x‖

),

and the result follows since limits of vectors can be taken coordinate-wise. �

4in the technical sense, not the “freshman calculus” sense that it’s graph is astraight line!

Page 9: MAT 473 — DIFFERENTIATION · MAT 473 — DIFFERENTIATION JOHN QUIGG Contents 1. Introduction 2 2. Linear maps 4 3. Derivatives 8 4. Partial derivatives 12 5. Mean value 16 6. Inverse

MAT 473 — DIFFERENTIATION 9

Lemma 3.5. f is differentiable at a, with f ′(a) = A, if and only ifthere exists a function q defined on some neighborhood5 U of 0 suchthat limx→0 q(x) = 0 and

f(a + x) = f(a) + Ax + q(x)‖x‖ for all x ∈ U.

Proof. For each x ∈ Rn such that a + x ∈ dom f , define

q(x) =

{f(a+x)−f(a)−Ax

‖x‖ if x 6= 0

0 if x = 0.

The result follows from the definition of derivative. �

Proposition 3.6. If f is differentiable at a, then f is continuous at a.

Proof. With q as in Lemma 3.5,

f(a + x) = f(a) + f ′(a)x + q(x)‖x‖ x→0−−→ f(a),

since f ′(a) is continuous. �

Proposition 3.7 (Arithmetic of derivatives). If f and g are both dif-ferentiable at a, then:

(i) (f + g)′(a) = f ′(a) + g′(a);(ii) (fg)′(a)x =

(f ′(a)x

)g(a)+f(a)

(g′(a)x

)if f or g is real-valued;

(iii) (cf)′(a) = cf ′(a) if c ∈ R;(iv) (f · g)′(a)x =

(f ′(a)x

)· g(a) + f(a) ·

(g′(a)x

)if both f and g

are Rm-valued.

Proof. Use Lemma 3.5 to write

f(a + x) = f(a) + f ′(a)x + q(x)‖x‖g(a + x) = g(a) + g′(a)x + r(x)‖x‖

with q(x), r(x) → 0 as x → 0.(i) We have

(f + g)(a + x) = f(a) + f ′(a)x + q(x)‖x‖+ g(a) + g′(a)x + r(x)‖x‖= (f + g)(a) + (f ′(a) + g′(a))x + (q(x) + r(x))‖x‖,

and q(x) + r(x) → 0, so the result follows from Lemma 3.5.

5a neighborhood of a point is a set containing the point in its interior

Page 10: MAT 473 — DIFFERENTIATION · MAT 473 — DIFFERENTIATION JOHN QUIGG Contents 1. Introduction 2 2. Linear maps 4 3. Derivatives 8 4. Partial derivatives 12 5. Mean value 16 6. Inverse

10 JOHN QUIGG

(ii) Without loss of generality f is real-valued. Then

(fg)(a + x) =(f(a) + f ′(a)x + q(x)‖x‖

)(g(a) + g′(a)x + r(x)‖x‖

)= f(a)g(a) +

(f ′(a)x

)g(a) + f(a)

(g′(a)x

)+

(f ′(a)x

)(g′(a)x

)‖x‖

‖x‖

+(f(a) + f ′(a)x

)r(x)‖x‖+ q(x)‖x‖g(a + x).

We have ∥∥∥∥(f ′(a)x

)(g′(a)x

)‖x‖

∥∥∥∥ ≤ ‖f ′(a)‖‖g′(a)‖‖x‖ → 0.

Since f(a) + f ′(a)x → f(a) and r(x) → 0,(f(a) + f ′(a)x

)r(x) → 0.

Since q(x) → 0 and g(a + x) → g(a) by continuity,

q(x)g(a + x) → 0.

The result now follows from Lemma 3.5.(iii) Immediate from (ii), since the derivative of a constant function

is 0.(iv) Similar to (ii). �

Proposition 3.8 (Chain Rule). If f is differentiable at a and g isdifferentiable at f(a), then g ◦ f is differentiable at a and

(g ◦ f)′(a) = g′(f(a))f ′(a).

Proof. Use Lemma 3.5 to write

f(a + x) = f(a) + f ′(a)x + q(x)‖x‖g(f(a) + y) = g(f(a)) + g′(f(a))y + r(y)‖y‖

with limx→0 q(x) = 0 and limy→0 r(y) = 0. Putting y = f(a+x)−f(a),we have

g ◦ f(a + x) = g(f(a)) + g′(f(a))f ′(a)x + g′(f(a))q(x)‖x‖

+ r(y)

∥∥f ′(a)x + q(x)‖x‖∥∥

‖x‖‖x‖

Since q(x) → 0, so does g′(f(a))q(x). We have∥∥f ′(a)x + q(x)‖x‖∥∥

‖x‖≤ ‖f ′(a)‖+ ‖q(x)‖ x→0−−→ ‖f ′(a)‖,

Page 11: MAT 473 — DIFFERENTIATION · MAT 473 — DIFFERENTIATION JOHN QUIGG Contents 1. Introduction 2 2. Linear maps 4 3. Derivatives 8 4. Partial derivatives 12 5. Mean value 16 6. Inverse

MAT 473 — DIFFERENTIATION 11

and r(y) → 0 as x → 0 because f is continuous at a. Thus

r(y)

∥∥f ′(a)x + q(x)‖x‖∥∥

‖x‖x→0−−→ 0,

so we are done by Lemma 3.5. �

Exercise 3.9. Let f : Rn×Rm → Rk be bilinear, that is, for each fixedy ∈ Rm the function x 7→ f(x, y) is linear, and similarly y 7→ f(x, y) islinear for each x.

(a) Prove that there exists M ∈ R such that

‖f(x, y)‖ ≤ M‖x‖‖y‖ for all (x, y) ∈ Rn × Rm.

Hint: write x =∑

i xiei, where {ei} is the standard basis forRn, and use the inequality |xi| ≤ ‖x‖, and similarly for y.

(b) Prove that f is differentiable at each (a, b) ∈ Rn × Rm, with

f ′(a, b)(x, y) = f(a, y) + f(x, b) for all (x, y) ∈ Rn × Rm.

The derivative formula for bilinear maps in the preceding exercisecan be regarded as a very general version of the product rule. In fact,it’s the “mother of all product rules”. For example:

Exercise 3.10. Use the derivative formula for bilinear maps and theChain Rule to deduce Lemma 3.7 (ii) and (iv).

Page 12: MAT 473 — DIFFERENTIATION · MAT 473 — DIFFERENTIATION JOHN QUIGG Contents 1. Introduction 2 2. Linear maps 4 3. Derivatives 8 4. Partial derivatives 12 5. Mean value 16 6. Inverse

12 JOHN QUIGG

4. Partial derivatives

Definition 4.1. Let E ⊂ Rn, f : E → R, a ∈ E◦, and j = 1, . . . , n.The partial derivative of f at a with respect to the jth variable is

Djf(a) = limt→0

f(a + tej)− f(a)

t,

provided this limit exists, where e1, . . . , en denotes the standard basisfor Rn.

We have

Djf(a) =d

dtf(a + tej)

∣∣∣∣t=0

=d

dxj

f(a1, . . . , xj, . . . , an)

∣∣∣∣xj=aj

,

which we can interpret as the derivative at aj of the one-variable func-tion we get from f by holding all the other variables {xk | k 6= j}constant.

Notation and Terminology 4.2. In simple situations it’s frequentlyconvenient to use notation like

∂xj

f(x1, . . . , xn) = Djf(x1, . . . , xn) or∂f

∂xj

= Djf,

especially when we don’t want to make up a name for the functionand/or numbers for the variables.

Example 4.3.∂

∂xx2y = 2xy.

If f = (f1, . . . , fm) : Rn → Rm, we can consider the partial deriva-tives Djfi of the component functions f1, . . . , fm.

Proposition 4.4. If f is differentiable at a, then Djfi(a) exists for alli, j, and f ′(a) is represented by the matrix6 [Djfi(a)].

Proof. By the Chain Rule,

Djfi(a) =d

dtfi(a + tej)

∣∣∣∣t=0

= f ′i(a + tej)ej

∣∣∣∣t=0

= f ′i(a)ej,

the ij-entry of the matrix representing f ′(a). �

Thus, if f is differentiable then all its partial derivatives Difj exist.But not conversely. In fact:

6Some people call the following matrix the Jacobian of f at a, while others callit the Jacobian matrix, and its determinant the Jacobian. Due to this confusion inthe terminology, we’ll just eschew it altogether.

Page 13: MAT 473 — DIFFERENTIATION · MAT 473 — DIFFERENTIATION JOHN QUIGG Contents 1. Introduction 2 2. Linear maps 4 3. Derivatives 8 4. Partial derivatives 12 5. Mean value 16 6. Inverse

MAT 473 — DIFFERENTIATION 13

Exercise 4.5. Define f : R2 → R by

f(x, y) =

xy

x2 + y2if (x, y) 6= (0, 0)

0 if (x, y) = (0, 0).

Show that at (0, 0), both partial derivatives of f exist, but f is noteven continuous.

Even if f is continuous, existence of the partial derivatives doesn’tguarantee differentiability, as you’ll show in the following exercise. Butfirst, note that if all the partial derivatives Djfi(a) exist, then f ′(a)exists if and only if (f(a + x)− f(a)− Ax)/‖x‖ → 0 as x → 0, whereA is the linear map represented by the matrix of partials.

Exercise 4.6. Change the above f so f(x, y) = x2y/(x2 + y2) when(x, y) 6= (0, 0). Show that at (0, 0), f is continuous and both partialderivatives exist, but f is not differentiable at (0, 0).

There’s another way to detect nondifferentiability in the above ex-ercise, involving a new gadget we introduce in the following definition.The partial derivatives measure the rate of change in the directionsof the standard basis vectors e1, . . . , en. Here’s how to do it in anydirection:

Definition 4.7. Let E ⊂ Rn, f : E → R, and a ∈ E◦, and let u be aunit vector7 in Rn. The directional derivative of f at a in the directionu is

Duf(a) := limt→0

f(a + tu)− f(a)

t.

Thus

Duf(a) =d

dtf(a + tu)

∣∣∣∣t=0

.

By the Chain Rule, if f is differentiable at a, then every partial deriv-ative exists. In particular, if f ′(a) = 0, then Duf(a) = 0 for every unitvector u. In the above exercise, at (0, 0) both partial derivatives are 0,but the directional derivative in the direction u = 1/

√2(1, 1) is

limt→0

f(tu)

t= lim

t→0

1

t

(t2/2)(t/√

2)

t2=

1

2√

2.

But all the directional derivatives could have been 0 without implyingdifferentiability:

7A unit vector is a vector with norm 1.

Page 14: MAT 473 — DIFFERENTIATION · MAT 473 — DIFFERENTIATION JOHN QUIGG Contents 1. Introduction 2 2. Linear maps 4 3. Derivatives 8 4. Partial derivatives 12 5. Mean value 16 6. Inverse

14 JOHN QUIGG

Exercise 4.8. Let

f(x, y) =

x2y

√x2 + y2

x4 + y2if (x, y) 6= (0, 0)

0 if (x, y) = (0, 0).

(a) Prove that all the directional derivatives of f at (0, 0) are 0.(b) Prove that f is nondifferentiable at (0, 0). Hint: try going to

(0, 0) along a parabola.

In Lemma 5.6 we’ll fix the situation by showing that if the partialderivatives are continuous then f is differentiable.

Examples 4.9. (i) If E ⊂ R is an open interval and g = (g1, . . . , gn) : E →Rn is differentiable, then for each t ∈ E the derivative

g′(t) = limx→0

g(t + x)− g(t)

x

is represented by the column matrix with ith entry g′i(t). Identify n×1matrices with elements of Rn. Then g′(t) = (g′1(t), . . . , g

′n(t)), and this

is called the tangent vector to the curve g at t.(ii) If U ⊂ Rn is open and f : U → R is differentiable, then for

each a ∈ U the derivative f ′(a) is represented by the row matrix withjth entry Djf(a). Associate to this the gradient vector ∇f(a) :=(D1f(a), . . . , Dnf(a)). Then

f ′(a)x = ∇f(a) · x =n∑1

Djf(a)xj.

(iii) Now combine (i)–(ii): if t ∈ E then

(f ◦ g)′(t) = f ′(g(t))g′(t) = ∇f(g(t)) · g′(t) =n∑1

Djf(g(t))g′j(t).

Thus, for example, if f is differentiable at a and u is a unit vector, thendirectional derivative of f at a in the direction u is

Duf(a) = ∇f(a) · u.

At the point a, the maximum directional derivative is ‖∇f(a)‖, in thedirection ∇f(a). Also, ∇f(a) is orthogonal to the level hypersurfaceS := {x ∈ Rn | f(x) = f(a)} of f through a, since if g : E → S is anydifferentiable curve in S with g(t) = a then f ◦ g is constant, hence

0 = (f ◦ g)′(t) = ∇f(a) · g′(t).

Page 15: MAT 473 — DIFFERENTIATION · MAT 473 — DIFFERENTIATION JOHN QUIGG Contents 1. Introduction 2 2. Linear maps 4 3. Derivatives 8 4. Partial derivatives 12 5. Mean value 16 6. Inverse

MAT 473 — DIFFERENTIATION 15

Example 4.10. For all r ∈ R we have

∇‖x‖r = r‖x‖r−2x

because

Di‖x‖r =∂

∂xi

( n∑1

x2j

)r/2

=r

2

( n∑1

x2j

)r/2−1

2xi = r‖x‖r−2xi.

Exercise 4.11. Let f : R → Rn be differentiable, and suppose thereexists r > 0 such that for all t ∈ R we have f(t) ∈ ∂Br(0) (the boundaryof the ball of radius r centered at 0 in Rn). Prove that for all t ∈ R,the vectors f(t) and f ′(t) are orthogonal, that is, f(t) · f ′(t) = 0.

Exercise 4.12. Prove that if E ⊂ Rn, and f : E → R is differentiableat x and has a maximum or minimum at x, then f ′(x) = 0. Hint: youknow it’s true when n = 1.

Exercise 4.13. Let f, g : R2 → R be differentiable and satisfy theCauchy-Riemann equations, that is,

D1f = D2g and D2f = −D1g

on R2. Define u, v : R2 → R by

u(r, θ) = f(r cos θ, r sin θ) and v(r, θ) = g(r cos θ, r sin θ).

Prove that u and v satisfy

D1u(r, θ) =1

rD2v(r, θ) and D1v(r, θ) = −1

rD2u(r, θ)

for all (r, θ) ∈ R2 with r 6= 0..

Page 16: MAT 473 — DIFFERENTIATION · MAT 473 — DIFFERENTIATION JOHN QUIGG Contents 1. Introduction 2 2. Linear maps 4 3. Derivatives 8 4. Partial derivatives 12 5. Mean value 16 6. Inverse

16 JOHN QUIGG

5. Mean value

Theorem 5.1 (Mean Value Theorem). Let U ⊂ Rn be open and con-vex, and let f : U → R be differentiable. Then for all x, y ∈ U thereexists z on the line segment joining x and y such that

f(x)− f(y) = f ′(z)(x− y).

Proof. Fix x, y ∈ U , and define g : [0, 1] → U by

g(t) = tx + (1− t)y,

so that the range of g is the line segment joining x and y. Then f ◦ gis continuous on [0, 1] and differentiable on (0, 1). By the one-variableMean Value Theorem, there exists c ∈ (0, 1) such that

f(x)− f(y) = f ◦ g(1)− f ◦ g(0) = (f ◦ g)′(c)(1− 0)

= f ′(g(c))g′(c)

= f ′(cx + (1− c)y

)(x− y)

so we can take z = cx + (1− c)y. �

Perhaps surprisingly, there is no general version of the Mean ValueTheorem for functions f : U → Rm:

Exercise 5.2. Define f : R → R2 by f(x) = (cos x, sin x). Prove thatthere does not exist z between 0 and 2π such that

f(2π)− f(0) = 2πf ′(z).

However, at least we get an inequality, which turns out to be goodenough for most purposes:

Corollary 5.3 (Mean Value Inequality). Let U ⊂ Rn be open andconvex, and let f : U → Rm be differentiable. Suppose ‖f ′(x)‖ ≤ M forall x ∈ U . Then

‖f(x)− f(y)‖ ≤ M‖x− y‖ for all x, y ∈ U.

Proof. Fix x, y ∈ U and a unit vector u ∈ Rm, and define A ∈ L(Rm, R)by

A(v) = u · v.

By the Mean Value Theorem there exists z on the line segment joiningx and y such that

u · (f(x)− f(y)) = A ◦ f(x)− A ◦ f(y) = (A ◦ f)′(z)(x− y)

= u · f ′(z)(x− y).

Page 17: MAT 473 — DIFFERENTIATION · MAT 473 — DIFFERENTIATION JOHN QUIGG Contents 1. Introduction 2 2. Linear maps 4 3. Derivatives 8 4. Partial derivatives 12 5. Mean value 16 6. Inverse

MAT 473 — DIFFERENTIATION 17

Hence

‖u · (f(x)− f(y))‖ ≤ ‖u‖‖f ′(z)‖‖x− y‖ ≤ M‖x− y‖.Taking the sup over u yields the desired inequality. �

Definition 5.4. f is continuously differentiable, or C1, if it is differ-entiable and its derivative is continuous.

In more detail, if U ⊂ Rn is open and f : U → Rm is differentiable,then f is C1 if f ′ : U → L(Rn, Rm) is continuous. Why do we have thesuperscript 1 on the C? Well, as you might guess, this is “first-order”continuous differentiability; we’ll increase our superscript later whenwe discuss higher-order derivatives.

Observation 5.5. f = (f1, . . . , fm) is C1 if and only if each componentfunction fi is.

Lemma 5.6. With the above notation, f is C1 if and only if everypartial derivative Djfi exists and is continuous on U .

Proof. Without loss of generality f is real-valued. If f is C1, then theentries Djf of the matrix function representing f ′ are continuous.

Conversely, assume the condition regarding the partials. It suffices toshow f is differentiable, for then the entries of the matrix representingf ′ are continuous, hence f ′ itself is continuous. Let a ∈ U , and chooser > 0 such that Br(a) ⊂ U , and then take any x ∈ Rn with ‖x‖ < r.Define points a0, . . . , an ∈ Br(a) by

aj =

{a if j = 0

a +∑j

1 xiei if j = 1, . . . , n.

Then

f(a + x)− f(a) =n∑1

(f(aj)− f(aj−1)

)=

n∑1

Djf(aj−1 + tjej)xj

for some tj between 0 and xj, so∣∣∣∣f(a + x)− f(a)−∇f(a) · x‖x‖

∣∣∣∣ =1

‖x‖

∣∣∣∣ n∑1

(Djf(aj−1 + tjej)−Djf(a)

)xj

∣∣∣∣≤

n∑1

∣∣Djf(aj−1 + tjej)−Djf(a)∣∣

x→0−−→ 0

Page 18: MAT 473 — DIFFERENTIATION · MAT 473 — DIFFERENTIATION JOHN QUIGG Contents 1. Introduction 2 2. Linear maps 4 3. Derivatives 8 4. Partial derivatives 12 5. Mean value 16 6. Inverse

18 JOHN QUIGG

by continuity of the Djf , since

‖aj−1 + tjej − a‖ =

∥∥∥∥ j−1∑i=1

xiei + tjej

∥∥∥∥≤

j−1∑1

|xi|+ |tj| ≤n∑1

|xi| ≤ n‖x‖ → 0.

Thus f is differentiable at a by Lemma 3.5. �

The above result gives a sufficient, but not necessary, condition fordifferentiability:

Exercise 5.7. Define f : R → R by

f(x) =

{x2 sin 1

xif x 6= 0

0 if x = 0.

Prove that f is differentiable on R, but f ′ is discontinuous at 0.

However, the above lemma applies in an overwhelming majority ofcases; for example, it immediately implies that any rational function8

is differentiable.

Exercise 5.8. Let U = {(x, y) ∈ R2 | y 6= 0}, and define q : U → R by

q(x, y) =x

y.

(a) Find a formula for q′(a, b) for (a, b) ∈ U .(b) Use the formula you found in part (a) together with the Chain

Rule to derive a “quotient rule” for(f

g

)′

(a)x

where f, g : E → R, E ⊂ Rn, a ∈ Eo, both f and g aredifferentiable at a, and 0 /∈ ran g.

Exercise 5.9. Let U ⊂ Rn be open, and let f : U → Rm. Supposeevery partial derivative Djfi is bounded on U . Prove that f is con-tinuous on U . Hint: the Mean Value Inequality does not apply. Usethe technique of the proof that if every Djfi is continuous then f isdifferentiable.

8A rational function is a quotient of two polynomials, and a polynomial on Rn

is a linear combination of monomials xk11 · · ·xkn

n (where each ki is a nonnegativeinteger).

Page 19: MAT 473 — DIFFERENTIATION · MAT 473 — DIFFERENTIATION JOHN QUIGG Contents 1. Introduction 2 2. Linear maps 4 3. Derivatives 8 4. Partial derivatives 12 5. Mean value 16 6. Inverse

MAT 473 — DIFFERENTIATION 19

6. Inverse functions

The Inverse Function Theorem from 1-variable calculus says that ifU is an open interval in R, f : U → R is 1-1 and continuous, a ∈ U ,and f is differentiable at a with f ′(a) 6= 0, then the inverse f−1 isdifferentiable at a and

(f−1)′(f(a)) =1

f ′(a).

The n-dimensional version (Theorem 6.2 below) requires a muchstronger assumption on the derivative of f : it must exist and in factbe continuous on some neighborhood of a. As a consolation, we candeduce that f is 1-1 near a. But before we get into the theorem it-self, let’s observe that the (appropriate n-dimensional version of the)formula for the derivative of the inverse follows immediately from theChain Rule:

Exercise 6.1. Let E ⊂ Rn, f : E → Rn, and a ∈ E◦. Assume f isdifferentiable at a. Also assume f is 1-1, so that we have an inversefunction f−1 : f(E) → E. Finally, assume f−1 is differentiable at f(a).

(a) Prove that f ′(a) is invertible and

(f−1)′(f(a)) = f ′(a)−1.

(b) Carefully explain how part (a) implies that if f ′ is continuousat a then (f−1)′ is continuous at f(a).

It is one of the miracles of calculus that (roughly speaking) continuityand invertibility of the derivative make everything else happen:

Theorem 6.2 (Inverse Function Theorem). Let U ⊂ Rn be open,f : U → Rn be C1, and a ∈ U . If f ′(a) is invertible, then there existopen sets V, W ⊂ Rn such that a ∈ V ⊂ U , f(a) ∈ W , and f : V → Wis 1-1 onto with C1 inverse.

Proof. We can simplify things considerably with some preliminaries.We’ll choose V in such a way that f ′(x) will be invertible for all x ∈ V .Consequently, to conclude that f(V ) is open, it suffices to show thatf(a) is an interior point of f(U), and for this it’s enough to have f(a)interior to f(V ) for some V ⊂ U . Thus, once we have an open subsetV of U such that f is 1-1 on V and f ′(x) is invertible for all x ∈ V ,we’ll know that f(V ) is open. For the same reason, once we have sucha set V , to see that f−1 is C1 on f(V ) it suffices to show that f−1 isdifferentiable at f(a), because the above exercise will then imply that(f−1)′ is continuous at f(a), and this holds for every a at which thederivative of f is invertible, hence at every element of V .

Page 20: MAT 473 — DIFFERENTIATION · MAT 473 — DIFFERENTIATION JOHN QUIGG Contents 1. Introduction 2 2. Linear maps 4 3. Derivatives 8 4. Partial derivatives 12 5. Mean value 16 6. Inverse

20 JOHN QUIGG

Replacing f by f ′(a)−1◦f , without loss of generality f ′(a) = I. Thenreplacing f by x 7→ f(x + a) − f(a), without loss of generality a = 0and f(0) = 0.

We get our subset V as follows: since f is C1, there exists ε > 0 suchthat V := Bε(0) ⊂ U and

‖f ′(x)− I‖ <1

2for all x ∈ V.

In particular, f ′(x) is invertible for all x ∈ V .Since V is convex and open, the Mean Value Inequality tells us that

for all x, z ∈ V we have

‖(f − I)(x)− (f − I)(z)‖ ≤ ‖x− z‖2

,

hence

‖f(x)− f(z)‖ ≥ ‖x− z‖ − ‖(f − I)(x)− (f − I)(z)‖

≥ ‖x− z‖2

.(1)

Thus f is 1-1 on V , and moreover f−1 is continuous on W := f(V ).For the rest of the proof we ignore the original U and regard f as afunction from V 1-1 onto W .

As we discussed at the start of the proof, it remains to show that0 ∈ W ◦ and f−1 is differentiable at 0.

For the first, put B = Bε/2(0), an open ball containing 0 such that

B ⊂ V . We have 0 /∈ f(∂B) because 0 /∈ ∂B and f is 1-1 on V . Since∂B is compact and z 7→ ‖f(z)‖ is continuous, there exists δ > 0 suchthat

‖f(z)‖ ≥ 2δ for all z ∈ ∂B.

Claim: Bδ(0) ⊂ W . Let w ∈ Bδ(0), and define g : B → R by g(z) =‖f(z) − w‖2. Since B is compact and g is continuous and nonzero, ghas a minimum at some p ∈ B. Since g(0) = ‖w‖2 < δ2 and g(z) ≥ δ2

for all z ∈ ∂B, we have p ∈ B. Thus for all h ∈ Rn we have

0 = g′(p)h = 2(f(p)− w) · f ′(p)h.

Since f ′(p) is invertible, there exists h ∈ Rn such that f ′(p)h = f(p)−w, so we must have f(p)− w = 0, proving the claim.

Page 21: MAT 473 — DIFFERENTIATION · MAT 473 — DIFFERENTIATION JOHN QUIGG Contents 1. Introduction 2 2. Linear maps 4 3. Derivatives 8 4. Partial derivatives 12 5. Mean value 16 6. Inverse

MAT 473 — DIFFERENTIATION 21

It remains to show f−1 is differentiable at 0. Let y ∈ W \ {0} andx = f−1(y). Then x ∈ V \ {0}, so

‖f−1(y)− f−1(0)− Iy‖‖y‖

=‖x‖‖y‖

‖x− f(x)‖‖x‖

≤ 2‖f(x)− x‖

‖x‖,

since ‖y‖ ≥ ‖x‖/2 by (1). This latter inequality also implies that asy → 0 we have x → 0, hence ‖f(x)− x‖/‖x‖ → 0 since f ′(0) = I, andwe’ve shown that (f−1)′(0) = I. �

Continuity of the derivative was crucial in the Inverse Function The-orem. We only assumed f ′ was invertible at one point, but then bycontinuity we knew f ′ must be invertible nearby. The theorem woulddefinitely become false if we don’t assume f ′ is continuous, even in the1-variable case:

Exercise 6.3. Define f : R → R by

f(x) =

x + 2x2 sin

(1

x

)if x 6= 0

0 if x = 0.

Show that f is differentiable on R and f ′(0) 6= 0, but f is not 1-1 onany open interval containing 0.

Exercise 6.4. Let U = {(x, y) ∈ R2 : y 6= 0}, and define f : U → R2

by

f(x, y) =

(ex + xy2,

2 sin πx

y

).

Note that f(1, 1) = (e + 1, 0).

(a) Use the Inverse Function Theorem to show that f is invertiblenear (1, 1), and find a formula for (the matrix representing)(f−)′(e + 1, 0).

(b) Why does the Inverse Function Theorem not apply to the ques-tion of whether f is invertible near (0, 1)?

It follows from the Inverse Function Theorem that if f is C1 andf ′(x) is invertible for all x then ran f is open. However, f need not be1-1; the Inverse Function Theorem only tells us f is locally 1-1.

Exercise 6.5. Define f : R2 → R2 by

f(x, y) =(ex cos y, ex sin y

).

Prove that f is C1 and f ′(x, y) is invertible for all (x, y) ∈ R2, but f isnot 1-1. What is the range of f?

Page 22: MAT 473 — DIFFERENTIATION · MAT 473 — DIFFERENTIATION JOHN QUIGG Contents 1. Introduction 2 2. Linear maps 4 3. Derivatives 8 4. Partial derivatives 12 5. Mean value 16 6. Inverse

22 JOHN QUIGG

7. Implicit functions

Notation and Terminology 7.1. For n, m ∈ N, it is often convenientto abuse notation by identifying Rn × Rm with Rn+m, so that if x =(x1, . . . , xn) and y = (y1, . . . , ym) then we allow ourselves to write

(x, y) = (x1, . . . , xn, y1, . . . , ym).

Similarly, if f is a C1 function defined on an open subset of Rn+m, thenfor (a, b) ∈ Rn × Rm = Rn+m it is sometimes convenient to write

f ′(a, b) =[DIf(a, b) DIIf(a, b)

],

where DIf means the derivative of f with respect to the first n coordi-nates and DIIf the derivative with respect to the last m coordinates.Then

f ′(a, b)(x, y) = DIf(a, b)x + DIIf(a, b)y for all (x, y) ∈ Rn × Rm.

If f is an Rm-valued function defined on a subset of Rn+m, then forc ∈ Rm the equation f(x, y) = c is really a system of m equationsin n + m unknowns. If n = 0 the Inverse Function Theorem gives asufficient condition for the equation to have a unique solution. Moregenerally, the following theorem, certainly one of the most importantin multivariable calculus, gives a sufficient condition for us to solve theequation for y as a function of x:

Theorem 7.2 (Implicit Function Theorem). Let U ⊂ Rn×Rm be open,f : U → Rm be C1, and (a, b) ∈ U . Suppose f(a, b) = 0 and DIIf(a, b)is invertible. Then there exists an open set V ⊂ U such that (a, b) ∈ Vand V ∩ f−1(0) is the graph of a C1 function g defined on some opensubset of Rn.

Proof. More precisely, we must show that there exist open sets V ⊂ Uand W ⊂ Rn and a C1 function g : W → Rm such that for all (x, y) ∈ Vwe have f(x, y) = 0 if and only if x ∈ W and y = g(x).

We use a common trick of introducing an auxiliary function: definef̃ : U → Rn × Rm by f̃(x, y) = (x, f(x, y)). Then f̃ is C1 and

f̃ ′(a, b) =

[I 0

DIf(a, b) DIIf(a, b)

]is invertible. By the Inverse Function Theorem there exists an openset V ⊂ U such that (a, b) ∈ V and f̃ is 1-1 on V with C1 inverse.

Note that for all (s, t) ∈ f̃(V ), f̃−1(s, t) is the unique (x, y) ∈ V suchthat

(s, t) = f̃(x, y) = (x, f(x, y)).

Page 23: MAT 473 — DIFFERENTIATION · MAT 473 — DIFFERENTIATION JOHN QUIGG Contents 1. Introduction 2 2. Linear maps 4 3. Derivatives 8 4. Partial derivatives 12 5. Mean value 16 6. Inverse

MAT 473 — DIFFERENTIATION 23

Thus s = x, and

f̃−1(x, t) = (x, h(x, t))

for a unique function h : f̃(V ) → V , which is C1 since f̃−1 is.Now put

W = {x ∈ Rn | (x, 0) ∈ f̃(V )}.Then W is open and a ∈ W . Define g : W → Rm by

g(x) = h(x, 0).

Then g is C1 since h is.Claim: V ∩ f−1(0) coincides with the graph of g. To see this, first

suppose (x, y) ∈ V and f(x, y) = 0. Then f̃(x, y) = (x, 0). Hence x ∈W and (x, y) = f̃−1(x, 0), so that y = h(x, 0) = g(x). Conversely, wecan just reverse these steps to argue that if x ∈ W and y = g(x), then

(x, y) = (x, h(x, 0)) = f̃−1(x, 0), hence (x, y) ∈ V and (x, 0) = f̃(x, y),thus f(x, y) = 0. �

In the statement of the Implicit Function Theorem, there’s nothingmagic about the value 0 — no matter what the value of f at (a, b) is,we can solve the equation f(x, y) = f(a, b) in a continuously differen-tiable manner for y as a function of x for (x, y) near (a, b); this followsimmediately from the Implicit Function Theorem upon replacing f byf − f(a, b). Nor is there anything magic about solving for the last mvariables in terms of the first n variables — if we assume that f is C1

and f ′(a, b) is onto Rm, then there are m linearly independent columnsin the matrix [f ′(a, b)], so we can solve for the corresponding vari-ables in terms of the others; this follows immediately from the ImplicitFunction Theorem by composing f by a rearrangement of coordinatesin Rn+m. In particular, if m = 1 then we just need f ′(a, b) 6= 0.

What about the “implicit differentiation” you did in your first cal-culus course? The Implicit Function Theorem justifies doing it, andhere’s what you were really doing:

Exercise 7.3. Let U ⊂ Rn × Rm be open, f : U → Rm be C1, and(a, b) ∈ U . Suppose f(a, b) = 0 and DIIf(a, b) is invertible. Let g bethe unique C1 solution near (a, b) of the equation f(x, y) = 0 for y interms of x, guaranteed by the Implicit Function Theorem, defined onan open set W containing a, so that g(a) = b and f(x, g(x)) = 0 for allx ∈ W .

(a) Define h : W → Rn × Rm by h(x) = (x, g(x)). Prove that

h′(a)x = (x, g′(a)x) for all x ∈ Rn.

Page 24: MAT 473 — DIFFERENTIATION · MAT 473 — DIFFERENTIATION JOHN QUIGG Contents 1. Introduction 2 2. Linear maps 4 3. Derivatives 8 4. Partial derivatives 12 5. Mean value 16 6. Inverse

24 JOHN QUIGG

(b) Show that

(f ◦ h)′(a)x = DIf(a, b)x + DIIf(a, b)g′(a)x for all x ∈ Rn,

where as before in this situation we write DIf for the derivativeof f with respect to the first n variables and DIIf for thederivative with respect to the last m variables.

(c) Deduce the formula

g′(a) = −DIIf(a, b)−1DIf(a, b).

Exercise 7.4. Prove that there exist r > 0 and continuously differ-entiable real-valued functions u, v, w defined on the open ball Br(1, 1)in R2 such that u(1, 1) = 1, v(1, 1) = 1, w(1, 1) = −1, and for all(x, y) ∈ Br(1, 1),

u5 + xv2 − y + w = 0

v5 + yu2 − x + w = 0

w4 + y5 − x4 = 1.

Also, find DIIv(1, 1).

Exercise 7.5. Let f : R2 → R be C1 and (a, b) ∈ R2. Assumef ′(a, b) 6= 0. Use the Implicit Function Theorem to prove that f isnot 1-1.

We observed earlier that the Inverse Function theorem implies thata C1 function whose derivative is never singular9 has open range; wecan generalize this:

Exercise 7.6. Let U ⊂ Rn+m be open and f : U → Rm be C1, withf ′(x, y) of rank m for all (x, y) ∈ U . Use the Implicit Function Theorem(or the methods of its proof) to prove that the range of f is open. Hint:you might find it useful to prove that the projection

p : Rn+m = Rn × Rm → Rm

defined by p(x, y) = y takes open sets to open sets.

Exercise 7.7. Let U be an open subset of Rn, let f : U → Rm be C1,and let a ∈ U . Suppose that f ′(a) is onto Rm, and that n > m. Provethat there exists an open subset V of U containing a such that f ′(x) isonto Rm for all x ∈ V .

9A square matrix is singular if it is noninvertible.

Page 25: MAT 473 — DIFFERENTIATION · MAT 473 — DIFFERENTIATION JOHN QUIGG Contents 1. Introduction 2 2. Linear maps 4 3. Derivatives 8 4. Partial derivatives 12 5. Mean value 16 6. Inverse

MAT 473 — DIFFERENTIATION 25

8. Higher-order derivatives

How can we get our hands on higher-order derivatives? Remember:for a differentiable multi-variable function f : U → Rm (with U ∈ Rn),the derivative f ′(x) at any point x ∈ U is a linear map. So whatwould the derivative f ′′ of f ′ be? Well, we have f ′ : U → L(Rn, Rm),so presumably f ′′(x) would be a linear map from Rn to L(Rn, Rm). Asany fool can see, this is getting out of hand. So, we bid a sad adieuto higher-order derivatives in this naive sense10, and content ourselveswith differentiating the partials:

Definition 8.1. Let U ⊂ Rn be open and f : U → R. For each k-tuple(i1, . . . , ik) ∈ {1, . . . , n}k of integers between 1 and n, define

Di1···ikf = Di1 · · ·Dikf.

What does the above equation mean? We apply the operators fromright to left, so the expression “Di1 · · ·Dikf” means first take the partialderivative Dikf , then the partial derivative Dik−1

(Dikf) of that, and callthis Dik−1ikf , and continue in this way, eventually taking the partialderivative Di1(Di2···ikf) of Di2···ikf .

Definition 8.2. Di1···ikf is a kth-order partial derivative of f .

These are the higher-order derivatives we’ll use. What about “higher-order continuous differentiability”? We won’t try to make up a namefor it, but we’ll use a convenient notation:

Definition 8.3. f is Ck if Di1···ikf is continuous for every k-tuple(i1, . . . , ik).

Observation 8.4. If f is Ck, then it is Cj for every j < k: for every(k − 1)-tuple (i2, . . . , ik), the partial derivatives

Di1Di2···ikf = Di1···ikf

are continuous for all i1, so Di2···ikf is differentiable, hence continuous.This means f is Ck−1, and we can continue inductively.

The notation Di1···ikf gives an explicit order for taking partial deriv-atives, and this is important:

Exercise 8.5. Define f : R2 → R by

f(x, y) =

xy3

x2 + y2if (x, y) 6= (0, 0)

0 if (x, y) = (0, 0).

10Actually, for some (but not our) purposes, higher derivatives of f are usedprofitably as multilinear maps.

Page 26: MAT 473 — DIFFERENTIATION · MAT 473 — DIFFERENTIATION JOHN QUIGG Contents 1. Introduction 2 2. Linear maps 4 3. Derivatives 8 4. Partial derivatives 12 5. Mean value 16 6. Inverse

26 JOHN QUIGG

Show that D12f(0, 0) and D21f(0, 0) both exist, but are not equal.

However, if the partials are continuous, the order doesn’t matter:

Theorem 8.6 (Clairaut’s Theorem). If f is Ck, then Dj1···jkf = Di1···ikf

for every rearrangement (j1, . . . , jk) of (i1, . . . , ik).

Proof. Since every rearrangement of (i1, . . . , ik) can be obtained byswitching pairs of coordinates finitely many times, and since partialderivatives are computed by holding the other coordinates constant,without loss of generality n = k = 2. Fix (a, b) ∈ U . Choose openintervals I and J such that

(a, b) ∈ I × J ⊂ U,

and define g : I × J → R by

g(x, y) = f(x, y)− f(a, y)− f(x, b) + f(a, b) = h(y)− h(b),

where h : J → R is defined by

h(y) = f(x, y)− f(a, y).

By the one-variable Mean Value Theorem, for all (x, y) ∈ I × J thereexists t between b and y such that

g(x, y) = h′(t)(y − b) =(D2f(x, t)−D2f(a, t)

)(y − b),

and then for the same reason there exists s between a and x such that

g(x, y) = D1D2f(s, t)(x− a)(y − b).

As (x, y) → (a, b), so does (s, t), hence

g(x, y)

(x− a)(y − b)= D12f(s, t) → D12f(a, b)

by continuity. By symmetry we also have

g(x, y)

(x− a)(y − b)→ D21f(a, b),

so we are done. �

As in 1-variable calculus, higher-order derivatives are used mainlyvia Taylor’s Theorem. But what should Taylor’s Theorem look like formulti-variable functions? Well, first of all we’ll only do it for real-valuedfunctions. And even then it’ll look pretty messy:

Theorem 8.7 (Taylor’s Theorem). Let U ⊂ Rn be open and convex,and let f : U → R be Ck. Then for all a, x such that a, a + x ∈ U , and

Page 27: MAT 473 — DIFFERENTIATION · MAT 473 — DIFFERENTIATION JOHN QUIGG Contents 1. Introduction 2 2. Linear maps 4 3. Derivatives 8 4. Partial derivatives 12 5. Mean value 16 6. Inverse

MAT 473 — DIFFERENTIATION 27

for all k ∈ N, there exists c on the line segment joining a and a + xsuch that

f(a + x) = f(a) +k−1∑j=1

1

j!

n∑i1,...,ij=1

Di1···ijf(a)xi1 . . . xij

+1

k!

n∑i1,...,ik=1

Di1···ikf(c)xi1 · · ·xik .

Proof. Define g : R → Rn by g(t) = a+ tx, and put V = g−1(U), whichis an open set containing [0, 1]. Then f ◦ g : V → R is Ck. By theone-variable Taylor Theorem, there exists s ∈ [0, 1] such that

f(a + x)− f(a) = f ◦ g(1)− f ◦ g(0)

=k−1∑1

(f ◦ g)(j)(0)

j!+

(f ◦ g)(k)(s)

k!.

Thus, it suffices to show that if 1 ≤ j ≤ k and t ∈ V , then

(f ◦ g)(j)(t) =∑

i1,...,ij

Di1···ijf(g(t))xi1 · · ·xij .

The equality holds for j = 1 by the Chain Rule. Let 1 ≤ j < k, andassume the equality holds for j. Differentiate both sides of the aboveequation with respect to t:

(f ◦ g)(j+1)(t) =∑

i1,...,ij

((Di1···ijf) ◦ g

)′(t)xi1 · · ·xij

=∑

i1,...,ij

∑j

DjDi1···ijf(g(t))g′j(t)xi1 · · ·xij

=∑

i1,...,ij

∑j

Dji1···ijf(g(t))xjxi1 · · ·xij

=∑

i1,...,ij+1

Di1···ij+1f(g(t))xi1 · · ·xij+1

,

where in the last equality we used Clairaut’s Theorem. �

When k = 1, Taylor’s Theorem is just the Mean Value Theorem(and we don’t need f ′ to be continuous, since we don’t need Clairaut’sTheorem to rearrange partial derivatives). Let’s examine the case k =2: Taylor’s Theorem says

f(a + x) = f(a) + f ′(a)x +1

2

∑i,j

Dijf(c)xixj

Page 28: MAT 473 — DIFFERENTIATION · MAT 473 — DIFFERENTIATION JOHN QUIGG Contents 1. Introduction 2 2. Linear maps 4 3. Derivatives 8 4. Partial derivatives 12 5. Mean value 16 6. Inverse

28 JOHN QUIGG

for some c on the line segment joining a and a + x. So what? Since wehave assumed something about the 2nd-order partial derivatives of f ,you might expect to get a “second-order” generalization of Lemma 3.5:

Exercise 8.8. Let U ⊂ Rn be open and f : U → R be C2. Prove thatthere exists a function r defined on some neighborhood V of 0 suchthat limx→0 r(x) = 0 and

f(a+x) = f(a)+f ′(a)x+1

2

n∑i,j=1

Dijf(a)xixj+r(x)‖x‖2 for all x ∈ V.

In the preceding exercise, the function

x 7→∑i,j

Dijf(a)xixj

is a quadratic form, and the n× n matrix with ij-entry Dijf(a) is theHessian of f at a. Thus the quadratic form can be written xT Hx,where H is the Hessian. Note that (tx)T H(tx) = t2xT Hx for all t ∈ R.In the following exercise you’ll prove another property of quadraticforms:

Exercise 8.9. Let H be an n × n matrix such that xT Hx > 0 for allnonzero vectors x ∈ Rn. Prove that there exists c > 0 such that

xT Hx ≥ c‖x‖2 for all x ∈ Rn.

Observation 8.10. Similarly, if xT Hx < 0 for all x 6= 0 then xT Hx ≤c‖x‖2 for some c < 0 and all x ∈ Rn.

We can use the preceding exercise to get a multi-variable version ofthe Second Derivative Test:

Exercise 8.11 (Second Derivative Test). Let U ⊂ Rn be open, f : U →R be C2, and a ∈ U . Assume f ′(a) = 0. Let H be the Hessian of f ata. Prove the following:

(a) If xT Hx > 0 for all x 6= 0, then there exists ε > 0 such that

f(a + x) > f(a) whenever 0 < ‖x‖ < ε.

(b) If xT Hx < 0 for all x 6= 0, then there exists ε > 0 such that

f(a + x) < f(a) whenever 0 < ‖x‖ < ε.

(c) If xT Hx take both positive and negative values, then for allε > 0 there exist x, y ∈ Bε(0) such that

f(a + x) > f(a) and f(a + y) < f(a).

Here’s a curious application of Taylor’s Theorem for k = 3:

Page 29: MAT 473 — DIFFERENTIATION · MAT 473 — DIFFERENTIATION JOHN QUIGG Contents 1. Introduction 2 2. Linear maps 4 3. Derivatives 8 4. Partial derivatives 12 5. Mean value 16 6. Inverse

MAT 473 — DIFFERENTIATION 29

Exercise 8.12. Suppose U is an open subset of R2, (a, b) ∈ U , andf : U → R is C3. Prove that

limr→0

4

πr2

∫ 2π

0

f(a+r cos θ, b+r sin θ) cos 2θ dθ = D11f(a, b)−D22f(a, b).

Hint: use Taylor’s Theorem with k = 3. Also, you may use thefollowing following standard integrals from calculus:∫ 2π

0

cos 2θ dθ =

∫ 2π

0

cos θ cos 2θ dθ =

∫ 2π

0

sin θ cos 2θ dθ

=

∫ 2π

0

cos θ sin θ cos 2θ dθ = 0

∫ 2π

0

cos2 θ cos 2θ dθ =π

2,

∫ 2π

0

sin2 θ cos 2θ dθ = −π

2

Page 30: MAT 473 — DIFFERENTIATION · MAT 473 — DIFFERENTIATION JOHN QUIGG Contents 1. Introduction 2 2. Linear maps 4 3. Derivatives 8 4. Partial derivatives 12 5. Mean value 16 6. Inverse

Index

C1, 17Chain Rule, 10Clairaut’s Theorem, 26continuously differentiable, 17curve, 14

derivative, 8directional, 13partial, 12

differentiable, 8at a point, 8continuously, 17

directional derivative, 13

gradient vector, 14

Hessian, 28higher order partial derivative, 25

Implicit Function Theorem, 22Inverse Function Theorem, 19

L(Rn, Rm), 4level hypersurface, 14

Mean Value Inequality, 16Mean Value Theorem, 16

norm of linear mapEuclidean norm, 4operator norm, 4

operator norm, 4

partial derivative, 12

quadratic form, 28

Second Derivative Test, 28

tangent vector, 14Taylor’s Theorem, 26

30