gunning: functions of several real variables

7/23/2019 Gunning: Functions of Several Real Variables

http://slidepdf.com/reader/full/gunning-functions-of-several-real-variables 1/173

1

[chapter] [chapter]



2



AN INTRODUCTION TO FUNCTIONS OF

SEVERAL REAL VARIABLES

Robert C. Gunning

cRobert C. Gunning



ii



Introduction

I have taught the honors version of calculus of several variables at PrincetonUniversity from time to time over a number of years. Usually I had used MichaelSpivak’s path breaking and highly original Calculus on Manifolds as the coursetextbook, supplemented by one or another of the available excellent more ad-vanced textbooks covering the standard approaches and including a wide rangeof examples and problems. As usual though, for anyone who has taught a coursefrom the same textbook more than a couple of times, there are some parts of Spivak’s treatment that I would like to expand, and some parts that I wouldlike to contract or handle differently. During the 2007-2008 academic year Lil-lian Pierce and I cotaught the course, while she was a graduate student here,and reorganized the presentation of the material and the problem assignments;her suggestions were substantial and she played a significant part in workingthrough the revisions that year. The goal was to continue the rather abstracttreatment of the underlying mathematical topics but to supplement it by moreemphasis on using the theoretical material in solving problems and on lookingat applications. The problems were divided into two categories: a first group

of problems testing a basic understanding of the essential topics discussed andtheir applications, problems that almost all serious students should be able tosolve without much difficulty; and a second group of problems covering moretheoretical aspects and tougher calculations, challenging the students but stillnot particularly difficult. The temptation to include a third category of op-tional very challenging problems, to introduce interested students to a widerrange of other theoretical and practical aspects, was frustrated by a resoundinglack of interest on the part of the students. A team of Princeon undergradu-ate mathematics majors consisting of Robert Haraway, Adam Hesterberg, JayHolt and Alex Schiller took careful notes of the lectures that both Lillian and Igave. In the subsequent years I have used these notes when teaching the courseagain, modified each year with revisions and additions using the very helpfulcorrections and suggestions from the students. There is a good deal of material

to cover, so the course meets four hours each week, one of which is devotedprincipally to examples and illustrative calculations, while the discussion of theproblems is left to office hours outside the regular class schedule. It has beenpossible at least to discuss all the topics covered in general term, focusing onthe most difficult proofs and leaving it to the students to read the details of some of the proofs in the notes.

iii



iv INTRODUCTION

I would like to express here my sincere thanks to the students who compiled

the lecture notes that were the basis for these notes, Robert Haraway, AdamHesterberg, Jay Holt and Alex Schiller; to Lillian Pierce for her suggestions andgreat help in reorganizing the course; and to the students who have taken thecourse since its reorganization for a number of corrections and suggestions forgreater clarity in the notes. The remaining errors and confusion are my ownresponsibility though.

Robert C. GunningFine Hall, Princeton University, January 2011



Contents

Introduction iii

1 Background 1

1.1 Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Topological Preliminaries . . . . . . . . . . . . . . . . . . . . . . 71.3 Continuous Mappings . . . . . . . . . . . . . . . . . . . . . . . . 12

2 Differentiable Mappings 172.1 The Derivative . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.2 The Chain Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212.3 Higher Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . 242.4 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

3 The Rank Theorem 373.1 The Inverse Mapping Theorem . . . . . . . . . . . . . . . . . . . 373.2 The Implicit Function Theorem . . . . . . . . . . . . . . . . . . . 46

3.3 The Rank Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4 Integration 654.1 Riemann Integral . . . . . . . . . . . . . . . . . . . . . . . . . . . 654.2 Fubini’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 784.3 Limits and Improper Integrals . . . . . . . . . . . . . . . . . . . . 824.4 Change of Variables . . . . . . . . . . . . . . . . . . . . . . . . . 90

5 Differential Forms 1015.1 Line Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1015.2 Differential Forms . . . . . . . . . . . . . . . . . . . . . . . . . . 1115.3 Integrals of Differential Forms . . . . . . . . . . . . . . . . . . . . 1265.4 Stokes’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

A Exterior Algebras 155

v



vi CONTENTS



Chapter 1

Background

1.1 Norms

Some fairly standard set-theoretic notation and terminology will be usedconsistently. In particular, a ∈ A indicates that a is an element of, or a pointin, the set A, while A ⊂ B indicates that A is a subset of B, possibly equalto B. The union A ∪ B of sets A and B consists of all elements that belongto either A or B or both A and B , while the intersection A ∩ B consists of allelements that belong to both A and B . The complement A ∼ B consists of allelements of A that do not belong to B, whether or not B is a subset of A. If the set A is understood from context it may be omitted; so ∼ B consists of allelements in a set A that are not in B , where the set A is understood, and if Aand B are both understood to be subsets of a set E then A ∼ B = A ∩ (∼ B).

A mapping f : A −→ B associates to each element a ∈ A an element f (a) ∈ B,the image of the point a. This mapping f is injective if distinct elements of Ahave distinct images in B, is surjective if every element of B is the image of some element of A, and is bijective if it is both injective and surjective; thus f is bijective if and only if is a one-to-one mapping between the sets A and B,and consequently has an inverse mapping f −1 : B −→ A that is also bijective.

The n-dimensional real vector space will be denoted by Rn, following theBourbaki convention. It will be viewed as the space of real column vectors of length n, elements of which are

x =

x1

...xn

;

but for notational convenience a vector x ∈ Rn sometimes will be indicated bylisting its coordinates in the form x = xj = x1, . . . , xn. The origin in Rn isthe vector 0 = 0 = 0, 0, . . . , 0. When considering a function f , the notationf (x1, . . . , xn) also will be used when viewing coordinates of the vector x asindividual variables; but the vector x still will be viewed as a column vector.

1



2 CHAPTER 1. BACKGROUND

Addition is the usual addition of vectors by adding their coordinates; and scalarmultiplication amounts to multiplying all the entries of the vector by a scalar,

so that

a x =

a x1

...a xn

for any a ∈R.

Linear mappings between vector spaces are described by matrix multiplication;for example, a 2 × 3 matrix A = aij describes the mapping A : R3 −→ R2

that takes a vector x = xj ∈ R3 to the vector

Ax =

a11 a12 a13

a21 a22 a23

x1

x2

x3

=

a11x1 + a12x2 + a13x3

a21x1 + a22x2 + a23x3

∈ R2.

There are a various norms measuring the sizes or lengths of vectors in Rn

, onlytwo of which will be considered here:

• the Euclidean norm or Cartesian norm of a vector x = xj is definedby

x2 = n

j=1 x2j =

x2

1 + · · · + x2n (with the non-negative square

root);

• the supremum norm or sup norm of a vector x = xj is defined byx∞ = max1≤j≤n |xj |.

In general, a norm on the vector space Rn is a mapping Rn −→ R that as-

sociates to any x ∈ R

n

a real number x ∈ R and that has the followingproperties:

1. positivity : x ≥ 0 and x = 0 if and only if x = 0;

2. homogeneity : cx = |c| x for any c ∈ R;

3. the triangle inequality : x + y ≤ x + y.

That the supremum norm satisfies these three properties is obvious, exceptperhaps for the triangle inequality; to verify that, if x, y ∈Rn then since |xj | ≤x∞ and |yj | ≤ y∞ for 1 ≤ j ≤ n it follows that |xj + yj| ≤ |xj | + |yj| ≤x∞ + y∞ for 1 ≤ j ≤ n and consequently that x + y = max1≤j≤n |xj +yj | ≤ x∞ + y∞. That the Euclidean norm satisfies these three properties

also is obvious, except for the triangle inequality. It is convenient to demonstratethat together with an inequality involving the inner product of two vectors,which is defined by

(1.1) (x, y) =n

j=1

xjyj for vectors x = xj and y = yj ∈Rn;



1.1. NORMS 3

often this is written (x, y) = x ·y and called the dot product of the two vectorsx and y. It is clear from its definition that the inner or dot product has the

following properties:

1. linearity : (c1x1 + c2x2, y) = c1(x1, y) + c2(x2, y) for any c ∈ R;

2. symmetry : (x, y) = (y, x);

3. positivity : (x, x) ≥ 0 and (x, x) = 0 if and only if x = 0.

It is apparent from symmetry that the inner product (x, y) is also a linearfunction of the vector y. The Euclidean norm can be defined in terms of theinner product by

(1.2) x2 =

(x, x) (with the non-negative square root),

as is also quite clear from the definition. Conversely the inner product can be

defined in terms of the Euclidean norm by

(1.3) (x, y) = 1

4x + y2

2 − 1

4x − y2

2

since

x + y22 − x − y2

2 = (x + y, x + y) − (x − y, x − y)

=

(x, x) + 2(x, y) + (y, y)

−

(x, x) − 2(x, y) + (y, y)

= 4(x, y);

equation (1.3) is called the polarization identity.

Theorem 1.1 (i) |(x, y)| ≤ x2y2 for any x, y ∈Rn

; and this is an equality if and only if the two vectors are linearly dependent.

(ii) x + y2 ≤ x2 + y2 for any x, y ∈ Rn; and this is an equality if and

only if one of the two vectors is a non-negative multiple of the other.

Proof: If x, y ∈ Rn where y = 0 and t ∈ R introduce the continuous functionf (t) of the variable t defined by

f (t) = x + ty22 =

nj=1

(xj + tyj)2(1.4)

=n

j=1

x2j + 2t

n

j=1

xjyj + t2n

j=1

y2j

= x22 + 2t(x, y) + t2y2

2.

The function f (t) becomes large for |t| large, since y2 > 0 by assumption, soit must take a minimum value at some point. Since

f (t) = 2(x, y) + 2ty22




there is actually just a single point at which f (t) = 0, the point t0 = − (x,y)y22 ,

so this must be the point at which the function f (t) takes its minimum value;and the minimum value is

(1.5) f (t0) = x22 + 2

− (x, y)

y22

(x, y) +

(x, y)2

y42

y22 =

x22y2

2 − (x, y)2

y22

.

It is clear from (1.4) that f (t) ≥ 0 at all points t ∈ R, so in particular atthe point t0, and that yields the inequality (i). It is clear from (1.5) that thisinequality is an equality if and only if f (t0) = 0, hence if and only if x + t0y = 0since f (t0) = x + t0y2

2; and since y = 0 that is just the condition that thevectors x and y are linearly dependent. Next from (1.4) with t = 1 and fromthe inequality (i) it follows that

x + y22 = x2

2 + 2(x, y) + y22

≤ x22 + 2x2y2 + y22 = (x2 + y2)2,

which yields the inequality (ii). This inequality is an equality if and only if (x, y) = x2y2, or equivalently if and only if (i) is an equality and (x, y) ≥ 0;and since y = 0 it follows from what has already been demonstrated that theinequality (i) is an equality if and only if x = c y for some real number c, andthen (x, y) = cy2

2 ≥ 0 if and only if c ≥ 0. That suffices for the proof.

The very useful inequality (i) in the preceding theorem, called the Cauchy-Schwarz inequality, can be written

(x, y)

x2y2

≤ 1 if x = 0, y = 0;

as a consequence there is an angle θ, called the angle between the nonzerovectors x and y, that is determined uniquely up to a multiple of 2π by

(1.6) cos θ = (x, y)

x2y2.

In particular if θ = 0 or π so cos θ = ±1 then by Theorem 1.1 (i) the two vectorsx and y are linearly dependent; they are parallel and in the same direction if θ = 0 so that (x, y) > 0, or parallel and in the opposite direction if θ = πso that (x, y) < 0. The two vectors are orthogonal or perpendicular to oneanother when the angle is either π/2 or 3π/2 so that (x, y) = 0. The geometricalinterpretation of the norm x2 in terms of the inner product makes that normparticularly useful in many applications.

In general, two norms xa and xb on a vector space Rn are equivalentif there are nonzero constants ca and cb such that xa ≤ ca xb and xb ≤cb x for all x ∈Rn. The Euclidean and supremum norms are equivalent, sincethey are related by the very useful inequalities

(1.7) x∞ ≤ x2 ≤ √

nx∞ for any x ∈Rn.



1.1. NORMS 5

To verify these inequalities, if x = xj ∈ Rn then |xj | ≤ x∞ for 1 ≤ j ≤ nso

x22 =

nj=1

x2j ≤

nj=1

x2∞ = nx2

∞ hence x2 ≤ √

nx∞;

on the other hand

x22 =

nj=1

x2j ≥ x2

j so x2 ≥ |xj | for 1 ≤ j ≤ n hence x2 ≥ x∞.

It follows from (1.7) that whenever x2 is small then x∞ is also small, andconversely; so for many purposes the two norms are interchangeable. If it isnot really necessary to specify which norm is meant the notation x will beused, meaning either

x

2 or

x

∞. Of course some care must be taken; in

particular x should have the same meaning in any single equation, or usuallythroughout any single proof, for quite obvious reasons. Any norm can be usedto define a distance function d(x, y) by

(1.8) d(x, y) = x − y.

As an immediate consequence of the properties of norms it follows that d(x, y) =0 if and only if x = y, that d(x, y) = d(y, x) and that the distance functionsatisfies the triangle inequality d(x, y) ≤ d(x, z) + d(z, y), so this notion of dis-tance does have the expected properties. Equivalent norms determine distancefunctions that are equivalent in the corresponding sense, as is quite evident fromthe definitions.

For some purposes it is convenient to view an m × n matrix

A = aij 1 ≤ i ≤ m, 1 ≤ j ≤ n ∈ Rm×n

as a vector in Rmn, and to consider its norm as a vector; thus(1.9)

A∞ = max1≤i≤m1≤j≤n

aij

and A2 =

1≤i≤m1≤j≤n

a2ij (the nonnegative square root).

If x ∈ Rn then Ax ∈Rm, and since

nj=1

aijxj

≤

nj=1

|aij ||xj | ≤n

j=1

A∞x∞ = n A∞x∞

it follows that

(1.10) Ax∞ = max1≤i≤m

nj=1

aijxj

≤ nA∞x∞.




An -neighborhood N (a) of a point a ∈ Rn, an open -neighborhoodto be more specific, is a subset of Rn defined by

(1.11) N (a) =

x ∈ Rn x − a <

;

and a closed -neighborhood is defined correspondingly by

(1.12) N (a) =

x ∈ Rn x − a ≤

.

The boundary of an open or closed -neighborhood is the closed set

(1.13) ∂ N (a) = ∂ N (a) =

x ∈ Rn x − a =

.

If it is necessary or convenient to be specific, an open neighborhood in theEuclidean norm is denoted by

N ,2(a) while an open -neighborhood in the

supremum norm is denoted by N ,∞(a). It is clear from the inequality (1.7)that

(1.14) N ,2(a) ⊂ N ,∞(a) ⊂ N √ n ,2(a) in Rn.

The geometrical shapes of these neighborhoods depend on the norms used intheir definitions; for example in the plane neighborhoods have the shapes asshown in Figure1.1.

a a

Euclidean norm Supremum norm

Figure 1.1: Neighborhoods N (a) ⊂R2.

Other useful subsets of Rn are open cells, subsets of the form

(1.15) ∆ =

x = xj ∈ Rn aj < xj < bj for 1 ≤ j ≤ n

for arbitrary real numbers aj < bj, and closed cells, subsets of the form

(1.16) ∆ =

x = xj ∈ Rn

aj ≤ xj ≤ bj for 1 ≤ j ≤ n

for arbitrary real numbers aj ≤ bj . Closed cells may have aj = bj for someindices j , while of course open cells for which aj = bj for any index j can onlybe the empty set. The boundary of an open or closed cell is the set

(1.17) ∂ ∆ = ∂ ∆ =

x ∈Rn

aj ≤ xj ≤ bj for 1 ≤ j ≤ n, andat least one inequality is an equality

.



1.2. TOPOLOGICAL PRELIMINARIES 7

A customary notation for cells in R1, which are also called intervals, is

(1.18) (a, b) = x ∈ R1 a < x < b and [a, b] = x ∈ R1 a ≤ x ≤ b ;

these can be mixed, as for instance (a, b] = x ∈ R1 a < x ≤ b .

1.2 Topological Preliminaries

Some familiarity with the basic topological notions for subsets of the realline is assumed; but a brief review is included here, focusing on the extension of these notions to subsets of Rn, and it may be a sufficient introduction even forthose seeing these concepts for the first time. However if these notions are notentirely familiar it is advisable as an additional exercise to prove explicitly andin detail those statements that are labeled “clear” or “evident” in the discussion

here.The basic topological property of the real number system is its complete-

ness, the property that any set S of real numbers that is bounded from abovehas a least upper bound or supremum, denoted by sup S , or equivalently thatany set S of real numbers that is bounded from below has a greatest lowerbound or infimum, denoted by inf S ; an alternative statement of this propertyis that any Cauchy sequence of real numbers converges to a real limit.

There are many other topological notions and terms that are in commonuse. A subset U ⊂ Rn is said to be open if for each a ∈ U there is an > 0such that N (a) ⊂ U . It is clear from the inequalities (1.7) that the conditionthat a set be open is independent of which of the two norms x2 or x∞ isused to define the neighborhood N (a). Intuitively a set U ⊂ Rn is open if forany point a

∈ U all points that are near enough to a are also in the set U .

For example an open -neighborhood N (a) of a point a ∈ Rn and an open cell∆ ⊂ Rn are open subsets. An open subset of Rn containing a point a is oftencalled an open neighborhood of the point a; an open -neighborhood N (a)is thus an open neighborhood of the point a in this sense as well. The collectionT of open subsets of Rn has the following characteristic properties:1. if U α ∈ T then

α U α ∈ T ;

2. if U i ∈ T for 1 ≤ i ≤ N for some N thenN

i=1 U i ∈ T ;3. ∅ ∈ T and Rn ∈ T .These properties can be summarized in the statement that an arbitrary unionof open sets is open, a finite intersection of open sets is open, and the empty setand the set Rn itself are open. That the open subsets of Rn satisfy these threeconditions is quite clear. It is also clear that an infinite intersection of open

sets is not necessarily open; for instance ∞ν =1 N 1ν (a) = a and a single point isnot open. A collection T of subsets of an arbitrary abstract set S having thesethree characteristic properties is said to be a topology on the set S , and thesets U ∈ T are said to be the open sets in this topology. For many purposes thetopological properties of a space can be described just in terms of the collectionof open sets, rather than in terms of a norm or distance function on the space.




A point a ∈ Rn is a limit point of a subset E ⊂ Rn if for any > 0the intersection

N (a)

∩ E contains points of E other than a; clearly that is

equivalent to the condition that for all > 0 the intersection N (a) ∩ E is aninfinite set of points. The set of limit points of E is denoted by E and is calledthe derived set of E . The set E is said to be closed if E ⊂ E . For examplea closed neighborhood N (a) of a point a ∈ Rn and a closed cell ∆ in Rn areclosed subsets. The closure of a set E , denoted by E , is defined to be the unionE = E ∪ E , and is readily seen to be a closed set. A closed neighborhod N (a)of a point a ∈ Rn is the closure of the open neighborhood N (a) and a closedcell ∆ in Rn is the closure of the open cell ∆ provided that ∆ = ∅. A basictopological result is that a subset E ⊂ Rn is closed if and only if its complementF = Rn ∼ E is open. To verify that, if E is closed and a ∈ E then a ∈ E so a isnot a limit point of E hence there must be some > 0 such that N (a) ∩ E = ∅;consequently N (a) ⊂ F so F is open. Conversely if E is open then no point of E can be a limit point of F , since for any point a

∈ E there is an > 0 such

that N (a) ⊂ E hence that N (a) ∩ F = ∅; consequently all the limit points of F are contained in F so F is closed. It is clear that the closed sets have theproperties that a finite union of closed sets is closed, that any intersection of closed sets is closed, and that the empty set and the full set Rn are both closed;but an arbitrary union of closed sets is not necessarily closed, since any set canbe written as a union of its points and each point is a closed set. A topology ona space can be defined equivalently by giving a collection of sets satisfying thesethree conditions and defining the open sets to be their complements, althoughit is customary to define topologies directly in terms of the open sets.

A subset S ⊂ Rn can be viewed as a set in itself, without regard to theambient space Rn in which it is contained; in a sense that amounts to consideringthe set S itself as the entire universe, rather than viewing it is a subset contained

in the universeRn

. The restriction of a norm onRn

to the subset S can be usedto describe a distance function on the set S itself, and open and closed subsetsof S then can be defined in terms of this distance when an neighborhood of a point a in the set S is taken to be the intersection N (a) ∩ S . Thus a subsetE ⊂ S is said to be relatively open if for any point a ∈ E there is an > 0such that N (a) ∩ S ⊂ E , where N (a) ∩ S consists of those points of S that aredistance at most from the point a. A point a ∈ S is a relative limit pointof E in S if for any > 0 the intersection N (a) ∩ S contains a point of E otherthan a, or equivalently, if for any > 0 the intersection N (a) ∩ S is infinite;and that is just the condition that a ∈ E ∩ S where E is the derived set of E as a subset of Rn, so a relative limit point of E is a limit point of E in theordinary sense but one that is also contained in S . A subset E ⊂ S is said to berelatively closed if it contains all its relative limits points. Clearly a relatively

open subset of S need not be an open subset of E ; if S is a coordinate axis inthe plane R2 an open interval of S is relatively open in S but is not open inR2. Equally clearly a relatively closed subset of S need not be a closed subsetof Rn; for S ⊂ Rn may not be closed in Rn it is still relatively closed in itself.Howver a subset E ⊂ S is relatively open if and only if it is the intersectionE = U ∩ S of S with an open subset U ⊂ Rn, and is relatively closed if and




only if it is the intersection E = C ∩ S of S with a closed subset C ⊂ Rn.Indeed it is clear that E = U

∩S where U

⊂Rn is open then E

∩U is relatively

open; and if E ⊂ S is relatively open then by definition for each point a ∈ E there is an a > 0 such that N (a) ∩ S ⊂ E , and then U = ∪a N (a) is an opensubset of Rn such that E = U ∩ S . Similarly if E = C ∩ S for some closedsubset C ⊂ Rn then E ⊂ C ⊂ C so E ∩ S ⊂ C ∩ S = E , and if E ⊂ S is arelatively closed subset of S then then C = E ∪ E is a closed subset of Rn andC ∩ S = (E ∩ S ) ∪ (E ∩ S ) ⊂ E .

The boundary of a subset E ⊂ Rn is the closed set ∂ E = E ∩ (∼ E ). Theboundaries of neighborhoods N (a) and of cells ∆ as defined earlier are alsotheir boundaries in this sense. For more general sets some caution is necessary,since the boundary may not always correspond to what naively might be viewedas the boundary of the set; for example if E is the set of all points in an opencell ∆ that have rational coordinates then ∂ E = ∆.

Two subsets E, F ⊂

Rn are separated if E ∩

F = E ∩

F = ∅

. This issomewhat stronger than that the two sets are disjoint; for example E = [0, 1)and F = [1, 2] are disjoint subsets of R but are not separated since E ∩ F = 1,but E = [0, 1) and G = (1, 2] are separated subsets of R. A subset S ⊂ Rn

is connected if it cannot be written as the union of two nonempty separatedsubsets. In terms of open sets alone, a topological space S , such as a subset of Rn with the relative topology, is connected if it cannot be written as a disjointunion of two relatively open subsets of S , or equivalently if there is no subset of S that is both relatively open and relatively closed, other than the empty set andall of S . The equivalence of these conditions is quite obvious. It is easy to seethat the entire space Rn is connected. Indeed if U ⊂ Rn is a subset other thanthe empty set or all of Rn that is both open and closed, choose a point a ∈ U and a point b ∈ Rn ∼ U . The set of real numbers s such that a + t(b − a) ∈ U

for 0 ≤ t < s is nonempty, since U is open, and it is bounded above since b ∈ U ,so this set has a least upper bound s0, one of the characteristic properties of the real number system. Since U is closed it must be the case that a + s0b ∈ U ,for this is the limit of the points a + sb ∈ U as s → s0; but then since U isopen it also must be the case that a + sb ∈ U for some real numbers s > s0, acontradiction. The same argument shows that an open neighborhood N r(a) andan open cell in Rn are connected sets. On the other hand the subset Q ⊂ R of rational points with the relative topology it inherits from R is not a connectedset; indeed the interval [

√ 2,

√ 3] ⊂ Q is both open and closed.

An open covering of a subset E ⊂ Rn is a collection of open sets U α, notnecessarily a countable collection, such that E ⊂ U α. If some of the setsU α are redundant they can be eliminated and the remaining sets are also anopen covering of E , called a subcovering of the set E . A set E is said to be

compact if every open covering of E has a finite subcovering, that is, if for anyopen covering U α of E finitely many of the sets U α actually cover all of E .This is a rather subtle notion, but is very important and frequently used. Anexample of a non-compact subset of Rn is a nonempy open neighborhood N 1(0)of the origin in Rn; this set is covered by the open subsets U ν = N 1− 1

ν(0) for

ν = 1, 2, 3, . . ., but the union of any finite collection of these subsets will just




be the set U ν for the largest ν in the collection, and that is a proper subsetof

N 1(0). An open cell is also noncompact, for essentially the same reason. A

closed cell is an example of a compact subset; but the proof is a bit subtlerand rests on the basic topological properties of the real number system, just asdid the proof that Rn is connected. For the proof it is convenient to define theedgesize of an open or closed cell ∆ = x = xj ∈ Rn

aj ≤ xj ≤ bj ⊂ Rn

to be the nonnegative number d(∆) = max1≤j≤n(bj − aj).

Lemma 1.2 If ∆ν ⊂ Rn are closed cells in Rn for ν = 1, 2, . . . such that

∆ν +1 ⊂ ∆ν and limν →∞ d(∆ν ) = 0 then

ν ∆ν is a single point of Rn.

Proof: If ∆ν = x = xj aν

j ≤ xj ≤ bν j then for each j clearly aν +1j ≥ aν

j

and bν +1j ≤ bν j ; and since ∆ν ⊂ ∆1 it is also the case that aν

j ≤ b1j and bν j ≥ a1

j .The basic completeness property of the real number system implies that anyincreasing sequence of real numbers bounded from above and any decreasing

sequence of real numbers bounded from below have limiting values; thereforelimν →∞ aν j = aj and limν →∞ bν j = bj for some uniquely determined real numbers

aj, bj, and it is clear that the cell ∆ = x = xjaj ≤ xj ≤ bj is contained in

the intersection

ν ∆ν . On the other hand since (bj − aj) ≤ (bν j − aν j ) ≤ d(∆ν )

and limν →∞ d(∆ν ) = 0 it must be the case that bj = aj so the limiting cell ∆is just a single point of Rn, which concludes the proof.

Theorem 1.3 A closed cell in Rn is compact.

Proof: If a closed cell ∆ = x = xjaj ≤ xj ≤ bj is not compact there is an

open covering U α of ∆ that does not admit any finite subcovering. The cell ∆can be written as the union of the closed cells arising from bisecting each of itssides. If finitely many of the sets

U α

covered each of the subcells then finitely

many would cover the entire set ∆, which is not the case; hence at least one of the subcells cannot be covered by finitely many of the sets U α. Then bisecteach of the sides of that subcell, and repeat the process. The result is that thereis a collection of closed cells ∆ν which cannot be covered by finitely many of the open sets U α and for which ∆ν +1 ⊂ ∆ν ⊂ ∆, and limν →∞ d(∆ν ) −→ 0.It then follows from the preceding lemma that

ν ∆ν = a, a single point in

Rn. This point must be contained within one of the sets U α0, and if ν issufficiently large then ∆ν ⊂ U α0 as well; but that is a contradiction, since thecells ∆ν were chosen so that none of them could be covered by finitely many of the sets U α. Therefore the cell ∆ is compact, which concludes the proof.

Theorem 1.4 A closed subset of a compact set is compact.

Proof: Suppose that E ⊂ F where F is compact and E is closed, and thatU α is an open covering of the set E . The sets U α together with the open setRn ∼ E form a covering of the compact set F , so finitely many of these setscover F hence also cover E . Clearly the set Rn ∼ E covers none of the pointsof E , so the remaining finitely many sets U α necessarily cover E . Therefore E is compact, which suffices for the proof.




Theorem 1.5 (Heine-Borel Theorem) A subset E ⊂ Rn is compact if and

only if it is closed and bounded.

Proof: A bounded subset E ⊂ Rn is contained in a sufficiently large closed cell∆ ⊂ Rn; the set ∆ is compact by Theorem 1.3, so if E is also closed then byTheorem 1.4 it is compact. Conversely suppose that E is a compact set. If E isnot bounded it can be covered by the collection of open neighborhoods N ν (0)for ν = 1, 2, . . ., but it is not covered by any finite set of these neighborhoods;that contradicts the compactness of E , so a compact set is bounded. If E isnot closed then there is a limit point a ∈ E that is not contained in E . Eachclosed neighborhood N 1/ν (a) of the point a must contain a point of E , sincea ∈ E , but the intersection of all of these neighborhoods consists of the pointa itself, which is not contained in E . The complements U ν = Rn ∼ N 1/ν (a)then form an open covering of E , but no finite number of these sets suffice tocover E since the union of finitely many such sets must be one of the sets U ν

and hence does not cover the points of E contained in N 1/ν (a); and that againis a contradiction, which suffices to conclude the proof.

Theorem 1.6 (Casorati-Weierstrass Theorem) A subset E ⊂ Rn is com-

pact if and only if every sequence of distinct points in E has a limit point in

E .

Proof: Suppose that E is compact. If aν ⊂ E is a collection of distinct pointsof E with no limit points in E then the points aν can have no limit points in Rn,for since E is closed these limit points would necessarily lie in E ; in particularthe set

ν aν is a closed set. Each point aν has an open neighborhood U ν that

contains none of the other points, since otherwise aν would itself be a limitpoint of this collection of points. These open sets U ν together with the open set

Rn ∼ (ν aν ) form an open covering of E ; and since E is compact finitely manyof these sets already cover E . That is a contradiction, since the set Rn ∼ (

ν aν )

does not cover any of the points aν and no finite collection of the sets U ν coverall the points aν . On the other hand, suppose that E ⊂ Rn is not compact.Then by the Heine-Borel Theorem either E is not bounded or E is not closed.If E is not bounded it must contain a sequence of distinct points aν such that|aν | is a strictly increasing sequence of real numbers with no finite limit, andthis sequence can have no limit point in E . On the other hand if E is not closedit contains a sequence of distinct points aν with a limit point not contained inE . That suffices to conclude the proof.

As already noted, for any set S ⊂ Rn a subset E ⊂ S that is relatively openin S need not be open in Rn and a subset F ⊂ S that is relatively closed in S

need not be closed in Rn

. However a subset E ⊂ S is compact in the relativetopology of S , in which the definition of compactness is expressed in terms of open coverings by relatively open sets in S , if and only E is compact as a subsetof Rn. Indeed that is quite clear, since the relatively open sets Aα in S are theintersections Aα = S ∩ U α of S with open subsets U α in Rn so that E ⊂ ∪αAα

if and only if E ⊂ ∪αU α. Thus the property that a set be compact is a much




more intrinsic property of that set than that it be either open or closed. Notecarefully though that the Heine-Borel Theorem only holds for subsets of the

entire space Rn; the proof quite explicitly rested on properties of closed cells inRn, and it is clear that a subset of an open cell ∆ ⊂ Rn can be relatively closedand bounded in ∆ but not closed in Rn hence not compact. However it is clearthat the Casorati-Weierstrass Theorem holds in any subset S ⊂ Rn in terms of the relative topology of S , since the limit points of a subset E ⊂ S are just thepoints of the intersection E ∩ S .

1.3 Continuous Mappings

A mapping f from a subset U ⊂ Rm to a subset V ⊂ Rn associates to eachpoint x = xj ∈ U a point f (x) = y = yj ∈ V ; the coordinates yj dependon the point x so can be viewed as given by functions y

j = f

j(x), which are

the coordinate functions of the mapping f . The mapping f is continuousat a point a ∈ U if for every > 0 there is a δ > 0 such that ||f (a) − f (x)|| ≤ whenever x ∈ U and ||x − a|| ≤ δ , or equivalently, for every > 0 there is aδ > 0 such that f (x) ∈ V ∩ N (a) whenever x ∈ U ∩ N δ(a). It is clear fromthe inequalities (1.7) that in the definition of continuity the norm ||x|| can beeither the Euclidean norm ||x||2 or the supremum norm ||x||∞. It is also clearthat the mapping f is continuous at a point a = aj if and only if each of itscoordinate functions f j is continuous at the point a = aj. The mapping issaid to be continuous in the subset U if it is continuous at each point of U .

Theorem 1.7 A mapping f : S −→ Rn from a subset S ⊂ Rm into Rn is

continuous in S if and only if f −1(U ) is a relatively open subset of S for any

open subset U ⊂

Rn.

Proof: If f : S −→ Rn is continuous, U ⊂ Rn is an open subset and a ∈ f −1(U )then f (a) = b ∈ U ; and since U is open there is an open neighborhood N (b)of the point b such that N (b) ⊂ U . Since f is continuous there is a δ > 0 suchthat f (x) ∈ N (b) ⊂ U whenever x ∈ S ∩ N δ(a), so that S ∩ N δ(a) ⊂ f −1(U );that is the case for any point a ∈ f −1(U ), hence f −1(U ) is a relatively opensubset of S . On the other hand if f −1(U ) is a relatively open subset of S for anyopen subset U ⊂ Rn then for any point a ∈ S with image f (a) = b ∈Rn and forany > 0 sufficiently small that N (b) ⊂ U the set f −1( N (b)) is a relativelyopen subset of S , so S ∩ N δ(a) ⊂ f −1( N (b)) ⊂ f −1(U ) for some δ > 0 andconsequently f −1(U ) is a relatively open subset of S . That suffices for the proof.

Corollary 1.8 A mapping f : S −→ Rn from a subset S ⊂ Rm into Rn is continuous in S if and only if f −1(E ) is a closed subset of S in the relative

topology of S for any closed subset E ⊂ Rn.

Proof: This is an immediate consequence of the preceding theorem, since asubset of S is closed in the relative topology of S if and only if its complement



1.3. CONTINUOUS MAPPINGS 13

in S is relatively open and f −1(Rn ∼ E ) =

S ∼ f −1(E )

for any subset E ⊂ Rn.

That suffices for the proof.

These results show that the continuity of a mapping in a set S really is aproperty of the topology of S . A simple consequence of either result is that if g : U −→ V and f : V −→ W are continuous mappings between subsets U, V,W of some Euclidean spaces then the composition f g : U −→ W that takes a pointx ∈ U to the point (f g)(x) = f

g(x)

is also continuous; indeed if E ⊂ W is

open then f −1(E ) is open since f is continuous and (f g)−1(E ) = g−1

f −1(E )

is open since g is continuous, and consequently (f g) is continuous. Theresults in Theorem 1.7 and Corollary 1.8 involve the inverse image of a setunder a mapping f ; the image of an open set under a continuous mapping isnot necessarily open, and the image of a closed set under a continuous mappingis not necessarily closed. For instance if f : R −→ R is the continuous mappingf (x) = e−x2 then f (R) =

x 0 < x

≤ 1

, which is neither open nor closed

although R itself is both open and closed.

Theorem 1.9 If f : S −→ Rn is a continuous mapping from a subset S ⊂ Rm

into Rn then the image f (E ) of any compact subset E ⊂ S is a compact subset

of Rn.

Proof: If E ⊂ S is compact and f (E ) is contained in a union of open setsU α then E is contained in the union of the open sets f −1(U α); and since E is compact is it contained in a union of finitely many of the sets U α, so E iscontained in the union of the inverse images of these finitely many open sets.That suffices for the proof.

Since a compact subset of Rn is closed, by the Heine-Borel Theorem, the

inverse under a continuous mapping of a compact set is necessarily closed; but itis not necessarily compact. For example the inverse image of the set [ −1, 1] ⊂ R

under the mapping f : R −→ R given by f (x) = sin x is an unbounded set henceis not compact.

Corollary 1.10 If f is a continuous function on a compact set U ⊂ Rn then

there are points a, b ∈ U such that

(1.19) f (a) = supx∈U

f (x) and f (b) = inf x∈U

f (x).

Proof: The image f (U ) ⊂ R is compact by the preceding theorem, hence is aclosed set; so if α = supx

∈U f (x) then since α is a limit point of the set f (U ) it

must be contained in the set f (U ) hence α = f (a) for some point a ∈ U , andcorrespondingly for β = inf x∈U f (x). That suffices for the proof.

Theorem 1.11 A one-to-one continuous mapping from a compact subset U ⊂Rm onto a subset V ⊂ Rn has a continuous inverse.




Proof: If the mapping f : U −→ V is one-to-one it has a well defined inversemapping g : V

−→ U . To show that g is continuous it suffices to show that

g−1(E ) is closed for any closed subset E ⊂ U , in view of Corollary 1.8. If E is closed then by Theorem 1.4 it is compact, since U is compact; and theng−1(E ) = f (E ) is compact by Theorem 1.9, hence is closed by the Heine-BorelTheorem, and that suffices for the proof.

The assumption of compactness is essential in the preceding theorem. Forexample the mapping f : [0, 2π) −→ R2 defined by

f (t) = (cos t, sin t) ∈R2

is clearly one-to-one and continuous; but the inverse mapping fails to be con-tinuous at the point f (0) = (1, 0). Some properties of continuity are not purelytopological properties, in the sense that they cannot be stated purely in terms

of open and closed sets, but are metric properties, involving the norms used todefine continuity. By definition a mapping f : U −→ W between two subsets of Euclidean spaces is continuous at a point a ∈ U if and only if for every > 0there is a δ > 0 such that ||f (a)−f (x)|| ≤ whenever ||x−a|| ≤ δ . The mappingf is continuous in the set U if for each point a ∈ U and any > 0 there is aδ a > 0, which may depend on the point a, such that ||f (a)− f (x)|| ≤ whenever||x − a|| ≤ δ a. The mapping f is uniformly continuous in U if it is possibleto find values δ a that are independent of the point a ∈ U . Equivalently themapping f is uniformly continuous in U if for any > 0 there exists δ > 0 suchthat f (x) − f (y) < whenever x − y < δ . It should be kept in mind thatthe two norms are in different spaces; and it is evident from the inequalities(1.7) that they may be different norms as well. Not all continuous mappingsare uniformly continuous, as for example the mapping f : R −→ R defined by

f (x) = x2

; but in some circumstances continuous mappings are automaticallyuniformly continuous.

Theorem 1.12 A continuous mapping f : U −→ Rn from a compact subset

U ⊂ Rm into Rn is uniformly continuous.

Proof: If f : U −→ Rn and > 0 then for any point a ∈ U there is δ a > 0 suchthat f (x) − f (a) < 1

2 whenever x ∈ N δa . The collection of neighborhoods N 1

2δa (a) for all points a ∈ U are an open covering of U , and if U is compact

finitely many of these neighborhoods cover all of U . If δ > 0 is the minimumof the finitely many positive numbers δ a for this finite covering, and if x, y ∈ U are any two points such that x − y < 1

2δ , then x ∈ N 1

2 δ(a) for one of these

neighborhoods, and since x − y < 12

δ it is also the case that y ∈ N δ(a). Itfollows that

f (x)

−f (y)

≤ f (x)

−f (a)

+

f (a)

−f (y)

≤ 1

2 + 1

2 = , so f

is uniformly continuous, which concludes the proof.



1.3. CONTINUOUS MAPPINGS 15

PROBLEMS: GROUP 1

(1) Sketch the following subsets of R2 = x = (x1, x2) , and for each sub-set indicate whether it is open, closed, or compact, and find its boundary.

(i) x 0 < x∞ ≤ 1

(ii) x 0 < x1 ≤ 1, x2 = sin 1

x1

(iii) x x∞ ≤ 1 ∼ x

0 < x2 < 1

(iv) x |x1| ≤ 1 ∩ x

x2 ≤ 1

(2) Indicate the points at which each of the following functions in R2 is contin-uous.

(i) f (x) = x2 − sin x1 if x1 ≥ 0,x1 + x2 if x1 < 0;

(ii) g (x) = x1 + x2 if x∞ ≤ 1,x2

1 + x22 if x∞ > 1.

(3) Show that if f and g are continuous functions in an open subset U ⊂ Rn

then max(f, g), min(f, g), and |f | are also continuous; show that the conversesdo not hold, so if max(f, g) is continuous it is not necessarily the case that f and g are continuous, and similarly for min(f, g) and |f |.

(4) Show that the function x1 = n

j=1 |xj | of a vector x = xj ∈ Rn isalso a norm. Show that this norm is equivalent to the Euclidean and supremumnorms. What is the shape of an open neighborhood of the origin in the planeR2 defined by this norm?

PROBLEMS: GROUP 2

(5) Show that a norm x0 on vectors in Rn can be defined by an inner prod-

uct (x1, x2)0 on Rn, in the sense that x20

= (x, x)0 , if and only if the normsatisfies the parallelogram law :

x + y20

+ x − y20

= 2x20 + 2y2

0.

As a consequence, show that the supremum norm cannot be defined by an innerproduct.

(6) The set of all real polynomials in a single variable x form a real vector

space P , but not a vector space of finite dimension; nonetheless it is possibleto define norms on this vector space. Show that the expressions p(x)∞ and p(x)c defined for a polynomial p(x) = a0 + a1x + a2x2 + · · · by

p(x)∞ = sup− 12≤x≤1

2

| p(x)| and p(x)c = max|a0|, |a1|, |a2|, · · ·




are norms on the vector space P , but that these two norms are not equivalentnorms.

(7) Show that (E ) ⊂ E for any subset E ⊂ Rn, but that the containmentis not necessarily an equality.

(8) Describe the open and closed subsets of the following sets S ⊂ Rn in therelative topology induced by the natural topology on Rn:(i) S = Rm for 1 ≤ m < n, viewed as a linear subspace of Rn.(ii) S is the set of points of Rn, all coordinates of which are integers.

(9) Show that any two norms on Rn are equivalent.

(10) Show that if f (x1, x2) is a continuous function in the unit cell ∆ ⊂ R2

then the function g(x1) = sup0≤x2≤

1 f (x1, x2) is a continuous function on theunit interval [0, 1]. Show that this is not necessarily the case if it is just assumedthat f (x1, x2) is a bounded continuous function in the open unit cell ∆ ⊂ R2.

[Suggestion: consider the function f (x1, x2) = 1 − x1/x21 .]



Chapter 2

Differentiable Mappings

2.1 The Derivative

A mapping f : U −→ Rn from an open subset U ⊂ Rm into Rn is differen-tiable at a point a ∈ U if there is a linear mapping A : Rm −→ Rn describedby an n × m matrix A such that for all h in an open neighborhood of the originin Rm

(2.1) f (a + h) = f (a) + Ah + (h) where limh→0||(h)||||h|| = 0.

Here h ∈ Rm so Ah ∈ Rn, while f (a), f (a + h), (h) ∈ Rn. This definition

is independent of the norm chosen; for if limh→0||(h)||2||h||2 = 0 then from the

inequalities (1.7) it follows that ||(h)||∞||h||∞ ≤ | |(h)||2

||h||2√

n so limh→0||(h)||∞||h||∞ = 0 as

well, and the converse holds similarly. If f is differentiable at a it is continuousat a, since limh→0 Ah = 0 and limh→0 (h) = 0. For example, if m = 3 andn = 2 so that f : R3 → R2 then (2.1) takes the form

f 1(a + h)f 2(a + h)

=

f 1(a)f 2(a)

+

a11 a12 a13

a21 a22 a23

h1

h2

h3

+

1(h)2(h)

.

If f is a constant mapping then f (a + h) = f (a), which is (2.1) for A = 0 and(h) = 0; so any constant mapping is differentiable. If f is the linear mappingf (x) = Ax for a matrix A then f (a + h) = A(a + h) = f (a) + Ah, which is(2.1) for the matrix A and (h) = 0, so a linear mapping also is differentiable.If m = n = 1 it is possible to divide (2.1) by the real number h and to rewritethat equation in the form

limh→0

f (a + h) − f (a)

h − A = lim

h→0

(h)

h = 0;

this is a form of the familiar definition that the real-valued function f (x) of thevariable x ∈ R is differentiable at the point a and that its derivative at thatpoint is the real number A.

17



18 CHAPTER 2. DIFFERENTIABLE MAPPINGS

Theorem 2.1 A mapping f : Rm −→ Rn is differentiable at a point a if and

only if each of the coordinate functions f i of the mapping f is differentiable at

that point.

Proof: If f is a differentiable mapping it follows from (2.1) that each coordinatefunction f i satisfies

(2.2) f i(a + h) = f i(a) +mj=1

aijhj + i(h) where limh→0|i(h)|h = 0,

since |i(h)|h∞ ≤ (h)∞

h∞ ; and this is just the condition that each of the coordinate

functions f i of the mapping f is differentiable at the point a. Conversely if eachof the coordinate mappings f i is differentiable at the point a then (2.2) holds for1 ≤ i ≤ n. The collection of these n equations taken together form the equation

(2.1) in which (h) =

i(h)

; and since (h)∞h∞

= max1≤i≤n|i(h)|h∞

it follows

that limh→0(h)∞h∞ = 0, so the mapping f is differentiable. That concludes the

proof.

For the special case of a vector h = hj for which hj = 0 for j = k for someindex k , equation (2.2) takes the form

f i(a1, . . . , ak + hk, . . . , am) = f i(a1, . . . , ak, . . . , am) + aikhk + i(hk)

where limhk→0|i(hk)||hk| = 0; that is just the condition that f i(x), viewed as a

function of the variable xk alone for fixed values xj = aj for the remainingvariables for j = k , is a differentiable function of that variable xk and that itsderivative at the point xk = ak is the real number aik. The constant aik iscalled the partial derivative of the function f i with respect to the variable

xk at the point a, and is denoted by aik = ∂ kf i(a). It follows that the entriesin the matrix A = aik are the uniquely determined partial derivatives of thecoordinate functions of the mapping; this matrix is called the derivative of themapping f at the point a and is denoted by f (a), so

(2.3) f (a) =

f (a)ik = ∂ kf i(a) 1 ≤ k ≤ m, 1 ≤ i ≤ n

.

Generally it is a fairly straightforward matter to calculate the partial derivativesof a function; merely consider all the variables except one of them as constants,and apply the familiar techniques for calculating derivatives of a function of one variable. That can be applied to each coordinate function of a mapping,to yield the derivative of that mapping. It is evident from (2.1) that a linearcombination c1f 1 + c2f 2 of two differentiable mappings f 1, f 2 : Rm

−→ Rn at a

point a for any constants c1, c2 ∈ R is again a differentiable mapping at thatpoint, and it follows from (2.3) that differentiation is linear in the sense that(c1f 1 + c2f 2)(a) = c1f 1(a) + c2f 2(a). There are various alternative notations forderivatives and partial derivatives of functions of several variables that are in

common use. For instance Df (a) is often used for f (a), and Dkf (a) or ∂f

∂xk(a)



2.1. THE DERIVATIVE 19

for ∂ kf (a) when f (x) is a real-valued function of the variable x ∈Rn; and whenthe mapping f : Rm

−→ Rn is viewed as giving the coordinates yi of a point

y ∈ Rn as functions yi(x) of the coordinates xj of points x ∈ Rm the notation∂yi

∂xjis quite commonly used for ∂ jyi.

If a mapping f : Rm −→ Rn is differentiable at a point a ∈ Rm then ithas partial derivatives ∂ kf i(a) with respect to each variable xj at that point;but it is not true conversely that if the coordinate functions of a mapping f have partial derivatives at a point a with respect to each variable xj then f is adifferentiable mapping. For example the mapping f : R2 −→ R defined by

f (x) =

x1x2

x21 + x2

2

if x = 0,

0 if x = 0

vanishes identically in the variable x2 if x1 = 0, so ∂ 2f (0, 0) = 0, and similarly∂ 1f (0, 0) = 0. This function is not even continuous at the origin, since forinstance it takes the value 1

2 whenever x1 = x2 except at the origin where ittakes the value 0; hence it is not differentiable at the origin. However if thepartial derivatives of a mapping not only exist but also are continuous then themapping is differentiable.

Theorem 2.2 If the partial derivatives of a mapping f : Rm → Rn exist at all

points near a and are continuous at the point a then the mapping f is differen-

tiable at the point a.

Proof: In view of Theorem 2.1 it is enough to prove this for the special casethat n = 1, in which case the mapping f : Rm → R is just a real valued function;and for convenience only the case m = 2 will be demonstrated in detail, sinceit is easier to follow the proof in the simpler case and all the essential ideasare present. Assume that the partial derivatives ∂ kf i(x) exist for all points xnear a and are continuous at a = aj, and consider a fixed vector h = hj.When one of the variables is held fixed and f is viewed as a function of theremaining variable it is a differentiable function of a single variable. The meanvalue theorem for functions of a single variable asserts that if f (x) is continuousin a closed interval [a, b] and is differentiable at each point of the open interval(a, b) then f (b)−f (a) = f (c)(b−a) for some point c ∈ (a, b); this can be appliedto the function f (x1, a2) of the single variable x1 in the interval between a1 anda1 +h1, and to the function f (a1 +h1, x2) of the single variable x2 in the interval

between a2 and a2 + h2 if h is sufficiently small; as a consequence there existvalues α1 between a1 and a1 + h1 and α2 between a2 and a2 + h2 such that

f (a1 + h1, a2) − f (a1, a2) = h1 ∂ 1f (α1, a2) and

f (a1 + h1, a2 + h2) − f (a1 + h1, a2) = h2 ∂ 2f (a1 + h1, α2).




Then

f (a + h) − f (a) = f (a1 + h1, a2 + h2) − f (a1, a2) ==

f (a1 + h1, a2 + h2) − f (a1 + h1, a2)

+

f (a1 + h1, a2) − f (a1, a2)

= h2 ∂ 2f (a1 + h1, α2) + h1 ∂ 1f (α1, α2)

= h2 ∂ 2f (a1, a2) + h1 ∂ 1f (a1, a2) + (h)

where

(h) = h2

∂ 2f (a1 + h1, α2) − ∂ 2f (a1, a2)

+ h1

∂ 1f (α1, a2) − ∂ 1f (a1, a2)

.

By the triangle inequality

|(h)|h∞ ≤

∂ 2f (a1 + h1, α2) − ∂ 2f (a1, a2)

+∂ 1f (α1, a2) − ∂ 1f (a1, a2)

since |

h1|h∞ ≤ 1 and |

h2|h∞ ≤ 1; and since the partial derivatives are assumed to

be continuous at the point a it follows that limh→0|(h)|h∞ = 0. That shows that

the mapping f : R2 −→ R is differentiable at the point (a1, a2), which sufficesto conclude the proof.

If a mapping f : Rm −→ Rn is differentiable at all points of an open subsetU ⊂ Rm the partial derivatives of the coordinate functions of the mappingf exist at each point of U , but need not be continuous functions in U . Thestandard example for functions of a single variable is the function

f (x) =

x2 sin 1

x2 if x = 0,

0 if x

= 0,

which is differentiable at all points x ∈ R but for which the derivative is notcontinuous at the point 0. Indeed if x = 0 then

f (x) = 2x sin 1

x2 − 2

x cos

1

x2,

which is not bounded as x tends to zero; however

(2.4) f (0) = limh→0

h2 sin 1

h2 − 0

h − 0 = lim

h→0h sin

1

h2 = 0

since | sin 1h | ≤ 1 for h = 0. Thus the hypothesis in the preceding theorem is

strictly stronger than just the assumption that the partial derivatives of thecoordinate functions of the mapping exist. A mapping f : U −→ Rn defined in

an open subset U ⊂ Rm

is said to be continuously differentiable or of classC1 in an open subset U ⊂ Rm if the partial derivatives of its coordinate functionsexist and are continuous throughout the set U . By the preceding theorem, thatis precisely equivalent to the condition that the mapping f is differentiable ateach point a ∈ U and the matrix function f (a) is a continuous function of thevariable a ∈ U .



2.2. THE CHAIN RULE 21

2.2 The Chain Rule

If g : U −→ Rm is a mapping defined in an open neighborhood U of a pointa ∈ Rl and f : V −→ Rn is a mapping defined in an open neighborhood V of thepoint b = g(a) ∈ Rm, where g(U ) ⊂ V , the composition φ = f g : U −→ Rn

is the mapping defined by φ(x) = f

g(x)

for any x ∈ U ; this situation isdescribed in the following diagram.

(2.5)

Rl Rm Rn U

g−−−−→ V f −−−−→ W φ = f g

a g

−−−−→ b = g(a)

f

−−−−→ f (b) = φ(a)

Theorem 2.3 If the mapping g is differentiable at the point a and the mapping

f is differentiable at the point b = g(a) then the composite function φ = f gis differentiable at the point a and φ(a) = f

g(a)

· g(a).

Proof: Since the mapping f is differentiable at the point b

(2.6) f (b + h) = f (b) + f (b)h + f (h) where limh→0

f (h)∞h∞ = 0,

and since the mapping g is differentiable at the point a

(2.7) g(a + k) = g(a)

b

+ g(a)k + g(k)

h

where limk→0

g(k)∞k∞ = 0.

Substituting (2.7) into (2.6) where b = g(a) and h = g (a)k + g(k) leads tothe result that

φ(a + k) = f

g(a + k)

= f

b + h

= f (b) + f (b)h + f (h)

= φ(a) + f (b)

g(a)k + g(k)

+ f (h)

hence that

(2.8) φ(a + k) = φ(a) + f (b)g(a)k + (k)

where (k) = f (b)g(k) + f (h). From the inequality (1.10) and the triangleinequality it follows that

(k)∞k∞ ≤ f (b)g(k)∞

k∞ + f (h))∞

h∞ · g(a)k + g(k)∞k∞

≤ nf (b)∞ g(k)∞k∞ +

f (h))∞h∞ ·

n g (a)∞ +

g(k)∞k∞

.




Since limk→0g(k)∞k∞ = 0 and limh→0

f (h)∞h∞ = 0 while limh→0 k = 0 it

follows from the preceding equation that limk→0

(k)∞k∞ = 0, and it then followsfrom (2.8) that h = f g is differentiable at the point a and that φ(a) =

f

g(a) · g(a), which concludes the proof.

For a simple application of the chain rule, since the function g(y1, y2) = y1y2

is differentiable at any point y1, y2 ∈ R and g (y1, y2) = (y2 y1) it follows thatfor any differentiable mapping f = f 1, f 2 : U −→ R2 in an open subsetU ⊂ Rm the composition φ = g f : U −→ R is a differentiable mapping and

φ(x) = g

f (x)

f (x) =

f 2(x) f 1(x)

·

f 1(x)f 2(x)

= f 2(x)f 1(x) + f 1(x)f 2(x);

this is just an extension of the familiar product rule for differentiating functionsfrom the case of functions of a single variable to the case of functions of severalvariables, since φ(x) = f 1(x)f 2(x). The entries in the matrix φ(x) thus havethe form

∂ kφ(x) = f 2(x)∂ kf 1(x) + f 1(x)∂ kf 2(x).

The corresponding argument shows that the quotient f 1/f 2 of two differentiablefunctions is differentiable at any point x at which f 2(x) = 0, and that itsderivative has the expected form. For another application of the chain rule, if f : U −→ V is a one-to-one mapping between two open subsets U, V ⊂ Rm

and if g : V −→ U is the inverse mapping then g f : U −→ U is the identitymapping g f (x) = x, so (g f )(x) = I where I is the m × m identity matrix.If the mappings f and g are continuously differentiable then by the chain ruleg

f (x) · f (x) = (g f )(x) = I; thus the matrix g

f (x)

is the inverse of the

matrix f (x) at each point x ∈ U , so both matrices are nonsingular matrices ateach point x

∈ U .

An alternative notation for the chain rule is suggestive and sometimes quiteuseful. When mappings f : Rl −→ Rm and g : Rm −→ Rn are described interms of the coordinates t = t1, . . . , tl ∈ Rl, x = x1, . . . , xm ∈ Rm andy = y1, . . . , yl ∈ Rn, the coordinate functions of the mappings f , g and φhave the form yi = f i(x) = φi(t) and xj = gj(t). The partial derivatives aresometimes denoted by

(φ)ik = ∂ kφi = ∂yi

∂tk, (f )ij = ∂ jf i(x) =

∂yi

∂xj, (g)jk = ∂ kgj(t) =

∂xj

∂tk.

By the preceding theorem the derivative of the composite function φ = f g isthe matrix product φ = f g, which in terms of the entries of these matrices is(φ)ik =

nj=1(f )ij(g)jk or equivalently ∂ kφi =

nj=1 ∂ jf i · ∂ kgj; and in the

alternative notation this takes the form

(2.9) ∂yi

∂tk=

nj=1

∂yi

∂xj· ∂ xj

∂tk.

This is the extension to mappings in several variables of the traditional formula-tion of the chain rule for functions of a single variable as the identity dy

dt = dydx · dxdt ;



2.2. THE CHAIN RULE 23

this form of the chain rule is in some ways easier to remember, and with somecaution, easier to use, than the version of the chain rule in the preceding the-

orem. It is customary however to omit any explicit mention of the points atwhich the derivatives are taken; so some care must be taken to remember thethe derivative ∂yi

∂xjis evaluated at the point x while the derivatives ∂yi

∂tkand

∂xj∂tk

are evaluated at the point t. This lack of clarity means that some caution mustbe taken when this notation is used.

Some care also must be taken with the chain rule in those cases wherethe compositions are not quite so straightforward. For instance if φ(x1, x2) =f

x1, x2, g(x1, x2)

for a function f (x1, x2, x3) of three variables and a functiong(x1, x2) of two variables, the function φ is really the composition φ = f Gof the mappings f : R3 −→ R1 given by the function f and the mappingG : R2 −→ R3 given by

G(x1, x2) = x1

x2

g(x1, x2) ,

so if the function g is differentiable it follows from the chain rule that

φ(x) = f

G(x)

g(x) =

∂ 1f

G(x)

∂ 2f

G(x)

∂ 3f

G(x)

1 0

0 1

∂ 1g(x) ∂ 2g(x)

=

∂ 1f

G(x)

+ ∂ 3f

G(x)

∂ 1g(x) ∂ 2f

G(x)

+ ∂ 1f (x)∂ 2g(x)

hence the coordinate functions of the matrix φ(x) are

∂ 1φ(x) = ∂ 1f

G(x)

+ ∂ 3f

G(x)

∂ 1g(x),

∂ 2φ(x) = ∂ 2f

G(x)

+ ∂ 1f

G(x)

∂ 2g(x).

This amounts to calculating the partial derivative ∂ 1φ(x) as the sum of thepartial derivatives of the function f

x1, x2, g2(x1, x2)

with respect to each of

its three variables, and multiplying each of these derivatives by the derivativeof what is in the place of that variable with respect to the variable x1. Somepractice, checked by going back to the form of the chain rule give in Theorem 2.3,may prove helpful; and if there are any doubts about an application of the chainrule they can be cleared up by identifying the function as an explicit composition

of mappings. It should be noted that in this case the meaning of the expression∂

∂x1f

x1, x2, x3

where x3 = g(x1, x2)

is not clear; it may mean either the derivative of the function f with respect toits first variable or the derivative of the composite function of the two variables




x1, x2 with respect to the variable x1, while ∂ 1f (x1, x2, x3) where x3 = g(x1, x2)is less ambiguous.

The chain rule also is useful in deriving information about the derivatives of functions that are defined only implicitly. For example if a function f (x1, x2)satisfies the equation

f (x1, x2)5 + x1f (x1, x2) + f (x1, x2) = 2x1 + 3x2

and the initial condition that f (0, 0) = 0 then the values of that function aredetermined implicitly but not explicitly by the preceding equation. This equa-tion is the condition that the composition of the mapping F : R2 −→ R3 definedby F (x1, x2) =

x1, x2, f (x1, x2)

and the mapping G : R3 −→ R defined by

G(x1, x2, y) = y5 + x1y + y − 2x1 − 3x2 is the trivial mapping G F (x1, x2) = 0,so that (G F )(0, 0) = 0; and if the function f is differentiable it follows fromthe chain rule that

∂ 1(GF ) = 5f (x1, x2)4

∂ 1f (x1, x2) + x1∂ 1f (x1, x2) + f (x1, x2) + ∂ 1f (x1, x2)−2,so since ∂ 1(G F ) = 0 and f (0, 0) = 0 the preceding equation reduces to∂ 1f (0, 0) = 2. A similar calculation yields the value of ∂ 2f (0, 0).

2.3 Higher Derivatives

If f : U −→ R is a function defined in an open set U ⊂ R2 and if the partialderivative ∂ j1f (x) exists at all points x ∈ U the function ∂ j1f (x) may itself havepartial derivatives, such as ∂ j2

∂ j1f (x)

, which for convenience is shortened

to ∂ j2∂ j1f (x); and the process may continue, leading to ∂ j3∂ j2∂ j1f (x) and soon. The order in which successive derivatives are taken may be significant; forexample, a straightforward calculation for the function

f (x1, x2) =

x1x2(x2

1 − x22)

x21 + x2

2

if (x1, x2) = (0, 0)

0 if (x1, x2) = 0

shows that ∂ 1∂ 2f (0, 0) = 1 but ∂ 2∂ 1f (0, 0) = −1. However for sufficientlyregular functions the order of differentiation is irrelevant.

Theorem 2.4 If f : U −→ R is a function in open subset U ⊂ R2, if the partial

derivatives ∂ 1f (x), ∂ 2f (x), ∂ 1∂ 2f (x), ∂ 2∂ 1f (x) exist at all points x ∈ U , and if

the mixed partial derivatives ∂ 1∂ 2f (x), ∂ 2∂ 1f (x) exist and are continuous at a

point a ∈ U , then ∂ 1∂ 2f (a) = ∂ 2∂ 1f (a).

Proof: By an application of the mean value theorem for functions of one vari-

able it follows that

∆ =

f (a1 + h1, a2 + h2) − f (a1 + h1, a2)

−

f (a1, a2 + h2) − f (a1, a2)

= φ(a1 + h1) − φ(a1) where φ(x1) = f (x1, a2 + h2) − f (x1, a2)

= φ(α1)h1 where α1 is between a1 and a1 + h1



2.3. HIGHER DERIVATIVES 25

if h is sufficiently small, and by another application of the mean value theoremfor functions of one variable it further follows that

φ(α1) = ∂ 1f (α1, a2 + h2) − ∂ 1f (α1, a2)

= ∂ 2∂ 1f (α1, α2)h2 where α2 is between a2 and a2 + h2

if h is sufficiently small; consequently

∆ = ∂ 2∂ 1f (α1, α2)h2h1.

On the other hand it is possible to group the terms in the equation for ∆ inanother way, so that

∆ =

f (a1 + h1, a2 + h2) − f (a1, a2 + h2)

−

f (a1 + h1, a2 + h2) − f (a1, a2)

.

The same argument used for the first grouping when applied to this groupingamounts to interechanging the roles of the two variables, which leads to theresult that

∆ = ∂ 1∂ 2f (β 1, β 2)h1h2

for some points β 1 between a1 and a1 + h1 and β 2 between a2 and a2 + h2

if h is sufficiently small; the points β 1 and β 2 are not necessarily the sameas the points α1 and α2 since they are derived by different applications of themean value theorem in one variable. Comparing the two expressions for ∆ anddividing by h1h2 shows that

∂ 2∂ 1f (α1, α2) = ∂ 1∂ 2f (β 1, β 2).

Since the functions ∂ 1∂ 2f (x1, x2) and ∂ 2∂ 1f (x1, x2) are both continuous and α1

and β 1 both approach a1 as h1 tends to 0, and correspondingly for the points α2and β 2, it follows in the limit that ∂ 2∂ 1f (a1, a2) = ∂ 1∂ 2f (a1, a2), which sufficesto conclude the proof.

The preceding result shows that so long as the mixed partial derivatives existand are continuous functions the order in which partial derivatives are taken isimmaterial; in particular that is the case for C2 functions, those for which all thesecond partial derivatives exist and are continuous. In such cases the notationcan be simplified, for example by writing ∂ 21 ∂ 2f (x) in place of ∂ 1∂ 2∂ 1f (x) or∂ 2∂ 1∂ 1f (x), or by writing ∂ j1j2f (x) in place of ∂ j1∂ j2f (x). It is sometimesconvenient to use the multi-index notation, in which ∂ I f (x) where I = (3, 2, 1)stands for ∂ 31 ∂ 22 ∂ 3f (x) for instance. Another notation frequently used is

∂ 3f (x)

∂x21∂x2 = ∂

2

1 ∂ 2f (x).

The definition (2.1) of differentiability involved an approximation of a functionby a polynomial of degree 1 for sufficiently small values of the the auxiliaryvariable h. The existence of higher order derivatives can be interpreted corre-spondingly as involving an approximation of a function by polynomials of higher




degree for sufficiently small values of the the auxiliary variable h. Since thistopic is sometimes omitted in treatments of functions of a single variable the

discussion here will begin with the results for functions of one variable, fromwhich the general result follows readily.

Theorem 2.5 (Taylor expansion in one variable) If a function f : U −→R1 has partial derivatives up to order k + 1 in an open neighborhood U ⊂ R1 of

a point a ∈ R1 then for any h ∈ R1 sufficiently small

(2.10) f (a + h) = f (a) + f (a)h + 1

2!f (a)h2 +

1

3!f (a)h3 + . . .

· · · + 1

k!f (k)(a)hk +

1

(k + 1)!f (k+1)(α)hk+1

where α is between a and a + h.

Proof: For a fixed h ∈ R sufficiently small that the closed interval from a toa + h is contained in U let

(2.11) R(x) = f (a + x) − f (a) − f (a)x − 1

2!f (a)x2 − 1

3!f (a)x3 . . .

· · · − 1

k!f (k)(a)xk − 1

(k + 1)!cxk+1

where c ∈ R is chosen so that R(h) = 0. Clearly R(0) = 0 as well, so it followsfrom an application of the mean value theorem for functions of a single variablethat R(α1) = 0 for some value α1 between 0 and h. Note that

(2.12) R(x) = f (a + x) − f (a) − f (a)x − 12! f (a)x2−· · · − 1

(k − 1)!f (k−1)(a)xk−1 − 1

k!cxk,

which is much the same as (2.11) but for the index k − 1 in place of the index k.In particular R(0) = 0, and since R(α1) = 0 it follows from another applicationof the mean value theorem for functions of a single variable that R(α2) = 0 forsome α2 between 0 and α1. The process continues, with further differentiationof the expression (2.11), yielding a sequence of points α1, α2, . . . , αk+1 whereαi+1 is between 0 and αi, hence is between 0 and h, and R(i)(αi) = 0. Finallyfor the case i = k + 1 it follows that

R(k+1)(x) = f (k+1)(x)−

c,

and since R(k+1)(αk+1) = 0 it follows that c = f (k+1)(αk+1). That suffices forthe proof.

The corresponding result in several variables can be deduced from the resultin a single variable by an application of the chain rule.



2.3. HIGHER DERIVATIVES 27

Theorem 2.6 (Taylor expansion in several variables) If f : U −→ R has

continuous partial derivatives up to order k +1 in an open neighborhood U

⊂Rm

of a point a ∈ Rm then for any h = hj ∈Rm sufficiently small

(2.13) f (a + h) = f (a) +mj=1

∂ jf (a)hj + 1

2!

mj1,j2=1

∂ j1j2f (a)hj1hj2 + · · ·

· · · + 1

k!

mj1,...,jk=1

∂ j1...jkf (a)hj1 . . . hjk

+ 1

(k + 1)!

mj1,...,jk+1=1

∂ j1...jk+1f (α)hj1 . . . hjk+1

where α is between a and a + h on the line segment connecting them.

Proof: Let φ(t) = a +th for any t

∈R and consider the function g(t) = f (φ(t)),

for which g(0) = f φ(0) = f (a) and g(1) = f φ(1) = f (a + h). By the Taylorexpansion (2.10) of the function g(t)

(2.14) g(1) = g(0) + g(0) + 1

2!g(0) + . . . +

1

k!g(k)(0) +

1

(k + 1)!g(k+1)(τ )

for some τ ∈ (0, 1). By repeated applications of the chain rule

g(t) =m

j1=1

∂ j1f (a + th)hj1 ,

g(t) =

mj1,j2=1

∂ j1j2f (a + th)hj1hj2

and in general

g(ν )(t) =m

j1,j2,...,jnu=1

∂ j1j2···jνf (a + th)hj1hj2 · · · hjν ;

substituting these values into (2.14) for t = 0 yields (2.13) for α = a + τ h andthereby concludes the proof.

As a consequence, if a function f (x) is twice differentiable near a pointa ∈ Rm and if h ∈Rm is sufficiently small then

(2.15) f (a + h) = f (a) +

mj=1

∂ jf (a)hj +

mj1,j2=1

∂ j1j2f (α)hj1hj2

for some point α between a and h on the line segment connecting them. The ex-

pression m

j1,j2=1 ∂ j1j2f (α)hj1hj2 is a quadratic form in the variables hj definedby the matrix ∂ j1∂ j2f (α); this matrix, which by Theorem 2.4 is a symmetricmatrix, is called the Hessian of the function f (x) at the point a. The lastterm in the Taylor expansions (2.10) and (2.13) is called Lagrange’s form of theremainder; there are various alternative forms for the last term that are oftenused, but they will not be needed in the discussion here.




2.4 Functions

The special case of ordinary real-valued functions of m variables is particu-larly important and will be considered in more detail here. The derivative f (x)of a function is a 1 × m matrix, so it is not a vector in the sense used for pointsx ∈ Rm, which are always viewed as column vectors; but the transpose tf (x)is a column vector, so it can be viewed as an ordinary vector in Rm. To avoidconfusion and keep these distinctions clear, the transpose of the derivative isoften denoted by ∇f (x), although an earlier but still used alternative notationis gradf (x); and this vector is called the gradient of the function f (x). Thusfor a function f (x1, x2) of two variables

f (x) =

∂ 1f (x) ∂ 2f (x)

and ∇f (x) = gradf (x) =

∂ 1f (x)

∂ 2f (x

.

Although this is really a rather trivial point, the convention that the gradient of a function defined in a subset of a vector space Rm is a vector in the same senseas all other vectors in Rm is quite standard and convenient; ∇f (x) is a vectorthat can be used in the same context as the vector x, either in the addition of vectors or in the dot or inner product of two vectors.

As a first example of this usage, a straight line through a point a ∈ Rm inthe direction of a unit vector u can be described parametrically as the set of points x = φ(t) for t ∈ R, where φ : R −→ Rm is the mapping φ(t) = a + tu.If f : U −→ R is a differentiable function in an open set U ⊂ Rm containingthe point a the restriction of f to this straight line can be viewed as a function(f φ)(t) = f (a + tu) of the parameter t ∈ R near the origin. The derivativeof this restriction is called the directional derivative of the function f at the

point a in the direction of the unit vector u, and is denoted by ∂ uf (a). Itfollows from the chain rule that ∂ uf (a) = f (a)φ(0); the coordinate functionsof the mapping φ(t) are φj(t) = aj + ujt so the derivative φ(0) is just the m ×1matrix or vector φ(0) = uj, hence

(2.16) ∂ uf (a) =mj=1

∂ jf (a)uj = ∇f (a) · u.

Thus the directional derivative of the function f in the direction of a unit vectoru is the dot product of the gradient of the function f at the point a with theunit vector u. The angle θ between the vectors ∇f (a) and u is determined by

(2.17) cos θ = ∇f (a)

·u

∇f (a)2 =

1

∇f (a)2 ∇f (a)

since u is a unit vector; so it follows that the directional derivative is ∂ uf (a) =∇f (a)2 cos θ, the length of the projection of the gradient vector ∇f (a) in thedirection of the unit vector u. Thus the maximal directional derivative of thefunction f (x) at the point a is in the direction of the vector ∇f (a) and is equal



2.4. FUNCTIONS 29

to ∇f (a)2, while the minimal directional derivative is in the direction of thevector

−∇f (a) and is equal to

−∇f (a)

2; and ∂ uf (a) = 0 for any vector u

orthogonal to the gradient ∇f (a). The gradient ∇f (a) of a function f at apoint a is a vector in the direction in which f is increasing most rapidly; andthe length ∇f (a) of the gradient vector is the maximum rate of increase of the function f .

The gradient notation is also commonly used in examining extremal valuesof a differentiable function f defined in an open set U ⊂ Rm. A point a ∈ U is a local maximum (plural: local maxima) of f if f (a) ≥ f (x) for all pointsx near a, and is a local minimum (plural: local minima) of the function f if f (a) ≤ f (x) for all points x near a; in either case the point a is called alocal extremum (plural: local extrema) of the function f . It is familiar thatlocal extrema of differentiable functions f (x) of a single variable occur amongthose points a at which f (a) = 0. Local extrema of a differentiable functionf of several variables occur among those points a at which

∇f (a) = 0, which

points are called the critical points of the function f . That is quite clear,since if a is a local extremum then it is a local extremum of the function f (x)as a function of any one coordinate xj when the remaining coordinates are heldfixed, so by the one-variable result that ∂ jf (a) = 0; and since that is the casefor all indices j it follows that f (a) = ∂ jf i(a) = 0 hence the point a is acritical point. Not all critical points are local extrema though; the origin is acritical point of the function f (x) = x3 of a single variable but is neither a localmaximum or a local minimum. The process of finding maxima or minima of functions defined in an open subset U ⊂ Rm begins with the identification of the critical points of the function, which are the candidates for at least localextrema. In many cases it is fairly easy to tell by close inspection whether alocal extremum is a local maximum or local minimum; and if that fails there

is an analogue for functions of several variables of the second-derivative testfamiliar from the examination of extrema for functions of a single variable. If f is a twice continuously differentiable function with a critical point a ∈ Rm thensince f (a) = 0 the Taylor expansion(2.15) takes the form

f (a + h) = f (a) +

mj1,j2=1

∂ j1j2f (α)hj1hj2

where α is a point between a and a+h on the line joining these two points. If theHessian ∂ j1j2f (a) is positive definite then by continuity the Hessian ∂ j1j2f (α)is positive definite for all points α ∈ U for an open neighborhood U of a, sothat

m

j1,j2=1

∂ j1j2f (α)hj1hj2 ≥ 0 for all α ∈ U ;

consequently f (a + h) ≥ f (a) so a is a local minimum of the function f . Simi-larly if the Hessian ∂ j1j2f (a) is negative definite then a is a local maximum. If the Hessian is neither positive definite nor negative definite the second derivativetest generally provides no information; but if the Hessian has some strictly posi-tive and some strictly negative eigenvalues then a is neither a local maximum nor




local minimum, but is a point called a saddle point of the function. Geometri-cally the graph of a function of two variables near such a point looks something

like an ordinary saddle; it is concave upwards in one direction and concavedownwards in another direction. The difficulty lies in determining whether theHessian is positive or negative definite. The simplest general result is that asymmetric matrix is positive definite if and only if the principal minors (thedeterminants of all submatrices formed from the first k rows and columns for1 ≤ k ≤ n) are all strictly positive, and the matrix is negative definite if andonly if the principal minors alternate in sign beginning with a negative sign,so the 1 × 1 minor is negative, the 2 × 2 minor is positive, and so on. For thecase of functions in Rm for sufficiently small m this is fairly easy to apply: for

example if m = 2 the principal minors of a matrix M =

m11 m12

m21 m22

are m11

and m11m22 − m12m21, so it is fairly easy to see whether the critical point isa local maximum or local minimum, although it does involve calculating some

second derivatives. It is sometimes tempting to think that a function f has alocal minimum at a ∈ Rm if ∂ 2j f (a) ≥ 0 for 1 ≤ j ≤ m; but that is not neces-sarily the case, since for example the matrix ( 1 2

2 1 ) has strictly positive diagonalentries but is not positive definite since the determinants of the two principalminors are 1 and −3.

The condition that a function f is differentiable at a point a can be viewedas the condition that f can be approximated near the point a by an affinefunction, a polynommial of degree 1 in the variables in Rm, and that the errorin this approximation is fairly small; this interpretation is often quite useful inpractice. It is often expressed as the condition that for small changes ∆x inthe coordinates of a point in Rm the change ∆f in the value of the function isapproximately a linear function of ∆x; for (2.1) can be written

(2.18) ∆f (x) = f (x + ∆x) − f (x) = f (x)∆x + (∆x),

so the change ∆f (x) in the value of the function f (x) is approximately equal tothe linear function f (x)∆x of the change ∆x in the variable x, and the error(∆x) is much smaller than the change ∆x in the variable since

lim∆x→0

(∆x)/∆x = 0.

If the function f (x) is C2 in an open neighborhood U of the point x and if M = supt∈U ∂ j1j2f (t)∞ is a bound on the Hessian of the function f (t) fort ∈ U then

m

j1,j2=1

∂ j1j2f (t)hj1hj2

≤ M m2h2

∞

for all points t ∈ U , and it follows from (2.15) that

(2.19)(∆x)

≤ M m2∆x2∞;

this is a rough but sometimes useful estimate for the size of the error termin(2.1). In practice it is often convenient to have a quick approximation for



2.4. FUNCTIONS 31

the values of a function f in a neighborhood of a point when the value of thefunction at that point is known.

Theorem 2.7 (The Mean Value Theorem) If f : U −→ R is a differen-

tiable function in an open set U ⊂ Rm and if two points a, b and the line λ joining them lie in U then

(2.20) f (b) − f (a) = ∇f (c) · (b − a)

for some point c between a and b on the line λ joining these two points.

Proof: The function g (t) = f

a + t(b − a)

) is a differentiable function of thevariable t in an open neighborhood of the interval [0, 1], and by the mean valuetheorem for functions of one variable g (1) − g(0) = g (τ ) for a point τ ∈ (0, 1).

Since g(1) = f (b) and g(0) = f (a), while by the chain rule

g(t) =mj=1

∂ jf

a + t(b − a)

(bj − aj),

it follows that

f (b) − f (a) = g(1) − g(0) = g(τ ) =

mj=1

∂ jf

a + τ (b − a)

(bj − aj)

=mj=1

∂ jf (c)(bj − aj) = ∇f (c) · (b − a)

where c = a + τ (a − b), and that suffices for the proof.

The mean value theorem does not extend directly to mappings f : U −→ Rn

for n > 1; for although the preceding theorem can be applied to each coordinatefunction of the mapping f , the points at which the derivatives of the differentcoordinate functions are evaluated may be different. However for some purposesan estimate is useful enough.

Corollary 2.8 (The Mean Value Inequality) If f : U −→ Rn is a differ-

entiable mapping in an open set U ⊂ Rm and if two points a, b and the line λ joining them lie in U then

(2.21) f (b) − f (a)∞ ≤ m√ nf (c)∞b − a∞ for some point c between a and b on the line λ.

Proof: For any vector u ∈ Rn for which u2 = 1 the dot product f u(x) =u · f (x) is a real-valued function to which the Mean Value Theorem can be




applied, so

f u(b) − f u(a) = ∇f u(c) · (b − a) =mj=1

∂ j(u · f )(c) (bj − aj)

=

ni=1

mj=1

ui∂ jf i(c)(bj − aj) =

ni=1

mj=1

ui · f (c)ij(bj − aj)

= u ·

f (c)(b − a)

for some point c between a and b on the line λ joining these two points. If theunit vector u is chosen to lie in the direction of the vector f (b) − f (a) then

f (b) − f (a)2 = |f u(b) − f u(a)| =

u ·

f (c)(b − a)

≤ f (c)(b − a)2

by the Cauchy-Schwarz inequality, so by (1.7) and (1.10)

f (b) − f (a)∞ ≤ f (b) − f (a)2 ≤ f (c)(b − a)2

≤ √ nf (c)(b − a)∞ ≤ m

√ nf (c)∞b − a∞,

which suffices for the proof.

Corollary 2.9 If f : U −→ Rn is a differentiable mapping in a cell ∆ ⊂Rm such that f (x)∞ ≤ M for all x ∈ U then the mapping f is uniformly

continuous in ∆.

Proof: If f (x)∞ ≤ M for all x ∈ ∆ then for any two points x, y ∈ ∆ theline joining them is contained in ∆ so it follows from the Mean Value Inequalitythat

f (x) − f (y)∞ ≤ M nx − y∞,

which shows that the mapping f is uniformly continuous in ∆ and therebyconcludes the proof.

One way in which to avoid the question whether a vector is a row vector orcolumn vector is to write the vector as an explicit linear combination of basisvectors in the vector space Rm. If f (x) = xj then

f (x) = 0 · · · 0 1 0 · · · 0 ,

where the entry 1 is in column j; thus f (x) is independent of x and actuallyis one of the standard basis vectors for the vector space Rm. It is tempting toavoid introducing a separate notation and to denote this function just by xj .The standard notation for its derivative then would be xj , which is somewhatconfusing since xj commonly is used to denote another set of variables in Rn;



2.4. FUNCTIONS 33

it is clearer and rather more customary to denote the derivative of the functionxj by dxj , so that

(2.22) dxj =

0 · · · 0 1 0 · · · 0

with the entry 1 in column j and all other entries 0. With this notation thevector f j(x) for an arbitrary differentiable function f can be denoted corre-spondingly by df (x) and written in terms of the basis (2.22) as

(2.23) df (x) =mj=1

∂ jf (x) dxj .

This form for the derivative f (x) is called a differential form, or to be moreexplicit, a differential form of degree 1. More generally a mapping f : U −→ Rm

when viewed as associating to a point x

∈ Rm a vector also of dimension m

is called a vector field on the subset U ⊂ Rm; this is familiar from physics,for the electric, magnetic and gravitational fields. A vector field f in an opensubset U ⊂ Rm with the coordinate functions f j(x) also can be written as alinear combination of the standard basis vectors in Rn and consequently as adifferential form of degree 1, explicitly

(2.24) ωf (x) =

mj=1

f j(x) dxj .

The discussion of differential forms will be taken up again in Chapter 5.




PROBLEMS: GROUP 1

(1) Find the derivatives of the following mappings:

(i) f : R3 −→ R3 where f (x) =

x1 + x2

x32 + x2

3

x1x3

(ii)f : R2 −→ R1 where f (x) = sin(x2

1 + x2)

(iii)f : R1 −→ R2 where f (x) =

cos(sin x)

esinx

(2) Find the following iterated partial derivatives:

( i) ∂ 1∂ 2 sin(x1x2), (ii) ∂ 21 sin(x1x2), (iii) ∂ 1∂ 2∂ 3 ex1+x2+x3 .

(3) Find ∂ 2

∂r2f (r cos θ, r sin θ) where f (x1, x2) is a twice continuously differ-

entiable function of two variables.

(4) Find the local maxima and minima of the function

f (x) = log(x21 + x2

2 + 1)

in the plane R2.

(5) Show that the function

f (x1, x2) = (x21 + 2x2

2)e−x21−x22

has a minimal and a maximal value in R2; find the points at which these ex-trema are attained, and the values of the function f at those points.

(6) Find the directional derivative

∂ u

x21x2 + sin(πx1x3)

at the point (0, 1, 2) in the direction of the unit vector u = 1√

14(1, 2, 3).

(7) Find the direction in which the function

f (x) = x1x2 + x2x3 + x3x1

in R3 is increasing and decreasing most rapidly at the point (1, 1, 2)



2.4. FUNCTIONS 35

PROBLEMS: GROUP 2

(8) Show that the function f (x) = |x1x2| (the non-negative square root)is not differentiable at the origin. Show that a function f (x) such |f (x)| ≤ x2

2

in an open neighborhood of the origin is differentiable at the origin, and findits derivative at the origin. If a function f (x) satisfies |f (x)| ≤ x2 in an openneighborhood of the origin is it necessarily differentiable at the origin? (Eithershow that it is differentiable or give a counterexample.)

(9) If f 1, f 2 are C 1 functions defined in an open neighborhood of the point(2, 1, −1, −2) in R2 such that f 1(2, 1, −1, −2) = 4 and f 2(2, 1, −1, −2) = 3, andthat

f 21 + f 22 + 4 = 29 and f 21

x21

+ f 22x2

2

+ 4 = 17,

find ∂ 1f 2(2, 1, −1, −2) and ∂ 2f 1(2, 1 − 1, −2).

(10) If f is a C1 real-valued function in R2 such that

f (1, 2) = f (1, 4) = f (2, 1) = f (4, 1) = 1

and

∂ 1f (1, 1) = −1, ∂ 1f (1, 2) = 1, ∂ 1f (1, 4) = 3, ∂ 1f (2, 1) = 4, ∂ 1f (4, 1) = 2,∂ 1f (1, 1) = −1, ∂ 2f (1, 2) = 5, ∂ 2f (1, 4) = −1, ∂ 2f (2, 1) = 2, ∂ 2f (4, 1) = −3,

find ∂ 1g(1, 2) where

g(x1, x2) = f

f (x1, x2), f (x2

2, x31)

.

(11) If g is a C1 function in R2 and

f (x1, x2, x3) = g

x1, g(x1, g(x2, x3))

,

find the derivative of the function f in terms of the partial derivatives of thefunction g .

(12) Show that the set M of real 2 × 2 matrices of the form

M (a, b) =

a b−b a

form a real algebra, in the sense that the sum and product of two matrices of this

form have the same form. Show that the mapping that associates to a complexnumber z = a + i b ∈ C the matrix M (z) = M (a, b) ∈ M has the property thatM (z1 +z2) = M (z1)+ M (z2) and M (z1z2) = M (z1)M (z2). (The algebra of suchmatrices thus is isomorphic to the algebra of complex numbers). Show that theset of continuously differentiable mappings f : R2 −→ R2 satisfying the partialdifferential equation f (x) ∈ M is closed under composition, in the sense that if f




and g are solutions of this partial differential equation then so is the compositionf

g. Show that the identity mapping is a solution of this partial differential

equation, but that the mapping (x1, x2) −→ (x1, −x2) is not. (This partialdifferential equation is called the Cauchy-Riemann equation; the solutions arethose functions that are limits of complex polynomials in the complex variablez = x + i y.) Describe the set of mappings : R2 −→ R2 satisfying the partialdifferential equation f (x) ∈ N for the algebra N of matrices of the form

a b0 d

.

(13) Show that the function

f (x) =

x1|x2|

x21 + x2

2

if (x1, x2) = (0, 0), positive square root

0 if (x1, x2) = (0, 0),

has a well defined directional derivative in each direction at the origin but isnot differentiable at the origin.

(14) Find real numbers a, b such that the line y = ax + b in the plane is the bestfit to n points (xi, yi) ∈ R2, in the sense that

ni=1(yi − axi − b)2 is minimized

among all possible values of a and b.

(15) Show that a strictly subharmonic function in the plane, a function f (x)such that ∂ 21 f (x)+ ∂ 22 f (x) > 0 for all points x ∈R2, cannot have a maximum atany point in the plane. Can such a function have a minimum in the plane? Why?

(16) Show that a twice continuously differentiable function f (x1, x2) in R2 satis-fies ∂ 1∂ 2f (x1, x2) = 0 for all points x ∈R2 if and only if there are twice contin-uously differentiable functions g1, g2 in R such that f (x1, x2) = g1(x1) + g2(x2).To what extent are the functions g1(x) and g2(x) uniquely determined?

(17) Show that if f (x1, x2) is three times continuously differentiable in an openneighborhood of the origin, if the origin is a critical point at which the Hessianis also zero, and if at least one third partial derivative of f at the origin isnonzero, then the origin is not a local maximum or local minimum of f .



Chapter 3

The Rank Theorem

3.1 The Inverse Mapping Theorem

A continuous one-to-one mapping between between two open subsets of Rn

with a continuous inverse, a mapping called a homeomorphism between thesetwo sets, really identifies the two sets as far as topological properties are con-cerned. If the mapping and its inverse are both continuously differentiable, inwhich case the mapping is called a C1 homeomorphism or sometimes a C1 dif-feomorphism, this identification also extends to properties of the derivativesof functions; and if the mapping is a Cr homeomorphisms for any r > 0 thisidentification extends correspondingly to properties of all derivatives of order atmost r. It is a familiar result from calculus in one variable that if f : I −→ R1

is a continuously differentiable function in an open interval I ⊂ R1

such thatf (x) = 0 for all x ∈ I then the mapping f is a C1 homeomorphism from I to itsimage f (I ). The proof of this result is often skipped or treated casually; but itis a model for one proof of the corresponding result in higher dimensions, so itmay be helpful to review that special case first. Suppose that f : [a, b] −→ [c, d]is a continuous mapping between two closed intervals in R1 such that f is C1

in (a, b) and f (x) = 0 at each point x ∈ (a, b); since the image f ([a, b]) of thecompact connected set [a, b] is necessarily also compact and connected it mustbe a closed interval, so it can be assumed that f ([a, b]) = [c, d]. The set of pointswhere f (x) > 0 and the set of points at which f (x) < 0 are open subsets of (a, b), the union of which is the entire interval (a, b); and since an interval isconnected it must be the case that one of these two sets is empty, so eitherf (x) > 0 at all points x ∈ (a, b) or f (x) < 0 at all points x ∈ (a, b). Suppose

first that f (x) > 0 at all points x ∈ (a, b). It follows from the mean value the-orem that whenever a ≤ x1 < x2 ≤ b there is a point ξ such that x1 < ξ < x2

and f (x2) − f (x1) = f (ξ )(x2 − x1) > 0; thus f is a strictly increasing functionin [a, b], so it is a one-to-one mapping from the closed interval [a, b] onto theclosed interval [c, d] and has a well defined inverse mapping g : [c, d] −→ [a, b].For any δ > 0 the continuous positive function f (x) attains its minimum value

37



38 CHAPTER 3. THE RANK THEOREM

at a point in the compact subset [a + δ, b − δ ], and that minimum must be apositive number m > 0 since f (x) > 0 everywhere; so f (x)

≥ m > 0 for all

x ∈ [a + δ, b − δ ]. Therefore whenever a + δ ≤ x1 < x2 ≤ b − δ it follows fromthe mean value theorem that there is a point ξ such that x1 < ξ < x2 andf (x2) − f (x1) = f (ξ )(x2 − x1) ≥ m(x2 − x1). If yi = f (xi) then xi = g(yi) andthe preceding inequality can be written y2 − y1 ≥ m

g(y2) − g(y1)

. That is

the case for any points yi for which f (a + δ ) ≤ y1 < y2 ≤ f (b − δ ), which showsthat the inverse function g is continuous in the interval [f (a + δ ), f (b − δ )]; andsince that is the case for any δ > 0 it follows that the function g is continuousin (c, d). Furthermore in the interval c ≤ y1 < y2 ≤ d if xi = g(yi) then

g(y2) − g(y1)

y2 − y1=

x2 − x1

f (x2) − f (x1);

since g is continuous, x2 tends to x1 as y2 tends to y1, so in the limit the

preceding equation reduces to g(y1) = 1/f (x1), showing that the function gis differentiable and that its derivataive is also continuous. The correspondingresult of course holds if f (x) < 0 in (a, b). This argument critically used thehypothesis that the derivative f (x) is a continuous function, and the result doesnot hold without that hypothesis. Indeed the function

h(x) = 1

2x + f (x) =

1

2x + x2 sin

1

x,

where f (x) is the function (2.4) considered earlier, is differentiable at all pointsx ∈ R1 and h(0) = 1

2 ; but the derivatives of this function are unbounded andalternate in signs in any open neighborhood of 0 so h cannot be a one-to-onemapping in any open neighborhood of 0.

If f : U

−→ V is a

C1 homeomorphism between two open subsets U, V

⊂Rn

and if g : V −→ U is its inverse then gf (x) = x, and an application of thechain rule as on page 21 shows that g(f (x)

f (x) = I, the identity matrix,

and consequently that det f (x) = 0 at each point x ∈ U . Thus a necessarycondition for the mapping f to be a C1 homeomorphism is that det f (x) = 0for all x ∈ U ; the matrix f is called the Jacobian matrix of the mapping f ,and its determinant is called the Jacobian of the mapping f . This necessarycondition is sufficient locally. It is convenient first to establish a special case of that result, from which the general case follows easily.

Theorem 3.1 If f : W −→ Rn is a continuously differentiable mapping defined

in an open neighborhod W ⊂ Rn of the origin such that f (0) = 0 and f (0) = I ,the identity matrix, then for any sufficiently small open subneighborhood U ⊂ W of the origin the restriction of the mapping f to U is a

C1 homeomorphism

f : U −→ V between U and an open neighborhood V ⊂ Rn of the origin.

Proof: The mapping r(x) = f (x) − x is a continuously differentiable mappingsuch that r(0) = 0 and r(0) = 0; therefore there is a closed cell ∆ ⊂ W ⊂ Rn

centered at the origin 0 sufficiently small that the following conditions are sat-isfied:



3.1. THE INVERSE MAPPING THEOREM 39

(i) r(x)∞ ≤ 1n2 for all x ∈ ∆,

(ii) det f (x)

= 0 for all x

∈ ∆.

The first step in the proof it to establish a basic inequality on which the remain-der of the proof rests. From the Mean Value Inequality (2.21) and (i) it followsthat for any points a, b ∈ ∆ there is a point c on the line connecting those twopoints for which

r(b) − r(a)∞ ≤ n√

nr(c)∞b − a∞ ≤ 1

2√

nb − a∞;

so by (1.7)

r(b) − r(a)2 ≤ √

nr(b) − r(a)∞ ≤ 1

2b − a∞ ≤ 1

2b − a2.

Then from this inequality and the triangle inequality it follows that

b−a2 ≤ f (b) − f (a) − (b − a)2

=r(b)−r(a)

+f (b)−f (a)2 ≤ 1

2b−a2+f (b)−f (a)

2

hence that

(3.1) f (b) − f (a)2 ≥ 1

2b − a2,

which is the basic inequality.It follows from (3.1) that the mapping f : ∆ −→ Rn is an injective mapping;

for if f (b) = f (a) that inequality shows that b − a2 = 0. In particularf (x) = f (0) = 0 for any point x ∈ ∂ ∆, the boundary of the cell ∆; so since ∂ ∆is compact there is a positive number d > 0 such that

(3.2) f (x)2 ≥ d > 0 for all x ∈ ∂ ∆.

The next step is to show that the open neighborhood V of the origin defined by

(3.3) V =

y ∈Rn y2 <

d

2

is contained in the image f (∆). To demonstrate that, for any point b ∈ V the function ψ(x) = f (x) − b2 is a continuous function on ∆, and it is justnecessary to show that ψ(a) = 0 for some point a ∈ ∆. Since b ∈ V thenψ(0) = b2 < d

2 by (3.3), while if x ∈ ∂ ∆ then in view of (3.2) and thetriangle inequality

d ≤ f (x)2 ≤ f (x) − b2 + b2 = ψ(x) + b2 < ψ(x) + d2

so ψ(x) > d2 . Thus ψ(0) < ψ(x) for any point x ∈ ∂ ∆, so the function ψ(x)

must take its minimum value at some interior point a ∈ ∆; and that point isthe obvious candidate to be a zero of the function ψ(x). The square of this




function, the function ψ(x)2 =

ni=1

f i(x) − bi

2

, then also takes its minimumvalue at the point a

∈ ∆, hence that point is a critical point of ψ(x)2, a zero

of all of its partial derivatives ∂ jψ(x)2 = ni=1 2f i(x) − bi∂ jf i(x); thus 0 =ni=1 2

f i(a) − bi

∂ jf i(a) for 1 ≤ j ≤ n, which can be written equivalently in

terms of the matrix f (a) as

2f (a)

f (a) − b

= 0.

However the matrix f (a) is nonsingular by (ii), hence f (a) − b = 0 as desired.What has been shown so far is that the mapping f : ∆ −→ Rn is an injective

mapping, and that V ⊂ f (∆). Consequently the subset U = f −1(V ) ⊂ ∆ is anopen neighborhood of the point a for which f is a one-to-one mapping f : U −→V from U onto V ; so there is a well defined inverse mapping g = f −1 : V −→ U from V onto U . To show that this mapping g is continuouly differentiable,for any points y, y + k ∈ V let x, x + h be their images under the mapping

g : V −→ U , so that

(3.4)x = g(y) and x + h = g(y + k)

y = f (x) and y + k = f (x + h).

By the basic inequality (3.1)

g(y+k)−g(y)2 = (x+h)−x2 ≤ 2f (x+h)−f (x)2 = 2(y+k)−x2 = 2k2,

hence g is continuous. Since f is differentiable

k = f (x + h) − f (x) = f (x)h + f (h) where limk→0f (h)2h2 = 0,

from which it follows that

(3.5) g(y + k) − g(y) = h = f (x)−1

k + g(k)where g(k) = −f (x)−1f (h). It follows from (1.10) that f (x)−1k∞ ≤Ak∞ where A = nf (x)−1∞, so by (1.7) equivalently f (x)−1k2 ≤ Bk2

where B = √

nA, and similarly g(k)2 = − f (x)−1f (h)2 ≤ Bf (h)2;altogether then

(3.6) g(k)2

k2≤ B

f (h)2

h2· h2

k2≤ 2B

f (h)2

h2

since by the fundamental inequality (3.1) yet again

h2

k2=

(x + h) − x2

f (x + h) − f (x)2≤ 2.

Since limh→0 f (h)

2

h2 = 0 while limk→0 h = 0 as a consequence of the con-tinuity of the mapping g, it follows from (3.6) that limk→0

g(k)2k2 = 0 and

consequently from (3.5) that the mapping g is differentiable, and moreover thatg(y) = f (x)−1 so the derivative g(y) is also continuous; that suffices for theproof.




Corollary 3.2 (Inverse Mapping Theorem) If f : W −→ Rn is a continu-

ously differentiable mapping in an open neighborhod W

⊂Rn of a point a

∈Rn

such that det f (a) = 0 then for any sufficiently small open subneighborhood U ⊂ W of the point a the restriction of the mapping f to U is a C1 homeomor-

phism f : U −→ V between U and an open neighborhood V of the image point

b = f (a). If g : V −→ U is the inverse mapping then g(y) = f (g(y)−1

for

each point y ∈ V .

Proof: Introduce the invertible affine mappings φ, ψ : Rn −→ Rn defined by

φ(x) = f (a)−1x + a, ψ(x) = x − b.

The composition F = ψ f φ is then a continuously differentiable mappingfrom an open neighborhood of the origin 0 ∈ Rn into Rn such that F(0) = 0,and by the chain rule F (0) = ψ(b)f (a)φ(0) = I

· f (a)f (a)−1 = I . The

preceding theorem shows that the mapping F is invertible, and that its inverseG is a continuously differentiable mapping from an open neighborhood of theorigin into Rn. The composition g = φ G ψ is a continuously differentiablemapping from an open neighborhood of b into Rn, and since f = ψ−1 F φ−1

it follows that g f = φ G ψ ψ−1 F φ−1 is the identity mapping so thatg is the inverse mapping to f . Finally since y = f

g(y)

for any point y ∈ V

it follows from the chain rule that I = f

g(y)

g(y) where I is the identity

matrix, so that g (y) = f

g(y)−1

, and that concludes the proof.

One consequence of the Inverse Mapping Theorem is that if f : U −→ Rn is acontinuously differentiable mapping defined in an open subset U ⊂ Rn such thatdet f (x) = 0 for all points x ∈ U then f is an open mapping, in the sense that

the image of any open subset U 0 ⊂ U is an open subset f (U 0) ⊂Rn

. The inverseimage of any open set under a continuous mapping is always open; but the imageof an open subset under a general continuous mapping is not necessarily open,so open continuous mappings are a special subclass of continuous mappings.If f : U −→ Rn is an invertible continuously differentiable mapping from anopen subset U ⊂ Rn with coordinates (t1, . . . , tn) onto an open subset V ⊂ Rn

with coordinates (x1, . . . , xn), points in V can be described either in terms of the coordinates (x1, . . . , xn) in Rn or alternatively in terms of the coordinates(t1, . . . , tn) ∈ U ⊂ Rn; for this reasons the parameters (t1, . . . , tn) are oftendescribed as being another local coordinate system in V , and the mappingf is called a local change of coordinates in V . The usefulness of a changeof coordinates is that it may be possible to describe the local geometry muchmore simply in terms of the coordinates (t1, . . . , tn) than in terms of the initial

coordinates (x1, . . . , xn). For example, an arc of a circle of radius 1 centered atthe origin in the plane with a coordinate system (x1, x2) can be described locallyas a straight line segment r = 1 in terms of polar coordinates (r, θ). It shouldbe emphasized though that the Inverse Mapping Theorem is only a local result;a mapping f : U −→ Rm for which det f (x) = 0 at all points x ∈ U ⊂ Rn

need not be invertible if n > 0. For example the derivative of the mapping




f : R2 −→ R2 defined by f (r, θ) = (r cos θ, r sin θ) is the matrix

(3.7) f (r, θ) =

∂x∂r

∂x∂θ

∂y

∂r

∂y

∂θ

=

cos θ −r sin θsin θ r cos θ

for which det f (r, θ) = r. The inverse Mapping Theorem asserts that the map-ping f is locally invertible near any point (r, θ) ∈ R2 for which r = 0; but f is clearly not a one-to-one mapping in the open subset U = (r, θ) | r > 0 since f (r, θ) = f (r, θ + 2nπ) for any integer n. The problem of finding generalnecessary and sufficient conditions for a mapping to be globally invertible is afamous and difficult one1. Points at which the Jacobian of a differentiable map-ping f : Rn −→ Rn vanishes are known as singular points or singularities

of the mapping. Mappings can be very complicated indeed near their singu-lar points. For a simple example, consider again the mapping f : R2 −→ R2

given by f (r, θ) = (r cos θ, r sin θ); the origin r = θ = 0 is a singular point since

f (0, 0) =

1 0

0 0

. The mapping is sketched in the accompanying Figure 3.1. A

cell centered at the origin is mapped to the bow-tie region, and the mappingis locally one-to-one except along the axis r = 0; that entire axis is mappedto a single point, the origin. The general study of the geometry of singulari-ties of mappings is quite extensive, including what is known as the theory of catastrophes2.

An important and sometimes somewhat confusing concept is that of ori-entation. If f : U −→ Rn is a continuously differentiable mapping from anopen subset U ⊂ Rn into Rn and if det f (a) = 0 at a point a ∈ U then eitherdet f (a) > 0, in which case the mapping f is said to be orientation preserv-

ing at the point a, or det f (a) < 0, in which case the mapping f is said to be

1The Jacobian Problem is the question whether a mapping f : Cn −→ Cn from the n-

dimensional complex vector space to itself for n > 1 defined by polynomial equations suchthat det f (x) = 1 for all points x ∈ Cn is necessarily an invertible mapping from Cn ontoitself. This is a long standing open problem in mathematics, having been raised by O. H.Keller in 1939. It is known to be true for polynomial mappings f : C2 −→ C2 of degree atmost 100, but not in general, at least at the time of this writing. It is known to be false forfinite fields, rather than the complex numbers. It is also false for functions more general thanpolynomials; the mapping f : C2 −→ C

2 defined by f (x, y) = (ex, e−xy) has the derivativef (x, y) = 1 at all points (x, y) but its image excludes the axis (0 , y) and the mapping is not

one-to-one. A famous example due to Fatou and Bieberbach, and discussed for example inthe book Several Complex Variables by S. Bochner and W. T. Martin, Princeton UniversityPress, 1948, is an injective mapping f : C2 −→ C2 given by functions that are everywhereanalytic and with the derivative f (x) = 1 at all points x ∈ C2; but the image omits an opensubset of C2. A survey of the problem and some approaches, and some examples of some false

proofs, can be found in the paper by H. Bass, E.H. Connell, D. Wright in the Bulletin of the American Mathematical Society , volume 7, 1982. If it is just assumed that det f

(x) = 0 forall points x ∈ C

2, the “strong” Jacobian problem, there is a counterexample for real mappingsf : R2 −→ R

2 by S Pinchuk in Mathematische Zeitschrift , volume 217, 1994.2See for example the books Catastrophe Theory and Its Applications by Tim Poston and

Ian Stewart, New York: Dover, 1998, or Catastrophe Theory by Vladimir Arnold, Berlin:Springer-Verlag, 1992.




Figure 3.1: The mapping f : (r, θ) −→ (r cos θ, r sin θ) in an open neighborhoodof the origin.

orientation reversing near the point a. If U is a connected open set and theJacobian of the mapping f is nowhere vanishing then the mapping f is eitherorientation preserving at all points of U or orientation reversing at all pointsof U . In the case n = 1 an orientation preserving mapping is a monotonicallyincreasing mapping, and an orientation reversing mapping is a monotonicallydecreasing mapping; the orientation of R1, identified with the set of real num-

bers, is the direction of increasing numbers. If n > 1 each coordinate axis inRn is naturally oriented in the direction in which the variable increases, and theorientation of Rn itself is defined by the choice of an order of the coordinateaxes. For example in the case n = 2 with the variables x1, x2 it is customary tospecify an orientation as either clockwise or counterclockwise, where clockwisedetermines the order x1, x2 of the axes while counterclockwise determines thereversed order x2, x1 of the axes; and in the case n = 3 the right-hand rulefamiliar from physics specifies the order of the axes as that determined by theorder of the thumb and first two fingers on the right hand, when these fingers areviewed as lying along three coordinate axes. When the variables are denoted byx1, x2, . . . , xn the natural order is the corresponding order of the axes of thesevariables, first the x1 axis, then the x2 axis, and so on. However for variablessuch as r, θ in R2 or α,a, ℵ in R3 there is no natural order so it is necessary to

specify an order of the axes. Two different orders though may lead to the sameorientation. A permutation of the variables is a linear transformation M de-scribed by a permutation matrix M , a matrix that has a single entry of 1 in eachrow and column but all other entries 0; the linear transformation described by apermutation matrix, a permutation of the coordinates, is orientation preservingif det M = +1 but orientation reversing if det M = −1. Thus for n = 3 forexample the orders x1, x2, x3 and x2, x3, x1 and x3, x1, x2 determine the sameorientation since these orders arise from the first by repeated applications of the

permutation matrix M + =

0 1 00 0 11 0 0

for which det M + = +1; on the other hand

the orders x2, x1, x3 and x3, x2, x1 and x1, x3, x2 arise from the three previous

orders by by applications of the permutation matrix M − =

0 1 01 0 00 0 1

for which

det M − = −1, the permutation that switches the first two coordinates. A per-

mutation that interchanges any two consecutive vectors is orientation reversing,since det( 0 1

1 0 ) = 1, and any permutation can be written as a successive applica-tion of such interchanges; so beginning with any chosen order of the coordinates,those orders that arise by an even number of such interchanges beginning withthe initially chosen order have the same orientation as the initially chosen order,while those that arise by an odd number of such interchanges have the opposite




orientation. Thus there are aways just two possible orientations of the vectorspace Rn for any n > 0. For example, the vector space R2 with coordinates

r, θ can be given the orientation r, θ and the vector space R2 with coordinatesx, y can be given the orientation x, y. The mapping f : R2(r, θ) −→ R2(x, y)that takes a point (r, θ) to the point f (r, θ) = (r cos θ, r sin θ) has the derivativef (r, θ) = r, and consequently

if r > 0 then f (r, θ) > 0 and f is orientation preserving,if r < 0 then f (r, θ) < 0 and f is orientation reversing,if r = 0 then f (r, θ) = 0 and f is singular,

as indicated in the accompanying Figure 3.2. Orientation can be described

r

θ y

u

u

u

uv

xvv

v

1

23

41

2

3

4

λ λ λ λ 12

12

Figure 3.2: The mapping f : (r, θ) −→ (r cos θ, r sin θ) preserves the orientationon the set where r > 0 but reverses the orientation on the set where r < 0.

alternatively in terms of the unit vectors dxi along the coordinate axes ratherthan in terms of the coordinate axes themselves; and the equivalence of differentchoices of the order of the unit vectors can be handled through a special caseof the exterior algebra generated by these vectors as discussed in the Appendix.Associate to any order of the basis vectors dxi such as dxi1 , dxi2 , . . . , d xin a

formal vector denoted by

(3.8) dxi1 ∧ dxi2 ∧ · · · ∧ dxin ,

with the convention that when the order of two adjacent indices is reversed thisvector changes sign; thus

(3.9) dxi1∧· · ·∧dxik∧dxik+1∧·· ·∧dxin = −dxi1∧· · ·∧dxik+1∧dxik∧· · ·∧dxin .

This vector is not to be interpreted as a vector in the space Rn, but as a basisvector in the abstract one-dimensional vector space consisting of the productstdx1 ∧ dx2 ∧ · · · ∧ dxn for all t ∈ R. All the orders of the axes, or equivalentlyall the orders of the basis vectors dxi, that determine the same vector (3.8)describe the same orientation of Rn, and those that determine the negative of

the vector (3.8) describe the opposite orientation; thus an orientation of Rn

canbe specified by the choice of a vector dx1 ∧ dx2 ∧ · · · ∧ dxn. For example, in R2

with the coordinates r, θ one orientation is specified by the vector dr ∧ dθ andthe other possible orientation is specified by the vector dθ ∧ dr = −dr ∧ dθ.

Locally any C1 homeomorphism can be decomposed into a composition of simpler C1 homeomorphisms; that can be useful in reducing the examination




of local properties of C1 homeomorphisms to an examination of the propertiesof some simpler

C1 homeomorphisms. The permutations of the variables in

Rn and translations in Rn are two particularly simple C1 homeomorphisms; asomewhat less simple but useful C1 homeomorphism is one that changes onlya single variable, so for which the coordinate functions f i(x) are such thatf i(x) = xi for i = j while f j(x) is a function of all the variables.

Theorem 3.3 (Decomposition Theorem) A C1 mapping f : U −→ Rn de-

fined in an open neighborhood U of the origin in Rn and such that det f (0) = 0can be written in any sufficiently small open subneighborhood V ⊂ U of the ori-

gin as the composition of C1 homeomorphisms that are either (i) translations,

or (ii) permutations of the variables in Rn, or (iii) mappings that change only

a single variable.

Proof: The composition of the mapping f and the translation T y = y

−f (0)

yields a mapping T f such that T f (0) = 0. Suppose that the coordinate functionsf i of the mapping T f have the property that f i(x) = xi whenever i < r for someindex r; for convenience such a mapping will be called a mapping of type r, but

just for the course of the present proof. Any mapping is of type 1, so there is noloss of generality in assuming that the mapping T f is of type r for some r ≥ 1;but only the identity mapping is of type r for any r > n. It is also convenientto simplify the notation in the proof by writing points x ∈Rn in the form

x =

xI

xr

xII

where

xI ∈ Rr−1 I = (1, 2, . . . , r − 1)xr ∈ R1

xII ∈ Rn−r−1 II = (r + 1, r + 2, . . . , n)

and to write the mapping T f correspondingly in the form

T f (x) =

xI

f r(x)f II (x)

where

xI ∈ Rr−1 I = (1, 2, . . . , r − 1)f r(x) ∈R1

f II (x) ∈Rn−r−1 II = (r + 1, r + 2, . . . , n).

In these terms the derivative of the mapping T f is the n × n matrix

(3.10) (T f )(x) =

I r−1 0 0∂ I f r(x) ∂ rf r(x) ∂ II f r(x)∂ I f II (x) ∂ rf II (x) ∂ II f II (x)

in which I r−1 is the (r − 1) × (r − 1) identity matrix while

∂ I

f r

(x) = ∂ 1f r(x) · · ·

∂ r−

1f r(x) , a 1×

(r−

1) matrix,

∂ I f II (x) =

∂ jf i(x)r+1 ≤ i ≤ n, 1 ≤ j ≤ r−1,

an (n−r−1)×(r−1) matrix,

and similarly for the other components of the matrix (3.10). Since det(T f )(0) =det f (0) = 0, not all the entries in column r of this matrix can be zero; so if P r is a permutation matrix that interchanges rows r and j for a suitable index




r < j ≤ n the derivative (P rT f )(x) of the mapping P rT f also has the form(3.10) but with the additional property that ∂ rf r(0)

= 0. Let φr : U

−→ Rn

be the mapping defined by

φr

xI

xr

xII

=

xI

f r(x)xII

,

for which

(3.11) φr(x) =

I r−1 0 0∂ I f r(x) ∂ rf r(x) ∂ II f r(x)

0 0 I n−r−1

.

Since detφr(0) = ∂ rf r(0) = 0 it follows from the Inverse Mapping Theoremthat the mapping φr is invertible in a sufficiently small open neighborhood

U r ⊂ U of the origin. The inverse mapping of course has the correspondingform

φ−1r

yI

yryII

=

yI

gr(y)yII

for some funcetion gr; and since φ−1

r φr is the identity mapping then in par-ticular

xr = gr(xI , f r(x), xII ),

hence

φ−1r P rT f (x) =

xI

gr(xI , f r(x), xII )f II (x)

=

xI

xr

f II (x)

.

Thus the mapping φ−1r P rT f is of type r + 1. This procedure can be repeated,and finally the composition

(φ−1n P n) (φ−1

n−1P n−1) · · · (φ−1r P rT f )

is a mapping of type n + 1 so is the identity mapping; consequently

f = T −1P −1r φr · · · P −1

n−1φn−1 P −1n φn,

which exhibits the mapping f in a sufficiently small neighborhood ofthe originas the composition of mappings of type (i), (ii) and (iii) as in the statement of the theorem, thereby concluding the proof.

3.2 The Implicit Function TheoremThe Inverse Mapping Theorem can be applied to examine mappings between

spaces of different dimensions; the general result will be discussed in Section 3.3,but a special case of particular importance and usefulness will be discussed inthis section.



3.2. THE IMPLICIT FUNCTION THEOREM 47

Theorem 3.4 If f : U −→ Rn is a C1 mapping defined in an open neighborhood

U of the origin 0

∈ Rm where m > n, and if f (0) = 0 and the n

× n matrix

consisting of the first n columns of the n × m matrix f (0) is of rank n, then after shrinking the neighborhood U if necessary there is a C1 homeomorphism

φ : V −→ U between an open neighborhood V of the origin 0 ∈ Rm and the

open neighborhood U , where

(3.12) φ(t1, . . . , tm) = g1(t), . . . , gn(t) g(t)

, tn+1, . . . , tm

for some C1 functions gi(t) of the variables t ∈ V , such that gi(0) = 0 and

(3.13) (f φ)(t1 . . . , tm) = t1, . . . , tn.

Proof: The assertion of the theorem can be summarized as the existence of the

mapping φ in the diagramRm Rm Rn V

φ−−−−→ U f −−−−→ Rn

with the properties (3.12) and (3.13). To simplify the notation write pointsx ∈ Rm in the form

x =

xI

xII

where

xI ∈ Rn I = (1, 2, . . . , n)xII ∈ Rm−n II = (n + 1, n + 2, . . . , m).

The n × m matrix f (x) can be written

f (x) = ∂ I f (x) ∂ II f (x)in terms of the n×n matrix ∂ I f (x) having the entries

∂ I

f (x)

ij= ∂ jf i(x) for

1 ≤ i, j ≤ n and the n × (m − n) matrix ∂ II f (x) with entries

∂ II

f (x)

ij=

∂ jf i(x) for 1 ≤ i ≤ n and n + 1 ≤ j ≤ m. Introduce the C1 mapping ψ : U −→Rm defined by

ψ(x) = ψ

xI

xII

=

f (x)xII

and note that

ψ(x) = ∂ I f (x) ∂ II f (x)

∂ I xII ∂ II xII = ∂ I f (x) ∂ II f (x)

0 I m−n

where I m−n is the identity (m − n) × (m − n) matrix; consequently detψ(0) =det ∂ I f (0) = 0, so it follows from the inverse mapping theorem that after shrink-ing the neighborhood U if necessary the mapping ψ is a one-to-one mapping




from U onto an open neighborhood V of the image ψ(0) = 0 ∈ Rm with a C1

inverse φ = ψ−1, which of course has the form

φ(t) =

g(t)tII

for a mapping g : Rm −→ Rn for which g(0) = 0 since φ(0) = 0; so φ is amapping of the form (3.12). Then

tI tII

= (ψ φ)

tI

tII

= ψ

g(t)tII

=

f (φ(t))

tII

so f

φ(t)

= tI , which demonstrates (3.13) and thereby concludes the proof.

Corollary 3.5 If f : U

−→ Rn is a

C1 mapping defined in an open subset

U ⊂ Rm where m > n, and if rank f (a) = n at a point a ∈ U , then there is a local coordinate system in an open neighborhood of the point a such that the

mapping f − f (a) is a linear mapping described by a matrix of rank n in terms

of these coordinates.

Proof: The mapping f (x) = f (x + a) − f (a) is C1 in an open neighborhood of the origin, and rank f (0) = rank f (a) = n while f (0) = 0. By a permutation of the coordinates it can be supposed that the first n columns of the n × m matrixf (0) are linearly independent. The mapping f thus satisfies the hypothesesof the preceding theorem, so after a further C1 change of coordinates near theorigin it is a linear linear mapping of the form (3.13), a linear mapping of rankn; and since f (x) − f (a) = f (x − a) that suffices for the proof.

Corollary 3.6 If f : U −→ Rn is a continuously differentiable mapping defined in an open subset U ⊂ Rm where m > n, and if rank f (a) = n at a point a ∈ U ,then there is a local coordinate system in an open neighborhood U a of the point

a in terms of which the set x ∈ U a f (x) = f (a) is a linear subspace of

dimension m − n.

Proof: It follows from the preceding Corollary 3.5 that there is a local coordi-nate system in an open neighborhood U a of the point a such that the mappingf − f (a) is a linear mapping of rank n in terms of these coordinates; conse-quently the set of points in U a at which the mapping f − f (a) vanishes is alinear subspace of dimension m − n, which suffices for the proof.

The preceding corollaries involved only the assumption that det f (0) = 0;

but one of the most important consequences of Theorem 3.4 involves the moreprecise assumption that a particular submatrix of f (0) is nonsingular. Thefirst n × n submatrix of the derivative f (0) consists of the n columns ∂ if (0) for1 ≤ i ≤ n; so the property of the matrix f (0) in the hypothesis of Theorem 3.4can be restated as the assumption that the matrix

∂ 1f (0) · · · ∂ nf (0)

is of

its maximal rank n.




Corollary 3.7 (Implicit Function Theorem) If f : U −→ Rn is a C1 map-

ping defined in an open neighborhood U of the origin 0

∈Rm where m > n, and

if f (0) = 0 and rank ∂ 1f (0) · · · ∂ nf (0) = n, then in a sufficiently small open cell ∆ = ∆ × ∆ ⊂ U containing the origin, written as the product of a

cell ∆ ⊂ Rn in the space of the first n variables x = x1, . . . xn and a cell

∆ ⊂ Rm−n in the space of the last m − n variables x = xn+1. . . . , xm, there

is a mapping h : ∆ −→ ∆ such that f (x) = 0 for a point x = x, x ∈∆ × ∆ if and only if x = h(x).

Proof: It follows from Theorem 3.4 that if the neighborhood U is sufficientlysmall there is a C1 homeomorphism φ : V −→ U from an open neighborhoodV of the origin in the space Rm of the variables t1, . . . , tm onto U such that(3.12) and (3.13) are satisfied. Any point x ∈ U is the image x = φ(t) of aunique point t ∈ V , and it follows from (3.13) that f (x) = 0 if and only if 0 = (f

φ)(t) =

t1, . . . , tn

= 0, or equivalently if and only if t = 0 when

the point t ∈ Rm is written t = t, t where t ∈ Rn and t ∈ Rm−n; andin that case it follows from (3.12) that (xx) = x = φ(t) = g(0, t), t,so that actually t = x and consequently x = g(0, x), x or equivalentlyx = g(0, x), which suffices for the proof.

The argument in the proof of the preceding corollary is illustrated in theaccompanying Figure 3.3. The zeros of the function f form the image of the lin-ear subspace defined by the equations t1 = · · · = tn = 0 under the C1 mappingφ : Rm −→ Rm that describes the change of coordinates in an open neighbor-hood of the point a. The mapping φ does not change the last m−n coordinatesbut only the first n coordinates of the points; so the mapping preserves the“height” of points above the coordinate axis of the first n coordinates, butmoves the points by changing their first n coordinates. Points on the linear

subspace are described by the parameters ti = xi for n + 1 ≤ i ≤ m while thecoordinates ti are held constant for 1 ≤ i ≤ m − k; the zeros of the function f are also described by the parameters ti = xi for n + 1 ≤ i ≤ m.

t ,... t x , ...x1

φ

f(x) = 0f( (t)) = 0φ

t ,...,tx ,...,x

mn+1 m

n n1

n+1

Figure 3.3: Illustration of the proof of the Implicit Mapping Theorem.

In the Implicit Function Theorem, the equation f (x) = 0 can be viewed as adescription of relations among the coordinates x1, . . . , xm in Rm, which implic-itly describe some of the coordinates as functions of the remaining coordinates;




the theorem provides conditions determining which coordinates really can beviewed as explicit functions of the remaining coordinates, at least locally. For

example, if f : U −→ R1 is a continuously differentiable function in an opensubset U ⊂ Rm and if ∂ 1f (a) = 0 at some point a ∈ U then by the ImplicitFunction Theorem in an open cell ∆ ⊂ U containing the point a there is a func-tion g(x2, . . . , xn) such that f (x1, . . . , xm) = 0 for a point x1, . . . , xm ∈ ∆ if and only if x1 = g(x2, . . . , xm). The equation f (x1, x2, . . . xm) = 0 defines x1

implicitly as a function of the remaining variables, and in an open neighborhoodof any point a at which ∂ 1f (a) = 0 this equation also defines x1 explicitly asa function x1 = g(x2, . . . , xm) of the remaining variables. It is obvious thatsome condition on the function f is necessary for this to be true; for instancewithout any condition the function f (x1, . . . , xm) might be independent of thevariable x1, in which case it can say nothing at all about that variable. Theexample in which f (x1, x2, x3) = x2

1 + x22 + x2

3 − 1 illustrates another way inwhich some condition on the function f is necessary. The equation f (x) = 0can be viewed as defining x1 as the function x1 = ± 1 − x2

2 − x23, where the

square root has two well defined locally continuous values so long as x22 +x2

3 < 1;but x1 is not a well defined function in an open neighborhood of any point atwhich x2

2 + x23 = 1, or equivalently at which ∂ 1f (x1, x2, x3) = 0. If the variable

x1 is defined explicitly by x1 = g(x2, x3) then f

g(x2, x2), x2, x3

) = 0 identi-

cally in the variables x2, x3; and the chain rule can be applied to calculate thederivatives of the function g (x2, x3), since

0 = ∂

∂x2f

g(x2, x2), x2, x3

)

= ∂ 1f

g(x2, x2), x2, x3

)∂ 1g(x2, x2) + ∂ 2f

g(x2, x2), x2, x3

).

Thus the partial derivatives of g can be expressed in terms of the partial deriva-tives of f , and similarly for higher derivatives. Of course the same argumentcan be applied to another variable xi, expressing it as a function of the remain-ing variables provided that ∂ if (x) = 0. If f 1, f 2 are continuously differentiablefunctions in in an open subset U ⊂ R4 and if

(3.14) det

∂ 1f 1(a) ∂ 2f 1(a)∂ 1f 2(a) ∂ 2f 2(a)

= 0

then the set of points at which f (x) = 0 near a can be described explicitly as theset of points at which x1 and x2 are explictly given functions of the remainingvariables, say x1 = g1(x3, x4) and x2 = g2(x3, x4). Thus there are the identities

f 1

g1(x3, x4), g2(x3, x4), x3, x4

= f 2

g1(x3, x4), g2(x3, x4), x3, x4

= 0; and

the partial derivatives of these identities yield two linear equations relating thepartial derivatives of the functions g1(x3, x4) and g2(x3, x4), which can be solvedexplicitly in view of (3.14).

The set V = x ∈ U f (x) = 0 of zeros of a mapping f : U −→ Rn

defined in an open subset U ⊂ Rm is called the zero locus of the mappingf . If the mapping is continuous its zero locus is a relatively closed subset of




U , since it is the inverse image of the closed set 0 ∈ Rn under a continuousmapping. A relatively closed subset V

⊂ U of an open set U

⊂ Rn is called a

k-dimensional submanifold of U if for any point a ∈ V there are an openneighborhood U a ⊂ U and a local change of coordinates in U a such that V ∩ U ais a linear subspace of dimension k in terms of these coordinates. If the changeof coordinates is described by a continously differentiable mapping the subsetV is said somewhat more precisely to be a C1 submanifold; similarly it is said tobe a C2 submanifold if the change of coordinates is described by a C2 mapping,and so on. Corollary 3.6 can be interpreted as the assertion that if f : U −→ Rn

is a C1 mapping from an open subset U ⊂ Rm into Rn for m > n with the zerolocus V ⊂ U , and if rank f (x) = n at each point x ∈ V , then V is a C1 m − n-dimensional submanifold of U . It is quite common to rephrase this result justin terms of the coordinate functions f 1, . . . , f n of the mapping f and to speak of the set of common zeros of these functions. In these terms, Corollary 3.6 assertsthat if the gradient vectors

∇f 1, . . . ,

∇f n are linearly independent vectors at

each point of the set V of the common zeros of these functions then V is a C1

submanifold of dimension m − n in U . For example the circle

(3.15) V =

(x1, x2) ∈ R2 x2

1 + x22 = 1

⊂ R2

can be defined as the zero locus of the function f (x) = x21 + x2

2 − 1 in the plane,and since f (x) =

2x1 2x2

is a matrix of rank 1 at each point x ∈ V , at which

not both x1 = 0 and x2 = 0, the zero locus V is a one-dimensional submanifoldof the plane. Indeed it can be described locally as the linear space r = 1 interms of polar coordinates (r, θ) in the plane in a neighborhood of each point of V ; but of course there is no single coordinate neighborhood in which the entirecircle can be described by the equation r = 1, since the other coordinate θ isnot a single-valued function on the full circle. For some purposes an alternative

characterization of submanifolds is quite convenient.

Corollary 3.8 A relatively closed subset V ⊂ U of an open set U ⊂ Rm is an

(m − n)-dimensional C1 submanifold of U if and only if for any point a ∈ V ,after a permutation of the coordinates in Rm if necessary, there is an open cell

∆a = ∆I,a ×∆II ,a centered at the point a ∈ V , a product of cells ∆I,a ∈ Rn and

∆II ,a ∈ Rm−n, such that the intersection V ∩ ∆a is described by the equations

(3.16) V ∩ U a =

x ∈ U a

xi = gi(xn+1, . . . xm), 1 ≤ i ≤ n

for some C1 functions gi in the cell ∆II ,a ∈ Rk.

Proof: If V is an (m − n)-dimensional C1 submanifold of U it follows from the

Inverse Function Theorem that V can be described in an open neighborhoodof any point a ∈ V by the equations (3.16). Conversely if V is defined by theequations (3.16) these equations can be rewritten f i(x) = xi−gi(xn+1, . . . xm) =0 for 1 ≤ i ≤ n, and since ∇f i(x) are clearly linearly independent vectors itfollows that V is an (m − n)-dimensional C1 submanifold. That suffices for theproof.




The condition that a subset V ⊂ U of an open set U ⊂ Rn be a submani-fold is a local condition; so if a relatively closed subset V is a submanifold in a

neighborhood of each of its points it is a submanifold in U . Two C1 subman-ifolds V 1 ⊂ U 1, V 2 ⊂ U 2 of open sets U 1, U 2 ⊂ Rm are called homeomorphicC1 submanifolds if there is a C1 homeomorphism φ : U 1 −→ U 2 such thatφ(V 1) = V 2; this is clearly an equivalence relation that can be viewed as identi-fying two submanifolds that are essentially the same. The corresponding notionholds for topological submanifolds, for which the equivalence mapping is merelya homeomorphism, or for Cr submanifolds for r > 1. This equivalence relationreally involves not just the point set V but the particular way in which that setis imbedded in the open set U ⊂ Rm. For example a circle in R3 and a knot-ted curve in R3 are both submanifolds of R3; but they are not homeomorphictopological submanifolds, since there is no homeomorphism of R3 to itself thattransforms a circle into a knotted curve3.

The set of common zeros of a collection of functions f 1, . . . , f n for whichthe gradients ∇f 1(x), . . . , ∇f n(x) are not linearly independent at some pointsis not necessarily a submanifold at these exceptional points but may have sin-gularities of one sort or another. Of course if the gradients of these functionsare linearly independent the squares f i(x)2 of these functions have the sameset of common zeros as do the original functions, forming a submanifold; butthe gradients of the squares of these functions vanish at all their common zeros.The study of singularities of differentiable mappings has been and still is a veryactive area of mathematical research4.

3The study of knots is an old subject in mathematics, going back to work of P. G. Tait inthe 1870’s, but it is still a very active subject, with recent ties to mathematical physics andrepresentation theory among other topics. Classical knot theory focusses on the classificationof knots and the algebraic invariants that distinguish different knots. Further information can

be found in the book by R. H. Crowell and R. H. Fox, Introduction to Knot Theory , Springer,1963.

4Some classical examples of singularities are isolated singularities of algebraic curves, thetopological properties of which were investigated by K. Brauner (Abh. Math. Sem. Hamburg,vol. 6 (1928), pages 8 - 54). A polynomial function p(z1, z2) of two complex variables amountsto a pair of real functions of four real variables, the real and imaginary parts of the variablesand of the function; so the set V of points at which p(z1, z2) = 0 is a candidate to bea submanifold of dimension 2 in R4. For a linear polynomial p(z1, z2) = a1z1 + a2z2 theset V is a linear submanifold, and the intersection V ∩ S 3 of the set V and the sphereS 3 = (z1, z2)

˛˛ |z1|2 + |z2|2 = is a one-dimensional submanifold of the sphere, a linear slice

of the sphere so just an ordinary circle. If the polynomial p(z1, z2) has no constant term buta nontrivial linear term then again the set V is a 2-dimensional submanifold near the origin,and the intersection V ∩ S 3 is a submanifold of the sphere that is C1 homeomorphic to a circlein the sphere. The set V of zeros of the polynomial p(z1, z2) = z2

1 + z2

2 is a 2-dimensional

submanifold near the origin except at the origin itself, which is a singularity; actually thispolynomial splits as the product p(z1, z2) = (z1 + i z2)(z1 − i z2) of two linear functions both

of which vanish at the origin, so near the origin the set V is the union of two linear spacesof dimension 2, the linear spaces z1 + i z2 = 0 and z1 − i z2 = 0, which meet only at asingle point, the origin. In this case the intersection V ∩ S 3 is a submanifold consisting of two disjoint circles which are not linked in the three-dimensional sphere. For the polynomial

p(z1, z2) = z21

+ z32

the origin is again a singularity, and in this case the intersection V ∩ S 3 asa submanifold of the sphere is a nontrivial knot in the sphere, which is fairly easy to describe.In general the intersection V ∩ S 3 is a submanifold of the sphere that is a knot of a special



3.3. THE RANK THEOREM 53

3.3 The Rank Theorem

The Inverse Mapping Theorem can be applied to describe mappings betweenspaces in situations more general than those to which the Implicit Functionapplies, in particular to describe mappings f : Rm −→ Rn where m < n.

Theorem 3.9 (Rank Theorem) If f : U −→ Rn is a continuously differen-

tiable mapping defined in an open subset U ⊂ Rm and if rank f (x) = r for all

points in an open neighborhood of a point a ∈ U then in suitable local coordi-

nates near the points a ∈ Rm and f (a) ∈Rn the mapping f is a linear mapping

of rank r.

Proof: This theorem involves a change of local coordinates in both the domainand range of the mapping f , so requires the existence at least locally of C1

homeomorphisms ψ and θ in the diagram

Rm ψ−→ Rm f −→ Rn θ−→ Rn

so that the composition θ f ψ : Rm −→ Rn is a linear mapping of rankr. If rank f (a) = r then by suitably relabeling the variables in the domainRm and range Rn of the mapping f it can be assumed that the leading r × rsubmatrix of f (a) is a nonsingular r × r matrix; and by continuity that is alsothe case for the matrix f (x) for all points x ∈ U a ⊂ Rm for a sufficientlysmall open neighborhood U a of the point a. As before it is convenient to writepoints x ∈ Rm in the form x = ( xI

xII ) where xI ∈ Rr and xII ∈ Rm−r, andcorrespondingly to write points y ∈ Rn in the form y = ( yI

yII ) where yI ∈ Rr

and yII ∈ Rn−r. The mapping f can then be written

f xI

xII = f I (x)f II (x)

where xI , f I (x) ∈ Rr while xII ∈ Rm−r and f II (x) ∈ Rn−r; and the matrixf (x) then can be decomposed into the submatrices

f (x) =

∂ I f I (x) ∂ II f I (x)

∂ I f II (x) ∂ II f II (x)

where ∂ I f I (x) is an r × r matrix which is nonsingular for all points x ∈ U a. Inthese terms, introduce the mapping φ : U a −→ Rm defined by

φ

xI

xII

=

f I (x)xII

,

type, and its topological structure determines a good deal of the structure of the set V at itssingular point. There are analogous topological sets that arise in the examination of isolatedsingularities of the sets of zeros of polynomials in several variables, a fascinating subject inits own right; these examples can be used to introduce the exotic differentiable structures onhigher dimensional spheres, as discovered by J. W. Milnor. A more detailed discussion canbe found in Milnor’s book Singular Points of Complex Hypersurfaces, Annals of MathematicsStudies, number 61, Princeton University Press, 1968.




so that

φ(x) = ∂ I f I (x) ∂ II f I (x)

0 I m−r where I m−r is the (m − r) × (m − r) identity matrix; since the matrix ∂ I f I (x)is nonsingular for all points x ∈ U a it follows that the matrix φ(x) also isnonsingular for all points x ∈ U a. If the neighborhood U a is sufficiently small itfollows from the Inverse Mapping Theorem that the restriction of the mappingφ to U a is a C1 homeomorphism φ : U a −→ V between the open sets U a andV ⊂ Rm. Clearly the inverse mapping ψ : V −→ U a has the form

ψ

tI

tII

=

g(t)tII

for a C1 mapping gI : V −→ Rr,

and then

tI

tII = t = φψ(t) = f I ψ(t)tII hence f I ψ(t) = tI ;

consequently the composite mapping h(t) =

f ψ(t) has the form(3.17)

h(t) = f ψ(t)

=

f I ψ(t)

f II ψ(t)

=

tI

hII (t)

where hII (t) = f II

ψ(t)

∈ Rn−r,

hence

(3.18) h(t) =

I r 0

∂ I hII (t) ∂ II hII (t)

where I r is the r ×r identity matrix. On the other hand since h = f ψ then bythe chain rule h(t) = f

ψ(t)

· ψ(t); the m × m matrix ψ (t) is nonsingular

for all t ∈ U a, while by hypothesis rank f ψ(t) = r for all t ∈ U a, andconsequently rank h(t) = r for all points t ∈ U a as well. However it is evidentfrom (3.18) that rank h(t) = r if and only if ∂ II hII (t) = 0 for all points t ∈ U a;so the mapping hII is a function just of the variables tI ∈ Rr. The mappingθ : Rn −→ Rn defined by

θ

yI

yII

=

yI

yII − hII (yI )

has the derivative

θ(y) =

I r 0

−∂ ihII (yI ) I n−r

,

so after restricting the neighborhood U a further if necessary this mapping canbe viewed as a local change of coordinates in that neighborhood by the InverseMapping Theorem; and it is evident from the definition of this mapping andthe explicit form (3.17) of the mapping h, in which hII (t) is a function of thevariables tI alone, that

(θ f ψ)(t) = θ

h(t)

= θ

tI

hII (t)

=

tI

hII (t) − hI (t)

=

tI 0

,




which is a linear mapping of rank r. That suffices for the proof.

Corollary 3.10 If f : U −→ Rn is a continuously differentiable mapping defined in an open subset U ⊂ Rm and if rank f (x) = r for all points x ∈ U then

(i) the image f (U a) of any sufficiently small open neighborhood U a ⊂ U of a

point a ∈ U is a C1 submanifold of dimension n of an open neighborhood of the

point f (a) ∈Rn;

(ii) for any point b ∈ f (U ) ⊂Rn the inverse image f −1(b) is a C1 submanifold

of dimension m − r in U .

Proof: It follows from the preceding theorem that after suitable changes of coordinates in open neighborhoods U a of the point a ∈ U ⊂ Rm and V b of thepoint b = f (a) ∈Rn the mapping f is a linear mapping of rank r; consequentlythe image f (U a) is a linear subspace of dimension r in terms of the new coor-dinates in Rn and the inverse image f −1(y) of any point y

∈ f (U a)

⊂ Rm is

a linear subspace of dimension m − r in terms of the new coordinates in Rm.Consequently the image f (U a) is an r-dimensional submanifold of V b in termsof the original coordinates in Rn, and the inverse image f −1(y)∩U a of any pointy ∈ f (U a) is an m − r-dimensional C1 submanifold of U a in terms of the originalcoordinates in Rm. The latter statement holds for any point in f −1(y); so sincethe inverse image f −1(y) ⊂ U is a relatively closed subset of U which is locally asubmanifold of U it must be a submanifold of U , and that suffices for the proof.

Part (i) of the preceding corollary can only be expected to hold locally. Forinstance it is quite possible that the image of an open interval I ⊂ R1 under acontinuously differentiable mapping φ : I −→ Rn is a curve that intersects itself at some points, so that although the image is locally a one-dimensional subman-ifold the full image has singularities; and the image of the open interval need

not even be a closed subset of Rn

. The situation in higher dimensions of courseis even more complicated; but the local result is of considerable use. For theInverse Mapping Theorem and the Implicit Function Theorem the hypothesisonly required that the mapping be of maximal rank at a point a, or equivalentlythat there is a square submatrix of maximal size in the Jacobian matrix of themapping so that the determinant of that matrix at the point a is nonzero; andsince the Jacobian of a C1 mapping is a continuous function it is necessarilynonzero at all points near enough to a. However if the rank of the Jacobianmatrix is not maximal at a point a its rank at nearby points may well be greaterat points arbitrarily close to a. That is the reason that the hypothesis in therank theorem requires that the rank be locally constant. In some applicationsof the Rank Theorem it is still the case that the rank is maximal; so if the rankis maximal at a point is is maximal at all nearby points.

Corollary 3.11 If φ : U −→ Rn is a continuously differentiable mapping from

an open set U ⊂ Rr into Rn where r < n and if rankφ(x) = r at each point x ∈ U then the image under φ of any sufficiently small open neighborhod of any

point a ∈ U is a C1 submanifold of dimension r in an open neighborhood of the

point φ(a) ∈Rn.




Proof: If rank φ(a) = r at a point a ∈ U for a mapping φ : Rr −→ Rn wherer < n then rankφ(x) = for all points x near a, so the desired result follows

immediately from the preceding theorem, and that suffices for the proof.

For example a mapping f : [0, 1] −→ Rm can be viewed as a parametrizedcurve in the vector space Rm, beginning at the point f (0) and ending at thepoint f (1); and if f (t) = 0 at each point t ∈ [0, 1] then by Corollary 3.11 theimage of an open neighborhood of any point t0 ∈ (0, 1) is a 1-dimensional sub-manifold of an open neighborhood of the point f (t0) ∈ Rm. That the mappingf is differentiable at the point t0 means that

f (t) = f (t0) + f (t0)(t − t0) + (t) where limt→t0

(t)t − t0 = 0

for all points t sufficiently near t0. The parametrized curve

f 0(t) = f (t0) + f (t0)(t − t0)

is a line in Rm passing through the point f (t0), called the tangent line tothe parametrized curve γ at the point f (t0) = a0, and its image is of courseanother submanifold of Rm; the vector f (t0) is called the tangent vectorto the parametrized curve f at the point f (t0) = a0. It follows from (2.1)and the uniqueness of the derivative that the tangent line to the parametrizedcurve f at the point a0 is the parametrized line through that point that is thebest linear approximation to the f near that point. It is common to changethe parametrization of the tangent line so that the point a corresponds to theparameter value t = 0, so to describe the tangent line as the parametrzed curvef 0(t) = f (t0) + f (t0)t. A continuously differentiable mapping φ : U −→ Rn

from an open subset U ⊂ R

m

into Rn

takes a parametrized curve f to theparametrized curve φ f , and by the chain rule the tangent vector to the imageφ f at the point φ(a0) is the vector (φ f )(t0) = φ(a0)f (t0). Thus thederivative φ(a0) of the mapping φ at the point a0 can be interpreted as thelinear mapping that takes tangent vectors to parametrized curves through thepoint a0 to tangent vectors to the image parametrized curves through the pointφ(a0).

If V is a C1 submanifold of an open subset U ⊂ Rm then through each pointa ∈ V it is possible to pass a continuously differentiable parametrized curvecontained entirely in the set V near that point; for in some local coordinatesnear the point a the subset V is a linear subspace, which contains straightlines through that point. Moreover if dim V = r there are parametrized curvesthrough that point, the tangent vectors to which span a linear subspace of

dimension r; for in some local coordinates near the point a the subset V is alinear subspace of dimension r, and there are r linearly independent straightlines through any point of a linear space of dimension r. For a submanifoldV ⊂ U of an open set U ⊂ Rm defined by equations

(3.19) V = x ∈ U f 1(x) = · · · = f n−r(x) = 0




where the vectors ∇f i(x) are linearly independent at each point x ∈ V , thelinear subspace of dimension k consisting of the tangent vectors at the point a

of all continuously differentiable parametrized curves contained in V and passingthrough that point is called the tangent space to the submanifold V at thepoint a and is denoted by T a(V ). If φ : I −→ V is a C1 parametrized curve forwhich φ(0) = a, then since f i

φ(t)

= 0 identically in t it follows from the chain

rule that f i(a)φ(0) = 0, or equivalently that ∇f i(a)·φ(0) = 0; thus the tangentvector to the parametrized curve φ is perpendicular to all of the gradient vectors∇f i(a), hence the tangent space to V at the point a ∈ V is the linear subspaceconsisting of all points x ∈ Rm such that the vector x − a is perpendicular tothe vectors ∇f i(a). The equation of the tangent space therefore is

(3.20) T a(V ) =

x ∈Rm(x − a) · ∇f i(a) = 0 for 1 ≤ i ≤ m − r

.

The tangent space is the linear subspace of Rn that is the best local approxi-

mation to the submanifold V near that point, just as is the tangent vector to aparametrized curve. For example, if V ⊂ R3 is the surface defined by

V =

x ∈R3 f (x) = x2

1 + x32 + x4

3 − 1 = 0

then f (1, 1, 1) =

2 3 4

so the equation of the tangent plane to V at thepoint 1, 1, 1 is (2, 3, 4) · (x1 − 1, x2 − 1, x3 − 1) = 0 or equivalently 2x1 + 3x2 +4x3 = 9.

To determine the extrema of a C 1 function g defined on a k-dimensionalsubmanifold V ⊂ Rm it is always possible theoretically to introduce local co-ordinates in an open neighborhood U of any point a ∈ V so that V ∩ U is ak-dimensional linear subspace in terms of those coordinates; the restriction of the function g to V then is locally just a function on a k-dimensional linear

subspace of Rm, so can be viewed as a function on an open subset of Rk and itscritical points are candidates for points at which the function has an extremalvalue. However that can be a rather arduous calculation in all but the simplestsituations, since determining the local coordinates in terms of which the sub-manifold is linear can be quite difficult to do explicitly. However the implicitfunction theorem provides a much simpler and widely used method for findinglocal extrema on a submanifold. For the submanifold V defined by (3.19) wherethe vectors ∇f i(x) are linearly independent at all points x ∈ V , a point a ∈ V will be a local extremum of the restriction of the continuously differentiablefunction g(x) to the submanifold V if and only if it is a local extrema of therestriction of the function g (x) to any continuously differentiable parametrizedcurve lying in V and passing through the point a. Then for any C1 parametrizedparametrized curve φ : U

−→Rm in an open neighborhood U

⊂R1 of the origin

t = 0 passing through the point a, so that φ(0) = a, and contained in V , so thatf i

φ(t)

= 0 for all t ∈ U , the origin t = 0 is a local extrema for the composite

function g

φ(t)

and consequently

0 = d

dtg(φ(t)

t=0

= g(a)φ(0) = ∇g(a) · φ(0); .




thus ∇g(a) must be perpendicular to the tangent line to the parametrized curve,and since that is the case for all parametrized curves in V the vector

∇g(a) must

be perpendicular to the tangent space T a(V ) to the submanifold V at the point a.However the tangent space is characterized in (3.20) as the plane perpendicularto the gradients ∇f i(a), so the vector ∇g(a) must be contained in the span of the vectors ∇f i(a), consequently must be expressible as a linear combination

(3.21) ∇g(a) =

ni=1

λi∇f i(a)

for some parameters λi ∈ R. This set of equations, combined with the equations(3.16) for the submanifold V , are an explicit set of equations that can deter-mine the local extrema of the function g(x), a technique called the method of Lagrange multipliers.

As an example of the use of this method, find the minimum of x + y + z

subject to the restrictions that ax + by + cz = 1 and x, y and z are positive, wherea, b, and c be fixed and positive. The submanifold V is defined by the equationf (x,y,z) = a

x + by + c

z −1 = 0, and the function of interest is g(x,y,z) = x+y+z,where

∇g = (1, 1, 1) ∇f = (−a

x2 , −b

y2 , −c

z2 )

The Lagrange formula is the equation ∇g = λ · ∇f ; the first component of thisvector equation is λ = −x2/a, and upon substituting this into the other twocomponents there result the equations

y = x

b

a and z = x

c

a

Since a, b and c are fixed there results a one-parameter set of solutions, parametrizedby x. When restricted to the set S = (x,y,z) : f (x,y,z) = 0, x > 0, y > 0, z >0 this is

a

x +

b

x

b/a+

c

x

c/a=

√ a

x

√ a +

√ b +

√ c

= 1

⇒ (x,y,z) =√

a +√

b +√

c√

a,√

b,√

c

.

Thus the extremum associated to this point is f (x,y,z) =√

a +√

b +√

c2

.

To see that this is the minimum, the set S is noncompact but the function gmust have a minimum on any compact subset of S . The function g increaseswhenever one of the coordinates x, y, or z becomes very large. Since S is

contained entirely in the first quadrant, it is only necessary to minimize f on areasonable compact subset near the origin, which is the solution found.

When finding the extrema of a continuously differentiable function g (x) ona set such as the closed semidisc

D =

x ∈ R2 x2

1 + x22 ≤ 1, x2 ≥ 0




in the plane, the procedure is first to examine critical points in the interior of the semidisc, then to examine the two pieces of the boundary of the semidisc

that are local submanifolds, the semicircle x21 + x22 = 1, x2 ≥ 0 and the axis−1 ≤ x1 ≤ 1, x2 = 0, and then the two remaining points (−1, 0) and (1, 0)that are points where the boundary fails to be a submanifold. In general, theprocedure is to examine critical points in the interior of the region, and then toexamine the pieces of the boundary that are submanifols and apply the methodof Lagrange multipliers in all these cases.

If f : U −→ Rn is a continuously mapping defined in an open subset U ⊂ Rm

where m < n then rank f (a) ≤ n at any point a ∈ U ; and if rank f (x) = n at apoint x = a ∈ U then it is also the case that rank f (x) = n for all points x ∈ U afor an open neighborhood V of the point a, since if some n×n submatrix of f (a)has a nonzero determinant then by continuity that submatrix of f (x) will alsohave a nonzero determinant for all points x near enough a. The Rank Theoremasserts that the mapping f is equivalent to a linear mapping of rank n in an openneighborhood of each point a ∈ V , and consequently the image of the mappingf is a full open neighborhood of the point f (a). In particular the image of themapping f is not contained within any proper linear subspace of the image spaceRn, so the coordinate functions f i of the mapping f are linearly independentfunctions at each point a ∈ U . On the other hand if rank f (x) = r at all pointsx ∈ U where r < n, then by the Rank Theorem the mapping f will be a linearmapping of rank r in suitable local coordinates in open neighborhoods of eachpoint a ∈ U ⊂ Rm and its image f (a) ∈ Rn; that means that the image of therestriction of the mapping f to an open neighborhood V a of any point a ∈ U iscontained in a submanifold of dimension r in an open neighborhood V f (a) ⊂ Rn

of the image point f (a). This submanifold is described as the set of commonzeros of n − r linear functions in some local coordinates near the point f (a),

and these functions will at least be some continuously differentiable functionshi near the point f (a) in the initial coordinates in Rn. Therefore the componentfunctions f i of the mapping f satisfy a collection of n − r nontrivial equationshν

f 1(x), f 2(x) . . . , f n(x)

= 0 for 1 ≤ ν ≤ n − r; that is usually expressed by

saying that the functions f i are functionally dependent. f The preceding discussion really focused only on local properties of subman-

ifolds of open subsets U ⊂ Rn; the definition and characterization of subman-ifolds are both local conditions. There is also an extensive global theory of submanifolds, and of a weaker equivalence class of submanifolds known as man-ifolds. Although these topics will not be pursued here, it may be of some interestat least to see the basic concepts involved. If V is a topological submanifold of an open subset U ⊂ Rm it has the relative topology induced by the topologyof U , in which the open subsets of V are the intersections of V with the open

subsets of U . Continuous functions on V can be defined as usual iterms of theopen subsets of V . If V 1 ⊂ U 1 and V 2 ⊂ U 2 are topological submanifolds of open subsets U 1 ⊂ Rn1 and U 2 ⊂ Rn2 a mapping φ : V 1 −→ V 2 is continuousin the relative topologies if and only if φ−1(W ) ⊂ V 1 is an open subset of V 1whenever W ⊂ V 2 is an open subset of V 2; and the submanifolds V 1 and V 2 areweakly homeomorphic if and only if there is a continuous one-to-one map-




ping φ : V 1 −→ V 2 with a continuous inverse. This must be distinguished fromthe notion that the two submanifolds V 1 and V 2 are homeomorphic, as defined

on page 52; that refers to the way in which the submanifolds are imbeddedin the open subsets U i, so the submanifolds are equivalent if and only if thereis a homeomorphism ψ : U 1 −→ U 2 such that ψ(V 1) = V 2. Two submanifoldsthat are weakly homeomorphic need not be submanifolds of subsets U i in spacesRmi of the same dimension. For instance, a circle in the plane and a knottedcurve in space are generally weakly homeomorphic, when the knotted curve isdescribed as the image of a one-to-one C1 mapping of the circle into R3 with anontrivial derivative; but they are certainly not homeomorphic as submanifolds,since they are imbedded in spaces of different dimensions. Weak equivalence of submanifolds is an equivalence relation in the customary sense; and a weakequivalence class is called a manifold, in this case a topological manifold. Agiven submanifold V ⊂ U in an open subset U ⊂ Rm thus determines a mani-fold, an equivalence class of submanifolds under weak equivalenace that includesthe submanifold V . It is perhaps more common to speak of submanifolds beingequivalent as manifolds, rather than as weakly equivalent submanifolds.

As might be expected, it is also possible to define manifolds even more in-trinsically, without considering their embeddings. If V ⊂ U is a k -dimensionaltopological submanifold of an open set U ⊂ Rm then by definition each pointa ∈ V is contained in an open subset U a ⊂ Rm for which there is a homeomor-phism φa : U a −→ W a between the open subset U a ⊂ Rn and another opensubset W a ⊂ Rm such that φa(V ∩U a) is a linear subspace La ⊂ W a, which canbe viewed as an open subset La ⊂ Rk; the restriction φa : V ∩ U a −→ La is ahomeomorphism between the relatively open subset V a = V ∩ U a ⊂ V and theopen subset La ⊂ Rk, which can be considered as describing a system of localcoordinates in the relatively open neighborhood V a of the point a ∈ V , the local

coordinates in the open subset La ⊂ Rk

. The corresponding construction canbe carried out for any other topological submanifold V ⊂ U that is homeo-morphic to V through a homeomorphism ψ : U −→ U such that ψ(V ) = V ; if a = ψ(a) there is an open subset U a with a homeomorphism φa : U a −→ W asuch that the restriction φa : V ∩ U a −→ La is a homeomorphism betweenthe relatively open subset V a = V ∩ U a ⊂ V and the open subset La ⊂ Rk,so the composition φa ψ : V a −→ L

a can be considered as describiing an-other system of local coordinate in the relatively open neighborhood V a of thepoint a ∈ V . These two sets of local coordinates near a are related by thehomeomorphism φa ψ (φa

V a : La −→ La. Thus the system of local coor-

dinates in neighborhoods of a point a ∈ V provided by equivalent topologicalk-dimensional submanifolds of Rm are all homeomorphic to one another. Inparticular if V a and V b are any two overlapping systems of local coordinates in

V then they determine homeomorphic systems of local coordinates in an opennighborhood of any point c ∈ V a∩V b; thus the manifold V can be described by acollection of relatively open coordinate neighborhoods V a ⊂ V which determineequivalent coordinate neighborhoods in any intersection U a ∩ U b. Weak home-omorphisms between two submanifolds representing the same manifold can bedescribed entirely in terms of these local coordinate neighborhoods. The same




construction can be carried out for C1 submanifolds, providing systems of localcoordinates that are related by

C1 homeomorphisms in the intersections of co-

ordinate neighborhoods, and of course also for Cr submanifolds for any r. Thenotion of a manifold can be extended to those defined more abstractly just bycoverings of a topological space V by coordinate neighborhoods. Traditionallya k-dimensional manifold is defined as a topological space M such that eachpoint p ∈ M has an open neighborhood U p ⊂ M with a one-to-one continuousmapping f p : U p −→ V p onto an open subset V p ⊂ Rk; the sets U p are calledcoordinate neighborhoods on the manifold M , and the parameters in theimage set V p ⊂ Rk are called local coordinates on the manifold M in U p. Twok-dimensional manifolds M and M are homeomorphic if there is a continousinvertible one-to-one mapping φ : M −→ M ; this is an equivalence relation inthe traditional sense, identifying manifolds that are topologically the same. Itis customary to impose at least some restrictions on the topology of the set M to avoid odd and irregular cases5; thus the topology is usually assumed to be aHausdorff topology, definied as a topology such that any two points of M havedisjoint open neighborhoods. It is possible to speak of continuous functions ona manifold, since the manifold has a topology, but it is not possible to speak of differentiable functions; local coordinate neighborhoods look like pieces of Rk,so that the notion of a function being differentiable in each of those coordinateneighborhoods makes sense, but the notions of differentiablity in the two coor-dinate neighborhoods may be quite different, since it is only required that themappings f p : U p −→ V p be continuous. However if the manifold M has theadditional property that in any intersection U p∩U q of coordinate neighborhoodsof M the composition

f −1 p f q : f q(U p ∩ U q) −→ f p(U p ∩ U q)

is a Cr mapping between the open subsets f q(U p∩U q) ⊂ Rk and f q(U p∩U q) ⊂Rk,then the notion of a function being Cr is well defined; a manifold M with thisproperty is called a differentiable manifold of class Cr, or just a Cr manifoldfor short. The Whitney Embedding Theorem6 shows that any abstractly definedmanifold can be realized by a submanifold of some space Rn, so that it is possibleto define manifolds either as equivalence classes of submanifolds or as abtractlydefined manifolds.

5For example the real axis together with one additional point 0, where an open neigh-borhood of the zero 0 is defined as the set of real numbers x ∈ R | |x| ≤ and an open

neighborhood of the additional point 0 is defined to be that point together with the set of real numbers x ∈ R | 0 < |x| ≤ is a one-dimensional manifold; but the points 0 and 0 donot have disjoint open neighborhoods

6The general result was proved by H. Whitney, Annals of Mathematics, vol. 37 (1936),pages 645-680. The result is discussed in a number of articles and bo oks, for example in thearticle by Howard Jacobowitz on “The global isometric embedding problem” in Contemp.Math.,vol. 332, Amer. Math. Soc., Providence, RI, 2003, pages 131-137.




PROBLEMS: GROUP 1

(1) Find the Jacobian of the mapping f : R2 −→ R2 defined by

f (x) = (ex1 cos x2, ex1 sin x2).

Find formulas for the local inverse of this mapping at those points at which itexists. Does this mapping have a global inverse g : R2 −→ R2?

(2)(i) At which points is the plane curve defined by the equation

f (x) = x31 + x3

2 − 3x1x2 = 0

a submanifold of R2

?(ii) Find the tangent vector to this curve at the point ( 32

, 32 ) and the equation

of the tangent line to the curve at this point.

(3) Find the tangent vector to the curve C ⊂ R3 defined parametrically by

f (t) = e(t−1)(t−2), t2 − 4, t2 − 1

at the point on the curve corresponding to the parameter value t = 1, and theequation of the tangent line to the curve C at that point.

(4) Find a normal vector and the equation of the tangent plane to the sur-face x2

1x2 + x2x3 + x4 = 3 at the point (1, 1, 1, 1, ).

(5) Find all points on the paraboloid x21 + x2

2 − x23 = 1 where the tangent

plane is parallel to the plane x1 + x2 + x3 = 1, and find the equations of thetangent planes at these points.

(6) Find the maximal and minimal values of the function

f (x) = 3x21 − 2x2

2 + 2x2

on the closed unit disc x ∈ R2 | x21 + x2

2 ≤ 1 , by the method of Lagrangemultipliers, and determine the points at which these extreme values are taken.




PROBLEMS: GROUP 2

(7) Show that any 2 × 2 matrix A sufficiently close to the identity matrix hasa unique square root near the identity matrix, a solution X of the equationX 2 = A that is also near the identity matrix. This solution can be viewed asa function

√ A of the 2 × 2 matrices A near the identity matrix I for which√

I = I . What is the derivative of this function√

A in a neighborhood of theidentity matrix I ? (8)(i) Find the maximum and minimum values of the function

f (x) = x21 + x1x2 + x2

2 + x2x3 + x23

on the spherical surface S defined by x21 + x2

2 + x23 = 1, and determine the points

at which these extreme values are attained.(ii) Find the maximum and minimum values of the function f on the intersection

S ∩T of the spherical surface S and the plane T defined by c1x1 +c2x2 +c3x3 = 0,where (c1, c2, c3) ∈ S is a point at which the function f takes its maximum valueon S .

(9) For which C1 functions f (x) on R2 such that f (0, 0) = 0 is there a C1

function g(x1) in an open neighborhood of the origin 0 ∈ R1 such that g(0) = 0and f

f (x1, g(x1)), g(x1)

= 0?

(10)(i) If S is the surface in R3 defined by the equation

x21 − x3

2 + 3x1x2x3 = −1,

find the equation of the tangent plane to S at the point (2, −1, 1).(ii) Find the points on S at which the normal to S is a vector parallel to thevector (1, 1, 0).(iii) Show that S is a submanifold of R3.

(11) Show that the two equations

sin(x1 + 2x2) + cos(3x3 + 4x4) = 1, ex1x3 + ex2x4 = 2

define x1 and x3 as functions of x2 and x4 near the origin 0 = (0, 0, 0, 0), and

determine the value of ∂x1

∂x2at that point. Show that these two equations also

define x2 and x4 as functions of x1 and x3 near the origin, and determine the

value of

∂x2

∂x1 .

(12) Show that the union of the two positive coordinate axes in R2, the set (x1, x2) ∈ R2 | x1x2 = 0, x1 ≥ 0, x2 ≥ 0 , is a continuous submanifold of R2

but is not a C1 submanifold.




(13) Suppose that S 1 and S 2 are two smooth surfaces in R3 (2-dimensionalsubmanifolds) and that λ is a line segment from a point a1

∈ S 1 to a2

∈ S 2

that has the minimal length among all line segments joining points of these twosurfaces. Show that the line λ is perpendicular to the tangent planes to thesurfaces S 1 and S 2 at the points a1 and a2.



Chapter 4

Integration

4.1 The Riemann Integral

The definition of the integral of a function over a cell ∆ ⊂ Rn is is a straight-forward extension of the definition of the integral of a function over an intervalI ⊂ R1. The content of a closed cell

∆ =

x ∈ Rnaj ≤ xj ≤ bj for 1 ≤ j ≤ n

is defined by

(4.1) |∆| =n

j=1

(bj − aj),

and is also considered as the content |∆| of the corresponding open cell ∆. Thecontent of a cell or interval in R1 is its length, the content of a cell in R2 is itsarea, and the content of a cell in R3 is its volume; so the content of a cell is justa generalization of the area or volume to arbitrary dimensions. The cell can bedecomposed as a union of smaller closed cells by adjoining finitely many pointsof subdivision

aj = cj,0 < cj,1 < · · · < cj,nj = bj

of each of its sides and writing ∆ as the union of the closed subcellsx ∈Rn

cj,k−1 ≤ xj ≤ cj,k for 1 ≤ j ≤ n, 1 ≤ k ≤ nj

;

such a decomposition ∆ = i ∆i of the cell is called a partition P of the cell.The subcells of the partition are not disjoint, but only have portions of theirboundaries in common. For the case n = 1 this is the decomposition of aninterval, a cell in R1, as a union of subintervals; and for n = 2 it is illustratedin the accompanying Figure 4.1. It is clear that |∆| =

i |∆i| for any partitionof a cell ∆. If f is a bounded function on a cell ∆ ⊂Rn, its least upper bound

65



66 CHAPTER 4. INTEGRATION

a

b

a b1 1

2

2

∆

∆

ì

Figure 4.1: partition of cell ∆ ⊂ R2

and greatest lower bound in the set ∆ are well defined finite real numbers; the

upper sum of the function f for the partition P is defined by

(4.2) S ∗(f, P ) =i

supx∈∆i

f (x)|∆i|,

and similarly the lower sum is defined by

(4.3) S ∗(f, P ) =i

inf x∈∆i

f (x)|∆i|.

It is clear that

(4.4) S ∗(f, P ) ≤ S ∗(f, P )

for any partition P of the cell ∆. Adding further points of subdivision to thesides of the cell leads to a finer decomposition of the cell as a union of subcells,called a refinement of the partition. If P 2 is a refinement of a partition P 1 of a cell ∆ then

(4.5) S ∗(f, P 2) ≤ S ∗(f, P 1) and S ∗(f, P 2) ≥ S ∗(f, P 1);

for if a cell ∆i in the partition P 1 is decomposed into a union ∆i =

j ∆ij of

cells ∆ij in the partition P 2 then

supx∈∆ij

f (x) ≤ supx∈∆i

f (x) andj

|∆ij | = |∆i|

and consequently

S ∗(f, P 2) =i,j

supx∈∆ij

f (x)|∆ij|

≤i,j

supx∈∆i

f (x)|∆ij | =i

supx∈∆i

f (x)|∆i| = S ∗(f, P 1),



4.1. RIEMANN INTEGRAL 67

and correspondingly for the lower sums with the reversed inequalities. It followsthat if

P 1 and

P 2 are any two partitions of a cell ∆ then

(4.6) S ∗(f, P 1) ≤ S ∗(f, P 2);

for by considering all the points of subdivision of the sides of the cell ∆ fromboth partitions as defining a further partition P , a common refinement of bothof these partitions, then by (4.4) and (4.5)

(4.7) S ∗(f, P 1) ≤ S ∗(f, P ) ≤ S ∗(f, P ) ≤ S ∗(f, P 2).

In view of (4.7) the upper sums S ∗(f, P ) for all of the partitions P of a cell ∆are bounded below by any lower sum S ∗(f, P 1), and the lower sums S ∗(f, P )for all of the partitions P of a cell ∆ are bounded above by any upper sumS ∗(f, P 2); so it is possible to define the upper integral and lower integral of

the function f over the cell ∆ by

(4.8)

∗∆

f = inf P

S ∗(f, P ) and

∗∆

f = supP

S ∗(f, P ),

where the infimum and supremum are extended over all partitions P of the cell∆, and clearly

(4.9)

∗∆

f ≤ ∗

∆

f.

It may be the case though that the upper integral is strictly greater than thelower integral. For example, if f is the function on the real line R defined by

(4.10) f (x) = 0 if x is rational,

1 if x is irrational,

then supx∈∆if (x) = 1 and inf x∈∆i

f (x) = 0 for any partition P of a cell ∆

into a union ∆ =

i ∆i, and consequently S ∗(f, P ) = |∆| and S ∗(f, P ) = 0;and since that is the case for any partition P it follows that

∗∆

f = |∆| and ∗∆

f = 0, so ∗

∆ > f

∗∆

f in this case. On the other hand, it is clear that if

f (x) = c at all points x ∈ ∆ then ∗

∆ f = ∗∆ f = c|∆|. More generally if f is a

continuous function in a cell ∆ then ∗

∆f =

∗∆

f . Indeed if f is continuous in

∆ then it is uniformly continuous since ∆ is compact, so for any > 0 there isa partition P of the cell as a union ∆ = ∪i∆i where the cells ∆i are sufficientlysmall that supx

∈∆i

f (x)

−inf x

∈∆i

f (x)

≤ for each cell ∆i. Then

S ∗(f, P ) − S ∗(f, P ) =i

supx∈∆i

f (x)|∆i| − inf x∈∆i

f (x)|∆i|

(4.11)

≤i

|∆i| = |∆|;




that is the case for all partitions hence

∗

∆f −

∗∆

f ≤ |∆| for any > 0, and

consequently ∗

∆f = ∗∆

f . This proof can be tweaked to yield a more general

result that is a complete characterization of those functions for which the upperand lower integrals are actually equal; but that requires a digression to discusssome further properties of functions and point sets.

The oscillation of a bounded function f in a subset S ⊂ Rn is defined by

(4.12) of (S ) = supx∈S

f (x) − inf x∈S

f (x);

this is well defined, since the supremum and infimum of any bounded collectionof real numbers f (x) are well defined. In view of (4.11) it is clear that

(4.13) S ∗(f, P ) − S ∗(f, P ) =i

of (∆i)|∆i|

for any partition P , so the notion of oscillation is closely related to integrability.It is evident that if S 1 ⊂ S 2 then of (S 1) ≤ of (S 2). In particular if a functionf is defined and bounded in an open neighborhood N r(a) of a point a thenof N r1(a)

≤ of N r2(a)

whenever r1 ≤ r2 and r2 is sufficiently small that

these neighborhoods are contained in U , so the limit

(4.14) of (a) = limr→0

of N r(a)

is well defined; it is called the oscillation of the function f (x) at the point a.

Lemma 4.1 A bounded function f in an open neighborhood of a point a ∈Rn

is continuous at the point a if and only if of (a) = 0.

Proof: If of (a) = 0 then for any > 0 there is a δ > 0 such that of N δ(a) < ,so in particular |f (x) − f (a)| < whenever x ∈ N δ(a) and consequently f iscontinuous at the point a. Conversely if f is continuous at a then for any > 0there a δ > 0 such that |f (x) − f (a)| < 1

2 whenever x ∈ N δ(a); therefore|f (a)| − 1

2 ≤ |f (x)| ≤ |f (a)| + 1

2 for any x ∈ N δ(a), hence of

N δ(a)

< andconsequently of (a) = 0. That suffices for the proof.

Lemma 4.2 If f (x) is a bounded function in a subset S ⊂ Rn then

E =

x ∈ S of (x) <

is a relatively open subset of S for any > 0.

Proof: If a ∈ E then of (a) < , so by definition of N δ(a) ∩ E

< if δ is

sufficiently small. If b ∈ N 12δ

(a) ∩ E then N 12δ

(b) ∩ E ⊂ N δ(a) ∩ E ⊂ so

of N 1

2 δ(b) ∩ E

≤ of N δ(a) ∩ E

< and therefore N 1

2δ(b) ∩ E ⊂ E . That

shows that E is a relatively open subset of S and thereby concludes the proof.




A subset S ⊂ Rn has content zero if for any > 0 there are finitely manyopen cells ∆i such that S

⊂i ∆i and i

|∆i

| < ; the set S has measure zero

if for any > 0 there are countably many open cells ∆i such that S ⊂ i ∆i

and

i |∆i| < . Clearly any set of content zero is also a set of measure zero;and a compact set of measure zero is also of content zero, since if S is compactand S ⊂ i ∆i where

i |∆i| < then S is already contained in the union of

finitely many of these cells ∆i.

Lemma 4.3 If countably many subsets E i ⊂ Rn each have measure zero then

their union E =∞

i=1 E i also has measure zero.

Proof: Since E i has measure zero then for any > 0 there are countably manyopen cells ∆ij for j = 1, 2, . . . such that E i ⊂

∞j=1 ∆ij and

∞j=1 ∆ij < 1

2i ;

then E =

∞i=1 E i ⊂

∞i,j=1 ∆ij , which is also a countable union of open cells,

and ∞i,j=1 |∆ij | = ∞i=1 ∞j=1 |∆ij | ≤ ∞i=1

1

2i = , which suffices for theproof.

For example, the set Q of rational numbers does not have content zero, sinceno finite number of finite open intervals can cover all the rational numbers; butit does have measure zero as a consequence of the preceding lemma, since eachindividual point of course has measure zero. There are noncountable sets of measure zero as well, for any proper submanifold of an open subset U ⊂ Rn

is a set of measure zero and is a noncountable set if its dimension is greaterthan 0. Indeed if V ⊂ U is a submanifold of dimension k where 1 ≤ k < nthen by Corollary 3.8 for any point a ∈ V after relabeling the coordinates if necessary the intersection V ∩ ∆ of the submanifold V with a sufficiently smallcell ∆a = ∆

a × ∆ba centered at the point a, where ∆

a is in the subspaceRn−k consisting of the first n

− k coordinates x1, . . . , xn

−k and ∆

a

is in thesubspace Rk consisting of the last k coordinates xn−k+1, . . . , xn, is described bythe equations

V ∩ ∆a =

x ∈ U a

xi = gi(xn−k+1, . . . , xn) for 1 ≤ i ≤ n − k

for some continuously differentiable functions gi(x) of the variables x ∈ ∆a, as

in the accompanying Figure 4.2. Each point x ∈ V can be enclosed in a smallcell ∆i = ∆

i × ∆i where |∆

i| < , so that

i |∆i| < |∆|, from which it isevident that the set V ∩ ∆a is of measure 0. The entire submanifold V can becovered by countably many such cells ∆a, so by Lemma 4.3 the set V is also of measure 0.

A function f is continuous almost everywhere in a set S

⊂ Rn if the

subset E ⊂ S of points where f fails to be continuous is a set of measure zero.For example, the function f on R defined by

(4.15) f (x) =

1q if x = p

q where p, q are coprime integers,





∆

∆

‘‘

‘

a

a

x ,... xn−k 1

∆i

x ,..., xn−k+1 n

Figure 4.2: submanifold of a cell, a set of measure 0

is continuous at the irrational numbers but is discontinuous at any rationalnumber, so it is continuous almost everywhere. It is an amusing point, but a bitharder to prove, that there are no functions that are continuous at the rationalnumbers but discontinuous at any irrational number1.

Theorem 4.4 (Theorem of Riemann and Lebesgue) If f is a bounded func-

tion in a cell ∆ ⊂ Rn then ∗

∆f =

∗∆

f if and only if f is continuous almost

everywhere in ∆.

Proof: As preliminary observations, for any bounded function f in a cell ∆and for any > 0 let

E =

x

∈ ∆

|of (x) <

and F =

x

∈ ∆

|of (x)

≥

,

so F = ∆ ∼ E . By Lemma 4.2 the set E is relatively open in ∆, so itscomplement F is a relatively closed subset of ∆; and since ∆ is compact itfollows that the closed subset F ⊂ ∆ is compact. By Lemma 4.1 the functionf is continuous at a point a ∈ ∆ if and only if of (a) = 0; so the set of points in∆ at which f is not continuous is the set F = x ∈ ∆ | of (x) > o , and F ⊂ F for any > 0 while F =

∞n=1 F 1

n.

To turn to the proof itself, first suppose that the function f is bounded andcontinuous almost everywhere in ∆. The set F then is of measure zero, as arethe subsets F ; and since the sets F are compact, they even are of content zero.Therefore there are finitely many open cells ∆j ⊂ Rn such that F ⊂

j ∆

j and

j |∆

j | < . The complement K = ∆ ∼

j ∆

j is a closed and consequently

compact subset of ∆, and if a ∈

K then a ∈

F so of (a) < ; hence for any pointa ∈ K there is an open cell ∆

a containing that point such that of (∆a) < . SinceK is compact finitely many of these open cells ∆

ak serve to cover the compact

set K . Let P be the partition of ∆ determined by the points of subdivision of

1This result follows from an application of the Baire Category Theorem, which is discussedin most of the standard text books on real analysis.




the finitely many cells ∆j and ∆

ak . Let I be the set of those indices i such thatthe cell ∆i is contained within some of the cells ∆

j , and let I be the remaining

set of indices. Thus F ⊂ i∈I ∆i and i∈I |∆i| < , while if i ∈ I then∆i ∈ ∆

akfor some of the cells ∆

akso of (∆i) < . If |f (x)| ≤ M for all points

x ∈ ∆ then of (D) ≤ 2M for any subset D ⊂ ∆ and consequently

S ∗(f, P ) − S ∗(f, P ) =i

of (∆i)|∆i| =i∈I

of (∆i)|∆i| +i∈I

of (∆i)|∆i|

≤ 2M i∈I

|∆i| + i∈I

|∆i| ≤ 2M + |∆|.

Since that is the case for any it follows that ∗

∆f =

∗∆

.

Next suppose that f is a bounded function in ∆ such that ∗

∆f =

∗∆

. For

any η > 0 there is a partition P of the cell ∆ such that

i

of (∆i)|∆i| = S ∗(f, P ) − S ∗(f, P ) < η.

Let I be the set of those indices i such that ∆i ∩ F = ∅ and let I be theremaining indices; thus F ⊂

i∈I ∆i, and if i ∈ I there is a point ai ∈ ∆i

such that of (ai) ≥ , and consequently of (∆i) ≥ . Then

η >i∈I

of (∆i)|∆i| +i∈I

of (∆i)|∆i| ≥i∈I

of (∆i)|∆i| ≥i∈I

|∆i|

so that

i∈I |∆i| < η/; and since F ⊂

i∈I ∆i for all η > 0 it follows that F is a set of measure zero. Finally since F ⊂

∞n=1 F 1/n and all the sets F 1/n are

of measure zero the set F also is of measure zero, which suffices for the proof.

A function f is Riemann integrable over a cell ∆ ⊂ Rn if it is boundedin ∆ and is continuous almost everywhere in ∆. It follows from the precedingtheorem that

∗∆ f =

∗∆ f for any function f that is Riemann integrable over

the cell ∆. The common value of these upper and lower integrals of the functionf is called the Riemann integral of the function f over the cell ∆ and isdenoted by

∆

f , so that

(4.16)

∆

f =

∗∆

f =

∗∆

f ;

for convenience this is often called just the integral of the function f , althoughsome caution is necessary since there are other common notions of integrals,

such as the Lebesgue or Denjoy integrals.Although only the integral of the function f over a cell has been defined so

far, it is possible to extend the notion of the integral of a Riemann integrablefunction to more general sets than just cells. A bounded subset D ⊂ Rn iscalled a Jordan domain if its boundary ∂ D is a set of measure zero; since ∂ Dis a closed bounded set, and hence a compact set, then actually ∂ D is a set of




content zero for any Jordan domain D. It is easier to show that a domain isa Jordan domain by using the characterization that the boundary is a set of

measure zero; but for many results about Jordan domains it is more convenientto use the equivalent characterization that the boundary is a set of content zero.The equivalence of these two characterizations should be kept in mind in thesubsequent discussion. The definition of a Jordan domain does not impose anyfurther restrictions on the set D; so a Jordan domain may be open or closed oreven a more general point set2. An open or closed cell ∆ is a Jordan domain,since as noted earlier ∂ ∆ is a set of measure zero for any cell ∆. The intersectionand union of two Jordan domains are again Jordan domains. To verify that firstfor the union, if D1 and D2 are Jordan domains then D1 ∪ D2 ⊂ D1 ∪ D2 and∼ (D1 ∪ D2) = ∼ D1∩ ∼ D2 ⊂ ∼ D1 ∩ ∼ D2 and consequently

∂ (D1 ∪ D2) =

D1 ∪ D2

∼

D1 ∪ D2)

⊂ D1 ∩ (∼ D1 ∩ ∼ D2)D2 ∩ (∼ D1 ∩ ∼ D2),

which is a set of measure zero since D1∩∼ D1 and D2∩∼ D2 are sets of measurezero. If D ⊂ ∆ is a Jordan domain then its complement ∆ ∼ D is also a Jordandomain since ∂ (∆ ∼ D) = ∂ ∆ ∪ ∂D ; consequently the intersection of Jordandomains also is a Jordan domain.

A Jordan domain D ⊂ Rn is a bounded subset of Rn by definition, so D ⊂ ∆for some cell ∆ ⊂ Rn. A Riemann integrable function f in the Jordan domainD can be extended to a function f in the cell ∆ by setting

(4.17)

f (x) =

f (x) if x ∈ D

0 if x ∈ ∆ ∼ D.

The extended function f of course is bounded in D. It is continuous almosteverywhere in the interior of D, since the function f is continuous almost every-where in D by assumption, and it is also continuous in the complement ∆ ∼ Dof the closure of D, where it vanishes identically by definition; and since ∂D is aset of measure 0, by the definition of a Jordan domain, it follows that the func-tion f is also continuous almost everywhere in ∆. Consequently the function f is Riemann integrable in the cell ∆, so it is possible to define the integral of theinitial function f over the Jordan domain D by

(4.18)

D

f =

∆

f .

It is evident from the definition of the Riemann integral that the value of theintegral

D

f is independent of the choice of the cell ∆; for if ∆ ⊂ ∆ then in

2It is more common to call such sets Jordan measurable sets; but since the interest here isnot in Jordan measure as such the modified terminology seems more appropriate. The termdomain is sometimes used just to denote an open set, but it is convenient to use that termsomewhat more generally.




calculating the integral

∆ f it is always possible to take partitions such that

each cell ∆i in the partition is either contained in ∆ or is disjoint from ∆ or

is one of a finite number of cells containing ∂ ∆ and of total content less thanany given . As a special case, if f is a Riemann integrable function in a closedcell ∆ its restriction f |D to a Jordan domain D ⊂ ∆ is a Riemann integrable

function in D and the extension f |D can be identified with the product χD

f ,where χ

D is the characteristic function of the set D, the function defined by

(4.19) χD

(x) =

1 if x ∈ D,

0 if x ∈ D.

Thus if f is a Riemann integrable function in a closed cell ∆ then its integralover any Jordan domain D ⊂ ∆ can be defined equivalently by

(4.20) D

f = ∆

χD

f.

In particular the content of a Jordan domain D ⊂ ∆ can be defined as theintegral of the constant function 1 over the set D, so by the integral

(4.21) |D| =

∆

χD

.

When D = ∆ this definition coincides with the previous definition of the contentof a cell, since for any partition P of a cell ∆ it follows immediately from thedefinition of the upper sum that S ∗(χ

∆, P ) =

i |∆i| = |∆| and consequently

that ∆ χ∆

=

|∆

|.

Theorem 4.5 If f 1 and f 2 are Riemann integrable functions in a Jordan do-

main D then so is any linear combination f = c1f 1 + c2f 2 for real constants

c1, c2 and

(4.22)

D

f = c1

D

f 1 + c2

D

f 2.

Proof: It is clear that any linear combination of functions that are boundedand continuous almost everywhere in a Jordan domain D is also bounded andcontinuous almost everywhere in D, so any linear combination of Riemann in-tegrable functions over D is Riemann integrable over D. The integrals overD of Riemann integrable functions are defined as the integrals over any cell ∆

containing D of the extended functions (4.17); so it is sufficient just to prove thetheorem for linear combinations of Riemann integrable functions over a cell ∆.It is also clear from the definition of the Riemann integral over a cell that theintegral of the product cf of a constant c and a Riemann integrable function f is the product of c and the integral of f ; so finally it is sufficient just to showthat the integral over a cell ∆ of the sum of two Riemann integrable functions




is the sum of their integrals. If ∆ =

∆i is a partition P of the cell ∆ then for

any bounded functions f 1 and f 2

supx∈∆i

(f 1(x) + f 2(x) ≤ supx∈∆i

f 1(x) + supx∈∆i

f 2(x);

therefore S ∗(f 1 + f 2, P ) ≤ S ∗(f 1, P ) + S ∗(f 2, P ), and since that is the casefor any partition it follows that

∗∆(f 1 + f 2) ≤ ∗

∆f 1 +

∗∆

f 2. The correspond-ing argument for the lower sums shows that

∗∆

(f 1 + f 2) ≥ ∗∆

f 1 + ∗∆

f 2.Altogether then

∗∆

f 1 +

∗∆

f 2 ≤ ∗∆

(f 1 + f 2) ≤ ∗

∆

(f 1 + f 2) ≤ ∗

∆

f 1 +

∗∆

f 2;

and since f 1 and f 2 are Riemann integrable

∗

∆f 1 =

∗∆f 1 =

∆f 1 and

∗

∆f 2 =

∗∆ f 2 = ∆ f 2 so it follows from the preceding inequalities that ∗∆

(f 1 + f 2) =

∗∆

(f 1 + f 2) =

∆

f 1 +

∆

f 2,


Integrability is really required for the preceding result to hold. For thefunction (4.10) for example

∗∆

f = |∆| and ∗∆

f = 0, and it is clear that the

same argument applied to the function g (x) = 1 − f (x) shows that ∗

∆g = |∆|

and ∗∆

g = 0; consequently ∗

∆f +

∗∆

g = 2|∆| while ∗∆

f + ∗∆

g = 0. On

the other hand f +g = 1 so ∗

∆(f +g) = ∗∆(f +g) =

∆(f +g) = |∆|. Thus it is

neither true that ∗

∆(f + g) = ∗

∆ f + ∗

∆ g nor that ∗∆(f + g) = ∗∆ f + ∗∆ g.

Theorem 4.6 Let f and g be Riemann integrable functions over a Jordan do-

main D.

(i) If f (x) ≥ 0 for all x ∈ D then D

f ≥ 0.

(ii) If f (x) ≥ g(x) for all x ∈ D then D

f ≥ D

g.

(iii)

D f ≤

D|f |.

Proof: Again it is sufficient just to prove these assertions for Riemann inte-grable functions over a cell ∆. If f (x) ≥ 0 for all x ∈ ∆ then S ∗(f, P ) ≥ 0 forany partition P of the cell ∆ so (i) holds. Then (ii) follows by applying (i) tothe difference f − g and using the preceding Theorem 4.5, while (iii) follows byapplying (ii) to the functions |f | and ±f since |f (x)| ≥ f (x) and |f (x)| ≥ −f (x)for all x

∈ ∆; and that suffices for the proof.

A mapping f : D −→ Rn from a Jordan domain D ⊂ Rm into Rn is saidto be Riemann integrable if each of its coordinate functions f i is Riemannintegrable over D, and the integral

D

f is defined to be the vector havingthe coordinates

D f i. The two preceding theorems extend quite directly from

functions to mappings, with the appropriate modification for Theorem 4.6.




Theorem 4.7 (i) Any linear combination f = c1f 1 + c2f 2 of Riemann inte-

grable mappings f 1 and f 2 from a Jordan domain D

⊂Rm into Rn is Riemann

integrable and

(4.23)

D

(c1f 1 + c2f 2) = c1

D

f 1 + c2

D

f 2.

(ii) If f is a Riemann integrable mapping from a Jordan domain D ⊂ Rm into

Rn then

(4.24)

D

f

2≤ D

f 2.

Proof: (i) This is an immediate consequence of Theorem 4.5, since it justinvolves the individual coordinate functions of the mapping f .(ii) If u

∈ Rn is a unit vector in the direction of the vector ∆ f

∈ Rn then it

follows from the Cauchy-Schwarz inequality and Theorems 4.5 and 4.6 for thefunction u · f that

Df 2 = |u ·

Df | = |

Du · f | ≤

D|u · f | ≤

Df 2, which

suffices for the proof.

Theorem 4.8 If f is a bounded function in a Jordan domain D ⊂ Rn such

that f (x) = 0 for all points x ∈ (D ∼ E ), where E ⊂ D is a set of content 0,

then f is Riemann integrable over D and D f = 0.

Proof: The Jordan domain D is bounded, so D ⊂ ∆ for some cell ∆. Theextended function f in ∆ of course is zero for points x ∈ (∆ ∼ E ). Since E hascontent 0 then for any > 0 it is possible to find a partition P of the cell ∆ asa union ∆ = i

∈I ∆i so that E

⊂ i

∈I ∆i for a subset I

⊂ I of the indices,

where i∈I |∆i| < , while ∆i ∩ E = ∅ if i ∈ I for the complementary set of

indices I . Then |f (x)| < M for all points x ∈ ∆i if i ∈ I , where M is a bound

of the function f , while f (x) = 0 for all points x ∈ ∆i if i ∈ I ; consequently

S ∗( f , P ) =i∈I

sup

x∈∆i

f (x)

|∆i| +

i∈I

sup

x∈∆i

f (x)

|∆i|

≤i∈I

M |∆i| + 0 ≤ M ,

and similarlyS ∗(f, P )∗ ≥ −M ;

Thus for any > 0 there is a partition P of the cell ∆ such that

S ∗( f , P ) − S ∗( f , P ) ≤ M ,

and therefore f is integrable over ∆ and D f =

∆f = 0, which suffices for the

proof.




The preceding result fails if the exceptional set is only of measure 0 ratherthan of content 0; for a function f such that f (x) = 1 at a countable dense set

of points of a Jordan domain D and f (x) = −1 at another countable dense setof points of D while f (x) = 0 otherwise does have the property that it vanishesoutside of a set of measure 0, but it is not Riemann integrable over D.

Corollary 4.9 If f is a Riemann integrable function over a Jordan domain

D, and if g is a bounded function in D such that g(x) = f (x) at all points

x ∈ (D ∼ E ) for a subset E ⊂ D of content zero, then g is also Riemann

integrable over D and D

g = D

f .

Proof: The function f − g is bounded in D and vanishes on the set D ∼ E where E ⊂ D is a subset of measure zero, so by the preceding theorem thisfunction is integrable in D and

d(f − g) = 0. The difference g = f − (f − g)

of these two integrable functions is then also integrable, and it follows from

Theorem 4.7 (i) that D g = D f − D(f − g) = D f , which suffices for theproof.

A consequence of the preceding corollary is that a Riemann integrable func-tion f in a Jordan domain D can be modified on any subset of content 0 in Din any arbitrary way, so long as it remains bounded, and the modified functionremains Riemann integrable with the same integral as the function f . The inte-gral of a function f thus is not so much a property of the individual function f ,but rather is a property of an equivalence class of functions, where two Riemannintegrable functions in D are equivalent if and only if they are equal outside aset of content 0 in D. In particular when considering integrals or integrability of functions over cells or Jordan domains, the values of a function at the boundaryof the set are immaterial, provided of course that the function remains bounded;

so as far as integration is concerned, it is immaterial whether closed or openJordan domains are considered, and of course the same is true for cells.

Theorem 4.10 If two Jordan domains D1, D2 are disjoint except possibly for

a subset

(4.25) E = D1 ∩ D2 ⊂ ∂D1 ∪ ∂D2

and if f is a Riemann integrable function in the union D1 ∪ D2 then

(4.26)

D1∪D2

f =

D1

f +

D2

f ;

in particular

(4.27) |D1 ∪ D2| = |D1| + |D2|.

Proof: The union D1 ∪ D2 of two Jordan domains D1, D2 is also a Jordandomain, as noted on page 72, so it is contained in a cell ∆; and then

D1∪D2

f = ∆

χD1∪D2

f . The characteristic function of the union of the two Jordan domains




is χD1∪D2

(x) = χD1

(x) + χD2

(x) − χE

(x), where the intersection E = D1 ∩ D2

is also a Jordan domain; consequently D1∪D2

f =

∆

(χD1

+ χD2

− χE

)f

=

∆

χD1

f +

∆

χD2

f −

∆

χE

f

=

D1

f +

D2

f

for E ⊂ (∂D1 ∪ ∂D2) and the sets ∂D1 and ∂ D2 are sets of measure 0 by thedefinition of a Jordan domain so that |E | = 0 and therefore

∆

χE

f = 0 byCorollary 4.9. That demonstrates (4.26), which holds in particular for f = 1and therefore implies (4.27), thereby concluding the proof.

A subset D ⊂ ∆ has content zero if and only if ∗∆ χD = 0, since for anypartition P of the cell ∆ as the union ∆ =

i∈I ∆i it is clear that

S ∗(1, P ) =

i∈I |D∩∆i=∅ |∆i|.

That suggests that the notion of the content of a Jordan domain might beextended to the notion of the content of an arbitrary subset D ⊂ ∆ by setting|D| = ∗

∆χD

. However that does lead to some problems; for as already observed,the linearity of integration requires that functions be Riemann integrable, so itis not necessarily the case that |D1 ∪ D2| = |D1| + |D2| for general disjointsubsets D1, D2 ⊂ ∆. The Lebesgue integral does provide an extension of thenotion of the content to more general sets than Jordan domains; but there are

limits even there to the additivity of the sizes of arbitrary sets.




4.2 Fubini’s Theorem

The explicit calculation of integrals in several variables can be reduced toa succession of calculations of integrals in a single variable. A cell ∆ ∈ Rn =Rn ×Rn where n = n + n can be written as a product ∆ = ∆ × ∆ of cells∆ ∈ Rn and ∆ ∈ Rn , so a function f in the cell ∆ ∈ Rn can be written as afunction f (x, x) of variables x ∈ ∆ ∈ Rn and x ∈ ∆ ∈ Rn. For any fixedpoint x ∈ ∆ the function f (x, x) can be considered as a function of thevariable x ∈ Rn alone; so if f is bounded it is possible to consider the upperintegral of this function of the variable x, denoted by

∗x∈∆ f (x, x); this upper

integral then is a well defined bounded function of the variable x ∈ ∆, so it ispossible to consider the upper integral of this function of the variable x, denoted

by ∗

x∈∆

∗x∈∆ f (x, x)

and called an iterated integral. Of course it is

possible to use the lower rather than upper integral for one or another or both of

these integrals, leading to iterated integrals such as ∗x∈∆ ∗x∈∆ f (x, x);

and it is possible to proceed in the other order, integrating first with respect tothe variable x and then the variable x, leading to iterated integrals such as ∗

x∈∆

∗x∈∆ f (x, x)

.

Theorem 4.11 (Fubini’s Theorem) If f is a Riemann integrable function in

a product cell ∆ = ∆ × ∆ the integrals ∗

x∈∆ f (x, x) and ∗ x ∈∆ f (x, x)

are Riemann integrable functions of the variable x ∈ ∆ and

(4.28)

∆

f =

x∈∆

∗x∈∆

f (x, x)

=

x∈∆

∗ x∈∆

f (x, x)

.

Proof: If P is a partition ∆ =

i ∆i of the cell ∆ and P is a partition

∆ =

i ∆i of the cell ∆ these two separate partitions determine a partition

P = P × P of the cell ∆ and

S ∗(f, P ) =ij

supx∈∆

i

x∈∆j

f (x, x)∆

i × ∆j

=j

i

supx∈∆

i

x∈∆j

f (x, x)∆

i

∆j

.

For any fixed point x0 ∈ ∆j

i

supx∈∆

i

x∈∆j

f (x, x)∆

i

≥ i

supx∈∆

i

f (x, x0 )∆

i

= S ∗

f (x, x0 ), P

≥ F (x0 )



4.2. FUBINI’S THEOREM 79

where F (x0 ) =

∗x∈∆

f (x, x0 ). That is the case for any fixed point x0 ∈ ∆j so

the same results hold for the supremum over all such points; hence

S ∗(f, P ) ≥j

supx0∈∆

j

F (x0 )∆

j

= S ∗(F.P ) ≥ F (x)

so that ∗∆

f ≥ ∗

x∈∆

∗x∈∆

f (x, x)

.

The corresponding argument for the lower integral leads to the reversed inequal-ity, so altogether there is the chain of inequalities ∗

∆

f ≥ ∗

x∈∆

∗

x∈∆

f (xx)

≥ ∗x∈∆

∗

x∈∆

f (x, x)

≥ ∗x∈∆

∗x∈∆

f (x, x) ≥ ∗∆

f.

Since the function f is Riemann integrable ∗

∆f =

∗∆

f and consequently allof the preceding inequalities must be equalities, which concludes the proof.

Corollary 4.12 If f is a continuous function in a product cell ∆ = ∆ × ∆then

(4.29)

∆

f (x) =

x∈∆

x∈∆

f (x, x)

=

x∈∆

x∈∆

f (x, x)

.

Proof: If f is continuous in ∆ then for each fixed point x

∈ ∆ the function

f (x, x) is a continuous function of x ∈ ∆ so the upper integral or lowerintegral in (4.28) can be replaced by the ordinary integral; and the same holdsfor iterated integrals in the reverse order, so that suffices for the proof.

It is customary also to set x∈∆

x∈∆

f (x, x)

=

∆

∆

f (x, x)dxdx,

another commonly used notation for an iterated integral, where the inner inte-gral, that with respect to the inner variable x, is calculated first, and the resultis then integrated over the outer integral, that with respect to the outer variablex. Integrals over Jordan domains D ⊂ ∆ are calculated similarly, by reducing

them to integrals over cells; but some care often is required to keep the regionsof integration straight. The process can be repeated, with more iterations; soan integral over a Jordan domain in Rn can be reduced to the calculation of nseparate integrals of functions of a single variable.

For examples of the application of Fubini’s Theorem, first let D ⊂ R2 be theregion bounded by the lines x1 + x2 = 0, x1 − 2x2 = 0, and x2 = 1 as in the




0 0 00 0 00 0 00 0 00 0 01 1 11 1 11 1 11 1 11 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 01 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

(−1,0) (3,4)

x

D

x = 2 xx + x = 0

2

x1

121 2

D1

2

Figure 4.3: an example of Fubini’s theorem in R2

accompanying Figure 4.3. An application of Fubini’s Theorem shows that forany continuous function f (x) in D

D

f (x) =

1

0

2x2

−x2

f (x1, x2)dx1dx2,

thus reducing the integral to an iteration of integrals of functions of a singlevariable. The region D can be split into region D1 consisting of those pointsx ∈ D for which −1 ≤ x1 ≤ 0 and the region D2 consisting of those pointsx

∈ D for which 0

≤ x1

≤ 2, and then an application of Fubini’s Theorem

shows that for any continuous function f (x) in D D

f (x) =

0

−1

1

−x1

f (x1, x2)dx2dx1 +

2

0

1

x1/2

f (x1, x2)dx2dx1.

The order in which Fubini’s Theorem is applied can make a difference in thecalculation. The iterated integral 1

0

1

√ x1

cos(x32 + 1)dx2dx1

involves a rather complicated trigonometric integral with respect to the variablex2, but the equal iterated integral involves only simple integrations, since withthe change of variables t = x3

2 1

0

1

√ x1

cos(x32 + 1)dx2dx1 =

1

0

x220

cos(x32 + 1)dx1dx2

=

1

0

x22 cos(x3

2 + 1)dx2 =

1

0

cos(t + 1)1

3dt =

1

3

sin2 − sin1

.



4.2. FUBINI’S THEOREM 81

There are some cases in which a Riemann integrable function f (x, x) in aproduct ∆ = ∆

× ∆ is not a Riemann integrable function of one variable

when the other variable is held fixed. For instance the function f (x1, x2) of variables x1, x2 ∈ R1 defined in a cell ∆ = ∆ × ∆ ⊂ R2 by

f (x, x) =

0 if either x or x is irrational,

1q if both x and x are rational and x = p

q

for coprime integers p and q actually is independent of the variable x, andas a function of the variable x it is continuous whenever x is irrational butdiscontinuous whenever x is rational. As a function of two variables its pointsof discontinuity are the products ∆ × p

q for all rational points pq , a countable

union of intervals; and since intervals are sets of measure zero in R2 the points of discontinuity form a set of measure zero, so the function f is Riemann integrable.

Of course it is also Riemann integrable as a function of the variable x for eachfixed point x ∈ ∆; but for any fixed point x the function takes the values 0 or1q on dense subsets of the interval ∆ so the function is not Riemann integrable

as a function of the variable x. Fortunately in this case the integration can betaken first with respect to the variable x, and the simpler version of Fubini’sTheorem holds.

It should be noted in Fubini’s Theorem it is necessary to assume that afunction f on the product cell ∆ = ∆ × ∆ is Riemann integrable; there arefunctions that are Riemann integrable in the variable x ∈ ∆ for each fixedpoint x ∈ ∆, and conversely are Riemann integrable in the variable x ∈ ∆for each fixed point x ∈ ∆, but are not Riemann integrable in the product cell∆. For example let f (x, x) be a function on the cell ∆ × ∆ ∈ R2 that takes

the value 1 at the center of the square, that takes the value 1 at each of fourpoints in the subsquares when the square is split in half on each line, but wherethese points are chosen so that no two lie on the same horizontal or vertical line,and so on; and takes the value 0 otherwise. Thus this function takes each of the values 1 and 0 on dense subsets of the square, so it not Riemann integrable;but on each horizontal or vertical line the function takes the value 1 at a singlepoint and is 0 otherwise, so is Riemann integrable.




4.3 Limits and Improper Integrals

It is familiar that pointwise limits of continuous functions need no longer becontinuous; for example the functions f ν (x) = xν in the closed interval [0, 1] arecontinuous and

f (x) = limν →∞

f ν (x) =

0 if 0 ≤ x < 1,

1 if x = 1,

which fails to be continuous at the point x = 1. It is even the case that limitsof integrable functions need no longer be integrable; for example it is a straight-forward matter to verify that

limm→∞

limn→∞ cos πm!x

2n=


1 if x is rational,

which is not integrable3. However a stronger notion of convergence does preservecontinuity. A sequence of mappings f ν : U −→ Rn from a subset U ⊂ Rm intoRn is uniformly convergent to a mapping f in U if for any > 0 there isan integer N such that f ν (x) − f (x) < for all x ∈ U whenever ν > N . Astandard notation is to write f ν −→ f to indicate that the sequence of functions

f ν converges pointwise to the function f , and to write f ν =⇒U f to indicate that

the sequence of functions f ν converges uniformly to the function f in the set U .It is evident from the inequalities (1.7) that the notion of uniform convergenceis independent of the norm x is used in the definition.

Theorem 4.13 The limit of a uniformly convergent sequence of continuous

mappings from a subset U ⊂ Rm into Rn is continuous in U .

Proof: If f ν =⇒U f then for any > 0 there is an integer N sufficiently large that

f N (x) − f (x) < 13 for all x ∈ U . For any point a there is a δ > 0 such that

f N (a)− f N (x) < 13

whenever x−a < δ , since the mapping f N is continuousat the point a. Then

f (a) − f (x) ≤ f (a) − f N (a + f N (a) − f N (x) + f N (x) − f (x)<

1

3 +

1

3 +

1

3 =

3It is amusing, although somewhat more difficult to demonstrate, that it does require twoconsecutive limiting processes to yield this function, which cannot be realized as a simplelimit of continuous functions. R. Baire introduced a classification of functions, under which

continuous functions are of class 0, functions that are not continuous but are pointwise limitsof continuous functions are of class 1, functions that are not of class 1 but are pointwise limitsof functions of class 1 are of class 2, and so on. He showed that there are functions of all these“Baire classes”. This is discussed in some detail in C. Caratheodory’s b ook Vorlesungen ¨ uber

Reelle Funktionen , 1935 (English translation Real Functions, Chelsea 1946) and F. Hausdorff’sbook Grundz¨ uge der Mengelehre , 1937 (English translation Set Theory , AMS Chelsea 1957.)The subject is often treated in books on real analysis through a set of problems.



4.3. LIMITS AND IMPROPER INTEGRALS 83

1

f (x) ν

0 1/2 1/ νx

ν

ν2

Figure 4.4: an example

whenever x − a < δ , so the mapping f is continuous at the point a; and sincethat is the case at any point a ∈ U the mapping f is continuous in U, whichconcludes the proof.

Even if a sequence of integrable functions converges pointwise to an inte-grable function, the integral of the limit is not necessarily equal to the limit of the integral. For instance consider the sequence of continuous functions f ν (x) onthe unit interval [0, 1] where f ν (0) = 0, f ν (

12ν ) = 2ν, f u( 1

ν ) = 0, f ν (1) = 0, andthe function f ν is extended to be linear in the intervals between these points,as in the accompanying Figure 4.4. It easy to see that limn→∞ f n(x) = 0 for all

points x ∈ [0, 1] but that 1

0 f n = 1.

Theorem 4.14 If f ν is a uniformly convergent sequence of integrable functions

in a cell ∆ ⊂Rn then its limit f is an integrable function in ∆ and

limν →∞ ∆ f ν = ∆ f.

Proof: Since the functions f ν converge uniformly to f there is an integer N such that |f (x) − f N (x)| < 1 for all x ∈ ∆; the function f N is integrable byhypothesis so it is bounded by some constant M in ∆, and then

|f (x)| ≤ |f (x) − f N (x)| + |f N (x)| ≤ 1 + M

for all x ∈ ∆, so the limit function is bounded in ∆. The set E ν ⊂ ∆ of pointsat which the function f ν fails to be continuous is a set of measure zero, sincethe functions f ν are integrable by hypothesis; so the union E =

∞ν =1 E ν is also

a set of measure zero, and all the functions f ν are continuous in ∆ ∼ E . Thefunctions f ν converge uniformly to f in ∆ ∼ E so it follows from the precedingTheorem 4.13 that the limit function f is also continuous in ∆

∼ E , hence that

f is integrable. For any > 0 there is an N such that |f (x) − f ν (x)| < forall x ∈ ∆ whenever ν > N ; since the functions f and f ν are integrable it thenfollows from Theorem 4.6 that for any ν > N

∆

f −

∆

f ν

=

∆

(f − f ν ) ≤

∆

|f − f ν | ≤

∆

= |∆|




and consequently that limν →∞

∆

f ν =

∆

f , which concludes the proof.

Even uniform limits of differentiable functions are not necessarily differen-tiable; for example the functions f n(x) = 1

n sin n2x converge uniformly to zero,but f n(0) = n −→ ∞.

Theorem 4.15 If f ν are differentiable functions in a cell ∆ ⊂ Rn which con-

verge at a point a ∈ ∆, and if the derivatives f ν converge uniformly to a function

g in ∆, then the functions f ν actually converge uniformly in ∆ to a differentiable

function f in ∆ for which f = g.

Proof: For any > 0 there is an N sufficiently large that f µ(x)−f ν (x)∞ < for all x ∈ ∆ and f µ(a) − f ν (a)∞ < whenever µ,ν > N . It then followsfrom the Mean Value Inequality, Corollary 2.8, that for any point x ∈ ∆ thereis a point y ∈ ∆ such that whenever µ,ν > N

f µ(x)−f ν (x)−f µ(a)−f ν (a)∞ ≤ nf µ(y)−f ν (y)∞x−a∞ < nx−a∞hence that

f µ(x) − f ν (x)∞ ≤ f µ(a) − f ν (a)∞ + nx − a∞ < (1 + nd)

where d = supx∈∆ x − a∞; consequently the functions f n converge uniformlyto a function f in ∆. For any point x ∈ ∆ and any h ∈ Rn the function

(4.30) ην (h) =

1h∞

f ν (x + h) − f ν (x) − f ν (x)h

if h = 0,

0 if h = 0,

is a continuous function of the variable h in an open neighborhood of the origin,since f ν is differentiable. If h = 0 then

ηµ(h) − ην (h)∞ ≤ 1

h∞f µ(x + h) − f ν (x + h)− f µ(x) − f ν (x)

∞+ f µ(x) − f ν (x)∞,

and then from the Mean Value Inequality again for some point y between x andx + h

ηµ(h) − ην (h)∞ ≤ nf µ(y) − f ν (y)∞ + f µ(x) − f ν (x)∞,

and consequently

ηµ(h)

−ην (h)

∞ < (n + 1) whenever µ, ν > N , which also

holds trivially for h = 0. Therefore the functions ην (h) are uniformly convergentin an open neighborhood of the origin to a continuous limit function η(h), forwhich limh→0 η(h) = η(0) = 0. For any fixed h = 0 it follows from (4.30) that

η(h) = 1

h∞

f (x + h) − f (x) − g(x)h




since the functions f ν converge to f and the functions f ν converge to g ; but forh = 0 it follows from (4.30) that η(0) = 0. Therefore

limh→0

η(h) = limh→0

1

h∞

f (x + h) − f (x) − g(x)h

= 0,

which shows that f (x) = g(x) and thereby concludes the proof.

Since the derivatives f ν determine the functions f ν only up to additive con-stants it is apparent that it is not enough in the preceding theorem just to assumesomething about the convergence of the derivatives f ν and nothing about theconvergence of the functions f ν themselves; but by that theorem it is enough

just to assume in addition to the uniform convergence of the derivatives thatthe functions f ν converge at just a single point. The preceding theorem doesnot require any regularity properties of the derivatives f ν ; but if the derivativesare assumed to be continuous there is a considerably simpler proof of the pre-

ceding theorem. For functions of a single variable f ν (x) − f ν (a) = xa f ν bythe fundamental theorem of the calculus, so the convergence of the functionsf ν follows directly from the convergence of the values f ν (a) and of the uniformconvergence of the functions f ν and the limit is f (x) = limν →∞

xa f ν =

xa g by

Theorem 4.14 so f = g . The same argument can be applied in each variablefor functions of several variables. Another limiting process is the integral trans-formation defined by a kernel function k(x, y) of the variables x ∈ D ⊂ Rm andy ∈ E ⊂ Rn for compact Jordan domains D and E . An integrable functionf (y) of the variable y ∈ E can be transformed to the function

(4.31) F (x) =

y∈E

k(x, y)f (y)

of the variable x ∈ D, provided that the kernel k(x, y) is an integrable functionof the variable y ∈ E for each point x ∈ D. Of course the properties of thetransform F (x) depend on the properties of the kernel k (x, y) as a function of the variable x ∈ D; a sufficiently regular kernel k(x, y) can transform a merelyintegrable function f (y) of the variable y ∈ E to a continuous or differentiablefunction F (y) of the variable y ∈ E . The regularity conditions to be consideredhere are first that the kernel k(x, y) is a continuous function of both variables(x, y) ∈ D ×E , and second that in addition the kernel k(x, y) is a differentiablefunction of the variable x and the derivative is continuous in both variables; moreprecisely the second condition is that for each fixed point y ∈ E the functionk(x, y) is a differentiable function of the variable x in an open neighborhood of the set D and the gradient of k(x, y) when that function is viewed as a functionof the variable x alone for a fixed point y, which is a well defined vector-

valued function ∇xk(x, y) of the variables (x, y), is a continuous function inthat neighborhood.

Theorem 4.16 (i) If D ⊂ Rm and E ⊂ Rn are compact Jordan domains,

k(x, y) is a continuous function of the variables x ∈ D, y ∈ E , and f (y) is a

Riemann integrable function of the variable y ∈ E , then the transform F (x) =




y∈E

k(x, y)f (y) is a continuous function of the variable x ∈ D.

(ii) If in addition k(x, y) is continuous in both variables and differentiable in the

variable x in an open neighborhood of D×E with derivatives that are continuous in both variables, then the transform F (x) =

y∈E k(x, y)f (y) is a continuously

differentiable function of the variable x ∈ D and

(4.32) ∇F (x) =

y∈E

∇x(f (x, y).

Proof: (i) The integrable function f (y) is bounded, so there is a constantM such that |f (y| ≤ M for all y ∈ E . The continuous function k (x, y) in thecompact set D×E is uniformly continuous, so for any > 0 there is a δ > 0 suchthat |k(x1, y1) − k(x2, y2)| < whenever (x1, y1) − (x2, y2)2 < δ . Therefore if x ∈ D and h ∈ Rm satisfies h2 < δ then

|F (x + h) − F (x)| = y∈E k(x + h, y) − k(x, y)f (y)≤

y∈E

|k(x + h, y) − k(x, y)| |f (y)|

≤

y∈E

M ≤ M |E |,

so F (x) is a continuous function of the variable x ∈ D.(ii) By hypothesis the kernel k(x, y) is continuous in the closure of an openneighborhood U of the set D × E and is a differentiable function of the variablex in U ; so by the mean value theorem, Theorem 2.7, for any point x ∈ D andfor any h ∈ Rm sufficiently small

k(x + h, y)

−k(x, y) =

∇xk(z, y)

·h

for some point z ∈ U between x and x + h on the line joining these two points.Then

F (x + h) − F (x) −

y∈E

∇xk(x, y) · hf (y)

=

y∈E

k(x + h, y) − k(x, y) − ∇xk(x, y) · h

f (y)

=

y∈E

∇xk(z, y) − ∇xk(x, y)

· hf (y).

Since the function ∇xk(z, y) is assumed to be continuous on the compact setU it is uniformly continuous, and consequently for any > 0 there is a δ > 0

sufficiently small that whenever h2 < δ ∇xk(z, y) − ∇xk(x, y)2 <

and hence by the Cauchy-Schwartz inequality∇xk(z, y) − ∇xk(x, y)

· h < h2.




Thus if |f (y)| ≤ M for all y ∈ E then

F (x + h) − F (x) − y∈E

∇xk(x, y) · hf (y) ≤ y∈E

M h2 ≤ M |E |h2,

which shows that F (x) is differentiable and that its derivative is given by (4.32),which suffices for the proof.

The integral has been defined so far for integrable functions in a Jordandomain in Rm. It is possible to extend the integral as an improper integralto a wider class of functions over a wider class of regions of integration. Theterminology is not to suggest anything really improper, but just to indicate thatthe extension does not necessarily satisfy the results established for the integralsover Jordan domains. A subset D ⊂ Rn is an extended Jordan domain if D =

∞i=1 Di where Di are open Jordan domains such that Di ⊂ Di+1 for all

indices i. For any continuous function f in D set

(4.33) f +(x) = max(f (x), 0) and f −(x) = max(−f (x), 0)

for any point x ∈ D. Clearly f +(x) ≥ 0 and f −(x) ≥ 0 at all points x ∈ D,and f (x) = f +(x) − f −(x) is a unique decomposition of the function f as adifference of two non-negative functions; and both functions f + and f − arealso continuous in D. For any integer i > 0 the functions min(f +(x), i) andmin(f −(x), i) also are bounded and continuous in D, and consequently theyare integrable over the Jordan domains Dj . Moreover since these are positivefunctions and min(f +(x), i) ≤ min(f +(x), i +1) while Di ⊂ Di+1 it follows that

Di

min(f +(x), i) ≤ Di+1

min(f +(x), i + 1);

consequencly the sequence of real numbers Dimin(f +(x), i) for i = 1, 2 . . . has

a well defined limit as i → ∞, either a real number or +∞, so it is possible todefine

(4.34)

D

f + = limi→∞

Di

min(f +(x), i)

and correspondingly of course

(4.35)

D

f − = limi→∞

Di

min(f −(x), i).

If at least one of these two integrals is finite the improper integral of the functionf is defined to be

(4.36)

D

f =

D

f + − D

f −

which is either a real number or +∞ or −∞. This appears to depend onthe particular choice of the sequence of Jordan domains Di; but the value of




the improper integral is actually independent of that choice. Indeed if the setD also is the union D =

∞j=1 E j for some Jordan domains E j

⊂ Rn such

that E j ⊂ E j+1 then the same construction leads to other integrals, which forconvenience will be denoted by

E

f + and E

f −; of course E = D as sets, butthat common set is represented as an increasing union of Jordan domains in twodistinct ways. Since each set Di is compact and Di ⊂

∞j=1 E j for the open sets

E j then Di is actually contained in a finite union of the sets E j , and conseqentlyin the largest of the sets E j in this finite union, say the set E ji ; and since Di iscontained in E j whenever j > ji it is possible to assume that ji ≥ i. Therefore

Di

min(f +, i) ≤ E ji

min(f +, ji) ≤ E

f +,

and since that is the case for any i it follows that D f + ≤ E f +; and of course

the same holds for the integrals of the function f −. On the other hand theargument can be reversed, so that the opposite inequality

E

f + ≤ D

f + alsoholds, and consequently

E

f + = D

f +; and again the same holds for integralsof the function f −, so altogether

E f =

D f .

Theorem 4.17 Any open subset U ⊂ Rn is an extended Jordan domain, for

which the approximating Jordan domains are unions of cells in a partitition of

Rn.

Proof: Let P r be a partition of the entire space Rn into cells having as theirsides intervals of length 2−r. For a sufficiently large index r1 there will be someclosed cells in this partition that are contained in the open set U ; the union

of their interiors D1 then is an open Jordan domain contained in U . For asufficiently large index r2 > r1 each point on the compact boundary ∂D1 willbe contained in some open cells of the partition P r2 , and the closures of thesecells also will be contained within U ; the union D2 of all of the open cells of this partition that are contained within U is another Jordan domain such thatD1 ⊂ D2. The process continues, yielding a sequence of open Jordan domainsDi such that Di ⊂ Di+1; and since the sizes of the cells in this partition tendto 0 any point in U will eventually lie in one of the Jordan domains Di, whichsuffices to conclude the proof.

Although any bounded open subset of Rn is an extended Jordan domain,bounded open subsets of Rn are not necessarily Jordan domains themselves.For an example even in R1, let rj be a list of the rational numbers in the open

unit interval (0, 1) where j ≥ 1, let ∆j be an open interval of length at most|∆j | = 2j contained in (0, 1) and containing the point rj where < 1

2 , and let

U =

∆j . Clearly U = [0, 1], and it follows that ∂ U = [0, 1] ∼ U . If U were aJordan domain its boundary would be a compact set of content 0, which couldbe covered by finitely many open intervals of total length less than ; thus therewould be finitely many open intervals I j ⊂ R1 such that ([0, 1] ∼ U ) ⊂N

j=1 I j




and

N j=1 |I j | < , and consequently

[0, 1] ⊂ N j=1

I j ∪∞i=1

∆i

.

However the compact set [0, 1] would be covered by finitely many of these openintervals, so that actually

[0, 1] ⊂ N j=1

I j ∪M i=1

∆i

,

which is impossible sincen

j=1 |I j |+M

i=1 |∆i| < 2 < 1; consequently U cannotbe a Jordan domain.




4.4 Change of Variables

In discussing the effects of a change of variables in integration it is convenientto invoke the notion of an oriented integral. That notion is familiar in thestudy of integrals of functions of a single variable; but it is customarily merelymentioned in passing just as a matter of notation if it is noted explicitly atall. A cell in R1 is an interval, customarily denoted by [a, b] where a ≤ b, andoriented in the direction from a to b, the customary orientation of the real lineby increasing values of the real numbers. The interval bounded by two realnumbers a and b can also be denoted by ∆(a, b), viewed as an unoriented celland consequently as a cell that is defined independently of which of the twoboundary points a or b is the larger number, so that ∆(a, b) = ∆(b, a). Thecontent of this cell is |∆(a, b)| = |b − a|, and the Riemann integral

∆(a,b)

f is

defined as in (4.8). The oriented integral is defined in terms of the Riemann

integral by

(4.37)

ba

f (x) dx =

∆(a,b)f if a ≤ b,

− ∆(a,b)f if a > b .

This is the integral over the cell ∆(a, b) = ∆(b, a) when that cell is oriented inthe direction of the change of the variable x as it moves from the boundary pointa to the boundary point b, so in the positive direction of the natural orientationof the real line of the variable x if a < b but in the negative direction, the inverseof the natural orientation of the real line of the variable x, if a > b. It is clearfrom (4.37) that

(4.38) ba

f (x) dx = − ab

f (x) dx

for any order of the real numbers a and b. For example, the Riemann integral of a constant function f (x) = k is

∆(a,b)

k = k |b − a|, a result that is independent

of which of the two real numbers a or b is the larger; on the other hand it follows

readily from (4.37) that the oriented integral is ba k dx = k(b − a), also a result

that is independent of which of the two real numbers is larger. In particular theintegrals over ∆(a, x) for any fixed point a can be viewed as defining functionsof the real variable x ∈ R1, and

(4.39) ∆(a,x)

k = k

|x

−a

| and

x

a

k dx = k(x

−a);

both of these integrals are continuous functions of the variable x ∈ R1, but theoriented integral is differentiable at the point a while the Riemann integral isnot, one way in which the oriented integral is more convenient. In general, if f : R −→ R is a continuous function on the real line then for the Riemann



4.4. CHANGE OF VARIABLES 91

integral it follows readily that

(4.40)

∆(a,b)

+

∆(b,c)

= ∆(a,c) f if a < b < c,

∆(a,c)f + 2

∆(b,c)

f if a < c < b,

while on the other hand for the oriented integral it follows equally readily that

(4.41)

ba

f (x) dx +

cb

f (x) dx =

ca

f (x) dx for any order of a,b, c.

Theorem 4.18 (Fundamental Theorem of Calculus) If f is a continuous

function on an interval [a, b] ⊂ R1 then F 0(x) = xa f (t) dt is a continuously dif-

ferentiable function of the variable x ∈ (a, b) and F 0(x) = f (x); and conversely

if F (x) is a continuously differentiable function in (a, b) such that F (x) = f (x)then

x

a f (t) dt = F (x)

−F (a) for x

∈ (a, b).

Proof: If F 0(x) = xa

f (t) dt it follows from (4.40) that F 0(x + h) − F 0(x) = x+h

x f (t) dt whenever x, x+h ∈ [a.b], where h can be either positive or negative.

Since f is continuous then for any fixed point x ∈ [a, b] and any > 0 if h issufficiently small

f (x) − ≤ f (t) ≤ f (x) + for all t ∈ ∆(x, x + h);

the oriented integral of this inequality over the interval ∆(x, x + h) yields theinequality

f (x) −

h ≤ F 0(x + h) − F 0(x) ≤ (f (x) + )h,

and dividing by h and taking the limit as h tends to 0 shows that

f (x) − ≤ F 0(x) ≤ f (x) + ;

since that is the case for all > 0 it follows that F 0(x) = f (x). If F (x) is adifferentiable function in [a, b] such that F (x) = f (x) then F (x) = F 0(x) + c for

some constant c, since edx

F (x) − F 0(x)

= 0, and consequently

xa

f (t) dt =

F 0(b) − F 0(a) = F (b) − F (a), which suffices for the proof.

A C1 homeomorphism φ : [a, b] −→ ∆

φ(a), φ(b)

either is orientation pre-serving, so that φ(a) ≤ φ(b) and φ(x) ≥ 0 for all points x ∈ (a, b), or isorientation reversing, so that φ(a) ≥ φ(b) and φ(x) ≤ 0 for all points x ∈ (a, b).In either case, if f is a continuous function in [c, d] then for any differentiablefunction F in [a, b] such that F (x) = f (x) it follows from Theorem 4.18 that

φ(b)

φ(a)f (x) dx = F φ(b) − F φ(a).

By the chain rule (F φ) = (f φ) φ, so by Theorem 4.18 again ba

f

φ(x)

φ(x) dx = F

φ(b)− F

φ(a)

.




Comparing the two preceding observations shows that

(4.42) φ(b)

φ(a)

f (x) dx = ba

f φ(x) φ(x) dx.

This is the familiar formula for the change of variables in an oriented integral;actually it is customary to set y = φ(x) and rewrite the preceding equation inthe form

(4.43)

y(b)

y(a)

f (x)dx =

ba

f (y)dy

dx dx.

On the other hand for the Riemann integral it follows from (4.37) that

(4.44) ∆(a,b)

f = ba

f (x) dx since a ≤ b

and

(4.45)

∆(φ(a),φ(b))

=

φ(b)

φ(a) f (x) dx if φ(a) ≤ φ(b),

− φ(b)

φ(a) f (x) dx if φ(a) ≥ φ(b);

consequently in terms of the unoriented integral (4.42) takes the form

(4.46) ∆(c,d)

f = ∆(a,b)

(f φ)φ if φ(a) ≤ φ(b),

− ∆(a,b)(f φ)φ if φ(a) ≥ φ(b).

In the first case in (4.46) the mapping φ is orientation-preserving and φ(x) ≥ 0,while in the second case in (4.46 the mapping φ is orientation-reversing andφ(x) ≤ 0, so both cases can be subsumed under the general formula

(4.47)

φ(∆)

f =

∆

(f φ)|φ|

for any cell ∆ ⊂ R1. This is the form in which the result extends to mappingsin higher dimensions.

Theorem 4.19 If φ : D −→ E is a C1

homeomorphism between two open subsets of Rn and if f is a function that is continuous almost everywhere in E then (f φ) | detφ| is continuous almost everywhere in D and

(4.48)

E

f =

D

(f φ) | detφ|.




Proof: As a preliminary observation, if φ : D −→ E is a C1 homeomorphismbetween two open subsets D, E

⊂Rn then for any cell ∆

⊂ ∆

⊂ D the boundary

of the image of ∆ is the image of the boundary of ∆, that is, ∂ φ(∆) = φ(∂ ∆).Since each face of ∆ is locally a submanifold of Rn of dimension n − 1 it followsfrom the Rank Theorem, Theorem 3.9, that the boundary of the image φ(∆)consists of a collection of pieces that are also locally submanifolds of Rn of dimension n − 1, so are subsets of measure zero as noted in the discussionon page 70; consequently ∂ φ(∆) is a set of measure zero so the image φ(∆)is a Jordan domain in Rn. A set that is the image of a cell ∆ under a C1

homeomorphism of an open neighborhood of the closure ∆ will be called aspecial Jordan domain, although only for the duration of this proof. It isevident from this definition that the image of a special Jordan domain undera C1 homeomorphism defined in an open neighborhood of its closure is also aspecial Jordan domain.

To turn to the proof of the theorem itself, first it will be demonstrated thatfor any cell ∆ ⊂ ∆ ⊂ E and for any constant c

(4.49)

∆

c =

φ−1(∆)

c | detφ|

provided that the mapping φ has any one of following three special forms.(i) If the mapping φ : D −→ E is a translation then |detφ| = 1, and it obviousthat any partition of the cell ∆ translates to a corresponding partition of thetranslated cell φ−1(∆) for which the upper sums and consequently the upperintegrals of the constant c over the partitions of ∆ and φ−1(∆) are equal; andsince c is Riemann integrable it follows that

∆

c = φ−1(∆)

c, which is (4.49) in

this case.(ii) If the mapping φ : D −→ E is a permutation of the variables then | detφ| =1, and since the upper sums and hence the upper integrals of the constantfunction c for any partition of the cell ∆ are unchanged by a relabeling of thevariables it follows again that

∆

c = φ−1(∆)

c, which is (4.49) in this case.

(iii) If the mapping φ : D −→ E changes only a single variable, then in view of (ii) there is no loss of generality in assuming that it changes only the variablex1. When points x ∈ D and y ∈ E are written

x =

x1

xII

where

x1 ∈ R1

xII ∈ Rn−1 and y =

y1

yII

where

y1 ∈ R1

yII ∈ Rn−1

the mapping φ has the form

φ(x) =

φ1(x)

xII where

φ1(x) ∈R1

xII

∈Rn−1 ;

therefore

φ(x) =

∂ 1φ1(x) ∂ II φ1(x)

0 In−1

where In−1 is the (n − 1) × (n − 1) identity matrix, so detφ(x) = ∂ 1φ1(x). Thecell ∆ ⊂ E is the product of an interval ∆1 in the x1-axis and a cell ∆II in the




x1 1

yIIII

x

φ

II

y

∆II ∆ II(

−1Φ (∆) ∆

∆1

∆φ−1 (y ))(y )

Figure 4.5: Mappings that change a single variable

space Rn−1 of the variables yII , so it is the union of the one-dimenesional slices∆(yII ) = ∆1 × yII for points yII ∈ ∆II . The image of the cell ∆ under the

under the inverse homeomorphism φ−1

: E −→ D is the special Jordan domainφ−1(∆) ⊂ D, which correspondingly is the union of the one-dimensional slices

φ−1

∆(yII )

=

x = x1, xII ∈ φ−1(∆) xII = yII

as sketched in the accompanying Figure 4.5. For the constant function f = c inthe cell ∆ it follows from Fubini’s Theorem and the change of variables formula(4.42) for the restriction φ : φ−1

∆(yII )

−→ ∆(yII ) of the mapping φ to theone-dimensional slices that

∆

c =

yII∈∆II

y1∈∆(yII )

c =

xII∈∆II

x1∈φ−1

∆(xII )

c |∂ 1φ1(x)|

= xII∈∆II x1∈φ−1∆(xII ) c | detφ(x)| = φ−1(∆) c | detφ(x)|,

which is equation (4.49) in this case.Next it will be demonstrated that for any cell ∆ ⊂ ∆ ⊂ E and for any

function f that is bounded in ∆ and is continuous almost everywhere in ∆ theinduced function (f φ)| detφ| is bounded in φ−1(∆) and is continuous almosteverywhere in φ−1(∆) and

(4.50)

∆

f =

φ−1(∆)

(f φ) | detφ|,

provided again that the mapping φ : D −→ E has any one of the precedingthree special forms. Of course it is clear that (f

φ)

|detφ

| is bounded in

φ−1(∆), but the remainder is to be proved. For any partitition P of the cell ∆as the union ∆ = ∪i∆i, introduce the function f ∗ in ∆ defined by

(4.51) f ∗(y) =

supz∈∆i f (z) if y ∈ ∆i,

M if y ∈ ∪j∂ ∆j




where M is an upper bound for |f (x)| in ∆. The function f ∗ is constant in eachopen cell ∆i of the partition

P , so for any mapping φ : D

−→ E of one of the

three special forms considered here it follows from what was demonstrated inthe preceding paragraph that

(4.52)

∆i

f ∗ =

φ−1(∆i)

(f ∗ φ) | detφ|.

The function f ∗ is Riemann integrable except for the union of the boundariesof the cells ∆i, and that union is a compact set of measure 0 so f ∗ is Riemannintegrable over ∆, as is (f ∗ φ) | detφ|. The special Jordan domains ∆i aredisjoint except for their boundaries, as are the special Jordan domains φ−1(∆i),so it follows from (4.51) and Theorem 4.10 that(4.53)

∆ f ∗ = i ∆i

f ∗ = i φ−1(∆i)

(f ∗ φ) | detφ| = φ−1(∆)

(f ∗ φ) | detφ|.

From the definition (4.51) it follows that

∆if ∗ =

supy∈∆i

f (y) |∆i|, hence

(4.54)

∆

f ∗ =i

∆i

f ∗ =i

sup

y∈∆i

f (y)

|∆i| = S ∗(f, P ).

It is also clear from (4.50) that (f ∗ φ)(x) | detφ(x)| ≥ (f φ)(x) | detφ(x)| atall points x ∈ ∆, so the upper sums of these two functions for any partition of a cell containing the special Jordan domain φ−1(∆) satisfy the correspondinginequality and consequently so do the upper integrals, hence(4.55)

φ−1(∆)

(f ∗ φ) | detφ| = ∗

φ−1(∆)

(f ∗ φ) | detφ| ≥ ∗

∆

(f φ)(x) | detφ(x)|.

It then follows from (4.53), (4.54) and (4.55) that

(4.56) S ∗(f, P ) ≥ ∗

∆

(f φ)(x) | detφ(x)|.

The corresponding argument for the lower integral, for which the inequality isreversed, shows that

S ∗(f, P ) ≤ ∗∆

(f φ)(x) | detφ(x)|,

so altogether(4.57)

S ∗(f,P

) ≥

∗

∆

(f φ)(x)

|detφ(x)

| ≥ ∗∆

(f φ)(x)

|detφ(x)

| ≥ S ∗

(f,P

).

Since this holds for any partition P of the cell ∆ and by hypothesis the functionf is Riemann integral it must be the case that

(4.58)

∆

f ≥ ∗

∆

(f φ)(x) | detφ(x)| ∗∆

(f φ)(x) | detφ(x)| ≥

∆

f,




which in view of Theorem 4.4 shows that (f φ) | detφ| is Riemann integrableand in addition demonstrates (4.50).

It is then a straightforward matter to show that if φ : D −→ E is an arbitraryC1 homeomorphism and if f is a function that is continuous almost everywherein E then for any point b0 ∈ E

(4.59)

N

f =

φ−1(N )

(f φ) |φ|

for any sufficiently small open special Jordan domain N such that b0 ∈ N ⊂N ⊂ E . Indeed since detφ(b0) = 0 for a C1 homeomorphism φ : D −→ E itfollows from Theorem 3.3 that over a sufficiently small open neighborhood U 0of the point b0 the mapping φ can be written as the composition

(4.60) φ = φr φr−1 · · · φ2 φ1

of C1 homeomorphisms that are either translations, permutations of the vari-ables, or mappings that change only a single variable; thus there results thechain of C1 homeomorphisms

U rφr−→ U r−1

φr−1−→ · · · U iφi−→ U i−1 · · · φ2−→ U 1

φ1−→ U 0

between open subsets of Rn, where b0 ∈ U 0 ⊂ E and a0 = φ−1(b0) ∈ U r ⊂ D,such that each of the C1 homeomorphisms φi is either a translation, a permu-tation of the variables, or a mapping that changes only a single variable. If N ⊂ N ⊂ U 0 is an open cell containing the point b0 then the inverse imagesφ−1i · · ·φ−1

1 (N ) = N i ⊂ U i are special Jordan domains; and if N is sufficientlysmall then there will be cells ∆i ⊂ U i such that N i ⊂ ∆i ⊂ ∆i ⊂ U i for each set

N i. The restriction f N 0 is a bounded function in N 0 that is continuous almosteverywhere in N 0; and the induced functions

f i = (f N 0) φ1 · · · φi

then are bounded functions in N i that are continuous almost everywhere in N i,and f i−1 φi = f i as in the accompanying diagram.

f i f i−1

U iφi−−−−→ U i−1

N i −−−−→ N i−1

It follows from (4.50) that

(4.61)

N i−1

f i−1 =

N i

f i−1 φi) | detφi|




for each i, since the C1 homeomorphisms are of the form for which that resultapplies. Actually that result was only demonstrated for mappings for which the

image is a cell; but the functions f i can of course be extended by 0 to functionsf i in the cells ∆i, for which

∆i

f i = N i

f i, and (4.50) can be applied to these

extensions. That shows that (4.59) holds for each of these separate mappings φi,the composition of all of which is the mapping φ as in (4.60). To demonstrate(4.59) for the mapping φ it suffices just to show that if (4.59) holds for twoC1 homeomorphisms φ : D −→ E and ψ : E −→ F then it also holds for thecomposition ψ φ : D −→ F . For that purpose, if f is a Riemann integrablefunction in F then by (4.59) applied to the two mappings φ and ψ separately

F

f =

E

(f ψ)| detψ| = D

(f ψ) φ| detψ(φ)|| detφ|

=

D f (ψ φ)

| det(ψ φ)|

since (ψ φ) = ψ(φ)φ by the chain rule for differentiation.Finally to complete the proof of the theorem itself, by Theorem 4.17 any

open set E is an extended Jordan domain which can be written as the union of an increasing sequence of open Jordan domains E i, each of which in turn is aunion E i =

j ∆ij of cells ∆ij ; and by passing to refinements of these cells it

can be assumed by what already has been demonstrated that (4.48) holds foreach of the cells ∆ij and for any bounded function that is continuous almosteverywhere in D. The inverse images of these cells under the mapping φ arespecial Jordan domains, and since these cells are disjoint aside from subsetsof their boundaries, which have content 0, it follows from Theorem 4.10 that E i

f =

j

E ij

f ; and (4.48) holds for the integrals over each cell E ij so it

holds for the integrals over the Jordan domains E i. The integral over E is the

limit of integrals over the Jordan domains E i as in (4.34) and (4.35), and thatsuffices for the proof.

For an explicit example, consider the integral E log(x2

1 + x22) where

E =

(x1, x2) ∈R2 a ≤ x2

1 + x22 ≤ b, x1 ≥ 0, x2 ≥ 0

.

This is most convenently handled by introducing polar coordinates through thechange of coordinates

φ : R2(x1, x2) −→ R2(r, θ)

where x1 = r cos θ, x2 = r sin θ. The mapping φ establishes a one-to-one map-ping φ : D −→ E where

D = (r, θ)

∈R2

a

≤ r

≤ b, 0

≤ θ

≤ π

2 ;

and

φ(r, θ) = ∂ (x1, x2)

∂ (r, θ) =

∂x1

∂r

∂x1

∂θ

∂x2

∂r

∂x2

∂θ

=

cos θ −r sin θsin θ r cos θ




so |detφ(r, θ)| = r. The formula for the change of variables and an applicationof Fubini’s Theorem show that

E

log(x21 + x2

2) = D

log(r2) r

=

ba

π/2

0

2r log r dθdr

= π

2

ba

2r log r = π

2r2(log r − 1

2)ba

= π

2π

b2 log b − a2 log a− π

4(b2 − a2).




PROBLEMS: GROUP 1

(1) Evaluate the following integrals:

(i) E

x21 where E is the set x1 ≥ 0, 1 ≤ x2

1 + x22 ≤ 2.

(ii) E x1ex2 where E ⊂ R2 is the triangle with vertices (0, 0), (0, 2), (2, 2).

(iii) E x2ex1x2 where E ⊂ R2 is the set defined by the conditions

1 ≤ x1 ≤ 2, x2 ≤ 1, x1x2 ≥ 1.

(2) Show that the integral R2 x1x2e−x2

1−x2

2 converges, and evaluate it.

(3) Consider the integral E x3 where E ⊂ R3 is the set defined by the con-

ditionsx1 ≥ 0, x2 ≥ 0, 0 ≤ x3 ≤ 1, x2

1 + x22 ≤ 1.

(i) Write this integral as an interated integral in 3 different orders.

(ii) Evaluate this integral, by an iterated integration in any order you choose.

(4) Evaluate the following integrals, using a change of variables to simplifythe calculation:

(i) E

e−x21−x22 where E is the set of points (x1, x2) in the unit disc in R2 forwhich x2 ≥ 0.

(ii) E

1√ x21+x22

where E is the set of points x = (x1, x2) in the plane R2 for

which 1 ≤ x ≤ 2.

(5) Spherical coordinates in R3 are defined by

x1 = r sin φ cos θ, x2 = r sin φ sin θ, x3 = r cos φ.

Find explicitly the formula for the change of variables expressing integration inR3 in terms of spherical coordinates. Use this to evaluate the integral E r of

the function r =

x21 + x2

2 + x23 ≥ 0 over the solid ball E of radius 1.




PROBLEMS: GROUP 2

(6) Show that if E ⊂ Rn has content 0 then its boundary ∂E also has con-tent 0, but that if E ⊂ Rn has measure 0 its boundary ∂ E does not necessarilyhave measure 0.

(7) Show that if f is a non-negative integrable function in a cell ∆ such that ∆

f = 0 then x ∈ ∆ | f (x) = 0 is a set of measure 0.

(8) Show that if f is a monotonically increasing real-valued function in an in-terval [a, b] ⊂ R1 then f is necessarily continuous almost everywhere in [a, b],and consequently is an integrable function.

(9) Let E ⊂ ∆ be a set of content 0 contained in the cell

∆ =

(x1, x2) ∈R2 a1 ≤ x1 ≤ b1, a2 ≤ x2 ≤ b2

,

and let F ⊂ [a1, b1] be the set of those points x1 ∈ [a1, b1] such that the set

F x1 =

x2 ∈ [a2, b2] (x1, x2) ∈ E

⊂ [a2, b2]

as a subset of the one-dimensional cell [a2, b2] does not have content 0. Showthat F is a set of measure 0 in the interval [ a1, b1]. [Suggestion: the character-istic function χ

E is integrable, and Fubini’s Theorem can be applied to it.]

(10) Show that if E ⊂ ∆ is a Jordan domain then for any > 0 there ex-ists a compact Jordan domain F ⊂ E such that |E ∼ F | < . (11) Use polarcoordinates to evaluate the integral

E (1 + x2

1 + x22)−2 over the triangular region

bounded by the triangle with the vertices (0, 0), (2, 0), (1,√

3). (There are otherways of evaluating this integral of course, but for this problem please use polarcoordinates.)



Chapter 5

Differential Forms

5.1 Line Integrals

A particularly interesting special case of integration over proper subsets of Rn is that of integrals over curves in Rn. As on page 55, a parametrized curvein an open subset U ⊂ Rn is a continuous mapping φ : [a, b] −→ U from a closedinterval into Rn; the image of this mapping is a curve in U , a rather specialtype of subset φ([a, b]) ⊂ U . It is useful to distinguish between a mappingand its image, so to distinguish between parametrized curves and curves; aparametrized curve has as its image a unique curve, but a curve can be theimage of a great many different parametrized curves. Much of the subsequentdiscussion will focus on C1 parametrized curves φ, those for which the mapping

extends to a C1

mapping φ : (a − , b + ) −→ U for some > 0; but it iscommon also to consider Ck parametrized curves for any integer k > 0, or evenC∞ parametrized curves, those which are Ck parametrized curves for all k > 0.Unless explicitly asssumed otherwise, however, parametrized curves are merelycontinuous mappings. Also unless explicitly assumed otherwise, parametrizedcurves are not required to be one-to-one mappings φ : [a, b] −→ U , nor areC1 parametrized curves required to have a nonvanishing derivative φ; thusthe images of C1 parametrized curves are considerably more general than one-dimensional C1 submanifolds. Two parametrized curves φ : [a, b] −→ U and ψ :[c, d] −→ U are equivalent parametrized curves if there is a homeomorphismh : [a, b] −→ [c, d] such that φ(t) = ψ

h(t)

for all t ∈ [a, b]; this is a natural

equivalence relation among parametrized curves. It is clear that if φ : [a, b] −→U and ψ : [c, d] −→ U are equivalent parametrized curves then φ([a, b]) =

ψ([c, d]); thus equivalent parametrized curves have the same image curves, butof course parametrized curves having the same image curves are not necessarilyequivalent parametrized curves. In a natural extension of this definition, twoC1 parametrized curves φ : [a, b] −→ U and ψ : [c, d] −→ U are C1-equivalentparametrized curves if the mapping h : [a, b] −→ [c, d] is a C1 homeomorphism,and of course correspondingly for Ck parametrized curves. It is often assumed

101



102 CHAPTER 5. DIFFERENTIAL FORMS

as a standard convention that the parameter interval for a curve is the unitinterval [0, 1], although other normalizations are sometimes more appropriate.

If φ : [0, 1] −→ U is a parametrized curve in an open subset U ⊂ Rn and if f : U −→ R is a function in U , the composition f φ, also denoted by φ∗(f ),is a well defined function on the parameter interval [0, 1] called the inducedfunction or the pullback of the function f under the mapping φ; and if thefunction f is continuous then the induced function φ∗(f ) is a continuous functionin [0, 1]. The integral of a continuous function f in U on the parametrized curveφ : [0, 1] −→ U is defined by

(5.1)

φ

f =

[0,1]

φ∗(f ),

and has a well value for any parametrized curve. The integrals of a continuousfunction f along two equivalent parametrized curves in U need not be the same

though. Indeed if φ : [0, 1] −→ Rm and ψ : [0, 1] −→ Rn are C1 equivalentparametrized curves then φ = ψ h for a continuously differentiable mappingh : [0, 1] −→ [0, 1] and it follows from the change of variables theorem, Theo-rem 4.19, that

ψ

f =

[0,1]f ψ =

[0,1](f ψ) h |h| =

[0,1]f φ |h| is not

necessarily equal to

[0,1]f (φ) =

φ

f . Hence something further must be done

in order to obtain an intrinsic integral of a function along a parametrized curve.One way to obtain a more intrinsic integral is to use an intrinsic parametriza-

tion of the curve. If φ : [0, 1] −→ U is a parametrized curve with image curveγ = φ([0, 1]) ⊂ U , then for any partition P of the interval [0, 1] by points of subdivision 0 = t0 < t1 < · · · < tm = 1 the parametrized curve φ can be approx-imated by the piecewise linear parametrized curve φP for which φP (ti) = φ(ti)for each point of subdivision while the mapping φP is extended to be a linear

mapping between the points φ(ti−1) and φ(ti) so that if ti−1 ≤ t ≤ ti then

(5.2) φP (t) = φ(ti−1) + t − ti−1

ti − ti−1φ(ti).

This is rather like approximating the curve γ by the polygon joining the suc-cessive points φ(ti); but the mapping φ may not be a one-to-one mapping, forinstance it may go back and forth over various segments of the curve γ , andconsequently the approximation of the mapping φ by the piecewise linear map-ping φP may differ significantly from a polygonal approximation of the pointset γ . The length of the piecewise linear approximation φP to the parametrizedcurve φ is

(5.3) L(φ, P ) =

mi=1

φ(ti) −φ(ti−1)2,

which can be considered as an approximation to the length of the parametrizedcurve φ. If P is a refinement of the partition P , obtained by adding a point of subdivision t between ti−1 and ti, the length φ(ti) − φ(ti−1)2 of the linear



5.1. LINE INTEGRALS 103

segment between the points φ(ti) and φ(ti−1) of the piecewise linear approxi-mation φ

P is replaced by the larger sum

φ(ti)

−φ(t)

2 +

φ(t)

−φ(ti

−1)

2 in

the piecewise linear approximation φP , so that L(φ, P ) ≥ L(φ, P ); iteratingthis observation shows that the same inequality holds for any refinement P of the partition P . The arc length of the parametrized curve φ is defined by

(5.4) L(φ) = supP

L(φ, P ),

where the supremum is extended over all partitions P of the parameter inter-val [0, 1]. Of course it may be the case that this supremum is infinite; thoseparametrized curves for which the supremum is finite are called rectifiablecurves, and for such curves the arc length (5.4) has a uniquely determined fi-nite value. There are parametrized curves that are not rectifiable, such as theparametrized curve in R2 given by

(5.5) φ(t) = t, t sin π

t for 0 < t ≤ 1,

0, 0 for t = 0.

If P n is the parttion given by 0 < 1n < 1

n−1 < · · · < 12 < 1 then since

φ 1i

−φ

1i+1

2

≥1i sin πi

2 − 1i+1 sin π (i+1)

2

=

1

i+1 if i is even

1i if i is odd,

it is apparent that limn→∞ L(φ, P n) = ∞, hence that this parametrized curveis not rectifiable.

Lemma 5.1 If φ : [a, b] −→ U and ψ : [c, d] −→ U are two equivalent

parametrized curves and if φ is rectifiable then ψ also is rectifiable and L(φ) =L(ψ).

Proof: Since the parametrized curves φ and ψ are equivalent there is a home-omorphism h : [a, b] −→ [c, d] for which φ = ψ h. Any partition

P : a = t0 < t1 < · · · < tm = b

of the interval [a, b] determines a partition

h(P ) : c = h(a) = h(t0) < h(t1) < · · · < h(tm) = h(b) = d

of the interval [c, d], and conversely any partition of the interval [c, d] arisesin this way from a partition of the interval [a, b]. Since ψ

h(ti)

= φ(ti)

it is clear from the definition (5.3) that Lψ, h(P ) = L(φ, P ), hence that

L(ψ) = supP Lψ, h(P ) = supP L

φ, P = L(φ) and consequently that ψ

also is rectifiable. That suffices for the proof.




Lemma 5.2 If φ : [0, 1] −→ Rn is a rectifiable parametrized curve and L(φ, [a, b])is the arc length of the parametrized curve defined by the restriction φ : [a, b]

−→Rn for any subinterval [a, b] ⊂ [0, 1] then (i) L(φ, [a, b]) ≥ φ(b) −φ(a)2 if 0 ≤ a ≤ b ≤ 1; and

(ii) L(φ, [a, b]) = L(φ, [a, c]) + L(φ, [c, b]) if 0 ≤ a ≤ c ≤ b ≤ 1.

Proof: (i) If 0 ≤ a ≤ b ≤ 1 then a ≤ b is a particular partition P of the interval[a, b] so it follows from (5.3) and (5.4) for this partition that φ(b) − φ(a)2 =L(φ, P ) ≤ L(φ, [a, b]).(ii) For any partition P of the interval [a, b] the length L(φ, P ) is increased byadjoining the point c to the other points of subdivision; so when examining thesupremum of the lengths L(φ, P ) for partitions P of the interval 0, 1] it alwayscan be assumed that c is one of the points of subdivision of the partition P , andconsequently that the partition P consists of a partition P 1 of the interval [a, c]and a partition

P 2 of the interval [c, b]. Therefore L(φ,

P ) = L(φ,

P 1)+ L(φ,

P 2)

for all partitions, so L(φ, [a, b]) = L(φ, [a, c]) + L(φ, [c, b]), which suffices for theproof.

Lemma 5.3 If φ : [0, 1] −→ Rn is a rectifiable parametrized curve and s(t) =L(φ, [0, t]) for any t ∈ [0, 1] then

(i) s(t) is a monotonically increasing continuous function for which s(0) = 0and s(1) = l = L(φ), and

(ii) s(t1) = s(t2) for some parameter values 0 ≤ t1 < t2 ≤ 1 if and only if the

mapping φ is constant in the interval [t1, t2].

Proof: (i) It follows immediately from Lemma 5.2 (ii) that if 0 ≤ t ≤ t ≤ 1then s(t) = s(t) + L(φ, [t, t]) ≥ s(t), so the function s(t) is monotonicallyincreasing. For any > 0 there is a δ > 0 such that

(5.6) φ(t) − φ(t)2 < whenever |t − t| < δ

since the mapping φ is uniformly continuous on the compact interval [0, 1]];moreover there is a partition P of the interval [0, 1] such that

(5.7) 0 ≤ L(φ) − L(φ, P ) <

since the parametrized curve φ is rectifiable, and of course the same inequalityholds for any refinement of the partition P . Hence if t0 ∈ [0, 1] then after arefinement of the partition P if necessary it can be assumed that t0 = tk isone of the points of subdivision of the partition P and that |tk − tk−1| < δ and|tk+1 − tk| < δ . For any point t ∈ [tk−1, tk] adding t to the points of subdivisionof P yields a refinement P of P such that t = tl−1 and t0 = tl for some index l;and it follows from 5.7, which holds for the refinement

P , and from (5.6), (5.7)

and Lemma 5.2 that

> L(φ) − L(φ, P ) =i

L(φ, [ti−1, ti]) − φ(ti) −φ(ti−12

≥ L(φ, [t, t0]) − φ(t0) − φ(t)2 ≥ s(t0) − s(t) −




and consequently that 0 ≤ |φ(t0) − φ(t)| ≤ 2. Of course the correspondingargument shows that for points 0

≤ s(t)

−s(t0)

≤ 2 for points t

∈ [tk, tk+1], so

the function s(t) is continuous.(ii) If φ(t) = φ(t1) for all parameter values t ∈ [t1, t2] it is clear from thedefinition that L(P , [t1, t2]) = 0 for any partition P of the interval [t1, t2] andconsequently that 0 = L(φ, [t1, t2]) = L(φ, [0, t2]) − L(φ, [0, t1]) = s(t2) − s(t1).On the other hand if s(t1) = s(t2) then since s(t) is continuous and mono-tonically increasing it follows that s(t1) = s(t) whenever t1 ≤ t ≤ t2, henceLemma 5.2 (i) shows that that 0 = s(t) − s(t1) = L(φ, [t1, t]) ≥ φ(t) −φ(t1)2

and consequently that φ(t) = φ(t1). That suffices for the proof.

If φ : [0, 1] −→ Rn is a parametrized curve for which s(t) = L(φ, [0, t])is strictly increasing then the mapping s : [0, 1] −→ [0, l] is a homeomor-phism, since it is a continuous one-to-one mapping between two compact sets;and if t : [0, l] −→ [0, 1] is the inverse homeomorphism then the composition

φ0(s) = φ(t(s)) is a parametrized curve that is equivalent to the curve φ. FromLemma 5.1 it follows that

(5.8) L(φ0, [0, s]) = L(φ, [0, t(s)] = s,

so the parametrized curve φ0 is parametrized by arc length. On the otherhand if the function s(t) fails to be stictly increasing then by Lemma 5.3 (ii)that is because the mapping φ is constant on some subintervals of [0, 1]; andthe mapping φ can be modified, essentially by shrinking such subintervals topoints, so that s(t) becomes strictly increasing. In more detail, s : [0, 1] −→ [0, l]is a surjective mapping, since the image of the connected set [0 , 1] under thecontinuous mapping s is also connected and contains the points s(0) = 0 ands(1) = l. Therefore for any point s0 ∈ [0, l] there is some point t0 ∈ [0, l]

for which s(t0) = s0; and if there is another point t0 for which s(t0) = s0 itfollows from Lemma 5.3 (ii) that φ(t0) = φ(t0), so setting ψ(s0) = φs(t0)yields a well-defined mapping ψ : [0, l] −→ Rn, independent of the choice of theparticular point t0, such that ψ

s(t)

= φ(t). This mapping ψ is a continuous

mapping, since if 0 ≤ s1 ≤ s2 ≤ l and if t1, t2 ∈ [0, 1] are any parameter valuess for which s(t1) = s1 and s(t2) = s2 then by Lemma 5.2 (i)

ψ(s2) −ψ(s1)2 = φ(t2) − φ(t1)2

≤ L(φ, [t1, t2]) = s(t2) − s(t1) = s2 − s1.

Further L(ψ, [0, s]) is a strictly increasing function of s. Indeed if L(ψ, [0, s1]) =L(ψ, [0, s2]) where s1 < s2 then by Lemma 5.3 (ii) the mapping ψ must beconstant in the interval [s1, s2] so in particular ψ(s1) = ψ(s2); and if s1 = s(t1)

and s2 = s(t2) then φ(t1) = ψs(t1) = ψs(t2) = φ(t2) and consequentlys1 = s(t1) = s(t2) = s2 by Lemma 5.3 (ii) again, a contradiction. Consequentlyψ also is equivalent to a parametrized curve φ0 that is parametrized by arclength. Thus to any rectifiable parametrized curve φ there is associated by theappropriate one of these two constructions a unique rectifiable parametrizedcurve φ0 that is parametrized by arc length, called the parametrization of




the curve φ by arc length; and it is evident from the construction thatequivalent rectifiable parametrized curves have the same parametrization by

arc length. It is then possible to associate to any continuous function f inan open neighborhood of the image φ([0, 1]) of a rectifiable curve φ a uniqueintegral defined by

(5.9)

φ

f =

[0,l]

f φ0(s)

where φ0 is the parametrization of the curve φ by arc length; this integral oftenis denoted by

φ

f (s) ds, with the convention that ds always denotes integralwith respect to arc length.

Theorem 5.4 A C 1 parametrized curve φ : [0, 1] −→ U in an open subset U ⊂ Rn is rectifiable, and its arc length is

(5.10) L(φ) = [0,1]

φ2.

Proof: For any partition P of the interval [0, 1] it follows from the fundamentaltheorem of calculus in one variable applied to each coordinate function of themapping φ and the inequality (4.24) that

L(φ, P ) =mi=1

φ(ti) −φ(ti−12 =mi=1

[ti−1,ti]

φ

2(5.11)

≤mi=1

[ti−1,ti]

φ2 =

[0,1]

φ2;

and since φ2 is continuous on the full interval [0, 1] this integral is finite; hencethe curve γ is rectifiable. For any point t ∈ [0, 1) and any h > 0 sufficiently smallthat t + h ≤ 1 it follows from (5.11) applied to the restriction of the mapping φto the interval [t, t + h] that

(5.12) L(φ, [t, t + h]) ≤

[t,t+h]

φ2.

On the other hand for any unit vector u ∈ Rn it follows from Lemma 5.2 (i),the Cauchy-Schwarz inequality and the mean value theorem applied to the real-valued function u · φ(t) that

(5.13) L(φ, [t, t+h]) ≥ φ(t+h)−φ(t)2 ≥ |u·

φ(t+h)−φ(t)

| = |u·φ(τ )| h

for some point τ in the interval t < τ < t + h. Since L(φ, [t, t + h]) = L(φ, [0, t +h)]−L(φ, [0, t]) = s(t+h)−s(t) by Lemma 5.2 (ii) it follows from the inequalities(5.12) and (5.13) that

|u · φ(τ )| ≤ 1

h

s(t + h) − s(t)

≤ 1

h

[t,t+h]

φ2,




which in the limit as h tends to 0, and consequently τ tends to t, becomes theinequality

|u ·φ(t)| ≤ limh→0

1hs(t + h) − s(t) ≤ φ(t)2;

in particular since u·φ(t) = φ(t)2 when u is the unit vector in the direction of the vector φ(t), the preceding inequality shows that limh→0

1h

s(t+h)−s(t)

=

φ(t)2. The corresponding argument for h < 0 yields the same result, hencethe function s(t) is differentiable and

(5.14) d

dts(t) = φ(t)2.

It finally follows from the fundamental theorem of calculus that

L(φ) = s(1) = [0,1]

d

dt

s(t) = [0,1] φ

2,

and that concludes the proof.

For example if φ : [0, 3] −→ R2 is the parametrized curve described by themapping φ(t) = 1

3t3 − t, t2 then φ(t) =

t2 − 1 2t

so φ(t)2

2 = (t2 + 1)2

and consequently the arc length is L(φ) =

[0,2](t2 + 1) = 12. For any C1

parametrized curve φ it follows from (5.14) that its arc length s(t) = L(φ, [0, t])is a C1 mapping s : [0, 1] −→ [0, l] where s(1) = l = L(φ) is the arc lengthof φ; and if it is also assumed that φ(t) = 0 at all points t ∈ (0, 1) then thefunction s(t) is strictly increasing, and by the inverse mapping theorem it is a C1

homeomorphism so the curve φ is C1 equivalent to a curve that is parametrizedby arc length.

Corollary 5.5 A C1 parametrized curve φ : [0, l] −→ γ ⊂ Rn is the parametriza-tion of a curve by arc length if and only if φ(s)2 = 1 for all points s ∈ [0, l].

Proof: If φ : [0, l] −→ Rn is a C 1 curve parametrized by arc length thens(t) = L(φ, [0, t]) = t and it follows from (5.14) that φ(t)2 = s(t) = 1.Conversely if φ : [0, l] −→ Rn is a parametrized curve for which φ(s)2 = 1then it follows from (5.10) that L(φ, [0, t]) =

[0,t]

φ2 =

[0,t] 1 = t, and that

suffices for the proof.

If γ ⊂ Rn is a C 1 curve parametrized by arc length under the mappingφ : [0, l] −→ γ ⊂ Rn, it follows from Corollary 3.10 to the Rank Theoremthat the image of an open neighborhood of any parameter value s0 ∈ [0, l] is aC1 submanifold of an open neighborhood of the point φ(s0) ∈ U . The vector

φ(s0) is a tangent vector to that submanifold, as in the discussion on page 57,and has length φ(s0)2 = 1, by the preceding Corollary 5.5; thus it is theunit tangent vector to the local piece of the curve γ at the point φ(s0),and is denoted by τ (s0) = φ(s0). The curve γ may intersect itself at somepoints, at which there are unit tangent vectors of various different directions,to the unit tangent vectors to the various local components of the curve γ . If




f is a continuous vector field in an open neighborhood of γ , the dot productf φ(s) ·

τ φ(s) = f φ(s) ·φ(s) is a well defined continuous function of the

variable s ∈ [0, l] so it has a well defined integral

(5.15)

γ

f · τ =

[0,l]

f φ(s)

· φ(s).

called the line integral of the vector field f along the curve γ . This integral isnot only intrinsically defined, but is actually independent of the parametrizationof the curve.

Theorem 5.6 If ψ : [a, b] −→ Rn is a C1 parametrized curve for which ψ (t) =0 for a ≤ t ≤ b then

(5.16) ψ f ·τ = [a,b]

f ψ(t) ·ψ(t)

for any continuous vector field f in U .

Proof: The parametrized curve ψ is equivalent to a curve φ parametrized byarc length, since ψ(t) = 0; and the line integral of the vector field f alongthe parametrized curve ψ is defined in terms of the parametrized curve φ. If h : [a, b] −→ [0, l] is a C1 homeomorphism such that ψ(t) = φ

h(t)

then by the

chain rule ψ(t) = φ

h(t)

h(t), so by the formula for the change of variablesin integration for functions of a single variable it follows as in (5.15) that

γ f · τ =

[0,1]

f

φ(s)

·φ(s) =

[a,b]

f

φ(h(t))

· φ

h(t)

h(dt)

=

[a,b]

f ψ(t)

·ψ(t),


For example, consider the integral of the vector field f (x) = x21, x1x2 in

terms of the variables x1, x2 in R2 over the parabola described parametricallyby the mapping φ : [0, 1] −→ R2 for which φ(t) = t, t2. Since φ(t) = 1, 2tthen f

φ(t)

= t2, t3 so f

φ(t)

· φ(t) = t2 + 2t4 and consequently γ

f · dτ =

[0,1]

f φ(t)

· φ(t)

= [0,1]

(t2 + 2t4) = 13

t3 + 25

t5 10

= 1115

.

A more interesting example is the line integral of a vector field that is thegradient f (x) = ∇h(x) of a function h in an open set U ⊂ Rn, a vector fieldcalled a conservative vector field with the potential h.




Theorem 5.7 If f (x) = ∇h(x) is a conservative vector field in a connected

open subset U

⊂ Rn with a

C1 potential h, and if ψ : [0, 1]

−→ U is a

C1 curve

in U from a point ψ(0) = a ∈ U to a point ψ(1) = b ∈ U then

(5.17)

ψ

f · τ = h(b) − h(a).

Proof: In view of the preceding theorem it is sufficient to demonstrate thisresult just for a curve parametrized by arc length; so let φ : [0, l] −→ U be theparametrization of a curve by arc length, where φ(s) = φj(s) and φ(0) = a,φ(l) = b. By the chain rule

d

dshφ(s)

= h

φ(s)

φ(s) =

nj=1

∂ jhφ(s)

φj(s) =

nj=1

f jφ(s)

φj(s)

= f φ(s) ·φ(t),

and consequently γ

f · τ ds =

1

0

f φ(s)

· φ(s) =

1

0

dhφ(s)

= h

φ(l)

− hφ(0)

= h(b) − h(a),


Thus the integral of a conservative vector field along a curve really dependsonly on the beginning and end points of the curve, and is quite independent of the particular path taken between these two points. There are vector fields that

are not conservative, so for which the line integral between two points dependson the path chosen between those two points. For example for the vector fieldf (x) = x2, −x1 in R2, and the parametrized curves ψ1(t) = t, −t andψ2(t) = t, −t2 for t ∈ [0, 1]

ψ1

f · τ =

[0,1]

−t, −t · 1, −1 dt =

[0,1]

0 dt = 0,

ψ2

f · τ =

[0,1]

−t2, −t · 1, −2tdt =

[0,1]

t2dt = 1

3.

Actually the condition that the line integral of a vector field is independent of the path characterizes conservative vector fields.

Theorem 5.8 A C1 vector field f : U −→ Rn in a connected open subset U ⊂Rn is conservative if and only if the line integrals

φ

f · τ depend only on the

end points of C1 parametrized curves φ in U .




Proof: The preceding theorem demonstrated that if f is a conservative vectorfield then the line integral φ f

·τ depends just on the beginning and end points of

any C1 parametrized curve φ in U . Conversely suppose that f is a C1 vector fieldin a connected open subset U ⊂ Rn such that the line integals

φ

f ·τ depend only

on the beginning and end points of C1 parametrized curves φ : [0, 1] −→ U . Fora fixed base point a ∈ U the integral h(x) =

φx

f ·τ along any C1 parametrized

curve φx from the point a to the point x is a well defined function of the pointx = x1, . . . , xn ∈ U , independent of the particular choice of the parametrizedcurve φx. In particular consider a parametrized curve φx that describes a pathfrom a to the straight line λi through the point x parallel to the coordinate axisof the variable xi and then along λi to the point x, where near the point x theparametrization has the form

φx(t) = x1, . . . , xi−1, t , xi+1, . . . , xn for bi ≤ t ≤ xi

so φx = δ i is the unit vector parallel to λi. The function h(x) then can bewritten as the sum of the value A of the integral up to a point φx(bi) on theline λi and the integral

h(x) − A =

t∈[t0,xi]

φ(x) · τ = t∈[bi,xi]

f i(x1, . . . , xi−1, t , xi+1, . . . , xn)

along λi; and it then follows from the fundamental theorem of calculus in onevariable that ∂ ih(x) = f i(x). The corresponding argument for paths endingparallel to other coordinate axes in Rn shows that ∇h(x) = f (x), hence thatthe vector field f is conservative, thus concluding the proof.



5.2. DIFFERENTIAL FORMS 111

5.2 Differential forms

The differentials dxj of the coordinate functions xj in Rn are a basis for thevector space Rn, as in Section 2.4. Vector fields over an open subset U ⊂ Rn,mappings f : U −→ Rn with the coordinate functions f (x) = f j(x), whenwritten as the sums

(5.18) ω(x) =n

j=1

f j(x) dxj

of multiples of the basis vectors dxj , are called differential forms of degree 1

in U , or for short just differential 1-forms in U . Mappings f : U −→ R(nr) forindices 2 ≤ r ≤ n with the coordinate functions f = f j1···jr (x) for 1 ≤ j1 <

j2 < · · · < jr ≤ n, when written as the sums

(5.19) ω(x) =

j1<···<jr

f j1···jr(x)dxj1 ∧ · · · ∧ dxjr

of multiples of the basis vectors dxj1 ∧ · · · ∧ dxjr that span the exterior algebraΛr of r-vectors over the vector space Rn as discussed in Appendix A, are calleddifferential forms1 of degree r in U , or for short just differential r-forms inU . These are continuous differential forms if the coefficients f j(x) or f j1···jr(x)are continuous functions in U , and are Ck differential forms if these coefficientsare Ck functions in U . For differential forms of degree r > 1 it is sometimesconvenient to introduce the additional vectors dxi1 ∧ · · · ∧ dxir for any order of the indices i1, . . . , ir, where these satisfy the skew-symmetry conditions (A.2);thus each of these additional vectors is equal to either one of the basis vectors

of Λr

or the negative of one of these basis vectors. Differential r-forms in U canbe written alternatively in terms of these vectors as sums

(5.20) ω(x) =

ni1,...,ir=1

gi1···ir (x)dxi1 ∧ · · · ∧ dxir .

The coefficients f j1···jr (x) in (5.19) are of course uniquely determined, whilethe coefficients gi1···ir (x) in(5.20) are not uniquely determined. For example a

1There is often some mystery made about what a differential form really is. Differentialforms in open subsets of Rn are merely special vector-valued functions, described explicitly interms of basis vectors associated to the basis vectors in Rn; the detailed definitions are chosento simplify the numerous calculations in linear algebra arising from the change of variablesformula in various contexts, and to make the results of these calculations easier to remember

and to use. The properties of differential forms under changes of coordinates in Rn show thatthey are really rather intrinsically defined, which of course raises the temptation to definethem intrinsically. Unfortunately the traditional and perhaps easiest intrinsic approach is firstthrough the dual vector space, and then through a more general discussion of the relationsbetween the exterior algebras over dual vector spaces; for a first approach in multivariablecalculus that seems rather unnecessarily complicated and distracting, which is the motivationfor the alternative though nonintrinsic approach adopted in these notes.




differential form of degree 2 in R3 can be written

ω(x) = g12(x)dx1 ∧ dx2 + g13(x)dx1 ∧ dx3(5.21)+ g21(x)dx2 ∧ dx1 + g23(x)dx2 ∧ dx3

+ g31(x)dx3 ∧ dx1 + g32(x)dx3 ∧ dx2,

or since dxi1 ∧ dxi2 = −dxi2 ∧ dxi1 it can be written alternatively

ω(x) =

g12(x) − g21(x)

dx1 ∧ dx2 +

g13(x) − g31(x)

dx1 ∧ dx3(5.22)

+

g23(x) − g32(x)

dx2 ∧ dx3

thus as the differential form

ω(x) = f 12(x)dx1 ∧ dx2 + f 13(x)dx1 ∧ dx3(5.23)

+ f 23(x)dx2 ∧ dx3

where(5.24)f 12(x) = g12(x) − g21(x), f 13(x) = g13(x) − g31(x), f 23(x) = g23(x) − g32(x).

The functions f j1j2 are uniquely determined; but it is possible for instance toreplace g12(x) and g21(x) by g12(x) + g(x) and g21(x) + g(x) for any functiong(x) and still have the same differential 2-form. Of course since dx1 ∧ dx1 =dx2 ∧ dx2 = dx3 ∧ dx3 = 0 the coefficients gii(x) are totally irrelevant. Adifferential form written as in (5.19) is said to be in the reduced form, while adifferential form written as in (5.20) is said to be in the unreduced form. Itis often convenient to use multi-index notation, writing the unreduced form

(5.20) as

(5.25) ω(x) =

I gI (x)dxI

where I = (i1, i2, . . . , ir) and gI (x) is an abbreviation for gi1···ir (x) while dxI isan abbreviation for dxi1 ∧ · · · ∧ dxir , and writing the reduced form (5.19) as

(5.26) ω(x) =

I f I (x)dxI

with the corresponding meaning of the terms; thus

I denotes a sum over allindices i1, . . . , ir in the range 1 ≤ i1, . . . , ir ≤ n while

I denotes a sum over all

ordered sets of indices 1 ≤ i1 < · · · < ir ≤ n. The set of Ck differential forms of degree r in the set U is denoted by Λr

(k)(U ); however it is quite common to dropthe subscript k when the regularity of the differential forms is either clear from

the context or irrelevant. By definition the set Λ0(k)(U ) of differential forms of

degree 0 in U is just the set of Ck functions in U , while the set Λn(k)(U ) consists

of differential forms of degree n in U , each of which can be written uniquely inthe reduced form as

(5.27) ω(x) = f 12···n(x)dx1 ∧ · · · ∧ dxn;




this set of differential forms also can be identified with the set of Ck functionsin U by identifying the differential form (5.27) with the function f 12

···n(x). Of

course Λr(k)(U ) = 0 whenever r > n, as a consequence of (A.2).Differential forms can be combined algebraically, either by linear combina-

tions or by exterior multiplication. The linear combination aω(x) + bσ(x)of two differential r-forms ω(x) =

I f I (x)dxI and σ(x) =

I gI (x)dxI for

a, b ∈ R is defined to be the differential r-form

(5.28)

aω + bσ

(x) =I

a f I (x) + b gI (x)

dxI ;

under this operation the set Λr(k)(U ) has the natural structure of a real vector

space, although an infinite-dimensional vector space. The exterior productω(x) ∧ σ(x) of a differential r-form ω(x) =

I f I (x)dxI ∈ Λr

(k)(U ) and a dif-

ferential s-form σ(x) =

I gI (x)dxI ∈ Λs

(k)(U ) is defined to be the differentialr + s-form

(5.29) (ω ∧ σ)(x) =I,J

f I (x)gJ (x)dxI ∧ dxJ ∈ Λr+s(k) (U ).

This operation is clearly associative, in the sense that

(5.30) ω1 ∧ ω2 ∧ ω3

=

ω1 ∧ ω2

∧ ω3

for any differential forms ωi ∈ Λri(k). However it is not commutative but rather

satisfies the condition that

(5.31) (ω ∧ σ)(x) = (−1)rs(σ ∧ ω)(x) if ω ∈ Λr(k)(U ) and σ ∈ Λs

(k)(U );

for dxI ∧ dxJ = (−1)rsdxJ ∧ dxI since to rearrange the left-hand side eachdifferential dxj must be permuted with each of the r differentials dxi, a change

of sign (−1)r for each of the s differentials dxj . In particular the exteriorproduct of a 0-form, a function f (x), and an r-form ω(x) =

J gJ (x)dxJ is the

r-form (f ω)(x) =

J f (x)gJ (x)dxJ . Exterior products and linear operationsare related by the distributive condition that

(5.32) (a1ω1 + a2ω2) ∧ σ = a1ω1 ∧ σ + a2ω2 ∧ σ

for differential forms ωi ∈ Λr(k)(U ) and σ ∈ Λs

(k)(U ), as is evident from thedefinitions.

These algebraic operations on differential forms are quite familiar in thespecial case of three dimensional vector spaces. A vector field in an open setU ⊂ R3 is naturally identified with a differential 1-form, as already discussed;but since dim Λ2(R3) = 3 a vector field also can be identified with a differential2-form. Thus to a vector field f (x) =

f j(x)

in U

⊂ R3 there are naturally

associated the two differential forms

ωf (x) = f 1(x)dx1 + f 2(x)dx2 + f 3(x)dx3 ∈ Λ1(U )

and(5.33)

Ωf (x) = f 1(x)dx2 ∧ dx3 + f 2(x)dx3 ∧ dx1 + f 3(x)dx1 ∧ dx2 ∈ Λ2(U ),




where the notation for the 2-form Ωf (x), involving cyclic permutations of theindices, is chosen to fit into the classical form for the algebraic operations. The

exterior product of the 1-form associated to a vector field f : U −→ R3 and the2-form associated to a vector field g : U −→ R3 is easily seen to be the 3-form

(5.34) ωf (x)∧Ωg(x) =

f 1(x)g1(x)+ f 2(x)g2(x)+ f 3(x)g3(x)

dx1∧dx2 ∧dx3;

this is the 3-form associated to the dot product or inner product f (x) · g(x) of the two vector fields f (x) and g(x), so that

(5.35) ωf (x) ∧ Ωg(x) = f (x) · g(x) dx1 ∧ dx2 ∧ dx3,

and in this case the exterior product is commutative. On the other hand astraightforward calculation shows that the exterior product of the 1-forms as-sociated to two vector fields f : U

−→R3 and g : U

−→R3 is the 2 form

ωf (x) ∧ ωg(x) = det

f 2(x) f 3(x)g2(x) g3(x)

dx2 ∧dx3 + det

f 3(x) f 1(x)g3(x) g1(x)

dx3 ∧ dx1

(5.36)

+ det

f 1(x) f 2(x)g1(x) g2(x)

dx1 ∧ dx2,

which is the 2 form associated to the vector field f (x)×g(x), traditionally calledcross-product of the two vector fields f (x) and g(x), defined by(5.37)

f (x)×g(x) =

det

f 2(x) f 3(x)g2(x) g3(x)

, det

f 3(x) f 1(x)g3(x) g1(x)

, det

f 1(x) f 2(x)g1(x) g2(x)

;

when this vector is expressed as a linear combination of some basis vectors

1, 2, 3 it can be written symbolically as the determinant

(5.38) f (x) × g(x) = det

1 2 3

f 1(x) f 2(x) f 3(x)g1(x) g2(x) g3(x)

,

which is the easiest form in which to remember the defining equation (5.37). Inthese terms then

(5.39) ωf (x) ∧ ωg(x) = Ωf ×g(x);

the exterior product of 1-forms is skew-symmetric, as is the cross-product, so

(5.40) ωg(x) ∧ ωf (x) = −ωf (x) ∧ ωg(x) and f (x) × g(x) = −g(x) × f (x).

It is clear from (5.37) that the cross-product f (x)×

g(x) is trivial at a point xprecisely when the two vectors f (x) and g(x) are linearly dependent. It is alsoclear from (5.37) that for any other vector field h(x) in U

(5.41) h(x) · f (x) × g(x)

= det

h1(x) h2(x) h3(x)f 1(x) f 2(x) f 3(x)g1(x) g2(x) g3(x)

,




which provides an alternative characterization of the cross-product. An evidentconsequence of (5.41) is that f (x)

· f (x)

×g(x) = g(x)

· f (x)

× g(x) = 0;

hence the cross-product f (x) × g(x) is a vector in R3 that is perpendicular toboth f (x) and g(x) if these vectors are linearly independent, so it is determineduniquely up to sign. As for the sign, it follows from (5.37) then that if f (x0) =1, 0, 0 and g(x0) = 0, 1, 0 at a point x0 ∈ U then f (x0) × g(x0) =0, 0, 1; consequently the three vectors f (x0), g(x0), f (x0) × g(x0) have thesame orientation as the three basis vectors dx1, dx2, dx3 ∈ R3, the naturalorientation associated of the vector space R3 with this choice of coordinates.

Another algebraic operation is Hodge duality, the linear mapping

(5.42) ∗ : Λr(U ) −→ Λn−r(U ) for open sets U ⊂ Rn

that associates to a differential form ω =

I f I dxI ∈ Λr(U ) the differential

form

∗ω = 1

r!(n − r)!

I,J (I, J )f I dxJ (5.43)

=

I,J (I, J )f I dxJ

where (I, J ) is the sign of the permutation I , J of the integers 1, 2, . . . , n. Thusfor example in R2

(5.44) ∗(f 1dx1 + f 2dx2) =

1≤i,j≤2

(i, j)f i dxj = f 1 dx2 − f 2 dx1

and in R3

∗(f 1 dx1 + f 2 dx2 + f 3 dx3) = 12

1≤i,j1,j2≤3

(i, j1, j2)f i dxj1 ∧ dxj2

(5.45)

=

1≤i≤3

1≤j1<j2≤3

(i, j1, j2)f i dxj1 ∧ dxj2

= f 1 dx2 ∧ dx3 − f 2dx1 ∧ dx3 + f 3dx1 ∧ dx2

or equivalently ∗ωf = Ωf for the differential forms associated to any vector fieldf in an open subset U ⊂ R3. This operation has the property that

(5.46) ∗ ∗ ω = (−1)r(n−r)ω for any differential r-form ω in Rn.

To demonstrate this, if ω =

I f I dxI is an r-form in Rn then by definition

∗(∗ω) = ∗I,J (I, J )f I dxJ (5.47)

=

I,J,K (J, K )(I, J )f I dxK .

For any ordered set of r indices 1 ≤ k1 < · · · < kr ≤ n there is a uniquecomplementary ordered set of n − r indices 1 ≤ 1 < · · · < jn−r ≤ n such that




(J, K ) is a permutation of the integers (1, 2, . . . , n), and then there is a furtherunique complementary ordered set of r indices 1

≤ i1 <

· · · < ir

≤ n such

that (I, J ) also is is a permutation of the integers (1, 2, . . . , n), for which clearlyI = K ; thus the only nontrivial terms in the sum (5.46) are those for whichthe indices J and I are uniquely determined by the indices K , indeed for whichI = K , and since (I, J )(J, K ) = (K, J )(J, K ) = (−1)r(n−r) just as in theearlier examination (5.31) of exterior products it follows that (5.47) reduces to(5.46) as desired. For instance since it was already noted in (5.45) that ∗ωf = Ωf

for any vector field f in R3, it follows from (5.46) that ∗Ωf = ∗ ∗ ωf = ωf .The most characteristic property of Hodge duality2 though is that if ω =

I f I (x)dxI and σ =

J gJ (x)dxJ are two r-forms in Rn then

(5.48) ω ∧ ∗σ =

I f I (x)gI (x)dx1 ∧ · · · ∧ dxn.

Indeed by definition

ω ∧ ∗σ =

I,J,K f I (x)dxI ∧ (J, K )gJ (x)dxK (5.49)

=

I,J,K (J, K )f I (x)gJ (x)dxI ∧ dxK .

For any ordered set of r indices 1 ≤ i1 < · · · < ir ≤ n there is a unique orderedset of n − r indices 1 ≤ k1 < · · · < kn−r ≤ n such that dxI ∧ dxK is nontrivial,and for that ordered set of indices dxI ∧ dxK = (I, K )dx1 ∧ dx2 ∧ · · · ∧ dxn;then there is a unique ordered set of r indices 1 ≤ j1 < · · · < jr ≤ n such that(J, K ) = 0, and clearly J = I . Thus the choice of an ordered set of indices I determines unique ordered sets of indices J and K for which the associated termin the sum (5.49) is nontrivial, and since (I, K )dxI ∧dxK = dx1 ∧dx2 ∧· · ·∧dxn

that sum reduces to (5.48).

If φ : D −→ E is a mapping from an open subset D ⊂ Rm

into an opensubset E ⊂ Rn then to any function f in the set E there can be associated theinduced function φ∗(f ) in the set D, the function defined by

(5.50) φ∗(f )(x) = f φ(x)

for all points x ∈ D.

It is clear that φ∗(c1f 1 + c2f 2) = c1φ∗(f 1) + c2φ

∗(f 2) for any functions f 1, f 2in D and any constants c1, c2 ∈ R, and that if the mapping φ and the functionf are Ck then so is the induced function φ∗(f ); thus associating to a functionf ∈ Λ0

(k)(E ) the induced function φ∗(f ) ∈ Λ0(k)(D) is a linear mapping

(5.51) φ∗ : Λ0(k)(E ) −→ Λ0

(k)(D).

2

The version of Hodge duality discussed here is a special case of the general Hodge duality,for which the characteristic property is that

ω ∧ ∗σ =P

I,J hIJ f I (x)gI (x)dx1 ∧ · · · ∧ dxn

where hIJ = hi1j1hi2j2 · · · hinjn for some positive definite symmetric matrix hij ; actually forsome applications in physics the matrix hij is not even necessarily positive definite, but is aLorentz metric, having one negative and n − 1 positive eigenvalues.




More generally to any mapping f : E −→ RN there can be associated theinduced mapping φ∗(f ) in the set D, the mapping defined by

(5.52) φ∗(f )(x) = f φ(x)

for all points x ∈ D;

and again if the mappings φ and f are Ck then so is the induced mapping φ∗(f ).Differential forms ω ∈ Λr(E ) in an open set E ⊂ Rn are just mappings from E into a vector space RN , expressed in terms of a basis for RN that is determinedby the basis dy1, . . . , d yn for the vector space Rn if y1, . . . , yn are the coordinatesin Rn; on the other hand differential forms ω ∈ Λr(D) in an open set D ⊂ Rm

are mappings from D into a vector space Rm, expressed in terms of a basis forRm that is determined by the basis dx1, . . . , d xm for the vector space Rm if x1, . . . , xm are the coordinates in Rm. The induced differential form under themapping φ : D −→ E again is just the induced mappings as in (5.52), but mustalso reflect the change in the bases for the two vector spaces Rm and Rn defined

by the mapping φ. If the mapping φ is at least C1

then the coordinates yi of the point φ(x) are continuously differentiable functions of the variables xj , andby the chain rule the derivatives of these functions are the row vectors

yi(x) =

∂yi

∂x1,

∂yi

∂x2, · · · ∂yi

∂xm

,

which can be written equivalently in terms of the bases dxi of the space Rm anddyi of the space Rn as the linear combination

(5.53) dyi =

mj=1

∂yi

∂xjdxj

of the basis vectors dxj . The induced differential form, φ∗(ω)

∈ Λr(D) on

D consequently is defined by

φ∗

I

f I (y)dyI

=I,J

f I φ(x)

∂yi1

∂xj1

dxj1 ∧ · · · ∧ ∂yir

∂xjr

dxjr(5.54)

=I,J

f I φ(x)

∂yi1

∂xj1

· · · ∂ yir∂xjr

dxj1 ∧ · · · ∧ dxjr .

This is sometimes also referred to as the pullback of the differential form ω

from E to D. It is clear that φ∗

Λr(k)(E )

⊂ Λr

(k−1)(D) and that

(5.55)φ∗(c1ω1 +c2ω2) = c1φ

∗(ω1)+ c2φ∗(ω2) for any ω1, ω2 ∈ Λr(E ) and c1, c2 ∈ R,

so the mapping φ∗ is a linear mapping

(5.56) φ∗ : Λr(k)(E ) −→ Λr

(k−1)(D);

and it is also clear that

(5.57) φ∗(ω ∧ σ) = φ∗(ω) ∧ φ∗(σ) for any ω ∈ Λr(E ) and σ ∈ Λs(E ).




If D, E ⊂ Rn are both subsets of vector spaces of the same dimension andω

∈ Λn

(1)(E ) is a differential n-form in E then ω(y) = f (y)dy1

∧ · · · ∧dyn for

some function f in E and (5.54) reduces to

φ∗

f (y)dy1 ∧ · · · ∧ dyn

=J

f φ(x)

∂ y1

∂xj1

· · · ∂yn

∂xjn

dxj1 ∧ · · · ∧ dxjn(5.58)

= f φ(x)

detφ(x)dx1 ∧ · · · ∧ dxn.

This illustrates the role of exterior algebras as convenient tools in handlingdeterminants, and in a sense underlies all the applications of differential formsin analysis and geometry.

Theorem 5.9 If φ : D −→ E is a Ck mapping between open subsets D ⊂ Rm

and E ⊂ Rn while ψ : E −→ F is a Ck−1 mapping between open subsets

E ⊂ Rn and F

⊂ R p then (ψ

φ)∗ : Λr

(k)(F )

−→ Λr

(k−2)(D) is the composition

(ψ φ)∗ = φ∗ ψ∗ of the mapping φ∗ : Λr(k)(E ) −→ Λr

(k)(D) and the mapping

ψ∗ : Λr(F )(k) −→ Λr(E )(k).

Proof: If the coordinates in D are t1, . . . , tm, the coordinates in E are x1, . . . , xn,and the coordinates in F are y1, . . . , tl, then for any differential form ω(y) =

I f I (y)dyI it follows that

ψ∗(ω)(x) =

I,J

f I ψ(x)

∂yI

∂xJ (x)dxJ

and

φ∗ψ∗(ω)(t) = I,J,K

f I ψ(φ(t))∂yI

∂xJ φ(t)∂xJ

∂tK (t)dtK .

On the other hand for the composite mapping ψ φ

(ψ φ)∗(ω)(t) =I,K

f I ψ φ)(t)

∂yI

∂tK (t)dtK ,

and it follows from the chain rule that

∂yI

∂tK =J

∂yI

∂xJ (φ(t))

∂xJ

∂tK (t).

Consequently (ψ φ)∗(ω) = φ∗

ψ∗(ω)

, which suffices for the proof.

In addition to the algebraic operations and change of variables formula fordifferential forms discussed so far, there are analytic operations of differentiationand integration of differential forms. The exterior derivative of a differentialform ω (x) ∈ Λr

(k)(U ) for k > 0 of degree r is defined to be a differential form

dω(x) ∈ Λr+1(k−1)(U ) of degree r + 1. Explicitly, if f (x) ∈ Λ0

(k) is a differential

form of degree 0, a function defined in the subset U ⊂ Rn, its exterior derivative




is defined to be the derivative of that function expressed as a differential form,so

(5.59) df (x) =n

j=1

∂ jf (x)dxj if f ∈ Λ0k(U ).

If ω(x) =

I f I (x)dxI ∈ Λr(k) is a differential form of degree r > 0 its exterior

derivative is defined by

(5.60) dω(x) =I

df I (x) ∧ dxI ,

so that(5.61)

d n

i1,...ir=1 f i1···ir (x) dxi1∧···∧dxir =

n

j,i1,...ir=1 ∂ jf i1···ir (x) dxj∧dxi1∧···∧dxir .

A clear consequence of this is that dω = 0 for any n-form ω in an open subsetU ⊂ Rn.

Theorem 5.10 Exterior differentiation is a linear mapping

(5.62) d : Λr(k)(U ) −→ Λr+1

(k−1)(U )

of Ck differential r-forms in an open subset U ⊂ Rn for k ≥ 1, such that

(i) d

ω(x) ∧ σ(x)

= dω(x) ∧ σ(x) + (−1)rω(x) ∧ dσ(x)if ω (x) ∈ Λr

(k)(D) and σ(x) ∈ Λs(k)(D) for k ≥ 1, while

(ii) d d ω(x) = 0 for any differential form ω ∈

Λr

(2)(D).

Proof: That exterior differentiation is a linear mapping on r-forms is an im-mediate consequence of the definitions (5.59) and (5.60).(i) For functions it follows from the usual rules for differentiation of a productthat

(5.63) d

f (x)g(x)

= f (x)dg(x) + g(x)df (x).

In general if ω =

I f I (x)dxI ∈ Λr(1)(D) and σ =

J gJ (x)dxJ ∈ Λs

(1)(D) it

follows from (5.63) that

d

ω(x) ∧ σ(x)

= d

I,J f I (x)gJ (x)dxI ∧ dxJ

=

I,J d

f I (x)gJ (x)

∧ dxI ∧ dxJ

=I,J

gJ (x)df I (x) + f I (x)dgJ (x)

∧ dxI ∧ dxJ

=I,J

df I (x)) ∧ dxI ∧ gJ (x)dxJ + (−1)rI,J

f I (x)dxI ∧ dgJ (x) ∧ dxJ ,




where the sign (−1)r arises since the 1-form dgJ (x) must be moved across ther differentials in dxI .

(ii) If ω(x) = I f I (x)dxI ∈ Λr(2) then

d

dω(x)

= d

I

df I (x) ∧ dxI

= d

I,j

∂f I (x)

∂xj∧ dxj ∧ dxI

=I,j,k

∂ 2f I (x)

∂xj∂xkdxk ∧ dxj ∧ dxI ,

which is zero since ∂ jkf I (x) is symmetric in the indices j, k by Theorem 2.4while dxk ∧ dxj is skew-symmetric in these indices. That suffices for the proof.

A differential form ω(x) is said to be closed if dω(x) = 0, and it is saidto be exact if ω(x) = dσ(x) for some differential form σ(x). A consequence

of Theorem 5.10 (ii) is that any exact differential form is necessarily a closeddifferential form; but the converse is not true in general. For example, in thecomplement of the origin in the plane R2 of the variables x1, x2 the functionθ(x1, x2) = tan−1(x2/x1), the angle in polar coordinates, is well defined in anyhalf-plane bounded by a line through the origin, such as the half-planes x1 > 0and x1 < 0, but it cannot be defined in the entire complement of the origin asa single-valued function. However

ω(x) = dθ(x1, x2) = − x2

x21 + x2

2

dx1 + x1

x21 + x2

2

dx2

is a well defined closed differential 1-form in the complement of the origin in R2.If ω(x) were an exact differential form in the full complement of the origin it

would be the exterior derivative ω(x) = df (x) of a well defined function f (x) inthe complement of the origin; but then df (x) = dθ(x) so that df (x)−θ(x) = 0hence f (x)−θ(x) = c is a constant in the complement of the origin, which wouldimply that θ(x) = f (x) − c is a well defined function in the complement of theorigin, a contradiction. However it is at least true locally that any closed formis exact.

Theorem 5.11 (Poincare’s Lemma) If ω (x) ∈ Λr(1)(∆) is a closed differen-

tial form of degree r > 0 in the standard cell ∆ ⊂Rn then there is a differential

form σ (x) ∈ Λr−1(2) (∆) of degree r − 1 in ∆ such that ω(x) = dσ(x).

Proof: Consider differential r-forms ω (x) =

I f (x)dxI that involve only the

differentials dx1, dx2, . . . , d xk for some index 1

≤ k

≤ n, so that

ω(x) =

1≤i1<···<ir≤k f i1,··· ,ir (x)dxi1 ∧ · · · ∧ dxir .

The theorem will be demonstrated by induction on the index k. If k = 1 thenω(x) = f (x)dx1 and 0 = dω(x) =

ni=2 ∂ if (x)dxi ∧ dx1 so ∂ if (x) = 0 for i > 1

and consequently f (x) = f (x1) is a function of the variable x1 alone, where




x1 ∈ [0, 1]. The integral g(x1) =

x1

0 f is a C2 function of the variable x1 ∈ I for

which dg (x) = dg (x1) = f (x1)dx1 = ω(x), which establishes the desired result

in this special case. Assume then that the theorem has been demonstratedfor an index k, and consider a differential r-form ω(x) that involves only thedifferentials dx1, . . . , d xk, dxk+1. This differential form can be written

ω(x) = ω1(x) ∧ dxk+1 + ω2(x)

where the differential (r − 1)-form ω1(x) and the differential r-form ω2(x) arein reduced form and involve only the differentials dx1, . . . , d xk, so that

ω1(x) =

I f I (x)dxI and ω2(x) =

J gJ (x)dxJ

for indices

1 ≤ i1 < i2 < . . . < ir−1 ≤ k and 1 ≤ j1 < j2 < · · · < jr ≤ k.

By assumption

0 = dω(x) = dω1(x) ∧ dxk+1 + dω2(x)

=

I

ni=1 ∂ if I (x)dxi ∧ dxI ∧ dxk+1 +

J

nj=1 ∂ jgJ (x)dxj ∧ dxJ .

The only terms in this expression that involve dxk+1 and dxl for l > k +1 are justin the first sum and just for the index i = l; and the basis vectors dxl∧dxI ∧dxk+1

are linearly independent since 1 ≤ i1 < i1 < · · · < ir−1 ≤ k and k < k + 1 < l.Consequently ∂ lf I (x) = 0 f or l > k + 1, so f I (x) = f I (x1, · · · , xk+1) is afunction just of the variables x1, . . . , xk+1 for all l > k + 1, where 0 ≤ xi ≤ 1.The integrals

hI (x1, . . . , xk+1) = xk+1

0 f I (x1, . . . , xk, t)dt

then are C2 functions in the cell ∆ for which ∂ k+1hI (x) = f I (x), by the funda-mental theorem of calculus for functions of a single variable; hence the differen-tial (r − 1)-form

ω3(x) =

I hI (x)dxI ,

which involves just the differentials dx1, . . . , d xk, satisfies

dω3(x) =

I

ki=1 ∂ ihI (x)dxi ∧ dxI + ∂ k+1hI (x)dxk+1 ∧ dxI

=

I

ki=1 ∂ ihI (x)dxi ∧ dxI + f I (x)dxk+1 ∧ dxI

= ω4(x) + (−1)kω1(x) ∧ dxk+1

for a differential form ω4(x) that involves only the differentials dx1, . . . , d xk.Then

ω(x) − (−1)kdω3(x) = ω1(x) ∧ dxk+1 + ω2(x) − (−1)kω4(x) − ω1(x) ∧ dxk+1

= ω2(x) − (−1)kω4(x)




involves only the differentials dx1, . . . , d xk and

dω(x) − (−1)kdω3(x) = 0;

by the induction hypothesis then there is a differential form ω5(x)(x) in the cell∆ such that

ω(x) − (−1)kdω3(x) = dω5(x),

hence such that ω(x) = dσ(x) where σ (x) = (−1)kω3(x) + ω5(x). That estab-lishes the induction step and thereby concludes the proof.

The preceding theorem is an example of an integrability theorem for a lin-ear system of partial differential equations. For example, the existence of afunction h such that dh = σ for a 1-form σ =

nj=1 f j(x)dxj in Rn amounts

to the existence of a solution of the system of partial differential equations∂ ih(x) = f i(x) for 1 ≤ i ≤ n. It is clear that if there is a C2 solution h(x)then ∂ jf i(x) = ∂ j∂ ih(x) = ∂ i∂ jh(x) = ∂ if j(x); that is just the conditionthat dω(x) = 0, which by Poincare’s Lemma is also a sufficient condition forthe existence of the solution h(x). There are corresponding interpretations forPoincare’s Lemma for differential forms of higher degree, although they are abit more complicated to state without using the machinery of differential forms.Another important property of exterior differentiation is its invariance underthe pullback of differential forms.

Theorem 5.12 If φ : D −→ E is a Ck mapping from an open subset D ⊂ Rm

into an open subset E ⊂ Rn then for any differential form ω ∈ Λr(k)(E )

(5.64) dφ∗(ω) = φ∗(dω) ∈ Λr+1(k−1).

Proof: For a differential form ω ∈ Λr(E ) written explicitly as ω =

I f I (y)dyI it follows from the definition (5.54) that

φ∗(ω) =I,J

f I φ(x)

∂yi1

∂xj1


dxj1 ∧ · · · dxjr .

The exterior derivative of ω is

dω =

I,k∂f I ∂yk

dyk ∧ dyI

and it follows from the definition (5.54) that

φ∗(dω) =

I,J,k,l

∂f I ∂yk

φ(x)

∂ yk∂xl

∂yi1

∂xj1


dxl ∧ dxj1 ∧ · · · ∧ dxjr




On the other hand

dφ∗(ω) = I,J,l

∂

∂xlf I φ(x) ∂yi1

∂xj1· · ·

∂ yir∂xjr


=

I,J,k,l

∂f I ∂yk

φ(x)

∂ yk∂xl

∂yi1

∂xj1



+

I,J,k,l

f I φ(x)

∂ 2yi1∂xl∂xj1

∂yi2

∂xj2



+ · · ·where the remaining terms involve successively the second derivatives of the

the partial derivatives ∂yi2∂xl∂xj2

, ∂yi3∂xl∂xj3

, and so on. These terms that involve

the second derivatives all vanish, since for example

∂ 2yi1

∂xl∂xj1 is symmetric inthe indices l and j1 while dxl ∧ dxj1 is skew-symmetric in these indices. Thatsuffices for the proof.

The exterior derivatives of differential forms in R3 have classical forms, whichare possibly quite familiar from physics. For the 2-form Ωf (x) associated to avector field f (x) in an open set U ⊂ R3 as in (5.33) it follows quite directly fromthe definition of exterior derivative that

(5.65) dΩf (x) =

∂f 1(x)

∂x1+

∂ f 2(x)

∂x2+

∂ f 3(x)

∂x3

dx1 ∧ dx2 ∧ dx3.

The coefficient of this differential 3-form is known as the divergence of thevector field f and is often denoted by div f (x), so by definition it is the function

(5.66) div f (x) = ∂f 1(x)∂x1

+ ∂ f 2(x)∂x2

+ ∂ f 3(x)∂x3

.

A traditional alternative notation is expressed in terms of the vector differentialoperator

(5.67) ∇ =

∂ ∂x1

, ∂ ∂x2

, ∂ ∂x3

.

When applied to functions ∇f (x) is the gradient of the function f (x), as dis-cussed on page 28; and the dot product of this operator with a vector field isthe function

(5.68) ∇ · f (x) = ∂

∂x1f 1(x) +

∂

∂x2f 2(x) +

∂

∂x3 3

f 3(x) = div f (x).

In terms of this operator (5.65) can be rewritten

(5.69) dΩf (x) =

divf (x) dx1 ∧ dx2 ∧ dx3

or

∇ · f (x) dx1 ∧ dx2 ∧ dx3.




For the 1-form ωf (x) associated to a vector field f (x) in an open set U ⊂ R3 asin (5.33) it follows by a straightforward calculation that

dωf (x) =

∂ 2f 3(x) − ∂ 3f 2(x)

dx2 ∧ dx3 +

∂ 3f 1(x) − ∂ 1f 3(x)

dx3 ∧ dx1

(5.70)

+

∂ 1f 2(x) − ∂ 2f 1(x)

dx1 ∧ dx2;

this is the 2-form associated to the vector field called the curl of the vector fieldf (x), so by definition it is the vector field(5.71)

curl f (x) =

∂ 2f 3(x) − ∂ 3f 2(x), ∂ 3f 1(x) − ∂ 1f 3(x), ∂ 1f 2(x) − ∂ 2f 1(x)

.

It follows from (5.37) that the curl of a vector field can be written in terms of the operator ∇ as

(5.72) curl f (x) = ∇× f (x),

so that

(5.73) dωf (x) =

Ωcurl f (x)

or

Ω∇×f (x).

The condition that ddω = 0 for any differential form ω, as in Theorem 5.10 (ii),has classical interpretations as well. For any C 2 function f (x) in R3 sincedf (x) = ω

∇f (x), another interpretation of the first line of (5.33), it follows

from (5.73) that 0 = d df (x) = dω∇f (x) = Ω∇×∇f (x) and consequently that

(5.74) ∇ × ∇f (x) = 0 or alternatively curl gradf (x) = 0.

On the other hand for any C2 vector field f (x) = f 1(x), f 2(x), f 3(x) sincedωf (x) = Ω∇×f (x) by (5.73) it follows from (5.65) that 0 = d dω

f (x) = dΩ∇×f (x) =

∇ · ∇ × f (x)

dx1 ∧ dx2 ∧ dx3 and consequently that

(5.75) ∇ · ∇ × f (x)

= 0 or alternatively div curl f (x) = 0.

Both (5.74) and (5.75) of course can be demonstrated by direct calculation fromthe definition of the differential operator ∇. However Poincare’s Lemma is afar less trivial result, so its classical consequences cannot be proved merely by

a direct calculation, as should be expected. If f (x) is a C1 vector field in R3

the condition that f (x) = ∇g(x) for a function g(x) in ∆ means in terms of differential forms that ω

f (x) = dg(x); and by Poincare’s Lemma that is the case

for a vector field in a cell ∆ ⊂ R3 if 0 = dωf (x)

= Ω∇×f (x), consequently

(5.76) f (x) = ∇g(x) for a function g (x) in ∆ if ∇ × f (x) = 0,




or alternatively in the even more classical notation

(5.77) f (x) = grad g (x) for a function g (x) in ∆ if curl f (x) = 0.

On the other hand that f (x) = ∇ × g(x) for a vector field g (x) in ∆ means interms of differential forms that Ωf (x) = dωg(x0; and by Poincare’s Lemma thatis the case for differential forms in a cell ∆ ⊂R3 if 0 = dΩf (x) = ∇ · f (x) dx1 ∧dx2 ∧ dx3, and consequently

(5.78) f (x) = ∇ × g(x) for a vector field g(x) in ∆ if ∇ · f (x) = 0

or alternatively

(5.79) f (x) = curl g(x) for a vector field g(x) in ∆ if div f (x) = 0.




5.3 Integrals of Differential Forms

Integrals of differential n-forms in open subsets E ⊂ Rn can be defined,but only in terms of the orientation of the vector space Rn; these integralsare oriented integrals, extending that notion from spaces of dimension 1 asdefined in (4.37) to spaces of dimensions n > 1. The integral of a continuousn-form ω(y) = f (y)dy1 ∧ · · · ∧ dyn over an open subset E ⊂ Rn of an orientedvector space Rn with the orientation dy1 ∧ ·· · ∧ dyn in terms of coordinatesy1, . . . , yn is defined by

(5.80)

E

f (y)dy1 ∧ · · · ∧ dyn =

E

f,

so that

(5.81) E f (y)dyi1 ∧ · · · ∧ dyin = i2 i2

· · · in

1 2 · · · n E f (y)

where

i1 i2 · · · in1 2 · · · n

is the sign of the permutation

i1 i2 · · · in

of

the integers

1 2 · · · n

; thus under a permutation of the variables theintegral of a differential form changes by the sign of the permutation.

Theorem 5.13 If φ : D −→ E is a C1 homeomorphism between two connected

open subsets D, E ⊂ Rn then for any continuous differential n-form ω in E

(5.82)

E

ω = ± D

φ∗(ω),

where the sign is + if the mapping φ prerserves the orientation and − if the mapping φ reverses the orientation.

Proof: Suppose that the vector space Rn containing D is oriented by thedifferential form dx1 ∧ · · · ∧dxn in terms of coordinates x1, . . . , xn, and that thevector space Rn containing E is oriented by the differential form dy1∧· · ·∧dyn interms of coordinates y1, . . . , yn. The integral of a continuous differential n-formω(y) = f (y)dy1 ∧ · · · ∧ dyn in E is

(5.83)

E

ω =

E

f,

as in (5.80); and the integral of the induced differential form φ∗(ω) in D, whichby (5.58) has the explicit form φ∗(ω)(x) = f φ(x) detφ(x)dx1

∧ · · · ∧dxn, is

(5.84)

D

φ∗(ω) =

D

(f φ)detφ.

If the mapping φ preserves orientation then φ (x) > 0 at all points x ∈ D andconsequently detφ(x) = |detφ(x|, while if the mapping φ reverses orientation



5.3. INTEGRALS OF DIFFERENTIAL FORMS 127

then φ (x) < 0 at all points x ∈ D and consequently detφ(x) = −|detφ(x)|;therefore (5.84) can be rewritten

(5.85)

D

φ∗(ω) = ± D

(f φ)| detφ|

where the sign is + if the mapping φ preserves orientation and − if the map-ping φ reverses orientation. By the change of variables formula (4.48) in Theo-rem 4.19

(5.86)

E

f =

D

(f φ)| detφ|,

and (5.82) follows immediately by substituting (5.83) and (5.85) into (5.86), toconclude the proof.

There also are oriented integrals of m-forms over suitable lower-dimensional

subsets of the spaces Rn. An m-dimensional singular cell in an open subsetU ⊂ Rn is a continuous mapping φ : ∆ −→ U from an oriented closed cell∆ ⊂ Rm into the open subset U ⊂ Rn. It is often assumed as a standardconvention that the parameter cells are unit cells

(5.87) ∆m =

x1, . . . , xm ∈ Rm 0 ≤ xj ≤ 1 for 1 ≤ j ≤ m

,

although other normalizations are sometimes more appropriate. Much of thesubsequent discussion will focus on C1 singular cells, those for which the map-pings φ extend to C1 mappings from an open neighborhood of the closed cell∆ into U . It is also common to consider Ck singular cells for any integer k > 0or even C∞ singular cells, those for which the mappings φ are of class Ck for

all integers k > 0. Unless explicitly assumed otherwise, though, singular cellsare just continuous mappings. A 1-dimensional singular cell is a parametrizedcurve, as defined on page 55, since an interval [0, 1] is a 1-dimensional cell thatis naturally oriented. For singular cells of dimension m > 1 however the orienta-tion always must be specified explicitly. As in the case of curves, the mappingsdescribing singular cells are not required to be one-to-one mappings, nor for Cksingular cells is it required that the differential φ of the mapping have maximalrank; so the images of singular cells are not necessarily submanifolds of U evenlocally, and that is the reason for the adjective singular. It is even possible forthe image of a singular m-cell to be a submanifold of Rn of a dimension lowerthan m. However for some purposes it is necessary or convenient to considermore regular singular cells. A simple singular cell is a continuous mappingφ : ∆ −→ Rn that is a homeomorphism between the cell ∆ ⊂ Rm and its

image Γ = φ(∆) ⊂ Rn

, where Γ has the induced topology as a subset of Rn

.Two m-dimensional singular cells φ1 : ∆1 −→ Rn1 and φ2 : ∆2 −→ Rn2 areequivalent singular cells if there is an orientation-preserving homeomorphismh : ∆1 −→ ∆2 such that φ1(t) = φ2

h(t)

; this is a natural equivalence relation

between singular cells. Equivalent singular cells of course are of the same di-mension and have the same image; but in general singular cells having the same




image are not necessarily equivalent singular cells and are not necessarily of thesame dimension. However two simple singular cells are equivalent if and only if

they have the same image; for if φ1 : ∆1 −→ Rn and φ2 : ∆2 −→ R p are singularcells having the same image φ1(∆1) = φ2(∆2) then φ1(t) = φ2

(φ−1

2 φ1)(t)

,

and φ−12 φ1 is a homeomorphism so the singular cells φ1 and φ2 are equivalent

singular cells. Two Ck singular cells φ1 : ∆1 −→ Rn1 and φ2 : ∆2 −→ Rn2 areCk-equivalent singular cells if there is an orientation preserving Ck homeomor-phism h : V 1 −→ V 2 between open neighborhoods V 1 of the cell ∆1 and V 2 of the cell ∆2 such that φ1(t) = φ2

h(t)

for all t ∈ V 1.

If φ : ∆ −→ U is an m-dimensional C 1 singular cell in an open subsetU ⊂ Rn, and if ω ∈ Λm(U ) is a continuous differential m-form in the open setU , the integral of the differential form ω over the singular cell φ is defined by

(5.88)

φω =

∆φ∗(ω).

If φ1 : ∆1 −→ U and φ2 : ∆2 −→ U are C1 equivalent singular cells, so thatφ1(t) = φ2

h(t)

for an orientation preserving C1 homeomorphism h : V 1 −→

V 2 between open neighborhoods V i of the cells ∆i, then by Theorems 5.9 and5.13

(5.89)

∆1

φ∗1(ω) =

∆1

h∗φ∗2(ω)

=

∆2

φ∗2(ω);

thus the integrals of a differential form over C1 equivalent singular m-cells areequal. Since simple singular cells φ1 and φ2 are equivalent if and only if theyhave the same image Γ = φ1(∆1) = φ2(∆1), the integral of a differential formω along a simple singular cell is determined uniquely by the image Γ = φ(∆);

consequently for simple singular cells it is customary to set

(5.90)

φ

ω =

Γ

ω where Γ = φ(∆).

This notation of course is limited to the case of simple singular cells.Line integrals in Rn are special cases of integrals of differential forms over

singular cells, since a parametrized curve is a special case of a singular cell.If φ : [0, 1] −→ U is a C 1 parametrized 1-cell in an open subset U ⊂ Rn,where the parameter in [0, 1] is t and the parameters in Rn are x1, . . . , xn, andif ω(x) =

nj=1 f j(x)dxj is a continuous differential form in U , the induced

differential form is φ∗(ω)(t) =

nj=1 f j

φ(t)

dxj

dt dt; and if f (x) = f j(x) is

the vector field with component functions f j(x) this induced differential form

can be written as the dot product φ∗(ω)(t) = f φ(t) · φ(t)dt. The integral of the differential 1-form ω(x) over the singular 1-cell φ hence is just the familiarline integral

(5.91)

φ

ω =

φ

f · τ




as in (5.15); and for simple parametrized curves the integral is determined bythe image curve γ = φ([0, 1]) so it is often denoted just by γ ω = γ f

· τ .

For example if γ ⊂ R3 is the curve from the origin a = 0, 0, 0 to the pointb = 1, 1, 1 that is the image of the mapping φ : [0, 1] −→ R3 given byφ(t) = t, t2, t3, and if f is the vector field f (x) = x2

1, x2, x1x2x3, then γ

f · τ ds =

γ

(x21dx1 + x2dx2 + x1x2x3dx3)

=

[0,1]

t2 · dt + t2 · 2tdt + t6 · 3t2dt

=

[0,1]

3t2 + 3t8

dt =

4

3.

As this example illustrates, the explicit calculation of line integrals is often

simpler when these integrals are expressed in terms of differential 1-forms.Another commonly considered special case is that of the integration of dif-ferential 2-forms over 2-dimensional singular cells in R3. Let φ : ∆ −→ U bea C1 singular 2-cell in an open subset U ⊂ R3, where the parameters in R2

are t1, t2 with the natural orientation dt1 ∧ dt2 and the parameters in R3 arex1, x2, x3 with the natural orientation dx1 ∧dx2 ∧ dx3. A continuous 2-form inR3 can be viewed as the differential form

Ωf (x) = f 1(x)dx2 ∧ dx3 + f 2(x)dx3 ∧ dx1 + f 3(x)dx1 ∧ dx2

associated to the vector field f (x), with the usual cyclic notation as in (5.33).The induced differential form under the mapping φ is the differential 2-form

φ∗(Ωf )(t) =

2

j1j2=1f 1φ(t) ∂x2

∂tj1

∂x3

∂tj2+ f

2φ(t) ∂x3

∂tj1

∂x1

∂tj2

+ f 3φ(t)

∂x1

∂tj1

∂x2

∂tj2

dtj1 ∧ dtj2

=

f 1φ(t)

∂x2

∂t1

∂x3

∂t2− ∂ x2

∂t2

∂x3

∂t1

+ f 2

φ(t)

∂x3

∂t1

∂x1

∂t2− ∂ x3

∂t2

∂x1

∂t1

+ f 3

φ(t)

∂x1

∂t1

∂x2

∂t2− ∂ x1

∂t2

∂x2

∂t1

dt1 ∧ dt2,

which can be written more succinctly as

φ∗(Ωf )(t) = det

f 1φ(t) f 2φ(t) f 3φ(t)∂x1

∂t1

∂x2

∂t1

∂x3

∂t1

∂x1

∂t2

∂x2

∂t2

∂x3

∂t2

dt1 ∧ dt2,(5.92)




or using the cross-product introduced in (5.37) as

(5.93) φ∗(Ωf )(t) = f φ(t) · ∂ x∂t1× ∂ x∂t2

dt1 ∧ dt2

in terms of the vector fields

∂ x

∂tj=

∂x1

∂tj,

∂x2

∂tj,

∂x3

∂tj

for j = 1, 2.

Thus the integral of the differential 2-form Ωf on the singular 2-cell φ is givenby

(5.94)

φ

Ωf =

∆

f φ(t)

· ∂ x

∂t1× ∂ x

∂t2

dt1 ∧ dt2;

and if φ is a simple 2-cell with image φ(∆) = Γ ⊂ U this can be viewed asthe integral Γ Ωf . In either case the value of this integral depends on boththe orientation of the space R3, which determines the order of the columns in(5.92), and the orientation of the space R2, which determines the order of therows in (5.92). If the two vectors ∂ x/∂t1 and ∂ x/∂t2 are linearly independentat a point t ∈ ∆ then rank φ(t) = 2 and it follows from the rank theoremthat the image of an open neighborhood of the point t ∈ ∆ is a 2-dimensionalsubmanifold of U ; but of course this may not be all of the image Γ near thepoint φ(t) ∈ U if the 2-cell is not a simple 2-cell. The three vectors

∂ x

∂t1,

∂ x

∂t2,

∂ x

∂t1× ∂ x

∂t2

have the orientation compatible with the chosen orientation of R3, described bythe order dx1, dx2, dx3 of the basis vectors along the coordinate axes. The

unit vector in the direction of the cross-product ∂ x

∂t1 × ∂ x

∂t2 is the unit normalvector to that oriented submanifold, which is denoted by ν (t); and in theseterms

(5.95) ∂ x

∂t1× ∂ x

∂t2= ∂ x

∂t1× ∂ x

∂t2

2ν (t).

In parallel to the interpretation of line integrals in (5.88), the integral (5.94)can be written

(5.96)

Γ

Ωf =

Γ

f · ν dS

and can be viewed as the integral over the surface Γ of the dot product of thevector field f and the unit normal vector ν to Γ, integrated with respect to the

surface area of Γ where the surface area |Γ| is defined by3

(5.97)Γ =

∆

∂ x

∂t1× ∂ x

∂t2

2

.

3The direct definition of surface area is an extremely complicated matter, as discussed indetail for instance in the classical book by T. Radoo Length and Area , American Mathematical




The calculation of the surface area can be a rather complicated one, involvingexplicit integrals of the form

(5.98)

|Γ| =

∆

det

∂x2∂t1

∂x3∂t1

∂x2∂t2

∂x3∂t2

2

+ det

∂x3∂t1

∂x1∂t1

∂x3∂t2

∂x1∂t2

2

+ det

∂x1∂t1

∂x2∂t1

∂x1∂t2

∂x2∂t2

2

.

As in the case of line integrals it is generally much much simpler to calculatethe integral (5.94) as the integral of the differential form Ω f in terms of anyparametrization of the singular cell Γ, without calculating the surface area.

For example the cylindrical surface Γ ⊂ R3 defined by x21 + x2

2 = 1, 0 ≤ x3 ≤1, can be described as the image of the singular 2-cell φ : ∆2 −→ Γ given byφ(t) =

cos t1, sin t1, t2

, where the orientation of the cell is described by the

differential form dt1 ∧dt2. At the point 0 = 0, 0, 1 = φ0, 0 on the boundaryof the surface Γ

∂ x

∂t1= −sin t1, cos t1, 0

t1=t2=0

= 0 1 0,

∂ x

∂t2= 0 0 1

hence

∂ x

∂t1 ×

∂ x

∂t2

=

1, 0, 0

,

which points outward from the interior of the cylinder. (For the opposite orien-tation of the cylinder the orientation of the unit cell ∆2 would be described bythe differential form dt2 ∧ dt1.) The differential 2-form

Ωf = x2 dx2 ∧ dx3 + x1dx3 ∧ dx1 + x3dx1 ∧ dx2,

Society Colloquium Publications, New York, 1948. Those difficulties are finessed by using thepattern of the calculation of arc length through line integrals to define surface area, reversingthe order of the discussion in Section 5.15. It is customary to attempt to justify this definition

of surface area by observing that the length

‚‚‚

∂ x

∂t1 ×

∂ x

∂t2

‚‚‚2 of the cross product of the two

vectors ∂ x

∂t1and

∂ x

∂t2is the area of the parallelogram spanned by these two vectors, which

can be viewed as a planar approximation to the surface since the vectors are tangent vectorsapproximating the lengths of the two curves that are the image of the t1 and t2 coordinateaxes under the mapping φ : R2 −→ R

3. What is perhaps more convincing though is that theintegral

R Γ

dS for some standard surfaces yields the usual values for surface areas.




is that described by the vector field f where f (x) = x1, x2, x3, and its integralis

Γ

f · ν dS =

Γ

Ωf =

∆

x2 dx2 ∧ dx3 + x1dx3 ∧ dx1 + x3dx1 ∧ dx2

= det

cos t1 sin t1 t2

− sin t1 cos t1 00 0 1

dt1 ∧ dt2

=

t1∈[0,2π]

t2∈[0,1]

dt2dt1 = 2π.

Furthermore since

∂ x

∂t1× ∂ x

∂t2= cos t1dx1 + sin t1dx2 + 0dx3

when expressed in terms of the basis vectors dx1, dx2, dx3 it follows that ∂ x

∂t1× ∂ x

∂t2

2

= 1

so the area of the cylinder Γ is

|Γ| =

Γ

1dS =

∆

1 = 2π.

That is of course a familiar result, and it should be somewhat comforting thatthe general formula does work in this case.



5.4. STOKES’S THEOREM 133

5.4 Stokes’s Theorem

For many purposes it is necessary or useful to consider formal sums of singu-lar cells. For example, curves often are expressed as the union of segments, suchas when a curve that is not differentiable everywhere is expressed as a unionof pieces which are differentiable, or when the curve bounding a region in theplane such as an annulus is a union of separate curves; and in these cases thecurves can be viewed as images of formal sums of singular cells. In general, asingular m-chain in an open subset U ⊂ Rn is defined to be a finite formalsum

(5.99) C =r

j=1

ν j ·φj for integers ν j ∈ Z and singular m-cells φj in U ;

for some purposes it is useful also to consider a real singular m-chain, a

formal sum (5.99) for real numbers ν j ∈ R rather than just integers.As an illustrative example, the point set boundary of the standard unit cell

(5.100) ∆2 =

(x1, x2) 0 ≤ x1, x2 ≤ 1

⊂ R2

in the plane of the variables x1, x2 consists of those points x1, x2 ∈ R2 suchthat at least one of the variables x1, x2 takes on the value 0 or 1; thus it naturallyconsists of the 4 segments

γ 1,0 =

(0, x2) 0 ≤ x2 ≤ 1

(5.101)

γ 1,1 =

(1, x2)

0 ≤ x2 ≤ 1

γ 2,0 = (x1, 0) 0 ≤ x1 ≤ 1 γ 2,1 =

(x1, 1)

0 ≤ x1, ≤ 1

.

These segments are the images γ i, = η i,([0, 1]) of four simple singular 1-cellsηi, for i = 1, 2 and = 0, 1, the mappings defined by

(5.102) ηi,(t) =

, t ∈ R2 for i = 1, = 0, 1, 0 ≤ t ≤ 1,

t, ∈ R2 for i = 2, = 0, 1, 0 ≤ t ≤ 1.

When these 1-cells are oriented as usual by increasing values of the parameter t,the images γ i, are oriented line segments as indicated in the accompanying Fig-

ure 5.1; but these orientations do not reflect the customary practice of viewingthe boundary of the 2-cell ∆2 as a single curve traversed in a counterclockwisedirection. For this reason the boundary chain of the singular cell ∆2 is definedto be the singular 1-chain

(5.103) ∂ ∆2 = η2,0 + η1,1 − η2,1 − η1,0.




Since the singular cells ηi, are simple singular cells, they are described alterna-

tively by their images γ i, = η i,([0, 1]); hence the singular 1-chain ∂ ∆2 can be

described equivalently as

(5.104) ∂ ∆2 = γ 2,0 + γ 1,1 − γ 2,1 − γ 1,0,

which can be viewed as the traditional boundary curve of the cell ∆ 2 in Fig-ure 5.1, oriented counterclockwise and decomposed into the union of separateoriented pieces consisting of the images γ i, of the simple singular 1-cells η i,.

∆γ

γ

γ 1,0

2,0

2,1γ

1,1

Figure 5.1: The singular cells forming the boundary of a cell ∆2 ⊂ R2.

In general the point set boundary of the standard unit cell ∆m ⊂ Rm inthe m-dimensional vector space of the variables x

1, . . . , x

m consists of the 2m

segments

(5.105) Γi, =

x1, . . . , xi−1, , xi+1, . . . , xm 0 ≤ xk ≤ 1

for 1 ≤ i ≤ m and = 0, 1. Each of these segments is the image of a simple(m − 1)-dimensional singular cell ηi,, the mapping from the standard unit

cell ∆m−1 ⊂ Rm−1 in the (m − 1)-dimensional vector space of the variablest1, . . . , tm−1 given by

(5.106) ηi,t1, . . . , tm−1 = t1, . . . , ti−1, , ti, . . . , tm−1 ∈ Γi,,

where the orientation of ∆m−1 is described by the differential form dt1

∧dt2

∧· · ·∧dtm. The boundary chain of the singular cell ∆m of an arbitrary dimensionm is defined to be the singular (m − 1)-chain

(5.107) ∂ ∆m =

1≤i≤m

=0,1

(−1)i+ηi,;




and since the singular cells η i, are simple, they can be described fully by their

images ηi,(∆m

−1) = Γi,, so the boundary chain of the singular cell ∆m can

be described alternatively in terms of these images by

(5.108) ∂ ∆m =

1≤i≤m

=0,1

(−1)i+Γi,.

The notation ∂ ∆ is the same as that used for the point set boundary of ∆; butgenerally it should be clear from the context whether ∂ ∆ is to be viewed as apoint set or as a singular chain, and if necessary to avoid confusion it will becalled either the point set boundary or the boundary chain.

3

2

1u

u

u

Γ 3,1

Γ 2,1

Γ 1,1

Figure 5.2: The boundary of a cell ∆3 ⊂ R3.

For m = 1 the boundary chain of the unit 1-cell ∆ 1 = [0, 1] is the 0-chain∂ [0, 1] = 1 − 0, since Γ1, is but a single point on the line, while for m = 2 theboundary chain of the unit 2-cell ∆2 is the 1-chain (5.104), the usual boundarycurve with the countercockwise orientation. For m = 3 the boundary chain of the unit 3-cell ∆3 is the 2-chain

(5.109) ∂ ∆3 = −Γ1,0 + Γ1,1 + Γ2,0 − Γ2,1 − Γ3,0 + Γ3,1.

In more detail, let ∆3 be the 3-cell in the vector space R3

of the variablesx1, x2, x3 and let u1, u2, u3 be unit vectors along these coordinate axes, as inFigure 5.2. The boundary segment Γ3,1 is the top side x3 = 1 and the boundarysegment Γ3,0 is the bottom side x3 = 0 of the cell ∆, the images of the simplesingular 2-cell η3, : ∆2 −→ R3 where η3,(t1, t2) = t1, t2, for = 0, 1.

The orientation of the cell ∆2 is described by the differential form dt1 ∧ dt2




or equivalently by the order of the vectors u1, u2, the images of the vectorsdt1, dt2; but it is more convenient to describe the orientation altenatively by the

cross product u1×u2 = u3. In the boundary chain ∂ ∆3 the segment Γ3,1 appearswith the positive sign, so with the orientation described by the vector u3, thevector pointing outward from the cell ∆3; but the segment Γ3,0 appears withthe negative sign, so with the orientation described by the vector −u3, again thevector pointing outward from the cell ∆3. Similarly the boundary segment Γ1,1

is the front side x1 = 1 and the boundary segment Γ1,0 is the back side x1 = 0of the cell ∆3, the images of the simple singular 2-cell η1, : ∆2 −→ R3 where

η1,(t1, t2) = , t1, t2 for = 0, 1. The orientation of the cell ∆2 is described bythe differential form dt1 ∧ dt2 or equivalently by the order of the vectors u2, u3,the images of the vectors dt1, dt2; but it also can be described by the crossproduct u2 × u3 = u1. In the boundary chain ∂ ∆3 the segment Γ1,1 appearswith the positive sign, so with the orientation described by the vector u1, thevector pointing outward from the cell ∆

3; but the segment Γ

1,0 appears with

the negative sign, so with the orientation described by the vector −u1, again thevector pointing outward from the cell ∆3. Finally the boundary segment Γ2,1 isthe right-hand side x2 = 1 and the boundary segment Γ2,0 is the left-hand sidex2 = 0 of the cell ∆, the images of the simple singular 2-cell η2, : ∆2 −→ R3

where η2,(t1, t2) = t1, , t2 for = 0, 1. The orientation of the cell ∆2 isdescribed by the differential form dt1 ∧ dt2 or equivalently by the order of thevectors u1, u3, the images of the vectors dt1, dt2, or yet again alternatively bythe cross product u1 × u3 = −u2. In the boundary chain ∂ ∆ the segmentΓ2,1 appears with the negative sign, so with the orientation described by thevector u2, the vector pointing outward from the cell ∆; but the segment Γ2,0

appears with the positive sign, so with the orientation described by the vector−u2, again the vector pointing outward from the cell ∆3. Altogether then, the

boundary segments of the cell ∆3 are oriented so that the cross product of theunit vectors describing the orientation by their order is pointed outwards, aneasy way to remember the orientation and to use it in practice.

The boundary chain of the standard unit m-cell ∆m is the singular chain(5.107), a formal sum of mappings from the (m − 1)-cell ∆m−1 into ∆m, actu-ally into the point set boundary of ∆m. A general singular m-cell in Rn is acontinuous mapping φ : ∆m −→ Rn, and its boundary chain is defined to bethe singular (m − 1)-chain

(5.110) ∂ φ =

1≤i≤m

=0,1

(−1)i+φ ηi,,

the composition of the mappings ηi, : ∆m−1 −→ ∆m and the mapping φ :∆m −→ Rn. If the singular cell φ is a simple singular cell it is determineduniquely by its image φ(∆m), indeed often is identified carelessly with its image;and in that case the singular (m − 1)-cells Γi, = φ ηi,(∆m−1) also are simplesingular cells which are determined uniquely by their images, so they too canbe identified with their images Γi, = φ η i,(∆m−1) and consequently the




boundary chain can be described by the formal sum

(5.111) ∂ φ(∆m−1) = 1≤i≤m

=0,1

(−1)i+

Γi,.

This is illustrated in the example of a 2-dimensional singular cell in Figure 5.3.Of course this interpretation is limited to the case of a simple singular cell, andmust be used with caution otherwise. The boundary chain of a general singularchain C =

i ν iφi is defined as the singular chain

(5.112) ∂ C = ∂

i

ν iφi

=i

ν i · ∂ φi.

φ

γ

γ

γ

γ

φ( γ )

φ( γ ) φ( γ )

φ(γ )

1,0 1,1

2,0

2,1

1,0 2,0

1,1

2,1

.∆ φ(∆)

Figure 5.3: The boundary of a cell and of a singular cell

A singular m-chain

C is said to be a cycle if ∂

C = 0, and it is said to be a

boundary if C = ∂ C0 for some singular (m − 1) chain C0. For example, theboundary chain of a 1-cell [a, b] ⊂ R1 is the singular 0-chain ∂ [a, b] = b − a,hence the boundary chain of the singular chain (5.104) is

∂

∂ ∆2

= ∂γ 2,0 + ∂γ 1,1 − ∂γ 2,1 − ∂γ 1,0(5.113)

=

(1, 0) − (0, 0)

+

(1, 1) − (1, 0)

− (1, 1) − (0, 1)− (0, 1) − (0, 0)

= 0;

thus the boundary chain ∂ ∆2 is a cycle, and that is the general situation.

Theorem 5.14 The boundary chain of a singular chain is a cycle, or equiva-

lently ∂ (∂ C) = 0 for any singular chain C.

Proof: First the boundary chain of the standard unit cell ∆m as defined in(5.107) is the formal sum

(5.114) ∂ ∆m =

1≤i≤m

=0,1

(−1)i+ηi,,m−1




of the mappings ηi,,m−1∆m−1 −→ ∆m defined in (5.106), where the additionalsubscript m

− 1 emphasizes that these are mappings from the standard unit

(m − 1)-cell. The boundary chain of the chain (5.114) as defined in (5.112) isthen the formal sum

(5.115) ∂ (∂ ∆m) =

1≤i≤m−1

=0,1

(−1)i+∂ ηi,,m−1

where as in (5.110)

(5.116) ∂ ηi,,m−1 =

1≤j≤m−1

δ=0,1

(−1)j+δηi,,m−1 ηj,δ,m−2.

Altogether then

(5.117) ∂ (∂ ∆m) =

1≤i≤m−1

1≤j≤m−2

,δ=0,1

(−1)i+j++δηi,,m−1 ηj,δ,m−1.

By (5.106)(5.118)

ηj,δ,m−2(s1, . . . , sm−2) = (s1, . . . , sj−1, δ , sj, . . . , sm−2) = (t1, . . . , tm−1)

and correspondingly

(5.119) ηi,,m−1(t1, . . . , tm−1) = (t1, . . . , ti−1, , ti, . . . , tm−1) = (x1, . . . , xm).

The images of the mappings (5.118) and (5.119) in the separate cases j < i and j ≥ i are:first if j < i

(5.120) (t1, . . . tj−1, tj , tj+1, . . . ti−1, , ti, . . . tm−1)

(s1, . . . sj−1, δ, sj, . . . si−2, , si−1, . . . sm−2),

while if j ≥ i

(5.121) (t1, . . . ti−1, , ti, . . . tj−1, tj , tj+1, . . . tm−1)

(s1, . . . si−1, , si, . . . sj−1, δ, sj , . . . sm−2);

this second array can be rewritten more symmetrically by setting j = j + 1, sofor j > i

(5.122) (t1, . . . ti−1, , ti, . . . tj−2, tj−1, tj , . . . tm−1)

(s1, . . . si−1, , si, . . . sj−2, δ, sj−1, . . . sm−2).




From (5.120) and (5.122), after replacing j by j in (5.122), it follows that

∂ (∂ ∆m) = j<i

,δ=0,1

(−1)i+j++δ(s1, . . . , sj−1, δ , sj , . . . , si−2, , si−1, . . . , sm−2)

−j<i

,δ=0,1

(−1)i+j++δ(s1, . . . , si−1, , si, . . . , sj−2, δ , sj−1, . . . , sm−2),

where the negative sign in the second line reflects change of the index of sum-mation from j to j = j + 1 in the sign. The two lines are equal except for thesign, as is evident upon interchanging i and j and interchanging and δ , andtherefore ∂ (∂ ∆m) = 0. Next the boundary chain of a general singular m-cellφ : ∆m −→ Rn is defined to be ∂ φ =

1≤i≤m

=0,1

(−1)i+φ ηi,, as in (5.110),

where φ ηi, is a singular (m − 1)-cell; so by (5.112) and (5.110) again

∂ (∂ φ) =

1≤i≤m

=0,1

∂ φ ηi,,m−1

=

1≤i≤m

=0,1

1≤j≤m

δ=0,1

(−1)i+j++δφ ηi,,m−1 ηj,δ,m−2

= φ

∂ (∂ ∆m)

= φ(0) = 0,

upon applying what has been demonstrated for the special case of the standardunit cell ∆m. Finally by (5.112) the boundary chain of the singular m-chain

C = i ν i · φi for singular m-cells φi is defined by ∂ C = i ν i · φi, and also∂ (∂ C) =

i ν i∂ (∂ φi) = 0, since it was just demonstrated that ∂ (∂ φi) = 0 for

any singular cell φi. That suffices to conclude the proof.

The set of real singular m-chains in an open subset U ⊂ Rn form a real vectorspace C m(U,R), where the sum of two chains C =

i ν iφi and C =

j ν j φj

is defined to be the chain C + C =

i ν iφi +

j ν j φ

j . It is clear from the

definition of the boundary chain that ∂ C + C = ∂ C + ∂ C , so the set of real

singular m-cycles form a vector subspace Z m(U,R) ⊂ C m(U,R); and it followsfrom the preceding theorem that the set of real singular m-boundaries form avector subspace Bm(U,R) ⊂ Z m(U,R). The quotient vector space

(5.123) H m(U,R

) =

Z m(U,R)

Bm(U,R )

is called the real singular homology group of dimension m of the set U . Thesame construction can be carried out for ordinary singular chains , but in termsof abelian groups rather than of vector spaces; and the corresponding quotientH m(U,Z) is called the singular homology group of dimension m of the set




U . These are algebraic constructions that reflect the geometrical propertiesof the set U , and can be defined on arbitrary manifolds or even more general

topological spaces4.The integral of a differential m-form ω over a singular m-chain is defined by

(5.124)

C

ω =r

j=1

ν j

φj

ω for the chain C =r

j=1 ν j ·φj,

whether the coefficients ν j are integers or real numbers. For example, twice theintegral of a differential form ω over a singular cell φ can be viewed alternativelyas the integral of that differential form over the singular chain C = 2 ·φ; and thenegative of the integral of ω over φ, or what is the same thing the integral of ωover φ with the opposite orientation, can be viewed alternatively as the integralof that differential form over the singular chain

C =

−1

·φ =

−φ. The funda-

mental theorem of calculus for functions of a single variable extends to functionsof several variables in the form of Stokes’s Theorem, which is expressed mostnaturally in terms of the integrals of differential forms over singular chains; butthe proof really is but a straightforward application of the classical fundamentaltheorem of calculus in a single variable. It actually suffices to prove the resultin the following special case.

Theorem 5.15 If ω is a C1 differential form of degree m −1 defined in an open

neighborhood of a cell ∆m ⊂ Rm then

(5.125)

∂ ∆m

ω =

∆m

dω.

Proof: The differential (m − 1)-form

(5.126) ω = f (x)dx1 ∧ · · · ∧ dxr−1 ∧ dxr+1 ∧ · · · ∧ dxm,

is nontrivial only on the boundary segments Γi, for which i = r, since if i = rthe variable xi is constant on Γi, so the differential dxi vanishes identically onthat segment of the boundary; thus

∂ ∆m

ω = (−1)r

Γr,0

ω + (−1)r+1

Γr,1

ω.

The segments Γr, are the images

ηr,(∆m−1) = x1, . . . , xr−1, , xr+1, . . . , xm 0 ≤ xi ≤ 1

4Homology groups are discussed in detail in most books on algebraic topology, as for

instance in the book by Allen Hatcher Algebraic Topology , (Cambridge University Press,2002).




oriented by the differential form dx1 ∧ · · · ∧ dxr−1 ∧ dxr+1 ∧ . . . ∧ dxm; so theintegral of the differential form ω over the boundary chain ∂ ∆m is just

∂ ∆m

ω =

ηr,(∆m−1)

(−1)rf (x1, . . . , xr−1, 0, xr+1 . . . xm)

+ (−1)r+1f (x1, . . . , xr−1, 1, xr+1 . . . xm)

.

By the fundamental theorem of calculus in the single variable xr the integrandin the preceding integral is just (−1)r−1

[0,1]

∂ rf (x); substituting this into the

preceding integal and using Fubini’s Theorem shows that

∂ ∆m

ω = (

−1)r−1

ηr,(∆m) [0,1]

∂ rf (x) = (

−1)r−1

∆m

∂ rf (x).

However

dω = ∂ rf (x)dxr ∧ dx1 ∧ · · · ∧ dxr−1 ∧ dxr+1 ∧ · · · ∧ dxm

= (−1)r−1∂ rf (x)dx1 ∧ · · · ∧ dxm

so ∆m

dω = (−1)r−1

∆m

∂ rf (x),

which shows that (5.125) holds for differential forms of the special type (5.126).Any differential form can be written as a sum of differential forms of the type(5.126) for various values of the index r, and (5.125) is clearly a linear functionof the differential form ω; hence (5.125) holds for arbitrary differential forms,and that suffices for the proof.

Corollary 5.16 (Stokes’s Theorem) If C ⊂ U is a C1 m-dimensional sin-

gular chain in an open subset U ⊂ Rn and ω ∈ Λm(1)(U ) is a C1 differential

(m − 1)-form in U then

(5.127) ∂ C ω = C dω.

Proof: First for the special case of a singular cell φ : ∆m −→ U in an opensubset U ⊂ Rm, since ∂ φ =

1≤i≤m=0,1

(−1)i+ηi, for the mappings φ ηi, it fol-

lows from the definition (5.88) of the integral over a singular cell, Theorem 5.12




and the preceding theorem that for any C1 differential (m − 1)-form ω in U

∂ φ

ω = 1≤i≤m

=0,1

(−1)i+ φηi,

ω = 1≤i≤m

=0,1

(−1)i+ ∆m−1

(φ η∗i,)(ω)

=

1≤i≤m

=0,1

(−1)i+

∆m−1

η∗i,φ∗(ω)

=

∂ ∆m

φ∗(ω)

=

∆m

dφ∗(ω) =

∆m

φ∗(dω) =

φ

dω.

Then if C =

i ν i ·φi is a singular chain where Γi are singular cells in U and if ωis a C1 differential (m−1)-form it follows from what has just been demonstratedand the definition (5.124) of the integral over a singular chain that

∂ Cω =

i

ν i

∂ φi

ω =i

ν i

φi

dω =

P

i ν i·φi

dω =

C

dω,

and that suffices for the proof.

Examples of the application of Stokes’s Theorem abound. It can be used forinstance to complement the discussion of conservative vector fields in Section 5.1.It was noted there that any closed 1-form is locally the exterior derivative of a function, indeed is the exterior derivative of a function in any cell but notnecessarily so in more general regions; but it is the exterior derivative of afunction in a class of regions that can be described topologically.

Corollary 5.17 5 If U ⊂ Rn is an open subset such that H 1(U,R) = 0 then any closed C1 differential 1-form in U is the exterior derivative of a function in

U .

Proof: It follows from Theorem 5.8 that it is sufficient to show that the integral γ

ω of any closed 1-form ω on curves γ between two points a, b ∈ U dependsonly on the points a, b and not on the choice of the curve γ from a to b. If γ 1, γ 2 are two curves from a to b, viewed as singular 1-chains, then γ = γ 1 − γ 2is a closed singular 1-chain U so as a consequence of the hypothesis it is the

5The proof of this corollary obviously extends directly to show that if U ⊂ Rn is anopen subset for which H m(U,R) = 0 then any closed differential m-form in U is also exact.The set of exact differential m-forms in U form a vector subspace of the vector space of closed differential m-forms, and the quotient space Hm(U ) is called the deRham group of

the set U . In these terms the extension of the corollary can be stated as the assertion that if H m(U,R) = 0 then Hm(U ) = 0. The deRham theorem shows that in many circumstancesthe homology space H m(U,R) actually is isomorphic to the deRham group Hm(U ), providingan analytic interpretation of a geometric property of the set U ; this result is particularlyuseful in examining general manifolds, and is discussed further in the book ”From Calculusto Cohomology: De Rham Cohomology and Characteristic Classes” by Ib H. Madsen andJurgen Tornehave, (Cambridge University Press, 1997), among other places.




boundary γ = ∂ C of a singular 2-chain C ⊂ U ; then by Stokes’s Theorem

γ 1

ω − γ 2

ω = γ

ω = ∂ C

ω = C

dω = C

0 = 0,


A number of applications of Stokes’s Theorem can be restated in more clas-sical terms in the special cases of simple singular n-chains in open subsetsU ⊂ Rn, which are defined to be chains of the special form C =

iφi in which φi

are C1 singular cells, the images of which are Jordan domains φi(∆n) = Γi ⊂ U that are pairwise disjoint except for their boundaries. The support of a simplesingular chain C =

iφi is the subset |C| =

i φi(∆n) ⊂ U .

Theorem 5.18 If D = |C| ⊂ Rn is the support of a C1 simple singular n-chain

C in an open subset U

⊂Rn and if

(5.128) ω =n

i=1

f idx1 ∧ · · · ∧ dxi−1 ∧ dxi+1 ∧ · · · ∧ dxn

is a C1 differential (n − 1)-form in U then

(5.129)

∂ C

ω =

D

ni=1

(−1)i−1 ∂f i∂xi

.

Proof: By Stokes’s Theorem ∂ C =

C dω for the differential form (5.128)

and the simple singular n-chain C . The integral of any differential n-formσ = h(x)dx1 ∧ · · · ∧ dxn over the simple singular n-chain C =

i φi is

C σ =

i φi σ = i ∆n φ∗i (σ). Since the C1

mappingsφi : ∆m −→ Γi are orientation-preserving homeomorphisms by definition, it follows from Theorem 5.13 that

∆nφ∗i (σ) =

Γi

σ; and

Γiσ =

Γi

h as in (5.80). The Jordan domains Γi

are disjoint, except for their boundaries, so from Theorem 4.10 it follows thati

Γi

h = |C| h =

D

h; altogether therefore C σ =

D

h. In particular for

the differential form σ = dω in which h =n

i=1(−1)i−1∂f i/∂xi it follows that(5.129) holds as desired, to conclude the proof.

Corollary 5.19 (Green’s Theorem) If D ⊂ U is the support of a simple

singular 2-chain C in an open subset U ⊂ R2 then

(5.130) ∂ C f 1dx1 + f 2dx2 = D ∂f 2∂x

1 −

∂f 1∂x

2 for any C1 vector field f = f 1, f 2 in U .

Proof: This is just the special case of the preceding theorem for the differentialform ωf in R2, so no further proof is required.




0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0

1 1 1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0

1 1 1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 10011 0011

0 0

0 0

1 1

1 1

0 0

0 0

1 1

1 1

0 0 0 0 0 0 0

0 0 0 0 0 0 0

0 0 0 0 0 0 0

0 0 0 0 0 0 0

0 0 0 0 0 0 0

0 0 0 0 0 0 0

0 0 0 0 0 0 0

1 1 1 1 1 1 1

1 1 1 1 1 1 1

1 1 1 1 1 1 1

1 1 1 1 1 1 1

1 1 1 1 1 1 1

1 1 1 1 1 1 1

1 1 1 1 1 1 1

0 0 0 0 0 0 0

0 0 0 0 0 0 0

0 0 0 0 0 0 0

0 0 0 0 0 0 0

0 0 0 0 0 0 0

0 0 0 0 0 0 0

0 0 0 0 0 0 0

1 1 1 1 1 1 1

1 1 1 1 1 1 1

1 1 1 1 1 1 1

1 1 1 1 1 1 1

1 1 1 1 1 1 1

1 1 1 1 1 1 1

1 1 1 1 1 1 1

Γ Γ

γ

γ 1

1

2

2

∆1

φ1 ∆

2

φ2

Figure 5.4: An annular region viewed as the support of a singular cell C =φ1 + φ2

Green’s Theorem can be applied for example to the annular region D inFigure 5.4, which is exhibited as the support of the simple singular 2-chainC = φ1 + φ2; the boundary ∂ C consists of the outer circle γ 1 oriented in thecounterclockwise direction and the inner circle γ 2 oriented in the clockwise direc-tion. If ω = f 1dx1 + f 2dx2 is a C1 differential 1-form in an open neighborhood

of D then Green’s Theorem asserts that γ 1 ω − γ 2 ω = D ∂f 2

∂x1 − ∂f 1

∂x2. Inparticular it follows that

(5.131)

γ 1

x1dx2 − γ 2

x2dx1 =

D

dx1 ∧ dx2 = |D|,

where |D| is the area of the annular region D. Of course the same formula canbe applied to any set that is the support of a simple singular chain, showingthat the area of any such set can be calculated from its boundary alone 6. Asa more interesting example of this technique consider the hypocycloid curve inFigure 5.5, defined by the equation

x2/31 + x2/3

2 = a2/3

for some constant a > 0. This curve can be viewed as the boundary of a simple

singular chain C consisting of four simple singular 2-cells, the images of which6This method for calculating the area of a region from its boundary is embodied in me-

chanical and electronic machines, called planimeters, that have long been used in engineeringand surveying; these machines are discussed for example in the book by John Bryant andChristopher J. Sangwin How round is your circle? Where engineering and mathematics

meet. Princeton University Press, 2008.




are the segments in the four quadrants in Figure 5.5. The boundary chainreduces to the union of the four curved arcs, traversed in the counterclockwise

direction, so is the image of the singular 1-cell

φ(t) = a cos3 t, a sin3 t for 0 ≤ t ≤ 2π;

consequently the area |D| of the region enclosed by the hypocycloid is

|D| =

[0,2π]

(x1dx2 − x2dx1)

=

[0,2π]

a cos3 t · 3a sin2 t cos t − a sin3 t · 3a cos2 t

− sin t

=

[0,2π]

3a2 sin2 t cos2 t) =

[0,2π]

3

4a2 sin2 2t

= [0,2π]

38

a2(1 − cos4t) = 38

πa2.

x

x

1

2

Figure 5.5: A hypocycloid

Green’s Theorem can be written in terms of the differential 1-form ωf =f 1dx1 + f 2dx2 associated to a vector field f as the line integral

(5.132)

γ

f · τ ds =

D

∂f 2∂x1

− ∂f 1∂x2




where τ is the unit tangent vector to γ and ds indicates integration with respectto the arc length of γ . Green’s Theorem also can be written in terms of the

Hodge dual ∗ωf = f 1dx2 − f 2dx1 of the differential form ωf and since d ∗ ωf =(∂ 1f 1 + ∂ 2f 2)dx1 ∧ dx2

(5.133)

γ

∗ωf =

D

∂f 1∂x1

+ ∂f 2∂x2

.

If the curve γ is parametrized by a mapping φ : [0, 1] −→ U the integral of thedifferential form ωf is given by

(5.134)

γ

∗ωf =

t∈[0,1]

f 1

dx2

dt − f 2

dx1

dt

.

This can be viewed as the integral over the curve γ of the inner product f

·v of

the vector field f and the vector field

(5.135) v =

dx2

dt , −dx1

dt

;

and since v2 =

dx2dt

2+dx1dt

2then with the conventions adopted in the

discussion of line integrals the integral (5.134 can be rewritten in terms of theunit vector ν = v/v2 as the line integral

(5.136)

γ

∗ω =

γ

f · ν ds

with respect to the arc length of the curve γ . The vector field ν is orthogonal

to the tangent vector field τ along the curve since

v · τ =

dx2

dt , −dx1

dt

·

dx1

dt ,

dx2

dt

= 0;

and when τ = 1, 0 then ν = (0, −1), so the pair of orthogonal vectors ν , τ has the orientation of the vector space R2. In these terms the version (5.133) of Green’s Theorem takes the form

(5.137)

γ

f · ν ds =

D

∂f 1∂x1

+ ∂f 2∂x2

.

While the line integral

γ f · τ ds meaures the component of the vector field f

in the direction of the tangent vector to γ , the integral γ f · ν ds meaures thecomponent of the vector field f in the direction of the outward normal vector toγ . The first integral sometimes is interpreted as the work done by the vector fieldf in pushing a particle along the curve γ , while the second integral sometimes isinterpreted as the total flow across the curve γ of a substance moving with thevelocity at any point x described by the vector f (x). For a conservative vector




field f = ∇h with the potential function h Green’s Theorem in these two casestakes the forms

γ

∇h · τ ds = D

∂ 2h

∂x1∂x2− ∂ 2h

∂x2∂x1

= 0,

(5.138) γ

∇h · ν ds =

D

∂ 2h

∂x21

+ ∂ 2h

∂x22

=

D

∆h

where ∆h is the Laplacian of the function h, defined as ∆h(x) = ∂ 2h(x)∂x21

+ ∂ 2h(x)∂x22

.

Another classical special case of Theorem 5.18 arises in three dimensions,and has a great number of applications in physics.

Corollary 5.20 (Gauss’s Theorem) If D ⊂ U is the support of a simple

singular 3-chain C in U ⊂ R3 then

(5.139)

∂ C

f 1dx2 ∧ dx3 + f 2dx3 ∧ dx1 + f 3dx1 ∧ dx2

=

D

3i=1

∂f i∂xi

for any C1 vector field f = f 1, f 2, f 3 in U .

Proof: This is just the special case of the Theorem 5.18 for the differential formΩf in R3, so no further proof is required.

For the differential form Ωf associated to a vector field f the surface integralin (5.139) can be written as in (5.96), while the integrand in the integral overthe set D can be written as in (5.68), so Gauss’s Theorem can be written in the

alternative form

(5.140)

Γ

f · ν dS =

D

∇ · f =

D

div f (x)

where Γ is the boundary chain of the simple singular 3-chain C and D = |C|is the support of the chain C . As in the discussion of Green’s Theorem, thesurface integral can be interpreted as the total flow across the boundary Γ of asubstance moving with the velocity at a point x ∈ R3 described by the vectorf (x). With this interpretation, ∇ · f (x) = div f (x) describes the amount of this mysterious substance created at the point x, so the volume integral canbe interpreted as the total amount of the substance being created within D.Gauss’s Theorem asserts that the amount of substance being created in D isequal to the amount flowing outward across the boundary Γ of D , the amount

of material diverging from the domain D and measured locally by div f (x). Forthe particular case in which the vector field f is conservative, so is the gradientf = ∇h of a potential function, Gauss’s Theorem takes the form

(5.141)

Γ

∇h(x) · ν dS =

D

∇ · ∇h(x) =

D

∆h(x)




in terms of the harmonic differential operator ∆h =

3i=1

∂ 2h∂x2i

in R3. For a

simple example of Gauss’s Theorem, to integrate the normal component of thevector field f = 2x1, x2

2, x23 over the boundary Γ of the unit sphere D =

x ∈ R3x2

2 ≤ 1 , since ∇ · f = 2 + 2x2 + 2x3 it follows from (5.140) that Γ

f · ν dS = 2 D(1 + x2 + x3). The unit ball is symmetric with respect to

the variables x2 and x3 so the second two terms in the integral yield 0 andconsequently

Γ

f · ν dS = 2 D 1 = 8

3 π.

A final classical interpretation of the general Stokes’s Theorem of Corol-lary 5.16, which was traditionally called Stokes’s Theorem before that term wasadopted for the general case, involves line integrals in R3 that are the boundariesof singular 2-chains.

Corollary 5.21 If C ⊂ U is a continuously differentiable singular 2-chain in

an open subset U ⊂ R3 then

(5.142) ∂ C

f (x) · τ ds = C

∇ × f (x)

· ν dS =

C

curl f (x) · ν dS

for any C1 vector field f (x) in U .

Proof: This is just the special case of Corollary 5.16 for the differential formωf , so no further proof is necessary.

For a simple example of the classical Stokes’s Theorem, let γ be the intersec-tion of the unit sphere S ⊂ R3, the set S = (x1, x2, x3) |x2

1 + x22 + x2

3 = 1 = 0,with the plane x3 = 0 and f (x) be the vector field f (x) = 2x1, x2

2, x23; since the

curve γ is the boundary of the disc D = (x1, x2, x3)|x21 +x2

2 ≤ 1, x3 = 0 when

the orientation of the disc is described by the differential form dx2 ∧ dx1 and∇×f (x) = 0 Stokes’s Theorem asserts that γ f (x)·τ ds = D(∇×f (x)·ν dS =

0. For a perhaps more interesting observation though, if Ω is a closed 2-form inR3 and S is the unit sphere in R3, so that S is the boundary S = ∂D of the unitball, then by Gauss’s Theorem

S Ω =

D d, Ω = 0; but that does require that

the differential form Ω is defined and closed in the entire ball D, just as with thecorresponding example in R2 discussed on page 120. However if Ω = dω is anexact differential form defined just in an open neighborhood of the unit sphereS , and not necessarily in the entire ball D, it is actually the case that

S

Ω = 0.Indeed if γ is a circle on S , the intersection of S with a plane in R3, then γ splitsthe sphere S into two pieces, S = ∪S 1 ∪ S 2, where γ is the boundary of both S 1and S 2; but if S 1 and S 2 have the same orientation as S and if γ is oriented sothat ∂S 1 = γ then clearly ∂ S 2 = −γ . Then by the classical Stokes’s Theorem S Ω = S 1 Ω + S 2 Ω = γ ω − γ ω = 0. The classical Stokes’s Theorem wasalso used to give a geometrical interpretation of the curl of a vector field. If f (x) is a C1 vector field in an open neighborhood U of a point a ∈ R3, if D

is a disc of radius centered at the point a in a plane L ⊂ R3 through thepoint a, with boundary γ = ∂D in L, and if the plane L is chosen to that theunit normal vector ν to the plane L is parallel to the vector curl f (a),then by




Stokes’s Theorem

γ

f (x) · τ ds =

D

curl f (x) · ν dS ; and since the area of the

disc D is

|D

| = D

dS it follows that

1

π2

γ

f (x) · τ ds =

D

curl f (x) · ν dS D

dS.

The function curl f (x) · ν is a continuous function of the point x, so its averageover the disc D approaches the value curl f (a) · ν | = curl f (a)2 in the limitas tends to zero; consequently

(5.143) curl f (a)2 = lim→0

1

π2

γ

f (x) · τ ds,

so that the length of the vector curl(f (a) is the limit of the line integral of thevector f (x) around the curve γ as tends to zero, where this integral can beviewed as describing the curl of the vector field f (x) at the point a.




PROBLEMS: GROUP I

(1) Find the arc length of the curve γ ∈ R2 described parametrically by f (t) =( 1

3 t3 − t, t2) where 0 ≤ t ≤ 2.

(2) Evaluate γ (x1 + x2)dx1 + dx2 along the curve given parametrically by

(x1, x2) = (t, t2) for 0 ≤ t ≤ 1.

(3) In R4 consider the differential forms

ω1 = x1dx2 + x3dx4,

ω2 = (x21 + x2

3)dx1

∧dx2 + (x2

2 + x24)dx3

∧dx4,

ω3 = dx1 ∧ dx3 + x2dx1 ∧dx4 + x1dx2 ∧ dx4.

(i) Find the exterior products ω1 ∧ ω1, ω1 ∧ ω2 and ω2 ∧ ω2 in reduced form.

(ii) Find the exterior derivatives dω1, dω2 and dω3 in reduced form.

(iii) Find the Hodge duals ∗ω1 and ∗ω3 in reduced form.

(iv) Which of these three differentials forms are closed, that is, satisfy dω = 0?

(v) For each of the differentials ω that is closed find a differential form σ such

that ω = dσ. Having found one such form σ, what is the most general suchform? (Answer this part of the question without giving explicit formulas for allpossible solutions, but just by characterizing all possible solutions using yourgeneral knowledge of the properties of differential forms).

(4) Evaluate γ

x1dx1 + x21dx2 + x2dx3 along the curve x1 = x2 = x3 from

(0, 0, 0) to (1, 1, 1).

(5) Show that the integral γ

sin x2 sin x3 dx1 + x1 cos x2 sin x3 dx2 + x1 sin x2 cos x3 dx3

along a path from the point (0, 0, 0) to the point (1, 1, 1) in R3

is independentof the choice of the path γ .

(6) Show that the differential form

ω(x) = 2x1x2dx1 + (x21 + x2

3)dx2 + 2x2x3dx3




in R3 is closed, and find a function f (x) such that df (x) = ω(x). What is themost general function f (x) with this property?

(7) Let S be the cylindrical surface in R3 defined by the equations x21 + x2

2 =1, 0 ≤ x3 ≤ 1, so described parametrically by (x1, x2, x3) = (cos θ, sin θ, x3)for 0 ≤ θ ≤ 2π, 1 ≤ x3 ≤ 1, with the orientation described by the differentialform dθ ∧ dx3; and let S be the union of the cylindrical surface S and the discsclosing up the top and bottom of the cylinder S , so that S is boundary of asolid cylinder, oriented with the orientation of S extended to an orientation of S as the full boundary of the solid cylinder. Consider also the differential form

ω = x3 dx1 ∧ dx2 + x1x3 dx2 ∧ dx3 + x3 dx1 ∧ dx3.

(i) Evaluate

S ω.

(ii) Evaluate S ω, using Stokes’s Theorem.

PROBLEMS: GROUP II

(8) Find a curve γ in the plane that has length 1 and has the property that γ

f = 0 for the vector field f (x) =

x1

x2

.

(9) Let γ be a C3 curve in R3 parametrized by arc length by a C3 mappingf : [0, L] −→ R3, and let τ (t) = f (t) be the unit tangent vector to γ at thepoint f (t).(i) Show that if τ (t) = 0 then that vector is perpendicular to τ (t).

The length τ (t)2 = κ(t) of this vector is called the curvature of γ at thepoint f (t), and its inverse 1/κ(t) is called the radius of curvature of γ at thepoint f (t).(ii) Show that κ(t) = 0 for all t ∈ [0, L] if and only if γ is a straight line in R3.(iii) Show that the radius of curvature of a circle in a plane in R3 is equal tothe radius of the circle.

If τ (t) = 0 the unit vector ν (t) = 1κ(t)τ

(t) is called the principal normal

vector to the curve γ . The vector β(t) = τ (t) × ν (t), called the binormalvector to the curve γ at the point f (t), is a unit vector that is perpendicular

both to the tangent vector τ (t) and to the principal normal vector ν (t).(iv) Show that β(t) · β(t) = β(t) · f (t) = 0 and consequently that β(t) =−τ (t)ν (t) for a real-valued function τ (t) and that ν (t) = −κ(t)τ (t) + τ (t)β(t).

The function τ (t) is called the torsion of the curve γ . The set of differen-tial equations in (iv), called the Serret-Frenet equations, can be written




equivalently

τ (t) ν (t) β(t)

=τ (t) ν (t) β(t)

0 −κ(t) 0κ(t) 0 −τ (t)

0 τ (t) 0

.

(v) Show that τ (t) = 0 if and only if the curve γ is contained in a 2-dimensionallinear subspace of R3.(vi) Assuming the general result from the elementary theory of ordinary differ-ential equations that all parametrized curves for which the parametrization f (t)

satisfies f i(t) = 3

j=1 mij(t)f j(t) for some given functions mij(t) differ by a

linear transformation of R3, show that any two curves with the same curvatureand torsion can be transformed into one another by a linear transformation inR3. (Curvature and torsion are sometimes called the intrinsic equations of spacecurves.)The cross-product α × β of vectors is defined in (5.31) in the course notes; itis a vector that is perpendicular to α and β , and oriented so that α, β, α × β

is the usual orientation of R3, and the cross-product of two unit vectors is alsoa unit vector as a consequence of (5.34) in the course notes. This algebraicoperations on vectors in R3 is standard and will be used in the later discus-sion, so this problem presents an opportunity to get acquainted with it. For thegeometry of curves see for example volume II of Spivak’s Differential Geometry .

(10) To a vector field f = f i in R4 there can be associated the differentialforms

ωf = f 1dx1 + f 2dx2 + f 3dx3 + f 4dx4,

Ωf = f 1dx2 ∧ dx3 ∧ dx4 + f 2dx3 ∧ dx4 ∧ dx1 + · · · .

(i) For any 3 vector fields f i show that the vector field F for which ΩF =ωf 1 ∧ ωf 2 ∧ ωf 3 can be interpreted as an analogue F = f 1 × f 2 × f 3 of the cross-product in R3, so as a vector field perpendicular to the vector fields f i with theappropriate orientation.(ii) If σ and τ are 2-forms in R4 show that σ ∧ τ can be interepreted as the dotproduct of two vectors in R6.(iii) If f is a vector field and σ is a 2-form in R4 show that ωf ∧ σ can be inter-preted as a collection of dot products of vectors in R3.

(11) For the differential 1-form ω = ωf associated to a C2 vector field f inR3 calculate and interpret (in terms of the vector field f ) the results of the

differential operators

d ω = ∗d ∗ ω, ω = d d ω (x) + d d ω(x).

(The operator is Hodge’s harmonic operator.)




(12) Compute the surface integral

X

(x21 + x2

2)dx1 ∧ dx2 + x1dx2 ∧ dx3

where X is the curved part of the boundary of the cylinder of height 1 withbase the unit circle at the origin in the plane of the variables x1, x2.

(13) Calculate X

(∇ × f ) · ν dS for the vector field f (x) = x2, x1x2x3, x2)where X is the piece of the paraboloid x3 = 3 − x2

1 − x22 for which x3 ≥ 0, and

ν is the unit outward normal vector to X .

(14) In R3 let S be the surface x21 + x2

2 = 1 − x3, 0 ≤ x3 ≤ 1, let S 0 be the discx2

1 + x22 ≤ 1, x3 = 0 and let R be the solid region x2

1 + x22 ≤ 1 − x3, 0 ≤ x3;

assume the usual orientation in R3, so that dx1

∧dx2

∧dx3 is the volume ele-

ment, and orient the surfaces S and S 0 so that the boundary of R is ∂R = S +S 0.

(i) Find S x3dx1 ∧ dx2.

(ii) Find ∂R

x3dx1 ∧ dx2 + x2dx2 ∧ dx3.

(iii) Find S

F · ν dS for the vector field F = (x1, x2, 2x3 − x1 − x2) whereν is the outward normal to the surface S and dS is the element of area of S .

(iv) Find γ

x1dx1 + 2x2dx2 + 3x3dx3 where γ ⊂ S is the curve in R that begins

at the point (1, 0, 0), continues around the boundary ∂S 0 in the direction of thenatural orientation of that boundary to the point (0 , 1, 0), continues along the

intersection of the surface S and the plane x1 = 0 to the point (0,

1

√ 2 ,

1

2 ), con-tinues then in a straight line in the plane x3 = 1

2 to the central line x1 = x2 = 0and then along that line to the vertex (0, 0, 1); and do it the easy way.

(15) Express the integral S

f · ν dS , the area integral of the dot product of the unit normal vector ν and the vector f (x1, x2, x3) = (x2

1, x3, −x2) along thesurface S of the unit sphere x2

1 + x22 + x2

3 = 1, as an integral of an appropriatedifferential form on S . Evaluate the integral by using Stokes’s Theorem.

(16) If ω is a closed differential 1-form on the complement of the origin inR2 show that there is a unique constant c such that ω = c dθ + df where θ isthe angle (or argument) in polar coordinates and f is a differentiable functionin the complement of the origin in R2.

(17) Let Λ1∗(∆) be the set of C2 differential 1-forms in the cell ∆ ⊂ R2 that are

identically 0 in an open neighborhood of the boundary ∂ ∆. If ω1, ω2 ∈ Λ1∗(∆)

show that

(ω1, ω2) =

∆

ω1 ∧ ∗ω2




is an inner product, where * is the Hodge duality operator. Why is the integral

∆ ω1

∧ω2 not a reasonable candidate for an inner product? Show however that

for any continuous function f in ∆ and any 1-form ω ∈ Λ1∗(∆) ∆

df ∧ ω =

∆

−fdω.

(18) A function f in Rn is said to have compact support if there is a compactset S ⊂ Rn such that f (x) = 0 whenever x ∈ S .(i) Show that if f i for 1 ≤ i ≤ n are continuously differentiable functions in Rn

and at least one of them has compact support then Rn

df 1 ∧ df 2 ∧ · · · ∧ df n = 0.

(ii) Find an explicit example of continuously differentiable functions f i in Rn

,of course none of which has compact support, such that

limr→∞

Br

df 1 ∧ df 2 ∧ · · · ∧ df n = ∞

where Br = x ∈Rn x2 ≤ r .

(iii) Show that if f 1, f 2 are C1 functions in Rn and at least one of them hascompact support then

Rn

f 1(x) df 2(x) ∧ dx2 · · · ∧ dxn = − Rn

f 2(x) df 1(x) ∧ dx2 · · · ∧ dxn.

(19) Let f =

f 1

, f 2

, f 3

be a C

1 vector field and g be a C

2 function in an opensubset U ⊂ R3; and let D be a simple singular cell in U , such that its boundary∂D is a C1 submanifold of U .(i) Find d ∗ ωf , where ∗ indicates the Hodge dual form.(ii) Find ∗ ωf ∧ dg.(iii) Show that if f (x) is a tangent vector to the surface ∂D at each point x ∈ ∂Dthen

D

f 1∂ 1g + f 2∂ 2g + f 3∂ 3g

= −

D

∂ 1f 1 + ∂ 2f 2 + ∂ 3f 3

g.

(iv) Show that if ∇g(x) is a tangent vector to ∂D at each point x ∈ ∂ D andif ∆g(x) = ∂ 21 g(x) + ∂ 22 g(x) + ∂ 23 g(x) = 0 at each point x ∈ D then g(x) isconstant in D.



Appendix A

Exterior Algebras

Algebra is the study of a variety of structures: groups, rings, algebras, fields,and so on. A structure of considerable use in the calculus of several variablesis exterior algebra, which is really a topic in linear algebra, or in multilinearalgebra to be more precise, but which is not covered in most standard coursesin linear algebra; so a short survey of the relevant material in an elementaryform sufficient for purposes here is included in this appendix.

If V is a real vector space of dimension n with the basis ξ 1, . . . , ξ n, then toany integer k in the range 1 ≤ k ≤ n there is associated another vector spaceΛk(V ), the basis of which consists of a set of vectors called basic k-vectorsand denoted by

(A.1) ξ i1 ∧ ξ i2 ∧ · · · ∧ ξ ik where 1 ≤ i1 < i2 < . . . < ik ≤ n.

For k = 1 the basic 1-vectors thus are just the basis vectors ξ i themselves, sothat Λ1(V ) = V and consequently dim Λ1(V ) = 1. For k = n on the other handthere is just a single basic n-vector ξ 1 ∧ ξ 2 ∧ ·· · ∧ ξ n, so dimΛn(V ) = 1. Ingeneral for 1 ≤ k ≤ n the basic k-vectors are described by the sets of k-tuples of integers from 1 to n in ascending order, so by the number of distinct subsets of kintegers from the set of the first n integers, and consequently dim Λk(V ) =

nk

.

It is customary to extend this set of vector spaces by setting Λ0(V ) = R. Itis also customary, and very convenient, to introduce the formal k-vectorsξ i1 ∧ ξ i2 ∧ · · · ∧ ξ ik for arbitrary sets of k indices ij in the range 1 ≤ ij ≤ n, withthe understanding that

ξ i1 ∧ ξ i2 ∧ · · · ∧ ξ ik = 0 if the indices i1, i2, . . . , ik are not distinct,

(A.2)

ξ i1 ∧ ξ i2 ∧ · · · ∧ ξ ik = (sgn π)ξ j1 ∧ · · · ∧ ξ jkif j1, . . . , jk is a permutation π of i1, . . . , ik,

where sgn π = ±1 is the sign of the permutation π. If the permutation isdescribed by a permutation matrix M π, with a single entry of 1 in each row and

155



156 APPENDIX A. EXTERIOR ALGEBRAS

column and entries of 0 elsewhere, then sgn π = det M π; equivalently sgn π is thenumber of transpositions of adjacent integers that transform the set j1, . . . , jk

to the set i1, . . . , ik. Any formal k-vector thus is equal to either a unique basick-vector or the negative of a unique basic k-vector. A vector ω ∈ Λk(V ) can bewritten as a unique linear combination of the basic k-vectors

(A.3) ω =

1≤i1<i2<···<ik≤n

ai1i2···ikξ i1 ∧ ξ i2 ∧ · · · ∧ ξ ik

for some real numbers ai1i2···ik ; this is called the reduced form for the vector ω .Alternatively the vector ω can be written in various ways as a linear combinationof the formal k-vectors

(A.4) ω =n

i1,i2,··· ,ik=1

bi1i2···ikξ i1 ∧ ξ i2 ∧ · · · ∧ ξ ik

for some real numbers ai1i2···ik ; and this is called the unreduced form for thatvector. By using (A.2) an unreduced form for a vector ω can be reduced to aunique reduced form for that vector. For example if dim V = 3 a basis for the3-dimensional vector space Λ2(V ) consists of the basic 2-vectors ξ 1 ∧ ξ 2, ξ 1 ∧ ξ 3and ξ 2 ∧ ξ 3; a vector ω in the unreduced form

ω = b12ξ 1 ∧ ξ 2 + b13ξ 1 ∧ ξ 3 + b23ξ 2 ∧ ξ 3

+ b21ξ 2 ∧ ξ 1 + b31ξ 3 ∧ ξ 1 + b32ξ 3 ∧ ξ 2

is in the reduced form

ω = (b12 − b21)ξ 1 ∧ ξ 2 + (b13 − b31)ξ 1 ∧ ξ 3 + (b23 − b32)ξ 2 ∧ ξ 3.

The addition of vectors can be carried out in either the reduced or unreducedform, whichever is more convenient.

There is an additional algebraic operation on vectors in the spaces Λk(V )that associates to an ordered pair of vectors ω ∈ Λk(V ) and σ ∈ Λl(V ) theexterior product ω ∧ σ ∈ Λk+l(V ), of these two vectors; explicitly if

(A.5) ω =

ni1,...,ik=1

ai1···ikξ i1 ∧ · · · ∧ ξ ik ∈ Λk(V )

and

(A.6) σ =n

j1,...,j1=1

bj1···jlξ j1 ∧ · · · ∧ ξ jl ∈ Λl(V )

then

(A.7) ω ∧ σ =

ni1,...,ikj1,...,jl

ai1···ikbj1···jlξ i1 ∧ · · · ∧ ξ ik ∧ ξ j1 ∧ · · · ∧ ξ jl .



157

The unreduced form of the vectors is clearly very convenient for handling wedgeproducts. The order of the vectors ω and σ is important, since

(A.8) ω ∧ σ = (−1)klσ ∧ ω.

To demonstrate that, when the vectors ω and σ in (A.7) are reversed of course

bj1···jlai1···ik = ai1···ikbj1···jlsince these are merely real numbers; but the permutation that exchanges

ξ j1 ∧ · · · ∧ ξ jl ∧ ξ i1 ∧ · · · ∧ ξ ik and ξ i1 ∧ · · · ∧ ξ ik ∧ ξ j1 ∧ · · · ∧ ξ jl

can be written as the composition first of the permutation that transfers ξ i1across ξ j1 ∧ · · · ∧ ξ jl , a permuation with sign (−1)l, followed by the permutationthat transfers ξ i2 across ξ j1 ∧ · · · ∧ ξ jl , another permutation of sign (−1)l, andso on, and this product of k permutations each of sign (−1)l is a permutationof sign (−1)kl. In the special case in which k = 0 a vector ω ∈ Λ0(V ) is just a

real number a, and the wedge product ω ∧ σ with the vector (A.6) is the vectoraσ. Of course ω ∧ σ = 0 if k + l > n. The formal union Λ(V ) =

nk=0 Λk(V ), or

alternativaly the direct sum Λ(V ) =n

k=0 Λk(V ), of the vector spaces Λk(V )thus has two algebraic operations, the addition of vectors in each vector spaceΛk(V ) separately and the exterior product of vectors in any two of the vectorspaces Λk(V ) and Λl(V ); the set Λ(V ) with this algebraic structure is calledan exterior algebra, or more precisely the exterior algebra over a real vectorspace of dimension n.

The basic role of the exterior product is to handle a number of calculationsinvolving determinants. For example if ω1 =

nj=1 a1jξ j and ω2 =

nk=1 a2k

then

ω1

∧ω2 =

n

j,k=1

a1ja2kξ j

∧ξ k = 1≤j<k≤n

(a1ja2k

−a1ka2j)ξ j

∧ξ k

=

1≤j<k≤n

det

a1j a1k

a2j a2k

ξ j ∧ ξ k,

so ω1 ∧ ω2 is the vector consisting of all the distinct determinants of 2 × 2

submatrices of the 2 × n matrix

a11 a12 · · · a1n

a21 a22 · · · a2n

. For another example,

if ωi are any n vectors in V written explicitly in terms of the basis vectorsξ 1, . . . , ξ n as

ωi =

nji=1

aijiξ ji for 1 ≤ i ≤ n,

then

ω1 ∧ · · · ∧ ωn =n

j1,...,jn=1

a1j1a2j2 · · · anjnξ j1 ∧ ξ j2 ∧ · · · ∧ ξ jn

=

nj1,...,jn=1

a1j1a2j2 · · · anjnsgn

j1 j2 · · · jn1 2 · · · n

ξ 1 ∧ ξ 2 · · · ∧ ξ n




where sgn

j1 j2 · · · jn1 2

· · · n

is the sign of the permutation that takes the or-

dered set of integers j1, j2, · · · , jn to the ordered set of integers 1, 2 · · · , n; andas is familiar from linear algebra the preceding equation can be rewritten

(A.9) ω1 ∧ · · · ∧ ωn = det A ξ 1 ∧ ξ 2 · · · ∧ ξ n

for the matrix A = aij. To any n × n matrix A = aij there is naturallyassociated the 2-form ΩA =

ni,j=1 aijξ iξ j ; and if n = 2m is an even integer the

m-th exterior product of the 2-form ΩA with itself is an n-form

(A.10) ΩmA = P (A)ξ 1 ∧ ξ 2 ∧ · · · ∧ ξ n.

The coefficient P (A) is a polynomial in the entries of the matrix A which isnontrivial if the matrix A is skew-symmetric; in that case it is called the Pfaffianof the matrix A, and is another invariant attached to that matrix, with the

property that P (A)2

= det A.The special cases of vector spaces V of dimensions 2 and 3 arise in a greatmany applications. If dim V = 2 then dimΛ0(V ) = 1 since Λ0(V ) = R whiledimΛ1(V ) = 2 with the basis ξ 1, ξ 2 and dimΛ2(V ) = 1 with the basis ξ 1 ∧ ξ 2,which is the only nontrivial wedge product. For any two vectors ω = a1ξ 1 +a2ξ 2

and σ = b1ξ 1 + b2ξ 2 then as above ω ∧ σ = det

a1 a2

b2 b2

. If dim V = 3 then

dimΛ0(V ) = 1 since Λ0(V ) = R while dim Λ1(V ) = 3 with the basis ξ 1, ξ 2, ξ 3and dim Λ2(V ) = 3 with the basis ξ 1 ∧ ξ 2, ξ 1 ∧ ξ 3, ξ 2 ∧ ξ 3 and dimΛ3(V ) = 1with the basis ξ 1 ∧ ξ 2 ∧ ξ 3. What is particularly interesting in this case is thatboth Λ1(V ) = V and Λ2(V ) are vector spaces of dimension 3, so an element of Λ2(V ) also can be identified with a vector in the space V ; it is traditional andnatural to make this identification by associating to a vector a

∈R3 the vector

ωa ∈ Λ1(V ) defined by

(A.11) ωa = a1ξ 1 + a2ξ 2 + a3ξ 3

and the vector Ωa ∈ Λ2(V ) defined by

(A.12) Ωba = a1ξ 2 ∧ ξ 3 + a2ξ 3 ∧ ξ 1 + x3ξ 1 ∧ ξ 2,

the obvious cyclic order. Various wedge products then introduce various alge-braic operations between pairs of vectors in V . Thus

(A.13) ωa ∧ Ωb = (a1b1 + a2b2 + a3b3) ξ 1 ∧ ξ 2 ∧ ξ 3

this exterior product when identified with a real number is just the dot productof the two vectors a and b. On the other hand

ωa ∧ ωb =3

j,k=1

ajbk ξ j ∧ ξ k

= (a2b3 − a3b2)ξ 2 ∧ ξ 3 + (a3b1 − a1b3)ξ 3 ∧ ξ 3 + (a1b2 − a2b1)ξ 1 ∧ ξ 2

= ωc,



159

for the vector c = c1, c2, c3 ∈ R3 with the components

(A.14) c1 = deta2 a3

b2 b3

, c2 = deta3 a1

b3 b1

c3 = deta1 a2

b1 b2

;

this is just the cross-product c = a × b of the two vectors a and b, so ωa ∧ωb = ωa×b. It is an amusing exercise to develop the corresponding algebraicoperations on 4-dimensional vector spaces.



Index

C1 equivalent parametrized curves, 101

algebraexterior, 156, 157

almost everywhere, 69angle, between vectors, 4

arc length, 103parametrization by, 105

basic k-vectors, 155boundary, 6, 9, 137boundary chain

of singular cell, 136

Cartesian norm, 2Casorati-Weierstrass Theorem, 11Cauchy-Schwarz inequality, 4cell, 6

simple singular, 127cell, singular, 127chain

simple singular, 143singular, 133

change of local coordinates, 41characteristic function of a set, 73closed

relatively, 8closed set, 8closure

of a set, 8compact set, 9

complete, 7connected set, 9conservative vector field, 108content

of set, 73content zero, 68

continuous, 12almost everywhere, 69uniformly, 14

continuously differentiable, 20convergent

uniformly, 82coordinate functions

of a mapping, 12coordinates

local, change of, 41coordinats

local, 41critical point, 29cross-product

of vector fields in R3, 114curl

of vector field, 124curve, 101

Ck

, 101parametrized, 101curve, parametrized, 56cycle, 137

Decomposition Theorem, 45degree

of a differential form, 111dependent

functionally, 59derivative, 18

directional, 28exterior, 140

partial, 18derivative, exterior, 118derived set, 8diffeomorphism, 37differentiable, 17

continuously, 20

161



162 INDEX

differential form, 33, 111induced under a mapping, 117

pullback under a mapping, 117reduced, 111reduced and unreduced forms, 112unreduced, 111

differential formsexterior products, 113linear combinations, 113

directional derivative, 28distance function, 5divergence

of vector field, 123domain, Jordan, 71dot product, 3duality

of differential forms, 115

edgesizeof cell, 10

equivalentas manifolds, 60norms, 4singular cells, 128

equivalent parametrized curves, 101Euclidean norm, 2extended Jordan domain, 87

exterior algebra, 156, 157exterior derivative, 118, 140exterior product, 156

of differential forms, 113

field,vector, 33formal k-vectors, 155Fubini’s Theorem, 78function

induced, 102induced under a mapping, 116pullback, 102

functionally dependent, 59

functionsdcoordinate,of a mapping, 12

Gauss’s Theorem, 147gradient, 28greatest lower bound, 7

Green’s Theorem, 143group

singular homology, 139

Heine-Borel Theorem, 10Hessian, 27, 29Hodge duality, 115homeomorphic submanifolds, 52homeomorphism, 37homology group

singular, 139

improper integral, 87Index of notation

(x, y), inner product, 2

E , derived set or set of limit pointsof E , 8

L(φ), arc length of parametrizedcurve φ, 103

S ∗(f, P ), upper sum of function f for partition P , 66

S ∗(f, P ), upper sum of function f for partition P , 66

∆f (x), change in function f (x),30

∆, cell, 6∆(a, b) unoriented interval between

a and b, 90Λ(V ), exterior algebra on vectorspace V , 156

Λk(V ), space spanned by the basick-vectors in V , 155

ν , unit normal vector to singular2-cell, 130

φ∗(f ), induced function under themapping φ, 116

φ∗(f ω), induced differential r-formunder the mapping φ, 117

τ , unit tangent vector to a curve,108

x · y, dot product, 2χD

(x), characteristic function of D, 73

-neighborhood, 5inf S , infimum or greatest lower

bound of set S , 7



INDEX 163

∆

f , Riemann integral, 67

∇f (x), gradient of function f (x),

28∇xk(x, y), gradient of a function

k(x, y) viewed as a functionof x alone for each fixed y, 85

ω ∧σ, exterior product of differen-tial forms, 113

E , closure of set E , 8∂S , boundary of set S , 6, 9∂ uf (a), directional derivative of func-

tion f , 28sup S , supremum or least upper

bound of set S , 7

d(x, y), distance between points xand y, 5f n −→ f , f n converge to f , 82

f n=⇒U f , f n converge uniformly to

f in U , 82of (S ), oscillation of function f in

a set S , 68of (a), oscillation of function f at

a point a, 68curlf (x) = ∇× f (x), 124grad f (x), gradient of function f (x),

28

C∞ homeomorphism, 37

C1, class of continuously differen-tiable mappings, 20

C2, functions for which all secondpartial derivatives exist andare continuous, 25

N (a), neighborhood of a, 5∇, differential operator, 123divf (x) = ∇ · f (x), 123

induced differential formunder a mapping, 117

induced function, 102under a mapping, 116

infimum, 7inner product, 2integrable

Riemann, 71integrable, Riemann, 74integral

improper, 87iterated, 78

lower, 67on a parametrized curve, 102oriented, 90, 126Riemann, 67, 71upper, 67

integral, line, 108interval, 7Inverse Mapping Theorem, 40iterated integral, 78

Jacobian, 38Jacobian matrix, 38Jordan domain, 71

extended, 87

k-vectorsbasic, 155formal, 155

Lagrange multipliers, 58least upper bound, 7length, arc, 103limit point, 7

relative, 8line integral, 108

linear combinationof differential forms, 113local

change of coordinates, 41coordinate system, 41

locus, zero, 50lower integral, 67lower sum, 66

manifoldcontinuous, 59

mapping, 12open, 41

matrixJacobian, 38

maximumlocal, 29

Mean Value Inequality, 31Mean Value Theorem, 31



164 INDEX

measure zero, 68minimum

local, 29multi-index notation, 112

norm, 2equivalent, 4

notationmulti-index, 112

openrelatively, 8

open covering, 9open mapping, 41open neighborhood, 7open set, 7orientaion, 42oriented integral, 90, 126orthogonal, 4oscillation, 68

parametrization by arc length, 105parametrized curve, 56parametrized curves

equivalent, 101partial derivative, 18partition, 65

pfaffian, 158pointsingular, 42

polarization identity, 3potential, 108pullback function, 102pullback of a differential form

under a mapping, 117

Rank Theorem, 53reduced

form of differential form, 112reduced form, 111

of vector in ΛK

(V ), 156refinement, 66relative

limit point, 8relatively closed, 8relatively open, 8

Riemann integrable, 71Riemann integral, 67, 71

separated sets, 9simple

singular chain, 143simple singular cell, 127singular cell, 127

simple, 127singular chain, 133

simple, 143singular homology group, 139singular point, 42singularities

of submanifolds, 52singularity

of a mapping, 42Stokes’s Theorem, 140subcovering, 9submanifold, 50

weakly homeomorphic, 59sum

lower, 66upper, 66

supportof a singular chain, 143

supremum, 7

tangent line, 56tangent space, 57tangent vector, 56Taylor expansion, 26Theorem of Riemann and Lebesgue, 70topology, 7topology, on a set, 7

uniformly continuous, 14uniformly convergent, 82unit normal vector

to singular 2-cell, 130

unit tangent vector, 108unreduced

form of differential form, 112unreduced form, 111

of vector in ΛK (V ), 156upper integral, 67



INDEX 165

upper sum, 66

vectortangent, 56unit tangent, 108

vector field, 33conservative, 108

weakly homeomorphic submanifolds, 59

zero locus, 50

gunning: functions of several real variables

Documents