lecture notes on mathematics for...

105
Lecture Notes on Mathematics for Economists 1 by Takashi Kunimoto First Version: August 9, 2007 This Version: May 18, 2010 Summer 2010, Department of Economics, McGill University August 16 - 27 (tentative): Monday - Friday, 10:00am - 1:00pm; at TBA Instructor: Takashi Kunimoto Email: [email protected] Class Web: http://people.mcgill.ca/takashi.kunimoto/?View=Publications Office: Leacock 438 COURSE DESCRIPTION: This course is designed to provide you with mathematical tools that are extensively used in graduate economics courses. The topics which will be covered is Sets and Functions; Topology in the Euclidean Space; Linear Algebra Multivariate Calculus; Static Optimization; (Optional) Correspondences and Fixed Points; and (Optional) The First-Order Differential Equations in One Variable. A good comprehension of the material covered in the notes is essential for successful graduate studies in economics. Since we are seriously time constrained – which you might not believe –, it would be very useful for you to carry one of the books provided below as a reference after you start to study in graduate school in September. READING: The main textbook of this course is “Further Mathematics for Economic Analysis.” I mostly use this book for the course. However, if you don’t find the main textbook helpful enough, I strongly recommend you that you should buy at least one of the other books I listed below as well as “Further Mathematics for Economic Analysis.” Of course, you can buy any math book which you find useful. 1 I am thankful to the students for their comments, questions, and suggestions. Yet, I believe that there are still many errors in this manuscript. Of course, all remaining ones are my own. 1

Upload: vannga

Post on 08-Mar-2018

233 views

Category:

Documents


1 download

TRANSCRIPT

Lecture Notes on Mathematics for Economists 1

byTakashi Kunimoto

First Version: August 9, 2007This Version: May 18, 2010

Summer 2010, Department of Economics, McGill UniversityAugust 16 - 27 (tentative): Monday - Friday, 10:00am - 1:00pm; at TBA

Instructor: Takashi KunimotoEmail: [email protected] Web: http://people.mcgill.ca/takashi.kunimoto/?View=PublicationsOffice: Leacock 438

COURSE DESCRIPTION: This course is designed to provide you withmathematical tools that are extensively used in graduate economics courses. The topicswhich will be covered is

• Sets and Functions;

• Topology in the Euclidean Space;

• Linear Algebra

• Multivariate Calculus;

• Static Optimization;

• (Optional) Correspondences and Fixed Points; and

• (Optional) The First-Order Differential Equations in One Variable.

A good comprehension of the material covered in the notes is essential for successfulgraduate studies in economics. Since we are seriously time constrained – which youmight not believe –, it would be very useful for you to carry one of the books providedbelow as a reference after you start to study in graduate school in September.

READING:

The main textbook of this course is “Further Mathematics for Economic Analysis.” Imostly use this book for the course. However, if you don’t find the main textbook helpfulenough, I strongly recommend you that you should buy at least one of the other booksI listed below as well as “Further Mathematics for Economic Analysis.” Of course, youcan buy any math book which you find useful.

1I am thankful to the students for their comments, questions, and suggestions. Yet, I believe thatthere are still many errors in this manuscript. Of course, all remaining ones are my own.

1

• “Further Mathematics for Economic Analysis,” by Knut Sydsaeter, Peter Ham-mond, Atle Seierstad, and Atle Strom, Prentice Hall, 2005 (Main Textbook. Ifyou don’t have any math book or are not confident about your math skill, thisbook will help you a lot.)

• “Mathematical Appendix,” in Advanced Microeconomic Theory, Second Edition,by Geoffrey A. Jehle and Philip J. Reny, (2000), Addison Wesley (Supplementary.This is the main textbook for Econ 610 but the mathematical appendix of thisbook is too concise in many times)

• “Mathematics for Economists,” by Simon and Blume, Norton, (1994). (Supple-mentary. This book is a popular math book in many Ph.D programs in economics.There has to be a reason for that, although I don’t know the true one.)

• “Fundamental Methods of Mathematical Economics,” by A. Chiang, McGraw-Hill.(more elementary and supplementary)

• “Introductory Real Analysis,” by A. N. Kolmogorov and S.V. Fomin, Dover Publi-cations (very very advanced and supplementary. If you really like math, this is thebook for you.)

OFFICE HOURS: Wednesday and Friday, 2:00pm - 3:00pm

PROBLEM SETS: There will be several problem sets. Problem sets are essential tohelp you understand the course and to develop your skill to analyze economic problems.

ASSESSMENT: No grade will be assigned. However, you are expected to do theproblem sets assigned.

2

Contents

1 Introduction 6

2 Preliminaries 92.1 Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.1.1 Necessity and Sufficiency . . . . . . . . . . . . . . . . . . . . . . . 92.1.2 Theorems and Proofs . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.2 Set Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.3 Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.3.1 Preference Relations . . . . . . . . . . . . . . . . . . . . . . . . . . 132.4 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.4.1 Least Upper Bound Principle . . . . . . . . . . . . . . . . . . . . . 15

3 Topology in Rn 173.1 Sequences on R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.1.1 Subsequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183.1.2 Cauchy Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.1.3 Upper and Lower Limits . . . . . . . . . . . . . . . . . . . . . . . . 203.1.4 Infimum and Supremum of Functions . . . . . . . . . . . . . . . . 213.1.5 Indexed Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.2 Point Set Topology in Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.3 Topology and Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . 233.4 Properties of Sequences in Rn . . . . . . . . . . . . . . . . . . . . . . . . . 243.5 Continuous Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4 Linear Algebra 294.1 Basic Concepts in Linear Algebra . . . . . . . . . . . . . . . . . . . . . . . 294.2 Determinants and Matrix Inverses . . . . . . . . . . . . . . . . . . . . . . 32

4.2.1 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324.2.2 Matrix Inverses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324.2.3 Cramer’s Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.3 Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344.4 Linear Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4.4.1 Linear Dependence and Systems of Linear Equations . . . . . . . . 364.5 Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3

4.5.1 Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374.5.2 How to Find Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . 39

4.6 Diagonalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404.7 Quadratic Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424.8 Appendix 1: Farkas Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . 44

4.8.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454.8.2 Fundamental Theorem of Linear Algebra . . . . . . . . . . . . . . 464.8.3 Linear Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . 474.8.4 Non-Negative Solutions . . . . . . . . . . . . . . . . . . . . . . . . 474.8.5 The General Case . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

4.9 Appendix 2: Linear Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . 494.9.1 Number Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 494.9.2 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 514.9.3 Bases, Components, Dimension . . . . . . . . . . . . . . . . . . . . 524.9.4 Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 534.9.5 Morphisms of Linear Spaces . . . . . . . . . . . . . . . . . . . . . . 54

5 Calculus 555.1 Functions of a Single Variable . . . . . . . . . . . . . . . . . . . . . . . . . 555.2 Real-Valued Functions of Several Variables . . . . . . . . . . . . . . . . . 565.3 Gradients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 565.4 The Directional Derivative . . . . . . . . . . . . . . . . . . . . . . . . . . . 575.5 Convex Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

5.5.1 Upper Contour Sets . . . . . . . . . . . . . . . . . . . . . . . . . . 595.6 Concave and Convex Functions . . . . . . . . . . . . . . . . . . . . . . . . 595.7 Concavity/Convexity for C2 Functions . . . . . . . . . . . . . . . . . . . . 60

5.7.1 Jensen’s Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . 635.8 Quasiconcave and Quasiconvex Functions . . . . . . . . . . . . . . . . . . 645.9 Total Differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

5.9.1 Linear Approximations and Differentiability . . . . . . . . . . . . . 695.10 The Inverse of a Transformation . . . . . . . . . . . . . . . . . . . . . . . 725.11 Implicit Function Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . 73

6 Static Optimization 776.1 Unconstrained Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . 77

6.1.1 Extreme Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 776.1.2 Envelope Theorems for Unconstrained Maxima . . . . . . . . . . . 786.1.3 Local Extreme Points . . . . . . . . . . . . . . . . . . . . . . . . . 796.1.4 Necessary Conditions for Local Extreme Points . . . . . . . . . . . 80

6.2 Constrained Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . 816.2.1 Equality Constraints: The Lagrange Problem . . . . . . . . . . . . 816.2.2 Lagrange Multipliers as Shadow Prices . . . . . . . . . . . . . . . . 846.2.3 Tangent Hyperplane . . . . . . . . . . . . . . . . . . . . . . . . . . 846.2.4 Local First-Order Necessary Conditions . . . . . . . . . . . . . . . 85

4

6.2.5 Second-Order Necessary and Sufficient Conditions for Local Ex-treme Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

6.2.6 Envelope Result for Lagrange Problems . . . . . . . . . . . . . . . 876.3 Inequality Constraints: Nonlinear Programming . . . . . . . . . . . . . . . 886.4 Properties of the Value Function . . . . . . . . . . . . . . . . . . . . . . . 906.5 Constraint Qualifications . . . . . . . . . . . . . . . . . . . . . . . . . . . 916.6 Nonnegativity Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . 946.7 Concave Programming Problems . . . . . . . . . . . . . . . . . . . . . . . 956.8 Quasiconcave Programming . . . . . . . . . . . . . . . . . . . . . . . . . . 966.9 Appendix: Linear Programming . . . . . . . . . . . . . . . . . . . . . . . . 96

7 Differential Equations 977.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

8 Fixed Point Theorems 988.1 Banach Fixed Point Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 988.2 Brouwer Fixed Point Theorem . . . . . . . . . . . . . . . . . . . . . . . . 99

9 Topics on Convex Sets 1029.1 Separation Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1029.2 Polyhedrons and Polytopes . . . . . . . . . . . . . . . . . . . . . . . . . . 1049.3 Dimension of a Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1059.4 Properties of Convex Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

5

Chapter 1

Introduction

I start my lecture with Rakesh Vohra’s message about what economic theory is. He is aprofessor at Northwestern University. 1

All of economic theorizing reduces, in the end, to the solution of one ofthree problems.

Given a function f and a set S:

1. Find an x such that f(x) is in S. This is the feasibility question.

2. Find an x in S that optimizes f(x). This is the problem of optimality.

3. Find an x in S such that f(x) = x, this is the fixed point problem.

These three problems are, in general, quite difficult. However, if one isprepared to make assumptions about the nature of the underlying function(say it is linear, convex or continuous) and the nature of the set S (convex,compact etc.) it is possible to provide answers and very nice ones at that.

I think this is the biggest picture of economic theory you could have as you go alongthis course. Whenever you are at a loss, please come back to this message.

We build our theory on individuals. Assume that all commodities are traded in thecentralized markets. Throughout Econ 610 and 620, we assume that each individual(consumer and firm) takes prices as given. We call this the price taking behavior assump-tion. You might ask why individuals are price takers. My answer would be “why not?”Let us go as far as we can with this behavioral assumption and thereafter try to see thelimitation of the assumption. However, you have to wait for Econ 611 and 621 for howto relax this assumption. So, stick with this assumption. For each consumer, we wantto know

1. What is the set of “physically feasible bundles? Is there any such a bundle at all(feasibility)? We call this set the consumption set.

1See Preface of Advanced Mathematical Economics by Rakesh V. Vohra.

6

2. What is the set of “financially feasible bundles? Is there any such a bundle at all(feasibility)? We call this set the budget set.

3. What is the best bundle to the consumer among all feasible bundles (optimality)?We call this bundle the consumer’s demand.

We can make the exact parallel argument for the firm. What is the set of “technically”feasible inputs (feasibility)? We call this the production set of the firm. What is thebest combination of inputs to maximize its profit (optimality)? We call this the firm’ssupply. Once we figure out what are feasible and best choices to each consumer and eachfirm under any possible circumstance, we want to know if there is any coherent state ofaffairs where everybody makes her best choice. In particular, all markets must clear. Wecall this coherent state “competitive (Walrasian) equilibrium.” (a fixed point).

If we move from microeconomics to macroeconomics, we must pay special attentionto time. Now, each individual’s budget set does depend upon time. At each point intime, he can change his asset portfolio so that he smoothes out his consumption planand/or production plan over time. If you know exactly when you die, there is no problem.Because you just leave no money when you die, unless you want to leave some moneyto your kids (i.e., altruistic preferences). This is called the finite time horizon problem.What if you might live longer than you expected with no money left? Then, what doyou do? So, in reality, you don’t know exactly when you die. This situation can beformulated as the infinite time horizon problem. Do you see why? To deal with theinfinite horizon problem, we use the transversality condition as the terminal condition ofthe feasible set. Moreover, his optimization must be taken into account time. You canalso analogously define a sequence of competitive equilibria of the economy.

How can we summarize what we discussed above? Given a (per capita) consump-tion stream {ct}∞t=0, (per capita) capital accumulation {kt}∞t=0, (per capita) GDP stream{f(kt)}∞t=0, capital depreciation rate δ, the population growth rate n, (per capita) con-sumption growth g, instantaneous utility function of the representative consumer u(·),the effective discount rate of the representative consumer β > 0, (per capita) wage profile{wt}∞t=0, and capital interest rate profile {rt}∞t=0:

1. Find a {ct}∞t=0 such that kt = f(kt) − ct − (δ + g + n)kt holds at each t ≥ 1 andk0 > 0 is exogenously given. This is the feasibility question. Any such {ct} is calleda feasible consumption stream.

2. Find a feasible consumption stream {ct}∞t=0 that maximizes V0 =∫∞0 e−βtu(ct)dt.

This is the problem of optimality. I assume that V0 < ∞.

3. Find a {rt, wt}∞t=0 such that V0 (the planner’s optimization) is sustained throughmarket economies where kt = (rt − n − g)kt + wt − ct holds at each t ≥ 1 andanother condition limt→∞ λte

−βtkt = 0 holds. This latter condition is sometimescalled the transversality condition. This is the fixed point problem. This, in fact,can be done by choosing rt = f

′(kt) − δt and wt = f(kt) − f

′(kt)kt at each t ≥ 1.

7

With appropriate re-interpretations, the above is exactly what we had in the begin-ning except the transversality condition, which is a genuine feature of macroeconomics.

8

Chapter 2

Preliminaries

2.1 Logic

Theorems provide a compact and precise format for presenting the assumptions andimportant conclusions of sometimes lengthy arguments, and so help identify immediatelythe scope and limitations of the result presented. Theorems must be proved and a proofconsists of establishing the validity of the statement in the theorem in a way that isconsistent with the rules of logic.

2.1.1 Necessity and Sufficiency

Consider any two statements, p and q. When we say “p is necessary for q,” we meanthat p must be true for q to be true. For q to be true requires p to be true, so wheneverq is true, we know that p must also be true. So we might have said, instead, that “p istrue if q is true,” or simply that “p is implied by q” (p ⇐ q).

Suppose we know that “p ⇐ q” is a true statement. What if p is not true? Becausep is necessary for q, when p is not true, then q cannot be true, either. But doesn’t thisjust say that “q not true” is necessary for “p not true”? Or that “not-q” is implied by“not-p” (¬q ⇐ ¬p). This latter form of the original statement is called the contrapositiveform.

Let’s consider a simple illustration of these ideas. Let p be the statement, “x is aninteger less than 10.” Let q be the statement that “x is an integer less than 8.” Clearly,p is necessary for q (q ⇒ p). If we form the contrapositive of these two statements, thestatement ¬p becomes “x is not an integer less than 10,” and “x is not an integer lessthan 8.” Then, observe that ¬q ⇐ ¬p. However, ¬p ⇐ ¬q is false. The value of x couldwell be 9.

The notion of necessity is distinct from that of sufficiency. When we say “p issufficient for q,” we mean that whenever p holds, q must hold. We can say, “p is trueonly if q is true,” or that “p implies q” (p ⇒ q). Once again, whenever the statement

9

“p ⇒ q” is true, the contrapositive statement, “¬q ⇒ ¬p” is also true.

Two implications, “p ⇐ q” and “p ⇒ q,” can both be true. When this is so, I saythat “p is necessary and sufficient for q,” or “p is true if and only if q is true,” or “p iffq.” When “p is necessary and sufficient for q,” we say that the statements p and q areequivalent and write “p ⇔ q.”

To illustrate briefly, suppose that p and q are the following statements:

• p ≡ “X is yellow,”

• q ≡ “X is a lemon.”

Certainly, if X is a lemon, then X is yellow. Here, p is necessary for q. At the sametime, just because X is yellow does not mean that it must be a lemon. It could be abanana. So p is not sufficient for q.

2.1.2 Theorems and Proofs

Mathematical theorems usually have the form of an implication or an equivalence, whereone or more statements are alleged to be related in particular ways. Suppose we havethe theorem “p ⇒ q.” Here, p is the assumption and q is the conclusion. To prove atheorem is to establish the validity of its conclusion given the truth of its assumption,and several methods can be used to do that.

1. In a constructive proof, we assume that p is true, deduce various consequences ofthat, and use them to show that q must also hold. This is also sometimes called adirect proof, for obvious reasons.

2. In a contrapositive proof, we assume that q does not hold, then show that p cannothold. This approach takes advantage of the logical equivalence between the claims“p ⇒ q” and “¬q ⇒ ¬p” noted earlier, and essentially involves a constructive proofof the contrapositive to the original statement.

3. In a proof by contradiction, the strategy is to assume that p is true, assume that qis not true, and attempt to derive a logical contradiction. This approach relies onthe fact that p ⇒ q or ¬q always is true and if “p ⇒ ¬q” is false, then p ⇒ q mustbe true.

4. In a proof by mathematical induction, I have a statement H(k) which does dependupon a natural number k. What I want to show is that a statement H(k) is truefor each k = 1, 2, . . . . First, I show that H(1) is true. This step is usually easy toestablish. Next, I show that H(k) =⇒ H(k +1), i.e., if H(k) is true, then H(k +1)is also true. These two steps allows me to claim that I am done.

If I assert that p is necessary and sufficient for q, or that “p ⇔ q,” we must give aproof in “both directions.” That is, both “p ⇒ q” and “q ⇒ p” must be establishedbefore a complete proof of the assertion has been achieved.

10

It is important to keep in mind the old saying that goes, “Proof by example is noproof.” Suppose the following two statements are given:

• p ≡ “x is a student,”

• q ≡ “x has red hair.”

Assume further that we make the assertion “p ⇒ q.” Then clearly finding one studentwith red hair and pointing him out to you is not going to convince you of anything.Examples are good for illustrating but typically not for proving.

Finally, a sort of converse to the old saying about examples and proofs should benoted. Whereas citing a hundred examples can never prove that a certain propertyalways holds, citing one solitary counterexample can disprove that the property alwaysholds. For instance, to disprove the assertion about the color of students’ hair, you needsimply point out one student with brown hair. A counterexample proves that the claimcannot always be true because you have found at least one case where it is not.

2.2 Set Theory

A set is any collection of elements. Sets of objects will usually be denoted by capitalletters, A,S, T for example, while their members by lower case, a, s, t for example (Englishor Greek). A set S is a subset of another set T if every element of S is also an elementof T . We write S ⊂ T . If S ⊂ T , then x ∈ S ⇒ x ∈ T . The set S is a proper subset of Tif S ⊂ T and S = T ; sometimes one writes S � T in this case. Two sets are equal sets ifthey each contain exactly the same elements. We write S = T whenever x ∈ S ⇒ x ∈ Tand x ∈ T ⇒ x ∈ S. The number of elements in a set S, its cardinality, is denoted|S|. The upside down “A,” ∀, means “for all,” while the backward “E,” ∃ means “thereexists.”

A set S is empty or is an empty set if it contains no elements at all. It is a subsetof “every” set. For example, if A = {x| x2 = 0, x > 1}, then A is empty. We denotethe empty set by the symbol ∅. The complement of a set S in a universal set U is theset of all elements in U that are not in S and is denoted Sc. For any two sets S and Tin a universal set U , we define the set difference denoted S\T , as all elements in the setS that are not elements of T . Thus, we can think Sc = U\S. The symmetric differenceS�T = (S\T )∪ (T\S) is the set of all elements that belong to exactly one of the sets Sand T . Note that if S = T , then S�T = ∅.

For two sets S and T , we define the union of S and T as the set S∪T ≡ {x| x ∈S or x ∈ T}.We define the intersection of S and T as the set S ∩ T ≡ {x| x ∈ S and x ∈ T}. LetΛ ≡ {1, 2, 3, . . . } be an index set. In stead of writing {S1, S2, S3, . . . }, we can write{Sλ}λ∈Λ. We would denote the union of all sets in the collection by

⋃λ∈Λ Sλ, and the

intersection of all sets in the collection as⋂

λ∈Λ Sλ.

11

The following are some important identities involving the operations defined above.

• A ∪ B = B ∪ A, (A ∪ B) ∪ C = A ∪ (B ∪ C), A ∪ ∅ = A

• A ∩ B = B ∩ A, (A ∩ B) ∩ C = A ∩ (B ∩ C), A ∩ ∅ = ∅• A∪ (B ∩C) = (A∪B)∩ (A∪C), A∩ (B∪C = (A∩B)∪ (A∩C) (Distribute laws)

• A\(B ∪ C) = (A\B) ∩ (A\C), A\(B ∩ C) = (A\B) ∪ (A\C) (De Morgan’s laws)

• A�B = B�A, (A�B)�C = A�(B�C), A�∅ = A

The collection of all subsets of a set A is also a set, called the power set of A anddenoted by P(A). Thus, B ∈ P(A) ⇐⇒ B ⊂ A.

Example 2.1 Let A = {a, b, c}. Then, P(A) = {∅, {a}, {b}, {c},{a, b}, {a, c}, {b, c}, {a, b, c}}.The previous argument reveals its stance that the order of the elements in a set

specification does not matter. In particular, {a, b} = {b, a}. However, on many occasions,one is interested in distinguishing between the first and the second elements of a pair.One such example is the coordinates of a point in the x − y-plane. These coordinatesare given as an ordered pair (a, b) of real numbers. The important property of orderedpairs is that (a, b) = (c, d) if and only if a = c and b = d. The product of two sets S andT is the set of “ordered pairs” in the form (s, t), where the first element in the pair is amember of S and the second is a member of T . The product of S and T is denoted

S × T ≡ {(s, t)| s ∈ S, t ∈ T}.

The set of real numbers is denoted by the special symbol R and is defined as

R ≡ {x| −∞ < x < ∞}.

Any n-tuple, or vector, is just an n dimensional ordered tuple (x1, . . . , xn) and canbe thought of as a “point” in n dimensional Euclidean space. This space is defined asthe set product

Rn ≡ R × · · · × R︸ ︷︷ ︸n times

≡ {(x1, . . . , xn) | xi ∈ R, i = 1, . . . , n}.

Often, we want to restrict our attention to a subset of Rn, called the “nonnegativeorthant” and denoted Rn

+, where

Rn+ ≡ {(x1, . . . , xn) | xi ≥ 0, i = 1, . . . , n} ⊂ Rn.

Furthermore, we sometimes talk about the strictly “positive orthant” of Rn

Rn++ ≡ {(x1, . . . , xn) | xi > 0, i = 1, . . . , n} ⊂ Rn

+.

12

2.3 Relations

Any ordered pair (s, t) associates an element s ∈ S to an element t ∈ T . Any collectionof ordered pairs is said to constitute a binary relation between the sets S and T . Manyfamiliar binary relations are contained in the product of one set with itself. For example,let X be the closed unit interval, X = [0, 1]. Then the binary relation ≥ consists of allordered pairs of numbers in X where the first one in the pair is greater than or equal tothe second one. When, as here, a binary relation is a subset of the product of one setX with itself, we say that it is a relation on the set X. A binary relation R on X isrepresented by the subset of X × X, i.e., R ⊂ X × X. We can build more structure fora binary relation on some set by requiring that it possesses certain properties.

Definition 2.1 A relation R in X is reflexive if xRx for all x ∈ X.

For example, ≥ and = on R are reflexive, while > is not.

Definition 2.2 A relation R on X is complete if, for all elements x and y in X, xRyor yRx.

For example, ≥ on R is complete, while > and = are not. Note that R on X isreflexive if it is complete.

Definition 2.3 A relation R on X is transitive if, for any three elements x, y, andz ∈ X, xRy and yRz implies xRz.

For instance, all ≥,=, > on R are transitive.

Definition 2.4 A relation R on X is symmetric if xRy implies yRx and it is anti-symmetric if xRy and yRx implies x = y.

For example, = is symmetric, while ≥ and > are not. However, ≥ is anti-symmetric,while > is not as well. A relation R is said to be a partial ordering on X if it is reflexive,transitive, and anti-symmetric. If a partial ordering is complete, it is called a linearordering. For instance, the relation ≥ in R is a linear ordering.

For n ≥ 2, the less-than-or-equal-to relation ≥ on Rn is defined by (x1, . . . , xn) ≥(y1, . . . , yn) if and only if xk ≥ yk for k = 1, . . . , n. There is also a strict inequalityrelation �, which is given by (x1, . . . , xn) � (y1, . . . , yn) if and only if xk > yk for allk = 1, . . . , n.

2.3.1 Preference Relations

I now talk a little bit about economics. Here I apply the concept of relations to theconsumer choice problem. The number of commodities is finite and equal to n. Eachcommodity is measured in some infinitely divisible units. Let x = (x1, . . . , xn) ∈ Rn

+

be a consumption bundle. Let Rn+ be a consumption set that is the set of bundles the

consumer can conceive. We represent the consumer’s preferences by a binary relation,

13

�, defined on the consumption set, Rn+. If x � x

′, we say that “x is at least as good as

x′,” for this consumer.

Definition 2.5 The binary relation � on X is said to be strict preference relationif, x � x

′if and only if x � x

′but x

′� x.

Definition 2.6 The binary relation ∼ on X is said to be indifference relation if,x ∼ x

′if and only if x � x

′and x

′ � x.

Exercise 2.1 Show the following:

1. � on Rn+ is reflexive if it is complete.

2. � on Rn+ is not symmetric.

3. ∼ on Rn+ is symmetric.

2.4 Functions

A function is a relation that associates each element of one set with a single, uniqueelement of another set. We say that the function f is a mapping, map, or transformationfrom one set D to another set R and write f : D → R. We call the set D the domainand the set R the range of the mapping. If y is the point in the range mapped into bythe point x in the domain, we write y = f(x). In set-theoretic terms, f is a relation fromD to R with the property that for each x ∈ D, there is exactly one y ∈ R such that xfy(x is related to y via f).

The image of f is that set of points in the range into which some point in the domainis mapped, i.e.,

I ≡ {y | y = f(x) for some x ∈ D} ⊂ R.

The inverse image of a set of points S ⊂ I is defined as

f−1(S) ≡ {x | x ∈ D, f(x) ∈ S} .

The graph of the function f is the set of ordered pairs

G ≡ {(x, y) | x ∈ D, y = f(x)} .

If f(x) = y, one also writes x �→ y. The squaring function s : R → R, for example,can then be written as s : x �→ x2. Thus, �→ indicates the effect of the function on anelement of the domain. If f : A → B is a function and S ⊂ A, the restriction of f to Sis the function f |S defined by f |S(x) = f(x) for every x ∈ S. There is nothing in thedefinition of a function that prohibits more than one element in the domain from beingmapped into the same element in the range. If, however, every point in the range isassigned to “at most” a single point in the domain, the function is said to be one-to-one,

14

that is, for all x, x′ ∈ D, whenever f(x) = f(x

′), then x = x

′. If the image is equal to

the range - if for every y ∈ R, there is x ∈ D such that f(x) = y, the function is saidto be onto. If a function is one-to-one and onto (sometimes called bijective), then aninverse function f−1 : R → D exists that is also one-to-one and onto. The compositionof a function f : A → B and a function g : B → C is the function g ◦ f : A → C givenby (g ◦ f)(a) = g(f(a)) for all a ∈ A.

Exercise 2.2 Show that f(x) = x2 is not a one-to-one mapping.

2.4.1 Least Upper Bound Principle

A set S of real numbers is bounded above if there exists a real number b such that b ≥ xfor all x ∈ S. This number b is called an upper bound for S. A set that is bounded abovehas many upper bounds. A least upper bound for the set S is a number b∗ that is anupper bound for S and is such that b∗ ≤ b for every upper bound b. The existence of aleast upper bound is a basic and non-trivial property of the real number system.

Fact 2.1 (Least Upper Bound Principle) Any nonempty set of real numbers that isbounded above has a least upper bound.

This principle is rather an axiom of real numbers. A set S can have at most one leastupper bound, because if b∗1 and b∗2 are both least upper bounds for S, then b∗1 ≤ b∗2 andb∗2 ≤ b∗1, which thus implies that b∗1 = b∗2. The least upper bound b∗ of S is often calledthe supremum of S. We write b∗ = supS and b∗ = supx∈S x.

Example 2.2 The set S = (0, 5), consisting of all x such that 0 < x < 5, has manyupper bounds, some of which are 100, 6.73, and 5. Clearly no number smaller than 5can be an upper bound, so 5 is the least upper bound. Thus, supS = 5.

A set S is bounded below if there exists a real number a such that x ≥ a for all x ∈ S.The number a is a lower bound for S. A set S that is bounded below has a greatestlower bound a∗, with the property a∗ ≤ x for all x ∈ S, and a∗ ≥ a for all lower boundsa. The number a∗ is called the infimum of S and we write a∗ = inf S or a∗ = infx∈S x.Thus, we summarize

• supS = the least number greater than or equal to all numbers in S; and

• inf S = the greatest number less than or equal to all numbers in S.

Theorem 2.1 Let S be a set of real numbers and b∗ a real number. Then supS = b∗ ifand only if the following two conditions are satisfied:

1. x ≤ b∗ for all x ∈ S.

2. For each ε > 0, there exists an x ∈ S such that x > b∗ − ε.

15

Proof of Theorem 2.1: (=⇒) Since b∗ is an upper bound for S, by definition,property 1 holds, that is, x ≤ b∗ for all x ∈ S. Suppose, on the other hand, that thereis some ε > 0 such that x ≤ b∗ − ε for all x ∈ S. Define b∗∗ = b∗ − ε. This impliesthat b∗∗ is also an upper bound for S and b∗∗ < b∗. This contradicts our hypothesis thatb∗ is a least upper bound for S. (⇐=) Property 1 says that b∗ is an upper bound forS. Suppose, on the contrary, that b∗ is not a least upper bound. That is, there is someother b such that x ≤ b < b∗ for all x ∈ S. Define ε = b∗ − b. Then, we obtain thatx ≤ b∗ − ε for all x ∈ S. This contradicts property 2. �

16

Chapter 3

Topology in Rn

3.1 Sequences on R

A sequence is a function k �→ x(k) whose domain is the set {1, 2, 3, . . . } of all pos-itive integers. I denote the set of natural numbers by N = {1, 2, . . . }. The termsx(1), x(2), . . . , x(k), . . . of the sequence are usually denoted by using subscripts: x1, x2, . . . , xk, . . . .We shall use the notation {xk}∞k=1, or simply {xk}, to indicate an arbitrary sequence ofreal numbers. A sequence {xk} of real numbers is said to be

1. nondecreasing if xk ≤ xk+1 for k = 1, 2, . . .

2. strictly increasing if xk < xk+1 for k = 1, 2, . . .

3. nonincreasing if xk ≥ xk+1 for k = 1, 2, . . .

4. strictly decreasing if xk > xk+1 for k = 1, 2, . . .

A sequence that is nondecreasing or nonincreasing is called monotone. A sequence{xk} is said to converge to a number x if xk becomes arbitrarily close to x for allsufficient large k. We write limk→∞ xk = x or xk → x as k → ∞. The precise definitionof convergence is as follows:

Definition 3.1 The sequence {xk} converge to x if for every ε > 0, there exists anatural number Nε such that |xk − x| < ε for all k > Nε. The number x is calledthe limit of the sequence {xk}. A convergent sequence is one that converges to somenumber.

Note that the limit of a convergent sequence is unique. A sequence that does notconverge to any real number is said to diverge. In some cases we use the notationlimk→∞ xk even if the sequence {xk} is divergent. For example, we say that xk → ∞ ask → ∞. A sequence {xk} is bounded if there exists a number M such that |xk| ≤ M forall k = 1, 2, . . . . It is easy to see that every convergent sequence is bounded : If xk → x,by the definition of convergence, only finitely many terms of the sequence can lie outsidethe interval I = (x− 1, x + 1). The set I is bounded and the finite set of points from the

17

sequence that are not in I is bounded, so {xk} must be bounded. On the other hand, isevery bounded sequence convergent? No. For example, the sequence {yk} = {(−1)k} isbounded but not convergent.

Theorem 3.1 Every bounded monotone sequence is convergent.

Proof of Theorem 3.1: Suppose, without loss of generality, that {xk} is nonde-creasing and bounded. Let b∗ be the least upper bound of the set X = {xk|k = 1, 2, . . . },and let ε > 0 be an arbitrary number. Theorem 2.1 already showed that there must be aterm xN of the sequence for which xN > b∗ − ε. Because the sequence is nondecreasing,b∗ − ε < xN ≤ xk for all k > N . But the xk are all less than or equal to b∗ because ofboundedness, so we have b∗ − ε < xk ≤ b∗. Thus, for any ε > 0, there exists a numberN such that |xk − b∗| < ε for all k > N . Hence, {xk} converges to b∗. �

Theorem 3.2 Suppose that the sequences {xk} and {yk} converge to x and y, respec-tively. Then,

1. limk→∞(xk ± yk) = x ± y

2. limk→∞(xk · yk) = x · y3. limk→∞(xk/yk) = x/y, assuming that yk = 0 for all k and y = 0.

Exercise 3.1 Prove Theorem 3.2.

3.1.1 Subsequences

Let {xk} be a sequence. Consider a strictly increasing sequence of natural numbers

k1 < k2 < k3 < · · ·

and form a new sequence {yj}∞j=1, where yj = xkjfor j = 1, 2, . . . . The sequence

{yj}j = {xkj}j is called a subsequence of {xk}.

Theorem 3.3 Every subsequence of a convergent sequence is itself convergent, and hasthe same limit as the original sequence.

Proof of Theorem 3.3: It is trivial. �

Theorem 3.4 If the sequence {xk} is bounded, then it contains a convergent subse-quence.

Proof of Theorem 3.4: Since {xk} is bounded, we can assume that there existssome M ∈ R such that |xk| ≤ M for all k ∈ N. Let yn = sup{xk|k ≥ n} for n ∈ N.By construction, {yn} is a nonincreasing sequence because the set {xk|k ≥ n} shrinksas n increases. The sequence {yn} is also bounded because −M ≤ yn ≤ M . Theorem3.1 already showed that the sequence {yn} is convergent. Let x = limn→∞ yn. By the

18

definition of yn, we can choose a term xkn from the original sequence {xk} (with kn ≥ n)satisfying |yn − xkn | < 1/n.

|x − xkn | = |x − yn + yn − xkn | ≤ |x − yn| + |yn − xkn | < |x − yn| + 1/n.

This shows that xkn → x as n → ∞. �

3.1.2 Cauchy Sequences

I have defined the concept of convergence of sequences. Then, a natural question arises asto how we can check if a given sequence is convergent. The concept of Cauchy sequence,indeed, enables us to do so.

Definition 3.2 A sequence {xk} of real numbers is called a Cauchy sequence if forevery ε > 0, there exists a natural number Nε such that |xm −xn| < ε for all m,n > Nε.

The theorem below is a characterization of convergent sequences.

Theorem 3.5 A sequence is convergent if and only if it is a Cauchy sequence.

Proof of Theorem 3.5: (=⇒) Suppose that {xk} converges to x. Given ε > 0,we can choose a natural number N such that |xn − x| < ε/2 for all n > N . Then, form,n > N ,

|xm − xn| = |xm − x + x − xn| ≤ |xm − x| + |x − xm| < ε/2 + ε/2 = ε.

Therefore, {xk} is a Cauchy sequence. (⇐=) Suppose that {xk} is a Cauchy sequence.First, we shall show that the sequence is bounded. By the Cauchy property, there is anumber M such that |xk−xM | < 1 for k > M . Moreover, the finite set {x1, x2, . . . , xM−1}is clearly bounded. Hence, {xk} is bounded. Theorem 3.4 showed that the boundedsequence {xk} has a convergent subsequence {xkj

}. Let x = limj xkj. Because {xk} is a

Cauchy sequence, for every ε > 0, there is a natural number N such that |xm−xn| < ε/2for all m,n > N . If we take J is sufficiently large, we have |xkj

− x| < ε/2 for all j > J .Then for k > N and j > max{N,J},

|xk − x| = |xk − xkj+ xkj

− x| ≤ |xk − xkj| + |xkj

− x| < ε/2 + ε/2 = ε

Hence xk → x as k → ∞. �

Exercise 3.2 Consider the sequence {xk} with the generic term

xk =112

+122

+ · · · + 1k2

=k∑

i=1

1i2

19

Prove that this sequence is a Cauchy sequence. Hint:

1(n + 1)2

+1

(n + 2)2+ · · · + 1

(n + k)2

<1

n(n + 1)+

1(n + 1)(n + 2)

+ · · · + 1(n + k − 1)(n + k)

=(

1n− 1

n + 1

)+

(1

n + 1− 1

n + 2

)+ · · · +

(1

n + k − 1− 1

n + k

)

Exercise 3.3 Prove that a sequence can have at most one limit. Use proof by contra-diction. Namely, you first suppose, by way of contradiction, that there are two limitpoints.

3.1.3 Upper and Lower Limits

Let {xk} be a sequence that is bounded above, and define yn = sup{xk|k ≥ n} forn = 1, 2, . . . . Each yn is a finite number and {yn} is a nonincreasing sequence. Theneither limn→∞ yn exists or is −∞. We call this limit the upper limit (or lim sup) of thesequence {xk}, and we introduce the following notation:

limk→∞

supxk = limn→∞ (sup{xk|k ≥ n})

If {xk} is not bounded above, we write lim supk→∞ xk = ∞. Similarly, if {xk} isbounded below, its lower limit (or lim inf, is defined as

limk→∞

inf xk = limn→∞ (inf{xk|k ≥ n)

If {xk} is not bounded below, we write lim infk→∞ xk = −∞.

Theorem 3.6 If the sequence {xk} is convergent, then

limk→∞

supxk = limk→∞

inf xk = limk→∞

xk.

On the other hand, if limk→∞ supxk = limk→∞ inf xk, then {xk} is convergent.

I omit the proof of Theorem 3.6.

Exercise 3.4 Determine the lim sup and lim inf of the following sequences.

1. {xk} = {(−1)k}2. {xk} =

{(−1)k (2 + 1/k) + 1

}

20

3.1.4 Infimum and Supremum of Functions

Suppose that f(x) is defined for all x ∈ B, where B ⊂ Rn. We define the infimum andsupremum of the function f over B by

infx∈B

f(x) = inf{f(x)|x ∈ B}, supx∈B

f(x) = sup{f(x)|x ∈ B}.

If a function f is defined over a set B, if infx∈B f(x) = y, and if there exists a c ∈ Bsuch that f(c) = y, then we say that the infimum is attained (at the point c) in B. Inthis case the infimum y is called the minimum of f over B, and we often write “min”instead of “inf.” In the same way we write “max” instead of “sup” when the supremumof f over B is attained in B, and so becomes the maximum.

3.1.5 Indexed Sets

Suppose that, for each λ ∈ Λ, we specify an object aλ. Then, these objects form anindexed set {aλ}λ∈Λ with Λ as its index set. In formal terms, an indexed set is a functionwhose domain is the index set. For example, a sequence is an indexed set {ak}k∈� withthe set N of natural numbers as its index set. In stead of {ak}k∈� one often writes{ak}∞k=1.

A set whose elements are sets is often called a family of sets, and so an indexed setof sets is also called an indexed family of sets. Consider a nonempty indexed family{Aλ}λ∈Λ of sets. The union and the intersection of this family are the sets

• ⋃λ∈Λ Aλ = the set consisting of all x that belong to Aλ for at least one λ ∈ Λ

• ⋂λ∈Λ Aλ = the set consisting of all x that belong to Aλ for all λ ∈ Λ.

The distributive laws can be generalized to

A ∪(⋂

λ∈Λ

)=

⋂λ∈Λ

(A ∪ Bλ) , A ∩(⋃

λ∈Λ

)=

⋃λ∈Λ

(A ∩ Bλ)

and De Morgan’s laws to

A\(⋃

λ∈Λ

)=

⋂λ∈Λ

(A\Bλ) , A\(⋂

λ∈Λ

)=

⋃λ∈Λ

(A\Bλ)

The union and the intersection of a sequence {An}n∈� = {An}∞n=1 of sets is oftenwritten as

⋃∞n=1 An and

⋂∞n=1 An.

3.2 Point Set Topology in Rn

Consider the n-dimensional Euclidean space Rn, whose elements, or points, are n-vectors x = (x1, . . . , xn). The Euclidean distance d(x, y) between any two points x =

21

(x1, . . . , xn) and y = (y1, . . . , yn) in Rn is the norm ‖x − y‖ of the vector differencebetween x and y. Thus,

d(x, y) = ‖x − y‖ =√

(x1 − y1)2 + · · · + (xn − yn)2

If x, y, and z are points in Rn, then

d(x, z) ≤ d(x, y) + d(y, z) (triangle inequality)

If x0 is a point in Rn and r is a positive real number, then the set of all points x ∈ Rn

whose distance from x0 is less than r, is called the open ball around x0 with radius r.This open ball is denoted by Br(x0). Thus,

Br(x0) = {x ∈ Rn|d(x0, x) < r}

Definition 3.3 A set S ⊂ Rn is open if, for all x0 ∈ S, there exists some ε > 0 suchthat Bε(x0) ⊂ S.

On the real line R, the simplest type of open set is an open interval. Let S be anysubset of Rn. A point x0 ∈ S is called an interior point of S if there is some ε > 0 suchthat Bε(x0) ⊂ S. The set of all interior points of S is called the interior of S, and isdenoted int(S). A set S is said to be a neighborhood of x0 if x0 is an interior point of S,that is, if S contains some open ball Bε(x0) (i.e., Bε(x0) ⊂ S) for some ε > 0.

Theorem 3.7 1. The entire space Rn and the empty set ∅ are both open.

2. arbitrary unions of open sets are open: Let Λ be an arbitrary index set. If Aλ isopen for each λ ∈ Λ, then,

⋃λ∈Λ Aλ is also open.

3. The intersection of finitely many open sets is open: Let Λ be a finite set. If Aλ isopen for each λ ∈ Λ, then

⋂λ∈Λ Aλ is open.

Proof of Theorem 3.7: (1) It is clear that B1(x) ⊂ Rn for all x ∈ Rn, so Rn isopen. The empty set ∅ is open because the set has no element, so every member is aninterior point.

(2) Let {Uλ}λ∈Λ be an arbitrary family of open sets in Rn, and let U∗ =⋃

λ∈Λ Uλ bethe union of the whole family. For each x ∈ U∗, there is at least one λ ∈ Λ such thatx ∈ Uλ. Since Uλ is open by our hypothesis, there exists ε > 0 such that Bε(x) ⊂ Uλ ⊂U∗. Hence, U∗ is open.

(3) Let {Ui}Kk=1 be finite collection of open sets in Rn, and let U∗ =

⋂Kk=1 Uk be the

intersection of all these sets. Let x be any point in U∗. Since Uk is open for each k,there is εk > 0 such that Bεk

(x) ⊂ Uk for each k. Let ε∗ = min{ε1, . . . , εK}. This iswell defined because of finiteness. Then, Bε∗(x) ⊂ Uk for each k, which implies thatBε∗(x) ⊂ U∗. Hence, U∗ is open. �

22

Exercise 3.5 There are two questions. First, draw the graph of S = {(x, y) ∈ R2|2x −y < 2 and x − 3y < 5}. Second, prove that S is open in R2.

Definition 3.4 A set S is closed if its complement, Rn\S is open.

A point x0 ∈ Rn is said to be a boundary point of the set S ⊂ Rn if Bε(x0) ∩ Sc = ∅and Bε(x0) ∩ S = ∅ for every ε > 0. Here, Sc = R\S. In general, a set may includenone, some, or all of its boundary points. An open set, for instance, contains none of itsboundary points.

Each point in a set is either an interior point or a boundary point of the set. The setof all boundary points of a set S is said to be the boundary of S and is denoted ∂S orbd(S). Note that, given any set S ⊂ Rn, there is a corresponding partition of Rn intothree mutually disjoint sets (some of which may be empty), namely;

1. the interior of S, which consists of all points x ∈ Rn such that N ⊂ S for someneighborhood N of x;

2. the exterior of S, which consists of all points x ∈ Rn for which there exists someneighborhood N of x such that N ⊂ Rn\S;

3. the boundary of S, which consists of all points x ∈ Rn with the property that everyneighborhood N of x intersects both S and its complement Rn\S.

A set S ∈ Rn is said to be closed if it contains all its boundary points. The union ofS and its boundary (S ∪ ∂S) is called the closure of S, denoted by S. A point x belongsto S if and only if Bε(x) ∩ S = ∅ for any ε > 0. The closure S of any set S is indeedclosed. In fact, S is the smallest closed set containing S.

Theorem 3.8 1. The whole space Rn and the empty set ∅ are both closed.

2. Arbitrary intersections of closed sets are closed.

3. The union of finitely many closed sets is closed.

Exercise 3.6 Prove Theorem 3.8. Use the fact that the complement of open set is closedand Theorem 3.7.

In topology, any set containing some of its boundary points but not all of them, isneither open nor closed. The half-open intervals [a, b) and (a, b], for examples, are neitheropen nor closed. Hence, openness and closedness are not mutually exclusive.

3.3 Topology and Convergence

I want to generalize the argument in Section 3.1 into Rn. The basic idea to do so is toapply the previous argument coordinate-wise. A sequence {xk}∞k=1 in Rn is a functionthat for each natural number k yields a corresponding point xk in Rn.

23

Definition 3.5 A sequence {xk} in Rn converges to a point x ∈ Rn if for each ε > 0,there exists a natural number N such that xk ∈ Bε(x) for all k ≥ N , or equivalently, ifd(xk, x) → 0 as k → ∞.

Theorem 3.9 Let {xk} be a sequence in Rn. Then, {xk} converges to the vector x ∈ Rn

if and only if for each j = 1, . . . , n, the real number sequence {x(j)k }∞k=1, consisting of jth

component of each vector xk, converges to x(j) ∈ R, the jth component of x.

Proof of Theorem 3.9: (=⇒) For every k and every j, one has d(xk, x) = ‖xk−x| ≥|x(j)

k − x(j)|. It follows that if xk → x, then x(j)k → x(j) for each j. (⇐=) Suppose that

x(j)k → x(j) as k → ∞ for j = 1, . . . , n. Then, given any ε > 0, for each i = 1, . . . , n,

there exists a number Nj such that |x(j)k − x(j)| < ε/

√n for all k > Nj . It follows that

d(xk, x) =√

|x(1)k − x(1)|2 + · · · + |x(n)

k − x(n)|2 <√

ε2/n + · · · ε2/n = ε,

for all k > max{N1, . . . , Nn}. This is well defined because of the finiteness of n. There-fore, xk → x as k → ∞ �

Definition 3.6 A sequence {xk} in Rn is said to be Cauchy sequences if for everyε > 0, there exists a number N such that d(xm, xn) < ε for all m,n > N .

Theorem 3.10 A sequence {xk} in Rn is convergent if and only if it is a Cauchy se-quence.

Exercise 3.7 Prove Theorem 3.10. Apply the same argument in Theorem 3.5 to eachcoordinate.

3.4 Properties of Sequences in Rn

Theorem 3.11 1. For any set S ⊂ Rn, a point x ∈ Rn belongs to S if and only ifthere exists {xk} ∈ S such that xk → x as k → ∞.

2. A set S ⊂ Rn is closed if and only if every convergent sequence of points in S hasits limit in S.

Proof of Theorem 3.11: (=⇒ of Property 1) Let x ∈ S. Regardless of whetherx ∈ intS or ∂S, for each k ∈ N, we can construct xk such that xk ∈ B1/k(x) ∩ S (Inparticular, take xk = x for each k). Then xk → x as k → ∞. (⇐= of Property 1) Supposethat {xk} is a convergent sequence for which xk ∈ S for each k and x = limk→∞ xk. Weclaim that x ∈ S. For any ε > 0, there is a number N ∈ N such that xk ∈ Bε(x) forall k > N . Since xk ∈ S for each k, it follows that Bε(x) ∩ S = ∅. Suppose, on theother hand, x /∈ S. Since S is closed, there is some ε > 0 such that Bε(x) ∩ S = ∅. Thiscontradicts the conclusion I just drew that Bε(x) ∩ S = ∅ for any ε > 0. Hence, x ∈ S.

24

(=⇒ of Property 2) Assume that S is closed and let {xk} be a convergent sequencesuch that xk ∈ S for each k. Note that x ∈ S by property 1. Since S = S if S is closed,it follows that x ∈ S. (⇐= of Property 2) By property 1, for any point x ∈ S, there issome sequence {xk} for which xk ∈ S for each k and limk→∞ xk = x. By our hypothesis,x ∈ S. This shows that x ∈ S implies x ∈ S, i.e., S ⊂ S. By definition, S ⊂ S for anyS. Hence S = S, that is, S is closed. �

Definition 3.7 A set S in Rn is bounded if there exists a number M ∈ R such that‖x‖ ≤ M for all x ∈ S. A set that is not bounded is called unbounded. Here ‖x‖ =d(x, 0) =

√x2

1 + · · · + x2n, called the Euclidean norm.

Similarly, a sequence {xk} in Rn is bounded if the set {xk|k = 1, 2, . . . } is bounded.

Lemma 3.1 Any convergent sequence {xk} in Rn is bounded.

Proof of Lemma 3.1: If xk → x, then only finitely many terms of sequence can lieoutside the ball B1(x). The ball B1(x) is bounded and any finite set of points is bounded,so {xk} must be bounded. �

On the other hand, a bounded sequence {xk} in Rn is not necessarily convergent.This is the same as the sequences in R. The theorem below gives us a characterizationof boundedness of the set in terms of sequences.

Theorem 3.12 A subset S of Rn is bounded if and only if every sequence of points inS has a convergent subsequence.

I omit the proof of this theorem. Although it is not a difficult proof, it is tedious toprove that. The next concept of compact sets is used extensively in both mathematicsand economics.

Definition 3.8 A set S in Rn is compact if it is closed and bounded.

Theorem below is a characterization of compact sets in terms of sequences.

Theorem 3.13 (Bolzano-Weierstrass) A subset S of Rn is compact if and only ifevery sequence of points in S has a subsequence that converges to a point in S.

Proof of Bolzano-Weierstrass’s theorem: (=⇒) Suppose that S is compact andlet {xk} be a sequence such that xk ∈ S for each k. Since S is also bounded, there isa convergent subsequence {yn} = {xkn}. Furthermore, limn→∞ yn = y ∈ S because Sis closed. (⇐=) Let {xk} be any sequence for which xk ∈ S for each k. Suppose thatthere exists a subsequence {yn} = {xkn} of the sequence {xk} with limn→∞ yn = y ∈ S.This with the previous theorem (Theorem 3.12) already showed that S is bounded. Let{xk} be any sequence for which xk ∈ S for each k and x = limk→∞ xk. Since S is closedby definition, it follows that x ∈ S. By assumption, {xk} has a subsequence {xkj

} thatconverges to limj→∞ xkj

= x′ ∈ S. But {xkj

} also converges to x. Hence, the limitpoints must be the same, that is, x = x

′ ∈ S. �

25

Exercise 3.8 Let the number of commodities of the competitive market be n. Let pi > 0be a price for commodity i for each i = 1, . . . , n. Let y > 0 be the consumer’s income.Define the consumer’s budget set B(p, y) as

B(p, y) ≡{

x = (x1, . . . , xn) ∈ Rn+

∣∣∣∣ n∑i=1

pixi ≤ y

}.

Show that B(p, y) is nonempty and compact.

3.5 Continuous Functions

Consider first a real-valued function z = f(x) = f(x1, . . . , xn) of n variables. Roughlyspeaking, f is continuous if small changes in the independent variables cause only smallchanges in the function value.

Definition 3.9 A function f : S → R with domain S ⊂ Rn is continuous at a pointx0 in S if for every ε > 0 there exists a δ > 0 such that

|f(x) − f(x0)| < ε for all x ∈ S with ‖x − x0‖ < δ.

If f is continuous at every point in a set S, we simply say that f is continuous on S.

Exercise 3.9 Let f(x) =√

x be a function f : R+ → R. Prove that f is continuous.

Exercise 3.10 Let f : R+ → R be given below.

f(x) ={

1 if x ≥ 10 if x < 1

Show that f is not a continuous function.

Consider next the general case of vector-valued functions.

Definition 3.10 A function f = (f1, . . . , fm) from a subset S of Rn to Rm is said to becontinuous at x0 in S if for every ε > 0, there exists a δ > 0 such that d(f(x), f(x0)) < εfor all x ∈ S with d(x, x0) < δ, or equivalently, such that f(Bδ(x0) ∩ S) ⊂ Bε(f(x0)).

The next theorem shows that the continuity of vector-valued functions can reduce tothe continuity of each component (coordinate) functions, and vice versa.

Theorem 3.14 A function f = (f1, . . . , fm) from S ⊂ Rn to Rm is continuous at apoint x0 in S if and only if each component function fj : S → R, j = 1, . . . ,m, iscontinuous at x0.

26

Proof of Theorem 3.14: (=⇒) Suppose f is continuous at x0. Then, for everyε > 0, there exists a δ > 0 such that

|fj(x) − fj(x0)| ≤ d(f(x), f(x0)) < ε

for every x ∈ S with d(x, x0) < δ. Hence, fj is continuous at x0 for j = 1, . . . ,m. (⇐=)Suppose that each component fj is continuous at x0. Then, for every ε > 0 and everyj = 1, . . . ,m, there exists δj > 0 such that |fj(x)−fj(x0)| < ε/

√m for every point x ∈ S

with d(x, x0) < δj . Let δ = min{δ1, . . . , δm}. Then x ∈ Bδ(x0) ∩ S implies that

d(f(x), f(x0)) =√

|f1(x) − f1(x0)|2 + · · · + |fm(x) − fm(x0)|2 <

√ε2

m+ · · · + ε2

m= ε.

This proves that f is continuous at x0. �

Here, I want to characterize the continuity of the functions in terms of sequences.

Theorem 3.15 A function f from S ⊂ Rn into Rm is continuous at a point x0 in S ifand only if f(xk) → f(x0) for every sequence {xk} of points in S that converges to x0.

Proof of Theorem 3.15: (=⇒) Suppose that f is continuous at x0 and let {xk} bea sequence for which xk ∈ S and limk→∞ xk = x0. Let ε > 0 be given. Since xk → x0,for any δ > 0, there exists a number N ∈ N such that d(xk, x0) < δ for all k > N .Therefore, because of the continuity of f , there exists δ > 0 such that d(f(x), f(x0)) < εwhenever x ∈ Bδ(x0) ∩ S. But then xk ∈ Bδ(x0) ∩ S and so d(f(xk), f(x0)) < ε forall k > N . This implies that f(xk) → f(x0). (⇐=) Let {xk} be a sequence for whichxk ∈ S for each k and x0 = limk→∞ xk. Since xk → x0, for any δ > 0, there is a numberNδ such that d(xk, x0) < δ for all k > Nδ. Similarly, since f(xk) → f(x0), for anyε > 0, there exists a number Nε such that d(f(xk), f(x0)) < ε for all k > Nε. DefineN∗ = max{Nδ, Nε}. Then, by choosing k > N∗, there exists a δ > 0 with d(xk, x0) < δsuch that d(f(xk), f(x0)) < ε. In fact, this holds for any ε > 0. Hence, we prove that fis continuous at x0. �

The theorem below shows that continuous mappings preserve the compactness of theset.

Theorem 3.16 Let S ⊂ Rn and let f : S → Rm be continuous. Then f(K) = {f(x)|x ∈K} is compact for every compact subset K of S.

Proof of Theorem 3.16: Let {yk} be any sequence in f(K). By definition, for eachk, there is a point xk ∈ K such that yk = f(xk). Because K is compact, by Bolzano-Weierstrass’s theorem, the sequence {xk} has a subsequence {xkj

} with the property thatxkj

∈ K for each j and limj→∞ xkj= x0 ∈ K. Because f is continuous, by the previous

theorem (Theorem 3.15), f(xkj) → f(x0) as j → ∞, where f(x0) ∈ f(K) because

x0 ∈ K. But then {ykj} is a subsequence of {yk} that converges to a point f(x0) ∈ f(K).

So, we have proved that any sequence in f(K) has a subsequence converging to a pointof f(K). �

27

Suppose that f is a continuous function from Rn to Rm. If V is an open set in Rn,the image f(V ) = {f(x)|x ∈ V } of V need not be open in Rm. Nor need f(C) be closedif C is closed. Nevertheless, the inverse image f−1(U) = {x|f(x) ∈ U} of an open set Uunder continuous function f is always open. Similarly, the inverse image of any closedset must be closed.

Theorem 3.17 Let f be any function from Rn to Rm. Then f is continuous if and onlyif either of the following equivalent conditions is satisfied.

1. f−1(U) is open for each open set U in Rm.

2. f−1(F ) is closed for each closed set F in Rm.

I omit the proof of Theorem 3.17. Because it is conceptually involved. So, just acceptthe result.

Theorem 3.18 Let S be a compact set in R and let x∗ be the greatest lower bound of Sand x∗ be the lowest upper bound of S. Then, x∗ ∈ S and x∗ ∈ S.

Proof of Theorem 3.18: Let S ⊂ R be closed and bounded and let x∗ be thelowest upper bound of S. Then, by definition of any lower bound, we have x∗ ≥ x forall x ∈ S. If x∗ = x for some x ∈ S, we are done. Suppose, therefore, that x∗ is strictlygreater than every point in S. If x∗ > x for all x ∈ S, then x∗ /∈ S, so x∗ ∈ R\S.Since S is closed, R\S is open. Then, by the definition of open sets, there exists someε > 0 such that Bε(x∗) = (x∗ − ε, x∗ + ε) ⊂ R\S. Since x∗ > x for all x ∈ S andBε(x∗) ⊂ R\S, we claim that for any x ∈ Bε(x∗), we must have x > x for all x ∈ S. Inparticular, x∗ − ε/2 ∈ Bε(x∗) and x∗ − ε/2 > x for all x ∈ S. But then this contradictsour hypothesis that x∗ is the lowest upper bound of S. Thus, we must conclude thatx∗ ∈ S. The same argument should be constructed for the greatest lower bound of S,i.e., x∗. �

Theorem 3.19 (Weierstrass’s Theorem) Let f : S → R be a continuous real-valuedmapping where S is a nonempty compact subset of Rn. Then there exists a vector x∗, x∗ ∈S such that for all x ∈ S,

f(x∗) ≤ f(x) ≤ f(x∗).

Proof of Weierstrass’s Theorem: It follows from Theorem 3.16 and 3.18. �

28

Chapter 4

Linear Algebra

4.1 Basic Concepts in Linear Algebra

Let f : Rn → Rm be a mapping (transformation). A mapping f is said to be linearif for any x, y ∈ Rn and any α ∈ R, the following two conditions are satisfied: (1)f(x + y) = f(x) + f(y) and (2) f(αx) = αf(x). For any linear mapping f : Rn → Rm,there exists a unique m × n matrix A such that f(x) = Ax for all x ∈ Rn. 1 An m × nmatrix is a rectangular array with m rows and n columns:

A = (aij)m×n =

⎛⎜⎜⎜⎝

a11 a12 · · · a1n

a21 a22 · · · a2n...

.... . .

...am1 am2 · · · amn

⎞⎟⎟⎟⎠

Here aij denotes the elements in the ith row and the jth column. With this notation,we can express f(x) = Ax as below:

f(x) =

⎛⎜⎜⎜⎜⎜⎜⎝

f (1)(x)...

f (j)(x)...

f (m)(x)

⎞⎟⎟⎟⎟⎟⎟⎠ =

⎛⎜⎜⎜⎝

a11 a12 · · · a1n

a21 a22 · · · a2n...

.... . .

...am1 am2 · · · amn

⎞⎟⎟⎟⎠

⎛⎜⎜⎜⎝

x1

x2...

xn

⎞⎟⎟⎟⎠ .

Exercise 4.1 Show that any linear mapping f : Rn → Rm is continuous.

If A = (aij)m×n, B = (bij)m×n, and α is a scalar, we define

• A + B = (aij + bij)m×n,

• αA = (αaij)m×n,

1This is a non-trivial statement. But I take this one-to-one correspondence between linear mappingand matrix representation as a fact with no proof provided.

29

• A − B = A + (−1)B = (aij − bij)m×n.

Let f : Rn → Rm and g : Rn → Rp be linear mappings. Then, we can set m × nmatrix A = (aij)m×n associated with f and p × m matrix B = (bij)p×m associated withg. Consider the composite mapping g ◦ f(x) = g(f(x)). What I would like to have is therequirement on the product of matrices that g ◦ f ≡ BA. Then the product C = BAis defined as the p × n matrix C = (cij)p×n, whose element in the ith row and the jthcolumn is the inner product of the ith row of A and the jth column of B. That is,

cij =n∑

r=1

airbrj = ai1b1j + ai2b2j + · · · + aikbkj + · · · ainbnj︸ ︷︷ ︸n terms

It is important to note that the product BA is well defined only if the number ofcolumns in B is equal to the number of rows in A.

If A,B, and C are matrices whose dimensions are such that the given operations arewell defined, then the basic properties of matrix of multiplication are:

• (AB)C = A(BC) (associative law)

• A(B + C) = AB + AC (left distributive law)

• (A + B)C = AC + BC (right distributive law)

Exercise 4.2 Show the above three properties when we consider 2 × 2 matrices.

However, matrix multiplication is not commutative. In fact,

• AB = BA, except in special cases

• AB = 0 does not imply that A or B is 0

• AB = AC and A = 0 do not imply that B = C

Exercise 4.3 Confirm the above three points by example.

By using matrix multiplication, one can write a general system of linear equations ina very concise way. Specifically, the system

a11x1 + a12x2 + · · · + a1nxn = b1

a21x1 + a22x2 + · · · + a2nxn = b2

· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·am1x1 + am2x2 + · · · + amnxn = bm

can be written as Ax = b if we define

A =

⎛⎜⎜⎜⎝

a11 a12 · · · a1n

a21 a22 · · · a2n...

.... . .

...am1 am2 · · · amn

⎞⎟⎟⎟⎠ , x =

⎛⎜⎜⎜⎝

x1

x2...

xn

⎞⎟⎟⎟⎠ , b =

⎛⎜⎜⎜⎝

b1

b2...bn

⎞⎟⎟⎟⎠

30

A matrix is square if it has an equal number of rows and columns. If A is a squarematrix and n is a positive integer, we define the nth power of A in the obvious way:

An = AA · · ·A︸ ︷︷ ︸n factors

For diagonal matrices it is particularly easy to compute powers:

D =

⎛⎜⎜⎜⎝

d1 0 · · · 00 d2 · · · 0...

.... . .

...0 0 · · · dm

⎞⎟⎟⎟⎠ =⇒ Dn =

⎛⎜⎜⎜⎝

dn1 0 · · · 00 dn

2 · · · 0...

.... . .

...0 0 · · · dn

m

⎞⎟⎟⎟⎠

The identity matrix of order n, denoted by In, is the n×n matrix having ones alongthe main diagonal and zeros elsewhere:

In =

⎛⎜⎜⎜⎝

1 0 · · · 00 1 · · · 0...

.... . .

...0 0 · · · 1

⎞⎟⎟⎟⎠ (identity matrix)

If A is any m × n matrix, then AIn = A = ImA. In particular,

AIn = InA = A for every n × n matrix A

If A = (aij)m×n is any matrix, the transpose of A is defined as AT = (aji)n×m.The subscripts i and j are interchanged because every row of A becomes a column ofAT , and every column of A becomes a row of AT . The following rules apply to matrixtransposition:

1. (AT )T = A

2. (A + B)T = AT + BT

3. (αA)T = αAT

4. (AB)T = BTAT

Exercise 4.4 Prove the above four properties when we consider 2 × 2 matrices.

A square matrix is said to be symmetric if A = AT .

31

4.2 Determinants and Matrix Inverses

4.2.1 Determinants

Recall that the determinants |A| of 2 × 2 and 3 × 3 matrices are defined by

|A| =∣∣∣∣ a11 a12

a21 a22

∣∣∣∣ = a11a22 − a12a21

|A| =

∣∣∣∣∣∣a11 a12 a13

a21 a22 a23

a31 a32 a33

∣∣∣∣∣∣ = a11a22a33 + a12a23a31 + a13a21a32 − a11a23a32 − a12a21a33 − a13a22a31

For a general n×n matrix A = {aij}, the determinant |A| can be defined recursively.In fact,

|A| = ai1Ai1 + ai2Ai2 + · · · aijAij + · · · + ainAin

where the cofactors Aij are determinants of (n − 1) × (n − 1) matrices given by

Aij = (−1)i+j

∣∣∣∣∣∣∣∣∣∣∣∣∣∣

a11 · · · a1,j−1 a1j a1,j+1 · · · a1n

a21 · · · a2,j−1 a2j a2,j+1 · · · a2n...

......

......

ai1 · · · ai,j−1 aij ai,j+1 · · · ain

......

......

...an1 · · · an,j−1 anj an,j+1 · · · ann

∣∣∣∣∣∣∣∣∣∣∣∣∣∣Here row i and column j are to be deleted from the matrix A to produce Aij.

Proposition 4.1 Let A and B be n × n matrices. Then, |AB| = |A||B|Exercise 4.5 Prove Proposition 1 when n = 2.

4.2.2 Matrix Inverses

The inverse A−1 of an n × n matrix A has the following properties:

• B = A−1 ⇐⇒ AB = In ⇐⇒ BA = In

• A−1 exists ⇐⇒ |A| = 0

If A = (aij)n×n and |A| = 0, the unique inverse of A is given by

A−1 =1|A|adj(A), where adj(A) =

⎛⎜⎜⎜⎝

A11 A21 · · · An1

A12 A22 · · · An2...

.... . .

...A1n A2n · · · Ann

⎞⎟⎟⎟⎠

with Aij , the cofactor of the element aij. Note carefully the order of the indices in theadjoint matrix, adj(A) with the column number preceding the row number. The matrix(Aij)n×n is called the cofactor matrix, whose transpose is the adjoint matrix.

32

Exercise 4.6 Define A as

A =

⎛⎝ a11 a12 a13

a21 a22 a23

a31 a32 a33

⎞⎠

Then, derive A−1. Assume that |A| = 0.

Lemma 4.1 The following rules for inverses can be established.

• (A−1)−1 = A,

• (AB)−1 = B−1A−1,

• (AT )−1 = (A−1)T ,

• (αA)−1 = α−1A−1, where α ∈ R.

Exercise 4.7 Prove Lemma 4.1 when n = 2.

Proposition 4.2 Let A be a n × n matrix. Then, |A−1| = 1/|A|.

Exercise 4.8 Prove Proposition 4.2 when n = 2.

4.2.3 Cramer’s Rule

A linear system of n equations and n unknowns,

a11x1 + a12x2 + · · · + a1nxn = b1

a21x1 + a22x2 + · · · + a2nxn = b2 (∗)· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·

an1x1 + an2x2 + · · · + annxn = bn

has a unique solution if and only if |A| = 0. The solution is then

xj =|Aj||A| , j = 1, . . . , n

where the determinant

|Aj | =

∣∣∣∣∣∣∣∣∣∣

a11 · · · a1,j−1 b1 a1,j+1 · · · a1n

a21 · · · a2,j−1 b2 a2,j+1 · · · a2n

.... . .

......

.... . .

...an1 · · · an,j−1 bn an,j+1 · · · ann

∣∣∣∣∣∣∣∣∣∣is obtained by replacing the jth column of |A| by the column whose components areb1, b2, . . . , bn. If the right-hand side of the equation system (∗) consists only of zeros, sothat it can be written in matrix form as Ax = 0, the system is called homogeneous. Ahomogeneous system will always have the trivial solution x1 = x2 = · · · = xn = 0.

33

Lemma 4.2 Ax = 0 has nontrivial solutions if and only if |A| = 0.

I omit the proof of Lemma 4.2.

Exercise 4.9 Use Cramer’s rule to solve the following system of equations:

2x1 − 3x2 = 24x1 − 6x2 + x3 = 7

x1 + 10x2 = 1.

4.3 Vectors

An n-vector is an ordered n-tuple of numbers. It is often convenient to regard the rowsand columns of a matrix as vectors, and an n-vector can be understood either as a 1×nmatrix a = (a1, a2, . . . , an) (a row vector) or as an n × 1 matrix aT = (a1, a2, . . . , an)T

(a column vector). The operations of addition, subtraction and multiplication by scalarsof vectors are defined in the obvious way. The dot product (or inner product) of then-vectors a = (a1, a2, . . . , an) and b = (b1, b2, . . . , bn) is defined as

a · b = a1b1 + a2b2 + · · · + anbn =n∑

i=1

aibi

Proposition 4.3 If a,b, and c are n-vectors and α is a scalar, then

1. a · b = b · a,

2. a · (b + c) = a · b + a · c,3. (αa) · b = a · (αb) = α(a · b).

4. a · a = 0 =⇒ a = 0

5. (a + b) · (a + b) = a · a + 2(a · b) + b · b.

Exercise 4.10 Prove Proposition 4.1. If you find it difficult to do so, focus on vectorsin R2.

The Euclidean norm or length of the vector a = (a1, a2, . . . , an) is

‖a‖ =√

a · a =√

a21 + a2

2 + · · · + a2n

Note that ‖αa‖ = |α|‖a‖ for all scalars and vectors.

Lemma 4.3 The following useful inequalities hold.

1. |a · b| ≤ ‖a‖ · ‖b‖ (Cauchy-Schwartz inequality)

34

2. ‖a + b‖ ≤ ‖a‖ + ‖b‖ (Minkowski inequality)

Proof of Cauchy-Schwartz inequality: Define f(t) as

f(t) = (ta + b) · (ta + b),

where t ∈ R. Because of the definition of dot products, we have f(t) ≥ 0 for any t ∈ R.

f(t) = t2‖a‖2 + 2t(a · b) + ‖b‖2.

Then, using the formula, we solve the above equation with respect to t:

t =−(a · b) ±√

(a · b)2 − ‖a‖2‖b‖2

‖a‖2

Since f(t) ≥ 0 for any t ∈ R, we must have

|a · b| ≤ ‖a‖‖b‖. �

Exercise 4.11 Prove property 2 in Lemma 4.3. (Hint: It suffices to show that ‖a+b‖2 ≤(‖a‖ + ‖b‖)2)

Cauchy-Schwartz inequality implies that, for any a, b ∈ Rn,

−1 ≤ a · b‖a‖‖b‖ ≤ 1.

Thus, the angle θ between nonzero vectors a and b ∈ Rn is defined by

cos θ =a · b

‖a‖ · ‖b‖ , θ ∈ [0, π]

This definition reveals that cos θ = 0 if and only if a · b = 0. Then θ = π/2. In symbols,

a⊥b ⇐⇒ a · b = 0

The hyperplane in Rn that passes through the point a = (a1, . . . , an) and is orthogo-nal to the nonzero vector p = (p1, . . . , pn), is the set of all points x = (x1, . . . , xn) suchthat

p · (x − a) = 0

4.4 Linear Independence

Definition 4.1 The n vectors a1,a2, . . . ,an in Rm are linearly dependent if thereexist numbers c1, c2, . . . , cn, not all zero, such that

c1a1 + c2a2 + · · · + cnan = 0

If this equation holds only when c1 = c2 = · · · = cn = 0, then the vectors are linearlyindependent.

35

Exercise 4.12 a1 = (1, 2), a2 = (1, 1), and a3 = (5, 1) ∈ R2. Show that a1, a2, a3 arelinearly dependent.

Let a1,a2, . . . ,an ∈ Rn\{0}. Suppose that, for any i = 1, . . . , n, it follows thatai = ∑

j �=i λjaj for any λ1, . . . , λi−1, λi+1, . . . , λn ∈ R. Then, the entire space Rn isspanned by the set of all linear combinations of a1, . . . ,an.

Lemma 4.4 A set of n vectors a1,a2, . . . ,an in Rm is linearly dependent if and only ifat least one of them can be written as a linear combination of the others. Or equivalently:A set of vectors a1,a2, . . . ,an in Rm is linearly independent if and only if none of themcan be written as a linear combination of the others.

Proof of Lemma 4.4: Suppose that a1,a2, . . . ,an are linearly dependent. Thenthe equation c1a1 + · · ·+ cnan = 0 holds with at least one of the coefficients ci differentlyfrom 0. We can, without loss of generality, assume that c1 = 0. Solving the equation fora1 yields

a1 = −c2

c1a2 − · · · − cn

c1an.

Thus, a1 is a linear combination of the other vectors. �

4.4.1 Linear Dependence and Systems of Linear Equations

Consider the general system of m equations in n unknowns.

a11x1 + a12x2 + · · · + a1nxn = b1

a21x1 + a22x2 + · · · + a2nxn = b2 ⇐⇒ x1a1 + · · · + xnan = b (∗)· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·

an1x1 + an2x2 + · · · + amnxn = bm

Here a1, . . . ,an are the column vectors of coefficients, and b is the column vector withcomponents b1, . . . , bm.

Suppose that (∗) has two solutions (u1, . . . , un) and (v1, . . . , vn). Then,

u1a1 + · · · + unan = b and v1a1 + · · · + vnan = b

Subtracting the second equation from the first yields

(u1 − v1)a1 + · · · + (un − vn)an = 0.

Let c1 = u1−v1, . . . , cn = un−vn. The two solutions are different if and only if c1, . . . , cn

are not all equal to 0. We conclude that if system (∗) has more than one solution, thenthe column vectors a1, . . . ,an are linearly dependent. 2 Equivalently, If the columnvectors a1, . . . ,an are linearly independent, then system (∗) has “at most” one solution.3

2Recall Lemma 4.4.3Is there anything to say when there is no solution? The answer is yes. I can use Farkas Lemma to

check if there is any solution to the system. See Appendix 1 in this chapter for Farkas Lemma.

36

Theorem 4.1 The n column vectors a1,a2, . . . ,an of the n × n matrix

A =

⎛⎜⎜⎜⎝

a11 a12 · · · a1n

a21 a22 · · · a2n...

.... . .

...an1 an2 · · · ann

⎞⎟⎟⎟⎠ , where aj =

⎛⎜⎜⎜⎝

a1j

a2j...

anj

⎞⎟⎟⎟⎠ j = 1, . . . , n

are linearly independent iff |A| = 0.

Proof of Theorem 4.1: The vectors a1, . . . ,an are linearly independent iff thevector equation x1a1 + · · ·xnan = 0 has only the trivial solution x1 = · · · = xn = 0. Thisvector equation is equivalent to a homogeneous systems of equations, and therefore, ithas only the trivial solution iff |A| = 0. 4 �

Definition 4.2 The rank of a matrix A, written r(A), is the maximum number oflinearly independent column vectors in A. If A is the 0 matrix, we put r(A) = 0.

4.5 Eigenvalues

4.5.1 Motivations

Consider the matrix A below.

A =(

2 00 3

)

Let x, y be vectors in R2. Suppose that y is derived from Ax as follows.(y1

y2

)=

(2 00 3

)(x1

x2

)=

(2x1

3x2

)⇐⇒ y = Ax

The linear transformation (matrix) A extends x1 into 2x1 along the x1 axis and x2

into 3x2 along the x2 axis. Importantly, there is no interaction between x1 and x2 throughthe linear transformation A. This is, I believe, a straightforward extension of the lineartransformation in R into Rn. Define e1 = (1, 0) and e2 = (0, 1) as the unit vectors in R2.Then, x = x1e1 + x2e2 and y = 2x1e1 + 3x2e2. In other words, (e1, e2) is the unit vectorin the original space and (2e1, 3e2) is the unit vector in the space transformed throughA. Next, consider the matrix B as follows.

B =(

1 1−2 4

)

Now, we don’t have a clear image about what is going on through the linear trans-formation A. However, consider the following different unit vectors f1 = (1, 1) and

4Check Section 4.2.3 for this argument. Recall that the system of linear equations is homogeneous ifit is expressed by Ax = 0.

37

f2 = (1, 2). Then,

(1 1−2 4

)︸ ︷︷ ︸

A

f1︷ ︸︸ ︷(11

)= 2

(11

)︸ ︷︷ ︸

2f1

, and(

1 1−2 4

)︸ ︷︷ ︸

A

f2︷ ︸︸ ︷(12

)= 3

(12

)︸ ︷︷ ︸

3f2

This shows that once we take f1 and f2 as the new coordinate system, the lineartransformation B is the same as A but now along the f1 and f2 axes, respectively.Finally, consider the matrix C below.

C =(

2 −34 2

)It turns out that there is no way of finding the new coordinate system in which

the linear transformation C can be seen as either extending or shrinking the vectors ineach new axis. The reason why we don’t find such a new coordinate system is that werestrict our attention to Rn. Once we allow for the unit vectors in the new system to becomplex numbers, we again will be successful to find the new coordinate system in whicheverything is easy to understand. 5

Consider another different unit vector

f1 =

(1,

2√

33

i

)and f2 =

(1,−2

√3

3i

).

Then,

(2 −34 2

)︸ ︷︷ ︸

A

f1︷ ︸︸ ︷(1

2√

3i3

)= (2 − 2

√3i)

(1

2√

3i3

)︸ ︷︷ ︸

(2−2√

3i)f1

, and

(2 −34 2

)︸ ︷︷ ︸

A

f2︷ ︸︸ ︷(1

−2√

3i3

)= (2 + 2

√3i)

(1

−2√

3i3

)︸ ︷︷ ︸

(2+2√

3i)f2

I want to generalize the above argument. Suppose there happens to be a scalar λwith the special property that

Ax = λx (∗)In this case, we would have A2x = A(Ax) = A(λx) = λAx = λλx = λ2x, in general,Anx = λnx.

5Those who are interested in the definition of complex numbers should be referred to Appendix 2 inthis chapter.

38

Definition 4.3 If A is an n× n matrix, then a scalar λ is an eigenvalue of A if thereis a nonzero vector x ∈ Rn such that

Ax = λx

Then x is an eigenvector of A (associated with λ)

4.5.2 How to Find Eigenvalues

The eigenvalue equation can be written as

(A − λI)x = 0

where I denotes the identity matrix of order n. Note that this linear system of equationshas a solution x = 0 if and only if the coefficient matrix has determinant equal to 0 –that is, iff |A − λI| = 0. Letting p(λ) = |A − λI|, where A = (aij)n×n, we have theequation

p(λ) = |A − λI| =

∣∣∣∣∣∣∣∣∣a11 − λ a12 · · · a1n

a21 a22 − λ · · · a2n...

.... . .

...an1 an2 · · · ann − λ

∣∣∣∣∣∣∣∣∣= 0.

This is called the characteristic equation of A. From the definition of determinant,it follows that p(λ) is a polynomial of degree n in λ. According to the fundamentaltheorem of algebra, it has exactly n roots (real or complex), provided that any multipleroots are counted appropriately.

Theorem 4.2 (The Fundamental Theorem of Algebra by Gauss (1779)) Considera polynomial of degree n in z shown below.

zn + an−1zn−1 + · · · + a1z + a0 = 0, (∗)

where a0, . . . an−1 ∈ C. Then, (∗) has n solutions z∗1 , . . . , z∗n with the property that z∗i ∈ Cfor each i = 1, . . . , n. This includes the case in which z∗i = z∗j for some i = j.

Exercise 4.13 Find the eigenvalues and the associated eigenvectors of the matrices, Aand B.

A =(

1 23 0

)

B =(

0 1−1 0

)

In fact, it is convenient to write the characteristic function as a polynomial in −λ:

p(λ) = (−λ)n + bn−1(−λ)n−1 + · · · + b1(−λ) + b0

39

The zeros of this characteristic polynomial are precisely the eigenvalues of A. Denotingthe eigenvalues by λ1, λ2, . . . , λn ∈ C, we have

p(λ) = (−1)n(λ − λ1)(λ − λ2) · · · (λ − λn)

Theorem 4.3 If A is an n × n matrix with eigenvalues λ1, λ2, . . . , λn, then

1. |A| = λ1λ2 · · ·λn

2. tr(A) = a11 + a22 + · · · + ann = λ1 + λ2 + · · · + λn

Proof of Theorem 4.3: Putting λ = 0, we see that p(0) = b0 = |A|. Specifically,λ = 0 gives p(0) = (−1)n(−1)nλ1λ2 · · ·λn. Since (−1)n(−1)n = ((−1)n)2 = 1, we haveb0 = |A| = λ1λ2 · · ·λn. The product of the elements on the main diagonal of |A − λI| is

(a11 − λ)(a22 − λ) · · · (ann − λ).

If we choose ajj from one of these parentheses and −λ from the remaining n − 1, thenadd over j = 1, . . . n, we obtain the term

(a11 + a22 + · · · + ann)(−λ)n−1

Since we cannot obtain other terms with (−λ)n−1 except the above terms, we concludethat bn−1 = a11 + a22 + · · · ann, the trace of A. �

4.6 Diagonalization

Let A and P be n × n matrices with P invertible. Then A and P−1AP have the sameeigenvalues. This is true because the two matrices have the same characteristic polyno-mial:

|P−1AP − λI| = |P−1AP − P−1λIP | = |P−1(A − λI)P |= |P−1||A − λI||P | = |A − λI|

where we use the fact that |P−1| = 1/|P | and |AB| = |A||B| (See Proposition 4.1 and4.2.) .

An n × n matrix A is diagonalizable if there exists an invertible n × n matrix P anda diagonal matrix D such that P−1AP = D.

Theorem 4.4 (Diagonalization Theorem) An n × n matrix A is diagonalizable ifand only if it has a set of n linearly independent eigenvectors x1, . . . , xn ∈ Cn. In thiscase,

P−1AP = diag(λ1, . . . , λn)

where P is the matrix with x1, . . . , xn ∈ Cn as its columns, and λ1, . . . , λn ∈ C are thecorresponding eigenvalues.

40

Proof of Diagonalization Theorem: (⇐=) Suppose that A has n linearly indepen-dent eigenvectors x1, . . . , xn, with corresponding eigenvalues λ1, . . . , λn. Let P denotethe matrix whose columns are x1, . . . , xn. Then, AP = PD (⇔ AP = DP because D isdiagonal), where D = diag(λ1, . . . , λn). Because the eigenvectors are linearly indepen-dent, P is invertible, so P−1AP = D. (=⇒) If A is diagonalizable, there exists invertiblen× n matrix P such that P−1AP = D. Then, AP = PD. Since D is a diagonal matrixby our hypothesis, we have AP = DP . The columns of P must be eigenvectors of A,and the diagonal elements of D must be the corresponding eigenvalues. �

A matrix P is said to be orthogonal if P T = P−1, i.e., P T P = I. If x1, . . . , xn are then column vectors of P , then x

′1, . . . , x

′n are the row vectors of the transformed matrix,

P′. The condition P T P = I then reduces to the n2 equations xT

i · xj = 1 if i = j andxT

i · xj = 0 if i = j.

Theorem 4.5 If the matrix A = (aij)n×n is symmetric, then:

1. All the n eigenvalues λ1, . . . , λn are real.

2. Eigenvectors that corresponds to different eigenvalues are orthogonal.

3. There exists an orthogonal matrix P (i.e., P T = P−1) such that

P−1AP =

⎛⎜⎜⎜⎝

λ1 0 · · · 00 λ2 · · · 0...

.... . .

...0 0 · · · λn

⎞⎟⎟⎟⎠

The columns v1, v2, . . . , vn of the matrix P are eigenvectors of unit length corre-sponding to the eigenvalues λ1, λ2, . . . , λn.

Proof of Theorem 4.5: (1) We will show this for n = 2. The eigenvalues of 2 × 2matrix A are given by the quadratic equation.

|A − λI| =∣∣∣∣ a11 − λ a12

a21 a22 − λ

∣∣∣∣ = λ2 − (a11 + a22)λ + (a11a22 − a12a21) = 0 (∗)

The roots of the quadratic equation (∗) are

λ =(a11 + a22) ±

√(a11 + a22)2 − 4(a11a22 − a12a21)

2.

These roots are real if and only if (a11 + a22)2 ≥ 4(a11a22 − a12a21), which is equivalentto

(a11 − a22)2 + 4a12a21 ≥ 0(a11 − a22)2 + 4a2

12 ≥ 0 if a12 = a21 because of the symmetry of A

41

This is indeed the case. (2) Suppose that Axi = λixi and Axj = λjxj by λi = λj .Multiplying these equalities from the left by xT

j and xTi , respectively,

xTj Axi = λix

′jxi and xT

i Axj = λjxTi xj

Applying transpose operation on both hand sides, we obtain

xTi AT xj = λix

Ti xj and xT

j AT xi = λjxTj xi

Since A is symmetric so that A = AT , the above is simplified to

xTi Axj = λix

Ti xj and xT

j Axi = λjxTj xi

This implies that (λi − λj)x′ixj = 0. Since λi − λj = 0 by our hypothesis, we must

have x′ixj = 0 and thus, xi and xj are orthogonal. (3) Suppose all the real – which

we are supposed to know from (1) – eigenvalues are different. 6 Then, according to (2)which we have shown above, the associated eigenvectors are mutually orthogonal. Hence,the eigenvectors are linearly independent. When we define P as the collection of theeigenvectors x1, . . . xn as its columns, it follows that P−1 = P T . By the diagonalizationtheorem (Theorem 4.4), A is diagonalizable. We can choose the eigenvectors so that theyall have length 1, by replacing each xj with xj/‖xj‖. �

Exercise 4.14 Let a 2 × 2 symmetric matrix A be given below.

A =(

2 11 2

)Compute the matrix P described in Theorem 4.5.

4.7 Quadratic Forms

A quadratic form in n variables is a function Q of the form

Q(x1, . . . , xn) =n∑

i=1

n∑j=1

aijxixj = a11x211 + a12x1x2 + · · · + aijxixj + · · · + annx2

n.

where the aij are constants. Suppose we put x = (x1, . . . , xn)T and A = (aij). Then, itfollows from the definition of matrix multiplication that

Q(x1, . . . , xn) = Q(x) = xT Ax.

Of course, xixj = xjxi, so we can write aijxixj + ajixjxi = (aij + aji)xixj. If we replaceaij and aji by (aij + aji)/2, then the new numbers aij and aji become equal withoutchanging Q(x). Thus, we can assume that aij = aji for all i and j, which means thatthe matrix A is symmetric. Then A is called the symmetric matrix associated with Q,and Q is called a symmetric quadratic form.

6We omit the case where some of the eigenvalues are equal.

42

Definition 4.4 A quadratic form Q(x) = x′Ax, as well as its associated symmetric

matrix A, are said to be positive definite, positive semidefinite, negative definite,or negative semidefinite according as

Q(x) > 0, Q(x) ≥ 0, Q(x) < 0, Q(x) ≤ 0,

for all x ∈ R\{0}. The quadratic form Q(x) is indefinite if there exist vectors x∗ andy∗ such that Q(x∗) < 0 and Q(y∗) > 0.

Let A = (aij) be any n × n matrix. An arbitrary principal minor of order r is thedeterminant of the matrix obtained by deleting all but r rows and r columns in A withthe same numbers. In particular, a principal minor of order r always includes exactly relements of the main (principal) diagonal. We call the determinant |A| itself a principalminor (no rows and columns are deleted). A principal minor is said to be a leadingprincipal minor of order r (1 ≤ r ≤ n), if it consists of the first “leading” rows andcolumns of |A|.

Suppose A is an arbitrary n × n matrix. The leading principal minors of A are

Dk =

∣∣∣∣∣∣∣∣∣a11 a12 · · · a1k

a21 a22 · · · a2k...

.... . .

...ak1 ak2 · · · akk

∣∣∣∣∣∣∣∣∣, k = 1, . . . , n

Exercise 4.15 Consider a 3 × 3 matrix A:

A =

⎛⎝ a11 a12 a13

a21 a22 a23

a31 a32 a33

⎞⎠ .

Compute all the principal minors of A.

Theorem 4.6 Consider the quadratic form

Q(x) =n∑

i=1

n∑j=1

aijxixj (aij = aji)

with the associated symmetric matrix A = (aij)n×n. Let Dk be the leading principalminor of A of order k and let Δk denote an arbitrary principal minor of order k. Thenwe have

1. Q is positive definite ⇐⇒ Dk > 0 for k = 1, . . . , n

2. Q is positive semidefinite ⇐⇒ Δk ≥ 0 for all principal minors of order k =1, . . . , n.

3. Q is negative definite ⇐⇒ (−1)kDk > 0 for k = 1, . . . , n

43

4. Q is negative semidefinite ⇐⇒ (−1)kΔk ≥ 0 for all principal minors of orderk = 1, . . . , n.

Proof of Theorem 4.6: We only prove this for n = 2. Then, the quadratic form is

Q(x1, x2) = a11x21 + 2a12x1x2 + a22x

22

After some manipulation through perfect square, we obtain

Q(x1, x2) = a11

(x1 +

a12

a11x2

)2

︸ ︷︷ ︸>0

+(

a22 − a212

a11

)x2

2︸︷︷︸≥0

Thus, we obtain

• Q(x1, x2) > 0 ⇐⇒ a11 > 0 and a11a22 − a212 > 0.

• Q(x1, x2) < 0 ⇐⇒ a11 < 0 and a11a22 − a212 < 0. �

Theorem 4.7 Let Q = x′Ax be a quadratic form, where the matrix A is symmetric,

and let λ1, . . . , λn be the (real) eigenvalues of A. Then,

1. Q is positive definite ⇐⇒ λ1 > 0, . . . , λn > 0

2. Q is positive semidefinite ⇐⇒ λ1 ≥ 0, . . . , λn ≥ 0

3. Q is negative definite ⇐⇒ λ1 < 0, . . . , λn < 0

4. Q is negative semidefinite ⇐⇒ λ1 ≤ 0, . . . , λn ≤ 0

5. Q is indefinite ⇐⇒ A has eigenvalues with opposite signs.

Proof of Theorem 4.7: According to Theorem 4.5, there exists an orthogonalmatrix P such that P

′AP = diag(λ1, . . . , λn). Let y = (y1, . . . , yn)T be the n×1 matrix

defined by y = P T x. Then, x = Py, so that

xT Ax = (Py)T APy = yT P T APy = yT diag(λ1, . . . , λn)y = λ1y21 + λ2y

22 + · · · + λny2

n

Also, x = 0 iff y = 0. This completes the proof. �

4.8 Appendix 1: Farkas Lemma

In this Appendix, I follow “Advanced Mathematical Economics,” by Rakesh Vohra.

44

4.8.1 Preliminaries

Definition 4.5 A vector y can be expressed as a linear combination of a vectors inS = {x1, x2, . . . } if there are real numbers {λj}j∈S such that

y =∑j∈S

λjxj

The set of all vectors that can be expressed as a linear combination of vectors in S iscalled the span of S and denoted span(S).

Definition 4.6 The rank of a (not necessarily finite) set S of vectors is the size of thelargest subset of linearly independent vectors in S.

Definition 4.7 Let S be a set of vectors and B ⊂ S be finite and linearly independent.The set B of vectors is said to be a maximal linear independent set if the set B∪{x}is linearly dependent for all vectors x ∈ S\B. A maximal linearly independent subset ofS is called a basis of S.

Theorem 4.8 Every S ⊂ Rn has a basis. If B is a basis of S, then span(S) = span(B).

Theorem 4.9 Let S ⊂ Rn. If B and B′are two bases of S, then |B| = |B′ |.

From this theorem, one can see that if S has a basis B, then the rank of S and B.

Definition 4.8 Let S be a set of vectors. The dimension of span(S) is the rank of S.

Definition 4.9 The kernel or null space of A is the set {x ∈ Rn|Ax = 0}.The following theorem summarizes the relationship between the span of A and its

kernel.

Theorem 4.10 If A is an m×n matrix, then the dimension of span(A) plus the dimen-sion of the kernel of A is n.

This is sometimes written as

dim[span(A)] + dim[ker(A)] = rank(A) + dim[ker(A)] = n.

The column rank of a matrix is the dimension of the span of its columns. Similarly,the row rank is the dimension of the span of its row.

Theorem 4.11 Let A be an m × n matrix. Then, the column rank of A and AT (thetranpose of A) are the same.

Thus, the column and row rank of A are equal. This allows us to define the rank ofa matrix A to be the dimension of span(A).

45

4.8.2 Fundamental Theorem of Linear Algebra

Let A be an m × n matrix of real numbers. We will be interested in problems of thefollowing kind:

Given b ∈ Rm, find an x ∈ Rn such that Ax = b or prove that no such x exists.

Convincing another that Ax = b has a solution (when it does) is easy. One merelyexhibits and they can verify that the solution does indeed satisfy the equations. Whatif the system Ax = b does not admit a solution? By framing the problem in the rightway, we can bring to bear the machinery of linear algebra. Specifically, given b ∈ Rm,the problem of finding an x ∈ Rn such that Ax = b can be stated as: is b ∈ span(A)?

Theorem 4.12 (Gauss (??)) Let A be an m × n matrix, b ∈ Rm and F = {x ∈Rn|Ax = b}. Then, either F = ∅ or there exists y ∈ Rm such that yA = 0 and yb = 0but not both.

Suppose F = ∅. Then, b is not in the span of the columns of A. If I think of the spaceof the columns of A as a plane, then b is a vector pointing out of the plane. Thus, anyvector y orthogonal to this plane (and so to every column of A) must have a non-zerodot product with b. Now for an algebraic interpretation. Take any linear combinationof the equations in the system Ax = b. This linear combination can be obtained bypre-multiplying each side of the equation by a suitable vector y, i.e., yAx = yb. Supposethere is a solution x∗ to the system, i.e., Ax∗ = b. Any linear combination of theseequations results in an equation that x∗ satisfies as well. In particular, x∗ must also bea solution to the resulting equation: yAx = yb. Suppose I found a vector y such thatyAx = yb then clearly the original system Ax = b could not have a solution.

Proof : First, we prove the “not both” part. Suppose that F = ∅. Choose any x ∈ F .Then, for any y ∈ Rm, we have

yb = yAx = (yA)x = 0,

which contradicts the fact that yb = 0 for some y.If F = ∅, we are done. Suppose that F = ∅. Hence, b cannot be in the span of the

columns of A. Thus, the rank of C = [A, b], r′, is one larger than the rank, r, of A. That

is, r′= r + 1. Since C is a m × (n + 1) matrix,

rank(CT ) + dim[ker(CT )] = m = rank(AT ) + dim[ker(AT )].

Using the fact that the rank of a matric and its transpose coincide, we have

r′+ dim[ker(CT )] = m = r + dim[ker(AT )],

i.e., dim[ker(CT )] = dim[ker(AT )] − 1. Since the dimension of ker(CT ) is one smallerthan the dimension of ker(AT ), we can find a y ∈ ker(AT ) that is not in ker(CT ). Hence,yA = 0 but yb = 0. �

46

4.8.3 Linear Inequalities

Now consider the following problem:

Given a b ∈ Rm, find an x ∈ Rn such that Ax ≤ b or show that no such x exists.

The problem differs from the earlier one in that “=” has been replaced by “≤.”

4.8.4 Non-Negative Solutions

I focus on finding a non-negative x ∈ Rn such that Ax = b or show that no such x exists.Observe that if b = 0, the problem is trivial, so I assume that b = 0.

Definition 4.10 A set C of vectors is called a cone if λx ∈ C whenever x ∈ C andλ > 0.

Definition 4.11 The set of all non-negative linear combinations of the columns of A iscalled the finite cone generated by the columns of A. It is denoted cone(A).

Note the difference between span(A) and cone(A) below:

span(A) = {y ∈ Rm|y = Ax for some x ∈ Rn}

and

cone(A) ={y ∈ Rm|y = Ax for some x ∈ Rn

+

}Theorem 4.13 (Farkas Lemma (1902)) Let A be an m × n matrix, b ∈ Rm andF = {x ∈ Rn|Ax = b, x ≥ 0}. Then, either F = ∅ or there exists y ∈ Rm such thatyA ≥ 0 and y · b < 0 but not both.

Take any linear combination of the equations in Ax = b to get yAx = yb. A non-negative solution to the first system is a solution to the second. If we can choose y so thatyA ≥ 0 and y · b < 0, we find that the left hand side of the single equation yAx = yb isat least zero while the right hand side is negative, a contradiction. Thus, the fist systemcannot have a non-negative solution.

Proof : First we prove that both statements cannot hold simultaneously. Supposenot. Let x∗ ≥ 0 be a solution to Ax = b and y∗ a solution to yA ≥ 0 such thaty∗b < 0. Notice that x∗ must be a solution to y∗Ax = y∗b. Thus, y∗Ax∗ = y∗b. The0 ≤ y∗Ax∗ = y∗b < 0, which is a contradiction.

If b /∈ span(A) (i.e., there is no x such that Ax = b), by the previous theorem(Theorem ??), there is a y ∈ Rm such that yA = 0 and yb = 0. If it so happens that thegiven y has the property that yb < 0 we are done. If yb > 0, then negate y and again weare done. So, we may suppose that b ∈ span(A) but b /∈ cone(A), i.e., F = ∅.

47

Let r be the rank of A. Note that n ≥ r. Since A contains r linearly independentcolumn vectors and b ∈ span(A), we can express b as a linear combination of an r-subsetD of linearly independent columns of A. Let D = {ai1 , . . . , air} and b =

∑rt=1 λita

it .Note that D is linearly independent. Since b /∈ cone(A), at least one of {λit}t≥1 isnegative.

Now apply the following four step procedure repeatedly. Subsequently, we show thatthe procedure must terminate.

1. Choose the smallest index h amongst {i1, . . . , ir} with λh < 0.

2. Choose y so that y · a = 0 for all a ∈ D\ah and y · ah = 0. This can be done by theprevious theorem (Theorem ??) because ah /∈ span(D\ah). Normalize y so thaty · ah = 1. Observe that y · b = λh < 0.

3. If y · aj ≥ 0 for all columns aj of A stop, and the proof is complete.

4. Otherwise, choose the smallest index w amongst {1, . . . , n} such that y · aw < 0.Note that aw /∈ D\ah. Replace D by {D\ah} ∪ aw, i.e., exchange ah for aw.

To complete the proof, we must show that the procedure terminates. Let Dk denotethe set D at the start of the kth iteration of the four step procedure described above. Ifthe procedure does not terminate, there is a pair (k, ) with k < such that Dk = D�,i.e., the procedure cycles.

Let s be the largest index for which as has been removed from D at the end of oneof the iterations k, k + 1, . . . , − 1, say p. Since D� = Dk, there is a q such that as isinserted into Dq at the end of iteration q, where k ≤ q < . No assumption is madeabout whether p < q or p > q. Notice that

Dp ∩ {as+1, . . . , an} = Dq ∩ {as+1, . . . , an}.Let Dp = {ai1 , . . . , air}, b = λi1a

i1 + · · · λirair and let y

′be the vector found in step two

of iteration q. Then:

0 > y′ · b = y

′ (λi1a

i1 + · · · + λirair)

= y′λi1a

i1 + · · · + y′λira

ir > 0,

which is a contradiction. The first inequality comes from the previous theorem (Theorem??). To see why the last inequality must be true:

• When ij < s, we have from Step 1 of iteration p that λij ≥ 0. From Step 4 ofiteration q, we have y

′ · aij ≥ 0.

• When ij = s, we have from Step 1 of iteration p that λij < 0. From Step 4 ofiteration q we have y

′ · aij < 0.

• When ij > s, we have from Dp ∩ {as+1, . . . , an} = Dq ∩ {ar+1, . . . , an} and Step 2of iteration q that y

′ · aij = 0.

This complete the proof. �

48

4.8.5 The General Case

The problem of deciding whether the system {x ∈ Rn| Ax ≤ b} has a solution can bereduced to the problem of deciding if Bz = b, z ≥ 0 has a solution for a suitable matrixB.

First observe that any inequality of the form∑

j aijxj ≥ bi can be turned into anequation by the subtraction of a surplus variable, s. That is, define a new variable si ≥ 0such that ∑

j

aijxj − si = bi.

Similarly, an inequality of the form∑

j aijxj ≤ bi can be converted into an equation bythe addition of a slack variable, si ≥ 0 as follows:∑

j

aijxj + si = bi.

A variable xj that is unrestricted in sign can be replaced by two non-negative variableszj and z

′j by setting xj = zj − z

′j . In this way any inequality system can be converted

into an equality system with non-negative variables. We will refer this as converting intostandard form.

As an example, we derive the Farkas alternative for the system {x|Ax ≤ b, x ≥ 0}.Deciding solvability of Ax ≤ b for x ≥ 0 is equivalent to solvability of Ax + Is = b wherex, s ≥ 0. Set B = [A|I] and z = (x, s)T and we can write the system as Bz = b, z ≥ 0.Now apply the Farkas lemma to this system:

yB ≥ 0, yb < 0.

Now 0 ≤ yB = y[A|I] implies yA ≥ 0 and y ≥ 0. So, the Farkas alternative is {y|yA ≥0, y ≥ 0, yb < 0}. The principle here is that by a judicious use of auxiliary variable,one can convert almost anything into standard form.

4.9 Appendix 2: Linear Spaces

4.9.1 Number Fields

Linear algebra makes use of number systems (number fields). By a number field I meanany set K of objects, called “numbers,” which, when subjected to four arithmetic op-erations again give elements of K. More exactly, these operations have the followingproperties F1, F2, and F3 (field axioms):

F1: To every pair of numbers α and β in K, there corresponds a (unique) numberα + β in K, called the sum of α and β, where

1. α + β = β + α ∀α, β ∈ K (addition is communicative);

49

2. (α + β) + γ = α + (β + γ) ∀α, β, γ ∈ K (addition is associative);

3. There exists a number 0 (zero) in K such that 0 + α = α ∀α ∈ K;

4. For every α ∈ K, there exists a number (negative element) γ ∈ K such thatα + γ = 0.

The solvability of the equation α + γ = 0 ∀α allows us to carry out the operation ofsubtraction, by defining the difference β−α as the sum of the number β and the solutionγ of the equation α + γ = 0.

F2: To every pair of numbers α, β ∈ K, there corresponds a (unique) number α · β (orαβ) in K, called the product of α and β, where

1. αβ = βα ∀α, β ∈ K (multiplication is commutative);

2. (αβ)γ = α(βγ) ∀α, β, γ ∈ K (multiplication is associative);

3. There exists a number 1 ( = 0) in K such that 1 · α = α ∀αinK;

4. For every α = 0 ∈ K, there exists a number (reciprocal element) γ ∈ K such thatαγ = 1.

F3: Multiplication is distributive over addition, i.e., for every α, β, γ ∈ K,

α(β + γ) = αβ + αγ.

The solvability of the equation αγ = 1 for every α = 0 allows us to carry out theoperation of division, by defining the quotient β/α as the product of the number β andthe solution γ of the equation αγ = 1.

The numbers 1, 1 + 1 = 2, 2 + 1 = 3, etc. are said to be natural ; it is assumedthat none of these numbers is zero. The set of natural numbers is denoted as N. Bythe integers in a field K, we mean the set of all natural numbers together with theirnegatives and the number zero. The set of integers is denoted as Z. By the rationalnumbers in a field K, we mean the set of all quotients p/q, where p and q are integersand q = 0. The set of rational numbers is denoted as Q.

Two field K and K′

are said to be isomorphic if we can set up a one-to-one cor-respondence between K and K

′such that the number associated with every sum (or

product) of numbers in K is the sum (or product) of the corresponding numbers in K′.

The number associated with every difference (or quotient) of numbers in K will then bethe difference (or quotient) of the corresponding numbers in K

′.

The most commonly encountered concrete examples of number fields are the follow-ing:

50

1. The field of rational numbers, i.e., of quotients p/q where p and q = 0 are theordinary integers subject to the ordinary operations of arithmetic. It should benoted that the integers by themselves do not form a field, since they do not satisfyaxiom F2-4. It follows that every field K has a subset isomorphic to the field ofrational numbers.

2. The field of real numbers, having the set of all points of the real line as its geometriccounterpart. The set of real numbers is denoted as R. An axiomatic treatment ofthe field of real numbers is achieved by supplementing axioms F1, F2, F3 with theaxioms of order and the least upper bound principle.

3. The field of complex numbers of the form a + ib, where a and b are real numbers(i is not a real number), equipped with the following operations of addition andmultiplication:

(a1 + ib1) + (a2 + ib2) = (a1 + a2) + i(b1 + b2),(a1 + ib1)(a2 + ib2) = (a1a2 − b1b2) + i(a1b2 + a2b1).

The set of complex numbers is denoted as C. For numbers of the form a+ i0, theseoperations reduce to the corresponding operations for real numbers; briefly I writea + i0 = a and call complex numbers of this form real. Thus, it can be said thatthe field of complex numbers has a subset isomorphic to the field of real numbers.Complex numbers of the form 0 + ib are said to be (purely) imaginary and aredesignated briefly by ib. It follows from the multiplication rule that

i2 = i · i = (0 + i1)(0 + i1) = −1.

4.9.2 Definitions

The concept of a linear space generalizes that of the set of all vectors. The generalizationconsists first in getting away from the concrete nature of the objects involved (directedline segments) without changing the properties of the operations on the objects, andsecondly in getting away from the concrete nature of the admissible numerical factors(real numbers). This leads the following definition.

Definition 4.12 A set V is called a linear (or affine) space over a field K if

1. Given any two elements x, y ∈ V , there is a rule (the addition rule) leading to a(unique) element x + y ∈ V , called the sum of x and y;

2. Given any element x ∈ V and any number λ ∈ V , there is a rule (the multiplicationby a number) leading to a (unique) element λx ∈ V , called the product of theelement x and the number λ;

3. These two rules obey the axioms listed below, VS1 and VS2.

VS 1: The addition rule has the following properties:

51

1. x + y = y + x for every x, y ∈ V ;

2. (x + y) + z = x + (y + z) for every x, y, z ∈ V ;

3. There exists an element 0 ∈ V (the zero vector) such that x + 0 = x for everyx ∈ V ;

4. For every x ∈ V , there exists an element y ∈ V (the negative element) such thatx + y = 0.

VS 2: The rule for multiplication by a number has the following properties:

1. 1 · x = x for every x ∈ V ;

2. α(βx) = (αβ)x for every x ∈ V and α, β ∈ K;

3. (α + β)x = αx + βx for every x ∈ V and α, β ∈ K;

4. α(x + y) = αx + αy for every x ∈ V and every α ∈ K.

4.9.3 Bases, Components, Dimension

Definition 4.13 A system of linearly independent vectors e1, e2, . . . , en in a linear spaceV over a field K is called a basis for V if, given any x ∈ V , there exists an expansion

x = ξ1e1 + ξ2e2 + · · · + ξnen (∗)where ξj ∈ V for every j = 1, . . . , n.

It is easy to see that under these conditions, the coefficients in the expansion (∗) areuniquely determined. In fact, we can write two expansions

x = ξ1e1 + ξ2e2 + · · · + ξnen,

x = η1e1 + η2e2 + · · · + ηnen

for a vector x, then, subtracting them term by term, we obtain the relation

0 = (ξ1 − η1)e1 + (ξ2 − η2)e2 + · · · + (ξn − ηn)en,

from which, by the assumption that the vectors e1, e2, . . . , en are linearly independent,we find that

ξ1 = η1, ξ2 = η2, . . . , ξn = ηn.

The uniquely defined numbers ξ1, . . . , ξn are called the components of the vector x withrespect to the basis e1, . . . , en

The fundamental significance of the concept of a basis for a linear space consistsin the fact that when a basis is specified, the originally abstract linear operations inthe space become ordinary linear operations with numbers, i.e., the components of thevectors with respect to the given basis. In fact, we have the following.

52

Theorem 4.14 When two vectors of a linear space V are added, their components (withrespect to any basis) are added. When a vector is multiplied by a number λ, all itscomponents are multiplied by λ.

If, in a linear space V , we can find n linearly independent vectors while every n + 1vectors of the space are linearly dependent, then the number n is called the dimensionof the space V and the space V itself is called n-dimensional. A linear space in whichwe can find an arbitrarily large number of linearly independent vectors is called infinite-dimensional.

Theorem 4.15 In a space V of dimension n, there exists a basis consisting of n vectors.Moreover, any set of n linearly independent vectors of the space V is a basis for the space.

Theorem 4.16 If there is a basis in the space V , then the dimension of V equals thenumber of basis vectors.

4.9.4 Subspaces

Definition 4.14 (Subspaces) Suppose that a set W of elements of a linear space Vhas the following properties:

1. If x, y ∈ W , then x + y ∈ W ;

2. If x ∈ W and λ is an element of the field K, then λx ∈ W .

Then, every set W ⊂ V with properties 1 and 2 above is called linear subspace (orsimply a subspace) of the space V .

Definition 4.15 (The Direct Sum) A linear space W is the direct sum of givensubspaces W1, . . . ,Wm ⊂ W if the following two conditions are satisfied:

1. For every x ∈ W , there exists an expansion

x = x1 + · · · + xm,

where x1 ∈ W1, . . . , xm ∈ Wm;

2. This expansion is unique, i.e., if

x = x1 + · · · + xm = y1 + · · · + ym

where xj, yj ∈ Wj(j = 1, . . . ,m), then

x1 = y1, . . . , xm = ym.

Theorem 4.17 Let W1 be a fixed subspace of an n-dimensional space Vn. Then, therealways exists a subspace W2 ⊂ Vn such that the whole space Vn is the direct sum of W1

and W2.

53

4.9.5 Morphisms of Linear Spaces

Definition 4.16 Let ϕ be a rule which assigns to every given vector x of linear space Va vector x

′in a linear space. Then, ϕ is called morphism (or linear operator) if the

following two conditions hold:

1. ϕ(x + y) = ϕ(x) + ϕ(y) for every x, y ∈ V ;

2. ϕ(αx) = αϕ(x) for every x ∈ V and every α ∈ K.

A morphism ϕ mapping V onto all of V′in a one-to-one fashion is called an isomor-

phism, and the spaces V and V′

themselves are said to be isomorphic (more exactly,K-isomorphic).

Theorem 4.18 Any two n-dimensional spaces V and V′

(over the same field K) areK-isomorphic.

Corollary 4.1 Every n-dimensional linear space over a field K is K-isomorphic to thespace Kn. In particular, every n-dimensional complex space is C-isomorphic to the spaceCn, and every n-dimensional real space is R-isomorphic to the space Rn.

54

Chapter 5

Calculus

5.1 Functions of a Single Variable

Roughly speaking, a function y = f(x) is differentiable if it is both continuous and“smooth,” with no breaks or kinks. The derivative of f is a function giving, at eachvalue of x, the slope of change in f(x). We sometimes write

dy

dx= f

′(x).

to indicate that f′(x) gives us the (instantaneous) amount, dy, by which y changes per

unit change, dx, in x. If the first derivative is a differentiable function, we can take itsderivative which gets the second derivative of the original function

d2y

dx2= f

′′(x).

If a function possesses a continuous derivatives f′, f

′′, . . . , fn, it is called n-times contin-

uously differentiable, or a Cn function. Some rules of differentiation is provided below:

• For constants, α: d/dx(α) = 0.

• For sums: d/dx[f(x) ± g(x)] = f′(x) ± g

′(x).

• Power rule: d/dx(αxn) = nαxn−1.

• Product rule: d/dx[f(x)g(x)] = f(x)g′(x) + f

′(x)g(x)).

• Quotient rule: d/dx[f(x)/g(x)] = (g(x)f′(x) − f(x)g

′(x))/[g(x)]2.

• Chain rule: d/dx[f(g(x))] = f′(g(x))g

′(x).

Later in this note on multivariate calculus, we are going to discuss some of the aboveproperties in details from a more general perspective. Until then, just remember themso that you can use them anytime.

55

5.2 Real-Valued Functions of Several Variables

f : D → R is said to be a real-valued function if D is any set and R ⊂ R. Definethe following: x ≥ y if xi ≥ yi for every i = 1, . . . , n; and x � y if xi > yi for everyi = 1, . . . , n.

Definition 5.1 Let f : D → R, where D is a subset of Rn. Then, f is nondecreasingif f(x) ≥ f(y) whenever x ≥ y. If, in addition, the inequality is strict whenever x � y,then we say that f is is increasing. If, instead, f(x) > f(y) whenever x ≥ y and x = y,then we say that f is strongly increasing.

Rather than having a single slope, a function of n variables can be thought to haven partial slopes, each giving only the rate at which y would change if one xi, alone, wereto change. Each of these partial slopes is called partial derivative.

Definition 5.2 Let y = f(x1, . . . , xn). The partial derivative of f with respect to xi isdefined as

∂f(x)∂xi

≡ limh→0

f(x1, . . . , xi + h, . . . , xn) − f(x1, . . . , xi, . . . , xn)h

∂y/∂xi or fi(x) are used to denote partial derivatives.

5.3 Gradients

If z = F (x, y) and C is any number, we call the graph of the equation F (x, y) = C alevel curve for F . The slope of the level curve F (x, y) = C at a point (x, y) is given bythe formula

F (x, y) = C =⇒ y′=

dy

dx= −∂F (x, y)/∂x

∂F (x, y)/∂y= −F1(x, y)

F2(x, y)

If (x0, y0) is a particular point on the level curve F (x, y) = C, the slope at (x0, y0) is−F1(x0, y0)/F2(x0, y0). The equation for the tangent hyperplane T is

y − y0 = − [F1(x0, y0)/F2(x0, y0)] (x − x0)

or, rearranging

F1(x0, y0)(x − x0) + F2(x0, y0)(y − y0) = 0.

Recalling the inner product, the equation can be written as

(F1(x0, y0), F2(x0, y0)) · (x − x0, y − y0) = 0

The vector (F1(x0, y0), F2(x0, y0)) is said to be the gradient of F at (x0, y0) is oftendenoted by ∇F (x0, y0) (pronounced as “nabla”). The vector (x − x0, y − y0) is a vector

56

on the tangent hyperplane T which implies that ∇F (x0, y0) is orthogonal to the tangenthyperplane T at (x0, y0).

Suppose more generally that F (x) = F (x1, . . . , xn) is a function of n variables definedon an open set A in Rn, and let x0 = (x0

1, . . . , x0n) be a point in A. The gradient of F at

x0 is the vector

∇F (x0) =(

∂F (x0)∂x1

, · · · ,∂F (x0)

∂xn

)

of first-order partial derivatives.

5.4 The Directional Derivative

Let z = f(x) be a function of n variables. The partial derivative ∂f/∂xi measuresthe rate of change of f(x) in the direction parallel to the i-th coordinate axis. Eachpartial derivative says nothing about the behavior of f in other directions. We introducethe concept of directional derivative in order to measure the rate of change of f in anarbitrary direction.

Consider the vector x = (x1, . . . , xn) and let a = (a1, . . . , an) ∈ Rn\{0} be a givenvector. If we move a distance h‖a‖ > 0 from x in the direction given by a, we arrive atx + ha. The average rate of change of f from x to x + ha is then (f(x + ha) − f(x))/h.We define the derivative of f along the vector a by

f′a(x) = lim

h→0

f(x + ha) − f(x)h

or, with components,

f′a(x1, . . . , xn) = lim

h→0

f(x1 + ha1, . . . , xn + han) − f(x1, . . . , xn)h

We assume that x + ha lies in the domain of f for all sufficiently small h. This is onereason why the domain is generally assumed to be open. In particular, with ai = 1 andaj = 0 for all j = i, this derivative is the partial derivative of f with respect to xi.

Suppose f is C1 in a set A 1, and let x be an interior point in A. For an arbitraryvector a, define the function g by

g(h) = f(x + ha) = f(x1 + ha1, . . . , xn + han).1A function f : �n → � is continuously differentiable (or C1) on an open set A ⊂ �

n if, for each i =1, . . . , n, (∂f/∂xi)(x) exists for all x ∈ A and is continuous in x. f is k-times continuously differentiableor Ck on A if all the derivatives of f of order less than or equal to k(≥ 1) exist and are continuous on A.

57

Then, (g(h)−g(0))/h = (f(x+ha)−f(x))/h. Letting h tend to 0, we have g′(0) = fa(x).

Since g′(h) =

∑ni=1 fi(x + ha)ai, g

′(0) =

∑ni=1 fi(x)ai. Hence,

f′a(x) =

n∑i=1

fi(x)ai = ∇f (x) · a.

This equation shows that the derivative of f along the vector a is equal to the innerproduct of the gradient of f and a. If ‖a‖ = 1, the number f

′a(x) is called the directional

derivative of f at x, in the direction a.

Theorem 5.1 Suppose that f(x) = f(x1, . . . , xn) is C1 in an open set A. Then, atpoints x where ∇f (x) ∈ Rn\{0}, the gradient ∇f (x) = (f1(x), . . . , fn(x)) satisfies:

1. ∇f (x) is orthogonal to the level surface through x.

2. ∇f (x) points in the direction of maximal increase of f .

3. ‖∇f(x)‖ measures how fast the function increases in the direction of maximal in-crease.

Proof of Theorem 5.1: By introducing θ as the angle between the vectors ∇f (x)and a, we have

f′a(x) = ∇f (x) · a = ‖∇f(x)‖‖a‖ cos θ

Note that cos θ ≤ 1 for all θ and cos 0 = 1. So when ‖a‖ = 1, it follows that at pointswhere ∇f (x) = 0, the number f

′a(x) is largest when θ = 0, i.e., when a points in the

same direction as ∇f (x), while f′a(x) is smallest when θ = π, that is, cos π = −1, i.e.,

when a points in the opposite direction to ∇f (x). Moreover, it follows that the lengthof ∇f (x) equals the magnitude of the maximum directional derivative. �

Theorem 5.2 (The Mean-Value Theorem) Suppose that f : Rn → R is C1 in anopen set containing [x, y]. Then there exists a point w ∈ (x, y) such that

f(x) − f(y) = ∇f (w) · (x − y).

Proof of the mean-value theorem: We assume that the mean-value theorem forfunctions of one variable is correct. Define ϕ(λ) = f(λx + (1 − λ)y). Then, using thechain rule which we will cover later, ϕ

′(λ) = ∇f (λx + (1 − λ)y) · (x − y). According to

the mean-value theorem for functions of one variable, there exists a number λ0 ∈ (0, 1)such that ϕ(1) − ϕ(0) = ϕ

′(λ0). Putting w = λ0x + (1 − λ0)y, the theorem follows. �

5.5 Convex Sets

Convex sets are basic building blocks in virtually every area of microeconomic the-ory. Convexity is most often assumed to guarantee that the analysis is mathematicallytractable and that the results are clear-cut and “well-behaved.”

58

Definition 5.3 S ⊂ Rn is a convex set if for all x, y ∈ S, we have

αx + (1 − α)y ∈ S,

for all α ∈ [0, 1]

We say that z is a convex combination of x and y if z = αx + (1 − α)y for someα ∈ [0, 1]. We have a very simple and intuitive rule defining convex sets: A set is convexif and only if we can connect any two points in the set by a straight line that lies entirelywithin the set.

Exercise 5.1 Suppose that p � 0 and y ≥ 0. Let B(p, y) = {x ∈ Rn+|p · x ≤ y} be the

budget set of the consumer. Show that B(p, y) is convex.

Theorem 5.3 Let S and T be convex sets in Rn. Then S ∩ T is a convex set.

Proof of Theorem 5.3: Let x and y be any two points in S∩T . Because x ∈ S∩T ,we have x ∈ S and x ∈ T . Similarly, we have y ∈ S and y ∈ T . Let z = αx + (1 − α)yfor some α ∈ [0, 1] be any convex combination of x and y. z ∈ S because S is convexand z ∈ T because T is convex. Thus, z ∈ S ∩ T . �

Exercise 5.2 Construct an example in which two sets S and T are convex but S ∪ T isnot convex.

5.5.1 Upper Contour Sets

Let u(·) : Rn → R be a utility function. Define UC(x0) = {x ∈ Rn+|u(x) ≥ u(x0)}. This

UC(x0) is called the upper contour set which consists of all commodity vectors x thatthe individual values at least as good as x0. In consumer theory, we usually assume thatUC(x0) is convex for every x0 ∈ Rn

+.

5.6 Concave and Convex Functions

A C2 function of one variable y = f(x) is said to be concave (convex) on the interval Iif f

′′(x) ≤ (≥) 0 for all x ∈ I.

Definition 5.4 A function f(x) = f(x1, . . . , xn) defined on a convex set S is concave(convex) on S if

f(λx + (1 − λ)x′) ≥ (≤) λf(x) + (1 − λ)f(x

′)

for all x, x′ ∈ S and all λ ∈ [0, 1]

Definition 5.5 A function f(x) = f(x1, . . . , xn) defined on a convex set S is strictlyconcave (convex) on S if

f(λx + (1 − λ)x′) > (<) λf(x) + (1 − λ)f(x

′)

for all x, x′ ∈ S with x = x

′and all λ ∈ (0, 1)

59

5.7 Concavity/Convexity for C2 Functions

Suppose that z = f(x) = f(x1, . . . , xn) is a C2 function in an open convex set S in Rn.The matrix

D2f(x) = (fij(x))n×n

is called the Hessian (matrix) of f at x, and the n determinants

|D2(r)f(x)| =

∣∣∣∣∣∣∣∣∣f11(x) f12(x) · · · f1r(x)f21(x) f22(x) · · · f2r(x)

......

. . ....

fr1(x) fr2(x) · · · frr(x)

∣∣∣∣∣∣∣∣∣, r = 1, . . . , n

are the leading principal minors of D2f(x) of order r. Here fij(x) = ∂2f(x)/∂xi∂xj forany i, j = 1, . . . , r.

Theorem 5.4 (Second-Order Characterization of Concave (Convex) Functions)Suppose that f(x) = f(x1, . . . , xn) is a C2 function defined on an open, convex set Sin Rn. Let Δ2

(r)f(x) denote a generic principal minor of order r in the Hessian matrix.Then

1. f is convex in S ⇐⇒ Δ2(r)f(x) ≥ 0 for all x ∈ S and all Δ2

(r)f(x), r = 1, . . . , n.

2. f is concave in S ⇐⇒ (−1)rΔ2(r)f(x) ≥ 0 for all x ∈ S and all Δ2

rf(x), r =1, . . . , n.

Proof of Theorem 5.4: (⇐=) The proof relies on the knowledge on the chainrule (Theorem 5.15) which we are going to cover in this course. Just take it for granteduntil then. Take two points x, x0 ∈ S and let t ∈ [0, 1]. Define

g(t) = f(x0 + t(x − x0)) = f(tx + (1 − t)x0).

The chain rule for functions of several variables gives

g′(t) = (x − x0)T

[∇f (x0 + t(x − x0)]

=n∑

i=1

fi(x0 + t(x − x0))(xi − x0i )

Using the chain rule again, we get

g′′(t) = (x − x0)T

[D2f(x0 + t(x − x0))

](x − x0)

=n∑

i=1

n∑j=1

fij(x0 + t(x − x0))(xi − x0i )(xj − x0

j)

By our hypothesis with Theorem 4.6 on quadratic forms, g′′(t) ≥ 0 for any t ∈ [0, 1].

This shows that g(·) is convex. In particular, we have

g(t) = g (t · 1 + (1 − t) · 0)) ≥ tg(1) + (1 − t)g(0) = tf (x) + (1 − t)f(x0)

60

But this shows that f(·) is convex. The concavity of f easily follows by replacing f with−f . (=⇒) Suppose f(·) is convex. According to Theorem 4.6 on quadratic forms, itsuffices to show that for all x ∈ S and all h1, . . . , hn, we have

Q =n∑

i=1

n∑j=1

fij(x)hihj ≥ 0.

Now S is an open set, so if x ∈ S and h = (h1, . . . , hn) is an arbitrary vector, thereexists a positive number a such that x + th ∈ S for all t with |t| < a. Let I = (−a, a).Define the function p on I by p(t) = f(x + th). Since p(·) is convex in I,

p′′(t) =

n∑i=1

n∑j=1

fij(x + th)hihj ≥ 0

for all t ∈ I. Putting t = 0, it follows that f′′(x) ≥ 0. This completes the proof. �

Corollary 5.1 Let z = f(x, y) be a C2 function defined on an open convex set S ⊂ R2.Then,

1. f is convex ⇐⇒ f11 ≥ 0, f22 ≥ 0, and f11f22 − (f12)2 ≥ 0.

2. f is concave ⇐⇒ f11 ≤ 0, f22 ≤ 0, and f11f22 − (f12)2 ≥ 0.

Exercise 5.3 Let f(x, y) = 2x− y − x2 + 2xy − y2 for all (x, y) ∈ R2. Check whether fis concave, convex, or neither.

Exercise 5.4 The CES (Constant Elasticity of Substitution) function f defined for K >0, L > 0 by

f(K,L) = A[δK−ρ + (1 − δ)L−ρ

]−1/ρ

where A > 0, ρ = 0, and 0 ≤ δ ≤ 1. Show that f is concave if ρ ≥ −1 and convex ifρ ≤ −1.

Theorem 5.5 (Second-Order (Partial) Characterization of Strict Concavity)Suppose that f(x) = f(x1, . . . , xn) is a C2 function defined on an open, convex set S inRn. Let D2

(r)f(x) be defined above. Then

1. D2(r)f(x) > 0 for all x ∈ S and all r = 1, . . . , n =⇒ f is strictly convex.

2. (−1)rD2(r)f(x) > 0 for all x ∈ S and all r = 1, . . . , n =⇒ f is strictly concave.

Proof of Theorem 5.5: Define the function g(·) as in the proof of Theorem 5.4above. If the specified conditions are satisfied, the Hessian matrix D2f(x) is positivedefinite by Theorem 4.6 on quadratic forms. So, for x = x0, g

′′(t) > 0 for all t ∈ [0, 1].

It follows that g(·) is strictly convex. Then, we have

g(t) = g (t · 1 + (1 − t) · 0)) > tg(1) + (1 − t)g(0) = tf (x) + (1 − t)f(x0)

for all t ∈ (0, 1). The strict concavity of f is obtained by replacing f with −f . �

61

Corollary 5.2 Let z = f(x, y) be a C2 function defined on an open convex set S ⊂ R2.Then,

1. f11 > 0 and f11f22 − (f12)2 > 0 =⇒ f is strictly convex.

2. f11 < 0 and f11f22 − (f12)2 > 0 =⇒ f is strictly concave.

Theorem 5.6 (First-Order Characterization of Concavity) Suppose that f(·) isa C1 function defined on an open, convex set S in Rn. Then

1. f is concave in S if and only if

f(x) − f(x0) ≤ ∇f (x0) · (x − x0) =n∑

i=1

∂f(x0)∂xi

(xi − x0i )

for all x, x0 ∈ S.

2. f(·) is strictly concave iff the above inequality is always strict when x = x0.

3. The corresponding result for convex (strictly convex) functions is obtained by chang-ing ≤ to ≥ (< to >) in the above inequality.

Proof of Theorem 5.6: (1) (=⇒) Let x, x0 ∈ S. Since f is concave,

λf(x) + (1 − λ)f(x0) ≤ f(λx + (1 − λ)x0)

for all λ ∈ (0, 1). Rearranging the above inequality, for all λ ∈ (0, 1), we obtain

f(x) − f(x0) ≤ f(x0 + λ(x − x0)) − f(x0)λ

(∗)

Let λ → 0. The right hand side of (∗) then approaches ∇f (x0) · (x − x0). (⇐=) Letx, x0 ∈ S and λ ∈ (0, 1). Define z = λx + (1 − λ)x0. Notice that z ∈ S because S isconvex. By our hypothesis, we have

f(x) − f(z) ≤ ∇f (z) · (x − z) (i)f(x0) − f(z) ≤ ∇f (z) · (x0 − z) (ii)

Multiplying the inequality in (i) by λ > 0 and the inequality in (ii) by 1 − λ > 0, weobtain

λ (f(x) − f(z)) + (1 − λ)(f(x0) − f(z)

) ≤ ∇f (z) · [λ(x − z) + (1 − λ)(x0 − z)]

(iii)

Here λ(x − z) + (1 − λ)(x0 − z) = λx + (1 − λ)x0 − z = 0, so the right hand side of (iii)is 0. Thus, rearranging (iii) gives

λf(x) + (1 − λ)f(x0) ≤ f(z) = f(λx + (1 − λ)x0)

62

because z = λx + (1 − λ)x0. This shows that f is concave. (2) (=⇒) Suppose thatf is strictly concave in S. Then, inequality (∗) is strict for x = x0. (⇐=) With z =x0 + λ(x − x0), we have

f(x) − f(x0) <f(z) − f(x0)

λ≤ ∇f (x0) · (z − x0)

λ= ∇f (x0) · (x − x0).

where we used the inequality in (1), which we have already proved, and the fact thatz − x0 = λ(x − x0). This shows that the inequality in (1) holds with strict inequality.(3) This part is trivial. Do you agree with me? �

5.7.1 Jensen’s Inequality

Theorem 5.7 (Jensen’s Inequality) A function f(·) is concave on a convex set S inRn if and only if

f(λ1x1 + · · ·λnxn) ≥ λ1f(x1) + · · ·λnf(xn)

holds for all x1, . . . , xn ∈ S, and for all λi ≥ 0 for all i = 1, . . . , n with∑n

i=1 λi = 1.

Proof of Jensen’s Inequality: For any k ≥ 2, we propose the hypothesis H(k) asfollows.

H(k) : f

(k∑

h=1

λkxh

)≥

k∑h=1

λhf(xh)

H(2) is true because it is indeed the definition of concavity of f . Now, we will show thatH(k) =⇒ H(k + 1). We execute a series of computations below.

f

(k+1∑h=1

λhxh

)= f

(k∑

h=1

λh

[k∑

h=1

λh∑kh=1 λh

xh

]+ λk+1xk+1

)

≥(

k∑h=1

λh

)f

(k∑

h=1

λh∑kh=1 λh

xh

)+ λk+1f(xk+1) (because of H(2))

≥(

k∑h=1

λh

k∑h=1

λh∑kh=1 λh

f(xh) + λk+1f(xk+1)

(because H(k) is true under the inductive hypothesis.)

=k+1∑h=1

λhf(xh). �

One can extend Jensen’s inequality to the continuum. Let X be a random variablewhich takes values on the real line R. Define g : R → R to be a probability densityfunction. Then, continuous version of Jensen’s inequality is given:

63

Jensen’s Inequality (Continuum Version): A function f(·) is concave on R ifand only if

f

(∫ ∞

−∞f(x)g(x)dx

)≥

∫ ∞

−∞f(x)g(x)dx

for any probability density function g(·).

5.8 Quasiconcave and Quasiconvex Functions

Definition 5.6 A function f , defined over a convex set S ⊂ Rn, is quasiconcave ifthe upper level set Pα = {x ∈ S|f(x) ≥ α} is convex for each α ∈ R. We say thatf is quasiconvex if −f is quasiconcave. So, f is quasiconvex iff the lower level setPα = {x ∈ S|f(x) ≤ α} is convex for each α ∈ R.

Proposition 5.1 If f(·) is concave, then it is quasiconcave. Similarly, if f(·) is convex,then it is quasiconvex.

Exercise 5.5 Prove Proposition 5.1.

Theorem 5.8 Let f(·) be a function of n variables defined on a convex set S in Rn.Then, f is quasiconcave if and only if either of the following conditions is satisfied forall x, x

′ ∈ S and all λ ∈ [0, 1],

1. f(λx + (1 − λ)x′) ≥ min{f(x), f(x

′)}

2. f(x′) ≥ f(x) =⇒ f(λx + (1 − λ)x

′) ≥ f(x)

Proof of Theorem 5.8: (1) (=⇒) Suppose that f(·) is quasiconcave. Let x, x′ ∈ S

and λ ∈ [0, 1], and define a = min{f(x), f(x′)}. Then,

x, x′ ∈ Pa = {x ∈ S|f(x) ≥ a}

Since Pa is convex by our hypothesis, λx+(1−λ)x′ ∈ Pa for any λ ∈ [0, 1]. This implies

that f(λx + (1− λ)x′) ≥ a = min{f(x), f(x

′)}. (⇐=) Suppose that the inequality in (1)

is valid and let a be an arbitrary number. We must show that Pa is convex. Take anyarbitrary points x, x

′ ∈ Pa. Then, f(x) ≥ a and f(x′) ≥ a. Also, for all λ ∈ (0, 1), the

inequality in (1) implies that

f(λx + (1 − λ)x

′) ≥ min{f(x), f(x′)}

Thus, λx + (1− λ)x′ ∈ Pa. This proves that Pa is convex. We leave the rest of the proof

as an exercise. �

Exercise 5.6 Prove the second part of Theorem 5.8.

A function : R → R is said to be strictly increasing if F (x) > F (y) whenever x > y.

64

Theorem 5.9 (Quasiconcavity is preserved under positive monotone transformation)Let f(·) be defined on a convex set S in Rn and let F be a function of one variable whosedomain includes f(S). If f(·) is quasiconcave (quasiconvex) and F is strictly increasing,then F (f(·)) is quasiconcave (quasiconvex).

Proof of Theorem 5.9: Suppose f(·) is quasiconcave. Using the previous theorem(Theorem 5.8), we must have

f(λx + (1 − λ)x

′) ≥ min{f(x), f(x′)}.

Since F (·) is strictly increasing,

F(f(λx + (1 − λ)x

′)) ≥ F(min{f(x), f(x

′)})

= min{F (f(x)), F (f(x′))}.

It follows that F ◦ f is quasiconcave. The argument in the quasiconvex case is entirelysimilar, replacing ≥ with ≤ and min with max. �

Definition 5.7 A function f(·) defined on a convex set S ⊂ Rn is said to be strictlyquasiconcave if

f(λx + (1 − λ)x′) > min{f(x), f(x

′)}

for all x, x′ ∈ S with x = x

′and all λ ∈ (0, 1). The function f is strictly quasiconvex

if −f is strictly quasiconcave.

Exercise 5.7 The Cobb-Douglas function f(x) = Axα11 · · · xαn

n , defined for x1 > 0, . . . , xn >0, with A > 0 and αi > 0 for all i = 1, . . . , n. Show the following.

1. f(·) is quasiconcave for all α1, . . . , αn.

2. f(·) is concave for α1 + · · · + αn ≤ 1.

3. f(·) is strictly concave for α1 + · · · + αn < 1.

Theorem 5.10 (First-Order Characterization of Quasiconcavity) Let f(·) be aC1 function of n variables defined on an open convex set S in Rn. Then f(·) is quasi-concave on S if and only if for all x, x0 ∈ S,

f(x) ≥ f(x0) =⇒ ∇f (x0) · (x − x0) =n∑

i=1

∂f(x0)∂xi

(xi − x0i ) ≥ 0.

Proof of Theorem 5.10: (=⇒) Suppose f(·) is quasiconcave. Let x, x0 ∈ S anddefine the function g(·) on [0, 1] by

g(t) = f((1 − t)x0 + tx

)= f

(x0 + t(x − x0)

).

Then, using the chain rule (Theorem 5.15) which will be shown later, we have

g′(t) = ∇f (x0 + t(x − x0)) · (x − x0).

65

Suppose f(x) ≥ f(x0). By Theorem 5.8, g(t) ≥ g(0) for all t ∈ [0, 1]. For any t ∈ (0, 1],we have

g(t) − g(0)t

≥ 0.

Letting t → 0, we obtain

limt→0

g(t) − g(0)t

= g′(0) ≥ 0.

This implies

g′(0) = ∇f (x0) · (x − x0) ≥ 0

(⇐=) We will be satisfied with the figure for this part. �

The content of Theorem 5.10 is that for any quasiconcave function f(·) and any pairof points x and x0 with f(x) ≥ f(x0), the gradient vector ∇f (x0) and the vector (x−x0)must form an acute angle.

Theorem 5.11 (Second-Order Characterization of Quasiconcavity) Let f(·) bea C2 function defined on an open, convex set S in Rn. Then, f(·) is quasiconcave ifand only if, for every x ∈ S, the Hessian matrix D2f(x) is negative semidefinite in thesubspace {z ∈ Rn|∇f(x) · z = 0}, that is,

zT D2f(x)z ≤ 0 whenever ∇f (x) · z = 0

for every x ∈ S. If the Hessian matrix D2f(x) is negative definite in the subspace{z ∈ Rn|∇f(x) · z = 0} for every x ∈ S, then f(·) is strictly quasiconcave.

Proof of Theorem 5.11: (=⇒) Suppose f(·) is quasiconcave. Let x ∈ S. Choosex

′ ∈ S such that ∇f (x) · (x′ − x) = 0. Since f(·) is quasiconcave, f(x′) ≤ f(x). To see

this, draw the figure. Then,

f(x′) − f(x) ≤ ∇f (x) · (x′ − x) = 0

By Theorem 5.6, f(·) is concave in the subspace for which ∇f (x) · (x′ − x) = 0. WithTheorem 5.4 and Theorem 4.6, concavity of f(·) is equivalent to negative semidefinitenessof the Hessian matrix. Then, the conclusion follows. (⇐=) This proof is based on “ACharacterization of Quasi-Concave Functions,” by Kiyoshi Otani in Journal of EconomicTheory, vol 31, (1983), 194-196. Let x, x

′ ∈ S such that f(x′) ≥ f(x). This choice entails

no loss of generality. For λ ∈ [0, 1], define

g(λ) = f(x + λ(x′ − x)).

Note that g(0) = f(x), g(1) = f(x′), and g(1) ≥ g(0) because f(x

′) ≥ f(x) by our

hypothesis. What we want to show is that g(λ) ≥ g(0) for any λ ∈ [0, 1]. By themean-value theorem (Theorem 5.2), there exists λ0 ∈ (0, 1) such that g

′(λ0) = 0. Let

66

x0 = λ0x+(1−λ0)x′. Then, g

′(λ0) = 0 ⇔ ∇f (x0) ·(x′ −x) = 0. Assume that ∇f (x) = 0

for any x ∈ S. This strikes us as being innocuous. Let p denote ∇f (x0) for notationalsimplicity.

By pT · p > 0, there exists a C2 function β : R → R for sufficiently small |α| > 0 suchthat

β(0) = 0 and

f(β(α)p + α(x

′ − x) + x0

)= f(x0) for any small α

Again, for notional simplicity, we denote β(α)p + α(x′ − x) + x0 by z(α). By differ-

entiating f(z(α)) = f(x0), we have

∇f (z(α))[β

′(α)p + (x

′ − x)]

= 0 (∗)and by further differentiating, we have[

β′(α)p + (x

′ − x)]T

D2f(z(α))[β

′(α)p + (x

′ − x)]

+ ∇f (z(α))β′′(α)p = 0. (∗∗)

Since pT · (x′ − x) = 0 and ∇f (z(0))p ∈ Rn\{0}, (∗) implies β′(0) = 0. Moreover, our

hypothesis requires[β

′(α)p + (x

′ − x)]T

D2f(z(α))[β

′(α)p + (x

′ − x)]≤ 0.

Then, we must have β′′(α)∇f(z(α))p ≥ 0. When α is sufficiently close to zero, z(α) is

very close to x0 and so ∇f (z(α))p > 0 because p ∈ Rn\{0}. Then, β′′(α) ≥ 0 for α with

|α| sufficiently small.

For sufficiently small |λ − λ0|, we have

∇f((t − t0)(x

′ − x) + x0

)p > 0

because ∇f (x0)T p > 0 and (t−t0)(x′−x)+x0 is very close to x0. Hence, for λ sufficiently

close to λ0, we have

g(λ) = f(x + λ(x′ − x)) = f

((λ − λ0)(x

′ − x) + x0

)≤ f

(β(λ − λ0)p + (λ − λ0)(x

′ − x0) + x0

)= f(x0) = g(λ0)

because β(λ− λ0) ≥ 0 for λ sufficiently close to λ0. Accordingly, g(λ0) does not have aninterior minimum in [0, 1], unless it is constant. Hence, g(λ) ≥ g(0) for any λ ∈ [0, 1].The last step is based on Corollary 4.3 in “Nine Kinds of Quasiconcavity and Concavity,”by Diewert, Avriel, and Zang in Journal of Economic Theory, vol 25, (1981), 397-420.

Corollary 4.3 (Diewert, Avriel, and Zang (1981)): A differentiable function fdefined over an open S is quasiconcave if and only if, for any x0 ∈ S and any v ∈ R withvT v = 1,

67

vT∇f (x0) = 0 implies g(t) ≡ f(x0 + tv) does not attain a (semistrict) local minimum att = 0.

This completes the proof. �

Theorem 5.12 (A Characterization through Bordered Hessian) Let f be a C2

function defined in an open, convex set S in Rn. Define the bordered Hessian determi-nants Br(x) as follows: for each r = 1, . . . , n,

Br(x) =

∣∣∣∣∣∣∣∣∣0 f1(x) · · · fr(x)

f1(x) f11(x) · · · f1r(x)...

.... . .

...fr(x) fr1(x) · · · frr(x)

∣∣∣∣∣∣∣∣∣.

Then,

1. A necessary condition for f to be quasiconcave is that (−1)rBr(x) ≥ 0 for all x ∈ Sand all r = 1, . . . , n.

2. A sufficient condition for f to be strictly quasiconcave is that (−1)rBr(x) > 0 forall x ∈ S and all r = 1, . . . , n.

5.9 Total Differentiation

Consider functions that map points (vectors) in Rn to points (vectors) in Rm. Suchfunctions are often called transformations (operators). A transformation f : Rn → Rm

is said to be linear if

f(x1 + x2) = f(x1) + f(x2) and f(αx1) = αf(x1)

for all x1, x2 ∈ Rn and all scalars α ∈ R. Our knowledge on linear algebra tells us thatfor every linear transformation f : Rn → Rm, there is a unique m × n matrix A suchthat f(x) = Ax for all x ∈ Rn.⎛

⎜⎜⎜⎜⎜⎜⎝

f (1)(x)...

f (j)(x)...

f (m)(x)

⎞⎟⎟⎟⎟⎟⎟⎠ =

⎛⎜⎜⎜⎜⎜⎜⎝

a11 a12 · · · a1n...

.... . .

...aj1 aj2 · · · ajn...

.... . .

...am1 am2 · · · amn

⎞⎟⎟⎟⎟⎟⎟⎠

⎛⎜⎜⎜⎜⎜⎜⎝

x1

x2......

xn

⎞⎟⎟⎟⎟⎟⎟⎠

In particular,

f (j)(x) = aj1x1 + aj2x2 + · · · ajnxn =n∑

i=1

ajixi.

68

5.9.1 Linear Approximations and Differentiability

If a one variable function f is differentiable at a point x0, then the linear approximationto f around x0 is given by

f(x0 + h) ≈ f(x0) + f′(x0)h,

for small values of h. Here ≈ stands for “approximation.” This is useful because theapproximation error is defined by

O(h) = true value - approximate value = f(x0 + h) − f(x0) − f′(x0)h

becomes negligible for sufficiently small h. Namely O(h) → 0 as h → 0. More impor-tantly, however, O(h) also becomes small in comparison with h - that is,

limh→0

O(h)h

= limh→0

(f(x0 + h) − f(x0)

h− f

′(x0)

)= 0.

Moreover, f(·) is differentiable at x0 if and only if there exists a number c ∈ R such that

limh→0

f(x0 + h) − f(x0) − ch

h= 0.

If such a number c ∈ R exists, it is unique and c = f′(x0). These arguments can be gen-

eralized straightforwardly to higher dimensional spaces. In particular, a transformationf(·) is differentiable at a point x0 if it admits a linear transformation around x0:

Definition 5.8 If f : A → Rm is a transformation defined on a subset A of Rn andx0 is an interior point of A, then f is said to be differentiable at x0 if there exists anm × n matrix C such that

limh→0∈�n

‖f(x0 + h) − f(x0) − Ch‖‖h‖ = 0

If such a matrix C exists, it is called the total derivative of f(·) at x0, and is denotedby Df(x0).

If I restrict attention to real-valued functions, the following theorem establishes anequivalence between directional derivative along every vector and total differentiation.

Theorem 5.13 If f : A → R is defined on a subset A of Rn and f is differentiableat an interior point x ∈ A, then f has a derivative f

′a(x) along every n-vector a, and

f′a(x) = ∇f (x) · a.

Proof of Theorem 5.12: The derivative along a is

f′a(x) = lim

h→0

(f(x + ha) − f(x) −∇f (x) · ah

h+ ∇f (x) · a

)= 0 + ∇f (x) · a

69

In particular, if e′j = (0, . . . ,

j︷︸︸︷1 , . . . , 0) is the jth standard unit vector in Rn, then

∇f (x) · ej is the partial derivative ∂f(x)/∂xj = fj(x) with respect to the jth variable.On the other hand, ∇f (x) · ej is the jth component of ∇f (x). Hence, ∇f (x) is the rowvector

∇f (x) = (∇f(x) · e1, . . . ,∇f (x) · en) = (f1(x), . . . , fn(x)) .�Suppose that I am interested in checking if a given transformation f : Rn → Rm is

differentiable. Then, the next theorem shows that it suffices to check if each componentreal-valued function f j : Rn → R is differentiable (j = 1, . . . ,m). 2

Theorem 5.14 A transformation f = (f1, . . . , fm) from a subset A of Rn into Rm isdifferentiable at an interior point x ∈ A if and only if each component function f j : A →R, j = 1, . . . ,m, is differentiable at x. Moreover,

Df(x) =

⎛⎜⎜⎜⎝

∇f (1)(x)∇f (2)(x)

...∇f (m)(x)

⎞⎟⎟⎟⎠ =

⎛⎜⎜⎜⎜⎝

∂f1

∂x1(x) ∂f1

∂x2(x) · · · ∂f1

∂xn(x)

∂f2

∂x1(x) ∂f2

∂x2(x) · · · ∂f2

∂xn(x)

......

. . ....

∂fm

∂x1(x) ∂fm

∂x2(x) · · · ∂fm

∂xn(x)

⎞⎟⎟⎟⎟⎠

This is called the Jacobian matrix of f(·) at x. Its rows are the gradients of thecomponent functions of f(·).

Proof of Theorem 5.13: Let C be an m × n matrix and let O(h) = f(x + h) −f(x) − Ch where h ∈ Rn.⎛

⎜⎜⎜⎝O1(h)O2(h)

...Om(h)

⎞⎟⎟⎟⎠ =

⎛⎜⎜⎜⎝

f (1)(x + h) − f (1)(x)f (2)(x + h) − f (2)(x)

...f (m)(x + h) − f (m)(x)

⎞⎟⎟⎟⎠−

⎛⎜⎜⎜⎝

c11 c12 · · · c1n

c21 c22 · · · c2n...

.... . .

...cm1 cm2 · · · cmn

⎞⎟⎟⎟⎠

⎛⎜⎜⎜⎝

h1

h2...

hn

⎞⎟⎟⎟⎠ .

The j-th component of O(h), j = 1, . . . ,m, is Oj(h)− f (j)(x + h)− f (j)(x)−Cjh, whereCj is the j-th row of C. For each j,

|Oj(h)| ≤ ‖O(h)‖ ≤ |O1(h)| + · · · |Om(h)|It follows that

limh→0

‖O(h)‖‖h‖ = 0 ⇐⇒ lim

h→0

|Oi(h)|‖h‖ = 0 for all i = 1, . . . ,m

Hence, f(·) is differentiable at x if and only if each f (j) is differentiable at x. Also, thej-th row of the matrix C = Df(x) is the derivative of f (j), that is, Cj = ∇f (j)(x). �

The next theorem confirms our intuition about differentiation: If a transformation(or function) is differentiable, it is more than continuous.

2See the similar argument in Theorem 3.14 in which we show that a function f : �n → �m is

continuous if and only if each component function f j : �n → � is continuous (j = 1, . . . , m).

70

Theorem 5.15 (Differentiability =⇒ Continuity) If a transformation f from A ⊂Rn into Rm is differentiable at an interior point x0 ∈ A, then f is continuous.

Proof of Theorem 5.14: Let C = Df(x). Then, for small but nonzero h ∈ Rn, thetriangle inequality yields

‖f(x0 + h) − f(x0)‖ = ‖f(x0 + h) − f(x0) + Ch − Ch‖≤ ‖f(x0 + h) − f(x0) + Ch‖ + ‖Ch‖ (∵ Minkowski inequality)

= ‖h‖(‖f(x0 + h) − f(x0) + Ch‖

‖h‖)

+ ‖Ch‖

Since f is differentiable at x0,

‖f(x0 + h) − f(x0) + Ch‖‖h‖ → 0 as h → 0

‖Ch‖ → 0 as h → 0

Hence, f(x0 + h) → f(x0) as h → 0. �

The next theorem shows that the order of two operations do not matter for the finalproduct: (1) Construct a composite mapping and differentiate it; and (2) Differentiateeach mapping and constructs a composite of two derivatives.

Theorem 5.16 (The Chain Rule) Suppose f : A → Rm and g : B → Rp are definedon A ⊂ Rn and B ⊂ Rm, with f(A) ⊂ B, and suppose that f and g are differentiable atx and f(x), respectively. Then, the composite transformation g ◦ f : A → Rp defined by(g ◦ f)(x) = g(f(x)) is differentiable at x, and

D(g ◦ f)(x)︸ ︷︷ ︸p×n

= Dg(f(x))︸ ︷︷ ︸p×m

×Df(x)︸ ︷︷ ︸m×n

Proof of the Chain Rule: Define

k(h) = f(x + h) − f(x) = Df(x)h + ef (h), where‖ef (h)‖‖h‖ → 0 as h → 0

Also,

g(f(x + h)) − g(f(x)) = Dg(f(x))k + eg(k), where‖eg(k)‖‖k‖ → 0 as k → 0

Note that there exits some fixed constant K such that ‖k(h)‖ ≤ K‖h‖ for all small h.Otherwise, f and g are not differentiable. Note also that for all ε > 0, ‖eg(k)‖ < ε‖k‖for k small because g is differentiable. Thus, when h is small, we can summarize

‖eg(k(h))‖ < ε‖k(h)‖ ≤ εK‖h‖

71

Hence,

‖eg(k(h))‖‖h‖ → 0 as h → 0

Then, we execute a series of computation below.

e(h) = g (f(x) + k(h)) − g(f(x)) − Dg(f(x))Df(x)h= D(g ◦ f)(x)k(h) + eg(k(h)) − Dg(f(x))Df(x)h= D(g ◦ f)(x) (k(h) − Df(x)h) + eg(k(h))= D(g ◦ f)(x)ef (h) + eg(k(h)) (∵ h(k) = Df(x)h + ef (h))

And, moreover,

‖e(h)‖‖h‖ =

1‖h‖ ‖D(g ◦ f)(x)ef (h) + eg(k(h))‖

≤ ‖D(g ◦ f)(x)ef (h)‖‖h‖ +

‖eg(k(h))‖‖h‖ (∵ triangle inequality)

≤ ‖D(g ◦ f)(x)‖‖ef (h)‖‖h‖ +

‖eg(k(h))‖‖h‖ (∵ Cauchy-Schwartz inequality)

Since ‖ef (h)‖/‖h‖ → 0 and ‖eg(k(h))‖/‖h‖ → 0 as h → 0, we conclude that ‖e(h)‖/‖h‖ →0 as h → 0. �

5.10 The Inverse of a Transformation

Consider a transformation f : A → B where A ⊂ Rn and B ⊂ Rm. Suppose the range off is the whole of B, i.e., f(A) = B. Recall that f is one-to-one if f(x) = f(x

′) =⇒ x = x

′.

In sum, f is bijective. In this case, for each point y ∈ B there is exactly one point x ∈ Asuch that f(x) = y, and the inverse of f is the transformation f−1 : B → A which mapseach y ∈ B to precisely that point x ∈ A for which f(x) = y.

If f : U → V and g : V → U are differentiable and mutually inverse transformationsbetween open sets U and V in Rn, then g ◦ f is the identity transformation on U , andtherefore D(g ◦f)(x) = In for all x ∈ U . The chain rule then gives Dg(f(x))Df(x) = In.This means that the Jacobian matrix Df(x) must be nonsingular, so |Df(x)| = 0. Also,Dg(f(x)) is the inverse matrix Df(x)−1.

Theorem 5.17 (Inverse Function Theorem) Consider a transformation f = (f1, . . . , fn)from A ⊂ Rn into Rn and assume that f is Ck(k ≥ 1) in an open set containingx0 = (x0

1, . . . , x0n). Furthermore, suppose that |Df(x)| = 0 for x = x0. Let y0 = f(x0).

Then, there exists an open set U around x0 such that f maps U one-to-one onto an openset V around y0, and there is an inverse mapping g = f−1 : V → U which is also Ck.Moreover, for all y ∈ V , we have

Dg(y) = Df(x)−1, where x = g(y) ∈ U.

72

“Sketch” of the Proof of Inverse Function Theorem: The proof consists of5 steps. However, the proof of each step will be either briefly sketched or completelyskipped due to its technical difficulty. For simplicity, we assume that x0 = 0 and f(x0) =y0 = 0.

Step 1: There is no loss of generality to assume that Df(x) = In

Proof of Step 1: Let Df(0) = A. Since A is non-singular, A−1 exists. Let g : Rn →Rn be a linear mapping associated with A−1. That is, g(x) = A−1x for any x ∈ Rm.Note that g(0) = 0 and Dg(0) = A−1.

The Jacobian matrix of f ◦ g : Rn → Rn is given as D(f ◦ g)(0) = Df(0)Dg(0) =AA−1 = In. If we can show that f ◦ g is Cr, so is f . This is because g is a linear mapassociated with A−1 and so is Ck. Therefore, we can rather talk about f ◦ g instead off so that Df(0) = In can be assumed with no loss of generality. �

Step 2: There exists an open set U containing 0 such that f |U : U → Rn is one toone.

Proof of Step 2:Step 3: There exists an open set V such that f |U : U → V is onto. That is, for any

0 ∈ V , there exists 0 ∈ U such that f(0) = 0.

Step 4: f−1|V : V → U is one-to-one and onto.

Step 5: f−1|V : V → U is Ck. �

5.11 Implicit Function Theorems

f1(x1, x2, . . . , xn, y1, y2, . . . , ym) = 0· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ⇐⇒ f(x, y) = 0fm(x1, x2, . . . , xn, y1, y2, . . . , ym) = 0

with f = (f1, . . . , fm)′, x = (x1, . . . , xn), and y = (y1, . . . , ym).

Dxf(x, y) =

⎛⎜⎝ ∂f1/∂x1 · · · ∂f1/∂xn

.... . .

...∂fm/∂x1 · · · ∂fm/∂xn

⎞⎟⎠ .

Theorem 5.18 (Implicit Function Theorem) Suppose f = (f1, . . . , fm) is C1 inan open set A ⊂ Rn+m, and consider the vector equation f(x, y) = 0, where x ∈ Rn

and y ∈ Rm. Let (x0, y0) be an interior point of A satisfying f(x, y) = 0. Suppose that

73

the Jacobian determinant of f with respect to y is different from 0 at (x0, y0) – i.e.,|Dyf(x, y)| = 0 at (x, y) = (x0, y0). Then, there exist open balls B1 and B2 around x0

and y0, respectively, with B1 × B2 ⊂ A, such that |Dyf(x, y)| = 0 in B1 × B2, and suchthat for each x ∈ B1, there is a unique y ∈ B2 with f(x, y) = 0. In this way, y is definedon B1 as a C1 function g(·) of x. The Jacobian matrix Dg(x) can be found by implicitdifferentiation of f(x, y) = 0, and

dy

dx=

⎛⎜⎜⎜⎝

∂y1/∂x1 ∂y1/∂x2 · · · ∂y1/∂xn

∂y2/∂x1 ∂y2/∂x2 · · · ∂y2/∂xn...

.... . .

...∂ym/∂x1 ∂ym/∂x2 · · · ∂ym/∂xn

⎞⎟⎟⎟⎠ = Dg(x) = −[Dyf(x, y)]−1Dxf(x, y)

Proof of Implicit Function Theorem: We define the norm of vectors in Rn asfollows.

‖x‖ ≡ max1≤i≤n

|xi|

This is the norm we discuss for the implicit function theorem. The proof relies on thefollowing three lemmas (Lemmas 5.1, 5.2, and 5.3). We will not provide their proofshere.

Lemma 5.1 Let K be a compact set in Rn. Let {hk(x)}Kk=1 be a sequence of continuous

functions K → Rm. Suppose that for any ε > 0, there exists a number N ∈ N such that

maxx∈K

‖hm(x) − hn(x)‖ < ε

for all m,n > N . Then, there exists a unique continuous function h : K → Rm such that

limk→∞

{maxx∈K

‖hk(x) − h(x)‖}

= 0

With Weierstrass’s Theorem (Theorem 3.19) and the concept of Cauchy sequence inRn (Definition 3.6), the above lemma should be easy to be established. For the nextlemma, define the following.

Dα ={x ∈ Rn

∣∣‖x − x0‖ ≤ α}

Dβ ={y ∈ Rm

∣∣‖y − y0‖ ≤ β}

Let ξ(x, y) be a continuous mapping from Dα × Dβ to R with the property thatξ(x0, y0) = 0. Notice that Dα × Dβ is a compact set by construction.

Lemma 5.2 (Lipschitz Continuity) There exists a number K ∈ (0, 1) such that, forall y, y

′ ∈ Dβ,

‖ξ(x, y) − ξ(x, y′)‖ < K‖y − y

′‖.

74

We argue that Lemma 5.2 enables us to construct a sequence of continuous functionsneeded for Lemma 5.1. Again, we take Lemma 5.2 for granted. Let y0(x) = y0. Defineyk+1(x) = y0 + ξ(x, yk(x)) for k ≥ 0. Since ξ(x0, y0) = 0, we have

|ξ(x, y0)| = (1 − K)β

for x ∈ Dα with α > 0 sufficiently small. We execute a series of computations below.

|yk+1(x) − y0| = |ξ(x, yk(x))|= |ξ(x, yk(x)) − ξ(x, y0) + ξ(x, y0)|≤ |ξ(x, yk(x)) − ξ(x, y0)| + |ξ(x, y0)|< K|yk(x) − y0| + (1 − K)β

Fix m ∈ N.

‖yk+m(x) − yk(x)‖ = ‖ξ(x, yk+m−1(x)) − ξ(x, yk−1(x))‖≤ K‖yk+m−1(x) − yk−1(x)‖≤ Kk+1‖ym(x) − y0‖ → 0 as k → ∞

This means that maxx∈Dα ‖ym(x) − yn(x)‖ → 0 as m,n → ∞.

Lemma 5.3 Let ξ(x, y) be a continuous mapping from Dα×Dβ to Rm with the propertythat ξ(x0, y0) = 0. Furthermore, There exists a number K ∈ (0, 1) such that, for ally, y

′ ∈ Dβ,

‖ξ(x, y) − ξ(x, y′)‖ < K‖y − y

′‖.

Then, there exists a unique continuous mapping ϕ : Dα → Rm for which

ϕ(x) − y0 = ξ(x,ϕ(x))

for x ∈ Dα with α > 0 sufficiently small.

With the help of all three lemmas above, we will complete the proof of ImplicitFunction Theorem.

Define a function g(x, y) : Dα × Dβ → Rm satisfying the following equation.

f(x, y)︸ ︷︷ ︸m×1

= Dyf(x0, y0)︸ ︷︷ ︸m×m

m×1︷ ︸︸ ︷(y − y0) + g(x, y)︸ ︷︷ ︸

m×1

Since |Dyf(x0, y0)| = 0, we have

y − y0 = − [Dyf(x0, y0)

]−1g(x, y)

75

By construction of g, we have g(x0, y0) = 0 and |Dyg(x0, y0)| = 0. Define

ξ(x, y) = − [Dyf(x0, y0)

]−1g(x, y)

Note also that ξ(x0, y0) = 0. By the mean-value theorem (Theorem 5.2),

ξ(x, y) − ξ(x, y′) = − [

Dyf(x0, y0)]−1 [Dyg(x, y)](y − y

′)

= − [Dyf(x0, y0)

]−1 [Dyf(x, y) − Dyf(x0, y0)

](y − y

′)

={Im − [Dyf(x0, y0)]−1Dyf(x, y)

}(y − y0)

If we choose α > 0 small enough so that x is very close to x0, i.e., x ∈ Dα, there existsK ∈ (0, 1) such that

‖ξ(x, y) − ξ(x, y′)‖ < K‖Im(y − y0)‖

= K‖y − y0‖

Now, we can take two open sets B1 and B2 small enough so that B1 ⊂ Dα and B2 ⊂ Dβ

needed for the theorem. Then, we can use Lemma 5.3 which completes the proof. �

Corollary 5.3 (A Version of Implicit Function Theorem) Suppose f : R2 → R isC1 in an open set A containing (x0, y0), with f(x0, y0) = 0 and ∂f(x0, y0)/∂y = 0. Then,there exists an interval Ix = (x0 − δ, x0 + δ) and an interval Iy = (y0 − ε, y0 + ε) (withδ > 0 and ε > 0) such that Ix × Iy ⊂ A and:

1. for every x ∈ Ix, the equation f(x, y) = 0 has a unique solution in Iy which definesy as a function y = ϕ(x) in Iy;

2. ϕ is C1 in Ix = (x0 − δ, x0 + δ), with derivative

dy

dx= ϕ

′(x) = −∂f(x,ϕ(x))/∂x

∂f(x,ϕ(x))/∂y

Exercise 5.8 The point P = (x, y, z, u, v, w) = (1, 1, 0,−1, 0, 1) satisfies all the equa-tions

y2 − z + u − v − w3 = −1−2x + y − z2 + u + v3 − w = −3

x2 + z − u − v + w3 = 3

Using the implicit function theorem, find du/dx, dv/dx, and dw/dx at P .

76

Chapter 6

Static Optimization

6.1 Unconstrained Optimization

6.1.1 Extreme Points

Let f(·) be a real-valued function of n variables x1, . . . , xn defined on a set S in Rn.Suppose that the point x∗ = (x∗

1, . . . , x∗n) belongs to S and that the value of f at x∗ is

greater than or equal to the values attained by f at all other points x = (x1, . . . , xn) ∈ S.Thus,

f(x∗) ≥ f(x) for all x ∈ S (∗)

Here x∗ is called a (global) maximal point for f in S and f(x∗) is called the maximumvalue. If the inequality (∗) is strict for all x = x∗, then x∗ is a strict maximum pointfor f(·) in S. We define (strict) minimum point and minimum value by reversing theinequality sign in (∗). As collective names, we use extreme points and extreme values toindicate both maxima or minima.

Theorem 6.1 Let f(·) be defined on a set S in Rn and let x∗ = (x∗1, . . . , x∗

n) be aninterior point in S at which f(·) has partial derivatives. A necessary condition for x∗ tobe an extreme point for f is that x∗ is a stationary point for f(·) – that is, it satisfiesthe equations

∇f (x) = 0 ⇐⇒ ∂f(x)∂xi

= 0, for i = 1, . . . , n

Proof of Theorem 6.1: Suppose, on the contrary, that x∗ is a maximum point butnot a stationary point for f(·). Then, there is no loss of generality to assume that thereexists at least i such that fi(x) > 0. Define x∗∗ = (x∗

1, . . . , x∗i +ε, . . . , x∗

n). Since x∗ is aninterior point in S, one can make sure that x∗∗ ∈ S by choosing ε > 0 sufficiently small.Then,

f(x∗∗) ≈ f(x∗) + ∇f (x) · (0, . . . , 0, ε︸︷︷︸i

, 0, . . . , 0) > f(x∗).

77

However, this contradicts the hypothesis that x∗ is a maximum point for f(·). �

The next theorem clarifies under what conditions, the converse of the previous theo-rem (Theorem 6.1) is established.

Theorem 6.2 Suppose that the function f(·) is defined in a convex set S ⊂ Rn and letx∗ be an interior point of S. Assume that f(·) is C1 in a ball around x∗.

1. If f(·) is concave in S, then x∗ is a (global) maximum point for f(·) in S if andonly if x∗ is a stationary point for f(·).

2. f(·) is convex in S, then x∗ is a (global) minimum point for f(·) in S if and onlyif x∗ is a stationary point for f(·).

Proof of Theorem 6.2: We focus on the first part of the theorem. The second partfollows once we take into account that −f is concave. (=⇒) This follows from Theorem6.1 above. (⇐=) Suppose that x∗ is a stationary point for f(·) and that f(·) is concave.Recall the inequality in Theorem 5.6 (First-order characterization of concave functions).For any x ∈ S,

f(x) − f(x∗) ≤ ∇f (x∗) · (x − x∗) = 0 (∵ ∇f (x∗) = 0)

Thus, we have f(x) ≤ f(x∗) for any x ∈ S as desired. �

6.1.2 Envelope Theorems for Unconstrained Maxima

Consider an objective function with a parameter vector r of the form f(x, r) =f(x1, . . . , xn, r1, . . . , rk), where x ∈ S ⊂ Rn and r ∈ Rk. For each fixed r, suppose wehave found the maximum of f(x, r) when x varies in S. The maximum value of f(x, r)usually depends on r. We denote this value by f∗(r) and call f∗ the value function.Thus,

f∗(r) = maxx∈S

f(x, r)

The vector x that maximizes f(x, r) depends on r and is therefore denoted by x∗(r).Then, f∗(r) = f(x∗(r), r).

Theorem 6.3 (Envelope Theorem) In the maximization problem maxx∈S f(x, r), whereS ⊂ Rn and r ∈ Rk, suppose that there is a maximum point x∗(r) ∈ S for everyr ∈ Bδ(r∗) with some δ > 0. Furthermore, assume that the mappings r �→ f(x∗(r∗), r)and r �→ f∗(r) are differentiable at r∗. Then

∇rf∗(r∗) =

⎡⎢⎢⎢⎣

∂f(x, r)/∂r1

∂f(x, r)/∂r2...

∂f(x, r)/∂rk

⎤⎥⎥⎥⎦

x=x∗(r∗),r=r∗

78

There are two effects of r on the value function f∗ through both directly and indirectlyx∗(r). The Envelope theorem says that we can ignore the indirect effects.

Proof of Envelope Theorem: Define the function

ϕ(r) = f(x∗(r∗), r) − f∗(r).

Because x∗(r∗) is a maximum point of f(x, r) when r = r∗, one has ϕ(r∗) = 0 andϕ(r) ≤ 0 for all r ∈ Bδ(r∗). Since ϕ(r∗) is a maximum, the following first order conditionis satisfied (because of Theorem 6.1).

∇rϕ(r)∣∣r=r∗ = 0 ⇐⇒ ∂ϕ(r)

∂rj

∣∣∣∣r=r∗

= 0 ∀j = 1, . . . , k

That is,

∂ϕ(r)∂rj

=∂f(x∗(r∗), r)

∂rj

∣∣∣∣r=r∗

− ∂f∗(r)∂rj

∣∣∣∣r=r∗

= 0 ∀j = 1, . . . , k �

6.1.3 Local Extreme Points

The point x∗ is a local maximum point of f(·) in S if there exists an ε > 0 such thatf(x) ≤ f(x∗) for all x ∈ Bε(x∗) ∩ S. If x∗ is the unique local maximum point for f(·),then it is a strict local maximum point for f(·) in S. A (strict) local minimum point isdefined in the obvious way, and it should be clear what we mean by local maximum andminimum values, local extreme points, and local extreme values. A stationary point x∗ off(·) that is neither a local maximum point nor a local minimum point is called a saddlepoint of f(·).

Before stating the next result, recall the n leading principal minors of the Hessianmatrix D2f(x):

|D2(k)f(x)| =

∣∣∣∣∣∣∣∣∣f11(x) f12(x) · · · f1k(x)f21(x) f22(x) · · · f2n(x)

......

. . ....

fk1(x) fk2(x) · · · fkk(x)

∣∣∣∣∣∣∣∣∣, k = 1, . . . , n

Theorem 6.4 (Sufficient Conditions for Local Extreme Points) Suppose that f(x) =f(x1, . . . , xn) is defined on a set S ⊂ Rn and that x∗ is an interior stationary point. As-sume also that f(·) is C2 in an open ball around x∗. Then,

1. D2f(x∗) is positive definite =⇒ x∗ is a local minimum point.

2. D2f(x∗) is negative definite =⇒ x∗ is a local maximum point.

79

Proof of Theorem 6.4: We only focus on the first part of the theorem. We shouldbe able to prove the second part of the proof by replacing f(·) with −f (·). Since eachfij(x) is continuous in x (because f(·) is C2), the determinant is a continuous functionof x. Therefore, if |D2

(k)f(x∗)| > 0 for all k, it is possible to find a ball Bε(x∗) with ε > 0so small that |D2

(k)f(x)| > 0 for all x ∈ Bε(x∗) and all k = 1, . . . , n. By Theorem 4.6,the corresponding quadratic form is positive definite for all x ∈ Bε(x∗). It follows fromTheorem 5.5 that f(·) is strictly convex in Bε(x∗). Then, Theorem 6.2 shows that thestationary point x∗ is a maximum point for f in Bε(x∗). Hence, x∗ is a local minimumpoint for f(·). �

Lemma 6.1 If x∗ is an interior stationary point of f(·) such that |D2f(x∗)| = 0 andD2f(x∗) is neither positive definite nor negative definite, then x∗ is a saddle point.

6.1.4 Necessary Conditions for Local Extreme Points

To study the behavior of f(·) in an arbitrary fixed vector in Rn with length 1, so ‖h‖ = 1.The function g(·) describes the behavior of f(·) along the straight line through x∗ parallelto the vector h ∈ Rn.

g(t) = f(x∗ + th) = f(x∗1 + th1, . . . , x∗

n + thn)

We have the following characterization of local extreme points.

Theorem 6.5 (Necessary Conditions for Local Extreme Points) Suppose that f(x) =f(x1, . . . , xn) is defined on a set S ⊂ Rn, and x∗ is an interior stationary point in S.Assume that f is C2 in a ball around x∗. Then,

1. x∗ is a local minimum point =⇒ D2f(x∗) is positive semidefinite.

2. x∗ is a local maximum point =⇒ D2f(x∗) is negative semidefinite.

Proof of Theorem 6.5: Suppose that x∗ is an interior local maximum point forf(·). Then, if ε > 0 is small enough, Bε(x∗) ⊂ S, and f(x) ≤ f(x∗) for all x ∈ Bε(x∗).If t ∈ (−ε, ε), then x∗ + th ∈ Bε(x∗) because ‖(x∗ + th) − x∗‖ = ‖th‖ = |t| < ε. Then,for all t ∈ (−ε, ε), we have

f(x∗ + th) ≤ f(x∗) ⇐⇒ g(t) ≤ g(0).

Thus, the function g(·) has an interior maximum at t = 0. Using the chain rule, weobtain

g′(t) =

n∑i=1

fi(x∗ + th)hi

g′′(t) =

n∑i=1

n∑j=1

fij(x∗ + th)hihj

80

The condition g′′(0) ≤ 0 yields

n∑i=1

n∑j=1

fij(x∗ + th)hihj ≤ 0 ∀h = (h1, . . . , hn) with ‖h‖ = 1

This implies that the Hessian matrix D2f(x∗) is negative semidefinite. Theorem 4.6shows that this is equivalent to checking all principal minors. The same argument canbe used to establish the necessary condition for x∗ to be a local minimum point for f(·).�

Exercise 6.1 Find the local extreme values and classify the stationary points as maxima,minima, or neither.

1. f(x1, x2) = 2x1 − x21 − x2

2.

2. f(x1, x2) = x21 + 2x2

2 − 4x2.

3. f(x1, x2) = x31 − x22 + 2x2.

4. f(x1, x2) = 4x1 + 2x2 − x21 + x1x2 − x2

2.

5. f(x1, x2) = x31 − 6x1x2 + x3

2

6.2 Constrained Optimization

6.2.1 Equality Constraints: The Lagrange Problem

A general maximization problem with equality constraints is of the form

maxx=(x1,... ,xn)

f(x1, . . . , xn) subject to gj(x) = 0 ∀j = 1, . . . ,m (m < n) (∗)

Define the Lagrangian,

L(x) = f(x) − λ1g1(x) − · · · − λmgm(x)

where λ1, . . . , λm are called Lagrange multipliers. The necessary first-order conditionsfor optimality are then:

∇L(x) = ∇f (x) − λDg(x) = 0 ⇐⇒ ∂L(x)∂xi

=∂f(x)∂xi

−m∑

j=1

λj∂gj(x)

∂xi= 0, ∀i = 1, . . . , n (∗∗)

Theorem 6.6 (N&S Conditions for Extreme Points with Equality Constraints)The following establishes the necessary and sufficient conditions for the Lagrangian method

81

1. (Necessity) Suppose that the functions f and g1, . . . , gm are defined on a set Sin Rn and x∗ = (x∗

1, . . . , x∗n) is an interior point of S that solves the maximization

problem (∗). Assume further that f and g1, . . . , gm are C1 in a ball around x∗, andthat the Jacobian matrix,

Dg(x∗)︸ ︷︷ ︸m×n

=

⎛⎜⎜⎝

∂g1(x∗)∂x1

· · · ∂g1(x∗)∂xn

.... . .

...∂gm(x∗)

∂x1· · · ∂gm(x∗)

∂xn

⎞⎟⎟⎠

has rank m. Then, there exist unique numbers λ1, . . . , λm such that the first-orderconditions (∗∗) are valid.

2. (Sufficiency) If there exist numbers λ1, . . . , λm and a feasible x∗ which togethersatisfy the first-order conditions (∗∗), and if the Lagrangian L(x) is concave in x,then x∗ solves the maximization problem (∗).

Proof of Theorem 6.6: (Necessity) The proof for the necessity part consists ofthree steps.

Step 1: Construction of the unconstrained maximization problem

Since m×n matrix Dg(x∗) is assumed to have rank m, there exists a invertible (non-singular) m×m submatrix. After renumbering the variables, if necessary, we can assumethat it consists of the first m columns. By the implicit function theorem (Theorem 5.17),the m constraints, g1, . . . , gm define x1, . . . , xm as C1 functions of x = (xm+1, . . . , xn)in some open ball B around x∗, i.e., Bε(x∗) with ε > 0 sufficiently small, so we can write

xj = hj(xm+1, . . . , xn) = hj(x), j = 1, . . . ,m

Then, f(x1, . . . , xn) reduces to a composite function

ξ(x) = f(h1(x), . . . , hm(x), x)

of x only. Now, the maximization problem with equality constraints is translated intothe unconstrained maximization problem

maxx∈Bε(x∗)

ξ(x)

Since x∗ is a local extreme point for f subject to the given constraints, ξ must have anunconstrained local extreme point at x∗ = (x∗

m+1, . . . , x∗n). Hence, the partial derivatives

of ξ with respect to xm+1, . . . , xn must be 0:

∂ξ(x∗)∂xk

=(

∂f(x∗)∂x1

∂h1

∂xk+ · · · + ∂f(x∗)

∂xm

∂hm

∂xk

)︸ ︷︷ ︸

indirect effects

+∂f(x∗)

∂xk︸ ︷︷ ︸direct effects

= 0 (1)

82

where k = m + 1, . . . , n.

Step 2: To express ∂h1/∂xk, . . . , ∂hm/∂xk in terms of ∇gj(x)

gj(h1(x), . . . , hm(x), x) = 0, j = 1, . . . ,m

for all x ∈ B. Differentiating this with respect to xk gives

m∑s=1

∂gj

∂xs

∂hs

∂xk+

∂gj

∂xk= 0,

for k = m+1, . . . , n and j = 1, . . . ,m. In particular, this is valid at x∗. Now, multiplyingeach of the m equations above by a scalar λj, then adding these equations over j, weobtain

m∑j=1

λj

(m∑

s=1

∂gj

∂xs

∂hs

∂xk

)+

m∑j=1

λj∂gj

∂xk= 0, (2)

where k = m + 1, . . . , n. Next, subtract (2) from (1),

m∑s=1

⎛⎝∂f(x∗)

∂xs−

m∑j=1

λj∂gj

∂xs

⎞⎠ ∂hs

∂xk+

∂f(x∗)∂xk

−m∑

j=1

λj∂gj

∂xk= 0

where k = m + 1, . . . , n. These equations are valid for all choices of λ1, . . . , λm whenthe partial derivatives are evaluated at x∗. Suppose that we can prove the existence ofnumbers λ1, . . . , λm such that

∂f(x∗)∂xs

−m∑

j=1

λj∂gj

∂xs= 0, s = 1, . . . ,m (3)

If (3) is satisfied, we also have

∂f(x∗)∂xk

−m∑

j=1

λj∂gj

∂xk∀k = m + 1, . . . , n

Thus, the first-order necessary conditions for the Lagrangian are also satisfied.

Step 3: Existence of λ1, . . . , λm satisfying (3)

83

We rewrite the system (3) as

∂g1

∂x1λ1 +

∂g2

∂x1λ2 · · · ∂gm

∂x1λm =

∂f

∂x1

∂g1

∂x2λ1 +

∂g2

∂x2λ2 · · · ∂gm

∂x2λm =

∂f

∂x1⇐⇒ Dg(x∗)︸ ︷︷ ︸

m×n

λ︸︷︷︸n×1

= ∇f (x∗)︸ ︷︷ ︸m×1

· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·∂g1

∂xmλ1 +

∂g2

∂xmλ2 · · · ∂gm

∂xmλm =

∂f

∂xm

Here the coefficient matrix above is invertible (nonsingular), it has a unique solutionλ1, . . . , λm. This completes the proof.

(Sufficiency) Suppose that the Lagrangian L(x) is concave. The first-order neces-sary conditions imply that the Lagrangian is stationary at x∗. Then by Theorem 6.2(sufficiency for global maximum),

L(x∗) = f(x∗) −m∑

j=1

λjgj(x∗) ≥ f(x) −

m∑j=1

λjgj(x) = L(x) ∀x ∈ S

But for all feasible x, we have gj(x) = 0 and of course, gj(x∗) = 0 for all j = 1, . . . ,m.Hence, this implies that f(x∗) ≥ f(x). Thus, x∗ solves the maximization problem (∗). �

6.2.2 Lagrange Multipliers as Shadow Prices

The optimal values of x1, . . . , xn in the maximization problem (∗) will depend upon theparameter vector r = (r1, . . . , rk) ∈ Rk, in general. If x∗(r) = (x∗

1(r), . . . , x∗n(r)) denotes

the vector of optimal values of the choice variables, then the corresponding value

f∗(r) = f(x∗1(r), . . . , x∗

n(r))

of f(·) is called the (optimal) value function for the maximization problem (∗). The valuesof the Lagrange multipliers will also depend on r; we write λj = λj(r) for j = 1, . . . ,m.Let L(x, r) = f(x, r)−∑m

j=1 λjgj(x, r) be the Lagrangian. Under certain conditions, we

have∂f∗(r)

∂ri=

(∂L(x, r)

∂ri

)x=x∗(r)

, i = 1, . . . , k

6.2.3 Tangent Hyperplane

A set of equality constraints in Rn

g1(x) = 0g2(x) = 0

......

gm(x) = 0

84

defines a subset of Rn which is best viewed as a hypersurface. If, as I assume in thissection, the functions gj , j = 1, . . . ,m belong to C1, the surface defined by them is saidto be smooth. We introduce the tangent hyperplane M below:

M = {y ∈ Rn|Dg(x∗)y = 0}Note that the tangent hyperplane is a subspace of Rn.

Definition 6.1 A point x∗ satisfying the constraint g(x∗) = 0 is said to be a regu-lar point of the constraint if the gradient vectors ∇g1(x∗), . . . ,∇gm(x∗) are linearlyindependent. That is, Rank(Dg(x∗)) = m.

6.2.4 Local First-Order Necessary Conditions

Lemma 6.2 Let x∗ be a regular point of the constraint g(x) = 0 and a local extremepoint of f(·) subject to these constraints. Then, for any y ∈ Rn,

∇f (x∗)y = 0 whenever Dg(x∗)y = 0.

Proof of Lemma 6.2: Let y = (y1, . . . , yn) with ‖y‖ = 1. Let x(t) = x∗ + ty be anysmooth curve on the constraint surface g(x(t)) = 0 passing through x∗ with derivativex

′(t)|t=0 = y at x(0) = x∗. There exists some ε > 0 such that g(x(t)) = 0 for any

t ∈ (−ε, ε).

Since x∗ is a regular point, the tangent hyperplane is identical with the set of y’ssatisfying ∇g(x∗)y = 0. Then, since x∗ is a constrained local extreme point of f(·), wehave

d

dtf(x(t))

∣∣∣∣t=0

= 0 =⇒ ∇f (x∗)x′(0) = 0,

equivalently, ∇f (x∗)y = 0. �

The above lemma says that ∇f (x∗) is orthogonal to the tangent hyperplane.

6.2.5 Second-Order Necessary and Sufficient Conditions for Local Ex-treme Points

Theorem 6.7 (Necessity for Local Maximum) Suppose that x∗ is a local maximumof f(·) subject to g(x) = 0 and that x∗ is a regular point of these constraints. Then thereis a λ ∈ Rn such that

∇f (x∗) − λDg(x∗) = 0.

If we denote by M the tangent hyperplane M = {h ∈ Rn|Dg(x∗)h = 0}, then, the matrix

D2L(x∗) = D2f(x∗) − λD2g(x∗)

is negative semidefinite on M , that is,

hT D2L(x∗)h ≤ 0 ∀h ∈ M

85

Proof of Theorem 6.7: The first part follows from Theorem 6.6. We only focuson the second part. Let h = (h1, . . . , hn) ∈ M with ‖h‖ = 1. Let x(t) = x∗ + th beany smooth curve on the constant surface g(x(t)) = 0 passing through x∗ with derivativex

′(0) = h at x(0) = x∗. Suppose that x∗ is an interior local maximum point for f subject

to g(x) = 0. Then, if ε > 0 is small enough,

L(x∗ + th) ≤ L(x∗) ⇐⇒ f(x∗ + th) − λg(x∗ + th) ≤ f(x∗) − λg(x∗)

for all t ∈ (−ε, ε) because ‖(x∗ + th) − x∗‖ = ‖th‖ = |t| < ε. Define the functionϕ(t) = L((x∗ + th). Then, for all t ∈ (−ε, ε), we have

L(x∗ + th) ≤ L(x∗) ⇐⇒ ϕ(t) ≤ ϕ(0).

Thus, the function ϕ has an interior maximum at t = 0. Using the chain rule (Theorem5.15), we obtain

ϕ′(t) = ∇L(x∗ + th)h = ∇f (x∗ + th)h − λDg(x∗ + th)h

ϕ′(0) = ∇L(x∗)h = ∇f (x∗)h − λDg(x∗)h = 0

because h ∈ M so that ∇f (x∗)h = 0 and Dg(x∗)h = 0. Furthermore,

ϕ′′(t) = hT D2L(x∗ + th)h

The hypothesis that ϕ has an interior local maximum at t = 0 means ϕ′′(0) ≤ 0. Thus,

hT D2L(x∗)h ≤ 0 ⇐⇒n∑

i=1

Lij(x∗)hihj ≤ 0

This implies that the Hessian matrix D2L(x∗) is negative semidefinite on M . �

Theorem 6.8 (Sufficiency for Local Maximum) Suppose there is a point x∗ ∈ Rn

satisfying g(x∗) = 0, and a λ ∈ Rm such that

∇f (x∗) − λ′Dg(x∗) = 0︸︷︷︸

n×1

=

⎛⎜⎝ 0

...0

⎞⎟⎠ .

Suppose also that the matrix D2L(x∗) = D2f(x∗) + λD2g(x∗) is negative definite onM = {y ∈ Rn|Dg(x∗)y = 0}, that is, for y ∈ M with y = 0, yT D2L(x∗)y < 0. Then, x∗

is a strict local maximum of f(·) subject to g(x) = 0.

Proof of Theorem 6.8: The first part follows from Theorem 6.7. Define the La-grangian as follows.

L(x) = f(x) − λg(x)

Differentiating this with respect to x, and evaluating it at x∗, we obtain

∇L(x∗) = ∇f (x∗) − λDg(x∗) = 0

86

This implies that ∇L(x∗)y = 0 for any y ∈ Rn. By our hypothesis, D2L(x∗) is negativedefinite on M , and therefore, x∗ is a local maximum point of L(x) from Theorem 6.4.This implies that x∗ is a local maximum of f(·) subject to g(x) = 0. �

Exercise 6.2 Solve the problem

max{x + 4y + z} subject to x2 + y2 + z2 = 216 and x + 2y + 3z = 0

Exercise 6.3 Consider the problem (assuming m ≥ 4).

maxU(x1, x2) =12

ln(1 + x1) +14

ln(1 + x2) subject to 2x1 + 3x2 = m

Answer the following questions.

1. Let x∗1(m) and x∗

2(m) denote the values of x1 and x2 that solve the above maximiza-tion problem. Find these functions and the corresponding Lagrangian multiplier.

2. The optimal value U∗ of U(x1, x2) is a function of m. Find an explicit expressionfor U∗(m), and show that dU∗/dm = λ.

6.2.6 Envelope Result for Lagrange Problems

In economic optimization problems, the objective function as well as the constraint func-tions (such as the budget set) will often depend on parameters. These parameters areheld constant when optimizing (remember the price taking behavior assumption), butcan vary with the economic situation. We might want to know what happens to theoptimal value function when the parameter change.

Consider the following general Lagrange problem.

maxx∈S

f(x, r) subject to gj(x, r) = 0, j = 1, . . . ,m.

where r = (r1, . . . , rk) is a vector of parameters. The values of x1, . . . , xn that solvethe maximization problem will be functions of r. If we denote them by x∗

1(r), . . . , x∗n(r),

then

f∗(r) = f(x∗1(r), . . . , x∗

n(r))

is called the value function. Suppose that λi = λi(r) for all i = 1, . . . ,m are the Lagrangemultipliers in the first-order conditions for the maximization problem and let

L(x, r) = f(x, r) + λ · g(x, r)

be the Lagrangian. Here λ = (λ1, . . . , λm) and g(x, r) = (g1(x, r), . . . , gm(x, r)).

87

6.3 Inequality Constraints: Nonlinear Programming

maxx∈S

f(x) subject to

⎧⎪⎪⎪⎨⎪⎪⎪⎩

g1(x1, . . . , xn) ≤ 0g2(x1, . . . , xn) ≤ 0

...gm(x1, . . . , xn) ≤ 0

A vector x = (x1, . . . , xn) that satisfies all the constraints is called feasible. Theset of all feasible vectors is said to be the feasible set. We assume that f(·) and all thegj functions are C1. In the case of equality constraint, the number of constraints wereassumed to be strictly less than the number of variables. This is not necessary for thecase of inequality constraints. An inequality constraint gj(x) ≤ 0 is said to be active(binding) at x if gj(x) = 0 and inactive (non-binding) at x if gj(x) < 0.

Note that minimizing f(x) is equivalent to maximizing −f (x). Moreover, an inequal-ity constraint of the form gj(x) ≥ 0 can be rewritten as −gj(x) ≤ 0. In this way, mostconstrained optimization problem can be expressed as the above form.

We define the Lagrangian exactly as before.

L(x) = f(x) − λ · g(x) = f(x) −m∑

j=1

λjgj(x),

where λ = (λ1, . . . , λm) ∈ Rm are the Lagrangian multipliers. Again the first-orderpartial derivatives of the Lagrangian are equated to 0:

∇L(x) = ∇f (x) − λDg(x) = 0 ⇐⇒ ∂L(x)∂xi

=∂f(x)∂xi

−m∑

j=1

λj∂gj(x)

∂xi= 0, ∀i = 1, . . . , n (∗)

In addition, we introduce the complementary slackness conditions. For all j = 1, . . . ,m,

λj ≥ 0 and λj = 0 if gj(x) < 0 (∗∗)An alternative formulation of this condition is that for any j = 1, . . . ,m,

λj ≥ 0 and λjgj(x) = 0

In particular, if λj > 0, we must have gj(x) = 0. However, it is perfectly possible to haveboth λj = 0 and gj(x) = 0.

Conditions (∗) and (∗∗) are often called the Kuhn-Tucker conditions. They are(essentially but not quite) necessary conditions for a feasible vector to solve the maxi-mization problem. In general, they are definitely not sufficient on their own. Supposeone can find a point x∗ at which f(·) is stationary and gj(x∗) < 0 for all j = 1, . . . ,m.Then, the Kuhn-Tucker conditions will automatically be satisfied by x∗ together with allthe Lagrangian multipliers λj = 0 for all j = 1, . . . ,m.

88

Theorem 6.9 (Sufficiency for the Kuhn-Tucker Conditions I) Consider the max-imization problem and suppose that x∗ is feasible and satisfies conditions (∗) and (∗∗).If the Lagrangian L(x) = f(x) − λ · g(x) (with the λ values obtained from the recipe) isconcave, then x∗ is optimal.

Proof of Theorem 6.9: This is very much the same as the sufficiency part ofthe Lagrangian problem in Theorem 6.6. Since L(x) is concave by assumption and∇L(x∗) = 0 from (∗), by Theorem 6.2, x∗ is a global maximum point of L(x). Hence,for all x ∈ S,

f(x∗) −m∑

j=1

λjgj(x∗) ≥ f(x) −

m∑j=1

λjgj(x)

Rearranging gives the equivalent inequality

f(x∗) − f(x) ≥m∑

j=1

λj

(gj(x∗) − gj(x)

).

Thus, it suffices to showm∑

j=1

λj

(gj(x∗) − gj(x)

) ≥ 0

for all feasible x, because this will imply that x∗ solves the maximization problem. Sup-pose that gj(x∗) < 0. Then (∗∗) shows that λj = 0. Suppose that gj(x∗) = 0, we haveλj(gj(x∗) − gj(x)) = −λjg

j(x) ≥ 0 because x is feasible, i.e., gj(x) ≤ 0 and λj ≥ 0.Then, we have

∑mj=1 λj

(gj(x∗) − gj(x)

) ≥ 0 as desired. �

Theorem 6.10 (Sufficiency for the Kuhn-Tucker Conditions II) Consider the max-imization problem and suppose that x∗ is feasible and satisfies conditions (∗) and (∗∗).If f(·) is concave and each λjg

j(x) (with the λ values obtained from the recipe) is qua-siconvex, then x∗ is optimal.

Proof of Theorem 6.10: We want to show that f(x) − f(x∗) ≤ 0 for all feasiblex. Since f(·) is concave, then according to Theorem 5.6 (First-order characterization ofconcavity of f(·)),

f(x) − f(x∗) ≤ ∇f (x∗) · (x − x∗) =︸︷︷︸(∗)

m∑j=1

λj∇gj(x∗) · (x − x∗)

where we use the first order condition (∗). It therefore suffices to show that for allj = 1, . . . ,m, and all feasible x,

λj∇gj(x∗) · (x − x∗) ≤ 0.

The above inequality is satisfied for those j such that gj(x∗) < 0, because then λj = 0from the complementary slackness condition (∗∗). For those j such that gj(x∗) = 0, we

89

have gj(x) ≤ gj(x∗) (because x is feasible), and hence −λjgj(x) ≥ −λjg

j(x∗) becauseλj ≥ 0. Since the function −λjg

j(x) is quasiconcave (because λjgj(x) is quasiconvex),

it follows from Theorem 5.10 (a characterization of quasiconcavity) that ∇(−λjgj(x∗)) ·

(x − x∗) ≥ 0, and thus, λj∇gj(x∗) · (x − x∗) ≤ 0. �

Exercise 6.4 Reformulate the problem

min 4 ln(x2 + 2) + y2 subject to x2 + y ≥ 2, x ≥ 1

as a standard Kuhn-Tucker maximization problem and write down the necessary Kuhn-Tucker conditions. Moreover, find the solution of the problem (Take it for granted thatthere is a solution).

6.4 Properties of the Value Function

Consider the standard nonlinear programming problem below.

max f(x) subject to gj(x) ≤ bj ∀j = 1, . . . ,m

The optimal value of the objective f(x) obviously depends upon b ∈ Rm which is aparameter vector in the constraint set. The function defined by

f∗(b) = max{f(x)

∣∣gj(x) ≤ bj, j = 1, . . . ,m}

assigns to each b = (b1, . . . , bk) the optimal value f∗(b) of f(·). It is called the valuefunction for the problem. Let the optimal choice for x in the constrained optimizationproblem be denoted by x∗(b), and assume that it is unique. Let λj(b) for j = 1, . . . ,mbe the corresponding Lagrange multipliers. Then, if ∂f∗(b)/∂bj exists,

∂f∗(b)∂bj

= λj(b) ∀j = 1, . . . ,m

The value function f∗ is not necessarily C1. The next proposition characterizes a geo-metric structure of the value function.

Proposition 6.1 If f(x) is concave and gj(x) is convex for each j = 1, . . . ,m, thenf∗(b) is concave.

Proof of Proposition 6.1: Suppose that b′

and b′′

are two arbitrary parame-ter vectors in the constraint set, and let f∗(b′) = f(x∗(b′), f∗(b′′) = f(x∗(b′′), withgj(x∗(b′)) ≤ b

′j , gj(x∗(b′′)) ≤ b

′′j for j = 1, . . . ,m. Let α ∈ [0, 1]. Corresponding to the

vector αb′+ (1 − α)b

′′, there exists an optimal solution x∗(αb

′+ (1 − α)b

′′), and

f∗(αb′+ (1 − α)b

′′) = f(x∗(αb

′+ (1 − t)b

′′))

Define xα = αx∗(b′)+ (1−α)x∗(b′′). Then, convexity of gj for j = 1, . . . ,m implies that

gj(xα) ≤ αgj(x∗(b′)) + (1 − α)gj(x∗(b

′′)) ≤ αb

′j + (1 − α)b

′′j

90

Thus, xα is feasible in the problem with parameter αb′+(1−α)b

′′, and in that problem,

x∗(αb′+ (1 − α)b

′′) is optimal. It follows that

f(xα) ≤ f(x∗(αb′+ (1 − α)b

′′)) = f∗(αb

′+ (1 − α)b

′′)

On the other hand, concavity of f implies that

f(xα) ≥ αf(x∗(b′)) + (1 − α)f(x∗(b

′′)) = αf∗(b

′) + (1 − α)f(b

′′)

In sum,

f∗(αb′+ (1 − α)b

′′) ≥ αf∗(b

′) + (1 − α)f(b

′′)

This shows that f∗(b) is concave. �

6.5 Constraint Qualifications

Consider the maximization problem below.

max f(x) subject to gj(x) ≤ 0, j = 1, . . . ,m

Definition 6.2 The constrained maximization problem satisfies the constraint qual-ification if the gradient vectors ∇gj(x∗) (1 ≤ j ≤ m) corresponding those constraintsthat are active (binding) at x∗, are linearly independent.

An alternative formulation of this condition is: Delete all rows in the Jacobian matrixDg(x∗) that correspond to constraints that are inactive (not binding) at x∗. Then, theremaining matrix should have rank equal to the number of rows.

Theorem 6.11 (Kuhn-Tucker Necessary Conditions) Suppose that x∗ = (x∗1, . . . , x∗

n)solves the constrained maximization problem where f(·) and g1(·), . . . , gm(·) are C1 func-tions. Suppose furthermore that the maximization problem satisfies the constraint qual-ification. Then, there exist unique numbers λ1, . . . , λm such that the Kuhn-Tucker con-ditions (∗) and (∗∗) hold at x = x∗.

Proof of Kuhn-Tucker Necessary Conditions: We assume the following.

1. x∗ ∈ Rn maximizes f on the constraint set gj(x) ≤ 0 for all j = 1, . . . ,m

2. only g1, . . . , gk are binding at x∗, where k ≤ m.

3. the k × n Jacobian matrix Dgk(x∗) has maximal rank k. That is,

k = Rank [Dgk(x∗)] = Rank

⎛⎜⎝ ∂g1(x∗)/∂x1 · · · ∂g1(x∗)/∂xn

.... . .

...∂gk(x∗)/∂x1 · · · ∂gk(x∗)/∂xn

⎞⎟⎠ .

91

The proof consists of two steps.

Step 1: ∇L(x, λ) = 0 and λg(x) = 0

Since each gj(·) is a continuous function, there is a open ball Bε(x∗) such that gj(x) <0 for all x ∈ Bε(x∗) and for j = k + 1, . . . ,m. We will work in the open ball Bε(x∗) forthe rest of proof.

Note that x∗ maximizes f(·) in Bε(x∗) over the constraint set that gj(x) = 0 forj = 1, . . . , k. By assumption, Theorem 6.7 (Necessity for Optimization with EqualityConstraints) applies and therefore, there exist μ∗

1, . . . , μ∗k such that

∇L(x∗, μ∗) = 0 and gj(x∗) = 0 ∀j = 1, . . . , k

where L(x, μ) ≡ f(x) −∑kj=1 μjg

j(x) as the restricted Lagrangian.

Consider the usual Lagrangian

L(x, λ1, . . . , λm) ≡ f(x) −m∑

j=1

λkgj(x).

Let λ∗i = μ∗

i for i = 1, . . . , k and λ∗i = 0 for j = k + 1, . . . ,m. Then, we see that (x∗, λ∗)

is a solution of the n + m equations in n + m unknowns:

∂L∂xi

(x∗, λ∗) = 0 ∀i = 1, . . . , n

λ∗jg

j(x∗) = 0 ∀j = 1, . . . ,m

Step 2: λj ≥ 0 for all j

There is a C1 curve x(t) defined for t ∈ [0, ε) such that x(0) = x∗ and, for all t ∈ [0, ε),

g1(x(t)) = −t and gj(x(t)) = 0 for j = 2, . . . , k

By the implicit function theorem (Theorem 5.17), we can still solve the constrainedoptimization problem in Bε(x∗) even if we slightly perturb the constraint set. Let h =x

′(0). Using the chain rule (Theorem 5.15), we conclude that

nablag1(x∗)h = −1, ∇gj(x∗)h = 0 ∀j = 2, . . . , k

Since x(t) lies in the constraint set for all t and x∗ maximizes f(·) in the constraint set,f(·) must be nonincreasing along x(t). Therefore,

d

dtf(x(t))

∣∣∣∣t=0

= ∇f (x∗)h ≤ 0

92

By our first-order conditions, we execute a series of computations.

0 = ∇L(x∗)y

= ∇f (x∗)h −k∑

j=1

λj∇gj(x∗)y

= ∇f (x∗)h − λ1∇g1(x∗)h= ∇f (x∗)h + λ1

Since ∇f (x∗)h ≤ 0, we conclude that λ1 ≥ 0. A similar argument shows that λj ≥ 0 forj = 1, . . . , k. This completes the proof. �

Theorem 6.12 (Kuhn-Tucker N & S Conditions) Assume that a feasible vector x∗

and a set of multipliers λ1, . . . , λm satisfy the Kuhn-Tucker necessary conditions (∗) and(∗∗) for the constrained maximization problem. Define J = {j|gj(x∗) = 0}, the setof active (binding) constraints, and assume that λj > 0 for all j ∈ J Consider theLagrangian problem

max f(x) subject to gj(x) = 0 ∀j ∈ J

Then, x∗ satisfies

∇L(x∗) = ∇f (x∗) −∑j∈J

λj∇gj(x∗) = 0

for the given multipliers λj for j ∈ J . If D2L(x∗) is negative definite on M , then x∗ isa strict local maximum point for the original constrained maximization problem. Here

M ={h ∈ Rn

∣∣ ∇gj(x∗)h = 0 ∀j ∈ J}

Proof of Kuhn-Tucker N & S Conditions: Suppose, on the contrary, that x∗

is not a local maximum point of the constrained optimization problem. Then, we canconsider {yk} as a sequence of feasible points converging to x∗ such that f(yk) ≥ f(x∗)for each k. More specifically, for each k, define yk = x∗+εkhk with ‖hk‖ = 1 and εk > 0.We may assume that εk → 0 and hk → h∗ as k → ∞. Using the linear approximationthrough differentiability,

f(yk) ≈ f(x∗) + ∇f (x∗) · (yk − y∗) = f(x∗) + εk∇f (x∗) · hk

for k large enough. Letting k → ∞, because of linearity of ∇f (x∗)hk in hk (continuityfollows), we must have ∇f (x∗)h∗ ≥ 0 from f(yk) ≥ f(x∗). Also for each binding (active)constraint gj , we have

gj(yk) ≤ gj(x∗)

Again, using the linear approximation through differentiability,

gj(yk) ≈ gj(x∗) + ∇gj(x∗) · (yk − x∗) = gj(x∗) + εk∇gj(x∗) · hk

93

for k large enough. Then, we must have Dgj(x∗)h∗ ≤ 0 because Dgj(x∗)hk is a linearcontinuous in hk and gj(yk) ≤ gj(y∗) for each k.

If ∇gj(x∗)h∗ = 0 for all j ∈ J , then the proof goes through just as in Theorem 6.8.Therefore, there must exists at least one j ∈ J such that ∇gj(x∗)h∗ < 0. Then, weobtain

∇f (x∗)h∗ −∑j∈J

λjDgj(x∗)h∗ > 0 because λj > 0 for all j ∈ J

⎡⎣∇f (x∗) −

∑j∈J

λjDgj(x∗)

⎤⎦

︸ ︷︷ ︸=0

h∗ > 0

This, however, contradicts our fulfilled condition that ∇f (x∗) − ∑j∈J λj∇gj(x∗) = 0.

We complete the proof. �

Exercise 6.5 Consider the following constrained maximization problem.

max f(x, y) = x subject to g(x, y) = x3 + y2 = 0.

Show that this problem does not satisfy the constraint qualification.

6.6 Nonnegativity Constraints

Consider the nonlinear programming problem with nonnegativity constraints:

max f(x) subject to gj(x) ≤ 0 ∀j = 1, . . . ,m and xi ≥ 0 for all i = 1, . . . , n

We introduce n new constraints in addition to the m original ones:

gm+1(x) = −x1 ≤ 0gm+2(x) = −x2 ≤ 0

......

...gm+n(x) = −xn ≤ 0

We introduce the Lagrangian multipliers μ1, . . . , μn to go with the new constraints andform the extended Lagrangian.

L1(x) = f(x) −m∑

j=1

λjgj(x) −

n∑i=1

μi(−xi)

The necessary conditions for x∗ to solve the problem are

∂f(x∗)∂xi

−m∑

j=1

λj∂gj(x∗)

∂xi+ μi = 0, ∀i = 1, . . . , n

λi ≥ 0 and λj = 0 if gj(x∗) < 0, ∀j = 1, . . . ,m

μi ≥ 0 and μi = 0 if xi > 0, ∀i = 1, . . . , n

94

To reduce this collection of m + n constraints and m + n Lagrangian multipliers,the necessary conditions for the optimization problem are sometime formulated slightlydifferently below.

∂f(x∗)∂xi

−m∑

j=1

λj∂gj(x∗)

∂xi≤ 0 (= 0 if x∗

i > 0), ∀i = 1, . . . , n

This formulation follows from the first order condition.

∂f(x∗)∂xi

−m∑

j=1

λj∂gj(x∗)

∂xi= −μi, ∀i = 1, . . . , n

Note also that μi ≥ 0 and −μi = 0 if xi > 0

6.7 Concave Programming Problems

The constrained maximization problem is said to be a concave programming program inthe case when f(·) is concave and each gj is a convex function. In this case, the set offeasible vectors satisfying the m constraints is convex. We write the concave program asfollows:

max f(x) subject to g(x) ≤ 0

where g(x) = (g1(x), . . . , gm(x)) and 0 = (0, . . . , 0).

Definition 6.3 Let f(·) be concave on convex set S ⊂ Rn, and let x0 be an interiorpoint in S. Then, there exists a vector p ∈ Rn such that for all x ∈ S,

f(x) − f(x0) ≤ p · (x − x0).

A vector p that satisfies the above inequality is called a supergradient for f at x0.

Definition 6.4 The nonlinear programming problem satisfies the Slater qualificationif there exists a vector z ∈ Rn such that g(z) � 0, i.e., gj(z) < 0 for all j.

Theorem 6.13 (Necessary Conditions for Concave Programming) Suppose thatthe nonlinear programming is a concave programming satisfying the Slater constraintqualification. Then, the optimal value function f∗(c) is defined for (at least) all c ≥ g(z),and has a super gradient at 0. Furthermore, if λ is any supergradient of f∗ at 0, thenλ ≥ 0, and any solution x∗ of the concave programming problem is an unconstrained max-imum point of the Lagrangian L(x, λ) = f(x)− λ · g(x) which also satisfies λ · g(x∗) = 0(the complementary slackness condition).

Proof of Necessity for Concave Programming: We consider only the specialbut usual case where, for all c ∈ Rm, the feasible set of points x that satisfy g(x) ≤ cis bounded, so compact because of the assumption that the functions gj are C1, i.e.,continuous. (See Theorem 3.16). In this case, f∗(c) is defined as a maximum valuewhenever there exists at least one x satisfying g(x) ≤ c, which is certainly true whenc ≥ g(z). Then, f∗ is defined for all c ≥ g(z).

95

Theorem 6.14 (Sufficient Conditions for Concave Programming) Consider thenonlinear programming problem with f(·) concave and g(·) convex, and assume that thereexists a vector λ ≥ 0 and a feasible vector x∗ which together have the property that x∗

maximizes f(x) − λ · g(x) among all x ∈ Rn, and λ · g(x∗) = 0. Then, x∗ solves theoriginal concave problem and λ is a supergradient for f∗ at 0.

6.8 Quasiconcave Programming

The following theorem is important for economists, because in many economic opti-mization problems, the objective function is assumed to be quasiconcave, rather thanconcave.

Theorem 6.15 (Arrow and Enthoven (1961) in Econometrica) (Sufficient Con-ditions for Quasiconcave Programming): Consider the constrained optimizationproblem where the objective function f(·) is C1 and quasiconcave. Assume that thereexist numbers λ1, . . . , λm and a vector x∗ such that

1. x∗ is feasible and satisfies the Kuhn-Tucker conditions.

2. ∇f (x∗) = 0.

3. λjgj(x) is quasiconvex for each j = 1, . . . ,m.

Then, x∗ is optimal.

6.9 Appendix: Linear Programming

96

Chapter 7

Differential Equations

7.1 Introduction

What is a differential equation? As the name suggests, it is an equation. Unlike ordinaryalgebraic equations, in a differential equation:

• The unknown is a function, not a number.

• The equation includes one or more of the derivatives of the function.

An ordinary differential equation is one for which the unknown is a function of onlyone variable. Partial differential equations are equations where the unknown is a functionof two or more variables, and one or more the partial derivatives of the function areincluded. In this chapter, I restrict attention to first-order ordinary differential equations– that is, equations where the first-order derivatives of the unknown functions of onevariable are included.

Consider the following differential equation:

x =dx

dt= ax (∗)

where x = x(t) is a real-valued function of t ∈ R and x = dx/dt is the first-orderderivative of x(t). The above equation says that for any t ∈ R,

x′(t) = ax(t),

where a is some constant. I propose here x(t) = Keat as a solution to the differentialequation.

97

Chapter 8

Fixed Point Theorems

The fixed point problem is

Given a set S ⊂ Rn and a function f : S → S, is there an x ∈ S such thatf(x) = x?

The problem of finding the zeros of a function, f(·), and x ∈ S such that f(x) = 0,can be converted into a fixed point problem. To see this, observe that f(x) = 0 iffg(x) = x where g(x) = f(x)+x. The unconstrained optimization with concave objectivefunction is a special case of the fixed point theorem. In this case, the optimal solution isfound by solving ∇f (x) = 0.

8.1 Banach Fixed Point Theorem

Definition 8.1 A function f : S → S is said to be a contraction mapping if d(f(x), f(y)) ≤θd(x, y) for all x, y ∈ S, where 0 ≤ θ < 1 is a fixed constant.

Theorem 8.1 (Banach Fixed Point Theorem) Let S ⊂ Rn be closed and f : S → Sa contraction mapping. Then, there exists a unique x ∈ S such that f(x) = x.

Proof of Banach Fixed Point Theorem: We define the norm of vectors in Rn asfollows.

‖x‖ ≡ max1≤n

|xi|

We use this norm in the proof of the implicit function theorem (Theorem 5.17). Chooseany x0 ∈ S and let xk = f(xk−1). If a sequence {xk} has a limit x∗, then x∗ ∈ S becauseS is closed, and f(x∗) = x∗. Therefore, it suffices to prove that {xk} has a limit. We usethe Cauchy criterion. Pick q > p. Then,

‖xq − xp‖ =

∥∥∥∥∥∥q−1∑k=p

(xk+1 − xk)

∥∥∥∥∥∥ ≤︸︷︷︸Minkowski inequality

q−1∑k=p

‖xk+1 − xk‖

98

But

‖xk+1 − xk‖ = ‖f(xk) − f(xk−1)| ≤ θ‖xk − xk−1‖.Repeated application of the above yields

‖xk+1 − xk‖ ≤ θk‖x1 − x0‖Hence

‖xq − xp‖ ≤q−1∑k=p

θk‖x1 − x0‖

≤ ‖x1 − x0‖(θp + θp+1 + · · · )

= ‖x1 − x0‖ θp

1 − θ→ 0 as p, q → ∞

Because θp → 0 as p → ∞ due to θ < 1. �

8.2 Brouwer Fixed Point Theorem

Lemma 8.1 If f : [0, 1] → [0, 1] is continuous, there exists x ∈ [0, 1] such that f(x) = x.

Proof of Lemma 8.1: Each x ∈ [0, 1] can be represented as a convex combinationof the end points of the interval:

x = (1 − x) × 0 + x × 1

The same will be true for f(x). So we express each x ∈ [0, 1] as a pair of nonnegativenumbers (x1, x2) = (1−x, x) that add to one. When expressing f(x) in this way, we willwrite it as (f1(x), f2(x)) = (1 − f(x), f(x)). Suppose for a contradiction, that f has nofixed point.

Since f : [0, 1] → [0, 1] we can think of the function f(·) as moving each point x ∈ [0, 1]either to the right (if f(x) > x) or to the left (if f(x) < x). The assumption that f(·)has no fixed point eliminates the possibility that f(·) leaves the position of x unchanged.

Given any x ∈ [0, 1], we label it with a “+” if f1(x) < x1 (move to the right) andlabel it “−” if f1(x) > x1 (move to the left). The assumption of no fixed point impliesf1(x) = 1 − x for all x ∈ [0, 1]. Thus, the labeling scheme is well defined. Notice thatthe point 0 will be labeled (+) and the point 1 will be labeled (−).

Choose any finite partition, Π0, of the interval [0, 1] into smaller intervals.

Claim 8.1 The partition Π0 must contain a subinterval [x0, y0] whose endpoints havedifferent labels.

99

Proof of Claim 8.1: Every endpoint of these subintervals is labeled either (+) or(−). The point “0”, which must be the endpoint of the subinterval of Π0, has label (+).The point “1” has label (−). As we travel from 0 to 1 (left to right), we leave a pointlabeled (+) and arrive at a point labeled (−). At some point, we must pass through asubintervals which has endpoints with different labels. �

Now take the partition Π0 and form a new partition Π1, finer than the first by takingall the subintervals in Π0 whose endpoints have different labels and cutting them in half.In Π1, there must exist at least one subinterval, [x1, y1] with endpoints having differentlabels. Repeat this procedure indefinitely.

This produces an infinite sequence of subintervals {(xk, yk)} shrinking in size withdifferent labels at the endpoints. Furthermore, we can choose a subsequence of them sothat the left hand endpoint, xk, is labeled (+) and the right hand endpoint yk is labeled(−). Since these intervals live in [0, 1], their lengths are bounded. Therefore, by Bolzano-Weierstrass theorem (Theorem 3.13), there is a convergent subsequence of them, with|xk − yk| → 0 as k → ∞. By continuity of f(·), |f(xk) − f(yk)| → 0 as k → ∞.

Let z be the limit point of {xk} and {yk}. By continuity, f(xk) and f(yk) bothconverge to f(z). Since each xk is labeled (+) and each yk is labeled (−), for each k, wehave f1(xk) < xk

1 and in the limit f1(z) ≤ z1. For each k, we have f1(yk) > yk1 and in

the limit f1(z) ≥ z1. Thus, f1(z) ≤ z1 and f1(z) ≥ z1. This implies that f1(z) = z, i.e.,f(z) = z, a fixed point. This is a contradiction. �

Definition 8.2 The n-simplex is the set Δn = {x ∈ Rn|∑ni=1 xi = 1 and xi ≥ 0 for

all i = 1, . . . , n}.From the definition, Δn is convex and compact. We also see that this is an (n − 1)-

dimensional object.

Lemma 8.2 If f : Δn → Δn is a continuous function, then there exists x ∈ Δn suchthat f(x) = x.

We skip the proof of Lemma 8.2. We should note that Lemma 8.1 is a special case ofLemma 8.2. Before showing Brouwer fixed point theorem, we need some preliminaries.

Definition 8.3 A set A is topologically equivalent to a set B if there exists a con-tinuous function g with continuous inverse such that g(A) = B and g−1(B) = A.

Observe that topological equivalence is a weaker requirement than that for the inversefunction theorem. Do you see why? The closed n-ball of center x0 in Rn is the set{x ∈ Rn|d(x, x0) ≤ 1}. Note that a closed n-ball is of dimension n.

Theorem 8.2 A nonempty compact convex set S ⊂ Rn of dimension m ≤ n is topolog-ically equivalent to a closed ball in Rm.

100

We skip the proof of Theorem 8.2. Now, it is time to prove Brouwer fixed pointtheorem.

Theorem 8.3 (Brouwer Fixed Point Theorem (1912)) If S ⊂ Rn is compact andconvex and f : S → S is continuous, there exists x ∈ S such that f(x) = x.

Proof of Brouwer Fixed Point Theorem: Lemma 8.2 shows that a continuousfunction f : Δn → Δn must have a fixed point. Then, it only remains to prove that thereis no loss of generality to assume that S = Δn as any compact convex set of dimensionn − 1 in Rn. To do so, we make use of the topological equivalence of compact convexsets.

If S is a compact convex set of dimension n − 1, we know from Theorem 8.2 thatthere is a g : S → Δn and g−1 : Δn → S such that g and g−1 are continuous. Defineh : Δn → Δn as follows:

h(x) = g[f(g−1(x)

)].

Since h(·) is continuous, by Lemma 8.2, it has a fixed point x∗. Therefore, h(x∗) =g[f(g−1(x∗)

)]= x∗. We have f(g−1(x∗)) = g−1(x∗). So, g−1(x∗) is a fixed point for f .

101

Chapter 9

Topics on Convex Sets

9.1 Separation Theorems

If a is a nonzero vector in Rn and α is a real number, then the set

H = {x ∈ Rn|a · x = α}is a hyperplane in Rn, with a as its normal. Moreover, the hyperplane H separates Rn

into two closed half-spaces.

If S and T are subsets of Rn, then H is said to separate S and T if S is contained inone of the closed half-spaces determined by H and T is contained in the other. In otherwords, S and T can be separated by a hyperplane if there exists a vector a = 0 and ascalar α such that for all x ∈ S and y ∈ T ,

a · x ≤ α ≤ a · yIf both inequalities are strict, then the hyperplane H = {x ∈ Rn|a · x = α} strictlyseparates S and T .

Theorem 9.1 (Strict Separation Theorem) Let S be a closed convex set in Rn, andlet y be a point in Rn that does not belong to S. Then there exists a nonzero vectora ∈ Rn\{0} and a number α ∈ R such that

a · x < α < a · yfor all x ∈ S. For every such α, the hyperplane H = {x ∈ Rn|a ·x = α} strictly separatesS and y.

Proof of Theorem 9.1: Because S is a closed set, among all the points of S, thereis one w = (w1, . . . , wn) that is closest to y. For this we can suppose that there is nosuch closest point. Then, closedness of S gives us a contradiction. You should fill thegap in the above argument. Let a = y−w. Since w ∈ S and y /∈ S, it follows that a = 0.Note that a · (y − w) = a · a > 0, and so a · w < a · y. Suppose we prove that

a · x ≤ a · w ∀x ∈ S (∗)

102

Then, the theorem is true for every number α ∈ (a · w, a · y). Now, it remains to show(∗). Let x be any point in S. Since S is convex, λx + (1 − λ)w ∈ S for each λ ∈ [0, 1].Now define g(λ) as the square of the distance from λx + (1 − λ)w to the point y:

g(λ) = ‖y − (λx + (1 − λ)w)‖2 = ‖y − w + λ(w − x)‖2

Using the chain rule, we obtain g′(λ) = 2 (y − w + λ(w − x)) · (w− x). Also g(0) = ‖y −

w‖2, the square of the distance between y and w. But w is the point in S that is closest toy, so g(λ) ≥ g(0) for all λ ∈ [0, 1]. It follows that 0 ≤ g

′(0) = 2(y−w)·(w−x) = a·(w−x),

which proves (∗). �

In the proof of the above theorem, it was essential that y did not belong to S. If Sis an arbitrary convex set (not necessarily closed), and if y is not an interior point of S,then it seems plausible that y can still be separated from S by a hyperplane. If y is aboundary point of S, such a hyperplane is called a supporting hyperplane to S at y.

Theorem 9.2 (Separating Hyperplane) Let S be a convex set in Rn and supposey = (y1, . . . , yn) is not an interior point of S. Then, there exists a nonzero vectora ∈ Rn such that

a · x ≤ a · y for every x ∈ S

Proof of Theorem 9.2: Let S be the closure of S. Because S is convex, so is S. Doyou see why? Because y is not an interior point of S and S is convex, y is not an interiorpoint of S. Hence, there is a sequence {yk} of points for which yk /∈ S for each k andyk → y as k → ∞. Now, yk /∈ S and S is closed and convex, so according to the previoustheorem (Theorem 8.4), for each k, there exists a vector ak = 0 such that ak ·x < ak ·yk forall x ∈ S. Without loss of generality, we can assume that ‖ak‖ = 1 for each k. Then, {ak}is a sequence of vectors in {x ∈ Rn|‖x‖ = 1} which is a compact set. Bolzano-Weierstrasstheorem (Theorem 3.13) shows that {ak} has a convergent subsequence {akj

}∞j=1. Leta = limj→∞ akj

. Then, a ·x = limj→∞ akj·x ≤ limj→∞ akj

·ykj= a ·y for every x ∈ S, as

required. Here we make use of continuity of linear functions. Moreover, we can confirmthat a = 0 because ‖a‖ = limj→∞ ‖akj

‖ = 1. �

Theorem 9.3 (Separating Hyperplane Theorem) Let S and T be two disjoint nonemptyconvex sets in Rn. Then, there exists a nonzero vector a ∈ Rn and a scalar α ∈ R suchthat

a · x ≤ α ≤ a · y for all x ∈ S and all y ∈ T

Thus, S and T are separated by the hyperplane H = {z ∈ Rn|a · z = α}.Proof of Separating Hyperplane Theorem: Let W = S − T be the vector

difference of the two convex sets S and T . Since S and T are disjoint, 0 /∈ W . First, Iclaim that W is convex.

Claim 9.1 W is convex.

103

Proof of Claim 9.1: Let w,w′ ∈ W . By definition of W , there are s, s

′ ∈ S andt, t

′ ∈ T such that w = s − t and w′= s

′ − t′. Let α ∈ [0, 1]. What we want to show is

that αw + (1 − α)w′ ∈ W . We compute the convex combination below.

αw + (1 − α)w′

= αs − αt + (1 − α)s′ − (1 − α)t

=[αs + (1 − α)s

′]− [αt + (1 − α)t

′]Since S and T are convex, αs + (1 − α)s

′ ∈ S and αt + (1 − α)t′ ∈ T . This implies that

αw + (1 − α)w′ ∈ S − T = W . �

Hence, by the previous theorem (Theorem 8.5), there exists an a = 0 such thata · w ≤ a · 0 = 0 for all w ∈ W . Let x ∈ S and y ∈ T be any two points of these sets.Then w = x − y ∈ W by definition, so a · (x − y) ≤ 0. Hence

a · x ≤ a · y for all x ∈ S and all y ∈ T (∗∗)From (∗∗) it follows that the set A = {a · x|x ∈ S} is bounded above by a · y for anyy ∈ T . By Fact 2.1 (Least Upper Bound Principle), A has a supremum α. Since αis the least upper bound of A, it follows that α ≤ a · y for every y ∈ T . Therefore,a · x ≤ α ≤ a · y for all x ∈ S and all y ∈ T . Thus, S and T are separated by thehyperplane {z ∈ Rn|a · z = α}. �

Theorem 9.4 Let S and T be two disjoint, nonempty, closed, convex sets in Rn with Sbeing bounded. Then, there exists a nonzero vector a ∈ Rn and a scalar α ∈ R such that

a · x > α > a · y for all x ∈ S and all y ∈ T .

Lemma 9.1 Let A be an m × n matrix, then cone(A) is a convex set.

Lemma 9.2 Let A be an m × n matrix, then cone(A) is a closed set.

Theorem 9.5 (Farkas Lemma) Let A be an m × n matrix, b ∈ Rm and F = {x ∈Rn| Ax = b, x ≥ 0}. Then either F = ∅ or there exists y ∈ Rm such that yA ≥ 0 andyb < 0 but not both.

Proof of Farkas Lemma:

9.2 Polyhedrons and Polytopes

Recall that a set C of vectors is called a cone if λx ∈ C whenever x ∈ C and λ > 0.

Definition 9.1 A cone C ⊂ Rn is polyhedral if there is a matrix A such that C = {x ∈Rn|Ax ≤ 0}.

Geometrically, a polyhedral cone is the intersection of a finite number of half-spacesthrough the origin.

Theorem 9.6 (Farkas-Minkowski-Weyl) A cone C is polyhedral if and only if thereis a finite matrix A such that C = cone(A).

104

9.3 Dimension of a Set

9.4 Properties of Convex Sets

105